When a file on the workdir has the same or a newer timestamp than the
index, we need to perform a full check of the contents, as the update of
the file may have happened just after we wrote the index.
The iterator changes are such that we can reach inside the workdir
iterator from the diff, though it may be better to have an accessor
instead of moving these structs into the header.
These tests want to test that we don't recalculate entries which match
the index already. This is however something we force when truncating
racily-clean entries.
Tick the index forward as we know that we don't perform the
modifications which the racily-clean code is trying to avoid.
We update the index and then immediately change the contents of the
file. This makes the diff think there are no changes, as the timestamp
of the file agrees with the cached data. This is however a bug, as the
file has obviously changed contents.
The test is a bit fragile, as it assumes that the index writing and the
following modification of the file happen in the same second, but it's
enough to show the issue.
Introduce a new binary diff callback to provide the actual binary
delta contents to callers. Create this data from the diff contents
(instead of directly from the ODB) to support binary diffs including
the workdir, not just things coming out of the ODB.
The signature for the reflog is not something which changes
dynamically. Almost all uses will be NULL, since we want for the
repository's default identity to be used, making it noise.
In order to allow for changing the identity, we instead provide
git_repository_set_ident() and git_repository_ident() which allow a user
to override the choice of signature.
The implementation of the hashsig API disallows computing a signature on
small files containing only a few lines. This new flag disables this
behavior.
git_diff_find_similar() sets this flag by default which means that rename
/ copy detection of small files will now work. This in turn affects the
behavior of the git_status and git_blame APIs which will now detect rename
of small files assuming the right options are passed.
We cannot know from looking at .gitmodules whether a directory is a
submodule or not. We need the index or tree we are comparing against to
tell us. Otherwise we have to assume the entry in .gitmodules is stale
or otherwise invalid.
Thus we pass the index of the repository into the workdir iterator, even
if we do not want to compare against it. This follows what git does,
which even for `git diff <tree>`, it will consider staged submodules as
such.
The diff code was using an "ignored_prefix" directory to track if
a parent directory was ignored that contained untracked files
alongside tracked files. Unfortunately, when negative ignore rules
were used for directories inside ignored parents, the wrong rules
were applied to untracked files inside the negatively ignored
child directories.
This commit moves the logic for ignore containment into the workdir
iterator (which is a better place for it), so the ignored-ness of
a directory is contained in the frame stack during traversal. This
allows a child directory to override with a negative ignore and yet
still restore the ignored state of the parent when we traverse out
of the child.
Along with this, there are some problems with "directory only"
ignore rules on container directories. Given "a/*" and "!a/b/c/"
(where the second rule is a directory rule but the first rule is
just a generic prefix rule), then the directory only constraint
was having "a/b/c/d/file" match the first rule and not the second.
This was fixed by having ignore directory-only rules test a rule
against the prefix of a file with LEADINGDIR enabled.
Lastly, spot checks for ignores using `git_ignore_path_is_ignored`
were tested from the top directory down to the bottom to deal with
the containment problem, but this is wrong. We have to test bottom
to top so that negative subdirectory rules will be checked before
parent ignore rules.
This does change the behavior of some existing tests, but it seems
only to bring us more in line with core Git, so I think those
changes are acceptable.
This is a proposed adjustment to the trace APIs. This makes the
trace levels into a bitmask so that they can be selectively enabled
and adds a callback-level payload, plus a message-level payload.
This makes it easier for me to a GIT_TRACE_PERF callbacks that
are simply bypassed if the PERF level is not set.
This adds an option to refresh the stat cache while generating
status. It also rips out the GIT_PERF stuff I had an makes use
of the trace API to keep statistics about what happens during diff.
When diff is scanning the working directory, if it finds a file
where it is not sure if the index entry matches the working dir,
it will recalculate the OID (which is pretty expensive). This
adds a new flag to diff so that if the OID calculation finds that
the file actually has not changed (i.e. just the modified time was
altered or such), then it will refresh the stat cache in the index
so that future calls to diff will not have to check the oid again.
When we think the stat cache in the index seems valid and the size
or mode of a file has definitely changed, then don't bother trying
to recalculate the OID of the workdir bits to confirm that it is
modified - just accept that it is modified.
This can result in files that show as modified with no actual diff,
but the behavior actually appears to match Git on the command line.
This also includes a minor optimization to not perform a submodule
lookup on the ".git" directory itself.
This takes the `--stat` and related example options in the example
diff.c program and converts them to use the `git_diff_get_stats`
API which nicely formats stats for you.
I went to add bar-graph scaling to the stats formatter and noticed
that the `git_diff_stats` structure was holding on to all of the
`git_patch` objects. Unfortunately, each of these objects keeps
the full text of the diff in memory, so this is very expensive. I
ended up modifying `git_diff_stats` to keep just the data that it
needs to keep and allowed it to release the patches. Then, I added
width scaling to the output on top of that.
In making the diff example program match 'git diff' output, I ended
up removing an newline from the sumamry output which I then had to
compensate for in the email formatting to match the expectations.
Lastly, I went through and refactored the tests to use a couple of
helper functions and reduce the overall amount of code there.
This makes the lock management on the index a little bit broader,
having a number of routines hold the lock across looking up the
item to be modified and actually making the modification. Still
not true thread safety, but more pure index modifications are now
safe which allows the simple cases (such as starting up a diff
while index modifications are underway) safe enough to get the
snapshot without hitting allocation problems.
As part of this, I simplified the allocation of index entries to
use a flex array and just put the path at the end of the index
entry. This makes every entry self-contained and makes it a
little easier to feel sure that pointers to strings aren't
being accidentally copied and freed while other references are
still being held.
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
The usefulness of these helpers came up for me while debugging
some of the iterator changes that I was making, so since they
have also been requested (albeit indirectly) I thought I'd include
them.
With the new submodule cache validity checks, we generally don't
need to call git_submodule_reload_all to have up-to-date submodule
data. Some tests are still calling it where I want to actually
test that it can be called safely and doesn't break anything, but
mostly it is not needed.
This also expands some of the existing submodule tests to cover
some variants on the behavior that was already being tested.
When a directory containing a .git directory (or even just a plain
gitlink) was found, libgit2 was going out of its way to treat it
specially. This seemed like it was necessary because the diff
code was not originally emulating Git's behavior for untracked
directories correctly (i.e. scanning for ignored vs untracked items
inside). Now that libgit2 diff mimics Git's untracked directory
behavior, the special handling for contained Git repos is actually
incorrect and this commit rips it out.
`git_submodule` objects were already refcounted internally in case
the submodule name was different from the path at which it was
stored. This makes that refcounting externally used as well, so
`git_submodule_lookup` and `git_submodule_add_setup` return an
object that requires a `git_submodule_free` when done.
This adds `git_diff_buffers` and `git_patch_from_buffers`. This
also includes a bunch of internal refactoring to increase the
shared code between these functions and the blob-to-blob and
blob-to-buffer APIs, as well as some higher level assert helpers
in the tests to also remove redundancy.
Writing a sample Javascript driver pointed out some extra
whitespace handling that needed to be done in the diff driver.
This adds some tests with some sample javascript code that I
pulled off of GitHub just to see what would happen. Also, to
clean up the userdiff test data, I did a "git gc" and packed
up the test objects.
This moves the expected and actual test data along with the source
data for the userdiff tests into the tests/resources/userdiff test
repo and updates the test to use that.
Reorganize the builtin driver table slightly so that core Git
builtin definitions can be imported verbatim. Then take a few of
the core Git drivers and pull them in.
This also creates a test of diffs with the builtin HTML driver
which led to some small error handling fixes in the driver
selection logic.
It's hard or even impossible to correctly free the string buffer
allocated by git_patch_to_str in some circumstances. Drop the function
so people have to use git_patch_to_buf instead - git_buf has a dedicated
destructor.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
When doing copy detection, it is often necessary to include
UNMODIFIED records in the git_diff so they are available as source
records for GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED. Yet in the final
diff, often you will not want to have these UNMODIFIED records.
This adds a flag which marks these UNMODIFIED records for deletion
from the diff list so they will be removed after the rename detect
phase is over.
When FIND_COPIES is used in combination with BREAK_REWRITES for
rename detection, there was a bug where the split MODIFIED delta
was only used as a target for RENAME records and not for COPIED
records. This fixes that, converting the split into a pair of
DELETED and COPIED deltas when that circumstance arises.