This is a proposed adjustment to the trace APIs. This makes the
trace levels into a bitmask so that they can be selectively enabled
and adds a callback-level payload, plus a message-level payload.
This makes it easier for me to a GIT_TRACE_PERF callbacks that
are simply bypassed if the PERF level is not set.
This adds an option to refresh the stat cache while generating
status. It also rips out the GIT_PERF stuff I had an makes use
of the trace API to keep statistics about what happens during diff.
When we think the stat cache in the index seems valid and the size
or mode of a file has definitely changed, then don't bother trying
to recalculate the OID of the workdir bits to confirm that it is
modified - just accept that it is modified.
This can result in files that show as modified with no actual diff,
but the behavior actually appears to match Git on the command line.
This also includes a minor optimization to not perform a submodule
lookup on the ".git" directory itself.
When considering status of untracked directories, if we find an
explicitly ignored item, even if it is a directory, treat the
parent as an IGNORED item. It was accidentally being treated as
an EMPTY item because we were not looking into the ignored subdir.
In the iterator, distinguish between ignores and empty directories
so that diff and status can ignore empty directories, but checkout
and stash can treat them as untracked items.
When diff finds an untracked directory, it emulates Git behavior
by looking inside the directory to see if there are any untracked
items inside it. If there are only ignored items inside the dir,
then diff considers it ignored, even if there is no direct ignore
rule for it.
Checkout was not copying this behavior - when it found an untracked
directory, it just treated it as untracked. Unfortunately, when
combined with GIT_CHECKOUT_REMOVE_UNTRACKED, this made is seem that
checkout (and stash, which uses checkout) was removing ignored
items when you had only asked it to remove untracked ones.
This commit moves the logic for advancing past an untracked dir
while scanning for non-ignored items into an iterator helper fn,
and uses that for both diff and checkout.
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
Again, laying groundwork for some index iterator changes, this
contains a bunch of code refactorings for index internals that
should make it easier down the line to add locking around index
modifications. Also this removes the redundant prefix_position
function and fixes some potential memory leaks.
With the changes to how git_path_dirload_with_stat handles things
that look like submodules, submodules could end up sorted in the
wrong order with the workdir iterator. This moves the submodule
check earlier in the iterator processing of a new directory so
that the submodule name updates will happen immediately and the
sort order will be correct.
When a directory containing a .git directory (or even just a plain
gitlink) was found, libgit2 was going out of its way to treat it
specially. This seemed like it was necessary because the diff
code was not originally emulating Git's behavior for untracked
directories correctly (i.e. scanning for ignored vs untracked items
inside). Now that libgit2 diff mimics Git's untracked directory
behavior, the special handling for contained Git repos is actually
incorrect and this commit rips it out.
This renames git_vector_free_all to the better git_vector_free_deep
and also contains a couple of memory leak fixes based on valgrind
checks. The fixes are specifically: failure to free global dir
path variables when not compiled with threading on and failure to
free filters from the filter registry that had not be initialized
fully.
There are a lot of places that we call git__free on each item in
a vector and then call git_vector_free on the vector itself. This
just wraps that up into one convenient helper function.
This adds giterr_user_cancel to return GIT_EUSER and clear any
error message that is sitting around. As a result of using that
in places, we need to be more thorough with capturing errors that
happen inside a callback when used internally. To help with that,
this also adds giterr_capture and giterr_restore so that when we
internally use a foreach-type function that clears errors and
converts them to GIT_EUSER, it is easier to restore not just the
return value, but the actual error message text.
When the filesystem iterator encounters an error with a file, it
returns the error but because of the cleanup code, it was in some
cases erasing the error message. This uses the giterr_detach API
to make sure that the actual error message is restored after the
cleanup code has been run.
This cleans up some additional issues. The main change is that
on a filesystem that doesn't support mode bits, libgit2 will now
create new blobs with GIT_FILEMODE_BLOB always instead of being
at the mercy to the filesystem driver to report executable or not.
This means that if "core.filemode" lies and claims that filemode
is not supported, then we will ignore the executable bit from the
filesystem. Previously we would have allowed it.
This adds an option to the new git_repository_reset_filesystem to
recurse through submodules if desired. There may be other types
of APIs that would like a "recurse submodules" option, but this
one is particularly useful.
This also has a number of cleanups, etc., for related things
including trying to give better error messages when problems come
up from the filesystem. For example, the FAT filesystem driver on
MacOS appears to return errno EINVAL if you attempt to write a
filename with invalid UTF-8 in it. We try to capture that with a
better error message now.
This hooks up git_path_direach and git_path_dirload so that they
will take a flag indicating if directory entry names should be
tested and converted from decomposed unicode to precomposed form.
This code will only come into play on the Apple platform and even
then, only when certain types of filesystems are used.
This involved adding a flag to these functions which involved
changing a lot of places in the code.
This was an opportunity to do a bit of code cleanup here and there,
for example, getting rid of the git_futils_cleanupdir_r function in
favor of a simple flag to git_futils_rmdir_r to not remove the top
level entry. That ended up adding depth tracking during rmdir_r
which led to a safety check for infinite directory recursion. Yay.
This hasn't actually been tested on the Mac filesystems where the
issue occurs. I still need to get test environment for that.
This doesn't actual do string precompose but it puts the hooks in
place into the iterators and the git_path_dirload function so that
the actual precompose work is ready to go.
The routines to push and pop ignore files while traversing a
directory had some issues. In particular, setting up the initial
list would sometimes push an ignore file before it ought to be
applied if the starting path was a directory containing an ignore
file. Also, the pop function was not always matching the right
part of the path and would fail to pop ignores from the list in
some cases.
This adds some tests that exercise a particular problematic case
and then fixes the problems that I could find related to this.
At some point, I'd like to isolate this ignore rule management
code and rewrite it, but that's a larger project and right now,
I'll opt to just try to fix the broken behaviors.
This adds the ability for checkout to write to a target directory
instead of having to use the working directory of the repository.
This makes it easier to do exports of repository data and the like.
This is similar to, but not quite the same as, the --prefix option
to `git checkout-index` (this will always be treated as a directory
name, not just as a simple text prefix).
As part of this, the workdir iterator was extended to take the
path to the working directory as a parameter and fallback on the
git_repository_workdir result only if it's not specified.
Fixes#1332
This is a significant reorganization of the diff code to break it
into a set of more clearly distinct files and to document the new
organization. Hopefully this will make the diff code easier to
understand and to extend.
This adds a new `git_diff_driver` object that looks of diff driver
information from the attributes and the config so that things like
function content in diff headers can be provided. The full driver
spec is not implemented in the commit - this is focused on the
reorganization of the code and putting the driver hooks in place.
This also removes a few #includes from src/repository.h that were
overbroad, but as a result required extra #includes in a variety
of places since including src/repository.h no longer results in
pulling in the whole world.
1. internal iterators now return GIT_ITEROVER when you go past the
last item in the iteration.
2. git_iterator_advance will "advance" to the first item in the
iteration if it is called immediately after creating the
iterator, which allows a simpler idiom for basic iteration.
3. if git_iterator_advance encounters an error reading data (e.g.
a missing tree or an unreadable file), it returns the error
but also attempts to advance past the invalid data to prevent
an infinite loop.
Updated all tests and internal usage of iterators to account for
these new behaviors.
This adds some hooks into the filesystem iterator so that the
workdir iterator can just become a wrapper around it. Then we
remove most of the workdir iterator code and just have it augment
the filesystem iterator with skipping .git entries, updating the
ignore stack, and checking for submodules.
This adds a new variant iterator that is a raw filesystem iterator
for scanning directories from a root. There is still more work to
do to blend this with the working directory iterator.
1. Fix sort order problem with submodules where "mod" was sorting
after "mod-plus" because they were being sorted as "mod/" and
"mod-plus/". This involved pushing the "contains a .git entry"
test significantly lower in the stack.
2. Reinstate behavior that a directory which contains a .git entry
will be treated as a submodule during iteration even if it is
not yet added to the .gitmodules.
3. Now that any directory containing .git is reported as submodule,
we have to be more careful checking for GIT_EEXISTS when we
do a submodule lookup, because that is the error code that is
returned by git_submodule_lookup when you try to look up a
directory containing .git that has no record in gitmodules or
the index.
This updates the tree iterator internals to be more efficient.
The tree_iterator_entry objects are now kept as pointers that are
allocated from a git_pool, so that we may use git__tsort_r for
sorting (which is better than qsort, given that the tree is
likely mostly ordered already).
Those tree_iterator_entry objects now keep direct pointers to the
data they refer to instead of keeping indirect index values. This
simplifies a lot of the data structure traversal code.
This also adds bsearch to find the start item position for range-
limited tree iterators, and is more explicit about using
git_path_cmp instead of reimplementing it. The git_path_cmp
changed a bit to make it easier for tree_iterators to use it (but
it was barely being used previously, so not a big deal).
This adds a git_pool_free_array function that efficiently frees a
list of pool allocated pointers (which the tree_iterator keeps).
Also, added new tests for the git_pool free list functionality
that was not previously being tested (or used).
This fixes two bugs with the workdir iterator depth check: first
that the depth was not being decremented and second that empty
directories were counting against the depth even though a frame
was not being created for them.
This also fixes a bug with the ENOTFOUND return code for workdir
iterators when you attempt to advance_into an empty directory.
Actually, that works correctly, but it was incorrectly being
propogated into regular advance() calls in some circumstances.
Added new tests for the above that create a huge hierarchy on
the fly and try using the workdir iterator to traverse it.
Given a group of case-insensitively equivalent tree iterator
entries, this ensures that the case-sensitively first trees will
be used as the representative items. I.e. if you have conflicting
entries "A/B/x", "a/b/x", and "A/b/x", this change ensures that
the earliest entry "A/B/x" will be returned. The actual choice
is not that important, but it is nice to have it stable and to
have it been either the first or last item, as opposed to a
random item from within the equivalent span.
Tree iterator advance was moving forward without taking the
filemode of the entries into account, equating "a" and "a/".
This makes the tree entry comparison code more easily reusable
and fixes the problem.
This fixes an off by one error for generating full paths for
tree entries in tree iterators when INCLUDE_TREES is set. Also,
contains a bunch of small code cleanups with a couple of small
utility functions and macro changes to eliminate redundant code.
If there are case-ambiguities in the path of a case insensitive
tree iterator, it will now rewrite the entire path when it gives
the path name to an entry, so a tree with "A/b/C/d.txt" and
"a/B/c/E.txt" will give the true full paths (instead of case-
folding them both to "A/B/C/d.txt" or "a/b/c/E.txt" or something
like that.
There is a serious bug in the previous tree iterator implementation.
If case insensitivity resulted in member elements being equivalent
to one another, and those member elements were trees, then the
children of the colliding elements would be processed in sequence
instead of in a single flattened list. This meant that the tree
iterator was not truly acting like a case-insensitive list.
This completely reworks the tree iterator to manage lists with
case insensitive equivalence classes and advance through the items
in a unified manner in a single sorted frame.
It is possible that at a future date we might want to update this
to separate the case insensitive and case sensitive tree iterators
so that the case sensitive one could be a minimal amount of code
and the insensitive one would always know what it needed to do
without checking flags.
But there would be so much shared code between the two, that I'm
not sure it that's a win. For now, this gets what we need.
More tests are needed, though.
This standardizes iterator behavior across all three iterators
(index, tree, and working directory). Previously the working
directory iterator behaved differently from the other two.
Each iterator can now operate in one of three modes:
1. *No tree results, auto expand trees* means that only non-
tree items will be returned and when a tree/directory is
encountered, we will automatically descend into it.
2. *Tree results, auto expand trees* means that results will
be given for every item found, including trees, but you
only need to call normal git_iterator_advance to yield
every item (i.e. trees returned with pre-order iteration).
3. *Tree results, no auto expand* means that calling the
normal git_iterator_advance when looking at a tree will
not descend into the tree, but will skip over it to the
next entry in the parent.
Previously, behavior 1 was the only option for index and tree
iterators, and behavior 3 was the only option for workdir.
The main public API implications of this are that the
`git_iterator_advance_into()` call is now valid for all
iterators, not just working directory iterators, and all the
existing uses of working directory iterators explicitly use
the GIT_ITERATOR_DONT_AUTOEXPAND (for now).
Interestingly, the majority of the implementation was in the
index iterator, since there are no tree entries there and now
have to fake them. The tree and working directory iterators
only required small modifications.
The iterator APIs are not currently consistent with the parameter
ordering of the rest of the codebase. This rearranges the order
of parameters, simplifies the naming of a number of functions, and
makes somewhat better use of macros internally to clean up the
iterator code.
This also expands the test coverage of iterator functionality,
making sure that case sensitive range-limited iteration works
correctly.
With the new code to make tree iterators support ignore_case,
there is a bug in setting the start entry for range bounded
iterators where memcmp was being used instead of strncasecmp.
This fixes that and expands the tree iterator test to cover
the cases that were broken.
This makes tree iterators directly support case insensitivity by
using a secondary index that can be sorted by icase. Also, this
fixes the ambiguity check in the git_status_file API to also be
case insensitive. Lastly, this adds new test cases for case
insensitive range boundary checking for all types of iterators.
With this change, it should be possible to deprecate the spool
and sort iterator, but I haven't done that yet.
This changes the iterator API so that flags can be passed in to
the constructor functions to control the ignore_case behavior.
At this point, the flags are not supported on tree iterators (i.e.
there is no functional change over the old API), but the API
changes are all made to accomodate this.
By the way, I went with a flags parameter because in the future
I have a couple of other ideas for iterator flags that will make
it easier to fix some diff/status/checkout bugs.
In preparation for further iterator changes, this cleans up a few
small things in the iterator API:
* removed the git_iterator_for_repo_index_range API
* made git_iterator_free not be inlined
* minor param name and test function name tweaks