There are a few tests that set up a fake home directory and a
fake GLOBAL search path so that we can test things in global
ignore or attribute or config files. This cleans up that code to
work more robustly even if there is a test failure. This also
fixes some valgrind warnings where scanning search paths for
separators could end up doing a little bit of sketchy data access
when coming to the end of search list.
This is a proposed adjustment to the trace APIs. This makes the
trace levels into a bitmask so that they can be selectively enabled
and adds a callback-level payload, plus a message-level payload.
This makes it easier for me to a GIT_TRACE_PERF callbacks that
are simply bypassed if the PERF level is not set.
This adds an option to refresh the stat cache while generating
status. It also rips out the GIT_PERF stuff I had an makes use
of the trace API to keep statistics about what happens during diff.
When diff is scanning the working directory, if it finds a file
where it is not sure if the index entry matches the working dir,
it will recalculate the OID (which is pretty expensive). This
adds a new flag to diff so that if the OID calculation finds that
the file actually has not changed (i.e. just the modified time was
altered or such), then it will refresh the stat cache in the index
so that future calls to diff will not have to check the oid again.
When we think the stat cache in the index seems valid and the size
or mode of a file has definitely changed, then don't bother trying
to recalculate the OID of the workdir bits to confirm that it is
modified - just accept that it is modified.
This can result in files that show as modified with no actual diff,
but the behavior actually appears to match Git on the command line.
This also includes a minor optimization to not perform a submodule
lookup on the ".git" directory itself.
The current version of the commit creation and amend function are unsafe
to use when passing the update_ref parameter, as they do not check that
the reference at the moment of update points to what the user expects.
Make sure that we're moving history forward when we ask the library to
update the reference for us by checking that the first parent of the new
commit is the current value of the reference. We also make sure that the
ref we're updating hasn't moved between the read and the write.
Similarly, when amending a commit, make sure that the current tip of the
branch is the commit we're amending.
There is an interesting difference with core Git here, though.
Because libgit2 will do rename detection with the working directory,
in the last case where the HEAD and the working directory both
have the decomposed data and the index has the composed data, we
generate a single status record with two renames whereas Git will
generate one rename (head to index) and one untracked file.
The current FETCH_HEAD parsing code assumes that a quote must end the
branch name. Git however allows for quotes as part of a branch name,
which causes us to consider the FETCH_HEAD file as invalid.
Instead of searching for a single quote char, search for a quote char
followed by SP, which is not a valid part of a ref name.
When diff finds an untracked directory, it emulates Git behavior
by looking inside the directory to see if there are any untracked
items inside it. If there are only ignored items inside the dir,
then diff considers it ignored, even if there is no direct ignore
rule for it.
Checkout was not copying this behavior - when it found an untracked
directory, it just treated it as untracked. Unfortunately, when
combined with GIT_CHECKOUT_REMOVE_UNTRACKED, this made is seem that
checkout (and stash, which uses checkout) was removing ignored
items when you had only asked it to remove untracked ones.
This commit moves the logic for advancing past an untracked dir
while scanning for non-ignored items into an iterator helper fn,
and uses that for both diff and checkout.
To emulate git, stash should not remove untracked git repositories
inside the parent repo, and checkout's REMOVE_UNTRACKED should
also skip over these items.
`git stash` actually prints a warning message for these items.
That should be possible with a checkout notify callback if you
wanted to, although it would require a bit of extra logic as things
are at the moment.
This takes the `--stat` and related example options in the example
diff.c program and converts them to use the `git_diff_get_stats`
API which nicely formats stats for you.
I went to add bar-graph scaling to the stats formatter and noticed
that the `git_diff_stats` structure was holding on to all of the
`git_patch` objects. Unfortunately, each of these objects keeps
the full text of the diff in memory, so this is very expensive. I
ended up modifying `git_diff_stats` to keep just the data that it
needs to keep and allowed it to release the patches. Then, I added
width scaling to the output on top of that.
In making the diff example program match 'git diff' output, I ended
up removing an newline from the sumamry output which I then had to
compensate for in the email formatting to match the expectations.
Lastly, I went through and refactored the tests to use a couple of
helper functions and reduce the overall amount of code there.
Ignore patterns that ended with a trailing '/*' were still needing
to match against another actual '/' character in the full path.
This is not the same behavior as core Git.
Instead, we strip a trailing '/*' off of any patterns that were
matching and just take it to imply the FNM_LEADING_DIR behavior.
There was a latent bug where files that use macro definitions
could be parsed before the macro definitions were loaded. Because
of attribute file caching, preloading files that are going to be
used doesn't add a significant amount of overhead, so let's always
preload any files that could contain macros before we assemble the
actual vector of files to scan for attributes.
When traversing the directory structure, the iterator pushes and
pops ignore files using a vector. Some directories don't have
ignore files, so it uses a path comparison to decide when it is
right to actually pop the last ignore file. This was only
comparing directory suffixes, though, so a subdirectory with the
same name as a parent could result in the parent's .gitignore
being popped off the list ignores too early. This changes the
logic to compare the entire relative path of the ignore file.
When we delete an entry, we also want to refresh the configuration to
catch any changes that happened externally.
This allows us to simplify the logic, as we no longer need to delete
these variables internally. The whole state will be refreshed and the
deleted entries won't be there.
With the isolation of complex reads, we can now try to refresh the
on-disk file before reading a value from it.
This changes the semantics a bit, as before we could be sure that a
string we got from the configuration was valid until we wrote or
refreshed. This is no longer the case, as a read can also invalidate the
pointer.
In order to have consistent views of the config files for remotes,
submodules et al. and a configuration that represents what is currently
stored on-disk, we need a way to provide a view of the configuration
that does not change.
The goal here is to provide the snapshotting part by creating a
read-only copy of the state of the configuration at a particular point
in time, which does not change when a repository's main config changes.
On set, we set/add the value written to the config's internal values,
but we do not refresh old values.
Document this in a test in preparation for the refresh changes.
The checks to see if files were out of date in the attibute cache
was wrong because the cache-breaker data wasn't getting stored
correctly. Additionally, when the cache-breaker triggered, the
old file data was being leaked.
In the threading tests, I was still seeing a race condition where
the same item could end up being inserted multiple times into the
index. Preserving the sorted-ness of the index outside of the
`index_insert` call fixes the issue.
This is a big refactoring of the attribute file cache to be a bit
simpler which in turn makes it easier to enforce a lock around any
updates to the cache so that it can be used in a threaded env.
Tons of changes to the attributes and ignores code.
This makes the lock management on the index a little bit broader,
having a number of routines hold the lock across looking up the
item to be modified and actually making the modification. Still
not true thread safety, but more pure index modifications are now
safe which allows the simple cases (such as starting up a diff
while index modifications are underway) safe enough to get the
snapshot without hitting allocation problems.
As part of this, I simplified the allocation of index entries to
use a flex array and just put the path at the end of the index
entry. This makes every entry self-contained and makes it a
little easier to feel sure that pointers to strings aren't
being accidentally copied and freed while other references are
still being held.
This adds a basic test of doing simultaneous diffs on multiple
threads and adds basic locking for the attr file cache because
that was the immediate problem that arose from these tests.
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
The usefulness of these helpers came up for me while debugging
some of the iterator changes that I was making, so since they
have also been requested (albeit indirectly) I thought I'd include
them.
git_branch_t is an enum so requesting GIT_BRANCH_LOCAL | GIT_BRANCH_REMOTE is not possible as it is not a member of the enum (at least VS2013 C++ complains about it).
This fixes a regression introduced in commit a667ca8298 (PR #1946).
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Ignore rules with slashes in them are matched using FNM_PATHNAME
and use the path to the .gitignore file from the root of the
repository along with the path fragment (including slashes) in
the ignore file itself. Unfortunately, the relative path to the
.gitignore file was being applied to the global core.excludesfile
if that was also named ".gitignore".
This fixes that with more precise matching and includes test for
ignore rules with leading slashes (which were the primary example
of this being broken in the real world).
This also backports an improvement to the file context logic from
the threadsafe-iterators branch where we don't rely on mutating
the key of the attribute file name to generate the context path.
There were a couple bugs in popping ignore files during iteration
that could result in incorrect decisions be made and thus ignore
files below the root either not being loaded correctly or not
being popped at the right time.
One bug was an off-by-one in comparing the path of the gitignore
file with the path being exited during iteration.
The second bug was not correctly truncating the path being tracked
during traversal if there were no ignores on the list (i.e. when
you have no .gitignore at the root, but do have some in contained
directories).
This updates how libgit2 treats submodule-like directories that
actually have tracked content inside of them. This is a strange
corner case, but it seems that many people have abortive submodule
setups and then just went ahead and added the files into the
parent repository. In this case, we should just treat the
submodule as if it was a normal directory.
Libgit2 will still try to skip over real submodules and contained
repositories that do not have tracked files inside them, but this
adds some new handling for cases where the apparently submodule
data is in conflict with the actual list of tracked files.
The internal buffer in the `git_path_iconv_t` structure was not
being reset before the calls to `iconv` were made to convert data,
so if there were multiple decomposed Unicode paths in a single
directory, paths after the first one were being appended to the
first instead of treated as independent data.
The base for the relative urls is determined as follows, with descending
priority:
- remote url of HEAD's remote tracking branch
- remote "origin"
- workdir
This follows git.git behaviour
Cloning from an empty repo must set master's upstream to origin's
master, even if neither of them exist.
Fetching from a non-empty origin must then mark the master branch
for-merge. This currently fails.
One blame test replies on being run from within the libgit2
repository to leverage having a longer history to play with, but
some bundled versions of libgit2 don't have the whole libgit2
history. This just skips that test if the repository can't be
opened.
There was a little bug where the submodule cache thought that the
index date was out of date even when it wasn't that was resulting
in some extra scans of index data even when not needed.
Mostly this commit adds a bunch of new tests including adding and
removing submodules in the index and in the HEAD and seeing if we
can automatically pick them up when refreshing.
With the new submodule cache validity checks, we generally don't
need to call git_submodule_reload_all to have up-to-date submodule
data. Some tests are still calling it where I want to actually
test that it can be called safely and doesn't break anything, but
mostly it is not needed.
This also expands some of the existing submodule tests to cover
some variants on the behavior that was already being tested.