A tree to index rename with no changes was getting erased by
the iteration routine (if the routine actually loaded the data
for the unmodified file). This invokes the code path that was
previously messing up the diff and iterates twice to make sure
that the iteration process itself doesn't modify the data.
This changes the behavior of the status RENAMED flags so that they
will be combined with the MODIFIED flags if appropriate. If a file
is modified in the index and also renamed, then the status code
will have both the GIT_STATUS_INDEX_MODIFIED and INDEX_RENAMED bits
set. If it is renamed but the OID has not changed, then just the
GIT_STATUS_INDEX_RENAMED bit will be set. Similarly, the flags
GIT_STATUS_WT_MODIFIED and GIT_STATUS_WT_RENAMED can both be set
independently of one another.
This fixes a serious bug where the check for unmodified files that
was done at data load time could end up erasing the RENAMED state
of a file that was renamed with no changes.
Lastly, this contains a bunch of new tests for status with renames,
including tests where the only rename changes are case changes.
The expected results of these tests have to vary by whether the
platform uses a case sensitive filesystem or not, so the expected
data covers those platform differences separately.
In a case insensitive index, if you attempt to add a file from
disk with a different case pattern, the old case pattern in the
index should be preserved.
This fixes that (and a couple of minor warnings).
This commit reinstates some changes to git_diff__paired_foreach
that were discarded during the rebase (because the diff_output.c
file had gone away), and also adjusts the case insensitively
logic slightly to hopefully deal with either mismatched icase
diffs and other case insensitivity scenarios.
This adds two new public APIs: git_diff_patch_from_blobs and
git_diff_patch_from_blob_and_buffer, plus it refactors the code
for git_diff_blobs and git_diff_blob_to_buffer so that they code
is almost entirely shared between these APIs, and adds tests for
the new APIs.
This implements the loading of regular expression pattern lists
for diff drivers that search for function context in that way.
This also changes the way that diff drivers update options and
interface with xdiff APIs to make them a little more flexible.
There are all sorts of misconfiguration in the wild. We already rely
on the signature constructor to trim SP. Extend the logic to use
`isspace` to decide whether a character should be trimmed.
This is a significant reorganization of the diff code to break it
into a set of more clearly distinct files and to document the new
organization. Hopefully this will make the diff code easier to
understand and to extend.
This adds a new `git_diff_driver` object that looks of diff driver
information from the attributes and the config so that things like
function content in diff headers can be provided. The full driver
spec is not implemented in the commit - this is focused on the
reorganization of the code and putting the driver hooks in place.
This also removes a few #includes from src/repository.h that were
overbroad, but as a result required extra #includes in a variety
of places since including src/repository.h no longer results in
pulling in the whole world.
1. internal iterators now return GIT_ITEROVER when you go past the
last item in the iteration.
2. git_iterator_advance will "advance" to the first item in the
iteration if it is called immediately after creating the
iterator, which allows a simpler idiom for basic iteration.
3. if git_iterator_advance encounters an error reading data (e.g.
a missing tree or an unreadable file), it returns the error
but also attempts to advance past the invalid data to prevent
an infinite loop.
Updated all tests and internal usage of iterators to account for
these new behaviors.
This adds ~/ prefix expansion for the value of core.attributesfile
and core.excludesfile, plus it fixes the fact that the attributes
cache was holding on to the string data from the config for a long
time (instead of making its own strdup) which could have caused a
problem if the config was refreshed. Adds a test for the new
expansion capability.
This extends the rename tests to make sure that every rename
scenario in the inner loop of git_diff_find_similar is actually
exercised. Also, fixes an incorrect assert that was in one of
the clauses that was not previously being exercised.
This adds a couple more tests of different rename scenarios.
Also, this fixes a problem with the case where you have two
"split" deltas and the left half of one matches the right half of
the other. That case was already being handled, but in the wrong
order in a way that could result in bad output. Also, if the swap
also happened to put the other two halves into the correct place
(i.e. two files exchanged places with each other), then the second
delta was left with the SPLIT flag set when it really should be
cleared.
This flips rename detection around so instead of creating a
forward mapping from deltas to possible rename targets, instead
it creates a reverse mapping, looking at possible targets and
trying to find a source that they could have been renamed or
copied from. This is important because each output can only
have a single source, but a given source could map to multiple
outputs (in the form of COPIED records).
Additionally, this makes a couple of tweaks to the public rename
detection APIs, mostly renaming a couple of options that control
the behavior to make more sense and to be more like core Git.
I walked through the tests looking at the exact results and
updated the expectations based on what I saw. The new code is
different from the old because it cannot give some nonsense
results (like A was renamed to both B and C) which were part of
the outputs previously.
This adds a bunch more rename detection tests including checks
vs the working directory, the new exact match options, some more
whitespace variants, etc.
This also adds a git_futils_writebuffer helper function and uses
it in checkout. This is mainly added because I wanted an easy
way to write out a git_buf to disk inside my test code.
There are a number of bugs in the rename code that only were
obvious when I started testing it against large old repos with
more complex patterns. (The code to do that testing is not ready
to merge with libgit2, but I do plan to add more thorough tests.)
This contains a significant number of changes and also tweaks the
public API slightly to make emulating core git easier.
Most notably, this separates the GIT_DIFF_FIND_AND_BREAK_REWRITES
flag into FIND_REWRITES (which adds a self-similarity score to
every modified file) and BREAK_REWRITES (which splits the modified
deltas into add/remove pairs in the diff list). When you do a raw
output of core git, rewrites show up as M090 or such, not at A and
D output, so I wanted to be able to emulate that.
Publicly, this also changes the flags to be uint16_t since we
don't need values out of that range.
Internally, this contains significant changes from a number of
small bug fixes (like using the wrong side of the diff to decide
if the object could be found in the ODB vs the workdir) to larger
issues about which files can and should be compared and how the
various edge cases of similarity scores should be treated.
Honestly, I don't think this is the last update that will have to
be made to this code, but I think this moves us closer to correct
behavior and I tried to document the code so it would be easier
to follow..
I frequently want to the the first N digits of an OID formatted
as a string and I'd like it to be efficient. This function makes
that easy and I could rewrite the OID formatters in terms of it.
Expose a way to retrieve, along with the target git_object, the reference
pointed at by some revparse expression (`@{<-n>}` or
`<branchname>@{upstream}` syntax).
When the last item in a diff was an untracked directory that only
contained ignored items, the loop to scan the contents would run
off the end of the iterator and dereference a NULL pointer. This
includes a test that reproduces the problem and a fix.
There was a problem found in the Rugged test suite where the
refdb_fs_backend__next function could exit too early in some
very specific hashing patterns for packed refs. This ports
the Rugged test to libgit2 and then fixes the bug.
Nobody should ever be using anything other than ALL at this level, so
remove the option altogether.
As part of this, git_reference_foreach_glob is now implemented in the
frontend using an iterator. Backends will later regain the ability of
doing the glob filtering in the backend.
If you use rename detection, the renamed and copied files would
not show any text diffs because the function that decides if
data should be loaded didn't know which sides of the diff to
load for those cases.
This adds a test that looks at the patch generated for diff
entries that are COPIED or RENAMED.
The git_status_file API was doing a hack to deal with files that
are inside ignored directories. The status scan was not reporting
any file in this case, so git_status_file would attempt a final
"stat()" call, and return IGNORED if the file actually existed.
On case-insensitive filesystems where core.ignorecase is set
incorrectly, this magic check can "succeed" and report a file
as ignored when it should actually return ENOTFOUND.
Now that we have the GIT_STATUS_OPT_RECURSE_IGNORED_DIRS, we can
use that flag to make sure that git_status_file() will look into
ignored directories and eliminate the hack completely, so we give
the correct error.
This clarifies the docs for git_repository_message and also adds
to the tests to explicitly check NUL termination of data when the
output buffer is smaller than the message size. There is a minor
behavior change so that a non-NULL output buffer will always be
NUL terminated (at length zero) if an error occurs.
When a repository is initialised, we need to probe to see if there is
a global config to load. If this is not the case, the user isn't able
to write to the global config without creating the backend and adding
it themselves, which is inconvenient and overly complex.
Unconditionally create and add a backend for the global config file
regardless of whether it exists as a convenience for users.
To enable this, we allow creating backends to files that do not exist
yet, changing the semantics somewhat, and making some tests invalid.
When tagopt is set to '--tags', we should only take the default tags
refspec into account and ignore any configured ones.
Bring the code into compliance.
When a patch contained an eofnl change (i.e. the last line either
gained or lost a newline), the oldno and newno line number values
for the lines in the last hunk of the patch were not useful. This
makes them behave in a more expected manner.
This adds a new line origin constant for the special line that
is used when both files end without a newline.
In the course of writing the tests for this, I was having problems
with modifying a file but not having diff notice because it was
the same size and modified less than one second from the start of
the test, so I decided to start working on nanosecond timestamp
support. This commit doesn't contain the nanosecond support, but
it contains the reorganization of maybe_modified and the hooks so
that if the nanosecond data were being read by stat() (or rather
being copied by git_index_entry__init_from_stat), then the nsec
would be taken into account.
This new stuff could probably use some more tests, although there
is some amount of it here.
Currently git_branch_set_upstream when passed a local branch
creates invalid configuration, for ex. if we setup branch
'tracking_master' to track local 'master' libgit2 generates
the following config
```
[branch "track_master"]
remote = .
merge = .refs/heads/track_master
```
The merge value is invalid and calling git_branch_upstream on
'tracking_master' results in invalid reference error.
It should do:
```
[branch "track_master"]
remote = .
merge = refs/heads/master
```
Older versions of git would only write peeled entries for
items under refs/tags/. Newer versions will write them for
all refs, and we should be prepared to handle that.
When diff encounters an untracked directory, there was a shortcut
that it took which is not compatible with core git. This makes
the default behavior no longer take that shortcut and instead look
inside the untracked directory to see if there are any untracked
files within it. If there are not, then the directory is treated
as an ignore directory instead of an untracked directory. This
has implications for the git_status APIs.
Add a new git_oid_strcmp that compares a string OID with a hex
oid for sort order, and then reimplement git_oid_streq using it.
This actually should speed up git_oid_streq because it only reads
as far into the string as it needs to, whereas previously it would
convert the whole string into an OID and then use git_oid_cmp.
git_oid_ncmp was making some assumptions about the length of
the data - this shifts the check to the top of the loop so it
will work more robustly, limits the max, and adds some tests
to verify the functionality.
This makes diff use the cvar cache for config options where
possible, and also adds support for a number of other config
options to diff including "diff.context", "diff.ignoreSubmodules",
"diff.noprefix", "diff.mnemonicprefix", and "core.abbrev".
To make this natural, this involved a rearrangement of the code
that allocates the diff object vs. the code that initializes it
based on the combination of options passed in by the user and
read from the config.
This commit includes tests for most of these new options as well.
The indexer was creating a packfile object separately from the
code in pack.c which was a problem since I put a call to
git_mutex_init into just pack.c. This commit updates the pack
function for creating a new pack object (i.e. git_packfile_check())
so that it can be used in both places and then makes indexer.c
use the shared initialization routine.
There are also a few minor formatting and warning message fixes.
This adds create and free callback to the git_objects_table so
that more of the creation and destruction of objects can be table
driven instead of using switch statements. This also makes the
semantics of certain object creation functions consistent so that
we can make better use of function pointers. This also fixes a
theoretical error case where an object allocation fails and we
end up storing NULL into the cache.
This adds some basic tests for the oidmap just to make sure that
collisions, etc. are dealt with correctly.
This also adds some tests for the new caching that check if items
are inserted (or not inserted) properly into the cache, and that
the cache can hold up in a multithreaded environment without error.
This moves most of the refdb stuff over to the include/git2/sys
directory, with some minor shifts in function organization.
While I was making the necessary updates, I also removed the
trailing whitespace in a few files that I modified just because I
was there and it was bugging me.
This moves some of the odb_backend stuff that is related to the
internals of an odb_backend implementation into include/git2/sys.
Some of the stuff related to streaming I left in include/git2
because it seemed like it would be reasonably needed by a normal
user who wanted to stream objects into and out of the ODB.
Also, I added APIs for traversing the list of backends so that
some of the tests would not need to access ODB internals.
It used to be separate as an attempt to make the querying easier, but
it didn't work out that way, so put all the data together.
Add git_refspec_string() as well to get the original string, which is
now stored alongside the independent parts.
Introduce git_remote_{fetch,push}_refspecs() to get a list of refspecs
from the remote and rename the refspec-adding functions to a less
silly name.
Use this instead of the vector index hacks in the tests.
A remote can have a multitude of refspecs. Up to now our git_remote's
have supported a single one for each fetch and push out of simplicity
to get something working.
Let the remotes and internal code know about multiple remotes and get
the tests passing with them.
Instead of setting a refspec, the external users can clear all and add
refspecs. This should be enough for most uses, though we're still
missing a querying function.
This adds a new variant iterator that is a raw filesystem iterator
for scanning directories from a root. There is still more work to
do to blend this with the working directory iterator.
The end of the header is signaled by to consecutive LFs and the commit
message starts immediately after. Jumping over LFs at the start of the
message is a bug and leads to creating different commits if
when rebuilding history.
This also fixes an empty commit message being returned as "\n".
This adds tests for diffs with submodules in them and (perhaps
unsurprisingly) requires further fixes to be made. Specifically,
this fixes:
- when considering if a submodule is dirty in the workdir, it was
being treated as dirty even if only the index was dirty.
- git_diff_patch_to_str (and git_diff_patch_print) were "printing"
the headers for files (and submodules) that were unmodified or
had no meaningful content.
- added comment to previous fix and removed unneeded parens.
The purported command output was already inaccurate, as the refs
aren't where it shows. In any event, the labels a reader of this
file really needs are the indices used in commit_sorting_*, to make
it possible to understand them by referring directly from those
arrays to the diagram rather than from the index arrays, to commit_ids,
to the diagram. Add those.
Signed-off-by: Greg Price <price@mit.edu>
Return the size we'd need to write to instead of simply an
error. Split the function into two to be used later by the upstream
configuration functions.
This started out trying to look at the problems from issue #1425
and gradually grew to a broader set of fixes. There are two core
things fixed here:
1. When you had an ignore like "/bin" which is rooted at the top
of your tree, instead of immediately adding the "bin/" entry
as an ignored item in the diff, we were returning all of the
direct descendants of the directory as ignored items. This
changes things to immediately ignore the directory. Note that
this effects the behavior in test_status_ignore__subdirectories
so that we no longer exactly match core gits ignore behavior,
but the new behavior probably makes more sense (i.e. we now
will include an ignored directory inside an untracked directory
that we previously would have left off).
2. When a submodule only contained working directory changes, the
diff code was always considering it unmodified which was just
an outright bug. The HEAD SHA of the submodule matches the SHA
in the parent repo index, and since the SHAs matches, the diff
code was overwriting the actual status with UNMODIFIED.
These fixes broke existing tests test_diff_workdir__submodules and
test_status_ignore__subdirectories but looking it over, I actually
think the new results are correct and the old results were wrong.
@nulltoken had actually commented on the subdirectory ignore issue
previously.
I also included in the tests some debugging versions of the
shared iteration callback routines that print status or diff
information. These aren't used actively in the tests, but can be
quickly swapped in to test code to give a better picture of what
is being scanned in some of the complex test scenarios.
This option has been sitting unimplemented for a while, so I
finally went through and implemented it along with some tests.
As part of this, I improved the implementation of
GIT_DIFF_IGNORE_SUBMODULES so it be more diligent about avoiding
extra work and about leaving off delta records for submodules to
the greatest extent possible (though it may include them still
if you are request TYPECHANGE records).
This implements working versions of GIT_DIFF_RECURSE_IGNORED_DIRS
and GIT_STATUS_OPT_RECURSE_IGNORED_DIRS along with some tests for
the newly available behaviors. This is not turned on by default
for status, but can be accessed via the options to the extended
version of the command.
This adds crlf/lf conversion functions into buf_text with more
efficient implementations that bypass the high level buffer
functions. They attempt to minimize the number of reallocations
done and they directly write the buffer data as needed if they
know that there is enough memory allocated to memcpy data.
Tests are added for these new functions. The crlf.c code is
updated to use the new functions.
Removed the include of buf_text.h from filter.h and just include
it more narrowly in the places that need it.
This fixes of the file contents checks in checkout to give
slightly better error messages by directly calling the underlying
clar assertions so the file and line number of the top level call
can be reported correctly, and renames the helpers to not start
with "test_" since that is kind of reserved by clar.
This also enables some of the CRLF tests on all platforms that
were previously Windows only (by pushing a check of the native
line endings into the test body).
This fixes some places where the new tests were leaving the test
area in a bad state or were freeing data they should not free.
It also removes code that is extraneous to the core issue and
fixes an invalid SHA being looked up in one of the tests (which
was failing, but for the wrong reason).
Currently, the odb cache has a fixed size of 128 slots as defined by
GIT_DEFAULT_CACHE_SIZE. Allow users to set the size of the cache via
git_libgit2_opts().
Fixes#1035.
This adds a helper function for the cases where you want to
quickly set a single boolean config value for a repository.
This allowed me to remove a lot of code.
This makes the size_t comparison test nicer (assuming that the
values are actually not using the full length), and converts
some cases that were using it for pointer comparison to use the
macro that is designed for pointer comparison.
1. Fix sort order problem with submodules where "mod" was sorting
after "mod-plus" because they were being sorted as "mod/" and
"mod-plus/". This involved pushing the "contains a .git entry"
test significantly lower in the stack.
2. Reinstate behavior that a directory which contains a .git entry
will be treated as a submodule during iteration even if it is
not yet added to the .gitmodules.
3. Now that any directory containing .git is reported as submodule,
we have to be more careful checking for GIT_EEXISTS when we
do a submodule lookup, because that is the error code that is
returned by git_submodule_lookup when you try to look up a
directory containing .git that has no record in gitmodules or
the index.
This switches the APIs for setting and getting the global/system
search paths from using git_strarray to using a simple string with
GIT_PATH_LIST_SEPARATOR delimited paths, just as the environment
PATH variable would contain. This makes it simpler to get and set
the value.
I also added code to expand "$PATH" when setting a new value to
embed the old value of the path. This means that I no longer
require separate actions to PREPEND to the value.
The goal of this work is to expose the search logic for "global",
"system", and "xdg" files through the git_libgit2_opts() interface.
Behind the scenes, I changed the logic for finding files to have a
notion of a git_strarray that represents a search path and to store
a separate search path for each of the three tiers of config file.
For each tier, I implemented a function to initialize it to default
values (generally based on environment variables), and then general
interfaces to get it, set it, reset it, and prepend new directories
to it.
Next, I exposed these interfaces through the git_libgit2_opts
interface, reusing the GIT_CONFIG_LEVEL_SYSTEM, etc., constants
for the user to control which search path they were modifying.
There are alternative designs for the opts interface / argument
ordering, so I'm putting this phase out for discussion.
Additionally, I ended up doing a little bit of clean up regarding
attr.h and attr_file.h, adding a new attrcache.h so the other two
files wouldn't have to be included in so many places.
This fixes a number of issues identified by valgrind - mostly
missed free calls. Inside valgrind, mmap() may fail which causes
some of the diff tests to fail. This adds a fallback code path
to diff_output.c:get_workdir_content() where is the mmap() fails
the code will now try to read the file data directly into allocated
memory (which is what it would do if the data needed to be filtered
anyhow).
This updates the tree iterator internals to be more efficient.
The tree_iterator_entry objects are now kept as pointers that are
allocated from a git_pool, so that we may use git__tsort_r for
sorting (which is better than qsort, given that the tree is
likely mostly ordered already).
Those tree_iterator_entry objects now keep direct pointers to the
data they refer to instead of keeping indirect index values. This
simplifies a lot of the data structure traversal code.
This also adds bsearch to find the start item position for range-
limited tree iterators, and is more explicit about using
git_path_cmp instead of reimplementing it. The git_path_cmp
changed a bit to make it easier for tree_iterators to use it (but
it was barely being used previously, so not a big deal).
This adds a git_pool_free_array function that efficiently frees a
list of pool allocated pointers (which the tree_iterator keeps).
Also, added new tests for the git_pool free list functionality
that was not previously being tested (or used).
This fixes two bugs with the workdir iterator depth check: first
that the depth was not being decremented and second that empty
directories were counting against the depth even though a frame
was not being created for them.
This also fixes a bug with the ENOTFOUND return code for workdir
iterators when you attempt to advance_into an empty directory.
Actually, that works correctly, but it was incorrectly being
propogated into regular advance() calls in some circumstances.
Added new tests for the above that create a huge hierarchy on
the fly and try using the workdir iterator to traverse it.
Given a group of case-insensitively equivalent tree iterator
entries, this ensures that the case-sensitively first trees will
be used as the representative items. I.e. if you have conflicting
entries "A/B/x", "a/b/x", and "A/b/x", this change ensures that
the earliest entry "A/B/x" will be returned. The actual choice
is not that important, but it is nice to have it stable and to
have it been either the first or last item, as opposed to a
random item from within the equivalent span.