This adds three new public APIs for manipulating the index:
1. `git_index_add_all` is similar to `git add -A` and will add
files in the working directory that match a pathspec to the
index while honoring ignores, etc.
2. `git_index_remove_all` removes files from the index that match
a pathspec.
3. `git_index_update_all` updates entries in the index based on
the current contents of the working directory, either added
the new information or removing the entry from the index.
This makes the diff rename tracking code more careful about the
order in which it processes renames and more thorough in updating
the mapping of correct renames when an earlier rename update
alters the index of a later matched pair.
This adds parameters to the four functions that allow for blob-to-
blob and blob-to-buffer differencing (either via callbacks or by
making a git_diff_patch object). These parameters let you say
that filename we should pretend the blob has while doing the diff.
If you pass NULL, there should be no change from the existing
behavior, which is to skip using attributes for file type checks
and just look at content. With the parameters, you can plug into
the new diff driver functionality and get binary or non-binary
behavior, plus function context regular expressions, etc.
This commit also fixes things so that the git_diff_delta that is
generated by these functions will actually be populated with the
data that we know about the blobs (or buffers) so you can use it
appropriately. It also fixes a bug in generating patches from
the git_diff_patch objects created via these functions.
Lastly, there is one other behavior change that may matter. If
there is no difference between the two blobs, these functions no
longer generate any diff callbacks / patches unless you have
passed in GIT_DIFF_INCLUDE_UNMODIFIED. This is pretty natural,
but could potentially change the behavior of existing usage.
A tree to index rename with no changes was getting erased by
the iteration routine (if the routine actually loaded the data
for the unmodified file). This invokes the code path that was
previously messing up the diff and iterates twice to make sure
that the iteration process itself doesn't modify the data.
This changes the behavior of the status RENAMED flags so that they
will be combined with the MODIFIED flags if appropriate. If a file
is modified in the index and also renamed, then the status code
will have both the GIT_STATUS_INDEX_MODIFIED and INDEX_RENAMED bits
set. If it is renamed but the OID has not changed, then just the
GIT_STATUS_INDEX_RENAMED bit will be set. Similarly, the flags
GIT_STATUS_WT_MODIFIED and GIT_STATUS_WT_RENAMED can both be set
independently of one another.
This fixes a serious bug where the check for unmodified files that
was done at data load time could end up erasing the RENAMED state
of a file that was renamed with no changes.
Lastly, this contains a bunch of new tests for status with renames,
including tests where the only rename changes are case changes.
The expected results of these tests have to vary by whether the
platform uses a case sensitive filesystem or not, so the expected
data covers those platform differences separately.
In a case insensitive index, if you attempt to add a file from
disk with a different case pattern, the old case pattern in the
index should be preserved.
This fixes that (and a couple of minor warnings).
This commit reinstates some changes to git_diff__paired_foreach
that were discarded during the rebase (because the diff_output.c
file had gone away), and also adjusts the case insensitively
logic slightly to hopefully deal with either mismatched icase
diffs and other case insensitivity scenarios.
This adds two new public APIs: git_diff_patch_from_blobs and
git_diff_patch_from_blob_and_buffer, plus it refactors the code
for git_diff_blobs and git_diff_blob_to_buffer so that they code
is almost entirely shared between these APIs, and adds tests for
the new APIs.
This implements the loading of regular expression pattern lists
for diff drivers that search for function context in that way.
This also changes the way that diff drivers update options and
interface with xdiff APIs to make them a little more flexible.
There are all sorts of misconfiguration in the wild. We already rely
on the signature constructor to trim SP. Extend the logic to use
`isspace` to decide whether a character should be trimmed.
This is a significant reorganization of the diff code to break it
into a set of more clearly distinct files and to document the new
organization. Hopefully this will make the diff code easier to
understand and to extend.
This adds a new `git_diff_driver` object that looks of diff driver
information from the attributes and the config so that things like
function content in diff headers can be provided. The full driver
spec is not implemented in the commit - this is focused on the
reorganization of the code and putting the driver hooks in place.
This also removes a few #includes from src/repository.h that were
overbroad, but as a result required extra #includes in a variety
of places since including src/repository.h no longer results in
pulling in the whole world.
1. internal iterators now return GIT_ITEROVER when you go past the
last item in the iteration.
2. git_iterator_advance will "advance" to the first item in the
iteration if it is called immediately after creating the
iterator, which allows a simpler idiom for basic iteration.
3. if git_iterator_advance encounters an error reading data (e.g.
a missing tree or an unreadable file), it returns the error
but also attempts to advance past the invalid data to prevent
an infinite loop.
Updated all tests and internal usage of iterators to account for
these new behaviors.
This adds ~/ prefix expansion for the value of core.attributesfile
and core.excludesfile, plus it fixes the fact that the attributes
cache was holding on to the string data from the config for a long
time (instead of making its own strdup) which could have caused a
problem if the config was refreshed. Adds a test for the new
expansion capability.
This extends the rename tests to make sure that every rename
scenario in the inner loop of git_diff_find_similar is actually
exercised. Also, fixes an incorrect assert that was in one of
the clauses that was not previously being exercised.
This adds a couple more tests of different rename scenarios.
Also, this fixes a problem with the case where you have two
"split" deltas and the left half of one matches the right half of
the other. That case was already being handled, but in the wrong
order in a way that could result in bad output. Also, if the swap
also happened to put the other two halves into the correct place
(i.e. two files exchanged places with each other), then the second
delta was left with the SPLIT flag set when it really should be
cleared.
This flips rename detection around so instead of creating a
forward mapping from deltas to possible rename targets, instead
it creates a reverse mapping, looking at possible targets and
trying to find a source that they could have been renamed or
copied from. This is important because each output can only
have a single source, but a given source could map to multiple
outputs (in the form of COPIED records).
Additionally, this makes a couple of tweaks to the public rename
detection APIs, mostly renaming a couple of options that control
the behavior to make more sense and to be more like core Git.
I walked through the tests looking at the exact results and
updated the expectations based on what I saw. The new code is
different from the old because it cannot give some nonsense
results (like A was renamed to both B and C) which were part of
the outputs previously.
This adds a bunch more rename detection tests including checks
vs the working directory, the new exact match options, some more
whitespace variants, etc.
This also adds a git_futils_writebuffer helper function and uses
it in checkout. This is mainly added because I wanted an easy
way to write out a git_buf to disk inside my test code.
There are a number of bugs in the rename code that only were
obvious when I started testing it against large old repos with
more complex patterns. (The code to do that testing is not ready
to merge with libgit2, but I do plan to add more thorough tests.)
This contains a significant number of changes and also tweaks the
public API slightly to make emulating core git easier.
Most notably, this separates the GIT_DIFF_FIND_AND_BREAK_REWRITES
flag into FIND_REWRITES (which adds a self-similarity score to
every modified file) and BREAK_REWRITES (which splits the modified
deltas into add/remove pairs in the diff list). When you do a raw
output of core git, rewrites show up as M090 or such, not at A and
D output, so I wanted to be able to emulate that.
Publicly, this also changes the flags to be uint16_t since we
don't need values out of that range.
Internally, this contains significant changes from a number of
small bug fixes (like using the wrong side of the diff to decide
if the object could be found in the ODB vs the workdir) to larger
issues about which files can and should be compared and how the
various edge cases of similarity scores should be treated.
Honestly, I don't think this is the last update that will have to
be made to this code, but I think this moves us closer to correct
behavior and I tried to document the code so it would be easier
to follow..
I frequently want to the the first N digits of an OID formatted
as a string and I'd like it to be efficient. This function makes
that easy and I could rewrite the OID formatters in terms of it.
Expose a way to retrieve, along with the target git_object, the reference
pointed at by some revparse expression (`@{<-n>}` or
`<branchname>@{upstream}` syntax).
When the last item in a diff was an untracked directory that only
contained ignored items, the loop to scan the contents would run
off the end of the iterator and dereference a NULL pointer. This
includes a test that reproduces the problem and a fix.
There was a problem found in the Rugged test suite where the
refdb_fs_backend__next function could exit too early in some
very specific hashing patterns for packed refs. This ports
the Rugged test to libgit2 and then fixes the bug.
Nobody should ever be using anything other than ALL at this level, so
remove the option altogether.
As part of this, git_reference_foreach_glob is now implemented in the
frontend using an iterator. Backends will later regain the ability of
doing the glob filtering in the backend.
If you use rename detection, the renamed and copied files would
not show any text diffs because the function that decides if
data should be loaded didn't know which sides of the diff to
load for those cases.
This adds a test that looks at the patch generated for diff
entries that are COPIED or RENAMED.
The git_status_file API was doing a hack to deal with files that
are inside ignored directories. The status scan was not reporting
any file in this case, so git_status_file would attempt a final
"stat()" call, and return IGNORED if the file actually existed.
On case-insensitive filesystems where core.ignorecase is set
incorrectly, this magic check can "succeed" and report a file
as ignored when it should actually return ENOTFOUND.
Now that we have the GIT_STATUS_OPT_RECURSE_IGNORED_DIRS, we can
use that flag to make sure that git_status_file() will look into
ignored directories and eliminate the hack completely, so we give
the correct error.
This clarifies the docs for git_repository_message and also adds
to the tests to explicitly check NUL termination of data when the
output buffer is smaller than the message size. There is a minor
behavior change so that a non-NULL output buffer will always be
NUL terminated (at length zero) if an error occurs.
When a repository is initialised, we need to probe to see if there is
a global config to load. If this is not the case, the user isn't able
to write to the global config without creating the backend and adding
it themselves, which is inconvenient and overly complex.
Unconditionally create and add a backend for the global config file
regardless of whether it exists as a convenience for users.
To enable this, we allow creating backends to files that do not exist
yet, changing the semantics somewhat, and making some tests invalid.
When tagopt is set to '--tags', we should only take the default tags
refspec into account and ignore any configured ones.
Bring the code into compliance.
When a patch contained an eofnl change (i.e. the last line either
gained or lost a newline), the oldno and newno line number values
for the lines in the last hunk of the patch were not useful. This
makes them behave in a more expected manner.
This adds a new line origin constant for the special line that
is used when both files end without a newline.
In the course of writing the tests for this, I was having problems
with modifying a file but not having diff notice because it was
the same size and modified less than one second from the start of
the test, so I decided to start working on nanosecond timestamp
support. This commit doesn't contain the nanosecond support, but
it contains the reorganization of maybe_modified and the hooks so
that if the nanosecond data were being read by stat() (or rather
being copied by git_index_entry__init_from_stat), then the nsec
would be taken into account.
This new stuff could probably use some more tests, although there
is some amount of it here.
Currently git_branch_set_upstream when passed a local branch
creates invalid configuration, for ex. if we setup branch
'tracking_master' to track local 'master' libgit2 generates
the following config
```
[branch "track_master"]
remote = .
merge = .refs/heads/track_master
```
The merge value is invalid and calling git_branch_upstream on
'tracking_master' results in invalid reference error.
It should do:
```
[branch "track_master"]
remote = .
merge = refs/heads/master
```
Older versions of git would only write peeled entries for
items under refs/tags/. Newer versions will write them for
all refs, and we should be prepared to handle that.
When diff encounters an untracked directory, there was a shortcut
that it took which is not compatible with core git. This makes
the default behavior no longer take that shortcut and instead look
inside the untracked directory to see if there are any untracked
files within it. If there are not, then the directory is treated
as an ignore directory instead of an untracked directory. This
has implications for the git_status APIs.
Add a new git_oid_strcmp that compares a string OID with a hex
oid for sort order, and then reimplement git_oid_streq using it.
This actually should speed up git_oid_streq because it only reads
as far into the string as it needs to, whereas previously it would
convert the whole string into an OID and then use git_oid_cmp.
git_oid_ncmp was making some assumptions about the length of
the data - this shifts the check to the top of the loop so it
will work more robustly, limits the max, and adds some tests
to verify the functionality.
This makes diff use the cvar cache for config options where
possible, and also adds support for a number of other config
options to diff including "diff.context", "diff.ignoreSubmodules",
"diff.noprefix", "diff.mnemonicprefix", and "core.abbrev".
To make this natural, this involved a rearrangement of the code
that allocates the diff object vs. the code that initializes it
based on the combination of options passed in by the user and
read from the config.
This commit includes tests for most of these new options as well.
The indexer was creating a packfile object separately from the
code in pack.c which was a problem since I put a call to
git_mutex_init into just pack.c. This commit updates the pack
function for creating a new pack object (i.e. git_packfile_check())
so that it can be used in both places and then makes indexer.c
use the shared initialization routine.
There are also a few minor formatting and warning message fixes.
This adds create and free callback to the git_objects_table so
that more of the creation and destruction of objects can be table
driven instead of using switch statements. This also makes the
semantics of certain object creation functions consistent so that
we can make better use of function pointers. This also fixes a
theoretical error case where an object allocation fails and we
end up storing NULL into the cache.
This adds some basic tests for the oidmap just to make sure that
collisions, etc. are dealt with correctly.
This also adds some tests for the new caching that check if items
are inserted (or not inserted) properly into the cache, and that
the cache can hold up in a multithreaded environment without error.
This moves most of the refdb stuff over to the include/git2/sys
directory, with some minor shifts in function organization.
While I was making the necessary updates, I also removed the
trailing whitespace in a few files that I modified just because I
was there and it was bugging me.
This moves some of the odb_backend stuff that is related to the
internals of an odb_backend implementation into include/git2/sys.
Some of the stuff related to streaming I left in include/git2
because it seemed like it would be reasonably needed by a normal
user who wanted to stream objects into and out of the ODB.
Also, I added APIs for traversing the list of backends so that
some of the tests would not need to access ODB internals.
It used to be separate as an attempt to make the querying easier, but
it didn't work out that way, so put all the data together.
Add git_refspec_string() as well to get the original string, which is
now stored alongside the independent parts.
Introduce git_remote_{fetch,push}_refspecs() to get a list of refspecs
from the remote and rename the refspec-adding functions to a less
silly name.
Use this instead of the vector index hacks in the tests.
A remote can have a multitude of refspecs. Up to now our git_remote's
have supported a single one for each fetch and push out of simplicity
to get something working.
Let the remotes and internal code know about multiple remotes and get
the tests passing with them.
Instead of setting a refspec, the external users can clear all and add
refspecs. This should be enough for most uses, though we're still
missing a querying function.
This adds a new variant iterator that is a raw filesystem iterator
for scanning directories from a root. There is still more work to
do to blend this with the working directory iterator.