This was the first implementation and its goal was simply to have
something that worked. It is slow and now it's just taking up
space. Remove it and switch the one known usage to use the streaming
indexer.
This adds some new tests that actually exercise the similarity
metric between files to detect renames, copies, and split modified
files that are too heavily modified.
There is still more testing to do - these tests are just partially
covering the cases.
There is also one bug fix in this where a change set with only
MODIFY being broken into ADD/DELETE (due to low self-similarity)
without any additional RENAMED entries would end up not processing
the split requests (because the num_rewrites counter got reset).
This is the initial integration of the similarity metric into
the `git_diff_find_similar()` code path. The existing tests all
pass, but the new functionality isn't currently well tested. The
integration does go through the pluggable metric interface, so it
should be possible to drop in an alternative to the internal
metric that libgit2 implements.
This comes along with a behavior change for an existing interface;
namely, passing two NULLs to git_diff_blobs (or passing NULLs to
git_diff_blob_to_buffer) will now call the file_cb parameter zero
times instead of one time. I know it's strange that that change
is paired with this other change, but it emerged from some
initialization changes that I ended up making.
Previously the git_diff_delta recorded if the delta was binary.
This replaces that (with no net change in structure size) with
a full set of flags. The flag values that were already in use
for individual git_diff_file objects are reused for the delta
flags, too (along with renaming those flags to make it clear that
they are used more generally).
This (a) makes things somewhat more consistent (because I was
using a -1 value in the "boolean" binary field to indicate unset,
whereas now I can just use the flags that are easier to understand),
and (b) will make it easier for me to add some additional flags to
the delta object in the future, such as marking the results of a
copy/rename detection or other deltas that might want a special
indicator.
While making this change, I officially moved some of the flags that
were internal only into the private diff header.
This also allowed me to remove a gross hack in rename/copy detect
code where I was overwriting the status field with an internal
value.
This plugs in the three basic similarity strategies for handling
whitespace via internal use of the pluggable API. In so doing, I
realized that the use of git_buf in the hashsig API was not needed
and actually just made it harder to use, so I tweaked that API as
well.
Note that the similarity metric is still not hooked up in the
find_similarity code - this is just setting out the function that
will be used.
The callback will be called for each file, just before the `git_delta_t` gets inserted into the diff list.
When the callback:
- returns < 0, the diff process will be aborted
- returns > 0, the delta will not be inserted into the diff list, but the diff process continues
- returns 0, the delta is inserted into the diff list, and the diff process continues
This adds a `git_diff_patch_line_stats()` API that gets the total
number of adds, deletes, and context lines in a patch. This will
make it a little easier to emulate `git diff --stat` and the like.
Right now, this relies on generating the `git_diff_patch` object,
which is a pretty heavyweight way to get stat information. At
some future point, it would probably be nice to be able to get
this information without allocating the entire `git_diff_patch`,
but that's a much larger project.
This is a convenience function to get the branch name of a given
ref. The returned branch name is compatible with the name that can
be supplied e.g. to git_branch_lookup(). That is, the prefixes
"refs/heads" or "refs/remotes" are omitted.
Also added a new test for testing the new function.
This adds a new external API git_tree_entry_cmp and a new internal
API git_tree_entry_icmp for sorting tree entries. The case
insensitive one is internal only because general users should
never be seeing case-insensitively sorted trees.
This adds an option to checkout a la the diff option to turn off
fnmatch evaluation for pathspec entries. This can be useful to
make sure your "pattern" in really interpretted as an exact file
match only.
All the ODB backends have a specific refresh interface. When reading an
object, first we attempt every single backend: if the read fails, then
we refresh all the backends and retry the read one more time to see if
the object has appeared.
This moves the implementation of these two APIs into common code
that will be shared between the two. Also, this adds tests for
the `git_diff_blob_to_buffer` API. Lastly, this adds some extra
`const` to a few places that can use it.
This moves a lot of the detailed checkout documentation into a new
file (docs/checkout-internals.md) and simplifies the public docs
for the checkout API.
This adds a new API to the submodule interface that just returns
where information about the submodule was found (e.g. config file
only or in the HEAD, index, or working directory).
Also, the old "refresh" call was potentially keeping some stale
submodule data around, so this simplfies that code and literally
discards the old cache, then reallocates.
Make checkout update entries in the index for all files that are
updated and/or removed, unless flag GIT_CHECKOUT_DONT_UPDATE_INDEX
is given. To do this, iterators were extended to allow a little
more introspection into the index being iterated over, etc.
This flips checkout back to be driven off the changes between
the baseline and the target trees. This reinstates the complex
code for tracking the contents of the working directory, but
overall, I think the resulting logic is easier to follow.
I've tried to map out the detailed behaviors of checkout and make
sure that we're handling the various cases correctly, along with
providing options to allow us to emulate "git checkout" and "git
checkout-index" with the various flags. I've thrown away flags
in the checkout API that seemed like clutter and added some new
ones. Also, I've converted the conflict callback to a general
notification callback so we can emulate "git checkout" output and
display "dirty" files.
As of this commit, the new behavior is not working 100% but some
of that is probably baked into tests that are not testing the
right thing. This is a decent snapshot point, I think, along the
way to getting the update done.
The diff constructor functions had some confusing names, where the
"old" side of the diff was coming after the "new" side. This
reverses the order in the function name to make it less confusing.
Specifically...
* git_diff_index_to_tree becomes git_diff_tree_to_index
* git_diff_workdir_to_index becomes git_diff_index_to_workdir
* git_diff_workdir_to_tree becomes git_diff_tree_to_workdir
This removes the need to explicitly pass the repo into iterators
where the repo is implied by the other parameters. This moves
the repo to be owned by the parent struct. Also, this has some
iterator related updates to the internal diff API to lay the
groundwork for checkout improvements.
Moved it into graph.{c,h} which i created for the new "graph"
functions namespace. Also adjusted the function prototype
to use `size_t` and `const git_oid *`.
This fixes some missed places where we can apply const-ness to
various public APIs.
There are still some index and tree APIs that cannot take const
pointers because we sort our `git_vectors` lazily and so we can't
reliably bsearch the index and tree content without applying a
`git_vector_sort()` first.
This also fixes some missed places where size_t can be used and
where const can be applied to a couple internal functions.
This makes the diff functions that take callbacks both take
the payload parameter after the callback function pointers and
pass the payload as the last argument to the callback function
instead of the first. This should make them consistent with
other callbacks across the API.
A number of diff APIs and the `git_checkout_index` API take a
`git_repository` object an operate on the index. This updates
them to take a `git_index` pointer explicitly and only fall back
on the `git_repository` index if the index input is NULL. This
makes it easier to operate on a temporary index.
This fixes a number of warnings and problems with cross-platform
builds. Among other things, it's not safe to name a member of a
structure "strcmp" because that may be #defined.
This is a major reworking of checkout strategy options. The
checkout code is now sensitive to the contents of the HEAD tree
and the new options allow you to update the working tree so that
it will match the index content only when it previously matched
the contents of the HEAD. This allows you to, for example, to
distinguish between removing files that are in the HEAD but not
in the index, vs just removing all untracked files.
Because of various corner cases that arise, etc., this required
some additional capabilities in rmdir and other utility functions.
This includes the beginnings of an implementation of code to read
a partial tree into the index based on a pathspec, but that is
not enabled because of the possibility of creating conflicting
index entries.
This improves docs in some of the public header files, cleans
up and improves some of the example code, and fixes a couple
of pedantic warnings in places.
This adds a new API that allows users to reload the config if the
file has changed on disk. A new config callback function to
refresh the config was added.
The modified time and file size are used to test if the file needs
to be reloaded (and are now stored in the disk backend object).
In writing tests, just using mtime was a problem / race, so I
wanted to check file size as well. To support that, I extended
`git_futils_readbuffer_updated` to optionally check file size in
addition to mtime, and I added a new function `git_filebuf_stats`
to fetch the mtime and size for an open filebuf (so that the
config could be easily refreshed after a write).
Lastly, I moved some similar file checking code for attributes
into filebuf. It is still only being used for attrs, but it
seems potentially reusable, so I thought I'd move it over.
This improves the naming for the rename related functionality
moving it to be called `git_diff_find_similar()` and renaming
all the associated constants, etc. to make more sense.
I also moved the new code (plus the existing `git_diff_merge`)
into a new file `diff_tform.c` where I can put new functions
related to manipulating git diff lists.
This also updates the implementation significantly from the
last revision fixing some ordering issues (where break-rewrite
needs to be handled prior to copy and rename detection) and
improving config option handling.
This adds a `git_diff_patch_print()` API which is more like the
existing API to "print" a patch from an entire `git_diff_list`
but operates on a single `git_diff_patch` object.
Also, it rewrites the `git_diff_patch_to_str()` API to use that
function (making it very small).
This implements the basis for diff rename and copy detection,
although it is based on simple SHA comparison right now instead
of using a matching algortihm. Just as `git_diff_merge` can be
used as a post-pass on diffs to emulate certain command line
behaviors, there is a new API `git_diff_detect` which will
update a diff list in-place, adjusting some deltas to RENAMED
or COPIED state (and also, eventually, splitting MODIFIED deltas
where the change is too large into DELETED/ADDED pairs).
This also adds a new test repo that will hold rename/copy/split
scenarios. Right now, it just has exact-match rename and copy,
but the tests are written to use tree diffs, so we should be able
to add new test scenarios easily without breaking tests.
Added `struct git_config_entry`: a git_config_entry contains the key, the value, and the config file level from which a config element was found.
Added `git_config_open_level`: build a single-level focused config object from a multi-level one.
We are now storing `git_config_entry`s in the khash of the config_file
git_index_read_tree() was exposing a parameter to provide the user with
a progress indicator. Unfortunately, due to the recursive nature of the
tree walk, the maximum number of items to process was unknown. Thus,
the indicator was only counting processed entries, without providing
any information how the number of remaining items.
Introduce git_remote_stop() which sets a variable that is checked by
the fetch process in a few key places. If this is variable is set, the
fetch is aborted.
To answer if a single given file should be ignored, the path to
that file has to be processed progressively checking that there
are no intermediate ignored directories in getting to the file
in question. This enables that, fixing the broken old behavior,
and adds tests to exercise various ignore situations.