Make our overflow checking look more like gcc and clang's, so that
we can substitute it out with the compiler instrinsics on platforms
that support it. This means dropping the ability to pass `NULL` as
an out parameter.
As a result, the macros also get updated to reflect this as well.
Introduce git__reallocarray that checks the product of the number
of elements and element size for overflow before allocation. Also
introduce git__mallocarray that behaves like calloc, but without the
`c`. (It does not zero memory, for those truly worried about every
cycle.)
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
This improvement the management of the lock around submodule cache
updates slightly, using the lock to make sure that foreach can
safely make a snapshot of all existing submodules and making sure
that git_submodule_add_setup also grabs a lock before inserting
the new submodule. Cache initialization / refresh should already
have been holding the lock correctly as it adds submodules.
This updates the git_pqueue to simply be a set of specialized
init/insert/pop functions on a git_vector.
To preserve the pqueue feature of having a fixed size heap, I
converted the "sorted" field in git_vectors to a more general
"flags" field so that pqueue could mix in it's own flag. This
had a bunch of ramifications because a number of places were
directly looking at the vector "sorted" field - I added a couple
new git_vector helpers (is_sorted, set_sorted) so the specific
representation of this information could be abstracted.
This renames git_vector_free_all to the better git_vector_free_deep
and also contains a couple of memory leak fixes based on valgrind
checks. The fixes are specifically: failure to free global dir
path variables when not compiled with threading on and failure to
free filters from the filter registry that had not be initialized
fully.
This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
There are a lot of places that we call git__free on each item in
a vector and then call git_vector_free on the vector itself. This
just wraps that up into one convenient helper function.
When diff encounters an untracked directory, there was a shortcut
that it took which is not compatible with core git. This makes
the default behavior no longer take that shortcut and instead look
inside the untracked directory to see if there are any untracked
files within it. If there are not, then the directory is treated
as an ignore directory instead of an untracked directory. This
has implications for the git_status APIs.
This fixes some missed places where we can apply const-ness to
various public APIs.
There are still some index and tree APIs that cannot take const
pointers because we sort our `git_vectors` lazily and so we can't
reliably bsearch the index and tree content without applying a
`git_vector_sort()` first.
This also fixes some missed places where size_t can be used and
where const can be applied to a couple internal functions.
This improves the naming for the rename related functionality
moving it to be called `git_diff_find_similar()` and renaming
all the associated constants, etc. to make more sense.
I also moved the new code (plus the existing `git_diff_merge`)
into a new file `diff_tform.c` where I can put new functions
related to manipulating git diff lists.
This also updates the implementation significantly from the
last revision fixing some ordering issues (where break-rewrite
needs to be handled prior to copy and rename detection) and
improving config option handling.
The goal of this work is to rewrite git_status_file to use the
same underlying code as git_status_foreach.
This is done in 3 phases:
1. Extend iterators to allow ranged iteration with start and
end prefixes for the range of file names to be covered.
2. Improve diff so that when there is a pathspec and there is
a common non-wildcard prefix of the pathspec, it will use
ranged iterators to minimize excess iteration.
3. Rewrite git_status_file to call git_status_foreach_ext
with a pathspec that covers just the one file being checked.
Since ranged iterators underlie the status & diff implementation,
this is actually fairly efficient. The workdir iterator does
end up loading the contents of all the directories down to the
single file, which should ideally be avoided, but it is pretty
good.
This is a major reorganization of the diff code. This changes
the diff functions to use the iterators for traversing the
content. This allowed a lot of code to be simplified. Also,
this moved the functions relating to outputting a diff into a
new file (diff_output.c).
This includes a number of other changes - adding utility
functions, extending iterators, etc. plus more tests for the
diff code. This also takes the example diff.c program much
further in terms of emulating git-diff command line options.
This update addresses all of the feedback in pull request #570.
The biggest change was to create actual linked list stacks for
storing the tree and workdir iterator state. This cleaned up
the code a ton. Additionally, all of the static functions had
their 'git_' prefix removed, and a lot of other unnecessary
changes were removed from the original patch.
This create a new git_iterator type of object that provides a
uniform interface for iterating over the index, an arbitrary
tree, or the working directory of a repository.
As part of this, git ignore support was extended to support
push and pop of directory-based ignore files as the working
directory is being traversed (so the array of ignores does
not have to be recreated at each directory during traveral).
There are a number of other small utility functions in buffer,
path, vector, and fileops that are included in this patch
that made the iterator implementation cleaner.
This updates to implementation of gitattribute macros to be much more
similar to core git (albeit not 100%) and to handle expansion of
macros within macros, etc. It also cleans up the refcounting usage
with macros to be much cleaner.
Also, this adds a new vector function `git_vector_insert_sorted()`
which allows you to maintain a sorted list as you go. In order to
write that function, this changes the function `git__bsearch()` to
take a somewhat different set of parameters, although the core
functionality is still the same.
This adds APIs for querying git attributes. In addition to
the new API in include/git2/attr.h, most of the action is in
src/attr_file.[hc] which contains utilities for dealing with
a single attributes file, and src/attr.[hc] which contains
the implementation of the APIs that merge all applicable
attributes files.
1. The license header is technically not valid if it doesn't have a
copyright signature.
2. The COPYING file has been updated with the different licenses used in
the project.
3. The full GPLv2 header in each file annoys me.
Drop the GLibc implementation of Merge Sort and replace it with Timsort.
The algorithm has been tuned to work on arrays of pointers (void **),
so there's no longer a need to abstract the byte-width of each element
in the array.
All the comparison callbacks now take pointers-to-elements, not
pointers-to-pointers, so there's now one less level of dereferencing.
E.g.
int index_cmp(const void *a, const void *b)
{
- const git_index_entry *entry_a = *(const git_index_entry **)(a);
+ const git_index_entry *entry_a = (const git_index_entry *)(a);
The result is up to a 40% speed-up when sorting vectors. Memory usage
remains lineal.
A new `bsearch` implementation has been added, whose callback also
supplies pointer-to-elements, to uniform the Vector API again.
The routine remove duplictes from the vector. Only the last added element
of elements with equal keys remains in the vector.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Index operation use git_vector_sort() to sort index entries. Since index
support adding duplicates (two or more entries with the same path), it's
important to preserve order of elements. Preserving order of elements
allows to make decisions based on order. For example it's possible to
implement function witch removes all duplicates except last added.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
All `git_object` instances looked up from the repository are reference
counted. User is expected to use the new `git_object_close` when an
object is no longer needed to force freeing it.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We now store only one sorting callback that does entry comparison. This
is used when sorting the entries using a quicksort, and when looking for
a specific entry with the new search methods.
The following search methods now exist:
git_vector_search(vector, entry)
git_vector_search2(vector, custom_search_callback, key)
git_vector_bsearch(vector, entry)
git_vector_bsearch2(vector, custom_search_callback, key)
The sorting state of the vector is now stored internally.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
These two reference types are now stored separately to eventually allow
the removal/renaming of loose references and rewriting of the refs
packfile.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
- remove() would read one-past array bounds.
- resize() would fail if the initial size was 1, because it multiplied by 1.75
and truncated the resulting value. The buffer would always remain at size 1,
but elements would repeatedly be appended (via insert()) causing a crash.
The types in the git_index_entry struct are now system-defaults, and get
truncated to uint32_t's when written back on the index.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the operations on the 'git_index_entry' array and the
'git_tree_entry' array have been refactored into common code in the
src/vector.c file.
The new vector methods support:
- insertion: O(1) (avg)
- deletion: O(n)
- searching: O(logn)
- sorting: O(logn)
- r. access: O(1)
Signed-off-by: Vicent Marti <tanoku@gmail.com>