We need this from util.h and posix.h, but the latter includes common.h
which includes util.h, which means p_strlen is not defined by the time
we get to git__strndup().
Split the definition on p_strlen() off into its own header so we can use
it in util.h.
The current code issues a lot of strncmp() calls in order to check for
the end of the header, simply in order to copy it and start going
through it again. These are a lot of calls for something we can check as
we go along. Knowing the amount of parents beforehand to reduce
allocations in extreme cases does not make up for them.
Instead start parsing immediately and check for the double-newline after
each header field, leaving the raw_header allocation for the end, which
lets us go through the header once and reduces the amount of strncmp()
calls significantly.
In unscientific testing, this has reduced a shortlog-like usage (walking
though the whole history of a branch and extracting data from the
commits) of git.git from ~830ms to ~700ms and makes the time we spend in
strncmp() negligible.
Let the user push committish objects and peel them to figure out which
commit to push to our queue.
This is for convenience and for allowing uses of
git_revwalk_push_glob(w, "tags")
with annotated tags.
Change the name to _matching() intead of _if(), and force _set_target()
to be a conditional update. If the user doesn't care about the old
value, they should use git_reference_create().
This tweaks the pqueue_up and pqueue_down routines so that they
will not do full element swaps but instead carry over the state
of the previous loop iteration and only assign elements for which
we know the final position. This will avoid a little bit of data
assignment which should improve performance in theory.
Also got rid of some vector helpers that I'm no longer using.
This fixes a typo I made for setting the sorted flag on the index
after a reload. That typo didn't actually cause any test failures
so I'm also adding a test that explicitly checks that the index is
correctly sorted after a reload when ignoring case and when not.
This updates the git_pqueue to simply be a set of specialized
init/insert/pop functions on a git_vector.
To preserve the pqueue feature of having a fixed size heap, I
converted the "sorted" field in git_vectors to a more general
"flags" field so that pqueue could mix in it's own flag. This
had a bunch of ramifications because a number of places were
directly looking at the vector "sorted" field - I added a couple
new git_vector helpers (is_sorted, set_sorted) so the specific
representation of this information could be abstracted.
I accidentally wrote a separate priority queue implementation when
I was working on file rename detection as part of the file hash
signature calculation code. To simplify licensing terms, I just
adapted that to a general purpose priority queue and replace the
old priority queue implementation that was borrowed from elsewhere.
This also removes parts of the COPYING document that no longer
apply to libgit2.
Validating the workdir should not compare HEAD to working
directory - this is both inefficient (as it ignores the cache)
and incorrect. If we had legitimately allowed changes in the
index (identical to the merge result) then comparing HEAD to
workdir would reject these changes as different. Further, this
will identify files that were filtered strangely as modified,
while testing with the cache would prevent this.
Also, it's stupid slow.
The checkout code used to defer removal of "blocking" files in
checkouts until the blocked item was actually being written (since
we have already checked that the removing the block is acceptable
according to the update rules). Unfortunately, this resulted in
an intermediate index state where both the blocking and new items
were in the index which is no longer allowed. Now we just remove
the blocking item in the first pass so it never needs to coexist.
In cases where there are typechanges, this could result in a bit
more churn of removing and recreating intermediate directories,
but I'm going to assume that is an unusual case and the churn will
not be too costly.
There were some confusing issues mixing up the number of bytes
written to the zstream output buffer with the number of bytes
consumed from the zstream input. This reorganizes the zstream
API and makes it easier to deflate an arbitrarily large input
while still using a fixed size output.
This removes the fetchRecurse compiler warnings and makes the
behavior match the other submodule options (i.e. the in-memory
setting can be reset to the on-disk value).
When three-way merging indexes, we previously changed each path
as we read them, which would lead to us adding an index entry for
'foo', then removing an index entry for 'foo/file'. With the new
index requirements, this is not allowed. Removing entries in the
merged index, then adding them, resolves this. In the previous
example, we now remove 'foo/file' before adding 'foo'.
In case insensitive index mode, we would stop at a prefixed entry,
treating the provided search key length as a substring, not the
length of the string to match.
Writing a sample Javascript driver pointed out some extra
whitespace handling that needed to be done in the diff driver.
This adds some tests with some sample javascript code that I
pulled off of GitHub just to see what would happen. Also, to
clean up the userdiff test data, I did a "git gc" and packed
up the test objects.
Since I don't have permission yet on the code from Git, I decided
I'd take a stab at writing patterns for PHP and Javascript myself.
I think these are pretty weak, but probably better than the
default behavior without them.
I contacted a number of Git authors and lined up their permission
to relicense their work for use in libgit2 and copied over their
code for diff driver xfuncname patterns. At this point, the code
I've copied is taken verbatim from core Git although Thomas Rast
warned me that the C++ patterns, at least, really need an update.
I've left off patterns where I don't feel like I have permission
at this point until I hear from more authors.
Reorganize the builtin driver table slightly so that core Git
builtin definitions can be imported verbatim. Then take a few of
the core Git drivers and pull them in.
This also creates a test of diffs with the builtin HTML driver
which led to some small error handling fixes in the driver
selection logic.
This extends the diff driver parser to support multiline driver
definitions along with ! prefixing for negated matches. This
brings the driver function pattern parsing in line with core Git.
This also adds an internal table of driver definitions and a
fallback code path that will look in that table for diff drivers
that are set with attributes without having a definition in the
config file. Right now, I just populated the table with a kind
of simple HTML definition that is similar to the core Git def.
Don't try to determine whether the system supports file modes
when putting the tree data in the index during checkout. The tree's
mode is canonical and did not come from stat(2) in the first place.
It's hard or even impossible to correctly free the string buffer
allocated by git_patch_to_str in some circumstances. Drop the function
so people have to use git_patch_to_buf instead - git_buf has a dedicated
destructor.
Returning library-allocated strings from libgit2 works fine on Linux,
but may cause problems on Windows because there is no one C Runtime that
everything links against. With libgit2 not exposing its own allocator,
freeing the string is a gamble.
git_patch_to_str already serializes to a buffer, then returns the
underlying memory. Expose the functionality directly, so callers can use
the git_buf_free function to free the memory later.
The "merge none" (don't automerge) flag was only to aide in
merge trivial tests. We can easily determine whether merge
trivial resulted in a trivial merge or an automerge by examining
the REUC after automerge has completed.
The default merge_file level was XDL_MERGE_MINIMAL, which will
produce conflicts where there should not be in the case where
both sides were changed identically. Change the defaults to be
more aggressive (XDL_MERGE_ZEALOUS) which will more aggressively
compress non-conflicts. This matches git.git's defaults.
Increase testing around reverting a previously reverted commit to
illustrate this problem.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
This changes git_signature_dup to actually honor oom conditions raised by
the call to git__strdup. It also aligns it with the error code return
pattern used everywhere else.
Ok, scrap the previous commit. This is the right overflow check that
takes care of 64 bit overflow **and** 32-bit overflow, which needs to be
considered because the pool malloc can only allocate 32-bit elements in
one go.
Note that `git_pool_strdup` cannot really return any error codes,
because the pool doesn't set errors on OOM.
The only place where `giterr_set_oom` is called is in
`git_pool_strndup`, in a conditional check that is always optimized
away. `n + 1` cannot be zero if `n` is unsigned because the compiler
doesn't take wraparound into account.
This check has been removed altogether because `size_t` is not
particularly going to overflow.
This renames git_vector_free_all to the better git_vector_free_deep
and also contains a couple of memory leak fixes based on valgrind
checks. The fixes are specifically: failure to free global dir
path variables when not compiled with threading on and failure to
free filters from the filter registry that had not be initialized
fully.
This adds tests that try canceling an indexer operation from
within the progress callback.
After writing the tests, I wanted to run this under valgrind and
had a number of errors in that situation because mmap wasn't
working. I added a CMake option to force emulation of mmap and
consolidated the Amiga-specific code into that new place (so we
don't actually need separate Amiga code now, just have to turn on
-DNO_MMAP).
Additionally, I made the indexer code propagate error codes more
reliably than it used to.
Clone callbacks can return non-zero values to cancel the clone.
This adds some tests to verify that this actually works and updates
the documentation to be clearer that this can happen and that the
return value will be propagated back by the clone function.
The checkout notify callback behavior on non-zero return values
was not being tested. This adds tests, fixes a bug with positive
values, and clarifies the documentation to make it clear that the
checkout can be canceled via this mechanism.
The callback to supply data chunks could return a negative value
to stop creation of the blob, but we were neither using GIT_EUSER
nor propagating the return value. This makes things use the new
behavior of returning the negative value back to the user.
This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
There are a lot of places that we call git__free on each item in
a vector and then call git_vector_free on the vector itself. This
just wraps that up into one convenient helper function.
This continues auditing all the places where GIT_EUSER is being
returned and making sure to clear any existing error using the
new giterr_user_cancel helper. As a result, places that relied
on intercepting GIT_EUSER but having the old error preserved also
needed to be cleaned up to correctly stash and then retrieve the
actual error.
Additionally, as I encountered places where error codes were not
being propagated correctly, I tried to fix them up. A number of
those fixes are included in the this commit as well.
This adds giterr_user_cancel to return GIT_EUSER and clear any
error message that is sitting around. As a result of using that
in places, we need to be more thorough with capturing errors that
happen inside a callback when used internally. To help with that,
this also adds giterr_capture and giterr_restore so that when we
internally use a foreach-type function that clears errors and
converts them to GIT_EUSER, it is easier to restore not just the
return value, but the actual error message text.
This adds `git_config__lookup_entry` which will look up a key in
a config and return either the entry or NULL if the key was not
present. Optionally, it can either suppress all errors or can
return them (although not finding the key is not an error for this
function). Unlike other accessors, this does not normalize the
config key string, so it must only be used when the key is known
to be in normalized form (i.e. all lower-case before the first dot
and after the last dot, with no invalid characters).
This also adds three high-level helper functions to look up config
values with no errors and a fallback value. The three functions
are for string, bool, and int values, and will resort to the
fallback value for any error that arises. They are:
* `git_config__get_string_force`
* `git_config__get_bool_force`
* `git_config__get_int_force`
None of them normalize the config `key` either, so they can only
be used for internal cases where the key is known to be in normal
format.
The frontend used to look at the file directly, but that's obviously not
the right thing to do. Expose it on the backend and use that function
instead.
git-core only writes to the reflogs of HEAD, refs/heads/ and,
refs/notes/ or if there is already a reflog in place. Adjust our code to
follow these semantics.
When doing copy detection, it is often necessary to include
UNMODIFIED records in the git_diff so they are available as source
records for GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED. Yet in the final
diff, often you will not want to have these UNMODIFIED records.
This adds a flag which marks these UNMODIFIED records for deletion
from the diff list so they will be removed after the rename detect
phase is over.
When FIND_COPIES is used in combination with BREAK_REWRITES for
rename detection, there was a bug where the split MODIFIED delta
was only used as a target for RENAME records and not for COPIED
records. This fixes that, converting the split into a pair of
DELETED and COPIED deltas when that circumstance arises.
Whenever a reference is created or updated, we need to write to the
reflog regardless of whether the user gave us a message, so we shouldn't
leave that to the ref frontend, but integrate it into the backend.
This also eliminates the race between ref update and writing to the
reflog, as we protect the reflog with the ref lock.
As an additional benefit, this reflog append on the backend happens by
appending to the file instead of parsing and rewriting it.
Copy the pointers into temporary vectors instead of assigning them tot
he same array so we don't mess up with someone else's memory by
accident (e.g. by sorting).
The callback-based method of listing remote references dates back to the
beginning of the network code's lifetime, when we didn't know any
better.
We need to keep the list around for update_tips() after disconnect() so
let's make use of this to simply give the user a pointer to the array so
they can write straightforward code instead of having to go through a
callback.
Removing arbitrary refspecs makes things more complex to reason
about. Instead, let the user set the fetch and push refspec list to
whatever they want it to be.