When parsing a commit, we will treat all bytes left after parsing
the headers as the commit message. When no bytes are left, we
leave the commit's message uninitialized. While uncommon to have
a commit without message, this is the right behavior as Git
unfortunately allows for empty commit messages.
Given that this scenario is so uncommon, most programs acting on
the commit message will never check if the message is actually
set, which may lead to errors. To work around the error and not
lay the burden of checking for empty commit messages to the
developer, initialize the commit message with an empty string
when no commit message is given.
`xlocale.h` only defines `regcomp_l` if `regex.h` was included as well.
Also change the test cases to actually test `p_regcomp` works with
a multibyte locale.
When parsing tree entries from raw object data, we do not verify
that the tree entry actually has a filename as well as a valid
object ID. Fix this by asserting that the filename length is
non-zero as well as asserting that there are at least
`GIT_OID_RAWSZ` bytes left when parsing the OID.
When we read from the list which `limit_list()` gives us, we need to check that
the commit is still interesting, as it might have become uninteresting after it
was added to the list.
`git-rebase--merge` does not ask for time sorting, but uses the default. We now
produce the same default time-ordered output as git, so make us of that since
it's not always the same output as our time sorting.
It changed from implementation-defined to git's default sorting, as there are
systems (e.g. rebase) which depend on this order. Also specify more explicitly
how you can get git's "date-order".
After `limit_list()` we already have the list in time-sorted order, which is
what we want in the "default" case. Enqueueing into the "unsorted" list would
just reverse it, and the topological sort will do its own sorting if it needs
to.
We've now moved to code that's closer to git and produces the output
during the preparation phase, so we no longer process the commits as
part of generating the output.
This makes a chunk of code redundant, as we're simply short-circuiting
it by detecting we've processed the commits alrady.
After porting over the commit hiding and selection we were still left
with mistmaching output due to the topologial sort.
This ports the topological sorting code to make us match with our
equivalent of `--date-order` and `--topo-order` against the output
from `rev-list`.
This is a convenience function to reverse the contents of a vector and a pqueue
in-place.
The pqueue function is useful in the case where we're treating it as a
LIFO queue.
We had some home-grown logic to figure out which objects to show during
the revision walk, but it was rather inefficient, looking over the same
list multiple times to figure out when we had run out of interesting
commits. We now use the lists in a smarter way.
We also introduce the slop mechanism to determine when to stpo
looking. When we run out of interesting objects, we continue preparing
the walk for another 5 rounds in order to make it less likely that we
miss objects in situations with complex graphs.
When trying to determine if we can safely overwrite an existing workdir
item, we may need to calculate the oid for the workdir item to determine
if its identical to the old side (and eligible for removal).
We previously did this regardless of the type of entry in the workdir;
if it was a directory, we would open(2) it and then try to read(2).
The read(2) of a directory fails on many platforms, so we would treat it
as if it were unmodified and continue to perform the checkout.
On FreeBSD, you _can_ read(2) a directory, so this pattern failed. We
would calculate an oid from the data read and determine that the
directory was modified and would therefore generate a checkout conflict.
This reliance on read(2) is silly (and was most likely accidentally
giving us the behavior we wanted), we should be explicit about the
directory test.
When creating and printing diffs, deal with binary deltas that have
binary data specially, versus diffs that have a binary file but lack the
actual binary data.
The `PKG_CHECK_MODULES` function searches a pkg-config module and
then proceeds to set various variables containing information on
how to link to the library. In contrast to the `FIND_PACKAGE`
function, the library path set by `PKG_CHECK_MODULES` will not
necessarily contain linking instructions with a complete path to
the library, though. So when a library is not installed in a
standard location, the linker might later fail due to being
unable to locate it.
While we already honor this when configuring libssh2 by adding
`LIBSSH2_LIBRARY_DIRS` to the link directories, we fail to do so
for libcurl, preventing us to build libgit2 on e.g. FreeBSD. Fix
the issue by adding the curl library directory to the linker
search path.
Instead of skipping printing a binary diff when there is no data, skip
printing when we have a status of `UNMODIFIED`. This is more in-line
with our internal data model and allows us to expand the notion of
binary data.
In the future, there may have no data because the files were unmodified
(there was no data to produce) or it may have no data because there was
no data given to us in a patch. We want to treat these cases
separately.
When generating diffs for binary files, we load and decompress
the blobs in order to generate the actual diff, which can be very
costly. While we cannot avoid this for the case when we are
called with the `GIT_DIFF_SHOW_BINARY` flag, we do not have to
load the blobs in the case where this flag is not set, as the
caller is expected to have no interest in the actual content of
binary files.
Fix the issue by only generating a binary diff when the caller is
actually interested in the diff. As libgit2 uses heuristics to
determine that a blob contains binary data by inspecting its size
without loading from the ODB, this saves us quite some time when
diffing in a repository with binary files.