In `pack_entry_find_offset`, we try to find the offset of a
certain object in the pack file. To do so, we first assert if the
packfile has already been opened and open it if not. Opening the
packfile is guarded with a mutex, so concurrent access to this is
in fact safe.
What is not thread-safe though is our calculation of offsets
inside the packfile. Assume two threads calling
`pack_entry_find_offset` at the same time. We first calculate the
offset and index location and only then determine if the pack has
already been opened. If so, we re-calculate the offset and index
address.
Now the case for two threads: thread 1 first calculates the
addresses and is subsequently suspended. The second thread will
now call `pack_index_open` and initialize the pack file,
calculating its addresses correctly. When the first thread is
resumed now, he'll see that the pack file has already been
initialized and will happily proceed with the addresses it has
already calculated before the check. As the pack file was not
initialized before, these addresses are bogus.
Fix the issue by only calculating the addresses after having
checked if the pack file is open.
The `git_pqueue` struct allows being fixed in its total number of
entries. In this case, we simply throw away items that are
inserted into the priority queue by examining wether the new item
to be inserted has a higher priority than the previous smallest
one.
This feature somewhat contradicts our pqueue implementation in
that it is allowed to not have a comparison function. In fact, we
also fail to check if the comparison function is actually set in
the case where we add a new item into a fully filled fixed-size
pqueue.
As we cannot determine which item is the smallest item in absence
of a comparison function, we fix the `NULL` pointer dereference
by simply dropping all new items which are about to be inserted
into a full fixed-size pqueue.
When parsing a commit, we will treat all bytes left after parsing
the headers as the commit message. When no bytes are left, we
leave the commit's message uninitialized. While uncommon to have
a commit without message, this is the right behavior as Git
unfortunately allows for empty commit messages.
Given that this scenario is so uncommon, most programs acting on
the commit message will never check if the message is actually
set, which may lead to errors. To work around the error and not
lay the burden of checking for empty commit messages to the
developer, initialize the commit message with an empty string
when no commit message is given.
When parsing tree entries from raw object data, we do not verify
that the tree entry actually has a filename as well as a valid
object ID. Fix this by asserting that the filename length is
non-zero as well as asserting that there are at least
`GIT_OID_RAWSZ` bytes left when parsing the OID.
When we read from the list which `limit_list()` gives us, we need to check that
the commit is still interesting, as it might have become uninteresting after it
was added to the list.
`git-rebase--merge` does not ask for time sorting, but uses the default. We now
produce the same default time-ordered output as git, so make us of that since
it's not always the same output as our time sorting.
It changed from implementation-defined to git's default sorting, as there are
systems (e.g. rebase) which depend on this order. Also specify more explicitly
how you can get git's "date-order".
After `limit_list()` we already have the list in time-sorted order, which is
what we want in the "default" case. Enqueueing into the "unsorted" list would
just reverse it, and the topological sort will do its own sorting if it needs
to.
We've now moved to code that's closer to git and produces the output
during the preparation phase, so we no longer process the commits as
part of generating the output.
This makes a chunk of code redundant, as we're simply short-circuiting
it by detecting we've processed the commits alrady.
After porting over the commit hiding and selection we were still left
with mistmaching output due to the topologial sort.
This ports the topological sorting code to make us match with our
equivalent of `--date-order` and `--topo-order` against the output
from `rev-list`.
This is a convenience function to reverse the contents of a vector and a pqueue
in-place.
The pqueue function is useful in the case where we're treating it as a
LIFO queue.
We had some home-grown logic to figure out which objects to show during
the revision walk, but it was rather inefficient, looking over the same
list multiple times to figure out when we had run out of interesting
commits. We now use the lists in a smarter way.
We also introduce the slop mechanism to determine when to stpo
looking. When we run out of interesting objects, we continue preparing
the walk for another 5 rounds in order to make it less likely that we
miss objects in situations with complex graphs.
When trying to determine if we can safely overwrite an existing workdir
item, we may need to calculate the oid for the workdir item to determine
if its identical to the old side (and eligible for removal).
We previously did this regardless of the type of entry in the workdir;
if it was a directory, we would open(2) it and then try to read(2).
The read(2) of a directory fails on many platforms, so we would treat it
as if it were unmodified and continue to perform the checkout.
On FreeBSD, you _can_ read(2) a directory, so this pattern failed. We
would calculate an oid from the data read and determine that the
directory was modified and would therefore generate a checkout conflict.
This reliance on read(2) is silly (and was most likely accidentally
giving us the behavior we wanted), we should be explicit about the
directory test.
When creating and printing diffs, deal with binary deltas that have
binary data specially, versus diffs that have a binary file but lack the
actual binary data.
Instead of skipping printing a binary diff when there is no data, skip
printing when we have a status of `UNMODIFIED`. This is more in-line
with our internal data model and allows us to expand the notion of
binary data.
In the future, there may have no data because the files were unmodified
(there was no data to produce) or it may have no data because there was
no data given to us in a patch. We want to treat these cases
separately.
When generating diffs for binary files, we load and decompress
the blobs in order to generate the actual diff, which can be very
costly. While we cannot avoid this for the case when we are
called with the `GIT_DIFF_SHOW_BINARY` flag, we do not have to
load the blobs in the case where this flag is not set, as the
caller is expected to have no interest in the actual content of
binary files.
Fix the issue by only generating a binary diff when the caller is
actually interested in the diff. As libgit2 uses heuristics to
determine that a blob contains binary data by inspecting its size
without loading from the ODB, this saves us quite some time when
diffing in a repository with binary files.
According to the reference the git_checkout_tree and git_checkout_head
functions should accept NULL in the opts field
This was broken since the opts field was dereferenced and thus lead to a
crash.
When calling `http_connect` on a subtransport whose stream is already
connected, we first close the stream in case no keep-alive is in use.
When doing so, we do not reset the transport's connection state,
though. Usually, this will do no harm in case the subsequent connect
will succeed. But when the connection fails we are left with a
substransport which is tagged as connected but which has no valid
stream attached.
Fix the issue by resetting the subtransport's connected-state when
closing its stream in `http_connect`.
The .gitignore file allows for patterns which unignore previous
ignore patterns. When unignoring a previous pattern, there are
basically three cases how this is matched when no globbing is
used:
1. when a previous file has been ignored, it can be unignored by
using its exact name, e.g.
foo/bar
!foo/bar
2. when a file in a subdirectory has been ignored, it can be
unignored by using its basename, e.g.
foo/bar
!bar
3. when all files with a basename are ignored, a specific file
can be unignored again by specifying its path in a
subdirectory, e.g.
bar
!foo/bar
The first problem in libgit2 is that we did not correctly treat
the second case. While we verified that the negative pattern
matches the tail of the positive one, we did not verify if it
only matches the basename of the positive pattern. So e.g. we
would have also negated a pattern like
foo/fruz_bar
!bar
Furthermore, we did not check for the third case, where a
basename is being unignored in a certain subdirectory again.
Both issues are fixed with this commit.
Support reading and writing index v4. Index v4 uses a very simple
compression scheme for pathnames, but is otherwise similar to index v3.
Signed-off-by: David Turner <dturner@twitter.com>
When failing to initialize a new stransport stream, we try to
release already allocated memory by calling out to
`git_stream_free`, which in turn called out to the stream's
`free` function pointer. As we only initialize the function
pointer later on, this leads to a `NULL` pointer exception.
Furthermore, plug another memory leak when failing to create the
SSL context.
Only provide the empty tree internally, which matches git's behavior.
If we provide the empty blob then any users trying to write it with
libgit2 would omit it from actually landing in the odb, which appear
to git proper as a broken repository (missing that object).
The `SSLCopyPeerTrust` call can succeed but fail to return a trust
object if it can't load the certificate chain and thus cannot check the
validity of a certificate. This can lead to us calling `CFRelease` on a
`NULL` trust object, causing a crash.
Handle this by returning ECERTIFICATE.