Note that we're not checking whether the resize succeeds; in OOM cases,
we let it run with a "small" vector and hash table and see if by chance
we can grow it dynamically as we insert the new entries. Nothing to
lose really.
Instead of calling `git_index_add` in a loop, use the new
`git_index_fill` internal API to fill the index with the initial staged
entries.
The new `fill` helper assumes that all the entries will be unique and
valid, so it can append them at the end of the entries vector and only
sort it once at the end. It performs no validation checks.
This prevents the quadratic behavior caused by having to sort the
entries list once after every insertion.
When replacing an index with a new one, we need to iterate
through all index entries in order to determine which entries are
equal. When it is not possible to re-use old entries for the new
index, we move it into a list of entries that are to be removed
and thus free'd.
When we encounter a non-zero error code, though, we skip adding
the current index entry to the remove-queue. `INSERT_MAP_EX`,
which is the function last run before adding to the remove-queue,
may return a positive non-zero code that indicates what exactly
happened while inserting the element. In this case we skip adding
the entry to the remove-queue but still continue the current
operation, leading to a leak of the current entry.
Fix this by checking for a negative return value instead of a
non-zero one when we want to add the current index entry to the
remove-queue.
When adding to the index, we look to see if a portion of the given
path matches a portion of a path in the index. If so, we will use
the existing path information. For example, when adding `foo/bar.c`,
if there is an index entry to `FOO/other` and the filesystem is case
insensitive, then we will put `bar.c` into the existing tree instead
of creating a new one with a different case.
Use `strncmp` to do that instead of `memcmp`. When we `bsearch`
into the index, we locate the position where the new entry would
go. The index entry at that position does not necessarily have
a relation to the entry we're adding, so we cannot make assumptions
and use `memcmp`. Instead, compare them as strings.
When canonicalizing paths, we look for the first index entry that
matches a given substring.
When duplicating a `struct git_tree_entry` with
`git_tree_entry_dup` the resulting structure is not allocated
inside a memory pool. As we do a 1:1 copy of the original struct,
though, we also copy the `pooled` field, which is set to `true`
for pooled entries. This results in a huge memory leak as we
never free tree entries that were duplicated from a pooled
tree entry.
Fix this by marking the newly duplicated entry as un-pooled.
When formatting a patch as email we do not include the commit's
message in the formatted patch output. Implement this and add a
test that verifies behavior.
It is already possible to get a commit's summary with the
`git_commit_summary` function. It is not possible to get the
remaining part of the commit message, that is the commit
message's body.
Fix this by introducing a new function `git_commit_body`.
The `git_blame__entry` struct keeps track of line counts with
`int` fields. Since `int` is only guaranteed to be at least 16
bits we may overflow on certain platforms when line counts exceed
2^15.
Fix this by instead storing line counts in `size_t`.
It is not unreasonable to have versioned files with a line count
exceeding 2^16. Upon blaming such files we fail to correctly keep
track of the lines as `git_blame_hunk` stores them in `uint16_t`
fields.
Fix this by converting the line fields of `git_blame_hunk` to
`size_t`. Add test to verify behavior.
This reduces the size of the struct from 32 to 26 bytes, and leaves a
single padding byte at the end of the struct (which comes from the
zero-length array).
These are rather small allocations, so we end up spending a non-trivial
amount of time asking the OS for memory. Since these entries are tied to
the lifetime of their tree, we can give the tree a pool so we speed up
the allocations.
We've already looked at the filename with `memchr()` and then used
`strlen()` to allocate the entry. We already know how much we have to
advance to get to the object id, so add the filename length instead of
looking at each byte again.
When building a recursive merge base, allow conflicts to occur.
Use the file (with conflict markers) as the common ancestor.
The user has already seen and dealt with this conflict by virtue
of having a criss-cross merge. If they resolved this conflict
identically in both branches, then there will be no conflict in the
result. This is the best case scenario.
If they did not resolve the conflict identically in the two branches,
then we will generate a new conflict. If the user is simply using
standard conflict output then the results will be fairly sensible.
But if the user is using a mergetool or using diff3 output, then the
common ancestor will be a conflict file (itself with diff3 output,
haha!). This is quite terrible, but it matches git's behavior.
Use annotated commits to act as our virtual bases, instead of regular
commits, to avoid polluting the odb with virtual base commits and
trees. Instead, build an annotated commit with an index and pointers
to the commits that it was merged from.
When there are more than two common ancestors, continue merging the
virtual base with the additional common ancestors, effectively
octopus merging a new virtual base.
When examining the working directory and determining whether it's
up-to-date, only consider the nanoseconds in the index entry when
built with `GIT_USE_NSEC`. This prevents us from believing that
the working directory is always dirty when the index was originally
written with a git client that uinderstands nsecs (like git 2.x).
Allow users to set the `git_libgit2_opts` search path for the
`GIT_CONFIG_LEVEL_PROGRAMDATA`. Convert `GIT_CONFIG_LEVEL_PROGRAMDATA`
to `GIT_SYSDIR_PROGRAMDATA` for setting the configuration.
Ensure that `git_index_read_index` clears the uptodate bit on
files that it modifies.
Further, do not propagate the cache from an on-disk index into
another on-disk index. Although this should not be done, as
`git_index_read_index` is used to bring an in-memory index into
another index (that may or may not be on-disk), ensure that we do
not accidentally bring in these bits when misused.
The uptodate bit should have a lifecycle of a single read->write
on the index. Once the index is written, the files within it should
be scanned for racy timestamps against the new index timestamp.
Keep track of entries that we believe are up-to-date, because we
added the index entries since the index was loaded. This prevents
us from unnecessarily examining files that we wrote during the
cleanup of racy entries (when we smudge racily clean files that have
a timestamp newer than or equal to the index's timestamp when we
read it). Without keeping track of this, we would examine every
file that we just checked out for raciness, since all their timestamps
would be newer than the index's timestamp.