In our code writing index entries, we carry around a `disk_size`
representing how much memory we have in total and pass this value to
`git_encode_varint` to do bounds checks. This does not make much sense,
as at the time when passing on this variable it is already out of date.
Fix this by subtracting used memory from `disk_size` as we go along.
Furthermore, assert we've actually got enough space left to do the final
path memcpy.
When using compressed index entries, each entry's path is preceded by a
varint encoding how long the shared prefix with the previous index entry
actually is. We currently encode a length of `(path_len - same_len)`,
which is doubly wrong. First, `path_len` is already set to `path_len -
same_len` previously. Second, we want to encode the shared prefix rather
than the un-shared suffix length.
Fix this by using `same_len` as the varint value instead.
We have a check in place whether the index has enough data left for the
required footer after reading an index entry, but this was only used for
uncompressed entries. Move the check down a bit so that it is executed
for both compressed and uncompressed index entries.
All index entry size computations are now performed in
`index_entry_size`. As such, we do not need the file-scope macros for
computing these sizes anymore. Remove them and move the `entry_size`
macro into the `index_entry_size` function.
Our code to write index entries to disk does not check whether the
entry that is to be written should use prefix compression for the path.
As such, we were overallocating memory and added bogus right-padding
into the resulting index entries. As there is no padding allowed in the
index version 4 format, this should actually result in an invalid index.
Fix this by re-using the newly extracted `index_entry_size` function.
Create a new function `index_entry_size` which encapsulates the logic to
calculate how much space is needed for an index entry, whether it is
simple/extended or compressed/uncompressed. This can later be re-used by
our code writing index entries.
The last written disk entry is currently being written inside of the
function `write_disk_entry`. Make behavior a bit more obviously by
instead setting it inside of `write_entries` while iterating all
entries.
To calculate the path of a compressed index entry, we need to know the
preceding entry's path. While we do actually set the first predecessor
correctly to "", we fail to update this while reading the entries.
Fix the issue by updating `last` inside of the loop. Previously, we've
been passing a double-pointer to `read_entry`, which it didn't update.
As it is more obvious to update the pointer inside the loop itself,
though, we can simply convert it to a normal pointer.
The index version 4 introduced compressed path names for the entries.
From the git.git index-format documentation:
At the beginning of an entry, an integer N in the variable width
encoding [...] is stored, followed by a NUL-terminated string S.
Removing N bytes from the end of the path name for the previous
entry, and replacing it with the string S yields the path name for
this entry.
But instead of stripping N bytes from the previous path's string and
using the remaining prefix, we were instead simply concatenating the
previous path with the current entry path, which is obviously wrong.
Fix the issue by correctly copying the first N bytes of the previous
entry only and concatenating the result with our current entry's path.
Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
`giterr_set()` is used when it is required to format a string, and since
we don't really require it for this case, it is better to stick to
`giterr_set_str()`.
This also suppresses a warning(-Wformat-security) raised by the compiler.
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
Support reading and writing index v4. Index v4 uses a very simple
compression scheme for pathnames, but is otherwise similar to index v3.
Signed-off-by: David Turner <dturner@twitter.com>
Ensure that we include conflicts when calling `git_index_read_index`,
which will remove conflicts in the index that do not exist in the new
target, and will add conflicts from the new target.
Most of `git_index_read_index` is common to reading any iterator.
Refactor it out in case we want to implement `read_tree` in terms of it
in the future.
When removing an entry from the index by its position, we first
retrieve the position from the index's entries and then try to
remove the retrieved value from the index map with
`DELETE_IN_MAP`. When `index_remove_entry` returns `NULL` we try
to feed it into the `DELETE_IN_MAP` macro, which will
unconditionally call `idxentry_hash` and then happily dereference
the `NULL` entry pointer.
Fix the issue by not passing a `NULL` entry into `DELETE_IN_MAP`.
When adding a new entry to an existing index via `git_index_read_index`,
be sure to remove the tree cache entry for that new path. This will
mark all parent trees as dirty.
Clear any error state upon each iteration. If one of the iterations
ends (with an error of `GIT_ITEROVER`) we need to reset that error to 0,
lest we stop the whole process prematurely.
Android NDK does not have a `struct timespec` in its `struct stat`
for nanosecond support, instead it has a single nanosecond member inside
the struct stat itself. We will use that and use a macro to expand to
the `st_mtim` / `st_mtimespec` definition on other systems (much like
the existing `st_mtime` backcompat definition).
The overflow check in `read_reuc` tries to verify if the
`git__strtol32` parses an integer bigger than UINT_MAX. The `tmp`
variable is casted to an unsigned int for this and then checked
for being greater than UINT_MAX, which obviously can never be
true.
Fix this by instead fixing the `mode` field's size in `struct
git_index_reuc_entry` to `uint32_t`. We can now parse the int
with `git__strtol64`, which can never return a value bigger than
`UINT32_MAX`, and additionally checking if the returned value is
smaller than zero.
We do not need to handle overflows explicitly here, as
`git__strtol64` returns an error when the returned value would
overflow.
Allow `git_index_read` to handle reading existing indexes with
illegal entries. Allow the low-level `git_index_add` to add
properly formed `git_index_entry`s even if they contain paths
that would be illegal for the current filesystem (eg, `AUX`).
Continue to disallow `git_index_add_bypath` from adding entries
that are illegal universally illegal (eg, `.git`, `foo/../bar`).
We don't support using an index object from multiple threads at the same
time, so the locking doesn't have any effect when following the
rules. If not following the rules, things are going to break down
anyway.
Note that we're not checking whether the resize succeeds; in OOM cases,
we let it run with a "small" vector and hash table and see if by chance
we can grow it dynamically as we insert the new entries. Nothing to
lose really.
Instead of calling `git_index_add` in a loop, use the new
`git_index_fill` internal API to fill the index with the initial staged
entries.
The new `fill` helper assumes that all the entries will be unique and
valid, so it can append them at the end of the entries vector and only
sort it once at the end. It performs no validation checks.
This prevents the quadratic behavior caused by having to sort the
entries list once after every insertion.