The `git_buf_init` function has an optional length parameter, which will
cause the buffer to be initialized and allocated in one step. This can
be used instead of static initialization with `GIT_BUF_INIT` followed by
a `git_buf_grow`. This patch does so for two functions where it is
applicable.
Both the `git_buf_init` and `git_buf_attach` functions may call
`git_buf_grow` in case they were given an allocation length as
parameter. As such, it is possible for these functions to fail when we
run out of memory. While it won't probably be used anytime soon, it does
indeed make sense to also record this fact by returning an error code
from both functions. As they belong to the internal API only, this
change does not break our interface.
The `ENSURE_SIZE` macro can be used to grow a buffer if its currently
allocated size does not suffice a required target size. While most of
the code already uses this macro, the `git_buf_join` and `git_buf_join3`
functions do not yet use it. Due to the macro first checking whether we
have to grow the buffer at all, this has the benefit of saving a
function call when it is not needed. While this is nice to have, it will
probably not matter at all performance-wise -- instead, this only serves
for consistency across the code.
While the `ENSURE_SIZE` macro gets a reference to both the buffer that
is to be resized and a new size, we were not consistently referencing
the passed buffer, but instead a variable `buf`, which is not passed in.
Funnily enough, we never noticed because our buffers seem to always be
named `buf` whenever the macro was being used.
Fix the macro by always using the passed-in buffer. While at it, add
braces around all mentions of passed-in variables as should be done with
macros to avoid subtle errors.
Found-by: Edward Thompson
The function `git_buf_try_grow` consistently calls `giterr_set_oom`
whenever growing the buffer fails due to insufficient memory being
available. So in fact, we do not have to do this ourselves when a call
to any buffer-growing function has failed due to an OOM situation. But
we still do so in two functions, which this patch cleans up.
The updated SHA1DC library allows us to use custom includes instead of
using standard includes. Due to requirements with cross-platform, we
provide some custom system includes files like for example the
"stdint.h" file on Win32. Because of this, we want to make sure to avoid
breaking cross-platform compatibility when SHA1DC is enabled.
To use the new mechanism, we can simply define
`SHA1DC_NO_STANDARD_INCLUDES`. Furthermore, we can specify custom
include files via two defines, which we now use to include our
"common.h" header.
OpenSSL v1.1 has introduced a new way of initializing the library
without having to call various functions of different subsystems. In
libgit2, we have been adapting to that change with 88520151f
(openssl_stream: use new initialization function on OpenSSL version
>=1.1, 2017-04-07), where we added an #ifdef depending on the OpenSSL
version. This change broke building with libressl, though, which has not
changed its API in the same way.
Fix the issue by expanding the #ifdef condition to use the old way of
initializing with libressl.
Signed-off-by: Marc-Antoine Perennou <Marc-Antoine@Perennou.com>
The current write test does not trigger some edge-cases in the index
version 4 path compression code. Rewrite the test to start off the an
empty standard repository, creating index entries with interesting paths
itself. This allows for more fine-grained control over checked paths.
Furthermore, we now also verify that entry paths are actually
reconstructed correctly.
While we do have a test which checks whether a written index of version
4 has the correct version set, we do not check whether this actually
enables path compression for index entries. This commit adds a new test
by adding a number of index entries with equal path prefixes to the
index and subsequently flushing that to disk. With suffix compression
enabled by index version 4, only the last few bytes of these paths will
actually have to be written to the index, saving a lot of disk space.
For the test, differences are about an order of magnitude, allowing us
to easily verify without taking a deeper look at actual on-disk
contents.
While we have a simple test to determine whether we can write an index
of version 4, we never verified that we are able to read this kind of
index (and in fact, we were not able to do so). Add a new repository
which has an index of version 4. This repository is then read from a new
test.
The init and cleanup functions for test suites are usually prepended to
our actual tests. The index::version test suite does not adhere to this
stile. Fix this.
In our code writing index entries, we carry around a `disk_size`
representing how much memory we have in total and pass this value to
`git_encode_varint` to do bounds checks. This does not make much sense,
as at the time when passing on this variable it is already out of date.
Fix this by subtracting used memory from `disk_size` as we go along.
Furthermore, assert we've actually got enough space left to do the final
path memcpy.
When using compressed index entries, each entry's path is preceded by a
varint encoding how long the shared prefix with the previous index entry
actually is. We currently encode a length of `(path_len - same_len)`,
which is doubly wrong. First, `path_len` is already set to `path_len -
same_len` previously. Second, we want to encode the shared prefix rather
than the un-shared suffix length.
Fix this by using `same_len` as the varint value instead.
We have a check in place whether the index has enough data left for the
required footer after reading an index entry, but this was only used for
uncompressed entries. Move the check down a bit so that it is executed
for both compressed and uncompressed index entries.
All index entry size computations are now performed in
`index_entry_size`. As such, we do not need the file-scope macros for
computing these sizes anymore. Remove them and move the `entry_size`
macro into the `index_entry_size` function.
Our code to write index entries to disk does not check whether the
entry that is to be written should use prefix compression for the path.
As such, we were overallocating memory and added bogus right-padding
into the resulting index entries. As there is no padding allowed in the
index version 4 format, this should actually result in an invalid index.
Fix this by re-using the newly extracted `index_entry_size` function.
Create a new function `index_entry_size` which encapsulates the logic to
calculate how much space is needed for an index entry, whether it is
simple/extended or compressed/uncompressed. This can later be re-used by
our code writing index entries.
The last written disk entry is currently being written inside of the
function `write_disk_entry`. Make behavior a bit more obviously by
instead setting it inside of `write_entries` while iterating all
entries.
To calculate the path of a compressed index entry, we need to know the
preceding entry's path. While we do actually set the first predecessor
correctly to "", we fail to update this while reading the entries.
Fix the issue by updating `last` inside of the loop. Previously, we've
been passing a double-pointer to `read_entry`, which it didn't update.
As it is more obvious to update the pointer inside the loop itself,
though, we can simply convert it to a normal pointer.
The index version 4 introduced compressed path names for the entries.
From the git.git index-format documentation:
At the beginning of an entry, an integer N in the variable width
encoding [...] is stored, followed by a NUL-terminated string S.
Removing N bytes from the end of the path name for the previous
entry, and replacing it with the string S yields the path name for
this entry.
But instead of stripping N bytes from the previous path's string and
using the remaining prefix, we were instead simply concatenating the
previous path with the current entry path, which is obviously wrong.
Fix the issue by correctly copying the first N bytes of the previous
entry only and concatenating the result with our current entry's path.
When encoding varints to a buffer, we want to remain sure that the
required buffer space does not exceed what is actually available. Our
current check does not do the right thing, though, in that it does not
honor that our `pos` variable counts the position down instead of up. As
such, we will require too much memory for small varints and not enough
memory for big varints.
Fix the issue by correctly calculating the required size as
`(sizeof(varint) - pos)`. Add a test which failed before.
Upstream git.git has changed the way how packfiles are named.
Previously, they were using a hash of the contained object's OIDs, which
has then been changed to use the hash of the complete packfile instead.
See 1190a1acf (pack-objects: name pack files after trailer hash,
2013-12-05) in the git.git repository for more information on this
change.
This commit changes our logic to match the behavior of core git.
To determine if a repository is a worktree or not, we currently check
for the existence of a "gitdir" file inside of the repository's gitdir.
While this is sufficient for non-broken repositories, we have at least
one case of a subtly broken repository where there exists a gitdir file
inside of a gitmodule. This will cause us to misidentify the submodule
as a worktree.
While this is not really a fault of ours, we can do better here by
observing that a repository can only ever be a worktree iff its common
directory and dotgit directory are different. This allows us to make our
check whether a repo is a worktree or not more strict by doing a simple
string comparison of these two directories. This will also allow us to
do the right thing in the above case of a broken repository, as for
submodules these directories will be the same. At the same time, this
allows us to skip the `stat` check for the "gitdir" file for most
repositories.
The check whether a repository is a worktree or not is currently done
inside of `git_repository_open_ext`. As we want to extend this function
later on, pull it out into its own function `repo_is_worktree` to ease
working on it.
The out-parameters of `find_repo` containing found paths of a repository
are a tad confusing, as they are not as obvious as they could be. Rename
them like following to ease reading the code:
- `repo_path` -> `gitdir_path`
- `parent_path` -> `workdir_path`
- `link_path` -> `gitlink_path`
- `common_path` -> `commondir_path`
The `path` out-parameter of `find_repo` is being sanitized initially
such that we do not try to append to existing content. The sanitization
is done via `git_buf_free`, though, which forces us to needlessly
reallocate the buffer later in the function. Fix this by using
`git_buf_clear` instead.