Since writing multiple objects may all already exist in a single
packfile, avoid freshening that packfile repeatedly in a tight loop.
Instead, only freshen pack files every 2 seconds.
Don't try to determine when sysdirs are uninitialized. Instead, simply
initialize them all at `git_libgit2_init` time and never try to
reinitialize, except when consumers explicitly call `git_sysdir_set`.
Looking at the buffer length is especially problematic, since there may
no appropriate path for that value. (For example, the Windows-specific
programdata directory has no value on non-Windows machines.)
Previously we would continually trying to re-lookup these values,
which could get racy if two different threads are each calling
`git_sysdir_get` and trying to lookup / clear the value simultaneously.
According to git-fetch(1), "[t]he colon can be omitted when <dst>
is empty." So according to git, the refspec "refs/heads/master"
is the same as the refspec "refs/heads/master:" when fetching
changes. When trying to fetch from a remote with a trailing
colon with libgit2, though, the fetch actually fails while it
works when the trailing colon is left out. So obviously, libgit2
does _not_ treat these two refspec formats the same for fetches.
The problem results from parsing refspecs, where the resulting
refspec has its destination set to an empty string in the case of
a trailing colon and to a `NULL` pointer in the case of no
trailing colon. When passing this to our DWIM machinery, the
empty string gets translated to "refs/heads/", which is simply
wrong.
Fix the problem by having the parsing machinery treat both cases
the same for fetch refspecs.
After 1cd65991, we were passing a pointer to an `unsigned long` to
a function that now expected a pointer to a `size_t`. These types
differ on 64-bit Windows, which means that we trash the stack.
Use `size_t`s in the packbuilder to avoid this.
Somehow I ended up with the following in my ~/.gitconfig:
[branch "master"]
remote = origin
merge = master
rebase = true
I assume something went crazy while I was running the git.git tests
some time ago, and that I never noticed until now.
This is not a good configuration, but it shouldn't cause problems. But
it does. Specifically, if you have this in your config, and you
perform the following set of actions:
create a remote
fetch from that remote
create a branch off of the remote master branch called "master"
delete the branch
delete the remote
The remote delete fails with the message "Could not find key
'branch.master.rebase' to delete". This is because it's iterating over
the config entries (including the ones in the global config) and
believes that there is a master branch which must therefore have these
config keys.
https://github.com/libgit2/libgit2/issues/3856
Ensure that we include conflicts when calling `git_index_read_index`,
which will remove conflicts in the index that do not exist in the new
target, and will add conflicts from the new target.
Most of `git_index_read_index` is common to reading any iterator.
Refactor it out in case we want to implement `read_tree` in terms of it
in the future.
When we create a blame origin, we try to look up the blob that is
to be blamed at a certain revision. When this lookup fails, e.g.
because the file did not exist at that certain revision, we fail
to create the blame origin and return `NULL`. The blame origin
that we have just allocated is thereby free'd with
`origin_decref`.
The `origin_decref` function does not only decrement reference
counts for the blame origin, though, but also for its commit and
blob. When this is done in the error case, we will cause an
uneven reference count for these objects. This may result in
hard-to-debug failures at seemingly unrelated code paths, where
we try to access these objects when they in fact have already
been free'd.
Fix the issue by refactoring `make_origin` such that we only
allocate the object after the only function that may fail so that
we do not have to call `origin_decref` at all. Also fix the
`pass_blame` function, which indirectly calls `make_origin`, to
free the commit when `make_origin` failed.
When showing copy information because we are duplicating contents,
for example, when performing a `diff --find-copies-harder -M100 -B100`,
then show copy from/to lines in a patch, and do not show context.
Ensure that we can also parse such patches.
find_repo had a complex loop and heavily nested conditionals, making it
difficult to follow. Simplify this as much as possible:
- Separate assignments from conditionals.
- Check the complex loop condition in the only place it can change.
- Break out of the loop on error, rather than going through the rest of
the loop body first.
- Handle error cases by immediately breaking, rather than nesting
conditionals.
- Free repo_link unconditionally on the way out of the function, rather
than in multiple places.
- Add more comments on the remaining complex steps.
git_repository_open_ext provides parameters for the start path, whether
to search across filesystems, and what ceiling directories to stop at.
git commands have standard environment variables and defaults for each
of those, as well as various other parameters of the repository. To
avoid duplicate environment variable handling in users of libgit2, add a
GIT_REPOSITORY_OPEN_FROM_ENV flag, which makes git_repository_open_ext
automatically handle the appropriate environment variables. Commands
that intend to act just like those built into git itself can use this
flag to get the expected default behavior.
git_repository_open_ext with the GIT_REPOSITORY_OPEN_FROM_ENV flag
respects $GIT_DIR, $GIT_DISCOVERY_ACROSS_FILESYSTEM,
$GIT_CEILING_DIRECTORIES, $GIT_INDEX_FILE, $GIT_NAMESPACE,
$GIT_OBJECT_DIRECTORY, and $GIT_ALTERNATE_OBJECT_DIRECTORIES. In the
future, when libgit2 gets worktree support, git_repository_open_env will
also respect $GIT_WORK_TREE and $GIT_COMMON_DIR; until then,
git_repository_open_ext with this flag will error out if either
$GIT_WORK_TREE or $GIT_COMMON_DIR is set.
GIT_REPOSITORY_OPEN_NO_SEARCH does not search up through parent
directories, but still tries the specified path both directly and with
/.git appended. GIT_REPOSITORY_OPEN_BARE avoids appending /.git, but
opens the repository in bare mode even if it has a working directory.
To support the semantics git uses when given $GIT_DIR in the
environment, provide a new GIT_REPOSITORY_OPEN_NO_DOTGIT flag to not try
appending /.git.
git only checks ceiling directories when its search ascends to a parent
directory. A ceiling directory matching the starting directory will not
prevent git from finding a repository in the starting directory or a
parent directory. libgit2 handled the former case correctly, but
differed from git in the latter case: given a ceiling directory matching
the starting directory, but no repository at the starting directory,
libgit2 would stop the search at that point rather than finding a
repository in a parent directory.
Test case using git command-line tools:
/tmp$ git init x
Initialized empty Git repository in /tmp/x/.git/
/tmp$ cd x/
/tmp/x$ mkdir subdir
/tmp/x$ cd subdir/
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x git rev-parse --git-dir
fatal: Not a git repository (or any of the parent directories): .git
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x/subdir git rev-parse --git-dir
/tmp/x/.git
Fix the testsuite to test this case (in one case fixing a test that
depended on the current behavior), and then fix find_repo to handle this
case correctly.
In the process, simplify and document the logic in find_repo():
- Separate the concepts of "currently checking a .git directory" and
"number of iterations left before going further counts as a search"
into two separate variables, in_dot_git and min_iterations.
- Move the logic to handle in_dot_git and append /.git to the top of the
loop.
- Only search ceiling_dirs and find ceiling_offset after running out of
min_iterations; since ceiling_offset only tracks the longest matching
ceiling directory, if ceiling_dirs contained both the current
directory and a parent directory, this change makes find_repo stop the
search at the parent directory.
Avoid declaring old-style functions without any parameters.
Functions not accepting any parameters should be declared with
`void fn(void)`. See ISO C89 $3.5.4.3.
The old pthread-file did re-implement the pthreads API with exact symbol
matching. As the thread-abstraction has now been split up between Unix- and
Windows-specific files within the `git_` namespace to avoid symbol-clashes
between libgit2 and pthreads, the rewritten wrappers have nothing to do with
pthreads anymore.
Rename the Windows-specific pthread-files to honor this change.
The thread local storage is used to hold some global state that
is dynamically allocated and should be freed upon exit. On
Windows, we clean up the C run-time right after execution of
registered shutdown callbacks and before cleaning up the TLS.
When we clean up the CRT, we also cause it to analyze for memory
leaks. As we did not free the TLS yet this will lead to false
positives.
Fix the issue by first freeing the TLS and cleaning up the CRT
only afterwards.
When removing an entry from the index by its position, we first
retrieve the position from the index's entries and then try to
remove the retrieved value from the index map with
`DELETE_IN_MAP`. When `index_remove_entry` returns `NULL` we try
to feed it into the `DELETE_IN_MAP` macro, which will
unconditionally call `idxentry_hash` and then happily dereference
the `NULL` entry pointer.
Fix the issue by not passing a `NULL` entry into `DELETE_IN_MAP`.
When we receive a packet of exactly four bytes encoding its
length as those four bytes it can be treated as an empty line.
While it is not really specified how those empty lines should be
treated, we currently ignore them and do not return an error when
trying to parse it but simply advance the data pointer.
Callers invoking `git_pkt_parse_line` are currently not prepared
to handle this case as they do not explicitly check this case.
While they could always reset the passed out-pointer to `NULL`
before calling `git_pkt_parse_line` and determine if the pointer
has been set afterwards, it makes more sense to update
`git_pkt_parse_line` to set the out-pointer to `NULL` itself when
it encounters such an empty packet. Like this it is guaranteed
that there will be no invalid memory references to free'd
pointers.
As such, the issue has been fixed such that `git_pkt_parse_line`
always sets the packet out pointer to `NULL` when an empty packet
has been received and callers check for this condition, skipping
such packets.
When adding a new entry to an existing index via `git_index_read_index`,
be sure to remove the tree cache entry for that new path. This will
mark all parent trees as dirty.
Clear any error state upon each iteration. If one of the iterations
ends (with an error of `GIT_ITEROVER`) we need to reset that error to 0,
lest we stop the whole process prematurely.
It looks like we're getting the operation and not doing anything
with it, when in fact we are asserting that it's not null. Simply
assert that we are within the operation boundary instead of using
the `git_array_get` macro to do this for us.
Parse values up to and including `\377` (`0xff`) when unquoting.
Print octal values as an unsigned char when quoting, lest `printf`
think we're talking about negatives.
`oid_strlen` has meant one more than the length of the string.
This is mighty confusing. Make it mean only the string length!
Whomsoever needs to allocate a buffer to hold a string can null
terminate it like normal.
When a text file is added or deleted, use the file names from the
`diff --git` header instead of the `---` or `+++` lines. This is
for compatibility with git.
Now that `git_diff_delta` data can be produced by reading patch
file data, which may have an abbreviated oid, allow consumers to
know that the id is abbreviated.
Patches can now come from a variety of sources - either internally
generated (from diffing two commits) or as the results of parsing
some external data.
When we are provided some input buffer (with a length) to inflate,
and it contains more data than simply the deflated data, fail.
zlib will helpfully tell us when it is done reading (via Z_STREAM_END),
so if there is data leftover in the input buffer, fail lest we
continually try to inflate it.
Handle the application of binary patches. Include tests that
produce a binary patch (an in-memory `git_patch` object),
then enusre that the patch applies correctly.
Move the delta application functions into `delta.c`, next to the
similar delta creation functions. Make the `git__delta_apply`
functions adhere to other naming and parameter style within the
library.
When we want to remove the file, use the basename as the name of the
entry to remove, instead of the full one, which includes the directories
we've inserted into the stack.
Instead of going through the usual steps of reading a tree recursively
into an index, modifying it and writing it back out as a tree, introduce
a function to perform simple updates more efficiently.
`git_tree_create_updated` avoids reading trees which are not modified
and supports upsert and delete operations. It is not as versatile as
modifying the index, but it makes some common operations much more
efficient.
When calling `git_commit_create` with an empty array of `parents` and `parent_count == 0`
the call will segfault at https://github.com/libgit2/libgit2/blob/master/src/commit.c#L107
when it's trying to compare `current_id` to a null parent oid.
This just puts in a check to stop that segfault.
When determining diffs between two iterators we may need to
recurse into an unmatched directory for the "new" iterator when
it is either a prefix to the current item of the "old" iterator
or when untracked/ignored changes are requested by the user and
the directory is untracked/ignored.
When advancing into the directory and no files are found, we will
get back `GIT_ENOTFOUND`. If so, we simply skip the directory,
handling resulting unmatched old items in the next iteration. The
other case of `iterator_advance_into` returning either
`GIT_NOERROR` or any other error but `GIT_ENOTFOUND` will be
handled by the caller, which will now either compare the first
directory entry of the "new" iterator in case of `GIT_ENOERROR`
or abort on other cases.
Improve readability of the code to make the above logic more
clear.
We compute offsets by executing `off |= (*delta++ << 24)` for
multiple constants, where `off` is of type `size_t` and `delta`
is of type `unsigned char`. The usual arithmetic conversions (see
ISO C89 §3.2.1.5 "Usual arithmetic conversions") kick in here,
causing us to promote both operands to `int` and then extending
the result to an `unsigned long` when OR'ing it with `off`.
The integer promotion to `int` may result in wrong size
calculations for big values.
Fix the issue by making the constants `unsigned long`, causing both
operands to be promoted to `unsigned long`.
An object's size is computed by reading the object header's size
field until the most significant bit is not set anymore. To get
the total size, we increase the shift on each iteration and add
the shifted value to the total size.
We read the current value into a variable of type `unsigned
char`, from which we then take all bits except the most
significant bit and shift the result. We will end up with a
maximum shift of 60, but this exceeds the width of the value's
type, resulting in undefined behavior.
Fix the issue by instead reading the values into a variable of
type `unsigned long`, which matches the required width. This is
equivalent to git.git, which uses an `unsigned long` as well.
When `git_repository__cvar` fails we may end up with a
`ignorecase` value of `-1`. As we subsequently check if
`ignorecase` is non-zero, we may end up reporting that data
should be removed when in fact it should not.
Err on the safer side and set `ignorecase = 0` when
`git_repository__cvar` fails.
The `merge_file__xdiff` function checks if either `ours` or
`theirs` is `NULL`. The function is to be called with existing
files, though, and in fact already unconditionally dereferences
both pointers.
Remove the unnecessary check to silence warnings.
When we read the header, we want to know the size and type of the
object. We're currently inflating the full delta in order to read the
first few bytes. This can mean hundreds of kB needlessly inflated for
large objects.
Instead use a packfile stream to read just enough so we can read the two
varints in the header and avoid inflating most of the delta.
Differentiate between the ref_name used to create an annotated_commit
(that can subsequently be used to look up the reference) and the
description that we resolved this with (which _cannot_ be looked up).
The description is used for things like reflogs (and may be a ref name,
and ID something that we revparsed to get here), while the ref name must
actually be a reference name, and is used for things like rebase to
return to the initial branch.
openssl_read should return -1 in case of error.
SSL_read returns values <= 0 in case of error.
A return value of 0 can lead to an infinite loop, so the return value
of ssl_set_error will be returned if SSL_read is not successful (analog
to openssl_write).
While no extra header fields are defined for tags, git accepts them by
ignoring them and continuing the search for the message. There are a few
tags like this in the wild which git parses just fine, so we should do
the same.