According to git-fetch(1), "[t]he colon can be omitted when <dst>
is empty." So according to git, the refspec "refs/heads/master"
is the same as the refspec "refs/heads/master:" when fetching
changes. When trying to fetch from a remote with a trailing
colon with libgit2, though, the fetch actually fails while it
works when the trailing colon is left out. So obviously, libgit2
does _not_ treat these two refspec formats the same for fetches.
The problem results from parsing refspecs, where the resulting
refspec has its destination set to an empty string in the case of
a trailing colon and to a `NULL` pointer in the case of no
trailing colon. When passing this to our DWIM machinery, the
empty string gets translated to "refs/heads/", which is simply
wrong.
Fix the problem by having the parsing machinery treat both cases
the same for fetch refspecs.
When passing in a specific suite which should be executed by clar
via `-stest::suite`, we try to parse this string and then include
all tests contained in this suite. This also includes all tests
in sub-suites, e.g. 'test::suite::foo'.
In the case where multiple suites start with the same _string_,
for example 'test::foo' and 'test::foobar', we fail to
distinguish this correctly. When passing in `-stest::foobar`,
we wrongly determine that 'test::foo' is a prefix and try to
execute all of its matching functions. But as no function
will now match 'test::foobar', we simply execute nothing.
To fix this, we instead have to check if the prefix is an actual
suite prefix as opposed to a simple string prefix. We do so by by
inspecting if the first two characters trailing the prefix are
our suite delimiters '::', and only consider the filter as
matching in this case.
Ensure that we include conflicts when calling `git_index_read_index`,
which will remove conflicts in the index that do not exist in the new
target, and will add conflicts from the new target.
When showing copy information because we are duplicating contents,
for example, when performing a `diff --find-copies-harder -M100 -B100`,
then show copy from/to lines in a patch, and do not show context.
Ensure that we can also parse such patches.
git_repository_open_ext provides parameters for the start path, whether
to search across filesystems, and what ceiling directories to stop at.
git commands have standard environment variables and defaults for each
of those, as well as various other parameters of the repository. To
avoid duplicate environment variable handling in users of libgit2, add a
GIT_REPOSITORY_OPEN_FROM_ENV flag, which makes git_repository_open_ext
automatically handle the appropriate environment variables. Commands
that intend to act just like those built into git itself can use this
flag to get the expected default behavior.
git_repository_open_ext with the GIT_REPOSITORY_OPEN_FROM_ENV flag
respects $GIT_DIR, $GIT_DISCOVERY_ACROSS_FILESYSTEM,
$GIT_CEILING_DIRECTORIES, $GIT_INDEX_FILE, $GIT_NAMESPACE,
$GIT_OBJECT_DIRECTORY, and $GIT_ALTERNATE_OBJECT_DIRECTORIES. In the
future, when libgit2 gets worktree support, git_repository_open_env will
also respect $GIT_WORK_TREE and $GIT_COMMON_DIR; until then,
git_repository_open_ext with this flag will error out if either
$GIT_WORK_TREE or $GIT_COMMON_DIR is set.
GIT_REPOSITORY_OPEN_NO_SEARCH does not search up through parent
directories, but still tries the specified path both directly and with
/.git appended. GIT_REPOSITORY_OPEN_BARE avoids appending /.git, but
opens the repository in bare mode even if it has a working directory.
To support the semantics git uses when given $GIT_DIR in the
environment, provide a new GIT_REPOSITORY_OPEN_NO_DOTGIT flag to not try
appending /.git.
git only checks ceiling directories when its search ascends to a parent
directory. A ceiling directory matching the starting directory will not
prevent git from finding a repository in the starting directory or a
parent directory. libgit2 handled the former case correctly, but
differed from git in the latter case: given a ceiling directory matching
the starting directory, but no repository at the starting directory,
libgit2 would stop the search at that point rather than finding a
repository in a parent directory.
Test case using git command-line tools:
/tmp$ git init x
Initialized empty Git repository in /tmp/x/.git/
/tmp$ cd x/
/tmp/x$ mkdir subdir
/tmp/x$ cd subdir/
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x git rev-parse --git-dir
fatal: Not a git repository (or any of the parent directories): .git
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x/subdir git rev-parse --git-dir
/tmp/x/.git
Fix the testsuite to test this case (in one case fixing a test that
depended on the current behavior), and then fix find_repo to handle this
case correctly.
In the process, simplify and document the logic in find_repo():
- Separate the concepts of "currently checking a .git directory" and
"number of iterations left before going further counts as a search"
into two separate variables, in_dot_git and min_iterations.
- Move the logic to handle in_dot_git and append /.git to the top of the
loop.
- Only search ceiling_dirs and find ceiling_offset after running out of
min_iterations; since ceiling_offset only tracks the longest matching
ceiling directory, if ceiling_dirs contained both the current
directory and a parent directory, this change makes find_repo stop the
search at the parent directory.
Avoid declaring old-style functions without any parameters.
Functions not accepting any parameters should be declared with
`void fn(void)`. See ISO C89 $3.5.4.3.
Read a tree into an index using `git_index_read_index` (by reading
a tree into a new index, then reading that index into the current
index), then write the index back out, ensuring that our new index
is treesame to the tree that we read.
Test with some postimages that actually grow/shrink from the
original, adding new lines or removing them. (Also do so without
context to ensure that we can add/remove from a non-zero part of
the line vector.)
Parse values up to and including `\377` (`0xff`) when unquoting.
Print octal values as an unsigned char when quoting, lest `printf`
think we're talking about negatives.
Patches can now come from a variety of sources - either internally
generated (from diffing two commits) or as the results of parsing
some external data.
When we are provided some input buffer (with a length) to inflate,
and it contains more data than simply the deflated data, fail.
zlib will helpfully tell us when it is done reading (via Z_STREAM_END),
so if there is data leftover in the input buffer, fail lest we
continually try to inflate it.
Handle the application of binary patches. Include tests that
produce a binary patch (an in-memory `git_patch` object),
then enusre that the patch applies correctly.
Instead of going through the usual steps of reading a tree recursively
into an index, modifying it and writing it back out as a tree, introduce
a function to perform simple updates more efficiently.
`git_tree_create_updated` avoids reading trees which are not modified
and supports upsert and delete operations. It is not as versatile as
modifying the index, but it makes some common operations much more
efficient.
`test_commit_commit__create_initial_commit_parent_not_current` was not correctly
testing that `HEAD` was not changed. Now we grab the oid that it was pointing to
before the call to `git_commit_create` and the oid that it's pointing to afterwards
and compare those.
While no extra header fields are defined for tags, git accepts them by
ignoring them and continuing the search for the message. There are a few
tags like this in the wild which git parses just fine, so we should do
the same.
In order to match the star-star, we disable the flag that's looking for
a single path element, but that leads to searching for the pattern in
the middle of elements in the input string.
Mark when we're handing a star-star so we jump over the elements in our
attempt to match the part of the pattern that comes after the star-star.
While here, tighten up the check so we don't allow invalid rules
through.
When we're dealing with proxy addresses, we only want a hostname and
port, and the user would not provide a path, so make it optional so we
can use this same function to parse git as well as proxy URLs.
When running as root, skip the unreadable file tests, because, well,
they're probably _not_ unreadable to root unless you've got some
crazy NSA clearance-level honoring operating system shit going on.
When we turned strict object creation validation on by default, we
forgot to inform the refs::create tests of this. They, in fact,
believed that strict object creation was off by default. As a result,
their cleanup function went and turned strict object creation off for
the remaining tests.
If we cannot dwim the input, set the error message to be explicit about
that. Otherwise we leave the error for the last failed lookup, which
can be rather unexpected as it mentions a remote when the user thought
they were trying to look up a branch.
When passing -DUSE_OPENSSL:BOOL=OFF to cmake the testsuite will
fail with the following error:
core::stream::register_tls [/tmp/libgit2/tests/core/stream.c:40]
Function call failed: (error)
error -1 - <no message>
Fix test to assume failure for tls when built without openssl.
While at it also fix GIT_WIN32 cpp to check if it's defined
or not.
Allow callers to specify a start path with a trailing slash to match
a submodule, instead of just a directory. This is for some legacy
behavior that's sort of dumb, but there it is.
If we're looking for a symlink, realpath will give us the resolved path,
which is not what we're after, but a canonicalized version of the path
the user asked for.
Iterator tests were split over repo::iterator and diff::iterator,
with duplication between the two. Move them to iterator::index,
iterator::tree, and iterator::workdir.
Prior iterator implementations returned `GIT_ENOTFOUND` when
trying to advance into empty directories. Ensure that we no longer
do that and simply handle them gracefully.
tree_iterator was only working properly for a pathlist containing
file paths. In case of directory paths, it didn't match children
which contradicts GIT_DIFF_DISABLE_PATHSPEC_MATCH and
is different from index_iterator and fs_iterator.
As a consequence head-to-index status reporting for a specific
directory did not work properly -- all files have been reported
as added.
Include additional tests.
In the workdir iterator we do some tricky things to step down into
directories to look for things that are in our pathlist. Make sure
that we don't confuse between folders that we're definitely going to
return everything in and folders that we're only stepping down into
to keep looking for matches.
Ensure that we have hit the end of iteration; previously we tested
that we saw all the values that we expected to see. We did not
then ensure that we were at the end of the iteration (and that there
were subsequently values in the iteration that we did *not* expect.)
Refactored the tree iterator to never recurse; simply process the
next entry in order in `advance`. Additionally, reduce the number of
allocations and sorting as much as possible to provide a ~30% speedup
on case-sensitive iteration. (The gains for case-insensitive iteration
are less majestic.)
Disambiguate the reset and reset_range functions. Now reset_range
with a NULL path will clear the start or end; reset will leave the
existing start and end unchanged.
The callback mechanism makes it awkward to write data from an IO
source; move to `_fromstream()` which lets the caller remain in control,
in the same vein as we prefer iterators over foreach callbacks.
By returning when the count goes to zero rather than below it, setting
`howmany` to 7 in fact writes out the string 6 times.
Correct the termination condition to write out the string the amount of
times we specify.
The pair of `git_blob_create_frombuffer()` and
`git_blob_create_frombuffer_commit()` is meant to replace
`git_blob_create_fromchunks()` by providing a way for a user to write a
new blob when they want filtering or they do not know the size.
This approach allows the caller to retain control over when to add data
to this buffer and a more natural fit into higher-level language's own
stream abstractions instead of having to handle IO wait in the callback.
The in-memory buffer size of 2MB is chosen somewhat arbitrarily to be a
round multiple of usual page sizes and a value where most blobs seem
likely to be either going to be way below or way over that size. It's
also a round number of pages.
This implementation re-uses the helper we have from `_fromchunks()` so
we end up writing everything to disk, but hopefully more efficiently
than with a default filebuf. A later optimisation can be to avoid
writing the in-memory contents to disk, with some extra complexity.
This special-casing ignores that we might have a locked file, so the
hashtable does not represent the contents of the file we want to
write. This causes multivar writes to overwrite entries instead of add
to them when under lock.
There is no need for this as the normal code-path will write to the file
just fine, so simply get rid of it.
Since the `apply` callback can defer, the `check` callback is not
necessary. Removing the `check` callback further makes the `payload`
unnecessary along with the `cleanup` callback.
Ensure that setting the merge attribute forces the built-in default
`text` driver and does *not* honor the `merge.default` configuration
option. Further ensure that unsetting the merge attribute forces
a conflict (the `binary` driver).
Consumers can now register custom merged drivers with
`git_merge_driver_register`. This allows consumers to support the
merge drivers, as configured in `.gitattributes`. Consumers will be
asked to perform the file-level merge when a custom driver is
configured.
The function to extract signatures suffers from a similar bug to the
header field finding one by having an unecessary line feed check as a
break condition of its loop.
Fix that and add a test for this single-line signature situation.
This ensures that when using OpenSSL a safe default set of ciphers
is selected. This is done so that the client communicates securely
and we don't accidentally enable unsafe ciphers like RC4, or even
worse some old export ciphers.
Implements the first part of https://github.com/libgit2/libgit2/issues/3682
git_buf_clear does not free allocated memory associated with a
git_buf. Use `git_buf_free` instead to correctly free its memory
and plug the memory leak.
The old implementation had two issues:
1. OIDs that were too short as to be ambiguous were not being handled
properly.
2. If the last OID to expand in the array was missing from the ODB, we
would leak a `GIT_ENOTFOUND` error code from the function.
Sometimes you want to create a commit but not write it out to the
objectdb immediately. For these cases, provide a new function to
retrieve the buffer instead of having to go through the db.
If the underlying filesystem doesn't support better than one
second resolution, then don't expect that turning on `GIT_USE_NSEC`
does anything magical to change that.
Submodules don't exist in the objectdb and the code is making us try to
look for a blob with its commit id, which is obviously not going to
work.
Skip the test if the user wants to insert a submodule.
The index::nsec::staging_maintains_other_nanos test was created to
ensure that when we stage an entry when GIT_USE_NSECS is *unset* that
we truncate the index entry and do not persist the (old, invalid)
nanosec values. Ensure that when GIT_USE_NSECS is *set* that we do
not do that, and actually write the correct nanosecond values.
Test some additional exotic rebase setup behavior: that we are
able to set up properly when already in a detached HEAD state,
that the caller specifies all of branch, upstream and onto,
and that the caller specifies branch, upstream and onto by ID.
Allow `git_index_read` to handle reading existing indexes with
illegal entries. Allow the low-level `git_index_add` to add
properly formed `git_index_entry`s even if they contain paths
that would be illegal for the current filesystem (eg, `AUX`).
Continue to disallow `git_index_add_bypath` from adding entries
that are illegal universally illegal (eg, `.git`, `foo/../bar`).
Introduce a repository that contains some paths that were illegal
on PC-DOS circa 1981 (like `aux`, `con`, `com1`) and that in a
bizarre fit of retrocomputing, remain illegal on some "modern"
computers, despite being "new technology".
Introduce some aspirational tests that suggest that we should be
able to cope with trees and indexes that contain paths that
would be illegal on the filesystem, so that we can at least diff
them. Further ensure that checkout will not write a repository
with forbidden paths.
We should be checking whether the object we're looking up is a commit,
and we should let the caller know whether the not-found return code
comes from a bad object type or just a missing signature.
When performing an in-memory rebase, keep a single index for the
duration, so that callers have the expected index lifecycle and
do not hold on to an index that is free'd out from under them.
When we moved the logic to handle the first one, wrong loop logic was
kept in place which meant we still finished early. But we now notice it
because we're not reading past the last LF we find.
This was not noticed before as the last field in the tested commit was
multi-line which does not trigger the early break.
Introduce the ability to rebase in-memory or in a bare repository.
When `rebase_options.inmemory` is specified, the resultant `git_rebase`
session will not be persisted to disk. Callers may still analyze
the rebase operations, resolve any conflicts against the in-memory
index and create the commits. Neither `HEAD` nor the working
directory will be updated during this process.
We were searching only past the first header field, which meant we were
unable to find e.g. `tree` which is the first field.
While here, make sure to set an error message in case we cannot find the
field.
Include dotfiles when copying template directory, which will handle
both a template directory itself that begins with a dotfile, and
any dotfiles inside the directory.
Fix the file-mode test to expect system umask being applied to the
created file as well (it is currently applied to the directory only).
This fixes the test on systems where umask != 022.
Signed-off-by: Michał Górny <mgorny@gentoo.org>
When formatting a patch as email we do not include the commit's
message in the formatted patch output. Implement this and add a
test that verifies behavior.
It is already possible to get a commit's summary with the
`git_commit_summary` function. It is not possible to get the
remaining part of the commit message, that is the commit
message's body.
Fix this by introducing a new function `git_commit_body`.
It is not unreasonable to have versioned files with a line count
exceeding 2^16. Upon blaming such files we fail to correctly keep
track of the lines as `git_blame_hunk` stores them in `uint16_t`
fields.
Fix this by converting the line fields of `git_blame_hunk` to
`size_t`. Add test to verify behavior.
When building a recursive merge base, allow conflicts to occur.
Use the file (with conflict markers) as the common ancestor.
The user has already seen and dealt with this conflict by virtue
of having a criss-cross merge. If they resolved this conflict
identically in both branches, then there will be no conflict in the
result. This is the best case scenario.
If they did not resolve the conflict identically in the two branches,
then we will generate a new conflict. If the user is simply using
standard conflict output then the results will be fairly sensible.
But if the user is using a mergetool or using diff3 output, then the
common ancestor will be a conflict file (itself with diff3 output,
haha!). This is quite terrible, but it matches git's behavior.
Don't put the configuration in a subdir of the sandbox named
`config`, lest some tests decide to create their own directory
called `config`. Prefix with some underscores for uniqueness.
Ensure that `git_index_read_index` clears the uptodate bit on
files that it modifies.
Further, do not propagate the cache from an on-disk index into
another on-disk index. Although this should not be done, as
`git_index_read_index` is used to bring an in-memory index into
another index (that may or may not be on-disk), ensure that we do
not accidentally bring in these bits when misused.
Test that entries are only smudged when we write the index: the
entry smudging is to prevent us from updating an index in a way
that it would be impossible to tell that an item was racy.
Consider when we load an index: any entries that have the same
(or newer) timestamp than the index itself are considered racy,
and are subject to further scrutiny.
If we *save* that index with the same entries that we loaded,
then the index would now have a newer timestamp than the entries,
and they would no longer be given that additional scrutiny, failing
our racy detection! So test that we smudge those entries only on
writing the new index, but that we can detect them (in diff) without
having to write.
When there's no matching index entry (for whatever reason), don't
try to dereference the null return value to get at the id.
Otherwise when we break something in the index API, the checkout
test crashes for confusing reasons and causes us to step through
it in a debugger thinking that we had broken much more than we
actually did.
Keep track of entries that we believe are up-to-date, because we
added the index entries since the index was loaded. This prevents
us from unnecessarily examining files that we wrote during the
cleanup of racy entries (when we smudge racily clean files that have
a timestamp newer than or equal to the index's timestamp when we
read it). Without keeping track of this, we would examine every
file that we just checked out for raciness, since all their timestamps
would be newer than the index's timestamp.
When creating a filebuf, detect a directory that exists in our
target file location. This prevents a failure later, when we try
to move the lock file to the destination.
Test that on platforms without `core.symlinks`, we preserve symlinks
in `git_index_add_bypath`. (Users should correct the actual index
entry's mode to change a link to a regular file.)