The `git_pqueue` struct allows being fixed in its total number of
entries. In this case, we simply throw away items that are
inserted into the priority queue by examining wether the new item
to be inserted has a higher priority than the previous smallest
one.
This feature somewhat contradicts our pqueue implementation in
that it is allowed to not have a comparison function. In fact, we
also fail to check if the comparison function is actually set in
the case where we add a new item into a fully filled fixed-size
pqueue.
As we cannot determine which item is the smallest item in absence
of a comparison function, we fix the `NULL` pointer dereference
by simply dropping all new items which are about to be inserted
into a full fixed-size pqueue.
When one side of a merge is treesame to the ancestor, we can take the other side and skip all the expensive merge operations. This optimization can only be performed when the generation of REUC extension data is skipped.
Remove overriding the `checkout_strategy` for `update_options` when
performing an update on a submodule. Users should be specifying the
correct checkout strategy in
`update_options.checkout_opts.checkout_strategy`.
When parsing a commit, we will treat all bytes left after parsing
the headers as the commit message. When no bytes are left, we
leave the commit's message uninitialized. While uncommon to have
a commit without message, this is the right behavior as Git
unfortunately allows for empty commit messages.
Given that this scenario is so uncommon, most programs acting on
the commit message will never check if the message is actually
set, which may lead to errors. To work around the error and not
lay the burden of checking for empty commit messages to the
developer, initialize the commit message with an empty string
when no commit message is given.
When parsing tree entries from raw object data, we do not verify
that the tree entry actually has a filename as well as a valid
object ID. Fix this by asserting that the filename length is
non-zero as well as asserting that there are at least
`GIT_OID_RAWSZ` bytes left when parsing the OID.
When we read from the list which `limit_list()` gives us, we need to check that
the commit is still interesting, as it might have become uninteresting after it
was added to the list.
`git-rebase--merge` does not ask for time sorting, but uses the default. We now
produce the same default time-ordered output as git, so make us of that since
it's not always the same output as our time sorting.
It changed from implementation-defined to git's default sorting, as there are
systems (e.g. rebase) which depend on this order. Also specify more explicitly
how you can get git's "date-order".
After `limit_list()` we already have the list in time-sorted order, which is
what we want in the "default" case. Enqueueing into the "unsorted" list would
just reverse it, and the topological sort will do its own sorting if it needs
to.
We've now moved to code that's closer to git and produces the output
during the preparation phase, so we no longer process the commits as
part of generating the output.
This makes a chunk of code redundant, as we're simply short-circuiting
it by detecting we've processed the commits alrady.
After porting over the commit hiding and selection we were still left
with mistmaching output due to the topologial sort.
This ports the topological sorting code to make us match with our
equivalent of `--date-order` and `--topo-order` against the output
from `rev-list`.
This is a convenience function to reverse the contents of a vector and a pqueue
in-place.
The pqueue function is useful in the case where we're treating it as a
LIFO queue.
We had some home-grown logic to figure out which objects to show during
the revision walk, but it was rather inefficient, looking over the same
list multiple times to figure out when we had run out of interesting
commits. We now use the lists in a smarter way.
We also introduce the slop mechanism to determine when to stpo
looking. When we run out of interesting objects, we continue preparing
the walk for another 5 rounds in order to make it less likely that we
miss objects in situations with complex graphs.
When trying to determine if we can safely overwrite an existing workdir
item, we may need to calculate the oid for the workdir item to determine
if its identical to the old side (and eligible for removal).
We previously did this regardless of the type of entry in the workdir;
if it was a directory, we would open(2) it and then try to read(2).
The read(2) of a directory fails on many platforms, so we would treat it
as if it were unmodified and continue to perform the checkout.
On FreeBSD, you _can_ read(2) a directory, so this pattern failed. We
would calculate an oid from the data read and determine that the
directory was modified and would therefore generate a checkout conflict.
This reliance on read(2) is silly (and was most likely accidentally
giving us the behavior we wanted), we should be explicit about the
directory test.
When creating and printing diffs, deal with binary deltas that have
binary data specially, versus diffs that have a binary file but lack the
actual binary data.
Instead of skipping printing a binary diff when there is no data, skip
printing when we have a status of `UNMODIFIED`. This is more in-line
with our internal data model and allows us to expand the notion of
binary data.
In the future, there may have no data because the files were unmodified
(there was no data to produce) or it may have no data because there was
no data given to us in a patch. We want to treat these cases
separately.
When generating diffs for binary files, we load and decompress
the blobs in order to generate the actual diff, which can be very
costly. While we cannot avoid this for the case when we are
called with the `GIT_DIFF_SHOW_BINARY` flag, we do not have to
load the blobs in the case where this flag is not set, as the
caller is expected to have no interest in the actual content of
binary files.
Fix the issue by only generating a binary diff when the caller is
actually interested in the diff. As libgit2 uses heuristics to
determine that a blob contains binary data by inspecting its size
without loading from the ODB, this saves us quite some time when
diffing in a repository with binary files.
According to the reference the git_checkout_tree and git_checkout_head
functions should accept NULL in the opts field
This was broken since the opts field was dereferenced and thus lead to a
crash.
Introduce GIT_OPT_ENABLE_SYMBOLIC_REF_TARGET_VALIDATION option.
Setting this option to 0 allows
validation of a symbolic ref's target to be bypassed.
This option is enabled by default.
This mechanism is added primarily to address a discrepancy between git
behaviour and libgit2 behaviour, whereby the former allows the symbolic
ref target to carry an arbitrary string and the latter does not, so:
$ git symbolic-ref refs/heads/foo bar
$ cat .git/refs/heads/foo
ref: bar
where as attempting the same via libgit2 raises an error:
The given reference name 'bar' is not valid
this mechanism also allows those that might want to make use of
git's more lenient treatment of symbolic ref targets to do so.
When calling `http_connect` on a subtransport whose stream is already
connected, we first close the stream in case no keep-alive is in use.
When doing so, we do not reset the transport's connection state,
though. Usually, this will do no harm in case the subsequent connect
will succeed. But when the connection fails we are left with a
substransport which is tagged as connected but which has no valid
stream attached.
Fix the issue by resetting the subtransport's connected-state when
closing its stream in `http_connect`.
The .gitignore file allows for patterns which unignore previous
ignore patterns. When unignoring a previous pattern, there are
basically three cases how this is matched when no globbing is
used:
1. when a previous file has been ignored, it can be unignored by
using its exact name, e.g.
foo/bar
!foo/bar
2. when a file in a subdirectory has been ignored, it can be
unignored by using its basename, e.g.
foo/bar
!bar
3. when all files with a basename are ignored, a specific file
can be unignored again by specifying its path in a
subdirectory, e.g.
bar
!foo/bar
The first problem in libgit2 is that we did not correctly treat
the second case. While we verified that the negative pattern
matches the tail of the positive one, we did not verify if it
only matches the basename of the positive pattern. So e.g. we
would have also negated a pattern like
foo/fruz_bar
!bar
Furthermore, we did not check for the third case, where a
basename is being unignored in a certain subdirectory again.
Both issues are fixed with this commit.
Support reading and writing index v4. Index v4 uses a very simple
compression scheme for pathnames, but is otherwise similar to index v3.
Signed-off-by: David Turner <dturner@twitter.com>
When failing to initialize a new stransport stream, we try to
release already allocated memory by calling out to
`git_stream_free`, which in turn called out to the stream's
`free` function pointer. As we only initialize the function
pointer later on, this leads to a `NULL` pointer exception.
Furthermore, plug another memory leak when failing to create the
SSL context.
Only provide the empty tree internally, which matches git's behavior.
If we provide the empty blob then any users trying to write it with
libgit2 would omit it from actually landing in the odb, which appear
to git proper as a broken repository (missing that object).
The `SSLCopyPeerTrust` call can succeed but fail to return a trust
object if it can't load the certificate chain and thus cannot check the
validity of a certificate. This can lead to us calling `CFRelease` on a
`NULL` trust object, causing a crash.
Handle this by returning ECERTIFICATE.
Since writing multiple objects may all already exist in a single
packfile, avoid freshening that packfile repeatedly in a tight loop.
Instead, only freshen pack files every 2 seconds.
Don't try to determine when sysdirs are uninitialized. Instead, simply
initialize them all at `git_libgit2_init` time and never try to
reinitialize, except when consumers explicitly call `git_sysdir_set`.
Looking at the buffer length is especially problematic, since there may
no appropriate path for that value. (For example, the Windows-specific
programdata directory has no value on non-Windows machines.)
Previously we would continually trying to re-lookup these values,
which could get racy if two different threads are each calling
`git_sysdir_get` and trying to lookup / clear the value simultaneously.
According to git-fetch(1), "[t]he colon can be omitted when <dst>
is empty." So according to git, the refspec "refs/heads/master"
is the same as the refspec "refs/heads/master:" when fetching
changes. When trying to fetch from a remote with a trailing
colon with libgit2, though, the fetch actually fails while it
works when the trailing colon is left out. So obviously, libgit2
does _not_ treat these two refspec formats the same for fetches.
The problem results from parsing refspecs, where the resulting
refspec has its destination set to an empty string in the case of
a trailing colon and to a `NULL` pointer in the case of no
trailing colon. When passing this to our DWIM machinery, the
empty string gets translated to "refs/heads/", which is simply
wrong.
Fix the problem by having the parsing machinery treat both cases
the same for fetch refspecs.
After 1cd65991, we were passing a pointer to an `unsigned long` to
a function that now expected a pointer to a `size_t`. These types
differ on 64-bit Windows, which means that we trash the stack.
Use `size_t`s in the packbuilder to avoid this.
Somehow I ended up with the following in my ~/.gitconfig:
[branch "master"]
remote = origin
merge = master
rebase = true
I assume something went crazy while I was running the git.git tests
some time ago, and that I never noticed until now.
This is not a good configuration, but it shouldn't cause problems. But
it does. Specifically, if you have this in your config, and you
perform the following set of actions:
create a remote
fetch from that remote
create a branch off of the remote master branch called "master"
delete the branch
delete the remote
The remote delete fails with the message "Could not find key
'branch.master.rebase' to delete". This is because it's iterating over
the config entries (including the ones in the global config) and
believes that there is a master branch which must therefore have these
config keys.
https://github.com/libgit2/libgit2/issues/3856
Ensure that we include conflicts when calling `git_index_read_index`,
which will remove conflicts in the index that do not exist in the new
target, and will add conflicts from the new target.
Most of `git_index_read_index` is common to reading any iterator.
Refactor it out in case we want to implement `read_tree` in terms of it
in the future.
When we create a blame origin, we try to look up the blob that is
to be blamed at a certain revision. When this lookup fails, e.g.
because the file did not exist at that certain revision, we fail
to create the blame origin and return `NULL`. The blame origin
that we have just allocated is thereby free'd with
`origin_decref`.
The `origin_decref` function does not only decrement reference
counts for the blame origin, though, but also for its commit and
blob. When this is done in the error case, we will cause an
uneven reference count for these objects. This may result in
hard-to-debug failures at seemingly unrelated code paths, where
we try to access these objects when they in fact have already
been free'd.
Fix the issue by refactoring `make_origin` such that we only
allocate the object after the only function that may fail so that
we do not have to call `origin_decref` at all. Also fix the
`pass_blame` function, which indirectly calls `make_origin`, to
free the commit when `make_origin` failed.
When showing copy information because we are duplicating contents,
for example, when performing a `diff --find-copies-harder -M100 -B100`,
then show copy from/to lines in a patch, and do not show context.
Ensure that we can also parse such patches.
find_repo had a complex loop and heavily nested conditionals, making it
difficult to follow. Simplify this as much as possible:
- Separate assignments from conditionals.
- Check the complex loop condition in the only place it can change.
- Break out of the loop on error, rather than going through the rest of
the loop body first.
- Handle error cases by immediately breaking, rather than nesting
conditionals.
- Free repo_link unconditionally on the way out of the function, rather
than in multiple places.
- Add more comments on the remaining complex steps.
git_repository_open_ext provides parameters for the start path, whether
to search across filesystems, and what ceiling directories to stop at.
git commands have standard environment variables and defaults for each
of those, as well as various other parameters of the repository. To
avoid duplicate environment variable handling in users of libgit2, add a
GIT_REPOSITORY_OPEN_FROM_ENV flag, which makes git_repository_open_ext
automatically handle the appropriate environment variables. Commands
that intend to act just like those built into git itself can use this
flag to get the expected default behavior.
git_repository_open_ext with the GIT_REPOSITORY_OPEN_FROM_ENV flag
respects $GIT_DIR, $GIT_DISCOVERY_ACROSS_FILESYSTEM,
$GIT_CEILING_DIRECTORIES, $GIT_INDEX_FILE, $GIT_NAMESPACE,
$GIT_OBJECT_DIRECTORY, and $GIT_ALTERNATE_OBJECT_DIRECTORIES. In the
future, when libgit2 gets worktree support, git_repository_open_env will
also respect $GIT_WORK_TREE and $GIT_COMMON_DIR; until then,
git_repository_open_ext with this flag will error out if either
$GIT_WORK_TREE or $GIT_COMMON_DIR is set.
GIT_REPOSITORY_OPEN_NO_SEARCH does not search up through parent
directories, but still tries the specified path both directly and with
/.git appended. GIT_REPOSITORY_OPEN_BARE avoids appending /.git, but
opens the repository in bare mode even if it has a working directory.
To support the semantics git uses when given $GIT_DIR in the
environment, provide a new GIT_REPOSITORY_OPEN_NO_DOTGIT flag to not try
appending /.git.
git only checks ceiling directories when its search ascends to a parent
directory. A ceiling directory matching the starting directory will not
prevent git from finding a repository in the starting directory or a
parent directory. libgit2 handled the former case correctly, but
differed from git in the latter case: given a ceiling directory matching
the starting directory, but no repository at the starting directory,
libgit2 would stop the search at that point rather than finding a
repository in a parent directory.
Test case using git command-line tools:
/tmp$ git init x
Initialized empty Git repository in /tmp/x/.git/
/tmp$ cd x/
/tmp/x$ mkdir subdir
/tmp/x$ cd subdir/
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x git rev-parse --git-dir
fatal: Not a git repository (or any of the parent directories): .git
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x/subdir git rev-parse --git-dir
/tmp/x/.git
Fix the testsuite to test this case (in one case fixing a test that
depended on the current behavior), and then fix find_repo to handle this
case correctly.
In the process, simplify and document the logic in find_repo():
- Separate the concepts of "currently checking a .git directory" and
"number of iterations left before going further counts as a search"
into two separate variables, in_dot_git and min_iterations.
- Move the logic to handle in_dot_git and append /.git to the top of the
loop.
- Only search ceiling_dirs and find ceiling_offset after running out of
min_iterations; since ceiling_offset only tracks the longest matching
ceiling directory, if ceiling_dirs contained both the current
directory and a parent directory, this change makes find_repo stop the
search at the parent directory.
Avoid declaring old-style functions without any parameters.
Functions not accepting any parameters should be declared with
`void fn(void)`. See ISO C89 $3.5.4.3.
The old pthread-file did re-implement the pthreads API with exact symbol
matching. As the thread-abstraction has now been split up between Unix- and
Windows-specific files within the `git_` namespace to avoid symbol-clashes
between libgit2 and pthreads, the rewritten wrappers have nothing to do with
pthreads anymore.
Rename the Windows-specific pthread-files to honor this change.
The thread local storage is used to hold some global state that
is dynamically allocated and should be freed upon exit. On
Windows, we clean up the C run-time right after execution of
registered shutdown callbacks and before cleaning up the TLS.
When we clean up the CRT, we also cause it to analyze for memory
leaks. As we did not free the TLS yet this will lead to false
positives.
Fix the issue by first freeing the TLS and cleaning up the CRT
only afterwards.
When removing an entry from the index by its position, we first
retrieve the position from the index's entries and then try to
remove the retrieved value from the index map with
`DELETE_IN_MAP`. When `index_remove_entry` returns `NULL` we try
to feed it into the `DELETE_IN_MAP` macro, which will
unconditionally call `idxentry_hash` and then happily dereference
the `NULL` entry pointer.
Fix the issue by not passing a `NULL` entry into `DELETE_IN_MAP`.
When we receive a packet of exactly four bytes encoding its
length as those four bytes it can be treated as an empty line.
While it is not really specified how those empty lines should be
treated, we currently ignore them and do not return an error when
trying to parse it but simply advance the data pointer.
Callers invoking `git_pkt_parse_line` are currently not prepared
to handle this case as they do not explicitly check this case.
While they could always reset the passed out-pointer to `NULL`
before calling `git_pkt_parse_line` and determine if the pointer
has been set afterwards, it makes more sense to update
`git_pkt_parse_line` to set the out-pointer to `NULL` itself when
it encounters such an empty packet. Like this it is guaranteed
that there will be no invalid memory references to free'd
pointers.
As such, the issue has been fixed such that `git_pkt_parse_line`
always sets the packet out pointer to `NULL` when an empty packet
has been received and callers check for this condition, skipping
such packets.
When adding a new entry to an existing index via `git_index_read_index`,
be sure to remove the tree cache entry for that new path. This will
mark all parent trees as dirty.
Clear any error state upon each iteration. If one of the iterations
ends (with an error of `GIT_ITEROVER`) we need to reset that error to 0,
lest we stop the whole process prematurely.
It looks like we're getting the operation and not doing anything
with it, when in fact we are asserting that it's not null. Simply
assert that we are within the operation boundary instead of using
the `git_array_get` macro to do this for us.
Parse values up to and including `\377` (`0xff`) when unquoting.
Print octal values as an unsigned char when quoting, lest `printf`
think we're talking about negatives.
`oid_strlen` has meant one more than the length of the string.
This is mighty confusing. Make it mean only the string length!
Whomsoever needs to allocate a buffer to hold a string can null
terminate it like normal.
When a text file is added or deleted, use the file names from the
`diff --git` header instead of the `---` or `+++` lines. This is
for compatibility with git.
Now that `git_diff_delta` data can be produced by reading patch
file data, which may have an abbreviated oid, allow consumers to
know that the id is abbreviated.
Patches can now come from a variety of sources - either internally
generated (from diffing two commits) or as the results of parsing
some external data.
When we are provided some input buffer (with a length) to inflate,
and it contains more data than simply the deflated data, fail.
zlib will helpfully tell us when it is done reading (via Z_STREAM_END),
so if there is data leftover in the input buffer, fail lest we
continually try to inflate it.
Handle the application of binary patches. Include tests that
produce a binary patch (an in-memory `git_patch` object),
then enusre that the patch applies correctly.
Move the delta application functions into `delta.c`, next to the
similar delta creation functions. Make the `git__delta_apply`
functions adhere to other naming and parameter style within the
library.
When we want to remove the file, use the basename as the name of the
entry to remove, instead of the full one, which includes the directories
we've inserted into the stack.
Instead of going through the usual steps of reading a tree recursively
into an index, modifying it and writing it back out as a tree, introduce
a function to perform simple updates more efficiently.
`git_tree_create_updated` avoids reading trees which are not modified
and supports upsert and delete operations. It is not as versatile as
modifying the index, but it makes some common operations much more
efficient.
When calling `git_commit_create` with an empty array of `parents` and `parent_count == 0`
the call will segfault at https://github.com/libgit2/libgit2/blob/master/src/commit.c#L107
when it's trying to compare `current_id` to a null parent oid.
This just puts in a check to stop that segfault.
When determining diffs between two iterators we may need to
recurse into an unmatched directory for the "new" iterator when
it is either a prefix to the current item of the "old" iterator
or when untracked/ignored changes are requested by the user and
the directory is untracked/ignored.
When advancing into the directory and no files are found, we will
get back `GIT_ENOTFOUND`. If so, we simply skip the directory,
handling resulting unmatched old items in the next iteration. The
other case of `iterator_advance_into` returning either
`GIT_NOERROR` or any other error but `GIT_ENOTFOUND` will be
handled by the caller, which will now either compare the first
directory entry of the "new" iterator in case of `GIT_ENOERROR`
or abort on other cases.
Improve readability of the code to make the above logic more
clear.
We compute offsets by executing `off |= (*delta++ << 24)` for
multiple constants, where `off` is of type `size_t` and `delta`
is of type `unsigned char`. The usual arithmetic conversions (see
ISO C89 §3.2.1.5 "Usual arithmetic conversions") kick in here,
causing us to promote both operands to `int` and then extending
the result to an `unsigned long` when OR'ing it with `off`.
The integer promotion to `int` may result in wrong size
calculations for big values.
Fix the issue by making the constants `unsigned long`, causing both
operands to be promoted to `unsigned long`.
An object's size is computed by reading the object header's size
field until the most significant bit is not set anymore. To get
the total size, we increase the shift on each iteration and add
the shifted value to the total size.
We read the current value into a variable of type `unsigned
char`, from which we then take all bits except the most
significant bit and shift the result. We will end up with a
maximum shift of 60, but this exceeds the width of the value's
type, resulting in undefined behavior.
Fix the issue by instead reading the values into a variable of
type `unsigned long`, which matches the required width. This is
equivalent to git.git, which uses an `unsigned long` as well.
When `git_repository__cvar` fails we may end up with a
`ignorecase` value of `-1`. As we subsequently check if
`ignorecase` is non-zero, we may end up reporting that data
should be removed when in fact it should not.
Err on the safer side and set `ignorecase = 0` when
`git_repository__cvar` fails.
The `merge_file__xdiff` function checks if either `ours` or
`theirs` is `NULL`. The function is to be called with existing
files, though, and in fact already unconditionally dereferences
both pointers.
Remove the unnecessary check to silence warnings.
When we read the header, we want to know the size and type of the
object. We're currently inflating the full delta in order to read the
first few bytes. This can mean hundreds of kB needlessly inflated for
large objects.
Instead use a packfile stream to read just enough so we can read the two
varints in the header and avoid inflating most of the delta.
Differentiate between the ref_name used to create an annotated_commit
(that can subsequently be used to look up the reference) and the
description that we resolved this with (which _cannot_ be looked up).
The description is used for things like reflogs (and may be a ref name,
and ID something that we revparsed to get here), while the ref name must
actually be a reference name, and is used for things like rebase to
return to the initial branch.
openssl_read should return -1 in case of error.
SSL_read returns values <= 0 in case of error.
A return value of 0 can lead to an infinite loop, so the return value
of ssl_set_error will be returned if SSL_read is not successful (analog
to openssl_write).
While no extra header fields are defined for tags, git accepts them by
ignoring them and continuing the search for the message. There are a few
tags like this in the wild which git parses just fine, so we should do
the same.
When `init`ing a rebase from a detached HEAD, be sure to remember
that we were in a detached HEAD state so that we can correctly
`abort` the object that we just created.
In order to match the star-star, we disable the flag that's looking for
a single path element, but that leads to searching for the pattern in
the middle of elements in the input string.
Mark when we're handing a star-star so we jump over the elements in our
attempt to match the part of the pattern that comes after the star-star.
While here, tighten up the check so we don't allow invalid rules
through.
When we're dealing with proxy addresses, we only want a hostname and
port, and the user would not provide a path, so make it optional so we
can use this same function to parse git as well as proxy URLs.
If we cannot dwim the input, set the error message to be explicit about
that. Otherwise we leave the error for the last failed lookup, which
can be rather unexpected as it mentions a remote when the user thought
they were trying to look up a branch.
Allow callers to specify a start path with a trailing slash to match
a submodule, instead of just a directory. This is for some legacy
behavior that's sort of dumb, but there it is.
If we're looking for a symlink, realpath will give us the resolved path,
which is not what we're after, but a canonicalized version of the path
the user asked for.
The code initializing the merge driver registry accidentally
forgot a `goto done` in case of an error. Because of this the
next line, which registers the global shutdown callback for the
merge drivers, is only called when an error occured.
Fix this by adding the missing `goto done`. This fixes some
memory leaks when the global state is shut down.
The xdl_prepare_env() function may initialise an xdlclassifier_t
data structure via xdl_init_classifier(), which allocates memory
to several fields, for example 'rchash', 'rcrecs' and 'ncha'.
If this function later exits due to the failure of xdl_optimize_ctxs(),
then this xdlclassifier_t structure, and the memory allocated to it,
is not cleaned up.
In order to fix the memory leak, insert a call to xdl_free_classifier()
before returning.
This patch was originally written by Ramsay Jones (see commit
87f16258367a3b9a62663b11f898a4a6f3c19d31 in git.git).
Commit 307ab20b3 ("xdiff: PATIENCE/HISTOGRAM are not independent option
bits", 19-02-2012) introduced the XDF_DIFF_ALG() macro to access the
flag bits used to represent the diff algorithm requested. In addition,
code which had used explicit manipulation of the flag bits was changed
to use the macros.
However, one example of direct manipulation remains. Update this code to
use the XDF_DIFF_ALG() macro.
This patch was originally written by Ramsay Jones (see commit
5cd6978a9cfef58de061a9525f3678ade479564d in git.git).
If we hit the EOF while trying to write a new value, it may be that
we're already in the section that we were looking for. If so, do not
write a (duplicate) section header, just write the value.
Ensure that we have hit the end of iteration; previously we tested
that we saw all the values that we expected to see. We did not
then ensure that we were at the end of the iteration (and that there
were subsequently values in the iteration that we did *not* expect.)
Drop some of the layers of indirection between the workdir and the
filesystem iterators. This makes the code a little bit easier to
follow, and reduces the number of unnecessary allocations a bit as
well. (Prior to this, when we filter entries, we would allocate them,
filter them and then free them; now we do the filtering before
allocation.)
Also, rename `git_iterator_advance_over_with_status` to just
`git_iterator_advance_over`. Mostly because it's a fucking long-ass
function name otherwise.
Many code paths in checkout need the final, full on-disk path of the
file they're writing. (No surprise). However, they all munge the
`data->path` buffer themselves to get there. Provide a nice helper
method for them.
Plus, drop the use `git_iterator_current_workdir_path` which does the
same thing but different. Checkout is the only caller of this silly
function, which lets us remove it.
Refactored the tree iterator to never recurse; simply process the
next entry in order in `advance`. Additionally, reduce the number of
allocations and sorting as much as possible to provide a ~30% speedup
on case-sensitive iteration. (The gains for case-insensitive iteration
are less majestic.)
Disambiguate the reset and reset_range functions. Now reset_range
with a NULL path will clear the start or end; reset will leave the
existing start and end unchanged.
The callback mechanism makes it awkward to write data from an IO
source; move to `_fromstream()` which lets the caller remain in control,
in the same vein as we prefer iterators over foreach callbacks.
The pair of `git_blob_create_frombuffer()` and
`git_blob_create_frombuffer_commit()` is meant to replace
`git_blob_create_fromchunks()` by providing a way for a user to write a
new blob when they want filtering or they do not know the size.
This approach allows the caller to retain control over when to add data
to this buffer and a more natural fit into higher-level language's own
stream abstractions instead of having to handle IO wait in the callback.
The in-memory buffer size of 2MB is chosen somewhat arbitrarily to be a
round multiple of usual page sizes and a value where most blobs seem
likely to be either going to be way below or way over that size. It's
also a round number of pages.
This implementation re-uses the helper we have from `_fromchunks()` so
we end up writing everything to disk, but hopefully more efficiently
than with a default filebuf. A later optimisation can be to avoid
writing the in-memory contents to disk, with some extra complexity.
Allow setting the buffer size on open in order to use this data
structure more generally as a spill buffer, with larger buffer sizes for
specific use-cases.
This special-casing ignores that we might have a locked file, so the
hashtable does not represent the contents of the file we want to
write. This causes multivar writes to overwrite entries instead of add
to them when under lock.
There is no need for this as the normal code-path will write to the file
just fine, so simply get rid of it.
Take advantage of the constant size of tree-owned arrays and store them
in an array instead of a pool. This still lets us free them all at once
but lets the system allocator do the work of fitting them in.
Since the `apply` callback can defer, the `check` callback is not
necessary. Removing the `check` callback further makes the `payload`
unnecessary along with the `cleanup` callback.
Consumers can now register custom merged drivers with
`git_merge_driver_register`. This allows consumers to support the
merge drivers, as configured in `.gitattributes`. Consumers will be
asked to perform the file-level merge when a custom driver is
configured.
The function to extract signatures suffers from a similar bug to the
header field finding one by having an unecessary line feed check as a
break condition of its loop.
Fix that and add a test for this single-line signature situation.
While often similar, these are not the same on Windows. We want to use the page
size on Windows for the pools, but for mmap we need to use the allocation
granularity as the alignment.
On the other platforms these values remain the same.
This ensures that when using OpenSSL a safe default set of ciphers
is selected. This is done so that the client communicates securely
and we don't accidentally enable unsafe ciphers like RC4, or even
worse some old export ciphers.
Implements the first part of https://github.com/libgit2/libgit2/issues/3682
Callers of `git_config__cvar` already handle the case where the
function returns an error due to a failed configuration variable
lookup, but we are actually swallowing errors when calling
`git_config__lookup_entry` inside of the function.
Fix this by returning early when `git_config__lookup_entry`
returns an error. As we call `git_config__lookup_entry` with
`no_errors == false` which leads us to call `get_entry` with
`GET_NO_MISSING` we will not return early when the lookup fails
due to a missing entry. Like this we are still able to set the
default value of the cvar and exit successfully.
When writing to a file with locking not check if writing the
locked file actually succeeds. Fix the issue by returning error
code and message when writing fails.
Accessing the current values map is handled through the
`refcounder_strmap_take` function, which first acquires a mutex
before accessing its values. While this assures everybody is
trying to access the values with the mutex only we do not check
if the locking actually succeeds.
Fix the issue by checking if acquiring the lock succeeds and
returning `NULL` if we encounter an error. Adjust callers.
When normalizing options we try to look up HEAD's OID. While this
action may fail in malformed repositories we never check the
return value of the function.
Fix the issue by converting `normalize_options` to actually
return an error and handle the error in `git_blame_file`.
We usually check entries returned by `git_sortedcache_entry` for
NULL pointers. As we have a write lock in `packed_write`, though,
it really should not happen that the function returns NULL.
Assert that ref is not NULL to silence a Coverity warning.
When the user passes in a diff which has no repository associated
we may call `git_config__get_int_force` with a NULL-pointer
configuration. Even though `git_config__get_int_force` is
designed to swallow errors, it is not intended to be called with
a NULL pointer configuration.
Fix the issue by only calling `git_config__get_int_force` only
when configuration could be retrieved from the repository.
In C89 it is undefined behavior to pass `NULL` pointers to
`strncmp` and later on in C99 it has been explicitly stated that
functions with an argument declared as `size_t nmemb` specifying
the array length shall always have valid parameters, no matter if
`nmemb` is 0 or not (see ISO 9899 §7.21.1.2).
The function `str_equal_no_trailing_slash` always passes its
parameters to `strncmp` if their lengths match. This means if one
parameter is `NULL` and the other one either `NULL` or a string
with length 0 we will pass the pointers to `strncmp` and cause
undefined behavior.
Fix this by explicitly handling the case when both lengths are 0.
When computing a short OID we do this by first copying the
leading parts into the new OID structure and then setting the
trailing part to zero. In the case of the desired length being
`GIT_OID_HEXSZ - 1` we will call `memset` with an out of bounds
pointer and a length of 0. While this seems to cause no problems
for common platforms the C89 standard does not explicitly state
that calling `memset` with an out of bounds pointer and
length of 0 is valid.
Fix the potential issue by using the newly introduced
`git_oid__cpy_prefix` function.
When parsing a section header we expect something along the
format of '[section "subsection"]'. When a section is
mal-formated and is entirely missing its quotation marks we catch
this case by observing that `strchr(line, '"') - strrchr(line,
'"') = NULL - NULL = 0` and error out. Unfortunately, the error
message is misleading though, as we state that we are missing the
closing quotation mark while we in fact miss both quotation
marks.
Improve the error message by explicitly checking if the first
quotation mark could be found and, if not, stating that quotation
marks are completely missing.
The first time may be due to memory fragmentation or just bad luck on a
32-bit system. When we hit the mmap error for the first time, free up
the unused windows and try again.
The old implementation had two issues:
1. OIDs that were too short as to be ambiguous were not being handled
properly.
2. If the last OID to expand in the array was missing from the ODB, we
would leak a `GIT_ENOTFOUND` error code from the function.
Sometimes you want to create a commit but not write it out to the
objectdb immediately. For these cases, provide a new function to
retrieve the buffer instead of having to go through the db.
Submodules don't exist in the objectdb and the code is making us try to
look for a blob with its commit id, which is obviously not going to
work.
Skip the test if the user wants to insert a submodule.
We should have been doing this, but it initializes itself upon first
use, which works as long as nobody's doing concurrent network
operations. Initialize it on our init to make sure it's not getting
initialized concurrently.
If the caller has provided bad authentication, give them another
apportunity to get it right until they give up. This brings WinHTTP in
line with the other transports.
Commit 3d1abc5afc fixes a memory leak in the xdiff code. In the
process of upstreaming the fix it was pointed out by Johannes
Schindelin that there is another memory leak present (see [1]).
Fix the second memory leak by applying the upstream fix to our
code base.
[1]: http://thread.gmane.org/gmane.comp.version-control.git/287034
Android NDK does not have a `struct timespec` in its `struct stat`
for nanosecond support, instead it has a single nanosecond member inside
the struct stat itself. We will use that and use a macro to expand to
the `st_mtim` / `st_mtimespec` definition on other systems (much like
the existing `st_mtime` backcompat definition).
Use the `giterr_set` function, which actually supports `GITERR_OS`.
The `giterr_set_str` function is exposed for external users and will
not append the operating system's error message.
The `normalize_find_opts` function in theory allows for the
incoming diff to have no repository. When the caller does not
pass in diff find options or if the GIT_DIFF_FIND_BY_CONFIG value
is set, though, we try to derive the configuration from the
diff's repository configuration without first verifying that the
repository is actually set to a non-NULL value.
Fix this issue by explicitly checking if the repository is set
and if it is not, fall back to a default value of
GIT_DIFF_FIND_RENAMES.
Convert `rebase_alloc` to use our usual error propagation
patterns, that is accept an out-parameter and return an error
code that is to be checked by the caller. This allows us to use
the GITERR_CHECK_ALLOC macro, which helps static analysis.
Set the error code when an error occurs in any of the called
functions. This ensures we pass the error up to callers and
actually free the remote when an error occurs.
The overflow check in `read_reuc` tries to verify if the
`git__strtol32` parses an integer bigger than UINT_MAX. The `tmp`
variable is casted to an unsigned int for this and then checked
for being greater than UINT_MAX, which obviously can never be
true.
Fix this by instead fixing the `mode` field's size in `struct
git_index_reuc_entry` to `uint32_t`. We can now parse the int
with `git__strtol64`, which can never return a value bigger than
`UINT32_MAX`, and additionally checking if the returned value is
smaller than zero.
We do not need to handle overflows explicitly here, as
`git__strtol64` returns an error when the returned value would
overflow.
The fail-label of `reflog_parse` explicitly checks the entry
poitner for NULL before freeing it. When we jump to the label the
variable has to be set to a non-NULL and valid pointer though: if
the allocation fails we immediately return with an error code and
if the loop was not entered we return with a success code,
withouth executing the label's code.
Remove the useless NULL-check to silence Coverity.
When invoking `diff_print_info_init_frompatch` it is obvious that
the patch should be non-NULL. We explicitly check if the variable
is set and continue afterwards, happily dereferencing the
potential NULL-pointer.
Fix this by instead asserting that patch is set. This also
silences Coverity.
The function `compute_write_order` may return a `NULL`-pointer
when an error occurs. In such cases we jump to the `done`-label
where we try to clean up allocated memory. Unfortunately we try
to deallocate the `write_order` array, though, which may be NULL
here.
Fix this error by returning early instead of jumping to the
`done` label. There is no data to be cleaned up anyway.
When no payload is set for `crlf_apply` we try to compute the
crlf attributes ourselves with `crlf_check`. When the function
determines that the current file does not require any treatment
we return the GIT_PASSTHROUGH error code without actually
allocating the out-pointer, which indicates the file should not
be passed through the filter.
The `crlf_apply` function explicitly checks for the
GIT_PASSTHROUGH return code and ignores it. This means we will
try to apply the crlf-filter to the current file, leading us to
dereference the unallocated payload-pointer.
Fix this obviously incorrect behavior by not treating
GIT_PASSTHROUGH in any special way. This is the correct thing to
do anyway, as the code indicates that the file should not be
passed through the filter.
We commonly have to check if a git_buf has been allocated
correctly or if we ran out of memory. Introduce a new macro
similar to `GITERR_CHECK_ALLOC` which checks if we ran OOM and if
so returns an error. Provide a `#nodef` for Coverity to mark the
error case as an abort path.
When checking for out of memory situations we usually use the
GITERR_CHECK_ALLOC macro. Besides conforming to our current code
base it adds the benefit of silencing errors in Coverity due to
Coverity handling the macro's error path as abort.
Allow `git_index_read` to handle reading existing indexes with
illegal entries. Allow the low-level `git_index_add` to add
properly formed `git_index_entry`s even if they contain paths
that would be illegal for the current filesystem (eg, `AUX`).
Continue to disallow `git_index_add_bypath` from adding entries
that are illegal universally illegal (eg, `.git`, `foo/../bar`).
Although a `tree_iterator` that failed to be properly created
does not have a frame, all other `tree_iterator`s should. Do not
call `pop` in the failure case, but assert that in all other
cases there is a frame.
When Git repository at network locations, sometimes git_iterator_for_tree
fails at iterator__update_ignore_case so it goes to git_iterator_free.
Null pointer will crash the process if not check.
Signed-off-by: Colin Xu <colin.xu@gmail.com>
We should be checking whether the object we're looking up is a commit,
and we should let the caller know whether the not-found return code
comes from a bad object type or just a missing signature.
When performing an in-memory rebase, keep a single index for the
duration, so that callers have the expected index lifecycle and
do not hold on to an index that is free'd out from under them.
When we moved the logic to handle the first one, wrong loop logic was
kept in place which meant we still finished early. But we now notice it
because we're not reading past the last LF we find.
This was not noticed before as the last field in the tested commit was
multi-line which does not trigger the early break.
Introduce the ability to rebase in-memory or in a bare repository.
When `rebase_options.inmemory` is specified, the resultant `git_rebase`
session will not be persisted to disk. Callers may still analyze
the rebase operations, resolve any conflicts against the in-memory
index and create the commits. Neither `HEAD` nor the working
directory will be updated during this process.
The function `git_packfile_stream_open` tries to free the passed
in stream when an error occurs. The only call site is
`git_indexer_append`, though, which passes in the address of a
stream struct which has not been allocated on the heap.
Fix the issue by simply removing the call to free. In case of an
error we did not allocate any memory yet and otherwise it should
be the caller's responsibility to manage it's object's lifetime.
We were searching only past the first header field, which meant we were
unable to find e.g. `tree` which is the first field.
While here, make sure to set an error message in case we cannot find the
field.
Previously we would set the global filter registry structure before
adding filters to the structure, without a lock, which is quite racy.
Now, register default filters during global registration and use an
rwlock to read and write the filter registry (as appopriate).
Standard Windows type systems define CLSID_InternetSecurityManager
and IID_IInternetSecurityManager, but MinGW lacks these definitions.
As a result, we must hardcode these definitions ourselves. However,
we should not use a public struct with those names, lest another
library do the same thing and consumers cannot link to both.
We don't support using an index object from multiple threads at the same
time, so the locking doesn't have any effect when following the
rules. If not following the rules, things are going to break down
anyway.
Include dotfiles when copying template directory, which will handle
both a template directory itself that begins with a dotfile, and
any dotfiles inside the directory.
Fix the possibility of returning successfully from ssh_stream_read()
with *bytes_read < 0. This would occur if stdout channel read resulted
in 0, and stderr channel read failed afterwards.
Note that we're not checking whether the resize succeeds; in OOM cases,
we let it run with a "small" vector and hash table and see if by chance
we can grow it dynamically as we insert the new entries. Nothing to
lose really.
Instead of calling `git_index_add` in a loop, use the new
`git_index_fill` internal API to fill the index with the initial staged
entries.
The new `fill` helper assumes that all the entries will be unique and
valid, so it can append them at the end of the entries vector and only
sort it once at the end. It performs no validation checks.
This prevents the quadratic behavior caused by having to sort the
entries list once after every insertion.
When replacing an index with a new one, we need to iterate
through all index entries in order to determine which entries are
equal. When it is not possible to re-use old entries for the new
index, we move it into a list of entries that are to be removed
and thus free'd.
When we encounter a non-zero error code, though, we skip adding
the current index entry to the remove-queue. `INSERT_MAP_EX`,
which is the function last run before adding to the remove-queue,
may return a positive non-zero code that indicates what exactly
happened while inserting the element. In this case we skip adding
the entry to the remove-queue but still continue the current
operation, leading to a leak of the current entry.
Fix this by checking for a negative return value instead of a
non-zero one when we want to add the current index entry to the
remove-queue.
When adding to the index, we look to see if a portion of the given
path matches a portion of a path in the index. If so, we will use
the existing path information. For example, when adding `foo/bar.c`,
if there is an index entry to `FOO/other` and the filesystem is case
insensitive, then we will put `bar.c` into the existing tree instead
of creating a new one with a different case.
Use `strncmp` to do that instead of `memcmp`. When we `bsearch`
into the index, we locate the position where the new entry would
go. The index entry at that position does not necessarily have
a relation to the entry we're adding, so we cannot make assumptions
and use `memcmp`. Instead, compare them as strings.
When canonicalizing paths, we look for the first index entry that
matches a given substring.
When duplicating a `struct git_tree_entry` with
`git_tree_entry_dup` the resulting structure is not allocated
inside a memory pool. As we do a 1:1 copy of the original struct,
though, we also copy the `pooled` field, which is set to `true`
for pooled entries. This results in a huge memory leak as we
never free tree entries that were duplicated from a pooled
tree entry.
Fix this by marking the newly duplicated entry as un-pooled.
When formatting a patch as email we do not include the commit's
message in the formatted patch output. Implement this and add a
test that verifies behavior.
It is already possible to get a commit's summary with the
`git_commit_summary` function. It is not possible to get the
remaining part of the commit message, that is the commit
message's body.
Fix this by introducing a new function `git_commit_body`.
The `git_blame__entry` struct keeps track of line counts with
`int` fields. Since `int` is only guaranteed to be at least 16
bits we may overflow on certain platforms when line counts exceed
2^15.
Fix this by instead storing line counts in `size_t`.
It is not unreasonable to have versioned files with a line count
exceeding 2^16. Upon blaming such files we fail to correctly keep
track of the lines as `git_blame_hunk` stores them in `uint16_t`
fields.
Fix this by converting the line fields of `git_blame_hunk` to
`size_t`. Add test to verify behavior.