When replacing an index with a new one, we need to iterate
through all index entries in order to determine which entries are
equal. When it is not possible to re-use old entries for the new
index, we move it into a list of entries that are to be removed
and thus free'd.
When we encounter a non-zero error code, though, we skip adding
the current index entry to the remove-queue. `INSERT_MAP_EX`,
which is the function last run before adding to the remove-queue,
may return a positive non-zero code that indicates what exactly
happened while inserting the element. In this case we skip adding
the entry to the remove-queue but still continue the current
operation, leading to a leak of the current entry.
Fix this by checking for a negative return value instead of a
non-zero one when we want to add the current index entry to the
remove-queue.
When adding to the index, we look to see if a portion of the given
path matches a portion of a path in the index. If so, we will use
the existing path information. For example, when adding `foo/bar.c`,
if there is an index entry to `FOO/other` and the filesystem is case
insensitive, then we will put `bar.c` into the existing tree instead
of creating a new one with a different case.
Use `strncmp` to do that instead of `memcmp`. When we `bsearch`
into the index, we locate the position where the new entry would
go. The index entry at that position does not necessarily have
a relation to the entry we're adding, so we cannot make assumptions
and use `memcmp`. Instead, compare them as strings.
When canonicalizing paths, we look for the first index entry that
matches a given substring.
When duplicating a `struct git_tree_entry` with
`git_tree_entry_dup` the resulting structure is not allocated
inside a memory pool. As we do a 1:1 copy of the original struct,
though, we also copy the `pooled` field, which is set to `true`
for pooled entries. This results in a huge memory leak as we
never free tree entries that were duplicated from a pooled
tree entry.
Fix this by marking the newly duplicated entry as un-pooled.
When formatting a patch as email we do not include the commit's
message in the formatted patch output. Implement this and add a
test that verifies behavior.
It is already possible to get a commit's summary with the
`git_commit_summary` function. It is not possible to get the
remaining part of the commit message, that is the commit
message's body.
Fix this by introducing a new function `git_commit_body`.
The `git_blame__entry` struct keeps track of line counts with
`int` fields. Since `int` is only guaranteed to be at least 16
bits we may overflow on certain platforms when line counts exceed
2^15.
Fix this by instead storing line counts in `size_t`.
It is not unreasonable to have versioned files with a line count
exceeding 2^16. Upon blaming such files we fail to correctly keep
track of the lines as `git_blame_hunk` stores them in `uint16_t`
fields.
Fix this by converting the line fields of `git_blame_hunk` to
`size_t`. Add test to verify behavior.
This reduces the size of the struct from 32 to 26 bytes, and leaves a
single padding byte at the end of the struct (which comes from the
zero-length array).
These are rather small allocations, so we end up spending a non-trivial
amount of time asking the OS for memory. Since these entries are tied to
the lifetime of their tree, we can give the tree a pool so we speed up
the allocations.
We've already looked at the filename with `memchr()` and then used
`strlen()` to allocate the entry. We already know how much we have to
advance to get to the object id, so add the filename length instead of
looking at each byte again.
When building a recursive merge base, allow conflicts to occur.
Use the file (with conflict markers) as the common ancestor.
The user has already seen and dealt with this conflict by virtue
of having a criss-cross merge. If they resolved this conflict
identically in both branches, then there will be no conflict in the
result. This is the best case scenario.
If they did not resolve the conflict identically in the two branches,
then we will generate a new conflict. If the user is simply using
standard conflict output then the results will be fairly sensible.
But if the user is using a mergetool or using diff3 output, then the
common ancestor will be a conflict file (itself with diff3 output,
haha!). This is quite terrible, but it matches git's behavior.
Use annotated commits to act as our virtual bases, instead of regular
commits, to avoid polluting the odb with virtual base commits and
trees. Instead, build an annotated commit with an index and pointers
to the commits that it was merged from.
When there are more than two common ancestors, continue merging the
virtual base with the additional common ancestors, effectively
octopus merging a new virtual base.
When examining the working directory and determining whether it's
up-to-date, only consider the nanoseconds in the index entry when
built with `GIT_USE_NSEC`. This prevents us from believing that
the working directory is always dirty when the index was originally
written with a git client that uinderstands nsecs (like git 2.x).
Allow users to set the `git_libgit2_opts` search path for the
`GIT_CONFIG_LEVEL_PROGRAMDATA`. Convert `GIT_CONFIG_LEVEL_PROGRAMDATA`
to `GIT_SYSDIR_PROGRAMDATA` for setting the configuration.
Ensure that `git_index_read_index` clears the uptodate bit on
files that it modifies.
Further, do not propagate the cache from an on-disk index into
another on-disk index. Although this should not be done, as
`git_index_read_index` is used to bring an in-memory index into
another index (that may or may not be on-disk), ensure that we do
not accidentally bring in these bits when misused.
The uptodate bit should have a lifecycle of a single read->write
on the index. Once the index is written, the files within it should
be scanned for racy timestamps against the new index timestamp.
Keep track of entries that we believe are up-to-date, because we
added the index entries since the index was loaded. This prevents
us from unnecessarily examining files that we wrote during the
cleanup of racy entries (when we smudge racily clean files that have
a timestamp newer than or equal to the index's timestamp when we
read it). Without keeping track of this, we would examine every
file that we just checked out for raciness, since all their timestamps
would be newer than the index's timestamp.
When we insert a conflict in a case-insensitive index, accept the
new entry's path as the correct case instead of leaving the path we
already had.
This puts `git_index_conflict_add()` on the same level as
`git_index_add()` in this respect.
Reload the HEAD and index data for a submodule after reading the
configuration. The configuration may specify a `path`, so we must
update HEAD and index data with that path in mind.
When creating a filebuf, detect a directory that exists in our
target file location. This prevents a failure later, when we try
to move the lock file to the destination.
On platforms that lack `core.symlinks`, we should not go looking for
symbolic links and `p_readlink` their target. Instead, we should
examine the file's contents.
This reduces the chances of a crash in the thread tests. This shouldn't
affect general usage too much, since the main usage of these functions
are to read into an empty buffer.
Instead of relying on the size and timestamp, which can hide changes
performed in the same second, hash the file content's when we care about
detecting changes.
Using calloc instead of malloc because the parse error will lead to an immediate free of committer (and its properties, which can segfault on free if undefined - test_refs_reflog_reflog__reading_a_reflog_with_invalid_format_returns_error segfaulted before the fix).
#3458
Inserting new REUC entries can quickly become pathological given that
each insert unsorts the REUC vector, and both subsequent lookups *and*
insertions will require sorting it again before being successful.
To avoid this, we're switching to `git_vector_insert_sorted`: this keeps
the REUC vector constantly sorted and lets us use the `on_dup` callback
to skip an extra binary search on each insertion.
Provide a new merge option, GIT_MERGE_TREE_FAIL_ON_CONFLICT, which
will stop on the first conflict and fail the merge operation with
GIT_EMERGECONFLICT.
Although CMake will correctly configure include directories for us,
some people may use their own build system, and we should reference
`util.h` based on where it actually lives.
Although our index contains the literal time present in the index,
we do not read nanoseconds from disk, and thus we should not use
them in any comparisons, lest we always think our working directory
is dirty.
Guard this behind a `GIT_USE_NSECS` for future improvement.
For most real use cases, repositories with alternates use them as main
object storage. Checking the alternate for objects before the main
repository should result in measurable speedups.
Because of this, we're changing the sorting algorithm to prioritize
alternates *in cases where two backends have the same priority*. This
means that the pack backend for the alternate will be checked before the
pack backend for the main repository *but* both of them will be checked
before any loose backends.
In the current implementation of ODB backends, each backend is tasked
with refreshing itself after a failed lookup. This is standard Git
behavior: we want to e.g. reload the packfiles on disk in case they have
changed and that's the reason we can't find the object we're looking
for.
This behavior, however, becomes pathological in repositories where
multiple alternates have been loaded. Given that each alternate counts
as a separate backend, a miss in the main repository (which can
potentially be very frequent in cases where object storage comes from
the alternate) will result in refreshing all its packfiles before we
move on to the alternate backend where the object will most likely be
found.
To fix this, the code in `odb.c` has been refactored as to perform the
refresh of all the backends externally, once we've verified that the
object is nowhere to be found.
If the refresh is successful, we then perform the lookup sequentially
through all the backends, skipping the ones that we know for sure
weren't refreshed (because they have no refresh API).
The on-disk pack backend has been adjusted accordingly: it no longer
performs refreshes internally.
We moved the "main" parsing to use 64 bits for the timestamp, but the
quick parsing for the revwalk did not. This means that for large
timestamps we fail to parse the time and thus the walk.
Move this parser to use 64 bits as well.
xdiff craps the bed on large files. Treat very large files as binary,
so that it doesn't even have to try.
Refactor our merge binary handling to better match git.git, which
looks for a NUL in the first 8000 bytes.
As refdb and odb backends can be allocated by client code, libgit2
can’t know whether an alternative memory allocator was used, and thus
should not try to call `git__free` on those objects.
Instead, odb and refdb backend implementations must always provide
their own `free` functions to ensure memory gets freed correctly.
When we do not trust the on-disk mode, we use the mode of an existing
index entry. This allows us to preserve executable bits on platforms
that do not honor them on the filesystem.
If there is no stage 0 index entry, also look at conflicts to attempt
to answer this question: prefer the data from the 'ours' side, then
the 'theirs' side before falling back to the common ancestor.
SSL_shutdown() does not like it when we pass an unitialized ssl context
to it. This means that when we fail to connect to a host, we hide the
error message saying so with OpenSSL's indecipherable error message.
git expects an empty line after the binary data:
literal X
...binary data...
<empty_line>
The last literal block of the generated patches were not containing the required empty line. Example:
diff --git a/binary_file b/binary_file
index 3f1b3f9098131cfecea4a50ff8afab349ea66d22..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 6
Nc${NM%g@i}0ssZ|0lokL
diff --git a/binary_file2 b/binary_file2
index 31be99be19470da4af5b28b21e27896a2f2f9ee2..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 13
Sc${NMEKbZyOexL+Qd|HZV+4u-
git apply of that diff results in:
error: corrupt binary patch at line 9: diff --git a/binary_file2 b/binary_file2
fatal: patch with only garbage at line 10
The proper formating is:
diff --git a/binary_file b/binary_file
index 3f1b3f9098131cfecea4a50ff8afab349ea66d22..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 6
Nc${NM%g@i}0ssZ|0lokL
diff --git a/binary_file2 b/binary_file2
index 31be99be19470da4af5b28b21e27896a2f2f9ee2..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 13
Sc${NMEKbZyOexL+Qd|HZV+4u-
If the header doesn't look like a header (e.g. if it doesn't have a ":"
or if it has newlines), report "custom HTTP header '%s' is malformed".
If the header has the same name as a header already set by libgit2 (e.g.
"Host"), report "HTTP header '%s' is already set by libgit2".
This allows us to remove OS checks from source code, instead relying
on CMake to detect whether or not `struct stat` has the nanoseconds
members we rely on.
Check that the repository directory is beneath the workdir before
adding it to the list of reserved paths. If it is not, then there
is no possibility of checking out files into it, and it should not
be a reserved word.
This is a particular problem with submodules where the repo directory
may be in the super's .git directory.
When there is a comment at the end of a section, git keeps it there,
while we write the new variable right at the end.
Keep comments buffered and dump them when we're going to output a
variable or section, or reach EOF. This puts us in line with the config
files which git produces.
Don't coalesce all errors into ENOENT. At least identify EACCES.
All callers should be handling this case already, as the POSIX
`lstat` will return this.
`git_futils_mkdir` does not blindly call `git_futils_mkdir_relative`.
`git_futils_mkdir_relative` is used when you have some base directory
and want to create some path inside of it, potentially removing blocking
symlinks and files in the process. This is not suitable for a general
recursive mkdir within the filesystem.
Instead, when `mkdir` is being recursive, locate the first existent
parent directory and use that as the base for `mkdir_relative`.
Untangle git_futils_mkdir from git_futils_mkdir_ext - the latter
assumes that we own everything beneath the base, as if it were
being called with a base of the repository or working directory,
and is tailored towards checkout and ensuring that there is no
bogosity beneath the base that must be cleaned up.
This is (at best) slow and (at worst) unsafe in the larger context
of a filesystem where we do not own things and cannot do things like
unlink symlinks that are in our way.
When a file exists on disk and we're checking out a file that differs
in executableness, remove the old file. This allows us to recreate the
new file with p_open, which will take the new mode into account and
handle setting the umask properly.
Remove any notion of chmod'ing existing files, since it is now handled
by the aforementioned removal and was incorrect, as it did not take
umask into account.
The canonical directory path of the root directory of a volume on
POSIX already ends in a slash (eg, `/`). This is true only at the
root. Do not add a slash to paths in this case.
The canonical directory path of the root directory of a volume on
windows already ends in a slash (eg, `c:/`). This is true only
at the volume root. Do not add a slash to paths in this case.
The previous commit left the comment referencing the earlier state of
the code, change it to explain the current logic. While here, change the
logic to avoid repeating the copy of the base pattern.
When we're not doing pathspec matching, we let the iterator handle
file matching for us. However, we can only trust the iterator to
return *files* that match the pattern, because the iterator must
return directories that are not strictly in the pathlist, but that
are the parents of files that match the pattern, so that diff can
later recurse into them.
Thus, diff must examine non-files explicitly before including them
in the delta list.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
Error: stack overflow.
When searching for information about a submdoule, let's be more explicit
in what we expect to find. We currently insert a submodule into the map
and change certain parameters when the config callback gets called.
Switch to asking for the configuration we're interested in, rather than
taking it in an arbitrary order.
On case insensitive platforms, allow `git_index_add` to provide a new
path for an existing index entry. Previously, we would maintain the
case in an index entry without the ability to change it (except by
removing an entry and re-adding it.)
Higher-level functions (like `git_index_add_bypath` and
`git_index_add_frombuffers`) continue to keep the old path for easier
usage.
On case insensitive systems, when given a user-provided path in the
higher-level index addition functions (eg `git_index_add_bypath` /
`git_index_add_frombuffer`), examine the index to try to match the
given path to an existing directory.
Various mechanisms can cause the on-disk representation of a folder
to not match the representation in HEAD or the index - for example,
a case changing rename of some file `a/file.txt` to `A/file.txt`
will update the paths in the index, but not rename the folder on
disk.
If a user subsequently adds `a/other.txt`, then this should be stored
in the index as `A/other.txt`.
We create a lockfile to update files under GIT_DIR. Sometimes these
files are actually located elsewhere and a symlink takes their place. In
that case we should lock and update the file at its final location
rather than overwrite the symlink.
Some nicer refactoring for index iteration walks.
The index iterator doesn't binary search through the pathlist space,
since it lacks directory entries, and would have to binary search
each index entry and all its parents (eg, when presented with an index
entry of `foo/bar/file.c`, you would have to look in the pathlist for
`foo/bar/file.c`, `foo/bar` and `foo`). Since the index entries and the
pathlist are both nicely sorted, we walk the index entries in lockstep
with the pathlist like we do for other iteration/diff/merge walks.
When using literal pathspecs in diff with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`
turn on the faster iterator pathlist handling.
Updates iterator pathspecs to include directory prefixes (eg, `foo/`)
for compatibility with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`.
When given a pathlist, don't assume that directories sort before
files. Walk through any list of entries sorting before us to make
sure that we've exhausted all entries that *aren't* directories.
Eg, if we're searching for 'foo/bar', and we have a 'foo.c', keep
advancing the pathlist to keep looking for an entry prefixed with
'foo/'.
libgit2 implementations of smart subtransports can simply reach through
the structure, but external implementors cannot.
Add these two functions as a way for the smart subtransports to get the
callbacks as set by the user.
When parsing user-provided regex patterns for functions, we must not
fail to provide a diff just because a pattern is not well
formed. Ignore it instead.
When we ask for credentials, the user may choose to return EUSER to
indicate that an error has happened on its end and it wants to be given
back control.
We must therefore pass that back to the user instead of mentioning that
it was on_headers_complete() that returned an error code. Since we can,
we return the exact error code from the user (other than PASSTHROUGH)
since it doesn't cost anything, though using other error codes aren't
recommended.
This lock/unlock pair allows for the cller to lock a configuration file
to avoid concurrent operations.
It also allows for a transactional approach to updating a configuration
file. If multiple updates must be made atomically, they can be done
while the config is locked.
When a configuration file is locked, any updates made to it will be done
to the in-memory copy of the file. This allows for multiple updates to
happen while we hold the lock, preventing races during complex
config-file manipulation.
Instead of writing into the filebuf directly, make the functions to
write the modified config file write into a buffer which can then be
dumped into the lockfile for committing.
This allows us to re-use the same code for modifying a locked
configuration, as we can simply skip the last step of dumping the data
to disk.
When we're looking to update a tag, we can't stop if the tag auto-follow
rules don't say to update it. The tag might still match the refspec we
were given.
When curl uses a proxy, it will only use Basic unless we prompt it to
try to use the most secure on it has available.
This is something which git did recently, and it seems like a good idea.
With Visual Studio versions 2008 and older they ignore the full path to files and only check
the basename of the file to find a collision. Additionally, having duplicate basenames can break
other build tools like GYP.
This fixes https://github.com/libgit2/libgit2/issues/3356
Allow restoring a previously captured oom error, by
detecting when the captured message pointer points to the
static oom error message. This means there is no need
to strdup the message in giterr_detach.
Error messages that are detached are assumed to be dynamically
allocated. Passing a pointer to the static oom error message
can cause an attempt to free the static buffer later. This change
checks if the oom error message is about to be detached and detaches
a copy instead.
Instead of allocating a brand new buffer for each error string we want
to store, we can use a per-thread buffer to store the error string and
re-use the underlying storage. We already use the buffer to format the
string, so this mostly makes that more direct.
An error here will typically mean that the directory was removed between
the time we iterated the parent and the time we wanted to visit it in
which case we should ignore it.
Other kinds of errors such as permissions (or transient errors) also
better dealt with by pretending we didn't see it.
When we have an error renaming the lockfile, we need to make sure
that we remove it upon cleanup. For this, we need to keep track of
whether we opened the file and whether the rename succeeded.
If we did create the lockfile but the rename did not succeed, we
remove the lockfile. This won't protect against all errors, but
the most common ones (target file is open) does get handled.
The header src/cc-compat.h defines portable format specifiers PRIuZ, PRIdZ, and PRIxZ. The original report highlighted the need to use these specifiers in examples/network/fetch.c. For this commit, I checked all C source and header files not in deps/ and transitioned to the appropriate format specifier where appropriate.
Removing a reflog upon ref deletion is something which only some
backends might wish to do. Backends which are database-backed may wish
to archive a reflog, log-based ones may not need to do anything.
If we get the path from the gitmodules file, look up the submodule we're
interested in by path, rather then by name. Otherwise we might get
duplicate results.
Upgrade xdiff to version used in core git 2.4.5 (0df0541).
Corrects an issue where an LF is added at EOF while applying
an unrelated change (ba31180), cleans up some unused code (be89977 and
e5b0662), and provides an improved callback to avoid leaking internal
(to xdiff) structures (467d348).
This also adds some additional functionality that we do not yet take
advantage of, namely the ability to ignore changes whose lines are
all blank (36617af).
Versions prior to 0.9.8f did not have this function, rhel/centos5 are still on a
heavily backported version of 0.9.8e and theoretically supported until March 2017
Without this ifdef, I get the following link failure:
```
CMakeFiles/libgit2_clar.dir/src/openssl_stream.c.o: In function `openssl_connect':
openssl_stream.c:(.text+0x45a): undefined reference to `SSL_set_tlsext_host_name'
collect2: error: ld returned 1 exit status
make[6]: *** [libgit2_clar] Error 1
```
Introduce `git__getenv` which is a UTF-8 aware `getenv` everywhere.
Make `cl_getenv` use this to keep consistent memory handling around
return values (free everywhere, as opposed to only some platforms).
The regex we use to look at the gitmodules file does not correctly
delimit the name of submodule which we want to look up and puts '.*'
straight after the name, maching on any submodule which has the seeked
submodule as a prefix of its name.
Add the missing '\.' in the regex so we want a full stop to exist both
before and after the submodule name.
Allow custom filters with wildcard attributes, so that clients
can support some random `filter=foo` in a .gitattributes and look
up the corresponding smudge/clean commands in the configuration file.
Function was added in commit 2c982daa2e on October 5, 2011,
and removed in commit 41fb1ca0ec on October 29, 2012.
Given the length of time it's gone unused, it's safe to remove now.
This reverts commit 969d4b703c.
This was a fluke from Coverity. The length to all the APIs in the
library is supposed to be passed in as nibbles, not bytes. Passing it as
bytes would prevent us from parsing uneven-sized SHA1 strings.
Also, the rest of the library was still using nibbles (including
revparse and the odb_prefix APIs), so this change was seriously breaking
things in unexpected ways. ^^
When diffing the index with the workdir and GIT_DIFF_UPDATE_INDEX has been passed,
the previous implementation was always writing the index to disk even if it wasn't
modified.