In order to match the star-star, we disable the flag that's looking for
a single path element, but that leads to searching for the pattern in
the middle of elements in the input string.
Mark when we're handing a star-star so we jump over the elements in our
attempt to match the part of the pattern that comes after the star-star.
While here, tighten up the check so we don't allow invalid rules
through.
When we're dealing with proxy addresses, we only want a hostname and
port, and the user would not provide a path, so make it optional so we
can use this same function to parse git as well as proxy URLs.
If we cannot dwim the input, set the error message to be explicit about
that. Otherwise we leave the error for the last failed lookup, which
can be rather unexpected as it mentions a remote when the user thought
they were trying to look up a branch.
Allow callers to specify a start path with a trailing slash to match
a submodule, instead of just a directory. This is for some legacy
behavior that's sort of dumb, but there it is.
If we're looking for a symlink, realpath will give us the resolved path,
which is not what we're after, but a canonicalized version of the path
the user asked for.
The code initializing the merge driver registry accidentally
forgot a `goto done` in case of an error. Because of this the
next line, which registers the global shutdown callback for the
merge drivers, is only called when an error occured.
Fix this by adding the missing `goto done`. This fixes some
memory leaks when the global state is shut down.
The xdl_prepare_env() function may initialise an xdlclassifier_t
data structure via xdl_init_classifier(), which allocates memory
to several fields, for example 'rchash', 'rcrecs' and 'ncha'.
If this function later exits due to the failure of xdl_optimize_ctxs(),
then this xdlclassifier_t structure, and the memory allocated to it,
is not cleaned up.
In order to fix the memory leak, insert a call to xdl_free_classifier()
before returning.
This patch was originally written by Ramsay Jones (see commit
87f16258367a3b9a62663b11f898a4a6f3c19d31 in git.git).
Commit 307ab20b3 ("xdiff: PATIENCE/HISTOGRAM are not independent option
bits", 19-02-2012) introduced the XDF_DIFF_ALG() macro to access the
flag bits used to represent the diff algorithm requested. In addition,
code which had used explicit manipulation of the flag bits was changed
to use the macros.
However, one example of direct manipulation remains. Update this code to
use the XDF_DIFF_ALG() macro.
This patch was originally written by Ramsay Jones (see commit
5cd6978a9cfef58de061a9525f3678ade479564d in git.git).
If we hit the EOF while trying to write a new value, it may be that
we're already in the section that we were looking for. If so, do not
write a (duplicate) section header, just write the value.
Ensure that we have hit the end of iteration; previously we tested
that we saw all the values that we expected to see. We did not
then ensure that we were at the end of the iteration (and that there
were subsequently values in the iteration that we did *not* expect.)
Drop some of the layers of indirection between the workdir and the
filesystem iterators. This makes the code a little bit easier to
follow, and reduces the number of unnecessary allocations a bit as
well. (Prior to this, when we filter entries, we would allocate them,
filter them and then free them; now we do the filtering before
allocation.)
Also, rename `git_iterator_advance_over_with_status` to just
`git_iterator_advance_over`. Mostly because it's a fucking long-ass
function name otherwise.
Many code paths in checkout need the final, full on-disk path of the
file they're writing. (No surprise). However, they all munge the
`data->path` buffer themselves to get there. Provide a nice helper
method for them.
Plus, drop the use `git_iterator_current_workdir_path` which does the
same thing but different. Checkout is the only caller of this silly
function, which lets us remove it.
Refactored the tree iterator to never recurse; simply process the
next entry in order in `advance`. Additionally, reduce the number of
allocations and sorting as much as possible to provide a ~30% speedup
on case-sensitive iteration. (The gains for case-insensitive iteration
are less majestic.)
Disambiguate the reset and reset_range functions. Now reset_range
with a NULL path will clear the start or end; reset will leave the
existing start and end unchanged.
The callback mechanism makes it awkward to write data from an IO
source; move to `_fromstream()` which lets the caller remain in control,
in the same vein as we prefer iterators over foreach callbacks.
The pair of `git_blob_create_frombuffer()` and
`git_blob_create_frombuffer_commit()` is meant to replace
`git_blob_create_fromchunks()` by providing a way for a user to write a
new blob when they want filtering or they do not know the size.
This approach allows the caller to retain control over when to add data
to this buffer and a more natural fit into higher-level language's own
stream abstractions instead of having to handle IO wait in the callback.
The in-memory buffer size of 2MB is chosen somewhat arbitrarily to be a
round multiple of usual page sizes and a value where most blobs seem
likely to be either going to be way below or way over that size. It's
also a round number of pages.
This implementation re-uses the helper we have from `_fromchunks()` so
we end up writing everything to disk, but hopefully more efficiently
than with a default filebuf. A later optimisation can be to avoid
writing the in-memory contents to disk, with some extra complexity.
Allow setting the buffer size on open in order to use this data
structure more generally as a spill buffer, with larger buffer sizes for
specific use-cases.
This special-casing ignores that we might have a locked file, so the
hashtable does not represent the contents of the file we want to
write. This causes multivar writes to overwrite entries instead of add
to them when under lock.
There is no need for this as the normal code-path will write to the file
just fine, so simply get rid of it.
Take advantage of the constant size of tree-owned arrays and store them
in an array instead of a pool. This still lets us free them all at once
but lets the system allocator do the work of fitting them in.
Since the `apply` callback can defer, the `check` callback is not
necessary. Removing the `check` callback further makes the `payload`
unnecessary along with the `cleanup` callback.
Consumers can now register custom merged drivers with
`git_merge_driver_register`. This allows consumers to support the
merge drivers, as configured in `.gitattributes`. Consumers will be
asked to perform the file-level merge when a custom driver is
configured.
The function to extract signatures suffers from a similar bug to the
header field finding one by having an unecessary line feed check as a
break condition of its loop.
Fix that and add a test for this single-line signature situation.
While often similar, these are not the same on Windows. We want to use the page
size on Windows for the pools, but for mmap we need to use the allocation
granularity as the alignment.
On the other platforms these values remain the same.
This ensures that when using OpenSSL a safe default set of ciphers
is selected. This is done so that the client communicates securely
and we don't accidentally enable unsafe ciphers like RC4, or even
worse some old export ciphers.
Implements the first part of https://github.com/libgit2/libgit2/issues/3682
Callers of `git_config__cvar` already handle the case where the
function returns an error due to a failed configuration variable
lookup, but we are actually swallowing errors when calling
`git_config__lookup_entry` inside of the function.
Fix this by returning early when `git_config__lookup_entry`
returns an error. As we call `git_config__lookup_entry` with
`no_errors == false` which leads us to call `get_entry` with
`GET_NO_MISSING` we will not return early when the lookup fails
due to a missing entry. Like this we are still able to set the
default value of the cvar and exit successfully.
When writing to a file with locking not check if writing the
locked file actually succeeds. Fix the issue by returning error
code and message when writing fails.
Accessing the current values map is handled through the
`refcounder_strmap_take` function, which first acquires a mutex
before accessing its values. While this assures everybody is
trying to access the values with the mutex only we do not check
if the locking actually succeeds.
Fix the issue by checking if acquiring the lock succeeds and
returning `NULL` if we encounter an error. Adjust callers.
When normalizing options we try to look up HEAD's OID. While this
action may fail in malformed repositories we never check the
return value of the function.
Fix the issue by converting `normalize_options` to actually
return an error and handle the error in `git_blame_file`.
We usually check entries returned by `git_sortedcache_entry` for
NULL pointers. As we have a write lock in `packed_write`, though,
it really should not happen that the function returns NULL.
Assert that ref is not NULL to silence a Coverity warning.
When the user passes in a diff which has no repository associated
we may call `git_config__get_int_force` with a NULL-pointer
configuration. Even though `git_config__get_int_force` is
designed to swallow errors, it is not intended to be called with
a NULL pointer configuration.
Fix the issue by only calling `git_config__get_int_force` only
when configuration could be retrieved from the repository.
In C89 it is undefined behavior to pass `NULL` pointers to
`strncmp` and later on in C99 it has been explicitly stated that
functions with an argument declared as `size_t nmemb` specifying
the array length shall always have valid parameters, no matter if
`nmemb` is 0 or not (see ISO 9899 §7.21.1.2).
The function `str_equal_no_trailing_slash` always passes its
parameters to `strncmp` if their lengths match. This means if one
parameter is `NULL` and the other one either `NULL` or a string
with length 0 we will pass the pointers to `strncmp` and cause
undefined behavior.
Fix this by explicitly handling the case when both lengths are 0.
When computing a short OID we do this by first copying the
leading parts into the new OID structure and then setting the
trailing part to zero. In the case of the desired length being
`GIT_OID_HEXSZ - 1` we will call `memset` with an out of bounds
pointer and a length of 0. While this seems to cause no problems
for common platforms the C89 standard does not explicitly state
that calling `memset` with an out of bounds pointer and
length of 0 is valid.
Fix the potential issue by using the newly introduced
`git_oid__cpy_prefix` function.
When parsing a section header we expect something along the
format of '[section "subsection"]'. When a section is
mal-formated and is entirely missing its quotation marks we catch
this case by observing that `strchr(line, '"') - strrchr(line,
'"') = NULL - NULL = 0` and error out. Unfortunately, the error
message is misleading though, as we state that we are missing the
closing quotation mark while we in fact miss both quotation
marks.
Improve the error message by explicitly checking if the first
quotation mark could be found and, if not, stating that quotation
marks are completely missing.
The first time may be due to memory fragmentation or just bad luck on a
32-bit system. When we hit the mmap error for the first time, free up
the unused windows and try again.
The old implementation had two issues:
1. OIDs that were too short as to be ambiguous were not being handled
properly.
2. If the last OID to expand in the array was missing from the ODB, we
would leak a `GIT_ENOTFOUND` error code from the function.
Sometimes you want to create a commit but not write it out to the
objectdb immediately. For these cases, provide a new function to
retrieve the buffer instead of having to go through the db.
Submodules don't exist in the objectdb and the code is making us try to
look for a blob with its commit id, which is obviously not going to
work.
Skip the test if the user wants to insert a submodule.
We should have been doing this, but it initializes itself upon first
use, which works as long as nobody's doing concurrent network
operations. Initialize it on our init to make sure it's not getting
initialized concurrently.
If the caller has provided bad authentication, give them another
apportunity to get it right until they give up. This brings WinHTTP in
line with the other transports.
Commit 3d1abc5afc fixes a memory leak in the xdiff code. In the
process of upstreaming the fix it was pointed out by Johannes
Schindelin that there is another memory leak present (see [1]).
Fix the second memory leak by applying the upstream fix to our
code base.
[1]: http://thread.gmane.org/gmane.comp.version-control.git/287034
Android NDK does not have a `struct timespec` in its `struct stat`
for nanosecond support, instead it has a single nanosecond member inside
the struct stat itself. We will use that and use a macro to expand to
the `st_mtim` / `st_mtimespec` definition on other systems (much like
the existing `st_mtime` backcompat definition).
Use the `giterr_set` function, which actually supports `GITERR_OS`.
The `giterr_set_str` function is exposed for external users and will
not append the operating system's error message.
The `normalize_find_opts` function in theory allows for the
incoming diff to have no repository. When the caller does not
pass in diff find options or if the GIT_DIFF_FIND_BY_CONFIG value
is set, though, we try to derive the configuration from the
diff's repository configuration without first verifying that the
repository is actually set to a non-NULL value.
Fix this issue by explicitly checking if the repository is set
and if it is not, fall back to a default value of
GIT_DIFF_FIND_RENAMES.
Convert `rebase_alloc` to use our usual error propagation
patterns, that is accept an out-parameter and return an error
code that is to be checked by the caller. This allows us to use
the GITERR_CHECK_ALLOC macro, which helps static analysis.
Set the error code when an error occurs in any of the called
functions. This ensures we pass the error up to callers and
actually free the remote when an error occurs.
The overflow check in `read_reuc` tries to verify if the
`git__strtol32` parses an integer bigger than UINT_MAX. The `tmp`
variable is casted to an unsigned int for this and then checked
for being greater than UINT_MAX, which obviously can never be
true.
Fix this by instead fixing the `mode` field's size in `struct
git_index_reuc_entry` to `uint32_t`. We can now parse the int
with `git__strtol64`, which can never return a value bigger than
`UINT32_MAX`, and additionally checking if the returned value is
smaller than zero.
We do not need to handle overflows explicitly here, as
`git__strtol64` returns an error when the returned value would
overflow.
The fail-label of `reflog_parse` explicitly checks the entry
poitner for NULL before freeing it. When we jump to the label the
variable has to be set to a non-NULL and valid pointer though: if
the allocation fails we immediately return with an error code and
if the loop was not entered we return with a success code,
withouth executing the label's code.
Remove the useless NULL-check to silence Coverity.
When invoking `diff_print_info_init_frompatch` it is obvious that
the patch should be non-NULL. We explicitly check if the variable
is set and continue afterwards, happily dereferencing the
potential NULL-pointer.
Fix this by instead asserting that patch is set. This also
silences Coverity.
The function `compute_write_order` may return a `NULL`-pointer
when an error occurs. In such cases we jump to the `done`-label
where we try to clean up allocated memory. Unfortunately we try
to deallocate the `write_order` array, though, which may be NULL
here.
Fix this error by returning early instead of jumping to the
`done` label. There is no data to be cleaned up anyway.
When no payload is set for `crlf_apply` we try to compute the
crlf attributes ourselves with `crlf_check`. When the function
determines that the current file does not require any treatment
we return the GIT_PASSTHROUGH error code without actually
allocating the out-pointer, which indicates the file should not
be passed through the filter.
The `crlf_apply` function explicitly checks for the
GIT_PASSTHROUGH return code and ignores it. This means we will
try to apply the crlf-filter to the current file, leading us to
dereference the unallocated payload-pointer.
Fix this obviously incorrect behavior by not treating
GIT_PASSTHROUGH in any special way. This is the correct thing to
do anyway, as the code indicates that the file should not be
passed through the filter.
We commonly have to check if a git_buf has been allocated
correctly or if we ran out of memory. Introduce a new macro
similar to `GITERR_CHECK_ALLOC` which checks if we ran OOM and if
so returns an error. Provide a `#nodef` for Coverity to mark the
error case as an abort path.
When checking for out of memory situations we usually use the
GITERR_CHECK_ALLOC macro. Besides conforming to our current code
base it adds the benefit of silencing errors in Coverity due to
Coverity handling the macro's error path as abort.
Allow `git_index_read` to handle reading existing indexes with
illegal entries. Allow the low-level `git_index_add` to add
properly formed `git_index_entry`s even if they contain paths
that would be illegal for the current filesystem (eg, `AUX`).
Continue to disallow `git_index_add_bypath` from adding entries
that are illegal universally illegal (eg, `.git`, `foo/../bar`).
Although a `tree_iterator` that failed to be properly created
does not have a frame, all other `tree_iterator`s should. Do not
call `pop` in the failure case, but assert that in all other
cases there is a frame.
When Git repository at network locations, sometimes git_iterator_for_tree
fails at iterator__update_ignore_case so it goes to git_iterator_free.
Null pointer will crash the process if not check.
Signed-off-by: Colin Xu <colin.xu@gmail.com>
We should be checking whether the object we're looking up is a commit,
and we should let the caller know whether the not-found return code
comes from a bad object type or just a missing signature.
When performing an in-memory rebase, keep a single index for the
duration, so that callers have the expected index lifecycle and
do not hold on to an index that is free'd out from under them.
When we moved the logic to handle the first one, wrong loop logic was
kept in place which meant we still finished early. But we now notice it
because we're not reading past the last LF we find.
This was not noticed before as the last field in the tested commit was
multi-line which does not trigger the early break.
Introduce the ability to rebase in-memory or in a bare repository.
When `rebase_options.inmemory` is specified, the resultant `git_rebase`
session will not be persisted to disk. Callers may still analyze
the rebase operations, resolve any conflicts against the in-memory
index and create the commits. Neither `HEAD` nor the working
directory will be updated during this process.
The function `git_packfile_stream_open` tries to free the passed
in stream when an error occurs. The only call site is
`git_indexer_append`, though, which passes in the address of a
stream struct which has not been allocated on the heap.
Fix the issue by simply removing the call to free. In case of an
error we did not allocate any memory yet and otherwise it should
be the caller's responsibility to manage it's object's lifetime.
We were searching only past the first header field, which meant we were
unable to find e.g. `tree` which is the first field.
While here, make sure to set an error message in case we cannot find the
field.
Previously we would set the global filter registry structure before
adding filters to the structure, without a lock, which is quite racy.
Now, register default filters during global registration and use an
rwlock to read and write the filter registry (as appopriate).
Standard Windows type systems define CLSID_InternetSecurityManager
and IID_IInternetSecurityManager, but MinGW lacks these definitions.
As a result, we must hardcode these definitions ourselves. However,
we should not use a public struct with those names, lest another
library do the same thing and consumers cannot link to both.
We don't support using an index object from multiple threads at the same
time, so the locking doesn't have any effect when following the
rules. If not following the rules, things are going to break down
anyway.
Include dotfiles when copying template directory, which will handle
both a template directory itself that begins with a dotfile, and
any dotfiles inside the directory.
Fix the possibility of returning successfully from ssh_stream_read()
with *bytes_read < 0. This would occur if stdout channel read resulted
in 0, and stderr channel read failed afterwards.
Note that we're not checking whether the resize succeeds; in OOM cases,
we let it run with a "small" vector and hash table and see if by chance
we can grow it dynamically as we insert the new entries. Nothing to
lose really.
Instead of calling `git_index_add` in a loop, use the new
`git_index_fill` internal API to fill the index with the initial staged
entries.
The new `fill` helper assumes that all the entries will be unique and
valid, so it can append them at the end of the entries vector and only
sort it once at the end. It performs no validation checks.
This prevents the quadratic behavior caused by having to sort the
entries list once after every insertion.
When replacing an index with a new one, we need to iterate
through all index entries in order to determine which entries are
equal. When it is not possible to re-use old entries for the new
index, we move it into a list of entries that are to be removed
and thus free'd.
When we encounter a non-zero error code, though, we skip adding
the current index entry to the remove-queue. `INSERT_MAP_EX`,
which is the function last run before adding to the remove-queue,
may return a positive non-zero code that indicates what exactly
happened while inserting the element. In this case we skip adding
the entry to the remove-queue but still continue the current
operation, leading to a leak of the current entry.
Fix this by checking for a negative return value instead of a
non-zero one when we want to add the current index entry to the
remove-queue.
When adding to the index, we look to see if a portion of the given
path matches a portion of a path in the index. If so, we will use
the existing path information. For example, when adding `foo/bar.c`,
if there is an index entry to `FOO/other` and the filesystem is case
insensitive, then we will put `bar.c` into the existing tree instead
of creating a new one with a different case.
Use `strncmp` to do that instead of `memcmp`. When we `bsearch`
into the index, we locate the position where the new entry would
go. The index entry at that position does not necessarily have
a relation to the entry we're adding, so we cannot make assumptions
and use `memcmp`. Instead, compare them as strings.
When canonicalizing paths, we look for the first index entry that
matches a given substring.
When duplicating a `struct git_tree_entry` with
`git_tree_entry_dup` the resulting structure is not allocated
inside a memory pool. As we do a 1:1 copy of the original struct,
though, we also copy the `pooled` field, which is set to `true`
for pooled entries. This results in a huge memory leak as we
never free tree entries that were duplicated from a pooled
tree entry.
Fix this by marking the newly duplicated entry as un-pooled.
When formatting a patch as email we do not include the commit's
message in the formatted patch output. Implement this and add a
test that verifies behavior.
It is already possible to get a commit's summary with the
`git_commit_summary` function. It is not possible to get the
remaining part of the commit message, that is the commit
message's body.
Fix this by introducing a new function `git_commit_body`.
The `git_blame__entry` struct keeps track of line counts with
`int` fields. Since `int` is only guaranteed to be at least 16
bits we may overflow on certain platforms when line counts exceed
2^15.
Fix this by instead storing line counts in `size_t`.
It is not unreasonable to have versioned files with a line count
exceeding 2^16. Upon blaming such files we fail to correctly keep
track of the lines as `git_blame_hunk` stores them in `uint16_t`
fields.
Fix this by converting the line fields of `git_blame_hunk` to
`size_t`. Add test to verify behavior.
This reduces the size of the struct from 32 to 26 bytes, and leaves a
single padding byte at the end of the struct (which comes from the
zero-length array).
These are rather small allocations, so we end up spending a non-trivial
amount of time asking the OS for memory. Since these entries are tied to
the lifetime of their tree, we can give the tree a pool so we speed up
the allocations.
We've already looked at the filename with `memchr()` and then used
`strlen()` to allocate the entry. We already know how much we have to
advance to get to the object id, so add the filename length instead of
looking at each byte again.
When building a recursive merge base, allow conflicts to occur.
Use the file (with conflict markers) as the common ancestor.
The user has already seen and dealt with this conflict by virtue
of having a criss-cross merge. If they resolved this conflict
identically in both branches, then there will be no conflict in the
result. This is the best case scenario.
If they did not resolve the conflict identically in the two branches,
then we will generate a new conflict. If the user is simply using
standard conflict output then the results will be fairly sensible.
But if the user is using a mergetool or using diff3 output, then the
common ancestor will be a conflict file (itself with diff3 output,
haha!). This is quite terrible, but it matches git's behavior.
Use annotated commits to act as our virtual bases, instead of regular
commits, to avoid polluting the odb with virtual base commits and
trees. Instead, build an annotated commit with an index and pointers
to the commits that it was merged from.
When there are more than two common ancestors, continue merging the
virtual base with the additional common ancestors, effectively
octopus merging a new virtual base.
When examining the working directory and determining whether it's
up-to-date, only consider the nanoseconds in the index entry when
built with `GIT_USE_NSEC`. This prevents us from believing that
the working directory is always dirty when the index was originally
written with a git client that uinderstands nsecs (like git 2.x).
Allow users to set the `git_libgit2_opts` search path for the
`GIT_CONFIG_LEVEL_PROGRAMDATA`. Convert `GIT_CONFIG_LEVEL_PROGRAMDATA`
to `GIT_SYSDIR_PROGRAMDATA` for setting the configuration.
Ensure that `git_index_read_index` clears the uptodate bit on
files that it modifies.
Further, do not propagate the cache from an on-disk index into
another on-disk index. Although this should not be done, as
`git_index_read_index` is used to bring an in-memory index into
another index (that may or may not be on-disk), ensure that we do
not accidentally bring in these bits when misused.
The uptodate bit should have a lifecycle of a single read->write
on the index. Once the index is written, the files within it should
be scanned for racy timestamps against the new index timestamp.
Keep track of entries that we believe are up-to-date, because we
added the index entries since the index was loaded. This prevents
us from unnecessarily examining files that we wrote during the
cleanup of racy entries (when we smudge racily clean files that have
a timestamp newer than or equal to the index's timestamp when we
read it). Without keeping track of this, we would examine every
file that we just checked out for raciness, since all their timestamps
would be newer than the index's timestamp.
When we insert a conflict in a case-insensitive index, accept the
new entry's path as the correct case instead of leaving the path we
already had.
This puts `git_index_conflict_add()` on the same level as
`git_index_add()` in this respect.
Reload the HEAD and index data for a submodule after reading the
configuration. The configuration may specify a `path`, so we must
update HEAD and index data with that path in mind.
When creating a filebuf, detect a directory that exists in our
target file location. This prevents a failure later, when we try
to move the lock file to the destination.
On platforms that lack `core.symlinks`, we should not go looking for
symbolic links and `p_readlink` their target. Instead, we should
examine the file's contents.
This reduces the chances of a crash in the thread tests. This shouldn't
affect general usage too much, since the main usage of these functions
are to read into an empty buffer.
Instead of relying on the size and timestamp, which can hide changes
performed in the same second, hash the file content's when we care about
detecting changes.
Using calloc instead of malloc because the parse error will lead to an immediate free of committer (and its properties, which can segfault on free if undefined - test_refs_reflog_reflog__reading_a_reflog_with_invalid_format_returns_error segfaulted before the fix).
#3458
Inserting new REUC entries can quickly become pathological given that
each insert unsorts the REUC vector, and both subsequent lookups *and*
insertions will require sorting it again before being successful.
To avoid this, we're switching to `git_vector_insert_sorted`: this keeps
the REUC vector constantly sorted and lets us use the `on_dup` callback
to skip an extra binary search on each insertion.
Provide a new merge option, GIT_MERGE_TREE_FAIL_ON_CONFLICT, which
will stop on the first conflict and fail the merge operation with
GIT_EMERGECONFLICT.
Although CMake will correctly configure include directories for us,
some people may use their own build system, and we should reference
`util.h` based on where it actually lives.
Although our index contains the literal time present in the index,
we do not read nanoseconds from disk, and thus we should not use
them in any comparisons, lest we always think our working directory
is dirty.
Guard this behind a `GIT_USE_NSECS` for future improvement.
For most real use cases, repositories with alternates use them as main
object storage. Checking the alternate for objects before the main
repository should result in measurable speedups.
Because of this, we're changing the sorting algorithm to prioritize
alternates *in cases where two backends have the same priority*. This
means that the pack backend for the alternate will be checked before the
pack backend for the main repository *but* both of them will be checked
before any loose backends.
In the current implementation of ODB backends, each backend is tasked
with refreshing itself after a failed lookup. This is standard Git
behavior: we want to e.g. reload the packfiles on disk in case they have
changed and that's the reason we can't find the object we're looking
for.
This behavior, however, becomes pathological in repositories where
multiple alternates have been loaded. Given that each alternate counts
as a separate backend, a miss in the main repository (which can
potentially be very frequent in cases where object storage comes from
the alternate) will result in refreshing all its packfiles before we
move on to the alternate backend where the object will most likely be
found.
To fix this, the code in `odb.c` has been refactored as to perform the
refresh of all the backends externally, once we've verified that the
object is nowhere to be found.
If the refresh is successful, we then perform the lookup sequentially
through all the backends, skipping the ones that we know for sure
weren't refreshed (because they have no refresh API).
The on-disk pack backend has been adjusted accordingly: it no longer
performs refreshes internally.
We moved the "main" parsing to use 64 bits for the timestamp, but the
quick parsing for the revwalk did not. This means that for large
timestamps we fail to parse the time and thus the walk.
Move this parser to use 64 bits as well.
xdiff craps the bed on large files. Treat very large files as binary,
so that it doesn't even have to try.
Refactor our merge binary handling to better match git.git, which
looks for a NUL in the first 8000 bytes.
As refdb and odb backends can be allocated by client code, libgit2
can’t know whether an alternative memory allocator was used, and thus
should not try to call `git__free` on those objects.
Instead, odb and refdb backend implementations must always provide
their own `free` functions to ensure memory gets freed correctly.
When we do not trust the on-disk mode, we use the mode of an existing
index entry. This allows us to preserve executable bits on platforms
that do not honor them on the filesystem.
If there is no stage 0 index entry, also look at conflicts to attempt
to answer this question: prefer the data from the 'ours' side, then
the 'theirs' side before falling back to the common ancestor.
SSL_shutdown() does not like it when we pass an unitialized ssl context
to it. This means that when we fail to connect to a host, we hide the
error message saying so with OpenSSL's indecipherable error message.
git expects an empty line after the binary data:
literal X
...binary data...
<empty_line>
The last literal block of the generated patches were not containing the required empty line. Example:
diff --git a/binary_file b/binary_file
index 3f1b3f9098131cfecea4a50ff8afab349ea66d22..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 6
Nc${NM%g@i}0ssZ|0lokL
diff --git a/binary_file2 b/binary_file2
index 31be99be19470da4af5b28b21e27896a2f2f9ee2..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 13
Sc${NMEKbZyOexL+Qd|HZV+4u-
git apply of that diff results in:
error: corrupt binary patch at line 9: diff --git a/binary_file2 b/binary_file2
fatal: patch with only garbage at line 10
The proper formating is:
diff --git a/binary_file b/binary_file
index 3f1b3f9098131cfecea4a50ff8afab349ea66d22..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 6
Nc${NM%g@i}0ssZ|0lokL
diff --git a/binary_file2 b/binary_file2
index 31be99be19470da4af5b28b21e27896a2f2f9ee2..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 13
Sc${NMEKbZyOexL+Qd|HZV+4u-
If the header doesn't look like a header (e.g. if it doesn't have a ":"
or if it has newlines), report "custom HTTP header '%s' is malformed".
If the header has the same name as a header already set by libgit2 (e.g.
"Host"), report "HTTP header '%s' is already set by libgit2".
This allows us to remove OS checks from source code, instead relying
on CMake to detect whether or not `struct stat` has the nanoseconds
members we rely on.
Check that the repository directory is beneath the workdir before
adding it to the list of reserved paths. If it is not, then there
is no possibility of checking out files into it, and it should not
be a reserved word.
This is a particular problem with submodules where the repo directory
may be in the super's .git directory.
When there is a comment at the end of a section, git keeps it there,
while we write the new variable right at the end.
Keep comments buffered and dump them when we're going to output a
variable or section, or reach EOF. This puts us in line with the config
files which git produces.
Don't coalesce all errors into ENOENT. At least identify EACCES.
All callers should be handling this case already, as the POSIX
`lstat` will return this.
`git_futils_mkdir` does not blindly call `git_futils_mkdir_relative`.
`git_futils_mkdir_relative` is used when you have some base directory
and want to create some path inside of it, potentially removing blocking
symlinks and files in the process. This is not suitable for a general
recursive mkdir within the filesystem.
Instead, when `mkdir` is being recursive, locate the first existent
parent directory and use that as the base for `mkdir_relative`.
Untangle git_futils_mkdir from git_futils_mkdir_ext - the latter
assumes that we own everything beneath the base, as if it were
being called with a base of the repository or working directory,
and is tailored towards checkout and ensuring that there is no
bogosity beneath the base that must be cleaned up.
This is (at best) slow and (at worst) unsafe in the larger context
of a filesystem where we do not own things and cannot do things like
unlink symlinks that are in our way.
When a file exists on disk and we're checking out a file that differs
in executableness, remove the old file. This allows us to recreate the
new file with p_open, which will take the new mode into account and
handle setting the umask properly.
Remove any notion of chmod'ing existing files, since it is now handled
by the aforementioned removal and was incorrect, as it did not take
umask into account.
The canonical directory path of the root directory of a volume on
POSIX already ends in a slash (eg, `/`). This is true only at the
root. Do not add a slash to paths in this case.
The canonical directory path of the root directory of a volume on
windows already ends in a slash (eg, `c:/`). This is true only
at the volume root. Do not add a slash to paths in this case.
The previous commit left the comment referencing the earlier state of
the code, change it to explain the current logic. While here, change the
logic to avoid repeating the copy of the base pattern.
When we're not doing pathspec matching, we let the iterator handle
file matching for us. However, we can only trust the iterator to
return *files* that match the pattern, because the iterator must
return directories that are not strictly in the pathlist, but that
are the parents of files that match the pattern, so that diff can
later recurse into them.
Thus, diff must examine non-files explicitly before including them
in the delta list.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
While advancing the tree iterator, if we advance over things that
we aren't interested in, then call `current`. Which may *itself*
call advance.
Error: stack overflow.
When searching for information about a submdoule, let's be more explicit
in what we expect to find. We currently insert a submodule into the map
and change certain parameters when the config callback gets called.
Switch to asking for the configuration we're interested in, rather than
taking it in an arbitrary order.
On case insensitive platforms, allow `git_index_add` to provide a new
path for an existing index entry. Previously, we would maintain the
case in an index entry without the ability to change it (except by
removing an entry and re-adding it.)
Higher-level functions (like `git_index_add_bypath` and
`git_index_add_frombuffers`) continue to keep the old path for easier
usage.
On case insensitive systems, when given a user-provided path in the
higher-level index addition functions (eg `git_index_add_bypath` /
`git_index_add_frombuffer`), examine the index to try to match the
given path to an existing directory.
Various mechanisms can cause the on-disk representation of a folder
to not match the representation in HEAD or the index - for example,
a case changing rename of some file `a/file.txt` to `A/file.txt`
will update the paths in the index, but not rename the folder on
disk.
If a user subsequently adds `a/other.txt`, then this should be stored
in the index as `A/other.txt`.