`git_submodule_status` is very slow, bottlenecked on
`git_repository_head_tree`, which it uses through `submodule_update_head`. If
the user has requested submodule caching, assume that they want this status
cached too and skip it.
Signed-off-by: David Turner <dturner@twosigma.com>
Added `git_repository_submodule_cache_all` to initialze a cache of
submodules on the repository so that operations looking up N
submodules are O(N) and not O(N^2). Added a
`git_repository_submodule_cache_clear` function to remove the cache.
Also optimized the function that loads all submodules as it was itself
O(N^2) w.r.t the number of submodules, having to loop through the
`.gitmodules` file once per submodule. I changed it to process the
`.gitmodules` file once, into a map.
Signed-off-by: David Turner <dturner@twosigma.com>
For username/password credentials, support NTLM or Basic (in that order
of priority). Use the WinHTTP built-in authentication support for both,
and maintain a bitfield of the supported mechanisms from the response.
The Git protocol does not specify what should happen in the case
of an empty packet line (that is a packet line "0004"). We
currently indicate success, but do not return a packet in the
case where we hit an empty line. The smart protocol was not
prepared to handle such packets in all cases, though, resulting
in a `NULL` pointer dereference.
Fix the issue by returning an error instead. As such kind of
packets is not even specified by upstream, this is the right
thing to do.
Each packet line in the Git protocol is prefixed by a four-byte
length of how much data will follow, which we parse in
`git_pkt_parse_line`. The transmitted length can either be equal
to zero in case of a flush packet or has to be at least of length
four, as it also includes the encoded length itself. Not
checking this may result in a buffer overflow as we directly pass
the length to functions which accept a `size_t` length as
parameter.
Fix the issue by verifying that non-flush packets have at least a
length of `PKT_LEN_SIZE`.
git_checkout_tree() sets up its working directory iterator to respect the
pathlist if GIT_CHECKOUT_DISABLE_PATHSPEC_MATCH is present, which is great.
What's not so great is that this iterator is then used side-by-side with
an iterator created by git_checkout_iterator(), which did not set up its
pathlist appropriately (although the iterator mirrors all other iterator
options).
This could cause git_checkout_tree() to delete working tree files which
were not specified in the pathlist when GIT_CHECKOUT_DISABLE_PATHSPEC_MATCH
was used, as the unsynchronized iterators causes git_checkout_tree() to think
that files have been deleted between the two trees. Oops.
And added a test which fails without this fix (specifically, the final check
for "testrepo/README" to still be present fails).
Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
We want to keep the git UA in order for services to recognise that we're
a Git client and not a browser. But in order to stop dumb HTTP some
services have blocked UAs that claim to be pre-1.6.6 git.
Thread these needles by using the "git/2.0" prefix which is still close
enough to git's yet distinct enough that you can tell it's us.
This partially reverts bdec62dce1 which activates
the transport code-paths which allow you to use a custom TLS implementation
without having to have one at build-time.
However the capabilities describe how libgit2 was built, not what it could
potentially support, bring back the ifdefs so we only say we support HTTPS if
libgit2 was itself built with a TLS implementation.
The function to write trees allocates a new buffer for each tree.
This causes problems with performance when performing a lot
of actions involving writing trees, e.g. when doing many merges.
Fix the issue by instead handing in a shared buffer, which is then
re-used across the calls without having to re-allocate between
calls.
When trying to uncompress deltas in a packfile's delta chain, we try to
add object bases to the packfile cache, subsequently decrementing its
reference count if it has been added successfully. This may lead to a
mismatched reference count in the case where we exit the loop early due
to an encountered error.
Fix the issue by decrementing the reference count in error cleanup.
git_rebase_finish relies on head_detached being set, but
rebase_init_merge was only setting it when branch->ref_name was unset.
But branch->ref_name would be set to "HEAD" in the case of detached
HEAD being either implicitly (NULL) or explicitly passed to
git_rebase_init.
Introduce `git_thread_exit`, which will allow threads to terminate at an
arbitrary time, returning a `void *`. On Windows, this means that we
need to store the current `git_thread` in TLS, so that we can set its
`return` value when terminating.
We cannot simply use `ExitThread`, since Win32 returns `DWORD`s from
threads; we return `void *`.
`giterr_set()` is used when it is required to format a string, and since
we don't really require it for this case, it is better to stick to
`giterr_set_str()`.
This also suppresses a warning(-Wformat-security) raised by the compiler.
Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com>
The `CURLINFO_LASTSOCKET` information has been deprecated since
curl version 7.45.0 as it may result in an overflow in the
returned socket on certain systems, most importantly on 64 bit
Windows. Instead, a new call `CURLINFO_ACTIVESOCKET` has been
added which instead returns a `curl_socket_t`, which is always
sufficiently long to store a socket.
As we need to provide backwards compatibility with curl versions
smaller than 7.45.0, alias CURLINFO_ACTIVESOCKET to
CURLINFO_LASTSOCKET on platforms without CURLINFO_ACTIVESOCKET.
We look at whether we're trying to replace a blob with a tree during the
update phase, but we fail to look at whether we've just inserted a blob
where we're now trying to insert a tree.
Update the check to look at both places. The test for this was
previously succeeding due to the bu where we did not look at the sorted
output.
On Windows we can find locked files even when reading a reference or the
packed-refs file. Bubble up the error in this case as well to allow
callers on Windows to retry more intelligently.
It does not help us to check whether the file exists before trying to
unlink it since it might be gone by the time unlink is called.
Instead try to remove it and handle the resulting error if it did not
exist.
Checking the size before we open the file descriptor can lead to the
file being replaced from under us when renames aren't quite atomic, so
we can end up reading too little of the file, leading to us thinking the
file is corrupted.
There might be a few threads or processes working with references
concurrently, so fortify the code to ignore errors which come from
concurrent access which do not stop us from continuing the work.
This includes ignoring an unlinking error. Either someone else removed
it or we leave the file around. In the former case the job is done, and
in the latter case, the ref is still in a valid state.
We need to save the errno, lest we clobber it in the giterr_set()
call. Also add code for reporting that a path component is missing,
which is a distinct failure mode.
In order not to undo concurrent modifications to references, we must
make sure that we only delete a loose reference if it still has the same
value as when we packed it.
This means we need to lock it and then compare the value with the one we
put in the packed file.
When trying to find a discovery, we walk up the directory
structure checking if there is a ".git" file or directory and, if
so, check its validity. But in the case that we've got a ".git"
file, we do not want to unconditionally assume that the file is
in fact a ".git" file and treat it as such, as we would error out
if it is not.
Fix the issue by only treating a file as a gitlink file if it
ends with "/.git". This allows users of the function to discover
a repository by handing in any path contained inside of a git
repository.
The existing code would set a namespace of "" (empty string) with
GIT_NAMESPACE unset. In a repository where refs/heads/namespaces/
exists, that can produce incorrect results. Detect that case and avoid
setting the namespace at all.
Since that makes the last assignment to error conditional, and the
previous assignment can potentially get GIT_ENOTFOUND, set error to 0
explicitly to prevent the call from incorrectly failing with
GIT_ENOTFOUND.
We're recently trying to upgrade to the current master of libgit2
in Cargo but we're unfortunately hitting a segfault in one of our
tests. This particular test is just a small smoke test that https
works (e.g. it's configured in libgit2). It attempts to clone
from a URL which simply immediately drops connections after
they're accepted (e.g. terminate abnormally). We expect to see a
standard error from libgit2 but unfortunately we're seeing a
segfault.
This segfault is happening inside of the `wait_for` function of
`curl_stream.c` at the line `FD_SET(fd, &errfd)` because `fd` is
-1. This ends up doing an out-of-bounds array access that faults
the program. I tracked back to where this -1 came from to the
line here (returned by `CURLINFO_LASTSOCKET`) and added a check
to return an error.
The code correctly detects that forced creation of a branch on a
nonbare repo should not be able to overwrite a branch which is
the HEAD reference. But there's no reason to prevent this on
a bare repo, and in fact, git allows this. I.e.,
git branch -f master new_sha
works on a bare repo with HEAD set to master. This change fixes
that problem, and updates tests so that, for this case, both the
bare and nonbare cases are checked for correct behavior.
We need to include the initialisation and construction functions in all
backend, so we include this header when building against SecureTransport
and WinHTTP as well.
In `pack_entry_find_offset`, we try to find the offset of a
certain object in the pack file. To do so, we first assert if the
packfile has already been opened and open it if not. Opening the
packfile is guarded with a mutex, so concurrent access to this is
in fact safe.
What is not thread-safe though is our calculation of offsets
inside the packfile. Assume two threads calling
`pack_entry_find_offset` at the same time. We first calculate the
offset and index location and only then determine if the pack has
already been opened. If so, we re-calculate the offset and index
address.
Now the case for two threads: thread 1 first calculates the
addresses and is subsequently suspended. The second thread will
now call `pack_index_open` and initialize the pack file,
calculating its addresses correctly. When the first thread is
resumed now, he'll see that the pack file has already been
initialized and will happily proceed with the addresses it has
already calculated before the check. As the pack file was not
initialized before, these addresses are bogus.
Fix the issue by only calculating the addresses after having
checked if the pack file is open.
When trying to receive packets from the remote, we loop until
either an error distinct to `GIT_EBUFS` occurs or until we
successfully parsed the packet. This does not honor the case
where we are looping over an already closed socket which has no
more data, leaving us in an infinite loop if we got a bogus
packet size or if the remote hang up.
Fix the issue by returning `GIT_EEOF` when we cannot read data
from the socket anymore.
When reading a server's reference announcements via the smart
protocol, we expect the server to send multiple flushes before
the protocol is finished. If we fail to receive new data from the
socket, we will only return an end of stream error if we have not
seen any flush yet.
This logic is flawed in that we may run into an infinite loop
when receiving a server's reference announcement with a bogus
flush packet. E.g. assume the last flushing package is changed to
not be '0000' but instead any other value. In this case, we will
still await one more flush package and ignore the fact that we
are not receiving any data from the socket, causing an infinite
loop.
Fix the issue by always returning `GIT_EEOF` if the socket
indicates an end of stream.