When a refspec contains no rhs and thus won't cause an explicit update,
we skip all the logic, but that means that we don't update FETCH_HEAD
with it, which is what the implicit rhs is.
Add another bit of logic which puts those remote heads in the list of
updates so we put them into FETCH_HEAD.
When we don't own a buffer (asize=0) we currently allow the usage of
grow to copy the memory into a buffer we do own. This muddles the
meaning of grow, and lets us be a bit cavalier with ownership semantics.
Don't allow this any more. Usage of grow should be restricted to buffers
which we know own their own memory. If unsure, we must not attempt to
modify it.
Always set `GIT_DIFF_PATCH_DIFFABLE` for all files, regardless of
binary-ness, so that the binary callback is invoked to either
show the binary contents, or just print the standard "Binary files
differ" message. We may need to do deeper inspection for binary
files where we have avoided loading the contents into a file map.
If the libcurl stream is available, use that as the underlying stream
instead of the socket stream. This allows us to set a proxy for HTTPS
connections.
The TLS streams talk over the curl stream themselves, so we don't need
to ask for it explicitly. Do so in the case of the non-encrypted one so
we can still make use proxies in that case.
When linking against libcurl, use it as the underlying transport instead
of straight sockets. We can't quite just give over the file descriptor,
as curl puts it into non-blocking mode, so we build a custom BIO so
OpenSSL sends the data through our stream, be it the socket or curl
streams.
If the stream claims to support this feature, we can let the transport
set the proxy.
We also set HTTPPROXYTUNNEL option so curl can create a tunnel through
the proxy which lets us create our own TLS session (if needed).
cURL has a mode in which it acts a lot like our streams, providing send
and recv functions and taking care of the TLS and proxy setup for us.
Implement a new stream which uses libcurl instead of raw sockets or the
TLS libraries directly. This version does not support reporting
certificates or proxies yet.
When stashing the workdir tree, examine the index as well. Using
a mechanism similar to `git_diff_tree_to_workdir_with_index`
allows us to determine that a file was added in the index and
subsequently modified in the working directory. Without examining
the index, we would erroneously believe that this file was
untracked and fail to include it in the working directory tree.
Use a slightly modified `git_diff_tree_to_workdir_with_index` in
order to avoid some of the behavior custom to `git diff`. In
particular, be sure to include the working directory side of a
file when it was deleted in the index.
This is something we do on re-init but not when opening a
repository. This hasn't particularly mattered up to now as the version
has been 0 ever since the first release of git, but the times, they're
a-changing and we will soon see version 1 in the wild. We need to make
sure we don't open those.
If an index entry for a file that is not in HEAD is in conflicted state,
when diffing HEAD with the index, the status field of the corresponding git_diff_delta was incorrectly reported as GIT_DELTA_ADDED instead of GIT_DELTA_CONFLICTED.
This was due to handle_unmatched_new_item() initially setting the status
to GIT_DELTA_CONFLICTED but then overriding it later with GIT_DELTA_ADDED.
We currently do not handle those enum values which require us to set
"true" or unset variables in all cases. Use a common function which does
understand this by looking at our mapping directly.
Similarly to the other ones. In this test we copy over testing
`RECURSE_YES` which shows an error in our handling of the `YES` variant
which we may have to port to the rest.
During the cache deletion, the check for whether we consider a submodule
to exist got changed regarding submodules which are in the worktree but
not configured.
Instead of checking for the url field to be populated, check the
location where we've found it.
This lets us specify in the status call which ignore rules we want to
use (optionally falling back to whatever the submodule has in its
configuration).
This removes one of the reasons for having `_set_ignore()` set the value
in-memory. We re-use the `IGNORE_RESET` value for this as it is no
longer relevant but has a similar purpose to `IGNORE_FALLBACK`.
Similarly, we remove `IGNORE_DEFAULT` which does not have use outside of
initializers and move that to fall back to the configuration as well.
As submodules are becomes more like values, we should not let a status
check to update its properties. Instead of taking a submodule, have
status take a repo and submodule name.
Having this cache and giving them out goes against our multithreading
guarantees and it makes it impossible to use submodules in a
multi-threaded environment, as any thread can ask for a refresh which
may reallocate some string in the submodule struct which we've accessed
in a different one via a getter.
This makes the submodules behave more like remotes, where each object is
created upon request and not shared except explicitly by the user. This
means that some tests won't pass yet, as they assume they can affect the
submodule objects in the cache and that will affect later operations.
This allows the user to look up fields which we don't parse in libgit2,
and allows them to access gpgsig or mergetag fields if they wish to
check the signature.
When an entry has a racy timestamp, we need to check whether the file
itself has changed since we put its entry in the index. Only then do we
smudge the size field to force a check the next time around.
When a file on the workdir has the same or a newer timestamp than the
index, we need to perform a full check of the contents, as the update of
the file may have happened just after we wrote the index.
The iterator changes are such that we can reach inside the workdir
iterator from the diff, though it may be better to have an accessor
instead of moving these structs into the header.
When updating the index during a diff, preserve the original mode,
which prevents us from dropping the mode to what we have interpreted
as on our system (eg, what the working directory claims it to be,
which may be a lie on some systems.)
This is used by the submodule in order to figure out if the index has
changed since it last read it. Using a timestamp is racy, so let's make
it use the checksum, just like we now do for reloading the index itself.
We currently use a timetamp to check whether an index file has been
modified since we last read it, but this is racy. If two updates happen
in the same second and we read after the first one, we won't detect the
second one.
Instead read the SHA-1 checksum of the file, which are its last 20 bytes which
gives us a sure-fire way to detect whether the file has changed since we
last read it.
As we're now keeping track of it, expose an accessor to this data.
When checking out some file 'foo' that has been modified in the
working directory, allow the checkout to proceed (do not conflict)
if 'foo' is identical to the target of the checkout.
In order to avoid racy-git, we zero out the file size for entries with
the same timestamp as the index (or during the initial checkout). This
is the case in a couple of crlf tests, as the code is fast enough to do
everything in the same second.
As we know that we do not perform the modification just after writing
out the index, which is what this is designed to work around, tick the
mtime of the index file such that it doesn't agree with the files
anymore, and we do not zero out these entries.
If a file entry has the same timestamp as the index itself, it is
considered racily-clean, as it may have been modified after the index
was written, but during the same second. We take extra steps to check
the contents, but this is just one part of avoiding races.
For files which do have changes but have not been updated in the index,
updating the on-disk index means updating its timestamp, which means we
would no longer recognise these entries as racy and we would trust the
timestamp to tell us whether they have changed.
In order to work around this, git zeroes out the file-size field in
entries with the same timestamp as the index in order to force the next
diff to check the contents. Do so in libgit2 as well.
Arguably all uses of readdir_r are unnecessary, but in this case
especially so, as the directory handle only exists within this function,
so we don't race with anybody.
Introduce a new binary diff callback to provide the actual binary
delta contents to callers. Create this data from the diff contents
(instead of directly from the ODB) to support binary diffs including
the workdir, not just things coming out of the ODB.
The read and write callbacks passed to SSLSetIOFuncs() have been
rewritten to match the implementation used on opensource.apple.com and
other open source projects like VLC.
This change also fixes a bug where the read callback could get into
an infinite loop when 0 bytes were read.
Some tools create multiple author fields. git is rather lax when parsing
them, although fsck does complain about them. This means that they exist
in the wild.
As it's not too taxing to check for them, and there shouldn't be a
noticeable slowdown when dealing with correct commits, add logic to skip
over these extra fields when parsing the commit.
When we hit an error writing to the next stream from a file, we jump to
'done' which currently skips over closing the file descriptor.
Make sure to close the descriptor if it has been set to a valid value.
We take in a possibly partial ID by taking a length and working off of
that to figure out whether to just look up the object or ask the
backends for a prefix lookup.
Unfortunately we've been checking the size against `GIT_OID_HEXSZ` which
is the size of a *string* containing a full ID, whereas we need to check
against the size we can have when it's a 20-byte array.
Change the checks and comment to use `GIT_OID_RAWSZ` which is the
correct size of a git_oid to have when full.
The way we currently do it depends on the subtlety of strlen vs sizeof
and the fact that .pack is one longer than .idx. Let's use a git_buf so
we can express the manipulation we want much more clearly.
`merge_diff_list_count_candidates()` takes pointers to the source and
target counts, but when it comes time to increase them, we're increasing
the pointer, rather than the value it's pointing to.
Dereference the value to increase.
Coverity complains about the git_rawobj ones because we use a loop in
which we keep remembering the old version, and we end up copying our
object as the base, so we want to have the data pointer be NULL.
Let `ssh_stream_free()` take a NULL stream, as free functions should,
and remove the check from the connection setup.
The connection setup would not need the check anyhow, as we always have
a stream by the time we reach this code.
When the callback returns an error, we should stop immediately. This
broke when trying to make sure we pass specific errors up the chain.
This broke cancelling out of the loose backend's foreach.
We've been using `p_ftruncate()` to extend the packfile in order to mmap
it and write the new data into it. This works well in the general case,
but as truncation does not allocate space in the filesystem, it must do
so when we write data to it.
The only way the OS has to indicate a failure to allocate space is via
SIGBUS which means we tried to write outside the file. This will cause
everyone to crash as they don't expect to handle this signal.
Switch to using `p_lseek()` and `p_write()` to extend the file in a way
which tells the filesystem to allocate the space for the missing
data. We can then be sure that we have space to write into.
We use heuristics to make a decent guess at when we can save time and
space by linking object files during a clone. Unfortunately checking the
device id isn't enough, as those would be the same during e.g. a bind-mount,
but the OS still does not allow us to link between mounts of the same
filesystem.
If we fail to perform the links, fall back to copying the contents into
a new file as a last attempt.
A remote's URLs are now modified according to the url.*.insteadOf
and url.*.pushInsteadOf configurations. This allows a user to
replace URL prefixes by setting the corresponding keys. E.g.
"url.foo.insteadOf = bar" would replace the prefix "bar" with the
new prefix "foo".
Some brain damaged tolower() implementations appear to want to
take the locale into account, and this may require taking some
insanely aggressive lock on the locale and slowing down what should
be the most trivial of trivial calls for people who just want to
downcase ASCII.
Treat input bytes as unsigned before doing arithmetic on them,
lest we look at some non-ASCII byte (like a UTF-8 character) as a
negative value and perform the comparison incorrectly.