This is used by the submodule in order to figure out if the index has
changed since it last read it. Using a timestamp is racy, so let's make
it use the checksum, just like we now do for reloading the index itself.
We currently use a timetamp to check whether an index file has been
modified since we last read it, but this is racy. If two updates happen
in the same second and we read after the first one, we won't detect the
second one.
Instead read the SHA-1 checksum of the file, which are its last 20 bytes which
gives us a sure-fire way to detect whether the file has changed since we
last read it.
As we're now keeping track of it, expose an accessor to this data.
When checking out some file 'foo' that has been modified in the
working directory, allow the checkout to proceed (do not conflict)
if 'foo' is identical to the target of the checkout.
In order to avoid racy-git, we zero out the file size for entries with
the same timestamp as the index (or during the initial checkout). This
is the case in a couple of crlf tests, as the code is fast enough to do
everything in the same second.
As we know that we do not perform the modification just after writing
out the index, which is what this is designed to work around, tick the
mtime of the index file such that it doesn't agree with the files
anymore, and we do not zero out these entries.
If a file entry has the same timestamp as the index itself, it is
considered racily-clean, as it may have been modified after the index
was written, but during the same second. We take extra steps to check
the contents, but this is just one part of avoiding races.
For files which do have changes but have not been updated in the index,
updating the on-disk index means updating its timestamp, which means we
would no longer recognise these entries as racy and we would trust the
timestamp to tell us whether they have changed.
In order to work around this, git zeroes out the file-size field in
entries with the same timestamp as the index in order to force the next
diff to check the contents. Do so in libgit2 as well.
Arguably all uses of readdir_r are unnecessary, but in this case
especially so, as the directory handle only exists within this function,
so we don't race with anybody.
Introduce a new binary diff callback to provide the actual binary
delta contents to callers. Create this data from the diff contents
(instead of directly from the ODB) to support binary diffs including
the workdir, not just things coming out of the ODB.
The read and write callbacks passed to SSLSetIOFuncs() have been
rewritten to match the implementation used on opensource.apple.com and
other open source projects like VLC.
This change also fixes a bug where the read callback could get into
an infinite loop when 0 bytes were read.
Some tools create multiple author fields. git is rather lax when parsing
them, although fsck does complain about them. This means that they exist
in the wild.
As it's not too taxing to check for them, and there shouldn't be a
noticeable slowdown when dealing with correct commits, add logic to skip
over these extra fields when parsing the commit.
When we hit an error writing to the next stream from a file, we jump to
'done' which currently skips over closing the file descriptor.
Make sure to close the descriptor if it has been set to a valid value.
We take in a possibly partial ID by taking a length and working off of
that to figure out whether to just look up the object or ask the
backends for a prefix lookup.
Unfortunately we've been checking the size against `GIT_OID_HEXSZ` which
is the size of a *string* containing a full ID, whereas we need to check
against the size we can have when it's a 20-byte array.
Change the checks and comment to use `GIT_OID_RAWSZ` which is the
correct size of a git_oid to have when full.
The way we currently do it depends on the subtlety of strlen vs sizeof
and the fact that .pack is one longer than .idx. Let's use a git_buf so
we can express the manipulation we want much more clearly.
`merge_diff_list_count_candidates()` takes pointers to the source and
target counts, but when it comes time to increase them, we're increasing
the pointer, rather than the value it's pointing to.
Dereference the value to increase.
Coverity complains about the git_rawobj ones because we use a loop in
which we keep remembering the old version, and we end up copying our
object as the base, so we want to have the data pointer be NULL.
Let `ssh_stream_free()` take a NULL stream, as free functions should,
and remove the check from the connection setup.
The connection setup would not need the check anyhow, as we always have
a stream by the time we reach this code.
When the callback returns an error, we should stop immediately. This
broke when trying to make sure we pass specific errors up the chain.
This broke cancelling out of the loose backend's foreach.
We've been using `p_ftruncate()` to extend the packfile in order to mmap
it and write the new data into it. This works well in the general case,
but as truncation does not allocate space in the filesystem, it must do
so when we write data to it.
The only way the OS has to indicate a failure to allocate space is via
SIGBUS which means we tried to write outside the file. This will cause
everyone to crash as they don't expect to handle this signal.
Switch to using `p_lseek()` and `p_write()` to extend the file in a way
which tells the filesystem to allocate the space for the missing
data. We can then be sure that we have space to write into.
We use heuristics to make a decent guess at when we can save time and
space by linking object files during a clone. Unfortunately checking the
device id isn't enough, as those would be the same during e.g. a bind-mount,
but the OS still does not allow us to link between mounts of the same
filesystem.
If we fail to perform the links, fall back to copying the contents into
a new file as a last attempt.
A remote's URLs are now modified according to the url.*.insteadOf
and url.*.pushInsteadOf configurations. This allows a user to
replace URL prefixes by setting the corresponding keys. E.g.
"url.foo.insteadOf = bar" would replace the prefix "bar" with the
new prefix "foo".
Some brain damaged tolower() implementations appear to want to
take the locale into account, and this may require taking some
insanely aggressive lock on the locale and slowing down what should
be the most trivial of trivial calls for people who just want to
downcase ASCII.
Treat input bytes as unsigned before doing arithmetic on them,
lest we look at some non-ASCII byte (like a UTF-8 character) as a
negative value and perform the comparison incorrectly.
We do not error on "merge conflicts"; on the contrary, merge conflicts
are a normal part of merging. We only error on "checkout conflicts",
where a change exists in the index or the working directory that would
otherwise be overwritten by performing the checkout.
This *may* happen during merge (after the production of the new index
that we're going to checkout) but it could happen during any checkout.
If there exists a conflict in the index, but no file in the working
directory, this implies that the user wants to accept the resolution
by removing the file. Thus, remove the conflict entry from the
index, instead of trying to add a (nonexistent) file.
Mark the `old_file` and `new_file` sides of a delta with a new bit,
`GIT_DIFF_FLAG_EXISTS`, that introduces that a particular side of
the delta exists in the diff.
This is useful for indicating whether a working directory item exists
or not, in the presence of a conflict. Diff users may have previously
used DELETED to determine this information.
It's not always obvious the mapping between stage level and
conflict-ness. More importantly, this can lead otherwise sane
people to write constructs like `if (!git_index_entry_stage(entry))`,
which (while technically correct) is unreadable.
Provide a nice method to help avoid such messy thinking.
Since a diff entry only concerns a single entry, zero the information
for the index side of a conflict. (The index entry would otherwise
erroneously include the lowest-stage index entry - generally the
ancestor of a conflict.)
Test that during status, the index side of the conflict is empty.
When diffing against an index, return a new `GIT_DELTA_CONFLICTED`
delta type for items that are conflicted. For a single file path,
only one delta will be produced (despite the fact that there are
multiple entries in the index).
Index iterators now have the (optional) ability to return conflicts
in the index. Prior to this change, they would be omitted, and callers
(like diff) would omit conflicted index entries entirely.
Wrap the iterator current / advance functions so that we can extend
them, but also handle GIT_ITEROVER cases in the iterator funcs
instead of the callers.
When we moved from acting on the instance to acting on the
configuration, we dropped the validation of the passed refspec, which
can lead to writing an invalid refspec to the configuration. Bring that
validation back.
The public key field is optional and as such can take NULL. Account for
that and do not call strlen() on NULL values. Also assert() for non-NULL
values of username & private key.
Declare GIT_CREDTYPE_SSH_MEMORY to have consistent API independently of
whether libgit2 was built with or without in-memory key passing support.
Or rather, to have it at all since build-time definitions are not stored
in headers.
When we look for which remote corresponds to a remote-tracking branch,
we look in the refspecs to see which ones matches. If none do, we should
abort. We currently ignore the error message from this operation, so
let's not do that anymore.
As part of the test we're writing, let's test for the expected behaviour
if we cannot find a refspec which tells us what the remote-tracking
branch for a remote would look like.
When we find out that we're dealing with a matching refspec, we set the
flag and return immediately. This leaves the strings as NULL, which
breaks the contract.
Assign these pointers to a string with the correct values.
When we discover that we want to keep a negative rule, make sure to
clear the error variable, as it we otherwise return whatever was left by
the previous loop iteration.
This can be used by tools to show mesages about failing to communicate
with the server. The error message in this case will often contain the
server's error message, as far as it managed to send anything.
When we fail to read from stdout, it's typically because the URL was
wrong and the server process has sent some output over its stderr
output.
Read that output and set the error message to whatever we read from it.
We set an error if we get an error when reading, but we don't bother
setting an error message for write failing. This causes a cryptic error
to be shown to the user when the target filesystem is full.
These were left over from the culling as it's not clear which use-cases
might benefit from this. It is not clear that we want to support any
use-case which depends on changing the remote's idea of the base
refspecs rather than passing in different per-operation refspec list, so
remove these functions.
This function deals with functions doing IO which means the amount of
errors that can happen is quit large. It does not help if it always
ovewrites the underlying error message with a less understandable
version of "something went wrong".
Instead, only use this generic message if there was no error set by the
callback.
Instead of going through each entry we have and re-adding, which may not
even be correct for certain crlf options and has bad performance, use
the function which performs a diff against the worktree and try to add
and remove files from that list.
We currently iterate over all the entries and re-add them to the
index. While this provides correctness, it is wasteful as we try to
re-insert files which have not changed.
Instead, take a diff between the index and the worktree and only re-add
those which we already know have changed.
This is useful to send to the client while we're performing the work.
The reporting function has a force parameter which makes sure that we
do send out the message of 100% completed, even if this comes before the
next udpate window.
Instead of copying each object individually, as we'd been doing, use the
packbuilder which should be faster and give us some feedback.
While performing this change, we can hook up the packbuilder's writing
to the push progress so the caller knows how far along we are.
We currently first look in the loose object dir and then in the packs
for objects. When performing operations on recent history this has a
higher likelihood of hitting, but when we deal with operations which
look further back into the past, we start spending a large amount of
time getting ENOTENT from `access`.
Reversing the priorities means that long-running operations can get to
their objects faster, as we can look at the index data we have in memory
(or rather mapped) to figure out whether we have an object, which is
faster than going out to the filesystem.
The packed backend already implements an optimistic read algorithm by
first looking at the packs we know about and only going out to disk to
referesh if the object is not found which means that in the case where
we do have the object (which will be in the majority for anything that
traverses the graph) we can avoid going to to disk entirely to determine
whether an object exists.
Operations which look at recent history may take a slight impact, but
these would be operations which look a lot less at object and thus take
less time regardless.
The base refspecs changing can be a cause of confusion as to what is the
current base refspec set and complicate saving the remote's
configuration.
Change `git_remote_add_{fetch,push}()` to update the configuration
instead of an instance.
This finally makes `git_remote_save()` a no-op, it will be removed in a
later commit.
While this will rarely be different from the default, having it in the
remote adds yet another setting it has to keep around and can affect its
behaviour. Move it to the options.
Instead of having it set in a different place from every other callback,
put it the main structure. This removes some state from the remote and
makes it behave more like clone, where the constructors are passed via
the options.
As a first step in removing the repository-saving logic, don't allow
chaning the url or push url from a remote object, but change the
configuration on the configuration immediately.
Having the setting be different from calling its actions was not a great
idea and made for the sake of the wrong convenience.
Instead of that, accept either fetch options, push options or the
callbacks when dealing with the remote. The fetch options are currently
only the callbacks, but more options will be moved from setters and
getters on the remote to the options.
This does mean passing the same struct along the different functions but
the typical use-case will only call git_remote_fetch() or
git_remote_push() and so won't notice much difference.
The push object knows which remote it's associated with, and therefore
does not need to keep its own copy of the callbacks stored in the
remote.
Remove the copy and simply access the callbacks struct within the
remote.
Restricting files to size_t is a silly limitation. The loose backend
writes to a file directly, so there is no issue in using 63 bits for the
size.
We still assume that the header is going to fit in 64 bytes, which does
mean quite a bit smaller files due to the run-length encoding, but it's
still a much larger size than you would want Git to handle.
When handling attr matching, simply compare the directory path where the
attribute file resides to the path being matched. Skip over commonality
to allow us to compare the contents of the attribute file to the remainder
of the path.
This allows us to more easily compare the pattern directly to the path,
instead of trying to guess whether we want to compare the path's basename
or the full path based on whether the match was inside a containing
directory or not.
This also allows us to do fewer translations on the pattern (trying to
re-prefix it.)
When determining whether some file matches an attr pattern, do
not try to truncate the path to pass to fnmatch. When there is
no containing directory for an item (eg, from a .gitignore in the
root) this will cause us to truncate our path, which means that
we cannot do meaningful comparisons on it and we may have false
positives when trying to determine whether a given file is actually
a file or a folder (as we have lost the path's base information.)
This mangling was to allow fnmatch to compare a directory on disk to
the name of a directory, but it is unnecessary as our fnmatch accepts
FNM_LEADING_DIR.
We use a blocking socket and set the mode to AUTO_RETRY which means that
`SSL_write` and `SSL_read` will only return once the read or write has
been completed. We therefore don't need to handle partial writes or
re-try read due to a regenotiation.
While here, consider that a zero also indicates an error condition.
When writing a configuration file, we want to take a lock on the
new file (eg, `config.lock`) before opening the configuration file
(`config`) for reading so that we can prevent somebody from changing
the contents underneath us.