When using literal pathspecs in diff with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`
turn on the faster iterator pathlist handling.
Updates iterator pathspecs to include directory prefixes (eg, `foo/`)
for compatibility with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`.
When given a pathlist, don't assume that directories sort before
files. Walk through any list of entries sorting before us to make
sure that we've exhausted all entries that *aren't* directories.
Eg, if we're searching for 'foo/bar', and we have a 'foo.c', keep
advancing the pathlist to keep looking for an entry prefixed with
'foo/'.
libgit2 implementations of smart subtransports can simply reach through
the structure, but external implementors cannot.
Add these two functions as a way for the smart subtransports to get the
callbacks as set by the user.
When parsing user-provided regex patterns for functions, we must not
fail to provide a diff just because a pattern is not well
formed. Ignore it instead.
When we ask for credentials, the user may choose to return EUSER to
indicate that an error has happened on its end and it wants to be given
back control.
We must therefore pass that back to the user instead of mentioning that
it was on_headers_complete() that returned an error code. Since we can,
we return the exact error code from the user (other than PASSTHROUGH)
since it doesn't cost anything, though using other error codes aren't
recommended.
This lock/unlock pair allows for the cller to lock a configuration file
to avoid concurrent operations.
It also allows for a transactional approach to updating a configuration
file. If multiple updates must be made atomically, they can be done
while the config is locked.
When a configuration file is locked, any updates made to it will be done
to the in-memory copy of the file. This allows for multiple updates to
happen while we hold the lock, preventing races during complex
config-file manipulation.
Instead of writing into the filebuf directly, make the functions to
write the modified config file write into a buffer which can then be
dumped into the lockfile for committing.
This allows us to re-use the same code for modifying a locked
configuration, as we can simply skip the last step of dumping the data
to disk.
When we're looking to update a tag, we can't stop if the tag auto-follow
rules don't say to update it. The tag might still match the refspec we
were given.
When curl uses a proxy, it will only use Basic unless we prompt it to
try to use the most secure on it has available.
This is something which git did recently, and it seems like a good idea.
With Visual Studio versions 2008 and older they ignore the full path to files and only check
the basename of the file to find a collision. Additionally, having duplicate basenames can break
other build tools like GYP.
This fixes https://github.com/libgit2/libgit2/issues/3356
Allow restoring a previously captured oom error, by
detecting when the captured message pointer points to the
static oom error message. This means there is no need
to strdup the message in giterr_detach.
Error messages that are detached are assumed to be dynamically
allocated. Passing a pointer to the static oom error message
can cause an attempt to free the static buffer later. This change
checks if the oom error message is about to be detached and detaches
a copy instead.
Instead of allocating a brand new buffer for each error string we want
to store, we can use a per-thread buffer to store the error string and
re-use the underlying storage. We already use the buffer to format the
string, so this mostly makes that more direct.
An error here will typically mean that the directory was removed between
the time we iterated the parent and the time we wanted to visit it in
which case we should ignore it.
Other kinds of errors such as permissions (or transient errors) also
better dealt with by pretending we didn't see it.
When we have an error renaming the lockfile, we need to make sure
that we remove it upon cleanup. For this, we need to keep track of
whether we opened the file and whether the rename succeeded.
If we did create the lockfile but the rename did not succeed, we
remove the lockfile. This won't protect against all errors, but
the most common ones (target file is open) does get handled.
The header src/cc-compat.h defines portable format specifiers PRIuZ, PRIdZ, and PRIxZ. The original report highlighted the need to use these specifiers in examples/network/fetch.c. For this commit, I checked all C source and header files not in deps/ and transitioned to the appropriate format specifier where appropriate.
Removing a reflog upon ref deletion is something which only some
backends might wish to do. Backends which are database-backed may wish
to archive a reflog, log-based ones may not need to do anything.
If we get the path from the gitmodules file, look up the submodule we're
interested in by path, rather then by name. Otherwise we might get
duplicate results.
Upgrade xdiff to version used in core git 2.4.5 (0df0541).
Corrects an issue where an LF is added at EOF while applying
an unrelated change (ba31180), cleans up some unused code (be89977 and
e5b0662), and provides an improved callback to avoid leaking internal
(to xdiff) structures (467d348).
This also adds some additional functionality that we do not yet take
advantage of, namely the ability to ignore changes whose lines are
all blank (36617af).
Versions prior to 0.9.8f did not have this function, rhel/centos5 are still on a
heavily backported version of 0.9.8e and theoretically supported until March 2017
Without this ifdef, I get the following link failure:
```
CMakeFiles/libgit2_clar.dir/src/openssl_stream.c.o: In function `openssl_connect':
openssl_stream.c:(.text+0x45a): undefined reference to `SSL_set_tlsext_host_name'
collect2: error: ld returned 1 exit status
make[6]: *** [libgit2_clar] Error 1
```
Introduce `git__getenv` which is a UTF-8 aware `getenv` everywhere.
Make `cl_getenv` use this to keep consistent memory handling around
return values (free everywhere, as opposed to only some platforms).
The regex we use to look at the gitmodules file does not correctly
delimit the name of submodule which we want to look up and puts '.*'
straight after the name, maching on any submodule which has the seeked
submodule as a prefix of its name.
Add the missing '\.' in the regex so we want a full stop to exist both
before and after the submodule name.
Allow custom filters with wildcard attributes, so that clients
can support some random `filter=foo` in a .gitattributes and look
up the corresponding smudge/clean commands in the configuration file.
Function was added in commit 2c982daa2e on October 5, 2011,
and removed in commit 41fb1ca0ec on October 29, 2012.
Given the length of time it's gone unused, it's safe to remove now.
This reverts commit 969d4b703c.
This was a fluke from Coverity. The length to all the APIs in the
library is supposed to be passed in as nibbles, not bytes. Passing it as
bytes would prevent us from parsing uneven-sized SHA1 strings.
Also, the rest of the library was still using nibbles (including
revparse and the odb_prefix APIs), so this change was seriously breaking
things in unexpected ways. ^^
When diffing the index with the workdir and GIT_DIFF_UPDATE_INDEX has been passed,
the previous implementation was always writing the index to disk even if it wasn't
modified.
When a refspec contains no rhs and thus won't cause an explicit update,
we skip all the logic, but that means that we don't update FETCH_HEAD
with it, which is what the implicit rhs is.
Add another bit of logic which puts those remote heads in the list of
updates so we put them into FETCH_HEAD.
When we don't own a buffer (asize=0) we currently allow the usage of
grow to copy the memory into a buffer we do own. This muddles the
meaning of grow, and lets us be a bit cavalier with ownership semantics.
Don't allow this any more. Usage of grow should be restricted to buffers
which we know own their own memory. If unsure, we must not attempt to
modify it.
Always set `GIT_DIFF_PATCH_DIFFABLE` for all files, regardless of
binary-ness, so that the binary callback is invoked to either
show the binary contents, or just print the standard "Binary files
differ" message. We may need to do deeper inspection for binary
files where we have avoided loading the contents into a file map.
If the libcurl stream is available, use that as the underlying stream
instead of the socket stream. This allows us to set a proxy for HTTPS
connections.
The TLS streams talk over the curl stream themselves, so we don't need
to ask for it explicitly. Do so in the case of the non-encrypted one so
we can still make use proxies in that case.
When linking against libcurl, use it as the underlying transport instead
of straight sockets. We can't quite just give over the file descriptor,
as curl puts it into non-blocking mode, so we build a custom BIO so
OpenSSL sends the data through our stream, be it the socket or curl
streams.
If the stream claims to support this feature, we can let the transport
set the proxy.
We also set HTTPPROXYTUNNEL option so curl can create a tunnel through
the proxy which lets us create our own TLS session (if needed).
cURL has a mode in which it acts a lot like our streams, providing send
and recv functions and taking care of the TLS and proxy setup for us.
Implement a new stream which uses libcurl instead of raw sockets or the
TLS libraries directly. This version does not support reporting
certificates or proxies yet.
When stashing the workdir tree, examine the index as well. Using
a mechanism similar to `git_diff_tree_to_workdir_with_index`
allows us to determine that a file was added in the index and
subsequently modified in the working directory. Without examining
the index, we would erroneously believe that this file was
untracked and fail to include it in the working directory tree.
Use a slightly modified `git_diff_tree_to_workdir_with_index` in
order to avoid some of the behavior custom to `git diff`. In
particular, be sure to include the working directory side of a
file when it was deleted in the index.
This is something we do on re-init but not when opening a
repository. This hasn't particularly mattered up to now as the version
has been 0 ever since the first release of git, but the times, they're
a-changing and we will soon see version 1 in the wild. We need to make
sure we don't open those.
If an index entry for a file that is not in HEAD is in conflicted state,
when diffing HEAD with the index, the status field of the corresponding git_diff_delta was incorrectly reported as GIT_DELTA_ADDED instead of GIT_DELTA_CONFLICTED.
This was due to handle_unmatched_new_item() initially setting the status
to GIT_DELTA_CONFLICTED but then overriding it later with GIT_DELTA_ADDED.
We currently do not handle those enum values which require us to set
"true" or unset variables in all cases. Use a common function which does
understand this by looking at our mapping directly.
Similarly to the other ones. In this test we copy over testing
`RECURSE_YES` which shows an error in our handling of the `YES` variant
which we may have to port to the rest.
During the cache deletion, the check for whether we consider a submodule
to exist got changed regarding submodules which are in the worktree but
not configured.
Instead of checking for the url field to be populated, check the
location where we've found it.
This lets us specify in the status call which ignore rules we want to
use (optionally falling back to whatever the submodule has in its
configuration).
This removes one of the reasons for having `_set_ignore()` set the value
in-memory. We re-use the `IGNORE_RESET` value for this as it is no
longer relevant but has a similar purpose to `IGNORE_FALLBACK`.
Similarly, we remove `IGNORE_DEFAULT` which does not have use outside of
initializers and move that to fall back to the configuration as well.
As submodules are becomes more like values, we should not let a status
check to update its properties. Instead of taking a submodule, have
status take a repo and submodule name.
Having this cache and giving them out goes against our multithreading
guarantees and it makes it impossible to use submodules in a
multi-threaded environment, as any thread can ask for a refresh which
may reallocate some string in the submodule struct which we've accessed
in a different one via a getter.
This makes the submodules behave more like remotes, where each object is
created upon request and not shared except explicitly by the user. This
means that some tests won't pass yet, as they assume they can affect the
submodule objects in the cache and that will affect later operations.
This allows the user to look up fields which we don't parse in libgit2,
and allows them to access gpgsig or mergetag fields if they wish to
check the signature.
When an entry has a racy timestamp, we need to check whether the file
itself has changed since we put its entry in the index. Only then do we
smudge the size field to force a check the next time around.
When a file on the workdir has the same or a newer timestamp than the
index, we need to perform a full check of the contents, as the update of
the file may have happened just after we wrote the index.
The iterator changes are such that we can reach inside the workdir
iterator from the diff, though it may be better to have an accessor
instead of moving these structs into the header.
When updating the index during a diff, preserve the original mode,
which prevents us from dropping the mode to what we have interpreted
as on our system (eg, what the working directory claims it to be,
which may be a lie on some systems.)
This is used by the submodule in order to figure out if the index has
changed since it last read it. Using a timestamp is racy, so let's make
it use the checksum, just like we now do for reloading the index itself.
We currently use a timetamp to check whether an index file has been
modified since we last read it, but this is racy. If two updates happen
in the same second and we read after the first one, we won't detect the
second one.
Instead read the SHA-1 checksum of the file, which are its last 20 bytes which
gives us a sure-fire way to detect whether the file has changed since we
last read it.
As we're now keeping track of it, expose an accessor to this data.
When checking out some file 'foo' that has been modified in the
working directory, allow the checkout to proceed (do not conflict)
if 'foo' is identical to the target of the checkout.
In order to avoid racy-git, we zero out the file size for entries with
the same timestamp as the index (or during the initial checkout). This
is the case in a couple of crlf tests, as the code is fast enough to do
everything in the same second.
As we know that we do not perform the modification just after writing
out the index, which is what this is designed to work around, tick the
mtime of the index file such that it doesn't agree with the files
anymore, and we do not zero out these entries.
If a file entry has the same timestamp as the index itself, it is
considered racily-clean, as it may have been modified after the index
was written, but during the same second. We take extra steps to check
the contents, but this is just one part of avoiding races.
For files which do have changes but have not been updated in the index,
updating the on-disk index means updating its timestamp, which means we
would no longer recognise these entries as racy and we would trust the
timestamp to tell us whether they have changed.
In order to work around this, git zeroes out the file-size field in
entries with the same timestamp as the index in order to force the next
diff to check the contents. Do so in libgit2 as well.
Arguably all uses of readdir_r are unnecessary, but in this case
especially so, as the directory handle only exists within this function,
so we don't race with anybody.
Introduce a new binary diff callback to provide the actual binary
delta contents to callers. Create this data from the diff contents
(instead of directly from the ODB) to support binary diffs including
the workdir, not just things coming out of the ODB.
The read and write callbacks passed to SSLSetIOFuncs() have been
rewritten to match the implementation used on opensource.apple.com and
other open source projects like VLC.
This change also fixes a bug where the read callback could get into
an infinite loop when 0 bytes were read.
Some tools create multiple author fields. git is rather lax when parsing
them, although fsck does complain about them. This means that they exist
in the wild.
As it's not too taxing to check for them, and there shouldn't be a
noticeable slowdown when dealing with correct commits, add logic to skip
over these extra fields when parsing the commit.
When we hit an error writing to the next stream from a file, we jump to
'done' which currently skips over closing the file descriptor.
Make sure to close the descriptor if it has been set to a valid value.
We take in a possibly partial ID by taking a length and working off of
that to figure out whether to just look up the object or ask the
backends for a prefix lookup.
Unfortunately we've been checking the size against `GIT_OID_HEXSZ` which
is the size of a *string* containing a full ID, whereas we need to check
against the size we can have when it's a 20-byte array.
Change the checks and comment to use `GIT_OID_RAWSZ` which is the
correct size of a git_oid to have when full.