Don't put the configuration in a subdir of the sandbox named
`config`, lest some tests decide to create their own directory
called `config`. Prefix with some underscores for uniqueness.
Ensure that `git_index_read_index` clears the uptodate bit on
files that it modifies.
Further, do not propagate the cache from an on-disk index into
another on-disk index. Although this should not be done, as
`git_index_read_index` is used to bring an in-memory index into
another index (that may or may not be on-disk), ensure that we do
not accidentally bring in these bits when misused.
Test that entries are only smudged when we write the index: the
entry smudging is to prevent us from updating an index in a way
that it would be impossible to tell that an item was racy.
Consider when we load an index: any entries that have the same
(or newer) timestamp than the index itself are considered racy,
and are subject to further scrutiny.
If we *save* that index with the same entries that we loaded,
then the index would now have a newer timestamp than the entries,
and they would no longer be given that additional scrutiny, failing
our racy detection! So test that we smudge those entries only on
writing the new index, but that we can detect them (in diff) without
having to write.
When there's no matching index entry (for whatever reason), don't
try to dereference the null return value to get at the id.
Otherwise when we break something in the index API, the checkout
test crashes for confusing reasons and causes us to step through
it in a debugger thinking that we had broken much more than we
actually did.
Keep track of entries that we believe are up-to-date, because we
added the index entries since the index was loaded. This prevents
us from unnecessarily examining files that we wrote during the
cleanup of racy entries (when we smudge racily clean files that have
a timestamp newer than or equal to the index's timestamp when we
read it). Without keeping track of this, we would examine every
file that we just checked out for raciness, since all their timestamps
would be newer than the index's timestamp.
When creating a filebuf, detect a directory that exists in our
target file location. This prevents a failure later, when we try
to move the lock file to the destination.
Test that on platforms without `core.symlinks`, we preserve symlinks
in `git_index_add_bypath`. (Users should correct the actual index
entry's mode to change a link to a regular file.)
When `core.symlinks = false`, we write the symlinks content (target)
to a regular file. We should ensure that when we later see that
regular file, we treat it specially - and that changing that regular
file would actually change the symlink target. (For compatibility
with Git for Windows).
We currently use the timestamp in order to decide whether a config file
has changed since we last read it.
This scheme falls down if the file is written twice within the same
second, as we fail to detect the file change after the first read in
that second.
Using calloc instead of malloc because the parse error will lead to an immediate free of committer (and its properties, which can segfault on free if undefined - test_refs_reflog_reflog__reading_a_reflog_with_invalid_format_returns_error segfaulted before the fix).
#3458
Provide a new merge option, GIT_MERGE_TREE_FAIL_ON_CONFLICT, which
will stop on the first conflict and fail the merge operation with
GIT_EMERGECONFLICT.
Test that nanoseconds are round-tripped correctly when we read
an index file that contains them. We should, however, ignore them
because we don't understand them, and any new entries in the index
should contain a `0` nsecs field, while existing preserving entries.
For most real use cases, repositories with alternates use them as main
object storage. Checking the alternate for objects before the main
repository should result in measurable speedups.
Because of this, we're changing the sorting algorithm to prioritize
alternates *in cases where two backends have the same priority*. This
means that the pack backend for the alternate will be checked before the
pack backend for the main repository *but* both of them will be checked
before any loose backends.
We moved the "main" parsing to use 64 bits for the timestamp, but the
quick parsing for the revwalk did not. This means that for large
timestamps we fail to parse the time and thus the walk.
Move this parser to use 64 bits as well.
xdiff craps the bed on large files. Treat very large files as binary,
so that it doesn't even have to try.
Refactor our merge binary handling to better match git.git, which
looks for a NUL in the first 8000 bytes.
As refdb and odb backends can be allocated by client code, libgit2
can’t know whether an alternative memory allocator was used, and thus
should not try to call `git__free` on those objects.
Instead, odb and refdb backend implementations must always provide
their own `free` functions to ensure memory gets freed correctly.
When we do not trust the on-disk mode, we use the mode of an existing
index entry. This allows us to preserve executable bits on platforms
that do not honor them on the filesystem.
If there is no stage 0 index entry, also look at conflicts to attempt
to answer this question: prefer the data from the 'ours' side, then
the 'theirs' side before falling back to the common ancestor.
git expects an empty line after the binary data:
literal X
...binary data...
<empty_line>
The last literal block of the generated patches were not containing the required empty line. Example:
diff --git a/binary_file b/binary_file
index 3f1b3f9098131cfecea4a50ff8afab349ea66d22..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 6
Nc${NM%g@i}0ssZ|0lokL
diff --git a/binary_file2 b/binary_file2
index 31be99be19470da4af5b28b21e27896a2f2f9ee2..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 13
Sc${NMEKbZyOexL+Qd|HZV+4u-
git apply of that diff results in:
error: corrupt binary patch at line 9: diff --git a/binary_file2 b/binary_file2
fatal: patch with only garbage at line 10
The proper formating is:
diff --git a/binary_file b/binary_file
index 3f1b3f9098131cfecea4a50ff8afab349ea66d22..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 6
Nc${NM%g@i}0ssZ|0lokL
diff --git a/binary_file2 b/binary_file2
index 31be99be19470da4af5b28b21e27896a2f2f9ee2..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 13
Sc${NMEKbZyOexL+Qd|HZV+4u-
This allows us to remove OS checks from source code, instead relying
on CMake to detect whether or not `struct stat` has the nanoseconds
members we rely on.
Test an initial submodule update, where we are trying to checkout
the submodule for the first time, and placing a file within the
submodule working directory with the same name as the submodule
(and consequently, the same name as the repository itself).
`git_futils_mkdir` does not blindly call `git_futils_mkdir_relative`.
`git_futils_mkdir_relative` is used when you have some base directory
and want to create some path inside of it, potentially removing blocking
symlinks and files in the process. This is not suitable for a general
recursive mkdir within the filesystem.
Instead, when `mkdir` is being recursive, locate the first existent
parent directory and use that as the base for `mkdir_relative`.
Untangle git_futils_mkdir from git_futils_mkdir_ext - the latter
assumes that we own everything beneath the base, as if it were
being called with a base of the repository or working directory,
and is tailored towards checkout and ensuring that there is no
bogosity beneath the base that must be cleaned up.
This is (at best) slow and (at worst) unsafe in the larger context
of a filesystem where we do not own things and cannot do things like
unlink symlinks that are in our way.
When a file exists on disk and we're checking out a file that differs
in executableness, remove the old file. This allows us to recreate the
new file with p_open, which will take the new mode into account and
handle setting the umask properly.
Remove any notion of chmod'ing existing files, since it is now handled
by the aforementioned removal and was incorrect, as it did not take
umask into account.
Ensure that we can iterate the filesystem root and that paths come
back well-formed, not with an additional '/'. (eg, when iterating
`c:/`, expect that we do not get some path like `c://autoexec.bat`).
The previous commit left the comment referencing the earlier state of
the code, change it to explain the current logic. While here, change the
logic to avoid repeating the copy of the base pattern.
These are small pieces of data, so there is no advantage to allocating
them separately. Include the two ids inline in the struct we use to
check that the expected and actual ids match.
On case insensitive platforms, allow `git_index_add` to provide a new
path for an existing index entry. Previously, we would maintain the
case in an index entry without the ability to change it (except by
removing an entry and re-adding it.)
Higher-level functions (like `git_index_add_bypath` and
`git_index_add_frombuffers`) continue to keep the old path for easier
usage.
On case insensitive systems, when given a user-provided path in the
higher-level index addition functions (eg `git_index_add_bypath` /
`git_index_add_frombuffer`), examine the index to try to match the
given path to an existing directory.
Various mechanisms can cause the on-disk representation of a folder
to not match the representation in HEAD or the index - for example,
a case changing rename of some file `a/file.txt` to `A/file.txt`
will update the paths in the index, but not rename the folder on
disk.
If a user subsequently adds `a/other.txt`, then this should be stored
in the index as `A/other.txt`.
We create a lockfile to update files under GIT_DIR. Sometimes these
files are actually located elsewhere and a symlink takes their place. In
that case we should lock and update the file at its final location
rather than overwrite the symlink.
Some nicer refactoring for index iteration walks.
The index iterator doesn't binary search through the pathlist space,
since it lacks directory entries, and would have to binary search
each index entry and all its parents (eg, when presented with an index
entry of `foo/bar/file.c`, you would have to look in the pathlist for
`foo/bar/file.c`, `foo/bar` and `foo`). Since the index entries and the
pathlist are both nicely sorted, we walk the index entries in lockstep
with the pathlist like we do for other iteration/diff/merge walks.
When using literal pathspecs in diff with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`
turn on the faster iterator pathlist handling.
Updates iterator pathspecs to include directory prefixes (eg, `foo/`)
for compatibility with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`.
Document that `GIT_DIFF_PATHSPEC_DISABLE` is not necessarily about
explicit path matching, but also includes matching of directory
names. Enforce this in a test.
When given a pathlist, don't assume that directories sort before
files. Walk through any list of entries sorting before us to make
sure that we've exhausted all entries that *aren't* directories.
Eg, if we're searching for 'foo/bar', and we have a 'foo.c', keep
advancing the pathlist to keep looking for an entry prefixed with
'foo/'.
When parsing user-provided regex patterns for functions, we must not
fail to provide a diff just because a pattern is not well
formed. Ignore it instead.
This lock/unlock pair allows for the cller to lock a configuration file
to avoid concurrent operations.
It also allows for a transactional approach to updating a configuration
file. If multiple updates must be made atomically, they can be done
while the config is locked.
When a configuration file is locked, any updates made to it will be done
to the in-memory copy of the file. This allows for multiple updates to
happen while we hold the lock, preventing races during complex
config-file manipulation.
While we download the remote's remote-tracking branches, we don't
download the tag. This points to the tag auto-follow rules interfering
with the refspec.
When we pass the path of a repository to `_bypath()`, we should behave
like git and stage it as a `_COMMIT` regardless of whether it is
registered a a submodule.
When we have an error renaming the lockfile, we need to make sure
that we remove it upon cleanup. For this, we need to keep track of
whether we opened the file and whether the rename succeeded.
If we did create the lockfile but the rename did not succeed, we
remove the lockfile. This won't protect against all errors, but
the most common ones (target file is open) does get handled.
The header src/cc-compat.h defines portable format specifiers PRIuZ, PRIdZ, and PRIxZ. The original report highlighted the need to use these specifiers in examples/network/fetch.c. For this commit, I checked all C source and header files not in deps/ and transitioned to the appropriate format specifier where appropriate.
Removing a reflog upon ref deletion is something which only some
backends might wish to do. Backends which are database-backed may wish
to archive a reflog, log-based ones may not need to do anything.
When we rename a submodule, we should be merging two sets of information
based on whether their path is the same. We currently only deduplicate
on equal name, which causes us to double-report.
Introduce `git__getenv` which is a UTF-8 aware `getenv` everywhere.
Make `cl_getenv` use this to keep consistent memory handling around
return values (free everywhere, as opposed to only some platforms).
Don't use the filter's free callback to free the actual data structure
holding the filter, as we may not always actually initialize it (the
test may be skipped).
The function was removed, but its declaration and changelog entry about
its removal were forgotten.
The comment in the test doesn't make any sense as the function doesn't
exist anymore, so get rid of it as well.
Allow custom filters with wildcard attributes, so that clients
can support some random `filter=foo` in a .gitattributes and look
up the corresponding smudge/clean commands in the configuration file.
When a refspec contains no rhs and thus won't cause an explicit update,
we skip all the logic, but that means that we don't update FETCH_HEAD
with it, which is what the implicit rhs is.
Add another bit of logic which puts those remote heads in the list of
updates so we put them into FETCH_HEAD.
When we don't own a buffer (asize=0) we currently allow the usage of
grow to copy the memory into a buffer we do own. This muddles the
meaning of grow, and lets us be a bit cavalier with ownership semantics.
Don't allow this any more. Usage of grow should be restricted to buffers
which we know own their own memory. If unsure, we must not attempt to
modify it.
We test the generation of the textual patch via the patch function,
which are just one of two possibilities to get the output.
Add a second patch generation via the diff function to make sure both
outputs are in sync.
Ensure that when a file is added in the index and subsequently
modified in the working directory, the stashed working directory
tree contains the actual working directory contents.
This is something we do on re-init but not when opening a
repository. This hasn't particularly mattered up to now as the version
has been 0 ever since the first release of git, but the times, they're
a-changing and we will soon see version 1 in the wild. We need to make
sure we don't open those.
If an index entry for a file that is not in HEAD is in conflicted state,
when diffing HEAD with the index, the status field of the corresponding git_diff_delta was incorrectly reported as GIT_DELTA_ADDED instead of GIT_DELTA_CONFLICTED.
This was due to handle_unmatched_new_item() initially setting the status
to GIT_DELTA_CONFLICTED but then overriding it later with GIT_DELTA_ADDED.
Support hierarchical test resource data, such that you can have
`tests/resources/foo/bar` and move the `bar` directory in as
a fixture.
Calling `cl_fixture_sandbox` on a path that is not directly beneath
the test resources directory succeeds, placing that directory into
the test fixture. (For example, `cl_fixture_sandbox("foo/bar")`
will sandbox the `foo/bar` directory as `bar`).
Add support for cleaning up directories created this way, by only
cleaning up the basename (in this example, `bar`) from the fixture
directory.
Given a variety of combinations of core.autocrlf settings and
attributes settings, test that we check out data into the working
directory the same as a known-good test resource created by git.git.
The current code will always fail, but only because it's asking for a
string on a live config. Take a snapshot and make sure we fail with
ENOTFOUND instead of any old error.
Similarly to the other ones. In this test we copy over testing
`RECURSE_YES` which shows an error in our handling of the `YES` variant
which we may have to port to the rest.
This lets us specify in the status call which ignore rules we want to
use (optionally falling back to whatever the submodule has in its
configuration).
This removes one of the reasons for having `_set_ignore()` set the value
in-memory. We re-use the `IGNORE_RESET` value for this as it is no
longer relevant but has a similar purpose to `IGNORE_FALLBACK`.
Similarly, we remove `IGNORE_DEFAULT` which does not have use outside of
initializers and move that to fall back to the configuration as well.
As submodules are becomes more like values, we should not let a status
check to update its properties. Instead of taking a submodule, have
status take a repo and submodule name.
Having this cache and giving them out goes against our multithreading
guarantees and it makes it impossible to use submodules in a
multi-threaded environment, as any thread can ask for a refresh which
may reallocate some string in the submodule struct which we've accessed
in a different one via a getter.
This makes the submodules behave more like remotes, where each object is
created upon request and not shared except explicitly by the user. This
means that some tests won't pass yet, as they assume they can affect the
submodule objects in the cache and that will affect later operations.
As we attempt to replicate a situation in which an older checkout has
put a file on disk with different filtering settings from us, set the
timestamp on the entry and file to a second before we're performing the
operation so the entry in the index counts as old.
This way we can test that we're not looking at the on-disk file when the
index has the entry and we detect it as clean.
This allows the user to look up fields which we don't parse in libgit2,
and allows them to access gpgsig or mergetag fields if they wish to
check the signature.
When a file on the workdir has the same or a newer timestamp than the
index, we need to perform a full check of the contents, as the update of
the file may have happened just after we wrote the index.
The iterator changes are such that we can reach inside the workdir
iterator from the diff, though it may be better to have an accessor
instead of moving these structs into the header.
When ticking over one second, it can happen that the actual time ticks
over the same second between the time that we undermine our own race
protections and the time in which we perform the index update. Such
timing would make the time in the entries match the index' timestamp and
we have not gained anything.
Ticking over five seconds makes it so that if real-time rolls over that
second, our index is still ahead. This is still suboptimal as we're
dealing with timing, but five seconds should be long enough for any
reasonable test runner to finish the tests.
These tests want to test that we don't recalculate entries which match
the index already. This is however something we force when truncating
racily-clean entries.
Tick the index forward as we know that we don't perform the
modifications which the racily-clean code is trying to avoid.
In order to avoid racy-git, we zero out the file size for entries with
the same timestamp as the index (or during the initial checkout). This
is the case in a couple of crlf tests, as the code is fast enough to do
everything in the same second.
As we know that we do not perform the modification just after writing
out the index, which is what this is designed to work around, tick the
mtime of the index file such that it doesn't agree with the files
anymore, and we do not zero out these entries.
We update the index and then immediately change the contents of the
file. This makes the diff think there are no changes, as the timestamp
of the file agrees with the cached data. This is however a bug, as the
file has obviously changed contents.
The test is a bit fragile, as it assumes that the index writing and the
following modification of the file happen in the same second, but it's
enough to show the issue.
Introduce a new binary diff callback to provide the actual binary
delta contents to callers. Create this data from the diff contents
(instead of directly from the ODB) to support binary diffs including
the workdir, not just things coming out of the ODB.
Some tools create multiple author fields. git is rather lax when parsing
them, although fsck does complain about them. This means that they exist
in the wild.
As it's not too taxing to check for them, and there shouldn't be a
noticeable slowdown when dealing with correct commits, add logic to skip
over these extra fields when parsing the commit.
When the callback returns an error, we should stop immediately. This
broke when trying to make sure we pass specific errors up the chain.
This broke cancelling out of the loose backend's foreach.
A remote's URLs are now modified according to the url.*.insteadOf
and url.*.pushInsteadOf configurations. This allows a user to
replace URL prefixes by setting the corresponding keys. E.g.
"url.foo.insteadOf = bar" would replace the prefix "bar" with the
new prefix "foo".
Treat input bytes as unsigned before doing arithmetic on them,
lest we look at some non-ASCII byte (like a UTF-8 character) as a
negative value and perform the comparison incorrectly.
We do not error on "merge conflicts"; on the contrary, merge conflicts
are a normal part of merging. We only error on "checkout conflicts",
where a change exists in the index or the working directory that would
otherwise be overwritten by performing the checkout.
This *may* happen during merge (after the production of the new index
that we're going to checkout) but it could happen during any checkout.
When confronted with a conflict in the index, `git_index_add_all`
should stage the working directory copy. If there is no file in the
working directory, the conflict should simply be removed.
It's not always obvious the mapping between stage level and
conflict-ness. More importantly, this can lead otherwise sane
people to write constructs like `if (!git_index_entry_stage(entry))`,
which (while technically correct) is unreadable.
Provide a nice method to help avoid such messy thinking.
Since a diff entry only concerns a single entry, zero the information
for the index side of a conflict. (The index entry would otherwise
erroneously include the lowest-stage index entry - generally the
ancestor of a conflict.)
Test that during status, the index side of the conflict is empty.
When diffing against an index, return a new `GIT_DELTA_CONFLICTED`
delta type for items that are conflicted. For a single file path,
only one delta will be produced (despite the fact that there are
multiple entries in the index).
Index iterators now have the (optional) ability to return conflicts
in the index. Prior to this change, they would be omitted, and callers
(like diff) would omit conflicted index entries entirely.
When we moved from acting on the instance to acting on the
configuration, we dropped the validation of the passed refspec, which
can lead to writing an invalid refspec to the configuration. Bring that
validation back.
When we look for which remote corresponds to a remote-tracking branch,
we look in the refspecs to see which ones matches. If none do, we should
abort. We currently ignore the error message from this operation, so
let's not do that anymore.
As part of the test we're writing, let's test for the expected behaviour
if we cannot find a refspec which tells us what the remote-tracking
branch for a remote would look like.
When we find out that we're dealing with a matching refspec, we set the
flag and return immediately. This leaves the strings as NULL, which
breaks the contract.
Assign these pointers to a string with the correct values.
When we discover that we want to keep a negative rule, make sure to
clear the error variable, as it we otherwise return whatever was left by
the previous loop iteration.
The code used to rely on the clone code calling the remote's save, which
does not happen anymore, meaning that the configuration settings the
remote expected were not being written to disk.
The run-time configuration was still being affected, so the right branch
was being cloned. The tests continued to pass as we did not check for
the configuration entires. Fix this by creating the remote with the
single-branch refspec we want and checking for its existence in the
configuration.
A couple of tests use the wrong remote to push to. We did not notice up
to now because the local push would copy individual objects, and those
already existed, so it became a no-op.
Once we made local push create the packfile, it became noticeable that
there was a new packfile where it didn't belong.
The base refspecs changing can be a cause of confusion as to what is the
current base refspec set and complicate saving the remote's
configuration.
Change `git_remote_add_{fetch,push}()` to update the configuration
instead of an instance.
This finally makes `git_remote_save()` a no-op, it will be removed in a
later commit.
While this will rarely be different from the default, having it in the
remote adds yet another setting it has to keep around and can affect its
behaviour. Move it to the options.
Instead of having it set in a different place from every other callback,
put it the main structure. This removes some state from the remote and
makes it behave more like clone, where the constructors are passed via
the options.
As a first step in removing the repository-saving logic, don't allow
chaning the url or push url from a remote object, but change the
configuration on the configuration immediately.
Having the setting be different from calling its actions was not a great
idea and made for the sake of the wrong convenience.
Instead of that, accept either fetch options, push options or the
callbacks when dealing with the remote. The fetch options are currently
only the callbacks, but more options will be moved from setters and
getters on the remote to the options.
This does mean passing the same struct along the different functions but
the typical use-case will only call git_remote_fetch() or
git_remote_push() and so won't notice much difference.
Ensure that when examining a .gitignore in a subdirectory, we do not
erroneously apply the paths contained therein to the root of the
repository. (Fixed in c02a0e4).
We have a few tests checking each step, but we do not yet have a test
which tests the documented workflow for creating a submodule, namely
`setup_add` followed by cloning into it, followed by `add_finalize`.
Add such a test to protect against regressions in this workflow.
Add a test that exposes a bug in config_write.
It is valid to have multiple separate headers for the same config section, but
config_write will exit after finding the first matching section in certain
situations.
This test proves that config_write will duplicate a variable that already
exists instead of overwriting it if the variable is defined under a duplicate
section header.
Previously we would try to be clever when writing the configuration
file and try to stop parsing (and simply copy the rest of the old
file) when we either found the value we were trying to write,
or when we left the section that value was in, the assumption being
that there was no more work to do.
Regrettably, you can have another section with the same name later
in the file, and we must cope with that gracefully, thus we read the
whole file in order to write a new file.
Now, writing a file looks even more than reading. Pull the config
parsing out into its own function that can be used by both reading
and writing the configuration.
On Mac OS, `realpath` is deficient in determining the actual filename
on-disk as it will simply provide the string you gave it if that file
exists, instead of returning the filename as it exists. Instead we
must read the directory entries for the parent directory to get the
canonical filename.
Ensure that on a case insensitive filesystem that we can checkout
into some folder 'FOLDER' that exists on disk, even if the target
of the checkout is a different case (eg 'folder').
On Windows, you might sloppily rewrite a file (or have a sloppy
text editor that does it for you) and accidentally change its
case. (eg, "README" -> "readme"). Git ignores this accidental
case changing rename during checkout and will happily write the
new content to the file despite the name change. We should, too.
The _next method shouldn't take a path pointer (and a path_len
pointer) as 100% of current users use the full path and ignore
the filename.
Plus let's add some docs and a unit test.
Moved offending tests from network to online so they will get skipped
when there is a lack of network connectivity:
-test_online_remotes__single_branch
-test_online_remotes__restricted_refspecs