The pair of `git_blob_create_frombuffer()` and
`git_blob_create_frombuffer_commit()` is meant to replace
`git_blob_create_fromchunks()` by providing a way for a user to write a
new blob when they want filtering or they do not know the size.
This approach allows the caller to retain control over when to add data
to this buffer and a more natural fit into higher-level language's own
stream abstractions instead of having to handle IO wait in the callback.
The in-memory buffer size of 2MB is chosen somewhat arbitrarily to be a
round multiple of usual page sizes and a value where most blobs seem
likely to be either going to be way below or way over that size. It's
also a round number of pages.
This implementation re-uses the helper we have from `_fromchunks()` so
we end up writing everything to disk, but hopefully more efficiently
than with a default filebuf. A later optimisation can be to avoid
writing the in-memory contents to disk, with some extra complexity.
This special-casing ignores that we might have a locked file, so the
hashtable does not represent the contents of the file we want to
write. This causes multivar writes to overwrite entries instead of add
to them when under lock.
There is no need for this as the normal code-path will write to the file
just fine, so simply get rid of it.
Since the `apply` callback can defer, the `check` callback is not
necessary. Removing the `check` callback further makes the `payload`
unnecessary along with the `cleanup` callback.
Ensure that setting the merge attribute forces the built-in default
`text` driver and does *not* honor the `merge.default` configuration
option. Further ensure that unsetting the merge attribute forces
a conflict (the `binary` driver).
Consumers can now register custom merged drivers with
`git_merge_driver_register`. This allows consumers to support the
merge drivers, as configured in `.gitattributes`. Consumers will be
asked to perform the file-level merge when a custom driver is
configured.
The function to extract signatures suffers from a similar bug to the
header field finding one by having an unecessary line feed check as a
break condition of its loop.
Fix that and add a test for this single-line signature situation.
This ensures that when using OpenSSL a safe default set of ciphers
is selected. This is done so that the client communicates securely
and we don't accidentally enable unsafe ciphers like RC4, or even
worse some old export ciphers.
Implements the first part of https://github.com/libgit2/libgit2/issues/3682
git_buf_clear does not free allocated memory associated with a
git_buf. Use `git_buf_free` instead to correctly free its memory
and plug the memory leak.
The old implementation had two issues:
1. OIDs that were too short as to be ambiguous were not being handled
properly.
2. If the last OID to expand in the array was missing from the ODB, we
would leak a `GIT_ENOTFOUND` error code from the function.
Sometimes you want to create a commit but not write it out to the
objectdb immediately. For these cases, provide a new function to
retrieve the buffer instead of having to go through the db.
If the underlying filesystem doesn't support better than one
second resolution, then don't expect that turning on `GIT_USE_NSEC`
does anything magical to change that.
Submodules don't exist in the objectdb and the code is making us try to
look for a blob with its commit id, which is obviously not going to
work.
Skip the test if the user wants to insert a submodule.
The index::nsec::staging_maintains_other_nanos test was created to
ensure that when we stage an entry when GIT_USE_NSECS is *unset* that
we truncate the index entry and do not persist the (old, invalid)
nanosec values. Ensure that when GIT_USE_NSECS is *set* that we do
not do that, and actually write the correct nanosecond values.
Test some additional exotic rebase setup behavior: that we are
able to set up properly when already in a detached HEAD state,
that the caller specifies all of branch, upstream and onto,
and that the caller specifies branch, upstream and onto by ID.
Allow `git_index_read` to handle reading existing indexes with
illegal entries. Allow the low-level `git_index_add` to add
properly formed `git_index_entry`s even if they contain paths
that would be illegal for the current filesystem (eg, `AUX`).
Continue to disallow `git_index_add_bypath` from adding entries
that are illegal universally illegal (eg, `.git`, `foo/../bar`).
Introduce a repository that contains some paths that were illegal
on PC-DOS circa 1981 (like `aux`, `con`, `com1`) and that in a
bizarre fit of retrocomputing, remain illegal on some "modern"
computers, despite being "new technology".
Introduce some aspirational tests that suggest that we should be
able to cope with trees and indexes that contain paths that
would be illegal on the filesystem, so that we can at least diff
them. Further ensure that checkout will not write a repository
with forbidden paths.
We should be checking whether the object we're looking up is a commit,
and we should let the caller know whether the not-found return code
comes from a bad object type or just a missing signature.
When performing an in-memory rebase, keep a single index for the
duration, so that callers have the expected index lifecycle and
do not hold on to an index that is free'd out from under them.
When we moved the logic to handle the first one, wrong loop logic was
kept in place which meant we still finished early. But we now notice it
because we're not reading past the last LF we find.
This was not noticed before as the last field in the tested commit was
multi-line which does not trigger the early break.
Introduce the ability to rebase in-memory or in a bare repository.
When `rebase_options.inmemory` is specified, the resultant `git_rebase`
session will not be persisted to disk. Callers may still analyze
the rebase operations, resolve any conflicts against the in-memory
index and create the commits. Neither `HEAD` nor the working
directory will be updated during this process.
We were searching only past the first header field, which meant we were
unable to find e.g. `tree` which is the first field.
While here, make sure to set an error message in case we cannot find the
field.
Include dotfiles when copying template directory, which will handle
both a template directory itself that begins with a dotfile, and
any dotfiles inside the directory.
Fix the file-mode test to expect system umask being applied to the
created file as well (it is currently applied to the directory only).
This fixes the test on systems where umask != 022.
Signed-off-by: Michał Górny <mgorny@gentoo.org>
When formatting a patch as email we do not include the commit's
message in the formatted patch output. Implement this and add a
test that verifies behavior.
It is already possible to get a commit's summary with the
`git_commit_summary` function. It is not possible to get the
remaining part of the commit message, that is the commit
message's body.
Fix this by introducing a new function `git_commit_body`.
It is not unreasonable to have versioned files with a line count
exceeding 2^16. Upon blaming such files we fail to correctly keep
track of the lines as `git_blame_hunk` stores them in `uint16_t`
fields.
Fix this by converting the line fields of `git_blame_hunk` to
`size_t`. Add test to verify behavior.
When building a recursive merge base, allow conflicts to occur.
Use the file (with conflict markers) as the common ancestor.
The user has already seen and dealt with this conflict by virtue
of having a criss-cross merge. If they resolved this conflict
identically in both branches, then there will be no conflict in the
result. This is the best case scenario.
If they did not resolve the conflict identically in the two branches,
then we will generate a new conflict. If the user is simply using
standard conflict output then the results will be fairly sensible.
But if the user is using a mergetool or using diff3 output, then the
common ancestor will be a conflict file (itself with diff3 output,
haha!). This is quite terrible, but it matches git's behavior.
Don't put the configuration in a subdir of the sandbox named
`config`, lest some tests decide to create their own directory
called `config`. Prefix with some underscores for uniqueness.
Ensure that `git_index_read_index` clears the uptodate bit on
files that it modifies.
Further, do not propagate the cache from an on-disk index into
another on-disk index. Although this should not be done, as
`git_index_read_index` is used to bring an in-memory index into
another index (that may or may not be on-disk), ensure that we do
not accidentally bring in these bits when misused.
Test that entries are only smudged when we write the index: the
entry smudging is to prevent us from updating an index in a way
that it would be impossible to tell that an item was racy.
Consider when we load an index: any entries that have the same
(or newer) timestamp than the index itself are considered racy,
and are subject to further scrutiny.
If we *save* that index with the same entries that we loaded,
then the index would now have a newer timestamp than the entries,
and they would no longer be given that additional scrutiny, failing
our racy detection! So test that we smudge those entries only on
writing the new index, but that we can detect them (in diff) without
having to write.
When there's no matching index entry (for whatever reason), don't
try to dereference the null return value to get at the id.
Otherwise when we break something in the index API, the checkout
test crashes for confusing reasons and causes us to step through
it in a debugger thinking that we had broken much more than we
actually did.
Keep track of entries that we believe are up-to-date, because we
added the index entries since the index was loaded. This prevents
us from unnecessarily examining files that we wrote during the
cleanup of racy entries (when we smudge racily clean files that have
a timestamp newer than or equal to the index's timestamp when we
read it). Without keeping track of this, we would examine every
file that we just checked out for raciness, since all their timestamps
would be newer than the index's timestamp.
When creating a filebuf, detect a directory that exists in our
target file location. This prevents a failure later, when we try
to move the lock file to the destination.
Test that on platforms without `core.symlinks`, we preserve symlinks
in `git_index_add_bypath`. (Users should correct the actual index
entry's mode to change a link to a regular file.)
When `core.symlinks = false`, we write the symlinks content (target)
to a regular file. We should ensure that when we later see that
regular file, we treat it specially - and that changing that regular
file would actually change the symlink target. (For compatibility
with Git for Windows).
We currently use the timestamp in order to decide whether a config file
has changed since we last read it.
This scheme falls down if the file is written twice within the same
second, as we fail to detect the file change after the first read in
that second.
Using calloc instead of malloc because the parse error will lead to an immediate free of committer (and its properties, which can segfault on free if undefined - test_refs_reflog_reflog__reading_a_reflog_with_invalid_format_returns_error segfaulted before the fix).
#3458
Provide a new merge option, GIT_MERGE_TREE_FAIL_ON_CONFLICT, which
will stop on the first conflict and fail the merge operation with
GIT_EMERGECONFLICT.
Test that nanoseconds are round-tripped correctly when we read
an index file that contains them. We should, however, ignore them
because we don't understand them, and any new entries in the index
should contain a `0` nsecs field, while existing preserving entries.
For most real use cases, repositories with alternates use them as main
object storage. Checking the alternate for objects before the main
repository should result in measurable speedups.
Because of this, we're changing the sorting algorithm to prioritize
alternates *in cases where two backends have the same priority*. This
means that the pack backend for the alternate will be checked before the
pack backend for the main repository *but* both of them will be checked
before any loose backends.
We moved the "main" parsing to use 64 bits for the timestamp, but the
quick parsing for the revwalk did not. This means that for large
timestamps we fail to parse the time and thus the walk.
Move this parser to use 64 bits as well.
xdiff craps the bed on large files. Treat very large files as binary,
so that it doesn't even have to try.
Refactor our merge binary handling to better match git.git, which
looks for a NUL in the first 8000 bytes.
As refdb and odb backends can be allocated by client code, libgit2
can’t know whether an alternative memory allocator was used, and thus
should not try to call `git__free` on those objects.
Instead, odb and refdb backend implementations must always provide
their own `free` functions to ensure memory gets freed correctly.
When we do not trust the on-disk mode, we use the mode of an existing
index entry. This allows us to preserve executable bits on platforms
that do not honor them on the filesystem.
If there is no stage 0 index entry, also look at conflicts to attempt
to answer this question: prefer the data from the 'ours' side, then
the 'theirs' side before falling back to the common ancestor.
git expects an empty line after the binary data:
literal X
...binary data...
<empty_line>
The last literal block of the generated patches were not containing the required empty line. Example:
diff --git a/binary_file b/binary_file
index 3f1b3f9098131cfecea4a50ff8afab349ea66d22..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 6
Nc${NM%g@i}0ssZ|0lokL
diff --git a/binary_file2 b/binary_file2
index 31be99be19470da4af5b28b21e27896a2f2f9ee2..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 13
Sc${NMEKbZyOexL+Qd|HZV+4u-
git apply of that diff results in:
error: corrupt binary patch at line 9: diff --git a/binary_file2 b/binary_file2
fatal: patch with only garbage at line 10
The proper formating is:
diff --git a/binary_file b/binary_file
index 3f1b3f9098131cfecea4a50ff8afab349ea66d22..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 6
Nc${NM%g@i}0ssZ|0lokL
diff --git a/binary_file2 b/binary_file2
index 31be99be19470da4af5b28b21e27896a2f2f9ee2..86e5c1008b5ce635d3e3fffa4434c5eccd8f00b6 100644
GIT binary patch
literal 8
Pc${NM&PdElPvrst3ey5{
literal 13
Sc${NMEKbZyOexL+Qd|HZV+4u-
This allows us to remove OS checks from source code, instead relying
on CMake to detect whether or not `struct stat` has the nanoseconds
members we rely on.
Test an initial submodule update, where we are trying to checkout
the submodule for the first time, and placing a file within the
submodule working directory with the same name as the submodule
(and consequently, the same name as the repository itself).
`git_futils_mkdir` does not blindly call `git_futils_mkdir_relative`.
`git_futils_mkdir_relative` is used when you have some base directory
and want to create some path inside of it, potentially removing blocking
symlinks and files in the process. This is not suitable for a general
recursive mkdir within the filesystem.
Instead, when `mkdir` is being recursive, locate the first existent
parent directory and use that as the base for `mkdir_relative`.
Untangle git_futils_mkdir from git_futils_mkdir_ext - the latter
assumes that we own everything beneath the base, as if it were
being called with a base of the repository or working directory,
and is tailored towards checkout and ensuring that there is no
bogosity beneath the base that must be cleaned up.
This is (at best) slow and (at worst) unsafe in the larger context
of a filesystem where we do not own things and cannot do things like
unlink symlinks that are in our way.
When a file exists on disk and we're checking out a file that differs
in executableness, remove the old file. This allows us to recreate the
new file with p_open, which will take the new mode into account and
handle setting the umask properly.
Remove any notion of chmod'ing existing files, since it is now handled
by the aforementioned removal and was incorrect, as it did not take
umask into account.
Ensure that we can iterate the filesystem root and that paths come
back well-formed, not with an additional '/'. (eg, when iterating
`c:/`, expect that we do not get some path like `c://autoexec.bat`).
The previous commit left the comment referencing the earlier state of
the code, change it to explain the current logic. While here, change the
logic to avoid repeating the copy of the base pattern.
These are small pieces of data, so there is no advantage to allocating
them separately. Include the two ids inline in the struct we use to
check that the expected and actual ids match.
On case insensitive platforms, allow `git_index_add` to provide a new
path for an existing index entry. Previously, we would maintain the
case in an index entry without the ability to change it (except by
removing an entry and re-adding it.)
Higher-level functions (like `git_index_add_bypath` and
`git_index_add_frombuffers`) continue to keep the old path for easier
usage.
On case insensitive systems, when given a user-provided path in the
higher-level index addition functions (eg `git_index_add_bypath` /
`git_index_add_frombuffer`), examine the index to try to match the
given path to an existing directory.
Various mechanisms can cause the on-disk representation of a folder
to not match the representation in HEAD or the index - for example,
a case changing rename of some file `a/file.txt` to `A/file.txt`
will update the paths in the index, but not rename the folder on
disk.
If a user subsequently adds `a/other.txt`, then this should be stored
in the index as `A/other.txt`.
We create a lockfile to update files under GIT_DIR. Sometimes these
files are actually located elsewhere and a symlink takes their place. In
that case we should lock and update the file at its final location
rather than overwrite the symlink.
Some nicer refactoring for index iteration walks.
The index iterator doesn't binary search through the pathlist space,
since it lacks directory entries, and would have to binary search
each index entry and all its parents (eg, when presented with an index
entry of `foo/bar/file.c`, you would have to look in the pathlist for
`foo/bar/file.c`, `foo/bar` and `foo`). Since the index entries and the
pathlist are both nicely sorted, we walk the index entries in lockstep
with the pathlist like we do for other iteration/diff/merge walks.
When using literal pathspecs in diff with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`
turn on the faster iterator pathlist handling.
Updates iterator pathspecs to include directory prefixes (eg, `foo/`)
for compatibility with `GIT_DIFF_DISABLE_PATHSPEC_MATCH`.
Document that `GIT_DIFF_PATHSPEC_DISABLE` is not necessarily about
explicit path matching, but also includes matching of directory
names. Enforce this in a test.
When given a pathlist, don't assume that directories sort before
files. Walk through any list of entries sorting before us to make
sure that we've exhausted all entries that *aren't* directories.
Eg, if we're searching for 'foo/bar', and we have a 'foo.c', keep
advancing the pathlist to keep looking for an entry prefixed with
'foo/'.
When parsing user-provided regex patterns for functions, we must not
fail to provide a diff just because a pattern is not well
formed. Ignore it instead.
This lock/unlock pair allows for the cller to lock a configuration file
to avoid concurrent operations.
It also allows for a transactional approach to updating a configuration
file. If multiple updates must be made atomically, they can be done
while the config is locked.
When a configuration file is locked, any updates made to it will be done
to the in-memory copy of the file. This allows for multiple updates to
happen while we hold the lock, preventing races during complex
config-file manipulation.
While we download the remote's remote-tracking branches, we don't
download the tag. This points to the tag auto-follow rules interfering
with the refspec.
When we pass the path of a repository to `_bypath()`, we should behave
like git and stage it as a `_COMMIT` regardless of whether it is
registered a a submodule.
When we have an error renaming the lockfile, we need to make sure
that we remove it upon cleanup. For this, we need to keep track of
whether we opened the file and whether the rename succeeded.
If we did create the lockfile but the rename did not succeed, we
remove the lockfile. This won't protect against all errors, but
the most common ones (target file is open) does get handled.
The header src/cc-compat.h defines portable format specifiers PRIuZ, PRIdZ, and PRIxZ. The original report highlighted the need to use these specifiers in examples/network/fetch.c. For this commit, I checked all C source and header files not in deps/ and transitioned to the appropriate format specifier where appropriate.
Removing a reflog upon ref deletion is something which only some
backends might wish to do. Backends which are database-backed may wish
to archive a reflog, log-based ones may not need to do anything.
When we rename a submodule, we should be merging two sets of information
based on whether their path is the same. We currently only deduplicate
on equal name, which causes us to double-report.