The `pack_entry_find_prefix` function receives a `git_rawobj` structure
as argument. While the function first initializes the structure to a
sensible state, Coverity is unable to correctly detect this, resulting
in a warning.
Fix this warning by initializing the object to all-zeroes before passing
it to the function.
While parsing section headers, we use a buffer to store the actual
section name. We do not check though if the buffer runs out of memory at
any stage. Do so.
The function `pass_whole_blame` performs an object lookup but does not
check if the lookup actually succeeds. Convert the function to return an
error code and check for it in the calling function.
The OpenSSL library may require multiple locks to work correctly, where
it is the caller's responsibility to initialize and release the locks.
While we correctly initialized up to `n` locks, as determined by
`CRYPTO_num_locks`, we were repeatedly freeing the same mutex in our
shutdown procedure.
Fix the issue by freeing locks at the correct index.
The `map_free` functions were not implemented as functions but instead
as macros which also set the map to NULL. While this is most certainly
sensible in most cases, we should prefer the more obvious behavior,
namingly leaving the map pointer intact.
Furthermore, this macro has been refactored incorrectly during the
map-refactorings: the two statements are not actually grouped together
by a `do { ... } while (0)` block, as it is required for macros to
match the behavior of functions more closely. This has led to at least
one subtle nesting error in `pack-objects.c`. The following code block
```
if (pb->object_ix)
git_oidmap_free(pb->object_ix);
```
would be expanded to
```
if (pb->object_ix)
git_oidmap__free(pb->object_ix); pb->object_ix = NULL;
```
which is not what one woudl expect. While it is not a bug here as it
would simply become a no-op, the wrong implementation could lead to bugs
in other occasions.
Fix this by simply removing the macro altogether and replacing it with
real function calls. This leaves the burden of setting the pointer to
NULL afterwards to the caller, but this is actually expected and behaves
like other `free` functions.
We currently call `git_strmap_free` on `checkout_data.mkdir_map` in the
`checkout_data_clear` function. The only thing protecting us from a
double-free is that the `git_strmap_free` function is in fact not a
function, but a macro that also sets the map to NULL.
Remove the second call to `git_strmap_free` and explicitly set the map
member to NULL.
It is possible to specify submodule URLs relative to the repository
location. E.g. having a submodule with URL "../submodule" will look for
the submodule at "repo/../submodule".
With the introduction of worktrees, though, we cannot simply resolve the
URL relative to the repository location itself. If the repository for
which a URL is to be resolved is a working tree, we have to resolve the
URL relative to the parent's repository path. Otherwise, the URL would
change depending on where the working tree is located.
Fix this by special-casing when we have a working tree while getting the
URL base.
References for a repository are usually created inside of its gitdir.
When using worktrees, though, these references are not to be created
inside the worktree gitdir, but instead inside the gitdir of its parent
repository, which is the commondir. Like this, branches will still be
available after the worktree itself has been deleted.
The filesystem refdb currently still creates new references inside of
the gitdir. Fix this and have it create references in commondir.
The three link files "worktree/.git", ".git/worktrees/<name>/commondir"
and ".git/worktrees/<name>/gitdir" should always contain absolute and
resolved paths. Adjust the logic creating new worktrees to first use
`git_path_prettify_dir` before writing out these files, so that paths
are resolved first.
When creating a new worktree, we have to set up the initial data
structures. Next to others, this also includes the HEAD pseudo-ref.
We currently set it to the worktree respectively branch name, which is
actually not fully qualified.
Use the fully qualified branch name instead.
The working tree's parent path should not point to the parent's gitdir,
but to the parent's working directory. Pointing to the gitdir would not
make any sense, as the parent's working directory is actually equal to
both repository's common directory.
Fix the issue.
While we already provide functionality to look up a worktree from a
repository, we cannot do so the other way round. That is given a
repository, we want to look up its worktree if it actually exists.
Getting the worktree of a repository is useful when we want to get
certain meta information like the parent's location, getting the locked
status, etc.
Separate the logic of finding the worktree directory of a repository and
actually opening the working tree's directory. This is a preparatory
step for opening the worktree structure of a repository itself.
When calling `git_submodule_update` on a submodule, we have to retrieve
the ID of the submodule entry in the index. If the function is called on
a submodule which is only partly initialized, the submodule entry may
not be added to the index yet. This leads to an assert when trying to
look up the blob later on.
Fix the issue by checking if the index actually holds the submodule's
ID and erroring out if it does not.
The function `diff_parsed_alloc` allocates and initializes a
`git_diff_parsed` structure. This structure also contains diff options.
While we initialize its flags, we fail to do a real initialization of
its values. This bites us when we want to actually use the generated
diff as we do not se the option's version field, which is required to
operate correctly.
Fix the issue by executing `git_diff_init_options` on the embedded
struct.
In a diff, the shortest possible hunk with a modification (that is, no
deletion) results from a file with only one line with a single character
which is removed. Thus the following hunk
@@ -1 +1 @@
-a
+
is the shortest valid hunk modifying a line. The function parsing the
hunk body though assumes that there must always be at least 4 bytes
present to make up a valid hunk, which is obviously wrong in this case.
The absolute minimum number of bytes required for a modification is
actually 2 bytes, that is the "+" and the following newline. Note: if
there is no trailing newline, the assumption will not be offended as the
diff will have a line "\ No trailing newline" at its end.
This patch fixes the issue by lowering the amount of bytes required.
Now that the `git_diff_foreach` function does not depend on internals of
the `git_patch_generated` structure anymore, we can easily move it to
the actual diff code.
The current logic of `git_diff_foreach` makes the assumption that all
diffs passed in are actually derived from generated diffs. With these
assumptions we try to derive the actual diff by inspecting either the
working directory files or blobs of a repository. This obviously cannot
work for diffs parsed from a file, where we do not necessarily have a
repository at hand.
Since the introduced split of parsed and generated patches, there are
multiple functions which help us to handle patches generically, being
indifferent from where they stem from. Use these functions and remove
the old logic specific to generated patches. This allows re-using the
same code for invoking the callbacks on the deltas.
Under the existing logic, we try to load patch contents differently,
depending on whether the patch files stem from the working directory or
not. But actually, the executed code paths are completely equal to each
other -- so we were always the code despite the condition.
Remove the condition altogether and conflate both code paths.
Don't compute the sha-1 in `git_futils_readbuffer_updated` unless the
checksum was requested. This means that `git_futils_readbuffer` will
not calculate the checksum unnecessarily.
An untracked file in a submodule should not prevent a rebase from
starting. Even if the submodule's SHA is changed, and that file would
conflict with a new tracked file, it's still OK to start the rebase
and discover the conflict later.
Signed-off-by: David Turner <dturner@twosigma.com>
When fsync'ing files, fsync the parent directory in the case where we
rename a file into place, or create a new file, to ensure that the
directory entry is flushed correctly.
Add a custom `O_FSYNC` bit (if it's not been defined by the operating
system`) so that `git_futils_writebuffer` can optionally do an `fsync`
when it's done writing.
We call `fsync` ourselves, even on systems that define `O_FSYNC` because
its definition is no guarantee of its actual support. Mac, for
instance, defines it but doesn't support it in an `open(2)` call.
Introduce a simple counter that `p_fsync` implements. This is useful
for ensuring that `p_fsync` is called when we expect it to be, for
example when we have enabled an odb backend to perform `fsync`s when
writing objects.
Fixes a regression from #4092. This is a crash on 32-bit and I assume that
it doesn't do the right thing on 64-bit either. MSVC emits a warning for this,
but of course, it's easy to get lost among all of the similar 'possible loss
of data' warnings.
Provide a descriptive error message when compiling THREADSAFE on gcc
versions < 4.1. We require the atomic primitives (eg
`__sync_synchronize`) that were introduced in that version.
(Note, clang setes `__GNUC__` but appears to set its version > 4.1.)
Remove useless indirection from `git_attr_cache__init` to
`git_attr_cache__do_init`. The difference is that the
`git_attr_cache__init` macro first checks if the cache is already
initialized and, if so, not call `git_attr_cache__do_init`. But
actually, `git_attr_cache__do_init` already does the same thing and
returns immediately if the cache is already initialized.
Remove the indirection.
When doing an upsert of a file, we used to use `git__compare_and_swap`,
comparing the entry's file which is to be replaced with itself. This can
be more easily formulated by using `git__swap`, which unconditionally
replaces the value.
`snprintf` requires a _format_ but does not require _arguments_ to the
format. eg: `snprintf(buf, 42, "hi")` is perfectly legal. Expand the
macro to match.
Without this, `p_sprintf(buf, 42, "hi")` errors with:
```
error: expected expression
p_snprintf(msg, 42, "hi");
^
src/unix/posix.h:53:34: note: expanded from macro 'p_snprintf'
^
/usr/include/secure/_stdio.h:57:73: note: expanded from macro 'snprintf'
__builtin___snprintf_chk (str, len, 0, __darwin_obsz(str),
__VA_ARGS__)
```
The upstream git.git project currently identifies all references inside
of `refs/bisect/` as well as `HEAD` as per-worktree references. This is
already incorrect and is currently being fixed by an in-flight topic
[1]. The new behavior will be to match all pseudo-references outside of
the `refs/` hierarchy as well as `refs/bisect/`.
Our current behavior is to mark a selection of pseudo-references as
per-worktree, only. This matches more pseudo-references than current
git, but forgets about `refs/bisect/`. Adjust behavior to match the
in-flight topic, that is classify the following references as
per-worktree:
- everything outside of `refs/`
- everything inside of `refs/bisect/`
[1]: <20170213152011.12050-1-pclouds@gmail.com>
When extracting a commit's signature, we first free the object and only
afterwards put its signature contents into the result buffer. This works
in most cases - the free'd object will normally be cached anyway, so we
only end up decrementing its reference count without actually freeing
its contents. But in some more exotic setups, where caching is disabled,
this can definitly be a problem, as we might be the only instance
currently holding a reference to this object.
Fix this issue by first extracting the contents and freeing the object
afterwards only.
The functions `git_commit_header_field` and
`git_commit_extract_signature` both receive buffers used to hand back
the results to the user. While these functions called `git_buf_sanitize`
on these buffers, this is not the right thing to do, as it will simply
initialize or zero-terminate passed buffers. As we want to overwrite
contents, we instead have to call `git_buf_clear` to completely reset
them.
When `git_buf_sanitize` gets called, it converts a buffer with NULL
content to be correctly initialized. This is done by pointing it to
`git_buf__initbuf`. While the method's documentation states this
clearly, it may also lead to the conclusion that it will do the same to
buffers which do _not_ have NULL contents.
Clarify behavior when passing a buffer with non-NULL contents, where
`git_buf_sanitize` will ensure that the contents are `\0`-terminated.
When opening a worktree via the gitdir of its parent repository
we fail to correctly set up the worktree's working directory. The
problem here is two-fold: we first fail to see that the gitdir
actually is a gitdir of a working tree and then subsequently
fail to determine the working tree location from the gitdir.
The first problem of not noticing a gitdir belongs to a worktree
can be solved by checking for the existence of a `gitdir` file in
the gitdir. This file points back to the gitlink file located in
the working tree's working directory. As this file only exists
for worktrees, it should be sufficient indication of the gitdir
belonging to a worktree.
The second problem, that is determining the location of the
worktree's working directory, can then be solved by reading the
`gitdir` file in the working directory's gitdir. When we now
resolve relative paths and strip the final `.git` component, we
have the actual worktree's working directory location.
The `path_repository` variable is actually confusing to think
about, as it is not always clear what the repository actually is.
It may either be the path to the folder containing worktree and
.git directory, the path to .git itself, a worktree or something
entirely different. Actually, the intent of the variable is to
hold the path to the gitdir, which is either the .git directory
or the bare repository.
Rename the variable to `gitdir` to avoid confusion. While at it,
also rename `path_gitlink` to `gitlink` to improve consistency.
If a branch is already checked out in a working tree we are not
allowed to check out that branch in another repository. Introduce
this restriction when setting a repository's HEAD.
Implement a new function that is able to determine if a branch is
checked out in any repository connected to the current
repository. In particular, this is required to check if for a
given repository and branch, there exists any working tree
connected to that repository that is referencing this branch.
Implement `git_repository_head_for_worktree` and
`git_repository_head_detached_for_worktree` for directly accessing a
worktree's HEAD without opening it as a `git_repository` first.
Implement the `git_worktree_prune` function. This function can be
used to delete working trees from a repository. According to the
flags passed to it, it can either delete the working tree's
gitdir only or both gitdir and the working directory.
Working trees support locking by creating a file `locked` inside
the tree's gitdir with an optional reason inside. Support this
feature by adding functions to get and set the locking status.
Add a new function that checks wether a given `struct
git_worktree` is valid. The validation includes checking if the
gitdir, parent directory and common directory are present.
Introduce a new `struct git_worktree`, which holds information
about a possible working tree connected to a repository.
Introduce functions to allow opening working trees for a
repository.
A repository's configuartion file can always be found in the
GIT_COMMON_DIR, which has been newly introduced. For normal
repositories this does change nothing, but for working trees this
change allows to access the shared configuration file.
The refdb_fs_backend is not aware of the git commondir, which
stores common objects like the o bject database and packed/loose
refereensces when worktrees are used.
Make refdb_fs_backend aware of the common directory by
introducing a new commonpath variable that points to the actual
common path of the database and using it instead of the gitdir
for the mentioned objects.
The variable '.path' of the refdb_fs_backend struct becomes
confusing regarding the introduction of the git commondir. It
does not immediatly become obvious what it should point to.
Fix this problem by renaming the variable to `gitpath`,
clarifying that it acutally points to the `.git` directory of the
repository, in contrast to the commonpath directory, which points
to the directory containing shared objects like references and
the object store.
The recent introduction of the commondir variable of a repository
requires callers to distinguish whether their files are part of
the dot-git directory or the common directory shared between
multpile worktrees. In order to take the burden from callers and
unify knowledge on which files reside where, the
`git_repository_item_path` function has been introduced which
encapsulate this knowledge.
Modify most existing callers of `git_repository_path` to use
`git_repository_item_path` instead, thus making them implicitly
aware of the common directory.
The commondir variable stores the path to the common directory.
The common directory is used to store objects and references
shared across multiple repositories. A current use case is the
newly introduced `git worktree` feature, which sets up a separate
working copy, where the backing git object store and references
are pointed to by the common directory.
When calling `git_path_dirname_r` on a Win32 prefix, e.g. a drive
or network share prefix, we always want to return the trailing
'/'. This does not work currently when passing in a path like
'C:', where the '/' would not be appended correctly.
Fix this by appending a '/' if we try to normalize a Win32 prefix
and there is no trailing '/'.
Getting the dirname of a filesystem root should return the filesystem
root itself. E.g. the dirname of "/" is always "/". On Windows, we
emulate this behavior and as such, we should return e.g. "C:/" if
calling dirname on "C:/". But we currently fail to do so and instead
return ".", as we do not check if we actually have a Windows prefix
before stripping off the last directory component.
Fix this by calling out to `win32_prefix_length` immediately after
stripping trailing slashes, returning early if we have a prefix.
Extract code which determines if a path is at a Windows system's root.
This incluses drive prefixes (e.g. "C:\") as well as network computer
names (e.g. "//computername/").
The code reversing a vector initially determines the rear-pointer by
simply subtracting 1 from the vector's length. Obviously, this fails if
the vector is empty, in which case we have an integer overflow.
Fix the issue by returning early if the vector is empty.
Fix the following warning emitted by clang:
[ 16%] Building C object CMakeFiles/libgit2_clar.dir/src/submodule.c.o
/Users/mplough/devel/external/libgit2/src/submodule.c:408:6: warning: variable 'i' is used uninitialized whenever 'if' condition is true
[-Wsometimes-uninitialized]
if ((error = load_submodule_names(names, cfg)))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/mplough/devel/external/libgit2/src/submodule.c:448:20: note: uninitialized use occurs here
git_iterator_free(i);
^
/Users/mplough/devel/external/libgit2/src/submodule.c:408:2: note: remove the 'if' if its condition is always false
if ((error = load_submodule_names(names, cfg)))
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/Users/mplough/devel/external/libgit2/src/submodule.c:404:17: note: initialize the variable 'i' to silence this warning
git_iterator *i;
^
= NULL
1 warning generated.
Set up a WinHTTP status callback; inspect the WinHTTP status for
WINHTTP_CALLBACK_STATUS_SECURE_FAILURE, and convert the status code
to a useful message for callers.
`git_submodule_status` is very slow, bottlenecked on
`git_repository_head_tree`, which it uses through `submodule_update_head`. If
the user has requested submodule caching, assume that they want this status
cached too and skip it.
Signed-off-by: David Turner <dturner@twosigma.com>
Added `git_repository_submodule_cache_all` to initialze a cache of
submodules on the repository so that operations looking up N
submodules are O(N) and not O(N^2). Added a
`git_repository_submodule_cache_clear` function to remove the cache.
Also optimized the function that loads all submodules as it was itself
O(N^2) w.r.t the number of submodules, having to loop through the
`.gitmodules` file once per submodule. I changed it to process the
`.gitmodules` file once, into a map.
Signed-off-by: David Turner <dturner@twosigma.com>
For username/password credentials, support NTLM or Basic (in that order
of priority). Use the WinHTTP built-in authentication support for both,
and maintain a bitfield of the supported mechanisms from the response.
The Git protocol does not specify what should happen in the case
of an empty packet line (that is a packet line "0004"). We
currently indicate success, but do not return a packet in the
case where we hit an empty line. The smart protocol was not
prepared to handle such packets in all cases, though, resulting
in a `NULL` pointer dereference.
Fix the issue by returning an error instead. As such kind of
packets is not even specified by upstream, this is the right
thing to do.
Each packet line in the Git protocol is prefixed by a four-byte
length of how much data will follow, which we parse in
`git_pkt_parse_line`. The transmitted length can either be equal
to zero in case of a flush packet or has to be at least of length
four, as it also includes the encoded length itself. Not
checking this may result in a buffer overflow as we directly pass
the length to functions which accept a `size_t` length as
parameter.
Fix the issue by verifying that non-flush packets have at least a
length of `PKT_LEN_SIZE`.
git_checkout_tree() sets up its working directory iterator to respect the
pathlist if GIT_CHECKOUT_DISABLE_PATHSPEC_MATCH is present, which is great.
What's not so great is that this iterator is then used side-by-side with
an iterator created by git_checkout_iterator(), which did not set up its
pathlist appropriately (although the iterator mirrors all other iterator
options).
This could cause git_checkout_tree() to delete working tree files which
were not specified in the pathlist when GIT_CHECKOUT_DISABLE_PATHSPEC_MATCH
was used, as the unsynchronized iterators causes git_checkout_tree() to think
that files have been deleted between the two trees. Oops.
And added a test which fails without this fix (specifically, the final check
for "testrepo/README" to still be present fails).
Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
We want to keep the git UA in order for services to recognise that we're
a Git client and not a browser. But in order to stop dumb HTTP some
services have blocked UAs that claim to be pre-1.6.6 git.
Thread these needles by using the "git/2.0" prefix which is still close
enough to git's yet distinct enough that you can tell it's us.
This partially reverts bdec62dce1 which activates
the transport code-paths which allow you to use a custom TLS implementation
without having to have one at build-time.
However the capabilities describe how libgit2 was built, not what it could
potentially support, bring back the ifdefs so we only say we support HTTPS if
libgit2 was itself built with a TLS implementation.
The function to write trees allocates a new buffer for each tree.
This causes problems with performance when performing a lot
of actions involving writing trees, e.g. when doing many merges.
Fix the issue by instead handing in a shared buffer, which is then
re-used across the calls without having to re-allocate between
calls.
When trying to uncompress deltas in a packfile's delta chain, we try to
add object bases to the packfile cache, subsequently decrementing its
reference count if it has been added successfully. This may lead to a
mismatched reference count in the case where we exit the loop early due
to an encountered error.
Fix the issue by decrementing the reference count in error cleanup.
git_rebase_finish relies on head_detached being set, but
rebase_init_merge was only setting it when branch->ref_name was unset.
But branch->ref_name would be set to "HEAD" in the case of detached
HEAD being either implicitly (NULL) or explicitly passed to
git_rebase_init.