The upstream git.git project verifies objects when looking them up from
disk. This avoids scenarios where objects have somehow become corrupt on
disk, e.g. due to hardware failures or bit flips. While our mantra is
usually to follow upstream behavior, we do not do so in this case, as we
never check hashes of objects we have just read from disk.
To fix this, we create a new error class `GIT_EMISMATCH` which denotes
that we have looked up an object with a hashsum mismatch. `odb_read_1`
will then, after having read the object from its backend, hash the
object and compare the resulting hash to the expected hash. If hashes do
not match, it will return an error.
This obviously introduces another computation of checksums and could
potentially impact performance. Note though that we usually perform I/O
operations directly before doing this computation, and as such the
actual overhead should be drowned out by I/O. Running our test suite
seems to confirm this guess. On a Linux system with best-of-five
timings, we had 21.592s with the check enabled and 21.590s with the
ckeck disabled. Note though that our test suite mostly contains very
small blobs only. It is expected that repositories with bigger blobs may
notice an increased hit by this check.
In addition to a new test, we also had to change the
odb::backend::nonrefreshing test suite, which now triggers a hashsum
mismatch when looking up the commit "deadbeef...". This is expected, as
the fake backend allocated inside of the test will return an empty
object for the OID "deadbeef...", which will obviously not hash back to
"deadbeef..." again. We can simply adjust the hash to equal the hash of
the empty object here to fix this test.
This can prevent FILE_SHARED_VIOLATIONS when used in tools such as TortoiseGit TGitCache and FILE_SHARE_DELETE, because files can be opened w/o being locked any more.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Provide a macro that will allow us to run a function with posix-like
return values multiple times in a retry loop, with an optional cleanup
function called between invocations.
While we already provide functionality to look up a worktree from a
repository, we cannot do so the other way round. That is given a
repository, we want to look up its worktree if it actually exists.
Getting the worktree of a repository is useful when we want to get
certain meta information like the parent's location, getting the locked
status, etc.
The functions `git_commit_header_field` and
`git_commit_extract_signature` both receive buffers used to hand back
the results to the user. While these functions called `git_buf_sanitize`
on these buffers, this is not the right thing to do, as it will simply
initialize or zero-terminate passed buffers. As we want to overwrite
contents, we instead have to call `git_buf_clear` to completely reset
them.
Implement a new function that is able to determine if a branch is
checked out in any repository connected to the current
repository. In particular, this is required to check if for a
given repository and branch, there exists any working tree
connected to that repository that is referencing this branch.
Implement `git_repository_head_for_worktree` and
`git_repository_head_detached_for_worktree` for directly accessing a
worktree's HEAD without opening it as a `git_repository` first.
Implement the `git_worktree_prune` function. This function can be
used to delete working trees from a repository. According to the
flags passed to it, it can either delete the working tree's
gitdir only or both gitdir and the working directory.
Working trees support locking by creating a file `locked` inside
the tree's gitdir with an optional reason inside. Support this
feature by adding functions to get and set the locking status.
Add a new function that checks wether a given `struct
git_worktree` is valid. The validation includes checking if the
gitdir, parent directory and common directory are present.
Introduce a new `struct git_worktree`, which holds information
about a possible working tree connected to a repository.
Introduce functions to allow opening working trees for a
repository.
The commondir variable stores the path to the common directory.
The common directory is used to store objects and references
shared across multiple repositories. A current use case is the
newly introduced `git worktree` feature, which sets up a separate
working copy, where the backing git object store and references
are pointed to by the common directory.
Added `git_repository_submodule_cache_all` to initialze a cache of
submodules on the repository so that operations looking up N
submodules are O(N) and not O(N^2). Added a
`git_repository_submodule_cache_clear` function to remove the cache.
Also optimized the function that loads all submodules as it was itself
O(N^2) w.r.t the number of submodules, having to loop through the
`.gitmodules` file once per submodule. I changed it to process the
`.gitmodules` file once, into a map.
Signed-off-by: David Turner <dturner@twosigma.com>
The function to write trees allocates a new buffer for each tree.
This causes problems with performance when performing a lot
of actions involving writing trees, e.g. when doing many merges.
Fix the issue by instead handing in a shared buffer, which is then
re-used across the calls without having to re-allocate between
calls.