To determine if a repository is a worktree or not, we currently check
for the existence of a "gitdir" file inside of the repository's gitdir.
While this is sufficient for non-broken repositories, we have at least
one case of a subtly broken repository where there exists a gitdir file
inside of a gitmodule. This will cause us to misidentify the submodule
as a worktree.
While this is not really a fault of ours, we can do better here by
observing that a repository can only ever be a worktree iff its common
directory and dotgit directory are different. This allows us to make our
check whether a repo is a worktree or not more strict by doing a simple
string comparison of these two directories. This will also allow us to
do the right thing in the above case of a broken repository, as for
submodules these directories will be the same. At the same time, this
allows us to skip the `stat` check for the "gitdir" file for most
repositories.
The check whether a repository is a worktree or not is currently done
inside of `git_repository_open_ext`. As we want to extend this function
later on, pull it out into its own function `repo_is_worktree` to ease
working on it.
The out-parameters of `find_repo` containing found paths of a repository
are a tad confusing, as they are not as obvious as they could be. Rename
them like following to ease reading the code:
- `repo_path` -> `gitdir_path`
- `parent_path` -> `workdir_path`
- `link_path` -> `gitlink_path`
- `common_path` -> `commondir_path`
The `path` out-parameter of `find_repo` is being sanitized initially
such that we do not try to append to existing content. The sanitization
is done via `git_buf_free`, though, which forces us to needlessly
reallocate the buffer later in the function. Fix this by using
`git_buf_clear` instead.
While we already provide functions to get the current repository's HEAD,
it is quite involved to iterate over HEADs of both the repository and
all linked work trees. This commit implements a function
`git_repository_foreach_head`, which accepts a callback which is then
called for all HEAD files.
The functions `git_repository_head_for_worktree` and
`git_repository_detached_head_for_worktree` both implement their
own logic to read the HEAD reference file. Use the new function
`git_reference__read_head` instead to unify the code paths.
The function `read_worktree_head` has the logic embedded to construct
the path to `HEAD` in the work tree's git directory, which is quite
useful for other callers. Extract the logic into its own function to
make it reusable by others.
If trying to set the HEAD of a repository to another reference, we have
to check whether this reference is already checked out in another linked
work tree. If it is, we will refuse setting the HEAD and return an
error, but do not set a meaningful error message. Add one.
When opening a worktree via the gitdir of its parent repository
we fail to correctly set up the worktree's working directory. The
problem here is two-fold: we first fail to see that the gitdir
actually is a gitdir of a working tree and then subsequently
fail to determine the working tree location from the gitdir.
The first problem of not noticing a gitdir belongs to a worktree
can be solved by checking for the existence of a `gitdir` file in
the gitdir. This file points back to the gitlink file located in
the working tree's working directory. As this file only exists
for worktrees, it should be sufficient indication of the gitdir
belonging to a worktree.
The second problem, that is determining the location of the
worktree's working directory, can then be solved by reading the
`gitdir` file in the working directory's gitdir. When we now
resolve relative paths and strip the final `.git` component, we
have the actual worktree's working directory location.
The `path_repository` variable is actually confusing to think
about, as it is not always clear what the repository actually is.
It may either be the path to the folder containing worktree and
.git directory, the path to .git itself, a worktree or something
entirely different. Actually, the intent of the variable is to
hold the path to the gitdir, which is either the .git directory
or the bare repository.
Rename the variable to `gitdir` to avoid confusion. While at it,
also rename `path_gitlink` to `gitlink` to improve consistency.
If a branch is already checked out in a working tree we are not
allowed to check out that branch in another repository. Introduce
this restriction when setting a repository's HEAD.
Implement `git_repository_head_for_worktree` and
`git_repository_head_detached_for_worktree` for directly accessing a
worktree's HEAD without opening it as a `git_repository` first.
A repository's configuartion file can always be found in the
GIT_COMMON_DIR, which has been newly introduced. For normal
repositories this does change nothing, but for working trees this
change allows to access the shared configuration file.
The recent introduction of the commondir variable of a repository
requires callers to distinguish whether their files are part of
the dot-git directory or the common directory shared between
multpile worktrees. In order to take the burden from callers and
unify knowledge on which files reside where, the
`git_repository_item_path` function has been introduced which
encapsulate this knowledge.
Modify most existing callers of `git_repository_path` to use
`git_repository_item_path` instead, thus making them implicitly
aware of the common directory.
The commondir variable stores the path to the common directory.
The common directory is used to store objects and references
shared across multiple repositories. A current use case is the
newly introduced `git worktree` feature, which sets up a separate
working copy, where the backing git object store and references
are pointed to by the common directory.
Added `git_repository_submodule_cache_all` to initialze a cache of
submodules on the repository so that operations looking up N
submodules are O(N) and not O(N^2). Added a
`git_repository_submodule_cache_clear` function to remove the cache.
Also optimized the function that loads all submodules as it was itself
O(N^2) w.r.t the number of submodules, having to loop through the
`.gitmodules` file once per submodule. I changed it to process the
`.gitmodules` file once, into a map.
Signed-off-by: David Turner <dturner@twosigma.com>
Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
When trying to find a discovery, we walk up the directory
structure checking if there is a ".git" file or directory and, if
so, check its validity. But in the case that we've got a ".git"
file, we do not want to unconditionally assume that the file is
in fact a ".git" file and treat it as such, as we would error out
if it is not.
Fix the issue by only treating a file as a gitlink file if it
ends with "/.git". This allows users of the function to discover
a repository by handing in any path contained inside of a git
repository.
The existing code would set a namespace of "" (empty string) with
GIT_NAMESPACE unset. In a repository where refs/heads/namespaces/
exists, that can produce incorrect results. Detect that case and avoid
setting the namespace at all.
Since that makes the last assignment to error conditional, and the
previous assignment can potentially get GIT_ENOTFOUND, set error to 0
explicitly to prevent the call from incorrectly failing with
GIT_ENOTFOUND.
find_repo had a complex loop and heavily nested conditionals, making it
difficult to follow. Simplify this as much as possible:
- Separate assignments from conditionals.
- Check the complex loop condition in the only place it can change.
- Break out of the loop on error, rather than going through the rest of
the loop body first.
- Handle error cases by immediately breaking, rather than nesting
conditionals.
- Free repo_link unconditionally on the way out of the function, rather
than in multiple places.
- Add more comments on the remaining complex steps.
git_repository_open_ext provides parameters for the start path, whether
to search across filesystems, and what ceiling directories to stop at.
git commands have standard environment variables and defaults for each
of those, as well as various other parameters of the repository. To
avoid duplicate environment variable handling in users of libgit2, add a
GIT_REPOSITORY_OPEN_FROM_ENV flag, which makes git_repository_open_ext
automatically handle the appropriate environment variables. Commands
that intend to act just like those built into git itself can use this
flag to get the expected default behavior.
git_repository_open_ext with the GIT_REPOSITORY_OPEN_FROM_ENV flag
respects $GIT_DIR, $GIT_DISCOVERY_ACROSS_FILESYSTEM,
$GIT_CEILING_DIRECTORIES, $GIT_INDEX_FILE, $GIT_NAMESPACE,
$GIT_OBJECT_DIRECTORY, and $GIT_ALTERNATE_OBJECT_DIRECTORIES. In the
future, when libgit2 gets worktree support, git_repository_open_env will
also respect $GIT_WORK_TREE and $GIT_COMMON_DIR; until then,
git_repository_open_ext with this flag will error out if either
$GIT_WORK_TREE or $GIT_COMMON_DIR is set.
GIT_REPOSITORY_OPEN_NO_SEARCH does not search up through parent
directories, but still tries the specified path both directly and with
/.git appended. GIT_REPOSITORY_OPEN_BARE avoids appending /.git, but
opens the repository in bare mode even if it has a working directory.
To support the semantics git uses when given $GIT_DIR in the
environment, provide a new GIT_REPOSITORY_OPEN_NO_DOTGIT flag to not try
appending /.git.
git only checks ceiling directories when its search ascends to a parent
directory. A ceiling directory matching the starting directory will not
prevent git from finding a repository in the starting directory or a
parent directory. libgit2 handled the former case correctly, but
differed from git in the latter case: given a ceiling directory matching
the starting directory, but no repository at the starting directory,
libgit2 would stop the search at that point rather than finding a
repository in a parent directory.
Test case using git command-line tools:
/tmp$ git init x
Initialized empty Git repository in /tmp/x/.git/
/tmp$ cd x/
/tmp/x$ mkdir subdir
/tmp/x$ cd subdir/
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x git rev-parse --git-dir
fatal: Not a git repository (or any of the parent directories): .git
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x/subdir git rev-parse --git-dir
/tmp/x/.git
Fix the testsuite to test this case (in one case fixing a test that
depended on the current behavior), and then fix find_repo to handle this
case correctly.
In the process, simplify and document the logic in find_repo():
- Separate the concepts of "currently checking a .git directory" and
"number of iterations left before going further counts as a search"
into two separate variables, in_dot_git and min_iterations.
- Move the logic to handle in_dot_git and append /.git to the top of the
loop.
- Only search ceiling_dirs and find ceiling_offset after running out of
min_iterations; since ceiling_offset only tracks the longest matching
ceiling directory, if ceiling_dirs contained both the current
directory and a parent directory, this change makes find_repo stop the
search at the parent directory.
Differentiate between the ref_name used to create an annotated_commit
(that can subsequently be used to look up the reference) and the
description that we resolved this with (which _cannot_ be looked up).
The description is used for things like reflogs (and may be a ref name,
and ID something that we revparsed to get here), while the ref name must
actually be a reference name, and is used for things like rebase to
return to the initial branch.
Include dotfiles when copying template directory, which will handle
both a template directory itself that begins with a dotfile, and
any dotfiles inside the directory.
Check that the repository directory is beneath the workdir before
adding it to the list of reserved paths. If it is not, then there
is no possibility of checking out files into it, and it should not
be a reserved word.
This is a particular problem with submodules where the repo directory
may be in the super's .git directory.
Untangle git_futils_mkdir from git_futils_mkdir_ext - the latter
assumes that we own everything beneath the base, as if it were
being called with a base of the repository or working directory,
and is tailored towards checkout and ensuring that there is no
bogosity beneath the base that must be cleaned up.
This is (at best) slow and (at worst) unsafe in the larger context
of a filesystem where we do not own things and cannot do things like
unlink symlinks that are in our way.
This is something we do on re-init but not when opening a
repository. This hasn't particularly mattered up to now as the version
has been 0 ever since the first release of git, but the times, they're
a-changing and we will soon see version 1 in the wild. We need to make
sure we don't open those.
Having this cache and giving them out goes against our multithreading
guarantees and it makes it impossible to use submodules in a
multi-threaded environment, as any thread can ask for a refresh which
may reallocate some string in the submodule struct which we've accessed
in a different one via a getter.
This makes the submodules behave more like remotes, where each object is
created upon request and not shared except explicitly by the user. This
means that some tests won't pass yet, as they assume they can affect the
submodule objects in the cache and that will affect later operations.