When trying to find a discovery, we walk up the directory
structure checking if there is a ".git" file or directory and, if
so, check its validity. But in the case that we've got a ".git"
file, we do not want to unconditionally assume that the file is
in fact a ".git" file and treat it as such, as we would error out
if it is not.
Fix the issue by only treating a file as a gitlink file if it
ends with "/.git". This allows users of the function to discover
a repository by handing in any path contained inside of a git
repository.
The existing code would set a namespace of "" (empty string) with
GIT_NAMESPACE unset. In a repository where refs/heads/namespaces/
exists, that can produce incorrect results. Detect that case and avoid
setting the namespace at all.
Since that makes the last assignment to error conditional, and the
previous assignment can potentially get GIT_ENOTFOUND, set error to 0
explicitly to prevent the call from incorrectly failing with
GIT_ENOTFOUND.
find_repo had a complex loop and heavily nested conditionals, making it
difficult to follow. Simplify this as much as possible:
- Separate assignments from conditionals.
- Check the complex loop condition in the only place it can change.
- Break out of the loop on error, rather than going through the rest of
the loop body first.
- Handle error cases by immediately breaking, rather than nesting
conditionals.
- Free repo_link unconditionally on the way out of the function, rather
than in multiple places.
- Add more comments on the remaining complex steps.
git_repository_open_ext provides parameters for the start path, whether
to search across filesystems, and what ceiling directories to stop at.
git commands have standard environment variables and defaults for each
of those, as well as various other parameters of the repository. To
avoid duplicate environment variable handling in users of libgit2, add a
GIT_REPOSITORY_OPEN_FROM_ENV flag, which makes git_repository_open_ext
automatically handle the appropriate environment variables. Commands
that intend to act just like those built into git itself can use this
flag to get the expected default behavior.
git_repository_open_ext with the GIT_REPOSITORY_OPEN_FROM_ENV flag
respects $GIT_DIR, $GIT_DISCOVERY_ACROSS_FILESYSTEM,
$GIT_CEILING_DIRECTORIES, $GIT_INDEX_FILE, $GIT_NAMESPACE,
$GIT_OBJECT_DIRECTORY, and $GIT_ALTERNATE_OBJECT_DIRECTORIES. In the
future, when libgit2 gets worktree support, git_repository_open_env will
also respect $GIT_WORK_TREE and $GIT_COMMON_DIR; until then,
git_repository_open_ext with this flag will error out if either
$GIT_WORK_TREE or $GIT_COMMON_DIR is set.
GIT_REPOSITORY_OPEN_NO_SEARCH does not search up through parent
directories, but still tries the specified path both directly and with
/.git appended. GIT_REPOSITORY_OPEN_BARE avoids appending /.git, but
opens the repository in bare mode even if it has a working directory.
To support the semantics git uses when given $GIT_DIR in the
environment, provide a new GIT_REPOSITORY_OPEN_NO_DOTGIT flag to not try
appending /.git.
git only checks ceiling directories when its search ascends to a parent
directory. A ceiling directory matching the starting directory will not
prevent git from finding a repository in the starting directory or a
parent directory. libgit2 handled the former case correctly, but
differed from git in the latter case: given a ceiling directory matching
the starting directory, but no repository at the starting directory,
libgit2 would stop the search at that point rather than finding a
repository in a parent directory.
Test case using git command-line tools:
/tmp$ git init x
Initialized empty Git repository in /tmp/x/.git/
/tmp$ cd x/
/tmp/x$ mkdir subdir
/tmp/x$ cd subdir/
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x git rev-parse --git-dir
fatal: Not a git repository (or any of the parent directories): .git
/tmp/x/subdir$ GIT_CEILING_DIRECTORIES=/tmp/x/subdir git rev-parse --git-dir
/tmp/x/.git
Fix the testsuite to test this case (in one case fixing a test that
depended on the current behavior), and then fix find_repo to handle this
case correctly.
In the process, simplify and document the logic in find_repo():
- Separate the concepts of "currently checking a .git directory" and
"number of iterations left before going further counts as a search"
into two separate variables, in_dot_git and min_iterations.
- Move the logic to handle in_dot_git and append /.git to the top of the
loop.
- Only search ceiling_dirs and find ceiling_offset after running out of
min_iterations; since ceiling_offset only tracks the longest matching
ceiling directory, if ceiling_dirs contained both the current
directory and a parent directory, this change makes find_repo stop the
search at the parent directory.
Differentiate between the ref_name used to create an annotated_commit
(that can subsequently be used to look up the reference) and the
description that we resolved this with (which _cannot_ be looked up).
The description is used for things like reflogs (and may be a ref name,
and ID something that we revparsed to get here), while the ref name must
actually be a reference name, and is used for things like rebase to
return to the initial branch.
Include dotfiles when copying template directory, which will handle
both a template directory itself that begins with a dotfile, and
any dotfiles inside the directory.
Check that the repository directory is beneath the workdir before
adding it to the list of reserved paths. If it is not, then there
is no possibility of checking out files into it, and it should not
be a reserved word.
This is a particular problem with submodules where the repo directory
may be in the super's .git directory.
Untangle git_futils_mkdir from git_futils_mkdir_ext - the latter
assumes that we own everything beneath the base, as if it were
being called with a base of the repository or working directory,
and is tailored towards checkout and ensuring that there is no
bogosity beneath the base that must be cleaned up.
This is (at best) slow and (at worst) unsafe in the larger context
of a filesystem where we do not own things and cannot do things like
unlink symlinks that are in our way.
This is something we do on re-init but not when opening a
repository. This hasn't particularly mattered up to now as the version
has been 0 ever since the first release of git, but the times, they're
a-changing and we will soon see version 1 in the wild. We need to make
sure we don't open those.
Having this cache and giving them out goes against our multithreading
guarantees and it makes it impossible to use submodules in a
multi-threaded environment, as any thread can ask for a refresh which
may reallocate some string in the submodule struct which we've accessed
in a different one via a getter.
This makes the submodules behave more like remotes, where each object is
created upon request and not shared except explicitly by the user. This
means that some tests won't pass yet, as they assume they can affect the
submodule objects in the cache and that will affect later operations.
We do not always want to put the id directly into the reflog, but we
want to speicfy what a user typed. For this use-case we provide
annotated version of a few functions which let the caller specify what
user-friendly name was used when asking for the operation.
This changes the get_entry() method to return a refcounted version of
the config entry, which you have to free when you're done.
This allows us to avoid freeing the memory in which the entry is stored
on a refresh, which may happen at any time for a live config.
For this reason, get_string() has been forbidden on live configs and a
new function get_string_buf() has been added, which stores the string in
a git_buf which the user then owns.
The functions which parse the string value takea advantage of the
borrowing to parse safely and then release the entry.
We want to use the "checkout: moving from ..." message in order to let
git know when a change of branch has happened. Make the convenience
functions for this goal write this message.
The signature for the reflog is not something which changes
dynamically. Almost all uses will be NULL, since we want for the
repository's default identity to be used, making it noise.
In order to allow for changing the identity, we instead provide
git_repository_set_ident() and git_repository_ident() which allow a user
to override the choice of signature.
A repository can have multiple "reserved names" now, not just
a single "short name" for the repository folder itself. Refactor
to include a git_repository__reserved_names that returns all the
reserved names for a repository.
This is supported by the underlying set_index() implementation
and setting the repository index to NULL is recommended by the
git_repository_set_bare() documentation.
Disallow:
1. paths with trailing dot
2. paths with trailing space
3. paths with trailing colon
4. paths that are 8.3 short names of .git folders ("GIT~1")
5. paths that are reserved path names (COM1, LPT1, etc).
6. paths with reserved DOS characters (colons, asterisks, etc)
These paths would (without \\?\ syntax) be elided to other paths - for
example, ".git." would be written as ".git". As a result, writing these
paths literally (using \\?\ syntax) makes them hard to operate with from
the shell, Windows Explorer or other tools. Disallow these.
When using a bare repo with an index, libgit2 attempts to read
files from the index. It caches those files based on the path
to the file, specifically the path to the directory that contains
the file.
If there is no working directory, we use `git_path_dirname_r` to
get the path to the containing directory. However, for the
`.gitattributes` file in the root of the repository, this ends up
normalizing the containing path to `"."` instead of the empty
string and the lookup the `.gitattributes` data fails.
This adds a test of attribute lookups on bare repos and also
fixes the problem by simply rewriting `"."` to be `""`.
Teach git_repository_init_ext to use relative paths for the gitlink
to the work directory. This is used when creating a sub repository
where the sub repository resides in the parent repository's
.git directory.