git hardocodes these as objects which exist regardless of whether they
are in the odb and uses them in the shell interface as a way of
expressing the lack of a blob or tree for one side of e.g. a diff.
In the library we use each language's natural way of declaring a lack of
value which makes a workaround like this unnecessary. Since git uses it,
it does however mean each shell application would need to perform this
check themselves.
This makes it common work across a range of applications and an issue
with compatibility with git, which fits right into what the library aims
to provide.
Thus we introduce the hard-coded empty blob and tree in the odb
frontend. These hard-coded objects are checked for before going to the
backends, but after the cache check, which means the second time they're
used, they will be treated as normal cached objects instead of creating
new ones.
If the remote is anonymous, then we cannot check for any configuration,
as there is no name. Check for this before we try to use the name, which
may be a NULL pointer.
This fixes#2697.
This function has one output but can match multiple files, which can be
unexpected for the user, which would usually path the exact path of the
file he wants the status of.
We cannot know from looking at .gitmodules whether a directory is a
submodule or not. We need the index or tree we are comparing against to
tell us. Otherwise we have to assume the entry in .gitmodules is stale
or otherwise invalid.
Thus we pass the index of the repository into the workdir iterator, even
if we do not want to compare against it. This follows what git does,
which even for `git diff <tree>`, it will consider staged submodules as
such.
We consider an entry in .gitmodules to mean that we have a submodule at
a particular path, even if HEAD^{tree} and the index do not contain any
reference to it.
We should ignore that submodule entry and simply consider that path to
be a regular directory.
We currently consider CR to start the end of the line, but that means
that we miss cases with CR CR LF which can be used with git to match
files whose names have CR at the end of their names.
The fix from the patch comes from Russell's comment in the issue.
This fixes#2536.
* Error-handling is cleaned up to only let a file-not-found error
through, not other sorts of errors. And when a file-not-found
error happens, we clean up the error.
* Test now checks that file-not-found introduces no error. And
other minor cleanups.
An anonymous remote wouldn't create remote-tracking branches, so testing
we don't create them for TAGS_ALL is nonsensical. Furthermore, the name
of the supposed remote-tracking branch was also not one which would have
been created had it had a name.
Give the remote a name and test that we only create the tags when we
pass TAGS_ALL and that we do create the remote-branch branch when given
TAGS_AUTO.
When we update FETCH_HEAD we check whether the remote is the current
branch's upstream remote. The code does not check whether the current
refspec is relevant for this reference but always tries to perform the
reverse transformation, which causes it to error out if the refspec
doesn't match the reference.
Thanks to Pierre-Olivier Latour for the reproduction recipe.
For example, if you have
[include]
path = foo
and foo didn't exist, git_config_open_ondisk() would just give up
on the rest of the file. Now it ignores the unresolved include
without error and continues reading the rest of the file.
Already cherry-picked commits should not be re-included. If all changes
included in a commit exist in the upstream, then we should error with
GIT_EAPPLIED.
`git_rebase_next` will apply the next patch (or cherry-pick)
operation, leaving the results checked out in the index / working
directory so that consumers can resolve any conflicts, as appropriate.
Remote objects are not meant to be changed from under the user. We did
this in rename, but only the name and left the refspecs, such that a
save would save the wrong refspecs (and a fetch and anything else would
use the wrong refspecs).
Instead, let's simply take a name and not change any loaded remote from
under the user.
This function does not in fact tell us anything, as almost anything with
a colon in it is a valid rsync-style SSH path; it can not tell us that
we do not support ftp or afp or similar as those are still valid SSH
paths and we do support that.
Git for Windows will handle UNC paths only when in forward-slash
format, eg "//server/path". When given a UNC path as a remote,
rewrite standard format ("\\server\path") into this ridiculous
format.
The entry_count field is the amount of index entries covered by a
particular cache entry, that is how many files are there (recursively)
under a particular directory.
The current code that attemps to do this is severely defincient and is
trying to count the amount of children, which always comes up to zero.
We don't even need to recount, since we have the information during the
cache creation. We can take that number and keep it, as we only ever
invalidate or replace.
Keeping the cache around after read-tree is only one part of the
optimisation opportunities. In order to share the cache between program
instances, we need to write the TREE extension to the index.
Do so, taking the opportunity to rename 'entries' to 'entry_count' to
match the name given in the format description. The included test is
rather trivial, but works as a sanity check.
We don't need the remote loaded, and the function extracted both of
these from the git_remote in order to do its work, so let's remote a
step and not ask for the loaded remote at all.
This fixes#2390.
A transaction allows you to lock multiple references and set up changes
for them before applying the changes all at once (or as close as the
backend supports).
This can be used for replication purposes, or for making sure some
operations run when the reference is locked and thus cannot be changed.
When a list of refspecs is passed to fetch (what git would consider
refspec passed on the command-line), we not only need to perform the
updates described in that refspec, but also update the remote-tracking
branch of the fetched remote heads according to the remote's configured
refspecs.
These "fetches" are not however to be written to FETCH_HEAD as they
would be duplicate data, and it's not what the user asked for.
With opportunistic ref updates, git has introduced the concept of having
base refspecs *and* refspecs that are active for a particular fetch.
Let's start by letting the user override the refspecs for download.
When we describe the workdir, we perform a describe on HEAD and then
check to see if the worktree is dirty. If it is and we have a suffix
string, we append that to the buffer.
Instead of printing out to the buffer inside the information-gathering
phase, write the data to a intermediate result structure.
This allows us to split the options into gathering options and
formatting options, simplifying the gathering code.
Instead of spreading the data in function arguments, some of which
aren't used for ssh and having a struct only for ssh, use a struct for
both, using a common parent to pass to the callback.
We should let the user decide whether to cancel the connection or not
regardless of whether our checks have decided that the certificate is
fine. We provide our own assessment to the callback to let the user fall
back to our checks if they so desire.
If the certificate validation fails (or always in the case of ssh),
let the user decide whether to allow the connection.
The data structure passed to the user is the native certificate
information from the underlying implementation, namely OpenSSL or
WinHTTP.
When using a bare repo with an index, libgit2 attempts to read
files from the index. It caches those files based on the path
to the file, specifically the path to the directory that contains
the file.
If there is no working directory, we use `git_path_dirname_r` to
get the path to the containing directory. However, for the
`.gitattributes` file in the root of the repository, this ends up
normalizing the containing path to `"."` instead of the empty
string and the lookup the `.gitattributes` data fails.
This adds a test of attribute lookups on bare repos and also
fixes the problem by simply rewriting `"."` to be `""`.
A signature is made up of a non-empty name and a non-empty email so
let's validate that. This also brings us more in line with git, which
also rejects ident with an empty email.
Teach git_repository_init_ext to use relative paths for the gitlink
to the work directory. This is used when creating a sub repository
where the sub repository resides in the parent repository's
.git directory.
When the fetch refspec does not include the remote's default branch, it
indicates an error in user expectations or programmer error. Error out
in that case.
This lets us get rid of the dummy refspec which can never work as its
zeroed out. In the cases where we did not find a default branch, we set
HEAD detached immediately, which lets us refactor the "normal" path,
removing `found_branch`.
Add tests for the case when there are no branches on the remote and when
HEAD is detached but has the id of a non-branch. In both of these cases,
we should return ENOTFOUND.
It does the same as git_remote_supported_url() but has a name which
implies we'd check the URL for correctness while we're simply looking at
the scheme and looking it up in our lists.
While here, fix up the tests so we check all the combination of what's
supported.
Our mkdir helper was failing is a parent directory was not
accessible even if the child directory could be created.
This changes the helper to keep trying child directories
even when the parent is unwritable.
The old `allocfmt` is of no use to callers, as they are not able to free
the returned buffer. Export a new API that returns a static string that
doesn't need to be freed.
The online::push::notes test pushes a note but leaves it hanging
around for other tests to stumble across when they're validating
that they're seeing the refs they expect to see. Clean it up on
exit.
* Move the transport registration mechanisms into a new header under
'sys/' because this is advanced stuff.
* Remove the 'priority' argument from the registration as it adds
unnecessary complexity. (Since transports cannot decline to operate,
only the highest priority transport is ever executed.) Users who
require per-priority transports can implement that in their custom
transport themselves.
* Simplify registration further by taking a scheme (eg "http") instead
of a prefix (eg "http://").
In the check for multiline, we traverse the backslashes from the end
backwards and int the end assert that we haven't gone past the beginning
of the line. We make sure of this in the loop condition, but we also
check in the return value.
However, for certain configurations, a line in a multiline variable
might be empty to aid formatting. In that case, 'end' == 'start', since
we ended up looking at the first char which made it a multiline.
There is no need for the (end > start) check in the return, since the
loop guarantees we won't go further back than the first char in the
line, and we do accept the first char to be the final backslash.
This fixes#2483.
`git help ignore` has this to say about trailing slashes:
> If the pattern ends with a slash, it is removed for the purpose of
> the following description, but it would only find a match with a
> directory. In other words, foo/ will match a directory foo and
> paths underneath it, but will not match a regular file or a
> symbolic link foo (this is consistent with the way how pathspec
> works in general in Git).
Sure enough, having manually performed the same steps as this test,
`git status` tells us the following:
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached <file>..." to unstage)
#
# new file: force.txt
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# ../.gitignore
# child1/
# child2/
i.e. neither child1 nor child2 is ignored.
When writing 'bin/*' in the rules, this means we ignore very file inside
bin/ individually, but do not ignore the directory itself. Thus the
status listing should list both files under bin/, one untracked and one
ignored.
We always calculate multiple merge bases, but up to now we had only
exposed the "best" merge base.
Introduce git_oidarray which analogously to git_strarray lets us return
multiple ids.
git allows you to set which paths to use for the git server programs
when connecting over ssh; and we want to provide something similar.
We do this by providing a factory function which can be set as the
remote's transport callback which will set the given paths upon
creation.
We used to assume a refspec would only have an asterisk in the middle of
their respective pattern. This has not been a valid assumption for some
time now with git.
Instead of assuming where the asterisk is going to be, change the logic
to treat each pattern as having two halves with a replacement bit in the
middle, where the asterisk is.
Move the definition of git_thread_yield() to the test which needs it and
add the correct definition for it for FreeBSD and derivatives.
Original patch adding FreeBSD and derivatives by @jacquesg.
In order to connect to a remote server, we need to provide a path to the
repository we're interested in. Consider the lack of path in the url an
error.
As git_clone now has callbacks to configure the details of the
repository and remote, remove the lower-level functions from the public
API, as they lack some of the logic from git_clone proper.
git_checkout_index can now check out other git_index's (that are not
necessarily the repository index). This allows checkout_index to use
the repository's index for stat cache information instead of the index
data being checked out. git_merge and friends now check out their
indexes directly instead of trying to blend it into the running index.
To make sure that items returned from pool allocations are aligned
on nice boundaries, this rounds up all pool allocation sizes to a
multiple of 8. This adds a small amount of overhead to each item.
The rounding up could be made optional with an extra parameter to
the pool initialization that turned on rounding only for pools
where item alignment actually matters, but I think for the extra
code and complexity that would be involved, that it makes sense
just to burn a little bit of extra memory and enable this all the
time.
git_remote_set_transport now takes a transport factory rather than a transport
git_clone_options now allows the caller to specify a remote creation callback
For urls where we do not specify a username, we must handle the case
where the ssh transport asks us for the username.
Test also that switching username fails.
In order to know which authentication methods are supported/allowed by
the ssh server, we need to send a NONE auth request, which needs a
username associated with it.
Most ssh server implementations do not allow switching the username
between authentication attempts, which means we cannot use a dummy
username and then switch. There are two ways around this.
The first is to use a different connection, which an earlier commit
implements, but this increases how long it takes to get set up, and
without knowing the right username, we cannot guarantee that the
list we get in response is the right one.
The second is what's implemented here: if there is no username specified
in the url, ask for it first. We can then ask for the list of auth
methods and use the user's credentials in the same connection.
The checkout::conflict type conflict tests were failing because
they were overly assertive about the resultant mode, testing
group & other bits, which failed miserably for people who had a
umask less restrictive than 022. Only test the resultant owner bits.
Git for Windows 1.9.4 changed the behavior when the text=auto
attribute is specified and core.autocrlf=false. Previous observed
behavior would *not* filter files when going into the working
directory, the new behavior *does* filter. Update our behavior to match.
When checking out files, we're performing conversion into the user's
native line endings, but we only want to do it for files which have
consistent line endings. Refuse to perform the conversion for mixed-EOL
files.
The CRLF->LF filter is left as-is, as that conversion is considered to be
normalization by git and should force a conversion of the line endings.
Opening the same repository multiple times will currently open the same
file multiple times, as well as map the same region of the file multiple
times. This is not necessary, as the packfile data is immutable.
Instead of opening and closing packfiles directly, introduce an
indirection and allocate packfiles globally. This does mean locking on
each packfile open, but we already use this lock for the global mwindow
list so it doesn't introduce a new contention point.
Before calling the credentials callback, ask the sever which
authentication methods it supports and report that to the user, instead
of simply reporting everything that the transport supports.
In case of an error, we do fall back to listing all of them.
Finding a filename in a vector means we need to resort it every time we
want to read from it, which includes every time we want to write to it
as well, as we want to find duplicate keys.
A hash-map fits what we want to do much more accurately, as we do not
care about sorting, but just the particular filename.
We still keep removed entries around, as the interface let you assume
they were going to be around until the treebuilder is cleared or freed,
but in this case that involves an append to a vector in the filter case,
which can now fail.
The only time we care about sorting is when we write out the tree, so
let's make that the only time we do any sorting.
There is no reason why we need to use a callback here. A string array
fits better with the usage, as this is not an event and we don't need
anything from the user.
Inside `git_remote_load`, the calls to `get_optional_config` use
`giterr_clear` to unset any errors that are set due to missing config
keys. If neither a fetch nor a push url config was found for a remote,
we should set an error again.
This adds another assertion to ensure that the reference name inside
the git_reference struct returned by `git_branch_create` is returned as
precomposed if `core.precomposeunicode` is enabled.
If requested, git_clone_local_into() will try to link the object files
instead of copying them.
This only works on non-Windows (since it doesn't have this) when both
are on the same filesystem (which are unix semantics).
If you enabled core.safecrlf on an LF-ending platform, we would
error even for files with all LFs. We should only be warning on
irreversible mappings, I think.
There are a number of tests that modify the global or system
search paths during the tests. This adds a helper function to
make it easier to restore those paths and makes sure that they
are getting restored in a manner that preserves test isolation.
It seems that with the various recent changes to reference updating
and reflog writing, that the thread safety of refdb updates has
been reduced (either that or it was never thread safe and the
window for error has increased). Either way, this test is now
sometimes segfaulting which is no good, so let's disable the test
for now. We don't really make any public promises about thread
safety for this type of operation, so I think this is acceptable,
at least in the short term.
Only on a filesystem that is composed/decomposed insensitive,
should be testing that a branch can be looked up by the opposite
form and still work correctly.
One of the test helpers provides a quick way for looking up a
boolean key. But if the key way missing completely, the check
would actually raise an error. Given the way we use this helper,
if the key is missing, this should just return false, I think.
When using Iconv to convert unicode data and iconv doesn't like
the source data (because it thinks that it's not actual UTF-8),
instead of stopping the operation, just use the unconverted data.
This will generally do the right thing on the filesystem, since
that is the source of the non-UTF-8 path data anyhow.
This adds some tests for creating and looking up branches with
messy Unicode names. Also, this takes the helper function that
was previously internal to `git_repository_init` and makes it
into `git_path_does_fs_decompose_unicode` which is a useful in
tests to understand what the expected results should be.
This adds in missing calls to `git_buf_sanitize` and fixes a
number of places where `git_buf` APIs could inadvertently write
NUL terminator bytes into invalid buffers. This also changes the
behavior of `git_buf_sanitize` to NUL terminate a buffer if it can
and of `git_buf_shorten` to do nothing if it can.
Adds tests of filtering code with zeroed (i.e. unsanitized) buffer
which was previously triggering a segfault.
Diff and status do not want core.safecrlf to actually raise an
error regardless of the setting, so this extends the filter API
with an additional options flags parameter and adds a flag so that
filters can be applied with GIT_FILTER_OPT_ALLOW_UNSAFE, indicating
that unsafe filter application should be downgraded from a failure
to a warning.
The diff code was using an "ignored_prefix" directory to track if
a parent directory was ignored that contained untracked files
alongside tracked files. Unfortunately, when negative ignore rules
were used for directories inside ignored parents, the wrong rules
were applied to untracked files inside the negatively ignored
child directories.
This commit moves the logic for ignore containment into the workdir
iterator (which is a better place for it), so the ignored-ness of
a directory is contained in the frame stack during traversal. This
allows a child directory to override with a negative ignore and yet
still restore the ignored state of the parent when we traverse out
of the child.
Along with this, there are some problems with "directory only"
ignore rules on container directories. Given "a/*" and "!a/b/c/"
(where the second rule is a directory rule but the first rule is
just a generic prefix rule), then the directory only constraint
was having "a/b/c/d/file" match the first rule and not the second.
This was fixed by having ignore directory-only rules test a rule
against the prefix of a file with LEADINGDIR enabled.
Lastly, spot checks for ignores using `git_ignore_path_is_ignored`
were tested from the top directory down to the bottom to deal with
the containment problem, but this is wrong. We have to test bottom
to top so that negative subdirectory rules will be checked before
parent ignore rules.
This does change the behavior of some existing tests, but it seems
only to bring us more in line with core Git, so I think those
changes are acceptable.
We assume that everything under GIT_DIR/objects/ is a directory. This is
not necessarily the case if some process left a stray file in there.
Check beforehand if we do have a directory and ignore the entry
otherwise.
There are a few tests that set up a fake home directory and a
fake GLOBAL search path so that we can test things in global
ignore or attribute or config files. This cleans up that code to
work more robustly even if there is a test failure. This also
fixes some valgrind warnings where scanning search paths for
separators could end up doing a little bit of sketchy data access
when coming to the end of search list.
This is a proposed adjustment to the trace APIs. This makes the
trace levels into a bitmask so that they can be selectively enabled
and adds a callback-level payload, plus a message-level payload.
This makes it easier for me to a GIT_TRACE_PERF callbacks that
are simply bypassed if the PERF level is not set.
This adds an option to refresh the stat cache while generating
status. It also rips out the GIT_PERF stuff I had an makes use
of the trace API to keep statistics about what happens during diff.
When diff is scanning the working directory, if it finds a file
where it is not sure if the index entry matches the working dir,
it will recalculate the OID (which is pretty expensive). This
adds a new flag to diff so that if the OID calculation finds that
the file actually has not changed (i.e. just the modified time was
altered or such), then it will refresh the stat cache in the index
so that future calls to diff will not have to check the oid again.
When we think the stat cache in the index seems valid and the size
or mode of a file has definitely changed, then don't bother trying
to recalculate the OID of the workdir bits to confirm that it is
modified - just accept that it is modified.
This can result in files that show as modified with no actual diff,
but the behavior actually appears to match Git on the command line.
This also includes a minor optimization to not perform a submodule
lookup on the ".git" directory itself.
The current version of the commit creation and amend function are unsafe
to use when passing the update_ref parameter, as they do not check that
the reference at the moment of update points to what the user expects.
Make sure that we're moving history forward when we ask the library to
update the reference for us by checking that the first parent of the new
commit is the current value of the reference. We also make sure that the
ref we're updating hasn't moved between the read and the write.
Similarly, when amending a commit, make sure that the current tip of the
branch is the commit we're amending.
There is an interesting difference with core Git here, though.
Because libgit2 will do rename detection with the working directory,
in the last case where the HEAD and the working directory both
have the decomposed data and the index has the composed data, we
generate a single status record with two renames whereas Git will
generate one rename (head to index) and one untracked file.
The current FETCH_HEAD parsing code assumes that a quote must end the
branch name. Git however allows for quotes as part of a branch name,
which causes us to consider the FETCH_HEAD file as invalid.
Instead of searching for a single quote char, search for a quote char
followed by SP, which is not a valid part of a ref name.
When diff finds an untracked directory, it emulates Git behavior
by looking inside the directory to see if there are any untracked
items inside it. If there are only ignored items inside the dir,
then diff considers it ignored, even if there is no direct ignore
rule for it.
Checkout was not copying this behavior - when it found an untracked
directory, it just treated it as untracked. Unfortunately, when
combined with GIT_CHECKOUT_REMOVE_UNTRACKED, this made is seem that
checkout (and stash, which uses checkout) was removing ignored
items when you had only asked it to remove untracked ones.
This commit moves the logic for advancing past an untracked dir
while scanning for non-ignored items into an iterator helper fn,
and uses that for both diff and checkout.
To emulate git, stash should not remove untracked git repositories
inside the parent repo, and checkout's REMOVE_UNTRACKED should
also skip over these items.
`git stash` actually prints a warning message for these items.
That should be possible with a checkout notify callback if you
wanted to, although it would require a bit of extra logic as things
are at the moment.
This takes the `--stat` and related example options in the example
diff.c program and converts them to use the `git_diff_get_stats`
API which nicely formats stats for you.
I went to add bar-graph scaling to the stats formatter and noticed
that the `git_diff_stats` structure was holding on to all of the
`git_patch` objects. Unfortunately, each of these objects keeps
the full text of the diff in memory, so this is very expensive. I
ended up modifying `git_diff_stats` to keep just the data that it
needs to keep and allowed it to release the patches. Then, I added
width scaling to the output on top of that.
In making the diff example program match 'git diff' output, I ended
up removing an newline from the sumamry output which I then had to
compensate for in the email formatting to match the expectations.
Lastly, I went through and refactored the tests to use a couple of
helper functions and reduce the overall amount of code there.
Ignore patterns that ended with a trailing '/*' were still needing
to match against another actual '/' character in the full path.
This is not the same behavior as core Git.
Instead, we strip a trailing '/*' off of any patterns that were
matching and just take it to imply the FNM_LEADING_DIR behavior.
There was a latent bug where files that use macro definitions
could be parsed before the macro definitions were loaded. Because
of attribute file caching, preloading files that are going to be
used doesn't add a significant amount of overhead, so let's always
preload any files that could contain macros before we assemble the
actual vector of files to scan for attributes.
When traversing the directory structure, the iterator pushes and
pops ignore files using a vector. Some directories don't have
ignore files, so it uses a path comparison to decide when it is
right to actually pop the last ignore file. This was only
comparing directory suffixes, though, so a subdirectory with the
same name as a parent could result in the parent's .gitignore
being popped off the list ignores too early. This changes the
logic to compare the entire relative path of the ignore file.
When we delete an entry, we also want to refresh the configuration to
catch any changes that happened externally.
This allows us to simplify the logic, as we no longer need to delete
these variables internally. The whole state will be refreshed and the
deleted entries won't be there.
With the isolation of complex reads, we can now try to refresh the
on-disk file before reading a value from it.
This changes the semantics a bit, as before we could be sure that a
string we got from the configuration was valid until we wrote or
refreshed. This is no longer the case, as a read can also invalidate the
pointer.
In order to have consistent views of the config files for remotes,
submodules et al. and a configuration that represents what is currently
stored on-disk, we need a way to provide a view of the configuration
that does not change.
The goal here is to provide the snapshotting part by creating a
read-only copy of the state of the configuration at a particular point
in time, which does not change when a repository's main config changes.
On set, we set/add the value written to the config's internal values,
but we do not refresh old values.
Document this in a test in preparation for the refresh changes.
The checks to see if files were out of date in the attibute cache
was wrong because the cache-breaker data wasn't getting stored
correctly. Additionally, when the cache-breaker triggered, the
old file data was being leaked.
In the threading tests, I was still seeing a race condition where
the same item could end up being inserted multiple times into the
index. Preserving the sorted-ness of the index outside of the
`index_insert` call fixes the issue.
This is a big refactoring of the attribute file cache to be a bit
simpler which in turn makes it easier to enforce a lock around any
updates to the cache so that it can be used in a threaded env.
Tons of changes to the attributes and ignores code.
This makes the lock management on the index a little bit broader,
having a number of routines hold the lock across looking up the
item to be modified and actually making the modification. Still
not true thread safety, but more pure index modifications are now
safe which allows the simple cases (such as starting up a diff
while index modifications are underway) safe enough to get the
snapshot without hitting allocation problems.
As part of this, I simplified the allocation of index entries to
use a flex array and just put the path at the end of the index
entry. This makes every entry self-contained and makes it a
little easier to feel sure that pointers to strings aren't
being accidentally copied and freed while other references are
still being held.
This adds a basic test of doing simultaneous diffs on multiple
threads and adds basic locking for the attr file cache because
that was the immediate problem that arose from these tests.
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
The usefulness of these helpers came up for me while debugging
some of the iterator changes that I was making, so since they
have also been requested (albeit indirectly) I thought I'd include
them.
git_branch_t is an enum so requesting GIT_BRANCH_LOCAL | GIT_BRANCH_REMOTE is not possible as it is not a member of the enum (at least VS2013 C++ complains about it).
This fixes a regression introduced in commit a667ca8298 (PR #1946).
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Ignore rules with slashes in them are matched using FNM_PATHNAME
and use the path to the .gitignore file from the root of the
repository along with the path fragment (including slashes) in
the ignore file itself. Unfortunately, the relative path to the
.gitignore file was being applied to the global core.excludesfile
if that was also named ".gitignore".
This fixes that with more precise matching and includes test for
ignore rules with leading slashes (which were the primary example
of this being broken in the real world).
This also backports an improvement to the file context logic from
the threadsafe-iterators branch where we don't rely on mutating
the key of the attribute file name to generate the context path.
There were a couple bugs in popping ignore files during iteration
that could result in incorrect decisions be made and thus ignore
files below the root either not being loaded correctly or not
being popped at the right time.
One bug was an off-by-one in comparing the path of the gitignore
file with the path being exited during iteration.
The second bug was not correctly truncating the path being tracked
during traversal if there were no ignores on the list (i.e. when
you have no .gitignore at the root, but do have some in contained
directories).
This updates how libgit2 treats submodule-like directories that
actually have tracked content inside of them. This is a strange
corner case, but it seems that many people have abortive submodule
setups and then just went ahead and added the files into the
parent repository. In this case, we should just treat the
submodule as if it was a normal directory.
Libgit2 will still try to skip over real submodules and contained
repositories that do not have tracked files inside them, but this
adds some new handling for cases where the apparently submodule
data is in conflict with the actual list of tracked files.
The internal buffer in the `git_path_iconv_t` structure was not
being reset before the calls to `iconv` were made to convert data,
so if there were multiple decomposed Unicode paths in a single
directory, paths after the first one were being appended to the
first instead of treated as independent data.
The base for the relative urls is determined as follows, with descending
priority:
- remote url of HEAD's remote tracking branch
- remote "origin"
- workdir
This follows git.git behaviour
Cloning from an empty repo must set master's upstream to origin's
master, even if neither of them exist.
Fetching from a non-empty origin must then mark the master branch
for-merge. This currently fails.
One blame test replies on being run from within the libgit2
repository to leverage having a longer history to play with, but
some bundled versions of libgit2 don't have the whole libgit2
history. This just skips that test if the repository can't be
opened.
There was a little bug where the submodule cache thought that the
index date was out of date even when it wasn't that was resulting
in some extra scans of index data even when not needed.
Mostly this commit adds a bunch of new tests including adding and
removing submodules in the index and in the HEAD and seeing if we
can automatically pick them up when refreshing.
With the new submodule cache validity checks, we generally don't
need to call git_submodule_reload_all to have up-to-date submodule
data. Some tests are still calling it where I want to actually
test that it can be called safely and doesn't break anything, but
mostly it is not needed.
This also expands some of the existing submodule tests to cover
some variants on the behavior that was already being tested.
Wrote tests that try adding, removing, and updating the name of
submodules which showed a number of problems with how we account
for changes when incrementally updating the submodule info. Most
of these issues didn't exist before because reloading would always
blow away the old submodule data.
There are a few places where we need to join three strings to
assemble a path. This adds a simple join3 function to avoid the
comparatively expensive join_n (which calls strlen on each string
twice).
The order in this function is the opposite to what
create_with_fetchspec() has, so change this one, as url-then-refspec is
what git does.
As we need to break compilation and the swap doesn't do that, let's take
this opportunity to rename in-memory remotes to anonymous as that's
really what sets them apart.
When a submodule was inserted with a different path and name, the
return value from khash greater than zero was allowed to propagate
back out to the caller when it should really be zeroed. This led
to a possible crash when reloading submodules if that was the
first time that submodule data was loaded.
The reload_all call could end up dereferencing a NULL pointer if
there was an error while attempting to load the submodules config
data (i.e. invalid content in the gitmodules file). This fixes it.
When a directory containing a .git directory (or even just a plain
gitlink) was found, libgit2 was going out of its way to treat it
specially. This seemed like it was necessary because the diff
code was not originally emulating Git's behavior for untracked
directories correctly (i.e. scanning for ignored vs untracked items
inside). Now that libgit2 diff mimics Git's untracked directory
behavior, the special handling for contained Git repos is actually
incorrect and this commit rips it out.
`git_submodule` objects were already refcounted internally in case
the submodule name was different from the path at which it was
stored. This makes that refcounting externally used as well, so
`git_submodule_lookup` and `git_submodule_add_setup` return an
object that requires a `git_submodule_free` when done.
The reflog append function was overzealous in its checking. When passed
an old and new ids, it should not do any checking, but just serialize
the data to a reflog entry.
The existing ones lack checking zeroed ids when switching back from an
unborn branch as well as what happens when detaching.
The reflog appending function mistakenly wrote zeros when dealing with a
detached HEAD. This explicitly checks for those situations and fixes
them.
When we update the current branch, we must also append to HEAD's reflog
to keep them in sync.
This is a bit of a hack, but as git.git says, it covers 100% of
default cases.
This is not something anybody would ever do; removing HEAD makes the
.git/ directory no longer be a repository, so we wouldn't be expected to
handle such a situation.
If the pqueue comparison fn returned just 0 or 1 (think "a<b")
then the sort order of returned items could be wrong because there
was a "< 0" that really needed to be "<= 0". Yikes!!!
The git_odb_exists_prefix API was not dealing correctly when a
later backend returned GIT_ENOTFOUND even if an earlier backend
had found the object.
Additionally, the unit tests were not properly exercising the API
and had a couple mistakes in checking the results.
Lastly, since the backends are not expected to behavior correctly
unless all bytes of the short id are zero except for the prefix,
this makes the ODB prefix APIs explicitly clear out the extra
bytes so the user doesn't have to be as careful.
This finds a short id string that will unambiguously select the
given object, starting with the core.abbrev length (usually 7)
and growing until it is no longer ambiguous.
This adds `git_diff_buffers` and `git_patch_from_buffers`. This
also includes a bunch of internal refactoring to increase the
shared code between these functions and the blob-to-blob and
blob-to-buffer APIs, as well as some higher level assert helpers
in the tests to also remove redundancy.
* Make GIT_INLINE an internal definition so it cannot be used in
public headers
* Fix language in CONTRIBUTING
* Make index caps API use signed instead of unsigned values
This adds an API to amend an existing commit, basically a shorthand
for creating a new commit filling in missing parameters from the
values of an existing commit. As part of this, I also added a new
"sys" API to create a commit using a callback to get the parents.
This allowed me to rewrite all the other commit creation APIs so
that temporary allocations are no longer needed.
This fixes a number of warnings with the Windows 64-bit build
including a test failure in test_repo_message__message where an
invalid pointer to a git_buf was being used.
This fixes a typo I made for setting the sorted flag on the index
after a reload. That typo didn't actually cause any test failures
so I'm also adding a test that explicitly checks that the index is
correctly sorted after a reload when ignoring case and when not.
This updates the git_pqueue to simply be a set of specialized
init/insert/pop functions on a git_vector.
To preserve the pqueue feature of having a fixed size heap, I
converted the "sorted" field in git_vectors to a more general
"flags" field so that pqueue could mix in it's own flag. This
had a bunch of ramifications because a number of places were
directly looking at the vector "sorted" field - I added a couple
new git_vector helpers (is_sorted, set_sorted) so the specific
representation of this information could be abstracted.
There were some confusing issues mixing up the number of bytes
written to the zstream output buffer with the number of bytes
consumed from the zstream input. This reorganizes the zstream
API and makes it easier to deflate an arbitrarily large input
while still using a fixed size output.
This removes the fetchRecurse compiler warnings and makes the
behavior match the other submodule options (i.e. the in-memory
setting can be reset to the on-disk value).
Writing a sample Javascript driver pointed out some extra
whitespace handling that needed to be done in the diff driver.
This adds some tests with some sample javascript code that I
pulled off of GitHub just to see what would happen. Also, to
clean up the userdiff test data, I did a "git gc" and packed
up the test objects.
This moves the expected and actual test data along with the source
data for the userdiff tests into the tests/resources/userdiff test
repo and updates the test to use that.
Reorganize the builtin driver table slightly so that core Git
builtin definitions can be imported verbatim. Then take a few of
the core Git drivers and pull them in.
This also creates a test of diffs with the builtin HTML driver
which led to some small error handling fixes in the driver
selection logic.
Don't try to determine whether the system supports file modes
when putting the tree data in the index during checkout. The tree's
mode is canonical and did not come from stat(2) in the first place.
It's hard or even impossible to correctly free the string buffer
allocated by git_patch_to_str in some circumstances. Drop the function
so people have to use git_patch_to_buf instead - git_buf has a dedicated
destructor.
The "merge none" (don't automerge) flag was only to aide in
merge trivial tests. We can easily determine whether merge
trivial resulted in a trivial merge or an automerge by examining
the REUC after automerge has completed.
The default merge_file level was XDL_MERGE_MINIMAL, which will
produce conflicts where there should not be in the case where
both sides were changed identically. Change the defaults to be
more aggressive (XDL_MERGE_ZEALOUS) which will more aggressively
compress non-conflicts. This matches git.git's defaults.
Increase testing around reverting a previously reverted commit to
illustrate this problem.
Extend the "unmodified" submodule workdir test to include
uninitialized submodules, to prevent reporting submodules as
modified when they're not in the workdir at all.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
Ok, scrap the previous commit. This is the right overflow check that
takes care of 64 bit overflow **and** 32-bit overflow, which needs to be
considered because the pool malloc can only allocate 32-bit elements in
one go.
This wasn't being tested and since it has a callback, I fixed it
even though the return value of this callback is not treated like
any of the other callbacks in the API.
This adds tests that try canceling an indexer operation from
within the progress callback.
After writing the tests, I wanted to run this under valgrind and
had a number of errors in that situation because mmap wasn't
working. I added a CMake option to force emulation of mmap and
consolidated the Amiga-specific code into that new place (so we
don't actually need separate Amiga code now, just have to turn on
-DNO_MMAP).
Additionally, I made the indexer code propagate error codes more
reliably than it used to.
Clone callbacks can return non-zero values to cancel the clone.
This adds some tests to verify that this actually works and updates
the documentation to be clearer that this can happen and that the
return value will be propagated back by the clone function.
The checkout notify callback behavior on non-zero return values
was not being tested. This adds tests, fixes a bug with positive
values, and clarifies the documentation to make it clear that the
checkout can be canceled via this mechanism.
The callback to supply data chunks could return a negative value
to stop creation of the blob, but we were neither using GIT_EUSER
nor propagating the return value. This makes things use the new
behavior of returning the negative value back to the user.
This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
This continues auditing all the places where GIT_EUSER is being
returned and making sure to clear any existing error using the
new giterr_user_cancel helper. As a result, places that relied
on intercepting GIT_EUSER but having the old error preserved also
needed to be cleaned up to correctly stash and then retrieve the
actual error.
Additionally, as I encountered places where error codes were not
being propagated correctly, I tried to fix them up. A number of
those fixes are included in the this commit as well.
This adds giterr_user_cancel to return GIT_EUSER and clear any
error message that is sitting around. As a result of using that
in places, we need to be more thorough with capturing errors that
happen inside a callback when used internally. To help with that,
this also adds giterr_capture and giterr_restore so that when we
internally use a foreach-type function that clears errors and
converts them to GIT_EUSER, it is easier to restore not just the
return value, but the actual error message text.
The frontend used to look at the file directly, but that's obviously not
the right thing to do. Expose it on the backend and use that function
instead.
git-core only writes to the reflogs of HEAD, refs/heads/ and,
refs/notes/ or if there is already a reflog in place. Adjust our code to
follow these semantics.
When doing copy detection, it is often necessary to include
UNMODIFIED records in the git_diff so they are available as source
records for GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED. Yet in the final
diff, often you will not want to have these UNMODIFIED records.
This adds a flag which marks these UNMODIFIED records for deletion
from the diff list so they will be removed after the rename detect
phase is over.
When FIND_COPIES is used in combination with BREAK_REWRITES for
rename detection, there was a bug where the split MODIFIED delta
was only used as a target for RENAME records and not for COPIED
records. This fixes that, converting the split into a pair of
DELETED and COPIED deltas when that circumstance arises.
Whenever a reference is created or updated, we need to write to the
reflog regardless of whether the user gave us a message, so we shouldn't
leave that to the ref frontend, but integrate it into the backend.
This also eliminates the race between ref update and writing to the
reflog, as we protect the reflog with the ref lock.
As an additional benefit, this reflog append on the backend happens by
appending to the file instead of parsing and rewriting it.
This renamed `git_khash_str` to `git_strmap`, `git_hash_oid` to
`git_oidmap`, and deletes `git_hashtable` from the tree, plus
adds unit tests for `git_strmap`.
There was an error in the tree iterator where it would
delete two tree levels instead of just one when popping
up a tree level. Unfortunately the test data for the
tree iterator did not have any deep trees with subtrees
in the middle of the tree items, so this problem went
unnoticed. This contains the 1-line fix plus new test
data and tests that reveal the issue.
This gives `git_status_foreach()` back its old behavior of
emulating the "--untracked=all" behavior of git. You can
get any of the various --untracked options by passing flags
to `git_status_foreach_ext()` but the basic version will
keep the behavior it has always had.
This "fixes" the broken t18 status tests to accurately reflect
the new behavior for "created" untracked subdirectories. See
discussion in the PR for more details.
This also contains the submodules unit test that I forgot to
git add, and ports most of the t18-status.c tests to clar (still
missing a couple of the git_status_file() single file tests).