When we insert e.g. a tag or tagged object into the packfile, we must
make sure to insert any referenced objects as well, or we will have
broken links.
Use the recursive version of packfile insertion to make sure we send
over not just the tagged object but also the objects it references.
This function recursively inserts the given object and any referenced
ones. It can be thought of as a more general version of the functions to
insert a commit or tree.
When the user has a certificate check callback set, we still have to
check whether the stream we're using is even capable of providing a
certificate.
In the case of an unencrypted certificate, do not ask for it from the
stream, and do not call the callback.
This extra constructor will be useful for the annotated versions of
ref-modifying functions, as it allows us to create a commit with the
extended sha syntax which was used to retrieve it.
We do not always want to put the id directly into the reflog, but we
want to speicfy what a user typed. For this use-case we provide
annotated version of a few functions which let the caller specify what
user-friendly name was used when asking for the operation.
It turns out that erroring out on duplicate commits is the right thing
to do, but git was not hitting the bug on the server-side.
Bring back a descriptive error message in case of duplicate entries and
error out.
If a packfile includes duplicate objects, we can choose to use the
secon copy instead of the first by using the same logic as if it were
the first.
Change the error condition from 0 to -1, which indicates a bad resize,
and set the OOM message in that case.
This does mean we will leak the first copy of the object. We can deal
with that later, but making fetches work is more important.
While this is not even close to a fix, we can at least set an error
message so we know which error we are facing. Up to know we just
returned an error without a message.
Currently git_submodule_sync writes the submodule's URL to the
key 'branch.<REMOTE_NAME>.remote' while the reference
implementation of `git submodule sync` writes to
'remote.<REMOTE_NAME>.url', which is the intended behavior
according to git-submodule(1).
The current implementation does not set 'fire_callback' back to 0 for failed updates so the callback still fires.
Instead of adding yet another condition check to set 'fire_callback' to 0 if needed, considering this function should be a no-op for failed updates anyway, the best fix is to simplify its logic to check upfront if the update is a failed one.
The default behaviour for the packbuilder is to perform the work in a
single thread, which is fine for the public API, but we currently have
no way for a user to determine the number of threads to use when
creating the packfile, which makes our clone behaviour over the
filesystem quite a bit slower than what git offers.
This is a very particular scenario, in which we avoid spawning git by
being ourselves the server-side, so it's probably ok to auto-set the
threading, as the upload-pack process would do if we were talking to
git.
Currently we use the most naïve and inefficient method for figuring out
which objects to send to the remote whereby we end up trying to insert
subdirs which have not changed multiple times.
Instead, make use of the packbuilder's built-in more efficient method
which uses the walk to feed the object list and avoids inserting an
object and its descendants.
Most use-cases for the object packer communicate in terms of commits
which each side has. We already have an object to specify this
relationship between commits, namely git_revwalk.
By knowing which commits we want to pack and which the other side
already has, we can perform similar optimisations to git, by marking
each tree as interesting or uninteresting only once, and not sending
those trees which we know the other side has.
Keep the definitions in the headers, while putting the declarations in
the C files. Putting the function definitions in headers causes
them to be duplicated if you include two headers with them.
If a refspec could not be parsed, the git_refspec__parse function would
return an error value but would not provide additional error information
for the callers. This commit amends that function to set a more useful
error message.
When we rename a reference, we want the old and new ids to be the same
one (as we did not change it). The normal code path looks up the old id
from the current value of the brtanch, but by the time we look it up, it
does not exist anymore and thus we write a zero id.
Pass the old id explicitly instead.
Clear the error message on git_libgit2_shutdown for all versions of
the library (no threads and Win32 threads). Drop the giterr_clear
in clar, as that shouldn't be necessary.
This changes the get_entry() method to return a refcounted version of
the config entry, which you have to free when you're done.
This allows us to avoid freeing the memory in which the entry is stored
on a refresh, which may happen at any time for a live config.
For this reason, get_string() has been forbidden on live configs and a
new function get_string_buf() has been added, which stores the string in
a git_buf which the user then owns.
The functions which parse the string value takea advantage of the
borrowing to parse safely and then release the entry.
The user may decide to return any type of credential, including ones we
did not say we support. Add a check to make sure the user returned an
object of the right type and error out if not.
We want to use the "checkout: moving from ..." message in order to let
git know when a change of branch has happened. Make the convenience
functions for this goal write this message.
This namespace is about behaving like git's branch command, so let's do
exactly that instead of taking a reflog message.
This override is still available via the reference namespace.
The signature for the reflog is not something which changes
dynamically. Almost all uses will be NULL, since we want for the
repository's default identity to be used, making it noise.
In order to allow for changing the identity, we instead provide
git_repository_set_ident() and git_repository_ident() which allow a user
to override the choice of signature.
Win32 DLLs have four fields for the version number (major, minor,
teeny, patch). If a consumer wants to build a custom DLL, it may
be useful to set the patchlevel version number in the DLL.
This value only affects the DLL version number, it does not affect
the resultant "version number", which remains major.minor.teeny.
Windows headers #define some names that openssl uses too. Openssl
headers #undef the offending names before reusing them. But if those
offending Windows headers get included after the openssl headers the
namespace is polluted and nothing good happens.
Fixes issue #2850.
When the repository does not contain an index, emulate git's behavior
and upgrade to `SAFE_CREATE`. This allows us to check out repositories
created with `git clone --no-checkout`.
A repository can have multiple "reserved names" now, not just
a single "short name" for the repository folder itself. Refactor
to include a git_repository__reserved_names that returns all the
reserved names for a repository.
git_index_add_frombuffer enables now to store a memory buffer in the odb
and to store an entry in the index directly if the index is attached to a
repository.
Provide a convenience function that creates a buffer that can be provided
to callers but will not be freed via `git_buf_free`, so the buffer
creator maintains the allocation lifecycle of the buffer's contents.
Add structures and preliminary functions to take a buffer, file or
blob and write the contents in chunks through an arbitrary number
of chained filters, finally writing into a user-provided function
accept the contents.
Increment refcount of newly added cache entries just like existing
entries looked up from the cache. Otherwise the new entry can be
evicted from the cache and destroyed while it's still in use.
Always lock the index when we begin the merge, before we write
any of the metdata files. This prevents a race where another
client may run a commit after we have written the MERGE_HEAD but
before we have updated the index, which will produce a merge
commit that is treesame to one parent. The merge will finish and
update the index and the resultant commit would not be a merge at
all.
Introduce `git_indexwriter`, to allow us to lock the index while
performing additional operations, then complete the write (or abort,
unlocking the index).
Make our overflow checking look more like gcc and clang's, so that
we can substitute it out with the compiler instrinsics on platforms
that support it. This means dropping the ability to pass `NULL` as
an out parameter.
As a result, the macros also get updated to reflect this as well.
Win32 generally ignores Unix-like mode bits that don't make any
sense on the platform (eg `0644` makes no sense to Windows). But
WINE complains loudly when presented with POSIXy bits. Remove them.
(Thanks @phkelley)
Use `size_t` to hold the size of arrays to ease overflow checking,
lest we check for overflow of a `size_t` then promptly truncate
by packing the length into a smaller type.
Introduce git__reallocarray that checks the product of the number
of elements and element size for overflow before allocation. Also
introduce git__mallocarray that behaves like calloc, but without the
`c`. (It does not zero memory, for those truly worried about every
cycle.)
Fixes#2869. If included file includes more files, it may reallocate
cfg_file->readers, hence invalidate not only `r` pointer, but `result`
pointer as well.
`p_stat` calls `git_win32_path_from_utf8`, which canonicalizes the
path. Do not further try to modify the path, else we trim the
trailing slash from a root directory and try to access `C:` instead
of `C:/`.
During checkout, assume that the .gitattributes files aren't
modified during the checkout. Instead, create an "attribute session"
during checkout. Assume that attribute data read in the same
checkout "session" hasn't been modified since the checkout started.
(But allow subsequent checkouts to invalidate the cache.)
Further, cache nonexistent git_attr_file data even when .gitattributes
files are not found to prevent re-scanning for nonexistent files.
pathspec_match_free() should not dereference a NULL passed to it.
I found this issue when I tried to run example log program with
nonexistent branch:
./example/log help
Such call leads to segmentation fault.
This fixes the build at least on FreeBSD, where those types were not
defined indirectly:
src/openssl_stream.c💯18: error: variable has incomplete type 'struct in6_addr'
struct in6_addr addr6;
^
src/openssl_stream.c💯9: note: forward declaration of 'struct in6_addr'
struct in6_addr addr6;
^
src/openssl_stream.c:111:18: error: use of undeclared identifier 'AF_INET'
if (p_inet_pton(AF_INET, host, &addr4)) {
^
src/unix/posix.h:31:40: note: expanded from macro 'p_inet_pton'
^
src/openssl_stream.c:115:18: error: use of undeclared identifier 'AF_INET6'
if(p_inet_pton(AF_INET6, host, &addr6)) {
^
src/unix/posix.h:31:40: note: expanded from macro 'p_inet_pton'
^
This is supported by the underlying set_index() implementation
and setting the repository index to NULL is recommended by the
git_repository_set_bare() documentation.
On case insensitive filesystems, we may have files in the working
directory that case fold to a name we want to write. Remove those
files (by default) so that we will not end up with a filename that
has the unexpected case.
The documentation for `git_path_join_unrooted` states that the base
length will be returned, so that consumers like checkout know where
to start creating directories instead of always creating directories
at the directory root.
The implementation of the hashsig API disallows computing a signature on
small files containing only a few lines. This new flag disables this
behavior.
git_diff_find_similar() sets this flag by default which means that rename
/ copy detection of small files will now work. This in turn affects the
behavior of the git_status and git_blame APIs which will now detect rename
of small files assuming the right options are passed.
git_remote_download() must also clear the internal push state resulting from a possible earlier push operation. Otherwise calling git_remote_update_tips() will execute the push version instead of the fetch version and among other things, tags won't be updated.
The clock_gettime() function is normally not available under
AmigaOS, hence another solution is required. We are using now
GetUpTime() that is present in current versions of this
operating system.
The openssl setup function needs to be GIT_EXPORT'ed, be sure
to include the `sys/openssl.h` header so that it is appropriately
decorated as an export function.
A byte value of 0x85 is not whitespace, we were conflating that with
U+0085 (UTF8: 0xc2 0x85). This caused us to incorrectly treat valid
multibyte characters like U+88C5 (UTF8: 0xe8 0xa3 0x85) as whitespace.
On a case-insensitive filesystem, we need to deal with case-changing
renames (eg, foo -> FOO) by removing the old and adding the new,
exactly as if we were on a case-sensitive filesystem.
Update the `checkout::tree::can_cancel_checkout_from_notify` test, now
that notifications are always sent case sensitively.
For the REUC and NAME entries, we use size_t internally, and we take
size_t for the get_byindex() functions, but the entrycount() functions
strangely cast to an unsigned int instead.
This introduces the functionality of submodule update in
'git_submodule_do_update'. The existing 'git_submodule_update' function is
renamed to 'git_submodule_update_strategy'. The 'git_submodule_update'
function now refers to functionality similar to `git submodule update`,
while `git_submodule_update_strategy` is used to get the configured value
of submodule.<name>.update.
Path validation may be influenced by `core.protectHFS` and
`core.protectNTFS` configuration settings, thus treebuilders
can take a repository to influence their configuration.
HFS filesystems ignore some characters like U+200C. When these
characters are included in a path, they will be ignored for the
purposes of comparison with other paths. Thus, if you have a ".git"
folder, a folder of ".git<U+200C>" will also match. Protect our
".git" folder by ensuring that ".git<U+200C>" and friends do not match it.
Disallow:
1. paths with trailing dot
2. paths with trailing space
3. paths with trailing colon
4. paths that are 8.3 short names of .git folders ("GIT~1")
5. paths that are reserved path names (COM1, LPT1, etc).
6. paths with reserved DOS characters (colons, asterisks, etc)
These paths would (without \\?\ syntax) be elided to other paths - for
example, ".git." would be written as ".git". As a result, writing these
paths literally (using \\?\ syntax) makes them hard to operate with from
the shell, Windows Explorer or other tools. Disallow these.
When turning UTF-8 paths into UCS-2 paths for Windows, always use
the \\?\-prefixed paths. Because this bypasses the system's
path canonicalization, handle the canonicalization functions ourselves.
We must:
1. always use a backslash as a directory separator
2. only use a single backslash between directories
3. not rely on the system to translate "." and ".." in paths
4. remove trailing backslashes, except at the drive root (C:\)
This option does not get persisted to disk, which makes it different
from the rest of the setters. Remove it until we go all the way.
We still respect the configuration option, and it's still possible to
perform a one-time prune by calling the function.
For each remote-tracking branch we want to remove, we need to consider
it against every other refspec in case we have overlapping refspecs,
such as with
refs/heads/*:refs/remotes/origin/*
refs/pull/*/head:refs/remotes/origin/pr/*
as we'd otherwise remove too many refspecs.
Create a list of condidates, which are the references matching the rhs
of any active refspec and then filter that list by removing those
entries for which we find a remove reference with any active
refspec. Those which are left after this are removed.
Most of the network-facing facilities have been copied to the socket and
openssl streams. No code now uses these functions directly anymore, so
we can now remove them.
Having an ssh stream would require extra work for stream capabilities we
don't need anywhere else (oob auth and command execution) so for now
let's move away from the gitno connection to use socket_stream.
We can introduce an ssh stream interface if and as we need it.
This unfortunately isn't as stackable as could be possible, as it
hard-codes the socket stream. This is because the method of using a
custom openssl BIO is not clear, and we do not need this for now. We can
still bring this in if and as we need it.
We currently have gitno for talking over TCP, but this needs to know
about both plaintext and OpenSSL connections and the code has gotten
somewhat messy with ifdefs determining which version of the function
should be called.
In order to clean this up and abstract away the details of sending over
the different types of streams, we can instead use an interface and
stack stream implementations.
We may not be able to use the stackability with all streams, but we
are definitely be able to use the abstraction which is currently spread
between different bits of gitno.
A rule can only negate something which was explicitly mentioned in the
rules before it. Change our parsing to ignore a negative rule which does
not negate something mentioned in the rules above it.
While here, fix a wrong allocator usage. The memory for the match string
comes from pool allocator. We must not free it with the general
allocator. We can instead simply forget the string and it will be
cleaned up.
We no longer have NULL strings, but empty ones and duplicate the sides
if necessar, so the first check will never do anything.
While in the area, remove unnecessary ifs and early returns.
There are some combination of objects and target types which we know
cannot be fulfilled. Return EINVALIDSPEC for those to signify that there
is a mismatch in the user-provided data and what the object model is
capable of satisfying.
If we start at a tag and in the course of peeling find out that we
cannot reach a particular type, we return EPEEL.
That's a bad assumption to make, even though right now it holds
(because of the way we've implemented decompression of packfiles),
this may change in the future, given that ODB objects can be
binary data.
Furthermore, the ODB object can return a NULL pointer if the object
is empty. Copying the NULL pointer to the strbuf lets us handle it
like an empty string. Again, the NULL pointer is valid behavior because
you're supposed to check the *size* of the object before working
on it.
This is a contract that we made in the library and that we need to uphold. The
contents of a blob can never be NULL because several parts of the library (including
the filter and attributes code) expect `git_blob_rawcontent` to always return a
valid pointer.
When we fetch twice with the same remote object, we did not properly
clear the connection flags, so we would leak state from the last
connection.
This can cause the second fetch with the same remote object to fail if
using a HTTP URL where the server redirects to HTTPS, as the second
fetch would see `use_ssl` set and think the initial connection wanted to
downgrade the connection.
When attempting to update a reference on a remote during push, and the
reference on the remote refers to a commit that does not exist locally,
then we should report a more clear error message.
When creating a new remote, contrary to loading one from disk,
active_refspecs was not populated. This means that if using the new
remote to push, git_push_update_tips() will be a no-op since it
checks the refspecs passed during the push against the base ones
i.e. active_refspecs. And therefore the local refs won't be created
or updated after the push operation.
There is one well-known and well-tested parser which we should use,
instead of implementing parsing a second time.
The common parser is also augmented to copy the LHS into the RHS if the
latter is empty.
The expressions test had to change a bit, as we now catch a bad RHS of a
refspec locally.
This function, similar in style to git_remote_fetch(), performs all the
steps required for a push, with a similar interface.
The remote callbacks struct has learnt about the push callbacks, letting
us set the callbacks a single time instead of setting some in the remote
and some in the push operation.
This describes their purpose better, as we now initialize ssl and some
other global stuff in there. Calling the init function is not something
which has been optional for a while now.
git hardocodes these as objects which exist regardless of whether they
are in the odb and uses them in the shell interface as a way of
expressing the lack of a blob or tree for one side of e.g. a diff.
In the library we use each language's natural way of declaring a lack of
value which makes a workaround like this unnecessary. Since git uses it,
it does however mean each shell application would need to perform this
check themselves.
This makes it common work across a range of applications and an issue
with compatibility with git, which fits right into what the library aims
to provide.
Thus we introduce the hard-coded empty blob and tree in the odb
frontend. These hard-coded objects are checked for before going to the
backends, but after the cache check, which means the second time they're
used, they will be treated as normal cached objects instead of creating
new ones.
If the remote is anonymous, then we cannot check for any configuration,
as there is no name. Check for this before we try to use the name, which
may be a NULL pointer.
This fixes#2697.
This function has one output but can match multiple files, which can be
unexpected for the user, which would usually path the exact path of the
file he wants the status of.
We cannot know from looking at .gitmodules whether a directory is a
submodule or not. We need the index or tree we are comparing against to
tell us. Otherwise we have to assume the entry in .gitmodules is stale
or otherwise invalid.
Thus we pass the index of the repository into the workdir iterator, even
if we do not want to compare against it. This follows what git does,
which even for `git diff <tree>`, it will consider staged submodules as
such.
A rule "src" in src/.gitignore must only match subdirectories of
src/. The current code does not include this context in the match rule
and would thus consider this rule to match the top-level src/ directory
instead of the intended src/src/.
Keep track fo the context in which the rule was defined so we can
perform a prefix match.
We currently consider CR to start the end of the line, but that means
that we miss cases with CR CR LF which can be used with git to match
files whose names have CR at the end of their names.
The fix from the patch comes from Russell's comment in the issue.
This fixes#2536.
Before trying to rtransform using the given refspec to figure out what
the name of the upstream branch is on the remote, we must make sure that
the target of the refspec applies to the current branch's upstream.
* Error-handling is cleaned up to only let a file-not-found error
through, not other sorts of errors. And when a file-not-found
error happens, we clean up the error.
* Test now checks that file-not-found introduces no error. And
other minor cleanups.
The create function with default refspec is the same as the one with a
custom refspec, but it has the default refspec, so we can create the one
on top of the other.
When we first ask OpenSSL to verify the certfiicate itself (rather
than the HTTPS specifics), we should also return
GIT_ECERTIFICATE. Otherwise, the caller would consider this as a failed
operation rather than a failed validation and not call the user's own
validation.
For example, if you have
[include]
path = foo
and foo didn't exist, git_config_open_ondisk() would just give up
on the rest of the file. Now it ignores the unresolved include
without error and continues reading the rest of the file.
Already cherry-picked commits should not be re-included. If all changes
included in a commit exist in the upstream, then we should error with
GIT_EAPPLIED.
`git_rebase_next` will apply the next patch (or cherry-pick)
operation, leaving the results checked out in the index / working
directory so that consumers can resolve any conflicts, as appropriate.
Remote objects are not meant to be changed from under the user. We did
this in rename, but only the name and left the refspecs, such that a
save would save the wrong refspecs (and a fetch and anything else would
use the wrong refspecs).
Instead, let's simply take a name and not change any loaded remote from
under the user.
This function does not in fact tell us anything, as almost anything with
a colon in it is a valid rsync-style SSH path; it can not tell us that
we do not support ftp or afp or similar as those are still valid SSH
paths and we do support that.
All versions of SSL are considered deprecated now, so let's ask OpenSSl
to only use TLSv1. We still ask it to load those ciphers for
compatibility with servers which want to use an older hello but will use
TLS for encryption.
For good measure we also disable compression, which can be exploitable,
if the OpenSSL version supports it.
Git for Windows will handle UNC paths only when in forward-slash
format, eg "//server/path". When given a UNC path as a remote,
rewrite standard format ("\\server\path") into this ridiculous
format.
The entry_count field is the amount of index entries covered by a
particular cache entry, that is how many files are there (recursively)
under a particular directory.
The current code that attemps to do this is severely defincient and is
trying to count the amount of children, which always comes up to zero.
We don't even need to recount, since we have the information during the
cache creation. We can take that number and keep it, as we only ever
invalidate or replace.
FindFirstFile will fail with INVALID_HANDLE_VALUE if there are no
children to the given path, which can happen if the given path is a
file (and obviously has no children) or if the given path is an empty
mount point. (Most directories have at least directory entries '.'
and '..', but ridiculously another volume mounted in another drive
letter's path space do not, and thus have nothing to enumerate.)
If FindFirstFile fails, check if this is a directory-like thing
(a mount point).
A reparse point that is an IO_REPARSE_TAG_MOUNT_POINT could be
a junction or an actual filesystem mount point. (Who knew?)
If it's the latter, its reparse point will report the actual
volume information \??\Volume{GUID}\ and we should not attempt
to dereference that further, instead readlink should report
EINVAL since it's not a symlink / junction and its original
path was canonical.
Yes, really.
An obvious place to fill the tree cache is on write-tree, as we're
guaranteed to be able to fill in the whole tree cache.
The way this commit does this is not the most efficient, as we read the
root tree from the odb instead of filling in the cache as we go along,
but it fills the cache such that successive operations (and persisting
the index to disk) will be able to take advantage of the cache, and it
reuses the code we already have for filling the cache.
Filling in the cache as we create the trees would require some
reallocation of the children vector, which is currently not possible
with out pool implementation. A different data structure would likely
allow us to perform this operation at a later date.
Keeping the cache around after read-tree is only one part of the
optimisation opportunities. In order to share the cache between program
instances, we need to write the TREE extension to the index.
Do so, taking the opportunity to rename 'entries' to 'entry_count' to
match the name given in the format description. The included test is
rather trivial, but works as a sanity check.
When reading from a tree, we know what every tree is going to look like,
so we can fill in the tree cache completely, making use of the index for
modification of trees a lot quicker.
This simplifies freeing the entries quite a bit; though there aren't
that many failure paths right now, introducing filling the cache from a
tree will introduce more. This makes sure not to leak memory on errors.
If there have been no pushes, we can immediately return ITEROVER. If
there have been no hides, we must not run the uninteresting pre-mark
phase, as we do not want to hide anything and this would simply cause us
to spend time loading objects.
This introduces a phase at the start of preparing a walk which pre-marks
uninteresting commits, but only up to the common ancestors.
We do this in a similar way to git, by walking down the history and
marking (which is what we used to do), but we keep a time-sorted
priority queue of commits and stop marking as soon as there are only
uninteresting commits in this queue.
This is a similar rule to the one used to find the merge-base. As we
keep inserting commits regardless of the uninteresting bit, if there are
only uninteresting commits in the queue, it means we've run out of
interesting commits in our walk, so we can stop.
The old mark_unintesting() logic is still in place, but that stops
walking if it finds an already-uninteresting commit, so it will stop on
the ones we've pre-marked; but keeping it allows us to also hide those
that are hidden via the callback.
We don't need the remote loaded, and the function extracted both of
these from the git_remote in order to do its work, so let's remote a
step and not ask for the loaded remote at all.
This fixes#2390.
The stash is implemented as the refs/stash reference and its reflog. In
order to modify the reflog, we need avoid races by making sure we're the
only ones allowed to modify the reflog.
We achieve this via the transactions API. Locking the reference gives us
exclusive write access, letting us modify and write it without races.
A transaction allows you to lock multiple references and set up changes
for them before applying the changes all at once (or as close as the
backend supports).
This can be used for replication purposes, or for making sure some
operations run when the reference is locked and thus cannot be changed.
When a list of refspecs is passed to fetch (what git would consider
refspec passed on the command-line), we not only need to perform the
updates described in that refspec, but also update the remote-tracking
branch of the fetched remote heads according to the remote's configured
refspecs.
These "fetches" are not however to be written to FETCH_HEAD as they
would be duplicate data, and it's not what the user asked for.
The configured/base fetch refspecs need to be taken into account in
order to implement opportunistic remote-tracking branch updates. DWIM
them and store them in the struct, but don't do anything with them yet.
We can only DWIM when we've connected to the remote and have the list of
the remote's references. Adding or setting the refspecs should not
trigger an attempt to DWIM the refspecs as we typically cannot do it,
and even if we did, we would not use them for the current fetch.
With opportunistic ref updates, git has introduced the concept of having
base refspecs *and* refspecs that are active for a particular fetch.
Let's start by letting the user override the refspecs for download.
When we describe the workdir, we perform a describe on HEAD and then
check to see if the worktree is dirty. If it is and we have a suffix
string, we append that to the buffer.
Instead of printing out to the buffer inside the information-gathering
phase, write the data to a intermediate result structure.
This allows us to split the options into gathering options and
formatting options, simplifying the gathering code.
The getaddrinfo function indicates failure with a non-zero return code,
but this code is not necessarily negative. On platforms like Android
where the code is positive, a failed call causes libgit2 to segfault.
Instead of spreading the data in function arguments, some of which
aren't used for ssh and having a struct only for ssh, use a struct for
both, using a common parent to pass to the callback.
This option make it easy to ignore anything about the server we're
connecting to, which is bad security practice. This was necessary as we
didn't use to expose detailed information about the certificate, but now
that we do, we should get rid of this.
If the user wants to ignore everything, they can still provide a
callback which ignores all the information passed.
We should let the user decide whether to cancel the connection or not
regardless of whether our checks have decided that the certificate is
fine. We provide our own assessment to the callback to let the user fall
back to our checks if they so desire.
If the certificate validation fails (or always in the case of ssh),
let the user decide whether to allow the connection.
The data structure passed to the user is the native certificate
information from the underlying implementation, namely OpenSSL or
WinHTTP.
When using a bare repo with an index, libgit2 attempts to read
files from the index. It caches those files based on the path
to the file, specifically the path to the directory that contains
the file.
If there is no working directory, we use `git_path_dirname_r` to
get the path to the containing directory. However, for the
`.gitattributes` file in the root of the repository, this ends up
normalizing the containing path to `"."` instead of the empty
string and the lookup the `.gitattributes` data fails.
This adds a test of attribute lookups on bare repos and also
fixes the problem by simply rewriting `"."` to be `""`.
Passing 0 as the length of the paths to check to git_diff_index_to_workdir
results in all files being treated as conflicting, that is, all untracked or
modified files in the worktree is reported as conflicting
A signature is made up of a non-empty name and a non-empty email so
let's validate that. This also brings us more in line with git, which
also rejects ident with an empty email.
The compiler was generating a bunch of warnings for
git_mutex_init and git_mutex_lock when GIT_THREADS
was not defined (i.e. when not using -DTHREADSAFE=ON).
Also remove an unused variable from tests/path/core.c.
When the call to the agent fails, we must retrieve the error message
just after the function call, as other calls may overwrite it.
As the agent authentication is the only one which has a teardown and
there does not seem to be a way to get the error message from a stored
error number, this tries to introduce some small changes to store the
error from the agent.
Clearing the error at the beginning of the loop lets us know whether the
agent has already set the libgit2 error message and we should skip it,
or if we should set it.
Teach git_repository_init_ext to use relative paths for the gitlink
to the work directory. This is used when creating a sub repository
where the sub repository resides in the parent repository's
.git directory.
When the fetch refspec does not include the remote's default branch, it
indicates an error in user expectations or programmer error. Error out
in that case.
This lets us get rid of the dummy refspec which can never work as its
zeroed out. In the cases where we did not find a default branch, we set
HEAD detached immediately, which lets us refactor the "normal" path,
removing `found_branch`.
It does the same as git_remote_supported_url() but has a name which
implies we'd check the URL for correctness while we're simply looking at
the scheme and looking it up in our lists.
While here, fix up the tests so we check all the combination of what's
supported.
The previous commit makes it harder to figure out if the library was
built with support for a particular transport. Roll back some of the
changes and remove ssh:// and https:// from the list if we're being
built without support for them.
Even when built without a SSH support, we know about this transport. It
is implemented, but the current code makes us return an error message
saying it's not.
This is a leftover from the initial implementation of the transports
when there were in fact transports we knew about but were not
implemented.
Instead, let the SSH transport itself say it cannot run, the same as we
do for HTTPS.
A repository can have any number of references which we're not
interested in such as notes or tags. For the default branch calculation
we only care about branches. Make the decision about the number of
branches rather than the number of refs in general.
When we see PARENT1, it means there is a local commit and thus we are
ahead. Likewise, seeing PARENT2 means that the upstream branch has a
commit and we are one more behind.
The logic is currently reversed. Correct it.
This fixes#2501.
W/o this patch it is not possible to have a third party ssh transport_cb if GIT_SSH is disabled or a third party transport_cb which has a higher priority than the default one.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
The callers of git_packfile_unpack() expect the obj_offset argument to
be set to the beginning of the next object. We were mistakenly returning
the the offset of the object's data, which causes the CRC function to
try to use the wrong offset.
Set obj_offset to curpos instead of elem->offset to point to the next
element and bring back expected behaviour.
Our mkdir helper was failing is a parent directory was not
accessible even if the child directory could be created.
This changes the helper to keep trying child directories
even when the parent is unwritable.
The old `allocfmt` is of no use to callers, as they are not able to free
the returned buffer. Export a new API that returns a static string that
doesn't need to be freed.
The recv buffer (parse_buffer) and the buffer have independent sizes and
offsets. We try to fill in parse_buffer as much as possible before
passing it to the http parser. This is fine most of the time, but fails
us when the buffer is almost full.
In those situations, parse_buffer can have more data than we would be
able to put into the buffer (which may be getting full if we're towards
the end of a data sideband packet).
To work around this, we check if the space we have left on our buffer is
smaller than what could come from the network. If this happens, we make
parse_buffer think that it has as much space left as our buffer, so it
won't try to retrieve more data than we can deal with.
As the start of the data may no longer be at the start of the buffer, we
need to keep track of where it really starts (data_offset) and use that
in our calculations for the real size of the data we received from the
network.
This fixes#2518.
* Move the transport registration mechanisms into a new header under
'sys/' because this is advanced stuff.
* Remove the 'priority' argument from the registration as it adds
unnecessary complexity. (Since transports cannot decline to operate,
only the highest priority transport is ever executed.) Users who
require per-priority transports can implement that in their custom
transport themselves.
* Simplify registration further by taking a scheme (eg "http") instead
of a prefix (eg "http://").
In the check for multiline, we traverse the backslashes from the end
backwards and int the end assert that we haven't gone past the beginning
of the line. We make sure of this in the loop condition, but we also
check in the return value.
However, for certain configurations, a line in a multiline variable
might be empty to aid formatting. In that case, 'end' == 'start', since
we ended up looking at the first char which made it a multiline.
There is no need for the (end > start) check in the return, since the
loop guarantees we won't go further back than the first char in the
line, and we do accept the first char to be the final backslash.
This fixes#2483.
While scanning through a directory hierarchy, this prevents a
positive ignore match on a parent directory from blocking the scan
of a directory when a negative match rule exists for files inside
the directory.
* Removes mingw-compat.h
* Cleans up separation of compiler/platform idiosyncrasies
* Unifies mingw/msvc stat structures and functions
* (Tries to) hide more compiler specific implementation details (even in our internal API)
We always calculate multiple merge bases, but up to now we had only
exposed the "best" merge base.
Introduce git_oidarray which analogously to git_strarray lets us return
multiple ids.
This works around strict aliasing rules letting some versions of
GCC (particularly on RHEL 6) thinking that they can skip updating the
size of the array when calculating the next element's offset.
Preallocating two commits doesn't make much sense as leaving allocation
to the first array usage will allocate a sensible size with room for
growth.
This preallocation has also been hiding issues with strict aliasing in
the tests, as we have fairly simple histories and never trigger the
growth.
git allows you to set which paths to use for the git server programs
when connecting over ssh; and we want to provide something similar.
We do this by providing a factory function which can be set as the
remote's transport callback which will set the given paths upon
creation.
We used to assume a refspec would only have an asterisk in the middle of
their respective pattern. This has not been a valid assumption for some
time now with git.
Instead of assuming where the asterisk is going to be, change the logic
to treat each pattern as having two halves with a replacement bit in the
middle, where the asterisk is.
When transforming a non-pattern refspec, we simply need to copy over the
opposite string. Move that logic up to the wrapper so we can assume a
pattern refspec in the transformation function.
Move the definition of git_thread_yield() to the test which needs it and
add the correct definition for it for FreeBSD and derivatives.
Original patch adding FreeBSD and derivatives by @jacquesg.
In order to connect to a remote server, we need to provide a path to the
repository we're interested in. Consider the lack of path in the url an
error.
When the stream writing function was written, it assume that
libssh2_channel_write() would always write all of the data to the
wire. This is only true for the first 32k of data, which it tries to
fit into one ssh packet.
Since it can perform short writes, call it in a loop like we do for
send(), advancing the buffer offset.
As git_clone now has callbacks to configure the details of the
repository and remote, remove the lower-level functions from the public
API, as they lack some of the logic from git_clone proper.
Analogously to the remote creation callback, provide a way for the user
of git_clone() to create the repository with whichever options they
desire via callback.
git_checkout_index can now check out other git_index's (that are not
necessarily the repository index). This allows checkout_index to use
the repository's index for stat cache information instead of the index
data being checked out. git_merge and friends now check out their
indexes directly instead of trying to blend it into the running index.
To make sure that items returned from pool allocations are aligned
on nice boundaries, this rounds up all pool allocation sizes to a
multiple of 8. This adds a small amount of overhead to each item.
The rounding up could be made optional with an extra parameter to
the pool initialization that turned on rounding only for pools
where item alignment actually matters, but I think for the extra
code and complexity that would be involved, that it makes sense
just to burn a little bit of extra memory and enable this all the
time.
The OpenSSL library-loading functions do not expect to be called
multiple times. Add a flag in the non-threaded libgit2 init so we only
call once.
This fixes#2446.
git_remote_set_transport now takes a transport factory rather than a transport
git_clone_options now allows the caller to specify a remote creation callback
In order to know which authentication methods are supported/allowed by
the ssh server, we need to send a NONE auth request, which needs a
username associated with it.
Most ssh server implementations do not allow switching the username
between authentication attempts, which means we cannot use a dummy
username and then switch. There are two ways around this.
The first is to use a different connection, which an earlier commit
implements, but this increases how long it takes to get set up, and
without knowing the right username, we cannot guarantee that the
list we get in response is the right one.
The second is what's implemented here: if there is no username specified
in the url, ask for it first. We can then ask for the list of auth
methods and use the user's credentials in the same connection.
Set a message when we fail to lock.
Also make the put function void, since it's called from free, which
cannot report errors. The only errors we can experience here are
internal state corruption, so we assert that we are trying to put a
pack which we have previously got.
If we fail to insert the packfile in the map, make sure to free it.
This makes the free function only attempt to remove its mwindows from
the global list if we have opened the packfile to avoid accessing the
list unlocked.
Git for Windows 1.9.4 changed the behavior when the text=auto
attribute is specified and core.autocrlf=false. Previous observed
behavior would *not* filter files when going into the working
directory, the new behavior *does* filter. Update our behavior to match.
When checking out files, we're performing conversion into the user's
native line endings, but we only want to do it for files which have
consistent line endings. Refuse to perform the conversion for mixed-EOL
files.
The CRLF->LF filter is left as-is, as that conversion is considered to be
normalization by git and should force a conversion of the line endings.
Opening the same repository multiple times will currently open the same
file multiple times, as well as map the same region of the file multiple
times. This is not necessary, as the packfile data is immutable.
Instead of opening and closing packfiles directly, introduce an
indirection and allocate packfiles globally. This does mean locking on
each packfile open, but we already use this lock for the global mwindow
list so it doesn't introduce a new contention point.
Before calling the credentials callback, ask the sever which
authentication methods it supports and report that to the user, instead
of simply reporting everything that the transport supports.
In case of an error, we do fall back to listing all of them.
Bring together all of the OpenSSL initialization to
git_threads_init() so it's together and doesn't need locks.
Moving it here also gives us libssh2 thread safety (when built against
openssl).
When using in a multithreaded context, OpenSSL needs to lock, and leaves
it up to application to provide said locks.
We were not doing this, and it's just luck that's kept us from crashing
up to now.
The OpenSSL init functions are not reentrant, which means that running
multiple fetches in parallel can cause us to crash.
Use a mutex to init OpenSSL, and since we're adding this extra checks,
init it only once.
Instead of using a sentinel empty value to detect the last commit, let's
check for when we get a NULL from popping the stack, which lets us know
when we're done.
The current code causes us to read uninitialized data, although only on
RHEL/CentOS 6 in release mode. This is a readability win overall.
If the user wants to keep a copy for themselves, they should make a
copy. It adds unnecessary complexity to make sure the returned entries
are valid until the builder is cleared.
Finding a filename in a vector means we need to resort it every time we
want to read from it, which includes every time we want to write to it
as well, as we want to find duplicate keys.
A hash-map fits what we want to do much more accurately, as we do not
care about sorting, but just the particular filename.
We still keep removed entries around, as the interface let you assume
they were going to be around until the treebuilder is cleared or freed,
but in this case that involves an append to a vector in the filter case,
which can now fail.
The only time we care about sorting is when we write out the tree, so
let's make that the only time we do any sorting.
A symref inside the namespace gets renamed, we should make it point to
the target's new name.
This is for the origin/HEAD -> origin/master type of situations.
There is no reason why we need to use a callback here. A string array
fits better with the usage, as this is not an event and we don't need
anything from the user.
We must make sure that the name pointer remains valid, so make sure to
allocate the new one before freeing the old one and swap them so the
user never sees an invalid pointer.
We don't allow renames of anonymous remotes, so there's no need to
handle them.
A remote is always associated with a repository, so there's no need to
check for that.
Tighten up which references we consider for renaming so we don't try to
rename unrelated ones and end up with unexplained references.
If there is a reference on the target namespace, git overwrites it, so
let's do the same.
When renaming a lock file to its final location, we need to make sure
that it is replaced atomically.
We currently have a workaround for Windows by removing the target file.
This means that the target file, which may be a ref or a packfile, may
cease to exist for a short wile, which shold be avoided.
Implement the workaround only in Windows, by making sure that the file
we want to replace is writable.
Whe already worked out the kinks with the function used in the local
transport. Expose it and make use of it in the local clone method
instead of trying to work it out again.
When removing the remote-tracking branches, build up the list and remove
in two steps, working around an issue with the iterator. Removing while
we're iterating over the refs can cause us to miss references.
Inside `git_remote_load`, the calls to `get_optional_config` use
`giterr_clear` to unset any errors that are set due to missing config
keys. If neither a fetch nor a push url config was found for a remote,
we should set an error again.
This fixes two issues I found when core.precomposeunicode is enabled:
* When creating a reference with a NFD string, the returned
git_reference would return this NFD string as the reference’s
name. But when looking up the reference later, the name would
then be returned as NFC string.
* Renaming a reference would not honor the core.precomposeunicode and
apply no normalization to the new reference name.
If requested, git_clone_local_into() will try to link the object files
instead of copying them.
This only works on non-Windows (since it doesn't have this) when both
are on the same filesystem (which are unix semantics).
A call like git_clone("./foo", "./foo1") writes origin's url as './foo',
which makes it unusable, as they're relative to different things.
Go with git's behaviour and store the realpath as the url.
When git is given such a path, it will perform a "local clone",
bypassing the git-aware protocol and simply copying over all objects
that exist in the source.
Copy this behaviour when given a local path.
We have too many places where we repeat free code, so when adding the
new free to the generic code, it didn't take for the local transport.
While there, fix a C99-ism that sneaked through.
The protocol has a capability which allows the server to tell us which
refs are symrefs, so we can e.g. know which is the default branch.
This capability is different from the ones we already support, as it's
not setting a flag to true, but requires us to store a list of
refspec-formatted mappings.
This commit does not yet expose the information in the reference
listing.
Use size_t for page size, instead of long. Check result of sysconf.
Use size_t for page offset so no cast to size_t (second arg to p_mmap).
Use mod instead div/mult pair, so no cast to size_t is necessary.
If you enabled core.safecrlf on an LF-ending platform, we would
error even for files with all LFs. We should only be warning on
irreversible mappings, I think.
We set up the current branch after we fetch from the remote. This means
that the user's refspec may have already created this reference. It is
therefore not an error if we cannot create the branch because it already
exists.
This allows for the user to replicate git-clone's --mirror option.
Instead of changing the user-provided remote, duplicate it so we can add
the extra refspec without having to worry about unsetting it before
returning.
Windows has its own ftruncate() called _chsize_s().
p_mkstemp() is changed to use p_open() so we can make sure we open for
writing; the addition of exclusive create is a good thing to do
regardless, as we want a temporary path for ourselves.
Lastly, MSVC doesn't quite know how to add two numbers if one of them is a
void pointer, so let's alias it to unsigned char.C
Some OSs cannot keep their ideas about file content straight when mixing
standard IO with file mapping. As we use mmap for reading from the
packfile, let's make writing to the pack file use mmap.
When running multithreaded, it is not enough to check for the offmap
allocation. Move the call to cache_init() to packfile allocation so we
can be sure it is always allocated free of races.
This fixes#2355.
The switch makes the loop somewhat unwieldy. Let's assume it's fine and
perform the check when we're accessing the data.
This makes our code look a lot more like git's.
Dependency chains are often large and require a few
reallocations. Allocate a 64-element chain before doing anything else to
avoid allocations during the loop.
This value comes from the stack-allocated one git uses. We still
allocate this on the heap, but it does help performance a little bit.
Bring back the use of the delta base cache for unpacking objects. When
generating the delta chain, we stop when we find a delta base in the
pack's cache and use that as the starting point.
We currently make use of recursive function calls to unpack an object,
resolving the deltas as we come back down the chain. This means that we
have unbounded stack growth as we look up objects in a pack.
This is now done in two steps: first we figure out what the dependency
chain is by looking up the delta bases until we reach a non-delta
object, pushing the information we need onto a stack and then we pop
from that stack and apply the deltas until there are no more left.
This version of the code does not make use of the delta base cache so it
is slower than what's in the mainline. A later commit will reintroduce
it.
Repeating this error message makes it harder to find out where we
actually are finding the error, and they don't really describe what
we're trying to do.
When using Iconv to convert unicode data and iconv doesn't like
the source data (because it thinks that it's not actual UTF-8),
instead of stopping the operation, just use the unconverted data.
This will generally do the right thing on the filesystem, since
that is the source of the non-UTF-8 path data anyhow.
This adds some tests for creating and looking up branches with
messy Unicode names. Also, this takes the helper function that
was previously internal to `git_repository_init` and makes it
into `git_path_does_fs_decompose_unicode` which is a useful in
tests to understand what the expected results should be.
Our vector does a move of the rest of the array when we remove an
item. Doing this repeatedly can be expensive, and we do this a lot in
the indexer. Instead, set the value to NULL and skip those entries.
perf reported around 30% of `index-pack` time was going into
memmove. With this change, that goes away and we spent most of the time
hashing and inflating data.
This adds in missing calls to `git_buf_sanitize` and fixes a
number of places where `git_buf` APIs could inadvertently write
NUL terminator bytes into invalid buffers. This also changes the
behavior of `git_buf_sanitize` to NUL terminate a buffer if it can
and of `git_buf_shorten` to do nothing if it can.
Adds tests of filtering code with zeroed (i.e. unsanitized) buffer
which was previously triggering a segfault.
Diff and status do not want core.safecrlf to actually raise an
error regardless of the setting, so this extends the filter API
with an additional options flags parameter and adds a flag so that
filters can be applied with GIT_FILTER_OPT_ALLOW_UNSAFE, indicating
that unsafe filter application should be downgraded from a failure
to a warning.
1) Call to git_shutdown results in setting git__n_shutdown_callbacks
to -1. Next call to git__on_shutdown results in ABW (Array Bound Write)
for array git__shutdown_callbacks. In the current Implementation,
git_atomic_dec is called git__n_shutdown_callbacks + 1 times. I have
modified it to a for loop so that it is more readable. It would not
set git__n_shutdown_callbacks to a negative number and reset the
elements of git__shutdown_callbacks to NULL.
2) In function git_sysdir_get, shutdown function is registered only if
git_sysdir__dirs_shutdown_set is set to 0. However, after this variable
is set to 1, it is never reset to 0. If git_sysdir_global_init is
called again from synchronized_threads_init it does not register
shutdown function for this subsystem.
The diff code was using an "ignored_prefix" directory to track if
a parent directory was ignored that contained untracked files
alongside tracked files. Unfortunately, when negative ignore rules
were used for directories inside ignored parents, the wrong rules
were applied to untracked files inside the negatively ignored
child directories.
This commit moves the logic for ignore containment into the workdir
iterator (which is a better place for it), so the ignored-ness of
a directory is contained in the frame stack during traversal. This
allows a child directory to override with a negative ignore and yet
still restore the ignored state of the parent when we traverse out
of the child.
Along with this, there are some problems with "directory only"
ignore rules on container directories. Given "a/*" and "!a/b/c/"
(where the second rule is a directory rule but the first rule is
just a generic prefix rule), then the directory only constraint
was having "a/b/c/d/file" match the first rule and not the second.
This was fixed by having ignore directory-only rules test a rule
against the prefix of a file with LEADINGDIR enabled.
Lastly, spot checks for ignores using `git_ignore_path_is_ignored`
were tested from the top directory down to the bottom to deal with
the containment problem, but this is wrong. We have to test bottom
to top so that negative subdirectory rules will be checked before
parent ignore rules.
This does change the behavior of some existing tests, but it seems
only to bring us more in line with core Git, so I think those
changes are acceptable.
The brace in the check for peel's return was surrounding the wrong
thing, which made 'error' be set to 1 when there was an error instead of
the error code.
We assume that everything under GIT_DIR/objects/ is a directory. This is
not necessarily the case if some process left a stray file in there.
Check beforehand if we do have a directory and ignore the entry
otherwise.
There are a few tests that set up a fake home directory and a
fake GLOBAL search path so that we can test things in global
ignore or attribute or config files. This cleans up that code to
work more robustly even if there is a test failure. This also
fixes some valgrind warnings where scanning search paths for
separators could end up doing a little bit of sketchy data access
when coming to the end of search list.
This is a proposed adjustment to the trace APIs. This makes the
trace levels into a bitmask so that they can be selectively enabled
and adds a callback-level payload, plus a message-level payload.
This makes it easier for me to a GIT_TRACE_PERF callbacks that
are simply bypassed if the PERF level is not set.
This adds an option to refresh the stat cache while generating
status. It also rips out the GIT_PERF stuff I had an makes use
of the trace API to keep statistics about what happens during diff.
When diff is scanning the working directory, if it finds a file
where it is not sure if the index entry matches the working dir,
it will recalculate the OID (which is pretty expensive). This
adds a new flag to diff so that if the OID calculation finds that
the file actually has not changed (i.e. just the modified time was
altered or such), then it will refresh the stat cache in the index
so that future calls to diff will not have to check the oid again.
This reorganized the diff OID calculation to make it easier to
correctly update the stat cache during a diff once the flags to
do so are enabled.
This includes marking the path of a git_index_entry as const so
we can make a "fake" git_index_entry with a "const char *" path
and not get warnings. I was a little surprised at how unobtrusive
this change was, but I think it's probably a good thing.
When we think the stat cache in the index seems valid and the size
or mode of a file has definitely changed, then don't bother trying
to recalculate the OID of the workdir bits to confirm that it is
modified - just accept that it is modified.
This can result in files that show as modified with no actual diff,
but the behavior actually appears to match Git on the command line.
This also includes a minor optimization to not perform a submodule
lookup on the ".git" directory itself.
The current version of the commit creation and amend function are unsafe
to use when passing the update_ref parameter, as they do not check that
the reference at the moment of update points to what the user expects.
Make sure that we're moving history forward when we ask the library to
update the reference for us by checking that the first parent of the new
commit is the current value of the reference. We also make sure that the
ref we're updating hasn't moved between the read and the write.
Similarly, when amending a commit, make sure that the current tip of the
branch is the commit we're amending.
When deflating data, we might need to grow the buffer. Currently we
add a guess on top of the currently-allocated buffer size.
When we re-use the buffer, it already has some memory allocated; adding
to that means that we always grow the buffer regardless of how much we
need to use.
Instead, increase on top of the currently-used size. This still leaves
us with the allocated size of the largest object we compress, but it's a
minor pain compared to unbounded growth.
This fixes#2285.
It's possible for an encrypted connection not have a certificate. In
this case, SSL_get_verify_result() will return OK because no error
happened (as it never even tried to validate anything).
SSL_get_peer_certificate() will return NULL in this case so we need to
catch that. On the upside, the current code would segfault in this
situation instead of letting it through as a valid cert.
When considering status of untracked directories, if we find an
explicitly ignored item, even if it is a directory, treat the
parent as an IGNORED item. It was accidentally being treated as
an EMPTY item because we were not looking into the ignored subdir.
The current FETCH_HEAD parsing code assumes that a quote must end the
branch name. Git however allows for quotes as part of a branch name,
which causes us to consider the FETCH_HEAD file as invalid.
Instead of searching for a single quote char, search for a quote char
followed by SP, which is not a valid part of a ref name.
In the iterator, distinguish between ignores and empty directories
so that diff and status can ignore empty directories, but checkout
and stash can treat them as untracked items.
When diff finds an untracked directory, it emulates Git behavior
by looking inside the directory to see if there are any untracked
items inside it. If there are only ignored items inside the dir,
then diff considers it ignored, even if there is no direct ignore
rule for it.
Checkout was not copying this behavior - when it found an untracked
directory, it just treated it as untracked. Unfortunately, when
combined with GIT_CHECKOUT_REMOVE_UNTRACKED, this made is seem that
checkout (and stash, which uses checkout) was removing ignored
items when you had only asked it to remove untracked ones.
This commit moves the logic for advancing past an untracked dir
while scanning for non-ignored items into an iterator helper fn,
and uses that for both diff and checkout.
To emulate git, stash should not remove untracked git repositories
inside the parent repo, and checkout's REMOVE_UNTRACKED should
also skip over these items.
`git stash` actually prints a warning message for these items.
That should be possible with a checkout notify callback if you
wanted to, although it would require a bit of extra logic as things
are at the moment.
This takes the `--stat` and related example options in the example
diff.c program and converts them to use the `git_diff_get_stats`
API which nicely formats stats for you.
I went to add bar-graph scaling to the stats formatter and noticed
that the `git_diff_stats` structure was holding on to all of the
`git_patch` objects. Unfortunately, each of these objects keeps
the full text of the diff in memory, so this is very expensive. I
ended up modifying `git_diff_stats` to keep just the data that it
needs to keep and allowed it to release the patches. Then, I added
width scaling to the output on top of that.
In making the diff example program match 'git diff' output, I ended
up removing an newline from the sumamry output which I then had to
compensate for in the email formatting to match the expectations.
Lastly, I went through and refactored the tests to use a couple of
helper functions and reduce the overall amount of code there.
I was playing with "git diff-index" and wanted to be able to
emulate that behavior a little more closely with the diff example.
Also, I wanted to play with running `git_diff_tree_to_workdir`
directly even though core Git doesn't exactly have the equivalent,
so I added a command line option for that and tweaked some other
things in the example code.
This changes a minor output thing in that the "raw" print helper
function will no longer add ellipses (...) if the OID is not
actually abbreviated.
Allow the credentials callback to return GIT_PASSTHROUGH to make the
transports code behave as though none was set.
This should make it easier for bindings to behave closer to the C code
when there is no credentials callback set at their level.
Only apply LEADING_DIR pattern munging to patterns in ignore and
attribute files, not to pathspecs used to select files to operate
on. Also, allow internal macro definitions to be evaluated before
loading all external ones (important so that external ones can
make use of internal `binary` definition).
Ignore patterns that ended with a trailing '/*' were still needing
to match against another actual '/' character in the full path.
This is not the same behavior as core Git.
Instead, we strip a trailing '/*' off of any patterns that were
matching and just take it to imply the FNM_LEADING_DIR behavior.
There was a latent bug where files that use macro definitions
could be parsed before the macro definitions were loaded. Because
of attribute file caching, preloading files that are going to be
used doesn't add a significant amount of overhead, so let's always
preload any files that could contain macros before we assemble the
actual vector of files to scan for attributes.
When traversing the directory structure, the iterator pushes and
pops ignore files using a vector. Some directories don't have
ignore files, so it uses a path comparison to decide when it is
right to actually pop the last ignore file. This was only
comparing directory suffixes, though, so a subdirectory with the
same name as a parent could result in the parent's .gitignore
being popped off the list ignores too early. This changes the
logic to compare the entire relative path of the ignore file.
The ssh-specific credentials allow the username to be missing. The idea
being that the ssh transport will then use the username provided in the
url, if it's available. There are two main issues with this.
The credential callback already knows what username was provided by the
url and needs to figure out whether it wants to ask the user for it or
it can reuse it, so passing NULL as the username means the credential
callback is suspicious.
The username provided in the url is not in fact used by the
transport. The only time it even considers it is for the user/pass
credential, which asserts the existence of a username in its
constructor. For the ssh-specific ones, it passes in the username stored
in the credential, which is NULL. The libssh2 macro we use runs strlen()
against this value (which is no different from what we would be doing
ourselves), so we then crash.
As the documentation doesn't suggest to leave out the username, assert
the need for a username in the code, which removes this buggy behavior
and removes implicit state.
git_cred_has_username() becomes a blacklist of credential types that do
not have a username. The only one at the moment is the 'default' one,
which is meant to call up some Microsoft magic.
Now that our strmap is no longer modified but replaced, we can use the
same strmap for the snapshot's values and it will be freed when we don't
need it anymore.
When we delete an entry, we also want to refresh the configuration to
catch any changes that happened externally.
This allows us to simplify the logic, as we no longer need to delete
these variables internally. The whole state will be refreshed and the
deleted entries won't be there.
With the isolation of complex reads, we can now try to refresh the
on-disk file before reading a value from it.
This changes the semantics a bit, as before we could be sure that a
string we got from the configuration was valid until we wrote or
refreshed. This is no longer the case, as a read can also invalidate the
pointer.
Current code sets the active map to a new one and builds it whilst it's
active. This is a race condition with someone else trying to access the
same config.
Instead, let's build up our new map and swap the active and new one.
In order to have consistent views of the config files for remotes,
submodules et al. and a configuration that represents what is currently
stored on-disk, we need a way to provide a view of the configuration
that does not change.
The goal here is to provide the snapshotting part by creating a
read-only copy of the state of the configuration at a particular point
in time, which does not change when a repository's main config changes.
The checks to see if files were out of date in the attibute cache
was wrong because the cache-breaker data wasn't getting stored
correctly. Additionally, when the cache-breaker triggered, the
old file data was being leaked.
I don't love this approach, but achieving thread-safety for
attribute and ignore data while reloading files would require a
larger rewrite in order to avoid this. If an attribute or ignore
file is out of date, this holds a lock on the file while we are
reloading the data so that another thread won't try to reload the
data at the same time.
In the threading tests, I was still seeing a race condition where
the same item could end up being inserted multiple times into the
index. Preserving the sorted-ness of the index outside of the
`index_insert` call fixes the issue.
This is a big refactoring of the attribute file cache to be a bit
simpler which in turn makes it easier to enforce a lock around any
updates to the cache so that it can be used in a threaded env.
Tons of changes to the attributes and ignores code.
While I was looking at the conflict cleanup code, I looked over at
the tree cache code, since we clear the tree cache for each entry
that gets removed and there is some redundancy there. I made some
small tweaks to avoid extra calls to strchr and strlen in a few
circumstances.
I introduced a leak into conflict cleanup by removing items from
inside the git_vector_remove_matching call. This simplifies the
code to just use one common way for the two conflict cleanup APIs.
When an index has an active snapshot, removing an item can cause
an error (inserting into the deferred deletion vector), so I made
the git_index_conflict_cleanup API return an error code. I felt
like this wasn't so bad since it is just like the other APIs.
I fixed up a couple of comments while I was changing the header.
The iterator pushes and pops ignores incrementally onto a list as
it traverses the directory structure so that it doesn't have to
constantly recheck which ignore files apply. With the new ref
counting, it wasn't decrementing the refcount on the ignores that
it removed from the vector.
This makes the lock management on the index a little bit broader,
having a number of routines hold the lock across looking up the
item to be modified and actually making the modification. Still
not true thread safety, but more pure index modifications are now
safe which allows the simple cases (such as starting up a diff
while index modifications are underway) safe enough to get the
snapshot without hitting allocation problems.
As part of this, I simplified the allocation of index entries to
use a flex array and just put the path at the end of the index
entry. This makes every entry self-contained and makes it a
little easier to feel sure that pointers to strings aren't
being accidentally copied and freed while other references are
still being held.
This adds a basic test of doing simultaneous diffs on multiple
threads and adds basic locking for the attr file cache because
that was the immediate problem that arose from these tests.
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
The usefulness of these helpers came up for me while debugging
some of the iterator changes that I was making, so since they
have also been requested (albeit indirectly) I thought I'd include
them.
Again, laying groundwork for some index iterator changes, this
contains a bunch of code refactorings for index internals that
should make it easier down the line to add locking around index
modifications. Also this removes the redundant prefix_position
function and fixes some potential memory leaks.
Ignore rules with slashes in them are matched using FNM_PATHNAME
and use the path to the .gitignore file from the root of the
repository along with the path fragment (including slashes) in
the ignore file itself. Unfortunately, the relative path to the
.gitignore file was being applied to the global core.excludesfile
if that was also named ".gitignore".
This fixes that with more precise matching and includes test for
ignore rules with leading slashes (which were the primary example
of this being broken in the real world).
This also backports an improvement to the file context logic from
the threadsafe-iterators branch where we don't rely on mutating
the key of the attribute file name to generate the context path.
There were a couple bugs in popping ignore files during iteration
that could result in incorrect decisions be made and thus ignore
files below the root either not being loaded correctly or not
being popped at the right time.
One bug was an off-by-one in comparing the path of the gitignore
file with the path being exited during iteration.
The second bug was not correctly truncating the path being tracked
during traversal if there were no ignores on the list (i.e. when
you have no .gitignore at the root, but do have some in contained
directories).
This updates how libgit2 treats submodule-like directories that
actually have tracked content inside of them. This is a strange
corner case, but it seems that many people have abortive submodule
setups and then just went ahead and added the files into the
parent repository. In this case, we should just treat the
submodule as if it was a normal directory.
Libgit2 will still try to skip over real submodules and contained
repositories that do not have tracked files inside them, but this
adds some new handling for cases where the apparently submodule
data is in conflict with the actual list of tracked files.
git_merge_base() returns GIT_ENOTFOUND when it cannot find a merge
base. graph_desdendant_of() returns a boolean value (barring any
errors), so it needs to catch the NOTFOUND return value and convert it
into false, as not merge base means it cannot be a descendant.