The documentation for `git_path_join_unrooted` states that the base
length will be returned, so that consumers like checkout know where
to start creating directories instead of always creating directories
at the directory root.
The implementation of the hashsig API disallows computing a signature on
small files containing only a few lines. This new flag disables this
behavior.
git_diff_find_similar() sets this flag by default which means that rename
/ copy detection of small files will now work. This in turn affects the
behavior of the git_status and git_blame APIs which will now detect rename
of small files assuming the right options are passed.
git_remote_download() must also clear the internal push state resulting from a possible earlier push operation. Otherwise calling git_remote_update_tips() will execute the push version instead of the fetch version and among other things, tags won't be updated.
The clock_gettime() function is normally not available under
AmigaOS, hence another solution is required. We are using now
GetUpTime() that is present in current versions of this
operating system.
The openssl setup function needs to be GIT_EXPORT'ed, be sure
to include the `sys/openssl.h` header so that it is appropriately
decorated as an export function.
A byte value of 0x85 is not whitespace, we were conflating that with
U+0085 (UTF8: 0xc2 0x85). This caused us to incorrectly treat valid
multibyte characters like U+88C5 (UTF8: 0xe8 0xa3 0x85) as whitespace.
On a case-insensitive filesystem, we need to deal with case-changing
renames (eg, foo -> FOO) by removing the old and adding the new,
exactly as if we were on a case-sensitive filesystem.
Update the `checkout::tree::can_cancel_checkout_from_notify` test, now
that notifications are always sent case sensitively.
For the REUC and NAME entries, we use size_t internally, and we take
size_t for the get_byindex() functions, but the entrycount() functions
strangely cast to an unsigned int instead.
This introduces the functionality of submodule update in
'git_submodule_do_update'. The existing 'git_submodule_update' function is
renamed to 'git_submodule_update_strategy'. The 'git_submodule_update'
function now refers to functionality similar to `git submodule update`,
while `git_submodule_update_strategy` is used to get the configured value
of submodule.<name>.update.
Path validation may be influenced by `core.protectHFS` and
`core.protectNTFS` configuration settings, thus treebuilders
can take a repository to influence their configuration.
HFS filesystems ignore some characters like U+200C. When these
characters are included in a path, they will be ignored for the
purposes of comparison with other paths. Thus, if you have a ".git"
folder, a folder of ".git<U+200C>" will also match. Protect our
".git" folder by ensuring that ".git<U+200C>" and friends do not match it.
Disallow:
1. paths with trailing dot
2. paths with trailing space
3. paths with trailing colon
4. paths that are 8.3 short names of .git folders ("GIT~1")
5. paths that are reserved path names (COM1, LPT1, etc).
6. paths with reserved DOS characters (colons, asterisks, etc)
These paths would (without \\?\ syntax) be elided to other paths - for
example, ".git." would be written as ".git". As a result, writing these
paths literally (using \\?\ syntax) makes them hard to operate with from
the shell, Windows Explorer or other tools. Disallow these.
When turning UTF-8 paths into UCS-2 paths for Windows, always use
the \\?\-prefixed paths. Because this bypasses the system's
path canonicalization, handle the canonicalization functions ourselves.
We must:
1. always use a backslash as a directory separator
2. only use a single backslash between directories
3. not rely on the system to translate "." and ".." in paths
4. remove trailing backslashes, except at the drive root (C:\)
This option does not get persisted to disk, which makes it different
from the rest of the setters. Remove it until we go all the way.
We still respect the configuration option, and it's still possible to
perform a one-time prune by calling the function.
For each remote-tracking branch we want to remove, we need to consider
it against every other refspec in case we have overlapping refspecs,
such as with
refs/heads/*:refs/remotes/origin/*
refs/pull/*/head:refs/remotes/origin/pr/*
as we'd otherwise remove too many refspecs.
Create a list of condidates, which are the references matching the rhs
of any active refspec and then filter that list by removing those
entries for which we find a remove reference with any active
refspec. Those which are left after this are removed.
Most of the network-facing facilities have been copied to the socket and
openssl streams. No code now uses these functions directly anymore, so
we can now remove them.
Having an ssh stream would require extra work for stream capabilities we
don't need anywhere else (oob auth and command execution) so for now
let's move away from the gitno connection to use socket_stream.
We can introduce an ssh stream interface if and as we need it.
This unfortunately isn't as stackable as could be possible, as it
hard-codes the socket stream. This is because the method of using a
custom openssl BIO is not clear, and we do not need this for now. We can
still bring this in if and as we need it.
We currently have gitno for talking over TCP, but this needs to know
about both plaintext and OpenSSL connections and the code has gotten
somewhat messy with ifdefs determining which version of the function
should be called.
In order to clean this up and abstract away the details of sending over
the different types of streams, we can instead use an interface and
stack stream implementations.
We may not be able to use the stackability with all streams, but we
are definitely be able to use the abstraction which is currently spread
between different bits of gitno.
A rule can only negate something which was explicitly mentioned in the
rules before it. Change our parsing to ignore a negative rule which does
not negate something mentioned in the rules above it.
While here, fix a wrong allocator usage. The memory for the match string
comes from pool allocator. We must not free it with the general
allocator. We can instead simply forget the string and it will be
cleaned up.
We no longer have NULL strings, but empty ones and duplicate the sides
if necessar, so the first check will never do anything.
While in the area, remove unnecessary ifs and early returns.
There are some combination of objects and target types which we know
cannot be fulfilled. Return EINVALIDSPEC for those to signify that there
is a mismatch in the user-provided data and what the object model is
capable of satisfying.
If we start at a tag and in the course of peeling find out that we
cannot reach a particular type, we return EPEEL.
That's a bad assumption to make, even though right now it holds
(because of the way we've implemented decompression of packfiles),
this may change in the future, given that ODB objects can be
binary data.
Furthermore, the ODB object can return a NULL pointer if the object
is empty. Copying the NULL pointer to the strbuf lets us handle it
like an empty string. Again, the NULL pointer is valid behavior because
you're supposed to check the *size* of the object before working
on it.
This is a contract that we made in the library and that we need to uphold. The
contents of a blob can never be NULL because several parts of the library (including
the filter and attributes code) expect `git_blob_rawcontent` to always return a
valid pointer.
When we fetch twice with the same remote object, we did not properly
clear the connection flags, so we would leak state from the last
connection.
This can cause the second fetch with the same remote object to fail if
using a HTTP URL where the server redirects to HTTPS, as the second
fetch would see `use_ssl` set and think the initial connection wanted to
downgrade the connection.
When attempting to update a reference on a remote during push, and the
reference on the remote refers to a commit that does not exist locally,
then we should report a more clear error message.
When creating a new remote, contrary to loading one from disk,
active_refspecs was not populated. This means that if using the new
remote to push, git_push_update_tips() will be a no-op since it
checks the refspecs passed during the push against the base ones
i.e. active_refspecs. And therefore the local refs won't be created
or updated after the push operation.
There is one well-known and well-tested parser which we should use,
instead of implementing parsing a second time.
The common parser is also augmented to copy the LHS into the RHS if the
latter is empty.
The expressions test had to change a bit, as we now catch a bad RHS of a
refspec locally.
This function, similar in style to git_remote_fetch(), performs all the
steps required for a push, with a similar interface.
The remote callbacks struct has learnt about the push callbacks, letting
us set the callbacks a single time instead of setting some in the remote
and some in the push operation.
This describes their purpose better, as we now initialize ssl and some
other global stuff in there. Calling the init function is not something
which has been optional for a while now.
git hardocodes these as objects which exist regardless of whether they
are in the odb and uses them in the shell interface as a way of
expressing the lack of a blob or tree for one side of e.g. a diff.
In the library we use each language's natural way of declaring a lack of
value which makes a workaround like this unnecessary. Since git uses it,
it does however mean each shell application would need to perform this
check themselves.
This makes it common work across a range of applications and an issue
with compatibility with git, which fits right into what the library aims
to provide.
Thus we introduce the hard-coded empty blob and tree in the odb
frontend. These hard-coded objects are checked for before going to the
backends, but after the cache check, which means the second time they're
used, they will be treated as normal cached objects instead of creating
new ones.
If the remote is anonymous, then we cannot check for any configuration,
as there is no name. Check for this before we try to use the name, which
may be a NULL pointer.
This fixes#2697.
This function has one output but can match multiple files, which can be
unexpected for the user, which would usually path the exact path of the
file he wants the status of.
We cannot know from looking at .gitmodules whether a directory is a
submodule or not. We need the index or tree we are comparing against to
tell us. Otherwise we have to assume the entry in .gitmodules is stale
or otherwise invalid.
Thus we pass the index of the repository into the workdir iterator, even
if we do not want to compare against it. This follows what git does,
which even for `git diff <tree>`, it will consider staged submodules as
such.
A rule "src" in src/.gitignore must only match subdirectories of
src/. The current code does not include this context in the match rule
and would thus consider this rule to match the top-level src/ directory
instead of the intended src/src/.
Keep track fo the context in which the rule was defined so we can
perform a prefix match.
We currently consider CR to start the end of the line, but that means
that we miss cases with CR CR LF which can be used with git to match
files whose names have CR at the end of their names.
The fix from the patch comes from Russell's comment in the issue.
This fixes#2536.
Before trying to rtransform using the given refspec to figure out what
the name of the upstream branch is on the remote, we must make sure that
the target of the refspec applies to the current branch's upstream.
* Error-handling is cleaned up to only let a file-not-found error
through, not other sorts of errors. And when a file-not-found
error happens, we clean up the error.
* Test now checks that file-not-found introduces no error. And
other minor cleanups.
The create function with default refspec is the same as the one with a
custom refspec, but it has the default refspec, so we can create the one
on top of the other.
When we first ask OpenSSL to verify the certfiicate itself (rather
than the HTTPS specifics), we should also return
GIT_ECERTIFICATE. Otherwise, the caller would consider this as a failed
operation rather than a failed validation and not call the user's own
validation.
For example, if you have
[include]
path = foo
and foo didn't exist, git_config_open_ondisk() would just give up
on the rest of the file. Now it ignores the unresolved include
without error and continues reading the rest of the file.
Already cherry-picked commits should not be re-included. If all changes
included in a commit exist in the upstream, then we should error with
GIT_EAPPLIED.
`git_rebase_next` will apply the next patch (or cherry-pick)
operation, leaving the results checked out in the index / working
directory so that consumers can resolve any conflicts, as appropriate.
Remote objects are not meant to be changed from under the user. We did
this in rename, but only the name and left the refspecs, such that a
save would save the wrong refspecs (and a fetch and anything else would
use the wrong refspecs).
Instead, let's simply take a name and not change any loaded remote from
under the user.
This function does not in fact tell us anything, as almost anything with
a colon in it is a valid rsync-style SSH path; it can not tell us that
we do not support ftp or afp or similar as those are still valid SSH
paths and we do support that.
All versions of SSL are considered deprecated now, so let's ask OpenSSl
to only use TLSv1. We still ask it to load those ciphers for
compatibility with servers which want to use an older hello but will use
TLS for encryption.
For good measure we also disable compression, which can be exploitable,
if the OpenSSL version supports it.
Git for Windows will handle UNC paths only when in forward-slash
format, eg "//server/path". When given a UNC path as a remote,
rewrite standard format ("\\server\path") into this ridiculous
format.
The entry_count field is the amount of index entries covered by a
particular cache entry, that is how many files are there (recursively)
under a particular directory.
The current code that attemps to do this is severely defincient and is
trying to count the amount of children, which always comes up to zero.
We don't even need to recount, since we have the information during the
cache creation. We can take that number and keep it, as we only ever
invalidate or replace.
FindFirstFile will fail with INVALID_HANDLE_VALUE if there are no
children to the given path, which can happen if the given path is a
file (and obviously has no children) or if the given path is an empty
mount point. (Most directories have at least directory entries '.'
and '..', but ridiculously another volume mounted in another drive
letter's path space do not, and thus have nothing to enumerate.)
If FindFirstFile fails, check if this is a directory-like thing
(a mount point).
A reparse point that is an IO_REPARSE_TAG_MOUNT_POINT could be
a junction or an actual filesystem mount point. (Who knew?)
If it's the latter, its reparse point will report the actual
volume information \??\Volume{GUID}\ and we should not attempt
to dereference that further, instead readlink should report
EINVAL since it's not a symlink / junction and its original
path was canonical.
Yes, really.
An obvious place to fill the tree cache is on write-tree, as we're
guaranteed to be able to fill in the whole tree cache.
The way this commit does this is not the most efficient, as we read the
root tree from the odb instead of filling in the cache as we go along,
but it fills the cache such that successive operations (and persisting
the index to disk) will be able to take advantage of the cache, and it
reuses the code we already have for filling the cache.
Filling in the cache as we create the trees would require some
reallocation of the children vector, which is currently not possible
with out pool implementation. A different data structure would likely
allow us to perform this operation at a later date.
Keeping the cache around after read-tree is only one part of the
optimisation opportunities. In order to share the cache between program
instances, we need to write the TREE extension to the index.
Do so, taking the opportunity to rename 'entries' to 'entry_count' to
match the name given in the format description. The included test is
rather trivial, but works as a sanity check.
When reading from a tree, we know what every tree is going to look like,
so we can fill in the tree cache completely, making use of the index for
modification of trees a lot quicker.
This simplifies freeing the entries quite a bit; though there aren't
that many failure paths right now, introducing filling the cache from a
tree will introduce more. This makes sure not to leak memory on errors.
If there have been no pushes, we can immediately return ITEROVER. If
there have been no hides, we must not run the uninteresting pre-mark
phase, as we do not want to hide anything and this would simply cause us
to spend time loading objects.
This introduces a phase at the start of preparing a walk which pre-marks
uninteresting commits, but only up to the common ancestors.
We do this in a similar way to git, by walking down the history and
marking (which is what we used to do), but we keep a time-sorted
priority queue of commits and stop marking as soon as there are only
uninteresting commits in this queue.
This is a similar rule to the one used to find the merge-base. As we
keep inserting commits regardless of the uninteresting bit, if there are
only uninteresting commits in the queue, it means we've run out of
interesting commits in our walk, so we can stop.
The old mark_unintesting() logic is still in place, but that stops
walking if it finds an already-uninteresting commit, so it will stop on
the ones we've pre-marked; but keeping it allows us to also hide those
that are hidden via the callback.
We don't need the remote loaded, and the function extracted both of
these from the git_remote in order to do its work, so let's remote a
step and not ask for the loaded remote at all.
This fixes#2390.
The stash is implemented as the refs/stash reference and its reflog. In
order to modify the reflog, we need avoid races by making sure we're the
only ones allowed to modify the reflog.
We achieve this via the transactions API. Locking the reference gives us
exclusive write access, letting us modify and write it without races.
A transaction allows you to lock multiple references and set up changes
for them before applying the changes all at once (or as close as the
backend supports).
This can be used for replication purposes, or for making sure some
operations run when the reference is locked and thus cannot be changed.
When a list of refspecs is passed to fetch (what git would consider
refspec passed on the command-line), we not only need to perform the
updates described in that refspec, but also update the remote-tracking
branch of the fetched remote heads according to the remote's configured
refspecs.
These "fetches" are not however to be written to FETCH_HEAD as they
would be duplicate data, and it's not what the user asked for.
The configured/base fetch refspecs need to be taken into account in
order to implement opportunistic remote-tracking branch updates. DWIM
them and store them in the struct, but don't do anything with them yet.
We can only DWIM when we've connected to the remote and have the list of
the remote's references. Adding or setting the refspecs should not
trigger an attempt to DWIM the refspecs as we typically cannot do it,
and even if we did, we would not use them for the current fetch.
With opportunistic ref updates, git has introduced the concept of having
base refspecs *and* refspecs that are active for a particular fetch.
Let's start by letting the user override the refspecs for download.
When we describe the workdir, we perform a describe on HEAD and then
check to see if the worktree is dirty. If it is and we have a suffix
string, we append that to the buffer.
Instead of printing out to the buffer inside the information-gathering
phase, write the data to a intermediate result structure.
This allows us to split the options into gathering options and
formatting options, simplifying the gathering code.
The getaddrinfo function indicates failure with a non-zero return code,
but this code is not necessarily negative. On platforms like Android
where the code is positive, a failed call causes libgit2 to segfault.
Instead of spreading the data in function arguments, some of which
aren't used for ssh and having a struct only for ssh, use a struct for
both, using a common parent to pass to the callback.
This option make it easy to ignore anything about the server we're
connecting to, which is bad security practice. This was necessary as we
didn't use to expose detailed information about the certificate, but now
that we do, we should get rid of this.
If the user wants to ignore everything, they can still provide a
callback which ignores all the information passed.
We should let the user decide whether to cancel the connection or not
regardless of whether our checks have decided that the certificate is
fine. We provide our own assessment to the callback to let the user fall
back to our checks if they so desire.
If the certificate validation fails (or always in the case of ssh),
let the user decide whether to allow the connection.
The data structure passed to the user is the native certificate
information from the underlying implementation, namely OpenSSL or
WinHTTP.
When using a bare repo with an index, libgit2 attempts to read
files from the index. It caches those files based on the path
to the file, specifically the path to the directory that contains
the file.
If there is no working directory, we use `git_path_dirname_r` to
get the path to the containing directory. However, for the
`.gitattributes` file in the root of the repository, this ends up
normalizing the containing path to `"."` instead of the empty
string and the lookup the `.gitattributes` data fails.
This adds a test of attribute lookups on bare repos and also
fixes the problem by simply rewriting `"."` to be `""`.
Passing 0 as the length of the paths to check to git_diff_index_to_workdir
results in all files being treated as conflicting, that is, all untracked or
modified files in the worktree is reported as conflicting
A signature is made up of a non-empty name and a non-empty email so
let's validate that. This also brings us more in line with git, which
also rejects ident with an empty email.
The compiler was generating a bunch of warnings for
git_mutex_init and git_mutex_lock when GIT_THREADS
was not defined (i.e. when not using -DTHREADSAFE=ON).
Also remove an unused variable from tests/path/core.c.
When the call to the agent fails, we must retrieve the error message
just after the function call, as other calls may overwrite it.
As the agent authentication is the only one which has a teardown and
there does not seem to be a way to get the error message from a stored
error number, this tries to introduce some small changes to store the
error from the agent.
Clearing the error at the beginning of the loop lets us know whether the
agent has already set the libgit2 error message and we should skip it,
or if we should set it.
Teach git_repository_init_ext to use relative paths for the gitlink
to the work directory. This is used when creating a sub repository
where the sub repository resides in the parent repository's
.git directory.
When the fetch refspec does not include the remote's default branch, it
indicates an error in user expectations or programmer error. Error out
in that case.
This lets us get rid of the dummy refspec which can never work as its
zeroed out. In the cases where we did not find a default branch, we set
HEAD detached immediately, which lets us refactor the "normal" path,
removing `found_branch`.
It does the same as git_remote_supported_url() but has a name which
implies we'd check the URL for correctness while we're simply looking at
the scheme and looking it up in our lists.
While here, fix up the tests so we check all the combination of what's
supported.
The previous commit makes it harder to figure out if the library was
built with support for a particular transport. Roll back some of the
changes and remove ssh:// and https:// from the list if we're being
built without support for them.
Even when built without a SSH support, we know about this transport. It
is implemented, but the current code makes us return an error message
saying it's not.
This is a leftover from the initial implementation of the transports
when there were in fact transports we knew about but were not
implemented.
Instead, let the SSH transport itself say it cannot run, the same as we
do for HTTPS.
A repository can have any number of references which we're not
interested in such as notes or tags. For the default branch calculation
we only care about branches. Make the decision about the number of
branches rather than the number of refs in general.
When we see PARENT1, it means there is a local commit and thus we are
ahead. Likewise, seeing PARENT2 means that the upstream branch has a
commit and we are one more behind.
The logic is currently reversed. Correct it.
This fixes#2501.
W/o this patch it is not possible to have a third party ssh transport_cb if GIT_SSH is disabled or a third party transport_cb which has a higher priority than the default one.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
The callers of git_packfile_unpack() expect the obj_offset argument to
be set to the beginning of the next object. We were mistakenly returning
the the offset of the object's data, which causes the CRC function to
try to use the wrong offset.
Set obj_offset to curpos instead of elem->offset to point to the next
element and bring back expected behaviour.
Our mkdir helper was failing is a parent directory was not
accessible even if the child directory could be created.
This changes the helper to keep trying child directories
even when the parent is unwritable.
The old `allocfmt` is of no use to callers, as they are not able to free
the returned buffer. Export a new API that returns a static string that
doesn't need to be freed.
The recv buffer (parse_buffer) and the buffer have independent sizes and
offsets. We try to fill in parse_buffer as much as possible before
passing it to the http parser. This is fine most of the time, but fails
us when the buffer is almost full.
In those situations, parse_buffer can have more data than we would be
able to put into the buffer (which may be getting full if we're towards
the end of a data sideband packet).
To work around this, we check if the space we have left on our buffer is
smaller than what could come from the network. If this happens, we make
parse_buffer think that it has as much space left as our buffer, so it
won't try to retrieve more data than we can deal with.
As the start of the data may no longer be at the start of the buffer, we
need to keep track of where it really starts (data_offset) and use that
in our calculations for the real size of the data we received from the
network.
This fixes#2518.
* Move the transport registration mechanisms into a new header under
'sys/' because this is advanced stuff.
* Remove the 'priority' argument from the registration as it adds
unnecessary complexity. (Since transports cannot decline to operate,
only the highest priority transport is ever executed.) Users who
require per-priority transports can implement that in their custom
transport themselves.
* Simplify registration further by taking a scheme (eg "http") instead
of a prefix (eg "http://").
In the check for multiline, we traverse the backslashes from the end
backwards and int the end assert that we haven't gone past the beginning
of the line. We make sure of this in the loop condition, but we also
check in the return value.
However, for certain configurations, a line in a multiline variable
might be empty to aid formatting. In that case, 'end' == 'start', since
we ended up looking at the first char which made it a multiline.
There is no need for the (end > start) check in the return, since the
loop guarantees we won't go further back than the first char in the
line, and we do accept the first char to be the final backslash.
This fixes#2483.
While scanning through a directory hierarchy, this prevents a
positive ignore match on a parent directory from blocking the scan
of a directory when a negative match rule exists for files inside
the directory.
* Removes mingw-compat.h
* Cleans up separation of compiler/platform idiosyncrasies
* Unifies mingw/msvc stat structures and functions
* (Tries to) hide more compiler specific implementation details (even in our internal API)
We always calculate multiple merge bases, but up to now we had only
exposed the "best" merge base.
Introduce git_oidarray which analogously to git_strarray lets us return
multiple ids.
This works around strict aliasing rules letting some versions of
GCC (particularly on RHEL 6) thinking that they can skip updating the
size of the array when calculating the next element's offset.
Preallocating two commits doesn't make much sense as leaving allocation
to the first array usage will allocate a sensible size with room for
growth.
This preallocation has also been hiding issues with strict aliasing in
the tests, as we have fairly simple histories and never trigger the
growth.
git allows you to set which paths to use for the git server programs
when connecting over ssh; and we want to provide something similar.
We do this by providing a factory function which can be set as the
remote's transport callback which will set the given paths upon
creation.
We used to assume a refspec would only have an asterisk in the middle of
their respective pattern. This has not been a valid assumption for some
time now with git.
Instead of assuming where the asterisk is going to be, change the logic
to treat each pattern as having two halves with a replacement bit in the
middle, where the asterisk is.
When transforming a non-pattern refspec, we simply need to copy over the
opposite string. Move that logic up to the wrapper so we can assume a
pattern refspec in the transformation function.
Move the definition of git_thread_yield() to the test which needs it and
add the correct definition for it for FreeBSD and derivatives.
Original patch adding FreeBSD and derivatives by @jacquesg.
In order to connect to a remote server, we need to provide a path to the
repository we're interested in. Consider the lack of path in the url an
error.
When the stream writing function was written, it assume that
libssh2_channel_write() would always write all of the data to the
wire. This is only true for the first 32k of data, which it tries to
fit into one ssh packet.
Since it can perform short writes, call it in a loop like we do for
send(), advancing the buffer offset.
As git_clone now has callbacks to configure the details of the
repository and remote, remove the lower-level functions from the public
API, as they lack some of the logic from git_clone proper.
Analogously to the remote creation callback, provide a way for the user
of git_clone() to create the repository with whichever options they
desire via callback.
git_checkout_index can now check out other git_index's (that are not
necessarily the repository index). This allows checkout_index to use
the repository's index for stat cache information instead of the index
data being checked out. git_merge and friends now check out their
indexes directly instead of trying to blend it into the running index.
To make sure that items returned from pool allocations are aligned
on nice boundaries, this rounds up all pool allocation sizes to a
multiple of 8. This adds a small amount of overhead to each item.
The rounding up could be made optional with an extra parameter to
the pool initialization that turned on rounding only for pools
where item alignment actually matters, but I think for the extra
code and complexity that would be involved, that it makes sense
just to burn a little bit of extra memory and enable this all the
time.
The OpenSSL library-loading functions do not expect to be called
multiple times. Add a flag in the non-threaded libgit2 init so we only
call once.
This fixes#2446.
git_remote_set_transport now takes a transport factory rather than a transport
git_clone_options now allows the caller to specify a remote creation callback
In order to know which authentication methods are supported/allowed by
the ssh server, we need to send a NONE auth request, which needs a
username associated with it.
Most ssh server implementations do not allow switching the username
between authentication attempts, which means we cannot use a dummy
username and then switch. There are two ways around this.
The first is to use a different connection, which an earlier commit
implements, but this increases how long it takes to get set up, and
without knowing the right username, we cannot guarantee that the
list we get in response is the right one.
The second is what's implemented here: if there is no username specified
in the url, ask for it first. We can then ask for the list of auth
methods and use the user's credentials in the same connection.
Set a message when we fail to lock.
Also make the put function void, since it's called from free, which
cannot report errors. The only errors we can experience here are
internal state corruption, so we assert that we are trying to put a
pack which we have previously got.
If we fail to insert the packfile in the map, make sure to free it.
This makes the free function only attempt to remove its mwindows from
the global list if we have opened the packfile to avoid accessing the
list unlocked.
Git for Windows 1.9.4 changed the behavior when the text=auto
attribute is specified and core.autocrlf=false. Previous observed
behavior would *not* filter files when going into the working
directory, the new behavior *does* filter. Update our behavior to match.
When checking out files, we're performing conversion into the user's
native line endings, but we only want to do it for files which have
consistent line endings. Refuse to perform the conversion for mixed-EOL
files.
The CRLF->LF filter is left as-is, as that conversion is considered to be
normalization by git and should force a conversion of the line endings.
Opening the same repository multiple times will currently open the same
file multiple times, as well as map the same region of the file multiple
times. This is not necessary, as the packfile data is immutable.
Instead of opening and closing packfiles directly, introduce an
indirection and allocate packfiles globally. This does mean locking on
each packfile open, but we already use this lock for the global mwindow
list so it doesn't introduce a new contention point.
Before calling the credentials callback, ask the sever which
authentication methods it supports and report that to the user, instead
of simply reporting everything that the transport supports.
In case of an error, we do fall back to listing all of them.
Bring together all of the OpenSSL initialization to
git_threads_init() so it's together and doesn't need locks.
Moving it here also gives us libssh2 thread safety (when built against
openssl).
When using in a multithreaded context, OpenSSL needs to lock, and leaves
it up to application to provide said locks.
We were not doing this, and it's just luck that's kept us from crashing
up to now.
The OpenSSL init functions are not reentrant, which means that running
multiple fetches in parallel can cause us to crash.
Use a mutex to init OpenSSL, and since we're adding this extra checks,
init it only once.
Instead of using a sentinel empty value to detect the last commit, let's
check for when we get a NULL from popping the stack, which lets us know
when we're done.
The current code causes us to read uninitialized data, although only on
RHEL/CentOS 6 in release mode. This is a readability win overall.
If the user wants to keep a copy for themselves, they should make a
copy. It adds unnecessary complexity to make sure the returned entries
are valid until the builder is cleared.
Finding a filename in a vector means we need to resort it every time we
want to read from it, which includes every time we want to write to it
as well, as we want to find duplicate keys.
A hash-map fits what we want to do much more accurately, as we do not
care about sorting, but just the particular filename.
We still keep removed entries around, as the interface let you assume
they were going to be around until the treebuilder is cleared or freed,
but in this case that involves an append to a vector in the filter case,
which can now fail.
The only time we care about sorting is when we write out the tree, so
let's make that the only time we do any sorting.
A symref inside the namespace gets renamed, we should make it point to
the target's new name.
This is for the origin/HEAD -> origin/master type of situations.
There is no reason why we need to use a callback here. A string array
fits better with the usage, as this is not an event and we don't need
anything from the user.
We must make sure that the name pointer remains valid, so make sure to
allocate the new one before freeing the old one and swap them so the
user never sees an invalid pointer.
We don't allow renames of anonymous remotes, so there's no need to
handle them.
A remote is always associated with a repository, so there's no need to
check for that.
Tighten up which references we consider for renaming so we don't try to
rename unrelated ones and end up with unexplained references.
If there is a reference on the target namespace, git overwrites it, so
let's do the same.
When renaming a lock file to its final location, we need to make sure
that it is replaced atomically.
We currently have a workaround for Windows by removing the target file.
This means that the target file, which may be a ref or a packfile, may
cease to exist for a short wile, which shold be avoided.
Implement the workaround only in Windows, by making sure that the file
we want to replace is writable.
Whe already worked out the kinks with the function used in the local
transport. Expose it and make use of it in the local clone method
instead of trying to work it out again.
When removing the remote-tracking branches, build up the list and remove
in two steps, working around an issue with the iterator. Removing while
we're iterating over the refs can cause us to miss references.
Inside `git_remote_load`, the calls to `get_optional_config` use
`giterr_clear` to unset any errors that are set due to missing config
keys. If neither a fetch nor a push url config was found for a remote,
we should set an error again.
This fixes two issues I found when core.precomposeunicode is enabled:
* When creating a reference with a NFD string, the returned
git_reference would return this NFD string as the reference’s
name. But when looking up the reference later, the name would
then be returned as NFC string.
* Renaming a reference would not honor the core.precomposeunicode and
apply no normalization to the new reference name.
If requested, git_clone_local_into() will try to link the object files
instead of copying them.
This only works on non-Windows (since it doesn't have this) when both
are on the same filesystem (which are unix semantics).
A call like git_clone("./foo", "./foo1") writes origin's url as './foo',
which makes it unusable, as they're relative to different things.
Go with git's behaviour and store the realpath as the url.
When git is given such a path, it will perform a "local clone",
bypassing the git-aware protocol and simply copying over all objects
that exist in the source.
Copy this behaviour when given a local path.
We have too many places where we repeat free code, so when adding the
new free to the generic code, it didn't take for the local transport.
While there, fix a C99-ism that sneaked through.
The protocol has a capability which allows the server to tell us which
refs are symrefs, so we can e.g. know which is the default branch.
This capability is different from the ones we already support, as it's
not setting a flag to true, but requires us to store a list of
refspec-formatted mappings.
This commit does not yet expose the information in the reference
listing.
Use size_t for page size, instead of long. Check result of sysconf.
Use size_t for page offset so no cast to size_t (second arg to p_mmap).
Use mod instead div/mult pair, so no cast to size_t is necessary.
If you enabled core.safecrlf on an LF-ending platform, we would
error even for files with all LFs. We should only be warning on
irreversible mappings, I think.
We set up the current branch after we fetch from the remote. This means
that the user's refspec may have already created this reference. It is
therefore not an error if we cannot create the branch because it already
exists.
This allows for the user to replicate git-clone's --mirror option.
Instead of changing the user-provided remote, duplicate it so we can add
the extra refspec without having to worry about unsetting it before
returning.
Windows has its own ftruncate() called _chsize_s().
p_mkstemp() is changed to use p_open() so we can make sure we open for
writing; the addition of exclusive create is a good thing to do
regardless, as we want a temporary path for ourselves.
Lastly, MSVC doesn't quite know how to add two numbers if one of them is a
void pointer, so let's alias it to unsigned char.C
Some OSs cannot keep their ideas about file content straight when mixing
standard IO with file mapping. As we use mmap for reading from the
packfile, let's make writing to the pack file use mmap.
When running multithreaded, it is not enough to check for the offmap
allocation. Move the call to cache_init() to packfile allocation so we
can be sure it is always allocated free of races.
This fixes#2355.
The switch makes the loop somewhat unwieldy. Let's assume it's fine and
perform the check when we're accessing the data.
This makes our code look a lot more like git's.
Dependency chains are often large and require a few
reallocations. Allocate a 64-element chain before doing anything else to
avoid allocations during the loop.
This value comes from the stack-allocated one git uses. We still
allocate this on the heap, but it does help performance a little bit.
Bring back the use of the delta base cache for unpacking objects. When
generating the delta chain, we stop when we find a delta base in the
pack's cache and use that as the starting point.
We currently make use of recursive function calls to unpack an object,
resolving the deltas as we come back down the chain. This means that we
have unbounded stack growth as we look up objects in a pack.
This is now done in two steps: first we figure out what the dependency
chain is by looking up the delta bases until we reach a non-delta
object, pushing the information we need onto a stack and then we pop
from that stack and apply the deltas until there are no more left.
This version of the code does not make use of the delta base cache so it
is slower than what's in the mainline. A later commit will reintroduce
it.
Repeating this error message makes it harder to find out where we
actually are finding the error, and they don't really describe what
we're trying to do.
When using Iconv to convert unicode data and iconv doesn't like
the source data (because it thinks that it's not actual UTF-8),
instead of stopping the operation, just use the unconverted data.
This will generally do the right thing on the filesystem, since
that is the source of the non-UTF-8 path data anyhow.
This adds some tests for creating and looking up branches with
messy Unicode names. Also, this takes the helper function that
was previously internal to `git_repository_init` and makes it
into `git_path_does_fs_decompose_unicode` which is a useful in
tests to understand what the expected results should be.
Our vector does a move of the rest of the array when we remove an
item. Doing this repeatedly can be expensive, and we do this a lot in
the indexer. Instead, set the value to NULL and skip those entries.
perf reported around 30% of `index-pack` time was going into
memmove. With this change, that goes away and we spent most of the time
hashing and inflating data.
This adds in missing calls to `git_buf_sanitize` and fixes a
number of places where `git_buf` APIs could inadvertently write
NUL terminator bytes into invalid buffers. This also changes the
behavior of `git_buf_sanitize` to NUL terminate a buffer if it can
and of `git_buf_shorten` to do nothing if it can.
Adds tests of filtering code with zeroed (i.e. unsanitized) buffer
which was previously triggering a segfault.
Diff and status do not want core.safecrlf to actually raise an
error regardless of the setting, so this extends the filter API
with an additional options flags parameter and adds a flag so that
filters can be applied with GIT_FILTER_OPT_ALLOW_UNSAFE, indicating
that unsafe filter application should be downgraded from a failure
to a warning.
1) Call to git_shutdown results in setting git__n_shutdown_callbacks
to -1. Next call to git__on_shutdown results in ABW (Array Bound Write)
for array git__shutdown_callbacks. In the current Implementation,
git_atomic_dec is called git__n_shutdown_callbacks + 1 times. I have
modified it to a for loop so that it is more readable. It would not
set git__n_shutdown_callbacks to a negative number and reset the
elements of git__shutdown_callbacks to NULL.
2) In function git_sysdir_get, shutdown function is registered only if
git_sysdir__dirs_shutdown_set is set to 0. However, after this variable
is set to 1, it is never reset to 0. If git_sysdir_global_init is
called again from synchronized_threads_init it does not register
shutdown function for this subsystem.
The diff code was using an "ignored_prefix" directory to track if
a parent directory was ignored that contained untracked files
alongside tracked files. Unfortunately, when negative ignore rules
were used for directories inside ignored parents, the wrong rules
were applied to untracked files inside the negatively ignored
child directories.
This commit moves the logic for ignore containment into the workdir
iterator (which is a better place for it), so the ignored-ness of
a directory is contained in the frame stack during traversal. This
allows a child directory to override with a negative ignore and yet
still restore the ignored state of the parent when we traverse out
of the child.
Along with this, there are some problems with "directory only"
ignore rules on container directories. Given "a/*" and "!a/b/c/"
(where the second rule is a directory rule but the first rule is
just a generic prefix rule), then the directory only constraint
was having "a/b/c/d/file" match the first rule and not the second.
This was fixed by having ignore directory-only rules test a rule
against the prefix of a file with LEADINGDIR enabled.
Lastly, spot checks for ignores using `git_ignore_path_is_ignored`
were tested from the top directory down to the bottom to deal with
the containment problem, but this is wrong. We have to test bottom
to top so that negative subdirectory rules will be checked before
parent ignore rules.
This does change the behavior of some existing tests, but it seems
only to bring us more in line with core Git, so I think those
changes are acceptable.
The brace in the check for peel's return was surrounding the wrong
thing, which made 'error' be set to 1 when there was an error instead of
the error code.
We assume that everything under GIT_DIR/objects/ is a directory. This is
not necessarily the case if some process left a stray file in there.
Check beforehand if we do have a directory and ignore the entry
otherwise.
There are a few tests that set up a fake home directory and a
fake GLOBAL search path so that we can test things in global
ignore or attribute or config files. This cleans up that code to
work more robustly even if there is a test failure. This also
fixes some valgrind warnings where scanning search paths for
separators could end up doing a little bit of sketchy data access
when coming to the end of search list.
This is a proposed adjustment to the trace APIs. This makes the
trace levels into a bitmask so that they can be selectively enabled
and adds a callback-level payload, plus a message-level payload.
This makes it easier for me to a GIT_TRACE_PERF callbacks that
are simply bypassed if the PERF level is not set.
This adds an option to refresh the stat cache while generating
status. It also rips out the GIT_PERF stuff I had an makes use
of the trace API to keep statistics about what happens during diff.
When diff is scanning the working directory, if it finds a file
where it is not sure if the index entry matches the working dir,
it will recalculate the OID (which is pretty expensive). This
adds a new flag to diff so that if the OID calculation finds that
the file actually has not changed (i.e. just the modified time was
altered or such), then it will refresh the stat cache in the index
so that future calls to diff will not have to check the oid again.
This reorganized the diff OID calculation to make it easier to
correctly update the stat cache during a diff once the flags to
do so are enabled.
This includes marking the path of a git_index_entry as const so
we can make a "fake" git_index_entry with a "const char *" path
and not get warnings. I was a little surprised at how unobtrusive
this change was, but I think it's probably a good thing.
When we think the stat cache in the index seems valid and the size
or mode of a file has definitely changed, then don't bother trying
to recalculate the OID of the workdir bits to confirm that it is
modified - just accept that it is modified.
This can result in files that show as modified with no actual diff,
but the behavior actually appears to match Git on the command line.
This also includes a minor optimization to not perform a submodule
lookup on the ".git" directory itself.
The current version of the commit creation and amend function are unsafe
to use when passing the update_ref parameter, as they do not check that
the reference at the moment of update points to what the user expects.
Make sure that we're moving history forward when we ask the library to
update the reference for us by checking that the first parent of the new
commit is the current value of the reference. We also make sure that the
ref we're updating hasn't moved between the read and the write.
Similarly, when amending a commit, make sure that the current tip of the
branch is the commit we're amending.
When deflating data, we might need to grow the buffer. Currently we
add a guess on top of the currently-allocated buffer size.
When we re-use the buffer, it already has some memory allocated; adding
to that means that we always grow the buffer regardless of how much we
need to use.
Instead, increase on top of the currently-used size. This still leaves
us with the allocated size of the largest object we compress, but it's a
minor pain compared to unbounded growth.
This fixes#2285.
It's possible for an encrypted connection not have a certificate. In
this case, SSL_get_verify_result() will return OK because no error
happened (as it never even tried to validate anything).
SSL_get_peer_certificate() will return NULL in this case so we need to
catch that. On the upside, the current code would segfault in this
situation instead of letting it through as a valid cert.
When considering status of untracked directories, if we find an
explicitly ignored item, even if it is a directory, treat the
parent as an IGNORED item. It was accidentally being treated as
an EMPTY item because we were not looking into the ignored subdir.
The current FETCH_HEAD parsing code assumes that a quote must end the
branch name. Git however allows for quotes as part of a branch name,
which causes us to consider the FETCH_HEAD file as invalid.
Instead of searching for a single quote char, search for a quote char
followed by SP, which is not a valid part of a ref name.
In the iterator, distinguish between ignores and empty directories
so that diff and status can ignore empty directories, but checkout
and stash can treat them as untracked items.
When diff finds an untracked directory, it emulates Git behavior
by looking inside the directory to see if there are any untracked
items inside it. If there are only ignored items inside the dir,
then diff considers it ignored, even if there is no direct ignore
rule for it.
Checkout was not copying this behavior - when it found an untracked
directory, it just treated it as untracked. Unfortunately, when
combined with GIT_CHECKOUT_REMOVE_UNTRACKED, this made is seem that
checkout (and stash, which uses checkout) was removing ignored
items when you had only asked it to remove untracked ones.
This commit moves the logic for advancing past an untracked dir
while scanning for non-ignored items into an iterator helper fn,
and uses that for both diff and checkout.
To emulate git, stash should not remove untracked git repositories
inside the parent repo, and checkout's REMOVE_UNTRACKED should
also skip over these items.
`git stash` actually prints a warning message for these items.
That should be possible with a checkout notify callback if you
wanted to, although it would require a bit of extra logic as things
are at the moment.
This takes the `--stat` and related example options in the example
diff.c program and converts them to use the `git_diff_get_stats`
API which nicely formats stats for you.
I went to add bar-graph scaling to the stats formatter and noticed
that the `git_diff_stats` structure was holding on to all of the
`git_patch` objects. Unfortunately, each of these objects keeps
the full text of the diff in memory, so this is very expensive. I
ended up modifying `git_diff_stats` to keep just the data that it
needs to keep and allowed it to release the patches. Then, I added
width scaling to the output on top of that.
In making the diff example program match 'git diff' output, I ended
up removing an newline from the sumamry output which I then had to
compensate for in the email formatting to match the expectations.
Lastly, I went through and refactored the tests to use a couple of
helper functions and reduce the overall amount of code there.
I was playing with "git diff-index" and wanted to be able to
emulate that behavior a little more closely with the diff example.
Also, I wanted to play with running `git_diff_tree_to_workdir`
directly even though core Git doesn't exactly have the equivalent,
so I added a command line option for that and tweaked some other
things in the example code.
This changes a minor output thing in that the "raw" print helper
function will no longer add ellipses (...) if the OID is not
actually abbreviated.
Allow the credentials callback to return GIT_PASSTHROUGH to make the
transports code behave as though none was set.
This should make it easier for bindings to behave closer to the C code
when there is no credentials callback set at their level.
Only apply LEADING_DIR pattern munging to patterns in ignore and
attribute files, not to pathspecs used to select files to operate
on. Also, allow internal macro definitions to be evaluated before
loading all external ones (important so that external ones can
make use of internal `binary` definition).
Ignore patterns that ended with a trailing '/*' were still needing
to match against another actual '/' character in the full path.
This is not the same behavior as core Git.
Instead, we strip a trailing '/*' off of any patterns that were
matching and just take it to imply the FNM_LEADING_DIR behavior.
There was a latent bug where files that use macro definitions
could be parsed before the macro definitions were loaded. Because
of attribute file caching, preloading files that are going to be
used doesn't add a significant amount of overhead, so let's always
preload any files that could contain macros before we assemble the
actual vector of files to scan for attributes.
When traversing the directory structure, the iterator pushes and
pops ignore files using a vector. Some directories don't have
ignore files, so it uses a path comparison to decide when it is
right to actually pop the last ignore file. This was only
comparing directory suffixes, though, so a subdirectory with the
same name as a parent could result in the parent's .gitignore
being popped off the list ignores too early. This changes the
logic to compare the entire relative path of the ignore file.
The ssh-specific credentials allow the username to be missing. The idea
being that the ssh transport will then use the username provided in the
url, if it's available. There are two main issues with this.
The credential callback already knows what username was provided by the
url and needs to figure out whether it wants to ask the user for it or
it can reuse it, so passing NULL as the username means the credential
callback is suspicious.
The username provided in the url is not in fact used by the
transport. The only time it even considers it is for the user/pass
credential, which asserts the existence of a username in its
constructor. For the ssh-specific ones, it passes in the username stored
in the credential, which is NULL. The libssh2 macro we use runs strlen()
against this value (which is no different from what we would be doing
ourselves), so we then crash.
As the documentation doesn't suggest to leave out the username, assert
the need for a username in the code, which removes this buggy behavior
and removes implicit state.
git_cred_has_username() becomes a blacklist of credential types that do
not have a username. The only one at the moment is the 'default' one,
which is meant to call up some Microsoft magic.
Now that our strmap is no longer modified but replaced, we can use the
same strmap for the snapshot's values and it will be freed when we don't
need it anymore.
When we delete an entry, we also want to refresh the configuration to
catch any changes that happened externally.
This allows us to simplify the logic, as we no longer need to delete
these variables internally. The whole state will be refreshed and the
deleted entries won't be there.
With the isolation of complex reads, we can now try to refresh the
on-disk file before reading a value from it.
This changes the semantics a bit, as before we could be sure that a
string we got from the configuration was valid until we wrote or
refreshed. This is no longer the case, as a read can also invalidate the
pointer.
Current code sets the active map to a new one and builds it whilst it's
active. This is a race condition with someone else trying to access the
same config.
Instead, let's build up our new map and swap the active and new one.
In order to have consistent views of the config files for remotes,
submodules et al. and a configuration that represents what is currently
stored on-disk, we need a way to provide a view of the configuration
that does not change.
The goal here is to provide the snapshotting part by creating a
read-only copy of the state of the configuration at a particular point
in time, which does not change when a repository's main config changes.
The checks to see if files were out of date in the attibute cache
was wrong because the cache-breaker data wasn't getting stored
correctly. Additionally, when the cache-breaker triggered, the
old file data was being leaked.
I don't love this approach, but achieving thread-safety for
attribute and ignore data while reloading files would require a
larger rewrite in order to avoid this. If an attribute or ignore
file is out of date, this holds a lock on the file while we are
reloading the data so that another thread won't try to reload the
data at the same time.
In the threading tests, I was still seeing a race condition where
the same item could end up being inserted multiple times into the
index. Preserving the sorted-ness of the index outside of the
`index_insert` call fixes the issue.
This is a big refactoring of the attribute file cache to be a bit
simpler which in turn makes it easier to enforce a lock around any
updates to the cache so that it can be used in a threaded env.
Tons of changes to the attributes and ignores code.
While I was looking at the conflict cleanup code, I looked over at
the tree cache code, since we clear the tree cache for each entry
that gets removed and there is some redundancy there. I made some
small tweaks to avoid extra calls to strchr and strlen in a few
circumstances.
I introduced a leak into conflict cleanup by removing items from
inside the git_vector_remove_matching call. This simplifies the
code to just use one common way for the two conflict cleanup APIs.
When an index has an active snapshot, removing an item can cause
an error (inserting into the deferred deletion vector), so I made
the git_index_conflict_cleanup API return an error code. I felt
like this wasn't so bad since it is just like the other APIs.
I fixed up a couple of comments while I was changing the header.
The iterator pushes and pops ignores incrementally onto a list as
it traverses the directory structure so that it doesn't have to
constantly recheck which ignore files apply. With the new ref
counting, it wasn't decrementing the refcount on the ignores that
it removed from the vector.
This makes the lock management on the index a little bit broader,
having a number of routines hold the lock across looking up the
item to be modified and actually making the modification. Still
not true thread safety, but more pure index modifications are now
safe which allows the simple cases (such as starting up a diff
while index modifications are underway) safe enough to get the
snapshot without hitting allocation problems.
As part of this, I simplified the allocation of index entries to
use a flex array and just put the path at the end of the index
entry. This makes every entry self-contained and makes it a
little easier to feel sure that pointers to strings aren't
being accidentally copied and freed while other references are
still being held.
This adds a basic test of doing simultaneous diffs on multiple
threads and adds basic locking for the attr file cache because
that was the immediate problem that arose from these tests.
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
The usefulness of these helpers came up for me while debugging
some of the iterator changes that I was making, so since they
have also been requested (albeit indirectly) I thought I'd include
them.
Again, laying groundwork for some index iterator changes, this
contains a bunch of code refactorings for index internals that
should make it easier down the line to add locking around index
modifications. Also this removes the redundant prefix_position
function and fixes some potential memory leaks.
Ignore rules with slashes in them are matched using FNM_PATHNAME
and use the path to the .gitignore file from the root of the
repository along with the path fragment (including slashes) in
the ignore file itself. Unfortunately, the relative path to the
.gitignore file was being applied to the global core.excludesfile
if that was also named ".gitignore".
This fixes that with more precise matching and includes test for
ignore rules with leading slashes (which were the primary example
of this being broken in the real world).
This also backports an improvement to the file context logic from
the threadsafe-iterators branch where we don't rely on mutating
the key of the attribute file name to generate the context path.
There were a couple bugs in popping ignore files during iteration
that could result in incorrect decisions be made and thus ignore
files below the root either not being loaded correctly or not
being popped at the right time.
One bug was an off-by-one in comparing the path of the gitignore
file with the path being exited during iteration.
The second bug was not correctly truncating the path being tracked
during traversal if there were no ignores on the list (i.e. when
you have no .gitignore at the root, but do have some in contained
directories).
This updates how libgit2 treats submodule-like directories that
actually have tracked content inside of them. This is a strange
corner case, but it seems that many people have abortive submodule
setups and then just went ahead and added the files into the
parent repository. In this case, we should just treat the
submodule as if it was a normal directory.
Libgit2 will still try to skip over real submodules and contained
repositories that do not have tracked files inside them, but this
adds some new handling for cases where the apparently submodule
data is in conflict with the actual list of tracked files.
git_merge_base() returns GIT_ENOTFOUND when it cannot find a merge
base. graph_desdendant_of() returns a boolean value (barring any
errors), so it needs to catch the NOTFOUND return value and convert it
into false, as not merge base means it cannot be a descendant.
The internal buffer in the `git_path_iconv_t` structure was not
being reset before the calls to `iconv` were made to convert data,
so if there were multiple decomposed Unicode paths in a single
directory, paths after the first one were being appended to the
first instead of treated as independent data.
The base for the relative urls is determined as follows, with descending
priority:
- remote url of HEAD's remote tracking branch
- remote "origin"
- workdir
This follows git.git behaviour
When the current branch is unborn, git will still mark the current
branch's upstream for-merge if there is an upstream configuration. The
only non-constrived case is cloning from an empty repository which then
gains history. origin's master should be marked for-merge.
In order to do this, we cannot use the high-level wrappers that expect a
reference, as we may not have one. Move over to the internal ones that
expect a reference name, which we do have.
When doing a diff for use in status, we should never show the
content of a git repository contained inside another one. The
logic to do this was looking for a .git directory and so when a
gitlink plain .git file was used, it was failing to exclude the
directory content.
There was a little bug where the submodule cache thought that the
index date was out of date even when it wasn't that was resulting
in some extra scans of index data even when not needed.
Mostly this commit adds a bunch of new tests including adding and
removing submodules in the index and in the HEAD and seeing if we
can automatically pick them up when refreshing.
Wrote tests that try adding, removing, and updating the name of
submodules which showed a number of problems with how we account
for changes when incrementally updating the submodule info. Most
of these issues didn't exist before because reloading would always
blow away the old submodule data.
This improvement the management of the lock around submodule cache
updates slightly, using the lock to make sure that foreach can
safely make a snapshot of all existing submodules and making sure
that git_submodule_add_setup also grabs a lock before inserting
the new submodule. Cache initialization / refresh should already
have been holding the lock correctly as it adds submodules.
When forcing cache flushes or reload, etc., it is easier to keep
track of intent using enums instead of plain bools. Also, this
fixes a bug where the cache was not being properly refreshes by
a git_submodule_reload_all.
This makes submodule cache refresh actually look at the timestamps
from the data sources for submodules and reload as needed if they
have changed since the last refresh.
This takes the old submodule cache which was just a git_strmap
and makes a real git_submodule_cache object that can contain other
things like a lock and timestamp-ish data to control refreshing of
submodule info.
This fixes `git_submodule_sync` to correctly update the remote URL
of the default branch of the submodule along with the URL in the
parent repository config (i.e. match core Git's behavior).
Also move some useful helper logic from the submodule code into
a shared config API `git_config__update_entry` that can either set
or delete an entry with constraints like not overwriting or not
creating a new entry. I used that helper to update a couple other
places in the code.
There are a few places where we need to join three strings to
assemble a path. This adds a simple join3 function to avoid the
comparatively expensive join_n (which calls strlen on each string
twice).
The order in this function is the opposite to what
create_with_fetchspec() has, so change this one, as url-then-refspec is
what git does.
As we need to break compilation and the swap doesn't do that, let's take
this opportunity to rename in-memory remotes to anonymous as that's
really what sets them apart.
With the changes to how git_path_dirload_with_stat handles things
that look like submodules, submodules could end up sorted in the
wrong order with the workdir iterator. This moves the submodule
check earlier in the iterator processing of a new directory so
that the submodule name updates will happen immediately and the
sort order will be correct.
While looping over multiple heads, an up-to-date head will clobber the `remote->need_pack` setting, preventing the rest of the machinery from building and downloading a pack-file, breaking fetches.
If the first call to release a no-longer-existent submodule freed
the object, the check if a second is needed would dereference the
data that was just freed.
When a file is open for reading (without shared-delete permission), and
then a different thread/process called p_rename, that would fail, even
if the file was only open for reading for a few milliseconds. This
change lets p_rename wait up to 50ms for the file to be closed by the
reader. Applies only to win32.
This is especially important for git_filebuf_commit, because writes
should not fail if the file is read simultaneously.
Fixes#2207
When a submodule was inserted with a different path and name, the
return value from khash greater than zero was allowed to propagate
back out to the caller when it should really be zeroed. This led
to a possible crash when reloading submodules if that was the
first time that submodule data was loaded.
The reload_all call could end up dereferencing a NULL pointer if
there was an error while attempting to load the submodules config
data (i.e. invalid content in the gitmodules file). This fixes it.
This cleans up some places I missed that could hold onto submodule
references and cleans up the way in which the repository cache is
both reloaded and released so that existing submodule references
aren't destroyed inappropriately.
When a directory containing a .git directory (or even just a plain
gitlink) was found, libgit2 was going out of its way to treat it
specially. This seemed like it was necessary because the diff
code was not originally emulating Git's behavior for untracked
directories correctly (i.e. scanning for ignored vs untracked items
inside). Now that libgit2 diff mimics Git's untracked directory
behavior, the special handling for contained Git repos is actually
incorrect and this commit rips it out.
`git_submodule` objects were already refcounted internally in case
the submodule name was different from the path at which it was
stored. This makes that refcounting externally used as well, so
`git_submodule_lookup` and `git_submodule_add_setup` return an
object that requires a `git_submodule_free` when done.
As a way to speed up the cases where we need to hide some commits, we
find out what the merge bases are so we know to stop marking commits as
uninteresting and avoid walking down a potentially very large amount of
commits which we will never see. There are however two oversights in
current code.
The merge-base finding algorithm fails to recognize that if it is only
given one commit, there can be no merge base. It instead walks down the
whole ancestor chain needlessly. Make it return an empty list
immediately in this situation.
The revwalk does not know whether the user has asked to hide any commits
at all. In situation where the user pushes multiple commits but doesn't
hide any, the above fix wouldn't do the trick. Keep track of whether the
user wants to hide any commits and only run the merge-base finding
algorithm when it's needed.
The reflog append function was overzealous in its checking. When passed
an old and new ids, it should not do any checking, but just serialize
the data to a reflog entry.
The existing ones lack checking zeroed ids when switching back from an
unborn branch as well as what happens when detaching.
The reflog appending function mistakenly wrote zeros when dealing with a
detached HEAD. This explicitly checks for those situations and fixes
them.
When we update the current branch, we must also append to HEAD's reflog
to keep them in sync.
This is a bit of a hack, but as git.git says, it covers 100% of
default cases.
If the pqueue comparison fn returned just 0 or 1 (think "a<b")
then the sort order of returned items could be wrong because there
was a "< 0" that really needed to be "<= 0". Yikes!!!
The git_odb_exists_prefix API was not dealing correctly when a
later backend returned GIT_ENOTFOUND even if an earlier backend
had found the object.
Additionally, the unit tests were not properly exercising the API
and had a couple mistakes in checking the results.
Lastly, since the backends are not expected to behavior correctly
unless all bytes of the short id are zero except for the prefix,
this makes the ODB prefix APIs explicitly clear out the extra
bytes so the user doesn't have to be as careful.
We look up a reference in order to figure out if it's the current
branch, which we need to free once we're done with the check.
As a bonus, only perform the check when we're passed the force flag, as
it's a useless check otherwise.
We can make use of git_object_dup to use refcounting instead of pointer
comparison to make sure we don't free the caller's object.
This also lets us simplify the case for '~0' which is now just an
assignment instead of looking up the object we have at hand.
Previously the hunk_byfinalline_search_cmp function was called with different
data types (size_t and uint32_t) for the key argument but expected only the
former resulting in an invalid memory access when passed the latter on a 64 bit
machine.
The following patch makes sure that the function is called and works with the
same type (size_t).
If a directory disappears between the time we look up the entries of its
parent and the time when we go to look at it, we should ignore the error
and move forward.
This fixes#2046.
This finds a short id string that will unambiguously select the
given object, starting with the core.abbrev length (usually 7)
and growing until it is no longer ambiguous.
A few fixes have accumulated in this area which have made the freeing of
data a bit muddy. Make sure to free the data only when needed and once.
When we are going to write a delta to the packfile, we need to free the
data, otherwise leave it. The current version of the code mixes up the
checks for po->data and po->delta_data.
This adds `git_diff_buffers` and `git_patch_from_buffers`. This
also includes a bunch of internal refactoring to increase the
shared code between these functions and the blob-to-blob and
blob-to-buffer APIs, as well as some higher level assert helpers
in the tests to also remove redundancy.
- added MSVC cmake definitions to disable warnings
- general.c is rewritten so it is ansi-c compatible and compiles ok on microsoft windows
- some MSVC reported warning fixes
* Make GIT_INLINE an internal definition so it cannot be used in
public headers
* Fix language in CONTRIBUTING
* Make index caps API use signed instead of unsigned values
On some systems, notably HP PA-RISC systems running Linux or HP-UX,
EWOULDBLOCK and EAGAIN are not the same value. POSIX (and these OSes) allow
EWOULDBLOCK to occur on write(2) (and send(2), etc.), so check explicitly
for this case as well as EAGAIN by defining and using a macro GIT_ISBLOCKED
that considers both.
The macro is necessary because MSYS does not provide EWOULDBLOCK and
compilation fails if an attempt is made to use it unconditionally. On most
systems, where the two values are the same, the compiler will simply
optimize this check out and it will have no effect.
This adds an API to amend an existing commit, basically a shorthand
for creating a new commit filling in missing parameters from the
values of an existing commit. As part of this, I also added a new
"sys" API to create a commit using a callback to get the parents.
This allowed me to rewrite all the other commit creation APIs so
that temporary allocations are no longer needed.
This fixes a number of warnings with the Windows 64-bit build
including a test failure in test_repo_message__message where an
invalid pointer to a git_buf was being used.
We need this from util.h and posix.h, but the latter includes common.h
which includes util.h, which means p_strlen is not defined by the time
we get to git__strndup().
Split the definition on p_strlen() off into its own header so we can use
it in util.h.
The current code issues a lot of strncmp() calls in order to check for
the end of the header, simply in order to copy it and start going
through it again. These are a lot of calls for something we can check as
we go along. Knowing the amount of parents beforehand to reduce
allocations in extreme cases does not make up for them.
Instead start parsing immediately and check for the double-newline after
each header field, leaving the raw_header allocation for the end, which
lets us go through the header once and reduces the amount of strncmp()
calls significantly.
In unscientific testing, this has reduced a shortlog-like usage (walking
though the whole history of a branch and extracting data from the
commits) of git.git from ~830ms to ~700ms and makes the time we spend in
strncmp() negligible.
Let the user push committish objects and peel them to figure out which
commit to push to our queue.
This is for convenience and for allowing uses of
git_revwalk_push_glob(w, "tags")
with annotated tags.
Change the name to _matching() intead of _if(), and force _set_target()
to be a conditional update. If the user doesn't care about the old
value, they should use git_reference_create().
This tweaks the pqueue_up and pqueue_down routines so that they
will not do full element swaps but instead carry over the state
of the previous loop iteration and only assign elements for which
we know the final position. This will avoid a little bit of data
assignment which should improve performance in theory.
Also got rid of some vector helpers that I'm no longer using.
This fixes a typo I made for setting the sorted flag on the index
after a reload. That typo didn't actually cause any test failures
so I'm also adding a test that explicitly checks that the index is
correctly sorted after a reload when ignoring case and when not.
This updates the git_pqueue to simply be a set of specialized
init/insert/pop functions on a git_vector.
To preserve the pqueue feature of having a fixed size heap, I
converted the "sorted" field in git_vectors to a more general
"flags" field so that pqueue could mix in it's own flag. This
had a bunch of ramifications because a number of places were
directly looking at the vector "sorted" field - I added a couple
new git_vector helpers (is_sorted, set_sorted) so the specific
representation of this information could be abstracted.
I accidentally wrote a separate priority queue implementation when
I was working on file rename detection as part of the file hash
signature calculation code. To simplify licensing terms, I just
adapted that to a general purpose priority queue and replace the
old priority queue implementation that was borrowed from elsewhere.
This also removes parts of the COPYING document that no longer
apply to libgit2.
Validating the workdir should not compare HEAD to working
directory - this is both inefficient (as it ignores the cache)
and incorrect. If we had legitimately allowed changes in the
index (identical to the merge result) then comparing HEAD to
workdir would reject these changes as different. Further, this
will identify files that were filtered strangely as modified,
while testing with the cache would prevent this.
Also, it's stupid slow.
The checkout code used to defer removal of "blocking" files in
checkouts until the blocked item was actually being written (since
we have already checked that the removing the block is acceptable
according to the update rules). Unfortunately, this resulted in
an intermediate index state where both the blocking and new items
were in the index which is no longer allowed. Now we just remove
the blocking item in the first pass so it never needs to coexist.
In cases where there are typechanges, this could result in a bit
more churn of removing and recreating intermediate directories,
but I'm going to assume that is an unusual case and the churn will
not be too costly.
There were some confusing issues mixing up the number of bytes
written to the zstream output buffer with the number of bytes
consumed from the zstream input. This reorganizes the zstream
API and makes it easier to deflate an arbitrarily large input
while still using a fixed size output.
This removes the fetchRecurse compiler warnings and makes the
behavior match the other submodule options (i.e. the in-memory
setting can be reset to the on-disk value).
When three-way merging indexes, we previously changed each path
as we read them, which would lead to us adding an index entry for
'foo', then removing an index entry for 'foo/file'. With the new
index requirements, this is not allowed. Removing entries in the
merged index, then adding them, resolves this. In the previous
example, we now remove 'foo/file' before adding 'foo'.
In case insensitive index mode, we would stop at a prefixed entry,
treating the provided search key length as a substring, not the
length of the string to match.
Writing a sample Javascript driver pointed out some extra
whitespace handling that needed to be done in the diff driver.
This adds some tests with some sample javascript code that I
pulled off of GitHub just to see what would happen. Also, to
clean up the userdiff test data, I did a "git gc" and packed
up the test objects.
Since I don't have permission yet on the code from Git, I decided
I'd take a stab at writing patterns for PHP and Javascript myself.
I think these are pretty weak, but probably better than the
default behavior without them.
I contacted a number of Git authors and lined up their permission
to relicense their work for use in libgit2 and copied over their
code for diff driver xfuncname patterns. At this point, the code
I've copied is taken verbatim from core Git although Thomas Rast
warned me that the C++ patterns, at least, really need an update.
I've left off patterns where I don't feel like I have permission
at this point until I hear from more authors.
Reorganize the builtin driver table slightly so that core Git
builtin definitions can be imported verbatim. Then take a few of
the core Git drivers and pull them in.
This also creates a test of diffs with the builtin HTML driver
which led to some small error handling fixes in the driver
selection logic.
This extends the diff driver parser to support multiline driver
definitions along with ! prefixing for negated matches. This
brings the driver function pattern parsing in line with core Git.
This also adds an internal table of driver definitions and a
fallback code path that will look in that table for diff drivers
that are set with attributes without having a definition in the
config file. Right now, I just populated the table with a kind
of simple HTML definition that is similar to the core Git def.
Don't try to determine whether the system supports file modes
when putting the tree data in the index during checkout. The tree's
mode is canonical and did not come from stat(2) in the first place.
It's hard or even impossible to correctly free the string buffer
allocated by git_patch_to_str in some circumstances. Drop the function
so people have to use git_patch_to_buf instead - git_buf has a dedicated
destructor.
Returning library-allocated strings from libgit2 works fine on Linux,
but may cause problems on Windows because there is no one C Runtime that
everything links against. With libgit2 not exposing its own allocator,
freeing the string is a gamble.
git_patch_to_str already serializes to a buffer, then returns the
underlying memory. Expose the functionality directly, so callers can use
the git_buf_free function to free the memory later.
The "merge none" (don't automerge) flag was only to aide in
merge trivial tests. We can easily determine whether merge
trivial resulted in a trivial merge or an automerge by examining
the REUC after automerge has completed.
The default merge_file level was XDL_MERGE_MINIMAL, which will
produce conflicts where there should not be in the case where
both sides were changed identically. Change the defaults to be
more aggressive (XDL_MERGE_ZEALOUS) which will more aggressively
compress non-conflicts. This matches git.git's defaults.
Increase testing around reverting a previously reverted commit to
illustrate this problem.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
This changes git_signature_dup to actually honor oom conditions raised by
the call to git__strdup. It also aligns it with the error code return
pattern used everywhere else.
Ok, scrap the previous commit. This is the right overflow check that
takes care of 64 bit overflow **and** 32-bit overflow, which needs to be
considered because the pool malloc can only allocate 32-bit elements in
one go.
Note that `git_pool_strdup` cannot really return any error codes,
because the pool doesn't set errors on OOM.
The only place where `giterr_set_oom` is called is in
`git_pool_strndup`, in a conditional check that is always optimized
away. `n + 1` cannot be zero if `n` is unsigned because the compiler
doesn't take wraparound into account.
This check has been removed altogether because `size_t` is not
particularly going to overflow.
This renames git_vector_free_all to the better git_vector_free_deep
and also contains a couple of memory leak fixes based on valgrind
checks. The fixes are specifically: failure to free global dir
path variables when not compiled with threading on and failure to
free filters from the filter registry that had not be initialized
fully.
This adds tests that try canceling an indexer operation from
within the progress callback.
After writing the tests, I wanted to run this under valgrind and
had a number of errors in that situation because mmap wasn't
working. I added a CMake option to force emulation of mmap and
consolidated the Amiga-specific code into that new place (so we
don't actually need separate Amiga code now, just have to turn on
-DNO_MMAP).
Additionally, I made the indexer code propagate error codes more
reliably than it used to.
Clone callbacks can return non-zero values to cancel the clone.
This adds some tests to verify that this actually works and updates
the documentation to be clearer that this can happen and that the
return value will be propagated back by the clone function.
The checkout notify callback behavior on non-zero return values
was not being tested. This adds tests, fixes a bug with positive
values, and clarifies the documentation to make it clear that the
checkout can be canceled via this mechanism.
The callback to supply data chunks could return a negative value
to stop creation of the blob, but we were neither using GIT_EUSER
nor propagating the return value. This makes things use the new
behavior of returning the negative value back to the user.
This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
There are a lot of places that we call git__free on each item in
a vector and then call git_vector_free on the vector itself. This
just wraps that up into one convenient helper function.
This continues auditing all the places where GIT_EUSER is being
returned and making sure to clear any existing error using the
new giterr_user_cancel helper. As a result, places that relied
on intercepting GIT_EUSER but having the old error preserved also
needed to be cleaned up to correctly stash and then retrieve the
actual error.
Additionally, as I encountered places where error codes were not
being propagated correctly, I tried to fix them up. A number of
those fixes are included in the this commit as well.
This adds giterr_user_cancel to return GIT_EUSER and clear any
error message that is sitting around. As a result of using that
in places, we need to be more thorough with capturing errors that
happen inside a callback when used internally. To help with that,
this also adds giterr_capture and giterr_restore so that when we
internally use a foreach-type function that clears errors and
converts them to GIT_EUSER, it is easier to restore not just the
return value, but the actual error message text.
This adds `git_config__lookup_entry` which will look up a key in
a config and return either the entry or NULL if the key was not
present. Optionally, it can either suppress all errors or can
return them (although not finding the key is not an error for this
function). Unlike other accessors, this does not normalize the
config key string, so it must only be used when the key is known
to be in normalized form (i.e. all lower-case before the first dot
and after the last dot, with no invalid characters).
This also adds three high-level helper functions to look up config
values with no errors and a fallback value. The three functions
are for string, bool, and int values, and will resort to the
fallback value for any error that arises. They are:
* `git_config__get_string_force`
* `git_config__get_bool_force`
* `git_config__get_int_force`
None of them normalize the config `key` either, so they can only
be used for internal cases where the key is known to be in normal
format.
The frontend used to look at the file directly, but that's obviously not
the right thing to do. Expose it on the backend and use that function
instead.
git-core only writes to the reflogs of HEAD, refs/heads/ and,
refs/notes/ or if there is already a reflog in place. Adjust our code to
follow these semantics.
When doing copy detection, it is often necessary to include
UNMODIFIED records in the git_diff so they are available as source
records for GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED. Yet in the final
diff, often you will not want to have these UNMODIFIED records.
This adds a flag which marks these UNMODIFIED records for deletion
from the diff list so they will be removed after the rename detect
phase is over.
When FIND_COPIES is used in combination with BREAK_REWRITES for
rename detection, there was a bug where the split MODIFIED delta
was only used as a target for RENAME records and not for COPIED
records. This fixes that, converting the split into a pair of
DELETED and COPIED deltas when that circumstance arises.
Whenever a reference is created or updated, we need to write to the
reflog regardless of whether the user gave us a message, so we shouldn't
leave that to the ref frontend, but integrate it into the backend.
This also eliminates the race between ref update and writing to the
reflog, as we protect the reflog with the ref lock.
As an additional benefit, this reflog append on the backend happens by
appending to the file instead of parsing and rewriting it.
Copy the pointers into temporary vectors instead of assigning them tot
he same array so we don't mess up with someone else's memory by
accident (e.g. by sorting).
The callback-based method of listing remote references dates back to the
beginning of the network code's lifetime, when we didn't know any
better.
We need to keep the list around for update_tips() after disconnect() so
let's make use of this to simply give the user a pointer to the array so
they can write straightforward code instead of having to go through a
callback.
Removing arbitrary refspecs makes things more complex to reason
about. Instead, let the user set the fetch and push refspec list to
whatever they want it to be.
Create a git_branch_iterator type which is equivalent to the foreach but
lets us write loops instead of callbacks.
Since the introduction of git_reference_shorthand(), the added value of
passing the name is reduced.
When the filesystem iterator encounters an error with a file, it
returns the error but because of the cleanup code, it was in some
cases erasing the error message. This uses the giterr_detach API
to make sure that the actual error message is restored after the
cleanup code has been run.
There are a number of cases where it is convenient to be able to
fetch and "claim" the current error string, clearing the error.
This is helpful when you need to call some code that may alter
the error and you want to restore it later on and/or report it via
some other mechanism.
We used to move `data_start` forward, which is wrong as that needs to
point to the beginning of the buffer in order to perform size
calculations.
Introduce a `write_start` variable which indicates where we should start
writing from, which is what the `data_start` was being wrongly reused to
be.
The last commit taught git_checkout_tree to actually do something
meaningfull, when treeish was NULL. This lets us rewrite
git_checkout_head to simply call git_checkout_tree without giving it a
treeish.
In git_checkout_tree, the first check tests if either repo or treeish is
NULL and says that eithor of them has to have a valid value. But there
is no code to handle the treeish == NULL case.
So, do something meaningful in that case: use HEAD instead.
When downloading the default branch due to lack of refspecs, we still
need to write out FETCH_HEAD with the tip we downloaded, unfortunately
with a format that doesn't match what we already have.
This avoids sending our whole history bit by bit to the remote in cases
where there is no common history, just to give up in the end.
The number comes from the canonical implementation.
The correct behaviour when a remote has no refspecs (e.g. a URL from the
command-line) is to download the remote's HEAD. Let's do that.
This fixes#1261.
This was never really working right because we were checking the
wrong flag and not checking it in all the places that we need to
be checking it. I finally got around to writing a test and adding
actual support for it.
Sometimes the static initializer for git_diff_options cannot be
used and since setting them to all zeroes doesn't actually work
quite right, this adds a new helper for that situation.
This also adds an explicit new value to the submodule settings
options to be used when those enums need static initialization.
This changes `git_index_read` to have two modes - a hard index
reload that always resets the index to match the on-disk data
(which was the old behavior) and a soft index reload that uses
the timestamp / file size information and only replaces the index
data if the file on disk has been modified.
This then updates the git_status code to do a soft reload unless
the new GIT_STATUS_OPT_NO_REFRESH flag is passed in.
This also changes the behavior of the git_diff functions that use
the index so that when an index is not explicitly passed in (i.e.
when the functions call git_repository_index for you), they will
also do a soft reload for you.
This intentionally breaks the file signature of git_index_read
because there has been some confusion about the behavior previously
and it seems like all existing uses of the API should probably be
examined to select the desired behavior.
These changes fix the basic problem with GIT_DIFF_REVERSE being
broken for text diffs. The reversed diff entries were getting
added to the git_diff correctly, but some of the metadata was kept
incorrectly in a way that prevented the text diffs from being
generated correctly. Once I fixed that, it became clear that it
was not possible to merge reversed diffs correctly. This has a
first pass at fixing that problem. We probably need more tests
to make sure that is really fixed thoroughly.
Seems that regexp in Mac OS X and Linux were behaving
differently: while in OS X the empty string didn't
match any value, in Linux it was matching all of them,
so the the second fetch refspec was overwritting the
first one, instead of creating a new one.
Using an unmatcheable regular expression solves the
problem (and seems to be portable).
At some moment git_config_delete_entry lost the ability to delete one entry of
a multivar configuration. The moment you had more than one fetch or push
ref spec for a remote you will not be able to save that remote anymore. The
changes in network::remote::remotes::save show that problem.
I needed to create a new git_config_delete_multivar because I was not able to
remove one or several entries of a multivar config with the current API.
Several tries modifying how git_config_set_multivar(..., NULL) behaved were
not successful.
git_config_delete_multivar is very similar to git_config_set_multivar, and
delegates into config_delete_multivar of config_file. This function search
for the cvar_t that will be deleted, storing them in a temporal array, and
rebuilding the linked list. After calling config_write to delete the entries,
the cvar_t stored in the temporal array are freed.
There is a little fix in config_write, it avoids an infinite loop when using
a regular expression (case for the multivars). This error was found by the
test network::remote::remotes::tagopt.
This tells the server that we speak it, but we don't make use of its
extra information to determine if there's a better place to stop
negotiating.
In a somewhat-related change, reorder the capabilities so we ask for
them in the same order as git does.
Also take this opportunity to factor out a fairly-indented portion of
the negotiation logic.
It was there to keep it apart from the one which read in from a file on
disk. This other indexer does not exist anymore, so there is no need for
anything other than git_indexer to refer to it.
While here, rename _add() function to _append() and _finalize() to
_commit(). The former change is cosmetic, while the latter avoids
talking about "finalizing", which OO languages use to mean something
completely different.
When building libgit2 for ia32 architecture on a x64 machine, including
"config.h" without a "common.h" would result the following error:
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2288): error C2373: 'InterlockedIncrement' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2295): error C2373: 'InterlockedDecrement' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2303): error C2373: 'InterlockedExchange' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2314): error C2373: 'InterlockedExchangeAdd' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
The user is unable to derive the number of deltas in the pack, as that
would require them to capture the stats exactly in the moment between
download and final processing, which is abstracted away in the fetch.
Capture these numbers for the user and expose them in the progress
struct. The clone and fetch examples now also present this information
to the user.
The names from libssh2 are somewhat obtuse for us. We can simplify the
usual key/passphrase credential's name, as well as make clearer what the
custom signature function is.
It seems that to implement these options, we just have to pass
the appropriate flags through to the libxdiff code taken from
core git. So let's do it (and add a test).
Instead of having functions with so very many parameters to pass
hunk and line data, this takes the existing git_diff_hunk struct
and extends it with more hunk data, plus adds a git_diff_line.
Those structs are used to pass back hunk and line data instead of
the old APIs that took tons of parameters.
Some work that was previously only being done for git_diff_patch
creation (scanning the diff content for exact line counts) is now
done for all callbacks, but the performance difference should not
be noticable.
While the base git_diff_delta structure always contains two files,
when we introduce conflict data, it will be helpful to have an
indicator when an additional file is involved.
Move conflict handling into two steps: load the conflicts and
then apply the conflicts. This is more compatible with the
existing checkout implementation and makes progress reporting
more sane.
If a D/F conflict or rename 2->1 conflict occurs,
we write the file sides as filename~branchname. If
a file with that name already exists in the working
directory, write as filename~branchname_0 instead.
(Incrementing 0 until a unique filename is found.)
This lays groundwork for separating formatting options from diff
creation options. This groups the formatting flags separately
from the diff list creation flags and reorders the options. This
also tweaks some APIs to further separate code that uses patches
from code that just looks at git_diffs.
This makes no functional change to diff but renames a couple of
the objects and splits the new git_patch (formerly git_diff_patch)
into a new header file.
Don't increase the number of total objects, as it can produce
suprising progress output. The only addition compared to pre-thin is
the addition of local_objects to allow an output similar to git's
"completed with %d local objects".
The iconv init was accidentally clearing the default error state
during reference normalization. This resets so that normalization
errors will be detected correctly.
Before these changes, looking up a reference would return the
same precomposed or decomposed form of the reference name that
was used to look it up, so on MacOS which ignores the difference
between the two, a single reference could be looked up either way
and git_reference_name would return the form of the name that was
used to look it up! This change makes lookup always return the
precomposed name if core.precomposeunicode is set regardless of
which version was used to look it up. The reference iterator was
already returning the precomposed form from earlier work.
This also updates the CMakeLists.txt rules for enabling iconv
usage because the clar tests for this code were actually not being
activated properly with the old version.
Finally, this moves git_repository_reset_filesystem from include/
git2/repository.h to include/git2/sys/repository.h since it is not
really a function that normal library users should have to think
about very often.
This cleans up some additional issues. The main change is that
on a filesystem that doesn't support mode bits, libgit2 will now
create new blobs with GIT_FILEMODE_BLOB always instead of being
at the mercy to the filesystem driver to report executable or not.
This means that if "core.filemode" lies and claims that filemode
is not supported, then we will ignore the executable bit from the
filesystem. Previously we would have allowed it.
This adds an option to the new git_repository_reset_filesystem to
recurse through submodules if desired. There may be other types
of APIs that would like a "recurse submodules" option, but this
one is particularly useful.
This also has a number of cleanups, etc., for related things
including trying to give better error messages when problems come
up from the filesystem. For example, the FAT filesystem driver on
MacOS appears to return errno EINVAL if you attempt to write a
filename with invalid UTF-8 in it. We try to capture that with a
better error message now.
There may be multiple deltas referencing the same base as well as OFS
deltas which rely on a thin delta. Deal with both at the same time by
injecting a single object and going back up to the main
delta-resolving loop.
When a tool needs to recreate the tree object (for example an
interface to another VCS), it needs to use the raw attributes,
forgoing any normalization.
When a repository is transferred from one file system to another,
many of the config settings that represent the properties of the
file system may be wrong. This adds a new public API that will
refresh the config settings of the repository to account for the
change of file system. This doesn't do a full "reinitialize" and
operates on a existing git_repository object refreshing the config
when done.
This commit then makes use of the new API in clar as each test
repository is set up.
This commit also has a number of other clar test fixes where we
were making assumptions about the type of filesystem, either based
on outdated config data or based on the OS instead of the FS.
When given an ODB from which to read objects, the indexer will attempt
to inject the missing bases at the end of the pack and update the
header and trailer to reflect the new contents.
Though unusual, a packfile may contain a delta whose base is a delta
that comes later. In order index such a packfile, we must not give up
on the first failure to resolve a delta, but keep it around.
If there is a pass which makes no progress, this indicates that the
packfile is broken, so fail accordingly.
The repo init code was assuming Windows == no filemode, and
Mac or Windows == no case sensitivity. Those assumptions are not
consistently true depending on the mounted file system. This is a
first step to removing those assumptions. It focuses on the repo
init code and the tests of that code. There are still many other
tests that are broken when those assumptions don't hold true, but
this clears up one area of the code.
Also, this moves the core.precomposeunicode logic to be closer to
the current logic in core Git where it will be set to true on any
filesystem where composed unicode is decomposed when read back.
The indexer code was generating warnings on Windows 64-bit. I
looked closely at the logic and was able to simplify it a bit.
Also this fixes some other Windows and Linux warnings.
This adds a simple wrapper around the iconv APIs and uses it
instead of the old code that was inlining the iconv stuff. This
makes it possible for me to test the iconv logic in isolation.
A "no iconv" version of the API was defined with macros so that
I could have fewer ifdefs in the code itself.
This simplifies git_path_is_empty_dir on both Windows (getting rid
of git_buf allocation inside the function) and other platforms (by
just using git_path_direach), and adds tests for the function, and
uses the function to simplify some existing tests.
This hooks up git_path_direach and git_path_dirload so that they
will take a flag indicating if directory entry names should be
tested and converted from decomposed unicode to precomposed form.
This code will only come into play on the Apple platform and even
then, only when certain types of filesystems are used.
This involved adding a flag to these functions which involved
changing a lot of places in the code.
This was an opportunity to do a bit of code cleanup here and there,
for example, getting rid of the git_futils_cleanupdir_r function in
favor of a simple flag to git_futils_rmdir_r to not remove the top
level entry. That ended up adding depth tracking during rmdir_r
which led to a safety check for infinite directory recursion. Yay.
This hasn't actually been tested on the Mac filesystems where the
issue occurs. I still need to get test environment for that.
This doesn't actual do string precompose but it puts the hooks in
place into the iterators and the git_path_dirload function so that
the actual precompose work is ready to go.
This adds initialization of core.precomposeunicode to repo init
on Mac. This is necessary because when a Mac accesses a repo on
a VFAT or SAMBA file system, it will return directory entries in
decomposed unicode even if the filesystem entry is precomposed.
This also removes caching of a number of repo properties from the
repo init pipeline because these are properties of the specific
filesystem on which the repo is created, not of the system as a
whole.
This commit adds cancellation for the push operation. This work consists of:
1) Support cancellation during push operation
- During object counting phase
- During network transfer phase
- Propagate GIT_EUSER error code out to caller
2) Improve cancellation support during fetch
- Handle cancellation request during network transfer phase
- Clear error string when cancelled during indexing
3) Fix error handling in git_smart__download_pack
Cancellation during push is still only handled in the pack building and
network transfer stages of push (and not during packbuilding).
References and their logs are logically coupled, let's make it so in
the code by moving the fs-based reflog implementation to live next to
the fs-based refs one.
As part of the change, make the function take names rather than
references, as only the names are relevant when looking up and
handling reflogs.
The basic clone function is there to make it easy to create a "normal"
clone. Remove a bunch of options that are about changing the remote's
configuration.
The text progress and update_tips callbacks are already part of the
struct, which was meant to unify the callback setup, but the download
one was left out.
This adds the basics of progress reporting during push. While progress
for all aspects of a push operation are not reported with this change,
it lays the foundation to add these later. Push progress reporting
can be improved in the future - and consumers of the API should
just get more accurate information at that point.
The main areas where this is lacking are:
1) packbuilding progress: does not report progress during deltafication,
as this involves coordinating progress from multiple threads.
2) network progress: reports progress as objects and bytes are going
to be written to the subtransport (instead of as client gets
confirmation that they have been received by the server) and leaves
out some of the bytes that are transfered as part of the push protocol.
Basically, this reports the pack bytes that are written to the
subtransport. It does not report the bytes sent on the wire that
are received by the server. This should be a good estimate of
progress (and an improvement over no progress).
The subtransport path was relying on pointing to data owned by
the remote which meant that after a redirect, the updated path
was getting lost for future requests. This updates the http
transport to strdup the path and maintain its own lifetime.
This also pulls responsibility for parsing the URL back into the
http transport and isolates the functions that parse and free that
connection data so that they can be reused between the initial
parsing and the redirect parsing.
On occasion, files can disappear while we're iterating the
filesystem, between calls to readdir and stat. Let's pretend
those didn't exist in the first place.
The git_buf_text_gather_stats call returns a boolean indicating if
the file looks like binary data. That shouldn't be an error; it
should be used to skip CRLF processing though.
This replaces some git_buf_printf calls with simple calls to
git_buf_put instead. Also, it fixes a missing va_end inside
the git_buf_vprintf implementation.
The attempt to "clean up warnings" seems to have introduced some
new warnings on compliant compilers. This fixes those in a way
that I suspect will also be okay for the non-compliant compilers.
Also this fixes what appears to be an extra semicolon in the
repo initialization template dir handling (and as part of that
fix, handles the case where an error occurs correctly).
In revwalk, we are doing a very simple check to see if a string
contains wildcard characters, so a full regular expression match
is not needed.
In remote listing, now that we have git_config_foreach_match with
full regular expression matching, we can take advantage of that
and eliminate the regex here, replacing it with much simpler string
manipulation.
This contains a few bug fixes and some header and API cleanups.
The main API change is that filters should now use GIT_PASSTHROUGH
to indicate that they wish to skip processing a file instead of
GIT_ENOTFOUND.
The bug fixes include a possible out-of-range buffer access in
the ident filter, a filter ordering problem I introduced into the
custom filter tests on Windows, and a filter buf NUL termination
issue that was coming up on Linux.
This adds more tests of filters, including the ident filter when
mixed with custom filters. I was able to combine with the reverse
filter and demonstrate that the order of filter application with
the default priority constants matches the order of core Git.
Also, this fixes two issues in the ident filter: preventing ident
expansion on binary files and avoiding a NULL dereference when
dollar sign characters are found without Id.
Fixed the filter order to match core Git, too.
This test demonstrates an interesting behavior of core Git (which
is totally reasonable and which libgit2 matches, although mostly
by coincidence). If you use the ident filter and commit a file
with a garbage ident in it, like '$Id: this is just garbage$' and
then immediately do a 'git checkout-index' with the new file, Git
will not consider the file out of date and will not overwrite the
file with an updated $Id$. Libgit2 has the same behavior. If you
remove the file and then do a checkout-index, it will be replaced
with a filtered version that has injected the OID correctly.
These are a couple of new clar helpers for testing that a file
has expected contents that I extracted from the checkout code.
Actually wrote this as part of an abandoned earlier attempt at a
new filters API, but it will be useful now for some of the tests
I'm going to write.
I wish MSVC understood that "const char **" is not a const ptr,
but it a non-const pointer to an array of const ptrs. Does that
seem like too much to ask.
Checkout should not reject binary files from filters, as a filter
may actually wish to operate on binary files. The CRLF filter should
reject binary files itself if it wishes to. Moreover, the CRLF
filter requires this logic so that users can emulate the checkout
data in their odb -> workdir filtering.
Conflicts:
src/checkout.c
src/crlf.c
This makes the git_buf struct that was used internally into an
externally available structure and eliminates the git_buffer.
As part of that, some of the special cases that arose with the
externally used git_buffer were blended into the git_buf, such as
being careful about git_buf objects that may have a NULL ptr and
allowing for bufs with a valid ptr and size but zero asize as a
way of referring to externally owned data.
This adds the ident filter (that knows how to replace $Id$) and
tweaks the filter APIs and code so that git_filter_source objects
actually have the updated OID of the object being filtered when
it is a known value.
Extend the git2/sys/filter API with functions to look up a filter
and add it manually to a filter list. This requires some trickery
because the regular attribute lookups and checks are bypassed when
this happens, but in the right hands, it will allow a user to have
granular control over applying filters.
Increasingly there are a number of components that want to do some
cleanup at global shutdown time (at least if there are not going
to be memory leaks). This creates a very simple system of shutdown
hooks that will be invoked by git_threads_shutdown. Right now, the
maximum number of hooks is hardcoded, but since adding a hook is
not a public API, it should be fine and I thought it was better to
start off with really simple code.
This moves the git_filter_list into the public API so that users
can create, apply, and dispose of filter lists. This allows more
granular application of filters to user data outside of libgit2
internals.
This also converts all the internal usage of filters to the public
APIs along with a few small tweaks to make it easier to use the
public git_buffer stuff alongside the internal git_buf.
The filter registry as implemented was too primitive to actually
work once multiple filters were coming into play. This expands
the implementation of the registry to handle multiple prioritized
filters correctly.
Additionally, this adds an "attributes" field to a filter that
makes it really really easy to implement filters that are based
on one or more attribute values. The lookup and even simple value
checking can all happen automatically without custom filter code.
Lastly, with the registry improvements, this fills out the filter
lifecycle callbacks, with initialize and shutdown callbacks that
will be called before the filter is first used and after it is
last invoked. This allows for system-wide initialization and
cleanup by the filter.
This creates include/sys/filter.h with a basic definition of a
git_filter and then converts the internal code to use it. There
are related internal objects (git_filter_list) that we will want
to publish at some point, but this is a first step.
This begins the process of exposing git_filter objects to the
public API. This includes:
* new public type and API for `git_buffer` through which an
allocated buffer can be passed to the user
* new API `git_blob_filtered_content`
* make the git_filter type and GIT_FILTER_TO_... constants public
Unfortunately git-core uses the term "unborn branch" and "orphan
branch" interchangeably. However, "orphan" is only really there for
the checkout command, which has the `--orphan` option so it doesn't
actually create the branch.
Branches never have parents, so the distinction of a branch with no
parents is odd to begin with. Crucially, the error messages deal with
unborn branches, so let's use that.
As the include depth increases, the chance of a realloc
increases. This means that whenever we run git_array_alloc() or call
config_parse(), we need to remember what our reader's index is so we
can look it up again.
We need to refresh the variables from the included files if they are
changed, so loop over all included files and re-parse the files if any
of them has changed.
Now that #1785 is merged, git_odb_stream_finalize_write() calculates the object id before invoking the odb backend.
This commit gives a chance to the backend to check if it already knows this object.
The GIT_MODE_TYPE macro was looking at all bits above the
permissions, but it should really just look at the top bits so
that it will give the right results for a setgid or setuid entry.
Since we're now using these macros in the tests, this was causing
a test failure on platforms that don't support setgid.