If the remote is anonymous, then we cannot check for any configuration,
as there is no name. Check for this before we try to use the name, which
may be a NULL pointer.
This fixes#2697.
This function has one output but can match multiple files, which can be
unexpected for the user, which would usually path the exact path of the
file he wants the status of.
We cannot know from looking at .gitmodules whether a directory is a
submodule or not. We need the index or tree we are comparing against to
tell us. Otherwise we have to assume the entry in .gitmodules is stale
or otherwise invalid.
Thus we pass the index of the repository into the workdir iterator, even
if we do not want to compare against it. This follows what git does,
which even for `git diff <tree>`, it will consider staged submodules as
such.
A rule "src" in src/.gitignore must only match subdirectories of
src/. The current code does not include this context in the match rule
and would thus consider this rule to match the top-level src/ directory
instead of the intended src/src/.
Keep track fo the context in which the rule was defined so we can
perform a prefix match.
We currently consider CR to start the end of the line, but that means
that we miss cases with CR CR LF which can be used with git to match
files whose names have CR at the end of their names.
The fix from the patch comes from Russell's comment in the issue.
This fixes#2536.
Before trying to rtransform using the given refspec to figure out what
the name of the upstream branch is on the remote, we must make sure that
the target of the refspec applies to the current branch's upstream.
* Error-handling is cleaned up to only let a file-not-found error
through, not other sorts of errors. And when a file-not-found
error happens, we clean up the error.
* Test now checks that file-not-found introduces no error. And
other minor cleanups.
The create function with default refspec is the same as the one with a
custom refspec, but it has the default refspec, so we can create the one
on top of the other.
When we first ask OpenSSL to verify the certfiicate itself (rather
than the HTTPS specifics), we should also return
GIT_ECERTIFICATE. Otherwise, the caller would consider this as a failed
operation rather than a failed validation and not call the user's own
validation.
For example, if you have
[include]
path = foo
and foo didn't exist, git_config_open_ondisk() would just give up
on the rest of the file. Now it ignores the unresolved include
without error and continues reading the rest of the file.
Already cherry-picked commits should not be re-included. If all changes
included in a commit exist in the upstream, then we should error with
GIT_EAPPLIED.
`git_rebase_next` will apply the next patch (or cherry-pick)
operation, leaving the results checked out in the index / working
directory so that consumers can resolve any conflicts, as appropriate.
Remote objects are not meant to be changed from under the user. We did
this in rename, but only the name and left the refspecs, such that a
save would save the wrong refspecs (and a fetch and anything else would
use the wrong refspecs).
Instead, let's simply take a name and not change any loaded remote from
under the user.
This function does not in fact tell us anything, as almost anything with
a colon in it is a valid rsync-style SSH path; it can not tell us that
we do not support ftp or afp or similar as those are still valid SSH
paths and we do support that.
All versions of SSL are considered deprecated now, so let's ask OpenSSl
to only use TLSv1. We still ask it to load those ciphers for
compatibility with servers which want to use an older hello but will use
TLS for encryption.
For good measure we also disable compression, which can be exploitable,
if the OpenSSL version supports it.
Git for Windows will handle UNC paths only when in forward-slash
format, eg "//server/path". When given a UNC path as a remote,
rewrite standard format ("\\server\path") into this ridiculous
format.
The entry_count field is the amount of index entries covered by a
particular cache entry, that is how many files are there (recursively)
under a particular directory.
The current code that attemps to do this is severely defincient and is
trying to count the amount of children, which always comes up to zero.
We don't even need to recount, since we have the information during the
cache creation. We can take that number and keep it, as we only ever
invalidate or replace.
FindFirstFile will fail with INVALID_HANDLE_VALUE if there are no
children to the given path, which can happen if the given path is a
file (and obviously has no children) or if the given path is an empty
mount point. (Most directories have at least directory entries '.'
and '..', but ridiculously another volume mounted in another drive
letter's path space do not, and thus have nothing to enumerate.)
If FindFirstFile fails, check if this is a directory-like thing
(a mount point).
A reparse point that is an IO_REPARSE_TAG_MOUNT_POINT could be
a junction or an actual filesystem mount point. (Who knew?)
If it's the latter, its reparse point will report the actual
volume information \??\Volume{GUID}\ and we should not attempt
to dereference that further, instead readlink should report
EINVAL since it's not a symlink / junction and its original
path was canonical.
Yes, really.
An obvious place to fill the tree cache is on write-tree, as we're
guaranteed to be able to fill in the whole tree cache.
The way this commit does this is not the most efficient, as we read the
root tree from the odb instead of filling in the cache as we go along,
but it fills the cache such that successive operations (and persisting
the index to disk) will be able to take advantage of the cache, and it
reuses the code we already have for filling the cache.
Filling in the cache as we create the trees would require some
reallocation of the children vector, which is currently not possible
with out pool implementation. A different data structure would likely
allow us to perform this operation at a later date.
Keeping the cache around after read-tree is only one part of the
optimisation opportunities. In order to share the cache between program
instances, we need to write the TREE extension to the index.
Do so, taking the opportunity to rename 'entries' to 'entry_count' to
match the name given in the format description. The included test is
rather trivial, but works as a sanity check.
When reading from a tree, we know what every tree is going to look like,
so we can fill in the tree cache completely, making use of the index for
modification of trees a lot quicker.
This simplifies freeing the entries quite a bit; though there aren't
that many failure paths right now, introducing filling the cache from a
tree will introduce more. This makes sure not to leak memory on errors.
If there have been no pushes, we can immediately return ITEROVER. If
there have been no hides, we must not run the uninteresting pre-mark
phase, as we do not want to hide anything and this would simply cause us
to spend time loading objects.
This introduces a phase at the start of preparing a walk which pre-marks
uninteresting commits, but only up to the common ancestors.
We do this in a similar way to git, by walking down the history and
marking (which is what we used to do), but we keep a time-sorted
priority queue of commits and stop marking as soon as there are only
uninteresting commits in this queue.
This is a similar rule to the one used to find the merge-base. As we
keep inserting commits regardless of the uninteresting bit, if there are
only uninteresting commits in the queue, it means we've run out of
interesting commits in our walk, so we can stop.
The old mark_unintesting() logic is still in place, but that stops
walking if it finds an already-uninteresting commit, so it will stop on
the ones we've pre-marked; but keeping it allows us to also hide those
that are hidden via the callback.
We don't need the remote loaded, and the function extracted both of
these from the git_remote in order to do its work, so let's remote a
step and not ask for the loaded remote at all.
This fixes#2390.
The stash is implemented as the refs/stash reference and its reflog. In
order to modify the reflog, we need avoid races by making sure we're the
only ones allowed to modify the reflog.
We achieve this via the transactions API. Locking the reference gives us
exclusive write access, letting us modify and write it without races.
A transaction allows you to lock multiple references and set up changes
for them before applying the changes all at once (or as close as the
backend supports).
This can be used for replication purposes, or for making sure some
operations run when the reference is locked and thus cannot be changed.
When a list of refspecs is passed to fetch (what git would consider
refspec passed on the command-line), we not only need to perform the
updates described in that refspec, but also update the remote-tracking
branch of the fetched remote heads according to the remote's configured
refspecs.
These "fetches" are not however to be written to FETCH_HEAD as they
would be duplicate data, and it's not what the user asked for.
The configured/base fetch refspecs need to be taken into account in
order to implement opportunistic remote-tracking branch updates. DWIM
them and store them in the struct, but don't do anything with them yet.
We can only DWIM when we've connected to the remote and have the list of
the remote's references. Adding or setting the refspecs should not
trigger an attempt to DWIM the refspecs as we typically cannot do it,
and even if we did, we would not use them for the current fetch.
With opportunistic ref updates, git has introduced the concept of having
base refspecs *and* refspecs that are active for a particular fetch.
Let's start by letting the user override the refspecs for download.
When we describe the workdir, we perform a describe on HEAD and then
check to see if the worktree is dirty. If it is and we have a suffix
string, we append that to the buffer.
Instead of printing out to the buffer inside the information-gathering
phase, write the data to a intermediate result structure.
This allows us to split the options into gathering options and
formatting options, simplifying the gathering code.
The getaddrinfo function indicates failure with a non-zero return code,
but this code is not necessarily negative. On platforms like Android
where the code is positive, a failed call causes libgit2 to segfault.
Instead of spreading the data in function arguments, some of which
aren't used for ssh and having a struct only for ssh, use a struct for
both, using a common parent to pass to the callback.
This option make it easy to ignore anything about the server we're
connecting to, which is bad security practice. This was necessary as we
didn't use to expose detailed information about the certificate, but now
that we do, we should get rid of this.
If the user wants to ignore everything, they can still provide a
callback which ignores all the information passed.
We should let the user decide whether to cancel the connection or not
regardless of whether our checks have decided that the certificate is
fine. We provide our own assessment to the callback to let the user fall
back to our checks if they so desire.
If the certificate validation fails (or always in the case of ssh),
let the user decide whether to allow the connection.
The data structure passed to the user is the native certificate
information from the underlying implementation, namely OpenSSL or
WinHTTP.
When using a bare repo with an index, libgit2 attempts to read
files from the index. It caches those files based on the path
to the file, specifically the path to the directory that contains
the file.
If there is no working directory, we use `git_path_dirname_r` to
get the path to the containing directory. However, for the
`.gitattributes` file in the root of the repository, this ends up
normalizing the containing path to `"."` instead of the empty
string and the lookup the `.gitattributes` data fails.
This adds a test of attribute lookups on bare repos and also
fixes the problem by simply rewriting `"."` to be `""`.
Passing 0 as the length of the paths to check to git_diff_index_to_workdir
results in all files being treated as conflicting, that is, all untracked or
modified files in the worktree is reported as conflicting
A signature is made up of a non-empty name and a non-empty email so
let's validate that. This also brings us more in line with git, which
also rejects ident with an empty email.
The compiler was generating a bunch of warnings for
git_mutex_init and git_mutex_lock when GIT_THREADS
was not defined (i.e. when not using -DTHREADSAFE=ON).
Also remove an unused variable from tests/path/core.c.
When the call to the agent fails, we must retrieve the error message
just after the function call, as other calls may overwrite it.
As the agent authentication is the only one which has a teardown and
there does not seem to be a way to get the error message from a stored
error number, this tries to introduce some small changes to store the
error from the agent.
Clearing the error at the beginning of the loop lets us know whether the
agent has already set the libgit2 error message and we should skip it,
or if we should set it.
Teach git_repository_init_ext to use relative paths for the gitlink
to the work directory. This is used when creating a sub repository
where the sub repository resides in the parent repository's
.git directory.
When the fetch refspec does not include the remote's default branch, it
indicates an error in user expectations or programmer error. Error out
in that case.
This lets us get rid of the dummy refspec which can never work as its
zeroed out. In the cases where we did not find a default branch, we set
HEAD detached immediately, which lets us refactor the "normal" path,
removing `found_branch`.
It does the same as git_remote_supported_url() but has a name which
implies we'd check the URL for correctness while we're simply looking at
the scheme and looking it up in our lists.
While here, fix up the tests so we check all the combination of what's
supported.
The previous commit makes it harder to figure out if the library was
built with support for a particular transport. Roll back some of the
changes and remove ssh:// and https:// from the list if we're being
built without support for them.
Even when built without a SSH support, we know about this transport. It
is implemented, but the current code makes us return an error message
saying it's not.
This is a leftover from the initial implementation of the transports
when there were in fact transports we knew about but were not
implemented.
Instead, let the SSH transport itself say it cannot run, the same as we
do for HTTPS.
A repository can have any number of references which we're not
interested in such as notes or tags. For the default branch calculation
we only care about branches. Make the decision about the number of
branches rather than the number of refs in general.
When we see PARENT1, it means there is a local commit and thus we are
ahead. Likewise, seeing PARENT2 means that the upstream branch has a
commit and we are one more behind.
The logic is currently reversed. Correct it.
This fixes#2501.
W/o this patch it is not possible to have a third party ssh transport_cb if GIT_SSH is disabled or a third party transport_cb which has a higher priority than the default one.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
The callers of git_packfile_unpack() expect the obj_offset argument to
be set to the beginning of the next object. We were mistakenly returning
the the offset of the object's data, which causes the CRC function to
try to use the wrong offset.
Set obj_offset to curpos instead of elem->offset to point to the next
element and bring back expected behaviour.
Our mkdir helper was failing is a parent directory was not
accessible even if the child directory could be created.
This changes the helper to keep trying child directories
even when the parent is unwritable.
The old `allocfmt` is of no use to callers, as they are not able to free
the returned buffer. Export a new API that returns a static string that
doesn't need to be freed.
The recv buffer (parse_buffer) and the buffer have independent sizes and
offsets. We try to fill in parse_buffer as much as possible before
passing it to the http parser. This is fine most of the time, but fails
us when the buffer is almost full.
In those situations, parse_buffer can have more data than we would be
able to put into the buffer (which may be getting full if we're towards
the end of a data sideband packet).
To work around this, we check if the space we have left on our buffer is
smaller than what could come from the network. If this happens, we make
parse_buffer think that it has as much space left as our buffer, so it
won't try to retrieve more data than we can deal with.
As the start of the data may no longer be at the start of the buffer, we
need to keep track of where it really starts (data_offset) and use that
in our calculations for the real size of the data we received from the
network.
This fixes#2518.
* Move the transport registration mechanisms into a new header under
'sys/' because this is advanced stuff.
* Remove the 'priority' argument from the registration as it adds
unnecessary complexity. (Since transports cannot decline to operate,
only the highest priority transport is ever executed.) Users who
require per-priority transports can implement that in their custom
transport themselves.
* Simplify registration further by taking a scheme (eg "http") instead
of a prefix (eg "http://").
In the check for multiline, we traverse the backslashes from the end
backwards and int the end assert that we haven't gone past the beginning
of the line. We make sure of this in the loop condition, but we also
check in the return value.
However, for certain configurations, a line in a multiline variable
might be empty to aid formatting. In that case, 'end' == 'start', since
we ended up looking at the first char which made it a multiline.
There is no need for the (end > start) check in the return, since the
loop guarantees we won't go further back than the first char in the
line, and we do accept the first char to be the final backslash.
This fixes#2483.
While scanning through a directory hierarchy, this prevents a
positive ignore match on a parent directory from blocking the scan
of a directory when a negative match rule exists for files inside
the directory.
* Removes mingw-compat.h
* Cleans up separation of compiler/platform idiosyncrasies
* Unifies mingw/msvc stat structures and functions
* (Tries to) hide more compiler specific implementation details (even in our internal API)
We always calculate multiple merge bases, but up to now we had only
exposed the "best" merge base.
Introduce git_oidarray which analogously to git_strarray lets us return
multiple ids.
This works around strict aliasing rules letting some versions of
GCC (particularly on RHEL 6) thinking that they can skip updating the
size of the array when calculating the next element's offset.
Preallocating two commits doesn't make much sense as leaving allocation
to the first array usage will allocate a sensible size with room for
growth.
This preallocation has also been hiding issues with strict aliasing in
the tests, as we have fairly simple histories and never trigger the
growth.
git allows you to set which paths to use for the git server programs
when connecting over ssh; and we want to provide something similar.
We do this by providing a factory function which can be set as the
remote's transport callback which will set the given paths upon
creation.
We used to assume a refspec would only have an asterisk in the middle of
their respective pattern. This has not been a valid assumption for some
time now with git.
Instead of assuming where the asterisk is going to be, change the logic
to treat each pattern as having two halves with a replacement bit in the
middle, where the asterisk is.
When transforming a non-pattern refspec, we simply need to copy over the
opposite string. Move that logic up to the wrapper so we can assume a
pattern refspec in the transformation function.
Move the definition of git_thread_yield() to the test which needs it and
add the correct definition for it for FreeBSD and derivatives.
Original patch adding FreeBSD and derivatives by @jacquesg.
In order to connect to a remote server, we need to provide a path to the
repository we're interested in. Consider the lack of path in the url an
error.
When the stream writing function was written, it assume that
libssh2_channel_write() would always write all of the data to the
wire. This is only true for the first 32k of data, which it tries to
fit into one ssh packet.
Since it can perform short writes, call it in a loop like we do for
send(), advancing the buffer offset.
As git_clone now has callbacks to configure the details of the
repository and remote, remove the lower-level functions from the public
API, as they lack some of the logic from git_clone proper.
Analogously to the remote creation callback, provide a way for the user
of git_clone() to create the repository with whichever options they
desire via callback.
git_checkout_index can now check out other git_index's (that are not
necessarily the repository index). This allows checkout_index to use
the repository's index for stat cache information instead of the index
data being checked out. git_merge and friends now check out their
indexes directly instead of trying to blend it into the running index.
To make sure that items returned from pool allocations are aligned
on nice boundaries, this rounds up all pool allocation sizes to a
multiple of 8. This adds a small amount of overhead to each item.
The rounding up could be made optional with an extra parameter to
the pool initialization that turned on rounding only for pools
where item alignment actually matters, but I think for the extra
code and complexity that would be involved, that it makes sense
just to burn a little bit of extra memory and enable this all the
time.
The OpenSSL library-loading functions do not expect to be called
multiple times. Add a flag in the non-threaded libgit2 init so we only
call once.
This fixes#2446.
git_remote_set_transport now takes a transport factory rather than a transport
git_clone_options now allows the caller to specify a remote creation callback
In order to know which authentication methods are supported/allowed by
the ssh server, we need to send a NONE auth request, which needs a
username associated with it.
Most ssh server implementations do not allow switching the username
between authentication attempts, which means we cannot use a dummy
username and then switch. There are two ways around this.
The first is to use a different connection, which an earlier commit
implements, but this increases how long it takes to get set up, and
without knowing the right username, we cannot guarantee that the
list we get in response is the right one.
The second is what's implemented here: if there is no username specified
in the url, ask for it first. We can then ask for the list of auth
methods and use the user's credentials in the same connection.
Set a message when we fail to lock.
Also make the put function void, since it's called from free, which
cannot report errors. The only errors we can experience here are
internal state corruption, so we assert that we are trying to put a
pack which we have previously got.
If we fail to insert the packfile in the map, make sure to free it.
This makes the free function only attempt to remove its mwindows from
the global list if we have opened the packfile to avoid accessing the
list unlocked.
Git for Windows 1.9.4 changed the behavior when the text=auto
attribute is specified and core.autocrlf=false. Previous observed
behavior would *not* filter files when going into the working
directory, the new behavior *does* filter. Update our behavior to match.
When checking out files, we're performing conversion into the user's
native line endings, but we only want to do it for files which have
consistent line endings. Refuse to perform the conversion for mixed-EOL
files.
The CRLF->LF filter is left as-is, as that conversion is considered to be
normalization by git and should force a conversion of the line endings.
Opening the same repository multiple times will currently open the same
file multiple times, as well as map the same region of the file multiple
times. This is not necessary, as the packfile data is immutable.
Instead of opening and closing packfiles directly, introduce an
indirection and allocate packfiles globally. This does mean locking on
each packfile open, but we already use this lock for the global mwindow
list so it doesn't introduce a new contention point.
Before calling the credentials callback, ask the sever which
authentication methods it supports and report that to the user, instead
of simply reporting everything that the transport supports.
In case of an error, we do fall back to listing all of them.
Bring together all of the OpenSSL initialization to
git_threads_init() so it's together and doesn't need locks.
Moving it here also gives us libssh2 thread safety (when built against
openssl).
When using in a multithreaded context, OpenSSL needs to lock, and leaves
it up to application to provide said locks.
We were not doing this, and it's just luck that's kept us from crashing
up to now.
The OpenSSL init functions are not reentrant, which means that running
multiple fetches in parallel can cause us to crash.
Use a mutex to init OpenSSL, and since we're adding this extra checks,
init it only once.
Instead of using a sentinel empty value to detect the last commit, let's
check for when we get a NULL from popping the stack, which lets us know
when we're done.
The current code causes us to read uninitialized data, although only on
RHEL/CentOS 6 in release mode. This is a readability win overall.
If the user wants to keep a copy for themselves, they should make a
copy. It adds unnecessary complexity to make sure the returned entries
are valid until the builder is cleared.
Finding a filename in a vector means we need to resort it every time we
want to read from it, which includes every time we want to write to it
as well, as we want to find duplicate keys.
A hash-map fits what we want to do much more accurately, as we do not
care about sorting, but just the particular filename.
We still keep removed entries around, as the interface let you assume
they were going to be around until the treebuilder is cleared or freed,
but in this case that involves an append to a vector in the filter case,
which can now fail.
The only time we care about sorting is when we write out the tree, so
let's make that the only time we do any sorting.
A symref inside the namespace gets renamed, we should make it point to
the target's new name.
This is for the origin/HEAD -> origin/master type of situations.
There is no reason why we need to use a callback here. A string array
fits better with the usage, as this is not an event and we don't need
anything from the user.
We must make sure that the name pointer remains valid, so make sure to
allocate the new one before freeing the old one and swap them so the
user never sees an invalid pointer.
We don't allow renames of anonymous remotes, so there's no need to
handle them.
A remote is always associated with a repository, so there's no need to
check for that.
Tighten up which references we consider for renaming so we don't try to
rename unrelated ones and end up with unexplained references.
If there is a reference on the target namespace, git overwrites it, so
let's do the same.
When renaming a lock file to its final location, we need to make sure
that it is replaced atomically.
We currently have a workaround for Windows by removing the target file.
This means that the target file, which may be a ref or a packfile, may
cease to exist for a short wile, which shold be avoided.
Implement the workaround only in Windows, by making sure that the file
we want to replace is writable.
Whe already worked out the kinks with the function used in the local
transport. Expose it and make use of it in the local clone method
instead of trying to work it out again.
When removing the remote-tracking branches, build up the list and remove
in two steps, working around an issue with the iterator. Removing while
we're iterating over the refs can cause us to miss references.
Inside `git_remote_load`, the calls to `get_optional_config` use
`giterr_clear` to unset any errors that are set due to missing config
keys. If neither a fetch nor a push url config was found for a remote,
we should set an error again.
This fixes two issues I found when core.precomposeunicode is enabled:
* When creating a reference with a NFD string, the returned
git_reference would return this NFD string as the reference’s
name. But when looking up the reference later, the name would
then be returned as NFC string.
* Renaming a reference would not honor the core.precomposeunicode and
apply no normalization to the new reference name.
If requested, git_clone_local_into() will try to link the object files
instead of copying them.
This only works on non-Windows (since it doesn't have this) when both
are on the same filesystem (which are unix semantics).
A call like git_clone("./foo", "./foo1") writes origin's url as './foo',
which makes it unusable, as they're relative to different things.
Go with git's behaviour and store the realpath as the url.
When git is given such a path, it will perform a "local clone",
bypassing the git-aware protocol and simply copying over all objects
that exist in the source.
Copy this behaviour when given a local path.
We have too many places where we repeat free code, so when adding the
new free to the generic code, it didn't take for the local transport.
While there, fix a C99-ism that sneaked through.
The protocol has a capability which allows the server to tell us which
refs are symrefs, so we can e.g. know which is the default branch.
This capability is different from the ones we already support, as it's
not setting a flag to true, but requires us to store a list of
refspec-formatted mappings.
This commit does not yet expose the information in the reference
listing.
Use size_t for page size, instead of long. Check result of sysconf.
Use size_t for page offset so no cast to size_t (second arg to p_mmap).
Use mod instead div/mult pair, so no cast to size_t is necessary.
If you enabled core.safecrlf on an LF-ending platform, we would
error even for files with all LFs. We should only be warning on
irreversible mappings, I think.
We set up the current branch after we fetch from the remote. This means
that the user's refspec may have already created this reference. It is
therefore not an error if we cannot create the branch because it already
exists.
This allows for the user to replicate git-clone's --mirror option.
Instead of changing the user-provided remote, duplicate it so we can add
the extra refspec without having to worry about unsetting it before
returning.
Windows has its own ftruncate() called _chsize_s().
p_mkstemp() is changed to use p_open() so we can make sure we open for
writing; the addition of exclusive create is a good thing to do
regardless, as we want a temporary path for ourselves.
Lastly, MSVC doesn't quite know how to add two numbers if one of them is a
void pointer, so let's alias it to unsigned char.C
Some OSs cannot keep their ideas about file content straight when mixing
standard IO with file mapping. As we use mmap for reading from the
packfile, let's make writing to the pack file use mmap.
When running multithreaded, it is not enough to check for the offmap
allocation. Move the call to cache_init() to packfile allocation so we
can be sure it is always allocated free of races.
This fixes#2355.
The switch makes the loop somewhat unwieldy. Let's assume it's fine and
perform the check when we're accessing the data.
This makes our code look a lot more like git's.
Dependency chains are often large and require a few
reallocations. Allocate a 64-element chain before doing anything else to
avoid allocations during the loop.
This value comes from the stack-allocated one git uses. We still
allocate this on the heap, but it does help performance a little bit.
Bring back the use of the delta base cache for unpacking objects. When
generating the delta chain, we stop when we find a delta base in the
pack's cache and use that as the starting point.
We currently make use of recursive function calls to unpack an object,
resolving the deltas as we come back down the chain. This means that we
have unbounded stack growth as we look up objects in a pack.
This is now done in two steps: first we figure out what the dependency
chain is by looking up the delta bases until we reach a non-delta
object, pushing the information we need onto a stack and then we pop
from that stack and apply the deltas until there are no more left.
This version of the code does not make use of the delta base cache so it
is slower than what's in the mainline. A later commit will reintroduce
it.
Repeating this error message makes it harder to find out where we
actually are finding the error, and they don't really describe what
we're trying to do.
When using Iconv to convert unicode data and iconv doesn't like
the source data (because it thinks that it's not actual UTF-8),
instead of stopping the operation, just use the unconverted data.
This will generally do the right thing on the filesystem, since
that is the source of the non-UTF-8 path data anyhow.
This adds some tests for creating and looking up branches with
messy Unicode names. Also, this takes the helper function that
was previously internal to `git_repository_init` and makes it
into `git_path_does_fs_decompose_unicode` which is a useful in
tests to understand what the expected results should be.
Our vector does a move of the rest of the array when we remove an
item. Doing this repeatedly can be expensive, and we do this a lot in
the indexer. Instead, set the value to NULL and skip those entries.
perf reported around 30% of `index-pack` time was going into
memmove. With this change, that goes away and we spent most of the time
hashing and inflating data.
This adds in missing calls to `git_buf_sanitize` and fixes a
number of places where `git_buf` APIs could inadvertently write
NUL terminator bytes into invalid buffers. This also changes the
behavior of `git_buf_sanitize` to NUL terminate a buffer if it can
and of `git_buf_shorten` to do nothing if it can.
Adds tests of filtering code with zeroed (i.e. unsanitized) buffer
which was previously triggering a segfault.
Diff and status do not want core.safecrlf to actually raise an
error regardless of the setting, so this extends the filter API
with an additional options flags parameter and adds a flag so that
filters can be applied with GIT_FILTER_OPT_ALLOW_UNSAFE, indicating
that unsafe filter application should be downgraded from a failure
to a warning.
1) Call to git_shutdown results in setting git__n_shutdown_callbacks
to -1. Next call to git__on_shutdown results in ABW (Array Bound Write)
for array git__shutdown_callbacks. In the current Implementation,
git_atomic_dec is called git__n_shutdown_callbacks + 1 times. I have
modified it to a for loop so that it is more readable. It would not
set git__n_shutdown_callbacks to a negative number and reset the
elements of git__shutdown_callbacks to NULL.
2) In function git_sysdir_get, shutdown function is registered only if
git_sysdir__dirs_shutdown_set is set to 0. However, after this variable
is set to 1, it is never reset to 0. If git_sysdir_global_init is
called again from synchronized_threads_init it does not register
shutdown function for this subsystem.
The diff code was using an "ignored_prefix" directory to track if
a parent directory was ignored that contained untracked files
alongside tracked files. Unfortunately, when negative ignore rules
were used for directories inside ignored parents, the wrong rules
were applied to untracked files inside the negatively ignored
child directories.
This commit moves the logic for ignore containment into the workdir
iterator (which is a better place for it), so the ignored-ness of
a directory is contained in the frame stack during traversal. This
allows a child directory to override with a negative ignore and yet
still restore the ignored state of the parent when we traverse out
of the child.
Along with this, there are some problems with "directory only"
ignore rules on container directories. Given "a/*" and "!a/b/c/"
(where the second rule is a directory rule but the first rule is
just a generic prefix rule), then the directory only constraint
was having "a/b/c/d/file" match the first rule and not the second.
This was fixed by having ignore directory-only rules test a rule
against the prefix of a file with LEADINGDIR enabled.
Lastly, spot checks for ignores using `git_ignore_path_is_ignored`
were tested from the top directory down to the bottom to deal with
the containment problem, but this is wrong. We have to test bottom
to top so that negative subdirectory rules will be checked before
parent ignore rules.
This does change the behavior of some existing tests, but it seems
only to bring us more in line with core Git, so I think those
changes are acceptable.
The brace in the check for peel's return was surrounding the wrong
thing, which made 'error' be set to 1 when there was an error instead of
the error code.
We assume that everything under GIT_DIR/objects/ is a directory. This is
not necessarily the case if some process left a stray file in there.
Check beforehand if we do have a directory and ignore the entry
otherwise.
There are a few tests that set up a fake home directory and a
fake GLOBAL search path so that we can test things in global
ignore or attribute or config files. This cleans up that code to
work more robustly even if there is a test failure. This also
fixes some valgrind warnings where scanning search paths for
separators could end up doing a little bit of sketchy data access
when coming to the end of search list.
This is a proposed adjustment to the trace APIs. This makes the
trace levels into a bitmask so that they can be selectively enabled
and adds a callback-level payload, plus a message-level payload.
This makes it easier for me to a GIT_TRACE_PERF callbacks that
are simply bypassed if the PERF level is not set.
This adds an option to refresh the stat cache while generating
status. It also rips out the GIT_PERF stuff I had an makes use
of the trace API to keep statistics about what happens during diff.
When diff is scanning the working directory, if it finds a file
where it is not sure if the index entry matches the working dir,
it will recalculate the OID (which is pretty expensive). This
adds a new flag to diff so that if the OID calculation finds that
the file actually has not changed (i.e. just the modified time was
altered or such), then it will refresh the stat cache in the index
so that future calls to diff will not have to check the oid again.
This reorganized the diff OID calculation to make it easier to
correctly update the stat cache during a diff once the flags to
do so are enabled.
This includes marking the path of a git_index_entry as const so
we can make a "fake" git_index_entry with a "const char *" path
and not get warnings. I was a little surprised at how unobtrusive
this change was, but I think it's probably a good thing.
When we think the stat cache in the index seems valid and the size
or mode of a file has definitely changed, then don't bother trying
to recalculate the OID of the workdir bits to confirm that it is
modified - just accept that it is modified.
This can result in files that show as modified with no actual diff,
but the behavior actually appears to match Git on the command line.
This also includes a minor optimization to not perform a submodule
lookup on the ".git" directory itself.
The current version of the commit creation and amend function are unsafe
to use when passing the update_ref parameter, as they do not check that
the reference at the moment of update points to what the user expects.
Make sure that we're moving history forward when we ask the library to
update the reference for us by checking that the first parent of the new
commit is the current value of the reference. We also make sure that the
ref we're updating hasn't moved between the read and the write.
Similarly, when amending a commit, make sure that the current tip of the
branch is the commit we're amending.
When deflating data, we might need to grow the buffer. Currently we
add a guess on top of the currently-allocated buffer size.
When we re-use the buffer, it already has some memory allocated; adding
to that means that we always grow the buffer regardless of how much we
need to use.
Instead, increase on top of the currently-used size. This still leaves
us with the allocated size of the largest object we compress, but it's a
minor pain compared to unbounded growth.
This fixes#2285.
It's possible for an encrypted connection not have a certificate. In
this case, SSL_get_verify_result() will return OK because no error
happened (as it never even tried to validate anything).
SSL_get_peer_certificate() will return NULL in this case so we need to
catch that. On the upside, the current code would segfault in this
situation instead of letting it through as a valid cert.