Before calling the credentials callback, ask the sever which
authentication methods it supports and report that to the user, instead
of simply reporting everything that the transport supports.
In case of an error, we do fall back to listing all of them.
Bring together all of the OpenSSL initialization to
git_threads_init() so it's together and doesn't need locks.
Moving it here also gives us libssh2 thread safety (when built against
openssl).
When using in a multithreaded context, OpenSSL needs to lock, and leaves
it up to application to provide said locks.
We were not doing this, and it's just luck that's kept us from crashing
up to now.
The OpenSSL init functions are not reentrant, which means that running
multiple fetches in parallel can cause us to crash.
Use a mutex to init OpenSSL, and since we're adding this extra checks,
init it only once.
Instead of using a sentinel empty value to detect the last commit, let's
check for when we get a NULL from popping the stack, which lets us know
when we're done.
The current code causes us to read uninitialized data, although only on
RHEL/CentOS 6 in release mode. This is a readability win overall.
If the user wants to keep a copy for themselves, they should make a
copy. It adds unnecessary complexity to make sure the returned entries
are valid until the builder is cleared.
Finding a filename in a vector means we need to resort it every time we
want to read from it, which includes every time we want to write to it
as well, as we want to find duplicate keys.
A hash-map fits what we want to do much more accurately, as we do not
care about sorting, but just the particular filename.
We still keep removed entries around, as the interface let you assume
they were going to be around until the treebuilder is cleared or freed,
but in this case that involves an append to a vector in the filter case,
which can now fail.
The only time we care about sorting is when we write out the tree, so
let's make that the only time we do any sorting.
A symref inside the namespace gets renamed, we should make it point to
the target's new name.
This is for the origin/HEAD -> origin/master type of situations.
There is no reason why we need to use a callback here. A string array
fits better with the usage, as this is not an event and we don't need
anything from the user.
We must make sure that the name pointer remains valid, so make sure to
allocate the new one before freeing the old one and swap them so the
user never sees an invalid pointer.
We don't allow renames of anonymous remotes, so there's no need to
handle them.
A remote is always associated with a repository, so there's no need to
check for that.
Tighten up which references we consider for renaming so we don't try to
rename unrelated ones and end up with unexplained references.
If there is a reference on the target namespace, git overwrites it, so
let's do the same.
When renaming a lock file to its final location, we need to make sure
that it is replaced atomically.
We currently have a workaround for Windows by removing the target file.
This means that the target file, which may be a ref or a packfile, may
cease to exist for a short wile, which shold be avoided.
Implement the workaround only in Windows, by making sure that the file
we want to replace is writable.
Whe already worked out the kinks with the function used in the local
transport. Expose it and make use of it in the local clone method
instead of trying to work it out again.
When removing the remote-tracking branches, build up the list and remove
in two steps, working around an issue with the iterator. Removing while
we're iterating over the refs can cause us to miss references.
Inside `git_remote_load`, the calls to `get_optional_config` use
`giterr_clear` to unset any errors that are set due to missing config
keys. If neither a fetch nor a push url config was found for a remote,
we should set an error again.
This fixes two issues I found when core.precomposeunicode is enabled:
* When creating a reference with a NFD string, the returned
git_reference would return this NFD string as the reference’s
name. But when looking up the reference later, the name would
then be returned as NFC string.
* Renaming a reference would not honor the core.precomposeunicode and
apply no normalization to the new reference name.
If requested, git_clone_local_into() will try to link the object files
instead of copying them.
This only works on non-Windows (since it doesn't have this) when both
are on the same filesystem (which are unix semantics).
A call like git_clone("./foo", "./foo1") writes origin's url as './foo',
which makes it unusable, as they're relative to different things.
Go with git's behaviour and store the realpath as the url.
When git is given such a path, it will perform a "local clone",
bypassing the git-aware protocol and simply copying over all objects
that exist in the source.
Copy this behaviour when given a local path.
We have too many places where we repeat free code, so when adding the
new free to the generic code, it didn't take for the local transport.
While there, fix a C99-ism that sneaked through.
The protocol has a capability which allows the server to tell us which
refs are symrefs, so we can e.g. know which is the default branch.
This capability is different from the ones we already support, as it's
not setting a flag to true, but requires us to store a list of
refspec-formatted mappings.
This commit does not yet expose the information in the reference
listing.
Use size_t for page size, instead of long. Check result of sysconf.
Use size_t for page offset so no cast to size_t (second arg to p_mmap).
Use mod instead div/mult pair, so no cast to size_t is necessary.
If you enabled core.safecrlf on an LF-ending platform, we would
error even for files with all LFs. We should only be warning on
irreversible mappings, I think.
We set up the current branch after we fetch from the remote. This means
that the user's refspec may have already created this reference. It is
therefore not an error if we cannot create the branch because it already
exists.
This allows for the user to replicate git-clone's --mirror option.
Instead of changing the user-provided remote, duplicate it so we can add
the extra refspec without having to worry about unsetting it before
returning.
Windows has its own ftruncate() called _chsize_s().
p_mkstemp() is changed to use p_open() so we can make sure we open for
writing; the addition of exclusive create is a good thing to do
regardless, as we want a temporary path for ourselves.
Lastly, MSVC doesn't quite know how to add two numbers if one of them is a
void pointer, so let's alias it to unsigned char.C
Some OSs cannot keep their ideas about file content straight when mixing
standard IO with file mapping. As we use mmap for reading from the
packfile, let's make writing to the pack file use mmap.
When running multithreaded, it is not enough to check for the offmap
allocation. Move the call to cache_init() to packfile allocation so we
can be sure it is always allocated free of races.
This fixes#2355.
The switch makes the loop somewhat unwieldy. Let's assume it's fine and
perform the check when we're accessing the data.
This makes our code look a lot more like git's.
Dependency chains are often large and require a few
reallocations. Allocate a 64-element chain before doing anything else to
avoid allocations during the loop.
This value comes from the stack-allocated one git uses. We still
allocate this on the heap, but it does help performance a little bit.
Bring back the use of the delta base cache for unpacking objects. When
generating the delta chain, we stop when we find a delta base in the
pack's cache and use that as the starting point.
We currently make use of recursive function calls to unpack an object,
resolving the deltas as we come back down the chain. This means that we
have unbounded stack growth as we look up objects in a pack.
This is now done in two steps: first we figure out what the dependency
chain is by looking up the delta bases until we reach a non-delta
object, pushing the information we need onto a stack and then we pop
from that stack and apply the deltas until there are no more left.
This version of the code does not make use of the delta base cache so it
is slower than what's in the mainline. A later commit will reintroduce
it.
Repeating this error message makes it harder to find out where we
actually are finding the error, and they don't really describe what
we're trying to do.
When using Iconv to convert unicode data and iconv doesn't like
the source data (because it thinks that it's not actual UTF-8),
instead of stopping the operation, just use the unconverted data.
This will generally do the right thing on the filesystem, since
that is the source of the non-UTF-8 path data anyhow.
This adds some tests for creating and looking up branches with
messy Unicode names. Also, this takes the helper function that
was previously internal to `git_repository_init` and makes it
into `git_path_does_fs_decompose_unicode` which is a useful in
tests to understand what the expected results should be.
Our vector does a move of the rest of the array when we remove an
item. Doing this repeatedly can be expensive, and we do this a lot in
the indexer. Instead, set the value to NULL and skip those entries.
perf reported around 30% of `index-pack` time was going into
memmove. With this change, that goes away and we spent most of the time
hashing and inflating data.
This adds in missing calls to `git_buf_sanitize` and fixes a
number of places where `git_buf` APIs could inadvertently write
NUL terminator bytes into invalid buffers. This also changes the
behavior of `git_buf_sanitize` to NUL terminate a buffer if it can
and of `git_buf_shorten` to do nothing if it can.
Adds tests of filtering code with zeroed (i.e. unsanitized) buffer
which was previously triggering a segfault.
Diff and status do not want core.safecrlf to actually raise an
error regardless of the setting, so this extends the filter API
with an additional options flags parameter and adds a flag so that
filters can be applied with GIT_FILTER_OPT_ALLOW_UNSAFE, indicating
that unsafe filter application should be downgraded from a failure
to a warning.
1) Call to git_shutdown results in setting git__n_shutdown_callbacks
to -1. Next call to git__on_shutdown results in ABW (Array Bound Write)
for array git__shutdown_callbacks. In the current Implementation,
git_atomic_dec is called git__n_shutdown_callbacks + 1 times. I have
modified it to a for loop so that it is more readable. It would not
set git__n_shutdown_callbacks to a negative number and reset the
elements of git__shutdown_callbacks to NULL.
2) In function git_sysdir_get, shutdown function is registered only if
git_sysdir__dirs_shutdown_set is set to 0. However, after this variable
is set to 1, it is never reset to 0. If git_sysdir_global_init is
called again from synchronized_threads_init it does not register
shutdown function for this subsystem.
The diff code was using an "ignored_prefix" directory to track if
a parent directory was ignored that contained untracked files
alongside tracked files. Unfortunately, when negative ignore rules
were used for directories inside ignored parents, the wrong rules
were applied to untracked files inside the negatively ignored
child directories.
This commit moves the logic for ignore containment into the workdir
iterator (which is a better place for it), so the ignored-ness of
a directory is contained in the frame stack during traversal. This
allows a child directory to override with a negative ignore and yet
still restore the ignored state of the parent when we traverse out
of the child.
Along with this, there are some problems with "directory only"
ignore rules on container directories. Given "a/*" and "!a/b/c/"
(where the second rule is a directory rule but the first rule is
just a generic prefix rule), then the directory only constraint
was having "a/b/c/d/file" match the first rule and not the second.
This was fixed by having ignore directory-only rules test a rule
against the prefix of a file with LEADINGDIR enabled.
Lastly, spot checks for ignores using `git_ignore_path_is_ignored`
were tested from the top directory down to the bottom to deal with
the containment problem, but this is wrong. We have to test bottom
to top so that negative subdirectory rules will be checked before
parent ignore rules.
This does change the behavior of some existing tests, but it seems
only to bring us more in line with core Git, so I think those
changes are acceptable.
The brace in the check for peel's return was surrounding the wrong
thing, which made 'error' be set to 1 when there was an error instead of
the error code.
We assume that everything under GIT_DIR/objects/ is a directory. This is
not necessarily the case if some process left a stray file in there.
Check beforehand if we do have a directory and ignore the entry
otherwise.
There are a few tests that set up a fake home directory and a
fake GLOBAL search path so that we can test things in global
ignore or attribute or config files. This cleans up that code to
work more robustly even if there is a test failure. This also
fixes some valgrind warnings where scanning search paths for
separators could end up doing a little bit of sketchy data access
when coming to the end of search list.
This is a proposed adjustment to the trace APIs. This makes the
trace levels into a bitmask so that they can be selectively enabled
and adds a callback-level payload, plus a message-level payload.
This makes it easier for me to a GIT_TRACE_PERF callbacks that
are simply bypassed if the PERF level is not set.
This adds an option to refresh the stat cache while generating
status. It also rips out the GIT_PERF stuff I had an makes use
of the trace API to keep statistics about what happens during diff.
When diff is scanning the working directory, if it finds a file
where it is not sure if the index entry matches the working dir,
it will recalculate the OID (which is pretty expensive). This
adds a new flag to diff so that if the OID calculation finds that
the file actually has not changed (i.e. just the modified time was
altered or such), then it will refresh the stat cache in the index
so that future calls to diff will not have to check the oid again.
This reorganized the diff OID calculation to make it easier to
correctly update the stat cache during a diff once the flags to
do so are enabled.
This includes marking the path of a git_index_entry as const so
we can make a "fake" git_index_entry with a "const char *" path
and not get warnings. I was a little surprised at how unobtrusive
this change was, but I think it's probably a good thing.
When we think the stat cache in the index seems valid and the size
or mode of a file has definitely changed, then don't bother trying
to recalculate the OID of the workdir bits to confirm that it is
modified - just accept that it is modified.
This can result in files that show as modified with no actual diff,
but the behavior actually appears to match Git on the command line.
This also includes a minor optimization to not perform a submodule
lookup on the ".git" directory itself.
The current version of the commit creation and amend function are unsafe
to use when passing the update_ref parameter, as they do not check that
the reference at the moment of update points to what the user expects.
Make sure that we're moving history forward when we ask the library to
update the reference for us by checking that the first parent of the new
commit is the current value of the reference. We also make sure that the
ref we're updating hasn't moved between the read and the write.
Similarly, when amending a commit, make sure that the current tip of the
branch is the commit we're amending.
When deflating data, we might need to grow the buffer. Currently we
add a guess on top of the currently-allocated buffer size.
When we re-use the buffer, it already has some memory allocated; adding
to that means that we always grow the buffer regardless of how much we
need to use.
Instead, increase on top of the currently-used size. This still leaves
us with the allocated size of the largest object we compress, but it's a
minor pain compared to unbounded growth.
This fixes#2285.
It's possible for an encrypted connection not have a certificate. In
this case, SSL_get_verify_result() will return OK because no error
happened (as it never even tried to validate anything).
SSL_get_peer_certificate() will return NULL in this case so we need to
catch that. On the upside, the current code would segfault in this
situation instead of letting it through as a valid cert.
When considering status of untracked directories, if we find an
explicitly ignored item, even if it is a directory, treat the
parent as an IGNORED item. It was accidentally being treated as
an EMPTY item because we were not looking into the ignored subdir.
The current FETCH_HEAD parsing code assumes that a quote must end the
branch name. Git however allows for quotes as part of a branch name,
which causes us to consider the FETCH_HEAD file as invalid.
Instead of searching for a single quote char, search for a quote char
followed by SP, which is not a valid part of a ref name.
In the iterator, distinguish between ignores and empty directories
so that diff and status can ignore empty directories, but checkout
and stash can treat them as untracked items.
When diff finds an untracked directory, it emulates Git behavior
by looking inside the directory to see if there are any untracked
items inside it. If there are only ignored items inside the dir,
then diff considers it ignored, even if there is no direct ignore
rule for it.
Checkout was not copying this behavior - when it found an untracked
directory, it just treated it as untracked. Unfortunately, when
combined with GIT_CHECKOUT_REMOVE_UNTRACKED, this made is seem that
checkout (and stash, which uses checkout) was removing ignored
items when you had only asked it to remove untracked ones.
This commit moves the logic for advancing past an untracked dir
while scanning for non-ignored items into an iterator helper fn,
and uses that for both diff and checkout.
To emulate git, stash should not remove untracked git repositories
inside the parent repo, and checkout's REMOVE_UNTRACKED should
also skip over these items.
`git stash` actually prints a warning message for these items.
That should be possible with a checkout notify callback if you
wanted to, although it would require a bit of extra logic as things
are at the moment.
This takes the `--stat` and related example options in the example
diff.c program and converts them to use the `git_diff_get_stats`
API which nicely formats stats for you.
I went to add bar-graph scaling to the stats formatter and noticed
that the `git_diff_stats` structure was holding on to all of the
`git_patch` objects. Unfortunately, each of these objects keeps
the full text of the diff in memory, so this is very expensive. I
ended up modifying `git_diff_stats` to keep just the data that it
needs to keep and allowed it to release the patches. Then, I added
width scaling to the output on top of that.
In making the diff example program match 'git diff' output, I ended
up removing an newline from the sumamry output which I then had to
compensate for in the email formatting to match the expectations.
Lastly, I went through and refactored the tests to use a couple of
helper functions and reduce the overall amount of code there.
I was playing with "git diff-index" and wanted to be able to
emulate that behavior a little more closely with the diff example.
Also, I wanted to play with running `git_diff_tree_to_workdir`
directly even though core Git doesn't exactly have the equivalent,
so I added a command line option for that and tweaked some other
things in the example code.
This changes a minor output thing in that the "raw" print helper
function will no longer add ellipses (...) if the OID is not
actually abbreviated.
Allow the credentials callback to return GIT_PASSTHROUGH to make the
transports code behave as though none was set.
This should make it easier for bindings to behave closer to the C code
when there is no credentials callback set at their level.
Only apply LEADING_DIR pattern munging to patterns in ignore and
attribute files, not to pathspecs used to select files to operate
on. Also, allow internal macro definitions to be evaluated before
loading all external ones (important so that external ones can
make use of internal `binary` definition).
Ignore patterns that ended with a trailing '/*' were still needing
to match against another actual '/' character in the full path.
This is not the same behavior as core Git.
Instead, we strip a trailing '/*' off of any patterns that were
matching and just take it to imply the FNM_LEADING_DIR behavior.
There was a latent bug where files that use macro definitions
could be parsed before the macro definitions were loaded. Because
of attribute file caching, preloading files that are going to be
used doesn't add a significant amount of overhead, so let's always
preload any files that could contain macros before we assemble the
actual vector of files to scan for attributes.
When traversing the directory structure, the iterator pushes and
pops ignore files using a vector. Some directories don't have
ignore files, so it uses a path comparison to decide when it is
right to actually pop the last ignore file. This was only
comparing directory suffixes, though, so a subdirectory with the
same name as a parent could result in the parent's .gitignore
being popped off the list ignores too early. This changes the
logic to compare the entire relative path of the ignore file.
The ssh-specific credentials allow the username to be missing. The idea
being that the ssh transport will then use the username provided in the
url, if it's available. There are two main issues with this.
The credential callback already knows what username was provided by the
url and needs to figure out whether it wants to ask the user for it or
it can reuse it, so passing NULL as the username means the credential
callback is suspicious.
The username provided in the url is not in fact used by the
transport. The only time it even considers it is for the user/pass
credential, which asserts the existence of a username in its
constructor. For the ssh-specific ones, it passes in the username stored
in the credential, which is NULL. The libssh2 macro we use runs strlen()
against this value (which is no different from what we would be doing
ourselves), so we then crash.
As the documentation doesn't suggest to leave out the username, assert
the need for a username in the code, which removes this buggy behavior
and removes implicit state.
git_cred_has_username() becomes a blacklist of credential types that do
not have a username. The only one at the moment is the 'default' one,
which is meant to call up some Microsoft magic.
Now that our strmap is no longer modified but replaced, we can use the
same strmap for the snapshot's values and it will be freed when we don't
need it anymore.
When we delete an entry, we also want to refresh the configuration to
catch any changes that happened externally.
This allows us to simplify the logic, as we no longer need to delete
these variables internally. The whole state will be refreshed and the
deleted entries won't be there.
With the isolation of complex reads, we can now try to refresh the
on-disk file before reading a value from it.
This changes the semantics a bit, as before we could be sure that a
string we got from the configuration was valid until we wrote or
refreshed. This is no longer the case, as a read can also invalidate the
pointer.
Current code sets the active map to a new one and builds it whilst it's
active. This is a race condition with someone else trying to access the
same config.
Instead, let's build up our new map and swap the active and new one.
In order to have consistent views of the config files for remotes,
submodules et al. and a configuration that represents what is currently
stored on-disk, we need a way to provide a view of the configuration
that does not change.
The goal here is to provide the snapshotting part by creating a
read-only copy of the state of the configuration at a particular point
in time, which does not change when a repository's main config changes.
The checks to see if files were out of date in the attibute cache
was wrong because the cache-breaker data wasn't getting stored
correctly. Additionally, when the cache-breaker triggered, the
old file data was being leaked.
I don't love this approach, but achieving thread-safety for
attribute and ignore data while reloading files would require a
larger rewrite in order to avoid this. If an attribute or ignore
file is out of date, this holds a lock on the file while we are
reloading the data so that another thread won't try to reload the
data at the same time.
In the threading tests, I was still seeing a race condition where
the same item could end up being inserted multiple times into the
index. Preserving the sorted-ness of the index outside of the
`index_insert` call fixes the issue.
This is a big refactoring of the attribute file cache to be a bit
simpler which in turn makes it easier to enforce a lock around any
updates to the cache so that it can be used in a threaded env.
Tons of changes to the attributes and ignores code.
While I was looking at the conflict cleanup code, I looked over at
the tree cache code, since we clear the tree cache for each entry
that gets removed and there is some redundancy there. I made some
small tweaks to avoid extra calls to strchr and strlen in a few
circumstances.
I introduced a leak into conflict cleanup by removing items from
inside the git_vector_remove_matching call. This simplifies the
code to just use one common way for the two conflict cleanup APIs.
When an index has an active snapshot, removing an item can cause
an error (inserting into the deferred deletion vector), so I made
the git_index_conflict_cleanup API return an error code. I felt
like this wasn't so bad since it is just like the other APIs.
I fixed up a couple of comments while I was changing the header.
The iterator pushes and pops ignores incrementally onto a list as
it traverses the directory structure so that it doesn't have to
constantly recheck which ignore files apply. With the new ref
counting, it wasn't decrementing the refcount on the ignores that
it removed from the vector.
This makes the lock management on the index a little bit broader,
having a number of routines hold the lock across looking up the
item to be modified and actually making the modification. Still
not true thread safety, but more pure index modifications are now
safe which allows the simple cases (such as starting up a diff
while index modifications are underway) safe enough to get the
snapshot without hitting allocation problems.
As part of this, I simplified the allocation of index entries to
use a flex array and just put the path at the end of the index
entry. This makes every entry self-contained and makes it a
little easier to feel sure that pointers to strings aren't
being accidentally copied and freed while other references are
still being held.
This adds a basic test of doing simultaneous diffs on multiple
threads and adds basic locking for the attr file cache because
that was the immediate problem that arose from these tests.
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
The usefulness of these helpers came up for me while debugging
some of the iterator changes that I was making, so since they
have also been requested (albeit indirectly) I thought I'd include
them.
Again, laying groundwork for some index iterator changes, this
contains a bunch of code refactorings for index internals that
should make it easier down the line to add locking around index
modifications. Also this removes the redundant prefix_position
function and fixes some potential memory leaks.
Ignore rules with slashes in them are matched using FNM_PATHNAME
and use the path to the .gitignore file from the root of the
repository along with the path fragment (including slashes) in
the ignore file itself. Unfortunately, the relative path to the
.gitignore file was being applied to the global core.excludesfile
if that was also named ".gitignore".
This fixes that with more precise matching and includes test for
ignore rules with leading slashes (which were the primary example
of this being broken in the real world).
This also backports an improvement to the file context logic from
the threadsafe-iterators branch where we don't rely on mutating
the key of the attribute file name to generate the context path.
There were a couple bugs in popping ignore files during iteration
that could result in incorrect decisions be made and thus ignore
files below the root either not being loaded correctly or not
being popped at the right time.
One bug was an off-by-one in comparing the path of the gitignore
file with the path being exited during iteration.
The second bug was not correctly truncating the path being tracked
during traversal if there were no ignores on the list (i.e. when
you have no .gitignore at the root, but do have some in contained
directories).
This updates how libgit2 treats submodule-like directories that
actually have tracked content inside of them. This is a strange
corner case, but it seems that many people have abortive submodule
setups and then just went ahead and added the files into the
parent repository. In this case, we should just treat the
submodule as if it was a normal directory.
Libgit2 will still try to skip over real submodules and contained
repositories that do not have tracked files inside them, but this
adds some new handling for cases where the apparently submodule
data is in conflict with the actual list of tracked files.
git_merge_base() returns GIT_ENOTFOUND when it cannot find a merge
base. graph_desdendant_of() returns a boolean value (barring any
errors), so it needs to catch the NOTFOUND return value and convert it
into false, as not merge base means it cannot be a descendant.
The internal buffer in the `git_path_iconv_t` structure was not
being reset before the calls to `iconv` were made to convert data,
so if there were multiple decomposed Unicode paths in a single
directory, paths after the first one were being appended to the
first instead of treated as independent data.
The base for the relative urls is determined as follows, with descending
priority:
- remote url of HEAD's remote tracking branch
- remote "origin"
- workdir
This follows git.git behaviour
When the current branch is unborn, git will still mark the current
branch's upstream for-merge if there is an upstream configuration. The
only non-constrived case is cloning from an empty repository which then
gains history. origin's master should be marked for-merge.
In order to do this, we cannot use the high-level wrappers that expect a
reference, as we may not have one. Move over to the internal ones that
expect a reference name, which we do have.
When doing a diff for use in status, we should never show the
content of a git repository contained inside another one. The
logic to do this was looking for a .git directory and so when a
gitlink plain .git file was used, it was failing to exclude the
directory content.
There was a little bug where the submodule cache thought that the
index date was out of date even when it wasn't that was resulting
in some extra scans of index data even when not needed.
Mostly this commit adds a bunch of new tests including adding and
removing submodules in the index and in the HEAD and seeing if we
can automatically pick them up when refreshing.
Wrote tests that try adding, removing, and updating the name of
submodules which showed a number of problems with how we account
for changes when incrementally updating the submodule info. Most
of these issues didn't exist before because reloading would always
blow away the old submodule data.
This improvement the management of the lock around submodule cache
updates slightly, using the lock to make sure that foreach can
safely make a snapshot of all existing submodules and making sure
that git_submodule_add_setup also grabs a lock before inserting
the new submodule. Cache initialization / refresh should already
have been holding the lock correctly as it adds submodules.
When forcing cache flushes or reload, etc., it is easier to keep
track of intent using enums instead of plain bools. Also, this
fixes a bug where the cache was not being properly refreshes by
a git_submodule_reload_all.
This makes submodule cache refresh actually look at the timestamps
from the data sources for submodules and reload as needed if they
have changed since the last refresh.
This takes the old submodule cache which was just a git_strmap
and makes a real git_submodule_cache object that can contain other
things like a lock and timestamp-ish data to control refreshing of
submodule info.
This fixes `git_submodule_sync` to correctly update the remote URL
of the default branch of the submodule along with the URL in the
parent repository config (i.e. match core Git's behavior).
Also move some useful helper logic from the submodule code into
a shared config API `git_config__update_entry` that can either set
or delete an entry with constraints like not overwriting or not
creating a new entry. I used that helper to update a couple other
places in the code.
There are a few places where we need to join three strings to
assemble a path. This adds a simple join3 function to avoid the
comparatively expensive join_n (which calls strlen on each string
twice).
The order in this function is the opposite to what
create_with_fetchspec() has, so change this one, as url-then-refspec is
what git does.
As we need to break compilation and the swap doesn't do that, let's take
this opportunity to rename in-memory remotes to anonymous as that's
really what sets them apart.
With the changes to how git_path_dirload_with_stat handles things
that look like submodules, submodules could end up sorted in the
wrong order with the workdir iterator. This moves the submodule
check earlier in the iterator processing of a new directory so
that the submodule name updates will happen immediately and the
sort order will be correct.
While looping over multiple heads, an up-to-date head will clobber the `remote->need_pack` setting, preventing the rest of the machinery from building and downloading a pack-file, breaking fetches.
If the first call to release a no-longer-existent submodule freed
the object, the check if a second is needed would dereference the
data that was just freed.
When a file is open for reading (without shared-delete permission), and
then a different thread/process called p_rename, that would fail, even
if the file was only open for reading for a few milliseconds. This
change lets p_rename wait up to 50ms for the file to be closed by the
reader. Applies only to win32.
This is especially important for git_filebuf_commit, because writes
should not fail if the file is read simultaneously.
Fixes#2207
When a submodule was inserted with a different path and name, the
return value from khash greater than zero was allowed to propagate
back out to the caller when it should really be zeroed. This led
to a possible crash when reloading submodules if that was the
first time that submodule data was loaded.
The reload_all call could end up dereferencing a NULL pointer if
there was an error while attempting to load the submodules config
data (i.e. invalid content in the gitmodules file). This fixes it.
This cleans up some places I missed that could hold onto submodule
references and cleans up the way in which the repository cache is
both reloaded and released so that existing submodule references
aren't destroyed inappropriately.
When a directory containing a .git directory (or even just a plain
gitlink) was found, libgit2 was going out of its way to treat it
specially. This seemed like it was necessary because the diff
code was not originally emulating Git's behavior for untracked
directories correctly (i.e. scanning for ignored vs untracked items
inside). Now that libgit2 diff mimics Git's untracked directory
behavior, the special handling for contained Git repos is actually
incorrect and this commit rips it out.
`git_submodule` objects were already refcounted internally in case
the submodule name was different from the path at which it was
stored. This makes that refcounting externally used as well, so
`git_submodule_lookup` and `git_submodule_add_setup` return an
object that requires a `git_submodule_free` when done.
As a way to speed up the cases where we need to hide some commits, we
find out what the merge bases are so we know to stop marking commits as
uninteresting and avoid walking down a potentially very large amount of
commits which we will never see. There are however two oversights in
current code.
The merge-base finding algorithm fails to recognize that if it is only
given one commit, there can be no merge base. It instead walks down the
whole ancestor chain needlessly. Make it return an empty list
immediately in this situation.
The revwalk does not know whether the user has asked to hide any commits
at all. In situation where the user pushes multiple commits but doesn't
hide any, the above fix wouldn't do the trick. Keep track of whether the
user wants to hide any commits and only run the merge-base finding
algorithm when it's needed.
The reflog append function was overzealous in its checking. When passed
an old and new ids, it should not do any checking, but just serialize
the data to a reflog entry.
The existing ones lack checking zeroed ids when switching back from an
unborn branch as well as what happens when detaching.
The reflog appending function mistakenly wrote zeros when dealing with a
detached HEAD. This explicitly checks for those situations and fixes
them.
When we update the current branch, we must also append to HEAD's reflog
to keep them in sync.
This is a bit of a hack, but as git.git says, it covers 100% of
default cases.
If the pqueue comparison fn returned just 0 or 1 (think "a<b")
then the sort order of returned items could be wrong because there
was a "< 0" that really needed to be "<= 0". Yikes!!!
The git_odb_exists_prefix API was not dealing correctly when a
later backend returned GIT_ENOTFOUND even if an earlier backend
had found the object.
Additionally, the unit tests were not properly exercising the API
and had a couple mistakes in checking the results.
Lastly, since the backends are not expected to behavior correctly
unless all bytes of the short id are zero except for the prefix,
this makes the ODB prefix APIs explicitly clear out the extra
bytes so the user doesn't have to be as careful.
We look up a reference in order to figure out if it's the current
branch, which we need to free once we're done with the check.
As a bonus, only perform the check when we're passed the force flag, as
it's a useless check otherwise.
We can make use of git_object_dup to use refcounting instead of pointer
comparison to make sure we don't free the caller's object.
This also lets us simplify the case for '~0' which is now just an
assignment instead of looking up the object we have at hand.
Previously the hunk_byfinalline_search_cmp function was called with different
data types (size_t and uint32_t) for the key argument but expected only the
former resulting in an invalid memory access when passed the latter on a 64 bit
machine.
The following patch makes sure that the function is called and works with the
same type (size_t).
If a directory disappears between the time we look up the entries of its
parent and the time when we go to look at it, we should ignore the error
and move forward.
This fixes#2046.
This finds a short id string that will unambiguously select the
given object, starting with the core.abbrev length (usually 7)
and growing until it is no longer ambiguous.
A few fixes have accumulated in this area which have made the freeing of
data a bit muddy. Make sure to free the data only when needed and once.
When we are going to write a delta to the packfile, we need to free the
data, otherwise leave it. The current version of the code mixes up the
checks for po->data and po->delta_data.
This adds `git_diff_buffers` and `git_patch_from_buffers`. This
also includes a bunch of internal refactoring to increase the
shared code between these functions and the blob-to-blob and
blob-to-buffer APIs, as well as some higher level assert helpers
in the tests to also remove redundancy.
- added MSVC cmake definitions to disable warnings
- general.c is rewritten so it is ansi-c compatible and compiles ok on microsoft windows
- some MSVC reported warning fixes
* Make GIT_INLINE an internal definition so it cannot be used in
public headers
* Fix language in CONTRIBUTING
* Make index caps API use signed instead of unsigned values
On some systems, notably HP PA-RISC systems running Linux or HP-UX,
EWOULDBLOCK and EAGAIN are not the same value. POSIX (and these OSes) allow
EWOULDBLOCK to occur on write(2) (and send(2), etc.), so check explicitly
for this case as well as EAGAIN by defining and using a macro GIT_ISBLOCKED
that considers both.
The macro is necessary because MSYS does not provide EWOULDBLOCK and
compilation fails if an attempt is made to use it unconditionally. On most
systems, where the two values are the same, the compiler will simply
optimize this check out and it will have no effect.
This adds an API to amend an existing commit, basically a shorthand
for creating a new commit filling in missing parameters from the
values of an existing commit. As part of this, I also added a new
"sys" API to create a commit using a callback to get the parents.
This allowed me to rewrite all the other commit creation APIs so
that temporary allocations are no longer needed.
This fixes a number of warnings with the Windows 64-bit build
including a test failure in test_repo_message__message where an
invalid pointer to a git_buf was being used.
We need this from util.h and posix.h, but the latter includes common.h
which includes util.h, which means p_strlen is not defined by the time
we get to git__strndup().
Split the definition on p_strlen() off into its own header so we can use
it in util.h.
The current code issues a lot of strncmp() calls in order to check for
the end of the header, simply in order to copy it and start going
through it again. These are a lot of calls for something we can check as
we go along. Knowing the amount of parents beforehand to reduce
allocations in extreme cases does not make up for them.
Instead start parsing immediately and check for the double-newline after
each header field, leaving the raw_header allocation for the end, which
lets us go through the header once and reduces the amount of strncmp()
calls significantly.
In unscientific testing, this has reduced a shortlog-like usage (walking
though the whole history of a branch and extracting data from the
commits) of git.git from ~830ms to ~700ms and makes the time we spend in
strncmp() negligible.
Let the user push committish objects and peel them to figure out which
commit to push to our queue.
This is for convenience and for allowing uses of
git_revwalk_push_glob(w, "tags")
with annotated tags.
Change the name to _matching() intead of _if(), and force _set_target()
to be a conditional update. If the user doesn't care about the old
value, they should use git_reference_create().
This tweaks the pqueue_up and pqueue_down routines so that they
will not do full element swaps but instead carry over the state
of the previous loop iteration and only assign elements for which
we know the final position. This will avoid a little bit of data
assignment which should improve performance in theory.
Also got rid of some vector helpers that I'm no longer using.
This fixes a typo I made for setting the sorted flag on the index
after a reload. That typo didn't actually cause any test failures
so I'm also adding a test that explicitly checks that the index is
correctly sorted after a reload when ignoring case and when not.
This updates the git_pqueue to simply be a set of specialized
init/insert/pop functions on a git_vector.
To preserve the pqueue feature of having a fixed size heap, I
converted the "sorted" field in git_vectors to a more general
"flags" field so that pqueue could mix in it's own flag. This
had a bunch of ramifications because a number of places were
directly looking at the vector "sorted" field - I added a couple
new git_vector helpers (is_sorted, set_sorted) so the specific
representation of this information could be abstracted.
I accidentally wrote a separate priority queue implementation when
I was working on file rename detection as part of the file hash
signature calculation code. To simplify licensing terms, I just
adapted that to a general purpose priority queue and replace the
old priority queue implementation that was borrowed from elsewhere.
This also removes parts of the COPYING document that no longer
apply to libgit2.
Validating the workdir should not compare HEAD to working
directory - this is both inefficient (as it ignores the cache)
and incorrect. If we had legitimately allowed changes in the
index (identical to the merge result) then comparing HEAD to
workdir would reject these changes as different. Further, this
will identify files that were filtered strangely as modified,
while testing with the cache would prevent this.
Also, it's stupid slow.
The checkout code used to defer removal of "blocking" files in
checkouts until the blocked item was actually being written (since
we have already checked that the removing the block is acceptable
according to the update rules). Unfortunately, this resulted in
an intermediate index state where both the blocking and new items
were in the index which is no longer allowed. Now we just remove
the blocking item in the first pass so it never needs to coexist.
In cases where there are typechanges, this could result in a bit
more churn of removing and recreating intermediate directories,
but I'm going to assume that is an unusual case and the churn will
not be too costly.
There were some confusing issues mixing up the number of bytes
written to the zstream output buffer with the number of bytes
consumed from the zstream input. This reorganizes the zstream
API and makes it easier to deflate an arbitrarily large input
while still using a fixed size output.
This removes the fetchRecurse compiler warnings and makes the
behavior match the other submodule options (i.e. the in-memory
setting can be reset to the on-disk value).
When three-way merging indexes, we previously changed each path
as we read them, which would lead to us adding an index entry for
'foo', then removing an index entry for 'foo/file'. With the new
index requirements, this is not allowed. Removing entries in the
merged index, then adding them, resolves this. In the previous
example, we now remove 'foo/file' before adding 'foo'.
In case insensitive index mode, we would stop at a prefixed entry,
treating the provided search key length as a substring, not the
length of the string to match.
Writing a sample Javascript driver pointed out some extra
whitespace handling that needed to be done in the diff driver.
This adds some tests with some sample javascript code that I
pulled off of GitHub just to see what would happen. Also, to
clean up the userdiff test data, I did a "git gc" and packed
up the test objects.
Since I don't have permission yet on the code from Git, I decided
I'd take a stab at writing patterns for PHP and Javascript myself.
I think these are pretty weak, but probably better than the
default behavior without them.
I contacted a number of Git authors and lined up their permission
to relicense their work for use in libgit2 and copied over their
code for diff driver xfuncname patterns. At this point, the code
I've copied is taken verbatim from core Git although Thomas Rast
warned me that the C++ patterns, at least, really need an update.
I've left off patterns where I don't feel like I have permission
at this point until I hear from more authors.
Reorganize the builtin driver table slightly so that core Git
builtin definitions can be imported verbatim. Then take a few of
the core Git drivers and pull them in.
This also creates a test of diffs with the builtin HTML driver
which led to some small error handling fixes in the driver
selection logic.
This extends the diff driver parser to support multiline driver
definitions along with ! prefixing for negated matches. This
brings the driver function pattern parsing in line with core Git.
This also adds an internal table of driver definitions and a
fallback code path that will look in that table for diff drivers
that are set with attributes without having a definition in the
config file. Right now, I just populated the table with a kind
of simple HTML definition that is similar to the core Git def.
Don't try to determine whether the system supports file modes
when putting the tree data in the index during checkout. The tree's
mode is canonical and did not come from stat(2) in the first place.
It's hard or even impossible to correctly free the string buffer
allocated by git_patch_to_str in some circumstances. Drop the function
so people have to use git_patch_to_buf instead - git_buf has a dedicated
destructor.
Returning library-allocated strings from libgit2 works fine on Linux,
but may cause problems on Windows because there is no one C Runtime that
everything links against. With libgit2 not exposing its own allocator,
freeing the string is a gamble.
git_patch_to_str already serializes to a buffer, then returns the
underlying memory. Expose the functionality directly, so callers can use
the git_buf_free function to free the memory later.
The "merge none" (don't automerge) flag was only to aide in
merge trivial tests. We can easily determine whether merge
trivial resulted in a trivial merge or an automerge by examining
the REUC after automerge has completed.
The default merge_file level was XDL_MERGE_MINIMAL, which will
produce conflicts where there should not be in the case where
both sides were changed identically. Change the defaults to be
more aggressive (XDL_MERGE_ZEALOUS) which will more aggressively
compress non-conflicts. This matches git.git's defaults.
Increase testing around reverting a previously reverted commit to
illustrate this problem.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
This changes git_signature_dup to actually honor oom conditions raised by
the call to git__strdup. It also aligns it with the error code return
pattern used everywhere else.
Ok, scrap the previous commit. This is the right overflow check that
takes care of 64 bit overflow **and** 32-bit overflow, which needs to be
considered because the pool malloc can only allocate 32-bit elements in
one go.
Note that `git_pool_strdup` cannot really return any error codes,
because the pool doesn't set errors on OOM.
The only place where `giterr_set_oom` is called is in
`git_pool_strndup`, in a conditional check that is always optimized
away. `n + 1` cannot be zero if `n` is unsigned because the compiler
doesn't take wraparound into account.
This check has been removed altogether because `size_t` is not
particularly going to overflow.
This renames git_vector_free_all to the better git_vector_free_deep
and also contains a couple of memory leak fixes based on valgrind
checks. The fixes are specifically: failure to free global dir
path variables when not compiled with threading on and failure to
free filters from the filter registry that had not be initialized
fully.
This adds tests that try canceling an indexer operation from
within the progress callback.
After writing the tests, I wanted to run this under valgrind and
had a number of errors in that situation because mmap wasn't
working. I added a CMake option to force emulation of mmap and
consolidated the Amiga-specific code into that new place (so we
don't actually need separate Amiga code now, just have to turn on
-DNO_MMAP).
Additionally, I made the indexer code propagate error codes more
reliably than it used to.
Clone callbacks can return non-zero values to cancel the clone.
This adds some tests to verify that this actually works and updates
the documentation to be clearer that this can happen and that the
return value will be propagated back by the clone function.
The checkout notify callback behavior on non-zero return values
was not being tested. This adds tests, fixes a bug with positive
values, and clarifies the documentation to make it clear that the
checkout can be canceled via this mechanism.
The callback to supply data chunks could return a negative value
to stop creation of the blob, but we were neither using GIT_EUSER
nor propagating the return value. This makes things use the new
behavior of returning the negative value back to the user.
This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
There are a lot of places that we call git__free on each item in
a vector and then call git_vector_free on the vector itself. This
just wraps that up into one convenient helper function.
This continues auditing all the places where GIT_EUSER is being
returned and making sure to clear any existing error using the
new giterr_user_cancel helper. As a result, places that relied
on intercepting GIT_EUSER but having the old error preserved also
needed to be cleaned up to correctly stash and then retrieve the
actual error.
Additionally, as I encountered places where error codes were not
being propagated correctly, I tried to fix them up. A number of
those fixes are included in the this commit as well.
This adds giterr_user_cancel to return GIT_EUSER and clear any
error message that is sitting around. As a result of using that
in places, we need to be more thorough with capturing errors that
happen inside a callback when used internally. To help with that,
this also adds giterr_capture and giterr_restore so that when we
internally use a foreach-type function that clears errors and
converts them to GIT_EUSER, it is easier to restore not just the
return value, but the actual error message text.
This adds `git_config__lookup_entry` which will look up a key in
a config and return either the entry or NULL if the key was not
present. Optionally, it can either suppress all errors or can
return them (although not finding the key is not an error for this
function). Unlike other accessors, this does not normalize the
config key string, so it must only be used when the key is known
to be in normalized form (i.e. all lower-case before the first dot
and after the last dot, with no invalid characters).
This also adds three high-level helper functions to look up config
values with no errors and a fallback value. The three functions
are for string, bool, and int values, and will resort to the
fallback value for any error that arises. They are:
* `git_config__get_string_force`
* `git_config__get_bool_force`
* `git_config__get_int_force`
None of them normalize the config `key` either, so they can only
be used for internal cases where the key is known to be in normal
format.
The frontend used to look at the file directly, but that's obviously not
the right thing to do. Expose it on the backend and use that function
instead.
git-core only writes to the reflogs of HEAD, refs/heads/ and,
refs/notes/ or if there is already a reflog in place. Adjust our code to
follow these semantics.
When doing copy detection, it is often necessary to include
UNMODIFIED records in the git_diff so they are available as source
records for GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED. Yet in the final
diff, often you will not want to have these UNMODIFIED records.
This adds a flag which marks these UNMODIFIED records for deletion
from the diff list so they will be removed after the rename detect
phase is over.
When FIND_COPIES is used in combination with BREAK_REWRITES for
rename detection, there was a bug where the split MODIFIED delta
was only used as a target for RENAME records and not for COPIED
records. This fixes that, converting the split into a pair of
DELETED and COPIED deltas when that circumstance arises.
Whenever a reference is created or updated, we need to write to the
reflog regardless of whether the user gave us a message, so we shouldn't
leave that to the ref frontend, but integrate it into the backend.
This also eliminates the race between ref update and writing to the
reflog, as we protect the reflog with the ref lock.
As an additional benefit, this reflog append on the backend happens by
appending to the file instead of parsing and rewriting it.
Copy the pointers into temporary vectors instead of assigning them tot
he same array so we don't mess up with someone else's memory by
accident (e.g. by sorting).
The callback-based method of listing remote references dates back to the
beginning of the network code's lifetime, when we didn't know any
better.
We need to keep the list around for update_tips() after disconnect() so
let's make use of this to simply give the user a pointer to the array so
they can write straightforward code instead of having to go through a
callback.
Removing arbitrary refspecs makes things more complex to reason
about. Instead, let the user set the fetch and push refspec list to
whatever they want it to be.
Create a git_branch_iterator type which is equivalent to the foreach but
lets us write loops instead of callbacks.
Since the introduction of git_reference_shorthand(), the added value of
passing the name is reduced.
When the filesystem iterator encounters an error with a file, it
returns the error but because of the cleanup code, it was in some
cases erasing the error message. This uses the giterr_detach API
to make sure that the actual error message is restored after the
cleanup code has been run.
There are a number of cases where it is convenient to be able to
fetch and "claim" the current error string, clearing the error.
This is helpful when you need to call some code that may alter
the error and you want to restore it later on and/or report it via
some other mechanism.
We used to move `data_start` forward, which is wrong as that needs to
point to the beginning of the buffer in order to perform size
calculations.
Introduce a `write_start` variable which indicates where we should start
writing from, which is what the `data_start` was being wrongly reused to
be.
The last commit taught git_checkout_tree to actually do something
meaningfull, when treeish was NULL. This lets us rewrite
git_checkout_head to simply call git_checkout_tree without giving it a
treeish.
In git_checkout_tree, the first check tests if either repo or treeish is
NULL and says that eithor of them has to have a valid value. But there
is no code to handle the treeish == NULL case.
So, do something meaningful in that case: use HEAD instead.
When downloading the default branch due to lack of refspecs, we still
need to write out FETCH_HEAD with the tip we downloaded, unfortunately
with a format that doesn't match what we already have.
This avoids sending our whole history bit by bit to the remote in cases
where there is no common history, just to give up in the end.
The number comes from the canonical implementation.
The correct behaviour when a remote has no refspecs (e.g. a URL from the
command-line) is to download the remote's HEAD. Let's do that.
This fixes#1261.
This was never really working right because we were checking the
wrong flag and not checking it in all the places that we need to
be checking it. I finally got around to writing a test and adding
actual support for it.
Sometimes the static initializer for git_diff_options cannot be
used and since setting them to all zeroes doesn't actually work
quite right, this adds a new helper for that situation.
This also adds an explicit new value to the submodule settings
options to be used when those enums need static initialization.
This changes `git_index_read` to have two modes - a hard index
reload that always resets the index to match the on-disk data
(which was the old behavior) and a soft index reload that uses
the timestamp / file size information and only replaces the index
data if the file on disk has been modified.
This then updates the git_status code to do a soft reload unless
the new GIT_STATUS_OPT_NO_REFRESH flag is passed in.
This also changes the behavior of the git_diff functions that use
the index so that when an index is not explicitly passed in (i.e.
when the functions call git_repository_index for you), they will
also do a soft reload for you.
This intentionally breaks the file signature of git_index_read
because there has been some confusion about the behavior previously
and it seems like all existing uses of the API should probably be
examined to select the desired behavior.
These changes fix the basic problem with GIT_DIFF_REVERSE being
broken for text diffs. The reversed diff entries were getting
added to the git_diff correctly, but some of the metadata was kept
incorrectly in a way that prevented the text diffs from being
generated correctly. Once I fixed that, it became clear that it
was not possible to merge reversed diffs correctly. This has a
first pass at fixing that problem. We probably need more tests
to make sure that is really fixed thoroughly.
Seems that regexp in Mac OS X and Linux were behaving
differently: while in OS X the empty string didn't
match any value, in Linux it was matching all of them,
so the the second fetch refspec was overwritting the
first one, instead of creating a new one.
Using an unmatcheable regular expression solves the
problem (and seems to be portable).
At some moment git_config_delete_entry lost the ability to delete one entry of
a multivar configuration. The moment you had more than one fetch or push
ref spec for a remote you will not be able to save that remote anymore. The
changes in network::remote::remotes::save show that problem.
I needed to create a new git_config_delete_multivar because I was not able to
remove one or several entries of a multivar config with the current API.
Several tries modifying how git_config_set_multivar(..., NULL) behaved were
not successful.
git_config_delete_multivar is very similar to git_config_set_multivar, and
delegates into config_delete_multivar of config_file. This function search
for the cvar_t that will be deleted, storing them in a temporal array, and
rebuilding the linked list. After calling config_write to delete the entries,
the cvar_t stored in the temporal array are freed.
There is a little fix in config_write, it avoids an infinite loop when using
a regular expression (case for the multivars). This error was found by the
test network::remote::remotes::tagopt.
This tells the server that we speak it, but we don't make use of its
extra information to determine if there's a better place to stop
negotiating.
In a somewhat-related change, reorder the capabilities so we ask for
them in the same order as git does.
Also take this opportunity to factor out a fairly-indented portion of
the negotiation logic.
It was there to keep it apart from the one which read in from a file on
disk. This other indexer does not exist anymore, so there is no need for
anything other than git_indexer to refer to it.
While here, rename _add() function to _append() and _finalize() to
_commit(). The former change is cosmetic, while the latter avoids
talking about "finalizing", which OO languages use to mean something
completely different.
When building libgit2 for ia32 architecture on a x64 machine, including
"config.h" without a "common.h" would result the following error:
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2288): error C2373: 'InterlockedIncrement' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2295): error C2373: 'InterlockedDecrement' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2303): error C2373: 'InterlockedExchange' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2314): error C2373: 'InterlockedExchangeAdd' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
The user is unable to derive the number of deltas in the pack, as that
would require them to capture the stats exactly in the moment between
download and final processing, which is abstracted away in the fetch.
Capture these numbers for the user and expose them in the progress
struct. The clone and fetch examples now also present this information
to the user.
The names from libssh2 are somewhat obtuse for us. We can simplify the
usual key/passphrase credential's name, as well as make clearer what the
custom signature function is.
It seems that to implement these options, we just have to pass
the appropriate flags through to the libxdiff code taken from
core git. So let's do it (and add a test).
Instead of having functions with so very many parameters to pass
hunk and line data, this takes the existing git_diff_hunk struct
and extends it with more hunk data, plus adds a git_diff_line.
Those structs are used to pass back hunk and line data instead of
the old APIs that took tons of parameters.
Some work that was previously only being done for git_diff_patch
creation (scanning the diff content for exact line counts) is now
done for all callbacks, but the performance difference should not
be noticable.
While the base git_diff_delta structure always contains two files,
when we introduce conflict data, it will be helpful to have an
indicator when an additional file is involved.
Move conflict handling into two steps: load the conflicts and
then apply the conflicts. This is more compatible with the
existing checkout implementation and makes progress reporting
more sane.
If a D/F conflict or rename 2->1 conflict occurs,
we write the file sides as filename~branchname. If
a file with that name already exists in the working
directory, write as filename~branchname_0 instead.
(Incrementing 0 until a unique filename is found.)
This lays groundwork for separating formatting options from diff
creation options. This groups the formatting flags separately
from the diff list creation flags and reorders the options. This
also tweaks some APIs to further separate code that uses patches
from code that just looks at git_diffs.
This makes no functional change to diff but renames a couple of
the objects and splits the new git_patch (formerly git_diff_patch)
into a new header file.
Don't increase the number of total objects, as it can produce
suprising progress output. The only addition compared to pre-thin is
the addition of local_objects to allow an output similar to git's
"completed with %d local objects".
The iconv init was accidentally clearing the default error state
during reference normalization. This resets so that normalization
errors will be detected correctly.
Before these changes, looking up a reference would return the
same precomposed or decomposed form of the reference name that
was used to look it up, so on MacOS which ignores the difference
between the two, a single reference could be looked up either way
and git_reference_name would return the form of the name that was
used to look it up! This change makes lookup always return the
precomposed name if core.precomposeunicode is set regardless of
which version was used to look it up. The reference iterator was
already returning the precomposed form from earlier work.
This also updates the CMakeLists.txt rules for enabling iconv
usage because the clar tests for this code were actually not being
activated properly with the old version.
Finally, this moves git_repository_reset_filesystem from include/
git2/repository.h to include/git2/sys/repository.h since it is not
really a function that normal library users should have to think
about very often.
This cleans up some additional issues. The main change is that
on a filesystem that doesn't support mode bits, libgit2 will now
create new blobs with GIT_FILEMODE_BLOB always instead of being
at the mercy to the filesystem driver to report executable or not.
This means that if "core.filemode" lies and claims that filemode
is not supported, then we will ignore the executable bit from the
filesystem. Previously we would have allowed it.
This adds an option to the new git_repository_reset_filesystem to
recurse through submodules if desired. There may be other types
of APIs that would like a "recurse submodules" option, but this
one is particularly useful.
This also has a number of cleanups, etc., for related things
including trying to give better error messages when problems come
up from the filesystem. For example, the FAT filesystem driver on
MacOS appears to return errno EINVAL if you attempt to write a
filename with invalid UTF-8 in it. We try to capture that with a
better error message now.
There may be multiple deltas referencing the same base as well as OFS
deltas which rely on a thin delta. Deal with both at the same time by
injecting a single object and going back up to the main
delta-resolving loop.
When a tool needs to recreate the tree object (for example an
interface to another VCS), it needs to use the raw attributes,
forgoing any normalization.
When a repository is transferred from one file system to another,
many of the config settings that represent the properties of the
file system may be wrong. This adds a new public API that will
refresh the config settings of the repository to account for the
change of file system. This doesn't do a full "reinitialize" and
operates on a existing git_repository object refreshing the config
when done.
This commit then makes use of the new API in clar as each test
repository is set up.
This commit also has a number of other clar test fixes where we
were making assumptions about the type of filesystem, either based
on outdated config data or based on the OS instead of the FS.
When given an ODB from which to read objects, the indexer will attempt
to inject the missing bases at the end of the pack and update the
header and trailer to reflect the new contents.
Though unusual, a packfile may contain a delta whose base is a delta
that comes later. In order index such a packfile, we must not give up
on the first failure to resolve a delta, but keep it around.
If there is a pass which makes no progress, this indicates that the
packfile is broken, so fail accordingly.
The repo init code was assuming Windows == no filemode, and
Mac or Windows == no case sensitivity. Those assumptions are not
consistently true depending on the mounted file system. This is a
first step to removing those assumptions. It focuses on the repo
init code and the tests of that code. There are still many other
tests that are broken when those assumptions don't hold true, but
this clears up one area of the code.
Also, this moves the core.precomposeunicode logic to be closer to
the current logic in core Git where it will be set to true on any
filesystem where composed unicode is decomposed when read back.
The indexer code was generating warnings on Windows 64-bit. I
looked closely at the logic and was able to simplify it a bit.
Also this fixes some other Windows and Linux warnings.
This adds a simple wrapper around the iconv APIs and uses it
instead of the old code that was inlining the iconv stuff. This
makes it possible for me to test the iconv logic in isolation.
A "no iconv" version of the API was defined with macros so that
I could have fewer ifdefs in the code itself.
This simplifies git_path_is_empty_dir on both Windows (getting rid
of git_buf allocation inside the function) and other platforms (by
just using git_path_direach), and adds tests for the function, and
uses the function to simplify some existing tests.
This hooks up git_path_direach and git_path_dirload so that they
will take a flag indicating if directory entry names should be
tested and converted from decomposed unicode to precomposed form.
This code will only come into play on the Apple platform and even
then, only when certain types of filesystems are used.
This involved adding a flag to these functions which involved
changing a lot of places in the code.
This was an opportunity to do a bit of code cleanup here and there,
for example, getting rid of the git_futils_cleanupdir_r function in
favor of a simple flag to git_futils_rmdir_r to not remove the top
level entry. That ended up adding depth tracking during rmdir_r
which led to a safety check for infinite directory recursion. Yay.
This hasn't actually been tested on the Mac filesystems where the
issue occurs. I still need to get test environment for that.
This doesn't actual do string precompose but it puts the hooks in
place into the iterators and the git_path_dirload function so that
the actual precompose work is ready to go.
This adds initialization of core.precomposeunicode to repo init
on Mac. This is necessary because when a Mac accesses a repo on
a VFAT or SAMBA file system, it will return directory entries in
decomposed unicode even if the filesystem entry is precomposed.
This also removes caching of a number of repo properties from the
repo init pipeline because these are properties of the specific
filesystem on which the repo is created, not of the system as a
whole.
This commit adds cancellation for the push operation. This work consists of:
1) Support cancellation during push operation
- During object counting phase
- During network transfer phase
- Propagate GIT_EUSER error code out to caller
2) Improve cancellation support during fetch
- Handle cancellation request during network transfer phase
- Clear error string when cancelled during indexing
3) Fix error handling in git_smart__download_pack
Cancellation during push is still only handled in the pack building and
network transfer stages of push (and not during packbuilding).
References and their logs are logically coupled, let's make it so in
the code by moving the fs-based reflog implementation to live next to
the fs-based refs one.
As part of the change, make the function take names rather than
references, as only the names are relevant when looking up and
handling reflogs.
The basic clone function is there to make it easy to create a "normal"
clone. Remove a bunch of options that are about changing the remote's
configuration.
The text progress and update_tips callbacks are already part of the
struct, which was meant to unify the callback setup, but the download
one was left out.
This adds the basics of progress reporting during push. While progress
for all aspects of a push operation are not reported with this change,
it lays the foundation to add these later. Push progress reporting
can be improved in the future - and consumers of the API should
just get more accurate information at that point.
The main areas where this is lacking are:
1) packbuilding progress: does not report progress during deltafication,
as this involves coordinating progress from multiple threads.
2) network progress: reports progress as objects and bytes are going
to be written to the subtransport (instead of as client gets
confirmation that they have been received by the server) and leaves
out some of the bytes that are transfered as part of the push protocol.
Basically, this reports the pack bytes that are written to the
subtransport. It does not report the bytes sent on the wire that
are received by the server. This should be a good estimate of
progress (and an improvement over no progress).
The subtransport path was relying on pointing to data owned by
the remote which meant that after a redirect, the updated path
was getting lost for future requests. This updates the http
transport to strdup the path and maintain its own lifetime.
This also pulls responsibility for parsing the URL back into the
http transport and isolates the functions that parse and free that
connection data so that they can be reused between the initial
parsing and the redirect parsing.
On occasion, files can disappear while we're iterating the
filesystem, between calls to readdir and stat. Let's pretend
those didn't exist in the first place.
The git_buf_text_gather_stats call returns a boolean indicating if
the file looks like binary data. That shouldn't be an error; it
should be used to skip CRLF processing though.
This replaces some git_buf_printf calls with simple calls to
git_buf_put instead. Also, it fixes a missing va_end inside
the git_buf_vprintf implementation.
The attempt to "clean up warnings" seems to have introduced some
new warnings on compliant compilers. This fixes those in a way
that I suspect will also be okay for the non-compliant compilers.
Also this fixes what appears to be an extra semicolon in the
repo initialization template dir handling (and as part of that
fix, handles the case where an error occurs correctly).
In revwalk, we are doing a very simple check to see if a string
contains wildcard characters, so a full regular expression match
is not needed.
In remote listing, now that we have git_config_foreach_match with
full regular expression matching, we can take advantage of that
and eliminate the regex here, replacing it with much simpler string
manipulation.
This contains a few bug fixes and some header and API cleanups.
The main API change is that filters should now use GIT_PASSTHROUGH
to indicate that they wish to skip processing a file instead of
GIT_ENOTFOUND.
The bug fixes include a possible out-of-range buffer access in
the ident filter, a filter ordering problem I introduced into the
custom filter tests on Windows, and a filter buf NUL termination
issue that was coming up on Linux.
This adds more tests of filters, including the ident filter when
mixed with custom filters. I was able to combine with the reverse
filter and demonstrate that the order of filter application with
the default priority constants matches the order of core Git.
Also, this fixes two issues in the ident filter: preventing ident
expansion on binary files and avoiding a NULL dereference when
dollar sign characters are found without Id.
Fixed the filter order to match core Git, too.
This test demonstrates an interesting behavior of core Git (which
is totally reasonable and which libgit2 matches, although mostly
by coincidence). If you use the ident filter and commit a file
with a garbage ident in it, like '$Id: this is just garbage$' and
then immediately do a 'git checkout-index' with the new file, Git
will not consider the file out of date and will not overwrite the
file with an updated $Id$. Libgit2 has the same behavior. If you
remove the file and then do a checkout-index, it will be replaced
with a filtered version that has injected the OID correctly.
These are a couple of new clar helpers for testing that a file
has expected contents that I extracted from the checkout code.
Actually wrote this as part of an abandoned earlier attempt at a
new filters API, but it will be useful now for some of the tests
I'm going to write.
I wish MSVC understood that "const char **" is not a const ptr,
but it a non-const pointer to an array of const ptrs. Does that
seem like too much to ask.
Checkout should not reject binary files from filters, as a filter
may actually wish to operate on binary files. The CRLF filter should
reject binary files itself if it wishes to. Moreover, the CRLF
filter requires this logic so that users can emulate the checkout
data in their odb -> workdir filtering.
Conflicts:
src/checkout.c
src/crlf.c
This makes the git_buf struct that was used internally into an
externally available structure and eliminates the git_buffer.
As part of that, some of the special cases that arose with the
externally used git_buffer were blended into the git_buf, such as
being careful about git_buf objects that may have a NULL ptr and
allowing for bufs with a valid ptr and size but zero asize as a
way of referring to externally owned data.
This adds the ident filter (that knows how to replace $Id$) and
tweaks the filter APIs and code so that git_filter_source objects
actually have the updated OID of the object being filtered when
it is a known value.
Extend the git2/sys/filter API with functions to look up a filter
and add it manually to a filter list. This requires some trickery
because the regular attribute lookups and checks are bypassed when
this happens, but in the right hands, it will allow a user to have
granular control over applying filters.
Increasingly there are a number of components that want to do some
cleanup at global shutdown time (at least if there are not going
to be memory leaks). This creates a very simple system of shutdown
hooks that will be invoked by git_threads_shutdown. Right now, the
maximum number of hooks is hardcoded, but since adding a hook is
not a public API, it should be fine and I thought it was better to
start off with really simple code.
This moves the git_filter_list into the public API so that users
can create, apply, and dispose of filter lists. This allows more
granular application of filters to user data outside of libgit2
internals.
This also converts all the internal usage of filters to the public
APIs along with a few small tweaks to make it easier to use the
public git_buffer stuff alongside the internal git_buf.
The filter registry as implemented was too primitive to actually
work once multiple filters were coming into play. This expands
the implementation of the registry to handle multiple prioritized
filters correctly.
Additionally, this adds an "attributes" field to a filter that
makes it really really easy to implement filters that are based
on one or more attribute values. The lookup and even simple value
checking can all happen automatically without custom filter code.
Lastly, with the registry improvements, this fills out the filter
lifecycle callbacks, with initialize and shutdown callbacks that
will be called before the filter is first used and after it is
last invoked. This allows for system-wide initialization and
cleanup by the filter.
This creates include/sys/filter.h with a basic definition of a
git_filter and then converts the internal code to use it. There
are related internal objects (git_filter_list) that we will want
to publish at some point, but this is a first step.
This begins the process of exposing git_filter objects to the
public API. This includes:
* new public type and API for `git_buffer` through which an
allocated buffer can be passed to the user
* new API `git_blob_filtered_content`
* make the git_filter type and GIT_FILTER_TO_... constants public
Unfortunately git-core uses the term "unborn branch" and "orphan
branch" interchangeably. However, "orphan" is only really there for
the checkout command, which has the `--orphan` option so it doesn't
actually create the branch.
Branches never have parents, so the distinction of a branch with no
parents is odd to begin with. Crucially, the error messages deal with
unborn branches, so let's use that.
As the include depth increases, the chance of a realloc
increases. This means that whenever we run git_array_alloc() or call
config_parse(), we need to remember what our reader's index is so we
can look it up again.
We need to refresh the variables from the included files if they are
changed, so loop over all included files and re-parse the files if any
of them has changed.
Now that #1785 is merged, git_odb_stream_finalize_write() calculates the object id before invoking the odb backend.
This commit gives a chance to the backend to check if it already knows this object.
The GIT_MODE_TYPE macro was looking at all bits above the
permissions, but it should really just look at the top bits so
that it will give the right results for a setgid or setuid entry.
Since we're now using these macros in the tests, this was causing
a test failure on platforms that don't support setgid.
This adds some more macros for some standard operations on file
modes, particularly related to permissions, and then updates a
number of places around the code base to use the new macros.
In order to support config includes, we must differentiate between the
backend's main file and the file we are currently parsing.
This lays the groundwork for includes, keeping the current behaviours.
Previously, `git_object_read()`, `git_object_read_prefix()` and
`git_object_exists()` were implementing an auto refresh logic. When the
expected object couldn't be found in any backend, a call to
`git_odb_refresh()` was triggered and the lookup was once again performed
against all backends.
This commit removes this auto-refresh logic from the odb layer and pushes
it down into the pack-backend (as it's the only one currently exposing
a `refresh()` endpoint).
This simplifies the git_repository_is_empty a bit so that a
detached HEAD is just taken to mean the repo is not empty, since
a newly initialized repo will not have a detached HEAD.
Ensure that we apply splits to rewrites, even if we're not
interested in examining it closely for rename/copy detection.
In keeping with core git, status should not display rewrites,
it should simply show files as "modified".
In order to be loaded, a remote needs to be configured with at least a `url` or a `pushurl`.
ENOTFOUND will be returned when trying to git_remote_load() a remote with neither of these entries defined.
This loads SRWLock APIs at runtime and in their absence (i.e. on
Windows before Vista) falls back on a regular CRITICAL_SECTION
that will not permit concurrent readers.
9e9aee6 added an include <netinet/in.h> to fix the build on FreeBSD.
Sometime since then the same header is included ifndef _WIN32, so
remove the duplicate include.
This converts an internal lock from a write lock to a read lock
where write isn't needed, and also clarifies some doc things about
where various locks are acquired and how various APIs are intended
to be used.
This adds thread safety to the refdb_fs by using the new
git_sortedcache object and also by relaxing the handling of some
filesystem errors where the fs may be changed out from under us.
This also adds some new threading tests that hammer on the refdb.
The refdb_fs implementation calls realloc directly on a reference
object when it wants to rename it. It is not a public object, so
this doesn't mess with the immutability of references, but it does
assume certain constraints on the reference representation. This
commit wraps that assumption in an isolated API to isolate it.
This adds a convenient new data type for caching the contents of
file in memory when each item in that file corresponds to a name
and you need to both be able to lookup items by name and iterate
over them in some sorted order. The new data type has locks in
place to manage usage in a threaded environment.
If there were symbolic refs among the loose refs then the code
to create packed-refs would fail trying to parse the OID out of
them (where Git just skips trying to pack them). This fixes it.
When a git_buf contains a UTF-8 BOM, the three bytes comprising
that BOM are treated as unprintable characters. For a small git_buf,
the three BOM characters overwhelm the printable characters. This
is problematic when trying to check out a small file as the CR/LF
filtering will not apply.
p_inet_pton on Windows should set errno properly for callers.
Rewrite p_inet_pton to handle error cases correctly and add
test cases to exercise this function.
Report the index being locked with its own error code in order to be
able to differentiate, as a locked index is typically the result of a
crashed process or concurrent access, both of which often require user
intervention to fix.
If none of the backends support direct writes and we must stream the
whole file, we already know what the object's id should be; so use the
stream's functions directly, bypassing the frontend's hashing and
overwriting of our existing id.
The frontend is in charge of calculating the id of the objects. Thus
the backends should treat it as a read-only value. The positioning in
the function signature made it seem as though it was an output
parameter.
Make the id const and move it from the front to behind the subject
(backend or stream).
When dealing with a chain of tags, we need to enqueue each of them
individually, which means we can't use `git_tag_peel` as that jumps
over the intermediate tags.
Do the peeling manually so we can look at each object and take the
appropriate action.
Hash the data as it's coming into the stream and tell the backend what
its name is when finalizing the write. This makes it consistent with
the way a plain git_odb_write() performs the write.
This is in preparation for moving the hashing to the frontend, which
requires us to handle the incoming data before passing it to the
backend's stream.
This fixes a small memory leak in git_revparse where early returns on
errors from git_revparse_single cause a free() on the (reallocated) left
side of the revspec to be skipped.
Accept any value for the remote's url, including an empty string which
we used to reject as invalid configuration.
This is not quite what git does (although it has its own problems with
such configurations) and it makes it harder to fix the issue, by not
letting the user modify it.
As we already need to check for a valid URL when we try to connect to
the network, let that perform the check, as we don't need to do it
anywhere else.
This reverts refactoring done in 13224ea4aa
that introduces a performance regression for NFS when reading files that
don't exist. open() forces a cache invalidation on NFS, while stat()ing a
file just uses the cache and is very quick.
To give a specific example, say you have a repo with a thousand packed
refs. Before this change, looking up every single one ould incur a thousand
slow open() calls. With this change, it's a thousand fast stat() calls.
This is just a bunch of small fixes that I noticed while looking
at the UTF8 and UTF16 path stuff. It fixes a slowdown in looking
for an empty directory (not exiting loop asap), makes the dir name
in the git__DIR structure be a GIT_FLEX_ARRAY to save an allocation,
and fixes some slightly odd assumptions in the cl_getenv helper.
Key-based authentication also needs an username, so include it in each
one.
Also stop assuming a default username of "git" in the ssh transport
which has no business making such a decision.
The routines to push and pop ignore files while traversing a
directory had some issues. In particular, setting up the initial
list would sometimes push an ignore file before it ought to be
applied if the starting path was a directory containing an ignore
file. Also, the pop function was not always matching the right
part of the path and would fail to pop ignores from the list in
some cases.
This adds some tests that exercise a particular problematic case
and then fixes the problems that I could find related to this.
At some point, I'd like to isolate this ignore rule management
code and rewrite it, but that's a larger project and right now,
I'll opt to just try to fix the broken behaviors.
This rolls back the changes to fnmatch parsing from commit
2e40a60e84 except for the tests
that were added. Instead this adds couple of new flags that can
be passed in when attempting to parse an fnmatch pattern. Also,
this changes the pathspec match logic to special case matching a
filename with a '!' prefix against a negative pattern.
This fixes the build.
`git_config_set_string(config, "config.section", "")` fails when
escaping the value.
The buffer in `escape_value` is allocated without NULL-termination. And
in case of empty string 0 is passed for buffer size in `git_buf_grow`.
`git_buf_detach` returns NULL when the allocated size is 0 and that
leads to an error return in `GITERR_CHECK_ALLOC` called after
`escape_value`
The change in `config_file.c` was suggested by Russell Belfer <rb@github.com>
new functions in struct git_config_backend:
* iterator_new(...)
* iterator_free(...)
* next(...)
The old callback based foreach style can still be used with `git_config_backend_foreach_match`
This step is needed to easily add iterators to git_config_backend
As well use these new git_strmap functions to implement foreach
* git_strmap_iter
* git_strmap_has_data(...)
* git_strmap_begin(...)
* git_strmap_end(...)
* git_strmap_next(...)
In git_diff_paired_foreach, temporarily resort the
index->workdir diff list by index path so that we can
track a rename in the workdir from head->index->workdir.
When using a rename source that is actually a to-be-split record,
we have to update the best-fit mapping data in both the case where
the target is also a split record and the case where the target
is a simple added record. Before this commit, we were only doing
the update when the target was itself a split record (and even in
that case, the test was slightly wrong).
After doing further profiling, I found that a lot of time was
being spent attempting to insert hashes into the file hash
signature when using the rolling hash because the rolling hash
approach generates a hash per byte of the file instead of one
per run/line of data.
To optimize this, I decided to convert back to a run-based file
signature algorithm which would be more like core Git.
After changing this, a number of the existing tests started to
fail. In some cases, this appears to have been because the test
was coded to be too specific to the particular results of the file
similarity metric and in some cases there appear to have been bugs
in the core rename detection code where only by the coincidence
of the file similarity scoring were the expected results being
generated.
This renames all the variables in the core rename detection code
to be more consistent and hopefully easier to follow which made it
a bit easier to reason about the behavior of that code and fix the
problems that I was seeing. I think it's in better shape now.
There are a couple of tests now that attempt to stress test the
rename detection code and they are quite slow. Most of the time
is spent setting up the test data on disk and in the index. When
we roll out performance improvements for index insertion, it
should also speed up these tests I hope.
The size data in the index may not reflect the actual size of the
blob data from the ODB when content filtering comes into play.
This commit fixes rename detection to use the actual blob size when
calculating data signatures instead of the value from the index.
Because of a misunderstanding on my part, I first converted the
git_index_add_bypath API to use the post-filtered blob data size
in creating the index entry. I backed that change out, but I
kept the overall refactoring of that routine and the new internal
git_blob__create_from_paths API because it eliminates an extra
stat() call from the code that adds a file to the index.
The existing tests actually cover this code path, at least when
running on Windows, so at this point I'm not adding new tests to
cover the changes.
The previous fix for checking file sizes with rename detection
always loads the blob. In this version, if the odb backend can
get the object header without loading the whole thing into memory,
then we'll just use that, so that we can eliminate possible rename
sources & targets without loading them.
The performance improvements I introduced for rename detection
were not able to run successfully for tree-to-tree diffs because
the blob size was not known early enough and so the file signature
always had to be calculated nonetheless.
This change separates loading blobs into memory from calculating
the signature. I can't avoid having to load the large blobs into
memory, but by moving it forward, I'm able to avoid the signature
calculation if the blob won't come into play for renames.
This restores the usage of GIT_DIFF_LINE_BINARY for the diff
output line that reads "Binary files x and y differ" so that it
can be optionally colorized independently of the file header.
This allows git_diff_patch_size to account for hunk headers and
file headers in the returned size. This required some refactoring
of the code that is used to print file headers so that it could be
invoked by the git_diff_patch_size API.
Also this increases the test coverage and fixes an off-by-one bug
in the size calculation when newline changes happen at the end of
the file.
Instead of using lots of strdup calls, this adds a memory pool to
the loose refs iteration code and uses it for keeping track of the
loose refs array. Memory usage could probably be reduced even
further by eliminating the vector and just scanning by adding the
strlen of each ref, but that would be a more intrusive changes.
This also updates the error handling to be more thorough about
checking for failed allocations, etc.
The git_reference_next API silently skips invalid references when
scanning the loose refs. The git_reference_next_name API should
skip the same ones even though it isn't creating the reference
object.
This adds a test with a an invalid loose reference and makes sure
that both APIs skip the same entries and generate the same results.
This makes git__swap use the __sync_lock_test_and_set primitive
with GCC and the InterlockedExchangePointer primitive with MSVC.
Previously is used compare_and_swap in a way that was probably
unintuitive for most thinking (i.e. it could fail to swap in the
value if another thread raced in). Now it will always succeed
and the last thread to run in a race will win instead of the
first thread.
This also fixes up a little confusion between volatile void **
and void * volatile * that came up with the Win32 compiler.
This restores a behavior that was accidentally lost during some
diff refactoring where an untracked directory that contains a .git
item should be treated as IGNORED, not as UNTRACKED. The submodule
code already detects this, but the diff code was not handling the
scenario right.
This also updates a number of existing tests that were actually
exercising the behavior but did not have the right expectations in
place. It actually makes the new
`test_diff_submodules__diff_ignore_options` test feel much better
because the "not-a-submodule" entries are now ignored instead of
showing up as untracked items.
Fixes#1697
This adds correct support for an equivalent to --ignore-submodules
in diff, where an actual ignore value can be passed to diff to
override the per submodule settings in the configuration.
This required tweaking the constants for ignore values so that
zero would not be used and could represent an unset option to the
diff. This was an opportunity to move the submodule values into
include/git2/types.h and to rename the poorly named DEFAULT values
for ignore and update constants to RESET instead.
Now the GIT_DIFF_IGNORE_SUBMODULES flag is exactly the same as
setting the ignore_submodules option to GIT_SUBMODULE_IGNORE_ALL
(which is actually a minor change from the old behavior in that
submodules will now be treated as UNMODIFIED deltas instead of
being left out totally - if you set GIT_DIFF_INCLUDE_UNMODIFIED).
This includes tests for the various new settings.
Submodules now expose an internal status API that allows diff to
get back the OID values from the submodule very easily and also
to avoiding caching issues and to override the ignore setting for
the submodule.
This fixes the way that submodule status is checked to bypass just
about all of the caching in the submodule object. Based on the
ignore value, it will try to do the minimum work necessary to find
the current status of the submodule - but it will actually go to
disk to get all of the current values.
This also removes the custom refcounting stuff in favor of the
common git_refcount style. Right now, it is still for internal
purposes only, but it should make it easier to add true submodule
refcounting in the future with a public git_submodule_free call
that will allow bindings not to worry about the submodule object
getting freed from underneath them.
This adds a BARE option to git_repository_open_ext which allows
a fast open path that still knows how to read gitlinks and to
search for the actual .git directory from a subdirectory.
`git_repository_open_bare` is still simpler and faster, but having
a gitlink aware fast open is very useful for submodules where we
want to quickly be able to peek at the HEAD and index data without
doing any other meaningful repo operations.
This is probably not the final form of this change, but this is
a preliminary version of checking a timestamp to see if the cached
working directory HEAD OID matches the current. Right now, this
uses the timestamp on the index and is, like most of our timestamp
checking, subject to having only second accuracy.
This adds an additional pathspec API that will match a pathspec
against a diff object. This is convenient if you want to handle
renames (so you need the whole diff and can't use the pathspec
constraint built into the diff API) but still want to tell if the
diff had any files that matched the pathspec.
When the pathspec is matched against a diff, instead of keeping
a list of filenames that matched, instead the API keeps the list
of git_diff_deltas that matched and they can be retrieved via a
new API git_pathspec_match_list_diff_entry.
There are a couple of other minor API extensions here that were
mostly for the sake of convenience and to reduce dependencies
on knowing the internal data structure between files inside the
library.
This is a simple bit vector object that is not resizable after
the initial allocation but can be of arbitrary size. It will
keep the bti vector entirely on the stack for vectors 64 bits
or less, and will allocate the vector on the heap for larger
sizes. The API is uniform regardless of storage location.
This is very basic right now and all the APIs are inline functions,
but it is useful for storing an array of boolean values.
This converts the array of parent SHAs from a git_vector where
each SHA has to be separately allocated to a git_array_t where
all the SHAs can be kept in one block. Since the two collections
have almost identical APIs, there isn't much involved in making
the change. I did add an API to git_array_t so that it could be
allocated at a precise initial size.
This fixes the way the example log program decides if a merge
commit should be shown when a pathspec is given. Also makes it
easier to use the pathspec API to just check "does a tree match
anything in the pathspec" without allocating a match list.
This adds a new public API for compiling pathspecs and matching
them against the working directory, the index, or a tree from the
repository. This also reworks the pathspec internals to allow the
sharing of code between the existing internal usage of pathspec
matching and the new external API.
While this is working and the new API is ready for discussion, I
think there is still an incorrect behavior in which patterns are
always matched against the full path of an entry without taking
the subdirectories into account (so "s*" will match "subdir/file"
even though it wouldn't with core Git). Further enhancements are
coming, but this was a good place to take a functional snapshot.
The SSH error checking and reporting could still be further
improved by using the libssh2 native methods to get error info,
but at least this ensures that all error codes are checked and
translated into libgit2 error messages.
If there is not an error, the return value was always the return value
of the last call to file->get_multivar
With this commit GIT_ENOTFOUND is only returned if all the calls to
filge-get_multivar return GIT_ENOTFOUND.
This makes all of the credential objects use the same pattern to
clear the contents and call git__memzero when done. Much of this
information is probably not sensitive, but it also seems better
to just clear consistently.
Much of the SSH credential creation API can be left enabled even
on platforms with no SSH support. We really just have to give an
error when you attempt to open the SSH connection.
The diff hunk context string that is returned to xdiff need not
be NUL terminated because the xdiff code just copies the number of
bytes that you report directly into the output. There was an off
by one in the diff driver code when the header context was longer
than the output buffer size, the output buffer length included
the NUL byte which was copied into the hunk header.
Fixes#1710
This option serves no benefit now that the git_status_list API
is available. It was of questionable value before and now it
would just be a bad idea to use it rather than the indexed API.
The index isn't really thread safe for the most part, but we can
easily be more careful and avoid double frees and the like, which
are serious problems (as opposed to a lookup which might return
the incorrect value but if the index in being updated, that is
much harder to avoid).
In both of these cases, the submodule data should still be loaded
just (obviously) without the data that comes from either the index
or the HEAD.
This fixes a bug in the orphaned head case.
There was a bug where submodules whose HEAD had not been moved
were being marked as having an UNMODIFIED delta record instead
of being left MODIFIED. This fixes that and fixes the tests to
notice if a submodule has been incorrectly marked as UNMODIFIED.
In theory, p_stat should never return an S_ISLNK result, but due
to the current implementation on Windows with mount points it is
possible that it will. For now, work around that by allowing a
link in the path to a directory being created. If it is really a
problem, then the issue will be caught on the next iteration of
the loop, but typically this will be the right thing to do.
This updates the calls that make the subdirectories for objects
to use a base directory above which git_futils_mkdir won't walk
any higher. This prevents attempts to mkdir all the way up to
the root of the filesystem.
Also, this moves the objects_dir into the loose backend structure
and removes the separate allocation, plus does some preformatting
of the objects_dir value to guarantee a trailing slash, etc.
With the new target directory option to checkout, the non-bareness
of the repository should be checked much later in the parameter
validation process - actually that check was already in place, but
I was doing it redundantly in the checkout APIs.
This removes the now unnecessary early check for bare repos. It
also adds some other parameter validation and makes it so that
implied parameters can actually be passed as NULL (i.e. if you
pass a git_index, you don't have to pass the git_repository - we
can get it from index).
This adds the ability for checkout to write to a target directory
instead of having to use the working directory of the repository.
This makes it easier to do exports of repository data and the like.
This is similar to, but not quite the same as, the --prefix option
to `git checkout-index` (this will always be treated as a directory
name, not just as a simple text prefix).
As part of this, the workdir iterator was extended to take the
path to the working directory as a parameter and fallback on the
git_repository_workdir result only if it's not specified.
Fixes#1332
This fixes the checkout case when a file is modified between the
baseline and the target and yet missing in the working directory.
The logic for that case appears to have been wrong.
This also adds a useful checkout notify callback to the checkout
test helpers that will count notifications and also has a debug
mode to visualize what checkout thinks that it's doing.
Files in status will, be default, be sorted according to the case
insensitivity of the filesystem that we're running on. However,
in some cases, this is not desirable. Even on case insensitive
file systems, 'git status' at the command line will generally use
a case sensitive sort (like 'ls'). Some GUIs prefer to display a
list of file case insensitively even on case-sensitive platforms.
This adds two new flags: GIT_STATUS_OPT_SORT_CASE_SENSITIVELY
and GIT_STATUS_OPT_SORT_CASE_INSENSITIVELY that will override the
default sort order of the status output and give the user control.
This includes tests for exercising these new options and makes
the examples/status.c program emulate core Git and always use a
case sensitive sort.
This adds some tests for updating the index and having it remove
items to make sure that the iteration over the index still works
even as earlier items are removed.
In testing with valgrind, this found a path that would use the
path string from the index entry after it had been freed. The
bug fix is simply to copy the path of the index entry before
doing any actual index manipulation.
This adds three new public APIs for manipulating the index:
1. `git_index_add_all` is similar to `git add -A` and will add
files in the working directory that match a pathspec to the
index while honoring ignores, etc.
2. `git_index_remove_all` removes files from the index that match
a pathspec.
3. `git_index_update_all` updates entries in the index based on
the current contents of the working directory, either added
the new information or removing the entry from the index.
Command line Git sometimes generates an error message if given a
pathspec that contains an exact match to an ignored file (provided
--force isn't also given). This adds an internal function that
makes it easy to check it that has happened. Right now, I'm not
creating a public API for this because that would get a little
more complicated with a need for callbacks for all invalid paths.
Right now, setting up a pathspec to be parsed and processed
requires several data structures and a couple of API calls. This
adds a new high level data structure that contains all the items
that you'll need and high-level APIs that do all of the setup and
all of the teardown. This will make it easier to use pathspecs
in various places with less repeated code.
This makes the diff rename tracking code more careful about the
order in which it processes renames and more thorough in updating
the mapping of correct renames when an earlier rename update
alters the index of a later matched pair.
This adds parameters to the four functions that allow for blob-to-
blob and blob-to-buffer differencing (either via callbacks or by
making a git_diff_patch object). These parameters let you say
that filename we should pretend the blob has while doing the diff.
If you pass NULL, there should be no change from the existing
behavior, which is to skip using attributes for file type checks
and just look at content. With the parameters, you can plug into
the new diff driver functionality and get binary or non-binary
behavior, plus function context regular expressions, etc.
This commit also fixes things so that the git_diff_delta that is
generated by these functions will actually be populated with the
data that we know about the blobs (or buffers) so you can use it
appropriately. It also fixes a bug in generating patches from
the git_diff_patch objects created via these functions.
Lastly, there is one other behavior change that may matter. If
there is no difference between the two blobs, these functions no
longer generate any diff callbacks / patches unless you have
passed in GIT_DIFF_INCLUDE_UNMODIFIED. This is pretty natural,
but could potentially change the behavior of existing usage.
This changes the behavior of the status RENAMED flags so that they
will be combined with the MODIFIED flags if appropriate. If a file
is modified in the index and also renamed, then the status code
will have both the GIT_STATUS_INDEX_MODIFIED and INDEX_RENAMED bits
set. If it is renamed but the OID has not changed, then just the
GIT_STATUS_INDEX_RENAMED bit will be set. Similarly, the flags
GIT_STATUS_WT_MODIFIED and GIT_STATUS_WT_RENAMED can both be set
independently of one another.
This fixes a serious bug where the check for unmodified files that
was done at data load time could end up erasing the RENAMED state
of a file that was renamed with no changes.
Lastly, this contains a bunch of new tests for status with renames,
including tests where the only rename changes are case changes.
The expected results of these tests have to vary by whether the
platform uses a case sensitive filesystem or not, so the expected
data covers those platform differences separately.
Trees are always case sensitive. The index is always case
preserving and will be case sensitive when it is turned into a
tree. Therefore the tree and the index can and should always
be compared to one another case sensitively.
This will restore the index to case insensitive order after the
diff has been generated.
Consider this a short-term fix. The long term fix is to have the
index always stored both case sensitively and case insensitively
(at least on platforms that sometimes require case insensitivity).
In a case insensitive index, if you attempt to add a file from
disk with a different case pattern, the old case pattern in the
index should be preserved.
This fixes that (and a couple of minor warnings).
This commit reinstates some changes to git_diff__paired_foreach
that were discarded during the rebase (because the diff_output.c
file had gone away), and also adjusts the case insensitively
logic slightly to hopefully deal with either mismatched icase
diffs and other case insensitivity scenarios.
This makes diff more careful about picking the canonical path
when generating a delta so that it won't accidentally pick up a
case-mismatched path on a case-insensitive file system. This
should make sure we use the "most accurate" case correct version
of the path (i.e. from the tree if possible, or the index if
need be).
The exclude submodules flag was not doing the right thing, in
that a file with no diff between the head and the index and just
a delete in the workdir could be excluded if submodules were
excluded.
On Linux: fix a warning message related to the volatile qualifier (cast)
On Windows: use SecureZeroMemory()
On both, inline the call, so that no entry point can lead back to this "secure" memory zeroing.
This makes the git_diff_patch definition private to diff_patch.c
and fixes a number of other header file naming inconsistencies to
use `git_` prefixes on functions and structures that are shared
between files.
This changes the size data to uint32_t, fixes the array growth
logic to use a simple 1.5x multiplier, and uses a generic inline
function for growing the array to make the git_array_alloc API
feel more natural (i.e. it returns a pointer to the new item).
This adds two new public APIs: git_diff_patch_from_blobs and
git_diff_patch_from_blob_and_buffer, plus it refactors the code
for git_diff_blobs and git_diff_blob_to_buffer so that they code
is almost entirely shared between these APIs, and adds tests for
the new APIs.
This implements the loading of regular expression pattern lists
for diff drivers that search for function context in that way.
This also changes the way that diff drivers update options and
interface with xdiff APIs to make them a little more flexible.
There are all sorts of misconfiguration in the wild. We already rely
on the signature constructor to trim SP. Extend the logic to use
`isspace` to decide whether a character should be trimmed.
This is a significant reorganization of the diff code to break it
into a set of more clearly distinct files and to document the new
organization. Hopefully this will make the diff code easier to
understand and to extend.
This adds a new `git_diff_driver` object that looks of diff driver
information from the attributes and the config so that things like
function content in diff headers can be provided. The full driver
spec is not implemented in the commit - this is focused on the
reorganization of the code and putting the driver hooks in place.
This also removes a few #includes from src/repository.h that were
overbroad, but as a result required extra #includes in a variety
of places since including src/repository.h no longer results in
pulling in the whole world.
There are two places where git_futils_mkdir should exit early or
at least do less. The first is when using GIT_MKDIR_SKIP_LAST
and having that flag leave no directory left to create; it was
being handled previously, but the behavior was subtle. Now I put
in a clear explicit check that exits early in that case.
The second is when there is no directory to create, but there is
a valid path that should be verified. I shifted the logic a bit
so we'll be better about not entering the loop than that happens.
This implements a basic callback to extract function context for
a diff. It always uses the same search heuristic right now with
no regular expressions or language-specific variants. Those will
come next, I think.
This makes sure that git_futils_mkdir always skips over the root
directory at a minimum, even on platforms where the root is not
simply '/'. Also, this removes the GIT_WIN32 ifdef in favor of
making EACCES as a potentially recoverable error on all platforms.
We ran into an issue where cloning a repository to a folder
directly underneath the root of a volume (e.g. 'd:\libgit2')
would fail with an access denied error. This was traced down
to a call to make a directory that is the root (e.g. 'd:') could
return an error indicated access denied instead of an error
indicating the path already exists. This change now handles
the access denied error on Win32 and checks for the existence
of the folder.
git doesn't do that, and it's not something that's usually
actionable to fix. if you have a git repository with one bad
timezone in the history, it's too late to change it most likely.
Instead of just blowing away the stat cache data when loading a
new tree into the index, this checks if each loaded item has a
corresponding existing item with the same OID and if so, copies
the stat data from the old item to the new one so it will not be
blown away.
It is obviously quite a serious problem if this happens, but mutex
initialization can fail and we should detect it. It's a bit like
a memory allocation failure, in that you're probably pretty screwed
if this occurs, but at least we'll catch it.
By zeroing out the memory when we free larger objects (i.e. those
that serve as collections of other data, such as repos, odb, refdb),
I'm hoping that it will be easier for libgit2 bindings to find
errors in their object management code.
1. internal iterators now return GIT_ITEROVER when you go past the
last item in the iteration.
2. git_iterator_advance will "advance" to the first item in the
iteration if it is called immediately after creating the
iterator, which allows a simpler idiom for basic iteration.
3. if git_iterator_advance encounters an error reading data (e.g.
a missing tree or an unreadable file), it returns the error
but also attempts to advance past the invalid data to prevent
an infinite loop.
Updated all tests and internal usage of iterators to account for
these new behaviors.
`lpExitCode` is a pointer to a long. A long is 32 bits wide on Windows.
It means that on Windows 64bits, `GetExitCodeThread()` doesn't set/clear the high-order bytes of the 64 bits memory space pointed at by `value_ptr`.
git_packbuilder_write() used to write a packfile to the passed file
path. Instead, ask for a destination directory and create both the
packfile and an index, as most users probably do expect.
This adds ~/ prefix expansion for the value of core.attributesfile
and core.excludesfile, plus it fixes the fact that the attributes
cache was holding on to the string data from the config for a long
time (instead of making its own strdup) which could have caused a
problem if the config was refreshed. Adds a test for the new
expansion capability.
This improves the docs for GIT_DIFF_INCLUDE_UNTRACKED_CONTENT as
well as the other flags related to UNTRACKED items in diff, plus
it makes that flag now automatically turn on
GIT_DIFF_INCLUDE_UNTRACKED which seems like a reasonable dwim type
of change.
The GIT_CONFIG_LEVEL constants actually work well as an enum
because they are mutually exclusive, so this adds a typedef to
the enum and uses that everywhere that one of these constants are
expected, instead of the old code that typically used an unsigned
int.
This adds docs for the cache control options to git_libgit2_opts
and also tweaks the cache code so that if the cache is disabled,
then the next time we attempt to insert something into the cache
in question, we will actually clear any old cached objects.
In theory, if there was a problem reading the REUC data, the
read_reuc() routine could have left uninitialized and invalid
data in the git_index vector. This moves the line that inserts a
new entry into the vector down to the bottom of the routine so we
know all the content is already valid. Also, per @linquize, this
uses calloc to ensure no uninitialized data.
This extends the rename tests to make sure that every rename
scenario in the inner loop of git_diff_find_similar is actually
exercised. Also, fixes an incorrect assert that was in one of
the clauses that was not previously being exercised.
This moves the GIT_CVAR_ABBREV lookup out of the loop. Also, this
fixes git_diff_print_raw to actually use that constant instead of
hardcoding 7 characters.
This adds a couple more tests of different rename scenarios.
Also, this fixes a problem with the case where you have two
"split" deltas and the left half of one matches the right half of
the other. That case was already being handled, but in the wrong
order in a way that could result in bad output. Also, if the swap
also happened to put the other two halves into the correct place
(i.e. two files exchanged places with each other), then the second
delta was left with the SPLIT flag set when it really should be
cleared.
This flips rename detection around so instead of creating a
forward mapping from deltas to possible rename targets, instead
it creates a reverse mapping, looking at possible targets and
trying to find a source that they could have been renamed or
copied from. This is important because each output can only
have a single source, but a given source could map to multiple
outputs (in the form of COPIED records).
Additionally, this makes a couple of tweaks to the public rename
detection APIs, mostly renaming a couple of options that control
the behavior to make more sense and to be more like core Git.
I walked through the tests looking at the exact results and
updated the expectations based on what I saw. The new code is
different from the old because it cannot give some nonsense
results (like A was renamed to both B and C) which were part of
the outputs previously.
This adds a bunch more rename detection tests including checks
vs the working directory, the new exact match options, some more
whitespace variants, etc.
This also adds a git_futils_writebuffer helper function and uses
it in checkout. This is mainly added because I wanted an easy
way to write out a git_buf to disk inside my test code.
- Add new GIT_DIFF_FIND_EXACT_MATCH_ONLY flag to do similarity
matching without using the similarity metric (i.e. only compare
the SHA).
- Clean up the similarity measurement code to more rigorously
distinguish between files that are not similar and files that
are not comparable (previously, a 0 could either mean that the
files could not be compared or that they were totally different)
- When splitting a MODIFIED file into a DELETE/ADD pair, actually
make a DELETED/UNTRACKED pair if the right side of the diff is
from the working directory. This prevents an odd mix of ADDED
and UNTRACKED files on workdir diffs.
There are a number of bugs in the rename code that only were
obvious when I started testing it against large old repos with
more complex patterns. (The code to do that testing is not ready
to merge with libgit2, but I do plan to add more thorough tests.)
This contains a significant number of changes and also tweaks the
public API slightly to make emulating core git easier.
Most notably, this separates the GIT_DIFF_FIND_AND_BREAK_REWRITES
flag into FIND_REWRITES (which adds a self-similarity score to
every modified file) and BREAK_REWRITES (which splits the modified
deltas into add/remove pairs in the diff list). When you do a raw
output of core git, rewrites show up as M090 or such, not at A and
D output, so I wanted to be able to emulate that.
Publicly, this also changes the flags to be uint16_t since we
don't need values out of that range.
Internally, this contains significant changes from a number of
small bug fixes (like using the wrong side of the diff to decide
if the object could be found in the ODB vs the workdir) to larger
issues about which files can and should be compared and how the
various edge cases of similarity scores should be treated.
Honestly, I don't think this is the last update that will have to
be made to this code, but I think this moves us closer to correct
behavior and I tried to document the code so it would be easier
to follow..
I frequently want to the the first N digits of an OID formatted
as a string and I'd like it to be efficient. This function makes
that easy and I could rewrite the OID formatters in terms of it.
Expose a way to retrieve, along with the target git_object, the reference
pointed at by some revparse expression (`@{<-n>}` or
`<branchname>@{upstream}` syntax).
In theory, if there was a problem reading the REUC data, the
read_reuc() routine could have left uninitialized and invalid
data in the git_index vector. This moves the line that inserts a
new entry into the vector down to the bottom of the routine so we
know all the content is already valid. Also, per @linquize, this
uses calloc to ensure no uninitialized data.
This adds an example implementation that emulates git cat-file.
It is a convenient and relatively simple example of getting data
out of a repository.
Implementing this also revealed that there are a number of APIs
that are still not using const pointers to objects that really
ought to be. The main cause of this is that `git_vector_bsearch`
may need to call `git_vector_sort` before doing the search, so a
const pointer to the vector is not allowed. However, for tree
objects, with a little care, we can ensure that the vector of
tree entries is always sorted and allow lookups to take a const
pointer. Also, the missing const in commit objects just looks
like an oversight.
Since I added the GIT_IDXENTRY_STAGE macro to extract the stage
from a git_index_entry, we probably don't need an internal inline
function to do the same thing.
Under some strange circumstances, diffs can end up listing files
that we can't actually open successfully. Instead of aborting
the git_diff_find_similar, this makes it so that those files just
won't be considered as valid rename/copy targets instead.
It is possible for there to be a submodule in a repository with
no .gitmodules file (for example, if the user forgot to commit
the .gitmodules file). In this case, core Git will just create
an empty directory as a placeholder for the submodule but
otherwise ignore it. We were generating an error and stopping
the checkout. This makes our behavior match that of core git.
Unlike blob updates, symlink updates cannot be done "in place"
writing over an old symlink. This means that in checkout when we
realize that we can safely update a symlink, we still need to
remove the old one before writing the new.
When the last item in a diff was an untracked directory that only
contained ignored items, the loop to scan the contents would run
off the end of the iterator and dereference a NULL pointer. This
includes a test that reproduces the problem and a fix.
There was a problem found in the Rugged test suite where the
refdb_fs_backend__next function could exit too early in some
very specific hashing patterns for packed refs. This ports
the Rugged test to libgit2 and then fixes the bug.
Nobody should ever be using anything other than ALL at this level, so
remove the option altogether.
As part of this, git_reference_foreach_glob is now implemented in the
frontend using an iterator. Backends will later regain the ability of
doing the glob filtering in the backend.
If you use rename detection, the renamed and copied files would
not show any text diffs because the function that decides if
data should be loaded didn't know which sides of the diff to
load for those cases.
This adds a test that looks at the patch generated for diff
entries that are COPIED or RENAMED.
The git_status_file API was doing a hack to deal with files that
are inside ignored directories. The status scan was not reporting
any file in this case, so git_status_file would attempt a final
"stat()" call, and return IGNORED if the file actually existed.
On case-insensitive filesystems where core.ignorecase is set
incorrectly, this magic check can "succeed" and report a file
as ignored when it should actually return ENOTFOUND.
Now that we have the GIT_STATUS_OPT_RECURSE_IGNORED_DIRS, we can
use that flag to make sure that git_status_file() will look into
ignored directories and eliminate the hack completely, so we give
the correct error.
This clarifies the docs for git_repository_message and also adds
to the tests to explicitly check NUL termination of data when the
output buffer is smaller than the message size. There is a minor
behavior change so that a non-NULL output buffer will always be
NUL terminated (at length zero) if an error occurs.
When a repository is initialised, we need to probe to see if there is
a global config to load. If this is not the case, the user isn't able
to write to the global config without creating the backend and adding
it themselves, which is inconvenient and overly complex.
Unconditionally create and add a backend for the global config file
regardless of whether it exists as a convenience for users.
To enable this, we allow creating backends to files that do not exist
yet, changing the semantics somewhat, and making some tests invalid.
When tagopt is set to '--tags', we should only take the default tags
refspec into account and ignore any configured ones.
Bring the code into compliance.
When a patch contained an eofnl change (i.e. the last line either
gained or lost a newline), the oldno and newno line number values
for the lines in the last hunk of the patch were not useful. This
makes them behave in a more expected manner.
This adds a new line origin constant for the special line that
is used when both files end without a newline.
In the course of writing the tests for this, I was having problems
with modifying a file but not having diff notice because it was
the same size and modified less than one second from the start of
the test, so I decided to start working on nanosecond timestamp
support. This commit doesn't contain the nanosecond support, but
it contains the reorganization of maybe_modified and the hooks so
that if the nanosecond data were being read by stat() (or rather
being copied by git_index_entry__init_from_stat), then the nsec
would be taken into account.
This new stuff could probably use some more tests, although there
is some amount of it here.
Currently git_branch_set_upstream when passed a local branch
creates invalid configuration, for ex. if we setup branch
'tracking_master' to track local 'master' libgit2 generates
the following config
```
[branch "track_master"]
remote = .
merge = .refs/heads/track_master
```
The merge value is invalid and calling git_branch_upstream on
'tracking_master' results in invalid reference error.
It should do:
```
[branch "track_master"]
remote = .
merge = refs/heads/master
```
We use p->index_map.data to check whether the struct has been set up
and all the information about the index is stored there. This variable
gets set up halfway through the setup process, however, and a thread
can come along and use fields that haven't been written to yet.
Crucially, pack_entry_find_offset() needs to read the index version
(which is written after index_map) to know the offset and stride
length to pass to sha1_entry_pos(). If these values are wrong,
assertions in it will fail, as it will be reading bogus data.
Make index_version the last field to be written and switch from using
p->index_map.data to p->index_version as "git_pack_file is ready" flag
as we can use it to know if every field has been written.
Older versions of git would only write peeled entries for
items under refs/tags/. Newer versions will write them for
all refs, and we should be prepared to handle that.
There is an occasional assertion failure in sha1_entry_pos from
pack_entry_find_index when running threaded. Holding the mutex
around the code that grabs the index_map data and processes it
makes this assertion failure go away.
There are many paths through revparse that may return an error
code without reporting an error, I believe. This fixes one of
them. Because of the backtracking in revparse, it is pretty
complicated to fix the others.
A number of places were looking up option config values and then
not clearing the error codes if the values were not found. This
moves the repeated pattern into a shared routine and adds the
extra call to giterr_clear() when needed.
There are some cases, particularly where no loaded ODB backends
support a particular operation, where we would return an error
code without having set an error. This catches those cases and
reports that no ODB backends support the operation in question.
When diff encounters an untracked directory, there was a shortcut
that it took which is not compatible with core git. This makes
the default behavior no longer take that shortcut and instead look
inside the untracked directory to see if there are any untracked
files within it. If there are not, then the directory is treated
as an ignore directory instead of an untracked directory. This
has implications for the git_status APIs.
In preparation for more changes to the internal diff logic, it
seemed wise to split the very large git_diff__from_iterators into
separate functions that handle the four main cases (unmatched old
item, unmatched new item, unmatched new directory, and matched
old and new items). Hopefully this will keep the logic easier to
follow even as more cases have to be added to this code.
This removes the GIT_INLINE versions of the simple git_object
accessors and standardizes them with a helper macro in src/object.h
to build the function bodies.
Add a new git_oid_strcmp that compares a string OID with a hex
oid for sort order, and then reimplement git_oid_streq using it.
This actually should speed up git_oid_streq because it only reads
as far into the string as it needs to, whereas previously it would
convert the whole string into an OID and then use git_oid_cmp.
git_oid_ncmp was making some assumptions about the length of
the data - this shifts the check to the top of the loop so it
will work more robustly, limits the max, and adds some tests
to verify the functionality.
For update and create commands where all the objects are known to
exist in the remote, we must send an empty packfile. However, if all
we issue are delete commands, no packfile must be sent.
Take this into consideration for push.
This makes diff use the cvar cache for config options where
possible, and also adds support for a number of other config
options to diff including "diff.context", "diff.ignoreSubmodules",
"diff.noprefix", "diff.mnemonicprefix", and "core.abbrev".
To make this natural, this involved a rearrangement of the code
that allocates the diff object vs. the code that initializes it
based on the combination of options passed in by the user and
read from the config.
This commit includes tests for most of these new options as well.
This converts many of the config lookups that are done around the
library to use the repository config cache. This was everything I
could find that wasn't part of diff (which requires a larger fix).
This adds a bunch of additional config values to the repository
config value cache and makes it easier to add a simple boolean
config without creating enum values for each possible setting.
Also, this fixes a bug in git_config_refresh where the config
cache was not being cleared which could lead to potential
incorrect values.
The work to start using the new cached configs will come in the
next couple of commits...
When case insensitive tree iterators were added, we started reading
the case sensitivity of the index to decide if the tree should be
case sensitive. This is good for index-to-tree comparisons, but
for tree-to-tree comparisons, we should really default to doing a
case sensitive comparison unless the user really wants otherwise.
Rename git_packfile_check to git_packfile_alloc since it is now
being used more in that capacity. Fix the various places that use
it. Consolidate some repeated code in odb_pack.c related to the
allocation of a new pack_backend.
The indexer was creating a packfile object separately from the
code in pack.c which was a problem since I put a call to
git_mutex_init into just pack.c. This commit updates the pack
function for creating a new pack object (i.e. git_packfile_check())
so that it can be used in both places and then makes indexer.c
use the shared initialization routine.
There are also a few minor formatting and warning message fixes.
This removes the lock from the repository object and changes the
internals to use the new atomic git__compare_and_swap to update
the _odb, _config, _index, and _refdb variables in a threadsafe
manner.
This builds on the earlier thread safety work to make it so that
setting the odb, index, refdb, or config for a repository is done
in a threadsafe manner with minimized locking time. This is done
by adding a lock to the repository object and using it to guard
the assignment of the above listed pointers. The lock is only
held to assign the pointer value.
This also contains some minor fixes to the other work with pack
files to reduce the time that locks are being held to and fix an
apparently memory leak.
This adds create and free callback to the git_objects_table so
that more of the creation and destruction of objects can be table
driven instead of using switch statements. This also makes the
semantics of certain object creation functions consistent so that
we can make better use of function pointers. This also fixes a
theoretical error case where an object allocation fails and we
end up storing NULL into the cache.
When I was writing threading tests for the new cache, the main
error I kept running into was a pack file having it's content
unmapped underneath the running thread. This adds a lock around
the routines that map and unmap the pack data so that threads can
effectively reload the data when they need it.
This also required reworking the error handling paths in a couple
places in the code which I tried to make consistent.
Add a git_cache_set_max_object_size method that does more checking
around setting the max object size. Also add a git_cache_size to
read the number of objects currently in the cache. This makes it
easier to write tests.
This moves most of the refdb stuff over to the include/git2/sys
directory, with some minor shifts in function organization.
While I was making the necessary updates, I also removed the
trailing whitespace in a few files that I modified just because I
was there and it was bugging me.
Actually this renames git_commit_create_oid to
git_commit_create_from_oids and moves the API declaration to
include/git2/sys/commit.h since it is a dangerous API for general
use (because it doesn't check that the OID list items actually
refer to real objects).
This moves some of the odb_backend stuff that is related to the
internals of an odb_backend implementation into include/git2/sys.
Some of the stuff related to streaming I left in include/git2
because it seemed like it would be reasonably needed by a normal
user who wanted to stream objects into and out of the ODB.
Also, I added APIs for traversing the list of backends so that
some of the tests would not need to access ODB internals.
It used to be separate as an attempt to make the querying easier, but
it didn't work out that way, so put all the data together.
Add git_refspec_string() as well to get the original string, which is
now stored alongside the independent parts.
Introduce git_remote_{fetch,push}_refspecs() to get a list of refspecs
from the remote and rename the refspec-adding functions to a less
silly name.
Use this instead of the vector index hacks in the tests.
A remote can have a multitude of refspecs. Up to now our git_remote's
have supported a single one for each fetch and push out of simplicity
to get something working.
Let the remotes and internal code know about multiple remotes and get
the tests passing with them.
Instead of setting a refspec, the external users can clear all and add
refspecs. This should be enough for most uses, though we're still
missing a querying function.
This adds some hooks into the filesystem iterator so that the
workdir iterator can just become a wrapper around it. Then we
remove most of the workdir iterator code and just have it augment
the filesystem iterator with skipping .git entries, updating the
ignore stack, and checking for submodules.
This adds a new variant iterator that is a raw filesystem iterator
for scanning directories from a root. There is still more work to
do to blend this with the working directory iterator.
If the on-disk file has been staged (it's stat data matches the stat data
in the cache) then we need not hash the file to determine whether it
differs from the checkout target; instead we can simply use the oid in
the index.
This prevents recomputing a file's hash unnecessarily, prevents loading
the file (when filtering) and prevents edge cases where filters suggest
that a file is dirty immediately after git writes the file.
Don't try to update anything if there are no heads to update. This
saves us from trying to look into a fetch refspec when there is none.
A better fix for compatibility with git when using remotes without
refspecs is still needed, but this stops us from segfaulting.
The end of the header is signaled by to consecutive LFs and the commit
message starts immediately after. Jumping over LFs at the start of the
message is a bug and leads to creating different commits if
when rebuilding history.
This also fixes an empty commit message being returned as "\n".
This adds tests for diffs with submodules in them and (perhaps
unsurprisingly) requires further fixes to be made. Specifically,
this fixes:
- when considering if a submodule is dirty in the workdir, it was
being treated as dirty even if only the index was dirty.
- git_diff_patch_to_str (and git_diff_patch_print) were "printing"
the headers for files (and submodules) that were unmodified or
had no meaningful content.
- added comment to previous fix and removed unneeded parens.
GIT_DIFF_PATCH_DIFFABLE was not set, so the diff content was not shown
When submodule is dirty, the hash may be the same, but the length is different because -dirty is appended
We can therefore compare the length or hash
Return the size we'd need to write to instead of simply an
error. Split the function into two to be used later by the upstream
configuration functions.
This started out trying to look at the problems from issue #1425
and gradually grew to a broader set of fixes. There are two core
things fixed here:
1. When you had an ignore like "/bin" which is rooted at the top
of your tree, instead of immediately adding the "bin/" entry
as an ignored item in the diff, we were returning all of the
direct descendants of the directory as ignored items. This
changes things to immediately ignore the directory. Note that
this effects the behavior in test_status_ignore__subdirectories
so that we no longer exactly match core gits ignore behavior,
but the new behavior probably makes more sense (i.e. we now
will include an ignored directory inside an untracked directory
that we previously would have left off).
2. When a submodule only contained working directory changes, the
diff code was always considering it unmodified which was just
an outright bug. The HEAD SHA of the submodule matches the SHA
in the parent repo index, and since the SHAs matches, the diff
code was overwriting the actual status with UNMODIFIED.
These fixes broke existing tests test_diff_workdir__submodules and
test_status_ignore__subdirectories but looking it over, I actually
think the new results are correct and the old results were wrong.
@nulltoken had actually commented on the subdirectory ignore issue
previously.
I also included in the tests some debugging versions of the
shared iteration callback routines that print status or diff
information. These aren't used actively in the tests, but can be
quickly swapped in to test code to give a better picture of what
is being scanned in some of the complex test scenarios.
This option has been sitting unimplemented for a while, so I
finally went through and implemented it along with some tests.
As part of this, I improved the implementation of
GIT_DIFF_IGNORE_SUBMODULES so it be more diligent about avoiding
extra work and about leaving off delta records for submodules to
the greatest extent possible (though it may include them still
if you are request TYPECHANGE records).
This implements working versions of GIT_DIFF_RECURSE_IGNORED_DIRS
and GIT_STATUS_OPT_RECURSE_IGNORED_DIRS along with some tests for
the newly available behaviors. This is not turned on by default
for status, but can be accessed via the options to the extended
version of the command.
This adds crlf/lf conversion functions into buf_text with more
efficient implementations that bypass the high level buffer
functions. They attempt to minimize the number of reallocations
done and they directly write the buffer data as needed if they
know that there is enough memory allocated to memcpy data.
Tests are added for these new functions. The crlf.c code is
updated to use the new functions.
Removed the include of buf_text.h from filter.h and just include
it more narrowly in the places that need it.
This adds a check to the drop_crlf filter path to check it the
file in the index already has a CR in it, in which case this will
not drop the CRs from the workdir file contents.
This uncovered a "bug" in `git_blob_create_fromworkdir` where the
full path to the file was passed to look up the attributes instead
of the relative path from the working directory root. This meant
that the check in the index for a pre-existing entry of the same
name was failing.
Currently, the odb cache has a fixed size of 128 slots as defined by
GIT_DEFAULT_CACHE_SIZE. Allow users to set the size of the cache via
git_libgit2_opts().
Fixes#1035.
1. Fix sort order problem with submodules where "mod" was sorting
after "mod-plus" because they were being sorted as "mod/" and
"mod-plus/". This involved pushing the "contains a .git entry"
test significantly lower in the stack.
2. Reinstate behavior that a directory which contains a .git entry
will be treated as a submodule during iteration even if it is
not yet added to the .gitmodules.
3. Now that any directory containing .git is reported as submodule,
we have to be more careful checking for GIT_EEXISTS when we
do a submodule lookup, because that is the error code that is
returned by git_submodule_lookup when you try to look up a
directory containing .git that has no record in gitmodules or
the index.
This switches the APIs for setting and getting the global/system
search paths from using git_strarray to using a simple string with
GIT_PATH_LIST_SEPARATOR delimited paths, just as the environment
PATH variable would contain. This makes it simpler to get and set
the value.
I also added code to expand "$PATH" when setting a new value to
embed the old value of the path. This means that I no longer
require separate actions to PREPEND to the value.
Implicit type conversion argument of function to size_t type
Suspicious sequence of types castings: size_t -> int -> size_t
Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)'
Unsigned type is never < 0
The goal of this work is to expose the search logic for "global",
"system", and "xdg" files through the git_libgit2_opts() interface.
Behind the scenes, I changed the logic for finding files to have a
notion of a git_strarray that represents a search path and to store
a separate search path for each of the three tiers of config file.
For each tier, I implemented a function to initialize it to default
values (generally based on environment variables), and then general
interfaces to get it, set it, reset it, and prepend new directories
to it.
Next, I exposed these interfaces through the git_libgit2_opts
interface, reusing the GIT_CONFIG_LEVEL_SYSTEM, etc., constants
for the user to control which search path they were modifying.
There are alternative designs for the opts interface / argument
ordering, so I'm putting this phase out for discussion.
Additionally, I ended up doing a little bit of clean up regarding
attr.h and attr_file.h, adding a new attrcache.h so the other two
files wouldn't have to be included in so many places.
This adds a git_pool_freelist_item struct that makes it a little
easier to follow what's going on with the pool free list block
management code. It is functionally neutral.
This fixes a number of issues identified by valgrind - mostly
missed free calls. Inside valgrind, mmap() may fail which causes
some of the diff tests to fail. This adds a fallback code path
to diff_output.c:get_workdir_content() where is the mmap() fails
the code will now try to read the file data directly into allocated
memory (which is what it would do if the data needed to be filtered
anyhow).
This updates the tree iterator internals to be more efficient.
The tree_iterator_entry objects are now kept as pointers that are
allocated from a git_pool, so that we may use git__tsort_r for
sorting (which is better than qsort, given that the tree is
likely mostly ordered already).
Those tree_iterator_entry objects now keep direct pointers to the
data they refer to instead of keeping indirect index values. This
simplifies a lot of the data structure traversal code.
This also adds bsearch to find the start item position for range-
limited tree iterators, and is more explicit about using
git_path_cmp instead of reimplementing it. The git_path_cmp
changed a bit to make it easier for tree_iterators to use it (but
it was barely being used previously, so not a big deal).
This adds a git_pool_free_array function that efficiently frees a
list of pool allocated pointers (which the tree_iterator keeps).
Also, added new tests for the git_pool free list functionality
that was not previously being tested (or used).
This fixes two bugs with the workdir iterator depth check: first
that the depth was not being decremented and second that empty
directories were counting against the depth even though a frame
was not being created for them.
This also fixes a bug with the ENOTFOUND return code for workdir
iterators when you attempt to advance_into an empty directory.
Actually, that works correctly, but it was incorrectly being
propogated into regular advance() calls in some circumstances.
Added new tests for the above that create a huge hierarchy on
the fly and try using the workdir iterator to traverse it.
Clean up some sorting function stuff including fixing qsort_r
on MinGW, common function pointer type for comparison, and basic
insertion sort implementation (which we, regrettably, fall back
on for MinGW).
Given a group of case-insensitively equivalent tree iterator
entries, this ensures that the case-sensitively first trees will
be used as the representative items. I.e. if you have conflicting
entries "A/B/x", "a/b/x", and "A/b/x", this change ensures that
the earliest entry "A/B/x" will be returned. The actual choice
is not that important, but it is nice to have it stable and to
have it been either the first or last item, as opposed to a
random item from within the equivalent span.
Tree iterator advance was moving forward without taking the
filemode of the entries into account, equating "a" and "a/".
This makes the tree entry comparison code more easily reusable
and fixes the problem.
This fixes an off by one error for generating full paths for
tree entries in tree iterators when INCLUDE_TREES is set. Also,
contains a bunch of small code cleanups with a couple of small
utility functions and macro changes to eliminate redundant code.
If there are case-ambiguities in the path of a case insensitive
tree iterator, it will now rewrite the entire path when it gives
the path name to an entry, so a tree with "A/b/C/d.txt" and
"a/B/c/E.txt" will give the true full paths (instead of case-
folding them both to "A/B/C/d.txt" or "a/b/c/E.txt" or something
like that.
Previously, 0 meant default. This is problematic, as asking for 0
context lines is a valid thing to do.
Change GIT_DIFF_OPTIONS_INIT to default to three and stop treating 0
as a magic value. In case no options are provided, make sure the
options in the diff object default to 3.
Passing NULL is non-sensical. The error message leaves to be desired,
though, as it leaks internal implementation details. Catch it at the
`git_config_set_string` level and set an appropriate error message.
There is a serious bug in the previous tree iterator implementation.
If case insensitivity resulted in member elements being equivalent
to one another, and those member elements were trees, then the
children of the colliding elements would be processed in sequence
instead of in a single flattened list. This meant that the tree
iterator was not truly acting like a case-insensitive list.
This completely reworks the tree iterator to manage lists with
case insensitive equivalence classes and advance through the items
in a unified manner in a single sorted frame.
It is possible that at a future date we might want to update this
to separate the case insensitive and case sensitive tree iterators
so that the case sensitive one could be a minimal amount of code
and the insensitive one would always know what it needed to do
without checking flags.
But there would be so much shared code between the two, that I'm
not sure it that's a win. For now, this gets what we need.
More tests are needed, though.
It's somewhat common to try to write "/refs/tags/something". There is
no easy way to catch it during the main body of the function, as there
is no way to distinguish whether it's a leading slash or a double
slash somewhere in the middle.
Catch this at the beginning so we don't trigger the assert in
is_all_caps_and_underscore().
This standardizes iterator behavior across all three iterators
(index, tree, and working directory). Previously the working
directory iterator behaved differently from the other two.
Each iterator can now operate in one of three modes:
1. *No tree results, auto expand trees* means that only non-
tree items will be returned and when a tree/directory is
encountered, we will automatically descend into it.
2. *Tree results, auto expand trees* means that results will
be given for every item found, including trees, but you
only need to call normal git_iterator_advance to yield
every item (i.e. trees returned with pre-order iteration).
3. *Tree results, no auto expand* means that calling the
normal git_iterator_advance when looking at a tree will
not descend into the tree, but will skip over it to the
next entry in the parent.
Previously, behavior 1 was the only option for index and tree
iterators, and behavior 3 was the only option for workdir.
The main public API implications of this are that the
`git_iterator_advance_into()` call is now valid for all
iterators, not just working directory iterators, and all the
existing uses of working directory iterators explicitly use
the GIT_ITERATOR_DONT_AUTOEXPAND (for now).
Interestingly, the majority of the implementation was in the
index iterator, since there are no tree entries there and now
have to fake them. The tree and working directory iterators
only required small modifications.
The iterator APIs are not currently consistent with the parameter
ordering of the rest of the codebase. This rearranges the order
of parameters, simplifies the naming of a number of functions, and
makes somewhat better use of macros internally to clean up the
iterator code.
This also expands the test coverage of iterator functionality,
making sure that case sensitive range-limited iteration works
correctly.
`git_diff_get_patch()` would unconditionally load the patch object and
then simply leak it if the user hadn't requested it. Short-circuit
loading the object if the user doesn't want it.
The rest of the plugs are simply calling the free functions of objects
allocated during the tests.
These offsets are needed for REF_DELTA objects, which encode which
object they use as a base, but not where it lies in the packfile, so
we need a list.
These objects are mostly from older packfiles, before OFS_DELTA was
widely spread. The time spent in indexing these packfiles is greatly
reduced, though remains above what git is able to do.
This was the first implementation and its goal was simply to have
something that worked. It is slow and now it's just taking up
space. Remove it and switch the one known usage to use the streaming
indexer.
This removes assertions that prevent us from having an empty
git_config object and then updates some tests that were
dependent on global config state to use an empty config before
running anything.
This removes the one-off GIT_CDECL and adds a new standard way of
doing this named GIT_STDLIB_CALL with a src/win32 specific def
when on the Windows platform.
When creating files, instead of actually using GIT_FILEMODE_BLOB
and the other various constants that happen to correspond to
mode values, apparently I should be just using 0666 and 0777, and
relying on the umask to clear bits and make the value sane.
This fixes the rules for copying a template directory and fixes
the checks to match that new behavior. (Further changes to the
checkout logic to follow separately.)
The new tests were not taking core.filemode into account when
testing file modes after repo initialization. Fixed that and some
other Windows warnings that have crept in.
When PR #1359 removed the hooks from the test resources/template
directory, it made me realize that the tests for
git_repository_init_ext using templates must be pretty shabby
because we could not have been testing if the hooks were getting
created correctly.
So, this started with me recreating a couple of hooks, including
a sample and symlink, and adding tests that they got created
correctly in the various circumstances, including with the SHARED
modes, etc. Unfortunately this uncovered some issues with how
directories and symlinks were copied and chmod'ed. Also, there
was a FIXME in the code related to the chmod behavior as well.
Going back over the directory creation logic for setting up a
repository, I found it was a little difficult to read and could
result in creating and/or chmod'ing directories that the user
almost certainly didn't intend.
So that let to this work which makes repo initialization much
more careful (and hopefully easier to follow). It required a
couple of extensions / changes to core fileops utilities, but I
also think those are for the better, at least for git_futils_cp_r
in terms of being careful about what actions it takes.
This is designed to fix libgit2sharp #350 where if .gitignore is
a directory we abort all operations that process ignores instead
of just skipping it as core git does.
Also added test that fails without this change and passes with it.
This moves a couple of checks outside of the inner loop of the
find_similar rename/copy detection phase that are only dependent
on the "from" side of a detection.
Also, this replaces the inefficient initialization of the
options structure when a value is not provided explicitly by the
user.
Instead of creating three git_diff_similarity_metric statically
for the various config options, just create the metric structure
on demand and populate it, using the payload to specific the
extra flags that should be passed to the hashsig. This removes
a level of obfuscation from the code, I think.
This adds some new tests that actually exercise the similarity
metric between files to detect renames, copies, and split modified
files that are too heavily modified.
There is still more testing to do - these tests are just partially
covering the cases.
There is also one bug fix in this where a change set with only
MODIFY being broken into ADD/DELETE (due to low self-similarity)
without any additional RENAMED entries would end up not processing
the split requests (because the num_rewrites counter got reset).
This is the initial integration of the similarity metric into
the `git_diff_find_similar()` code path. The existing tests all
pass, but the new functionality isn't currently well tested. The
integration does go through the pluggable metric interface, so it
should be possible to drop in an alternative to the internal
metric that libgit2 implements.
This comes along with a behavior change for an existing interface;
namely, passing two NULLs to git_diff_blobs (or passing NULLs to
git_diff_blob_to_buffer) will now call the file_cb parameter zero
times instead of one time. I know it's strange that that change
is paired with this other change, but it emerged from some
initialization changes that I ended up making.
Previously the git_diff_delta recorded if the delta was binary.
This replaces that (with no net change in structure size) with
a full set of flags. The flag values that were already in use
for individual git_diff_file objects are reused for the delta
flags, too (along with renaming those flags to make it clear that
they are used more generally).
This (a) makes things somewhat more consistent (because I was
using a -1 value in the "boolean" binary field to indicate unset,
whereas now I can just use the flags that are easier to understand),
and (b) will make it easier for me to add some additional flags to
the delta object in the future, such as marking the results of a
copy/rename detection or other deltas that might want a special
indicator.
While making this change, I officially moved some of the flags that
were internal only into the private diff header.
This also allowed me to remove a gross hack in rename/copy detect
code where I was overwriting the status field with an internal
value.
This plugs in the three basic similarity strategies for handling
whitespace via internal use of the pluggable API. In so doing, I
realized that the use of git_buf in the hashsig API was not needed
and actually just made it harder to use, so I tweaked that API as
well.
Note that the similarity metric is still not hooked up in the
find_similarity code - this is just setting out the function that
will be used.
This moves the similarity metric code out of buf_text and into a
new file. Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes. In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.
This makes the text similarity metric treat \r as equivalent
to \n and makes it skip whitespace immediately following a line
terminator, so line indentation will have less effect on the
difference measurement (and so \r\n will be treated as just a
single line terminator).
This also separates the text and binary hash calculators into
two separate functions instead of have more if statements inside
the loop. This should make it easier to have more differentiated
heuristics in the future if we so wish.
This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score. This can be plugged into diff similarity
scoring.
This replaces most of the explicit vector iteration with calls
to git_vector_foreach, adds in some git__free and giterr_clear
calls to clean up during some error paths, and a couple of
other code simplifications.
The treebuilder entries vector flags removed items which means
we can't rely on the entries vector length to accurately get the
number of entries. This adds an entrycount value and maintains it
while updating the treebuilder entries.
The cppcheck static analyzer generates warnings for a bunch of
places in the libgit2 code base. All the ones fixed in this
commit are actually false positives, but I've reorganized the
code to hopefully make it easier for static analysis tools to
correctly understand the structure. I wouldn't do this if I
felt like it was making the code harder to read or worse for
humans, but in this case, these fixes don't seem too bad and will
hopefully make it easier for better analysis tools to get at any
real issues.
If gethostbyname() fails on platforms with NO_ADDRINFO, the code
leaks the struct addrinfo that was allocated. This fixes that
(and a number of code formatting issues in that area of code in
src/posix.c).
`git_diff_blobs` and `git_diff_blob_to_buffer` skip the step
where we check file attributes because they don't have a filename
associated with the data. Unfortunately, this meant they were also
skipping the check for the GIT_DIFF_FORCE_TEXT option and so you
could not force a diff of an apparent binary file. This adds the
force text check into their code path.
The callback will be called for each file, just before the `git_delta_t` gets inserted into the diff list.
When the callback:
- returns < 0, the diff process will be aborted
- returns > 0, the delta will not be inserted into the diff list, but the diff process continues
- returns 0, the delta is inserted into the diff list, and the diff process continues
Instead of returning directly the pattern as the return value, I used an
out parameter, because the function also tests if the passed pathspecs
vector is empty. If yes, it considers that the path "matches", but in
that case there is no matched pattern per se.
W/o this a libgit2 error message could have a mixed encoding:
e.g. a filename in UTF-8 combined with a native Windows error message
encoded with the local code page.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
A leading slash confuses the name normalization code when the flags
include ALLOW_ONELEVEL. Catch this case in particular to avoid
triggering an assertion in the uppercase check which expects us not to
pass it an empty string.
The existing tests don't catch this as they simply use the NORMAL
flag.
This fixes#1300.
This adds a `git_diff_patch_line_stats()` API that gets the total
number of adds, deletes, and context lines in a patch. This will
make it a little easier to emulate `git diff --stat` and the like.
Right now, this relies on generating the `git_diff_patch` object,
which is a pretty heavyweight way to get stat information. At
some future point, it would probably be nice to be able to get
this information without allocating the entire `git_diff_patch`,
but that's a much larger project.
This is a new implementation of core git's config key checking
rules that prevents non-alphanumeric characters (and '-') for
the top-level section and key names inside of config files.
This also validates the target section name when renaming
sections.
OpenBSD's realpath(3) doesn't require the last part of the path to
exist. Override p_realpath in this OS to bring it in line with the
library's assumptions.
Check whether the backslash at the end of the line is being escaped or
not so as not to consider it a continuation marker when it's e.g. a
Windows-style path.
This is a convenience function to get the branch name of a given
ref. The returned branch name is compatible with the name that can
be supplied e.g. to git_branch_lookup(). That is, the prefixes
"refs/heads" or "refs/remotes" are omitted.
Also added a new test for testing the new function.
With the new code to make tree iterators support ignore_case,
there is a bug in setting the start entry for range bounded
iterators where memcmp was being used instead of strncasecmp.
This fixes that and expands the tree iterator test to cover
the cases that were broken.
The commit time is already stored as a git_time_t, but we were
parsing is as a uint32_t. This just switches the parser to use
uint64_t which will handle dates further in the future (and adds
some tests of those future dates).
When the encoding header changed to be treated as an additional
header, the EOL pointer started to point to the byte after the LF,
making the git__strndup call copy the LF into the value.
Increase the EOL pointer value after copying the data to keep the rest
of the semantics but avoid copying LF.
This moves the check for the "encoding" header into a loop which
is just scanning for non-required headers at the end of a commit
header. That loop will skip unrecognized lines (including header
continuation lines) until a terminating completely blank line is
found, and only then does it move to reading the commit message.
This makes tree iterators directly support case insensitivity by
using a secondary index that can be sorted by icase. Also, this
fixes the ambiguity check in the git_status_file API to also be
case insensitive. Lastly, this adds new test cases for case
insensitive range boundary checking for all types of iterators.
With this change, it should be possible to deprecate the spool
and sort iterator, but I haven't done that yet.
This adds a new external API git_tree_entry_cmp and a new internal
API git_tree_entry_icmp for sorting tree entries. The case
insensitive one is internal only because general users should
never be seeing case-insensitively sorted trees.
git__bsearch and git__tsort did not pass a payload through to the
comparison function. This makes it impossible to implement sorted
lists where the sort order depends on external data (e.g. building
a secondary sort order for the entries in a tree). This commit
adds git__bsearch_r and git__tsort_r versions that pass a third
parameter to the cmp function of a user payload.
This changes the iterator API so that flags can be passed in to
the constructor functions to control the ignore_case behavior.
At this point, the flags are not supported on tree iterators (i.e.
there is no functional change over the old API), but the API
changes are all made to accomodate this.
By the way, I went with a flags parameter because in the future
I have a couple of other ideas for iterator flags that will make
it easier to fix some diff/status/checkout bugs.
Returning GIT_EAMBIGUOUS from inside the status callback gets
overridden with GIT_EUSER. `git_status_file` accounted for this
via the callback payload, but was allowing the error message to
be cleared. Move the `giterr_set` call outside the callback to
where the EUSER case was being dealt with.
In preparation for further iterator changes, this cleans up a few
small things in the iterator API:
* removed the git_iterator_for_repo_index_range API
* made git_iterator_free not be inlined
* minor param name and test function name tweaks
Somewhat surprisingly, this can increase the speed considerably, as we
don't bother trying to decide what to evict, and the most used entries
are quickly back into the cache.
This is an intermin solution. While this essentially disables the
--shared flag feature, previously external templates did not work
at all. This change fixes the previously corrected, and since
then failing, repo_init__extended_with_template() test.
The problem is now documented in the source code comments.
The indexer needs to call the packfile's free function so it takes care of
freeing the caches.
We still need to close the mwf descriptor manually so we can rename the
packfile into its final name on Windows.
Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.
This was just wrong. Added a test that verifying patch line
numbers even for hunks further into a file and then fixed the
algorithm. I needed to add a little extra state into the patch
so that I could track old and new file numbers independently,
but it should be okay.
Many delta bases are re-used. Cache them to avoid inflating the same
data repeatedly.
This version doesn't limit the amount of entries to store, so it can
end up using a considerable amound of memory.
This adds an option to checkout a la the diff option to turn off
fnmatch evaluation for pathspec entries. This can be useful to
make sure your "pattern" in really interpretted as an exact file
match only.
All the ODB backends have a specific refresh interface. When reading an
object, first we attempt every single backend: if the read fails, then
we refresh all the backends and retry the read one more time to see if
the object has appeared.
It is not legal inside our `p_mmap` function to mmap a zero length
file. This adds a test that exercises that case inside diff and
fixes the code path where we would try to do that.
The fix turns out not to be a lot of code since our default file
content is already initialized to "" which works in this case.
Fixes#1210