This fixes `git_submodule_sync` to correctly update the remote URL
of the default branch of the submodule along with the URL in the
parent repository config (i.e. match core Git's behavior).
Also move some useful helper logic from the submodule code into
a shared config API `git_config__update_entry` that can either set
or delete an entry with constraints like not overwriting or not
creating a new entry. I used that helper to update a couple other
places in the code.
There are a few places where we need to join three strings to
assemble a path. This adds a simple join3 function to avoid the
comparatively expensive join_n (which calls strlen on each string
twice).
The order in this function is the opposite to what
create_with_fetchspec() has, so change this one, as url-then-refspec is
what git does.
As we need to break compilation and the swap doesn't do that, let's take
this opportunity to rename in-memory remotes to anonymous as that's
really what sets them apart.
With the changes to how git_path_dirload_with_stat handles things
that look like submodules, submodules could end up sorted in the
wrong order with the workdir iterator. This moves the submodule
check earlier in the iterator processing of a new directory so
that the submodule name updates will happen immediately and the
sort order will be correct.
While looping over multiple heads, an up-to-date head will clobber the `remote->need_pack` setting, preventing the rest of the machinery from building and downloading a pack-file, breaking fetches.
If the first call to release a no-longer-existent submodule freed
the object, the check if a second is needed would dereference the
data that was just freed.
When a file is open for reading (without shared-delete permission), and
then a different thread/process called p_rename, that would fail, even
if the file was only open for reading for a few milliseconds. This
change lets p_rename wait up to 50ms for the file to be closed by the
reader. Applies only to win32.
This is especially important for git_filebuf_commit, because writes
should not fail if the file is read simultaneously.
Fixes#2207
When a submodule was inserted with a different path and name, the
return value from khash greater than zero was allowed to propagate
back out to the caller when it should really be zeroed. This led
to a possible crash when reloading submodules if that was the
first time that submodule data was loaded.
The reload_all call could end up dereferencing a NULL pointer if
there was an error while attempting to load the submodules config
data (i.e. invalid content in the gitmodules file). This fixes it.
This cleans up some places I missed that could hold onto submodule
references and cleans up the way in which the repository cache is
both reloaded and released so that existing submodule references
aren't destroyed inappropriately.
When a directory containing a .git directory (or even just a plain
gitlink) was found, libgit2 was going out of its way to treat it
specially. This seemed like it was necessary because the diff
code was not originally emulating Git's behavior for untracked
directories correctly (i.e. scanning for ignored vs untracked items
inside). Now that libgit2 diff mimics Git's untracked directory
behavior, the special handling for contained Git repos is actually
incorrect and this commit rips it out.
`git_submodule` objects were already refcounted internally in case
the submodule name was different from the path at which it was
stored. This makes that refcounting externally used as well, so
`git_submodule_lookup` and `git_submodule_add_setup` return an
object that requires a `git_submodule_free` when done.
As a way to speed up the cases where we need to hide some commits, we
find out what the merge bases are so we know to stop marking commits as
uninteresting and avoid walking down a potentially very large amount of
commits which we will never see. There are however two oversights in
current code.
The merge-base finding algorithm fails to recognize that if it is only
given one commit, there can be no merge base. It instead walks down the
whole ancestor chain needlessly. Make it return an empty list
immediately in this situation.
The revwalk does not know whether the user has asked to hide any commits
at all. In situation where the user pushes multiple commits but doesn't
hide any, the above fix wouldn't do the trick. Keep track of whether the
user wants to hide any commits and only run the merge-base finding
algorithm when it's needed.
The reflog append function was overzealous in its checking. When passed
an old and new ids, it should not do any checking, but just serialize
the data to a reflog entry.
The existing ones lack checking zeroed ids when switching back from an
unborn branch as well as what happens when detaching.
The reflog appending function mistakenly wrote zeros when dealing with a
detached HEAD. This explicitly checks for those situations and fixes
them.
When we update the current branch, we must also append to HEAD's reflog
to keep them in sync.
This is a bit of a hack, but as git.git says, it covers 100% of
default cases.
If the pqueue comparison fn returned just 0 or 1 (think "a<b")
then the sort order of returned items could be wrong because there
was a "< 0" that really needed to be "<= 0". Yikes!!!
The git_odb_exists_prefix API was not dealing correctly when a
later backend returned GIT_ENOTFOUND even if an earlier backend
had found the object.
Additionally, the unit tests were not properly exercising the API
and had a couple mistakes in checking the results.
Lastly, since the backends are not expected to behavior correctly
unless all bytes of the short id are zero except for the prefix,
this makes the ODB prefix APIs explicitly clear out the extra
bytes so the user doesn't have to be as careful.
We look up a reference in order to figure out if it's the current
branch, which we need to free once we're done with the check.
As a bonus, only perform the check when we're passed the force flag, as
it's a useless check otherwise.
We can make use of git_object_dup to use refcounting instead of pointer
comparison to make sure we don't free the caller's object.
This also lets us simplify the case for '~0' which is now just an
assignment instead of looking up the object we have at hand.
Previously the hunk_byfinalline_search_cmp function was called with different
data types (size_t and uint32_t) for the key argument but expected only the
former resulting in an invalid memory access when passed the latter on a 64 bit
machine.
The following patch makes sure that the function is called and works with the
same type (size_t).
If a directory disappears between the time we look up the entries of its
parent and the time when we go to look at it, we should ignore the error
and move forward.
This fixes#2046.
This finds a short id string that will unambiguously select the
given object, starting with the core.abbrev length (usually 7)
and growing until it is no longer ambiguous.
A few fixes have accumulated in this area which have made the freeing of
data a bit muddy. Make sure to free the data only when needed and once.
When we are going to write a delta to the packfile, we need to free the
data, otherwise leave it. The current version of the code mixes up the
checks for po->data and po->delta_data.
This adds `git_diff_buffers` and `git_patch_from_buffers`. This
also includes a bunch of internal refactoring to increase the
shared code between these functions and the blob-to-blob and
blob-to-buffer APIs, as well as some higher level assert helpers
in the tests to also remove redundancy.
- added MSVC cmake definitions to disable warnings
- general.c is rewritten so it is ansi-c compatible and compiles ok on microsoft windows
- some MSVC reported warning fixes
* Make GIT_INLINE an internal definition so it cannot be used in
public headers
* Fix language in CONTRIBUTING
* Make index caps API use signed instead of unsigned values
On some systems, notably HP PA-RISC systems running Linux or HP-UX,
EWOULDBLOCK and EAGAIN are not the same value. POSIX (and these OSes) allow
EWOULDBLOCK to occur on write(2) (and send(2), etc.), so check explicitly
for this case as well as EAGAIN by defining and using a macro GIT_ISBLOCKED
that considers both.
The macro is necessary because MSYS does not provide EWOULDBLOCK and
compilation fails if an attempt is made to use it unconditionally. On most
systems, where the two values are the same, the compiler will simply
optimize this check out and it will have no effect.
This adds an API to amend an existing commit, basically a shorthand
for creating a new commit filling in missing parameters from the
values of an existing commit. As part of this, I also added a new
"sys" API to create a commit using a callback to get the parents.
This allowed me to rewrite all the other commit creation APIs so
that temporary allocations are no longer needed.
This fixes a number of warnings with the Windows 64-bit build
including a test failure in test_repo_message__message where an
invalid pointer to a git_buf was being used.
We need this from util.h and posix.h, but the latter includes common.h
which includes util.h, which means p_strlen is not defined by the time
we get to git__strndup().
Split the definition on p_strlen() off into its own header so we can use
it in util.h.
The current code issues a lot of strncmp() calls in order to check for
the end of the header, simply in order to copy it and start going
through it again. These are a lot of calls for something we can check as
we go along. Knowing the amount of parents beforehand to reduce
allocations in extreme cases does not make up for them.
Instead start parsing immediately and check for the double-newline after
each header field, leaving the raw_header allocation for the end, which
lets us go through the header once and reduces the amount of strncmp()
calls significantly.
In unscientific testing, this has reduced a shortlog-like usage (walking
though the whole history of a branch and extracting data from the
commits) of git.git from ~830ms to ~700ms and makes the time we spend in
strncmp() negligible.
Let the user push committish objects and peel them to figure out which
commit to push to our queue.
This is for convenience and for allowing uses of
git_revwalk_push_glob(w, "tags")
with annotated tags.
Change the name to _matching() intead of _if(), and force _set_target()
to be a conditional update. If the user doesn't care about the old
value, they should use git_reference_create().
This tweaks the pqueue_up and pqueue_down routines so that they
will not do full element swaps but instead carry over the state
of the previous loop iteration and only assign elements for which
we know the final position. This will avoid a little bit of data
assignment which should improve performance in theory.
Also got rid of some vector helpers that I'm no longer using.
This fixes a typo I made for setting the sorted flag on the index
after a reload. That typo didn't actually cause any test failures
so I'm also adding a test that explicitly checks that the index is
correctly sorted after a reload when ignoring case and when not.
This updates the git_pqueue to simply be a set of specialized
init/insert/pop functions on a git_vector.
To preserve the pqueue feature of having a fixed size heap, I
converted the "sorted" field in git_vectors to a more general
"flags" field so that pqueue could mix in it's own flag. This
had a bunch of ramifications because a number of places were
directly looking at the vector "sorted" field - I added a couple
new git_vector helpers (is_sorted, set_sorted) so the specific
representation of this information could be abstracted.
I accidentally wrote a separate priority queue implementation when
I was working on file rename detection as part of the file hash
signature calculation code. To simplify licensing terms, I just
adapted that to a general purpose priority queue and replace the
old priority queue implementation that was borrowed from elsewhere.
This also removes parts of the COPYING document that no longer
apply to libgit2.
Validating the workdir should not compare HEAD to working
directory - this is both inefficient (as it ignores the cache)
and incorrect. If we had legitimately allowed changes in the
index (identical to the merge result) then comparing HEAD to
workdir would reject these changes as different. Further, this
will identify files that were filtered strangely as modified,
while testing with the cache would prevent this.
Also, it's stupid slow.
The checkout code used to defer removal of "blocking" files in
checkouts until the blocked item was actually being written (since
we have already checked that the removing the block is acceptable
according to the update rules). Unfortunately, this resulted in
an intermediate index state where both the blocking and new items
were in the index which is no longer allowed. Now we just remove
the blocking item in the first pass so it never needs to coexist.
In cases where there are typechanges, this could result in a bit
more churn of removing and recreating intermediate directories,
but I'm going to assume that is an unusual case and the churn will
not be too costly.
There were some confusing issues mixing up the number of bytes
written to the zstream output buffer with the number of bytes
consumed from the zstream input. This reorganizes the zstream
API and makes it easier to deflate an arbitrarily large input
while still using a fixed size output.
This removes the fetchRecurse compiler warnings and makes the
behavior match the other submodule options (i.e. the in-memory
setting can be reset to the on-disk value).
When three-way merging indexes, we previously changed each path
as we read them, which would lead to us adding an index entry for
'foo', then removing an index entry for 'foo/file'. With the new
index requirements, this is not allowed. Removing entries in the
merged index, then adding them, resolves this. In the previous
example, we now remove 'foo/file' before adding 'foo'.
In case insensitive index mode, we would stop at a prefixed entry,
treating the provided search key length as a substring, not the
length of the string to match.
Writing a sample Javascript driver pointed out some extra
whitespace handling that needed to be done in the diff driver.
This adds some tests with some sample javascript code that I
pulled off of GitHub just to see what would happen. Also, to
clean up the userdiff test data, I did a "git gc" and packed
up the test objects.
Since I don't have permission yet on the code from Git, I decided
I'd take a stab at writing patterns for PHP and Javascript myself.
I think these are pretty weak, but probably better than the
default behavior without them.
I contacted a number of Git authors and lined up their permission
to relicense their work for use in libgit2 and copied over their
code for diff driver xfuncname patterns. At this point, the code
I've copied is taken verbatim from core Git although Thomas Rast
warned me that the C++ patterns, at least, really need an update.
I've left off patterns where I don't feel like I have permission
at this point until I hear from more authors.
Reorganize the builtin driver table slightly so that core Git
builtin definitions can be imported verbatim. Then take a few of
the core Git drivers and pull them in.
This also creates a test of diffs with the builtin HTML driver
which led to some small error handling fixes in the driver
selection logic.
This extends the diff driver parser to support multiline driver
definitions along with ! prefixing for negated matches. This
brings the driver function pattern parsing in line with core Git.
This also adds an internal table of driver definitions and a
fallback code path that will look in that table for diff drivers
that are set with attributes without having a definition in the
config file. Right now, I just populated the table with a kind
of simple HTML definition that is similar to the core Git def.
Don't try to determine whether the system supports file modes
when putting the tree data in the index during checkout. The tree's
mode is canonical and did not come from stat(2) in the first place.
It's hard or even impossible to correctly free the string buffer
allocated by git_patch_to_str in some circumstances. Drop the function
so people have to use git_patch_to_buf instead - git_buf has a dedicated
destructor.
Returning library-allocated strings from libgit2 works fine on Linux,
but may cause problems on Windows because there is no one C Runtime that
everything links against. With libgit2 not exposing its own allocator,
freeing the string is a gamble.
git_patch_to_str already serializes to a buffer, then returns the
underlying memory. Expose the functionality directly, so callers can use
the git_buf_free function to free the memory later.
The "merge none" (don't automerge) flag was only to aide in
merge trivial tests. We can easily determine whether merge
trivial resulted in a trivial merge or an automerge by examining
the REUC after automerge has completed.
The default merge_file level was XDL_MERGE_MINIMAL, which will
produce conflicts where there should not be in the case where
both sides were changed identically. Change the defaults to be
more aggressive (XDL_MERGE_ZEALOUS) which will more aggressively
compress non-conflicts. This matches git.git's defaults.
Increase testing around reverting a previously reverted commit to
illustrate this problem.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
This changes git_signature_dup to actually honor oom conditions raised by
the call to git__strdup. It also aligns it with the error code return
pattern used everywhere else.
Ok, scrap the previous commit. This is the right overflow check that
takes care of 64 bit overflow **and** 32-bit overflow, which needs to be
considered because the pool malloc can only allocate 32-bit elements in
one go.
Note that `git_pool_strdup` cannot really return any error codes,
because the pool doesn't set errors on OOM.
The only place where `giterr_set_oom` is called is in
`git_pool_strndup`, in a conditional check that is always optimized
away. `n + 1` cannot be zero if `n` is unsigned because the compiler
doesn't take wraparound into account.
This check has been removed altogether because `size_t` is not
particularly going to overflow.
This renames git_vector_free_all to the better git_vector_free_deep
and also contains a couple of memory leak fixes based on valgrind
checks. The fixes are specifically: failure to free global dir
path variables when not compiled with threading on and failure to
free filters from the filter registry that had not be initialized
fully.
This adds tests that try canceling an indexer operation from
within the progress callback.
After writing the tests, I wanted to run this under valgrind and
had a number of errors in that situation because mmap wasn't
working. I added a CMake option to force emulation of mmap and
consolidated the Amiga-specific code into that new place (so we
don't actually need separate Amiga code now, just have to turn on
-DNO_MMAP).
Additionally, I made the indexer code propagate error codes more
reliably than it used to.
Clone callbacks can return non-zero values to cancel the clone.
This adds some tests to verify that this actually works and updates
the documentation to be clearer that this can happen and that the
return value will be propagated back by the clone function.
The checkout notify callback behavior on non-zero return values
was not being tested. This adds tests, fixes a bug with positive
values, and clarifies the documentation to make it clear that the
checkout can be canceled via this mechanism.
The callback to supply data chunks could return a negative value
to stop creation of the blob, but we were neither using GIT_EUSER
nor propagating the return value. This makes things use the new
behavior of returning the negative value back to the user.
This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
There are a lot of places that we call git__free on each item in
a vector and then call git_vector_free on the vector itself. This
just wraps that up into one convenient helper function.
This continues auditing all the places where GIT_EUSER is being
returned and making sure to clear any existing error using the
new giterr_user_cancel helper. As a result, places that relied
on intercepting GIT_EUSER but having the old error preserved also
needed to be cleaned up to correctly stash and then retrieve the
actual error.
Additionally, as I encountered places where error codes were not
being propagated correctly, I tried to fix them up. A number of
those fixes are included in the this commit as well.
This adds giterr_user_cancel to return GIT_EUSER and clear any
error message that is sitting around. As a result of using that
in places, we need to be more thorough with capturing errors that
happen inside a callback when used internally. To help with that,
this also adds giterr_capture and giterr_restore so that when we
internally use a foreach-type function that clears errors and
converts them to GIT_EUSER, it is easier to restore not just the
return value, but the actual error message text.
This adds `git_config__lookup_entry` which will look up a key in
a config and return either the entry or NULL if the key was not
present. Optionally, it can either suppress all errors or can
return them (although not finding the key is not an error for this
function). Unlike other accessors, this does not normalize the
config key string, so it must only be used when the key is known
to be in normalized form (i.e. all lower-case before the first dot
and after the last dot, with no invalid characters).
This also adds three high-level helper functions to look up config
values with no errors and a fallback value. The three functions
are for string, bool, and int values, and will resort to the
fallback value for any error that arises. They are:
* `git_config__get_string_force`
* `git_config__get_bool_force`
* `git_config__get_int_force`
None of them normalize the config `key` either, so they can only
be used for internal cases where the key is known to be in normal
format.
The frontend used to look at the file directly, but that's obviously not
the right thing to do. Expose it on the backend and use that function
instead.
git-core only writes to the reflogs of HEAD, refs/heads/ and,
refs/notes/ or if there is already a reflog in place. Adjust our code to
follow these semantics.
When doing copy detection, it is often necessary to include
UNMODIFIED records in the git_diff so they are available as source
records for GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED. Yet in the final
diff, often you will not want to have these UNMODIFIED records.
This adds a flag which marks these UNMODIFIED records for deletion
from the diff list so they will be removed after the rename detect
phase is over.
When FIND_COPIES is used in combination with BREAK_REWRITES for
rename detection, there was a bug where the split MODIFIED delta
was only used as a target for RENAME records and not for COPIED
records. This fixes that, converting the split into a pair of
DELETED and COPIED deltas when that circumstance arises.
Whenever a reference is created or updated, we need to write to the
reflog regardless of whether the user gave us a message, so we shouldn't
leave that to the ref frontend, but integrate it into the backend.
This also eliminates the race between ref update and writing to the
reflog, as we protect the reflog with the ref lock.
As an additional benefit, this reflog append on the backend happens by
appending to the file instead of parsing and rewriting it.
Copy the pointers into temporary vectors instead of assigning them tot
he same array so we don't mess up with someone else's memory by
accident (e.g. by sorting).
The callback-based method of listing remote references dates back to the
beginning of the network code's lifetime, when we didn't know any
better.
We need to keep the list around for update_tips() after disconnect() so
let's make use of this to simply give the user a pointer to the array so
they can write straightforward code instead of having to go through a
callback.
Removing arbitrary refspecs makes things more complex to reason
about. Instead, let the user set the fetch and push refspec list to
whatever they want it to be.
Create a git_branch_iterator type which is equivalent to the foreach but
lets us write loops instead of callbacks.
Since the introduction of git_reference_shorthand(), the added value of
passing the name is reduced.
When the filesystem iterator encounters an error with a file, it
returns the error but because of the cleanup code, it was in some
cases erasing the error message. This uses the giterr_detach API
to make sure that the actual error message is restored after the
cleanup code has been run.
There are a number of cases where it is convenient to be able to
fetch and "claim" the current error string, clearing the error.
This is helpful when you need to call some code that may alter
the error and you want to restore it later on and/or report it via
some other mechanism.
We used to move `data_start` forward, which is wrong as that needs to
point to the beginning of the buffer in order to perform size
calculations.
Introduce a `write_start` variable which indicates where we should start
writing from, which is what the `data_start` was being wrongly reused to
be.
The last commit taught git_checkout_tree to actually do something
meaningfull, when treeish was NULL. This lets us rewrite
git_checkout_head to simply call git_checkout_tree without giving it a
treeish.
In git_checkout_tree, the first check tests if either repo or treeish is
NULL and says that eithor of them has to have a valid value. But there
is no code to handle the treeish == NULL case.
So, do something meaningful in that case: use HEAD instead.
When downloading the default branch due to lack of refspecs, we still
need to write out FETCH_HEAD with the tip we downloaded, unfortunately
with a format that doesn't match what we already have.
This avoids sending our whole history bit by bit to the remote in cases
where there is no common history, just to give up in the end.
The number comes from the canonical implementation.
The correct behaviour when a remote has no refspecs (e.g. a URL from the
command-line) is to download the remote's HEAD. Let's do that.
This fixes#1261.
This was never really working right because we were checking the
wrong flag and not checking it in all the places that we need to
be checking it. I finally got around to writing a test and adding
actual support for it.
Sometimes the static initializer for git_diff_options cannot be
used and since setting them to all zeroes doesn't actually work
quite right, this adds a new helper for that situation.
This also adds an explicit new value to the submodule settings
options to be used when those enums need static initialization.
This changes `git_index_read` to have two modes - a hard index
reload that always resets the index to match the on-disk data
(which was the old behavior) and a soft index reload that uses
the timestamp / file size information and only replaces the index
data if the file on disk has been modified.
This then updates the git_status code to do a soft reload unless
the new GIT_STATUS_OPT_NO_REFRESH flag is passed in.
This also changes the behavior of the git_diff functions that use
the index so that when an index is not explicitly passed in (i.e.
when the functions call git_repository_index for you), they will
also do a soft reload for you.
This intentionally breaks the file signature of git_index_read
because there has been some confusion about the behavior previously
and it seems like all existing uses of the API should probably be
examined to select the desired behavior.
These changes fix the basic problem with GIT_DIFF_REVERSE being
broken for text diffs. The reversed diff entries were getting
added to the git_diff correctly, but some of the metadata was kept
incorrectly in a way that prevented the text diffs from being
generated correctly. Once I fixed that, it became clear that it
was not possible to merge reversed diffs correctly. This has a
first pass at fixing that problem. We probably need more tests
to make sure that is really fixed thoroughly.
Seems that regexp in Mac OS X and Linux were behaving
differently: while in OS X the empty string didn't
match any value, in Linux it was matching all of them,
so the the second fetch refspec was overwritting the
first one, instead of creating a new one.
Using an unmatcheable regular expression solves the
problem (and seems to be portable).
At some moment git_config_delete_entry lost the ability to delete one entry of
a multivar configuration. The moment you had more than one fetch or push
ref spec for a remote you will not be able to save that remote anymore. The
changes in network::remote::remotes::save show that problem.
I needed to create a new git_config_delete_multivar because I was not able to
remove one or several entries of a multivar config with the current API.
Several tries modifying how git_config_set_multivar(..., NULL) behaved were
not successful.
git_config_delete_multivar is very similar to git_config_set_multivar, and
delegates into config_delete_multivar of config_file. This function search
for the cvar_t that will be deleted, storing them in a temporal array, and
rebuilding the linked list. After calling config_write to delete the entries,
the cvar_t stored in the temporal array are freed.
There is a little fix in config_write, it avoids an infinite loop when using
a regular expression (case for the multivars). This error was found by the
test network::remote::remotes::tagopt.
This tells the server that we speak it, but we don't make use of its
extra information to determine if there's a better place to stop
negotiating.
In a somewhat-related change, reorder the capabilities so we ask for
them in the same order as git does.
Also take this opportunity to factor out a fairly-indented portion of
the negotiation logic.
It was there to keep it apart from the one which read in from a file on
disk. This other indexer does not exist anymore, so there is no need for
anything other than git_indexer to refer to it.
While here, rename _add() function to _append() and _finalize() to
_commit(). The former change is cosmetic, while the latter avoids
talking about "finalizing", which OO languages use to mean something
completely different.
When building libgit2 for ia32 architecture on a x64 machine, including
"config.h" without a "common.h" would result the following error:
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2288): error C2373: 'InterlockedIncrement' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2295): error C2373: 'InterlockedDecrement' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2303): error C2373: 'InterlockedExchange' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2314): error C2373: 'InterlockedExchangeAdd' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
The user is unable to derive the number of deltas in the pack, as that
would require them to capture the stats exactly in the moment between
download and final processing, which is abstracted away in the fetch.
Capture these numbers for the user and expose them in the progress
struct. The clone and fetch examples now also present this information
to the user.
The names from libssh2 are somewhat obtuse for us. We can simplify the
usual key/passphrase credential's name, as well as make clearer what the
custom signature function is.
It seems that to implement these options, we just have to pass
the appropriate flags through to the libxdiff code taken from
core git. So let's do it (and add a test).
Instead of having functions with so very many parameters to pass
hunk and line data, this takes the existing git_diff_hunk struct
and extends it with more hunk data, plus adds a git_diff_line.
Those structs are used to pass back hunk and line data instead of
the old APIs that took tons of parameters.
Some work that was previously only being done for git_diff_patch
creation (scanning the diff content for exact line counts) is now
done for all callbacks, but the performance difference should not
be noticable.
While the base git_diff_delta structure always contains two files,
when we introduce conflict data, it will be helpful to have an
indicator when an additional file is involved.
Move conflict handling into two steps: load the conflicts and
then apply the conflicts. This is more compatible with the
existing checkout implementation and makes progress reporting
more sane.
If a D/F conflict or rename 2->1 conflict occurs,
we write the file sides as filename~branchname. If
a file with that name already exists in the working
directory, write as filename~branchname_0 instead.
(Incrementing 0 until a unique filename is found.)
This lays groundwork for separating formatting options from diff
creation options. This groups the formatting flags separately
from the diff list creation flags and reorders the options. This
also tweaks some APIs to further separate code that uses patches
from code that just looks at git_diffs.
This makes no functional change to diff but renames a couple of
the objects and splits the new git_patch (formerly git_diff_patch)
into a new header file.
Don't increase the number of total objects, as it can produce
suprising progress output. The only addition compared to pre-thin is
the addition of local_objects to allow an output similar to git's
"completed with %d local objects".
The iconv init was accidentally clearing the default error state
during reference normalization. This resets so that normalization
errors will be detected correctly.
Before these changes, looking up a reference would return the
same precomposed or decomposed form of the reference name that
was used to look it up, so on MacOS which ignores the difference
between the two, a single reference could be looked up either way
and git_reference_name would return the form of the name that was
used to look it up! This change makes lookup always return the
precomposed name if core.precomposeunicode is set regardless of
which version was used to look it up. The reference iterator was
already returning the precomposed form from earlier work.
This also updates the CMakeLists.txt rules for enabling iconv
usage because the clar tests for this code were actually not being
activated properly with the old version.
Finally, this moves git_repository_reset_filesystem from include/
git2/repository.h to include/git2/sys/repository.h since it is not
really a function that normal library users should have to think
about very often.
This cleans up some additional issues. The main change is that
on a filesystem that doesn't support mode bits, libgit2 will now
create new blobs with GIT_FILEMODE_BLOB always instead of being
at the mercy to the filesystem driver to report executable or not.
This means that if "core.filemode" lies and claims that filemode
is not supported, then we will ignore the executable bit from the
filesystem. Previously we would have allowed it.
This adds an option to the new git_repository_reset_filesystem to
recurse through submodules if desired. There may be other types
of APIs that would like a "recurse submodules" option, but this
one is particularly useful.
This also has a number of cleanups, etc., for related things
including trying to give better error messages when problems come
up from the filesystem. For example, the FAT filesystem driver on
MacOS appears to return errno EINVAL if you attempt to write a
filename with invalid UTF-8 in it. We try to capture that with a
better error message now.
There may be multiple deltas referencing the same base as well as OFS
deltas which rely on a thin delta. Deal with both at the same time by
injecting a single object and going back up to the main
delta-resolving loop.
When a tool needs to recreate the tree object (for example an
interface to another VCS), it needs to use the raw attributes,
forgoing any normalization.
When a repository is transferred from one file system to another,
many of the config settings that represent the properties of the
file system may be wrong. This adds a new public API that will
refresh the config settings of the repository to account for the
change of file system. This doesn't do a full "reinitialize" and
operates on a existing git_repository object refreshing the config
when done.
This commit then makes use of the new API in clar as each test
repository is set up.
This commit also has a number of other clar test fixes where we
were making assumptions about the type of filesystem, either based
on outdated config data or based on the OS instead of the FS.
When given an ODB from which to read objects, the indexer will attempt
to inject the missing bases at the end of the pack and update the
header and trailer to reflect the new contents.
Though unusual, a packfile may contain a delta whose base is a delta
that comes later. In order index such a packfile, we must not give up
on the first failure to resolve a delta, but keep it around.
If there is a pass which makes no progress, this indicates that the
packfile is broken, so fail accordingly.
The repo init code was assuming Windows == no filemode, and
Mac or Windows == no case sensitivity. Those assumptions are not
consistently true depending on the mounted file system. This is a
first step to removing those assumptions. It focuses on the repo
init code and the tests of that code. There are still many other
tests that are broken when those assumptions don't hold true, but
this clears up one area of the code.
Also, this moves the core.precomposeunicode logic to be closer to
the current logic in core Git where it will be set to true on any
filesystem where composed unicode is decomposed when read back.
The indexer code was generating warnings on Windows 64-bit. I
looked closely at the logic and was able to simplify it a bit.
Also this fixes some other Windows and Linux warnings.
This adds a simple wrapper around the iconv APIs and uses it
instead of the old code that was inlining the iconv stuff. This
makes it possible for me to test the iconv logic in isolation.
A "no iconv" version of the API was defined with macros so that
I could have fewer ifdefs in the code itself.
This simplifies git_path_is_empty_dir on both Windows (getting rid
of git_buf allocation inside the function) and other platforms (by
just using git_path_direach), and adds tests for the function, and
uses the function to simplify some existing tests.
This hooks up git_path_direach and git_path_dirload so that they
will take a flag indicating if directory entry names should be
tested and converted from decomposed unicode to precomposed form.
This code will only come into play on the Apple platform and even
then, only when certain types of filesystems are used.
This involved adding a flag to these functions which involved
changing a lot of places in the code.
This was an opportunity to do a bit of code cleanup here and there,
for example, getting rid of the git_futils_cleanupdir_r function in
favor of a simple flag to git_futils_rmdir_r to not remove the top
level entry. That ended up adding depth tracking during rmdir_r
which led to a safety check for infinite directory recursion. Yay.
This hasn't actually been tested on the Mac filesystems where the
issue occurs. I still need to get test environment for that.
This doesn't actual do string precompose but it puts the hooks in
place into the iterators and the git_path_dirload function so that
the actual precompose work is ready to go.
This adds initialization of core.precomposeunicode to repo init
on Mac. This is necessary because when a Mac accesses a repo on
a VFAT or SAMBA file system, it will return directory entries in
decomposed unicode even if the filesystem entry is precomposed.
This also removes caching of a number of repo properties from the
repo init pipeline because these are properties of the specific
filesystem on which the repo is created, not of the system as a
whole.
This commit adds cancellation for the push operation. This work consists of:
1) Support cancellation during push operation
- During object counting phase
- During network transfer phase
- Propagate GIT_EUSER error code out to caller
2) Improve cancellation support during fetch
- Handle cancellation request during network transfer phase
- Clear error string when cancelled during indexing
3) Fix error handling in git_smart__download_pack
Cancellation during push is still only handled in the pack building and
network transfer stages of push (and not during packbuilding).
References and their logs are logically coupled, let's make it so in
the code by moving the fs-based reflog implementation to live next to
the fs-based refs one.
As part of the change, make the function take names rather than
references, as only the names are relevant when looking up and
handling reflogs.
The basic clone function is there to make it easy to create a "normal"
clone. Remove a bunch of options that are about changing the remote's
configuration.
The text progress and update_tips callbacks are already part of the
struct, which was meant to unify the callback setup, but the download
one was left out.
This adds the basics of progress reporting during push. While progress
for all aspects of a push operation are not reported with this change,
it lays the foundation to add these later. Push progress reporting
can be improved in the future - and consumers of the API should
just get more accurate information at that point.
The main areas where this is lacking are:
1) packbuilding progress: does not report progress during deltafication,
as this involves coordinating progress from multiple threads.
2) network progress: reports progress as objects and bytes are going
to be written to the subtransport (instead of as client gets
confirmation that they have been received by the server) and leaves
out some of the bytes that are transfered as part of the push protocol.
Basically, this reports the pack bytes that are written to the
subtransport. It does not report the bytes sent on the wire that
are received by the server. This should be a good estimate of
progress (and an improvement over no progress).
The subtransport path was relying on pointing to data owned by
the remote which meant that after a redirect, the updated path
was getting lost for future requests. This updates the http
transport to strdup the path and maintain its own lifetime.
This also pulls responsibility for parsing the URL back into the
http transport and isolates the functions that parse and free that
connection data so that they can be reused between the initial
parsing and the redirect parsing.
On occasion, files can disappear while we're iterating the
filesystem, between calls to readdir and stat. Let's pretend
those didn't exist in the first place.
The git_buf_text_gather_stats call returns a boolean indicating if
the file looks like binary data. That shouldn't be an error; it
should be used to skip CRLF processing though.
This replaces some git_buf_printf calls with simple calls to
git_buf_put instead. Also, it fixes a missing va_end inside
the git_buf_vprintf implementation.
The attempt to "clean up warnings" seems to have introduced some
new warnings on compliant compilers. This fixes those in a way
that I suspect will also be okay for the non-compliant compilers.
Also this fixes what appears to be an extra semicolon in the
repo initialization template dir handling (and as part of that
fix, handles the case where an error occurs correctly).
In revwalk, we are doing a very simple check to see if a string
contains wildcard characters, so a full regular expression match
is not needed.
In remote listing, now that we have git_config_foreach_match with
full regular expression matching, we can take advantage of that
and eliminate the regex here, replacing it with much simpler string
manipulation.
This contains a few bug fixes and some header and API cleanups.
The main API change is that filters should now use GIT_PASSTHROUGH
to indicate that they wish to skip processing a file instead of
GIT_ENOTFOUND.
The bug fixes include a possible out-of-range buffer access in
the ident filter, a filter ordering problem I introduced into the
custom filter tests on Windows, and a filter buf NUL termination
issue that was coming up on Linux.
This adds more tests of filters, including the ident filter when
mixed with custom filters. I was able to combine with the reverse
filter and demonstrate that the order of filter application with
the default priority constants matches the order of core Git.
Also, this fixes two issues in the ident filter: preventing ident
expansion on binary files and avoiding a NULL dereference when
dollar sign characters are found without Id.
Fixed the filter order to match core Git, too.
This test demonstrates an interesting behavior of core Git (which
is totally reasonable and which libgit2 matches, although mostly
by coincidence). If you use the ident filter and commit a file
with a garbage ident in it, like '$Id: this is just garbage$' and
then immediately do a 'git checkout-index' with the new file, Git
will not consider the file out of date and will not overwrite the
file with an updated $Id$. Libgit2 has the same behavior. If you
remove the file and then do a checkout-index, it will be replaced
with a filtered version that has injected the OID correctly.
These are a couple of new clar helpers for testing that a file
has expected contents that I extracted from the checkout code.
Actually wrote this as part of an abandoned earlier attempt at a
new filters API, but it will be useful now for some of the tests
I'm going to write.
I wish MSVC understood that "const char **" is not a const ptr,
but it a non-const pointer to an array of const ptrs. Does that
seem like too much to ask.
Checkout should not reject binary files from filters, as a filter
may actually wish to operate on binary files. The CRLF filter should
reject binary files itself if it wishes to. Moreover, the CRLF
filter requires this logic so that users can emulate the checkout
data in their odb -> workdir filtering.
Conflicts:
src/checkout.c
src/crlf.c
This makes the git_buf struct that was used internally into an
externally available structure and eliminates the git_buffer.
As part of that, some of the special cases that arose with the
externally used git_buffer were blended into the git_buf, such as
being careful about git_buf objects that may have a NULL ptr and
allowing for bufs with a valid ptr and size but zero asize as a
way of referring to externally owned data.
This adds the ident filter (that knows how to replace $Id$) and
tweaks the filter APIs and code so that git_filter_source objects
actually have the updated OID of the object being filtered when
it is a known value.
Extend the git2/sys/filter API with functions to look up a filter
and add it manually to a filter list. This requires some trickery
because the regular attribute lookups and checks are bypassed when
this happens, but in the right hands, it will allow a user to have
granular control over applying filters.
Increasingly there are a number of components that want to do some
cleanup at global shutdown time (at least if there are not going
to be memory leaks). This creates a very simple system of shutdown
hooks that will be invoked by git_threads_shutdown. Right now, the
maximum number of hooks is hardcoded, but since adding a hook is
not a public API, it should be fine and I thought it was better to
start off with really simple code.
This moves the git_filter_list into the public API so that users
can create, apply, and dispose of filter lists. This allows more
granular application of filters to user data outside of libgit2
internals.
This also converts all the internal usage of filters to the public
APIs along with a few small tweaks to make it easier to use the
public git_buffer stuff alongside the internal git_buf.
The filter registry as implemented was too primitive to actually
work once multiple filters were coming into play. This expands
the implementation of the registry to handle multiple prioritized
filters correctly.
Additionally, this adds an "attributes" field to a filter that
makes it really really easy to implement filters that are based
on one or more attribute values. The lookup and even simple value
checking can all happen automatically without custom filter code.
Lastly, with the registry improvements, this fills out the filter
lifecycle callbacks, with initialize and shutdown callbacks that
will be called before the filter is first used and after it is
last invoked. This allows for system-wide initialization and
cleanup by the filter.
This creates include/sys/filter.h with a basic definition of a
git_filter and then converts the internal code to use it. There
are related internal objects (git_filter_list) that we will want
to publish at some point, but this is a first step.
This begins the process of exposing git_filter objects to the
public API. This includes:
* new public type and API for `git_buffer` through which an
allocated buffer can be passed to the user
* new API `git_blob_filtered_content`
* make the git_filter type and GIT_FILTER_TO_... constants public
Unfortunately git-core uses the term "unborn branch" and "orphan
branch" interchangeably. However, "orphan" is only really there for
the checkout command, which has the `--orphan` option so it doesn't
actually create the branch.
Branches never have parents, so the distinction of a branch with no
parents is odd to begin with. Crucially, the error messages deal with
unborn branches, so let's use that.
As the include depth increases, the chance of a realloc
increases. This means that whenever we run git_array_alloc() or call
config_parse(), we need to remember what our reader's index is so we
can look it up again.
We need to refresh the variables from the included files if they are
changed, so loop over all included files and re-parse the files if any
of them has changed.
Now that #1785 is merged, git_odb_stream_finalize_write() calculates the object id before invoking the odb backend.
This commit gives a chance to the backend to check if it already knows this object.
The GIT_MODE_TYPE macro was looking at all bits above the
permissions, but it should really just look at the top bits so
that it will give the right results for a setgid or setuid entry.
Since we're now using these macros in the tests, this was causing
a test failure on platforms that don't support setgid.
This adds some more macros for some standard operations on file
modes, particularly related to permissions, and then updates a
number of places around the code base to use the new macros.
In order to support config includes, we must differentiate between the
backend's main file and the file we are currently parsing.
This lays the groundwork for includes, keeping the current behaviours.
Previously, `git_object_read()`, `git_object_read_prefix()` and
`git_object_exists()` were implementing an auto refresh logic. When the
expected object couldn't be found in any backend, a call to
`git_odb_refresh()` was triggered and the lookup was once again performed
against all backends.
This commit removes this auto-refresh logic from the odb layer and pushes
it down into the pack-backend (as it's the only one currently exposing
a `refresh()` endpoint).
This simplifies the git_repository_is_empty a bit so that a
detached HEAD is just taken to mean the repo is not empty, since
a newly initialized repo will not have a detached HEAD.
Ensure that we apply splits to rewrites, even if we're not
interested in examining it closely for rename/copy detection.
In keeping with core git, status should not display rewrites,
it should simply show files as "modified".
In order to be loaded, a remote needs to be configured with at least a `url` or a `pushurl`.
ENOTFOUND will be returned when trying to git_remote_load() a remote with neither of these entries defined.
This loads SRWLock APIs at runtime and in their absence (i.e. on
Windows before Vista) falls back on a regular CRITICAL_SECTION
that will not permit concurrent readers.
9e9aee6 added an include <netinet/in.h> to fix the build on FreeBSD.
Sometime since then the same header is included ifndef _WIN32, so
remove the duplicate include.
This converts an internal lock from a write lock to a read lock
where write isn't needed, and also clarifies some doc things about
where various locks are acquired and how various APIs are intended
to be used.
This adds thread safety to the refdb_fs by using the new
git_sortedcache object and also by relaxing the handling of some
filesystem errors where the fs may be changed out from under us.
This also adds some new threading tests that hammer on the refdb.
The refdb_fs implementation calls realloc directly on a reference
object when it wants to rename it. It is not a public object, so
this doesn't mess with the immutability of references, but it does
assume certain constraints on the reference representation. This
commit wraps that assumption in an isolated API to isolate it.
This adds a convenient new data type for caching the contents of
file in memory when each item in that file corresponds to a name
and you need to both be able to lookup items by name and iterate
over them in some sorted order. The new data type has locks in
place to manage usage in a threaded environment.
If there were symbolic refs among the loose refs then the code
to create packed-refs would fail trying to parse the OID out of
them (where Git just skips trying to pack them). This fixes it.
When a git_buf contains a UTF-8 BOM, the three bytes comprising
that BOM are treated as unprintable characters. For a small git_buf,
the three BOM characters overwhelm the printable characters. This
is problematic when trying to check out a small file as the CR/LF
filtering will not apply.
p_inet_pton on Windows should set errno properly for callers.
Rewrite p_inet_pton to handle error cases correctly and add
test cases to exercise this function.
Report the index being locked with its own error code in order to be
able to differentiate, as a locked index is typically the result of a
crashed process or concurrent access, both of which often require user
intervention to fix.
If none of the backends support direct writes and we must stream the
whole file, we already know what the object's id should be; so use the
stream's functions directly, bypassing the frontend's hashing and
overwriting of our existing id.
The frontend is in charge of calculating the id of the objects. Thus
the backends should treat it as a read-only value. The positioning in
the function signature made it seem as though it was an output
parameter.
Make the id const and move it from the front to behind the subject
(backend or stream).
When dealing with a chain of tags, we need to enqueue each of them
individually, which means we can't use `git_tag_peel` as that jumps
over the intermediate tags.
Do the peeling manually so we can look at each object and take the
appropriate action.
Hash the data as it's coming into the stream and tell the backend what
its name is when finalizing the write. This makes it consistent with
the way a plain git_odb_write() performs the write.
This is in preparation for moving the hashing to the frontend, which
requires us to handle the incoming data before passing it to the
backend's stream.
This fixes a small memory leak in git_revparse where early returns on
errors from git_revparse_single cause a free() on the (reallocated) left
side of the revspec to be skipped.
Accept any value for the remote's url, including an empty string which
we used to reject as invalid configuration.
This is not quite what git does (although it has its own problems with
such configurations) and it makes it harder to fix the issue, by not
letting the user modify it.
As we already need to check for a valid URL when we try to connect to
the network, let that perform the check, as we don't need to do it
anywhere else.
This reverts refactoring done in 13224ea4aa
that introduces a performance regression for NFS when reading files that
don't exist. open() forces a cache invalidation on NFS, while stat()ing a
file just uses the cache and is very quick.
To give a specific example, say you have a repo with a thousand packed
refs. Before this change, looking up every single one ould incur a thousand
slow open() calls. With this change, it's a thousand fast stat() calls.
This is just a bunch of small fixes that I noticed while looking
at the UTF8 and UTF16 path stuff. It fixes a slowdown in looking
for an empty directory (not exiting loop asap), makes the dir name
in the git__DIR structure be a GIT_FLEX_ARRAY to save an allocation,
and fixes some slightly odd assumptions in the cl_getenv helper.
Key-based authentication also needs an username, so include it in each
one.
Also stop assuming a default username of "git" in the ssh transport
which has no business making such a decision.
The routines to push and pop ignore files while traversing a
directory had some issues. In particular, setting up the initial
list would sometimes push an ignore file before it ought to be
applied if the starting path was a directory containing an ignore
file. Also, the pop function was not always matching the right
part of the path and would fail to pop ignores from the list in
some cases.
This adds some tests that exercise a particular problematic case
and then fixes the problems that I could find related to this.
At some point, I'd like to isolate this ignore rule management
code and rewrite it, but that's a larger project and right now,
I'll opt to just try to fix the broken behaviors.
This rolls back the changes to fnmatch parsing from commit
2e40a60e84 except for the tests
that were added. Instead this adds couple of new flags that can
be passed in when attempting to parse an fnmatch pattern. Also,
this changes the pathspec match logic to special case matching a
filename with a '!' prefix against a negative pattern.
This fixes the build.
`git_config_set_string(config, "config.section", "")` fails when
escaping the value.
The buffer in `escape_value` is allocated without NULL-termination. And
in case of empty string 0 is passed for buffer size in `git_buf_grow`.
`git_buf_detach` returns NULL when the allocated size is 0 and that
leads to an error return in `GITERR_CHECK_ALLOC` called after
`escape_value`
The change in `config_file.c` was suggested by Russell Belfer <rb@github.com>
new functions in struct git_config_backend:
* iterator_new(...)
* iterator_free(...)
* next(...)
The old callback based foreach style can still be used with `git_config_backend_foreach_match`
This step is needed to easily add iterators to git_config_backend
As well use these new git_strmap functions to implement foreach
* git_strmap_iter
* git_strmap_has_data(...)
* git_strmap_begin(...)
* git_strmap_end(...)
* git_strmap_next(...)
In git_diff_paired_foreach, temporarily resort the
index->workdir diff list by index path so that we can
track a rename in the workdir from head->index->workdir.
When using a rename source that is actually a to-be-split record,
we have to update the best-fit mapping data in both the case where
the target is also a split record and the case where the target
is a simple added record. Before this commit, we were only doing
the update when the target was itself a split record (and even in
that case, the test was slightly wrong).
After doing further profiling, I found that a lot of time was
being spent attempting to insert hashes into the file hash
signature when using the rolling hash because the rolling hash
approach generates a hash per byte of the file instead of one
per run/line of data.
To optimize this, I decided to convert back to a run-based file
signature algorithm which would be more like core Git.
After changing this, a number of the existing tests started to
fail. In some cases, this appears to have been because the test
was coded to be too specific to the particular results of the file
similarity metric and in some cases there appear to have been bugs
in the core rename detection code where only by the coincidence
of the file similarity scoring were the expected results being
generated.
This renames all the variables in the core rename detection code
to be more consistent and hopefully easier to follow which made it
a bit easier to reason about the behavior of that code and fix the
problems that I was seeing. I think it's in better shape now.
There are a couple of tests now that attempt to stress test the
rename detection code and they are quite slow. Most of the time
is spent setting up the test data on disk and in the index. When
we roll out performance improvements for index insertion, it
should also speed up these tests I hope.
The size data in the index may not reflect the actual size of the
blob data from the ODB when content filtering comes into play.
This commit fixes rename detection to use the actual blob size when
calculating data signatures instead of the value from the index.
Because of a misunderstanding on my part, I first converted the
git_index_add_bypath API to use the post-filtered blob data size
in creating the index entry. I backed that change out, but I
kept the overall refactoring of that routine and the new internal
git_blob__create_from_paths API because it eliminates an extra
stat() call from the code that adds a file to the index.
The existing tests actually cover this code path, at least when
running on Windows, so at this point I'm not adding new tests to
cover the changes.
The previous fix for checking file sizes with rename detection
always loads the blob. In this version, if the odb backend can
get the object header without loading the whole thing into memory,
then we'll just use that, so that we can eliminate possible rename
sources & targets without loading them.
The performance improvements I introduced for rename detection
were not able to run successfully for tree-to-tree diffs because
the blob size was not known early enough and so the file signature
always had to be calculated nonetheless.
This change separates loading blobs into memory from calculating
the signature. I can't avoid having to load the large blobs into
memory, but by moving it forward, I'm able to avoid the signature
calculation if the blob won't come into play for renames.
This restores the usage of GIT_DIFF_LINE_BINARY for the diff
output line that reads "Binary files x and y differ" so that it
can be optionally colorized independently of the file header.
This allows git_diff_patch_size to account for hunk headers and
file headers in the returned size. This required some refactoring
of the code that is used to print file headers so that it could be
invoked by the git_diff_patch_size API.
Also this increases the test coverage and fixes an off-by-one bug
in the size calculation when newline changes happen at the end of
the file.
Instead of using lots of strdup calls, this adds a memory pool to
the loose refs iteration code and uses it for keeping track of the
loose refs array. Memory usage could probably be reduced even
further by eliminating the vector and just scanning by adding the
strlen of each ref, but that would be a more intrusive changes.
This also updates the error handling to be more thorough about
checking for failed allocations, etc.
The git_reference_next API silently skips invalid references when
scanning the loose refs. The git_reference_next_name API should
skip the same ones even though it isn't creating the reference
object.
This adds a test with a an invalid loose reference and makes sure
that both APIs skip the same entries and generate the same results.
This makes git__swap use the __sync_lock_test_and_set primitive
with GCC and the InterlockedExchangePointer primitive with MSVC.
Previously is used compare_and_swap in a way that was probably
unintuitive for most thinking (i.e. it could fail to swap in the
value if another thread raced in). Now it will always succeed
and the last thread to run in a race will win instead of the
first thread.
This also fixes up a little confusion between volatile void **
and void * volatile * that came up with the Win32 compiler.
This restores a behavior that was accidentally lost during some
diff refactoring where an untracked directory that contains a .git
item should be treated as IGNORED, not as UNTRACKED. The submodule
code already detects this, but the diff code was not handling the
scenario right.
This also updates a number of existing tests that were actually
exercising the behavior but did not have the right expectations in
place. It actually makes the new
`test_diff_submodules__diff_ignore_options` test feel much better
because the "not-a-submodule" entries are now ignored instead of
showing up as untracked items.
Fixes#1697
This adds correct support for an equivalent to --ignore-submodules
in diff, where an actual ignore value can be passed to diff to
override the per submodule settings in the configuration.
This required tweaking the constants for ignore values so that
zero would not be used and could represent an unset option to the
diff. This was an opportunity to move the submodule values into
include/git2/types.h and to rename the poorly named DEFAULT values
for ignore and update constants to RESET instead.
Now the GIT_DIFF_IGNORE_SUBMODULES flag is exactly the same as
setting the ignore_submodules option to GIT_SUBMODULE_IGNORE_ALL
(which is actually a minor change from the old behavior in that
submodules will now be treated as UNMODIFIED deltas instead of
being left out totally - if you set GIT_DIFF_INCLUDE_UNMODIFIED).
This includes tests for the various new settings.
Submodules now expose an internal status API that allows diff to
get back the OID values from the submodule very easily and also
to avoiding caching issues and to override the ignore setting for
the submodule.
This fixes the way that submodule status is checked to bypass just
about all of the caching in the submodule object. Based on the
ignore value, it will try to do the minimum work necessary to find
the current status of the submodule - but it will actually go to
disk to get all of the current values.
This also removes the custom refcounting stuff in favor of the
common git_refcount style. Right now, it is still for internal
purposes only, but it should make it easier to add true submodule
refcounting in the future with a public git_submodule_free call
that will allow bindings not to worry about the submodule object
getting freed from underneath them.
This adds a BARE option to git_repository_open_ext which allows
a fast open path that still knows how to read gitlinks and to
search for the actual .git directory from a subdirectory.
`git_repository_open_bare` is still simpler and faster, but having
a gitlink aware fast open is very useful for submodules where we
want to quickly be able to peek at the HEAD and index data without
doing any other meaningful repo operations.
This is probably not the final form of this change, but this is
a preliminary version of checking a timestamp to see if the cached
working directory HEAD OID matches the current. Right now, this
uses the timestamp on the index and is, like most of our timestamp
checking, subject to having only second accuracy.
This adds an additional pathspec API that will match a pathspec
against a diff object. This is convenient if you want to handle
renames (so you need the whole diff and can't use the pathspec
constraint built into the diff API) but still want to tell if the
diff had any files that matched the pathspec.
When the pathspec is matched against a diff, instead of keeping
a list of filenames that matched, instead the API keeps the list
of git_diff_deltas that matched and they can be retrieved via a
new API git_pathspec_match_list_diff_entry.
There are a couple of other minor API extensions here that were
mostly for the sake of convenience and to reduce dependencies
on knowing the internal data structure between files inside the
library.
This is a simple bit vector object that is not resizable after
the initial allocation but can be of arbitrary size. It will
keep the bti vector entirely on the stack for vectors 64 bits
or less, and will allocate the vector on the heap for larger
sizes. The API is uniform regardless of storage location.
This is very basic right now and all the APIs are inline functions,
but it is useful for storing an array of boolean values.
This converts the array of parent SHAs from a git_vector where
each SHA has to be separately allocated to a git_array_t where
all the SHAs can be kept in one block. Since the two collections
have almost identical APIs, there isn't much involved in making
the change. I did add an API to git_array_t so that it could be
allocated at a precise initial size.
This fixes the way the example log program decides if a merge
commit should be shown when a pathspec is given. Also makes it
easier to use the pathspec API to just check "does a tree match
anything in the pathspec" without allocating a match list.
This adds a new public API for compiling pathspecs and matching
them against the working directory, the index, or a tree from the
repository. This also reworks the pathspec internals to allow the
sharing of code between the existing internal usage of pathspec
matching and the new external API.
While this is working and the new API is ready for discussion, I
think there is still an incorrect behavior in which patterns are
always matched against the full path of an entry without taking
the subdirectories into account (so "s*" will match "subdir/file"
even though it wouldn't with core Git). Further enhancements are
coming, but this was a good place to take a functional snapshot.
The SSH error checking and reporting could still be further
improved by using the libssh2 native methods to get error info,
but at least this ensures that all error codes are checked and
translated into libgit2 error messages.
If there is not an error, the return value was always the return value
of the last call to file->get_multivar
With this commit GIT_ENOTFOUND is only returned if all the calls to
filge-get_multivar return GIT_ENOTFOUND.
This makes all of the credential objects use the same pattern to
clear the contents and call git__memzero when done. Much of this
information is probably not sensitive, but it also seems better
to just clear consistently.
Much of the SSH credential creation API can be left enabled even
on platforms with no SSH support. We really just have to give an
error when you attempt to open the SSH connection.
The diff hunk context string that is returned to xdiff need not
be NUL terminated because the xdiff code just copies the number of
bytes that you report directly into the output. There was an off
by one in the diff driver code when the header context was longer
than the output buffer size, the output buffer length included
the NUL byte which was copied into the hunk header.
Fixes#1710
This option serves no benefit now that the git_status_list API
is available. It was of questionable value before and now it
would just be a bad idea to use it rather than the indexed API.
The index isn't really thread safe for the most part, but we can
easily be more careful and avoid double frees and the like, which
are serious problems (as opposed to a lookup which might return
the incorrect value but if the index in being updated, that is
much harder to avoid).
In both of these cases, the submodule data should still be loaded
just (obviously) without the data that comes from either the index
or the HEAD.
This fixes a bug in the orphaned head case.
There was a bug where submodules whose HEAD had not been moved
were being marked as having an UNMODIFIED delta record instead
of being left MODIFIED. This fixes that and fixes the tests to
notice if a submodule has been incorrectly marked as UNMODIFIED.
In theory, p_stat should never return an S_ISLNK result, but due
to the current implementation on Windows with mount points it is
possible that it will. For now, work around that by allowing a
link in the path to a directory being created. If it is really a
problem, then the issue will be caught on the next iteration of
the loop, but typically this will be the right thing to do.
This updates the calls that make the subdirectories for objects
to use a base directory above which git_futils_mkdir won't walk
any higher. This prevents attempts to mkdir all the way up to
the root of the filesystem.
Also, this moves the objects_dir into the loose backend structure
and removes the separate allocation, plus does some preformatting
of the objects_dir value to guarantee a trailing slash, etc.
With the new target directory option to checkout, the non-bareness
of the repository should be checked much later in the parameter
validation process - actually that check was already in place, but
I was doing it redundantly in the checkout APIs.
This removes the now unnecessary early check for bare repos. It
also adds some other parameter validation and makes it so that
implied parameters can actually be passed as NULL (i.e. if you
pass a git_index, you don't have to pass the git_repository - we
can get it from index).
This adds the ability for checkout to write to a target directory
instead of having to use the working directory of the repository.
This makes it easier to do exports of repository data and the like.
This is similar to, but not quite the same as, the --prefix option
to `git checkout-index` (this will always be treated as a directory
name, not just as a simple text prefix).
As part of this, the workdir iterator was extended to take the
path to the working directory as a parameter and fallback on the
git_repository_workdir result only if it's not specified.
Fixes#1332
This fixes the checkout case when a file is modified between the
baseline and the target and yet missing in the working directory.
The logic for that case appears to have been wrong.
This also adds a useful checkout notify callback to the checkout
test helpers that will count notifications and also has a debug
mode to visualize what checkout thinks that it's doing.
Files in status will, be default, be sorted according to the case
insensitivity of the filesystem that we're running on. However,
in some cases, this is not desirable. Even on case insensitive
file systems, 'git status' at the command line will generally use
a case sensitive sort (like 'ls'). Some GUIs prefer to display a
list of file case insensitively even on case-sensitive platforms.
This adds two new flags: GIT_STATUS_OPT_SORT_CASE_SENSITIVELY
and GIT_STATUS_OPT_SORT_CASE_INSENSITIVELY that will override the
default sort order of the status output and give the user control.
This includes tests for exercising these new options and makes
the examples/status.c program emulate core Git and always use a
case sensitive sort.
This adds some tests for updating the index and having it remove
items to make sure that the iteration over the index still works
even as earlier items are removed.
In testing with valgrind, this found a path that would use the
path string from the index entry after it had been freed. The
bug fix is simply to copy the path of the index entry before
doing any actual index manipulation.
This adds three new public APIs for manipulating the index:
1. `git_index_add_all` is similar to `git add -A` and will add
files in the working directory that match a pathspec to the
index while honoring ignores, etc.
2. `git_index_remove_all` removes files from the index that match
a pathspec.
3. `git_index_update_all` updates entries in the index based on
the current contents of the working directory, either added
the new information or removing the entry from the index.
Command line Git sometimes generates an error message if given a
pathspec that contains an exact match to an ignored file (provided
--force isn't also given). This adds an internal function that
makes it easy to check it that has happened. Right now, I'm not
creating a public API for this because that would get a little
more complicated with a need for callbacks for all invalid paths.
Right now, setting up a pathspec to be parsed and processed
requires several data structures and a couple of API calls. This
adds a new high level data structure that contains all the items
that you'll need and high-level APIs that do all of the setup and
all of the teardown. This will make it easier to use pathspecs
in various places with less repeated code.
This makes the diff rename tracking code more careful about the
order in which it processes renames and more thorough in updating
the mapping of correct renames when an earlier rename update
alters the index of a later matched pair.
This adds parameters to the four functions that allow for blob-to-
blob and blob-to-buffer differencing (either via callbacks or by
making a git_diff_patch object). These parameters let you say
that filename we should pretend the blob has while doing the diff.
If you pass NULL, there should be no change from the existing
behavior, which is to skip using attributes for file type checks
and just look at content. With the parameters, you can plug into
the new diff driver functionality and get binary or non-binary
behavior, plus function context regular expressions, etc.
This commit also fixes things so that the git_diff_delta that is
generated by these functions will actually be populated with the
data that we know about the blobs (or buffers) so you can use it
appropriately. It also fixes a bug in generating patches from
the git_diff_patch objects created via these functions.
Lastly, there is one other behavior change that may matter. If
there is no difference between the two blobs, these functions no
longer generate any diff callbacks / patches unless you have
passed in GIT_DIFF_INCLUDE_UNMODIFIED. This is pretty natural,
but could potentially change the behavior of existing usage.
This changes the behavior of the status RENAMED flags so that they
will be combined with the MODIFIED flags if appropriate. If a file
is modified in the index and also renamed, then the status code
will have both the GIT_STATUS_INDEX_MODIFIED and INDEX_RENAMED bits
set. If it is renamed but the OID has not changed, then just the
GIT_STATUS_INDEX_RENAMED bit will be set. Similarly, the flags
GIT_STATUS_WT_MODIFIED and GIT_STATUS_WT_RENAMED can both be set
independently of one another.
This fixes a serious bug where the check for unmodified files that
was done at data load time could end up erasing the RENAMED state
of a file that was renamed with no changes.
Lastly, this contains a bunch of new tests for status with renames,
including tests where the only rename changes are case changes.
The expected results of these tests have to vary by whether the
platform uses a case sensitive filesystem or not, so the expected
data covers those platform differences separately.
Trees are always case sensitive. The index is always case
preserving and will be case sensitive when it is turned into a
tree. Therefore the tree and the index can and should always
be compared to one another case sensitively.
This will restore the index to case insensitive order after the
diff has been generated.
Consider this a short-term fix. The long term fix is to have the
index always stored both case sensitively and case insensitively
(at least on platforms that sometimes require case insensitivity).
In a case insensitive index, if you attempt to add a file from
disk with a different case pattern, the old case pattern in the
index should be preserved.
This fixes that (and a couple of minor warnings).
This commit reinstates some changes to git_diff__paired_foreach
that were discarded during the rebase (because the diff_output.c
file had gone away), and also adjusts the case insensitively
logic slightly to hopefully deal with either mismatched icase
diffs and other case insensitivity scenarios.
This makes diff more careful about picking the canonical path
when generating a delta so that it won't accidentally pick up a
case-mismatched path on a case-insensitive file system. This
should make sure we use the "most accurate" case correct version
of the path (i.e. from the tree if possible, or the index if
need be).
The exclude submodules flag was not doing the right thing, in
that a file with no diff between the head and the index and just
a delete in the workdir could be excluded if submodules were
excluded.
On Linux: fix a warning message related to the volatile qualifier (cast)
On Windows: use SecureZeroMemory()
On both, inline the call, so that no entry point can lead back to this "secure" memory zeroing.
This makes the git_diff_patch definition private to diff_patch.c
and fixes a number of other header file naming inconsistencies to
use `git_` prefixes on functions and structures that are shared
between files.
This changes the size data to uint32_t, fixes the array growth
logic to use a simple 1.5x multiplier, and uses a generic inline
function for growing the array to make the git_array_alloc API
feel more natural (i.e. it returns a pointer to the new item).
This adds two new public APIs: git_diff_patch_from_blobs and
git_diff_patch_from_blob_and_buffer, plus it refactors the code
for git_diff_blobs and git_diff_blob_to_buffer so that they code
is almost entirely shared between these APIs, and adds tests for
the new APIs.
This implements the loading of regular expression pattern lists
for diff drivers that search for function context in that way.
This also changes the way that diff drivers update options and
interface with xdiff APIs to make them a little more flexible.
There are all sorts of misconfiguration in the wild. We already rely
on the signature constructor to trim SP. Extend the logic to use
`isspace` to decide whether a character should be trimmed.
This is a significant reorganization of the diff code to break it
into a set of more clearly distinct files and to document the new
organization. Hopefully this will make the diff code easier to
understand and to extend.
This adds a new `git_diff_driver` object that looks of diff driver
information from the attributes and the config so that things like
function content in diff headers can be provided. The full driver
spec is not implemented in the commit - this is focused on the
reorganization of the code and putting the driver hooks in place.
This also removes a few #includes from src/repository.h that were
overbroad, but as a result required extra #includes in a variety
of places since including src/repository.h no longer results in
pulling in the whole world.
There are two places where git_futils_mkdir should exit early or
at least do less. The first is when using GIT_MKDIR_SKIP_LAST
and having that flag leave no directory left to create; it was
being handled previously, but the behavior was subtle. Now I put
in a clear explicit check that exits early in that case.
The second is when there is no directory to create, but there is
a valid path that should be verified. I shifted the logic a bit
so we'll be better about not entering the loop than that happens.
This implements a basic callback to extract function context for
a diff. It always uses the same search heuristic right now with
no regular expressions or language-specific variants. Those will
come next, I think.
This makes sure that git_futils_mkdir always skips over the root
directory at a minimum, even on platforms where the root is not
simply '/'. Also, this removes the GIT_WIN32 ifdef in favor of
making EACCES as a potentially recoverable error on all platforms.
We ran into an issue where cloning a repository to a folder
directly underneath the root of a volume (e.g. 'd:\libgit2')
would fail with an access denied error. This was traced down
to a call to make a directory that is the root (e.g. 'd:') could
return an error indicated access denied instead of an error
indicating the path already exists. This change now handles
the access denied error on Win32 and checks for the existence
of the folder.
git doesn't do that, and it's not something that's usually
actionable to fix. if you have a git repository with one bad
timezone in the history, it's too late to change it most likely.
Instead of just blowing away the stat cache data when loading a
new tree into the index, this checks if each loaded item has a
corresponding existing item with the same OID and if so, copies
the stat data from the old item to the new one so it will not be
blown away.
It is obviously quite a serious problem if this happens, but mutex
initialization can fail and we should detect it. It's a bit like
a memory allocation failure, in that you're probably pretty screwed
if this occurs, but at least we'll catch it.
By zeroing out the memory when we free larger objects (i.e. those
that serve as collections of other data, such as repos, odb, refdb),
I'm hoping that it will be easier for libgit2 bindings to find
errors in their object management code.
1. internal iterators now return GIT_ITEROVER when you go past the
last item in the iteration.
2. git_iterator_advance will "advance" to the first item in the
iteration if it is called immediately after creating the
iterator, which allows a simpler idiom for basic iteration.
3. if git_iterator_advance encounters an error reading data (e.g.
a missing tree or an unreadable file), it returns the error
but also attempts to advance past the invalid data to prevent
an infinite loop.
Updated all tests and internal usage of iterators to account for
these new behaviors.