* Make GIT_INLINE an internal definition so it cannot be used in
public headers
* Fix language in CONTRIBUTING
* Make index caps API use signed instead of unsigned values
On some systems, notably HP PA-RISC systems running Linux or HP-UX,
EWOULDBLOCK and EAGAIN are not the same value. POSIX (and these OSes) allow
EWOULDBLOCK to occur on write(2) (and send(2), etc.), so check explicitly
for this case as well as EAGAIN by defining and using a macro GIT_ISBLOCKED
that considers both.
The macro is necessary because MSYS does not provide EWOULDBLOCK and
compilation fails if an attempt is made to use it unconditionally. On most
systems, where the two values are the same, the compiler will simply
optimize this check out and it will have no effect.
This adds an API to amend an existing commit, basically a shorthand
for creating a new commit filling in missing parameters from the
values of an existing commit. As part of this, I also added a new
"sys" API to create a commit using a callback to get the parents.
This allowed me to rewrite all the other commit creation APIs so
that temporary allocations are no longer needed.
This fixes a number of warnings with the Windows 64-bit build
including a test failure in test_repo_message__message where an
invalid pointer to a git_buf was being used.
We need this from util.h and posix.h, but the latter includes common.h
which includes util.h, which means p_strlen is not defined by the time
we get to git__strndup().
Split the definition on p_strlen() off into its own header so we can use
it in util.h.
The current code issues a lot of strncmp() calls in order to check for
the end of the header, simply in order to copy it and start going
through it again. These are a lot of calls for something we can check as
we go along. Knowing the amount of parents beforehand to reduce
allocations in extreme cases does not make up for them.
Instead start parsing immediately and check for the double-newline after
each header field, leaving the raw_header allocation for the end, which
lets us go through the header once and reduces the amount of strncmp()
calls significantly.
In unscientific testing, this has reduced a shortlog-like usage (walking
though the whole history of a branch and extracting data from the
commits) of git.git from ~830ms to ~700ms and makes the time we spend in
strncmp() negligible.
Let the user push committish objects and peel them to figure out which
commit to push to our queue.
This is for convenience and for allowing uses of
git_revwalk_push_glob(w, "tags")
with annotated tags.
Change the name to _matching() intead of _if(), and force _set_target()
to be a conditional update. If the user doesn't care about the old
value, they should use git_reference_create().
This tweaks the pqueue_up and pqueue_down routines so that they
will not do full element swaps but instead carry over the state
of the previous loop iteration and only assign elements for which
we know the final position. This will avoid a little bit of data
assignment which should improve performance in theory.
Also got rid of some vector helpers that I'm no longer using.
This fixes a typo I made for setting the sorted flag on the index
after a reload. That typo didn't actually cause any test failures
so I'm also adding a test that explicitly checks that the index is
correctly sorted after a reload when ignoring case and when not.
This updates the git_pqueue to simply be a set of specialized
init/insert/pop functions on a git_vector.
To preserve the pqueue feature of having a fixed size heap, I
converted the "sorted" field in git_vectors to a more general
"flags" field so that pqueue could mix in it's own flag. This
had a bunch of ramifications because a number of places were
directly looking at the vector "sorted" field - I added a couple
new git_vector helpers (is_sorted, set_sorted) so the specific
representation of this information could be abstracted.
I accidentally wrote a separate priority queue implementation when
I was working on file rename detection as part of the file hash
signature calculation code. To simplify licensing terms, I just
adapted that to a general purpose priority queue and replace the
old priority queue implementation that was borrowed from elsewhere.
This also removes parts of the COPYING document that no longer
apply to libgit2.
Validating the workdir should not compare HEAD to working
directory - this is both inefficient (as it ignores the cache)
and incorrect. If we had legitimately allowed changes in the
index (identical to the merge result) then comparing HEAD to
workdir would reject these changes as different. Further, this
will identify files that were filtered strangely as modified,
while testing with the cache would prevent this.
Also, it's stupid slow.
The checkout code used to defer removal of "blocking" files in
checkouts until the blocked item was actually being written (since
we have already checked that the removing the block is acceptable
according to the update rules). Unfortunately, this resulted in
an intermediate index state where both the blocking and new items
were in the index which is no longer allowed. Now we just remove
the blocking item in the first pass so it never needs to coexist.
In cases where there are typechanges, this could result in a bit
more churn of removing and recreating intermediate directories,
but I'm going to assume that is an unusual case and the churn will
not be too costly.
There were some confusing issues mixing up the number of bytes
written to the zstream output buffer with the number of bytes
consumed from the zstream input. This reorganizes the zstream
API and makes it easier to deflate an arbitrarily large input
while still using a fixed size output.
This removes the fetchRecurse compiler warnings and makes the
behavior match the other submodule options (i.e. the in-memory
setting can be reset to the on-disk value).
When three-way merging indexes, we previously changed each path
as we read them, which would lead to us adding an index entry for
'foo', then removing an index entry for 'foo/file'. With the new
index requirements, this is not allowed. Removing entries in the
merged index, then adding them, resolves this. In the previous
example, we now remove 'foo/file' before adding 'foo'.
In case insensitive index mode, we would stop at a prefixed entry,
treating the provided search key length as a substring, not the
length of the string to match.
Writing a sample Javascript driver pointed out some extra
whitespace handling that needed to be done in the diff driver.
This adds some tests with some sample javascript code that I
pulled off of GitHub just to see what would happen. Also, to
clean up the userdiff test data, I did a "git gc" and packed
up the test objects.
Since I don't have permission yet on the code from Git, I decided
I'd take a stab at writing patterns for PHP and Javascript myself.
I think these are pretty weak, but probably better than the
default behavior without them.
I contacted a number of Git authors and lined up their permission
to relicense their work for use in libgit2 and copied over their
code for diff driver xfuncname patterns. At this point, the code
I've copied is taken verbatim from core Git although Thomas Rast
warned me that the C++ patterns, at least, really need an update.
I've left off patterns where I don't feel like I have permission
at this point until I hear from more authors.
Reorganize the builtin driver table slightly so that core Git
builtin definitions can be imported verbatim. Then take a few of
the core Git drivers and pull them in.
This also creates a test of diffs with the builtin HTML driver
which led to some small error handling fixes in the driver
selection logic.
This extends the diff driver parser to support multiline driver
definitions along with ! prefixing for negated matches. This
brings the driver function pattern parsing in line with core Git.
This also adds an internal table of driver definitions and a
fallback code path that will look in that table for diff drivers
that are set with attributes without having a definition in the
config file. Right now, I just populated the table with a kind
of simple HTML definition that is similar to the core Git def.
Don't try to determine whether the system supports file modes
when putting the tree data in the index during checkout. The tree's
mode is canonical and did not come from stat(2) in the first place.
It's hard or even impossible to correctly free the string buffer
allocated by git_patch_to_str in some circumstances. Drop the function
so people have to use git_patch_to_buf instead - git_buf has a dedicated
destructor.
Returning library-allocated strings from libgit2 works fine on Linux,
but may cause problems on Windows because there is no one C Runtime that
everything links against. With libgit2 not exposing its own allocator,
freeing the string is a gamble.
git_patch_to_str already serializes to a buffer, then returns the
underlying memory. Expose the functionality directly, so callers can use
the git_buf_free function to free the memory later.
The "merge none" (don't automerge) flag was only to aide in
merge trivial tests. We can easily determine whether merge
trivial resulted in a trivial merge or an automerge by examining
the REUC after automerge has completed.
The default merge_file level was XDL_MERGE_MINIMAL, which will
produce conflicts where there should not be in the case where
both sides were changed identically. Change the defaults to be
more aggressive (XDL_MERGE_ZEALOUS) which will more aggressively
compress non-conflicts. This matches git.git's defaults.
Increase testing around reverting a previously reverted commit to
illustrate this problem.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
This changes git_signature_dup to actually honor oom conditions raised by
the call to git__strdup. It also aligns it with the error code return
pattern used everywhere else.
Ok, scrap the previous commit. This is the right overflow check that
takes care of 64 bit overflow **and** 32-bit overflow, which needs to be
considered because the pool malloc can only allocate 32-bit elements in
one go.
Note that `git_pool_strdup` cannot really return any error codes,
because the pool doesn't set errors on OOM.
The only place where `giterr_set_oom` is called is in
`git_pool_strndup`, in a conditional check that is always optimized
away. `n + 1` cannot be zero if `n` is unsigned because the compiler
doesn't take wraparound into account.
This check has been removed altogether because `size_t` is not
particularly going to overflow.
This renames git_vector_free_all to the better git_vector_free_deep
and also contains a couple of memory leak fixes based on valgrind
checks. The fixes are specifically: failure to free global dir
path variables when not compiled with threading on and failure to
free filters from the filter registry that had not be initialized
fully.
This adds tests that try canceling an indexer operation from
within the progress callback.
After writing the tests, I wanted to run this under valgrind and
had a number of errors in that situation because mmap wasn't
working. I added a CMake option to force emulation of mmap and
consolidated the Amiga-specific code into that new place (so we
don't actually need separate Amiga code now, just have to turn on
-DNO_MMAP).
Additionally, I made the indexer code propagate error codes more
reliably than it used to.
Clone callbacks can return non-zero values to cancel the clone.
This adds some tests to verify that this actually works and updates
the documentation to be clearer that this can happen and that the
return value will be propagated back by the clone function.
The checkout notify callback behavior on non-zero return values
was not being tested. This adds tests, fixes a bug with positive
values, and clarifies the documentation to make it clear that the
checkout can be canceled via this mechanism.
The callback to supply data chunks could return a negative value
to stop creation of the blob, but we were neither using GIT_EUSER
nor propagating the return value. This makes things use the new
behavior of returning the negative value back to the user.
This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
There are a lot of places that we call git__free on each item in
a vector and then call git_vector_free on the vector itself. This
just wraps that up into one convenient helper function.
This continues auditing all the places where GIT_EUSER is being
returned and making sure to clear any existing error using the
new giterr_user_cancel helper. As a result, places that relied
on intercepting GIT_EUSER but having the old error preserved also
needed to be cleaned up to correctly stash and then retrieve the
actual error.
Additionally, as I encountered places where error codes were not
being propagated correctly, I tried to fix them up. A number of
those fixes are included in the this commit as well.
This adds giterr_user_cancel to return GIT_EUSER and clear any
error message that is sitting around. As a result of using that
in places, we need to be more thorough with capturing errors that
happen inside a callback when used internally. To help with that,
this also adds giterr_capture and giterr_restore so that when we
internally use a foreach-type function that clears errors and
converts them to GIT_EUSER, it is easier to restore not just the
return value, but the actual error message text.
This adds `git_config__lookup_entry` which will look up a key in
a config and return either the entry or NULL if the key was not
present. Optionally, it can either suppress all errors or can
return them (although not finding the key is not an error for this
function). Unlike other accessors, this does not normalize the
config key string, so it must only be used when the key is known
to be in normalized form (i.e. all lower-case before the first dot
and after the last dot, with no invalid characters).
This also adds three high-level helper functions to look up config
values with no errors and a fallback value. The three functions
are for string, bool, and int values, and will resort to the
fallback value for any error that arises. They are:
* `git_config__get_string_force`
* `git_config__get_bool_force`
* `git_config__get_int_force`
None of them normalize the config `key` either, so they can only
be used for internal cases where the key is known to be in normal
format.
The frontend used to look at the file directly, but that's obviously not
the right thing to do. Expose it on the backend and use that function
instead.
git-core only writes to the reflogs of HEAD, refs/heads/ and,
refs/notes/ or if there is already a reflog in place. Adjust our code to
follow these semantics.
When doing copy detection, it is often necessary to include
UNMODIFIED records in the git_diff so they are available as source
records for GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED. Yet in the final
diff, often you will not want to have these UNMODIFIED records.
This adds a flag which marks these UNMODIFIED records for deletion
from the diff list so they will be removed after the rename detect
phase is over.
When FIND_COPIES is used in combination with BREAK_REWRITES for
rename detection, there was a bug where the split MODIFIED delta
was only used as a target for RENAME records and not for COPIED
records. This fixes that, converting the split into a pair of
DELETED and COPIED deltas when that circumstance arises.
Whenever a reference is created or updated, we need to write to the
reflog regardless of whether the user gave us a message, so we shouldn't
leave that to the ref frontend, but integrate it into the backend.
This also eliminates the race between ref update and writing to the
reflog, as we protect the reflog with the ref lock.
As an additional benefit, this reflog append on the backend happens by
appending to the file instead of parsing and rewriting it.
Copy the pointers into temporary vectors instead of assigning them tot
he same array so we don't mess up with someone else's memory by
accident (e.g. by sorting).
The callback-based method of listing remote references dates back to the
beginning of the network code's lifetime, when we didn't know any
better.
We need to keep the list around for update_tips() after disconnect() so
let's make use of this to simply give the user a pointer to the array so
they can write straightforward code instead of having to go through a
callback.
Removing arbitrary refspecs makes things more complex to reason
about. Instead, let the user set the fetch and push refspec list to
whatever they want it to be.
Create a git_branch_iterator type which is equivalent to the foreach but
lets us write loops instead of callbacks.
Since the introduction of git_reference_shorthand(), the added value of
passing the name is reduced.
When the filesystem iterator encounters an error with a file, it
returns the error but because of the cleanup code, it was in some
cases erasing the error message. This uses the giterr_detach API
to make sure that the actual error message is restored after the
cleanup code has been run.
There are a number of cases where it is convenient to be able to
fetch and "claim" the current error string, clearing the error.
This is helpful when you need to call some code that may alter
the error and you want to restore it later on and/or report it via
some other mechanism.
We used to move `data_start` forward, which is wrong as that needs to
point to the beginning of the buffer in order to perform size
calculations.
Introduce a `write_start` variable which indicates where we should start
writing from, which is what the `data_start` was being wrongly reused to
be.
The last commit taught git_checkout_tree to actually do something
meaningfull, when treeish was NULL. This lets us rewrite
git_checkout_head to simply call git_checkout_tree without giving it a
treeish.
In git_checkout_tree, the first check tests if either repo or treeish is
NULL and says that eithor of them has to have a valid value. But there
is no code to handle the treeish == NULL case.
So, do something meaningful in that case: use HEAD instead.
When downloading the default branch due to lack of refspecs, we still
need to write out FETCH_HEAD with the tip we downloaded, unfortunately
with a format that doesn't match what we already have.
This avoids sending our whole history bit by bit to the remote in cases
where there is no common history, just to give up in the end.
The number comes from the canonical implementation.
The correct behaviour when a remote has no refspecs (e.g. a URL from the
command-line) is to download the remote's HEAD. Let's do that.
This fixes#1261.
This was never really working right because we were checking the
wrong flag and not checking it in all the places that we need to
be checking it. I finally got around to writing a test and adding
actual support for it.
Sometimes the static initializer for git_diff_options cannot be
used and since setting them to all zeroes doesn't actually work
quite right, this adds a new helper for that situation.
This also adds an explicit new value to the submodule settings
options to be used when those enums need static initialization.
This changes `git_index_read` to have two modes - a hard index
reload that always resets the index to match the on-disk data
(which was the old behavior) and a soft index reload that uses
the timestamp / file size information and only replaces the index
data if the file on disk has been modified.
This then updates the git_status code to do a soft reload unless
the new GIT_STATUS_OPT_NO_REFRESH flag is passed in.
This also changes the behavior of the git_diff functions that use
the index so that when an index is not explicitly passed in (i.e.
when the functions call git_repository_index for you), they will
also do a soft reload for you.
This intentionally breaks the file signature of git_index_read
because there has been some confusion about the behavior previously
and it seems like all existing uses of the API should probably be
examined to select the desired behavior.
These changes fix the basic problem with GIT_DIFF_REVERSE being
broken for text diffs. The reversed diff entries were getting
added to the git_diff correctly, but some of the metadata was kept
incorrectly in a way that prevented the text diffs from being
generated correctly. Once I fixed that, it became clear that it
was not possible to merge reversed diffs correctly. This has a
first pass at fixing that problem. We probably need more tests
to make sure that is really fixed thoroughly.
Seems that regexp in Mac OS X and Linux were behaving
differently: while in OS X the empty string didn't
match any value, in Linux it was matching all of them,
so the the second fetch refspec was overwritting the
first one, instead of creating a new one.
Using an unmatcheable regular expression solves the
problem (and seems to be portable).
At some moment git_config_delete_entry lost the ability to delete one entry of
a multivar configuration. The moment you had more than one fetch or push
ref spec for a remote you will not be able to save that remote anymore. The
changes in network::remote::remotes::save show that problem.
I needed to create a new git_config_delete_multivar because I was not able to
remove one or several entries of a multivar config with the current API.
Several tries modifying how git_config_set_multivar(..., NULL) behaved were
not successful.
git_config_delete_multivar is very similar to git_config_set_multivar, and
delegates into config_delete_multivar of config_file. This function search
for the cvar_t that will be deleted, storing them in a temporal array, and
rebuilding the linked list. After calling config_write to delete the entries,
the cvar_t stored in the temporal array are freed.
There is a little fix in config_write, it avoids an infinite loop when using
a regular expression (case for the multivars). This error was found by the
test network::remote::remotes::tagopt.
This tells the server that we speak it, but we don't make use of its
extra information to determine if there's a better place to stop
negotiating.
In a somewhat-related change, reorder the capabilities so we ask for
them in the same order as git does.
Also take this opportunity to factor out a fairly-indented portion of
the negotiation logic.
It was there to keep it apart from the one which read in from a file on
disk. This other indexer does not exist anymore, so there is no need for
anything other than git_indexer to refer to it.
While here, rename _add() function to _append() and _finalize() to
_commit(). The former change is cosmetic, while the latter avoids
talking about "finalizing", which OO languages use to mean something
completely different.
When building libgit2 for ia32 architecture on a x64 machine, including
"config.h" without a "common.h" would result the following error:
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2288): error C2373: 'InterlockedIncrement' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2295): error C2373: 'InterlockedDecrement' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2303): error C2373: 'InterlockedExchange' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
C:\Program Files\Microsoft SDKs\Windows\v7.1\include\winbase.h(2314): error C2373: 'InterlockedExchangeAdd' : redefinition; different type modifiers [C:\cygwin\home\zcbenz\codes\git-utils\build\libgit2.vcxproj]
The user is unable to derive the number of deltas in the pack, as that
would require them to capture the stats exactly in the moment between
download and final processing, which is abstracted away in the fetch.
Capture these numbers for the user and expose them in the progress
struct. The clone and fetch examples now also present this information
to the user.
The names from libssh2 are somewhat obtuse for us. We can simplify the
usual key/passphrase credential's name, as well as make clearer what the
custom signature function is.
It seems that to implement these options, we just have to pass
the appropriate flags through to the libxdiff code taken from
core git. So let's do it (and add a test).
Instead of having functions with so very many parameters to pass
hunk and line data, this takes the existing git_diff_hunk struct
and extends it with more hunk data, plus adds a git_diff_line.
Those structs are used to pass back hunk and line data instead of
the old APIs that took tons of parameters.
Some work that was previously only being done for git_diff_patch
creation (scanning the diff content for exact line counts) is now
done for all callbacks, but the performance difference should not
be noticable.
While the base git_diff_delta structure always contains two files,
when we introduce conflict data, it will be helpful to have an
indicator when an additional file is involved.
Move conflict handling into two steps: load the conflicts and
then apply the conflicts. This is more compatible with the
existing checkout implementation and makes progress reporting
more sane.
If a D/F conflict or rename 2->1 conflict occurs,
we write the file sides as filename~branchname. If
a file with that name already exists in the working
directory, write as filename~branchname_0 instead.
(Incrementing 0 until a unique filename is found.)
This lays groundwork for separating formatting options from diff
creation options. This groups the formatting flags separately
from the diff list creation flags and reorders the options. This
also tweaks some APIs to further separate code that uses patches
from code that just looks at git_diffs.
This makes no functional change to diff but renames a couple of
the objects and splits the new git_patch (formerly git_diff_patch)
into a new header file.
Don't increase the number of total objects, as it can produce
suprising progress output. The only addition compared to pre-thin is
the addition of local_objects to allow an output similar to git's
"completed with %d local objects".
The iconv init was accidentally clearing the default error state
during reference normalization. This resets so that normalization
errors will be detected correctly.
Before these changes, looking up a reference would return the
same precomposed or decomposed form of the reference name that
was used to look it up, so on MacOS which ignores the difference
between the two, a single reference could be looked up either way
and git_reference_name would return the form of the name that was
used to look it up! This change makes lookup always return the
precomposed name if core.precomposeunicode is set regardless of
which version was used to look it up. The reference iterator was
already returning the precomposed form from earlier work.
This also updates the CMakeLists.txt rules for enabling iconv
usage because the clar tests for this code were actually not being
activated properly with the old version.
Finally, this moves git_repository_reset_filesystem from include/
git2/repository.h to include/git2/sys/repository.h since it is not
really a function that normal library users should have to think
about very often.
This cleans up some additional issues. The main change is that
on a filesystem that doesn't support mode bits, libgit2 will now
create new blobs with GIT_FILEMODE_BLOB always instead of being
at the mercy to the filesystem driver to report executable or not.
This means that if "core.filemode" lies and claims that filemode
is not supported, then we will ignore the executable bit from the
filesystem. Previously we would have allowed it.
This adds an option to the new git_repository_reset_filesystem to
recurse through submodules if desired. There may be other types
of APIs that would like a "recurse submodules" option, but this
one is particularly useful.
This also has a number of cleanups, etc., for related things
including trying to give better error messages when problems come
up from the filesystem. For example, the FAT filesystem driver on
MacOS appears to return errno EINVAL if you attempt to write a
filename with invalid UTF-8 in it. We try to capture that with a
better error message now.
There may be multiple deltas referencing the same base as well as OFS
deltas which rely on a thin delta. Deal with both at the same time by
injecting a single object and going back up to the main
delta-resolving loop.
When a tool needs to recreate the tree object (for example an
interface to another VCS), it needs to use the raw attributes,
forgoing any normalization.
When a repository is transferred from one file system to another,
many of the config settings that represent the properties of the
file system may be wrong. This adds a new public API that will
refresh the config settings of the repository to account for the
change of file system. This doesn't do a full "reinitialize" and
operates on a existing git_repository object refreshing the config
when done.
This commit then makes use of the new API in clar as each test
repository is set up.
This commit also has a number of other clar test fixes where we
were making assumptions about the type of filesystem, either based
on outdated config data or based on the OS instead of the FS.
When given an ODB from which to read objects, the indexer will attempt
to inject the missing bases at the end of the pack and update the
header and trailer to reflect the new contents.
Though unusual, a packfile may contain a delta whose base is a delta
that comes later. In order index such a packfile, we must not give up
on the first failure to resolve a delta, but keep it around.
If there is a pass which makes no progress, this indicates that the
packfile is broken, so fail accordingly.
The repo init code was assuming Windows == no filemode, and
Mac or Windows == no case sensitivity. Those assumptions are not
consistently true depending on the mounted file system. This is a
first step to removing those assumptions. It focuses on the repo
init code and the tests of that code. There are still many other
tests that are broken when those assumptions don't hold true, but
this clears up one area of the code.
Also, this moves the core.precomposeunicode logic to be closer to
the current logic in core Git where it will be set to true on any
filesystem where composed unicode is decomposed when read back.
The indexer code was generating warnings on Windows 64-bit. I
looked closely at the logic and was able to simplify it a bit.
Also this fixes some other Windows and Linux warnings.
This adds a simple wrapper around the iconv APIs and uses it
instead of the old code that was inlining the iconv stuff. This
makes it possible for me to test the iconv logic in isolation.
A "no iconv" version of the API was defined with macros so that
I could have fewer ifdefs in the code itself.
This simplifies git_path_is_empty_dir on both Windows (getting rid
of git_buf allocation inside the function) and other platforms (by
just using git_path_direach), and adds tests for the function, and
uses the function to simplify some existing tests.
This hooks up git_path_direach and git_path_dirload so that they
will take a flag indicating if directory entry names should be
tested and converted from decomposed unicode to precomposed form.
This code will only come into play on the Apple platform and even
then, only when certain types of filesystems are used.
This involved adding a flag to these functions which involved
changing a lot of places in the code.
This was an opportunity to do a bit of code cleanup here and there,
for example, getting rid of the git_futils_cleanupdir_r function in
favor of a simple flag to git_futils_rmdir_r to not remove the top
level entry. That ended up adding depth tracking during rmdir_r
which led to a safety check for infinite directory recursion. Yay.
This hasn't actually been tested on the Mac filesystems where the
issue occurs. I still need to get test environment for that.
This doesn't actual do string precompose but it puts the hooks in
place into the iterators and the git_path_dirload function so that
the actual precompose work is ready to go.
This adds initialization of core.precomposeunicode to repo init
on Mac. This is necessary because when a Mac accesses a repo on
a VFAT or SAMBA file system, it will return directory entries in
decomposed unicode even if the filesystem entry is precomposed.
This also removes caching of a number of repo properties from the
repo init pipeline because these are properties of the specific
filesystem on which the repo is created, not of the system as a
whole.
This commit adds cancellation for the push operation. This work consists of:
1) Support cancellation during push operation
- During object counting phase
- During network transfer phase
- Propagate GIT_EUSER error code out to caller
2) Improve cancellation support during fetch
- Handle cancellation request during network transfer phase
- Clear error string when cancelled during indexing
3) Fix error handling in git_smart__download_pack
Cancellation during push is still only handled in the pack building and
network transfer stages of push (and not during packbuilding).
References and their logs are logically coupled, let's make it so in
the code by moving the fs-based reflog implementation to live next to
the fs-based refs one.
As part of the change, make the function take names rather than
references, as only the names are relevant when looking up and
handling reflogs.
The basic clone function is there to make it easy to create a "normal"
clone. Remove a bunch of options that are about changing the remote's
configuration.
The text progress and update_tips callbacks are already part of the
struct, which was meant to unify the callback setup, but the download
one was left out.
This adds the basics of progress reporting during push. While progress
for all aspects of a push operation are not reported with this change,
it lays the foundation to add these later. Push progress reporting
can be improved in the future - and consumers of the API should
just get more accurate information at that point.
The main areas where this is lacking are:
1) packbuilding progress: does not report progress during deltafication,
as this involves coordinating progress from multiple threads.
2) network progress: reports progress as objects and bytes are going
to be written to the subtransport (instead of as client gets
confirmation that they have been received by the server) and leaves
out some of the bytes that are transfered as part of the push protocol.
Basically, this reports the pack bytes that are written to the
subtransport. It does not report the bytes sent on the wire that
are received by the server. This should be a good estimate of
progress (and an improvement over no progress).
The subtransport path was relying on pointing to data owned by
the remote which meant that after a redirect, the updated path
was getting lost for future requests. This updates the http
transport to strdup the path and maintain its own lifetime.
This also pulls responsibility for parsing the URL back into the
http transport and isolates the functions that parse and free that
connection data so that they can be reused between the initial
parsing and the redirect parsing.
On occasion, files can disappear while we're iterating the
filesystem, between calls to readdir and stat. Let's pretend
those didn't exist in the first place.
The git_buf_text_gather_stats call returns a boolean indicating if
the file looks like binary data. That shouldn't be an error; it
should be used to skip CRLF processing though.
This replaces some git_buf_printf calls with simple calls to
git_buf_put instead. Also, it fixes a missing va_end inside
the git_buf_vprintf implementation.
The attempt to "clean up warnings" seems to have introduced some
new warnings on compliant compilers. This fixes those in a way
that I suspect will also be okay for the non-compliant compilers.
Also this fixes what appears to be an extra semicolon in the
repo initialization template dir handling (and as part of that
fix, handles the case where an error occurs correctly).
In revwalk, we are doing a very simple check to see if a string
contains wildcard characters, so a full regular expression match
is not needed.
In remote listing, now that we have git_config_foreach_match with
full regular expression matching, we can take advantage of that
and eliminate the regex here, replacing it with much simpler string
manipulation.
This contains a few bug fixes and some header and API cleanups.
The main API change is that filters should now use GIT_PASSTHROUGH
to indicate that they wish to skip processing a file instead of
GIT_ENOTFOUND.
The bug fixes include a possible out-of-range buffer access in
the ident filter, a filter ordering problem I introduced into the
custom filter tests on Windows, and a filter buf NUL termination
issue that was coming up on Linux.
This adds more tests of filters, including the ident filter when
mixed with custom filters. I was able to combine with the reverse
filter and demonstrate that the order of filter application with
the default priority constants matches the order of core Git.
Also, this fixes two issues in the ident filter: preventing ident
expansion on binary files and avoiding a NULL dereference when
dollar sign characters are found without Id.
Fixed the filter order to match core Git, too.
This test demonstrates an interesting behavior of core Git (which
is totally reasonable and which libgit2 matches, although mostly
by coincidence). If you use the ident filter and commit a file
with a garbage ident in it, like '$Id: this is just garbage$' and
then immediately do a 'git checkout-index' with the new file, Git
will not consider the file out of date and will not overwrite the
file with an updated $Id$. Libgit2 has the same behavior. If you
remove the file and then do a checkout-index, it will be replaced
with a filtered version that has injected the OID correctly.
These are a couple of new clar helpers for testing that a file
has expected contents that I extracted from the checkout code.
Actually wrote this as part of an abandoned earlier attempt at a
new filters API, but it will be useful now for some of the tests
I'm going to write.
I wish MSVC understood that "const char **" is not a const ptr,
but it a non-const pointer to an array of const ptrs. Does that
seem like too much to ask.
Checkout should not reject binary files from filters, as a filter
may actually wish to operate on binary files. The CRLF filter should
reject binary files itself if it wishes to. Moreover, the CRLF
filter requires this logic so that users can emulate the checkout
data in their odb -> workdir filtering.
Conflicts:
src/checkout.c
src/crlf.c
This makes the git_buf struct that was used internally into an
externally available structure and eliminates the git_buffer.
As part of that, some of the special cases that arose with the
externally used git_buffer were blended into the git_buf, such as
being careful about git_buf objects that may have a NULL ptr and
allowing for bufs with a valid ptr and size but zero asize as a
way of referring to externally owned data.
This adds the ident filter (that knows how to replace $Id$) and
tweaks the filter APIs and code so that git_filter_source objects
actually have the updated OID of the object being filtered when
it is a known value.
Extend the git2/sys/filter API with functions to look up a filter
and add it manually to a filter list. This requires some trickery
because the regular attribute lookups and checks are bypassed when
this happens, but in the right hands, it will allow a user to have
granular control over applying filters.
Increasingly there are a number of components that want to do some
cleanup at global shutdown time (at least if there are not going
to be memory leaks). This creates a very simple system of shutdown
hooks that will be invoked by git_threads_shutdown. Right now, the
maximum number of hooks is hardcoded, but since adding a hook is
not a public API, it should be fine and I thought it was better to
start off with really simple code.
This moves the git_filter_list into the public API so that users
can create, apply, and dispose of filter lists. This allows more
granular application of filters to user data outside of libgit2
internals.
This also converts all the internal usage of filters to the public
APIs along with a few small tweaks to make it easier to use the
public git_buffer stuff alongside the internal git_buf.
The filter registry as implemented was too primitive to actually
work once multiple filters were coming into play. This expands
the implementation of the registry to handle multiple prioritized
filters correctly.
Additionally, this adds an "attributes" field to a filter that
makes it really really easy to implement filters that are based
on one or more attribute values. The lookup and even simple value
checking can all happen automatically without custom filter code.
Lastly, with the registry improvements, this fills out the filter
lifecycle callbacks, with initialize and shutdown callbacks that
will be called before the filter is first used and after it is
last invoked. This allows for system-wide initialization and
cleanup by the filter.
This creates include/sys/filter.h with a basic definition of a
git_filter and then converts the internal code to use it. There
are related internal objects (git_filter_list) that we will want
to publish at some point, but this is a first step.
This begins the process of exposing git_filter objects to the
public API. This includes:
* new public type and API for `git_buffer` through which an
allocated buffer can be passed to the user
* new API `git_blob_filtered_content`
* make the git_filter type and GIT_FILTER_TO_... constants public