This implements working versions of GIT_DIFF_RECURSE_IGNORED_DIRS
and GIT_STATUS_OPT_RECURSE_IGNORED_DIRS along with some tests for
the newly available behaviors. This is not turned on by default
for status, but can be accessed via the options to the extended
version of the command.
This adds crlf/lf conversion functions into buf_text with more
efficient implementations that bypass the high level buffer
functions. They attempt to minimize the number of reallocations
done and they directly write the buffer data as needed if they
know that there is enough memory allocated to memcpy data.
Tests are added for these new functions. The crlf.c code is
updated to use the new functions.
Removed the include of buf_text.h from filter.h and just include
it more narrowly in the places that need it.
This adds a check to the drop_crlf filter path to check it the
file in the index already has a CR in it, in which case this will
not drop the CRs from the workdir file contents.
This uncovered a "bug" in `git_blob_create_fromworkdir` where the
full path to the file was passed to look up the attributes instead
of the relative path from the working directory root. This meant
that the check in the index for a pre-existing entry of the same
name was failing.
Currently, the odb cache has a fixed size of 128 slots as defined by
GIT_DEFAULT_CACHE_SIZE. Allow users to set the size of the cache via
git_libgit2_opts().
Fixes#1035.
1. Fix sort order problem with submodules where "mod" was sorting
after "mod-plus" because they were being sorted as "mod/" and
"mod-plus/". This involved pushing the "contains a .git entry"
test significantly lower in the stack.
2. Reinstate behavior that a directory which contains a .git entry
will be treated as a submodule during iteration even if it is
not yet added to the .gitmodules.
3. Now that any directory containing .git is reported as submodule,
we have to be more careful checking for GIT_EEXISTS when we
do a submodule lookup, because that is the error code that is
returned by git_submodule_lookup when you try to look up a
directory containing .git that has no record in gitmodules or
the index.
This switches the APIs for setting and getting the global/system
search paths from using git_strarray to using a simple string with
GIT_PATH_LIST_SEPARATOR delimited paths, just as the environment
PATH variable would contain. This makes it simpler to get and set
the value.
I also added code to expand "$PATH" when setting a new value to
embed the old value of the path. This means that I no longer
require separate actions to PREPEND to the value.
Implicit type conversion argument of function to size_t type
Suspicious sequence of types castings: size_t -> int -> size_t
Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)'
Unsigned type is never < 0
The goal of this work is to expose the search logic for "global",
"system", and "xdg" files through the git_libgit2_opts() interface.
Behind the scenes, I changed the logic for finding files to have a
notion of a git_strarray that represents a search path and to store
a separate search path for each of the three tiers of config file.
For each tier, I implemented a function to initialize it to default
values (generally based on environment variables), and then general
interfaces to get it, set it, reset it, and prepend new directories
to it.
Next, I exposed these interfaces through the git_libgit2_opts
interface, reusing the GIT_CONFIG_LEVEL_SYSTEM, etc., constants
for the user to control which search path they were modifying.
There are alternative designs for the opts interface / argument
ordering, so I'm putting this phase out for discussion.
Additionally, I ended up doing a little bit of clean up regarding
attr.h and attr_file.h, adding a new attrcache.h so the other two
files wouldn't have to be included in so many places.
This adds a git_pool_freelist_item struct that makes it a little
easier to follow what's going on with the pool free list block
management code. It is functionally neutral.
This fixes a number of issues identified by valgrind - mostly
missed free calls. Inside valgrind, mmap() may fail which causes
some of the diff tests to fail. This adds a fallback code path
to diff_output.c:get_workdir_content() where is the mmap() fails
the code will now try to read the file data directly into allocated
memory (which is what it would do if the data needed to be filtered
anyhow).
This updates the tree iterator internals to be more efficient.
The tree_iterator_entry objects are now kept as pointers that are
allocated from a git_pool, so that we may use git__tsort_r for
sorting (which is better than qsort, given that the tree is
likely mostly ordered already).
Those tree_iterator_entry objects now keep direct pointers to the
data they refer to instead of keeping indirect index values. This
simplifies a lot of the data structure traversal code.
This also adds bsearch to find the start item position for range-
limited tree iterators, and is more explicit about using
git_path_cmp instead of reimplementing it. The git_path_cmp
changed a bit to make it easier for tree_iterators to use it (but
it was barely being used previously, so not a big deal).
This adds a git_pool_free_array function that efficiently frees a
list of pool allocated pointers (which the tree_iterator keeps).
Also, added new tests for the git_pool free list functionality
that was not previously being tested (or used).
This fixes two bugs with the workdir iterator depth check: first
that the depth was not being decremented and second that empty
directories were counting against the depth even though a frame
was not being created for them.
This also fixes a bug with the ENOTFOUND return code for workdir
iterators when you attempt to advance_into an empty directory.
Actually, that works correctly, but it was incorrectly being
propogated into regular advance() calls in some circumstances.
Added new tests for the above that create a huge hierarchy on
the fly and try using the workdir iterator to traverse it.
Clean up some sorting function stuff including fixing qsort_r
on MinGW, common function pointer type for comparison, and basic
insertion sort implementation (which we, regrettably, fall back
on for MinGW).
Given a group of case-insensitively equivalent tree iterator
entries, this ensures that the case-sensitively first trees will
be used as the representative items. I.e. if you have conflicting
entries "A/B/x", "a/b/x", and "A/b/x", this change ensures that
the earliest entry "A/B/x" will be returned. The actual choice
is not that important, but it is nice to have it stable and to
have it been either the first or last item, as opposed to a
random item from within the equivalent span.
Tree iterator advance was moving forward without taking the
filemode of the entries into account, equating "a" and "a/".
This makes the tree entry comparison code more easily reusable
and fixes the problem.
This fixes an off by one error for generating full paths for
tree entries in tree iterators when INCLUDE_TREES is set. Also,
contains a bunch of small code cleanups with a couple of small
utility functions and macro changes to eliminate redundant code.
If there are case-ambiguities in the path of a case insensitive
tree iterator, it will now rewrite the entire path when it gives
the path name to an entry, so a tree with "A/b/C/d.txt" and
"a/B/c/E.txt" will give the true full paths (instead of case-
folding them both to "A/B/C/d.txt" or "a/b/c/E.txt" or something
like that.
Previously, 0 meant default. This is problematic, as asking for 0
context lines is a valid thing to do.
Change GIT_DIFF_OPTIONS_INIT to default to three and stop treating 0
as a magic value. In case no options are provided, make sure the
options in the diff object default to 3.
Passing NULL is non-sensical. The error message leaves to be desired,
though, as it leaks internal implementation details. Catch it at the
`git_config_set_string` level and set an appropriate error message.
There is a serious bug in the previous tree iterator implementation.
If case insensitivity resulted in member elements being equivalent
to one another, and those member elements were trees, then the
children of the colliding elements would be processed in sequence
instead of in a single flattened list. This meant that the tree
iterator was not truly acting like a case-insensitive list.
This completely reworks the tree iterator to manage lists with
case insensitive equivalence classes and advance through the items
in a unified manner in a single sorted frame.
It is possible that at a future date we might want to update this
to separate the case insensitive and case sensitive tree iterators
so that the case sensitive one could be a minimal amount of code
and the insensitive one would always know what it needed to do
without checking flags.
But there would be so much shared code between the two, that I'm
not sure it that's a win. For now, this gets what we need.
More tests are needed, though.
It's somewhat common to try to write "/refs/tags/something". There is
no easy way to catch it during the main body of the function, as there
is no way to distinguish whether it's a leading slash or a double
slash somewhere in the middle.
Catch this at the beginning so we don't trigger the assert in
is_all_caps_and_underscore().
This standardizes iterator behavior across all three iterators
(index, tree, and working directory). Previously the working
directory iterator behaved differently from the other two.
Each iterator can now operate in one of three modes:
1. *No tree results, auto expand trees* means that only non-
tree items will be returned and when a tree/directory is
encountered, we will automatically descend into it.
2. *Tree results, auto expand trees* means that results will
be given for every item found, including trees, but you
only need to call normal git_iterator_advance to yield
every item (i.e. trees returned with pre-order iteration).
3. *Tree results, no auto expand* means that calling the
normal git_iterator_advance when looking at a tree will
not descend into the tree, but will skip over it to the
next entry in the parent.
Previously, behavior 1 was the only option for index and tree
iterators, and behavior 3 was the only option for workdir.
The main public API implications of this are that the
`git_iterator_advance_into()` call is now valid for all
iterators, not just working directory iterators, and all the
existing uses of working directory iterators explicitly use
the GIT_ITERATOR_DONT_AUTOEXPAND (for now).
Interestingly, the majority of the implementation was in the
index iterator, since there are no tree entries there and now
have to fake them. The tree and working directory iterators
only required small modifications.
The iterator APIs are not currently consistent with the parameter
ordering of the rest of the codebase. This rearranges the order
of parameters, simplifies the naming of a number of functions, and
makes somewhat better use of macros internally to clean up the
iterator code.
This also expands the test coverage of iterator functionality,
making sure that case sensitive range-limited iteration works
correctly.
`git_diff_get_patch()` would unconditionally load the patch object and
then simply leak it if the user hadn't requested it. Short-circuit
loading the object if the user doesn't want it.
The rest of the plugs are simply calling the free functions of objects
allocated during the tests.
These offsets are needed for REF_DELTA objects, which encode which
object they use as a base, but not where it lies in the packfile, so
we need a list.
These objects are mostly from older packfiles, before OFS_DELTA was
widely spread. The time spent in indexing these packfiles is greatly
reduced, though remains above what git is able to do.
This was the first implementation and its goal was simply to have
something that worked. It is slow and now it's just taking up
space. Remove it and switch the one known usage to use the streaming
indexer.
This removes assertions that prevent us from having an empty
git_config object and then updates some tests that were
dependent on global config state to use an empty config before
running anything.
This removes the one-off GIT_CDECL and adds a new standard way of
doing this named GIT_STDLIB_CALL with a src/win32 specific def
when on the Windows platform.
When creating files, instead of actually using GIT_FILEMODE_BLOB
and the other various constants that happen to correspond to
mode values, apparently I should be just using 0666 and 0777, and
relying on the umask to clear bits and make the value sane.
This fixes the rules for copying a template directory and fixes
the checks to match that new behavior. (Further changes to the
checkout logic to follow separately.)
The new tests were not taking core.filemode into account when
testing file modes after repo initialization. Fixed that and some
other Windows warnings that have crept in.
When PR #1359 removed the hooks from the test resources/template
directory, it made me realize that the tests for
git_repository_init_ext using templates must be pretty shabby
because we could not have been testing if the hooks were getting
created correctly.
So, this started with me recreating a couple of hooks, including
a sample and symlink, and adding tests that they got created
correctly in the various circumstances, including with the SHARED
modes, etc. Unfortunately this uncovered some issues with how
directories and symlinks were copied and chmod'ed. Also, there
was a FIXME in the code related to the chmod behavior as well.
Going back over the directory creation logic for setting up a
repository, I found it was a little difficult to read and could
result in creating and/or chmod'ing directories that the user
almost certainly didn't intend.
So that let to this work which makes repo initialization much
more careful (and hopefully easier to follow). It required a
couple of extensions / changes to core fileops utilities, but I
also think those are for the better, at least for git_futils_cp_r
in terms of being careful about what actions it takes.
This is designed to fix libgit2sharp #350 where if .gitignore is
a directory we abort all operations that process ignores instead
of just skipping it as core git does.
Also added test that fails without this change and passes with it.
This moves a couple of checks outside of the inner loop of the
find_similar rename/copy detection phase that are only dependent
on the "from" side of a detection.
Also, this replaces the inefficient initialization of the
options structure when a value is not provided explicitly by the
user.
Instead of creating three git_diff_similarity_metric statically
for the various config options, just create the metric structure
on demand and populate it, using the payload to specific the
extra flags that should be passed to the hashsig. This removes
a level of obfuscation from the code, I think.
This adds some new tests that actually exercise the similarity
metric between files to detect renames, copies, and split modified
files that are too heavily modified.
There is still more testing to do - these tests are just partially
covering the cases.
There is also one bug fix in this where a change set with only
MODIFY being broken into ADD/DELETE (due to low self-similarity)
without any additional RENAMED entries would end up not processing
the split requests (because the num_rewrites counter got reset).
This is the initial integration of the similarity metric into
the `git_diff_find_similar()` code path. The existing tests all
pass, but the new functionality isn't currently well tested. The
integration does go through the pluggable metric interface, so it
should be possible to drop in an alternative to the internal
metric that libgit2 implements.
This comes along with a behavior change for an existing interface;
namely, passing two NULLs to git_diff_blobs (or passing NULLs to
git_diff_blob_to_buffer) will now call the file_cb parameter zero
times instead of one time. I know it's strange that that change
is paired with this other change, but it emerged from some
initialization changes that I ended up making.
Previously the git_diff_delta recorded if the delta was binary.
This replaces that (with no net change in structure size) with
a full set of flags. The flag values that were already in use
for individual git_diff_file objects are reused for the delta
flags, too (along with renaming those flags to make it clear that
they are used more generally).
This (a) makes things somewhat more consistent (because I was
using a -1 value in the "boolean" binary field to indicate unset,
whereas now I can just use the flags that are easier to understand),
and (b) will make it easier for me to add some additional flags to
the delta object in the future, such as marking the results of a
copy/rename detection or other deltas that might want a special
indicator.
While making this change, I officially moved some of the flags that
were internal only into the private diff header.
This also allowed me to remove a gross hack in rename/copy detect
code where I was overwriting the status field with an internal
value.
This plugs in the three basic similarity strategies for handling
whitespace via internal use of the pluggable API. In so doing, I
realized that the use of git_buf in the hashsig API was not needed
and actually just made it harder to use, so I tweaked that API as
well.
Note that the similarity metric is still not hooked up in the
find_similarity code - this is just setting out the function that
will be used.
This moves the similarity metric code out of buf_text and into a
new file. Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes. In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.
This makes the text similarity metric treat \r as equivalent
to \n and makes it skip whitespace immediately following a line
terminator, so line indentation will have less effect on the
difference measurement (and so \r\n will be treated as just a
single line terminator).
This also separates the text and binary hash calculators into
two separate functions instead of have more if statements inside
the loop. This should make it easier to have more differentiated
heuristics in the future if we so wish.
This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score. This can be plugged into diff similarity
scoring.
This replaces most of the explicit vector iteration with calls
to git_vector_foreach, adds in some git__free and giterr_clear
calls to clean up during some error paths, and a couple of
other code simplifications.
The treebuilder entries vector flags removed items which means
we can't rely on the entries vector length to accurately get the
number of entries. This adds an entrycount value and maintains it
while updating the treebuilder entries.
The cppcheck static analyzer generates warnings for a bunch of
places in the libgit2 code base. All the ones fixed in this
commit are actually false positives, but I've reorganized the
code to hopefully make it easier for static analysis tools to
correctly understand the structure. I wouldn't do this if I
felt like it was making the code harder to read or worse for
humans, but in this case, these fixes don't seem too bad and will
hopefully make it easier for better analysis tools to get at any
real issues.
If gethostbyname() fails on platforms with NO_ADDRINFO, the code
leaks the struct addrinfo that was allocated. This fixes that
(and a number of code formatting issues in that area of code in
src/posix.c).
`git_diff_blobs` and `git_diff_blob_to_buffer` skip the step
where we check file attributes because they don't have a filename
associated with the data. Unfortunately, this meant they were also
skipping the check for the GIT_DIFF_FORCE_TEXT option and so you
could not force a diff of an apparent binary file. This adds the
force text check into their code path.
The callback will be called for each file, just before the `git_delta_t` gets inserted into the diff list.
When the callback:
- returns < 0, the diff process will be aborted
- returns > 0, the delta will not be inserted into the diff list, but the diff process continues
- returns 0, the delta is inserted into the diff list, and the diff process continues
Instead of returning directly the pattern as the return value, I used an
out parameter, because the function also tests if the passed pathspecs
vector is empty. If yes, it considers that the path "matches", but in
that case there is no matched pattern per se.
W/o this a libgit2 error message could have a mixed encoding:
e.g. a filename in UTF-8 combined with a native Windows error message
encoded with the local code page.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
A leading slash confuses the name normalization code when the flags
include ALLOW_ONELEVEL. Catch this case in particular to avoid
triggering an assertion in the uppercase check which expects us not to
pass it an empty string.
The existing tests don't catch this as they simply use the NORMAL
flag.
This fixes#1300.
This adds a `git_diff_patch_line_stats()` API that gets the total
number of adds, deletes, and context lines in a patch. This will
make it a little easier to emulate `git diff --stat` and the like.
Right now, this relies on generating the `git_diff_patch` object,
which is a pretty heavyweight way to get stat information. At
some future point, it would probably be nice to be able to get
this information without allocating the entire `git_diff_patch`,
but that's a much larger project.
This is a new implementation of core git's config key checking
rules that prevents non-alphanumeric characters (and '-') for
the top-level section and key names inside of config files.
This also validates the target section name when renaming
sections.
OpenBSD's realpath(3) doesn't require the last part of the path to
exist. Override p_realpath in this OS to bring it in line with the
library's assumptions.
Check whether the backslash at the end of the line is being escaped or
not so as not to consider it a continuation marker when it's e.g. a
Windows-style path.
This is a convenience function to get the branch name of a given
ref. The returned branch name is compatible with the name that can
be supplied e.g. to git_branch_lookup(). That is, the prefixes
"refs/heads" or "refs/remotes" are omitted.
Also added a new test for testing the new function.
With the new code to make tree iterators support ignore_case,
there is a bug in setting the start entry for range bounded
iterators where memcmp was being used instead of strncasecmp.
This fixes that and expands the tree iterator test to cover
the cases that were broken.
The commit time is already stored as a git_time_t, but we were
parsing is as a uint32_t. This just switches the parser to use
uint64_t which will handle dates further in the future (and adds
some tests of those future dates).
When the encoding header changed to be treated as an additional
header, the EOL pointer started to point to the byte after the LF,
making the git__strndup call copy the LF into the value.
Increase the EOL pointer value after copying the data to keep the rest
of the semantics but avoid copying LF.
This moves the check for the "encoding" header into a loop which
is just scanning for non-required headers at the end of a commit
header. That loop will skip unrecognized lines (including header
continuation lines) until a terminating completely blank line is
found, and only then does it move to reading the commit message.
This makes tree iterators directly support case insensitivity by
using a secondary index that can be sorted by icase. Also, this
fixes the ambiguity check in the git_status_file API to also be
case insensitive. Lastly, this adds new test cases for case
insensitive range boundary checking for all types of iterators.
With this change, it should be possible to deprecate the spool
and sort iterator, but I haven't done that yet.
This adds a new external API git_tree_entry_cmp and a new internal
API git_tree_entry_icmp for sorting tree entries. The case
insensitive one is internal only because general users should
never be seeing case-insensitively sorted trees.
git__bsearch and git__tsort did not pass a payload through to the
comparison function. This makes it impossible to implement sorted
lists where the sort order depends on external data (e.g. building
a secondary sort order for the entries in a tree). This commit
adds git__bsearch_r and git__tsort_r versions that pass a third
parameter to the cmp function of a user payload.
This changes the iterator API so that flags can be passed in to
the constructor functions to control the ignore_case behavior.
At this point, the flags are not supported on tree iterators (i.e.
there is no functional change over the old API), but the API
changes are all made to accomodate this.
By the way, I went with a flags parameter because in the future
I have a couple of other ideas for iterator flags that will make
it easier to fix some diff/status/checkout bugs.
Returning GIT_EAMBIGUOUS from inside the status callback gets
overridden with GIT_EUSER. `git_status_file` accounted for this
via the callback payload, but was allowing the error message to
be cleared. Move the `giterr_set` call outside the callback to
where the EUSER case was being dealt with.
In preparation for further iterator changes, this cleans up a few
small things in the iterator API:
* removed the git_iterator_for_repo_index_range API
* made git_iterator_free not be inlined
* minor param name and test function name tweaks
Somewhat surprisingly, this can increase the speed considerably, as we
don't bother trying to decide what to evict, and the most used entries
are quickly back into the cache.
This is an intermin solution. While this essentially disables the
--shared flag feature, previously external templates did not work
at all. This change fixes the previously corrected, and since
then failing, repo_init__extended_with_template() test.
The problem is now documented in the source code comments.
The indexer needs to call the packfile's free function so it takes care of
freeing the caches.
We still need to close the mwf descriptor manually so we can rename the
packfile into its final name on Windows.
Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.
This was just wrong. Added a test that verifying patch line
numbers even for hunks further into a file and then fixed the
algorithm. I needed to add a little extra state into the patch
so that I could track old and new file numbers independently,
but it should be okay.
Many delta bases are re-used. Cache them to avoid inflating the same
data repeatedly.
This version doesn't limit the amount of entries to store, so it can
end up using a considerable amound of memory.
This adds an option to checkout a la the diff option to turn off
fnmatch evaluation for pathspec entries. This can be useful to
make sure your "pattern" in really interpretted as an exact file
match only.
All the ODB backends have a specific refresh interface. When reading an
object, first we attempt every single backend: if the read fails, then
we refresh all the backends and retry the read one more time to see if
the object has appeared.
It is not legal inside our `p_mmap` function to mmap a zero length
file. This adds a test that exercises that case inside diff and
fixes the code path where we would try to do that.
The fix turns out not to be a lot of code since our default file
content is already initialized to "" which works in this case.
Fixes#1210
This moves the implementation of these two APIs into common code
that will be shared between the two. Also, this adds tests for
the `git_diff_blob_to_buffer` API. Lastly, this adds some extra
`const` to a few places that can use it.
Before this, we error out from `reference_matches_remote_head` if the
reference we're searching for does not exist.
Since we explicitly check if master is existing in `update_head_to_remote`
and error out if it doesn't, a repository without master branch could
not be cloned.
In fact this was later clobbered by what is fixed in #1194.
However, this patch introduces a `found` member in `head_info` and sets
it accordingly. That also saves us from checking the string length of
`branchname` a few times.
As a function that appears to only be called on error paths, I don't
think it makes sense for it to return an error, or clobber the global
giterr. Note that no existing callsites actually check the return
code.
In my own application, there are errors where the real error ends
up being hidden, as git_mwindow_file_deregister() clobbers the
global giterr. I'm not sure this error is even relevant?
I saw a repo in the wild today which had a master branch ref which was packed, but had no trailing newline. Git handled it fine, but libgit2 choked on it. Fix seems simple enough. If we don't see a newline, assume the end of the buffer is the end of the ref line.
There are a couple of checkout bugs fixed here. One is with
untracked working directory entries that are prefixes of tree
entries but not in a meaningful way (i.e. "read" is a prefix of
"readme.txt" but doesn't interfere in any way). The second bug
is actually a redo of 07edfa0fc640f85f95507c3101e77accd7d2bf0d
where directory entries in the index that are not in the diff
were not being removed correctly. That fix remedied one case
but broke another.
When checking out with the GIT_CHECKOUT_REMOVE_UNTRACKED option
and there was an entire tree in the working directory and in the
index that is not in the baseline nor target commit, the tree was
correctly(?) removed from the working directory but was not
successfully removed from the index. This fixes that and adds a
test of the functionality.
This moves a lot of the detailed checkout documentation into a new
file (docs/checkout-internals.md) and simplifies the public docs
for the checkout API.
There were a bunch of small bugs in the checkout code where I was
assuming that a typechange was always from a tree to a blob or
vice versa. This fixes up most of those cases. Also, there were
circumstances where the submodule definitions were changed by the
checkout and the submodule data was not getting reloaded properly
before the new submodules were checked out.
The notifications were broken from the various iterations over
this code and were not returning working dir item data correctly.
Also, workdir items that were alphabetically after the last item
in diff were not being processed.
The spoolandsort iterator changes got sort-of cherry picked out of
this branch and so I dropped the commit when rebasing; however,
there were a few small changes that got dropped as well (since the
version merged upstream wasn't quite the same as what I dropped).
This adds a new API to the submodule interface that just returns
where information about the submodule was found (e.g. config file
only or in the HEAD, index, or working directory).
Also, the old "refresh" call was potentially keeping some stale
submodule data around, so this simplfies that code and literally
discards the old cache, then reallocates.
Stash was sometimes obscuring the actual error code, replacing it
with a -1 when there was more descriptive value. This updates
stash to preserve the original error code more reliably along
with a variety of other error handling tweaks.
I believe this is an improvement, but arguably, preserving the
underlying error code may result in values that are harder to
interpret by the caller who does not understand the internals.
Discussion is welcome!
Previously a NULL oid was handled like an empty buffer and
returned a status empty string. This makes git_oid_tostr()
set the output buffer to the empty string instead.
Make checkout update entries in the index for all files that are
updated and/or removed, unless flag GIT_CHECKOUT_DONT_UPDATE_INDEX
is given. To do this, iterators were extended to allow a little
more introspection into the index being iterated over, etc.
This flips checkout back to be driven off the changes between
the baseline and the target trees. This reinstates the complex
code for tracking the contents of the working directory, but
overall, I think the resulting logic is easier to follow.
I've tried to map out the detailed behaviors of checkout and make
sure that we're handling the various cases correctly, along with
providing options to allow us to emulate "git checkout" and "git
checkout-index" with the various flags. I've thrown away flags
in the checkout API that seemed like clutter and added some new
ones. Also, I've converted the conflict callback to a general
notification callback so we can emulate "git checkout" output and
display "dirty" files.
As of this commit, the new behavior is not working 100% but some
of that is probably baked into tests that are not testing the
right thing. This is a decent snapshot point, I think, along the
way to getting the update done.
This corrects the order of operations in git reset so that the
checkout to reset the working directory content is done before
the HEAD is moved. This allows us to use the HEAD and the index
content to know what files can / should safely be reset.
Unfortunately, there are still some cases where the behavior of
this revision differs from core git. Notable, a file which has
been added to the index but is not present in the HEAD is
considered to be tracked by core git (and thus removable by a
reset command) whereas since this loads the target state into
the index prior to resetting, it will consider such a file to be
untracked and won't touch it. That is a larger fix that I'll
defer to a future commit.
* gen_pktline() in smart_protocol.c was skipping refspecs that deleted
refs that were not advertised by the server. The new behavior is to
send a delete command with an old-id of zero, which matches the behavior
of the official git client.
* Update test_network_push__delete() in reaction to above fix.
* Obviate messy logic that handles missing push_spec rrefs by canonicalizing
push_spec. After calculate_work(), loid, roid, and rref, are filled in with
exactly what is sent to the server
The original libpqueue file were licensed under Apache 2.0 so
therefore should retain their copyrights and header as per the
license terms at http://www.apache.org/licenses/LICENSE-2.0
When normalizing a reference name, if there is an error because
the name is invalid, then the memory allocated for storing the
name could be leaked if the caller was not careful and assumed
that the error return code meant that no allocation had occurred.
This fixes that by explicitly deallocating the reference name
buffer if there is an error in normalizing the name.
An earlier change to `git_diff_from_iterators` introduced a
memory leak where the allocated spoolandsort iterator was not
returned to the caller and thus not freed.
One proposal changes all iterator APIs to use git_iterator** so
we can reallocate the iterator at will, but that seems unexpected.
This commit makes it so that an iterator can be changed in place.
The callbacks are isolated in a separate structure and a pointer
to that structure can be reassigned by the spoolandsort extension.
This means that spoolandsort doesn't create a new iterator; it
just allocates a new block of callbacks (along with space for its
own extra data) and swaps that into the iterator.
Additionally, since spoolandsort is only needed to switch the
case sensitivity of an iterator, this simplifies the API to only
take the ignore_case boolean and to be a no-op if the iterator
already matches the requested case sensitivity.
The diff constructor functions had some confusing names, where the
"old" side of the diff was coming after the "new" side. This
reverses the order in the function name to make it less confusing.
Specifically...
* git_diff_index_to_tree becomes git_diff_tree_to_index
* git_diff_workdir_to_index becomes git_diff_index_to_workdir
* git_diff_workdir_to_tree becomes git_diff_tree_to_workdir
According to man 3 SSL_shutdown / TLS, "If a unidirectional shutdown is
enough (the underlying connection shall be closed anyway), this first
call to SSL_shutdown() is sufficient."
Currently, an unidirectional shutdown is enough, since
gitno_ssl_teardown is called by gitno_close only. Do so to avoid further
errors (by misbehaving peers for example).
Fixes#1129.
While C Git has been writing entry count -1 (ie. never other negative
numbers) as invalid since day 1, it accepts all negative entry counts
as invalid. JGit follows the same rule. libgit2 should also follow, or
the index that works with C Git or JGit may someday be rejected by
libgit2.
Other reimplementations like dulwich and grit have not bothered with
parsing or writing tree cache.
The `git_iterator_reset` command has not been working in all cases
particularly when there is a start and end range. This fixes it
and adds tests for it, and also extends it with the ability to
update the start/end range strings when an iterator is reset.
This removes the need to explicitly pass the repo into iterators
where the repo is implied by the other parameters. This moves
the repo to be owned by the parent struct. Also, this has some
iterator related updates to the internal diff API to lay the
groundwork for checkout improvements.
If commit timestamps are off, we're more likely to hit a traversal
where the first path ends up traversing past the root commit of the tree.
If that happens, it's possible that the loop will complete before the second
path marks some of those final parents. This fix keeps track of the root
nodes that are encountered in the traversal, and verify that they are
properly marked.
In the best case, with accurate timestamps, the traversal will continue
to terminate when all the commits are STALE (parents of a merge-base), as
it did before. In the worst case, where one path makes a complete traversal
past a root commit, we will continue the loop until the root commit itself
is marked.
This could also use PTHREAD_MUTEX_INITIALIZER, but a dynamic initializer seems like a more portable concept, and we won't need another #define on top of git_mutex_init()
Storing 4kB or 8kB in the stack is not very gentle. As this part has
to be linear, put the buffer into the indexer object so we allocate it
once in the heap.
There are many different broken filemodes in the wild so we need to
protect against them and give something useful up the chain. Don't
fail when reading a tree from the ODB but normalize the mode as best
we can.
As 664 is no longer a mode that we consider to be valid and gets
normalized to 644, we can stop accepting it in the treebuilder. The
library won't expose it to the user, so any invalid modes are a bug.
To paraphrase @peff:
You can get both size and type from a packed object reasonably cheaply.
If you have:
* An object that is not a delta; both type and size are available in the
packfile header.
* An object that is a delta. The packfile type will be OBJ_*_DELTA, and
you have to resolve back to the base to find the real type. That means
potentially a lot of packfile index lookups, but each one is
relatively cheap. For the size, you inflate the first few bytes of the
delta, whose header will tell you the resulting size of applying the
delta to the base.
For simplicity, we just decompress the whole delta for now.
A mmap-window is not guaranteed to give you the whole object, but the
indexer currently assumes so.
Loop asking for more data until we've successfully CRC'd all of the
packed data.
Up to now, deltas needed to be enterily in the packfile, and we tried
to decompress then in their entirety over and over again.
Adjust the logic so we read them as they come, just as we do for full
objects. This also allows us to simplify the logic and have less
nested code. The delta resolving phase still needs to decompress the
whole object into memory, as there is not yet any streaming
delta-apply support, but it helps in speeding up the downloading
process and reduces the amount of memory allocations we need to do.
The new API allows us to read the object bit by bit from the packfile,
instead of needing it all at once in the packfile. This also allows us
to hash the object as it comes in from the network instead of having
to try to read it all and failing repeatedly for larger objects.
This is only the first step, but it already shows huge improvements
when dealing with objects over a few megabytes in size. It reduces the
memory needs in some cases, but delta objects still need to be
completely in memory and the old inefficent method is still used for
that.
`revwalk.h:commit_lookup()` -> `git_revwalk__commit_lookup()`
and make `git_commit_list_parse()` do real error checking that
the item in the list is an actual commit object. Also fixed an
apparent typo in a test name.
Moved it into graph.{c,h} which i created for the new "graph"
functions namespace. Also adjusted the function prototype
to use `size_t` and `const git_oid *`.
There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc). This groups all those functions together into a
new file and converts the code to use that.
This has two enhancements to existing functionality. The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).
clang-SVN HEAD kindly provided my the info, that sm_repo maybe
uninitialized when we want to free it (If the expression in line 358 or
359/360 evaluate to true, we jump to "cleanup", where we'd use sm_repo
uninitialized).
This fixes some missed places where we can apply const-ness to
various public APIs.
There are still some index and tree APIs that cannot take const
pointers because we sort our `git_vectors` lazily and so we can't
reliably bsearch the index and tree content without applying a
`git_vector_sort()` first.
This also fixes some missed places where size_t can be used and
where const can be applied to a couple internal functions.
This makes the diff functions that take callbacks both take
the payload parameter after the callback function pointers and
pass the payload as the last argument to the callback function
instead of the first. This should make them consistent with
other callbacks across the API.
3f9eb1e introduced support for SSL certificates issued for IP
addresses, making use of in_addr and in_addr6 structs. On FreeBSD
these are defined in (a file included in) <netinet/in.h>, so include
that file on FreeBSD and get the build working again.
The workdir iterator has always tried to ignore .git files, but
it turns out there were some bugs. This makes it more robust at
ignoring .git files.
This also makes iterators always check ".git" case insensitively
regardless of the properties of the system. This will make libgit2
skip ".GIT" and the like. This is different from core git, but on
systems with case insensitive but case preserving file systems,
allowing ".GIT" to be added is problematic.
This checks for a leading '.' before looking for the invalid
tree entry names. Even on pretty high levels of optimization,
this seems to make a measurable improvement.
I accidentally used && in the check initially instead of || and
while debugging ended up improving the error reporting of issues
with adding tree entries. I thought I'd leave those changes, too.
A number of diff APIs and the `git_checkout_index` API take a
`git_repository` object an operate on the index. This updates
them to take a `git_index` pointer explicitly and only fall back
on the `git_repository` index if the index input is NULL. This
makes it easier to operate on a temporary index.
The index iterator could previously only be created from a repo
object, but this allows creating an iterator from a `git_index`
object instead (while keeping, though renaming, the old function).
The existing p_lstat implementation on win32 is not quite POSIX
compliant when setting errno to ENOTDIR. This adds an option to
make is be compliant so that code (such as checkout) that cares
to have separate behavior for ENOTDIR can use it portably.
This also contains a couple of other minor cleanups in the
posix_w32.c implementations to avoid unnecessary work.
Using the builtin strcmp and strcasecmp as function pointers is
problematic on win32. This adds internal implementations and
divorces us from the platform linkage.
Returning NULL for the string when we haven't signaled an error
condition is counter-intuitive and causes unnecessary edge
cases. Return an empty string when asking for a string value for a
configuration variable such as '[section] var' to avoid these edge
cases.
If the distinction between no value and an empty value is needed, this
can be retrieved from the entry directly. As a side-effect, this
change stops the int parsing functions from segfaulting on such a
variable.
This fixes a number of warnings and problems with cross-platform
builds. Among other things, it's not safe to name a member of a
structure "strcmp" because that may be #defined.
This is a major reworking of checkout strategy options. The
checkout code is now sensitive to the contents of the HEAD tree
and the new options allow you to update the working tree so that
it will match the index content only when it previously matched
the contents of the HEAD. This allows you to, for example, to
distinguish between removing files that are in the HEAD but not
in the index, vs just removing all untracked files.
Because of various corner cases that arise, etc., this required
some additional capabilities in rmdir and other utility functions.
This includes the beginnings of an implementation of code to read
a partial tree into the index based on a pathspec, but that is
not enabled because of the possibility of creating conflicting
index entries.
There are some diff functions that are useful in a rewritten
checkout and this lays some groundwork for that. This contains
three main things:
1. Share the function diff uses to calculate the OID for a file
in the working directory (now named `git_diff__oid_for_file`
2. Add a `git_diff__paired_foreach` function to iterator over
two diff lists concurrently. Convert status to use it.
3. Move all the string/prefix/index entry comparisons into
function pointers inside the `git_diff_list` object so they
can be switched between case sensitive and insensitive
versions. This makes them easier to reuse in various
functions without replicating logic. As part of this, move
a couple of index functions out of diff.c and into index.c.
Diff uses a `git_strarray` of path specs to represent a subset
of all files to be processed. It is useful to be able to reuse
this filtering in other places outside diff, so I've moved it
into a standalone set of utilities.
This makes it so that the check if a file is ignored will be
deferred until requested on the workdir iterator, instead of
aggressively evaluating the ignore rules for each entry. This
should improve performance because there will be no need to
check ignore rules for files that are already in the index.
So, @nulltoken created a failing test case for checkout that
proved to be particularly daunting. If checkout is given only
a very limited strategy mask (e.g. just GIT_CHECKOUT_CREATE_MISSING)
then it is possible for typechange/rename modifications to leave it
unable to complete the request. That's okay, but the existing code
did not have enough information not to generate an error (at least
for tree/blob conflicts).
This led me to a significant reorganization of the code to handle
the failing case, but it has three benefits:
1. The test case is handled correctly (I think)
2. The new code should actually be much faster than the old code
since I decided to make checkout aware of diff list internals.
3. The progress value accuracy is hugely increased since I added
a fourth pass which calculates exactly what work needs to be
done before doing anything.
* Rework GIT_DIRREMOVAL values to GIT_RMDIR flags, allowing
combinations of flags
* Add GIT_RMDIR_EMPTY_PARENTS flag to remove parent dirs that
are left empty after removal
* Add GIT_MKDIR_VERIFY_DIR to give an error if item is a file,
not a dir (previously an EEXISTS error was ignored, even for
files) and enable this flag for git_futils_mkpath2file call
* Improve accuracy of error messages from git_futils_mkdir
This fix makes libgit2 capable of parsing annotated tag objects that lack
the optional message/description field.
Previously, libgit2 treated this field as mandatory and raised a tag_error on
such tags. However, the message field is optional.
An example of such a tag is refs/tags/v2.6.16.31-rc1 in Linux:
$ git cat-file tag refs/tags/v2.6.16.31-rc1
object afaa018cefb6af63befef1df7d8febaae904434f
type commit
tag v2.6.16.31-rc1
tagger Adrian Bunk <bunk@stusta.de> 1162716505 +0100
$
This improves docs in some of the public header files, cleans
up and improves some of the example code, and fixes a couple
of pedantic warnings in places.
This adds a new API that allows users to reload the config if the
file has changed on disk. A new config callback function to
refresh the config was added.
The modified time and file size are used to test if the file needs
to be reloaded (and are now stored in the disk backend object).
In writing tests, just using mtime was a problem / race, so I
wanted to check file size as well. To support that, I extended
`git_futils_readbuffer_updated` to optionally check file size in
addition to mtime, and I added a new function `git_filebuf_stats`
to fetch the mtime and size for an open filebuf (so that the
config could be easily refreshed after a write).
Lastly, I moved some similar file checking code for attributes
into filebuf. It is still only being used for attrs, but it
seems potentially reusable, so I thought I'd move it over.
This improves the naming for the rename related functionality
moving it to be called `git_diff_find_similar()` and renaming
all the associated constants, etc. to make more sense.
I also moved the new code (plus the existing `git_diff_merge`)
into a new file `diff_tform.c` where I can put new functions
related to manipulating git diff lists.
This also updates the implementation significantly from the
last revision fixing some ordering issues (where break-rewrite
needs to be handled prior to copy and rename detection) and
improving config option handling.
This adds a `git_diff_patch_print()` API which is more like the
existing API to "print" a patch from an entire `git_diff_list`
but operates on a single `git_diff_patch` object.
Also, it rewrites the `git_diff_patch_to_str()` API to use that
function (making it very small).
This implements the basis for diff rename and copy detection,
although it is based on simple SHA comparison right now instead
of using a matching algortihm. Just as `git_diff_merge` can be
used as a post-pass on diffs to emulate certain command line
behaviors, there is a new API `git_diff_detect` which will
update a diff list in-place, adjusting some deltas to RENAMED
or COPIED state (and also, eventually, splitting MODIFIED deltas
where the change is too large into DELETED/ADDED pairs).
This also adds a new test repo that will hold rename/copy/split
scenarios. Right now, it just has exact-match rename and copy,
but the tests are written to use tree diffs, so we should be able
to add new test scenarios easily without breaking tests.
libcryto's SHA-1 implementation is measurably better than the one that
ships with the library. If we link to it for HTTPS support already,
use that implementation instead.
Testing on a ~600MB of the linux repository, this reduces indexing
time by 40% and removes the hashing from the top spot in the perf
output.
Added `struct git_config_entry`: a git_config_entry contains the key, the value, and the config file level from which a config element was found.
Added `git_config_open_level`: build a single-level focused config object from a multi-level one.
We are now storing `git_config_entry`s in the khash of the config_file
- make sure temporary streamed blobs are created under the
.git/objects folder and not in the current path, whatever it is.
- do not make the name of the temp file depend on the hintpath.
git_index_read_tree() was exposing a parameter to provide the user with
a progress indicator. Unfortunately, due to the recursive nature of the
tree walk, the maximum number of items to process was unknown. Thus,
the indicator was only counting processed entries, without providing
any information how the number of remaining items.
The new Win32 global path search was not working with the
environment variable tests. But when I fixed the test, the new
codes use of getenv() was causing more failures (presumably because
of caching on Windows ???). This fixes the global file lookup to
always go directly to the Win32 API in a predictable way.
Introduce git_remote_stop() which sets a variable that is checked by
the fetch process in a few key places. If this is variable is set, the
fetch is aborted.
Fixed no-submodule speedup of new checkout code. Fixed missing
final update to progress (which may go away, I realize). Fixed
unused structure in header and incorrect comment.
To answer if a single given file should be ignored, the path to
that file has to be processed progressively checking that there
are no intermediate ignored directories in getting to the file
in question. This enables that, fixing the broken old behavior,
and adds tests to exercise various ignore situations.
Because fnmatch uses recursion, there were some input sequences
that cause seriously degenerate behavior. This imports a fix
that imposes a max recursion limiter to avoid the worst of it.
We used to require loose references to contain only an OID (possibly
after trimming the string). This is however not enough for letting us
lookup FETCH_HEAD, which can have a lot of content after the initial
OID.
Change the parsing rules so that a loose refernce must e at least 40
bytes long and the 41st (if it's there) must be accepted by
isspace(3). This makes the trim unnecessary, so only do it for
symrefs. This fixes#977.
The fix for fetching from empty repositories (22935b06d protocol:
don't store flushes; 2012-10-07) forgot to take into account the
deletion of the flush pkt in the HTTP transport. As a result, the HEAD
ref advertisement where we detect the remote's capabilities was
deleted instead. Fix this.
This started as a complex new test for checkout going through the
"typechanges" test repository, but that revealed numerous issues
with checkout, including:
* complete failure with submodules
* failure to create blobs with exec bits
* problems when replacing a tree with a blob because the tree
"example/" sorts after the blob "example" so the delete was
being processed after the single file blob was created
This fixes most of those problems and includes a number of other
minor changes that made it easier to do that, including improving
the TYPECHANGE support in diff/status, etc.
This is just some cleanup code, rearranging some of the checkout
code where TYPECHANGE support was added and adding some comments
to the diff header regarding the constants.
When I wrote the diff code, I based it on core git's diff output
which tends to split a type change into an add and a delete. But
core git's status has the notion of a T (typechange) flag for a
file. This introduces that into our status APIs and modifies the
diff code so it can be forced to not split type changes.
The adds a test for the submodule diff capabilities and then
fixes a few bugs with how the output is generated. It improves
the accuracy of OIDs in the diff delta object and makes the
submodule output more closely mirror the OIDs that will be used
by core git.
There are a few cases where diff should leave directories in
the diff list if we want to match core git, such as when the
directory contains a .git dir. That feature was lost when I
introduced some of the new submodule handling.
This restores that and then fixes a couple of related to diff
output that are triggered by having diffs with directories in
them.
Also, this adds a new flag that can be passed to diff if you
want diff output to actually include the file content of any
untracked files.
The reference is only needed inside the function. We mistakenly
increased the reference counter causing the ODB not to get freed and
leaking descriptors.
Storing flushes in the refs vector doesn't let us recognize when the
remote is empty, as we'd always introduce at least one element into
it. These flushes aren't necessary, so we can simply ignore them.
We don't have anything useful that we could do with that oid anyway (We
need to query the submodule for the HEAD commit instead).
Without this, the following code creates the error "Failed to read
descriptor: Is a directory" when run against the submod2 test-case:
const char* oidstr = "873585b94bdeabccea991ea5e3ec1a277895b698";
git_tree* tree = resolve_commit_oid_to_tree(g_repo, oidstr);
git_diff_list* diff = NULL;
cl_assert(tree);
cl_git_pass(git_diff_workdir_to_tree(g_repo, NULL, tree, &diff));
1. teach diff.c:maybe_modified to query git_submodule_status for the
modification state of a submodule. According to the
git_submodule_status docs, it will filter for to-ignore states
already.
2. teach diff_output.c:get_workdir_content to check the submodule status
again and create a line like:
Subproject commit <SHA-1>\n
or
Subproject comimt <SHA-1>-dirty\n
like git.git does.
diff_output.c:get_blob_content used to try to read the submodule commit
as a blob in the superproject's odb. Of course it cannot find it and
errors out with GIT_ENOTFOUND, implcitly terminating the whole diff
output.
This patch teaches it to create a text that describes the submodule
instead. The text looks like:
Subproject commit <SHA1>\n
which is what git.git does, too.
Together with include-tag, this make us behave more like git. After a
fetch, try to create any tags the remote told us about for which we
have objects locally.
Indicate whether the error comes from the ref already existing or
elsewhere. We always perform the check and this lets the user write
more concise code.
There are a lot of places where the diff API gives the user access
to internal data structures and many of these were being exposed
through non-const pointers. This replaces them all with const
pointers for any object that the user can access but is still
owned internally to the git_diff_list or git_diff_patch objects.
This will probably break some bindings... Sorry!
This fixes all the bugs in the new diff patch code. The only
really interesting one is that when we merge two diffs, we now
have to actually exclude diff delta records that are not supposed
to be tracked, as opposed to before where they could be included
because they would be skipped silently by `git_diff_foreach()`.
Other than that, there are just minor errors.
Replacing the `git_iterator` object, this creates a simple API
for accessing the "patch" for any file pair in a diff list and
then gives indexed access to the hunks in the patch and the lines
in the hunk. This is the initial implementation of this revised
API - it is still broken, but at least builds cleanly.
This file is not just read if the global config file (%HOME%/.gitconfig)
is not found, however, it is used everytime but with lower priority.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Do not hardcode the installation path of msysgit, but read installation path from registry.
Also "%PROGRAMFILES%\Git\etc" won't work on x64 systems with 64-bit libgit2, because
msysgit is x86 only and located in "%ProgramFiles(x86)%\Git\etc".
Signed-off-by: Sven Strickroth <email@cs-ware.de>
On most systems %USERPROFILE% is the same as %HOMEDRIVE%\%HOMEPATH%,
however, for windows machines in an AD or domain environment this
might be different and %HOMEDRIVE%\%HOMEPATH% seems to be better.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Use %HOME% before trying to figure out the windows user directory.
Users might set this as they are used on *nix systems.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Since quite a while now, git_branch_foreach has learnt to list branches
without the 'refs/heads/' or 'refs/remotes' prefixes.
This patch teaches git_tag_list to do the same for listing tags.
There has been discussion for a while about making some set of
the `giterr_set` type functions part of the public API for code
that is implementing new backends to libgit2. This makes the
`giterr_set_str()` and `giterr_set_oom()` functions public.
The old method was avoiding re-loading of packfiles by watching the mtime of the
pack directory. This causes the ODB to become stale if the directory and packfile
are written within the same clock millisecond, as when cloning a fairly small
repo.
This method tries to find the object in the cached packs, and forces a refresh when
that fails. This will cause extra stat'ing on a miss, but speeds up the success
case and avoids this race condition.