`git_diff_get_patch()` would unconditionally load the patch object and
then simply leak it if the user hadn't requested it. Short-circuit
loading the object if the user doesn't want it.
The rest of the plugs are simply calling the free functions of objects
allocated during the tests.
This was the first implementation and its goal was simply to have
something that worked. It is slow and now it's just taking up
space. Remove it and switch the one known usage to use the streaming
indexer.
This removes assertions that prevent us from having an empty
git_config object and then updates some tests that were
dependent on global config state to use an empty config before
running anything.
When creating files, instead of actually using GIT_FILEMODE_BLOB
and the other various constants that happen to correspond to
mode values, apparently I should be just using 0666 and 0777, and
relying on the umask to clear bits and make the value sane.
This fixes the rules for copying a template directory and fixes
the checks to match that new behavior. (Further changes to the
checkout logic to follow separately.)
The new tests were not taking core.filemode into account when
testing file modes after repo initialization. Fixed that and some
other Windows warnings that have crept in.
When PR #1359 removed the hooks from the test resources/template
directory, it made me realize that the tests for
git_repository_init_ext using templates must be pretty shabby
because we could not have been testing if the hooks were getting
created correctly.
So, this started with me recreating a couple of hooks, including
a sample and symlink, and adding tests that they got created
correctly in the various circumstances, including with the SHARED
modes, etc. Unfortunately this uncovered some issues with how
directories and symlinks were copied and chmod'ed. Also, there
was a FIXME in the code related to the chmod behavior as well.
Going back over the directory creation logic for setting up a
repository, I found it was a little difficult to read and could
result in creating and/or chmod'ing directories that the user
almost certainly didn't intend.
So that let to this work which makes repo initialization much
more careful (and hopefully easier to follow). It required a
couple of extensions / changes to core fileops utilities, but I
also think those are for the better, at least for git_futils_cp_r
in terms of being careful about what actions it takes.
This is designed to fix libgit2sharp #350 where if .gitignore is
a directory we abort all operations that process ignores instead
of just skipping it as core git does.
Also added test that fails without this change and passes with it.
This includes tests for crlf changes, whitespace changes with the
default comparison and with the ignore whitespace comparison, and
more sensitivity checking for the comparison code.
This fixes both a test that I broke in diff::patch where I was
relying on the current state of the working directory for the
renames test data and fixes an unstable test in diff::rename
where the environment setting for the "diff.renames" config was
being allowed to influence the test results.
This adds some new tests that actually exercise the similarity
metric between files to detect renames, copies, and split modified
files that are too heavily modified.
There is still more testing to do - these tests are just partially
covering the cases.
There is also one bug fix in this where a change set with only
MODIFY being broken into ADD/DELETE (due to low self-similarity)
without any additional RENAMED entries would end up not processing
the split requests (because the num_rewrites counter got reset).
This is the initial integration of the similarity metric into
the `git_diff_find_similar()` code path. The existing tests all
pass, but the new functionality isn't currently well tested. The
integration does go through the pluggable metric interface, so it
should be possible to drop in an alternative to the internal
metric that libgit2 implements.
This comes along with a behavior change for an existing interface;
namely, passing two NULLs to git_diff_blobs (or passing NULLs to
git_diff_blob_to_buffer) will now call the file_cb parameter zero
times instead of one time. I know it's strange that that change
is paired with this other change, but it emerged from some
initialization changes that I ended up making.
Previously the git_diff_delta recorded if the delta was binary.
This replaces that (with no net change in structure size) with
a full set of flags. The flag values that were already in use
for individual git_diff_file objects are reused for the delta
flags, too (along with renaming those flags to make it clear that
they are used more generally).
This (a) makes things somewhat more consistent (because I was
using a -1 value in the "boolean" binary field to indicate unset,
whereas now I can just use the flags that are easier to understand),
and (b) will make it easier for me to add some additional flags to
the delta object in the future, such as marking the results of a
copy/rename detection or other deltas that might want a special
indicator.
While making this change, I officially moved some of the flags that
were internal only into the private diff header.
This also allowed me to remove a gross hack in rename/copy detect
code where I was overwriting the status field with an internal
value.
This plugs in the three basic similarity strategies for handling
whitespace via internal use of the pluggable API. In so doing, I
realized that the use of git_buf in the hashsig API was not needed
and actually just made it harder to use, so I tweaked that API as
well.
Note that the similarity metric is still not hooked up in the
find_similarity code - this is just setting out the function that
will be used.
This moves the similarity metric code out of buf_text and into a
new file. Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes. In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.
This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score. This can be plugged into diff similarity
scoring.
The recent changes with git_treebuilder_entrycount point out that
the test coverage for git_treebuilder_remove and
git_treebuilder_entrycount is completely absent. This adds tests.
The cppcheck static analyzer generates warnings for a bunch of
places in the libgit2 code base. All the ones fixed in this
commit are actually false positives, but I've reorganized the
code to hopefully make it easier for static analysis tools to
correctly understand the structure. I wouldn't do this if I
felt like it was making the code harder to read or worse for
humans, but in this case, these fixes don't seem too bad and will
hopefully make it easier for better analysis tools to get at any
real issues.
The callback will be called for each file, just before the `git_delta_t` gets inserted into the diff list.
When the callback:
- returns < 0, the diff process will be aborted
- returns > 0, the delta will not be inserted into the diff list, but the diff process continues
- returns 0, the delta is inserted into the diff list, and the diff process continues
A leading slash confuses the name normalization code when the flags
include ALLOW_ONELEVEL. Catch this case in particular to avoid
triggering an assertion in the uppercase check which expects us not to
pass it an empty string.
The existing tests don't catch this as they simply use the NORMAL
flag.
This fixes#1300.
This adds a `git_diff_patch_line_stats()` API that gets the total
number of adds, deletes, and context lines in a patch. This will
make it a little easier to emulate `git diff --stat` and the like.
Right now, this relies on generating the `git_diff_patch` object,
which is a pretty heavyweight way to get stat information. At
some future point, it would probably be nice to be able to get
this information without allocating the entire `git_diff_patch`,
but that's a much larger project.
- Fix stack corruption introduced in 9bccf33c due to passing pointer to
local variable _cred_acquire_called.
- Fix strcmp in do_verify_push_status when expected or actual push_status
is NULL
Check whether the backslash at the end of the line is being escaped or
not so as not to consider it a continuation marker when it's e.g. a
Windows-style path.
This is a convenience function to get the branch name of a given
ref. The returned branch name is compatible with the name that can
be supplied e.g. to git_branch_lookup(). That is, the prefixes
"refs/heads" or "refs/remotes" are omitted.
Also added a new test for testing the new function.
With the new code to make tree iterators support ignore_case,
there is a bug in setting the start entry for range bounded
iterators where memcmp was being used instead of strncasecmp.
This fixes that and expands the tree iterator test to cover
the cases that were broken.
The commit time is already stored as a git_time_t, but we were
parsing is as a uint32_t. This just switches the parser to use
uint64_t which will handle dates further in the future (and adds
some tests of those future dates).
This moves the check for the "encoding" header into a loop which
is just scanning for non-required headers at the end of a commit
header. That loop will skip unrecognized lines (including header
continuation lines) until a terminating completely blank line is
found, and only then does it move to reading the commit message.
This makes tree iterators directly support case insensitivity by
using a secondary index that can be sorted by icase. Also, this
fixes the ambiguity check in the git_status_file API to also be
case insensitive. Lastly, this adds new test cases for case
insensitive range boundary checking for all types of iterators.
With this change, it should be possible to deprecate the spool
and sort iterator, but I haven't done that yet.
This adds a test that confirms that the working directory iterator
can actually correctly process ranges of files case insensitively
with proper sorting and proper boundaries.
This changes the iterator API so that flags can be passed in to
the constructor functions to control the ignore_case behavior.
At this point, the flags are not supported on tree iterators (i.e.
there is no functional change over the old API), but the API
changes are all made to accomodate this.
By the way, I went with a flags parameter because in the future
I have a couple of other ideas for iterator flags that will make
it easier to fix some diff/status/checkout bugs.
In preparation for further iterator changes, this cleans up a few
small things in the iterator API:
* removed the git_iterator_for_repo_index_range API
* made git_iterator_free not be inlined
* minor param name and test function name tweaks
Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.
This was just wrong. Added a test that verifying patch line
numbers even for hunks further into a file and then fixed the
algorithm. I needed to add a little extra state into the patch
so that I could track old and new file numbers independently,
but it should be okay.
This adds an option to checkout a la the diff option to turn off
fnmatch evaluation for pathspec entries. This can be useful to
make sure your "pattern" in really interpretted as an exact file
match only.
It is not legal inside our `p_mmap` function to mmap a zero length
file. This adds a test that exercises that case inside diff and
fixes the code path where we would try to do that.
The fix turns out not to be a lot of code since our default file
content is already initialized to "" which works in this case.
Fixes#1210
This moves the implementation of these two APIs into common code
that will be shared between the two. Also, this adds tests for
the `git_diff_blob_to_buffer` API. Lastly, this adds some extra
`const` to a few places that can use it.
It turns out that using REMOVE_UNTRACKED with checkout for this
particular test was causing the .gitattributes file to be removed
and so we do have to allow for the CRs in the created file...
There are a couple of checkout bugs fixed here. One is with
untracked working directory entries that are prefixes of tree
entries but not in a meaningful way (i.e. "read" is a prefix of
"readme.txt" but doesn't interfere in any way). The second bug
is actually a redo of 07edfa0fc640f85f95507c3101e77accd7d2bf0d
where directory entries in the index that are not in the diff
were not being removed correctly. That fix remedied one case
but broke another.
When checking out with the GIT_CHECKOUT_REMOVE_UNTRACKED option
and there was an entire tree in the working directory and in the
index that is not in the baseline nor target commit, the tree was
correctly(?) removed from the working directory but was not
successfully removed from the index. This fixes that and adds a
test of the functionality.
For clone to work as expected, it should be using a SAFE_CREATE
checkout (i.e. create files that are missing, even if the target
tree matches the current HEAD).
Test a number of other cases, including intentionally forced
conflicts and deeper inspection that trees get created properly.
There is a still a bug in checkout because the first test here
(i.e. test_checkout_typechange__checkout_typechanges_safe) should
be able to pass with GIT_CHECKOUT_SAFE as a strategy, but it will
not because of some lingering submodule checkout issues.
Stash was sometimes obscuring the actual error code, replacing it
with a -1 when there was more descriptive value. This updates
stash to preserve the original error code more reliably along
with a variety of other error handling tweaks.
I believe this is an improvement, but arguably, preserving the
underlying error code may result in values that are harder to
interpret by the caller who does not understand the internals.
Discussion is welcome!
Previously a NULL oid was handled like an empty buffer and
returned a status empty string. This makes git_oid_tostr()
set the output buffer to the empty string instead.
Make checkout update entries in the index for all files that are
updated and/or removed, unless flag GIT_CHECKOUT_DONT_UPDATE_INDEX
is given. To do this, iterators were extended to allow a little
more introspection into the index being iterated over, etc.
This flips checkout back to be driven off the changes between
the baseline and the target trees. This reinstates the complex
code for tracking the contents of the working directory, but
overall, I think the resulting logic is easier to follow.
I've tried to map out the detailed behaviors of checkout and make
sure that we're handling the various cases correctly, along with
providing options to allow us to emulate "git checkout" and "git
checkout-index" with the various flags. I've thrown away flags
in the checkout API that seemed like clutter and added some new
ones. Also, I've converted the conflict callback to a general
notification callback so we can emulate "git checkout" output and
display "dirty" files.
As of this commit, the new behavior is not working 100% but some
of that is probably baked into tests that are not testing the
right thing. This is a decent snapshot point, I think, along the
way to getting the update done.
This adds a failure reporting function that is called by
cl_git_pass which captures the actual error return code and
the error message if available in the failure report.
* gen_pktline() in smart_protocol.c was skipping refspecs that deleted
refs that were not advertised by the server. The new behavior is to
send a delete command with an old-id of zero, which matches the behavior
of the official git client.
* Update test_network_push__delete() in reaction to above fix.
* Obviate messy logic that handles missing push_spec rrefs by canonicalizing
push_spec. After calculate_work(), loid, roid, and rref, are filled in with
exactly what is sent to the server
The diff constructor functions had some confusing names, where the
"old" side of the diff was coming after the "new" side. This
reverses the order in the function name to make it less confusing.
Specifically...
* git_diff_index_to_tree becomes git_diff_tree_to_index
* git_diff_workdir_to_index becomes git_diff_index_to_workdir
* git_diff_workdir_to_tree becomes git_diff_tree_to_workdir
The `git_iterator_reset` command has not been working in all cases
particularly when there is a start and end range. This fixes it
and adds tests for it, and also extends it with the ability to
update the start/end range strings when an iterator is reset.
This removes the need to explicitly pass the repo into iterators
where the repo is implied by the other parameters. This moves
the repo to be owned by the parent struct. Also, this has some
iterator related updates to the internal diff API to lay the
groundwork for checkout improvements.
Ahead-behind count is still a valid operation, even if the two
commits don't have a common merge-base. The old implementation was
buggy, so it returned ENOTFOUND. Fixed now.
There are many different broken filemodes in the wild so we need to
protect against them and give something useful up the chain. Don't
fail when reading a tree from the ODB but normalize the mode as best
we can.
As 664 is no longer a mode that we consider to be valid and gets
normalized to 644, we can stop accepting it in the treebuilder. The
library won't expose it to the user, so any invalid modes are a bug.
`revwalk.h:commit_lookup()` -> `git_revwalk__commit_lookup()`
and make `git_commit_list_parse()` do real error checking that
the item in the list is an actual commit object. Also fixed an
apparent typo in a test name.
Moved it into graph.{c,h} which i created for the new "graph"
functions namespace. Also adjusted the function prototype
to use `size_t` and `const git_oid *`.
There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc). This groups all those functions together into a
new file and converts the code to use that.
This has two enhancements to existing functionality. The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).
This fixes some missed places where we can apply const-ness to
various public APIs.
There are still some index and tree APIs that cannot take const
pointers because we sort our `git_vectors` lazily and so we can't
reliably bsearch the index and tree content without applying a
`git_vector_sort()` first.
This also fixes some missed places where size_t can be used and
where const can be applied to a couple internal functions.
This makes the diff functions that take callbacks both take
the payload parameter after the callback function pointers and
pass the payload as the last argument to the callback function
instead of the first. This should make them consistent with
other callbacks across the API.
Without this change, any failed assertion in the second (or a later) test
inside a test suite has a chance of double deleting memory, resulting in
a heap corruption. See #1096 for details.
This leaves alone the test cases where we "just" use cl_git_sandbox_init()
and cl_git_sandbox_cleanup(). These methods already take good care to not
double delete a repository.
Fixes#1096
The workdir iterator has always tried to ignore .git files, but
it turns out there were some bugs. This makes it more robust at
ignoring .git files.
This also makes iterators always check ".git" case insensitively
regardless of the properties of the system. This will make libgit2
skip ".GIT" and the like. This is different from core git, but on
systems with case insensitive but case preserving file systems,
allowing ".GIT" to be added is problematic.
A number of diff APIs and the `git_checkout_index` API take a
`git_repository` object an operate on the index. This updates
them to take a `git_index` pointer explicitly and only fall back
on the `git_repository` index if the index input is NULL. This
makes it easier to operate on a temporary index.
The index iterator could previously only be created from a repo
object, but this allows creating an iterator from a `git_index`
object instead (while keeping, though renaming, the old function).
The reset hard tests had hardcoded expected file content and was
not correctly compensating for CRLF filtering when a file needed
to be reverted by the reset hard. This fixes that.
The existing p_lstat implementation on win32 is not quite POSIX
compliant when setting errno to ENOTDIR. This adds an option to
make is be compliant so that code (such as checkout) that cares
to have separate behavior for ENOTDIR can use it portably.
This also contains a couple of other minor cleanups in the
posix_w32.c implementations to avoid unnecessary work.
Returning NULL for the string when we haven't signaled an error
condition is counter-intuitive and causes unnecessary edge
cases. Return an empty string when asking for a string value for a
configuration variable such as '[section] var' to avoid these edge
cases.
If the distinction between no value and an empty value is needed, this
can be retrieved from the entry directly. As a side-effect, this
change stops the int parsing functions from segfaulting on such a
variable.
The `git_reset` API with the HARD option is still slightly
broken, but this test now does exercise the ability of the
command to revert modified files.
This is a major reworking of checkout strategy options. The
checkout code is now sensitive to the contents of the HEAD tree
and the new options allow you to update the working tree so that
it will match the index content only when it previously matched
the contents of the HEAD. This allows you to, for example, to
distinguish between removing files that are in the HEAD but not
in the index, vs just removing all untracked files.
Because of various corner cases that arise, etc., this required
some additional capabilities in rmdir and other utility functions.
This includes the beginnings of an implementation of code to read
a partial tree into the index based on a pathspec, but that is
not enabled because of the possibility of creating conflicting
index entries.
So, @nulltoken created a failing test case for checkout that
proved to be particularly daunting. If checkout is given only
a very limited strategy mask (e.g. just GIT_CHECKOUT_CREATE_MISSING)
then it is possible for typechange/rename modifications to leave it
unable to complete the request. That's okay, but the existing code
did not have enough information not to generate an error (at least
for tree/blob conflicts).
This led me to a significant reorganization of the code to handle
the failing case, but it has three benefits:
1. The test case is handled correctly (I think)
2. The new code should actually be much faster than the old code
since I decided to make checkout aware of diff list internals.
3. The progress value accuracy is hugely increased since I added
a fourth pass which calculates exactly what work needs to be
done before doing anything.
* Rework GIT_DIRREMOVAL values to GIT_RMDIR flags, allowing
combinations of flags
* Add GIT_RMDIR_EMPTY_PARENTS flag to remove parent dirs that
are left empty after removal
* Add GIT_MKDIR_VERIFY_DIR to give an error if item is a file,
not a dir (previously an EEXISTS error was ignored, even for
files) and enable this flag for git_futils_mkpath2file call
* Improve accuracy of error messages from git_futils_mkdir
Apparently git_remote_list() includes even remotes for which git_remote_load() would fail. Sorry @nulltoken, false alarm.
This reverts commit f358ec143c.
This fix makes libgit2 capable of parsing annotated tag objects that lack
the optional message/description field.
Previously, libgit2 treated this field as mandatory and raised a tag_error on
such tags. However, the message field is optional.
An example of such a tag is refs/tags/v2.6.16.31-rc1 in Linux:
$ git cat-file tag refs/tags/v2.6.16.31-rc1
object afaa018cefb6af63befef1df7d8febaae904434f
type commit
tag v2.6.16.31-rc1
tagger Adrian Bunk <bunk@stusta.de> 1162716505 +0100
$
This improves docs in some of the public header files, cleans
up and improves some of the example code, and fixes a couple
of pedantic warnings in places.
This adds a new API that allows users to reload the config if the
file has changed on disk. A new config callback function to
refresh the config was added.
The modified time and file size are used to test if the file needs
to be reloaded (and are now stored in the disk backend object).
In writing tests, just using mtime was a problem / race, so I
wanted to check file size as well. To support that, I extended
`git_futils_readbuffer_updated` to optionally check file size in
addition to mtime, and I added a new function `git_filebuf_stats`
to fetch the mtime and size for an open filebuf (so that the
config could be easily refreshed after a write).
Lastly, I moved some similar file checking code for attributes
into filebuf. It is still only being used for attrs, but it
seems potentially reusable, so I thought I'd move it over.
This improves the naming for the rename related functionality
moving it to be called `git_diff_find_similar()` and renaming
all the associated constants, etc. to make more sense.
I also moved the new code (plus the existing `git_diff_merge`)
into a new file `diff_tform.c` where I can put new functions
related to manipulating git diff lists.
This also updates the implementation significantly from the
last revision fixing some ordering issues (where break-rewrite
needs to be handled prior to copy and rename detection) and
improving config option handling.
This implements the basis for diff rename and copy detection,
although it is based on simple SHA comparison right now instead
of using a matching algortihm. Just as `git_diff_merge` can be
used as a post-pass on diffs to emulate certain command line
behaviors, there is a new API `git_diff_detect` which will
update a diff list in-place, adjusting some deltas to RENAMED
or COPIED state (and also, eventually, splitting MODIFIED deltas
where the change is too large into DELETED/ADDED pairs).
This also adds a new test repo that will hold rename/copy/split
scenarios. Right now, it just has exact-match rename and copy,
but the tests are written to use tree diffs, so we should be able
to add new test scenarios easily without breaking tests.
Added `struct git_config_entry`: a git_config_entry contains the key, the value, and the config file level from which a config element was found.
Added `git_config_open_level`: build a single-level focused config object from a multi-level one.
We are now storing `git_config_entry`s in the khash of the config_file
git_index_read_tree() was exposing a parameter to provide the user with
a progress indicator. Unfortunately, due to the recursive nature of the
tree walk, the maximum number of items to process was unknown. Thus,
the indicator was only counting processed entries, without providing
any information how the number of remaining items.