When diff is scanning the working directory, if it finds a file
where it is not sure if the index entry matches the working dir,
it will recalculate the OID (which is pretty expensive). This
adds a new flag to diff so that if the OID calculation finds that
the file actually has not changed (i.e. just the modified time was
altered or such), then it will refresh the stat cache in the index
so that future calls to diff will not have to check the oid again.
When we think the stat cache in the index seems valid and the size
or mode of a file has definitely changed, then don't bother trying
to recalculate the OID of the workdir bits to confirm that it is
modified - just accept that it is modified.
This can result in files that show as modified with no actual diff,
but the behavior actually appears to match Git on the command line.
This also includes a minor optimization to not perform a submodule
lookup on the ".git" directory itself.
The current version of the commit creation and amend function are unsafe
to use when passing the update_ref parameter, as they do not check that
the reference at the moment of update points to what the user expects.
Make sure that we're moving history forward when we ask the library to
update the reference for us by checking that the first parent of the new
commit is the current value of the reference. We also make sure that the
ref we're updating hasn't moved between the read and the write.
Similarly, when amending a commit, make sure that the current tip of the
branch is the commit we're amending.
There is an interesting difference with core Git here, though.
Because libgit2 will do rename detection with the working directory,
in the last case where the HEAD and the working directory both
have the decomposed data and the index has the composed data, we
generate a single status record with two renames whereas Git will
generate one rename (head to index) and one untracked file.
The current FETCH_HEAD parsing code assumes that a quote must end the
branch name. Git however allows for quotes as part of a branch name,
which causes us to consider the FETCH_HEAD file as invalid.
Instead of searching for a single quote char, search for a quote char
followed by SP, which is not a valid part of a ref name.
When diff finds an untracked directory, it emulates Git behavior
by looking inside the directory to see if there are any untracked
items inside it. If there are only ignored items inside the dir,
then diff considers it ignored, even if there is no direct ignore
rule for it.
Checkout was not copying this behavior - when it found an untracked
directory, it just treated it as untracked. Unfortunately, when
combined with GIT_CHECKOUT_REMOVE_UNTRACKED, this made is seem that
checkout (and stash, which uses checkout) was removing ignored
items when you had only asked it to remove untracked ones.
This commit moves the logic for advancing past an untracked dir
while scanning for non-ignored items into an iterator helper fn,
and uses that for both diff and checkout.
To emulate git, stash should not remove untracked git repositories
inside the parent repo, and checkout's REMOVE_UNTRACKED should
also skip over these items.
`git stash` actually prints a warning message for these items.
That should be possible with a checkout notify callback if you
wanted to, although it would require a bit of extra logic as things
are at the moment.
This takes the `--stat` and related example options in the example
diff.c program and converts them to use the `git_diff_get_stats`
API which nicely formats stats for you.
I went to add bar-graph scaling to the stats formatter and noticed
that the `git_diff_stats` structure was holding on to all of the
`git_patch` objects. Unfortunately, each of these objects keeps
the full text of the diff in memory, so this is very expensive. I
ended up modifying `git_diff_stats` to keep just the data that it
needs to keep and allowed it to release the patches. Then, I added
width scaling to the output on top of that.
In making the diff example program match 'git diff' output, I ended
up removing an newline from the sumamry output which I then had to
compensate for in the email formatting to match the expectations.
Lastly, I went through and refactored the tests to use a couple of
helper functions and reduce the overall amount of code there.
Ignore patterns that ended with a trailing '/*' were still needing
to match against another actual '/' character in the full path.
This is not the same behavior as core Git.
Instead, we strip a trailing '/*' off of any patterns that were
matching and just take it to imply the FNM_LEADING_DIR behavior.
There was a latent bug where files that use macro definitions
could be parsed before the macro definitions were loaded. Because
of attribute file caching, preloading files that are going to be
used doesn't add a significant amount of overhead, so let's always
preload any files that could contain macros before we assemble the
actual vector of files to scan for attributes.
When traversing the directory structure, the iterator pushes and
pops ignore files using a vector. Some directories don't have
ignore files, so it uses a path comparison to decide when it is
right to actually pop the last ignore file. This was only
comparing directory suffixes, though, so a subdirectory with the
same name as a parent could result in the parent's .gitignore
being popped off the list ignores too early. This changes the
logic to compare the entire relative path of the ignore file.
When we delete an entry, we also want to refresh the configuration to
catch any changes that happened externally.
This allows us to simplify the logic, as we no longer need to delete
these variables internally. The whole state will be refreshed and the
deleted entries won't be there.
With the isolation of complex reads, we can now try to refresh the
on-disk file before reading a value from it.
This changes the semantics a bit, as before we could be sure that a
string we got from the configuration was valid until we wrote or
refreshed. This is no longer the case, as a read can also invalidate the
pointer.
In order to have consistent views of the config files for remotes,
submodules et al. and a configuration that represents what is currently
stored on-disk, we need a way to provide a view of the configuration
that does not change.
The goal here is to provide the snapshotting part by creating a
read-only copy of the state of the configuration at a particular point
in time, which does not change when a repository's main config changes.
On set, we set/add the value written to the config's internal values,
but we do not refresh old values.
Document this in a test in preparation for the refresh changes.
The checks to see if files were out of date in the attibute cache
was wrong because the cache-breaker data wasn't getting stored
correctly. Additionally, when the cache-breaker triggered, the
old file data was being leaked.
In the threading tests, I was still seeing a race condition where
the same item could end up being inserted multiple times into the
index. Preserving the sorted-ness of the index outside of the
`index_insert` call fixes the issue.
This is a big refactoring of the attribute file cache to be a bit
simpler which in turn makes it easier to enforce a lock around any
updates to the cache so that it can be used in a threaded env.
Tons of changes to the attributes and ignores code.
This makes the lock management on the index a little bit broader,
having a number of routines hold the lock across looking up the
item to be modified and actually making the modification. Still
not true thread safety, but more pure index modifications are now
safe which allows the simple cases (such as starting up a diff
while index modifications are underway) safe enough to get the
snapshot without hitting allocation problems.
As part of this, I simplified the allocation of index entries to
use a flex array and just put the path at the end of the index
entry. This makes every entry self-contained and makes it a
little easier to feel sure that pointers to strings aren't
being accidentally copied and freed while other references are
still being held.
This adds a basic test of doing simultaneous diffs on multiple
threads and adds basic locking for the attr file cache because
that was the immediate problem that arose from these tests.
This makes the index iterator honor the GIT_ITERATOR_IGNORE_CASE
and GIT_ITERATOR_DONT_IGNORE_CASE flags without modifying the
index data itself. To take advantage of this, I had to export a
number of the internal index entry comparison functions. I also
wrote some new tests to exercise the capability.
The usefulness of these helpers came up for me while debugging
some of the iterator changes that I was making, so since they
have also been requested (albeit indirectly) I thought I'd include
them.
git_branch_t is an enum so requesting GIT_BRANCH_LOCAL | GIT_BRANCH_REMOTE is not possible as it is not a member of the enum (at least VS2013 C++ complains about it).
This fixes a regression introduced in commit a667ca8298 (PR #1946).
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Ignore rules with slashes in them are matched using FNM_PATHNAME
and use the path to the .gitignore file from the root of the
repository along with the path fragment (including slashes) in
the ignore file itself. Unfortunately, the relative path to the
.gitignore file was being applied to the global core.excludesfile
if that was also named ".gitignore".
This fixes that with more precise matching and includes test for
ignore rules with leading slashes (which were the primary example
of this being broken in the real world).
This also backports an improvement to the file context logic from
the threadsafe-iterators branch where we don't rely on mutating
the key of the attribute file name to generate the context path.
There were a couple bugs in popping ignore files during iteration
that could result in incorrect decisions be made and thus ignore
files below the root either not being loaded correctly or not
being popped at the right time.
One bug was an off-by-one in comparing the path of the gitignore
file with the path being exited during iteration.
The second bug was not correctly truncating the path being tracked
during traversal if there were no ignores on the list (i.e. when
you have no .gitignore at the root, but do have some in contained
directories).
This updates how libgit2 treats submodule-like directories that
actually have tracked content inside of them. This is a strange
corner case, but it seems that many people have abortive submodule
setups and then just went ahead and added the files into the
parent repository. In this case, we should just treat the
submodule as if it was a normal directory.
Libgit2 will still try to skip over real submodules and contained
repositories that do not have tracked files inside them, but this
adds some new handling for cases where the apparently submodule
data is in conflict with the actual list of tracked files.
The internal buffer in the `git_path_iconv_t` structure was not
being reset before the calls to `iconv` were made to convert data,
so if there were multiple decomposed Unicode paths in a single
directory, paths after the first one were being appended to the
first instead of treated as independent data.
The base for the relative urls is determined as follows, with descending
priority:
- remote url of HEAD's remote tracking branch
- remote "origin"
- workdir
This follows git.git behaviour
Cloning from an empty repo must set master's upstream to origin's
master, even if neither of them exist.
Fetching from a non-empty origin must then mark the master branch
for-merge. This currently fails.
One blame test replies on being run from within the libgit2
repository to leverage having a longer history to play with, but
some bundled versions of libgit2 don't have the whole libgit2
history. This just skips that test if the repository can't be
opened.
There was a little bug where the submodule cache thought that the
index date was out of date even when it wasn't that was resulting
in some extra scans of index data even when not needed.
Mostly this commit adds a bunch of new tests including adding and
removing submodules in the index and in the HEAD and seeing if we
can automatically pick them up when refreshing.
With the new submodule cache validity checks, we generally don't
need to call git_submodule_reload_all to have up-to-date submodule
data. Some tests are still calling it where I want to actually
test that it can be called safely and doesn't break anything, but
mostly it is not needed.
This also expands some of the existing submodule tests to cover
some variants on the behavior that was already being tested.
Wrote tests that try adding, removing, and updating the name of
submodules which showed a number of problems with how we account
for changes when incrementally updating the submodule info. Most
of these issues didn't exist before because reloading would always
blow away the old submodule data.
There are a few places where we need to join three strings to
assemble a path. This adds a simple join3 function to avoid the
comparatively expensive join_n (which calls strlen on each string
twice).
The order in this function is the opposite to what
create_with_fetchspec() has, so change this one, as url-then-refspec is
what git does.
As we need to break compilation and the swap doesn't do that, let's take
this opportunity to rename in-memory remotes to anonymous as that's
really what sets them apart.
When a submodule was inserted with a different path and name, the
return value from khash greater than zero was allowed to propagate
back out to the caller when it should really be zeroed. This led
to a possible crash when reloading submodules if that was the
first time that submodule data was loaded.
The reload_all call could end up dereferencing a NULL pointer if
there was an error while attempting to load the submodules config
data (i.e. invalid content in the gitmodules file). This fixes it.
When a directory containing a .git directory (or even just a plain
gitlink) was found, libgit2 was going out of its way to treat it
specially. This seemed like it was necessary because the diff
code was not originally emulating Git's behavior for untracked
directories correctly (i.e. scanning for ignored vs untracked items
inside). Now that libgit2 diff mimics Git's untracked directory
behavior, the special handling for contained Git repos is actually
incorrect and this commit rips it out.
`git_submodule` objects were already refcounted internally in case
the submodule name was different from the path at which it was
stored. This makes that refcounting externally used as well, so
`git_submodule_lookup` and `git_submodule_add_setup` return an
object that requires a `git_submodule_free` when done.
The reflog append function was overzealous in its checking. When passed
an old and new ids, it should not do any checking, but just serialize
the data to a reflog entry.
The existing ones lack checking zeroed ids when switching back from an
unborn branch as well as what happens when detaching.
The reflog appending function mistakenly wrote zeros when dealing with a
detached HEAD. This explicitly checks for those situations and fixes
them.
When we update the current branch, we must also append to HEAD's reflog
to keep them in sync.
This is a bit of a hack, but as git.git says, it covers 100% of
default cases.
This is not something anybody would ever do; removing HEAD makes the
.git/ directory no longer be a repository, so we wouldn't be expected to
handle such a situation.
If the pqueue comparison fn returned just 0 or 1 (think "a<b")
then the sort order of returned items could be wrong because there
was a "< 0" that really needed to be "<= 0". Yikes!!!
The git_odb_exists_prefix API was not dealing correctly when a
later backend returned GIT_ENOTFOUND even if an earlier backend
had found the object.
Additionally, the unit tests were not properly exercising the API
and had a couple mistakes in checking the results.
Lastly, since the backends are not expected to behavior correctly
unless all bytes of the short id are zero except for the prefix,
this makes the ODB prefix APIs explicitly clear out the extra
bytes so the user doesn't have to be as careful.
This finds a short id string that will unambiguously select the
given object, starting with the core.abbrev length (usually 7)
and growing until it is no longer ambiguous.
This adds `git_diff_buffers` and `git_patch_from_buffers`. This
also includes a bunch of internal refactoring to increase the
shared code between these functions and the blob-to-blob and
blob-to-buffer APIs, as well as some higher level assert helpers
in the tests to also remove redundancy.
* Make GIT_INLINE an internal definition so it cannot be used in
public headers
* Fix language in CONTRIBUTING
* Make index caps API use signed instead of unsigned values
This adds an API to amend an existing commit, basically a shorthand
for creating a new commit filling in missing parameters from the
values of an existing commit. As part of this, I also added a new
"sys" API to create a commit using a callback to get the parents.
This allowed me to rewrite all the other commit creation APIs so
that temporary allocations are no longer needed.
This fixes a number of warnings with the Windows 64-bit build
including a test failure in test_repo_message__message where an
invalid pointer to a git_buf was being used.
This fixes a typo I made for setting the sorted flag on the index
after a reload. That typo didn't actually cause any test failures
so I'm also adding a test that explicitly checks that the index is
correctly sorted after a reload when ignoring case and when not.
This updates the git_pqueue to simply be a set of specialized
init/insert/pop functions on a git_vector.
To preserve the pqueue feature of having a fixed size heap, I
converted the "sorted" field in git_vectors to a more general
"flags" field so that pqueue could mix in it's own flag. This
had a bunch of ramifications because a number of places were
directly looking at the vector "sorted" field - I added a couple
new git_vector helpers (is_sorted, set_sorted) so the specific
representation of this information could be abstracted.
There were some confusing issues mixing up the number of bytes
written to the zstream output buffer with the number of bytes
consumed from the zstream input. This reorganizes the zstream
API and makes it easier to deflate an arbitrarily large input
while still using a fixed size output.
This removes the fetchRecurse compiler warnings and makes the
behavior match the other submodule options (i.e. the in-memory
setting can be reset to the on-disk value).
Writing a sample Javascript driver pointed out some extra
whitespace handling that needed to be done in the diff driver.
This adds some tests with some sample javascript code that I
pulled off of GitHub just to see what would happen. Also, to
clean up the userdiff test data, I did a "git gc" and packed
up the test objects.
This moves the expected and actual test data along with the source
data for the userdiff tests into the tests/resources/userdiff test
repo and updates the test to use that.
Reorganize the builtin driver table slightly so that core Git
builtin definitions can be imported verbatim. Then take a few of
the core Git drivers and pull them in.
This also creates a test of diffs with the builtin HTML driver
which led to some small error handling fixes in the driver
selection logic.
Don't try to determine whether the system supports file modes
when putting the tree data in the index during checkout. The tree's
mode is canonical and did not come from stat(2) in the first place.
It's hard or even impossible to correctly free the string buffer
allocated by git_patch_to_str in some circumstances. Drop the function
so people have to use git_patch_to_buf instead - git_buf has a dedicated
destructor.
The "merge none" (don't automerge) flag was only to aide in
merge trivial tests. We can easily determine whether merge
trivial resulted in a trivial merge or an automerge by examining
the REUC after automerge has completed.
The default merge_file level was XDL_MERGE_MINIMAL, which will
produce conflicts where there should not be in the case where
both sides were changed identically. Change the defaults to be
more aggressive (XDL_MERGE_ZEALOUS) which will more aggressively
compress non-conflicts. This matches git.git's defaults.
Increase testing around reverting a previously reverted commit to
illustrate this problem.
Extend the "unmodified" submodule workdir test to include
uninitialized submodules, to prevent reporting submodules as
modified when they're not in the workdir at all.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
Ok, scrap the previous commit. This is the right overflow check that
takes care of 64 bit overflow **and** 32-bit overflow, which needs to be
considered because the pool malloc can only allocate 32-bit elements in
one go.
This wasn't being tested and since it has a callback, I fixed it
even though the return value of this callback is not treated like
any of the other callbacks in the API.
This adds tests that try canceling an indexer operation from
within the progress callback.
After writing the tests, I wanted to run this under valgrind and
had a number of errors in that situation because mmap wasn't
working. I added a CMake option to force emulation of mmap and
consolidated the Amiga-specific code into that new place (so we
don't actually need separate Amiga code now, just have to turn on
-DNO_MMAP).
Additionally, I made the indexer code propagate error codes more
reliably than it used to.
Clone callbacks can return non-zero values to cancel the clone.
This adds some tests to verify that this actually works and updates
the documentation to be clearer that this can happen and that the
return value will be propagated back by the clone function.
The checkout notify callback behavior on non-zero return values
was not being tested. This adds tests, fixes a bug with positive
values, and clarifies the documentation to make it clear that the
checkout can be canceled via this mechanism.
The callback to supply data chunks could return a negative value
to stop creation of the blob, but we were neither using GIT_EUSER
nor propagating the return value. This makes things use the new
behavior of returning the negative value back to the user.
This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
This continues auditing all the places where GIT_EUSER is being
returned and making sure to clear any existing error using the
new giterr_user_cancel helper. As a result, places that relied
on intercepting GIT_EUSER but having the old error preserved also
needed to be cleaned up to correctly stash and then retrieve the
actual error.
Additionally, as I encountered places where error codes were not
being propagated correctly, I tried to fix them up. A number of
those fixes are included in the this commit as well.
This adds giterr_user_cancel to return GIT_EUSER and clear any
error message that is sitting around. As a result of using that
in places, we need to be more thorough with capturing errors that
happen inside a callback when used internally. To help with that,
this also adds giterr_capture and giterr_restore so that when we
internally use a foreach-type function that clears errors and
converts them to GIT_EUSER, it is easier to restore not just the
return value, but the actual error message text.
The frontend used to look at the file directly, but that's obviously not
the right thing to do. Expose it on the backend and use that function
instead.
git-core only writes to the reflogs of HEAD, refs/heads/ and,
refs/notes/ or if there is already a reflog in place. Adjust our code to
follow these semantics.
When doing copy detection, it is often necessary to include
UNMODIFIED records in the git_diff so they are available as source
records for GIT_DIFF_FIND_COPIES_FROM_UNMODIFIED. Yet in the final
diff, often you will not want to have these UNMODIFIED records.
This adds a flag which marks these UNMODIFIED records for deletion
from the diff list so they will be removed after the rename detect
phase is over.
When FIND_COPIES is used in combination with BREAK_REWRITES for
rename detection, there was a bug where the split MODIFIED delta
was only used as a target for RENAME records and not for COPIED
records. This fixes that, converting the split into a pair of
DELETED and COPIED deltas when that circumstance arises.
Whenever a reference is created or updated, we need to write to the
reflog regardless of whether the user gave us a message, so we shouldn't
leave that to the ref frontend, but integrate it into the backend.
This also eliminates the race between ref update and writing to the
reflog, as we protect the reflog with the ref lock.
As an additional benefit, this reflog append on the backend happens by
appending to the file instead of parsing and rewriting it.
This renamed `git_khash_str` to `git_strmap`, `git_hash_oid` to
`git_oidmap`, and deletes `git_hashtable` from the tree, plus
adds unit tests for `git_strmap`.
There was an error in the tree iterator where it would
delete two tree levels instead of just one when popping
up a tree level. Unfortunately the test data for the
tree iterator did not have any deep trees with subtrees
in the middle of the tree items, so this problem went
unnoticed. This contains the 1-line fix plus new test
data and tests that reveal the issue.
This gives `git_status_foreach()` back its old behavior of
emulating the "--untracked=all" behavior of git. You can
get any of the various --untracked options by passing flags
to `git_status_foreach_ext()` but the basic version will
keep the behavior it has always had.
This "fixes" the broken t18 status tests to accurately reflect
the new behavior for "created" untracked subdirectories. See
discussion in the PR for more details.
This also contains the submodules unit test that I forgot to
git add, and ports most of the t18-status.c tests to clar (still
missing a couple of the git_status_file() single file tests).
This is a work in progress. This adds two new sets of tests,
the issue_592 tests from @nulltoken's pull request #601 and
some new tests for submodules. The submodule tests still have
issues where the status is not reported correctly. That needs
to be fixed before merge.
Created a copy of tests/resources/testrepo.git that is compatible
with the Clar sandboxing helpers.
Restructured commit test suites to use Clar sandbox helpers.
Now using typed data arrays rather than lots of macros to define test
cases.
This also includes droping `git_buf_lasterror` because it makes no sense
in the new system. Note that in most of the places were it has been
dropped, the code needs cleanup. I.e. GIT_ENOMEM is going away, so
instead it should return a generic `-1` and obviously not throw
anything.
This reverts the changes to the GIT_STATUS constants and adds a
new enumeration to describe the type of change in a git_diff_delta.
I don't love this solution, but it should prevent strange errors
from occurring for now. Eventually, I would like to unify the
various status constants, but it needs a larger plan and I just
wanted to eliminate this breakage quickly.
It turns out that commit 31e9cfc4cbcaf1b38cdd3dbe3282a8f57e5366a5
did not fix the GIT_USUSED behavior on all platforms. This commit
walks through and really cleans things up more thoroughly, getting
rid of the unnecessary stuff.
To remove the use of some GIT_UNUSED, I ended up adding a couple
of new iterators for hashtables that allow you to iterator just
over keys or just over values.
In making this change, I found a bug in the clar tests (where we
were doing *count++ but meant to do (*count)++ to increment the
value). I fixed that but then found the test failing because it
was not really using an empty repo. So, I took some of the code
that I wrote for iterator testing and moved it to clar_helpers.c,
then made use of that to make it easier to open fixtures on a
per test basis even within a single test file.
This is a major reorganization of the diff code. This changes
the diff functions to use the iterators for traversing the
content. This allowed a lot of code to be simplified. Also,
this moved the functions relating to outputting a diff into a
new file (diff_output.c).
This includes a number of other changes - adding utility
functions, extending iterators, etc. plus more tests for the
diff code. This also takes the example diff.c program much
further in terms of emulating git-diff command line options.
Once I added tests for the whitespace handling options of
diff, I realized that there were some bugs. This fixes
those and adds the new tests into the test suite.
This fixes several bugs, updates tests and docs, eliminates the
FILE* assumption in favor of printing callbacks for the diff patch
formatter helpers, and adds a "diff" example function that can
perform a diff from the command line.
This gets the basic plumbing in place for git_diff_blob.
There is a known issue where additional parameters like
the number of lines of context to display on the diff
are not working correctly (which leads one of the new
unit tests to fail).
Since casting to void works to eliminate errors with unused
parameters on all platforms, avoid the various special cases.
Over time, it will make sense to eliminate the GIT_UNUSED
macro completely and just have GIT_UNUSED_ARG.
This makes so much sense that I can't believe it hasn't been done
before. Kill the old `git_fbuffer` and read files straight into
`git_buf` objects.
Also: In order to fully support 4GB files in 32-bit systems, the
`git_buf` implementation has been changed from using `ssize_t` for
storage and storing negative values on allocation failure, to using
`size_t` and changing the buffer pointer to a magical pointer on
allocation failure.
Hopefully this won't break anything.
Add unit tests to confirm ignore directory pattern matches and
to confirm that ignore and attribute files are loaded properly
into the attribute file cache.
This takes all of the functions that look up simple data about
paths (such as `git_futils_isdir`) and moves them over to path.h
(becoming `git_path_isdir`). This leaves fileops.h just with
functions that actually manipulate the filesystem or look at
the file contents in some way.
As part of this, the dir.h header which is really just for win32
support was moved into win32 (with some minor changes).
Per issue #533, the handling of relative paths in attribute
and ignore files was not right. Fixed this by pre-joining
the relative path of the attribute/ignore file onto the match
string when a full path match is required.
Unfortunately, fixing this required a bit more code than I
would have liked because I had to juggle things around so that
the fnmatch parser would have sufficient information to prepend
the relative path when it was needed.
This fixes issue 532 that attributes (and gitignores) could not
be checked for files that don't exist. It should be possible to
query such things regardless of the existence of the file.
Adds support for .gitignore files to git_status_foreach() and
git_status_file(). This includes refactoring the gitattributes
code to share logic where possible. The GIT_STATUS_IGNORED flag
will now be passed in for files that are ignored (provided they
are not already in the index or the head of repo).
This updates to implementation of gitattribute macros to be much more
similar to core git (albeit not 100%) and to handle expansion of
macros within macros, etc. It also cleans up the refcounting usage
with macros to be much cleaner.
Also, this adds a new vector function `git_vector_insert_sorted()`
which allows you to maintain a sorted list as you go. In order to
write that function, this changes the function `git__bsearch()` to
take a somewhat different set of parameters, although the core
functionality is still the same.
Add support for git attribute macro definitions. Also, add
support for cache flush API to clear the attribute file content
cache when needed.
Additionally, improved the handling of global and system files,
making common utility functions in fileops and converting config
and attr to both use the common functions.
Adds a bunch more tests and fixed some memory leaks. Note that
adding macros required me to use refcounted attribute assignment
definitions, which complicated, but probably improved memory usage.
This adds APIs for querying git attributes. In addition to
the new API in include/git2/attr.h, most of the action is in
src/attr_file.[hc] which contains utilities for dealing with
a single attributes file, and src/attr.[hc] which contains
the implementation of the APIs that merge all applicable
attributes files.
It was not safe for git_buf_joinpath to be used with a pointer
into the buf itself because a reallocation could invalidate
the input parameter that pointed into the buffer. This patch
makes it safe to self join, at least for the leading input to
the join, which is the common "append" case for self joins.
Also added unit tests to explicitly cover this case.
This should actually fix#511
This converts virtually all of the places that allocate GIT_PATH_MAX
buffers on the stack for manipulating paths to use git_buf objects
instead. The patch is pretty careful not to touch the public API
for libgit2, so there are a few places that still use GIT_PATH_MAX.
This extends and changes some details of the git_buf implementation
to add a couple of extra functions and to make error handling easier.
This includes serious alterations to all the path.c functions, and
several of the fileops.c ones, too. Also, there are a number of new
functions that parallel existing ones except that use a git_buf
instead of a stack-based buffer (such as git_config_find_global_r
that exists alongsize git_config_find_global).
This also modifies the win32 version of p_realpath to allocate whatever
buffer size is needed to accommodate the realpath instead of hardcoding
a GIT_PATH_MAX limit, but that change needs to be tested still.
The ownership semantics have been changed all over the library to be
consistent. There are no more "borrowed" or duplicated references.
Main changes:
- `git_repository_open2` and `3` have been dropped.
- Added setters and getters to hotswap all the repository owned
objects:
`git_repository_index`
`git_repository_set_index`
`git_repository_odb`
`git_repository_set_odb`
`git_repository_config`
`git_repository_set_config`
`git_repository_workdir`
`git_repository_set_workdir`
Now working directories/index files/ODBs and so on can be
hot-swapped after creating a repository and between operations.
- All these objects now have proper ownership semantics with
refcounting: they all require freeing after they are no longer
needed (the repository always keeps its internal reference).
- Repository open and initialization has been updated to keep in
mind the configuration files. Bare repositories are now always
detected, and a default config file is created on init.
- All the tests affected by these changes have been dropped from the
old test suite and ported to the new one.
Update all stack allocations of git_filebuf to use GIT_FILEBUF_INIT
and make git_filebuf_open and git_filebuf_cleanup safe to be called
multiple times on the same buffer.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
This new version of the references code is significantly faster and
hopefully easier to read.
External API stays the same. A new method `git_reference_reload()` has
been added to force updating a memory reference from disk. In-memory
references are no longer updated automagically -- this was killing us.
If a reference is deleted externally and the user doesn't reload the
memory object, nothing critical happens: any functions using that
reference should fail gracefully (e.g. deletion, renaming, and so on).
All generated references from the API are read only and must be free'd
by the user. There is no reference counting and no traces of generated
references are kept in the library.
There is no longer an internal representation for references. There is
only one reference struct `git_reference`, and symbolic/oid targets are
stored inside an union.
Packfile references are stored using an optimized struct with flex array
for reference names. This should significantly reduce the memory cost of
loading the packfile from disk.
Currently libgit2 shares pointers to its internal reference cache with
the user. This leads to several problems like invalidation of reference
pointers when reordering the cache or manipulation of the cache from
user side.
Give each user its own git_reference instead of leaking the internal
representation (struct reference).
Add the following new API functions:
* git_reference_free
* git_reference_is_packed
Signed-off-by: schu <schu-github@schulog.org>
The functions loose_object_mode and loose_object_dir_mode call stat
inside an assert statement which isn't evaluated when compiling in
Release mode (NDEBUG) and leads to failing tests. Replace it.
Signed-off-by: schu <schu-github@schulog.org>
This makes it slightly easier to debug test failures when one test
opens a repo, has a failure, and doesn't get a chance to close it for
the next test. Now, instead of getting no feedback, we at least see
test failure information.
The following files now have 0444 permissions:
- loose objects
- pack indexes
- pack files
- packs downloaded by fetch
- packs downloaded by the HTTP transport
And the following files now have 0666 permissions:
- config files
- repository indexes
- reflogs
- refs
This brings libgit2 more in line with Git.
Note that git_filebuf_commit() and git_filebuf_commit_at() have both
gained a new mode parameter.
The latter change fixes an important issue where filebufs created with
GIT_FILEBUF_TEMPORARY received 0600 permissions (due to mkstemp(3)
usage). Now we chmod() the file before renaming it into place.
Tests have been added to confirm that new commit, tag, and tree
objects are created with the right permissions. I don't have access to
Windows, so for now I've guarded the tests with "#ifndef GIT_WIN32".
To further match how Git behaves, this change makes most of the
directories libgit2 creates in a git repo have a file mode of
0777. Specifically:
- Intermediate directories created with git_futils_mkpath2file() have
0777 permissions. This affects odb_loose, reflog, and refs.
- The top level folder for bare repos is created with 0777
permissions.
- The top level folder for non-bare repos is created with 0755
permissions.
- /objects/info/, /objects/pack/, /refs/heads/, and /refs/tags/ are
created with 0777 permissions.
Additionally, the following changes have been made:
- fileops functions that create intermediate directories have grown a
new dirmode parameter. The only exception to this is filebuf's
lock_file(), which unconditionally creates intermediate directories
with 0777 permissions when GIT_FILEBUF_FORCE is set.
- The test runner now sets the umask to 0 before running any
tests. This ensurses all file mode checks are consistent across
systems.
- t09-tree.c now does a directory permissions check. I've avoided
adding this check to other tests that might reuse existing
directories from the prefabricated test repos. Because they're
checked into the repo, they have 0755 permissions.
- Other assorted directories created by tests have 0777 permissions.
When trying to find the end of an email, instead of starting at the
beginning of the signature, we start at the end of the name (after the
first '<').
This brings libgit2 more in line with Git's behavior when reading out
existing signatures.
However, note that Git does not allow names like these through the
usual porcelain; instead, it silently strips any '>' characters it
sees.
The v0.99 tag in the Git repo triggers this behavior:
http://git.kernel.org/?p=git/git.git;a=tag;h=d6602ec5194c87b0fc87103ca4d67251c76f233a
Ideally, we'd allow the tag to be instantiated even though the tagger
field is missing, but this at the very least prevents libgit2 from
crashing.
To test this bug, a new repository has been added based on the test
branch in testrepo.git. It contains a "e90810b" tag that looks like
this:
object e90810b8df3e80c413d903f631643c716887138d
type commit
tag e90810b
This is a very simple tag.
This ensures commit->message is always non-NULL, even if the commit
message is empty or consists of only a newline.
One such commit can be found in the wild in the jQuery repository:
25b424134f
The documentation is a bit misleading. The subsection name is always
case-sensitive, but with a [section.subsection] header, the subsection
is transformed to lowercase when the configuration is parsed.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
git_repository_config wants to take the global and system paths again
so that one can be explicit if needed.
The git_repository_config_autoload function is provided for the cases
when it's good enough for the library to guess where those files are
located.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
git_signature_new() and git_signature_now() currently don't return error
codes. Change the API to return error codes and not pointers to let the
user handle errors properly.
Signed-off-by: schu <schu-github@schulog.org>
When you open an index with git_index_open, the file is read before
the function returns. Thus, calling git_index_read after that is
useless.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
/home/kas/git/public/libgit2/tests/t00-core.c: In function ‘test_cmp’:
/home/kas/git/public/libgit2/tests/t00-core.c:78:10: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t00-core.c:78:22: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t07-hashtable.c: In function ‘hash_func’:
/home/kas/git/public/libgit2/tests/t07-hashtable.c:42:7: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t08-tag.c: In function ‘_gittest__write0’:
/home/kas/git/public/libgit2/tests/t08-tag.c:141:21: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t08-tag.c: In function ‘_gittest__write2’:
/home/kas/git/public/libgit2/tests/t08-tag.c:192:21: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t08-tag.c: In function ‘_gittest__write3’:
/home/kas/git/public/libgit2/tests/t08-tag.c:227:21: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t04-commit.c: In function ‘_gittest__write0’:
/home/kas/git/public/libgit2/tests/t04-commit.c:650:21: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t04-commit.c:651:21: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t04-commit.c: In function ‘_gittest__root0’:
/home/kas/git/public/libgit2/tests/t04-commit.c:723:21: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t04-commit.c:724:21: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/tests/t12-repo.c: In function ‘write_file’:
/home/kas/git/public/libgit2/tests/t12-repo.c:360:24: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
The `hashfile` function has been moved to ODB, next to `git_odb_hash`.
Global state has been removed from the dirent call in `status.c`,
because global state is killing the rainforest and causing global
warming.
Add git_status_hashfile() to get blob's object id for a file without adding
it to the object database or needing a repository at all.
This functionality is similar to `git hash-object` without '-w'.
The direct-writes commit left some (slow) internals methods that
were no longer needed. These have been removed.
Also, the Reflog code was using the old `git_signature__write`, so
it has been rewritten to use a normal buffer and the new `writebuf`
signature writer. It's now slightly simpler and faster.
DIRECT WRITES ARE BACK AND FASTER THAN EVER. The streaming writer to the
ODB was an overkill for the smaller objects like Commit and Tags; most
of the streaming logic was taking too long.
This commit makes Commits, Tags and Trees to be built-up in memory, and
then written to disk in 2 pushes (header + data), instead of streaming
everything.
This is *always* faster, even for big files (since the git_filebuf class
still does streaming writes when the memory cache overflows). This is
also a gazillion lines of code smaller, because we don't have to
precompute the final size of the object before starting the stream (this
was kind of defeating the point of streaming, anyway).
Blobs are still written with full streaming instead of loading them in
memory, since this is still the fastest way.
A new `git_buf` class has been added. It's missing some features, but
it'll get there.
So far libgit2 didn't support reference logs (reflog). Add a new
git_reflog_* API for basic reading and writing of reflogs:
* git_reflog_read
* git_reflog_write
* git_reflog_free
Signed-off-by: schu <schu-github@schulog.org>
Drop the GLibc implementation of Merge Sort and replace it with Timsort.
The algorithm has been tuned to work on arrays of pointers (void **),
so there's no longer a need to abstract the byte-width of each element
in the array.
All the comparison callbacks now take pointers-to-elements, not
pointers-to-pointers, so there's now one less level of dereferencing.
E.g.
int index_cmp(const void *a, const void *b)
{
- const git_index_entry *entry_a = *(const git_index_entry **)(a);
+ const git_index_entry *entry_a = (const git_index_entry *)(a);
The result is up to a 40% speed-up when sorting vectors. Memory usage
remains lineal.
A new `bsearch` implementation has been added, whose callback also
supplies pointer-to-elements, to uniform the Vector API again.
`git_futils_rmdir_r`: rename, clean up.
`git_reference_rename`: cleanup. Do not use 3x4096 buffers on the stack
or things will get ugly very fast. We can reuse the same buffer.
So far libgit2 didn't handle the following scenarios:
* Rename of reference m -> m/m
* Rename of reference n/n -> n
Fixed.
Since we don't write reflogs, we have to delete any old reflog for the
renamed reference. Otherwise git.git will possibly fail when it finds
invalid logs.
Reported-by: nulltoken <emeric.fermas@gmail.com>
Signed-off-by: schu <schu-github@schulog.org>
the old name succeeds, e.g. refs/heads/foo -> refs/heads/foo/bar
Reported-by: nulltoken <emeric.fermas@gmail.com>
Signed-off-by: schu <schu-github@schulog.org>
Removing a section variable doesn't remove its section
header. Overwrite the config10 file so there are no changes after the
test is run.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
The routine remove duplictes from the vector. Only the last added element
of elements with equal keys remains in the vector.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
git_signature__parse used to be very strict about what's a well-formed
signature. Add tests checking git_signature__parse can stick with
"unexpected" signatures (IOW no author name and / or no email, etc).
Signed-off-by: schu <schu-github@schulog.org>
The old `git_fileops_prettify_path` has been replaced with
`git_path_prettify`. This is a much simpler method that uses the OS's
`realpath` call to obtain the full path for directories and resolve
symlinks.
The `realpath` syscall is the original POSIX call in Unix system and
an emulated version under Windows using the Windows API.
Cleaned up the structure of the whole OS-abstraction layer.
fileops.c now contains a set of utility methods for file management used
by the library. These are abstractions on top of the original POSIX
calls.
There's a new file called `posix.c` that contains
emulations/reimplementations of all the POSIX calls the library uses.
These are prefixed with `p_`. There's a specific posix file for each
platform (win32 and unix).
All the path-related methods have been moved from `utils.c` to `path.c`
and have their own prefix.
A bunch of redundant methods have been removed from the external API.
- All the reference/tag creation methods with `_f` are gone. The force
flag is now passed as an argument to the normal create methods.
- All the different commit creation methods are gone; commit creation
now always requires a `git_commit` pointer for parents and a `git_tree`
pointer for tree, to ensure that corrupted commits cannot be generated.
- All the different tag creation methods are gone; tag creation now
always requires a `git_object` pointer to ensure that tags are not
created to inexisting objects.
If a config has several files, we need to check all of them before we
can say that a variable doesn't exist.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Make sure the user immediately sees the feedback, '.' or 'F', for a
test. If it's only in the buffer, it may gets "lost" in case of error.
Signed-off-by: schu <schu-github@schulog.org>