We used to require loose references to contain only an OID (possibly
after trimming the string). This is however not enough for letting us
lookup FETCH_HEAD, which can have a lot of content after the initial
OID.
Change the parsing rules so that a loose refernce must e at least 40
bytes long and the 41st (if it's there) must be accepted by
isspace(3). This makes the trim unnecessary, so only do it for
symrefs. This fixes#977.
The fix for fetching from empty repositories (22935b06d protocol:
don't store flushes; 2012-10-07) forgot to take into account the
deletion of the flush pkt in the HTTP transport. As a result, the HEAD
ref advertisement where we detect the remote's capabilities was
deleted instead. Fix this.
This started as a complex new test for checkout going through the
"typechanges" test repository, but that revealed numerous issues
with checkout, including:
* complete failure with submodules
* failure to create blobs with exec bits
* problems when replacing a tree with a blob because the tree
"example/" sorts after the blob "example" so the delete was
being processed after the single file blob was created
This fixes most of those problems and includes a number of other
minor changes that made it easier to do that, including improving
the TYPECHANGE support in diff/status, etc.
This is just some cleanup code, rearranging some of the checkout
code where TYPECHANGE support was added and adding some comments
to the diff header regarding the constants.
When I wrote the diff code, I based it on core git's diff output
which tends to split a type change into an add and a delete. But
core git's status has the notion of a T (typechange) flag for a
file. This introduces that into our status APIs and modifies the
diff code so it can be forced to not split type changes.
The adds a test for the submodule diff capabilities and then
fixes a few bugs with how the output is generated. It improves
the accuracy of OIDs in the diff delta object and makes the
submodule output more closely mirror the OIDs that will be used
by core git.
There are a few cases where diff should leave directories in
the diff list if we want to match core git, such as when the
directory contains a .git dir. That feature was lost when I
introduced some of the new submodule handling.
This restores that and then fixes a couple of related to diff
output that are triggered by having diffs with directories in
them.
Also, this adds a new flag that can be passed to diff if you
want diff output to actually include the file content of any
untracked files.
The reference is only needed inside the function. We mistakenly
increased the reference counter causing the ODB not to get freed and
leaking descriptors.
Storing flushes in the refs vector doesn't let us recognize when the
remote is empty, as we'd always introduce at least one element into
it. These flushes aren't necessary, so we can simply ignore them.
We don't have anything useful that we could do with that oid anyway (We
need to query the submodule for the HEAD commit instead).
Without this, the following code creates the error "Failed to read
descriptor: Is a directory" when run against the submod2 test-case:
const char* oidstr = "873585b94bdeabccea991ea5e3ec1a277895b698";
git_tree* tree = resolve_commit_oid_to_tree(g_repo, oidstr);
git_diff_list* diff = NULL;
cl_assert(tree);
cl_git_pass(git_diff_workdir_to_tree(g_repo, NULL, tree, &diff));
1. teach diff.c:maybe_modified to query git_submodule_status for the
modification state of a submodule. According to the
git_submodule_status docs, it will filter for to-ignore states
already.
2. teach diff_output.c:get_workdir_content to check the submodule status
again and create a line like:
Subproject commit <SHA-1>\n
or
Subproject comimt <SHA-1>-dirty\n
like git.git does.
diff_output.c:get_blob_content used to try to read the submodule commit
as a blob in the superproject's odb. Of course it cannot find it and
errors out with GIT_ENOTFOUND, implcitly terminating the whole diff
output.
This patch teaches it to create a text that describes the submodule
instead. The text looks like:
Subproject commit <SHA1>\n
which is what git.git does, too.
Together with include-tag, this make us behave more like git. After a
fetch, try to create any tags the remote told us about for which we
have objects locally.
Indicate whether the error comes from the ref already existing or
elsewhere. We always perform the check and this lets the user write
more concise code.
There are a lot of places where the diff API gives the user access
to internal data structures and many of these were being exposed
through non-const pointers. This replaces them all with const
pointers for any object that the user can access but is still
owned internally to the git_diff_list or git_diff_patch objects.
This will probably break some bindings... Sorry!
This fixes all the bugs in the new diff patch code. The only
really interesting one is that when we merge two diffs, we now
have to actually exclude diff delta records that are not supposed
to be tracked, as opposed to before where they could be included
because they would be skipped silently by `git_diff_foreach()`.
Other than that, there are just minor errors.
Replacing the `git_iterator` object, this creates a simple API
for accessing the "patch" for any file pair in a diff list and
then gives indexed access to the hunks in the patch and the lines
in the hunk. This is the initial implementation of this revised
API - it is still broken, but at least builds cleanly.
This file is not just read if the global config file (%HOME%/.gitconfig)
is not found, however, it is used everytime but with lower priority.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Do not hardcode the installation path of msysgit, but read installation path from registry.
Also "%PROGRAMFILES%\Git\etc" won't work on x64 systems with 64-bit libgit2, because
msysgit is x86 only and located in "%ProgramFiles(x86)%\Git\etc".
Signed-off-by: Sven Strickroth <email@cs-ware.de>
On most systems %USERPROFILE% is the same as %HOMEDRIVE%\%HOMEPATH%,
however, for windows machines in an AD or domain environment this
might be different and %HOMEDRIVE%\%HOMEPATH% seems to be better.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Use %HOME% before trying to figure out the windows user directory.
Users might set this as they are used on *nix systems.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Since quite a while now, git_branch_foreach has learnt to list branches
without the 'refs/heads/' or 'refs/remotes' prefixes.
This patch teaches git_tag_list to do the same for listing tags.
There has been discussion for a while about making some set of
the `giterr_set` type functions part of the public API for code
that is implementing new backends to libgit2. This makes the
`giterr_set_str()` and `giterr_set_oom()` functions public.
The old method was avoiding re-loading of packfiles by watching the mtime of the
pack directory. This causes the ODB to become stale if the directory and packfile
are written within the same clock millisecond, as when cloning a fairly small
repo.
This method tries to find the object in the cached packs, and forces a refresh when
that fails. This will cause extra stat'ing on a miss, but speeds up the success
case and avoids this race condition.
last_found is the last packfile a wanted object was found in. Since
last_found is shared among all searching threads, it might changes while
we're searching. As suggested by @arrbee, put a copy on the stack to fix
the race condition.
Defining the BOM as a string makes the array include the
NUL-terminator, which means that the memcpy is going to check for that
as well and thus never match for a nonempty file.
Define the array as three chars, which makes the size correct.
Wondows has its own HTTP library. Use that one when possible instead of
our own.
As we don't depend on them anymore, remove the http-parser library from
the Windows build, as well as the search for OpenSSL.
There is a bug in building the linked list of line records in the
diff iterator and also an off by one element error in the hunk
counts. This fixes both of these, adds some test data with more
complex sets of hunk and line diffs to exercise this code better.
This reduces the rate of syscalls for the common case of sequences of
object reads from the same pack.
Best of 5 timings for libgit2_clar before this patch:
real 0m5.375s
user 0m0.392s
sys 0m3.564s
After applying this patch:
real 0m5.285s
user 0m0.356s
sys 0m3.544s
0.6% improvement in system time.
9.2% improvement in user time.
1.7% improvement in elapsed time.
Confirmed a 0.6% reduction in number of system calls with strace.
Expect greater improvement for graph-traversal with large packs.
Fixed some minor `git_repository_hashfile` issues:
- Fixed incorrect doc (saying that repo could be NULL)
- Added checking of object type value to acceptable ones
- Added more tests for various parameter permutations
The existing `git_odb_hashfile` does not apply text filtering
rules because it doesn't have a repository context to evaluate
the correct rules to apply. This adds a new hashfile function
that will apply repository-specific filters (based on config,
attributes, and filename) before calculating the hash.
In the process of adding tests for the max file size threshold
(which treats files over a certain size as binary) there seem to
be a number of problems in the new code with detecting binaries.
This should fix those up, as well as add a test for the file
size threshold stuff.
Also, this un-deprecates `GIT_DIFF_LINE_ADD_EOFNL`, since I
finally found a legitimate situation where it would be returned.
Example: a cached node is owned only by the cache (refcount == 1).
Thread A holds the lock and determines that the entry which should get
cached equals the node (git_oid_cmp(&node->oid, &entry->oid) == 0).
It frees the given entry to instead return the cached node to the user
(entry = node). Now, before Thread A happens to increment the refcount
of the node *outside* the cache lock, Thread B tries to store another
entry and hits the slot of the node before, decrements its refcount and
frees it *before* Thread A gets a chance to increment for the user.
git_cached_obj_incref(entry);
git_mutex_lock(&cache->lock);
{
git_cached_obj *node = cache->nodes[hash & cache->size_mask];
if (node == NULL) {
cache->nodes[hash & cache->size_mask] = entry;
} else if (git_oid_cmp(&node->oid, &entry->oid) == 0) {
git_cached_obj_decref(entry, cache->free_obj);
entry = node;
} else {
git_cached_obj_decref(node, cache->free_obj);
// Thread B is here
cache->nodes[hash & cache->size_mask] = entry;
}
}
git_mutex_unlock(&cache->lock);
// Thread A is here
/* increase the refcount again, because we are
* returning it to the user */
git_cached_obj_incref(entry);
Often `git_odb_read_header` will "fail" and have to read the
entire object into memory instead of just the header. When this
happens, the object is loaded and then disposed of immediately,
which makes it difficult to efficiently use the header information
to decide if the object should be loaded (since attempting to do
so will often result in loading the object twice).
This commit takes the existing code and reorganizes it to have
two new functions:
- `git_odb__read_header_or_object` which acts just like the old
read header function except that it returns the object, too, if
it was forced to load the whole thing. It then becomes the
callers responsibility to free the `git_odb_object`.
- `git_object__from_odb_object` which was extracted from the old
`git_object_lookup` and creates a subclass of `git_object` from
an existing `git_odb_object` (separating the ODB lookup from the
`git_object` creation). This allows you to use the first header
reading function efficiently without instantiating the
`git_odb_object` twice.
There is no net change to the behavior of any of the existing
functions, but this allows internal code to tap into the ODB
lookup and object creation to be more efficient.
This commit adds a max_size value in the public `git_diff_options`
structure so that the user can automatically flag blobs over a
certain size as binary regardless of other properties.
Also, and perhaps more importantly, this moves binary detection
to be as early as possible in the diff traversal inner loop and
makes sure that we stop loading objects as soon as we decide that
they are binary.
The `git_diff_iterator_num_files` API was problematic, since we
don't actually know the exact number of files to be iterated over
until we load those files into memory. This replaces it with a
new `git_diff_iterator_progress` API that goes from 0 to 1, and
moves and renamed the old API for the internal places that can
tolerate a max value instead of an exact value.
Previously when diffing blobs, the diff code just ran with a NULL
repository object. Of course, that's not necessary and the test
for a NULL repo was confusing. This makes the blob diff run with
the repo that contains the blobs and clarifies the test that it
is possible to be diffing data where the path is unknown.
This adds support to diff and status for running filters (a la crlf)
on blobs in the workdir before computing SHAs and before generating
text diffs. This ended up being a bit more code change than I had
thought since I had to reorganize some of the diff logic to minimize
peak memory use when filtering blobs in a diff.
This also adds a cap on the maximum size of data that will be loaded
to diff. I set it at 512Mb which should match core git. Right now
it is a #define in src/diff.h but it could be moved into the public
API if desired.