This is an intermin solution. While this essentially disables the
--shared flag feature, previously external templates did not work
at all. This change fixes the previously corrected, and since
then failing, repo_init__extended_with_template() test.
The problem is now documented in the source code comments.
The indexer needs to call the packfile's free function so it takes care of
freeing the caches.
We still need to close the mwf descriptor manually so we can rename the
packfile into its final name on Windows.
Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.
This was just wrong. Added a test that verifying patch line
numbers even for hunks further into a file and then fixed the
algorithm. I needed to add a little extra state into the patch
so that I could track old and new file numbers independently,
but it should be okay.
Many delta bases are re-used. Cache them to avoid inflating the same
data repeatedly.
This version doesn't limit the amount of entries to store, so it can
end up using a considerable amound of memory.
This adds an option to checkout a la the diff option to turn off
fnmatch evaluation for pathspec entries. This can be useful to
make sure your "pattern" in really interpretted as an exact file
match only.
All the ODB backends have a specific refresh interface. When reading an
object, first we attempt every single backend: if the read fails, then
we refresh all the backends and retry the read one more time to see if
the object has appeared.
It is not legal inside our `p_mmap` function to mmap a zero length
file. This adds a test that exercises that case inside diff and
fixes the code path where we would try to do that.
The fix turns out not to be a lot of code since our default file
content is already initialized to "" which works in this case.
Fixes#1210
This moves the implementation of these two APIs into common code
that will be shared between the two. Also, this adds tests for
the `git_diff_blob_to_buffer` API. Lastly, this adds some extra
`const` to a few places that can use it.
Before this, we error out from `reference_matches_remote_head` if the
reference we're searching for does not exist.
Since we explicitly check if master is existing in `update_head_to_remote`
and error out if it doesn't, a repository without master branch could
not be cloned.
In fact this was later clobbered by what is fixed in #1194.
However, this patch introduces a `found` member in `head_info` and sets
it accordingly. That also saves us from checking the string length of
`branchname` a few times.
As a function that appears to only be called on error paths, I don't
think it makes sense for it to return an error, or clobber the global
giterr. Note that no existing callsites actually check the return
code.
In my own application, there are errors where the real error ends
up being hidden, as git_mwindow_file_deregister() clobbers the
global giterr. I'm not sure this error is even relevant?
I saw a repo in the wild today which had a master branch ref which was packed, but had no trailing newline. Git handled it fine, but libgit2 choked on it. Fix seems simple enough. If we don't see a newline, assume the end of the buffer is the end of the ref line.
There are a couple of checkout bugs fixed here. One is with
untracked working directory entries that are prefixes of tree
entries but not in a meaningful way (i.e. "read" is a prefix of
"readme.txt" but doesn't interfere in any way). The second bug
is actually a redo of 07edfa0fc640f85f95507c3101e77accd7d2bf0d
where directory entries in the index that are not in the diff
were not being removed correctly. That fix remedied one case
but broke another.
When checking out with the GIT_CHECKOUT_REMOVE_UNTRACKED option
and there was an entire tree in the working directory and in the
index that is not in the baseline nor target commit, the tree was
correctly(?) removed from the working directory but was not
successfully removed from the index. This fixes that and adds a
test of the functionality.
This moves a lot of the detailed checkout documentation into a new
file (docs/checkout-internals.md) and simplifies the public docs
for the checkout API.
There were a bunch of small bugs in the checkout code where I was
assuming that a typechange was always from a tree to a blob or
vice versa. This fixes up most of those cases. Also, there were
circumstances where the submodule definitions were changed by the
checkout and the submodule data was not getting reloaded properly
before the new submodules were checked out.
The notifications were broken from the various iterations over
this code and were not returning working dir item data correctly.
Also, workdir items that were alphabetically after the last item
in diff were not being processed.
The spoolandsort iterator changes got sort-of cherry picked out of
this branch and so I dropped the commit when rebasing; however,
there were a few small changes that got dropped as well (since the
version merged upstream wasn't quite the same as what I dropped).
This adds a new API to the submodule interface that just returns
where information about the submodule was found (e.g. config file
only or in the HEAD, index, or working directory).
Also, the old "refresh" call was potentially keeping some stale
submodule data around, so this simplfies that code and literally
discards the old cache, then reallocates.
Stash was sometimes obscuring the actual error code, replacing it
with a -1 when there was more descriptive value. This updates
stash to preserve the original error code more reliably along
with a variety of other error handling tweaks.
I believe this is an improvement, but arguably, preserving the
underlying error code may result in values that are harder to
interpret by the caller who does not understand the internals.
Discussion is welcome!
Previously a NULL oid was handled like an empty buffer and
returned a status empty string. This makes git_oid_tostr()
set the output buffer to the empty string instead.
Make checkout update entries in the index for all files that are
updated and/or removed, unless flag GIT_CHECKOUT_DONT_UPDATE_INDEX
is given. To do this, iterators were extended to allow a little
more introspection into the index being iterated over, etc.
This flips checkout back to be driven off the changes between
the baseline and the target trees. This reinstates the complex
code for tracking the contents of the working directory, but
overall, I think the resulting logic is easier to follow.
I've tried to map out the detailed behaviors of checkout and make
sure that we're handling the various cases correctly, along with
providing options to allow us to emulate "git checkout" and "git
checkout-index" with the various flags. I've thrown away flags
in the checkout API that seemed like clutter and added some new
ones. Also, I've converted the conflict callback to a general
notification callback so we can emulate "git checkout" output and
display "dirty" files.
As of this commit, the new behavior is not working 100% but some
of that is probably baked into tests that are not testing the
right thing. This is a decent snapshot point, I think, along the
way to getting the update done.
This corrects the order of operations in git reset so that the
checkout to reset the working directory content is done before
the HEAD is moved. This allows us to use the HEAD and the index
content to know what files can / should safely be reset.
Unfortunately, there are still some cases where the behavior of
this revision differs from core git. Notable, a file which has
been added to the index but is not present in the HEAD is
considered to be tracked by core git (and thus removable by a
reset command) whereas since this loads the target state into
the index prior to resetting, it will consider such a file to be
untracked and won't touch it. That is a larger fix that I'll
defer to a future commit.
* gen_pktline() in smart_protocol.c was skipping refspecs that deleted
refs that were not advertised by the server. The new behavior is to
send a delete command with an old-id of zero, which matches the behavior
of the official git client.
* Update test_network_push__delete() in reaction to above fix.
* Obviate messy logic that handles missing push_spec rrefs by canonicalizing
push_spec. After calculate_work(), loid, roid, and rref, are filled in with
exactly what is sent to the server
The original libpqueue file were licensed under Apache 2.0 so
therefore should retain their copyrights and header as per the
license terms at http://www.apache.org/licenses/LICENSE-2.0
When normalizing a reference name, if there is an error because
the name is invalid, then the memory allocated for storing the
name could be leaked if the caller was not careful and assumed
that the error return code meant that no allocation had occurred.
This fixes that by explicitly deallocating the reference name
buffer if there is an error in normalizing the name.
An earlier change to `git_diff_from_iterators` introduced a
memory leak where the allocated spoolandsort iterator was not
returned to the caller and thus not freed.
One proposal changes all iterator APIs to use git_iterator** so
we can reallocate the iterator at will, but that seems unexpected.
This commit makes it so that an iterator can be changed in place.
The callbacks are isolated in a separate structure and a pointer
to that structure can be reassigned by the spoolandsort extension.
This means that spoolandsort doesn't create a new iterator; it
just allocates a new block of callbacks (along with space for its
own extra data) and swaps that into the iterator.
Additionally, since spoolandsort is only needed to switch the
case sensitivity of an iterator, this simplifies the API to only
take the ignore_case boolean and to be a no-op if the iterator
already matches the requested case sensitivity.
The diff constructor functions had some confusing names, where the
"old" side of the diff was coming after the "new" side. This
reverses the order in the function name to make it less confusing.
Specifically...
* git_diff_index_to_tree becomes git_diff_tree_to_index
* git_diff_workdir_to_index becomes git_diff_index_to_workdir
* git_diff_workdir_to_tree becomes git_diff_tree_to_workdir
According to man 3 SSL_shutdown / TLS, "If a unidirectional shutdown is
enough (the underlying connection shall be closed anyway), this first
call to SSL_shutdown() is sufficient."
Currently, an unidirectional shutdown is enough, since
gitno_ssl_teardown is called by gitno_close only. Do so to avoid further
errors (by misbehaving peers for example).
Fixes#1129.
While C Git has been writing entry count -1 (ie. never other negative
numbers) as invalid since day 1, it accepts all negative entry counts
as invalid. JGit follows the same rule. libgit2 should also follow, or
the index that works with C Git or JGit may someday be rejected by
libgit2.
Other reimplementations like dulwich and grit have not bothered with
parsing or writing tree cache.
The `git_iterator_reset` command has not been working in all cases
particularly when there is a start and end range. This fixes it
and adds tests for it, and also extends it with the ability to
update the start/end range strings when an iterator is reset.
This removes the need to explicitly pass the repo into iterators
where the repo is implied by the other parameters. This moves
the repo to be owned by the parent struct. Also, this has some
iterator related updates to the internal diff API to lay the
groundwork for checkout improvements.
If commit timestamps are off, we're more likely to hit a traversal
where the first path ends up traversing past the root commit of the tree.
If that happens, it's possible that the loop will complete before the second
path marks some of those final parents. This fix keeps track of the root
nodes that are encountered in the traversal, and verify that they are
properly marked.
In the best case, with accurate timestamps, the traversal will continue
to terminate when all the commits are STALE (parents of a merge-base), as
it did before. In the worst case, where one path makes a complete traversal
past a root commit, we will continue the loop until the root commit itself
is marked.
This could also use PTHREAD_MUTEX_INITIALIZER, but a dynamic initializer seems like a more portable concept, and we won't need another #define on top of git_mutex_init()
Storing 4kB or 8kB in the stack is not very gentle. As this part has
to be linear, put the buffer into the indexer object so we allocate it
once in the heap.
There are many different broken filemodes in the wild so we need to
protect against them and give something useful up the chain. Don't
fail when reading a tree from the ODB but normalize the mode as best
we can.
As 664 is no longer a mode that we consider to be valid and gets
normalized to 644, we can stop accepting it in the treebuilder. The
library won't expose it to the user, so any invalid modes are a bug.
To paraphrase @peff:
You can get both size and type from a packed object reasonably cheaply.
If you have:
* An object that is not a delta; both type and size are available in the
packfile header.
* An object that is a delta. The packfile type will be OBJ_*_DELTA, and
you have to resolve back to the base to find the real type. That means
potentially a lot of packfile index lookups, but each one is
relatively cheap. For the size, you inflate the first few bytes of the
delta, whose header will tell you the resulting size of applying the
delta to the base.
For simplicity, we just decompress the whole delta for now.
A mmap-window is not guaranteed to give you the whole object, but the
indexer currently assumes so.
Loop asking for more data until we've successfully CRC'd all of the
packed data.
Up to now, deltas needed to be enterily in the packfile, and we tried
to decompress then in their entirety over and over again.
Adjust the logic so we read them as they come, just as we do for full
objects. This also allows us to simplify the logic and have less
nested code. The delta resolving phase still needs to decompress the
whole object into memory, as there is not yet any streaming
delta-apply support, but it helps in speeding up the downloading
process and reduces the amount of memory allocations we need to do.
The new API allows us to read the object bit by bit from the packfile,
instead of needing it all at once in the packfile. This also allows us
to hash the object as it comes in from the network instead of having
to try to read it all and failing repeatedly for larger objects.
This is only the first step, but it already shows huge improvements
when dealing with objects over a few megabytes in size. It reduces the
memory needs in some cases, but delta objects still need to be
completely in memory and the old inefficent method is still used for
that.
`revwalk.h:commit_lookup()` -> `git_revwalk__commit_lookup()`
and make `git_commit_list_parse()` do real error checking that
the item in the list is an actual commit object. Also fixed an
apparent typo in a test name.
Moved it into graph.{c,h} which i created for the new "graph"
functions namespace. Also adjusted the function prototype
to use `size_t` and `const git_oid *`.
There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc). This groups all those functions together into a
new file and converts the code to use that.
This has two enhancements to existing functionality. The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).
clang-SVN HEAD kindly provided my the info, that sm_repo maybe
uninitialized when we want to free it (If the expression in line 358 or
359/360 evaluate to true, we jump to "cleanup", where we'd use sm_repo
uninitialized).
This fixes some missed places where we can apply const-ness to
various public APIs.
There are still some index and tree APIs that cannot take const
pointers because we sort our `git_vectors` lazily and so we can't
reliably bsearch the index and tree content without applying a
`git_vector_sort()` first.
This also fixes some missed places where size_t can be used and
where const can be applied to a couple internal functions.
This makes the diff functions that take callbacks both take
the payload parameter after the callback function pointers and
pass the payload as the last argument to the callback function
instead of the first. This should make them consistent with
other callbacks across the API.
3f9eb1e introduced support for SSL certificates issued for IP
addresses, making use of in_addr and in_addr6 structs. On FreeBSD
these are defined in (a file included in) <netinet/in.h>, so include
that file on FreeBSD and get the build working again.
The workdir iterator has always tried to ignore .git files, but
it turns out there were some bugs. This makes it more robust at
ignoring .git files.
This also makes iterators always check ".git" case insensitively
regardless of the properties of the system. This will make libgit2
skip ".GIT" and the like. This is different from core git, but on
systems with case insensitive but case preserving file systems,
allowing ".GIT" to be added is problematic.
This checks for a leading '.' before looking for the invalid
tree entry names. Even on pretty high levels of optimization,
this seems to make a measurable improvement.
I accidentally used && in the check initially instead of || and
while debugging ended up improving the error reporting of issues
with adding tree entries. I thought I'd leave those changes, too.
A number of diff APIs and the `git_checkout_index` API take a
`git_repository` object an operate on the index. This updates
them to take a `git_index` pointer explicitly and only fall back
on the `git_repository` index if the index input is NULL. This
makes it easier to operate on a temporary index.
The index iterator could previously only be created from a repo
object, but this allows creating an iterator from a `git_index`
object instead (while keeping, though renaming, the old function).
The existing p_lstat implementation on win32 is not quite POSIX
compliant when setting errno to ENOTDIR. This adds an option to
make is be compliant so that code (such as checkout) that cares
to have separate behavior for ENOTDIR can use it portably.
This also contains a couple of other minor cleanups in the
posix_w32.c implementations to avoid unnecessary work.
Using the builtin strcmp and strcasecmp as function pointers is
problematic on win32. This adds internal implementations and
divorces us from the platform linkage.
Returning NULL for the string when we haven't signaled an error
condition is counter-intuitive and causes unnecessary edge
cases. Return an empty string when asking for a string value for a
configuration variable such as '[section] var' to avoid these edge
cases.
If the distinction between no value and an empty value is needed, this
can be retrieved from the entry directly. As a side-effect, this
change stops the int parsing functions from segfaulting on such a
variable.
This fixes a number of warnings and problems with cross-platform
builds. Among other things, it's not safe to name a member of a
structure "strcmp" because that may be #defined.
This is a major reworking of checkout strategy options. The
checkout code is now sensitive to the contents of the HEAD tree
and the new options allow you to update the working tree so that
it will match the index content only when it previously matched
the contents of the HEAD. This allows you to, for example, to
distinguish between removing files that are in the HEAD but not
in the index, vs just removing all untracked files.
Because of various corner cases that arise, etc., this required
some additional capabilities in rmdir and other utility functions.
This includes the beginnings of an implementation of code to read
a partial tree into the index based on a pathspec, but that is
not enabled because of the possibility of creating conflicting
index entries.
There are some diff functions that are useful in a rewritten
checkout and this lays some groundwork for that. This contains
three main things:
1. Share the function diff uses to calculate the OID for a file
in the working directory (now named `git_diff__oid_for_file`
2. Add a `git_diff__paired_foreach` function to iterator over
two diff lists concurrently. Convert status to use it.
3. Move all the string/prefix/index entry comparisons into
function pointers inside the `git_diff_list` object so they
can be switched between case sensitive and insensitive
versions. This makes them easier to reuse in various
functions without replicating logic. As part of this, move
a couple of index functions out of diff.c and into index.c.
Diff uses a `git_strarray` of path specs to represent a subset
of all files to be processed. It is useful to be able to reuse
this filtering in other places outside diff, so I've moved it
into a standalone set of utilities.
This makes it so that the check if a file is ignored will be
deferred until requested on the workdir iterator, instead of
aggressively evaluating the ignore rules for each entry. This
should improve performance because there will be no need to
check ignore rules for files that are already in the index.
So, @nulltoken created a failing test case for checkout that
proved to be particularly daunting. If checkout is given only
a very limited strategy mask (e.g. just GIT_CHECKOUT_CREATE_MISSING)
then it is possible for typechange/rename modifications to leave it
unable to complete the request. That's okay, but the existing code
did not have enough information not to generate an error (at least
for tree/blob conflicts).
This led me to a significant reorganization of the code to handle
the failing case, but it has three benefits:
1. The test case is handled correctly (I think)
2. The new code should actually be much faster than the old code
since I decided to make checkout aware of diff list internals.
3. The progress value accuracy is hugely increased since I added
a fourth pass which calculates exactly what work needs to be
done before doing anything.
* Rework GIT_DIRREMOVAL values to GIT_RMDIR flags, allowing
combinations of flags
* Add GIT_RMDIR_EMPTY_PARENTS flag to remove parent dirs that
are left empty after removal
* Add GIT_MKDIR_VERIFY_DIR to give an error if item is a file,
not a dir (previously an EEXISTS error was ignored, even for
files) and enable this flag for git_futils_mkpath2file call
* Improve accuracy of error messages from git_futils_mkdir
This fix makes libgit2 capable of parsing annotated tag objects that lack
the optional message/description field.
Previously, libgit2 treated this field as mandatory and raised a tag_error on
such tags. However, the message field is optional.
An example of such a tag is refs/tags/v2.6.16.31-rc1 in Linux:
$ git cat-file tag refs/tags/v2.6.16.31-rc1
object afaa018cefb6af63befef1df7d8febaae904434f
type commit
tag v2.6.16.31-rc1
tagger Adrian Bunk <bunk@stusta.de> 1162716505 +0100
$
This improves docs in some of the public header files, cleans
up and improves some of the example code, and fixes a couple
of pedantic warnings in places.
This adds a new API that allows users to reload the config if the
file has changed on disk. A new config callback function to
refresh the config was added.
The modified time and file size are used to test if the file needs
to be reloaded (and are now stored in the disk backend object).
In writing tests, just using mtime was a problem / race, so I
wanted to check file size as well. To support that, I extended
`git_futils_readbuffer_updated` to optionally check file size in
addition to mtime, and I added a new function `git_filebuf_stats`
to fetch the mtime and size for an open filebuf (so that the
config could be easily refreshed after a write).
Lastly, I moved some similar file checking code for attributes
into filebuf. It is still only being used for attrs, but it
seems potentially reusable, so I thought I'd move it over.
This improves the naming for the rename related functionality
moving it to be called `git_diff_find_similar()` and renaming
all the associated constants, etc. to make more sense.
I also moved the new code (plus the existing `git_diff_merge`)
into a new file `diff_tform.c` where I can put new functions
related to manipulating git diff lists.
This also updates the implementation significantly from the
last revision fixing some ordering issues (where break-rewrite
needs to be handled prior to copy and rename detection) and
improving config option handling.
This adds a `git_diff_patch_print()` API which is more like the
existing API to "print" a patch from an entire `git_diff_list`
but operates on a single `git_diff_patch` object.
Also, it rewrites the `git_diff_patch_to_str()` API to use that
function (making it very small).
This implements the basis for diff rename and copy detection,
although it is based on simple SHA comparison right now instead
of using a matching algortihm. Just as `git_diff_merge` can be
used as a post-pass on diffs to emulate certain command line
behaviors, there is a new API `git_diff_detect` which will
update a diff list in-place, adjusting some deltas to RENAMED
or COPIED state (and also, eventually, splitting MODIFIED deltas
where the change is too large into DELETED/ADDED pairs).
This also adds a new test repo that will hold rename/copy/split
scenarios. Right now, it just has exact-match rename and copy,
but the tests are written to use tree diffs, so we should be able
to add new test scenarios easily without breaking tests.
libcryto's SHA-1 implementation is measurably better than the one that
ships with the library. If we link to it for HTTPS support already,
use that implementation instead.
Testing on a ~600MB of the linux repository, this reduces indexing
time by 40% and removes the hashing from the top spot in the perf
output.
Added `struct git_config_entry`: a git_config_entry contains the key, the value, and the config file level from which a config element was found.
Added `git_config_open_level`: build a single-level focused config object from a multi-level one.
We are now storing `git_config_entry`s in the khash of the config_file
- make sure temporary streamed blobs are created under the
.git/objects folder and not in the current path, whatever it is.
- do not make the name of the temp file depend on the hintpath.
git_index_read_tree() was exposing a parameter to provide the user with
a progress indicator. Unfortunately, due to the recursive nature of the
tree walk, the maximum number of items to process was unknown. Thus,
the indicator was only counting processed entries, without providing
any information how the number of remaining items.
The new Win32 global path search was not working with the
environment variable tests. But when I fixed the test, the new
codes use of getenv() was causing more failures (presumably because
of caching on Windows ???). This fixes the global file lookup to
always go directly to the Win32 API in a predictable way.
Introduce git_remote_stop() which sets a variable that is checked by
the fetch process in a few key places. If this is variable is set, the
fetch is aborted.
Fixed no-submodule speedup of new checkout code. Fixed missing
final update to progress (which may go away, I realize). Fixed
unused structure in header and incorrect comment.
To answer if a single given file should be ignored, the path to
that file has to be processed progressively checking that there
are no intermediate ignored directories in getting to the file
in question. This enables that, fixing the broken old behavior,
and adds tests to exercise various ignore situations.
Because fnmatch uses recursion, there were some input sequences
that cause seriously degenerate behavior. This imports a fix
that imposes a max recursion limiter to avoid the worst of it.
We used to require loose references to contain only an OID (possibly
after trimming the string). This is however not enough for letting us
lookup FETCH_HEAD, which can have a lot of content after the initial
OID.
Change the parsing rules so that a loose refernce must e at least 40
bytes long and the 41st (if it's there) must be accepted by
isspace(3). This makes the trim unnecessary, so only do it for
symrefs. This fixes#977.
The fix for fetching from empty repositories (22935b06d protocol:
don't store flushes; 2012-10-07) forgot to take into account the
deletion of the flush pkt in the HTTP transport. As a result, the HEAD
ref advertisement where we detect the remote's capabilities was
deleted instead. Fix this.
This started as a complex new test for checkout going through the
"typechanges" test repository, but that revealed numerous issues
with checkout, including:
* complete failure with submodules
* failure to create blobs with exec bits
* problems when replacing a tree with a blob because the tree
"example/" sorts after the blob "example" so the delete was
being processed after the single file blob was created
This fixes most of those problems and includes a number of other
minor changes that made it easier to do that, including improving
the TYPECHANGE support in diff/status, etc.
This is just some cleanup code, rearranging some of the checkout
code where TYPECHANGE support was added and adding some comments
to the diff header regarding the constants.
When I wrote the diff code, I based it on core git's diff output
which tends to split a type change into an add and a delete. But
core git's status has the notion of a T (typechange) flag for a
file. This introduces that into our status APIs and modifies the
diff code so it can be forced to not split type changes.
The adds a test for the submodule diff capabilities and then
fixes a few bugs with how the output is generated. It improves
the accuracy of OIDs in the diff delta object and makes the
submodule output more closely mirror the OIDs that will be used
by core git.
There are a few cases where diff should leave directories in
the diff list if we want to match core git, such as when the
directory contains a .git dir. That feature was lost when I
introduced some of the new submodule handling.
This restores that and then fixes a couple of related to diff
output that are triggered by having diffs with directories in
them.
Also, this adds a new flag that can be passed to diff if you
want diff output to actually include the file content of any
untracked files.
The reference is only needed inside the function. We mistakenly
increased the reference counter causing the ODB not to get freed and
leaking descriptors.
Storing flushes in the refs vector doesn't let us recognize when the
remote is empty, as we'd always introduce at least one element into
it. These flushes aren't necessary, so we can simply ignore them.
We don't have anything useful that we could do with that oid anyway (We
need to query the submodule for the HEAD commit instead).
Without this, the following code creates the error "Failed to read
descriptor: Is a directory" when run against the submod2 test-case:
const char* oidstr = "873585b94bdeabccea991ea5e3ec1a277895b698";
git_tree* tree = resolve_commit_oid_to_tree(g_repo, oidstr);
git_diff_list* diff = NULL;
cl_assert(tree);
cl_git_pass(git_diff_workdir_to_tree(g_repo, NULL, tree, &diff));
1. teach diff.c:maybe_modified to query git_submodule_status for the
modification state of a submodule. According to the
git_submodule_status docs, it will filter for to-ignore states
already.
2. teach diff_output.c:get_workdir_content to check the submodule status
again and create a line like:
Subproject commit <SHA-1>\n
or
Subproject comimt <SHA-1>-dirty\n
like git.git does.
diff_output.c:get_blob_content used to try to read the submodule commit
as a blob in the superproject's odb. Of course it cannot find it and
errors out with GIT_ENOTFOUND, implcitly terminating the whole diff
output.
This patch teaches it to create a text that describes the submodule
instead. The text looks like:
Subproject commit <SHA1>\n
which is what git.git does, too.
Together with include-tag, this make us behave more like git. After a
fetch, try to create any tags the remote told us about for which we
have objects locally.
Indicate whether the error comes from the ref already existing or
elsewhere. We always perform the check and this lets the user write
more concise code.
There are a lot of places where the diff API gives the user access
to internal data structures and many of these were being exposed
through non-const pointers. This replaces them all with const
pointers for any object that the user can access but is still
owned internally to the git_diff_list or git_diff_patch objects.
This will probably break some bindings... Sorry!
This fixes all the bugs in the new diff patch code. The only
really interesting one is that when we merge two diffs, we now
have to actually exclude diff delta records that are not supposed
to be tracked, as opposed to before where they could be included
because they would be skipped silently by `git_diff_foreach()`.
Other than that, there are just minor errors.
Replacing the `git_iterator` object, this creates a simple API
for accessing the "patch" for any file pair in a diff list and
then gives indexed access to the hunks in the patch and the lines
in the hunk. This is the initial implementation of this revised
API - it is still broken, but at least builds cleanly.
This file is not just read if the global config file (%HOME%/.gitconfig)
is not found, however, it is used everytime but with lower priority.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Do not hardcode the installation path of msysgit, but read installation path from registry.
Also "%PROGRAMFILES%\Git\etc" won't work on x64 systems with 64-bit libgit2, because
msysgit is x86 only and located in "%ProgramFiles(x86)%\Git\etc".
Signed-off-by: Sven Strickroth <email@cs-ware.de>
On most systems %USERPROFILE% is the same as %HOMEDRIVE%\%HOMEPATH%,
however, for windows machines in an AD or domain environment this
might be different and %HOMEDRIVE%\%HOMEPATH% seems to be better.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Use %HOME% before trying to figure out the windows user directory.
Users might set this as they are used on *nix systems.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
Since quite a while now, git_branch_foreach has learnt to list branches
without the 'refs/heads/' or 'refs/remotes' prefixes.
This patch teaches git_tag_list to do the same for listing tags.
There has been discussion for a while about making some set of
the `giterr_set` type functions part of the public API for code
that is implementing new backends to libgit2. This makes the
`giterr_set_str()` and `giterr_set_oom()` functions public.
The old method was avoiding re-loading of packfiles by watching the mtime of the
pack directory. This causes the ODB to become stale if the directory and packfile
are written within the same clock millisecond, as when cloning a fairly small
repo.
This method tries to find the object in the cached packs, and forces a refresh when
that fails. This will cause extra stat'ing on a miss, but speeds up the success
case and avoids this race condition.
last_found is the last packfile a wanted object was found in. Since
last_found is shared among all searching threads, it might changes while
we're searching. As suggested by @arrbee, put a copy on the stack to fix
the race condition.
Defining the BOM as a string makes the array include the
NUL-terminator, which means that the memcpy is going to check for that
as well and thus never match for a nonempty file.
Define the array as three chars, which makes the size correct.
Wondows has its own HTTP library. Use that one when possible instead of
our own.
As we don't depend on them anymore, remove the http-parser library from
the Windows build, as well as the search for OpenSSL.
There is a bug in building the linked list of line records in the
diff iterator and also an off by one element error in the hunk
counts. This fixes both of these, adds some test data with more
complex sets of hunk and line diffs to exercise this code better.
This reduces the rate of syscalls for the common case of sequences of
object reads from the same pack.
Best of 5 timings for libgit2_clar before this patch:
real 0m5.375s
user 0m0.392s
sys 0m3.564s
After applying this patch:
real 0m5.285s
user 0m0.356s
sys 0m3.544s
0.6% improvement in system time.
9.2% improvement in user time.
1.7% improvement in elapsed time.
Confirmed a 0.6% reduction in number of system calls with strace.
Expect greater improvement for graph-traversal with large packs.
Fixed some minor `git_repository_hashfile` issues:
- Fixed incorrect doc (saying that repo could be NULL)
- Added checking of object type value to acceptable ones
- Added more tests for various parameter permutations
The existing `git_odb_hashfile` does not apply text filtering
rules because it doesn't have a repository context to evaluate
the correct rules to apply. This adds a new hashfile function
that will apply repository-specific filters (based on config,
attributes, and filename) before calculating the hash.
In the process of adding tests for the max file size threshold
(which treats files over a certain size as binary) there seem to
be a number of problems in the new code with detecting binaries.
This should fix those up, as well as add a test for the file
size threshold stuff.
Also, this un-deprecates `GIT_DIFF_LINE_ADD_EOFNL`, since I
finally found a legitimate situation where it would be returned.
Example: a cached node is owned only by the cache (refcount == 1).
Thread A holds the lock and determines that the entry which should get
cached equals the node (git_oid_cmp(&node->oid, &entry->oid) == 0).
It frees the given entry to instead return the cached node to the user
(entry = node). Now, before Thread A happens to increment the refcount
of the node *outside* the cache lock, Thread B tries to store another
entry and hits the slot of the node before, decrements its refcount and
frees it *before* Thread A gets a chance to increment for the user.
git_cached_obj_incref(entry);
git_mutex_lock(&cache->lock);
{
git_cached_obj *node = cache->nodes[hash & cache->size_mask];
if (node == NULL) {
cache->nodes[hash & cache->size_mask] = entry;
} else if (git_oid_cmp(&node->oid, &entry->oid) == 0) {
git_cached_obj_decref(entry, cache->free_obj);
entry = node;
} else {
git_cached_obj_decref(node, cache->free_obj);
// Thread B is here
cache->nodes[hash & cache->size_mask] = entry;
}
}
git_mutex_unlock(&cache->lock);
// Thread A is here
/* increase the refcount again, because we are
* returning it to the user */
git_cached_obj_incref(entry);
Often `git_odb_read_header` will "fail" and have to read the
entire object into memory instead of just the header. When this
happens, the object is loaded and then disposed of immediately,
which makes it difficult to efficiently use the header information
to decide if the object should be loaded (since attempting to do
so will often result in loading the object twice).
This commit takes the existing code and reorganizes it to have
two new functions:
- `git_odb__read_header_or_object` which acts just like the old
read header function except that it returns the object, too, if
it was forced to load the whole thing. It then becomes the
callers responsibility to free the `git_odb_object`.
- `git_object__from_odb_object` which was extracted from the old
`git_object_lookup` and creates a subclass of `git_object` from
an existing `git_odb_object` (separating the ODB lookup from the
`git_object` creation). This allows you to use the first header
reading function efficiently without instantiating the
`git_odb_object` twice.
There is no net change to the behavior of any of the existing
functions, but this allows internal code to tap into the ODB
lookup and object creation to be more efficient.
This commit adds a max_size value in the public `git_diff_options`
structure so that the user can automatically flag blobs over a
certain size as binary regardless of other properties.
Also, and perhaps more importantly, this moves binary detection
to be as early as possible in the diff traversal inner loop and
makes sure that we stop loading objects as soon as we decide that
they are binary.
The `git_diff_iterator_num_files` API was problematic, since we
don't actually know the exact number of files to be iterated over
until we load those files into memory. This replaces it with a
new `git_diff_iterator_progress` API that goes from 0 to 1, and
moves and renamed the old API for the internal places that can
tolerate a max value instead of an exact value.
Previously when diffing blobs, the diff code just ran with a NULL
repository object. Of course, that's not necessary and the test
for a NULL repo was confusing. This makes the blob diff run with
the repo that contains the blobs and clarifies the test that it
is possible to be diffing data where the path is unknown.
This adds support to diff and status for running filters (a la crlf)
on blobs in the workdir before computing SHAs and before generating
text diffs. This ended up being a bit more code change than I had
thought since I had to reorganize some of the diff logic to minimize
peak memory use when filtering blobs in a diff.
This also adds a cap on the maximum size of data that will be loaded
to diff. I set it at 512Mb which should match core git. Right now
it is a #define in src/diff.h but it could be moved into the public
API if desired.
This refactors the diff output code so that an iterator object
can be used to traverse and generate the diffs, instead of just
the `foreach()` style with callbacks. The code has been rearranged
so that the two styles can still share most functions.
This also replaces `GIT_REVWALKOVER` with `GIT_ITEROVER` and uses
that as a common error code for marking the end of iteration when
using a iterator style of object.
SSL_get_error() allows to receive a result code for various SSL
operations. Depending on the return value (see man (3) SSL_get_error)
there might be additional information in the OpenSSL error queue. Return
the queued message if available, otherwise set an error message
corresponding to the return code.
There is a better and less fragile way to calculate time offsets. Let
the OS take care of dealing with DST and simply take the the offset
between the local time and UTC that it gives us.
Passing SSL_VERIFY_PEER makes OpenSSL shut down the connection if the
certificate is invalid, without giving us a chance to ignore that
error. Pass SSL_VERIFY_NONE and call SSL_get_verify_result if the user
wanted us to check.
When no CNs match, we used to jump to on_error which gave a bogus
error as that's for OpenSSL errors. Jump to cert_fail so we tell the
user that the error came from checking the certificate.
This expands the types of peeling that `git_object_peel` knows
how to do to include TAG -> BLOB peeling, and makes the errors
slightly more consistent depending on the situation. It also
adds a new special behavior where peeling to ANY will peel until
the object type changes (e.g. chases TAGs to a non-TAG).
Using this expanded peeling, this replaces peeling code that was
embedded in `git_tag_peel` and `git_reset`.
It's not really needed with the current code as we have EOS and the
sideband's flush to tell us we're done.
Keep the distinction between processed and received objects.
Just clean up valgrind warnings about uninitialized memory
and also clear out errno in some cases where it results in
a false error message being generated at a later point.
This is a big redesign of the git_submodule_status API and the
implementation of the redesigned API. It also fixes a number of
bugs that I found in other parts of the submodule API while
writing the tests for the status part.
This also fixes a couple of bugs in the iterators that had not
been noticed before - one with iterating when there is a gitlink
(i.e. separate-work-dir) and one where I was treating anything
even vaguely submodule-like as a submodule, more aggressively
than core git does.
This fixes up a number of problems flagged by valgrind and also
cleans up the internal `git_submodule` allocation handling
overall with a simpler model.
This cleans up a number of items suggested during code review
with @vmg, including:
* renaming "outside repo" config API to `git_config_open_default`
* killing the `git_config_open_global` API
* removing the `git_` prefix from the static functions in fileops
* removing some unnecessary functionality from the "cp" command
This extends git_repository_init_ext further with support for
initializing the repository from an external template directory
and with support for the "create shared" type flags that make a
set GID repository directory.
This also adds tests for much of the new functionality to the
existing `repo/init.c` test suite.
Also, this adds a bunch of new utility functions including a
very general purpose `git_futils_mkdir` (with the ability to
make paths and to chmod the paths post-creation) and a file
tree copying function `git_futils_cp_r`. Also, this includes
some new path functions that were useful to keep the code
simple.
The extended version of repository init adds support for many
of the things that you can do with `git init` and sets up
structures that will make it easier to extend further in the
future.
In looking at PR #878, I found a few small bugs in the diff code,
mostly related to work that can be avoided when processing tree-
to-tree diffs that was always being carried out. This commit has
some small fixes in it.
This creates a public API for adding to the internal ignores
list, which already existing but was not accessible.
This adds the new default value for core.excludesfile also.
Up to now, the idea was that the user would do all the operations for
one repository in the same thread. Thus we could have the
memory-mapped window information thread-local and avoid any locking.
This is not practical in a few environments, such as Apple's GCD which
allocates threads arbitrarily or the .NET CLR, where the OS-level
thread can change at any moment.
Make the control structure global and protect it with a mutex so we
don't depend on the thread currently executing the code.
If you want to be absolutely safe with git_message_prettify, you
can now pass a NULL pointer for the buffer and get back the number
of bytes that would be copied into the buffer.
This means that an error is a non-negative return code and a
success will be greater than zero from this function.
Returning a negative cancels the walk, and returning a positive one
causes us to skip an entry, which was previously done by a negative
value.
This allows us to stay consistent with the rest of the functions that
take a callback and keeps the skipping functionality.
There is a little cleanup necessary from PR #843. Since the
new callbacks return `GIT_EUSER` we have to be a little careful
about return values when they are used internally to the library.
Also, callbacks should be checked for non-zero return values,
not just less than zero.
This updates all the `foreach()` type functions across the library
that take callbacks from the user to have a consistent behavior.
The rules are:
* A callback terminates the loop by returning any non-zero value
* Once the callback returns non-zero, it will not be called again
(i.e. the loop stops all iteration regardless of state)
* If the callback returns non-zero, the parent fn returns GIT_EUSER
* Although the parent returns GIT_EUSER, no error will be set in
the library and `giterr_last()` will return NULL if called.
This commit makes those changes across the library and adds tests
for most of the iteration APIs to make sure that they follow the
above rules.
Fixes#824
Exporting variables in a dynamic library is a PITA. Let's keep
these values internally and wrap them through a helper method.
This doesn't break the external API. @arrbee, aren't you glad I turned
the `GIT_ATTR_` macros into function macros? ✨
The 'git revert/cherry-pick/merge -n' commands leave .git/MERGE_MSG
behind so that git-commit can find it. As we don't yet support these
operations, users who are shelling out to let git perform these
operations haven't had a convenient way to get this message.
These functions allow the user to retrieve the message and remove it
when she's created the commit.
Instad of each transport having its own function and logic to get to
its refs, store them directly in transport.
Leverage the new gitno_buffer to make the parsing and storing of the
refs use common code and get rid of the git_protocol struct.
This allows us to add capabilitites to both at the same time, keeps
them in sync and removes a lot of code.
gitno_buffer now uses a callback to fill its buffer, allowing us to
use the same interface for git and http (which uses callbacks).
For the transition, http is going to keep its own logic until the
git/common code catches up with the implied multi_ack that http
has. This also has the side-effect of making the code cleaner and more
correct regardingt he protocol.
git.git uses an inlined hashcmp function instead of memcmp, since it
performes much better when comparing hashes (most hashes compared
diverge within the first byte).
Measurements and rationale for the curious reader:
http://thread.gmane.org/gmane.comp.version-control.git/172286
Renamed git_checkout_index to what it really was,
and removed duplicate code from clone.c. Added
git_checkout_ref, which updates HEAD and hands off
to git_checkout_head.
Added tests for the options the caller can pass to
git_checkout_*.
This makes sure that an error code returned by the callback function
of `git_tree_walk` will stop the iteration and get propagated back
to the caller verbatim.
Also, this adds a minor helper function `git_tree_entry_byoid` that
searches a `git_tree` for an entry with the given OID. This isn't
a fast function, but it's easier than writing the loop yourself as
an external user of the library.
A diff that is created with a NULL options parameter could result
in a NULL prefix string, but diff merge was unconditionally
strdup'ing it. I added a test to replicate the issue and then a
new method that does the right thing with NULL values.
Josh Triplett noticed libgit2 actually does preorder entries in
tree_walk_post instead of postorder. Also, we continued walking even
when an error occured in the callback.
Fix#773; also, allow both pre- and postorder walking.
Also removes the unnecessary check for filter
length, since git_filters_apply does the right
thing when there are none, and it's more efficient
than this.
Not all delta bases are available on the first try. By delaying
resolving all deltas until the end, we avoid decompressing some of the
data twice or even more times, saving effort and time.
The correct way to advertise out capabilities is by appending them to
the first 'want' line, using SP as separator, instead of NUL as the
server does. Inconsistent documentation lead to the use of NUL in
libgit2.
Fix this so we can request much more efficient packs from the
remote which reduces the indexing time considerably.
passing 0 to git_strol(32|64) let the implementation guess if it's
dealing with an octal number or a decimal one.
Let's make it safe and ensure that both 'HEAD@{010}' and 'HEAD@{10}'
point at the same commit.
This added a flag to the `git_repository_set_workdir()` function
that enables generation of a `.git` gitlink file that links the
new workdir to the parent repository. Essentially, the flag tells
the function to write out the changes to disk to permanently set
the workdir of the repository to the new path.
If you pass this flag as true, then setting the workdir to something
other than the default workdir (i.e. the parent of the .git repo
directory), will create a plain file named ".git" with the standard
gitlink contents "gitdir: <repo-path>", and also update the
"core.worktree" and "core.bare" config values.
Setting the workdir to the default repo workdir will clear the
core.worktree flag (but still permanently set core.bare to false).
BTW, the libgit2 API does not currently provide a function for
clearing the workdir and converting a non-bare repo into a bare one.
Adding a new config iteration function that let's you iterate
over just the config entries that match a particular regular
expression. The old foreach becomes a simple use of this with
an empty pattern.
This also fixes an apparent bug in the existing `git_config_foreach`
where returning a non-zero value from the iteration callback was
not correctly aborting the iteration and the returned value was
not being propogated back to the caller of foreach.
Added to tests to cover all these changes.
This makes it easy to take a buffer containing a path with relative
references (i.e. .. or . path segments) and resolve all of those
into a clean path. This can be applied to URLs as well as file
paths which can be useful.
As part of this, I made the drive-letter detection apply on all
platforms, not just windows. If you give a path that looks like
"c:/..." on any platform, it seems like we might as well detect
that as a rooted path. I suppose if you create a directory named
"x:" on another platform and want to use that as the beginning
of a relative path under the root directory of your repo, this
could cause a problem, but then it seems like you're asking for
trouble.
* `git_buf_rfind` (with tests and tests for `git_buf_rfind_next`)
* `git_buf_puts_escaped` and `git_buf_puts_escaped_regex` (with tests)
to copy strings into a buffer while injecting an escape sequence
(e.g. '\') in front of particular characters.
passing 0 to git_strol(32|64) let the implementation guess if it's
dealing with an octal number or a decimal one.
Let's make it safe and ensure that both 'HEAD@{010}' and 'HEAD@{10}'
point at the same commit.
On GNU, the d_name field of the dirent structure is defined as "char d_name[1]",
so we must allocate more than sizeof(struct dirent) bytes, just like on Sun.
Once a file is registered, there is no way to deregister it, even
after the structure that contains it is no longer needed and has been
freed. This may be the source of #624.
Allow and use the deregister function to remove our file from the
global list.
Currently, the first call of git_indexer_stream_add adds the data to the
underlying pack file and opens it for later use, but doesn't start
parsing the already available data.
This means, git_indexer_stream_finalize only works if
git_indexer_stream_add was called at least twice. Kill this limitation
by parsing available data immediately.
When the repository was reinitialized, every configuration change in repo_init_config() was directly performed against the file on the filesystem. However, a previous version of the configuration had previously been loaded in memory and attached to the repository, in repo_init_reinit().
The repository was unaware of the change and the stale cached version of the configuration never refreshed.
So far they only create a repo, setup the "origin"
remote, and fetch. The API probably needs work as
well; there's no way to get progress information
at this point.
Also uncovered a shortcoming; git_remote_download
doesn't fetch over local transport.
1. compile warning:
D:\libgit2.git\src\win32\posix_w32.c: In function 'p_open':
D:\libgit2.git\src\win32\posix_w32.c:235:10: warning: 'mode_t' is promoted to 'int' when passed through '...' [enabled by default]
D:\libgit2.git\src\win32\posix_w32.c:235:10: note: (so you should pass 'int' not 'mode_t' to 'va_arg')
D:\libgit2.git\src\win32\posix_w32.c:235:10: note: if this code is reached, the program will abort
2. test crash.
3. the above two issues are same root cause. please see http://www.eskimo.com/~scs/cclass/int/sx11c.html
The call to repo_init_reinit already takes care of opening the
repository and giving us a git_repository object to give to the
caller. There is no need to call git_repository_open again.
If we find several objects with the same prefix, we need to free the
memory where we stored the earlier object. Keep track of the raw.data
pointer across read_prefix calls and free it if we find another
object.
oid_for_tree_path may not always find the path in the tree, in which
case we need to return an error. The current code doesn't do this and
results in undefined behavior.
This fixes git_index_add and git_index_append to behave more like
core git, preserving old filemode data in the index when adding
and/or appending with core.filemode = false.
This also has placeholder support for core.symlinks and
core.ignorecase, but those flags are not implemented (well,
symlinks has partial support for preserving mode information in
the same way that git does, but it isn't tested).
git_commit() and git_tag() no longer prettify the
message by default. This has to be taken care of
by the caller.
This has the nice side effect of putting the
caller in position to actually choose to strip
the comments or not.
Needs AmigaOS.cmake now from CMake package at OS4Depot, or contents below:
--8<--
SET(AMIGA 1)
SET(CMAKE_SHARED_LIBRARY_C_FLAGS "-fPIC")
SET(CMAKE_SHARED_LIBRARY_CREATE_C_FLAGS "-shared")
--8<--
When a configuration option is set, we didn't check to see whether
there was any escaping needed. Escape the available characters so we
can unescape them correctly when we read them.
On RAM: the .idx and .pack files become links to a .lock and the original download respectively.
Assume some feature (such as record locking) supported by SFS but not JXFS or RAM: is required.
When checking for a drive letter on windows, instead of using
isalpha(), it is better to just check for a..z and A..Z, I think,
particularly because the MS isalpha implementation appears to
assert when given an 0xFF byte.
There are three actual changes in this commit:
1. When the trailing newline of a file is removed in a diff, the
change will now be reported with `GIT_DIFF_LINE_DEL_EOFNL` passed
to the callback. Previously, the `ADD_EOFNL` constant was given
which was just an error in my understanding of when the various
circumstances arose. `GIT_DIFF_LINE_ADD_EOFNL` is deprecated and
should never be generated. A new newline is simply an `ADD`.
2. Rewrote the `diff_delta__merge_like_cgit` function that contains
the core logic of the `git_diff_merge` implementation. The new
version doesn't actually have significantly different behavior,
but the logic should be much more obvious, I think.
3. Fixed a bug in `git_diff_merge` where it freed a string pool
while some of the string data was still in use. This led to
`git_diff_print_patch` accessing memory that had been freed.
The rest of this commit contains improved documentation in `diff.h`
to make the behavior and the equivalencies with core git clearer,
and a bunch of new tests to cover the various cases, oh and a minor
simplification of `examples/diff.c`.
File modes were both not being ignored properly on platforms
where they should be ignored, nor be diffed consistently on
platforms where they are supported.
This change adds a number of diff and status filemode change
tests. This also makes sure that filemode-only changes are
included in the diff output when they occur and that filemode
changes are ignored successfully when core.filemode is false.
There is no code that automatically toggles core.filemode
based on the capabilities of the current platform, so the user
still needs to be careful in their .git/config file.
- Do not create new levels of fanout when creating notes from libgit2
- Insert a note in an existing matching fanout
- Remove a note from an existing fanout
- Cleanup git_note_read, git_note_remove, git_note_foreach, git_note_create methods in order use tree structures instead of tree_oids
git_status_file would always return GIT_ENOTFOUND for these files.
The underlying bug was that git__strcmp_cb, which is used by
git_path_with_stat_cmp to sort entries in the working directory,
compares strings based on unsigned chars (this is confirmed by the
strcmp(3) manpage), while git__prefixcmp, which is used by
workdir_iterator__entry_cmp to search for a path in the working
directory, compares strings based on char. So the sort puts this path at
the end of the list, while the search expects it to be at the beginning.
The fix was simply to make git__prefixcmp compare using unsigned chars,
just like strcmp(3). The rest of the change is just adding/updating
tests.
Converted an internal utility to return an oid,
rather than a tree entry (whose lifetime is tied
to the parent tree, which was freed before
returning).
The error codes from failed lookups of system and global files
on Windows were not consistent with the codes returned on other
platforms. This makes the error detection patterns match and
adds a unit test for the various errors.
This fixes two bugs:
* Issue #728 where git_status_file was not working for files
that contain spaces. This was caused by reusing the "fnmatch"
parsing code from ignore and attribute files to interpret the
"pathspec" that constrained the files to apply the status to.
In that code, unescaped whitespace was considered terminal to
the pattern, so a file with internal whitespace was excluded
from the matched files. The fix was to add a mode to that code
that allows spaces and tabs inside patterns. This mode only
comes into play when parsing in-memory strings.
* The other issue was undetected, but it was in the recently
added code to reload gitattributes / gitignores when they were
changed on disk. That code was not clearing out the old values
from the cached file content before reparsing which meant that
newly added patterns would be read in, but deleted patterns
would not be removed. The fix was to clear the vector of
patterns in a cached file before reparsing the file.
The function to convert UTF-16 to UTF-8 was only allocating a
buffer of wcslen(utf16str) bytes for the UTF-8 string, but that
is not sufficient if you have multibyte characters, and so when
those occured, the conversion was failing. This updates the
conversion functions to use the Win APIs to calculate the correct
buffer lengths.
Also fixes a comparison in the unit tests that would fail if
you did not have a particular environment variable set.
We used to consider a missing core.bare option to mean that the
repository was corrupt. This is too strict. Consider it a non-bare
repository if it's not set.
On Windows, we are having problems with home directories
that have non-ascii characters in them. This rewrites the
relevant code to fetch environment variables as UTF-16 and
then explicitly map then into UTF-8 for our internal usage.
If it's not available, an error saying so will be returned when trying
to use a https:// URL.
This also unifies a lot of the network code to use git_transport in
many places instead of an socket descriptor.
Local fetch isn't implemented yet. Don't segfault on call, but set a
dummy for negotiate_fetch and terminate gracefully.
Reported-by: Brad Harder <bch@methodlogic.net>
Creating a workdir iterator on a directory with absolutely
no files was returning an error (GIT_ENOTFOUND) instead of
an iterator for nothing. This fixes that and includes two
new tests that cover that case.
GProf shows `git_text_gather_stats` as the most expensive call
in large diffs. The function calculates a lot of information
that is not actually used and does not do so in a optimal
order. This introduces a tuned `git_buf_is_binary` function
that executes the same algorithm in a fraction of the time.
There was a bug where tracked files inside directories that were
inside ignored directories where not being found by status. To
make that a little clearer, if you have a .gitignore with:
ignore/
And then have the following files:
ignore/dir/tracked <-- actually a tracked file
ignore/dir/untracked <-- should be ignored
Then we would show the tracked file as being removed (because
when we got the to contained item "dir/" inside the ignored
directory, we decided it was safe to skip -- bzzt, wrong!).
This update is much more careful about checking that we are
not skipping over any prefix of a tracked item, regardless of
whether it is ignored or not.
As documented in diff.c, this commit does create behavior that
still differs from core git with regards to the handling of
untracked files contained inside ignored directories. With
libgit2, those files will just not show up in status or diff.
With core git, those files don't show up in status or diff
either *unless* they are explicitly ignored by a .gitignore
pattern in which case they show up as ignored files.
Needless to say, this is a local behavior difference only, so
it should not be important and (to me) the libgit2 behavior
seems more consistent.
Ported the win32 implementations of gmtime_r,
localtime_r, and gettimeofday to be part of the
posix compatibility layer, and fixed
git_signature_now to use them.
The goal of this work is to rewrite git_status_file to use the
same underlying code as git_status_foreach.
This is done in 3 phases:
1. Extend iterators to allow ranged iteration with start and
end prefixes for the range of file names to be covered.
2. Improve diff so that when there is a pathspec and there is
a common non-wildcard prefix of the pathspec, it will use
ranged iterators to minimize excess iteration.
3. Rewrite git_status_file to call git_status_foreach_ext
with a pathspec that covers just the one file being checked.
Since ranged iterators underlie the status & diff implementation,
this is actually fairly efficient. The workdir iterator does
end up loading the contents of all the directories down to the
single file, which should ideally be avoided, but it is pretty
good.
From the description of git_revwalk_reset in revwalk.h the function should
clear all pushed and hidden commits, and leave the walker in a blank state (just like at creation).
Apparently everything gets reseted appart of pushed commits (walk->one and walk->twos)
This fix should reset the walker properly.