In order to be loaded, a remote needs to be configured with at least a `url` or a `pushurl`.
ENOTFOUND will be returned when trying to git_remote_load() a remote with neither of these entries defined.
This loads SRWLock APIs at runtime and in their absence (i.e. on
Windows before Vista) falls back on a regular CRITICAL_SECTION
that will not permit concurrent readers.
9e9aee6 added an include <netinet/in.h> to fix the build on FreeBSD.
Sometime since then the same header is included ifndef _WIN32, so
remove the duplicate include.
This converts an internal lock from a write lock to a read lock
where write isn't needed, and also clarifies some doc things about
where various locks are acquired and how various APIs are intended
to be used.
This adds thread safety to the refdb_fs by using the new
git_sortedcache object and also by relaxing the handling of some
filesystem errors where the fs may be changed out from under us.
This also adds some new threading tests that hammer on the refdb.
The refdb_fs implementation calls realloc directly on a reference
object when it wants to rename it. It is not a public object, so
this doesn't mess with the immutability of references, but it does
assume certain constraints on the reference representation. This
commit wraps that assumption in an isolated API to isolate it.
This adds a convenient new data type for caching the contents of
file in memory when each item in that file corresponds to a name
and you need to both be able to lookup items by name and iterate
over them in some sorted order. The new data type has locks in
place to manage usage in a threaded environment.
If there were symbolic refs among the loose refs then the code
to create packed-refs would fail trying to parse the OID out of
them (where Git just skips trying to pack them). This fixes it.
When a git_buf contains a UTF-8 BOM, the three bytes comprising
that BOM are treated as unprintable characters. For a small git_buf,
the three BOM characters overwhelm the printable characters. This
is problematic when trying to check out a small file as the CR/LF
filtering will not apply.
p_inet_pton on Windows should set errno properly for callers.
Rewrite p_inet_pton to handle error cases correctly and add
test cases to exercise this function.
Report the index being locked with its own error code in order to be
able to differentiate, as a locked index is typically the result of a
crashed process or concurrent access, both of which often require user
intervention to fix.
If none of the backends support direct writes and we must stream the
whole file, we already know what the object's id should be; so use the
stream's functions directly, bypassing the frontend's hashing and
overwriting of our existing id.
The frontend is in charge of calculating the id of the objects. Thus
the backends should treat it as a read-only value. The positioning in
the function signature made it seem as though it was an output
parameter.
Make the id const and move it from the front to behind the subject
(backend or stream).
When dealing with a chain of tags, we need to enqueue each of them
individually, which means we can't use `git_tag_peel` as that jumps
over the intermediate tags.
Do the peeling manually so we can look at each object and take the
appropriate action.
Hash the data as it's coming into the stream and tell the backend what
its name is when finalizing the write. This makes it consistent with
the way a plain git_odb_write() performs the write.
This is in preparation for moving the hashing to the frontend, which
requires us to handle the incoming data before passing it to the
backend's stream.
This fixes a small memory leak in git_revparse where early returns on
errors from git_revparse_single cause a free() on the (reallocated) left
side of the revspec to be skipped.
Accept any value for the remote's url, including an empty string which
we used to reject as invalid configuration.
This is not quite what git does (although it has its own problems with
such configurations) and it makes it harder to fix the issue, by not
letting the user modify it.
As we already need to check for a valid URL when we try to connect to
the network, let that perform the check, as we don't need to do it
anywhere else.
This reverts refactoring done in 13224ea4aa
that introduces a performance regression for NFS when reading files that
don't exist. open() forces a cache invalidation on NFS, while stat()ing a
file just uses the cache and is very quick.
To give a specific example, say you have a repo with a thousand packed
refs. Before this change, looking up every single one ould incur a thousand
slow open() calls. With this change, it's a thousand fast stat() calls.
This is just a bunch of small fixes that I noticed while looking
at the UTF8 and UTF16 path stuff. It fixes a slowdown in looking
for an empty directory (not exiting loop asap), makes the dir name
in the git__DIR structure be a GIT_FLEX_ARRAY to save an allocation,
and fixes some slightly odd assumptions in the cl_getenv helper.
Key-based authentication also needs an username, so include it in each
one.
Also stop assuming a default username of "git" in the ssh transport
which has no business making such a decision.
The routines to push and pop ignore files while traversing a
directory had some issues. In particular, setting up the initial
list would sometimes push an ignore file before it ought to be
applied if the starting path was a directory containing an ignore
file. Also, the pop function was not always matching the right
part of the path and would fail to pop ignores from the list in
some cases.
This adds some tests that exercise a particular problematic case
and then fixes the problems that I could find related to this.
At some point, I'd like to isolate this ignore rule management
code and rewrite it, but that's a larger project and right now,
I'll opt to just try to fix the broken behaviors.
This rolls back the changes to fnmatch parsing from commit
2e40a60e84 except for the tests
that were added. Instead this adds couple of new flags that can
be passed in when attempting to parse an fnmatch pattern. Also,
this changes the pathspec match logic to special case matching a
filename with a '!' prefix against a negative pattern.
This fixes the build.
`git_config_set_string(config, "config.section", "")` fails when
escaping the value.
The buffer in `escape_value` is allocated without NULL-termination. And
in case of empty string 0 is passed for buffer size in `git_buf_grow`.
`git_buf_detach` returns NULL when the allocated size is 0 and that
leads to an error return in `GITERR_CHECK_ALLOC` called after
`escape_value`
The change in `config_file.c` was suggested by Russell Belfer <rb@github.com>
new functions in struct git_config_backend:
* iterator_new(...)
* iterator_free(...)
* next(...)
The old callback based foreach style can still be used with `git_config_backend_foreach_match`
This step is needed to easily add iterators to git_config_backend
As well use these new git_strmap functions to implement foreach
* git_strmap_iter
* git_strmap_has_data(...)
* git_strmap_begin(...)
* git_strmap_end(...)
* git_strmap_next(...)
In git_diff_paired_foreach, temporarily resort the
index->workdir diff list by index path so that we can
track a rename in the workdir from head->index->workdir.
When using a rename source that is actually a to-be-split record,
we have to update the best-fit mapping data in both the case where
the target is also a split record and the case where the target
is a simple added record. Before this commit, we were only doing
the update when the target was itself a split record (and even in
that case, the test was slightly wrong).
After doing further profiling, I found that a lot of time was
being spent attempting to insert hashes into the file hash
signature when using the rolling hash because the rolling hash
approach generates a hash per byte of the file instead of one
per run/line of data.
To optimize this, I decided to convert back to a run-based file
signature algorithm which would be more like core Git.
After changing this, a number of the existing tests started to
fail. In some cases, this appears to have been because the test
was coded to be too specific to the particular results of the file
similarity metric and in some cases there appear to have been bugs
in the core rename detection code where only by the coincidence
of the file similarity scoring were the expected results being
generated.
This renames all the variables in the core rename detection code
to be more consistent and hopefully easier to follow which made it
a bit easier to reason about the behavior of that code and fix the
problems that I was seeing. I think it's in better shape now.
There are a couple of tests now that attempt to stress test the
rename detection code and they are quite slow. Most of the time
is spent setting up the test data on disk and in the index. When
we roll out performance improvements for index insertion, it
should also speed up these tests I hope.
The size data in the index may not reflect the actual size of the
blob data from the ODB when content filtering comes into play.
This commit fixes rename detection to use the actual blob size when
calculating data signatures instead of the value from the index.
Because of a misunderstanding on my part, I first converted the
git_index_add_bypath API to use the post-filtered blob data size
in creating the index entry. I backed that change out, but I
kept the overall refactoring of that routine and the new internal
git_blob__create_from_paths API because it eliminates an extra
stat() call from the code that adds a file to the index.
The existing tests actually cover this code path, at least when
running on Windows, so at this point I'm not adding new tests to
cover the changes.
The previous fix for checking file sizes with rename detection
always loads the blob. In this version, if the odb backend can
get the object header without loading the whole thing into memory,
then we'll just use that, so that we can eliminate possible rename
sources & targets without loading them.
The performance improvements I introduced for rename detection
were not able to run successfully for tree-to-tree diffs because
the blob size was not known early enough and so the file signature
always had to be calculated nonetheless.
This change separates loading blobs into memory from calculating
the signature. I can't avoid having to load the large blobs into
memory, but by moving it forward, I'm able to avoid the signature
calculation if the blob won't come into play for renames.
This restores the usage of GIT_DIFF_LINE_BINARY for the diff
output line that reads "Binary files x and y differ" so that it
can be optionally colorized independently of the file header.
This allows git_diff_patch_size to account for hunk headers and
file headers in the returned size. This required some refactoring
of the code that is used to print file headers so that it could be
invoked by the git_diff_patch_size API.
Also this increases the test coverage and fixes an off-by-one bug
in the size calculation when newline changes happen at the end of
the file.
Instead of using lots of strdup calls, this adds a memory pool to
the loose refs iteration code and uses it for keeping track of the
loose refs array. Memory usage could probably be reduced even
further by eliminating the vector and just scanning by adding the
strlen of each ref, but that would be a more intrusive changes.
This also updates the error handling to be more thorough about
checking for failed allocations, etc.
The git_reference_next API silently skips invalid references when
scanning the loose refs. The git_reference_next_name API should
skip the same ones even though it isn't creating the reference
object.
This adds a test with a an invalid loose reference and makes sure
that both APIs skip the same entries and generate the same results.
This makes git__swap use the __sync_lock_test_and_set primitive
with GCC and the InterlockedExchangePointer primitive with MSVC.
Previously is used compare_and_swap in a way that was probably
unintuitive for most thinking (i.e. it could fail to swap in the
value if another thread raced in). Now it will always succeed
and the last thread to run in a race will win instead of the
first thread.
This also fixes up a little confusion between volatile void **
and void * volatile * that came up with the Win32 compiler.
This restores a behavior that was accidentally lost during some
diff refactoring where an untracked directory that contains a .git
item should be treated as IGNORED, not as UNTRACKED. The submodule
code already detects this, but the diff code was not handling the
scenario right.
This also updates a number of existing tests that were actually
exercising the behavior but did not have the right expectations in
place. It actually makes the new
`test_diff_submodules__diff_ignore_options` test feel much better
because the "not-a-submodule" entries are now ignored instead of
showing up as untracked items.
Fixes#1697
This adds correct support for an equivalent to --ignore-submodules
in diff, where an actual ignore value can be passed to diff to
override the per submodule settings in the configuration.
This required tweaking the constants for ignore values so that
zero would not be used and could represent an unset option to the
diff. This was an opportunity to move the submodule values into
include/git2/types.h and to rename the poorly named DEFAULT values
for ignore and update constants to RESET instead.
Now the GIT_DIFF_IGNORE_SUBMODULES flag is exactly the same as
setting the ignore_submodules option to GIT_SUBMODULE_IGNORE_ALL
(which is actually a minor change from the old behavior in that
submodules will now be treated as UNMODIFIED deltas instead of
being left out totally - if you set GIT_DIFF_INCLUDE_UNMODIFIED).
This includes tests for the various new settings.
Submodules now expose an internal status API that allows diff to
get back the OID values from the submodule very easily and also
to avoiding caching issues and to override the ignore setting for
the submodule.
This fixes the way that submodule status is checked to bypass just
about all of the caching in the submodule object. Based on the
ignore value, it will try to do the minimum work necessary to find
the current status of the submodule - but it will actually go to
disk to get all of the current values.
This also removes the custom refcounting stuff in favor of the
common git_refcount style. Right now, it is still for internal
purposes only, but it should make it easier to add true submodule
refcounting in the future with a public git_submodule_free call
that will allow bindings not to worry about the submodule object
getting freed from underneath them.
This adds a BARE option to git_repository_open_ext which allows
a fast open path that still knows how to read gitlinks and to
search for the actual .git directory from a subdirectory.
`git_repository_open_bare` is still simpler and faster, but having
a gitlink aware fast open is very useful for submodules where we
want to quickly be able to peek at the HEAD and index data without
doing any other meaningful repo operations.
This is probably not the final form of this change, but this is
a preliminary version of checking a timestamp to see if the cached
working directory HEAD OID matches the current. Right now, this
uses the timestamp on the index and is, like most of our timestamp
checking, subject to having only second accuracy.
This adds an additional pathspec API that will match a pathspec
against a diff object. This is convenient if you want to handle
renames (so you need the whole diff and can't use the pathspec
constraint built into the diff API) but still want to tell if the
diff had any files that matched the pathspec.
When the pathspec is matched against a diff, instead of keeping
a list of filenames that matched, instead the API keeps the list
of git_diff_deltas that matched and they can be retrieved via a
new API git_pathspec_match_list_diff_entry.
There are a couple of other minor API extensions here that were
mostly for the sake of convenience and to reduce dependencies
on knowing the internal data structure between files inside the
library.
This is a simple bit vector object that is not resizable after
the initial allocation but can be of arbitrary size. It will
keep the bti vector entirely on the stack for vectors 64 bits
or less, and will allocate the vector on the heap for larger
sizes. The API is uniform regardless of storage location.
This is very basic right now and all the APIs are inline functions,
but it is useful for storing an array of boolean values.
This converts the array of parent SHAs from a git_vector where
each SHA has to be separately allocated to a git_array_t where
all the SHAs can be kept in one block. Since the two collections
have almost identical APIs, there isn't much involved in making
the change. I did add an API to git_array_t so that it could be
allocated at a precise initial size.
This fixes the way the example log program decides if a merge
commit should be shown when a pathspec is given. Also makes it
easier to use the pathspec API to just check "does a tree match
anything in the pathspec" without allocating a match list.
This adds a new public API for compiling pathspecs and matching
them against the working directory, the index, or a tree from the
repository. This also reworks the pathspec internals to allow the
sharing of code between the existing internal usage of pathspec
matching and the new external API.
While this is working and the new API is ready for discussion, I
think there is still an incorrect behavior in which patterns are
always matched against the full path of an entry without taking
the subdirectories into account (so "s*" will match "subdir/file"
even though it wouldn't with core Git). Further enhancements are
coming, but this was a good place to take a functional snapshot.
The SSH error checking and reporting could still be further
improved by using the libssh2 native methods to get error info,
but at least this ensures that all error codes are checked and
translated into libgit2 error messages.
If there is not an error, the return value was always the return value
of the last call to file->get_multivar
With this commit GIT_ENOTFOUND is only returned if all the calls to
filge-get_multivar return GIT_ENOTFOUND.
This makes all of the credential objects use the same pattern to
clear the contents and call git__memzero when done. Much of this
information is probably not sensitive, but it also seems better
to just clear consistently.
Much of the SSH credential creation API can be left enabled even
on platforms with no SSH support. We really just have to give an
error when you attempt to open the SSH connection.
The diff hunk context string that is returned to xdiff need not
be NUL terminated because the xdiff code just copies the number of
bytes that you report directly into the output. There was an off
by one in the diff driver code when the header context was longer
than the output buffer size, the output buffer length included
the NUL byte which was copied into the hunk header.
Fixes#1710
This option serves no benefit now that the git_status_list API
is available. It was of questionable value before and now it
would just be a bad idea to use it rather than the indexed API.
The index isn't really thread safe for the most part, but we can
easily be more careful and avoid double frees and the like, which
are serious problems (as opposed to a lookup which might return
the incorrect value but if the index in being updated, that is
much harder to avoid).
In both of these cases, the submodule data should still be loaded
just (obviously) without the data that comes from either the index
or the HEAD.
This fixes a bug in the orphaned head case.
There was a bug where submodules whose HEAD had not been moved
were being marked as having an UNMODIFIED delta record instead
of being left MODIFIED. This fixes that and fixes the tests to
notice if a submodule has been incorrectly marked as UNMODIFIED.