This simplifies the git_repository_is_empty a bit so that a
detached HEAD is just taken to mean the repo is not empty, since
a newly initialized repo will not have a detached HEAD.
Ensure that we apply splits to rewrites, even if we're not
interested in examining it closely for rename/copy detection.
In keeping with core git, status should not display rewrites,
it should simply show files as "modified".
In order to be loaded, a remote needs to be configured with at least a `url` or a `pushurl`.
ENOTFOUND will be returned when trying to git_remote_load() a remote with neither of these entries defined.
This loads SRWLock APIs at runtime and in their absence (i.e. on
Windows before Vista) falls back on a regular CRITICAL_SECTION
that will not permit concurrent readers.
9e9aee6 added an include <netinet/in.h> to fix the build on FreeBSD.
Sometime since then the same header is included ifndef _WIN32, so
remove the duplicate include.
This converts an internal lock from a write lock to a read lock
where write isn't needed, and also clarifies some doc things about
where various locks are acquired and how various APIs are intended
to be used.
This adds thread safety to the refdb_fs by using the new
git_sortedcache object and also by relaxing the handling of some
filesystem errors where the fs may be changed out from under us.
This also adds some new threading tests that hammer on the refdb.
The refdb_fs implementation calls realloc directly on a reference
object when it wants to rename it. It is not a public object, so
this doesn't mess with the immutability of references, but it does
assume certain constraints on the reference representation. This
commit wraps that assumption in an isolated API to isolate it.
This adds a convenient new data type for caching the contents of
file in memory when each item in that file corresponds to a name
and you need to both be able to lookup items by name and iterate
over them in some sorted order. The new data type has locks in
place to manage usage in a threaded environment.
If there were symbolic refs among the loose refs then the code
to create packed-refs would fail trying to parse the OID out of
them (where Git just skips trying to pack them). This fixes it.
When a git_buf contains a UTF-8 BOM, the three bytes comprising
that BOM are treated as unprintable characters. For a small git_buf,
the three BOM characters overwhelm the printable characters. This
is problematic when trying to check out a small file as the CR/LF
filtering will not apply.
p_inet_pton on Windows should set errno properly for callers.
Rewrite p_inet_pton to handle error cases correctly and add
test cases to exercise this function.
Report the index being locked with its own error code in order to be
able to differentiate, as a locked index is typically the result of a
crashed process or concurrent access, both of which often require user
intervention to fix.
If none of the backends support direct writes and we must stream the
whole file, we already know what the object's id should be; so use the
stream's functions directly, bypassing the frontend's hashing and
overwriting of our existing id.
The frontend is in charge of calculating the id of the objects. Thus
the backends should treat it as a read-only value. The positioning in
the function signature made it seem as though it was an output
parameter.
Make the id const and move it from the front to behind the subject
(backend or stream).
When dealing with a chain of tags, we need to enqueue each of them
individually, which means we can't use `git_tag_peel` as that jumps
over the intermediate tags.
Do the peeling manually so we can look at each object and take the
appropriate action.
Hash the data as it's coming into the stream and tell the backend what
its name is when finalizing the write. This makes it consistent with
the way a plain git_odb_write() performs the write.
This is in preparation for moving the hashing to the frontend, which
requires us to handle the incoming data before passing it to the
backend's stream.
This fixes a small memory leak in git_revparse where early returns on
errors from git_revparse_single cause a free() on the (reallocated) left
side of the revspec to be skipped.
Accept any value for the remote's url, including an empty string which
we used to reject as invalid configuration.
This is not quite what git does (although it has its own problems with
such configurations) and it makes it harder to fix the issue, by not
letting the user modify it.
As we already need to check for a valid URL when we try to connect to
the network, let that perform the check, as we don't need to do it
anywhere else.
This reverts refactoring done in 13224ea4aa
that introduces a performance regression for NFS when reading files that
don't exist. open() forces a cache invalidation on NFS, while stat()ing a
file just uses the cache and is very quick.
To give a specific example, say you have a repo with a thousand packed
refs. Before this change, looking up every single one ould incur a thousand
slow open() calls. With this change, it's a thousand fast stat() calls.
This is just a bunch of small fixes that I noticed while looking
at the UTF8 and UTF16 path stuff. It fixes a slowdown in looking
for an empty directory (not exiting loop asap), makes the dir name
in the git__DIR structure be a GIT_FLEX_ARRAY to save an allocation,
and fixes some slightly odd assumptions in the cl_getenv helper.
Key-based authentication also needs an username, so include it in each
one.
Also stop assuming a default username of "git" in the ssh transport
which has no business making such a decision.
The routines to push and pop ignore files while traversing a
directory had some issues. In particular, setting up the initial
list would sometimes push an ignore file before it ought to be
applied if the starting path was a directory containing an ignore
file. Also, the pop function was not always matching the right
part of the path and would fail to pop ignores from the list in
some cases.
This adds some tests that exercise a particular problematic case
and then fixes the problems that I could find related to this.
At some point, I'd like to isolate this ignore rule management
code and rewrite it, but that's a larger project and right now,
I'll opt to just try to fix the broken behaviors.
This rolls back the changes to fnmatch parsing from commit
2e40a60e84 except for the tests
that were added. Instead this adds couple of new flags that can
be passed in when attempting to parse an fnmatch pattern. Also,
this changes the pathspec match logic to special case matching a
filename with a '!' prefix against a negative pattern.
This fixes the build.
`git_config_set_string(config, "config.section", "")` fails when
escaping the value.
The buffer in `escape_value` is allocated without NULL-termination. And
in case of empty string 0 is passed for buffer size in `git_buf_grow`.
`git_buf_detach` returns NULL when the allocated size is 0 and that
leads to an error return in `GITERR_CHECK_ALLOC` called after
`escape_value`
The change in `config_file.c` was suggested by Russell Belfer <rb@github.com>
new functions in struct git_config_backend:
* iterator_new(...)
* iterator_free(...)
* next(...)
The old callback based foreach style can still be used with `git_config_backend_foreach_match`
This step is needed to easily add iterators to git_config_backend
As well use these new git_strmap functions to implement foreach
* git_strmap_iter
* git_strmap_has_data(...)
* git_strmap_begin(...)
* git_strmap_end(...)
* git_strmap_next(...)
In git_diff_paired_foreach, temporarily resort the
index->workdir diff list by index path so that we can
track a rename in the workdir from head->index->workdir.
When using a rename source that is actually a to-be-split record,
we have to update the best-fit mapping data in both the case where
the target is also a split record and the case where the target
is a simple added record. Before this commit, we were only doing
the update when the target was itself a split record (and even in
that case, the test was slightly wrong).
After doing further profiling, I found that a lot of time was
being spent attempting to insert hashes into the file hash
signature when using the rolling hash because the rolling hash
approach generates a hash per byte of the file instead of one
per run/line of data.
To optimize this, I decided to convert back to a run-based file
signature algorithm which would be more like core Git.
After changing this, a number of the existing tests started to
fail. In some cases, this appears to have been because the test
was coded to be too specific to the particular results of the file
similarity metric and in some cases there appear to have been bugs
in the core rename detection code where only by the coincidence
of the file similarity scoring were the expected results being
generated.
This renames all the variables in the core rename detection code
to be more consistent and hopefully easier to follow which made it
a bit easier to reason about the behavior of that code and fix the
problems that I was seeing. I think it's in better shape now.
There are a couple of tests now that attempt to stress test the
rename detection code and they are quite slow. Most of the time
is spent setting up the test data on disk and in the index. When
we roll out performance improvements for index insertion, it
should also speed up these tests I hope.
The size data in the index may not reflect the actual size of the
blob data from the ODB when content filtering comes into play.
This commit fixes rename detection to use the actual blob size when
calculating data signatures instead of the value from the index.
Because of a misunderstanding on my part, I first converted the
git_index_add_bypath API to use the post-filtered blob data size
in creating the index entry. I backed that change out, but I
kept the overall refactoring of that routine and the new internal
git_blob__create_from_paths API because it eliminates an extra
stat() call from the code that adds a file to the index.
The existing tests actually cover this code path, at least when
running on Windows, so at this point I'm not adding new tests to
cover the changes.
The previous fix for checking file sizes with rename detection
always loads the blob. In this version, if the odb backend can
get the object header without loading the whole thing into memory,
then we'll just use that, so that we can eliminate possible rename
sources & targets without loading them.
The performance improvements I introduced for rename detection
were not able to run successfully for tree-to-tree diffs because
the blob size was not known early enough and so the file signature
always had to be calculated nonetheless.
This change separates loading blobs into memory from calculating
the signature. I can't avoid having to load the large blobs into
memory, but by moving it forward, I'm able to avoid the signature
calculation if the blob won't come into play for renames.
This restores the usage of GIT_DIFF_LINE_BINARY for the diff
output line that reads "Binary files x and y differ" so that it
can be optionally colorized independently of the file header.
This allows git_diff_patch_size to account for hunk headers and
file headers in the returned size. This required some refactoring
of the code that is used to print file headers so that it could be
invoked by the git_diff_patch_size API.
Also this increases the test coverage and fixes an off-by-one bug
in the size calculation when newline changes happen at the end of
the file.
Instead of using lots of strdup calls, this adds a memory pool to
the loose refs iteration code and uses it for keeping track of the
loose refs array. Memory usage could probably be reduced even
further by eliminating the vector and just scanning by adding the
strlen of each ref, but that would be a more intrusive changes.
This also updates the error handling to be more thorough about
checking for failed allocations, etc.
The git_reference_next API silently skips invalid references when
scanning the loose refs. The git_reference_next_name API should
skip the same ones even though it isn't creating the reference
object.
This adds a test with a an invalid loose reference and makes sure
that both APIs skip the same entries and generate the same results.
This makes git__swap use the __sync_lock_test_and_set primitive
with GCC and the InterlockedExchangePointer primitive with MSVC.
Previously is used compare_and_swap in a way that was probably
unintuitive for most thinking (i.e. it could fail to swap in the
value if another thread raced in). Now it will always succeed
and the last thread to run in a race will win instead of the
first thread.
This also fixes up a little confusion between volatile void **
and void * volatile * that came up with the Win32 compiler.
This restores a behavior that was accidentally lost during some
diff refactoring where an untracked directory that contains a .git
item should be treated as IGNORED, not as UNTRACKED. The submodule
code already detects this, but the diff code was not handling the
scenario right.
This also updates a number of existing tests that were actually
exercising the behavior but did not have the right expectations in
place. It actually makes the new
`test_diff_submodules__diff_ignore_options` test feel much better
because the "not-a-submodule" entries are now ignored instead of
showing up as untracked items.
Fixes#1697
This adds correct support for an equivalent to --ignore-submodules
in diff, where an actual ignore value can be passed to diff to
override the per submodule settings in the configuration.
This required tweaking the constants for ignore values so that
zero would not be used and could represent an unset option to the
diff. This was an opportunity to move the submodule values into
include/git2/types.h and to rename the poorly named DEFAULT values
for ignore and update constants to RESET instead.
Now the GIT_DIFF_IGNORE_SUBMODULES flag is exactly the same as
setting the ignore_submodules option to GIT_SUBMODULE_IGNORE_ALL
(which is actually a minor change from the old behavior in that
submodules will now be treated as UNMODIFIED deltas instead of
being left out totally - if you set GIT_DIFF_INCLUDE_UNMODIFIED).
This includes tests for the various new settings.
Submodules now expose an internal status API that allows diff to
get back the OID values from the submodule very easily and also
to avoiding caching issues and to override the ignore setting for
the submodule.
This fixes the way that submodule status is checked to bypass just
about all of the caching in the submodule object. Based on the
ignore value, it will try to do the minimum work necessary to find
the current status of the submodule - but it will actually go to
disk to get all of the current values.
This also removes the custom refcounting stuff in favor of the
common git_refcount style. Right now, it is still for internal
purposes only, but it should make it easier to add true submodule
refcounting in the future with a public git_submodule_free call
that will allow bindings not to worry about the submodule object
getting freed from underneath them.
This adds a BARE option to git_repository_open_ext which allows
a fast open path that still knows how to read gitlinks and to
search for the actual .git directory from a subdirectory.
`git_repository_open_bare` is still simpler and faster, but having
a gitlink aware fast open is very useful for submodules where we
want to quickly be able to peek at the HEAD and index data without
doing any other meaningful repo operations.
This is probably not the final form of this change, but this is
a preliminary version of checking a timestamp to see if the cached
working directory HEAD OID matches the current. Right now, this
uses the timestamp on the index and is, like most of our timestamp
checking, subject to having only second accuracy.
This adds an additional pathspec API that will match a pathspec
against a diff object. This is convenient if you want to handle
renames (so you need the whole diff and can't use the pathspec
constraint built into the diff API) but still want to tell if the
diff had any files that matched the pathspec.
When the pathspec is matched against a diff, instead of keeping
a list of filenames that matched, instead the API keeps the list
of git_diff_deltas that matched and they can be retrieved via a
new API git_pathspec_match_list_diff_entry.
There are a couple of other minor API extensions here that were
mostly for the sake of convenience and to reduce dependencies
on knowing the internal data structure between files inside the
library.
This is a simple bit vector object that is not resizable after
the initial allocation but can be of arbitrary size. It will
keep the bti vector entirely on the stack for vectors 64 bits
or less, and will allocate the vector on the heap for larger
sizes. The API is uniform regardless of storage location.
This is very basic right now and all the APIs are inline functions,
but it is useful for storing an array of boolean values.
This converts the array of parent SHAs from a git_vector where
each SHA has to be separately allocated to a git_array_t where
all the SHAs can be kept in one block. Since the two collections
have almost identical APIs, there isn't much involved in making
the change. I did add an API to git_array_t so that it could be
allocated at a precise initial size.
This fixes the way the example log program decides if a merge
commit should be shown when a pathspec is given. Also makes it
easier to use the pathspec API to just check "does a tree match
anything in the pathspec" without allocating a match list.
This adds a new public API for compiling pathspecs and matching
them against the working directory, the index, or a tree from the
repository. This also reworks the pathspec internals to allow the
sharing of code between the existing internal usage of pathspec
matching and the new external API.
While this is working and the new API is ready for discussion, I
think there is still an incorrect behavior in which patterns are
always matched against the full path of an entry without taking
the subdirectories into account (so "s*" will match "subdir/file"
even though it wouldn't with core Git). Further enhancements are
coming, but this was a good place to take a functional snapshot.
The SSH error checking and reporting could still be further
improved by using the libssh2 native methods to get error info,
but at least this ensures that all error codes are checked and
translated into libgit2 error messages.
If there is not an error, the return value was always the return value
of the last call to file->get_multivar
With this commit GIT_ENOTFOUND is only returned if all the calls to
filge-get_multivar return GIT_ENOTFOUND.
This makes all of the credential objects use the same pattern to
clear the contents and call git__memzero when done. Much of this
information is probably not sensitive, but it also seems better
to just clear consistently.
Much of the SSH credential creation API can be left enabled even
on platforms with no SSH support. We really just have to give an
error when you attempt to open the SSH connection.
The diff hunk context string that is returned to xdiff need not
be NUL terminated because the xdiff code just copies the number of
bytes that you report directly into the output. There was an off
by one in the diff driver code when the header context was longer
than the output buffer size, the output buffer length included
the NUL byte which was copied into the hunk header.
Fixes#1710
This option serves no benefit now that the git_status_list API
is available. It was of questionable value before and now it
would just be a bad idea to use it rather than the indexed API.
The index isn't really thread safe for the most part, but we can
easily be more careful and avoid double frees and the like, which
are serious problems (as opposed to a lookup which might return
the incorrect value but if the index in being updated, that is
much harder to avoid).
In both of these cases, the submodule data should still be loaded
just (obviously) without the data that comes from either the index
or the HEAD.
This fixes a bug in the orphaned head case.
There was a bug where submodules whose HEAD had not been moved
were being marked as having an UNMODIFIED delta record instead
of being left MODIFIED. This fixes that and fixes the tests to
notice if a submodule has been incorrectly marked as UNMODIFIED.
In theory, p_stat should never return an S_ISLNK result, but due
to the current implementation on Windows with mount points it is
possible that it will. For now, work around that by allowing a
link in the path to a directory being created. If it is really a
problem, then the issue will be caught on the next iteration of
the loop, but typically this will be the right thing to do.
This updates the calls that make the subdirectories for objects
to use a base directory above which git_futils_mkdir won't walk
any higher. This prevents attempts to mkdir all the way up to
the root of the filesystem.
Also, this moves the objects_dir into the loose backend structure
and removes the separate allocation, plus does some preformatting
of the objects_dir value to guarantee a trailing slash, etc.
With the new target directory option to checkout, the non-bareness
of the repository should be checked much later in the parameter
validation process - actually that check was already in place, but
I was doing it redundantly in the checkout APIs.
This removes the now unnecessary early check for bare repos. It
also adds some other parameter validation and makes it so that
implied parameters can actually be passed as NULL (i.e. if you
pass a git_index, you don't have to pass the git_repository - we
can get it from index).
This adds the ability for checkout to write to a target directory
instead of having to use the working directory of the repository.
This makes it easier to do exports of repository data and the like.
This is similar to, but not quite the same as, the --prefix option
to `git checkout-index` (this will always be treated as a directory
name, not just as a simple text prefix).
As part of this, the workdir iterator was extended to take the
path to the working directory as a parameter and fallback on the
git_repository_workdir result only if it's not specified.
Fixes#1332
This fixes the checkout case when a file is modified between the
baseline and the target and yet missing in the working directory.
The logic for that case appears to have been wrong.
This also adds a useful checkout notify callback to the checkout
test helpers that will count notifications and also has a debug
mode to visualize what checkout thinks that it's doing.
Files in status will, be default, be sorted according to the case
insensitivity of the filesystem that we're running on. However,
in some cases, this is not desirable. Even on case insensitive
file systems, 'git status' at the command line will generally use
a case sensitive sort (like 'ls'). Some GUIs prefer to display a
list of file case insensitively even on case-sensitive platforms.
This adds two new flags: GIT_STATUS_OPT_SORT_CASE_SENSITIVELY
and GIT_STATUS_OPT_SORT_CASE_INSENSITIVELY that will override the
default sort order of the status output and give the user control.
This includes tests for exercising these new options and makes
the examples/status.c program emulate core Git and always use a
case sensitive sort.
This adds some tests for updating the index and having it remove
items to make sure that the iteration over the index still works
even as earlier items are removed.
In testing with valgrind, this found a path that would use the
path string from the index entry after it had been freed. The
bug fix is simply to copy the path of the index entry before
doing any actual index manipulation.
This adds three new public APIs for manipulating the index:
1. `git_index_add_all` is similar to `git add -A` and will add
files in the working directory that match a pathspec to the
index while honoring ignores, etc.
2. `git_index_remove_all` removes files from the index that match
a pathspec.
3. `git_index_update_all` updates entries in the index based on
the current contents of the working directory, either added
the new information or removing the entry from the index.
Command line Git sometimes generates an error message if given a
pathspec that contains an exact match to an ignored file (provided
--force isn't also given). This adds an internal function that
makes it easy to check it that has happened. Right now, I'm not
creating a public API for this because that would get a little
more complicated with a need for callbacks for all invalid paths.
Right now, setting up a pathspec to be parsed and processed
requires several data structures and a couple of API calls. This
adds a new high level data structure that contains all the items
that you'll need and high-level APIs that do all of the setup and
all of the teardown. This will make it easier to use pathspecs
in various places with less repeated code.
This makes the diff rename tracking code more careful about the
order in which it processes renames and more thorough in updating
the mapping of correct renames when an earlier rename update
alters the index of a later matched pair.
This adds parameters to the four functions that allow for blob-to-
blob and blob-to-buffer differencing (either via callbacks or by
making a git_diff_patch object). These parameters let you say
that filename we should pretend the blob has while doing the diff.
If you pass NULL, there should be no change from the existing
behavior, which is to skip using attributes for file type checks
and just look at content. With the parameters, you can plug into
the new diff driver functionality and get binary or non-binary
behavior, plus function context regular expressions, etc.
This commit also fixes things so that the git_diff_delta that is
generated by these functions will actually be populated with the
data that we know about the blobs (or buffers) so you can use it
appropriately. It also fixes a bug in generating patches from
the git_diff_patch objects created via these functions.
Lastly, there is one other behavior change that may matter. If
there is no difference between the two blobs, these functions no
longer generate any diff callbacks / patches unless you have
passed in GIT_DIFF_INCLUDE_UNMODIFIED. This is pretty natural,
but could potentially change the behavior of existing usage.
This changes the behavior of the status RENAMED flags so that they
will be combined with the MODIFIED flags if appropriate. If a file
is modified in the index and also renamed, then the status code
will have both the GIT_STATUS_INDEX_MODIFIED and INDEX_RENAMED bits
set. If it is renamed but the OID has not changed, then just the
GIT_STATUS_INDEX_RENAMED bit will be set. Similarly, the flags
GIT_STATUS_WT_MODIFIED and GIT_STATUS_WT_RENAMED can both be set
independently of one another.
This fixes a serious bug where the check for unmodified files that
was done at data load time could end up erasing the RENAMED state
of a file that was renamed with no changes.
Lastly, this contains a bunch of new tests for status with renames,
including tests where the only rename changes are case changes.
The expected results of these tests have to vary by whether the
platform uses a case sensitive filesystem or not, so the expected
data covers those platform differences separately.
Trees are always case sensitive. The index is always case
preserving and will be case sensitive when it is turned into a
tree. Therefore the tree and the index can and should always
be compared to one another case sensitively.
This will restore the index to case insensitive order after the
diff has been generated.
Consider this a short-term fix. The long term fix is to have the
index always stored both case sensitively and case insensitively
(at least on platforms that sometimes require case insensitivity).
In a case insensitive index, if you attempt to add a file from
disk with a different case pattern, the old case pattern in the
index should be preserved.
This fixes that (and a couple of minor warnings).
This commit reinstates some changes to git_diff__paired_foreach
that were discarded during the rebase (because the diff_output.c
file had gone away), and also adjusts the case insensitively
logic slightly to hopefully deal with either mismatched icase
diffs and other case insensitivity scenarios.
This makes diff more careful about picking the canonical path
when generating a delta so that it won't accidentally pick up a
case-mismatched path on a case-insensitive file system. This
should make sure we use the "most accurate" case correct version
of the path (i.e. from the tree if possible, or the index if
need be).
The exclude submodules flag was not doing the right thing, in
that a file with no diff between the head and the index and just
a delete in the workdir could be excluded if submodules were
excluded.
On Linux: fix a warning message related to the volatile qualifier (cast)
On Windows: use SecureZeroMemory()
On both, inline the call, so that no entry point can lead back to this "secure" memory zeroing.
This makes the git_diff_patch definition private to diff_patch.c
and fixes a number of other header file naming inconsistencies to
use `git_` prefixes on functions and structures that are shared
between files.
This changes the size data to uint32_t, fixes the array growth
logic to use a simple 1.5x multiplier, and uses a generic inline
function for growing the array to make the git_array_alloc API
feel more natural (i.e. it returns a pointer to the new item).
This adds two new public APIs: git_diff_patch_from_blobs and
git_diff_patch_from_blob_and_buffer, plus it refactors the code
for git_diff_blobs and git_diff_blob_to_buffer so that they code
is almost entirely shared between these APIs, and adds tests for
the new APIs.
This implements the loading of regular expression pattern lists
for diff drivers that search for function context in that way.
This also changes the way that diff drivers update options and
interface with xdiff APIs to make them a little more flexible.
There are all sorts of misconfiguration in the wild. We already rely
on the signature constructor to trim SP. Extend the logic to use
`isspace` to decide whether a character should be trimmed.
This is a significant reorganization of the diff code to break it
into a set of more clearly distinct files and to document the new
organization. Hopefully this will make the diff code easier to
understand and to extend.
This adds a new `git_diff_driver` object that looks of diff driver
information from the attributes and the config so that things like
function content in diff headers can be provided. The full driver
spec is not implemented in the commit - this is focused on the
reorganization of the code and putting the driver hooks in place.
This also removes a few #includes from src/repository.h that were
overbroad, but as a result required extra #includes in a variety
of places since including src/repository.h no longer results in
pulling in the whole world.
There are two places where git_futils_mkdir should exit early or
at least do less. The first is when using GIT_MKDIR_SKIP_LAST
and having that flag leave no directory left to create; it was
being handled previously, but the behavior was subtle. Now I put
in a clear explicit check that exits early in that case.
The second is when there is no directory to create, but there is
a valid path that should be verified. I shifted the logic a bit
so we'll be better about not entering the loop than that happens.
This implements a basic callback to extract function context for
a diff. It always uses the same search heuristic right now with
no regular expressions or language-specific variants. Those will
come next, I think.
This makes sure that git_futils_mkdir always skips over the root
directory at a minimum, even on platforms where the root is not
simply '/'. Also, this removes the GIT_WIN32 ifdef in favor of
making EACCES as a potentially recoverable error on all platforms.
We ran into an issue where cloning a repository to a folder
directly underneath the root of a volume (e.g. 'd:\libgit2')
would fail with an access denied error. This was traced down
to a call to make a directory that is the root (e.g. 'd:') could
return an error indicated access denied instead of an error
indicating the path already exists. This change now handles
the access denied error on Win32 and checks for the existence
of the folder.
git doesn't do that, and it's not something that's usually
actionable to fix. if you have a git repository with one bad
timezone in the history, it's too late to change it most likely.
Instead of just blowing away the stat cache data when loading a
new tree into the index, this checks if each loaded item has a
corresponding existing item with the same OID and if so, copies
the stat data from the old item to the new one so it will not be
blown away.
It is obviously quite a serious problem if this happens, but mutex
initialization can fail and we should detect it. It's a bit like
a memory allocation failure, in that you're probably pretty screwed
if this occurs, but at least we'll catch it.
By zeroing out the memory when we free larger objects (i.e. those
that serve as collections of other data, such as repos, odb, refdb),
I'm hoping that it will be easier for libgit2 bindings to find
errors in their object management code.
1. internal iterators now return GIT_ITEROVER when you go past the
last item in the iteration.
2. git_iterator_advance will "advance" to the first item in the
iteration if it is called immediately after creating the
iterator, which allows a simpler idiom for basic iteration.
3. if git_iterator_advance encounters an error reading data (e.g.
a missing tree or an unreadable file), it returns the error
but also attempts to advance past the invalid data to prevent
an infinite loop.
Updated all tests and internal usage of iterators to account for
these new behaviors.
`lpExitCode` is a pointer to a long. A long is 32 bits wide on Windows.
It means that on Windows 64bits, `GetExitCodeThread()` doesn't set/clear the high-order bytes of the 64 bits memory space pointed at by `value_ptr`.
git_packbuilder_write() used to write a packfile to the passed file
path. Instead, ask for a destination directory and create both the
packfile and an index, as most users probably do expect.
This adds ~/ prefix expansion for the value of core.attributesfile
and core.excludesfile, plus it fixes the fact that the attributes
cache was holding on to the string data from the config for a long
time (instead of making its own strdup) which could have caused a
problem if the config was refreshed. Adds a test for the new
expansion capability.
This improves the docs for GIT_DIFF_INCLUDE_UNTRACKED_CONTENT as
well as the other flags related to UNTRACKED items in diff, plus
it makes that flag now automatically turn on
GIT_DIFF_INCLUDE_UNTRACKED which seems like a reasonable dwim type
of change.
The GIT_CONFIG_LEVEL constants actually work well as an enum
because they are mutually exclusive, so this adds a typedef to
the enum and uses that everywhere that one of these constants are
expected, instead of the old code that typically used an unsigned
int.
This adds docs for the cache control options to git_libgit2_opts
and also tweaks the cache code so that if the cache is disabled,
then the next time we attempt to insert something into the cache
in question, we will actually clear any old cached objects.
In theory, if there was a problem reading the REUC data, the
read_reuc() routine could have left uninitialized and invalid
data in the git_index vector. This moves the line that inserts a
new entry into the vector down to the bottom of the routine so we
know all the content is already valid. Also, per @linquize, this
uses calloc to ensure no uninitialized data.
This extends the rename tests to make sure that every rename
scenario in the inner loop of git_diff_find_similar is actually
exercised. Also, fixes an incorrect assert that was in one of
the clauses that was not previously being exercised.
This moves the GIT_CVAR_ABBREV lookup out of the loop. Also, this
fixes git_diff_print_raw to actually use that constant instead of
hardcoding 7 characters.
This adds a couple more tests of different rename scenarios.
Also, this fixes a problem with the case where you have two
"split" deltas and the left half of one matches the right half of
the other. That case was already being handled, but in the wrong
order in a way that could result in bad output. Also, if the swap
also happened to put the other two halves into the correct place
(i.e. two files exchanged places with each other), then the second
delta was left with the SPLIT flag set when it really should be
cleared.
This flips rename detection around so instead of creating a
forward mapping from deltas to possible rename targets, instead
it creates a reverse mapping, looking at possible targets and
trying to find a source that they could have been renamed or
copied from. This is important because each output can only
have a single source, but a given source could map to multiple
outputs (in the form of COPIED records).
Additionally, this makes a couple of tweaks to the public rename
detection APIs, mostly renaming a couple of options that control
the behavior to make more sense and to be more like core Git.
I walked through the tests looking at the exact results and
updated the expectations based on what I saw. The new code is
different from the old because it cannot give some nonsense
results (like A was renamed to both B and C) which were part of
the outputs previously.
This adds a bunch more rename detection tests including checks
vs the working directory, the new exact match options, some more
whitespace variants, etc.
This also adds a git_futils_writebuffer helper function and uses
it in checkout. This is mainly added because I wanted an easy
way to write out a git_buf to disk inside my test code.
- Add new GIT_DIFF_FIND_EXACT_MATCH_ONLY flag to do similarity
matching without using the similarity metric (i.e. only compare
the SHA).
- Clean up the similarity measurement code to more rigorously
distinguish between files that are not similar and files that
are not comparable (previously, a 0 could either mean that the
files could not be compared or that they were totally different)
- When splitting a MODIFIED file into a DELETE/ADD pair, actually
make a DELETED/UNTRACKED pair if the right side of the diff is
from the working directory. This prevents an odd mix of ADDED
and UNTRACKED files on workdir diffs.
There are a number of bugs in the rename code that only were
obvious when I started testing it against large old repos with
more complex patterns. (The code to do that testing is not ready
to merge with libgit2, but I do plan to add more thorough tests.)
This contains a significant number of changes and also tweaks the
public API slightly to make emulating core git easier.
Most notably, this separates the GIT_DIFF_FIND_AND_BREAK_REWRITES
flag into FIND_REWRITES (which adds a self-similarity score to
every modified file) and BREAK_REWRITES (which splits the modified
deltas into add/remove pairs in the diff list). When you do a raw
output of core git, rewrites show up as M090 or such, not at A and
D output, so I wanted to be able to emulate that.
Publicly, this also changes the flags to be uint16_t since we
don't need values out of that range.
Internally, this contains significant changes from a number of
small bug fixes (like using the wrong side of the diff to decide
if the object could be found in the ODB vs the workdir) to larger
issues about which files can and should be compared and how the
various edge cases of similarity scores should be treated.
Honestly, I don't think this is the last update that will have to
be made to this code, but I think this moves us closer to correct
behavior and I tried to document the code so it would be easier
to follow..
I frequently want to the the first N digits of an OID formatted
as a string and I'd like it to be efficient. This function makes
that easy and I could rewrite the OID formatters in terms of it.
Expose a way to retrieve, along with the target git_object, the reference
pointed at by some revparse expression (`@{<-n>}` or
`<branchname>@{upstream}` syntax).
In theory, if there was a problem reading the REUC data, the
read_reuc() routine could have left uninitialized and invalid
data in the git_index vector. This moves the line that inserts a
new entry into the vector down to the bottom of the routine so we
know all the content is already valid. Also, per @linquize, this
uses calloc to ensure no uninitialized data.
This adds an example implementation that emulates git cat-file.
It is a convenient and relatively simple example of getting data
out of a repository.
Implementing this also revealed that there are a number of APIs
that are still not using const pointers to objects that really
ought to be. The main cause of this is that `git_vector_bsearch`
may need to call `git_vector_sort` before doing the search, so a
const pointer to the vector is not allowed. However, for tree
objects, with a little care, we can ensure that the vector of
tree entries is always sorted and allow lookups to take a const
pointer. Also, the missing const in commit objects just looks
like an oversight.
Since I added the GIT_IDXENTRY_STAGE macro to extract the stage
from a git_index_entry, we probably don't need an internal inline
function to do the same thing.
Under some strange circumstances, diffs can end up listing files
that we can't actually open successfully. Instead of aborting
the git_diff_find_similar, this makes it so that those files just
won't be considered as valid rename/copy targets instead.
It is possible for there to be a submodule in a repository with
no .gitmodules file (for example, if the user forgot to commit
the .gitmodules file). In this case, core Git will just create
an empty directory as a placeholder for the submodule but
otherwise ignore it. We were generating an error and stopping
the checkout. This makes our behavior match that of core git.
Unlike blob updates, symlink updates cannot be done "in place"
writing over an old symlink. This means that in checkout when we
realize that we can safely update a symlink, we still need to
remove the old one before writing the new.
When the last item in a diff was an untracked directory that only
contained ignored items, the loop to scan the contents would run
off the end of the iterator and dereference a NULL pointer. This
includes a test that reproduces the problem and a fix.
There was a problem found in the Rugged test suite where the
refdb_fs_backend__next function could exit too early in some
very specific hashing patterns for packed refs. This ports
the Rugged test to libgit2 and then fixes the bug.
Nobody should ever be using anything other than ALL at this level, so
remove the option altogether.
As part of this, git_reference_foreach_glob is now implemented in the
frontend using an iterator. Backends will later regain the ability of
doing the glob filtering in the backend.
If you use rename detection, the renamed and copied files would
not show any text diffs because the function that decides if
data should be loaded didn't know which sides of the diff to
load for those cases.
This adds a test that looks at the patch generated for diff
entries that are COPIED or RENAMED.
The git_status_file API was doing a hack to deal with files that
are inside ignored directories. The status scan was not reporting
any file in this case, so git_status_file would attempt a final
"stat()" call, and return IGNORED if the file actually existed.
On case-insensitive filesystems where core.ignorecase is set
incorrectly, this magic check can "succeed" and report a file
as ignored when it should actually return ENOTFOUND.
Now that we have the GIT_STATUS_OPT_RECURSE_IGNORED_DIRS, we can
use that flag to make sure that git_status_file() will look into
ignored directories and eliminate the hack completely, so we give
the correct error.
This clarifies the docs for git_repository_message and also adds
to the tests to explicitly check NUL termination of data when the
output buffer is smaller than the message size. There is a minor
behavior change so that a non-NULL output buffer will always be
NUL terminated (at length zero) if an error occurs.
When a repository is initialised, we need to probe to see if there is
a global config to load. If this is not the case, the user isn't able
to write to the global config without creating the backend and adding
it themselves, which is inconvenient and overly complex.
Unconditionally create and add a backend for the global config file
regardless of whether it exists as a convenience for users.
To enable this, we allow creating backends to files that do not exist
yet, changing the semantics somewhat, and making some tests invalid.
When tagopt is set to '--tags', we should only take the default tags
refspec into account and ignore any configured ones.
Bring the code into compliance.
When a patch contained an eofnl change (i.e. the last line either
gained or lost a newline), the oldno and newno line number values
for the lines in the last hunk of the patch were not useful. This
makes them behave in a more expected manner.
This adds a new line origin constant for the special line that
is used when both files end without a newline.
In the course of writing the tests for this, I was having problems
with modifying a file but not having diff notice because it was
the same size and modified less than one second from the start of
the test, so I decided to start working on nanosecond timestamp
support. This commit doesn't contain the nanosecond support, but
it contains the reorganization of maybe_modified and the hooks so
that if the nanosecond data were being read by stat() (or rather
being copied by git_index_entry__init_from_stat), then the nsec
would be taken into account.
This new stuff could probably use some more tests, although there
is some amount of it here.
Currently git_branch_set_upstream when passed a local branch
creates invalid configuration, for ex. if we setup branch
'tracking_master' to track local 'master' libgit2 generates
the following config
```
[branch "track_master"]
remote = .
merge = .refs/heads/track_master
```
The merge value is invalid and calling git_branch_upstream on
'tracking_master' results in invalid reference error.
It should do:
```
[branch "track_master"]
remote = .
merge = refs/heads/master
```
We use p->index_map.data to check whether the struct has been set up
and all the information about the index is stored there. This variable
gets set up halfway through the setup process, however, and a thread
can come along and use fields that haven't been written to yet.
Crucially, pack_entry_find_offset() needs to read the index version
(which is written after index_map) to know the offset and stride
length to pass to sha1_entry_pos(). If these values are wrong,
assertions in it will fail, as it will be reading bogus data.
Make index_version the last field to be written and switch from using
p->index_map.data to p->index_version as "git_pack_file is ready" flag
as we can use it to know if every field has been written.
Older versions of git would only write peeled entries for
items under refs/tags/. Newer versions will write them for
all refs, and we should be prepared to handle that.
There is an occasional assertion failure in sha1_entry_pos from
pack_entry_find_index when running threaded. Holding the mutex
around the code that grabs the index_map data and processes it
makes this assertion failure go away.
There are many paths through revparse that may return an error
code without reporting an error, I believe. This fixes one of
them. Because of the backtracking in revparse, it is pretty
complicated to fix the others.
A number of places were looking up option config values and then
not clearing the error codes if the values were not found. This
moves the repeated pattern into a shared routine and adds the
extra call to giterr_clear() when needed.
There are some cases, particularly where no loaded ODB backends
support a particular operation, where we would return an error
code without having set an error. This catches those cases and
reports that no ODB backends support the operation in question.
When diff encounters an untracked directory, there was a shortcut
that it took which is not compatible with core git. This makes
the default behavior no longer take that shortcut and instead look
inside the untracked directory to see if there are any untracked
files within it. If there are not, then the directory is treated
as an ignore directory instead of an untracked directory. This
has implications for the git_status APIs.
In preparation for more changes to the internal diff logic, it
seemed wise to split the very large git_diff__from_iterators into
separate functions that handle the four main cases (unmatched old
item, unmatched new item, unmatched new directory, and matched
old and new items). Hopefully this will keep the logic easier to
follow even as more cases have to be added to this code.
This removes the GIT_INLINE versions of the simple git_object
accessors and standardizes them with a helper macro in src/object.h
to build the function bodies.
Add a new git_oid_strcmp that compares a string OID with a hex
oid for sort order, and then reimplement git_oid_streq using it.
This actually should speed up git_oid_streq because it only reads
as far into the string as it needs to, whereas previously it would
convert the whole string into an OID and then use git_oid_cmp.
git_oid_ncmp was making some assumptions about the length of
the data - this shifts the check to the top of the loop so it
will work more robustly, limits the max, and adds some tests
to verify the functionality.
For update and create commands where all the objects are known to
exist in the remote, we must send an empty packfile. However, if all
we issue are delete commands, no packfile must be sent.
Take this into consideration for push.
This makes diff use the cvar cache for config options where
possible, and also adds support for a number of other config
options to diff including "diff.context", "diff.ignoreSubmodules",
"diff.noprefix", "diff.mnemonicprefix", and "core.abbrev".
To make this natural, this involved a rearrangement of the code
that allocates the diff object vs. the code that initializes it
based on the combination of options passed in by the user and
read from the config.
This commit includes tests for most of these new options as well.
This converts many of the config lookups that are done around the
library to use the repository config cache. This was everything I
could find that wasn't part of diff (which requires a larger fix).
This adds a bunch of additional config values to the repository
config value cache and makes it easier to add a simple boolean
config without creating enum values for each possible setting.
Also, this fixes a bug in git_config_refresh where the config
cache was not being cleared which could lead to potential
incorrect values.
The work to start using the new cached configs will come in the
next couple of commits...
When case insensitive tree iterators were added, we started reading
the case sensitivity of the index to decide if the tree should be
case sensitive. This is good for index-to-tree comparisons, but
for tree-to-tree comparisons, we should really default to doing a
case sensitive comparison unless the user really wants otherwise.