This adds some basic tests for the oidmap just to make sure that
collisions, etc. are dealt with correctly.
This also adds some tests for the new caching that check if items
are inserted (or not inserted) properly into the cache, and that
the cache can hold up in a multithreaded environment without error.
This adds crlf/lf conversion functions into buf_text with more
efficient implementations that bypass the high level buffer
functions. They attempt to minimize the number of reallocations
done and they directly write the buffer data as needed if they
know that there is enough memory allocated to memcpy data.
Tests are added for these new functions. The crlf.c code is
updated to use the new functions.
Removed the include of buf_text.h from filter.h and just include
it more narrowly in the places that need it.
Currently, the odb cache has a fixed size of 128 slots as defined by
GIT_DEFAULT_CACHE_SIZE. Allow users to set the size of the cache via
git_libgit2_opts().
Fixes#1035.
This switches the APIs for setting and getting the global/system
search paths from using git_strarray to using a simple string with
GIT_PATH_LIST_SEPARATOR delimited paths, just as the environment
PATH variable would contain. This makes it simpler to get and set
the value.
I also added code to expand "$PATH" when setting a new value to
embed the old value of the path. This means that I no longer
require separate actions to PREPEND to the value.
The goal of this work is to expose the search logic for "global",
"system", and "xdg" files through the git_libgit2_opts() interface.
Behind the scenes, I changed the logic for finding files to have a
notion of a git_strarray that represents a search path and to store
a separate search path for each of the three tiers of config file.
For each tier, I implemented a function to initialize it to default
values (generally based on environment variables), and then general
interfaces to get it, set it, reset it, and prepend new directories
to it.
Next, I exposed these interfaces through the git_libgit2_opts
interface, reusing the GIT_CONFIG_LEVEL_SYSTEM, etc., constants
for the user to control which search path they were modifying.
There are alternative designs for the opts interface / argument
ordering, so I'm putting this phase out for discussion.
Additionally, I ended up doing a little bit of clean up regarding
attr.h and attr_file.h, adding a new attrcache.h so the other two
files wouldn't have to be included in so many places.
This updates the tree iterator internals to be more efficient.
The tree_iterator_entry objects are now kept as pointers that are
allocated from a git_pool, so that we may use git__tsort_r for
sorting (which is better than qsort, given that the tree is
likely mostly ordered already).
Those tree_iterator_entry objects now keep direct pointers to the
data they refer to instead of keeping indirect index values. This
simplifies a lot of the data structure traversal code.
This also adds bsearch to find the start item position for range-
limited tree iterators, and is more explicit about using
git_path_cmp instead of reimplementing it. The git_path_cmp
changed a bit to make it easier for tree_iterators to use it (but
it was barely being used previously, so not a big deal).
This adds a git_pool_free_array function that efficiently frees a
list of pool allocated pointers (which the tree_iterator keeps).
Also, added new tests for the git_pool free list functionality
that was not previously being tested (or used).
This plugs in the three basic similarity strategies for handling
whitespace via internal use of the pluggable API. In so doing, I
realized that the use of git_buf in the hashsig API was not needed
and actually just made it harder to use, so I tweaked that API as
well.
Note that the similarity metric is still not hooked up in the
find_similarity code - this is just setting out the function that
will be used.
This moves the similarity metric code out of buf_text and into a
new file. Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes. In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.
This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score. This can be plugged into diff similarity
scoring.
The cppcheck static analyzer generates warnings for a bunch of
places in the libgit2 code base. All the ones fixed in this
commit are actually false positives, but I've reorganized the
code to hopefully make it easier for static analysis tools to
correctly understand the structure. I wouldn't do this if I
felt like it was making the code harder to read or worse for
humans, but in this case, these fixes don't seem too bad and will
hopefully make it easier for better analysis tools to get at any
real issues.
Core git just looks for NUL bytes in files when deciding about
is-binary inside diff (although it uses a better algorithm in
checkout, when deciding if CRLF conversion should be done).
Libgit2 was using the better algorithm in both places, but that
is causing some confusion. For now, this makes diff just look
for NUL bytes to decide if a file is binary by content in diff.
There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc). This groups all those functions together into a
new file and converts the code to use that.
This has two enhancements to existing functionality. The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).
This fixes some missed places where we can apply const-ness to
various public APIs.
There are still some index and tree APIs that cannot take const
pointers because we sort our `git_vectors` lazily and so we can't
reliably bsearch the index and tree content without applying a
`git_vector_sort()` first.
This also fixes some missed places where size_t can be used and
where const can be applied to a couple internal functions.
Without this change, any failed assertion in the second (or a later) test
inside a test suite has a chance of double deleting memory, resulting in
a heap corruption. See #1096 for details.
This leaves alone the test cases where we "just" use cl_git_sandbox_init()
and cl_git_sandbox_cleanup(). These methods already take good care to not
double delete a repository.
Fixes#1096
The existing p_lstat implementation on win32 is not quite POSIX
compliant when setting errno to ENOTDIR. This adds an option to
make is be compliant so that code (such as checkout) that cares
to have separate behavior for ENOTDIR can use it portably.
This also contains a couple of other minor cleanups in the
posix_w32.c implementations to avoid unnecessary work.
* Rework GIT_DIRREMOVAL values to GIT_RMDIR flags, allowing
combinations of flags
* Add GIT_RMDIR_EMPTY_PARENTS flag to remove parent dirs that
are left empty after removal
* Add GIT_MKDIR_VERIFY_DIR to give an error if item is a file,
not a dir (previously an EEXISTS error was ignored, even for
files) and enable this flag for git_futils_mkpath2file call
* Improve accuracy of error messages from git_futils_mkdir
The new Win32 global path search was not working with the
environment variable tests. But when I fixed the test, the new
codes use of getenv() was causing more failures (presumably because
of caching on Windows ???). This fixes the global file lookup to
always go directly to the Win32 API in a predictable way.
This started as a complex new test for checkout going through the
"typechanges" test repository, but that revealed numerous issues
with checkout, including:
* complete failure with submodules
* failure to create blobs with exec bits
* problems when replacing a tree with a blob because the tree
"example/" sorts after the blob "example" so the delete was
being processed after the single file blob was created
This fixes most of those problems and includes a number of other
minor changes that made it easier to do that, including improving
the TYPECHANGE support in diff/status, etc.
There has been discussion for a while about making some set of
the `giterr_set` type functions part of the public API for code
that is implementing new backends to libgit2. This makes the
`giterr_set_str()` and `giterr_set_oom()` functions public.
This cleans up a number of items suggested during code review
with @vmg, including:
* renaming "outside repo" config API to `git_config_open_default`
* killing the `git_config_open_global` API
* removing the `git_` prefix from the static functions in fileops
* removing some unnecessary functionality from the "cp" command
If you use the clar cleanup callback function, you can't pass a
reference pointer to a stack allocated variable because when the
cleanup function runs, the stack won't exist anymore.
This extends git_repository_init_ext further with support for
initializing the repository from an external template directory
and with support for the "create shared" type flags that make a
set GID repository directory.
This also adds tests for much of the new functionality to the
existing `repo/init.c` test suite.
Also, this adds a bunch of new utility functions including a
very general purpose `git_futils_mkdir` (with the ability to
make paths and to chmod the paths post-creation) and a file
tree copying function `git_futils_cp_r`. Also, this includes
some new path functions that were useful to keep the code
simple.
This makes it easy to take a buffer containing a path with relative
references (i.e. .. or . path segments) and resolve all of those
into a clean path. This can be applied to URLs as well as file
paths which can be useful.
As part of this, I made the drive-letter detection apply on all
platforms, not just windows. If you give a path that looks like
"c:/..." on any platform, it seems like we might as well detect
that as a rooted path. I suppose if you create a directory named
"x:" on another platform and want to use that as the beginning
of a relative path under the root directory of your repo, this
could cause a problem, but then it seems like you're asking for
trouble.
* `git_buf_rfind` (with tests and tests for `git_buf_rfind_next`)
* `git_buf_puts_escaped` and `git_buf_puts_escaped_regex` (with tests)
to copy strings into a buffer while injecting an escape sequence
(e.g. '\') in front of particular characters.