The old method was avoiding re-loading of packfiles by watching the mtime of the
pack directory. This causes the ODB to become stale if the directory and packfile
are written within the same clock millisecond, as when cloning a fairly small
repo.
This method tries to find the object in the cached packs, and forces a refresh when
that fails. This will cause extra stat'ing on a miss, but speeds up the success
case and avoids this race condition.
last_found is the last packfile a wanted object was found in. Since
last_found is shared among all searching threads, it might changes while
we're searching. As suggested by @arrbee, put a copy on the stack to fix
the race condition.
This reduces the rate of syscalls for the common case of sequences of
object reads from the same pack.
Best of 5 timings for libgit2_clar before this patch:
real 0m5.375s
user 0m0.392s
sys 0m3.564s
After applying this patch:
real 0m5.285s
user 0m0.356s
sys 0m3.544s
0.6% improvement in system time.
9.2% improvement in user time.
1.7% improvement in elapsed time.
Confirmed a 0.6% reduction in number of system calls with strace.
Expect greater improvement for graph-traversal with large packs.
This updates all the `foreach()` type functions across the library
that take callbacks from the user to have a consistent behavior.
The rules are:
* A callback terminates the loop by returning any non-zero value
* Once the callback returns non-zero, it will not be called again
(i.e. the loop stops all iteration regardless of state)
* If the callback returns non-zero, the parent fn returns GIT_EUSER
* Although the parent returns GIT_EUSER, no error will be set in
the library and `giterr_last()` will return NULL if called.
This commit makes those changes across the library and adds tests
for most of the iteration APIs to make sure that they follow the
above rules.
There are three changes here:
- correctly propogate error code from failed object lookups
- make zlib inflate use our allocators
- add OID to notfound error in ODB lookups
This migrates odb.c, odb_loose.c, odb_pack.c and pack.c to
the new style of error handling. Also got the unix and win32
versions of map.c. There are some minor changes to other
files but no others were completely converted.
This also contains an update to filebuf so that a zeroed out
filebuf will not think that the fd (== 0) is actually open
(and inadvertently call close() on fd 0 if cleaned up).
Lastly, this was built and tested on win32 and contains a
bunch of fixes for the win32 build which was pretty broken.
It turns out that commit 31e9cfc4cbcaf1b38cdd3dbe3282a8f57e5366a5
did not fix the GIT_USUSED behavior on all platforms. This commit
walks through and really cleans things up more thoroughly, getting
rid of the unnecessary stuff.
To remove the use of some GIT_UNUSED, I ended up adding a couple
of new iterators for hashtables that allow you to iterator just
over keys or just over values.
In making this change, I found a bug in the clar tests (where we
were doing *count++ but meant to do (*count)++ to increment the
value). I fixed that but then found the test failing because it
was not really using an empty repo. So, I took some of the code
that I wrote for iterator testing and moved it to clar_helpers.c,
then made use of that to make it easier to open fixtures on a
per test basis even within a single test file.
This is legacy compat stuff for when `deflateBound` is not defined, but
we're not embedding zlib and that function is always available. Kill
that with fire.
This takes all of the functions that look up simple data about
paths (such as `git_futils_isdir`) and moves them over to path.h
(becoming `git_path_isdir`). This leaves fileops.h just with
functions that actually manipulate the filesystem or look at
the file contents in some way.
As part of this, the dir.h header which is really just for win32
support was moved into win32 (with some minor changes).
This converts virtually all of the places that allocate GIT_PATH_MAX
buffers on the stack for manipulating paths to use git_buf objects
instead. The patch is pretty careful not to touch the public API
for libgit2, so there are a few places that still use GIT_PATH_MAX.
This extends and changes some details of the git_buf implementation
to add a couple of extra functions and to make error handling easier.
This includes serious alterations to all the path.c functions, and
several of the fileops.c ones, too. Also, there are a number of new
functions that parallel existing ones except that use a git_buf
instead of a stack-based buffer (such as git_config_find_global_r
that exists alongsize git_config_find_global).
This also modifies the win32 version of p_realpath to allocate whatever
buffer size is needed to accommodate the realpath instead of hardcoding
a GIT_PATH_MAX limit, but that change needs to be tested still.
There were quite a few places were spaces were being used instead of
tabs. Try to catch them all. This should hopefully not break anything.
Except for `git blame`. Oh well.
1. The license header is technically not valid if it doesn't have a
copyright signature.
2. The COPYING file has been updated with the different licenses used in
the project.
3. The full GPLv2 header in each file annoys me.
The callers immediately throw away the offset, so we don't need any
logical changes in any of them. This will be useful for the indexer,
as it does need to know where the compressed data ends.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
This code is useful for more things than just the packfile handling
code. Factor it out so it can be reused.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Drop the GLibc implementation of Merge Sort and replace it with Timsort.
The algorithm has been tuned to work on arrays of pointers (void **),
so there's no longer a need to abstract the byte-width of each element
in the array.
All the comparison callbacks now take pointers-to-elements, not
pointers-to-pointers, so there's now one less level of dereferencing.
E.g.
int index_cmp(const void *a, const void *b)
{
- const git_index_entry *entry_a = *(const git_index_entry **)(a);
+ const git_index_entry *entry_a = (const git_index_entry *)(a);
The result is up to a 40% speed-up when sorting vectors. Memory usage
remains lineal.
A new `bsearch` implementation has been added, whose callback also
supplies pointer-to-elements, to uniform the Vector API again.
Check if the window structure has actually been allocated before
trying to access it, and don't leak said structure if the map fails.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Cleaned up the structure of the whole OS-abstraction layer.
fileops.c now contains a set of utility methods for file management used
by the library. These are abstractions on top of the original POSIX
calls.
There's a new file called `posix.c` that contains
emulations/reimplementations of all the POSIX calls the library uses.
These are prefixed with `p_`. There's a specific posix file for each
platform (win32 and unix).
All the path-related methods have been moved from `utils.c` to `path.c`
and have their own prefix.
Also allow space for the null-terminator when allocating the buffer in
packfile_unpack_compressed. Up to now, the last newline had served as
a terminator, but 858ef372 searches for a double-newline and exposes
the problem.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
As suggested by carlosmn, git_oid_ncmp would probably
be a better name than git_oid_match, for it does the same
as git_oid_cmp but only up to a certain amount of hex digits.