This changes the behavior of callbacks so that the callback error
code is not converted into GIT_EUSER and instead we propagate the
return value through to the caller. Instead of using the
giterr_capture and giterr_restore functions, we now rely on all
functions to pass back the return value from a callback.
To avoid having a return value with no error message, the user
can call the public giterr_set_str or some such function to set
an error message. There is a new helper 'giterr_set_callback'
that functions can invoke after making a callback which ensures
that some error message was set in case the callback did not set
one.
In places where the sign of the callback return value is
meaningful (e.g. positive to skip, negative to abort), only the
negative values are returned back to the caller, obviously, since
the other values allow for continuing the loop.
The hardest parts of this were in the checkout code where positive
return values were overloaded as meaningful values for checkout.
I fixed this by adding an output parameter to many of the internal
checkout functions and removing the overload. This added some
code, but it is probably a better implementation.
There is some funkiness in the network code where user provided
callbacks could be returning a positive or a negative value and
we want to rely on that to cancel the loop. There are still a
couple places where an user error might get turned into GIT_EUSER
there, I think, though none exercised by the tests.
This continues auditing all the places where GIT_EUSER is being
returned and making sure to clear any existing error using the
new giterr_user_cancel helper. As a result, places that relied
on intercepting GIT_EUSER but having the old error preserved also
needed to be cleaned up to correctly stash and then retrieve the
actual error.
Additionally, as I encountered places where error codes were not
being propagated correctly, I tried to fix them up. A number of
those fixes are included in the this commit as well.
It is obviously quite a serious problem if this happens, but mutex
initialization can fail and we should detect it. It's a bit like
a memory allocation failure, in that you're probably pretty screwed
if this occurs, but at least we'll catch it.
By zeroing out the memory when we free larger objects (i.e. those
that serve as collections of other data, such as repos, odb, refdb),
I'm hoping that it will be easier for libgit2 bindings to find
errors in their object management code.
We use p->index_map.data to check whether the struct has been set up
and all the information about the index is stored there. This variable
gets set up halfway through the setup process, however, and a thread
can come along and use fields that haven't been written to yet.
Crucially, pack_entry_find_offset() needs to read the index version
(which is written after index_map) to know the offset and stride
length to pass to sha1_entry_pos(). If these values are wrong,
assertions in it will fail, as it will be reading bogus data.
Make index_version the last field to be written and switch from using
p->index_map.data to p->index_version as "git_pack_file is ready" flag
as we can use it to know if every field has been written.
There is an occasional assertion failure in sha1_entry_pos from
pack_entry_find_index when running threaded. Holding the mutex
around the code that grabs the index_map data and processes it
makes this assertion failure go away.
Rename git_packfile_check to git_packfile_alloc since it is now
being used more in that capacity. Fix the various places that use
it. Consolidate some repeated code in odb_pack.c related to the
allocation of a new pack_backend.
The indexer was creating a packfile object separately from the
code in pack.c which was a problem since I put a call to
git_mutex_init into just pack.c. This commit updates the pack
function for creating a new pack object (i.e. git_packfile_check())
so that it can be used in both places and then makes indexer.c
use the shared initialization routine.
There are also a few minor formatting and warning message fixes.
This builds on the earlier thread safety work to make it so that
setting the odb, index, refdb, or config for a repository is done
in a threadsafe manner with minimized locking time. This is done
by adding a lock to the repository object and using it to guard
the assignment of the above listed pointers. The lock is only
held to assign the pointer value.
This also contains some minor fixes to the other work with pack
files to reduce the time that locks are being held to and fix an
apparently memory leak.
When I was writing threading tests for the new cache, the main
error I kept running into was a pack file having it's content
unmapped underneath the running thread. This adds a lock around
the routines that map and unmap the pack data so that threads can
effectively reload the data when they need it.
This also required reworking the error handling paths in a couple
places in the code which I tried to make consistent.
These offsets are needed for REF_DELTA objects, which encode which
object they use as a base, but not where it lies in the packfile, so
we need a list.
These objects are mostly from older packfiles, before OFS_DELTA was
widely spread. The time spent in indexing these packfiles is greatly
reduced, though remains above what git is able to do.
Somewhat surprisingly, this can increase the speed considerably, as we
don't bother trying to decide what to evict, and the most used entries
are quickly back into the cache.
The indexer needs to call the packfile's free function so it takes care of
freeing the caches.
We still need to close the mwf descriptor manually so we can rename the
packfile into its final name on Windows.
Many delta bases are re-used. Cache them to avoid inflating the same
data repeatedly.
This version doesn't limit the amount of entries to store, so it can
end up using a considerable amound of memory.
To paraphrase @peff:
You can get both size and type from a packed object reasonably cheaply.
If you have:
* An object that is not a delta; both type and size are available in the
packfile header.
* An object that is a delta. The packfile type will be OBJ_*_DELTA, and
you have to resolve back to the base to find the real type. That means
potentially a lot of packfile index lookups, but each one is
relatively cheap. For the size, you inflate the first few bytes of the
delta, whose header will tell you the resulting size of applying the
delta to the base.
For simplicity, we just decompress the whole delta for now.
This updates all the `foreach()` type functions across the library
that take callbacks from the user to have a consistent behavior.
The rules are:
* A callback terminates the loop by returning any non-zero value
* Once the callback returns non-zero, it will not be called again
(i.e. the loop stops all iteration regardless of state)
* If the callback returns non-zero, the parent fn returns GIT_EUSER
* Although the parent returns GIT_EUSER, no error will be set in
the library and `giterr_last()` will return NULL if called.
This commit makes those changes across the library and adds tests
for most of the iteration APIs to make sure that they follow the
above rules.
Once a file is registered, there is no way to deregister it, even
after the structure that contains it is no longer needed and has been
freed. This may be the source of #624.
Allow and use the deregister function to remove our file from the
global list.
On RAM: the .idx and .pack files become links to a .lock and the original download respectively.
Assume some feature (such as record locking) supported by SFS but not JXFS or RAM: is required.
There are three changes here:
- correctly propogate error code from failed object lookups
- make zlib inflate use our allocators
- add OID to notfound error in ODB lookups
This migrates odb.c, odb_loose.c, odb_pack.c and pack.c to
the new style of error handling. Also got the unix and win32
versions of map.c. There are some minor changes to other
files but no others were completely converted.
This also contains an update to filebuf so that a zeroed out
filebuf will not think that the fd (== 0) is actually open
(and inadvertently call close() on fd 0 if cleaned up).
Lastly, this was built and tested on win32 and contains a
bunch of fixes for the win32 build which was pretty broken.
This is legacy compat stuff for when `deflateBound` is not defined, but
we're not embedding zlib and that function is always available. Kill
that with fire.
This takes all of the functions that look up simple data about
paths (such as `git_futils_isdir`) and moves them over to path.h
(becoming `git_path_isdir`). This leaves fileops.h just with
functions that actually manipulate the filesystem or look at
the file contents in some way.
As part of this, the dir.h header which is really just for win32
support was moved into win32 (with some minor changes).
off_t is always 32 bits in Windows, which is beyond stupid, but we just
don't care anymore because we're using `git_off_t` which is assured to
be 64 bits on all platforms, regardless of compilation mode. Just
ensure that no casts to `off_t` are performed.
Also, the check for `off_t` overflows has been dropped, once again,
because the size of our offsets is always 64 bits on all platforms.
Fixes#534
See `global.c` for a description of what we're doing.
When libgit2 is built with GIT_THREADS support, the threading system
must be explicitly initialized with `git_threads_init()`.
This makes libgit2 more closely match Git, which only checks for
ambiguous pack entries when given short hashes.
Note that the only time this is ever relevant is when a pack has the
same object more than once (it's happened in the wild, I promise).
There were quite a few places were spaces were being used instead of
tabs. Try to catch them all. This should hopefully not break anything.
Except for `git blame`. Oh well.
1. The license header is technically not valid if it doesn't have a
copyright signature.
2. The COPYING file has been updated with the different licenses used in
the project.
3. The full GPLv2 header in each file annoys me.
When indexing a file with ref deltas, a temporary cache for the
offsets has to be built, as we don't have an index file yet. If the
user takes the responsiblity for filling the cache, the packing code
will look there first when it finds a ref delta.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
The callers immediately throw away the offset, so we don't need any
logical changes in any of them. This will be useful for the indexer,
as it does need to know where the compressed data ends.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>