Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
When trying to uncompress deltas in a packfile's delta chain, we try to
add object bases to the packfile cache, subsequently decrementing its
reference count if it has been added successfully. This may lead to a
mismatched reference count in the case where we exit the loop early due
to an encountered error.
Fix the issue by decrementing the reference count in error cleanup.
In `pack_entry_find_offset`, we try to find the offset of a
certain object in the pack file. To do so, we first assert if the
packfile has already been opened and open it if not. Opening the
packfile is guarded with a mutex, so concurrent access to this is
in fact safe.
What is not thread-safe though is our calculation of offsets
inside the packfile. Assume two threads calling
`pack_entry_find_offset` at the same time. We first calculate the
offset and index location and only then determine if the pack has
already been opened. If so, we re-calculate the offset and index
address.
Now the case for two threads: thread 1 first calculates the
addresses and is subsequently suspended. The second thread will
now call `pack_index_open` and initialize the pack file,
calculating its addresses correctly. When the first thread is
resumed now, he'll see that the pack file has already been
initialized and will happily proceed with the addresses it has
already calculated before the check. As the pack file was not
initialized before, these addresses are bogus.
Fix the issue by only calculating the addresses after having
checked if the pack file is open.
Move the delta application functions into `delta.c`, next to the
similar delta creation functions. Make the `git__delta_apply`
functions adhere to other naming and parameter style within the
library.
When we read the header, we want to know the size and type of the
object. We're currently inflating the full delta in order to read the
first few bytes. This can mean hundreds of kB needlessly inflated for
large objects.
Instead use a packfile stream to read just enough so we can read the two
varints in the header and avoid inflating most of the delta.
The function `git_packfile_stream_open` tries to free the passed
in stream when an error occurs. The only call site is
`git_indexer_append`, though, which passes in the address of a
stream struct which has not been allocated on the heap.
Fix the issue by simply removing the call to free. In case of an
error we did not allocate any memory yet and otherwise it should
be the caller's responsibility to manage it's object's lifetime.
The way we currently do it depends on the subtlety of strlen vs sizeof
and the fact that .pack is one longer than .idx. Let's use a git_buf so
we can express the manipulation we want much more clearly.
Keep the definitions in the headers, while putting the declarations in
the C files. Putting the function definitions in headers causes
them to be duplicated if you include two headers with them.
Increment refcount of newly added cache entries just like existing
entries looked up from the cache. Otherwise the new entry can be
evicted from the cache and destroyed while it's still in use.
Make our overflow checking look more like gcc and clang's, so that
we can substitute it out with the compiler instrinsics on platforms
that support it. This means dropping the ability to pass `NULL` as
an out parameter.
As a result, the macros also get updated to reflect this as well.
The callers of git_packfile_unpack() expect the obj_offset argument to
be set to the beginning of the next object. We were mistakenly returning
the the offset of the object's data, which causes the CRC function to
try to use the wrong offset.
Set obj_offset to curpos instead of elem->offset to point to the next
element and bring back expected behaviour.
If we fail to insert the packfile in the map, make sure to free it.
This makes the free function only attempt to remove its mwindows from
the global list if we have opened the packfile to avoid accessing the
list unlocked.
Opening the same repository multiple times will currently open the same
file multiple times, as well as map the same region of the file multiple
times. This is not necessary, as the packfile data is immutable.
Instead of opening and closing packfiles directly, introduce an
indirection and allocate packfiles globally. This does mean locking on
each packfile open, but we already use this lock for the global mwindow
list so it doesn't introduce a new contention point.
When running multithreaded, it is not enough to check for the offmap
allocation. Move the call to cache_init() to packfile allocation so we
can be sure it is always allocated free of races.
This fixes#2355.
The switch makes the loop somewhat unwieldy. Let's assume it's fine and
perform the check when we're accessing the data.
This makes our code look a lot more like git's.
Dependency chains are often large and require a few
reallocations. Allocate a 64-element chain before doing anything else to
avoid allocations during the loop.
This value comes from the stack-allocated one git uses. We still
allocate this on the heap, but it does help performance a little bit.
Bring back the use of the delta base cache for unpacking objects. When
generating the delta chain, we stop when we find a delta base in the
pack's cache and use that as the starting point.
We currently make use of recursive function calls to unpack an object,
resolving the deltas as we come back down the chain. This means that we
have unbounded stack growth as we look up objects in a pack.
This is now done in two steps: first we figure out what the dependency
chain is by looking up the delta bases until we reach a non-delta
object, pushing the information we need onto a stack and then we pop
from that stack and apply the deltas until there are no more left.
This version of the code does not make use of the delta base cache so it
is slower than what's in the mainline. A later commit will reintroduce
it.
Repeating this error message makes it harder to find out where we
actually are finding the error, and they don't really describe what
we're trying to do.