Rename git_packfile_check to git_packfile_alloc since it is now
being used more in that capacity. Fix the various places that use
it. Consolidate some repeated code in odb_pack.c related to the
allocation of a new pack_backend.
The indexer was creating a packfile object separately from the
code in pack.c which was a problem since I put a call to
git_mutex_init into just pack.c. This commit updates the pack
function for creating a new pack object (i.e. git_packfile_check())
so that it can be used in both places and then makes indexer.c
use the shared initialization routine.
There are also a few minor formatting and warning message fixes.
Implicit type conversion argument of function to size_t type
Suspicious sequence of types castings: size_t -> int -> size_t
Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)'
Unsigned type is never < 0
These offsets are needed for REF_DELTA objects, which encode which
object they use as a base, but not where it lies in the packfile, so
we need a list.
These objects are mostly from older packfiles, before OFS_DELTA was
widely spread. The time spent in indexing these packfiles is greatly
reduced, though remains above what git is able to do.
This was the first implementation and its goal was simply to have
something that worked. It is slow and now it's just taking up
space. Remove it and switch the one known usage to use the streaming
indexer.
The indexer needs to call the packfile's free function so it takes care of
freeing the caches.
We still need to close the mwf descriptor manually so we can rename the
packfile into its final name on Windows.
Storing 4kB or 8kB in the stack is not very gentle. As this part has
to be linear, put the buffer into the indexer object so we allocate it
once in the heap.
A mmap-window is not guaranteed to give you the whole object, but the
indexer currently assumes so.
Loop asking for more data until we've successfully CRC'd all of the
packed data.
Up to now, deltas needed to be enterily in the packfile, and we tried
to decompress then in their entirety over and over again.
Adjust the logic so we read them as they come, just as we do for full
objects. This also allows us to simplify the logic and have less
nested code. The delta resolving phase still needs to decompress the
whole object into memory, as there is not yet any streaming
delta-apply support, but it helps in speeding up the downloading
process and reduces the amount of memory allocations we need to do.
The new API allows us to read the object bit by bit from the packfile,
instead of needing it all at once in the packfile. This also allows us
to hash the object as it comes in from the network instead of having
to try to read it all and failing repeatedly for larger objects.
This is only the first step, but it already shows huge improvements
when dealing with objects over a few megabytes in size. It reduces the
memory needs in some cases, but delta objects still need to be
completely in memory and the old inefficent method is still used for
that.
It's not really needed with the current code as we have EOS and the
sideband's flush to tell us we're done.
Keep the distinction between processed and received objects.
Not all delta bases are available on the first try. By delaying
resolving all deltas until the end, we avoid decompressing some of the
data twice or even more times, saving effort and time.
Once a file is registered, there is no way to deregister it, even
after the structure that contains it is no longer needed and has been
freed. This may be the source of #624.
Allow and use the deregister function to remove our file from the
global list.
Currently, the first call of git_indexer_stream_add adds the data to the
underlying pack file and opens it for later use, but doesn't start
parsing the already available data.
This means, git_indexer_stream_finalize only works if
git_indexer_stream_add was called at least twice. Kill this limitation
by parsing available data immediately.
On RAM: the .idx and .pack files become links to a .lock and the original download respectively.
Assume some feature (such as record locking) supported by SFS but not JXFS or RAM: is required.
Trying to send every single line immediately won't give us any speed
improvement and duplicates the code we need for other transports. Make
the git transport use the same buffer functions as HTTP.