To paraphrase @peff:
You can get both size and type from a packed object reasonably cheaply.
If you have:
* An object that is not a delta; both type and size are available in the
packfile header.
* An object that is a delta. The packfile type will be OBJ_*_DELTA, and
you have to resolve back to the base to find the real type. That means
potentially a lot of packfile index lookups, but each one is
relatively cheap. For the size, you inflate the first few bytes of the
delta, whose header will tell you the resulting size of applying the
delta to the base.
For simplicity, we just decompress the whole delta for now.
A mmap-window is not guaranteed to give you the whole object, but the
indexer currently assumes so.
Loop asking for more data until we've successfully CRC'd all of the
packed data.
Up to now, deltas needed to be enterily in the packfile, and we tried
to decompress then in their entirety over and over again.
Adjust the logic so we read them as they come, just as we do for full
objects. This also allows us to simplify the logic and have less
nested code. The delta resolving phase still needs to decompress the
whole object into memory, as there is not yet any streaming
delta-apply support, but it helps in speeding up the downloading
process and reduces the amount of memory allocations we need to do.
The new API allows us to read the object bit by bit from the packfile,
instead of needing it all at once in the packfile. This also allows us
to hash the object as it comes in from the network instead of having
to try to read it all and failing repeatedly for larger objects.
This is only the first step, but it already shows huge improvements
when dealing with objects over a few megabytes in size. It reduces the
memory needs in some cases, but delta objects still need to be
completely in memory and the old inefficent method is still used for
that.
`revwalk.h:commit_lookup()` -> `git_revwalk__commit_lookup()`
and make `git_commit_list_parse()` do real error checking that
the item in the list is an actual commit object. Also fixed an
apparent typo in a test name.
Moved it into graph.{c,h} which i created for the new "graph"
functions namespace. Also adjusted the function prototype
to use `size_t` and `const git_oid *`.
There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc). This groups all those functions together into a
new file and converts the code to use that.
This has two enhancements to existing functionality. The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).
clang-SVN HEAD kindly provided my the info, that sm_repo maybe
uninitialized when we want to free it (If the expression in line 358 or
359/360 evaluate to true, we jump to "cleanup", where we'd use sm_repo
uninitialized).
This fixes some missed places where we can apply const-ness to
various public APIs.
There are still some index and tree APIs that cannot take const
pointers because we sort our `git_vectors` lazily and so we can't
reliably bsearch the index and tree content without applying a
`git_vector_sort()` first.
This also fixes some missed places where size_t can be used and
where const can be applied to a couple internal functions.
This makes the diff functions that take callbacks both take
the payload parameter after the callback function pointers and
pass the payload as the last argument to the callback function
instead of the first. This should make them consistent with
other callbacks across the API.
3f9eb1e introduced support for SSL certificates issued for IP
addresses, making use of in_addr and in_addr6 structs. On FreeBSD
these are defined in (a file included in) <netinet/in.h>, so include
that file on FreeBSD and get the build working again.
The workdir iterator has always tried to ignore .git files, but
it turns out there were some bugs. This makes it more robust at
ignoring .git files.
This also makes iterators always check ".git" case insensitively
regardless of the properties of the system. This will make libgit2
skip ".GIT" and the like. This is different from core git, but on
systems with case insensitive but case preserving file systems,
allowing ".GIT" to be added is problematic.
This checks for a leading '.' before looking for the invalid
tree entry names. Even on pretty high levels of optimization,
this seems to make a measurable improvement.
I accidentally used && in the check initially instead of || and
while debugging ended up improving the error reporting of issues
with adding tree entries. I thought I'd leave those changes, too.