The frontend is in charge of calculating the id of the objects. Thus
the backends should treat it as a read-only value. The positioning in
the function signature made it seem as though it was an output
parameter.
Make the id const and move it from the front to behind the subject
(backend or stream).
Hash the data as it's coming into the stream and tell the backend what
its name is when finalizing the write. This makes it consistent with
the way a plain git_odb_write() performs the write.
This is in preparation for moving the hashing to the frontend, which
requires us to handle the incoming data before passing it to the
backend's stream.
By zeroing out the memory when we free larger objects (i.e. those
that serve as collections of other data, such as repos, odb, refdb),
I'm hoping that it will be easier for libgit2 bindings to find
errors in their object management code.
There are some cases, particularly where no loaded ODB backends
support a particular operation, where we would return an error
code without having set an error. This catches those cases and
reports that no ODB backends support the operation in question.
This adds create and free callback to the git_objects_table so
that more of the creation and destruction of objects can be table
driven instead of using switch statements. This also makes the
semantics of certain object creation functions consistent so that
we can make better use of function pointers. This also fixes a
theoretical error case where an object allocation fails and we
end up storing NULL into the cache.
This moves some of the odb_backend stuff that is related to the
internals of an odb_backend implementation into include/git2/sys.
Some of the stuff related to streaming I left in include/git2
because it seemed like it would be reasonably needed by a normal
user who wanted to stream objects into and out of the ODB.
Also, I added APIs for traversing the list of backends so that
some of the tests would not need to access ODB internals.
Currently, the odb cache has a fixed size of 128 slots as defined by
GIT_DEFAULT_CACHE_SIZE. Allow users to set the size of the cache via
git_libgit2_opts().
Fixes#1035.
Implicit type conversion argument of function to size_t type
Suspicious sequence of types castings: size_t -> int -> size_t
Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)'
Unsigned type is never < 0
All the ODB backends have a specific refresh interface. When reading an
object, first we attempt every single backend: if the read fails, then
we refresh all the backends and retry the read one more time to see if
the object has appeared.
The new API allows us to read the object bit by bit from the packfile,
instead of needing it all at once in the packfile. This also allows us
to hash the object as it comes in from the network instead of having
to try to read it all and failing repeatedly for larger objects.
This is only the first step, but it already shows huge improvements
when dealing with objects over a few megabytes in size. It reduces the
memory needs in some cases, but delta objects still need to be
completely in memory and the old inefficent method is still used for
that.
Fixed some minor `git_repository_hashfile` issues:
- Fixed incorrect doc (saying that repo could be NULL)
- Added checking of object type value to acceptable ones
- Added more tests for various parameter permutations
Often `git_odb_read_header` will "fail" and have to read the
entire object into memory instead of just the header. When this
happens, the object is loaded and then disposed of immediately,
which makes it difficult to efficiently use the header information
to decide if the object should be loaded (since attempting to do
so will often result in loading the object twice).
This commit takes the existing code and reorganizes it to have
two new functions:
- `git_odb__read_header_or_object` which acts just like the old
read header function except that it returns the object, too, if
it was forced to load the whole thing. It then becomes the
callers responsibility to free the `git_odb_object`.
- `git_object__from_odb_object` which was extracted from the old
`git_object_lookup` and creates a subclass of `git_object` from
an existing `git_odb_object` (separating the ODB lookup from the
`git_object` creation). This allows you to use the first header
reading function efficiently without instantiating the
`git_odb_object` twice.
There is no net change to the behavior of any of the existing
functions, but this allows internal code to tap into the ODB
lookup and object creation to be more efficient.
This adds support to diff and status for running filters (a la crlf)
on blobs in the workdir before computing SHAs and before generating
text diffs. This ended up being a bit more code change than I had
thought since I had to reorganize some of the diff logic to minimize
peak memory use when filtering blobs in a diff.
This also adds a cap on the maximum size of data that will be loaded
to diff. I set it at 512Mb which should match core git. Right now
it is a #define in src/diff.h but it could be moved into the public
API if desired.