Commit Graph

230 Commits

Author SHA1 Message Date
Patrick Steinhardt
e0973bc0fc odb: verify hashes in read_prefix_1
While the function reading an object from the complete OID already
verifies OIDs, we do not yet do so for reading objects from a partial
OID. Do so when strict OID verification is enabled.
2017-04-28 14:10:37 +02:00
Patrick Steinhardt
141096202b odb: improve error handling in read_prefix_1
The read_prefix_1 function has several return statements springled
throughout the code. As we have to free memory upon getting an error,
the free code has to be repeated at every single retrun -- which it is
not, so we have a memory leak here.

Refactor the code to use the typical `goto out` pattern, which will free
data when an error has occurred. While we're at it, we can also improve
the error message thrown when multiple ambiguous prefixes are found. It
will now include the colliding prefixes.
2017-04-28 14:10:37 +02:00
Patrick Steinhardt
35079f507b odb: add option to turn off hash verification
Verifying hashsums of objects we are reading from the ODB may be costly
as we have to perform an additional hashsum calculation on the object.
Especially when reading large objects, the penalty can be as high as
35%, as can be seen when executing the equivalent of `git cat-file` with
and without verification enabled. To mitigate for this, we add a global
option for libgit2 which enables the developer to turn off the
verification, e.g. when he can be reasonably sure that the objects on
disk won't be corrupted.
2017-04-28 14:05:45 +02:00
Patrick Steinhardt
28a0741f1a odb: verify object hashes
The upstream git.git project verifies objects when looking them up from
disk. This avoids scenarios where objects have somehow become corrupt on
disk, e.g. due to hardware failures or bit flips. While our mantra is
usually to follow upstream behavior, we do not do so in this case, as we
never check hashes of objects we have just read from disk.

To fix this, we create a new error class `GIT_EMISMATCH` which denotes
that we have looked up an object with a hashsum mismatch. `odb_read_1`
will then, after having read the object from its backend, hash the
object and compare the resulting hash to the expected hash. If hashes do
not match, it will return an error.

This obviously introduces another computation of checksums and could
potentially impact performance. Note though that we usually perform I/O
operations directly before doing this computation, and as such the
actual overhead should be drowned out by I/O. Running our test suite
seems to confirm this guess. On a Linux system with best-of-five
timings, we had 21.592s with the check enabled and 21.590s with the
ckeck disabled. Note though that our test suite mostly contains very
small blobs only. It is expected that repositories with bigger blobs may
notice an increased hit by this check.

In addition to a new test, we also had to change the
odb::backend::nonrefreshing test suite, which now triggers a hashsum
mismatch when looking up the commit "deadbeef...". This is expected, as
the fake backend allocated inside of the test will return an empty
object for the OID "deadbeef...", which will obviously not hash back to
"deadbeef..." again. We can simply adjust the hash to equal the hash of
the empty object here to fix this test.
2017-04-28 14:05:45 +02:00
Edward Thomson
6fd6c67824 Merge pull request #4030 from libgit2/ethomson/fsync
fsync all the things
2017-03-22 20:29:22 +00:00
Edward Thomson
52d03f37f7 git_commit_create: freshen tree objects in commit
Freshen the tree object that a commit points to during commit time.
2017-03-03 14:12:00 +00:00
Edward Thomson
1c04a96b25 Honor core.fsyncObjectFiles 2017-03-02 09:11:33 +00:00
Edward Thomson
909d549436 giterr_set: consistent error messages
Error messages should be sentence fragments, and therefore:

1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
2016-12-29 12:26:03 +00:00
Patrick Steinhardt
901434b00f common: cast precision specifiers to int 2016-11-14 10:07:55 +01:00
Edward Thomson
becadafca8 odb: only provide the empty tree
Only provide the empty tree internally, which matches git's behavior.
If we provide the empty blob then any users trying to write it with
libgit2 would omit it from actually landing in the odb, which appear
to git proper as a broken repository (missing that object).
2016-08-05 19:30:56 -04:00
Edward Thomson
8f09a98e18 odb: freshen existing objects when writing
When writing an object, we calculate its OID and see if it exists in the
object database.  If it does, we need to freshen the file that contains
it.
2016-08-04 15:12:04 -04:00
Edward Thomson
20302aa437 Merge pull request #3223 from ethomson/apply
Reading patch files
2016-06-25 23:33:05 -04:00
Sim Domingo
2076d3291c fix error message SHA truncation in git_odb__error_notfound() 2016-06-20 11:15:23 -04:00
Edward Thomson
6a2d2f8aa1 delta: move delta application to delta.c
Move the delta application functions into `delta.c`, next to the
similar delta creation functions.  Make the `git__delta_apply`
functions adhere to other naming and parameter style within the
library.
2016-05-26 13:01:03 -05:00
Vicent Marti
1bbcb2b279 odb: Try to lookup headers in all backends before passthrough 2016-03-09 18:17:37 +01:00
Vicent Marti
e78d2ac939 odb: Refactor git_odb_expand_ids 2016-03-09 16:43:43 +01:00
Vicent Marti
4416aa7749 odb: Implement new helper to read types without refreshing 2016-03-09 16:43:17 +01:00
Vicent Marti
9a78665005 odb: Handle corner cases in git_odb_expand_ids
The old implementation had two issues:

1. OIDs that were too short as to be ambiguous were not being handled
properly.

2. If the last OID to expand in the array was missing from the ODB, we
would leak a `GIT_ENOTFOUND` error code from the function.
2016-03-09 11:00:27 +01:00
Edward Thomson
62484f52d1 git_odb_expand_ids: accept git_odb_expand_id array
Take (and write to) an array of a struct, `git_odb_expand_id`.
2016-03-08 14:57:20 -05:00
Edward Thomson
4b1f0f79ac git_odb_expand_ids: rename func, return the type 2016-03-08 11:44:21 -05:00
Edward Thomson
6c04269c8f git_odb_exists_many_prefixes: query odb for multiple short ids
Query the object database for multiple objects at a time, given their
object ID (which may be abbreviated) and optional type.
2016-03-07 16:10:25 -05:00
Edward Thomson
e10144ae57 odb: improved not found error messages
When looking up an abbreviated oid, show the actual (abbreviated) oid
the caller passed instead of a full (but ambiguously truncated) oid.
2016-03-07 10:20:01 -05:00
Vicent Marti
a0a1b19ab0 odb: Prioritize alternate backends
For most real use cases, repositories with alternates use them as main
object storage. Checking the alternate for objects before the main
repository should result in measurable speedups.

Because of this, we're changing the sorting algorithm to prioritize
alternates *in cases where two backends have the same priority*. This
means that the pack backend for the alternate will be checked before the
pack backend for the main repository *but* both of them will be checked
before any loose backends.
2015-10-14 20:53:01 +02:00
Vicent Marti
43820f204e odb: Be smarter when refreshing backends
In the current implementation of ODB backends, each backend is tasked
with refreshing itself after a failed lookup. This is standard Git
behavior: we want to e.g. reload the packfiles on disk in case they have
changed and that's the reason we can't find the object we're looking
for.

This behavior, however, becomes pathological in repositories where
multiple alternates have been loaded. Given that each alternate counts
as a separate backend, a miss in the main repository (which can
potentially be very frequent in cases where object storage comes from
the alternate) will result in refreshing all its packfiles before we
move on to the alternate backend where the object will most likely be
found.

To fix this, the code in `odb.c` has been refactored as to perform the
refresh of all the backends externally, once we've verified that the
object is nowhere to be found.

If the refresh is successful, we then perform the lookup sequentially
through all the backends, skipping the ones that we know for sure
weren't refreshed (because they have no refresh API).

The on-disk pack backend has been adjusted accordingly: it no longer
performs refreshes internally.
2015-10-14 19:24:07 +02:00
Arthur Schreiber
d3b29fb94b refdb and odb backends must provide free function
As refdb and odb backends can be allocated by client code, libgit2
can’t know whether an alternative memory allocator was used, and thus
should not try to call `git__free` on those objects.

Instead, odb and refdb backend implementations must always provide
their own `free` functions to ensure memory gets freed correctly.
2015-10-01 00:50:37 +02:00
Edward Thomson
e5f9df7b0f odb: cast to long long for printf 2015-06-29 21:45:04 +00:00
Pierre-Olivier Latour
9f3c18e2ac Fixed build warnings on Xcode 6.1 2015-06-02 11:49:38 -07:00
Edward Thomson
a6f2ceaf48 Merge pull request #3118 from libgit2/cmn/stream-size
odb: make the writestream's size a git_off_t
2015-05-13 12:11:55 -04:00
Carlos Martín Nieto
b0d7f329a8 odb: reverse the default backend priorities
We currently first look in the loose object dir and then in the packs
for objects. When performing operations on recent history this has a
higher likelihood of hitting, but when we deal with operations which
look further back into the past, we start spending a large amount of
time getting ENOTENT from `access`.

Reversing the priorities means that long-running operations can get to
their objects faster, as we can look at the index data we have in memory
(or rather mapped) to figure out whether we have an object, which is
faster than going out to the filesystem.

The packed backend already implements an optimistic read algorithm by
first looking at the packs we know about and only going out to disk to
referesh if the object is not found which means that in the case where
we do have the object (which will be in the majority for anything that
traverses the graph) we can avoid going to to disk entirely to determine
whether an object exists.

Operations which look at recent history may take a slight impact, but
these would be operations which look a lot less at object and thus take
less time regardless.
2015-05-13 10:23:19 +02:00
Carlos Martín Nieto
77b339f7b6 odb: make the writestream's size a git_off_t
Restricting files to size_t is a silly limitation. The loose backend
writes to a file directly, so there is no issue in using 63 bits for the
size.

We still assume that the header is going to fit in 64 bytes, which does
mean quite a bit smaller files due to the run-length encoding, but it's
still a much larger size than you would want Git to handle.
2015-05-13 09:34:20 +02:00
J Wyman
7dd2253826 centralizing all IO buffer size values 2015-05-11 10:32:08 -07:00
Edward Thomson
f1453c59b2 Make our overflow check look more like gcc/clang's
Make our overflow checking look more like gcc and clang's, so that
we can substitute it out with the compiler instrinsics on platforms
that support it.  This means dropping the ability to pass `NULL` as
an out parameter.

As a result, the macros also get updated to reflect this as well.
2015-02-13 09:27:33 -05:00
Edward Thomson
15d54fdd34 odb__hashlink: check st.st_size before casting 2015-02-12 22:54:46 -05:00
Edward Thomson
392702ee2c allocations: test for overflow of requested size
Introduce some helper macros to test integer overflow from arithmetic
and set error message appropriately.
2015-02-12 22:54:46 -05:00
Edward Thomson
c251f3bbe7 win32: remember to cleanup our hash_ctx 2014-12-09 12:04:47 -05:00
Vicent Marti
e015665142 odb: git_odb_object contents are never NULL
This is a contract that we made in the library and that we need to uphold. The
contents of a blob can never be NULL because several parts of the library (including
the filter and attributes code) expect `git_blob_rawcontent` to always return a
valid pointer.
2014-11-21 14:09:53 +01:00
Carlos Martín Nieto
e1ac010148 odb: hardcode the empty blob and tree
git hardocodes these as objects which exist regardless of whether they
are in the odb and uses them in the shell interface as a way of
expressing the lack of a blob or tree for one side of e.g. a diff.

In the library we use each language's natural way of declaring a lack of
value which makes a workaround like this unnecessary. Since git uses it,
it does however mean each shell application would need to perform this
check themselves.

This makes it common work across a range of applications and an issue
with compatibility with git, which fits right into what the library aims
to provide.

Thus we introduce the hard-coded empty blob and tree in the odb
frontend. These hard-coded objects are checked for before going to the
backends, but after the cache check, which means the second time they're
used, they will be treated as normal cached objects instead of creating
new ones.
2014-11-08 20:53:38 +01:00
Carlos Martín Nieto
530594c0aa odb: clear backend errors on successful read
We go through the different backends in order, so it's not an error if
at least one of the backends has the data we want.
2014-05-23 06:01:57 +02:00
Russell Belfer
bc91347b58 Fix remaining init_options inconsistencies
There were a couple of "init_opts()" functions a few more cases
of structure initialization that I somehow missed.
2014-05-02 09:21:33 -07:00
Jacques Germishuys
48e60ae75e Don't redefine the same callback types, their signatures may change 2014-04-21 11:28:49 +02:00
Edward Thomson
3ab5781601 Merge pull request #2178 from libgit2/rb/fix-short-id
Fix git_odb_short_id and git_odb_exists_prefix bugs
2014-03-31 23:23:32 -07:00
Linquize
31a14982a0 Fix wrong assertion
Fixes issue #2196
2014-03-21 17:36:34 +08:00
Russell Belfer
8949907887 Fix a number of git_odb_exists_prefix bugs
The git_odb_exists_prefix API was not dealing correctly when a
later backend returned GIT_ENOTFOUND even if an earlier backend
had found the object.

Additionally, the unit tests were not properly exercising the API
and had a couple mistakes in checking the results.

Lastly, since the backends are not expected to behavior correctly
unless all bytes of the short id are zero except for the prefix,
this makes the ODB prefix APIs explicitly clear out the extra
bytes so the user doesn't have to be as careful.
2014-03-10 11:34:50 -07:00
Matthew Bowen
b9f819978c Added function-based initializers for every options struct.
The basic structure of each function is courtesy of arrbee.
2014-03-05 21:49:23 -05:00
Vicent Marti
a064dc2d0b Merge pull request #2159 from libgit2/rb/odb-exists-prefix
Add ODB API to check for existence by prefix and object id shortener
2014-03-06 00:47:05 +01:00
Russell Belfer
26875825df Check short OID len in odb, not in backends 2014-03-05 13:06:22 -08:00
Edward Thomson
7bd2f40154 ODB writing fails gracefully when unsupported
If no ODB backends support writing, we should fail gracefully.
2014-03-05 11:35:47 -08:00
Russell Belfer
f5753999e4 Add exists_prefix to ODB backend and ODB API 2014-03-04 15:34:23 -08:00
Brodie Rao
ae3b6d612d odb: handle NULL pointers passed to git_odb_stream_free
Signed-off-by: Brodie Rao <brodie@sf.io>
2014-01-12 23:33:59 -08:00
Edward Thomson
dd64c71c26 Allow backend consumers to specify file mode 2013-11-04 14:50:25 -05:00