Fixes#127 (that was quite an outstanding issue).
Rationale:
The tree objects on Git are stored and read following a very specific
sorting algorithm that places folders before files. That original sort
was the sort we were storing on memory, but this sort was being queried
with a binary search that used a simple `strcmp` for comparison, so
there were many instances where the search was failing.
Obviously, the most straightforward way to fix this is changing the
binary search CB to use the same comparison method as the sorting CB.
The problem with this is that the binary search callback compares a path
and an entry, so there is no way to know if the given path is a folder
or a standard file.
How do we work around this? Instead of splitting the `entry_byname`
method in two (one for searching directories and one for searching
normal files), we just assume that the path we are searching for is of
the same kind as the path it's being compared at the moment.
return git_futils_cmp_path(
ksearch->filename, ksearch->filename_len, entry->attr & 040000,
entry->filename, entry->filename_len, entry->attr & 040000);
Since there cannot be a folder and a regular file with the same name on
the same tree, the most basic equality check will always fail
for all comparsions, until our path is compared with the actual entry we
are looking for; in this case, the matching will succeed with the file
type of the entry -- whatever it was initially.
I hope that makes sense.
PS: While I was at it, I switched the cmp methods to use cached values
for the length of each filename. That makes searches and sorts
retardedly fast -- I was wondering the reason of the performance hiccups
on massive trees; it's because of 2*strlen for each comparsion call.
mode field of git_index_entry_unmerged is array of unsigned ints. It's
unsafe to cast pointer to an element of the array to long int *. It may
cause overflow in git_strtol32().
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Type casting usually points to some trick or bug. It's better not hide
it between useless type castings.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
The `hashfile` function has been moved to ODB, next to `git_odb_hash`.
Global state has been removed from the dirent call in `status.c`,
because global state is killing the rainforest and causing global
warming.
Add git_status_hashfile() to get blob's object id for a file without adding
it to the object database or needing a repository at all.
This functionality is similar to `git hash-object` without '-w'.
The direct-writes commit left some (slow) internals methods that
were no longer needed. These have been removed.
Also, the Reflog code was using the old `git_signature__write`, so
it has been rewritten to use a normal buffer and the new `writebuf`
signature writer. It's now slightly simpler and faster.
DIRECT WRITES ARE BACK AND FASTER THAN EVER. The streaming writer to the
ODB was an overkill for the smaller objects like Commit and Tags; most
of the streaming logic was taking too long.
This commit makes Commits, Tags and Trees to be built-up in memory, and
then written to disk in 2 pushes (header + data), instead of streaming
everything.
This is *always* faster, even for big files (since the git_filebuf class
still does streaming writes when the memory cache overflows). This is
also a gazillion lines of code smaller, because we don't have to
precompute the final size of the object before starting the stream (this
was kind of defeating the point of streaming, anyway).
Blobs are still written with full streaming instead of loading them in
memory, since this is still the fastest way.
A new `git_buf` class has been added. It's missing some features, but
it'll get there.
Our good, lovely folks at Microsoft decided that there was no good
reason to make `vsnprintf` compilant with the C standard, so that
function in Windows returns -1 on overflow, instead of returning the
actual byte count needed to write the full string.
We now handle this situation more gracefully with the POSIX
compatibility layer, by returning the needed byte size using an
auxiliary method instead of blindly resizing the target buffer until it
fits.
This means we can now support `printf`s of any size by allocating a
temporary buffer. That's good.
So far libgit2 didn't support reference logs (reflog). Add a new
git_reflog_* API for basic reading and writing of reflogs:
* git_reflog_read
* git_reflog_write
* git_reflog_free
Signed-off-by: schu <schu-github@schulog.org>