The goal of this work is to rewrite git_status_file to use the
same underlying code as git_status_foreach.
This is done in 3 phases:
1. Extend iterators to allow ranged iteration with start and
end prefixes for the range of file names to be covered.
2. Improve diff so that when there is a pathspec and there is
a common non-wildcard prefix of the pathspec, it will use
ranged iterators to minimize excess iteration.
3. Rewrite git_status_file to call git_status_foreach_ext
with a pathspec that covers just the one file being checked.
Since ranged iterators underlie the status & diff implementation,
this is actually fairly efficient. The workdir iterator does
end up loading the contents of all the directories down to the
single file, which should ideally be avoided, but it is pretty
good.
This also includes droping `git_buf_lasterror` because it makes no sense
in the new system. Note that in most of the places were it has been
dropped, the code needs cleanup. I.e. GIT_ENOMEM is going away, so
instead it should return a generic `-1` and obviously not throw
anything.
This takes all of the functions that look up simple data about
paths (such as `git_futils_isdir`) and moves them over to path.h
(becoming `git_path_isdir`). This leaves fileops.h just with
functions that actually manipulate the filesystem or look at
the file contents in some way.
As part of this, the dir.h header which is really just for win32
support was moved into win32 (with some minor changes).
Currently, diff_index passes the full relative path from the
repository root to the callback. In case of an addition, it passes
the tree entry instead of the index entry.
This change fixes the path used for addition, and it passes only
the basename of the path. This mimics the current behavior of
git_tree_diff.
This converts virtually all of the places that allocate GIT_PATH_MAX
buffers on the stack for manipulating paths to use git_buf objects
instead. The patch is pretty careful not to touch the public API
for libgit2, so there are a few places that still use GIT_PATH_MAX.
This extends and changes some details of the git_buf implementation
to add a couple of extra functions and to make error handling easier.
This includes serious alterations to all the path.c functions, and
several of the fileops.c ones, too. Also, there are a number of new
functions that parallel existing ones except that use a git_buf
instead of a stack-based buffer (such as git_config_find_global_r
that exists alongsize git_config_find_global).
This also modifies the win32 version of p_realpath to allocate whatever
buffer size is needed to accommodate the realpath instead of hardcoding
a GIT_PATH_MAX limit, but that change needs to be tested still.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
# Please enter the commit message for your changes. Lines starting
# with '#' will be ignored, and an empty message aborts the commit.
#
# Author: Carlos Martín Nieto <carlos@cmartin.tk>
#
# On branch development
# Your branch is ahead of 'origin/development' by 11 commits.
#
# Changes to be committed:
# (use "git reset HEAD^1 <file>..." to unstage)
#
# modified: include/git2/tree.h
# modified: src/tree.c
# modified: tests-clay/clay_main.c
# modified: tests-clay/object/tree/diff.c
#
# Untracked files:
# (use "git add <file>..." to include in what will be committed)
#
# 0001-remote-Cleanup-the-remotes-code.patch
# 466.patch
# 466.patch.1
# 488.patch
# Makefile
# libgit2.0.15.0.dylib
# libgit2.0.dylib
# libgit2.dylib
# libgit2_clay
# libgit2_test
# tests-clay/object/tree/
For each difference in the trees, the callback gets called with the
relevant information so the user can fill in their own data
structures.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
The ownership semantics have been changed all over the library to be
consistent. There are no more "borrowed" or duplicated references.
Main changes:
- `git_repository_open2` and `3` have been dropped.
- Added setters and getters to hotswap all the repository owned
objects:
`git_repository_index`
`git_repository_set_index`
`git_repository_odb`
`git_repository_set_odb`
`git_repository_config`
`git_repository_set_config`
`git_repository_workdir`
`git_repository_set_workdir`
Now working directories/index files/ODBs and so on can be
hot-swapped after creating a repository and between operations.
- All these objects now have proper ownership semantics with
refcounting: they all require freeing after they are no longer
needed (the repository always keeps its internal reference).
- Repository open and initialization has been updated to keep in
mind the configuration files. Bare repositories are now always
detected, and a default config file is created on init.
- All the tests affected by these changes have been dropped from the
old test suite and ported to the new one.
Taking advantage of the tree cache, git_tree_create_fromindex becomes
comparable in speed to git write-tree when the cache is available.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
1. The license header is technically not valid if it doesn't have a
copyright signature.
2. The COPYING file has been updated with the different licenses used in
the project.
3. The full GPLv2 header in each file annoys me.
There is no point in reinventing the wheel when using the treebuilder
is much more straightforward and makes the code more readable. There
is no optimisation, and the performance is no worse than when writing
the tree object ourselves.
/home/kas/git/public/libgit2/src/tree.c: In function ‘entry_search_cmp’:
/home/kas/git/public/libgit2/src/tree.c:47:36: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/src/tree.c: In function ‘git_treebuilder_remove’:
/home/kas/git/public/libgit2/src/tree.c:443:31: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
The old matcher was returning fake matches when given stupid entry
names. E.g.
`git2` could be matched by `git2 /`, `git2/foobar`, git2/////`
and other stupid stuff
Fixes#127 (that was quite an outstanding issue).
Rationale:
The tree objects on Git are stored and read following a very specific
sorting algorithm that places folders before files. That original sort
was the sort we were storing on memory, but this sort was being queried
with a binary search that used a simple `strcmp` for comparison, so
there were many instances where the search was failing.
Obviously, the most straightforward way to fix this is changing the
binary search CB to use the same comparison method as the sorting CB.
The problem with this is that the binary search callback compares a path
and an entry, so there is no way to know if the given path is a folder
or a standard file.
How do we work around this? Instead of splitting the `entry_byname`
method in two (one for searching directories and one for searching
normal files), we just assume that the path we are searching for is of
the same kind as the path it's being compared at the moment.
return git_futils_cmp_path(
ksearch->filename, ksearch->filename_len, entry->attr & 040000,
entry->filename, entry->filename_len, entry->attr & 040000);
Since there cannot be a folder and a regular file with the same name on
the same tree, the most basic equality check will always fail
for all comparsions, until our path is compared with the actual entry we
are looking for; in this case, the matching will succeed with the file
type of the entry -- whatever it was initially.
I hope that makes sense.
PS: While I was at it, I switched the cmp methods to use cached values
for the length of each filename. That makes searches and sorts
retardedly fast -- I was wondering the reason of the performance hiccups
on massive trees; it's because of 2*strlen for each comparsion call.
DIRECT WRITES ARE BACK AND FASTER THAN EVER. The streaming writer to the
ODB was an overkill for the smaller objects like Commit and Tags; most
of the streaming logic was taking too long.
This commit makes Commits, Tags and Trees to be built-up in memory, and
then written to disk in 2 pushes (header + data), instead of streaming
everything.
This is *always* faster, even for big files (since the git_filebuf class
still does streaming writes when the memory cache overflows). This is
also a gazillion lines of code smaller, because we don't have to
precompute the final size of the object before starting the stream (this
was kind of defeating the point of streaming, anyway).
Blobs are still written with full streaming instead of loading them in
memory, since this is still the fastest way.
A new `git_buf` class has been added. It's missing some features, but
it'll get there.
Drop the GLibc implementation of Merge Sort and replace it with Timsort.
The algorithm has been tuned to work on arrays of pointers (void **),
so there's no longer a need to abstract the byte-width of each element
in the array.
All the comparison callbacks now take pointers-to-elements, not
pointers-to-pointers, so there's now one less level of dereferencing.
E.g.
int index_cmp(const void *a, const void *b)
{
- const git_index_entry *entry_a = *(const git_index_entry **)(a);
+ const git_index_entry *entry_a = (const git_index_entry *)(a);
The result is up to a 40% speed-up when sorting vectors. Memory usage
remains lineal.
A new `bsearch` implementation has been added, whose callback also
supplies pointer-to-elements, to uniform the Vector API again.
Cleaned up the structure of the whole OS-abstraction layer.
fileops.c now contains a set of utility methods for file management used
by the library. These are abstractions on top of the original POSIX
calls.
There's a new file called `posix.c` that contains
emulations/reimplementations of all the POSIX calls the library uses.
These are prefixed with `p_`. There's a specific posix file for each
platform (win32 and unix).
All the path-related methods have been moved from `utils.c` to `path.c`
and have their own prefix.
Magic constant replaced by direct to-string covertion because of:
1) with value length 6 (040000 - subtree) final tree will be corrupted;
2) for wrong values length <6 final tree will be corrupted too.
Hey. Apologies in advance -- I broke your bindings.
This is a major commit that includes a long-overdue redesign of the
whole object-database structure. This is expected to be the last major
external API redesign of the library until the first non-alpha release.
Please get your bindings up to date with these changes. They will be
included in the next minor release. Sorry again!
Major features include:
- Real caching and refcounting on parsed objects
- Real caching and refcounting on objects read from the ODB
- Streaming writes & reads from the ODB
- Single-method writes for all object types
- The external API is now partially thread-safe
The speed increases are significant in all aspects, specially when
reading an object several times from the ODB (revwalking) and when
writing big objects to the ODB.
Here's a full changelog for the external API:
blob.h
------
- Remove `git_blob_new`
- Remove `git_blob_set_rawcontent`
- Remove `git_blob_set_rawcontent_fromfile`
- Rename `git_blob_writefile` -> `git_blob_create_fromfile`
- Change `git_blob_create_fromfile`:
The `path` argument is now relative to the repository's working dir
- Add `git_blob_create_frombuffer`
commit.h
--------
- Remove `git_commit_new`
- Remove `git_commit_add_parent`
- Remove `git_commit_set_message`
- Remove `git_commit_set_committer`
- Remove `git_commit_set_author`
- Remove `git_commit_set_tree`
- Add `git_commit_create`
- Add `git_commit_create_v`
- Add `git_commit_create_o`
- Add `git_commit_create_ov`
tag.h
-----
- Remove `git_tag_new`
- Remove `git_tag_set_target`
- Remove `git_tag_set_name`
- Remove `git_tag_set_tagger`
- Remove `git_tag_set_message`
- Add `git_tag_create`
- Add `git_tag_create_o`
tree.h
------
- Change `git_tree_entry_2object`:
New signature is `(git_object **object_out, git_repository *repo, git_tree_entry *entry)`
- Remove `git_tree_new`
- Remove `git_tree_add_entry`
- Remove `git_tree_remove_entry_byindex`
- Remove `git_tree_remove_entry_byname`
- Remove `git_tree_clearentries`
- Remove `git_tree_entry_set_id`
- Remove `git_tree_entry_set_name`
- Remove `git_tree_entry_set_attributes`
object.h
------------
- Remove `git_object_new
- Remove `git_object_write`
- Change `git_object_close`:
This method is now *mandatory*. Not closing an object causes a
memory leak.
odb.h
-----
- Remove type `git_rawobj`
- Remove `git_rawobj_close`
- Rename `git_rawobj_hash` -> `git_odb_hash`
- Change `git_odb_hash`:
New signature is `(git_oid *id, const void *data, size_t len, git_otype type)`
- Add type `git_odb_object`
- Add `git_odb_object_close`
- Change `git_odb_read`:
New signature is `(git_odb_object **out, git_odb *db, const git_oid *id)`
- Change `git_odb_read_header`:
New signature is `(size_t *len_p, git_otype *type_p, git_odb *db, const git_oid *id)`
- Remove `git_odb_write`
- Add `git_odb_open_wstream`
- Add `git_odb_open_rstream`
odb_backend.h
-------------
- Change type `git_odb_backend`:
New internal signatures are as follows
int (* read)(void **, size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* read_header)(size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* writestream)(struct git_odb_stream **, struct git_odb_backend *, size_t, git_otype)
int (* readstream)( struct git_odb_stream **, struct git_odb_backend *, const git_oid *)
- Add type `git_odb_stream`
- Add enum `git_odb_streammode`
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All `git_object` instances looked up from the repository are reference
counted. User is expected to use the new `git_object_close` when an
object is no longer needed to force freeing it.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We now store only one sorting callback that does entry comparison. This
is used when sorting the entries using a quicksort, and when looking for
a specific entry with the new search methods.
The following search methods now exist:
git_vector_search(vector, entry)
git_vector_search2(vector, custom_search_callback, key)
git_vector_bsearch(vector, entry)
git_vector_bsearch2(vector, custom_search_callback, key)
The sorting state of the vector is now stored internally.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The methods previously known as
git_repository_lookup
git_repository_newobject
git_repository_lookup_ref
are now part of their respective namespaces:
git_object_lookup
git_object_new
git_reference_lookup
This makes the API more consistent with the new references API.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Removed `git_tree_add_entry_unsorted`. Now the `git_tree_add_entry`
method doesn't sort the entries array by default; entries are only
sorted lazily when required. This is done automatically by the library
(the `git_tree_sort_entries` call has been removed).
This should improve performance. No point on sorting entries all the time, anyway.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Don't allow access to any tree entries whilst the entries array is
unsorted. We keep track on when the array is unsorted, and any methods
that access the array while it is unsorted now sort the array before
accessing it.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
If plain strcmp is used, as this code did before, the final sorting may
end up different from what git-add would do (for example, 'boost'
appearing before 'boost-build.jam', because Git sorts as if it were
spelled 'boost/').
If the sorting is incorrect like this, Git 1.7.4 insists that unmodified
files have been modified. For example, my test repository has these
four entries:
drwxr-xr-x 199 johnw wheel 6766 Feb 2 17:21 boost
-rw-r--r-- 1 johnw wheel 849 Feb 2 17:22 boost-build.jam
-rw-r--r-- 1 johnw wheel 989 Feb 2 17:21 boost.css
-rw-r--r-- 1 johnw wheel 6308 Feb 2 17:21 boost.png
Here is the output from git-ls-tree for these files, in a commit tree
created using git-add and git-commit:
100644 blob 8b8775433aef73e9e12609610ae2e35cf1e7ec2c boost-build.jam
100644 blob 986c4050fa96d825a1311c8e871cdcc9a3e0d2c3 boost.css
100644 blob b4d51fcd5c9149fd77f5ca6ed2b6b1b70e8fe24f boost.png
040000 tree 46537eeaa4d577010f19b1c9e940cae9a670ff5c boost
Here is the output for the same commit produced using libgit2:
040000 tree c27c0fd1436f28a6ba99acd0a6c17d178ed58288 boost
100644 blob 8b8775433aef73e9e12609610ae2e35cf1e7ec2c boost-build.jam
100644 blob 986c4050fa96d825a1311c8e871cdcc9a3e0d2c3 boost.css
100644 blob b4d51fcd5c9149fd77f5ca6ed2b6b1b70e8fe24f boost.png
Due to this reordering, git-status claims the three blobs are always
modified, no matter what I do using git-read-tree or git-reset or
git-checkout to update the index.
Yes, we are breaking the API. Alpha software, deal with it.
We need a way of getting a pointer to each newly added entry to the
index, because manually looking up the entry after creation is
outrageously expensive.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
In-memory tree objects were not being properly initialized, because the
internal entries vector was created on the 'parse' method.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Libgit2 is now officially include as
#include "<git2.h>"
or indidividual files may be included as
#include <git2/index.h>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The maze with include dependencies has been fixed.
There is now a global include:
#include <git.h>
The git_odb_backend API has been exposed.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the operations on the 'git_index_entry' array and the
'git_tree_entry' array have been refactored into common code in the
src/vector.c file.
The new vector methods support:
- insertion: O(1) (avg)
- deletion: O(n)
- searching: O(logn)
- sorting: O(logn)
- r. access: O(1)
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All initialization functions now return error codes instead of pointers.
Error codes are now properly propagated on most functions. Several new
and more specific error codes have been added in common.h
Signed-off-by: Vicent Marti <tanoku@gmail.com>
String mememory is now managed in a much more sane manner.
Fixes include:
- git_person email and name is no longer limited to 64 characters
- git_tree_entry filename is no longer limited to 255 characters
- raw objects are properly opened & closed the minimum amount of
times required for parsing
- unit tests no longer leak
- removed 5 other misc memory leaks as reported by Valgrind
- tree writeback no longer segfaults on rare ocassions
The git_person struct is no longer public. It is now managed by the
library, and getter methods are in place to access its internal
attributes.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The tree array wasn't being initialized when instantiating a tree object
in memory instead of loading it from disk.
New unit tests added to check for the problem.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
git_tree_entry_byname was dereferencing a NULL pointer when the searched
file couldn't be found on the tree.
New test cases have been added to check for entry access methods.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the setter methods for git_tree have been added, including the
setters for attributes on each git_tree_entry and methods to add/remove
entries of the tree.
Modified trees and trees created in-memory from scratch can be written
back to the repository using git_object_write().
Signed-off-by: Vicent Marti <tanoku@gmail.com>