Hey. Apologies in advance -- I broke your bindings.
This is a major commit that includes a long-overdue redesign of the
whole object-database structure. This is expected to be the last major
external API redesign of the library until the first non-alpha release.
Please get your bindings up to date with these changes. They will be
included in the next minor release. Sorry again!
Major features include:
- Real caching and refcounting on parsed objects
- Real caching and refcounting on objects read from the ODB
- Streaming writes & reads from the ODB
- Single-method writes for all object types
- The external API is now partially thread-safe
The speed increases are significant in all aspects, specially when
reading an object several times from the ODB (revwalking) and when
writing big objects to the ODB.
Here's a full changelog for the external API:
blob.h
------
- Remove `git_blob_new`
- Remove `git_blob_set_rawcontent`
- Remove `git_blob_set_rawcontent_fromfile`
- Rename `git_blob_writefile` -> `git_blob_create_fromfile`
- Change `git_blob_create_fromfile`:
The `path` argument is now relative to the repository's working dir
- Add `git_blob_create_frombuffer`
commit.h
--------
- Remove `git_commit_new`
- Remove `git_commit_add_parent`
- Remove `git_commit_set_message`
- Remove `git_commit_set_committer`
- Remove `git_commit_set_author`
- Remove `git_commit_set_tree`
- Add `git_commit_create`
- Add `git_commit_create_v`
- Add `git_commit_create_o`
- Add `git_commit_create_ov`
tag.h
-----
- Remove `git_tag_new`
- Remove `git_tag_set_target`
- Remove `git_tag_set_name`
- Remove `git_tag_set_tagger`
- Remove `git_tag_set_message`
- Add `git_tag_create`
- Add `git_tag_create_o`
tree.h
------
- Change `git_tree_entry_2object`:
New signature is `(git_object **object_out, git_repository *repo, git_tree_entry *entry)`
- Remove `git_tree_new`
- Remove `git_tree_add_entry`
- Remove `git_tree_remove_entry_byindex`
- Remove `git_tree_remove_entry_byname`
- Remove `git_tree_clearentries`
- Remove `git_tree_entry_set_id`
- Remove `git_tree_entry_set_name`
- Remove `git_tree_entry_set_attributes`
object.h
------------
- Remove `git_object_new
- Remove `git_object_write`
- Change `git_object_close`:
This method is now *mandatory*. Not closing an object causes a
memory leak.
odb.h
-----
- Remove type `git_rawobj`
- Remove `git_rawobj_close`
- Rename `git_rawobj_hash` -> `git_odb_hash`
- Change `git_odb_hash`:
New signature is `(git_oid *id, const void *data, size_t len, git_otype type)`
- Add type `git_odb_object`
- Add `git_odb_object_close`
- Change `git_odb_read`:
New signature is `(git_odb_object **out, git_odb *db, const git_oid *id)`
- Change `git_odb_read_header`:
New signature is `(size_t *len_p, git_otype *type_p, git_odb *db, const git_oid *id)`
- Remove `git_odb_write`
- Add `git_odb_open_wstream`
- Add `git_odb_open_rstream`
odb_backend.h
-------------
- Change type `git_odb_backend`:
New internal signatures are as follows
int (* read)(void **, size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* read_header)(size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* writestream)(struct git_odb_stream **, struct git_odb_backend *, size_t, git_otype)
int (* readstream)( struct git_odb_stream **, struct git_odb_backend *, const git_oid *)
- Add type `git_odb_stream`
- Add enum `git_odb_streammode`
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We now depend on libpthread on all Unix platforms (should be installed
by default) and use a simple wrapper for Windows threads under Win32.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
It's no longer retarded. All object interdependencies are stored as OIDs
instead of actual objects. This should be hundreds of times faster,
specially on big repositories. Heck, who knows, maye it doesn't even
segfault -- wouldn't that be awesome?
What has changed on the API?
`git_commit_parent`, `git_commit_tree`, `git_tag_target` now return
their values through a pointer-to-pointer, and have an error code.
`git_commit_set_tree` and `git_tag_set_target` now return an error
code and may fail.
`git_repository_free__no_gc` has been deprecated because it's
stupid. Since there are no longer any interdependencies between
objects, we don't need internal reference counting, and GC
never fails or double-free's pointers.
`git_object_close` now does a very sane thing: marks an object
as unused. Closed objects will be eventually free'd from the
object cache based on LRU. Please use `git_object_close` from
the garbage collector `destroy` method on your bindings. It's
100% safe.
`git_repository_gc` is a new method that forces a garbage collector
pass through the repo, to free as many LRU objects as possible.
This is useful if we are running out of memory.
The new revision walker uses an internal Commit object storage system,
custom memory allocator and much improved topological and time sorting
algorithms. It's about 20x times faster than the previous implementation
when browsing big repositories.
The following external API calls have changed:
`git_revwalk_next` returns an OID instead of a full commit object.
The initial call to `git_revwalk_next` is no longer blocking when
iterating through a repo with a time-sorting mode.
Iterating with Topological or inverted modes still makes the initial
call blocking to preprocess the commit list, but this block should be
mostly unnoticeable on most repositories (topological preprocessing
times at 0.3s on the git.git repo).
`git_revwalk_push` and `git_revwalk_hide` now take an OID instead
of a full commit object.
Set of methods to find the minimal-length to uniquely identify every OID
in a list. Useful for GUI applications, commit logs and so on.
Includes stress test.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All `git_object` instances looked up from the repository are reference
counted. User is expected to use the new `git_object_close` when an
object is no longer needed to force freeing it.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We now store only one sorting callback that does entry comparison. This
is used when sorting the entries using a quicksort, and when looking for
a specific entry with the new search methods.
The following search methods now exist:
git_vector_search(vector, entry)
git_vector_search2(vector, custom_search_callback, key)
git_vector_bsearch(vector, entry)
git_vector_bsearch2(vector, custom_search_callback, key)
The sorting state of the vector is now stored internally.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The methods previously known as
git_repository_lookup
git_repository_newobject
git_repository_lookup_ref
are now part of their respective namespaces:
git_object_lookup
git_object_new
git_reference_lookup
This makes the API more consistent with the new references API.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Tests are now declared with detailed descriptions and a short test name:
BEGIN_TEST(the_test0, "this is an example test that does something")
...
END_TEST
Modules are declared through a simple macro interface:
BEGIN_MODULE(mod_name)
ADD_TEST(the_test0);
...
END_MODULE
Error messages when tests fail have been greatly improved.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Test helper function which recursively copies the content of a
directory. This function has been tweaked to prevent stack overflows by
reusing the same path buffers on all recursive calls.
The following methods have been implemented:
git_reference_packall
git_reference_rename
git_reference_delete
The library now has full support for packed references, including
partial and total writing. Internal documentation has been updated with
the details.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
These two reference types are now stored separately to eventually allow
the removal/renaming of loose references and rewriting of the refs
packfile.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The old hash table with chained buckets has been replaced by a new one
using Cuckoo hashing, which offers guaranteed constant lookup times.
This should improve speeds on most use cases, since hash tables in
libgit2 are usually used as caches where the objects are stored once and
queried several times.
The Cuckoo hash implementation is based off the one in the Basekit
library [1] for the IO language, but rewritten to support an arbritrary
number of hashes. We currently use 3 to maximize the usage of the nodes pool.
[1]: https://github.com/stevedekorte/basekit/blob/master/source/CHash.c
Signed-off-by: Vicent Marti <tanoku@gmail.com>
In response to issue #60 (git_index_write really slow), the write_index
function has been rewritten to improve its performance -- it should now
be in par with the performance of git.git.
On top of that, if Posix Threads are available when compiling libgit2, a
new threaded writing system will be used (3 separate threads take care
of solving byte-endianness, hashing the contents of the index and
writing to disk, respectively). For very long Index files, this method
is up to 3x times faster than git.git.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The priority value for different backends has been removed from the
public `git_odb_backend` struct. We handle that internally. The priority
value is specified on the `git_odb_add_alternate`.
This is convenient because it allows us to poll a backend twice with
different priorities without having to instantiate it twice.
We also differentiate between main backends and alternates; alternates have
lower priority and cannot be written to.
These changes come with some unit tests to make sure that the backend
sorting is consistent.
The libgit2 version has been bumped to 0.4.0.
This commit changes the external API:
CHANGED:
struct git_odb_backend
No longer has a `priority` attribute; priority for the backend
in managed internally by the library.
git_odb_add_backend(git_odb *odb, git_odb_backend *backend, int priority)
Now takes an additional priority parameter, the priority that
will be given to the backend.
ADDED:
git_odb_add_alternate(git_odb *odb, git_odb_backend *backend, int priority)
Add a backend as an alternate. Alternate backends have always
lower priority than main backends, and writing is disabled on
them.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The `git__joinpath` function has been changed to use a statically
allocated buffer; we assume the buffer to be 4096 bytes, because fuck
you.
The new method also supports an arbritrary number of paths to join,
which may come in handy in the future.
Some methods which were manually joining paths with `strcpy` now use the
new function, namely those in `index.c` and `refs.c`.
Based on Emeric Fermas' original patch, which was using the old
`git__joinpath` because I'm stupid. Thanks!
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Configure again the build system to look for SQLite3. If the library is
found, the SQLite backend will be automatically compiled.
Enjoy *very* fast reads and writes.
MASTER PROTIP: Initialize the backend with ":memory" as the path to the
SQLite database for fully-hosted in-memory repositories. Rejoice.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The `dirname` and `dirbase` methods have been replaced with the Android
implementation, which is actually compilant to some kind of standard.
A new method `topdir` has been added, which returns the topmost
directory in a path.
These changes fix issue #49:
`gitfo_prettify_dir_path` converts "./.git/" to ".git/", so
the code at src/repository.c:190 goes out of bounds when
trying to find the topmost directory.
The new `git__topdir` method handles this gracefully, and the
fixed `git__dirname` now returns the proper value for the
repository's working dir.
E.g.
/repo/.git/ ==> working dir '/repo/'
.git/ ==> working dir '.'
Signed-off-by: Vicent Marti <tanoku@gmail.com>
git_revwalk_next now returns an error code when the iteration is over.
git_repository_index now returns an error code when the index file could
not be opened.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
NIH Enterprises presents: a new testing system based on CuTesT, which is
faster than our previous one and fortunately uses no preprocessing on
the source files, which means we can run that from CMake.
The test suites have been gathered together into bigger files (one file
per suite, testing each of the different submodules of the library).
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Several changes have been committed to allow the user to create
in-memory references and write back to disk. Peeling of symbolic
references has been made explicit. Added getter and setter methods for
all attributes on a reference. Added corresponding documentation.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the commits have been squashed into a single one before refactoring
the final code, to keep everything tidy.
Individual commit messages are as follows:
Added repository reference looking up functionality placeholder.
Added basic reference database definition and caching infrastructure.
Removed useless constant.
Added GIT_EINVALIDREFNAME error and description. Added missing description for GIT_EBAREINDEX.
Added GIT_EREFCORRUPTED error and description.
Added GIT_ETOONESTEDSYMREF error and description.
Added resolving of direct and symbolic references.
Prepared the packed-refs parsing.
Added parsing of the packed-refs file content.
When no loose reference has been found, the full content of the packed-refs file is parsed. All of the new (i.e. not previously parsed as a loose reference) references are eagerly stored in the cached references storage.
The method packed_reference_file__parse() is in deer need of some refactoring. :-)
Extracted to a method the parsing of the peeled target of a tag.
Extracted to a method the parsing of a standard packed ref.
Fixed leaky removal of the cached references.
Ensured that a previously parsed packed reference isn't returned if a more up-to-date loose reference exists.
Enhanced documentation of git_repository_reference_lookup().
Moved some refs related constants from repository.c to refs.h.
Made parsing of a packed tag reference more robust.
Updated git_repository_reference_lookup() documentation.
Added some references to the test repository.
Added some tests covering tag references looking up.
Added some tests covering symbolic and head references looking up.
Added some tests covering packed references looking up.
Yes, we are breaking the API. Alpha software, deal with it.
We need a way of getting a pointer to each newly added entry to the
index, because manually looking up the entry after creation is
outrageously expensive.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Clean up a provided absolute or relative directory path.
This prettification relies on basic operations such as coalescing multiple forward slashes into a single slash, removing '.' and './' current directory segments, and removing parent directory whenever '..' is encountered. If not empty, the returned path ends with a forward slash.
For instance, this will turn "d1/s1///s2/..//../s3" into "d1/s3/".
This only performs a string based analysis of the path. No checks are done to make sure the path actually makes sense from the file system perspective.
- remove() would read one-past array bounds.
- resize() would fail if the initial size was 1, because it multiplied by 1.75
and truncated the resulting value. The buffer would always remain at size 1,
but elements would repeatedly be appended (via insert()) causing a crash.
It was not being used by any methods (only by malloc and calloc), and
since it needs to be TLS, it cannot be exported on DLLs on Windows.
Burn it with fire. The API always returns error codes!
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The new signature struct is public, and contains information about the
timezone offset. Must be free'd manually by the user.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The types in the git_index_entry struct are now system-defaults, and get
truncated to uint32_t's when written back on the index.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Libgit2 is now officially include as
#include "<git2.h>"
or indidividual files may be included as
#include <git2/index.h>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The maze with include dependencies has been fixed.
There is now a global include:
#include <git.h>
The git_odb_backend API has been exposed.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the operations on the 'git_index_entry' array and the
'git_tree_entry' array have been refactored into common code in the
src/vector.c file.
The new vector methods support:
- insertion: O(1) (avg)
- deletion: O(n)
- searching: O(logn)
- sorting: O(logn)
- r. access: O(1)
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Actually add files to the index by creating their corresponding blob and
storing it on the repository, then getting the hash and updating the
index file.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All initialization functions now return error codes instead of pointers.
Error codes are now properly propagated on most functions. Several new
and more specific error codes have been added in common.h
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The constructor to git_repository is now called
'git_repository_open(path)'
and takes a path to a git repository instead of an existing ODB object.
Unit tests have been updated accordingly and the two test repositories
have been merged into one.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Test the new method by loading all the objects in the sample ODB with a
full-read and a header-only read, and comparing the types and sizes.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
String mememory is now managed in a much more sane manner.
Fixes include:
- git_person email and name is no longer limited to 64 characters
- git_tree_entry filename is no longer limited to 255 characters
- raw objects are properly opened & closed the minimum amount of
times required for parsing
- unit tests no longer leak
- removed 5 other misc memory leaks as reported by Valgrind
- tree writeback no longer segfaults on rare ocassions
The git_person struct is no longer public. It is now managed by the
library, and getter methods are in place to access its internal
attributes.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
In particular, msvc complains thus:
t0603-sort.c(23) : warning C4244: 'function' : conversion from \
'time_t' to 'unsigned int', possible loss of data
Note that msvc, by default, defines time_t as a 64-bit type, whereas
srand() is expecting an (32-bit) unsigned int. In order to suppress
the warning, we simply cast the return value of the time() function
call to 'unsigned int'.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
The tree array wasn't being initialized when instantiating a tree object
in memory instead of loading it from disk.
New unit tests added to check for the problem.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Basic write-back & in-memory editing for objects is now tested in t0403
(commits), t0802 (tags) and t0902 (trees).
Add new helper functions in test_helpers.c to remove the loose objects
created when doing write-back tests.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
git_tree_entry_byname was dereferencing a NULL pointer when the searched
file couldn't be found on the tree.
New test cases have been added to check for entry access methods.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the setter methods for git_tree have been added, including the
setters for attributes on each git_tree_entry and methods to add/remove
entries of the tree.
Modified trees and trees created in-memory from scratch can be written
back to the repository using git_object_write().
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the required git_commit_set_XXX methods have been implemented; all
the attributes of a commit object can now be modified in-memory.
The new method git_object_write() automatically writes back the
in-memory changes of any object to the repository. So far it only
supports git_commit objects.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The 'git_obj' structure is now called 'git_rawobj', since
it represents a raw object read from the ODB.
The 'git_repository_object' structure is now called 'git_object',
since it's the base object class for all objects.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The interface for loading and parsing tree objects from a repository has
been completed with all the required accesor methods for attributes,
support for manipulating individual tree entries and a new unit test
t0901-readtree which tries to load and parse a tree object from a
repository.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Three new unit tests, t06XX files have been added.
t0601-read: tests for loading index files from disk,
for creating in-memory indexes and for accessing
index entries.
t0602-write: tests for writing index files back to disk
t0603-sort: tests for properly sorting the entries array of an index
Two test indexes have been added in 'tests/resources/':
test/resources/index: a sample index from a libgit2 repository
test/resources/gitgit.index: a sample index from a git.git
repository (includes TREE extension data)
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the external resources used by the tests are now placed inside the
common 'tests/resources' directory.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The old 'git_revpool' object has been removed and
split into two distinct objects with separate
functionality, in order to have separate methods for
object management and object walking.
* A new object 'git_repository' does the high-level
management of a repository's objects (commits, trees,
tags, etc) on top of a 'git_odb'.
Eventually, it will also manage other repository
attributes (e.g. tag resolution, references, etc).
See: src/git/repository.h
* A new external method
'git_repository_lookup(repo, oid, type)'
has been added to the 'git_repository' API.
All object lookups (git_XXX_lookup()) are now
wrappers to this method, and duplicated code
has been removed. The method does automatic type
checking and returns a generic 'git_revpool_object'
that can be cast to any specific object.
See: src/git/repository.h
* The external methods for object parsing of repository
objects (git_XXX_parse()) have been removed.
Loading objects from the repository is now managed
through the 'lookup' functions. These objects are
loaded with minimal information, and the relevant
parsing is done automatically when the user requests
any of the parsed attributes through accessor methods.
An attribute has been added to 'git_repository' in
order to force the parsing of all the repository objects
immediately after lookup.
See: src/git/commit.h
See: src/git/tag.h
See: src/git/tree.h
* The previous walking functionality of the revpool
is now found in 'git_revwalk', which does the actual
revision walking on a repository; the attributes
when walking through commits in a database have been
decoupled from the actual commit objects.
This increases performance when accessing commits
during the walk and allows to have several
'git_revwalk' instances working at the same time on
top of the same repository, without having to load
commits in memory several times.
See: src/git/revwalk.h
* The old 'git_revpool_table' has been renamed to
'git_hashtable' and now works as a generic hashtable
with support for any kind of object and custom hash
functions.
See: src/hashtable.h
* All the relevant unit tests have been updated, renamed
and grouped accordingly.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Tag objects are now properly loaded from the revision pool.
New test t0801 checks for loading a parsing a series of tags, including
the tag of a tag.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The 'parse_oid' and 'parse_person' methods which were used by the commit
parser are now global so they can be used when parsing other objects.
The 'git_commit_person' struct has been changed to a generic
'git_person'.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Packed objects inside packfiles are now properly unpacked when calling
the git_odb__read_packed() method; delta'ed objects are also properly
generated when needed.
A new unit test 0204-readpack tries to read a couple hundred packed
objects from a standard packed repository.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The following new external methods have been added:
GIT_EXTERN(const char *) git_commit_message_short(git_commit *commit);
GIT_EXTERN(const char *) git_commit_message(git_commit *commit);
GIT_EXTERN(time_t) git_commit_time(git_commit *commit);
GIT_EXTERN(const git_commit_person *) git_commit_committer(git_commit *commit);
GIT_EXTERN(const git_commit_person *) git_commit_author(git_commit *commit);
GIT_EXTERN(const git_tree *) git_commit_tree(git_commit *commit);
A new structure, git_commit_person has been added to represent a
commit's author or committer.
The parsing of a commit has been split in two phases.
When adding a commit to the revision pool:
- the commit's ODB object is opened
- its raw contents are parsed for commit TIME, PARENTS and TREE
(the minimal amount of data required to traverse the pool)
- the commit's ODB object is closed
When querying for extended information on a commit:
- the commit's ODB object is reopened
- its raw contents are parsed for the requested information
- the commit's ODB object remains open to handle additional queries
New unit tests have been added for the new functionality:
In t0401-parse: parse_person_test
In t0402-details: query_details_test
Signed-off-by: Vicent Marti <tanoku@gmail.com>
"t0502-table" tests for basic functionality of the objects
table:
table_create (creating a new object table)
table_populate (fill & lookup on the object table)
table_resize (dynamically resize the table)
"t0503-tableit" tests the iterator for object tables:
table_iterator (make sure the iterator reaches all objects)
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Created commit objects in t0401-parse weren't being freed properly.
Updated the API documentation to note that commit objects are owned
by the revision pool and should not be freed manually.
The parents list of each commit was being freed twice after each test.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
When git_oid_to_string() was passed a buffer size larger than
GIT_OID_HEXSZ+1, the function placed the c-string NUL char at
the wrong position. Fix the code to place the NUL at the end
of the (possibly truncated) oid string.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>