"git_config_backend" have been renamed to "git_config_file", which
implements a generic interface to access a configuration file -- be it
either on disk, from a DB or whatever mumbojumbo.
I think this makes more sense.
Regarding "initialize" vs. "initialise", www.dict.cc says the first is American
English whereas the latter in British English. For consistency, we should
stick to American English.
When I changed it over to use different strings for the variable and
the name, cvar_name_normalize was left behind. Fix this and rename to
cvar_normalize_name to reflect the incompatible change.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Now we use a simple closed-addressing cache. Cuckoo hashing was creating
too many issues with race conditions. Fuck that.
Let's see what happens performance wise, we may have to roll back or
come up with another way to implement an efficient multi-threaded cache.
Configuration options can come from different sources. Currently,
there is only support for reading them from a flat file, but it might
make sense to read it from a database at some point.
Move the parsing code into src/config_file.c and create an include
file include/git2/config_backend.h to allow for other backends to be
developed.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Ok, this is the real deal. Hopefully. Here's how it's going to work:
- One main method, called `git__throw`, that sets the error
code and error message when an error happens.
This method must be called in every single place where an error
code was being returned previously, setting an error message
instead.
Example, instead of:
return GIT_EOBJCORRUPTED;
Use:
return git__throw(GIT_EOBJCORRUPTED,
"The object is missing a finalizing line feed");
And instead of:
[...] {
error = GIT_EOBJCORRUPTED;
goto cleanup;
}
Use:
[...] {
error = git__throw(GIT_EOBJCORRUPTED, "What an error!");
goto cleanup;
}
The **only** exception to this are the allocation methods, which
return NULL on failure but already set the message manually.
/* only place where an error code can be returned directly,
because the error message has already been set by the wrapper */
if (foo == NULL)
return GIT_ENOMEM;
- One secondary method, called `git__rethrow`, which can be used to
fine-grain an error message and build an error stack.
Example, instead of:
if ((error = foobar(baz)) < GIT_SUCCESS)
return error;
You can now do:
if ((error = foobar(baz)) < GIT_SUCCESS)
return git__rethrow(error, "Failed to do a major operation");
The return of the `git_lasterror` method will be a string in the
shape of:
"Failed to do a major operation. (Failed to do an internal
operation)"
E.g.
"Failed to open the index. (Not enough permissions to access
'/path/to/index')."
NOTE: do not abuse this method. Try to write all `git__throw`
messages in a descriptive manner, to avoid having to rethrow them to
clarify their meaning.
This method should only be used in the places where the original
error message set by a subroutine is not specific enough.
It is encouraged to continue using this style as much possible to
enforce error propagation:
if ((error = foobar(baz)) < GIT_SUCCESS)
return error; /* `foobar` has set an error message, and
we are just propagating it */
The error handling revamp will take place in two phases:
- Phase 1: Replace all pieces of code that return direct error codes
with calls to `git__throw`. This can be done semi-automatically
using `ack` to locate all the error codes that must be replaced.
- Phase 2: Add some `git__rethrow` calls in those cases where the
original error messages are not specific enough.
Phase 1 is the main goal. A minor libgit2 release will be shipped once
Phase 1 is ready, and the work will start on gradually improving the
error handling mechanism by refining specific error messages.
OTHER NOTES:
- When writing error messages, please refrain from using weasel
words. They add verbosity to the message without giving any real
information. (<3 Emeric)
E.g.
"The reference file appears to be missing a carriage return"
Nope.
"The reference file is missing a carriage return"
Yes.
- When calling `git__throw`, please try to use more generic error
codes so we can eventually reduce the list of error codes to
something more reasonable. Feel free to add new, more generic error
codes if these are going to replace several of the old ones.
E.g.
return GIT_EREFCORRUPTED;
Can be turned into:
return git__throw(GIT_EOBJCORRUPTED,
"The reference is corrupted");
Win32 critical section objects (CRITICAL_SECTION) are not kernel objects.
Only kernel objects are destroyed by using CloseHandle. Critical sections
are supposed to be deleted with the DeleteCriticalSection API
(http://msdn.microsoft.com/en-us/library/ms682552(VS.85).aspx).
The section and variable names use different rules, so store them as
two different variables internally.
This will simplify the configuration-writing code as well later on,
but even with parsing, the code is simpler.
Take this opportunity to add a variable to the list directly when
parsing instead of passing through config_set.
A root commit is a commit whose branch (usually what HEAD points to)
doesn't exist (yet). This situation can happen when the commit is the
first after 1) a repository is initialized or 2) a orphan checkout has
been performed.
Take this opportunity to remove the symbolic link check, as
git_reference_resolve works on OID refs as well.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Typical use is git_reference_resolve(&ref, ref). Currently, if there is
an error, ref will point to NULL, causing the user to lose that
reference.
Always update resolved_ref instead of just on finding an OID ref,
storing the last valid reference in it.
This change helps simplify the code for allowing root commits.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Magic constant replaced by direct to-string covertion because of:
1) with value length 6 (040000 - subtree) final tree will be corrupted;
2) for wrong values length <6 final tree will be corrupted too.
Removed the optional `replace` argument, we now have 4 add methods:
`git_index_add`: add or update from path
`git_index_add2`: add or update from struct
`git_index_append`: add without replacing from path
`git_index_append2`: add without replacing from struct
Yes, this breaks the bindings.
New external functions:
- git_index_unmerged_entrycount: Counts the unmerged entries in
the index
- git_index_get_unmerged: Gets an unmerged entry from the index
by name
New internal functions:
- read_unmerged: Wrapper for read_unmerged_internal
- read_unmerged_internal: Reads unmerged entries from the index
if the index has the INDEX_EXT_UNMERGED_SIG set
- unmerged_srch: Search function for unmerged vector
- unmerged_cmp: Compare function for unmerged vector
New data structures:
- git_index now contains a git_vector unmerged that stores
unmerged entries
- git_index_entry_unmerged: Representation of an unmerged file
entry. It represents all three versions of the file at the
same time, with one name, three modes and three OIDs
When in the middle of a merge, the index needs to contain several files
with the same name. git_index_insert() used to prevent this by not adding a new entry if an entry with the same name already existed.
Most tags will have a timestamp of whenever the code is running and
dealing with time and timezones is error-prone. Optimize for this case
by adding a function which causes the signature to be created with a
current timestamp.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
git_repository_path() and git_repository_workdir() respectively return the path to the git repository and the working directory. Those paths are absolute and normalized.
Don't blindly pass the target type to git_tag_type2string as it will
give an empty string on GIT_OBJ_ANY which would cause us to create an
invalid tag object.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
We cannot assume that Redis is never going to return an error code; when
Reddit fails, we cannot crash our library, we need to handle the crash
gracefully.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
long int is a safer type than int unless the user knows that the
variable is going to be quite small.
The code has been reworked to use strtol instead of the more
complicated sscanf.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Many error paths freed their local data althought it is freed later on
when the end of the function notices that there was an error. This can
cause double frees and invalid memory access.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Make cvar_free return void instad of the next element, as it was
mostly a hack to make cvar_list_free shorter but it's now using the
list macros.
Also check if the input is NULL and return immediately in that case.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
There is no need to keep config file in memory until the the
configuration is freed. Free the buffer immediately after the
configuration has been parsed.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
If a variable name appears on its own in a line, it's assumed the
value is true. Store the variable name as NULL in that case.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
If a line ends at EOF there is no need to check for the newline
character and doing so will cause us to read memory beyond the
allocatd memory as we check for the Windows-style new-line, which is
two bytes long.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
If a variable value has the traditional continuation character (\) as
the last non-space character in the line, then we continue reading the
value on the next line.
Using more than two lines is also supported.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Make header and variable parse functions use their own buffers instead
of giving them the line they need to read as a parameter which they
mostly ignore.
This is in preparation for multiline configuration variables.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Streaming writes will no longer fail when writing to a backend that
doesn't support streaming writes but supports direct ones.
Now we create a fake stream on memory and then write it as a single
block using the backend `write` callback.
A variable name is stored internally with its section the way it
appeared in the configuration file in order to have the information
about what parts are case-sensitive inline.
Really implement parse_section_header_ext and move the assignment of
variables to config_parse.
The variable name matching is now done in a case-away way by
cvar_name_match and cvar_section_match. Before the user sees it, it's
normalized to the two- or three-dot version.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Such a list preserves the order the variables were first read in which
will be useful later for merging different data-sets. Furthermore,
reading and writing out the same configuration should not reorganize
the variables, which could happen when iterating through all the items
in a hash table.
A hash table is overkill for this small a data-set anyway.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
git_config_open shouldn't have to initialise variables that are only
used inside config_parse and its callees.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Config variables should be interpreted at run-time, as we don't know if a
zero means false or zero, or if yes means true or "yes".
As a variable has no intrinsic type, git_cvtype is gone and the public
API takes care of enforcing a few rules.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Allow any well-formed reference name to live under refs/ removing the
condition that they be under refs/{heads,tags,remotes}/ as was the
design of git.
An exception is made for HEAD which is allowed to contain an OID
reference in detached HEAD state.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Add internal reference create and rename functions which take a force
parameter, telling them to overwrite an existing reference if it
exists.
These functions try to update the reference if it's of the same type
as the one it's going to be replaced by. Otherwise the old reference
becomes invalid.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
These functions can be used to query or modify the variables in a
given configuration. No sanity checking is done on the variable names.
This is mostly meant as an API preview.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
If cfg_readline consumes the line, then parse_section_header will read
past it and if we read a character, parse_variable won't have the full
name.
This solution is a bit hackish, but it's the simplest way to get the
code to parse correctly.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Save the location of the name in section_out instead of returning it
as an int. Use the return code to signal success or failure.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Expose the tag parsing capabilities already present in the
library.
Exporting this function makes it possible to implement the
mktag command without duplicating this functionality.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
List all the references in the repository, calling a custom
callback for each one.
The listed references may be filtered by type, or using
a bitwise OR of several types. Use the magic value
`GIT_REF_LISTALL` to obtain all references, including
packed ones.
The `callback` function will be called for each of the references
in the repository, and will receive the name of the reference and
the `payload` value passed to this method.
The current behaviour of git_index_open{bare,inrepo}() is unexpected.
When an index is opened, an in-memory index object is created that is
linked to the index discovered by git_repository_open(). However, this
index object is empty, as the on-disk index is not read. To fully open
the on-disk index file, git_index_read() has to be called. This leads to
confusing behaviour. Consider the following code:
git_index *idx;
git_index_open_inrepo(&idx, repo);
git_index_write(idx);
You would expect this to have no effect, as the index is never
ostensibly manipulated. However, what actually happens is that the index
entries are removed from the on-disk index because the empty in-memory
index object created by open_inrepo() is written back to the disk.
This patch reads the index after opening it.
Temporary files when doing streaming writes are now stored inside the
Objects folder, to prevent issues when moving files between
disks/partitions.
Add support for block writes to the ODB again (for those backends that
cannot implement streaming).
When the system temporary folder is located on a different volume than the working directory into which libgit2 is executing, MoveFileEx() requires an additional flag.
Hey. Apologies in advance -- I broke your bindings.
This is a major commit that includes a long-overdue redesign of the
whole object-database structure. This is expected to be the last major
external API redesign of the library until the first non-alpha release.
Please get your bindings up to date with these changes. They will be
included in the next minor release. Sorry again!
Major features include:
- Real caching and refcounting on parsed objects
- Real caching and refcounting on objects read from the ODB
- Streaming writes & reads from the ODB
- Single-method writes for all object types
- The external API is now partially thread-safe
The speed increases are significant in all aspects, specially when
reading an object several times from the ODB (revwalking) and when
writing big objects to the ODB.
Here's a full changelog for the external API:
blob.h
------
- Remove `git_blob_new`
- Remove `git_blob_set_rawcontent`
- Remove `git_blob_set_rawcontent_fromfile`
- Rename `git_blob_writefile` -> `git_blob_create_fromfile`
- Change `git_blob_create_fromfile`:
The `path` argument is now relative to the repository's working dir
- Add `git_blob_create_frombuffer`
commit.h
--------
- Remove `git_commit_new`
- Remove `git_commit_add_parent`
- Remove `git_commit_set_message`
- Remove `git_commit_set_committer`
- Remove `git_commit_set_author`
- Remove `git_commit_set_tree`
- Add `git_commit_create`
- Add `git_commit_create_v`
- Add `git_commit_create_o`
- Add `git_commit_create_ov`
tag.h
-----
- Remove `git_tag_new`
- Remove `git_tag_set_target`
- Remove `git_tag_set_name`
- Remove `git_tag_set_tagger`
- Remove `git_tag_set_message`
- Add `git_tag_create`
- Add `git_tag_create_o`
tree.h
------
- Change `git_tree_entry_2object`:
New signature is `(git_object **object_out, git_repository *repo, git_tree_entry *entry)`
- Remove `git_tree_new`
- Remove `git_tree_add_entry`
- Remove `git_tree_remove_entry_byindex`
- Remove `git_tree_remove_entry_byname`
- Remove `git_tree_clearentries`
- Remove `git_tree_entry_set_id`
- Remove `git_tree_entry_set_name`
- Remove `git_tree_entry_set_attributes`
object.h
------------
- Remove `git_object_new
- Remove `git_object_write`
- Change `git_object_close`:
This method is now *mandatory*. Not closing an object causes a
memory leak.
odb.h
-----
- Remove type `git_rawobj`
- Remove `git_rawobj_close`
- Rename `git_rawobj_hash` -> `git_odb_hash`
- Change `git_odb_hash`:
New signature is `(git_oid *id, const void *data, size_t len, git_otype type)`
- Add type `git_odb_object`
- Add `git_odb_object_close`
- Change `git_odb_read`:
New signature is `(git_odb_object **out, git_odb *db, const git_oid *id)`
- Change `git_odb_read_header`:
New signature is `(size_t *len_p, git_otype *type_p, git_odb *db, const git_oid *id)`
- Remove `git_odb_write`
- Add `git_odb_open_wstream`
- Add `git_odb_open_rstream`
odb_backend.h
-------------
- Change type `git_odb_backend`:
New internal signatures are as follows
int (* read)(void **, size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* read_header)(size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* writestream)(struct git_odb_stream **, struct git_odb_backend *, size_t, git_otype)
int (* readstream)( struct git_odb_stream **, struct git_odb_backend *, const git_oid *)
- Add type `git_odb_stream`
- Add enum `git_odb_streammode`
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We now depend on libpthread on all Unix platforms (should be installed
by default) and use a simple wrapper for Windows threads under Win32.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
It's no longer retarded. All object interdependencies are stored as OIDs
instead of actual objects. This should be hundreds of times faster,
specially on big repositories. Heck, who knows, maye it doesn't even
segfault -- wouldn't that be awesome?
What has changed on the API?
`git_commit_parent`, `git_commit_tree`, `git_tag_target` now return
their values through a pointer-to-pointer, and have an error code.
`git_commit_set_tree` and `git_tag_set_target` now return an error
code and may fail.
`git_repository_free__no_gc` has been deprecated because it's
stupid. Since there are no longer any interdependencies between
objects, we don't need internal reference counting, and GC
never fails or double-free's pointers.
`git_object_close` now does a very sane thing: marks an object
as unused. Closed objects will be eventually free'd from the
object cache based on LRU. Please use `git_object_close` from
the garbage collector `destroy` method on your bindings. It's
100% safe.
`git_repository_gc` is a new method that forces a garbage collector
pass through the repo, to free as many LRU objects as possible.
This is useful if we are running out of memory.
The new pack backend is an adaptation of the original git.git code in
`sha1_file.c`. It's slightly faster than the previous version and
severely less memory-hungry.
The call-stack of a normal pack backend query has been properly
documented in the top of the header for future reference. And by
properly I mean with ASCII diagrams 'n shit.
The new revision walker uses an internal Commit object storage system,
custom memory allocator and much improved topological and time sorting
algorithms. It's about 20x times faster than the previous implementation
when browsing big repositories.
The following external API calls have changed:
`git_revwalk_next` returns an OID instead of a full commit object.
The initial call to `git_revwalk_next` is no longer blocking when
iterating through a repo with a time-sorting mode.
Iterating with Topological or inverted modes still makes the initial
call blocking to preprocess the commit list, but this block should be
mostly unnoticeable on most repositories (topological preprocessing
times at 0.3s on the git.git repo).
`git_revwalk_push` and `git_revwalk_hide` now take an OID instead
of a full commit object.
Set of methods to find the minimal-length to uniquely identify every OID
in a list. Useful for GUI applications, commit logs and so on.
Includes stress test.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Set of methods to find the minimal-length to uniquely identify every OID
in a list.
Includes stress test.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We cannot make sure that the user doesn't use the same buffer as source
and destination, so write to it using memmove.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Disable garbage collection of cross-references to prevent
double-freeing. Internal reference management is now done
with a separate method.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
- Added several missing reference increases
- Add new destructor to the repository that does not GC the objects
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All `git_object` instances looked up from the repository are reference
counted. User is expected to use the new `git_object_close` when an
object is no longer needed to force freeing it.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We now store only one sorting callback that does entry comparison. This
is used when sorting the entries using a quicksort, and when looking for
a specific entry with the new search methods.
The following search methods now exist:
git_vector_search(vector, entry)
git_vector_search2(vector, custom_search_callback, key)
git_vector_bsearch(vector, entry)
git_vector_bsearch2(vector, custom_search_callback, key)
The sorting state of the vector is now stored internally.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The methods previously known as
git_repository_lookup
git_repository_newobject
git_repository_lookup_ref
are now part of their respective namespaces:
git_object_lookup
git_object_new
git_reference_lookup
This makes the API more consistent with the new references API.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The following methods have been implemented:
git_reference_packall
git_reference_rename
git_reference_delete
The library now has full support for packed references, including
partial and total writing. Internal documentation has been updated with
the details.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
These two reference types are now stored separately to eventually allow
the removal/renaming of loose references and rewriting of the refs
packfile.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We now use MoveFileEx, which is not assured to be atomic but works for
always (both if the destination exists, or if it doesn't) and is
available in MinGW.
Since this is a Win32 API call, complaint about lost or overwritten files
should be forwarded at Steve Ballmer.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The `rename` call doesn't quite work on Win32: expects the destination
file to not exist. We're using a native Win32 call in those cases --
that should do the trick.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The old hash table with chained buckets has been replaced by a new one
using Cuckoo hashing, which offers guaranteed constant lookup times.
This should improve speeds on most use cases, since hash tables in
libgit2 are usually used as caches where the objects are stored once and
queried several times.
The Cuckoo hash implementation is based off the one in the Basekit
library [1] for the IO language, but rewritten to support an arbritrary
number of hashes. We currently use 3 to maximize the usage of the nodes pool.
[1]: https://github.com/stevedekorte/basekit/blob/master/source/CHash.c
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The new `git_filebuf` structure provides atomic high-performance writes
to disk by using a write cache, and optionally a double-buffered scheme
through a worker thread (not enabled yet).
Writes can be done 3-layered, like in git.git (user code -> write cache
-> disk), or 2-layered, by writing directly on the cache. This makes
index writing considerably faster.
The `git_filebuf` structure contains all the old functionality of
`git_filelock` for atomic file writes and reads. The `git_filelock`
structure has been removed.
Additionally, the `git_filebuf` API allows to automatically hash (SHA1)
all the data as it is written to disk (hashing is done smartly on big
chunks to improve performance).
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The interlocking on the write threads was not being done properly (index
entries were sometimes written out of order). With proper interlocking,
the threaded write is only marginally faster on big index files, and
slower on the smaller ones because of the overhead when creating
threads.
The threaded index writing has been temporarily disabled; after more
accurate benchmarks, if might be possible to enable it again only when
writing very large index files (> 1000 entries).
Signed-off-by: Vicent Marti <tanoku@gmail.com>
64-bit types stored in memory have to be truncated into 32 bits when
writing to disk. Was causing warnings in MSVC.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
In response to issue #60 (git_index_write really slow), the write_index
function has been rewritten to improve its performance -- it should now
be in par with the performance of git.git.
On top of that, if Posix Threads are available when compiling libgit2, a
new threaded writing system will be used (3 separate threads take care
of solving byte-endianness, hashing the contents of the index and
writing to disk, respectively). For very long Index files, this method
is up to 3x times faster than git.git.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The priority value for different backends has been removed from the
public `git_odb_backend` struct. We handle that internally. The priority
value is specified on the `git_odb_add_alternate`.
This is convenient because it allows us to poll a backend twice with
different priorities without having to instantiate it twice.
We also differentiate between main backends and alternates; alternates have
lower priority and cannot be written to.
These changes come with some unit tests to make sure that the backend
sorting is consistent.
The libgit2 version has been bumped to 0.4.0.
This commit changes the external API:
CHANGED:
struct git_odb_backend
No longer has a `priority` attribute; priority for the backend
in managed internally by the library.
git_odb_add_backend(git_odb *odb, git_odb_backend *backend, int priority)
Now takes an additional priority parameter, the priority that
will be given to the backend.
ADDED:
git_odb_add_alternate(git_odb *odb, git_odb_backend *backend, int priority)
Add a backend as an alternate. Alternate backends have always
lower priority than main backends, and writing is disabled on
them.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The alternates file is now parsed, and the alternate ODB folders are
added as separate backends. This allows the library to efficiently query
the alternate folders.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The `git__joinpath` function has been changed to use a statically
allocated buffer; we assume the buffer to be 4096 bytes, because fuck
you.
The new method also supports an arbritrary number of paths to join,
which may come in handy in the future.
Some methods which were manually joining paths with `strcpy` now use the
new function, namely those in `index.c` and `refs.c`.
Based on Emeric Fermas' original patch, which was using the old
`git__joinpath` because I'm stupid. Thanks!
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Removed `git_tree_add_entry_unsorted`. Now the `git_tree_add_entry`
method doesn't sort the entries array by default; entries are only
sorted lazily when required. This is done automatically by the library
(the `git_tree_sort_entries` call has been removed).
This should improve performance. No point on sorting entries all the time, anyway.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We now have proper sonames in Mac OS X and Linux, proper versioning on
the pkg-config file and proper DLL naming in Windows.
The version of the library is defined exclusively in 'src/git2.h'; the build scripts
read it from there automatically.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Configure again the build system to look for SQLite3. If the library is
found, the SQLite backend will be automatically compiled.
Enjoy *very* fast reads and writes.
MASTER PROTIP: Initialize the backend with ":memory" as the path to the
SQLite database for fully-hosted in-memory repositories. Rejoice.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The `dirname` and `dirbase` methods have been replaced with the Android
implementation, which is actually compilant to some kind of standard.
A new method `topdir` has been added, which returns the topmost
directory in a path.
These changes fix issue #49:
`gitfo_prettify_dir_path` converts "./.git/" to ".git/", so
the code at src/repository.c:190 goes out of bounds when
trying to find the topmost directory.
The new `git__topdir` method handles this gracefully, and the
fixed `git__dirname` now returns the proper value for the
repository's working dir.
E.g.
/repo/.git/ ==> working dir '/repo/'
.git/ ==> working dir '.'
Signed-off-by: Vicent Marti <tanoku@gmail.com>
git_revwalk_next now returns an error code when the iteration is over.
git_repository_index now returns an error code when the index file could
not be opened.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Don't allow access to any tree entries whilst the entries array is
unsorted. We keep track on when the array is unsorted, and any methods
that access the array while it is unsorted now sort the array before
accessing it.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
If plain strcmp is used, as this code did before, the final sorting may
end up different from what git-add would do (for example, 'boost'
appearing before 'boost-build.jam', because Git sorts as if it were
spelled 'boost/').
If the sorting is incorrect like this, Git 1.7.4 insists that unmodified
files have been modified. For example, my test repository has these
four entries:
drwxr-xr-x 199 johnw wheel 6766 Feb 2 17:21 boost
-rw-r--r-- 1 johnw wheel 849 Feb 2 17:22 boost-build.jam
-rw-r--r-- 1 johnw wheel 989 Feb 2 17:21 boost.css
-rw-r--r-- 1 johnw wheel 6308 Feb 2 17:21 boost.png
Here is the output from git-ls-tree for these files, in a commit tree
created using git-add and git-commit:
100644 blob 8b8775433aef73e9e12609610ae2e35cf1e7ec2c boost-build.jam
100644 blob 986c4050fa96d825a1311c8e871cdcc9a3e0d2c3 boost.css
100644 blob b4d51fcd5c9149fd77f5ca6ed2b6b1b70e8fe24f boost.png
040000 tree 46537eeaa4d577010f19b1c9e940cae9a670ff5c boost
Here is the output for the same commit produced using libgit2:
040000 tree c27c0fd1436f28a6ba99acd0a6c17d178ed58288 boost
100644 blob 8b8775433aef73e9e12609610ae2e35cf1e7ec2c boost-build.jam
100644 blob 986c4050fa96d825a1311c8e871cdcc9a3e0d2c3 boost.css
100644 blob b4d51fcd5c9149fd77f5ca6ed2b6b1b70e8fe24f boost.png
Due to this reordering, git-status claims the three blobs are always
modified, no matter what I do using git-read-tree or git-reset or
git-checkout to update the index.
Several changes have been committed to allow the user to create
in-memory references and write back to disk. Peeling of symbolic
references has been made explicit. Added getter and setter methods for
all attributes on a reference. Added corresponding documentation.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the commits have been squashed into a single one before refactoring
the final code, to keep everything tidy.
Individual commit messages are as follows:
Added repository reference looking up functionality placeholder.
Added basic reference database definition and caching infrastructure.
Removed useless constant.
Added GIT_EINVALIDREFNAME error and description. Added missing description for GIT_EBAREINDEX.
Added GIT_EREFCORRUPTED error and description.
Added GIT_ETOONESTEDSYMREF error and description.
Added resolving of direct and symbolic references.
Prepared the packed-refs parsing.
Added parsing of the packed-refs file content.
When no loose reference has been found, the full content of the packed-refs file is parsed. All of the new (i.e. not previously parsed as a loose reference) references are eagerly stored in the cached references storage.
The method packed_reference_file__parse() is in deer need of some refactoring. :-)
Extracted to a method the parsing of the peeled target of a tag.
Extracted to a method the parsing of a standard packed ref.
Fixed leaky removal of the cached references.
Ensured that a previously parsed packed reference isn't returned if a more up-to-date loose reference exists.
Enhanced documentation of git_repository_reference_lookup().
Moved some refs related constants from repository.c to refs.h.
Made parsing of a packed tag reference more robust.
Updated git_repository_reference_lookup() documentation.
Added some references to the test repository.
Added some tests covering tag references looking up.
Added some tests covering symbolic and head references looking up.
Added some tests covering packed references looking up.
Yes, we are breaking the API. Alpha software, deal with it.
We need a way of getting a pointer to each newly added entry to the
index, because manually looking up the entry after creation is
outrageously expensive.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
In-memory tree objects were not being properly initialized, because the
internal entries vector was created on the 'parse' method.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Don't need a brand new header for two typedefs when we already have a
types.h header.
Change comment style to ANSI C.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Clean up a provided absolute or relative directory path.
This prettification relies on basic operations such as coalescing multiple forward slashes into a single slash, removing '.' and './' current directory segments, and removing parent directory whenever '..' is encountered. If not empty, the returned path ends with a forward slash.
For instance, this will turn "d1/s1///s2/..//../s3" into "d1/s3/".
This only performs a string based analysis of the path. No checks are done to make sure the path actually makes sense from the file system perspective.
Windows uses a 64 bit time_t by default and assigning to unsigned int causes a
64 -> 32 bit truncation warning. This change forces the truncation,
acknowledging the implications detailed in the file comments. Also, blobs are
limited to 32 bit file sizes for the same reason (on all platforms).
Off_t is not cool. It can be 32 or 64 bits depending on the platform,
but on the Index format, it's always 32 bits.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
- remove() would read one-past array bounds.
- resize() would fail if the initial size was 1, because it multiplied by 1.75
and truncated the resulting value. The buffer would always remain at size 1,
but elements would repeatedly be appended (via insert()) causing a crash.
It's MurmurHash3 slightly edited to make it
cross-platform. Fast and neat.
Use this for hashing strings on hash tables instead
of a full SHA1 hash. It's very fast and well distributed.
Obviously not crypto-secure.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
It is not a good idea to export these internal symbols now that they are
not required to run the unit tests.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
It was not being used by any methods (only by malloc and calloc), and
since it needs to be TLS, it cannot be exported on DLLs on Windows.
Burn it with fire. The API always returns error codes!
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Some external functions were not being exported because they were using
the 'extern' keyword instead of the generic GIT_EXTERN() macro.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The new signature struct is public, and contains information about the
timezone offset. Must be free'd manually by the user.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The types in the git_index_entry struct are now system-defaults, and get
truncated to uint32_t's when written back on the index.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Libgit2 is now officially include as
#include "<git2.h>"
or indidividual files may be included as
#include <git2/index.h>
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The maze with include dependencies has been fixed.
There is now a global include:
#include <git.h>
The git_odb_backend API has been exposed.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the operations on the 'git_index_entry' array and the
'git_tree_entry' array have been refactored into common code in the
src/vector.c file.
The new vector methods support:
- insertion: O(1) (avg)
- deletion: O(n)
- searching: O(logn)
- sorting: O(logn)
- r. access: O(1)
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Yes, if you are wondering why the shared library was
failing to build under MSVC, it's because it was empty.
Oh wow.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We cannot assume that non-bare repositories have an index file, because
'git index' doesn't create it by default.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Actually add files to the index by creating their corresponding blob and
storing it on the repository, then getting the hash and updating the
index file.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Several private methods of the Index API are now public, including the
methods to remove, get and add index entries.
All the methods only take an integer value for the position of the entry
to get/remove. To get or remove entries based on their path names, look
them up first using the git_index_find method.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All initialization functions now return error codes instead of pointers.
Error codes are now properly propagated on most functions. Several new
and more specific error codes have been added in common.h
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The constructor to git_repository is now called
'git_repository_open(path)'
and takes a path to a git repository instead of an existing ODB object.
Unit tests have been updated accordingly and the two test repositories
have been merged into one.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Issue 9 on the tracker. The commit object getters for in-memory objects
were trying to parse an inexistant on-disk object when one of the commit
attributes which were still not set was queried.
We now return a NULL value when this happens.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Before changing the attributes of a commit, make sure that the internal
status is consistent with the one in the repository.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
You can know access the owning repository of any existing object, or the
repository on which a revision walker is working on.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
As requested, git_odb_read_header looks up an object on the ODB, but loads
only the header information (type & size) without loading any of the
actual file contents in memory.
It is significantly faster than doing a git_odb_read if you only need an
object's information and not its contents.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
String mememory is now managed in a much more sane manner.
Fixes include:
- git_person email and name is no longer limited to 64 characters
- git_tree_entry filename is no longer limited to 255 characters
- raw objects are properly opened & closed the minimum amount of
times required for parsing
- unit tests no longer leak
- removed 5 other misc memory leaks as reported by Valgrind
- tree writeback no longer segfaults on rare ocassions
The git_person struct is no longer public. It is now managed by the
library, and getter methods are in place to access its internal
attributes.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Since commit 70aab459, the msvc and MinGW builds have relied on
the built-in implementation of ntohl() and htonl(), rather than
linking the wsock32 library. The new index manipulation code now
calls ntohs()/htons() in addition to ntohl()/htonl(), so we need
to provide a built-in implementation of the 16-bit functions.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
The tree array wasn't being initialized when instantiating a tree object
in memory instead of loading it from disk.
New unit tests added to check for the problem.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Tag files can now be created and modified in-memory (all the setter
methods have been implemented), and written back to disk using the
generic git_object_write() method.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
git_tree_entry_byname was dereferencing a NULL pointer when the searched
file couldn't be found on the tree.
New test cases have been added to check for entry access methods.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the setter methods for git_tree have been added, including the
setters for attributes on each git_tree_entry and methods to add/remove
entries of the tree.
Modified trees and trees created in-memory from scratch can be written
back to the repository using git_object_write().
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All repository objects can now be created from scratch in memory using
either the git_object_new() method, or the corresponding git_XXX_new()
for each object.
So far, only git_commits can be written back to disk once created in
memory.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
All the required git_commit_set_XXX methods have been implemented; all
the attributes of a commit object can now be modified in-memory.
The new method git_object_write() automatically writes back the
in-memory changes of any object to the repository. So far it only
supports git_commit objects.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The new 'git__source_printf' does an overflow-safe printf on a source
bfufer.
The new 'git__source_write' does an overflow-safe byte write on a source
buffer.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The 'git_obj' structure is now called 'git_rawobj', since
it represents a raw object read from the ODB.
The 'git_repository_object' structure is now called 'git_object',
since it's the base object class for all objects.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
git_repository_object has now several internal methods to write back the
object information in the repository.
- git_repository__dbo_prepare_write()
Prepares the DBO object to be modified
- git_repository__dbo_write()
Writes new bytes to the DBO object
- git_repository__dbo_writeback()
Writes back the changes to the repository
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Added several methods to access:
- The ODB behind a repo
- The SHA1 id behind a generic repo object
- The type of a generic repo object
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Some compilers give linking problems when exporting 'uint32_t' as a
return type in the external API. Use generic types instead.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
A new method 'git_repository_object_free' allows to manually force the
freeing of a repository object, even though they are still automatically
managed by the repository and don't need to be freed by the user.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The interface for loading and parsing tree objects from a repository has
been completed with all the required accesor methods for attributes,
support for manipulating individual tree entries and a new unit test
t0901-readtree which tries to load and parse a tree object from a
repository.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The new 'git_index' structure is an in-memory representation
of a git index on disk; the 'git_index_entry' structures represent
each one of the file entries on the index.
The following calls for index instantiation have been added:
git_index_alloc(): instantiate a new index structure
git_index_free(): free an existing index
git_index_clear(): clear all the entires in an existing file
The following calls for index reading and writing have been added:
git_index_read(): update the contents of the index structure from
its file on disk.
Internally implemented through:
git_index__parse()
Index files are stored on disk in network byte order; all integer fields
inside them are properly converted to the machine's byte order when
loading them in memory. The parsing engine also distinguishes
between normal index entries and extended entries with 2 extra bytes
of flags.
The 'TREE' extension for index entries is also loaded into memory:
Tree caches stored in Index files are loaded into the
'git_index_tree' structure pointed by the 'tree' pointer inside
'git_index'.
'index->tree' points to the root node of the tree cache; the full tree
can be traversed through each of the node's 'tree->children'.
Index files can be written back to disk through:
git_index_write(): atomic writing of existing index objects
backed by internal method git_index__write()
The following calls for entry manipulation have been added:
git_index_add(): insert an empty entry to the index
git_index_find(): search an entry by its path name
git_index__append(): appends a new index entry to the end of the
list, resizing the entries array if required
New index entries are always inserted at the end of the array; since the
index entries must be sorted for it to be internally consistent, the
index object is only sorted once, and if required, before accessing the
whole entriea array (e.g. before writing to disk, before traversing,
etc).
git_index__remove_pos(): remove an index entry in a specific position
git_index__sort(): sort the entries in the array by path name
The entries array is sorted stably and in place using an
insertion sort, which ought to be the most efficient approach
since the entries array is always mostly-sorted.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The struct 'git_filelock' represents an atomically-locked
file, git-style.
Locked files can be modified atomically through the new file lock
interface:
int git_filelock_init(git_filelock *lock, const char *path);
int git_filelock_lock(git_filelock *lock, int append);
void git_filelock_unlock(git_filelock *lock);
int git_filelock_commit(git_filelock *lock);
int git_filelock_write(git_filelock *lock, const char *buffer, size_t length);
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The old 'git_revpool' object has been removed and
split into two distinct objects with separate
functionality, in order to have separate methods for
object management and object walking.
* A new object 'git_repository' does the high-level
management of a repository's objects (commits, trees,
tags, etc) on top of a 'git_odb'.
Eventually, it will also manage other repository
attributes (e.g. tag resolution, references, etc).
See: src/git/repository.h
* A new external method
'git_repository_lookup(repo, oid, type)'
has been added to the 'git_repository' API.
All object lookups (git_XXX_lookup()) are now
wrappers to this method, and duplicated code
has been removed. The method does automatic type
checking and returns a generic 'git_revpool_object'
that can be cast to any specific object.
See: src/git/repository.h
* The external methods for object parsing of repository
objects (git_XXX_parse()) have been removed.
Loading objects from the repository is now managed
through the 'lookup' functions. These objects are
loaded with minimal information, and the relevant
parsing is done automatically when the user requests
any of the parsed attributes through accessor methods.
An attribute has been added to 'git_repository' in
order to force the parsing of all the repository objects
immediately after lookup.
See: src/git/commit.h
See: src/git/tag.h
See: src/git/tree.h
* The previous walking functionality of the revpool
is now found in 'git_revwalk', which does the actual
revision walking on a repository; the attributes
when walking through commits in a database have been
decoupled from the actual commit objects.
This increases performance when accessing commits
during the walk and allows to have several
'git_revwalk' instances working at the same time on
top of the same repository, without having to load
commits in memory several times.
See: src/git/revwalk.h
* The old 'git_revpool_table' has been renamed to
'git_hashtable' and now works as a generic hashtable
with support for any kind of object and custom hash
functions.
See: src/hashtable.h
* All the relevant unit tests have been updated, renamed
and grouped accordingly.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Tag objects are now properly loaded from the revision pool.
New test t0801 checks for loading a parsing a series of tags, including
the tag of a tag.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The 'parse_oid' and 'parse_person' methods which were used by the commit
parser are now global so they can be used when parsing other objects.
The 'git_commit_person' struct has been changed to a generic
'git_person'.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Packed objects inside packfiles are now properly unpacked when calling
the git_odb__read_packed() method; delta'ed objects are also properly
generated when needed.
A new unit test 0204-readpack tries to read a couple hundred packed
objects from a standard packed repository.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The basic information (pointed trees and blobs) of each tree object in a
revision pool can now be parsed and queried.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The following new external methods have been added:
GIT_EXTERN(const char *) git_commit_message_short(git_commit *commit);
GIT_EXTERN(const char *) git_commit_message(git_commit *commit);
GIT_EXTERN(time_t) git_commit_time(git_commit *commit);
GIT_EXTERN(const git_commit_person *) git_commit_committer(git_commit *commit);
GIT_EXTERN(const git_commit_person *) git_commit_author(git_commit *commit);
GIT_EXTERN(const git_tree *) git_commit_tree(git_commit *commit);
A new structure, git_commit_person has been added to represent a
commit's author or committer.
The parsing of a commit has been split in two phases.
When adding a commit to the revision pool:
- the commit's ODB object is opened
- its raw contents are parsed for commit TIME, PARENTS and TREE
(the minimal amount of data required to traverse the pool)
- the commit's ODB object is closed
When querying for extended information on a commit:
- the commit's ODB object is reopened
- its raw contents are parsed for the requested information
- the commit's ODB object remains open to handle additional queries
New unit tests have been added for the new functionality:
In t0401-parse: parse_person_test
In t0402-details: query_details_test
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Commits now store pointers to their tree objects.
Tree objects now work as separate git_revpool_object
entities.
Tree objects can be loaded and parsed inedependently
from commits.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
git_revpool_object now has a type identifier for each object
type in a revpool (commits, trees, blobs, etc).
Trees can now be stored in the revision pool.
git_revpool_tableit now supports filtering objects by their
type when iterating through the object table.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Created commit objects in t0401-parse weren't being freed properly.
Updated the API documentation to note that commit objects are owned
by the revision pool and should not be freed manually.
The parents list of each commit was being freed twice after each test.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Previously the objects table was being freed, but not
the actuall commits. All git_commit objects are freed
and hence invalidated when freeing the git_rp object
they belong to.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
This fix had been delayed by Ramsay because on 32-bit systems it
highlights the fact that off_t is set to an invalid value.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
This reduces the global namespace pollution. These functions
were the only remaining external symbols (with the exception
of an PPC_SHA1 build) which did not start with 'git', and
since these are private library symbols the 'git__' prefix is
appropriate.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Given that the sha1.h header file should never be included into
any other file, since it represents an implementation detail of
hash.c, we remove the header and inline it's content.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
On Intel machines, the msvc compiler defines the CPU architecture
macros _M_IX86 and _M_X64 (equivalent to __i386__ and __x86_64__
respectively). Use these macros in the pre-processor expression
to select the "fast" definition of the {get,put}_be32() macros.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
When git_oid_to_string() was passed a buffer size larger than
GIT_OID_HEXSZ+1, the function placed the c-string NUL char at
the wrong position. Fix the code to place the NUL at the end
of the (possibly truncated) oid string.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
In order to avoid inconsistent definitions of type off_t, all
compilation units should include the "common.h" header file
before certain system headers (those which directly or indirectly
lead to the definition of off_t). The "common.h" header contains
the definition of _FILE_OFFSET_BITS to select 64-bit file offsets.
The symptom of this inconsistency, while compiling with -Wextra, is
the following warning:
In file included from src/common.h:50,
from src/commit.c:28:
src/util.h: In function git__is_sizet:
src/util.h:41: warning: comparison between signed and unsigned
In order to fix the problem, we simply remove the #include <time.h>
statement at the head of src/commit.c. Note that src/commit.h also
includes <time.h>.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
gcc (4.4.0) issues the following warning:
src/revobject.c:33: warning: dereferencing type-punned pointer \
will break strict-aliasing rules
We suppress the warning by copying the first 4 bytes from the oid
structure into an 'unsigned int' using memcpy(). This will also
fix any potential alignment issues on certain platforms.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
In particular, doxygen issues the following warning:
.../src/git/revwalk.h:86: Warning: The following parameters of \
gitrp_sorting(git_revpool *pool, unsigned int sort_mode) are \
not documented:
parameter 'pool'
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
In particular, sparse issues the following warnings:
src/revobject.c:29:14: warning: symbol 'max_load_factor' was \
not declared. Should it be static?
src/revobject.c:31:14: warning: symbol 'git_revpool_table__hash' was \
not declared. Should it be static?
In order to suppress these warnings, we simply declare them as
static, since they are not (currently) referenced outside of this
file.
In the case of max_load_factor, this is probably correct. However,
this may not be appropriate for git_revpool_table__hash(), given
how it is named. So, this should either be re-named to reflect it's
non-external status, or a declaration needs to be added to the
revobject.h header file.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
In order to suppress this warning, we could simply replace the
constant 0 with NULL. However, in this case, replacing the
comparison with 0 by !buffer is more idiomatic.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
In particular, the compiler issues the following warning:
src/revwalk.c(61) : warning C4244: '=' : conversion from \
'unsigned int' to 'unsigned char', possible loss of data
In order to suppress the warning, we change the type of the
sorting "enum" field of the git_revpool structure to be consistent
with the sort_mode parameter of the gitrp_sorting() function.
Note that if the size of the git_revpool structure is an issue,
then we could change the type of the sort_mode parameter instead.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
In particular, the compiler issues the following warnings:
src/revobject.c(29) : warning C4305: 'initializing' : truncation \
from 'double' to 'const float'
src/revobject.c(56) : warning C4244: '=' : conversion from \
'const float' to 'unsigned int', possible loss of data
src/revobject.c(149) : warning C4244: '=' : conversion from \
'const float' to 'unsigned int', possible loss of data
In order to suppress the warnings we change the type of max_load_factor
to double, rather than change the initialiser to 0.65f, and cast the
result type of the expressions to 'unsigned int' as expected by the
assignment operators. Note that double should be able to represent all
unsigned int values without loss.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
These warnings are issued by both gcc (-Wextra) and msvc (-W3).
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
On the msvc build, the tests t0401-parse and t0501-walk both
crash with a runtime error (ACCESS_VIOLATION). This is caused
by writing to un-allocated memory due to an under-allocation
of a git_revpool_table data structure.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
sorting ('prev' pointers in the linked list are no longer lost).
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
The GIT_RPSORT_XXX flags have been moved to the external API,
and a new method 'gitrp_sorting(...)' has been added to safely
change the sorting method of a revision pool.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
'git_commit_list_toposort()' and 'git_commit_list_timesort()' now
sort a commit list by topological and time order respectively.
Both sorts are stable and in place.
'git_commit_list_append' has been replaced by 'git_commit_list_push_back'
and 'git_commit_list_push_front'.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
Fixed issue when generating pending commits list during iteration.
The 'git_commit_lookup' function will now check the pool's cache
for commits which have been previously loaded/parsed; there can only
be a single 'git_commit' structure for each commit on the same pool.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
All the objects which will will be eventually transversable from
a revision pool (commits, trees, etc) now inherit from the
'git_revpool_object' structure which identifies them with their
own OID.
Furthermore, the 'git_revpool_table' and related functions have
been added, which allow for constant time lookup (hash table)
of the loaded revpool objects based on their OID.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
The 'gitrp_next()' method now correctly does a revision walking
of all the pushed revisions in arbritary ordering.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
git_commit_lookup() now creates commit references
without loading them from the ODB.
git_commit_parse() creates a commit reference, loads
it and parses it from the ODB.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
Basic support for iterating the revpool.
The following functions of the revwalk API have been partially
implemented:
void gitrp_reset(git_revpool *pool);
void gitrp_push(git_revpool *pool, git_commit *commit);
void gitrp_prepare_walk(git_revpool *pool);
git_commit *gitrp_next(git_revpool *pool);
Parsed commits' parents are now also parsed and stored in a
"git_commit_list" structure (linked list).
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
A few initial tests for commit parsing:
"parse_buffer_test" tests git_commit__parse_buffer() with
several malformed commit messages and a few corner cases
which should pass.
"parse_oid_test" tests git_commit__parse_oid() with several
malformed commit lines containing broken SHA1 OIDs.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
The external API function "git_commit_parse" has been renamed
to "git_commit_lookup" and has been partially implemented with
support for loading commits straight from the ODB. It still lacks
the functionality to lookup cached commits in the revpool and to
resolve tags to commits.
The following internal functions have been partially implemented:
int git_commit__parse_buffer(...);
int git_commit__parse_time(...);
int git_commit__parse_oid(...);
Commits are now fully parsed but the generated parent and tree
references are not handled yet.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
In particular, using the normal (or production) compiler
warning level (-W3), msvc complains as follows:
.../sha1.c(244) : warning C4018: '<' : signed/unsigned mismatch
.../sha1.c(270) : warning C4244: 'function' : conversion from \
'unsigned __int64' to 'unsigned long', possible loss of data
.../sha1.c(271) : warning C4244: 'function' : conversion from \
'unsigned __int64' to 'unsigned long', possible loss of data
Note that gcc issues a similar complaint about line 244 when
compiling with -Wextra.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Commit 5dddf7c (Add block-sha1 in favour of the mozilla routines
2010-04-14) introduced the "bswap.h" header file which contains
an inline function (default_swab32()). The msvc compiler does
not support the inline keyword which causes the build to fail
with a syntax error.
However, msvc does support inline functions using the __inline
keyword language extension. We already have the GIT_INLINE()
macro that allows us to hide this syntatic difference. In order
to fix the build, we simply use GIT_INLINE() in the definition
of the default_swab32() function.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
* ramsay/dev:
Add a pack index 'virtual function' to fetch an index entry
Add a pack index 'virtual function' to search by file offset
Change the interface of the pack index search function
Add an 64-bit offset table index bounds check for v2 pack index
Add a minimum size check when opening an v2 pack index file
win32: Add separate MinGW and MSVC compatability header files
Makefile: Add support for custom build options in config.mak file
Fix some coding style issues
Since block-sha1 from git.git has such excellent performance, we
can also get rid of the openssl dependency. It's rather simple
to add it back later as an optional extra, but we really needn't
bother to pull in the entire ssl library and have to deal with
linking issues now that we have the portable and, performance-wise,
truly excellent block-sha1 code to fall back on.
Since this requires a slight revamp of the build rules anyway, we
take the opportunity to fix including EXTRA_OBJS in the final build
as well.
The block-sha1 code was originally implemented for git.git by
Linus Torvalds <torvalds@linux-foundation.org> and was later
polished by Nicolas Pitre <nico@cam.org>.
Signed-off-by: Andreas Ericsson <ae@op5.se>
We don't use it yet, but now we have it there at least.
All the non-trivial parts of it appears to have been written
and contributed to git.git by some anonymous genius. The original
implementation was done by Paul Mackerras <paulus@samba.org>.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Given an index entry number, the idx_get() function returns an
(version agnostic) index_entry structure containing all of the
information required to unpack the corresponding object from
the '.pack' file.
Since the v1 and v2 file formats differ in the layout of the
object records, we provide two implementations of the get
function and initialise the function pointer appropriately.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
In addition to searching the index by oid, we need to search by
'.pack' file offset, particularly when processing OBJ_OFS_DELTA
objects. Since the v1 and v2 file formats differ in the layout
of the object records, we provide two implementations of the
search function and initialise the (virtual) function pointer
appropriately.
Note that, as part of the creation of the 'offset index', we also
add a check that the offset data in the index is within the bounds
of the '.pack' file. Having sorted the file offsets, while creating
the index, we only need to check the smallest and largest values.
The offset index consists of the im_off_idx array, which contains
the index entry numbers sorted into file offset order, and the
im_off_next mapping array. The im_off_next array maps an index
entry number to the 'next' index entry in file offset order.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
In particular, on a successful search, we now return the index
entry number of the object rather than the '.pack' file offset.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
This reduces the global namespace pollution and allows for
a win32 compiler (eg. Open Watcom) to provide these routines
in a header other than <dirent.h> (eg in <io.h>).
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Some win32 compilers define the SSIZE_T type, with the same
meaning and intent as ssize_t. If available, make ssize_t a
synonym of SSIZE_T.
At present, the Digital-Mars compiler is known not to define
SSIZE_T, so we provide an SSIZE_T macro to use in the typedef.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
In addition to removing the inline #define, commit 209849a also
removed a #pragma to disable msvc deprecated function warnings.
Without this #pragma, msvc currently issues 19 warnings related
to "deprecated insecure c-library functions", such as strcpy()
and 22 warnings related to "deprecated POSIX function names",
such as open().
In order to supress these warnings, re-instate the #pragma.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
No need to define inline as __inline because libgit2 code
should be using GIT_INLINE instead.
Signed-off-by: Julio Espinoza-Sokal <julioes@gmail.com>
Signed-off-by: Andreas Ericsson <ae@op5.se>
For information on FlushFileBuffers(), see the msdn document
at msdn.microsoft.com/en-us/library/aa364439(VS.85).aspx
Note that Windows 2000 is shown as the minimum windows version
to support FlushFileBuffers(), so if we wish to support Win9X
and NT4, we will need to add code to dynamically check if
kernel32.dll contains the function.
The only error return mentioned in the msdn document is
ERROR_INVALID_HANDLE, which is returned if the file/device
(eg console) is not buffered. The fsync(2) manpage says that
EINVAL is returned in errno, if "fd is bound to a special
file which does not support synchronization".
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
When setting the default value, the macro name was specified
as GIT_FLEX_ARRAY, which is inconsistent with it's earlier
usage in the file. This caused a compilation error, using the
MS Visual C/C++ compiler, when compiling the git_packlist
struct definition in src/odb.c.
In addition to changing the spelling of the FLEX_ARRAY macro
to GIT_FLEX_ARRAY, including it's use in src/odb.c, we also
rename the TYPEOF macro to GIT_TYPEOF.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
This supresses some "conversion from 'size_t' to 'unsigned int',
possible loss of data" warning messages from the MS Visual C/C++
compiler with -Wp64.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
In particular, in standard C, a struct or union must have at
least one member declared (ie. structs and unions cannot be
empty). Some compilers allow empty structs as an extension
and won't even issue a warning unless asked for it (eg, gcc
requires -pedantic). Some compilers allow empty structs as
an extension and will only treat it as an error if asked for
strict checking (eg Digital-Mars with -A). Some compilers
simply treat it as an error (eg MS Visual C/C++).
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
In 82324ac, the new static function exists_loose() called
object_file_name() and, in order to detect an error return,
tested for a negative value. This usage is incorrect, as
the error return is indicated by a positive return value.
(A successful call is indicated by a zero return value)
The only error return from object_file_name() relates to
insufficient buffer space and the return value gives the
required minimum buffer size (which will always be >0).
If the caller requires a dynamically allocated buffer,
this allows something like the following call sequence:
size_t len = object_file_name(NULL, 0, db->object_dir, id);
char *buf = git__malloc(len);
if (!buf)
error(...);
object_file_name(buf, len, db->object_dir,id);
...
No current callers take advantage of this capability.
Fix up the call site and change the return type of the
function, from int to size_t, which more accurately
reflects the implementation.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
This test assumed that it was invoked in an empty directory,
which is true when run from the Makefile, and so would fail
if run standalone. In order to allow the test to work when
run from any directory, create a sub directory "dir-walk"
and chdir() into this directory while running the tests.
Also, add some additional tests.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, the git__mmap() and git__munmap() routines provide
the interface to platform specific memory-mapped file facilities.
We provide implementations for unix and win32, which can be found
in their own sub-directories.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
On windows, unless we use the O_BINARY flag in the open()
call, the file I/O routines will perform line ending
conversion (\r\n => \n on input, \n => \r\n on output).
In addition to the performance penalty, most files in the
object database are binary and will, therefore, become
corrupted by this conversion.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, conditional expressions which contain an
assignment statement, where the expression type is not
explicitly made to be boolean, elicits the following
message:
warning 2: possible unintended assignment
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, using pointer arithmetic on void pointers,
despite being quite useful, is not legal in standard C.
Avoiding non-standard C constructs will help in porting
the library to other compilers/platforms.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Paul agreed to the GCC-exception license by email:
|
| From: Paul Kocher <paul@cryptography.com>
| Date: Sun, 15 Mar 2009 11:37:23 -0700
| Subject: Re: Adding Mozilla SHA1 implementation to libgit2
|
| Yes - that's fine.
|
| At 01:56 AM 3/5/2009, Andreas Ericsson wrote:
| > Hi Paul. We spoke earlier about this, if you remember?
| > We'd like to add the GCC-exception to the GPL license
| > for these files.
Signed-off-by: Paul Kocher <paul@cryptography.com>
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This function determines if the given object can be found
in the object database. At present, only the local object
database is searched.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, the test for z-stream input completion
(zs.avail_in != 0) logically belongs with the test for
the Z_STREAM_END stream status. This is also consistent
with the identical check in finish_inflate().
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
At present, it is sufficient to ensure that an error return
from inflateInit() is not ignored. Most error returns, like
Z_VERSION_ERROR and Z_STREAM_ERROR, indicate programming or
build errors. These errors could, perhaps, be handled with
simple asserts. However, for a Z_MEM_ERROR, we may want to
perform some further error handling in the future.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, neglecting to call inflateEnd() along various
codepaths in the inflate_tail() routine, would result in the
failure to release zlib internal state.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These routines are intended to extract the directory and
base name from a path string. Note that these routines
do not interact with any filesystem and work only on the
text of the path.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, the git__delta_apply() function has not been
declared prior to it's definition. In order to suppress the
warning, include the delta-apply.h header which provides the
public interface. This ensures that the declaration and
definition are consistent.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The git__delta_apply() function can be used to apply a Git style
delta, such as those used in pack files or in git patch files,
to recover the original object stream.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The fanout table is fairly commonly accessed, we need to read it
twice for each object we lookup in any given pack file. Most of
the processors running Git are running in little-endian mode, as
they are variants of the x86 platform, so reading the fanout is
a costly operation as we need to convert from network byte order
to local byte order. By decoding the fanout table into a malloc
obtained buffer we can save these 2 decode operations per lookup
and make search go more quickly.
This also cleans up the initialization of the search functions
by cutting out a few instructions, saving a small amount of time.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The index data is mapped into memory and then scanned using a
binary search algorithm to locate the matching entry for the
supplied git_oid. The standard fanout hash trick is applied to
reduce the search space by 8 iterations.
Since the v1 and v2 file formats differ in their search function,
due to the different layouts used for the object records, we use
two different search implementations and a virtual function pointer
to jump to the correct version of code for the current pack index.
The single function jump per-pack should be faster then computing
a branch point inside the inner loop of a common binary search.
To improve concurrency during read operations the pack lock is only
held while verifying the index is actually open, or while opening
the index for the first time. This permits multiple concurrent
readers to scan through the same index.
If an invalid index file is opened we close it and mark the
git_pack's invalid bit to true. The git_pack structure is kept
around in its parent git_packlist, but the invalid bit will cause
all future readers to skip over the pack entirely. Pruning the
invalid entries is relatively unimportant because they shouldn't
be very common, a $GIT_DIRECTORY/objects/pack directory tends to
only have valid pack files.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Win32 has a variant of mmap that is harder to use than POSIX, but
to run natively and efficiently on Win32 we need some form of it.
gitfo_map_ro() provides a basic mmap function for use in locations
where we need read-only random data access to large ranges of a file,
such as a pack-*.idx.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Using an atomic reference counter is difficult to make
cross-platform, as the reference count implementations
are generally processor specific. Its also hard to do
a proper multi-read/single-write implementation.
We now use a simple mutex around the reference count for the list
of packs. Readers grab the mutex and either build the list, or
increment the existing one's reference count. When the reader is
done with the list, the reference count is decremented. In this way
parallel readers are able to operate on the list without worrying
about it being deallocated out from under them.
Individual pack structures are held by reference counts, but we
only care about the list the pack structure is held in. There is
no need to increment/decrement the pack reference counts as we
scan through them during a read operation, the caller holds the
git_packlist and that is sufficient to hold the packs it references.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
As far as gcc is concerned, the "z size specifier" is available as
an extension to the language, which is available with or without any
-std= switch. (I think you have to go back to 2.95 for a version
of gcc which doesn't work.) Many other compilers have this as an
extension as well (ie without the equivalent of -std=c99).
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These headers aren't always available; they typically come from the
Linux kernel, but aren't supposed to be exported into the userspace
/usr/include. Modern kernels won't install these and some distros
rm -rf the directory post kernel header install.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Currently we only catalog the available pack files into a table,
storing their path names relative to the pack directory.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When scanning the pack directory we need to see if the path
name is present for ".idx" when we discover a ".pack" file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Checking the return value of snprintf is a pain, as it must be
>= 0 and < sizeof(buffer). git__fmt is a simple wrapper to
perform these checks.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Our fileops API is currently private. We aren't planning on supplying
a cross-platform file API to applications that link to us. If we did,
we'd probably whole-sale publish fileops, not just the dirent code.
By moving it to be private we can also change the call signature to
permit the buffer to be passed down through the call chain. This is
very helpful when we are doing a recursive scan as we can reuse just
one buffer in all stack frames, reducing the impact the recursion has
on the stack frames in the data cache.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We grab the lock while accessing the alternates list, ensuring that
we only initialize it once for the given git_odb.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These abstractions can be used to implement an efficient resource
reference counter and simple mutual exclusion. On pthreads we use
pthread_mutex_t, except when we are also on glibc and can directly
use its asm/atomic.h definitions.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
If we are using threads we need to make sure pthread.h comes
in before just about anything else. Some platforms enable
macros that alter what other headers define.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This error code indicates the OS error code has a better value
describing the last error, as it is likely a network or local
file IO problem identified by a C library function call.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We now forbid direct use of malloc, strdup or calloc within the
library and instead use wrapper functions git__malloc, etc. to
invoke the underlying library malloc and set git_errno to a no
memory error code if the allocation fails.
In the future once we have pack objects in memory we are likely
to enhance these routines with garbage collection logic to purge
cached pack data when allocations fail. Because the size of the
function will grow somewhat large, we don't want to mark them for
inline as gcc tends to aggressively inline, creating larger than
expected executables.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We're likely to add additional path data, like the path of the
refs or the path to the config file into the git_odb structure,
as it may grow into the repository wrapper. Changing the name
of the objects directory reference makes it more clear should
we later add something else.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We didn't search for the object, so we cannot possibly promise it
to the caller of git_odb_read().
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This is the correct C99 format code for the size_t type when passed
as an argument to the *printf family. If the platform doesn't
define it, we assume %lu and just cross our fingers that its the
proper setting for a size_t on this system. On most sane platforms,
"unsigned long" is the underlying type of "size_t".
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The Mach-O format does not permit gcc to implement the __thread
TLS specification, so we must instead emulate it using a single
int cell allocated from memory and stored inside of the thread
specific data associated with the current pthread.
What makes this tricky is git_errno must be a valid lvalue, so
we really need to return a pointer to the caller and deference it
as part of the git_errno macro.
The GCC-specific __attribute__((constructor)) extension is used
to ensure the pthread_key_t is allocated before any Git functions
are executed in the library, as this is necessary to access our
thread specific storage.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
[sp: Changed signature for output to use git_oid, and added
a test case to verify an allocated git_hash_ctx can be
reinitialized and reused.]
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, when asked to read an empty file, this function
calls malloc() with a zero size allocation request. Standard C
says that the behaviour of malloc() in this case is implementation
defined.
[C99, 7.20.3 says "... If the size of the space requested is zero,
the behavior is implementation-defined: either a null pointer is
returned, or the behavior is as if the size were some nonzero
value, except that the returned pointer shall not be used to
access an object."]
Finesse the issue by over-allocating by one byte. Setting the extra
byte to '\0' may also provide a useful sentinel for text files.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, the gitfo_read_file() routine can be used to slurp
the complete file contents into an gitfo_buf structure. The buffer
content will be allocated by malloc() and may be released by the
gitfo_free_buf() routine. The io buffer type can be initialised
on the stack with the GITFO_BUF_INIT macro.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, the warning relates to malloc(), which is
declared in <stdlib.h>. This header is now included,
indirectly, via the "common.h" header.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The PATH_MAX symbol is often, but not always, defined
in the <limits.h> header. In particular, on cygwin you
need to include this header to avoid a compilation error.
However, some systems define PATH_MAX to be something as
small as 256, which POSIX is happy to allow, while others
allow much larger values. In general it can vary from
one filesystem to another.
In order to avoid the vagaries of different systems, define
our own symbol.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
[sp: Credit for some of this implementation goes to Pieter, I
started off a patch he proposed for libgit2 but reworked
enough of it that I don't want to blame him for any bugs.]
Suggested-by: Pieter de Bie <pdebie@ai.rug.nl>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
commit dff79e27d3 renamed
the (small object) "git_sobj" to a plain "git_obj", but
neglected to update some of the documentation to reflect
that change.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since at least MS have something like GetFirstDirEnt() and
GetNextDirEnt() (presumably with superior performance), we
can let MS hackers add support for a dirent walker using
that API instead, while we stick with the posix-style
readdir() calls.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The idea is taken from Junio's work in read-cache.c, where
it's used for writing out the index without tap-dancing on
the poor harddrive. Since it's almost certainly useful for
cached writing of packfiles too, we turn it into a generic
API, making it perfectly simple to reuse it later.
gitfo_write_cached() has the same contract as gitfo_write(), it
returns GIT_SUCCESS if all bytes are successfully written (or were
at least buffered for later writing), and <0 if an error occurs
during buffer writing.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since it doesn't make sense to make the disk access stuff
portable *AND* public (that's a job for each application
imo), we can take a shortcut and just support unixy stuff
for now and get away with coding most of it as macros.
Since we go with an internal API for starters and only
provide higher-level API's to the libgit users, we'll be
ok with this approach.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since it's being added when we install the headers anyway,
we might as well get rid of it. If anything, we should point
coders to the COPYING file in the project's root directory
instead of duplicating the same (large-ish) text everywhere.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This adds the per-thread global variable git_errno to the
system, which callers can examine to get information about
an error.
Two helper functions are added to reduce LoC-count for the
library code itself.
Also, some exceptions are made for running sparse on GIT_TLS
definitions, since it doesn't grok thread-local variables at
all.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
ARRAY_SIZE() et al go in util.h, included from common.h
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This one pulls in compiler compatibility macros, some
common header files, and also the public common.h header.
C source files are modified to use the private common.h
in favour of the public one.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Otherwise their prototypes don't match their declarations.
Detected by 'sparse', which is obviously good to run
before each commit.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Using it in the first place means something's wrong.
This patch replaces it with an internal header which
carries the previously "protected" code instead.
Internal source-files simply include "commit.h" and
they're done. The internal header includes the public
one to make sure we always use the proper prototype.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
It doesn't cover all cases, but we can work on those as
we go along. For now, gcc, MSVC++, Intel C/C++, IBM XL C/C++,
Sun Studio C/C++ and Borland C++ Builder are the supported
compilers (although we boldly assume that they all are of
a recent enough version to support thread-local storage).
This is intended to be used in upcoming patches that implement
graceful (but TLS-dependant) error-handling in the library.
As an added bonus, we also bring the online_cpus() function
from git.git to detect the number of usable cpu's.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
It's arguably smoother to keep them close to the source,
as that's where one's working when modifying them. More
importantly, though, is the ability to use private headers
in the src/ dir that simply include "git/$samename.h" to
get to the public API at the same time.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
git_revp is something I personally can't stop pronouncing
"rev pointer". I'm sure others would suffer the same
problem.
Also, rename the git_revp_ sub-api "gitrp_". This is the
first of many such renames, primarily done to prevent
extreme inflation in the "git_" namespace, which we'd like
to reserve for a higher-level API.
While we're at it, we remove the noise-char "c" from a lot
of functions. Since revision walking is all about commits,
the common case should be that we're dealing with commits.
Exceptions can get a more mnemonic description as needed.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The 's' never really made sense, since it's not a "small"
object at all, but rather a plain object. As such, it should
have a "plain" object name.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We never want to accept a short read or a short write when
transferring data to or from a local file.
Either the entire read (or write) completes or the operation
failed and we will not recover gracefully from it.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
These are easily built off the standard C library functions memcpy
and memcmp. By marking these inline we stand a good chance of
the C compiler replacing the entire thing with tight machine code,
because many compilers will actually inline a memcmp or memcpy when
the 3rd argument (the size) is a constant value.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This way we can start to write IO code to read and write files in the
Git object database, but provide a hook to inject native Win32 APIs
instead so libgit2 can be ported to run natively on that platform.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This isn't the best idea I've head. Pierre Habouzit was suggesting
a technique of assigning a unique integer to each commit and then
allocating storage out of auxiliary pools, using the commit's unique
integer to index into any auxiliary pool in constant time. This way
both applications and the library can efficiently attach arbitrary
data onto a commit, such as rewritten parents, or flags, and have
them disconnected from the main object hash table.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This way only structures we ask the caller to allocate on their
call stack or which we want to allow them to use members from
are shown in the API docs.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Most read calls will use the small object format, as the
majority of the content within the database is very small
objects (under 20 KB when inflated).
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>