The current code issues a lot of strncmp() calls in order to check for
the end of the header, simply in order to copy it and start going
through it again. These are a lot of calls for something we can check as
we go along. Knowing the amount of parents beforehand to reduce
allocations in extreme cases does not make up for them.
Instead start parsing immediately and check for the double-newline after
each header field, leaving the raw_header allocation for the end, which
lets us go through the header once and reduces the amount of strncmp()
calls significantly.
In unscientific testing, this has reduced a shortlog-like usage (walking
though the whole history of a branch and extracting data from the
commits) of git.git from ~830ms to ~700ms and makes the time we spend in
strncmp() negligible.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
This converts the array of parent SHAs from a git_vector where
each SHA has to be separately allocated to a git_array_t where
all the SHAs can be kept in one block. Since the two collections
have almost identical APIs, there isn't much involved in making
the change. I did add an API to git_array_t so that it could be
allocated at a precise initial size.
This adds an example implementation that emulates git cat-file.
It is a convenient and relatively simple example of getting data
out of a repository.
Implementing this also revealed that there are a number of APIs
that are still not using const pointers to objects that really
ought to be. The main cause of this is that `git_vector_bsearch`
may need to call `git_vector_sort` before doing the search, so a
const pointer to the vector is not allowed. However, for tree
objects, with a little care, we can ensure that the vector of
tree entries is always sorted and allow lookups to take a const
pointer. Also, the missing const in commit objects just looks
like an oversight.
This adds create and free callback to the git_objects_table so
that more of the creation and destruction of objects can be table
driven instead of using switch statements. This also makes the
semantics of certain object creation functions consistent so that
we can make better use of function pointers. This also fixes a
theoretical error case where an object allocation fails and we
end up storing NULL into the cache.
Actually this renames git_commit_create_oid to
git_commit_create_from_oids and moves the API declaration to
include/git2/sys/commit.h since it is a dangerous API for general
use (because it doesn't check that the OID list items actually
refer to real objects).
The end of the header is signaled by to consecutive LFs and the commit
message starts immediately after. Jumping over LFs at the start of the
message is a bug and leads to creating different commits if
when rebuilding history.
This also fixes an empty commit message being returned as "\n".
Implicit type conversion argument of function to size_t type
Suspicious sequence of types castings: size_t -> int -> size_t
Consider reviewing the expression of the 'A = B == C' kind. The expression is calculated as following: 'A = (B == C)'
Unsigned type is never < 0
When the encoding header changed to be treated as an additional
header, the EOL pointer started to point to the byte after the LF,
making the git__strndup call copy the LF into the value.
Increase the EOL pointer value after copying the data to keep the rest
of the semantics but avoid copying LF.
This moves the check for the "encoding" header into a loop which
is just scanning for non-required headers at the end of a commit
header. That loop will skip unrecognized lines (including header
continuation lines) until a terminating completely blank line is
found, and only then does it move to reading the commit message.
'git commit' and 'git tag -a' enforce some conventions, like cleaning up excess whitespace and making sure that the last line ends with a '\n'. This fix replicates this behavior.
Fixlibgit2/libgit2sharp#117
This also includes droping `git_buf_lasterror` because it makes no sense
in the new system. Note that in most of the places were it has been
dropped, the code needs cleanup. I.e. GIT_ENOMEM is going away, so
instead it should return a generic `-1` and obviously not throw
anything.
git_commit_create is supposed to update the given reference
"update_ref", but segfaulted in case of a yet to be born
reference. Fix it.
Signed-off-by: schu <schu-github@schulog.org>
This converts virtually all of the places that allocate GIT_PATH_MAX
buffers on the stack for manipulating paths to use git_buf objects
instead. The patch is pretty careful not to touch the public API
for libgit2, so there are a few places that still use GIT_PATH_MAX.
This extends and changes some details of the git_buf implementation
to add a couple of extra functions and to make error handling easier.
This includes serious alterations to all the path.c functions, and
several of the fileops.c ones, too. Also, there are a number of new
functions that parallel existing ones except that use a git_buf
instead of a stack-based buffer (such as git_config_find_global_r
that exists alongsize git_config_find_global).
This also modifies the win32 version of p_realpath to allocate whatever
buffer size is needed to accommodate the realpath instead of hardcoding
a GIT_PATH_MAX limit, but that change needs to be tested still.
The ownership semantics have been changed all over the library to be
consistent. There are no more "borrowed" or duplicated references.
Main changes:
- `git_repository_open2` and `3` have been dropped.
- Added setters and getters to hotswap all the repository owned
objects:
`git_repository_index`
`git_repository_set_index`
`git_repository_odb`
`git_repository_set_odb`
`git_repository_config`
`git_repository_set_config`
`git_repository_workdir`
`git_repository_set_workdir`
Now working directories/index files/ODBs and so on can be
hot-swapped after creating a repository and between operations.
- All these objects now have proper ownership semantics with
refcounting: they all require freeing after they are no longer
needed (the repository always keeps its internal reference).
- Repository open and initialization has been updated to keep in
mind the configuration files. Bare repositories are now always
detected, and a default config file is created on init.
- All the tests affected by these changes have been dropped from the
old test suite and ported to the new one.
Currently libgit2 shares pointers to its internal reference cache with
the user. This leads to several problems like invalidation of reference
pointers when reordering the cache or manipulation of the cache from
user side.
Give each user its own git_reference instead of leaking the internal
representation (struct reference).
Add the following new API functions:
* git_reference_free
* git_reference_is_packed
Signed-off-by: schu <schu-github@schulog.org>