When extracting a commit's signature, we first free the object and only
afterwards put its signature contents into the result buffer. This works
in most cases - the free'd object will normally be cached anyway, so we
only end up decrementing its reference count without actually freeing
its contents. But in some more exotic setups, where caching is disabled,
this can definitly be a problem, as we might be the only instance
currently holding a reference to this object.
Fix this issue by first extracting the contents and freeing the object
afterwards only.
The functions `git_commit_header_field` and
`git_commit_extract_signature` both receive buffers used to hand back
the results to the user. While these functions called `git_buf_sanitize`
on these buffers, this is not the right thing to do, as it will simply
initialize or zero-terminate passed buffers. As we want to overwrite
contents, we instead have to call `git_buf_clear` to completely reset
them.
Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
When parsing a commit, we will treat all bytes left after parsing
the headers as the commit message. When no bytes are left, we
leave the commit's message uninitialized. While uncommon to have
a commit without message, this is the right behavior as Git
unfortunately allows for empty commit messages.
Given that this scenario is so uncommon, most programs acting on
the commit message will never check if the message is actually
set, which may lead to errors. To work around the error and not
lay the burden of checking for empty commit messages to the
developer, initialize the commit message with an empty string
when no commit message is given.
When calling `git_commit_create` with an empty array of `parents` and `parent_count == 0`
the call will segfault at https://github.com/libgit2/libgit2/blob/master/src/commit.c#L107
when it's trying to compare `current_id` to a null parent oid.
This just puts in a check to stop that segfault.
The function to extract signatures suffers from a similar bug to the
header field finding one by having an unecessary line feed check as a
break condition of its loop.
Fix that and add a test for this single-line signature situation.
Sometimes you want to create a commit but not write it out to the
objectdb immediately. For these cases, provide a new function to
retrieve the buffer instead of having to go through the db.
We should be checking whether the object we're looking up is a commit,
and we should let the caller know whether the not-found return code
comes from a bad object type or just a missing signature.
When we moved the logic to handle the first one, wrong loop logic was
kept in place which meant we still finished early. But we now notice it
because we're not reading past the last LF we find.
This was not noticed before as the last field in the tested commit was
multi-line which does not trigger the early break.
We were searching only past the first header field, which meant we were
unable to find e.g. `tree` which is the first field.
While here, make sure to set an error message in case we cannot find the
field.
It is already possible to get a commit's summary with the
`git_commit_summary` function. It is not possible to get the
remaining part of the commit message, that is the commit
message's body.
Fix this by introducing a new function `git_commit_body`.
This allows the user to look up fields which we don't parse in libgit2,
and allows them to access gpgsig or mergetag fields if they wish to
check the signature.
Some tools create multiple author fields. git is rather lax when parsing
them, although fsck does complain about them. This means that they exist
in the wild.
As it's not too taxing to check for them, and there shouldn't be a
noticeable slowdown when dealing with correct commits, add logic to skip
over these extra fields when parsing the commit.
The signature for the reflog is not something which changes
dynamically. Almost all uses will be NULL, since we want for the
repository's default identity to be used, making it noise.
In order to allow for changing the identity, we instead provide
git_repository_set_ident() and git_repository_ident() which allow a user
to override the choice of signature.
The current version of the commit creation and amend function are unsafe
to use when passing the update_ref parameter, as they do not check that
the reference at the moment of update points to what the user expects.
Make sure that we're moving history forward when we ask the library to
update the reference for us by checking that the first parent of the new
commit is the current value of the reference. We also make sure that the
ref we're updating hasn't moved between the read and the write.
Similarly, when amending a commit, make sure that the current tip of the
branch is the commit we're amending.
We can make use of git_object_dup to use refcounting instead of pointer
comparison to make sure we don't free the caller's object.
This also lets us simplify the case for '~0' which is now just an
assignment instead of looking up the object we have at hand.
This adds an API to amend an existing commit, basically a shorthand
for creating a new commit filling in missing parameters from the
values of an existing commit. As part of this, I also added a new
"sys" API to create a commit using a callback to get the parents.
This allowed me to rewrite all the other commit creation APIs so
that temporary allocations are no longer needed.
The current code issues a lot of strncmp() calls in order to check for
the end of the header, simply in order to copy it and start going
through it again. These are a lot of calls for something we can check as
we go along. Knowing the amount of parents beforehand to reduce
allocations in extreme cases does not make up for them.
Instead start parsing immediately and check for the double-newline after
each header field, leaving the raw_header allocation for the end, which
lets us go through the header once and reduces the amount of strncmp()
calls significantly.
In unscientific testing, this has reduced a shortlog-like usage (walking
though the whole history of a branch and extracting data from the
commits) of git.git from ~830ms to ~700ms and makes the time we spend in
strncmp() negligible.
Any well-behaved program should write a descriptive message to the
reflog whenever it updates a reference. Let's make this more prominent
by removing the version without the reflog parameters.
This converts the array of parent SHAs from a git_vector where
each SHA has to be separately allocated to a git_array_t where
all the SHAs can be kept in one block. Since the two collections
have almost identical APIs, there isn't much involved in making
the change. I did add an API to git_array_t so that it could be
allocated at a precise initial size.
This adds an example implementation that emulates git cat-file.
It is a convenient and relatively simple example of getting data
out of a repository.
Implementing this also revealed that there are a number of APIs
that are still not using const pointers to objects that really
ought to be. The main cause of this is that `git_vector_bsearch`
may need to call `git_vector_sort` before doing the search, so a
const pointer to the vector is not allowed. However, for tree
objects, with a little care, we can ensure that the vector of
tree entries is always sorted and allow lookups to take a const
pointer. Also, the missing const in commit objects just looks
like an oversight.
This adds create and free callback to the git_objects_table so
that more of the creation and destruction of objects can be table
driven instead of using switch statements. This also makes the
semantics of certain object creation functions consistent so that
we can make better use of function pointers. This also fixes a
theoretical error case where an object allocation fails and we
end up storing NULL into the cache.