Error messages should be sentence fragments, and therefore:
1. Should not begin with a capital letter,
2. Should not conclude with punctuation, and
3. Should not end a sentence and begin a new one
When we read from the list which `limit_list()` gives us, we need to check that
the commit is still interesting, as it might have become uninteresting after it
was added to the list.
It changed from implementation-defined to git's default sorting, as there are
systems (e.g. rebase) which depend on this order. Also specify more explicitly
how you can get git's "date-order".
After `limit_list()` we already have the list in time-sorted order, which is
what we want in the "default" case. Enqueueing into the "unsorted" list would
just reverse it, and the topological sort will do its own sorting if it needs
to.
We've now moved to code that's closer to git and produces the output
during the preparation phase, so we no longer process the commits as
part of generating the output.
This makes a chunk of code redundant, as we're simply short-circuiting
it by detecting we've processed the commits alrady.
After porting over the commit hiding and selection we were still left
with mistmaching output due to the topologial sort.
This ports the topological sorting code to make us match with our
equivalent of `--date-order` and `--topo-order` against the output
from `rev-list`.
We had some home-grown logic to figure out which objects to show during
the revision walk, but it was rather inefficient, looking over the same
list multiple times to figure out when we had run out of interesting
commits. We now use the lists in a smarter way.
We also introduce the slop mechanism to determine when to stpo
looking. When we run out of interesting objects, we continue preparing
the walk for another 5 rounds in order to make it less likely that we
miss objects in situations with complex graphs.
When walking backwards and marking parents uninteresting, make sure we
detect when the list of commits we have left has run out of
uninteresting commits so we can stop marking commits as
uninteresting. Failing to do so can mean that we walk the whole history
marking everything uninteresting, which eats up time, CPU and IO for
with useless work.
While pre-marking does look for this, we still need to check during the
main traversal as there are setups for which pre-marking does not leave
enough information in the commits. This can happen if we push a commit
and hide its parent.
When a commit is first set as unintersting and then pushed, we must take
care that we do not put it into the commit list as that makes us return
at least that commit (but maybe more) as we've inserted it into the list
because we have the assumption that we want anything in the commit list.
Keep the definitions in the headers, while putting the declarations in
the C files. Putting the function definitions in headers causes
them to be duplicated if you include two headers with them.
There are some combination of objects and target types which we know
cannot be fulfilled. Return EINVALIDSPEC for those to signify that there
is a mismatch in the user-provided data and what the object model is
capable of satisfying.
If we start at a tag and in the course of peeling find out that we
cannot reach a particular type, we return EPEEL.
If there have been no pushes, we can immediately return ITEROVER. If
there have been no hides, we must not run the uninteresting pre-mark
phase, as we do not want to hide anything and this would simply cause us
to spend time loading objects.
This introduces a phase at the start of preparing a walk which pre-marks
uninteresting commits, but only up to the common ancestors.
We do this in a similar way to git, by walking down the history and
marking (which is what we used to do), but we keep a time-sorted
priority queue of commits and stop marking as soon as there are only
uninteresting commits in this queue.
This is a similar rule to the one used to find the merge-base. As we
keep inserting commits regardless of the uninteresting bit, if there are
only uninteresting commits in the queue, it means we've run out of
interesting commits in our walk, so we can stop.
The old mark_unintesting() logic is still in place, but that stops
walking if it finds an already-uninteresting commit, so it will stop on
the ones we've pre-marked; but keeping it allows us to also hide those
that are hidden via the callback.
Preallocating two commits doesn't make much sense as leaving allocation
to the first array usage will allocate a sensible size with room for
growth.
This preallocation has also been hiding issues with strict aliasing in
the tests, as we have fairly simple histories and never trigger the
growth.
Instead of using a sentinel empty value to detect the last commit, let's
check for when we get a NULL from popping the stack, which lets us know
when we're done.
The current code causes us to read uninitialized data, although only on
RHEL/CentOS 6 in release mode. This is a readability win overall.
As a way to speed up the cases where we need to hide some commits, we
find out what the merge bases are so we know to stop marking commits as
uninteresting and avoid walking down a potentially very large amount of
commits which we will never see. There are however two oversights in
current code.
The merge-base finding algorithm fails to recognize that if it is only
given one commit, there can be no merge base. It instead walks down the
whole ancestor chain needlessly. Make it return an empty list
immediately in this situation.
The revwalk does not know whether the user has asked to hide any commits
at all. In situation where the user pushes multiple commits but doesn't
hide any, the above fix wouldn't do the trick. Keep track of whether the
user wants to hide any commits and only run the merge-base finding
algorithm when it's needed.
Let the user push committish objects and peel them to figure out which
commit to push to our queue.
This is for convenience and for allowing uses of
git_revwalk_push_glob(w, "tags")
with annotated tags.
This updates the git_pqueue to simply be a set of specialized
init/insert/pop functions on a git_vector.
To preserve the pqueue feature of having a fixed size heap, I
converted the "sorted" field in git_vectors to a more general
"flags" field so that pqueue could mix in it's own flag. This
had a bunch of ramifications because a number of places were
directly looking at the vector "sorted" field - I added a couple
new git_vector helpers (is_sorted, set_sorted) so the specific
representation of this information could be abstracted.
I accidentally wrote a separate priority queue implementation when
I was working on file rename detection as part of the file hash
signature calculation code. To simplify licensing terms, I just
adapted that to a general purpose priority queue and replace the
old priority queue implementation that was borrowed from elsewhere.
This also removes parts of the COPYING document that no longer
apply to libgit2.