This makes the text similarity metric treat \r as equivalent
to \n and makes it skip whitespace immediately following a line
terminator, so line indentation will have less effect on the
difference measurement (and so \r\n will be treated as just a
single line terminator).
This also separates the text and binary hash calculators into
two separate functions instead of have more if statements inside
the loop. This should make it easier to have more differentiated
heuristics in the future if we so wish.
This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score. This can be plugged into diff similarity
scoring.
The recent changes with git_treebuilder_entrycount point out that
the test coverage for git_treebuilder_remove and
git_treebuilder_entrycount is completely absent. This adds tests.
This replaces most of the explicit vector iteration with calls
to git_vector_foreach, adds in some git__free and giterr_clear
calls to clean up during some error paths, and a couple of
other code simplifications.
The treebuilder entries vector flags removed items which means
we can't rely on the entries vector length to accurately get the
number of entries. This adds an entrycount value and maintains it
while updating the treebuilder entries.
The cppcheck static analyzer generates warnings for a bunch of
places in the libgit2 code base. All the ones fixed in this
commit are actually false positives, but I've reorganized the
code to hopefully make it easier for static analysis tools to
correctly understand the structure. I wouldn't do this if I
felt like it was making the code harder to read or worse for
humans, but in this case, these fixes don't seem too bad and will
hopefully make it easier for better analysis tools to get at any
real issues.
If gethostbyname() fails on platforms with NO_ADDRINFO, the code
leaks the struct addrinfo that was allocated. This fixes that
(and a number of code formatting issues in that area of code in
src/posix.c).
There were a number of functions assigning their return value to
`error` without much explanation. I added in some rudimentary
error checking to help flesh out the example.
Also, I reformatted all of the comments down to 80 cols (and in
some cases, slightly updated the wording).
`git_diff_blobs` and `git_diff_blob_to_buffer` skip the step
where we check file attributes because they don't have a filename
associated with the data. Unfortunately, this meant they were also
skipping the check for the GIT_DIFF_FORCE_TEXT option and so you
could not force a diff of an apparent binary file. This adds the
force text check into their code path.