Commit Graph

4455 Commits

Author SHA1 Message Date
Russell Belfer
960a04dd56 Initial integration of similarity metric to diff
This is the initial integration of the similarity metric into
the `git_diff_find_similar()` code path.  The existing tests all
pass, but the new functionality isn't currently well tested.  The
integration does go through the pluggable metric interface, so it
should be possible to drop in an alternative to the internal
metric that libgit2 implements.

This comes along with a behavior change for an existing interface;
namely, passing two NULLs to git_diff_blobs (or passing NULLs to
git_diff_blob_to_buffer) will now call the file_cb parameter zero
times instead of one time.  I know it's strange that that change
is paired with this other change, but it emerged from some
initialization changes that I ended up making.
2013-02-21 12:40:33 -08:00
Vicent Martí
0309e85045 Merge pull request #1352 from ethomson/reuc_sort
add a sorter to the reuc on index creation
2013-02-21 09:05:48 -08:00
Edward Thomson
eb5ffd1944 add a sorter to the reuc on index creation 2013-02-21 11:00:29 -06:00
Russell Belfer
71a3d27ea6 Replace diff delta binary with flags
Previously the git_diff_delta recorded if the delta was binary.
This replaces that (with no net change in structure size) with
a full set of flags.  The flag values that were already in use
for individual git_diff_file objects are reused for the delta
flags, too (along with renaming those flags to make it clear that
they are used more generally).

This (a) makes things somewhat more consistent (because I was
using a -1 value in the "boolean" binary field to indicate unset,
whereas now I can just use the flags that are easier to understand),
and (b) will make it easier for me to add some additional flags to
the delta object in the future, such as marking the results of a
copy/rename detection or other deltas that might want a special
indicator.

While making this change, I officially moved some of the flags that
were internal only into the private diff header.

This also allowed me to remove a gross hack in rename/copy detect
code where I was overwriting the status field with an internal
value.
2013-02-20 15:10:21 -08:00
Russell Belfer
9bc8be3d7e Refine pluggable similarity API
This plugs in the three basic similarity strategies for handling
whitespace via internal use of the pluggable API.  In so doing, I
realized that the use of git_buf in the hashsig API was not needed
and actually just made it harder to use, so I tweaked that API as
well.

Note that the similarity metric is still not hooked up in the
find_similarity code - this is just setting out the function that
will be used.
2013-02-20 15:09:41 -08:00
Russell Belfer
a235e9d355 Pluggable similarity metric API 2013-02-20 15:09:41 -08:00
Russell Belfer
aa6432604e More tests of file signatures with whitespace opts
Seems to be working pretty well...
2013-02-20 15:09:41 -08:00
Russell Belfer
5e5848eb15 Change similarity metric to sampled hashes
This moves the similarity metric code out of buf_text and into a
new file.  Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes.  In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.
2013-02-20 15:09:40 -08:00
Russell Belfer
99ba8f2322 wip: adding metric to diff 2013-02-20 15:09:40 -08:00
Russell Belfer
f3327cac1d Some similarity metric adjustments
This makes the text similarity metric treat \r as equivalent
to \n and makes it skip whitespace immediately following a line
terminator, so line indentation will have less effect on the
difference measurement (and so \r\n will be treated as just a
single line terminator).

This also separates the text and binary hash calculators into
two separate functions instead of have more if statements inside
the loop. This should make it easier to have more differentiated
heuristics in the future if we so wish.
2013-02-20 15:09:40 -08:00
Russell Belfer
9c454b007b Initial implementation of similarity scoring algo
This adds a new `git_buf_text_hashsig` type and functions to
generate these hash signatures and compare them to give a
similarity score.  This can be plugged into diff similarity
scoring.
2013-02-20 15:09:40 -08:00
Vicent Martí
f2e1d06064 Merge pull request #1351 from arrbee/moar-treebuilder-tests
Add more treebuilder tests
2013-02-20 12:00:51 -08:00
Russell Belfer
0cfce06d08 Add more treebuilder tests
The recent changes with git_treebuilder_entrycount point out that
the test coverage for git_treebuilder_remove and
git_treebuilder_entrycount is completely absent.  This adds tests.
2013-02-20 11:58:21 -08:00
Vicent Martí
6ec37f7232 Merge pull request #1350 from arrbee/fix-1292
Add explicit entrycount to tree builder
2013-02-20 11:42:15 -08:00
Russell Belfer
e223717902 Some code cleanups in tree.c
This replaces most of the explicit vector iteration with calls
to git_vector_foreach, adds in some git__free and giterr_clear
calls to clean up during some error paths, and a couple of
other code simplifications.
2013-02-20 10:58:56 -08:00
Russell Belfer
93ab370b53 Store treebuilder length separately from entries vec
The treebuilder entries vector flags removed items which means
we can't rely on the entries vector length to accurately get the
number of entries.  This adds an entrycount value and maintains it
while updating the treebuilder entries.
2013-02-20 10:50:01 -08:00
Russell Belfer
f7511c2c69 Merge pull request #1348 from libgit2/signatures-2
Simplify signature parsing
2013-02-20 10:19:58 -08:00
Vicent Martí
fd48d84317 Merge pull request #1349 from libgit2/clar-no-cache
Disable caching in Clar
2013-02-20 10:07:14 -08:00
Vicent Marti
63964c891b Disable caching in Clar 2013-02-20 18:49:00 +01:00
Vicent Marti
cf80993a50 signature: Small cleanup 2013-02-20 18:46:10 +01:00
Vicent Marti
41051e3fe1 signature: Shut up MSVC, you silly goose 2013-02-20 17:09:51 +01:00
Vicent Marti
c51880eeaf Simplify signature parsing 2013-02-20 17:03:18 +01:00
Vicent Martí
fd69c7bf9a Merge pull request #1344 from arrbee/fix-static-analyzer-issues
Fix static analyzer issues
2013-02-17 02:41:58 -08:00
Russell Belfer
56543a609a Clear up warnings from cppcheck
The cppcheck static analyzer generates warnings for a bunch of
places in the libgit2 code base.  All the ones fixed in this
commit are actually false positives, but I've reorganized the
code to hopefully make it easier for static analysis tools to
correctly understand the structure.  I wouldn't do this if I
felt like it was making the code harder to read or worse for
humans, but in this case, these fixes don't seem too bad and will
hopefully make it easier for better analysis tools to get at any
real issues.
2013-02-15 16:02:45 -08:00
Russell Belfer
71d62d3905 Fix memory leak in p_getaddrinfo on Amiga
If gethostbyname() fails on platforms with NO_ADDRINFO, the code
leaks the struct addrinfo that was allocated.  This fixes that
(and a number of code formatting issues in that area of code in
src/posix.c).
2013-02-15 16:01:31 -08:00
Russell Belfer
a7ed746093 Add rudimentary error checks and reformat comments
There were a number of functions assigning their return value to
`error` without much explanation.  I added in some rudimentary
error checking to help flesh out the example.

Also, I reformatted all of the comments down to 80 cols (and in
some cases, slightly updated the wording).
2013-02-15 15:58:13 -08:00
Vicent Martí
1d75acf7b7 Merge pull request #1342 from ghedo/development
push: fix typo in git_push_finish() doc
2013-02-15 04:21:41 -08:00
Alessandro Ghedini
91f7335e1c push: fix typo in git_push_finish() doc 2013-02-15 13:12:16 +01:00
Vicent Martí
fcd7733ded Merge pull request #1318 from nulltoken/topic/diff-tree-coverage
Topic/diff tree coverage
2013-02-14 12:49:46 -08:00
Vicent Martí
c9d17120ce Merge pull request #1340 from schu/push-docs
push: improve docs on success / failure of git_push_finish
2013-02-14 11:33:47 -08:00
Michael Schubert
a53b5e5fc3 push: improve docs on success / failure of git_push_finish 2013-02-14 20:22:48 +01:00
Ben Straub
a9e1339c06 Fix a leak when canceling a network operation 2013-02-14 08:12:55 -08:00
Philip Kelley
2fe67aeb10 Fix a git_filebuf leak (fixes Win32 clone::can_cancel) 2013-02-14 08:46:58 -05:00
Vicent Martí
b78600255c Merge pull request #1335 from phkelley/development
Improve MSVC compiler, linker flags
2013-02-14 03:58:11 -08:00
Philip Kelley
5f633e911e Change git2.rc to identify git.dll as VOS_NT_WINDOWS32 2013-02-13 18:12:51 -05:00
Philip Kelley
19be3f9e65 Improve MSVC compiler, linker flags 2013-02-13 16:01:14 -05:00
Ben Straub
6a0ffe84a7 Merge pull request #1333 from phkelley/push_options
Add git_push_options, to set packbuilder parallelism
2013-02-12 10:50:55 -08:00
Russell Belfer
fbe67de997 Merge pull request #1246 from arrbee/fix-force-text-for-diff-blobs
Add FORCE_TEXT check into git_diff_blobs code path
2013-02-12 10:16:30 -08:00
Russell Belfer
9c258af094 Merge pull request #1316 from ben/clone-cancel
Allow network operations to cancel
2013-02-12 10:13:56 -08:00
Russell Belfer
c2c0874de2 More diff tests with binary data 2013-02-11 14:45:46 -08:00
Russell Belfer
ed55fd8bf8 Reorganize FORCE_TEXT diff flag checks 2013-02-11 14:45:46 -08:00
Russell Belfer
c2907575ec Add FORCE_TEXT check into git_diff_blobs code path
`git_diff_blobs` and `git_diff_blob_to_buffer` skip the step
where we check file attributes because they don't have a filename
associated with the data. Unfortunately, this meant they were also
skipping the check for the GIT_DIFF_FORCE_TEXT option and so you
could not force a diff of an apparent binary file.  This adds the
force text check into their code path.
2013-02-11 14:45:46 -08:00
Russell Belfer
40a605104c Merge pull request #1324 from nulltoken/topic/remote_isvalidname
Topic/remote isvalidname
2013-02-11 14:35:41 -08:00
nulltoken
2bca5b679b remote: Introduce git_remote_is_valid_name()
Fix libgit2/libgit2sharp#318
2013-02-11 23:19:41 +01:00
nulltoken
4d811c3b77 refs: No component of a refname can end with '.lock' 2013-02-11 23:19:40 +01:00
nulltoken
624924e876 remote: reorganize tests 2013-02-11 23:19:39 +01:00
Russell Belfer
390a3c8141 Merge pull request #1190 from nulltoken/topic/reset-paths
reset: Allow the selective reset of pathspecs
2013-02-11 11:44:00 -08:00
Philip Kelley
e026cfee00 Merge pull request #1323 from jamill/resolve_remote
Resolve a remote branch's remote
2013-02-11 09:12:39 -08:00
Jameson Miller
db4bb4158f Teach refspec to transform destination reference to source reference 2013-02-11 11:36:28 -05:00
Jameson Miller
2e3e8c889b Teach remote branch to return its remote 2013-02-11 11:36:22 -05:00