libgit2

proxmox-mirrors/libgit2

Fork 0

mirror of https://git.proxmox.com/git/libgit2 synced 2025-12-07 12:08:42 +00:00

Commit Graph

Author	SHA1	Message	Date
Russell Belfer	9bc8be3d7e	Refine pluggable similarity API This plugs in the three basic similarity strategies for handling whitespace via internal use of the pluggable API. In so doing, I realized that the use of git_buf in the hashsig API was not needed and actually just made it harder to use, so I tweaked that API as well. Note that the similarity metric is still not hooked up in the find_similarity code - this is just setting out the function that will be used.	2013-02-20 15:09:41 -08:00
Russell Belfer	5e5848eb15	Change similarity metric to sampled hashes This moves the similarity metric code out of buf_text and into a new file. Also, this implements a different approach to similarity measurement based on a Rabin-Karp rolling hash where we only keep the top 100 and bottom 100 hashes. In theory, that should be sufficient samples to given a fairly accurate measurement while limiting the amount of data we keep for file signatures no matter how large the file is.	2013-02-20 15:09:40 -08:00

Author

SHA1

Message

Date

Russell Belfer

9bc8be3d7e

Refine pluggable similarity API

This plugs in the three basic similarity strategies for handling
whitespace via internal use of the pluggable API.  In so doing, I
realized that the use of git_buf in the hashsig API was not needed
and actually just made it harder to use, so I tweaked that API as
well.

Note that the similarity metric is still not hooked up in the
find_similarity code - this is just setting out the function that
will be used.

2013-02-20 15:09:41 -08:00

Russell Belfer

5e5848eb15

Change similarity metric to sampled hashes

This moves the similarity metric code out of buf_text and into a
new file.  Also, this implements a different approach to similarity
measurement based on a Rabin-Karp rolling hash where we only keep
the top 100 and bottom 100 hashes.  In theory, that should be
sufficient samples to given a fairly accurate measurement while
limiting the amount of data we keep for file signatures no matter
how large the file is.

2013-02-20 15:09:40 -08:00

2 Commits