Commit Graph

5164 Commits

Author SHA1 Message Date
Edward Thomson
a42c2a8c89 Rename test for multiple similar matches
A rename test that illustrates a source matching multiple targets.
2013-08-04 13:44:51 -07:00
Russell Belfer
d730d3f4f0 Major rename detection changes
After doing further profiling, I found that a lot of time was
being spent attempting to insert hashes into the file hash
signature when using the rolling hash because the rolling hash
approach generates a hash per byte of the file instead of one
per run/line of data.

To optimize this, I decided to convert back to a run-based file
signature algorithm which would be more like core Git.

After changing this, a number of the existing tests started to
fail.  In some cases, this appears to have been because the test
was coded to be too specific to the particular results of the file
similarity metric and in some cases there appear to have been bugs
in the core rename detection code where only by the coincidence
of the file similarity scoring were the expected results being
generated.

This renames all the variables in the core rename detection code
to be more consistent and hopefully easier to follow which made it
a bit easier to reason about the behavior of that code and fix the
problems that I was seeing.  I think it's in better shape now.

There are a couple of tests now that attempt to stress test the
rename detection code and they are quite slow.  Most of the time
is spent setting up the test data on disk and in the index.  When
we roll out performance improvements for index insertion, it
should also speed up these tests I hope.
2013-07-31 16:40:42 -07:00
Russell Belfer
8dd8aa480b Fix some warnings 2013-07-26 10:28:57 -07:00
Russell Belfer
a16e41729d Fix rename detection to use actual blob size
The size data in the index may not reflect the actual size of the
blob data from the ODB when content filtering comes into play.
This commit fixes rename detection to use the actual blob size when
calculating data signatures instead of the value from the index.

Because of a misunderstanding on my part, I first converted the
git_index_add_bypath API to use the post-filtered blob data size
in creating the index entry.  I backed that change out, but I
kept the overall refactoring of that routine and the new internal
git_blob__create_from_paths API because it eliminates an extra
stat() call from the code that adds a file to the index.

The existing tests actually cover this code path, at least when
running on Windows, so at this point I'm not adding new tests to
cover the changes.
2013-07-25 12:27:39 -07:00
Russell Belfer
effdbeb323 Make rename detection file size fix better
The previous fix for checking file sizes with rename detection
always loads the blob.  In this version, if the odb backend can
get the object header without loading the whole thing into memory,
then we'll just use that, so that we can eliminate possible rename
sources & targets without loading them.
2013-07-24 17:48:37 -07:00
Russell Belfer
a5140f4dda Fix rename detection for tree-to-tree diffs
The performance improvements I introduced for rename detection
were not able to run successfully for tree-to-tree diffs because
the blob size was not known early enough and so the file signature
always had to be calculated nonetheless.

This change separates loading blobs into memory from calculating
the signature.  I can't avoid having to load the large blobs into
memory, but by moving it forward, I'm able to avoid the signature
calculation if the blob won't come into play for renames.
2013-07-24 17:11:49 -07:00
Russell Belfer
f5c4d02251 Fix incorrect comment 2013-07-24 13:44:35 -07:00
Russell Belfer
397357a048 Add rename test that used to be really slow
Before the optimization commits, this test used to take about 20
seconds to run on my machine.  Afterwards, there is still a couple
seconds of data setup, but the actual diff and rename detection
runs in a fraction of a second.
2013-07-24 13:12:00 -07:00
Russell Belfer
427cc255df Use local variables in hash calc to avoid aliasing 2013-07-24 13:11:11 -07:00
Russell Belfer
18e9efc425 Don't check rename if file size difference is huge 2013-07-24 13:10:16 -07:00
Russell Belfer
69c66b554e Don't do text diff unless content will be used 2013-07-24 13:09:33 -07:00
Russell Belfer
39a1a66242 Don't unload diff data unless loaded 2013-07-24 13:09:07 -07:00
Russell Belfer
cdbcb8dd80 Merge pull request #1745 from libgit2/doc-fixes
Doc fixes
2013-07-23 09:43:07 -07:00
Carlos Martín Nieto
64061d4a14 remote: fix git_remote_download() documentation
The description of what the function does hasn't been true for quite a
while. Change it to reflect the way it currently works.

While here, remove an even older comment about missing features that
have been implemented.
2013-07-23 10:51:14 +02:00
Carlos Martín Nieto
c05a55b056 Clean up some documentation
clang's docparser highlighted these.
2013-07-23 09:40:19 +02:00
Vicent Martí
e5bdf82976 Merge pull request #1732 from libgit2/revwalk-glob-should-ignore-invalid
Invalid refs on disk cause revwalk globbing to fail
2013-07-22 23:59:08 -07:00
Russell Belfer
4cee9b8618 Update init and clean for revwalk::basic tests
The new tests don't always want to use the same fixture data as
the old ones so this makes it configurable on a per-test basis.
2013-07-22 11:41:23 -07:00
Russell Belfer
989710d982 Fix warning message about mismatched types 2013-07-22 11:22:55 -07:00
Russell Belfer
c77342ef1c Use pool for loose refdb string allocations
Instead of using lots of strdup calls, this adds a memory pool to
the loose refs iteration code and uses it for keeping track of the
loose refs array.  Memory usage could probably be reduced even
further by eliminating the vector and just scanning by adding the
strlen of each ref, but that would be a more intrusive changes.

This also updates the error handling to be more thorough about
checking for failed allocations, etc.
2013-07-22 11:20:34 -07:00
Russell Belfer
b71071313f git_reference_next_name must match git_reference_next
The git_reference_next API silently skips invalid references when
scanning the loose refs.  The git_reference_next_name API should
skip the same ones even though it isn't creating the reference
object.

This adds a test with a an invalid loose reference and makes sure
that both APIs skip the same entries and generate the same results.
2013-07-22 11:01:19 -07:00
Martin Woodward
1cd9dc29b7 Merge pull request #1743 from ethomson/readme
Clarify when to use github issues
2013-07-19 11:14:22 -07:00
Edward Thomson
bef59b1be4 Update README.md 2013-07-19 12:56:47 -05:00
Ben Straub
97309dd025 Merge pull request #1726 from crazymaster/development
git_buf_text_gather_stats doesn't work for multi-byte characters
2013-07-19 10:43:53 -07:00
Edward Thomson
41a93cc6e5 Clarify when to use github issues
Suggest that github issues are to be used for bug reports, while questions about usage should be directed to StackOverflow.
2013-07-19 12:43:08 -05:00
Ben Straub
847b8e0e44 Merge pull request #1742 from martinwoodward/Refresh-Readme
Refresh readme and contributing guidance
2013-07-19 10:29:47 -07:00
Martin Woodward
6ca83665c7 Update contributing guidance to explain PR flow
Updating the contributing guidance to explain a bit more about how we use
PR's
2013-07-19 18:20:58 +01:00
Martin Woodward
3e3d332b4c Tidy up the methods of contacting the project
Updated the methods of getting involved with the project and asking
questions.
2013-07-19 18:04:11 +01:00
Ben Straub
275d8d55b2 Typo 2013-07-18 09:37:59 -07:00
Vicent Martí
794003650e Merge pull request #1736 from ben/default-to-cdecl
Switch default calling convention to cdecl
2013-07-18 06:26:25 -07:00
Ben Straub
99a9c86cb6 Merge pull request #1722 from libgit2/ntk/fix/issue_1722
git_revparse_ext: should return a NULL reference  when the revparse expression doesn't lead to a reference
2013-07-17 20:08:15 -07:00
Vicent Martí
d2db351cf6 Merge pull request #1735 from ethomson/ignored_are_not_rename_candidates
don't include ignored as rename candidates
2013-07-17 16:12:15 -07:00
Edward Thomson
d55bed1a25 don't include ignored as rename candidates 2013-07-17 16:55:00 -05:00
Ben Straub
e49dc6872d Switch default calling convention to cdecl. 2013-07-17 14:06:31 -07:00
Ben Straub
4e05fa7db4 Merge pull request #1731 from alindeman/patch-1
Small grammar fix in docs
2013-07-15 20:45:18 -07:00
Andy Lindeman
51b0397a66 Small grammar fix in docs 2013-07-15 23:40:57 -04:00
Vicent Martí
f538515079 Merge pull request #1728 from ivoire/small_fixes
Small fixes
2013-07-15 09:45:04 -07:00
Vicent Martí
3f8086e069 Merge pull request #1729 from tiennou/remote-owner
Add `git_remote_owner`.
2013-07-15 09:44:02 -07:00
Etienne Samson
85e1eded6a Add git_remote_owner 2013-07-15 16:31:25 +02:00
Rémi Duraffort
c6451624c4 Fix some more memory leaks in error path 2013-07-15 16:29:18 +02:00
Rémi Duraffort
050af8bbe0 pack: fix memory leak in error path 2013-07-15 16:29:13 +02:00
Rémi Duraffort
8d6ef4bf78 index: fix potential memory leaks 2013-07-15 16:29:09 +02:00
Rémi Duraffort
9146f1e57e repository: clarify assignment and test order 2013-07-15 16:29:00 +02:00
crazymaster
d0b25d9dff Fix 2013-07-15 08:14:00 +09:00
crazymaster
2185dd6f99 Fix typo 2013-07-15 08:06:09 +09:00
crazymaster
b74d4478df Fix the initial line 2013-07-15 07:44:08 +09:00
crazymaster
19bee769d4 Revert "Replace Japanese characters with the encoded hexadecimal values"
This reverts commit a91e4d6b21.
2013-07-15 07:39:16 +09:00
crazymaster
a91e4d6b21 Replace Japanese characters with the encoded hexadecimal values 2013-07-15 07:30:18 +09:00
Russell Belfer
351733128c Merge pull request #1727 from alindeman/lookup-object-doc-fix
Fixes return type documentation
2013-07-14 15:16:08 -07:00
Andy Lindeman
960431c380 Fixes return type documentation 2013-07-14 18:08:54 -04:00
crazymaster
6550565af3 Fix gather_stats 2013-07-14 21:08:45 +09:00