Commit Graph

37 Commits

Author SHA1 Message Date
Carlos Martín Nieto
a3ffbf230e pack: expose a cached delta base directly
Instead of going through a special entry in the chain, let's pass it as
an output parameter.
2014-05-13 02:48:48 +02:00
Carlos Martín Nieto
a332e91c92 pack: use a cache for delta bases when unpacking
Bring back the use of the delta base cache for unpacking objects. When
generating the delta chain, we stop when we find a delta base in the
pack's cache and use that as the starting point.
2014-05-09 09:40:29 +02:00
Carlos Martín Nieto
2acdf4b854 pack: unpack using a loop
We currently make use of recursive function calls to unpack an object,
resolving the deltas as we come back down the chain. This means that we
have unbounded stack growth as we look up objects in a pack.

This is now done in two steps: first we figure out what the dependency
chain is by looking up the delta bases until we reach a non-delta
object, pushing the information we need onto a stack and then we pop
from that stack and apply the deltas until there are no more left.

This version of the code does not make use of the delta base cache so it
is slower than what's in the mainline. A later commit will reintroduce
it.
2014-05-09 09:40:29 +02:00
Linquize
8610487cd3 Drop parsing pack filename SHA1 part, no one cares the filename 2014-01-23 23:28:28 +08:00
Vicent Marti
51a3dfb595 pack: __object_header always returns unsigned values 2013-11-01 17:36:09 +01:00
Linquize
3343b5ffd3 Fix warning on win64 2013-11-01 17:36:04 +01:00
Carlos Martín Nieto
51e82492ef pack: move the object header function here 2013-10-04 10:18:20 +02:00
Russell Belfer
5d2d21e536 Consolidate packfile allocation further
Rename git_packfile_check to git_packfile_alloc since it is now
being used more in that capacity.  Fix the various places that use
it.  Consolidate some repeated code in odb_pack.c related to the
allocation of a new pack_backend.
2013-04-22 16:52:07 +02:00
Russell Belfer
5360786885 Further threading fixes
This builds on the earlier thread safety work to make it so that
setting the odb, index, refdb, or config for a repository is done
in a threadsafe manner with minimized locking time.  This is done
by adding a lock to the repository object and using it to guard
the assignment of the above listed pointers.  The lock is only
held to assign the pointer value.

This also contains some minor fixes to the other work with pack
files to reduce the time that locks are being held to and fix an
apparently memory leak.
2013-04-22 16:52:07 +02:00
Russell Belfer
24c70804e8 Add mutex around mapping and unmapping pack files
When I was writing threading tests for the new cache, the main
error I kept running into was a pack file having it's content
unmapped underneath the running thread.  This adds a lock around
the routines that map and unmap the pack data so that threads can
effectively reload the data when they need it.

This also required reworking the error handling paths in a couple
places in the code which I tried to make consistent.
2013-04-22 16:50:51 +02:00
Carlos Martín Nieto
0e040c031e indexer: use a hashtable for keeping track of offsets
These offsets are needed for REF_DELTA objects, which encode which
object they use as a base, but not where it lies in the packfile, so
we need a list.

These objects are mostly from older packfiles, before OFS_DELTA was
widely spread. The time spent in indexing these packfiles is greatly
reduced, though remains above what git is able to do.
2013-03-03 23:18:29 +01:00
Carlos Martín Nieto
96c9b9f0e5 indexer: properly free the packfile resources
The indexer needs to call the packfile's free function so it takes care of
freeing the caches.

We still need to close the mwf descriptor manually so we can rename the
packfile into its final name on Windows.
2013-01-12 18:44:58 +01:00
Carlos Martín Nieto
80d647adc3 Revert "pack: packfile_free -> git_packfile_free and use it in the indexers"
This reverts commit f289f886cb, which
makes the tests fail on Windows. Revert until we can figure out a
solution.
2013-01-11 20:17:21 +01:00
Vicent Marti
d0b14cea0e pack: That declaration 2013-01-11 18:21:09 +01:00
Carlos Martín Nieto
0ed7562006 pack: limit the amount of memory the base delta cache can use
Currently limited to 16MB (like git) and to objects up to 1MB in
size.
2013-01-11 17:32:59 +01:00
Carlos Martín Nieto
c8f79c2bdf pack: abstract out the cache into its own functions 2013-01-11 17:32:59 +01:00
Carlos Martín Nieto
525d961c24 pack: refcount entries and add a mutex around cache access 2013-01-11 16:55:37 +01:00
Carlos Martín Nieto
c0f4a0118d pack: introduce a delta base cache
Many delta bases are re-used. Cache them to avoid inflating the same
data repeatedly.

This version doesn't limit the amount of entries to store, so it can
end up using a considerable amound of memory.
2013-01-11 16:55:37 +01:00
Edward Thomson
359fc2d241 update copyrights 2013-01-08 17:31:27 -06:00
Vicent Martí
0249a5032e Merge pull request #1091 from carlosmn/stream-object
Indexer speedup with large objects
2012-12-07 09:40:21 -08:00
David Michael Barr
44f9f54797 pack: add git_packfile_resolve_header
To paraphrase @peff:

You can get both size and type from a packed object reasonably cheaply.
If you have:

* An object that is not a delta; both type and size are available in the
  packfile header.
* An object that is a delta. The packfile type will be OBJ_*_DELTA, and
  you have to resolve back to the base to find the real type. That means
  potentially a lot of packfile index lookups, but each one is
  relatively cheap. For the size, you inflate the first few bytes of the
  delta, whose header will tell you the resulting size of applying the
  delta to the base.

For simplicity, we just decompress the whole delta for now.
2012-12-03 10:39:17 +11:00
Carlos Martín Nieto
46635339e9 pack: introduce a streaming API for raw objects
This allows us to take objects from the packfile as a stream instead
of having to keep it all in memory.
2012-11-30 15:55:23 +01:00
Russell Belfer
c3fb7d04ed Make git_odb_foreach_cb take const param
This makes the first OID param of the ODB callback a const pointer
and also propogates that change all the way to the backends.
2012-11-27 15:00:49 -08:00
David Michael Barr
60ecdf59d3 pack: iterate objects in offset order
Compute the ordering on demand and persist until the index is freed.
2012-09-14 15:52:41 +10:00
nulltoken
b8457baae2 portability: Improve x86/amd64 compatibility 2012-07-24 16:10:12 +02:00
Carlos Martín Nieto
521aedad30 odb: add git_odb_foreach()
Go through each backend and list every objects that exists in
them. This allows fsck-like uses.
2012-07-03 12:50:51 +02:00
Carlos Martín Nieto
fa679339c4 Add packfile_unpack_compressed() to the internal header 2012-04-13 22:16:48 +02:00
Russell Belfer
e1de726c15 Migrate ODB files to new error handling
This migrates odb.c, odb_loose.c, odb_pack.c and pack.c to
the new style of error handling.  Also got the unix and win32
versions of map.c.  There are some minor changes to other
files but no others were completely converted.

This also contains an update to filebuf so that a zeroed out
filebuf will not think that the fd (== 0) is actually open
(and inadvertently call close() on fd 0 if cleaned up).

Lastly, this was built and tested on win32 and contains a
bunch of fixes for the win32 build which was pretty broken.
2012-03-12 22:55:40 -07:00
schu
5e0de32818 Update Copyright header
Signed-off-by: schu <schu-github@schulog.org>
2012-02-13 17:11:09 +01:00
Brodie Rao
01ad7b3a9e *: correct and codify various file permissions
The following files now have 0444 permissions:

- loose objects
- pack indexes
- pack files
- packs downloaded by fetch
- packs downloaded by the HTTP transport

And the following files now have 0666 permissions:

- config files
- repository indexes
- reflogs
- refs

This brings libgit2 more in line with Git.

Note that git_filebuf_commit() and git_filebuf_commit_at() have both
gained a new mode parameter.

The latter change fixes an important issue where filebufs created with
GIT_FILEBUF_TEMPORARY received 0600 permissions (due to mkstemp(3)
usage). Now we chmod() the file before renaming it into place.

Tests have been added to confirm that new commit, tag, and tree
objects are created with the right permissions. I don't have access to
Windows, so for now I've guarded the tests with "#ifndef GIT_WIN32".
2011-10-14 16:07:47 -07:00
Vicent Marti
87d9869fc3 Tabify everything
There were quite a few places were spaces were being used instead of
tabs. Try to catch them all. This should hopefully not break anything.
Except for `git blame`. Oh well.
2011-09-19 03:34:49 +03:00
Vicent Marti
bb742ede3d Cleanup legal data
1. The license header is technically not valid if it doesn't have a
copyright signature.

2. The COPYING file has been updated with the different licenses used in
the project.

3. The full GPLv2 header in each file annoys me.
2011-09-19 01:54:32 +03:00
Carlos Martín Nieto
c1af5a3935 Implement cooperative caching
When indexing a file with ref deltas, a temporary cache for the
offsets has to be built, as we don't have an index file yet. If the
user takes the responsiblity for filling the cache, the packing code
will look there first when it finds a ref delta.

Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
2011-08-18 02:34:08 +02:00
Carlos Martín Nieto
b5b474dd0d Modify the given offset in git_packfile_unpack
The callers immediately throw away the offset, so we don't need any
logical changes in any of them. This will be useful for the indexer,
as it does need to know where the compressed data ends.

Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
2011-08-02 21:42:03 +02:00
Carlos Martín Nieto
a070f152bd Move pack functions to their own file 2011-08-02 21:42:03 +02:00
Carlos Martín Nieto
7d0cdf82be Make packfile_unpack_header more generic
On the way, store the fd and the size in the mwindow file.

Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
2011-08-02 21:42:03 +02:00
Carlos Martín Nieto
c7c9e18388 Move the pack structs to an internal header 2011-08-02 20:53:51 +02:00