This removes the need to explicitly pass the repo into iterators
where the repo is implied by the other parameters. This moves
the repo to be owned by the parent struct. Also, this has some
iterator related updates to the internal diff API to lay the
groundwork for checkout improvements.
This fixes some missed places where we can apply const-ness to
various public APIs.
There are still some index and tree APIs that cannot take const
pointers because we sort our `git_vectors` lazily and so we can't
reliably bsearch the index and tree content without applying a
`git_vector_sort()` first.
This also fixes some missed places where size_t can be used and
where const can be applied to a couple internal functions.
The index iterator could previously only be created from a repo
object, but this allows creating an iterator from a `git_index`
object instead (while keeping, though renaming, the old function).
This is a major reworking of checkout strategy options. The
checkout code is now sensitive to the contents of the HEAD tree
and the new options allow you to update the working tree so that
it will match the index content only when it previously matched
the contents of the HEAD. This allows you to, for example, to
distinguish between removing files that are in the HEAD but not
in the index, vs just removing all untracked files.
Because of various corner cases that arise, etc., this required
some additional capabilities in rmdir and other utility functions.
This includes the beginnings of an implementation of code to read
a partial tree into the index based on a pathspec, but that is
not enabled because of the possibility of creating conflicting
index entries.
There are some diff functions that are useful in a rewritten
checkout and this lays some groundwork for that. This contains
three main things:
1. Share the function diff uses to calculate the OID for a file
in the working directory (now named `git_diff__oid_for_file`
2. Add a `git_diff__paired_foreach` function to iterator over
two diff lists concurrently. Convert status to use it.
3. Move all the string/prefix/index entry comparisons into
function pointers inside the `git_diff_list` object so they
can be switched between case sensitive and insensitive
versions. This makes them easier to reuse in various
functions without replicating logic. As part of this, move
a couple of index functions out of diff.c and into index.c.
This adds a new API that allows users to reload the config if the
file has changed on disk. A new config callback function to
refresh the config was added.
The modified time and file size are used to test if the file needs
to be reloaded (and are now stored in the disk backend object).
In writing tests, just using mtime was a problem / race, so I
wanted to check file size as well. To support that, I extended
`git_futils_readbuffer_updated` to optionally check file size in
addition to mtime, and I added a new function `git_filebuf_stats`
to fetch the mtime and size for an open filebuf (so that the
config could be easily refreshed after a write).
Lastly, I moved some similar file checking code for attributes
into filebuf. It is still only being used for attrs, but it
seems potentially reusable, so I thought I'd move it over.
git_index_read_tree() was exposing a parameter to provide the user with
a progress indicator. Unfortunately, due to the recursive nature of the
tree walk, the maximum number of items to process was unknown. Thus,
the indicator was only counting processed entries, without providing
any information how the number of remaining items.
This fixes git_index_add and git_index_append to behave more like
core git, preserving old filemode data in the index when adding
and/or appending with core.filemode = false.
This also has placeholder support for core.symlinks and
core.ignorecase, but those flags are not implemented (well,
symlinks has partial support for preserving mode information in
the same way that git does, but it isn't tested).
The goal of this work is to rewrite git_status_file to use the
same underlying code as git_status_foreach.
This is done in 3 phases:
1. Extend iterators to allow ranged iteration with start and
end prefixes for the range of file names to be covered.
2. Improve diff so that when there is a pathspec and there is
a common non-wildcard prefix of the pathspec, it will use
ranged iterators to minimize excess iteration.
3. Rewrite git_status_file to call git_status_foreach_ext
with a pathspec that covers just the one file being checked.
Since ranged iterators underlie the status & diff implementation,
this is actually fairly efficient. The workdir iterator does
end up loading the contents of all the directories down to the
single file, which should ideally be avoided, but it is pretty
good.
This includes a few cleanups that came up while converting
these files.
This commit introduces a could new git error classes, including
the catchall class: GITERR_INVALID which I'm using as the class
for invalid and out of range values which are detected at too low
a level of library to use a higher level classification. For
example, an overflow error in parsing an integer or a bad letter
in parsing an OID string would generate an error in this class.
This converts blob.c, fileops.c, and all of the win32 files.
Also, various minor cleanups throughout the code. Plus, in
testing the win32 build, I cleaned up a bunch (although not
all) of the warnings with the 64-bit build.
This makes so much sense that I can't believe it hasn't been done
before. Kill the old `git_fbuffer` and read files straight into
`git_buf` objects.
Also: In order to fully support 4GB files in 32-bit systems, the
`git_buf` implementation has been changed from using `ssize_t` for
storage and storing negative values on allocation failure, to using
`size_t` and changing the buffer pointer to a magical pointer on
allocation failure.
Hopefully this won't break anything.
This takes all of the functions that look up simple data about
paths (such as `git_futils_isdir`) and moves them over to path.h
(becoming `git_path_isdir`). This leaves fileops.h just with
functions that actually manipulate the filesystem or look at
the file contents in some way.
As part of this, the dir.h header which is really just for win32
support was moved into win32 (with some minor changes).
This converts virtually all of the places that allocate GIT_PATH_MAX
buffers on the stack for manipulating paths to use git_buf objects
instead. The patch is pretty careful not to touch the public API
for libgit2, so there are a few places that still use GIT_PATH_MAX.
This extends and changes some details of the git_buf implementation
to add a couple of extra functions and to make error handling easier.
This includes serious alterations to all the path.c functions, and
several of the fileops.c ones, too. Also, there are a number of new
functions that parallel existing ones except that use a git_buf
instead of a stack-based buffer (such as git_config_find_global_r
that exists alongsize git_config_find_global).
This also modifies the win32 version of p_realpath to allocate whatever
buffer size is needed to accommodate the realpath instead of hardcoding
a GIT_PATH_MAX limit, but that change needs to be tested still.
The ownership semantics have been changed all over the library to be
consistent. There are no more "borrowed" or duplicated references.
Main changes:
- `git_repository_open2` and `3` have been dropped.
- Added setters and getters to hotswap all the repository owned
objects:
`git_repository_index`
`git_repository_set_index`
`git_repository_odb`
`git_repository_set_odb`
`git_repository_config`
`git_repository_set_config`
`git_repository_workdir`
`git_repository_set_workdir`
Now working directories/index files/ODBs and so on can be
hot-swapped after creating a repository and between operations.
- All these objects now have proper ownership semantics with
refcounting: they all require freeing after they are no longer
needed (the repository always keeps its internal reference).
- Repository open and initialization has been updated to keep in
mind the configuration files. Bare repositories are now always
detected, and a default config file is created on init.
- All the tests affected by these changes have been dropped from the
old test suite and ported to the new one.
Update all stack allocations of git_filebuf to use GIT_FILEBUF_INIT
and make git_filebuf_open and git_filebuf_cleanup safe to be called
multiple times on the same buffer.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The following files now have 0444 permissions:
- loose objects
- pack indexes
- pack files
- packs downloaded by fetch
- packs downloaded by the HTTP transport
And the following files now have 0666 permissions:
- config files
- repository indexes
- reflogs
- refs
This brings libgit2 more in line with Git.
Note that git_filebuf_commit() and git_filebuf_commit_at() have both
gained a new mode parameter.
The latter change fixes an important issue where filebufs created with
GIT_FILEBUF_TEMPORARY received 0600 permissions (due to mkstemp(3)
usage). Now we chmod() the file before renaming it into place.
Tests have been added to confirm that new commit, tag, and tree
objects are created with the right permissions. I don't have access to
Windows, so for now I've guarded the tests with "#ifndef GIT_WIN32".
When a file is updated in the index, it's path needs to be invalidated
in the tree cache as the hash is no longer correct.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
There were quite a few places were spaces were being used instead of
tabs. Try to catch them all. This should hopefully not break anything.
Except for `git blame`. Oh well.
1. The license header is technically not valid if it doesn't have a
copyright signature.
2. The COPYING file has been updated with the different licenses used in
the project.
3. The full GPLv2 header in each file annoys me.
index_init_entry() renamed to index_entry_init(). Now it allocates entry
on its own.
git_index_add() and git_index_append() reworked accordingly.
This commit fixes warning:
/home/kas/git/public/libgit2/src/index.c: In function ‘index_init_entry’:
/home/kas/git/public/libgit2/src/index.c:452:14: warning: assignment discards ‘const’ qualifier from pointer target type [enabled by default]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
/home/kas/git/public/libgit2/src/index.c: In function ‘git_index_clear’:
/home/kas/git/public/libgit2/src/index.c:228:8: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/src/index.c:235:8: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/src/index.c: In function ‘index_insert’:
/home/kas/git/public/libgit2/src/index.c:392:7: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/src/index.c:399:7: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/src/index.c: In function ‘read_unmerged’:
/home/kas/git/public/libgit2/src/index.c:681:35: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/src/index.c: In function ‘read_entry’:
/home/kas/git/public/libgit2/src/index.c:716:33: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
mode field of git_index_entry_unmerged is array of unsigned ints. It's
unsafe to cast pointer to an element of the array to long int *. It may
cause overflow in git_strtol32().
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Type casting usually points to some trick or bug. It's better not hide
it between useless type castings.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Drop the GLibc implementation of Merge Sort and replace it with Timsort.
The algorithm has been tuned to work on arrays of pointers (void **),
so there's no longer a need to abstract the byte-width of each element
in the array.
All the comparison callbacks now take pointers-to-elements, not
pointers-to-pointers, so there's now one less level of dereferencing.
E.g.
int index_cmp(const void *a, const void *b)
{
- const git_index_entry *entry_a = *(const git_index_entry **)(a);
+ const git_index_entry *entry_a = (const git_index_entry *)(a);
The result is up to a 40% speed-up when sorting vectors. Memory usage
remains lineal.
A new `bsearch` implementation has been added, whose callback also
supplies pointer-to-elements, to uniform the Vector API again.
It removes all entries with equal path except last added.
On large indexes git_index_append() + git_index_uniq() before writing is
*much* faster, than git_index_add().
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
git_index_find() in index_insert() is useless if replace is not
requested (append). Do not call it in this case.
It speedup git_index_append() *dramatically* on large indexes.
$ cat index_test.c
int main(int argc, char **argv)
{
git_index *index;
git_repository *repo;
git_odb *odb;
struct git_index_entry entry;
git_oid tree_oid;
char tree_hex[41];
int i;
git_repository_init(&repo, "/tmp/myrepo", 0);
odb = git_repository_database(repo);
git_repository_index(&index, repo);
memset(&entry, 0, sizeof(entry));
git_odb_write(&entry.oid, odb, "", 0, GIT_OBJ_BLOB);
entry.path = "test.file";
for (i = 0; i < 50000; i++)
git_index_append2(index, &entry);
git_tree_create_fromindex(&tree_oid, index);
git_oid_fmt(tree_hex, &tree_oid);
tree_hex[40] = '\0';
printf("tree: %s\n", tree_hex);
git_index_free(index);
git_repository_free(repo);
return 0;
}
Before:
$ time ./index_test
tree: 43f73659c43b651588cc81459d9e25b08721b95d
./index_test 151.19s user 0.05s system 99% cpu 2:31.78 total
After:
$ time ./index_test
tree: 43f73659c43b651588cc81459d9e25b08721b95d
./index_test 0.05s user 0.00s system 94% cpu 0.059 total
About 2573 times speedup on this test :)
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Cleaned up the structure of the whole OS-abstraction layer.
fileops.c now contains a set of utility methods for file management used
by the library. These are abstractions on top of the original POSIX
calls.
There's a new file called `posix.c` that contains
emulations/reimplementations of all the POSIX calls the library uses.
These are prefixed with `p_`. There's a specific posix file for each
platform (win32 and unix).
All the path-related methods have been moved from `utils.c` to `path.c`
and have their own prefix.
Handle Symlinks if they can be handled in Win32. This is not even
compiled. Needs review.
The lstat implementation is modified from core Git.
The readlink implementation is modified from PHP.
When calling gitfo_exists() on a symbolic link, sometimes we need to
simply check whether the link exists and sometimes we need to check
whether the file pointed to by the symlink exists.
Introduce a new function gitfo_shallow_exists that only checks if the
link exists and revert gitfo_exists to the original functionality of
checking whether the file pointed to by the link exists.
00582bc introduced a change that required the caller of
git_blob_create_fromfile() to pass a struct stat with the stat
information for the file. Several developers pointed out that this would
make life hard for the bindings developers as struct stat isn't widely
supported by other languages.
Make git_blob_create_fromfile() stat the path itself, eliminating the
need for the file to be stat'ed by the caller. This makes
index_init_entry() more costly as the file will be stat'ed twice but
makes life easier for everyone else.