/home/kas/git/public/libgit2/src/revwalk.c: In function ‘object_table_hash’:
/home/kas/git/public/libgit2/src/revwalk.c:120:7: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
home/kas/git/public/libgit2/src/transport_local.c: In function ‘cmp_refs’:
/home/kas/git/public/libgit2/src/transport_local.c:19:22: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/src/transport_local.c:20:22: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
/home/kas/git/public/libgit2/src/reflog.c: In function ‘reflog_parse’:
/home/kas/git/public/libgit2/src/reflog.c:148:17: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
/home/kas/git/public/libgit2/src/commit.c: In function ‘commit_parse_buffer’:
/home/kas/git/public/libgit2/src/commit.c:186:23: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
/home/kas/git/public/libgit2/src/commit.c:187:27: warning: cast discards ‘__attribute__((const))’ qualifier from pointer target type [-Wcast-qual]
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
In reference_read we stat a file and then call futils which stats it
again. Use git_futils_readbuffer_updated to avoid the extra stat
call. This introduces another parameter which is used to tell the
caller whether the file was read or not.
Modify the callers to take advantage of this new feature. This change
removes ~140 stat calls from the test suite.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
This extends the git_fuitls_readbuffer function to only read in if the
file's modification date is later than the given one. Some code paths
want to check a file's modification date in order to decide whether
they should read it or not. If they do want to read it, another stat
call is done by futils. This function combines these two operations so
we avoid one stat call each time we read a new or updated file.
The git_futils_readbuffer functions is now a wrapper around the new
function.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Current implementation of git_reference_rename() removes 'ref' from
loose cache, but not frees it. In result 'ref' is not reachable any more
and we have got memory leak.
Let's re-add 'ref' with corrected name to loose cache instead of
'new_ref' and free 'new_ref' properly.
'rollback' path seems leak too. git_reference_rename() need to be rewritten
for proper resource management.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
It's not obvious that recurse_tree_entries or recurse_tree_entry
should free a resource that wasn't allocated by them. Do this
explicitely and plug a leak while we're at it.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
The old matcher was returning fake matches when given stupid entry
names. E.g.
`git2` could be matched by `git2 /`, `git2/foobar`, git2/////`
and other stupid stuff
Fixes#127 (that was quite an outstanding issue).
Rationale:
The tree objects on Git are stored and read following a very specific
sorting algorithm that places folders before files. That original sort
was the sort we were storing on memory, but this sort was being queried
with a binary search that used a simple `strcmp` for comparison, so
there were many instances where the search was failing.
Obviously, the most straightforward way to fix this is changing the
binary search CB to use the same comparison method as the sorting CB.
The problem with this is that the binary search callback compares a path
and an entry, so there is no way to know if the given path is a folder
or a standard file.
How do we work around this? Instead of splitting the `entry_byname`
method in two (one for searching directories and one for searching
normal files), we just assume that the path we are searching for is of
the same kind as the path it's being compared at the moment.
return git_futils_cmp_path(
ksearch->filename, ksearch->filename_len, entry->attr & 040000,
entry->filename, entry->filename_len, entry->attr & 040000);
Since there cannot be a folder and a regular file with the same name on
the same tree, the most basic equality check will always fail
for all comparsions, until our path is compared with the actual entry we
are looking for; in this case, the matching will succeed with the file
type of the entry -- whatever it was initially.
I hope that makes sense.
PS: While I was at it, I switched the cmp methods to use cached values
for the length of each filename. That makes searches and sorts
retardedly fast -- I was wondering the reason of the performance hiccups
on massive trees; it's because of 2*strlen for each comparsion call.
mode field of git_index_entry_unmerged is array of unsigned ints. It's
unsafe to cast pointer to an element of the array to long int *. It may
cause overflow in git_strtol32().
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Type casting usually points to some trick or bug. It's better not hide
it between useless type castings.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
The `hashfile` function has been moved to ODB, next to `git_odb_hash`.
Global state has been removed from the dirent call in `status.c`,
because global state is killing the rainforest and causing global
warming.
Add git_status_hashfile() to get blob's object id for a file without adding
it to the object database or needing a repository at all.
This functionality is similar to `git hash-object` without '-w'.
The direct-writes commit left some (slow) internals methods that
were no longer needed. These have been removed.
Also, the Reflog code was using the old `git_signature__write`, so
it has been rewritten to use a normal buffer and the new `writebuf`
signature writer. It's now slightly simpler and faster.
DIRECT WRITES ARE BACK AND FASTER THAN EVER. The streaming writer to the
ODB was an overkill for the smaller objects like Commit and Tags; most
of the streaming logic was taking too long.
This commit makes Commits, Tags and Trees to be built-up in memory, and
then written to disk in 2 pushes (header + data), instead of streaming
everything.
This is *always* faster, even for big files (since the git_filebuf class
still does streaming writes when the memory cache overflows). This is
also a gazillion lines of code smaller, because we don't have to
precompute the final size of the object before starting the stream (this
was kind of defeating the point of streaming, anyway).
Blobs are still written with full streaming instead of loading them in
memory, since this is still the fastest way.
A new `git_buf` class has been added. It's missing some features, but
it'll get there.
Our good, lovely folks at Microsoft decided that there was no good
reason to make `vsnprintf` compilant with the C standard, so that
function in Windows returns -1 on overflow, instead of returning the
actual byte count needed to write the full string.
We now handle this situation more gracefully with the POSIX
compatibility layer, by returning the needed byte size using an
auxiliary method instead of blindly resizing the target buffer until it
fits.
This means we can now support `printf`s of any size by allocating a
temporary buffer. That's good.
So far libgit2 didn't support reference logs (reflog). Add a new
git_reflog_* API for basic reading and writing of reflogs:
* git_reflog_read
* git_reflog_write
* git_reflog_free
Signed-off-by: schu <schu-github@schulog.org>
Drop the GLibc implementation of Merge Sort and replace it with Timsort.
The algorithm has been tuned to work on arrays of pointers (void **),
so there's no longer a need to abstract the byte-width of each element
in the array.
All the comparison callbacks now take pointers-to-elements, not
pointers-to-pointers, so there's now one less level of dereferencing.
E.g.
int index_cmp(const void *a, const void *b)
{
- const git_index_entry *entry_a = *(const git_index_entry **)(a);
+ const git_index_entry *entry_a = (const git_index_entry *)(a);
The result is up to a 40% speed-up when sorting vectors. Memory usage
remains lineal.
A new `bsearch` implementation has been added, whose callback also
supplies pointer-to-elements, to uniform the Vector API again.
`git_futils_rmdir_r`: rename, clean up.
`git_reference_rename`: cleanup. Do not use 3x4096 buffers on the stack
or things will get ugly very fast. We can reuse the same buffer.
MSYS/MinGW uses winsock but obviously doesn't set _MSC_VER. Use _WIN32
to decide whether to use winsock or BSD headers. Also remove these
headers from src/transport_git.c altogether, as they are not needed.
MSYS is very conservative, so we have to tell it that we don't care
about versions of Windows lower than WindowsXP. We also need to tell
CMake to add ws2_32 to the libraries list and we shouldn't add the
-fPIC option, to MSYS because it complains that it does it anyway.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
So far libgit2 didn't handle the following scenarios:
* Rename of reference m -> m/m
* Rename of reference n/n -> n
Fixed.
Since we don't write reflogs, we have to delete any old reflog for the
renamed reference. Otherwise git.git will possibly fail when it finds
invalid logs.
Reported-by: nulltoken <emeric.fermas@gmail.com>
Signed-off-by: schu <schu-github@schulog.org>
git_futils_rmdir_recurs() shall remove the given directory and all
subdirectories. This happens only if the directories are empty.
Signed-off-by: schu <schu-github@schulog.org>
It removes all entries with equal path except last added.
On large indexes git_index_append() + git_index_uniq() before writing is
*much* faster, than git_index_add().
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
The routine remove duplictes from the vector. Only the last added element
of elements with equal keys remains in the vector.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Index operation use git_vector_sort() to sort index entries. Since index
support adding duplicates (two or more entries with the same path), it's
important to preserve order of elements. Preserving order of elements
allows to make decisions based on order. For example it's possible to
implement function witch removes all duplicates except last added.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
In some cases it's important to preserve order of elements with equal
keys (stable sort). qsort(3) doesn't define order of elements with
equal keys.
git__msort() implements merge sort which is stable sort.
Implementation taken from git. Function renamed git_qsort() -> git__msort().
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
git_index_find() in index_insert() is useless if replace is not
requested (append). Do not call it in this case.
It speedup git_index_append() *dramatically* on large indexes.
$ cat index_test.c
int main(int argc, char **argv)
{
git_index *index;
git_repository *repo;
git_odb *odb;
struct git_index_entry entry;
git_oid tree_oid;
char tree_hex[41];
int i;
git_repository_init(&repo, "/tmp/myrepo", 0);
odb = git_repository_database(repo);
git_repository_index(&index, repo);
memset(&entry, 0, sizeof(entry));
git_odb_write(&entry.oid, odb, "", 0, GIT_OBJ_BLOB);
entry.path = "test.file";
for (i = 0; i < 50000; i++)
git_index_append2(index, &entry);
git_tree_create_fromindex(&tree_oid, index);
git_oid_fmt(tree_hex, &tree_oid);
tree_hex[40] = '\0';
printf("tree: %s\n", tree_hex);
git_index_free(index);
git_repository_free(repo);
return 0;
}
Before:
$ time ./index_test
tree: 43f73659c43b651588cc81459d9e25b08721b95d
./index_test 151.19s user 0.05s system 99% cpu 2:31.78 total
After:
$ time ./index_test
tree: 43f73659c43b651588cc81459d9e25b08721b95d
./index_test 0.05s user 0.00s system 94% cpu 0.059 total
About 2573 times speedup on this test :)
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Check if the window structure has actually been allocated before
trying to access it, and don't leak said structure if the map fails.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Without this, hashtable.h doesn't know what uint32_t is and the
compiler thinks that it's a function type.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
If the section header is the last line in the file,
parse_section_header would incorrectly decide that the input had been
truncated.
Fix this by checking whether the actual input line is correctly
formatted.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
git_signature__parse used to be very strict about what's a well-formed
signature. Since git_signature__parse is used only when reading already
existing signatures, we should not care about if it's a valid signature
too much but rather show what we got.
Reported-by: nulltoken <emeric.fermas@gmail.com>
Signed-off-by: schu <schu-github@schulog.org>
The `stat` methods were having issues when called with a trailing slash
in Windows platforms.
We now use GetFileAttributes() where possible, which doesn't have this
restriction.
The old `git_fileops_prettify_path` has been replaced with
`git_path_prettify`. This is a much simpler method that uses the OS's
`realpath` call to obtain the full path for directories and resolve
symlinks.
The `realpath` syscall is the original POSIX call in Unix system and
an emulated version under Windows using the Windows API.
Cleaned up the structure of the whole OS-abstraction layer.
fileops.c now contains a set of utility methods for file management used
by the library. These are abstractions on top of the original POSIX
calls.
There's a new file called `posix.c` that contains
emulations/reimplementations of all the POSIX calls the library uses.
These are prefixed with `p_`. There's a specific posix file for each
platform (win32 and unix).
All the path-related methods have been moved from `utils.c` to `path.c`
and have their own prefix.
The assertion in line 360 was there to check that only loose refs were
being written as loose, but there are times when we need to re-write a
packed reference as loose.
Core Git doesn't care when people use spaces in the email address, even
though this kind of foolery receives the capital punishment in several
states of the USA.
We must obey.
A bunch of redundant methods have been removed from the external API.
- All the reference/tag creation methods with `_f` are gone. The force
flag is now passed as an argument to the normal create methods.
- All the different commit creation methods are gone; commit creation
now always requires a `git_commit` pointer for parents and a `git_tree`
pointer for tree, to ensure that corrupted commits cannot be generated.
- All the different tag creation methods are gone; tag creation now
always requires a `git_object` pointer to ensure that tags are not
created to inexisting objects.
Remove the unused repo and private pointers and make the direction a
flag, as it can only have two states. Change the connect signature to
use an int instead of git_net_direction and remove that enum.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Remove call of gitfo_size, since we call gitfo_lstat anyway; remove some
old workaround code for gitfo_read, which is obsolete now.
Signed-off-by: schu <schu-github@schulog.org>
Rather than an 'private' pointer, make the private structures inherit
from the generic git_transport struct. This way, we only have to worry
about one memory allocation instead of two. The structures are so
simple that this may even make the code use less memory overall.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
By pre-sorting the references, they are already in the right order if
we want to peel them.
With this, we get output-parity with git.git's ls-remote.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
This makes it easier to send a requqest for an URL. It assumes there
is a socket where the string should go out to.
Make git_pkt_gen_proto accept a command parameter, which defaults to
git-upload-pack
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Add a parameter to git_pkt_parse_line to tell it how much data you
have in your buffer. If the buffer is too short, it returns an error
saying so. Adapt the git transport to use this and fix the offset
calculation.
Add the GIT_ESHORTBUFFER error code.
A pkt-line's length are described in its first four bytes in ASCII
hex. Copy this substring to another string before feeding it to
git__strtol32. Otherwise, it will read the whole hash.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
This are the types I intend to use for pkt-line parsing and (later)
creation. git_pkt serves as a base pointer type and once you know what
type it is you can use the real one (command, tip list, etc.)
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
If the strings match, git__fnmatch returns GIT_SUCCESS and
GIT_ENOMATCH on failure to match.
MSVC fixes: Added a test for _MSC_VER and (in that case) defined
HAVE_STRING_H to 1 so it doesn't try to include <strings.h> which
doesn't exist in the MSVC world. Moved the function declarations to
use the modern inline ones so MSVC doesn't have a fit. Added casts
everywhere so MSVC doesn't crap its pants.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Move them to their own functions to avoid duplication and to make it
easier to ignore missing configuration.
Not finding 'fetch' is considered fatal, though this might not be
correct behaviour (push-only remotes?)
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Call gitfo_lstat with the full pathname instead of the relative one,
which fails in case the current working directory is different from
the workdir.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Also allow space for the null-terminator when allocating the buffer in
packfile_unpack_compressed. Up to now, the last newline had served as
a terminator, but 858ef372 searches for a double-newline and exposes
the problem.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
If a config has several files, we need to check all of them before we
can say that a variable doesn't exist.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
This function puts the global and repository configurations in one
git_config object and gives it to the user.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
There is no need to store the format outselves, as the library
provides git_filebuf_printf which takes care of the formatting itself.
Also get rid of an use of strcat + strcpy which is always a nice
thing.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
The better solution would probably be to turn the gitfo_lstat /
gitfo_readlink macros into real functions that wrap either lstat or
gitfo_lstat__w32 (and readlink or gitfo_readlink__w32). However, that
would introduce an indirection unless inlined. For now, this is the less
intrusive change.
Handle Symlinks if they can be handled in Win32. This is not even
compiled. Needs review.
The lstat implementation is modified from core Git.
The readlink implementation is modified from PHP.
In git, the short message of a commit is the part of the commit message before 2 consecutive line breaks. In the short message, line breaks are replaced by space characters.
After each variable gets set, we store it in our list (not completely
in the right position, but the close enough). Then we write out the
new config file in the same way that git.git does it (keep the rest of
the file intact and insert or replace the variable in its line).
Overwriting variables and adding new ones is supported (even on new
sections), though deleting isn't yet.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
The section name should be stored in its case-sensitive variant when
we are adding a new variable. Use the internalize_section function to
do just that.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
The (rather late) early-exit code, which provides a negligible
optimisation causes cvar_match_section to return false negatives when
it's called with a section name instead of a full variable name.
Remove this optimisation.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Add a check for the file descriptor in git_filebuf_cleanup. Without
it, an existing lockfile would be deleted if we tried to acquire it
(but failed, as the lockfile already existed).
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
When calling gitfo_exists() on a symbolic link, sometimes we need to
simply check whether the link exists and sometimes we need to check
whether the file pointed to by the symlink exists.
Introduce a new function gitfo_shallow_exists that only checks if the
link exists and revert gitfo_exists to the original functionality of
checking whether the file pointed to by the link exists.
This reverts commit df1c98ab6d6171ed63729195bd190b54b67fe530.
As 8a27b6b reverts the exposition of struct stat to the external API, we
do not need - indeed, do not want - struct stat to be in the outer
include layer.
00582bc introduced a change that required the caller of
git_blob_create_fromfile() to pass a struct stat with the stat
information for the file. Several developers pointed out that this would
make life hard for the bindings developers as struct stat isn't widely
supported by other languages.
Make git_blob_create_fromfile() stat the path itself, eliminating the
need for the file to be stat'ed by the caller. This makes
index_init_entry() more costly as the file will be stat'ed twice but
makes life easier for everyone else.
00582bcb introduced a change to git_blob_create_fromfile() that required
the caller to pass a stat struct. This means that we need to include
stat.h higher in the hierarchy of includes.
The entry mode flags for an entry created from a path name were not
correctly written if the entry was a symlink. The st_mode of a statted
symlink is 0120777, however git requires the mode to read 0120000,
because it does not care about permissions of symlinks.
Introduce index_create_mode() that correctly writes the mode flags in
the form expected by git.
gitfo_exists() used to error out if the given file was a symbolic link,
due to access() returning an error code. This is not expected behaviour,
as gitfo_exists() should only check whether the file itself exists, not
its link target if it is a symbolic link.
Fix this by calling gitfo_lstat() instead, which is just a wrapper for
lstat().
Also fix the same error in index_init_entry().
In order to be able to write symlinks with git_blob_create_fromfile(),
we need to check whether the file to be written is a symbolic link or
not. Since the calling function of git_blob_create_fromfile() is likely to have
stated the file before calling, we make it pass the stat.
The reason for this is that writing symbolic link blobs is significantly
different from writing ordinary files - we do not want to open the link
destination but instead want to write the link itself, regardless of
whether it exists or not.
Previously, index_init_entry() used to error out if the file to be added
was a symlink that pointed to a nonexistent file. Fix this behaviour to
add the file regardless of whether it exists. This mimics git.git's
behaviour.
As suggested by carlosmn, git_oid_ncmp would probably
be a better name than git_oid_match, for it does the same
as git_oid_cmp but only up to a certain amount of hex digits.
Feature Added: Search an unmerged entry by path (git_index_get_unmerged
renamed to git_index_get_unmerged_bypath) or by index (git_index_get_unmerged_byindex).
Now the ceiling_dirs are compared with their symbolic free version (like base_path).
The ceiling dirs check is now performed after getting the parent directory.
retrieve_device returns the file device for a given path (so that we can detect device change while walking through parent directories).
abspath returns a canonicalized path, symbolic link free.
retrieive_ceiling_directories_offset returns the biggest path offset that path match in the ceiling directory list (so that we can stop at ceiling directories).