Update all stack allocations of git_filebuf to use GIT_FILEBUF_INIT
and make git_filebuf_open and git_filebuf_cleanup safe to be called
multiple times on the same buffer.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
reference_rename used to delete an old reflog file when renaming a
reference to not confuse git.git. Don't do this anymore but let the user
take care of writing a reflog entry.
Signed-off-by: schu <schu-github@schulog.org>
There is no good reason to expose the negotiation as a different step
to downloading the packfile.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
It's redundant to do this (git doesn't) and Windows doesn't allow us
to overwrite a read-only file (which objects are).
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
See `global.c` for a description of what we're doing.
When libgit2 is built with GIT_THREADS support, the threading system
must be explicitly initialized with `git_threads_init()`.
This new version of the references code is significantly faster and
hopefully easier to read.
External API stays the same. A new method `git_reference_reload()` has
been added to force updating a memory reference from disk. In-memory
references are no longer updated automagically -- this was killing us.
If a reference is deleted externally and the user doesn't reload the
memory object, nothing critical happens: any functions using that
reference should fail gracefully (e.g. deletion, renaming, and so on).
All generated references from the API are read only and must be free'd
by the user. There is no reference counting and no traces of generated
references are kept in the library.
There is no longer an internal representation for references. There is
only one reference struct `git_reference`, and symbolic/oid targets are
stored inside an union.
Packfile references are stored using an optimized struct with flex array
for reference names. This should significantly reduce the memory cost of
loading the packfile from disk.
git_reference_rename() didn't properly cleanup old references given by
the user to not break some ugly old tests. Since references don't point
to libgit's internal cache anymore we can cleanup git_reference_rename()
to be somewhat less messy.
Signed-off-by: schu <schu-github@schulog.org>
Currently libgit2 shares pointers to its internal reference cache with
the user. This leads to several problems like invalidation of reference
pointers when reordering the cache or manipulation of the cache from
user side.
Give each user its own git_reference instead of leaking the internal
representation (struct reference).
Add the following new API functions:
* git_reference_free
* git_reference_is_packed
Signed-off-by: schu <schu-github@schulog.org>
This ensures that entries from the working directory are retrieved according to the following rules:
- The file "subdir" should appear before the file "subdir.txt"
- The folder "subdir" should appear after the file "subdir.txt"
The only caller has been changed to treat a NULL tree as a special
case and use the existing git_tree_entry_byindex.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
This function is already implemented (better) as git_index_get. Change
the only caller to use that function.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Our previous assumption that all paths in Windows are encoded in UTF-8
is rather weak, specially when considering that Git is
encoding-agnostic.
These set of functions allow the user to change the library's active
codepage globally, so it is possible to access paths and files on all
international versions of Windows.
Note that the default encoding here is UTF-8 because we assume that 99%
of all Git repositories will be in UTF-8.
Also, if you use non-ascii characters in paths, anywhere, please burn on
a fire.
libgit2 currently identifies loose objects as corrupt if they've been
deflated using a window size less than 32Kb, because the
is_zlib_compressed_data() function doesn't recognise the header
byte as a zlib header. This patch makes the method tolerant of
all valid window sizes (15-bit to 8-bit) - but doesn't sacrifice
it's accuracy in distingushing the standard loose-object format
from the experimental (now abandoned) format. It's based on a patch
which has been merged into C-Git master branch:
https://github.com/git/git/commit/7f684a2aff636f44a506
On memory constrained systems zlib may use a much smaller window
size - working on Agit, I found that Android uses a 4KB window;
giving a header byte of 0x48, not 0x78. Consequently all loose
objects generated by the Android platform appear 'corrupt' :(
It might appear that this patch changes isStandardFormat() to the
point where it could incorrectly identify the experimental format as
the standard one, but the two criteria (bitmask & checksum) can only
give a false result for an experimental object where both of the
following are true:
1) object size is exactly 8 bytes when uncompressed (bitmask)
2) [single-byte in-pack git type&size header] * 256
+ [1st byte of the following zlib header] % 31 = 0 (checksum)
As it happens, for all possible combinations of valid object type
(1-4) and window bits (0-7), the only time when the checksum will be
divisible by 31 is for 0x1838 - ie object type *1*, a Commit - which,
due the fields all Commit objects must contain, could never be as
small as 8 bytes in size.
Given this, the combination of the two criteria (bitmask & checksum)
always correctly determines the buffer format, and is more tolerant
than the previous version.
References:
Android uses a 4KB window for deflation:
http://android.git.kernel.org/?p=platform/libcore.git;a=blob;f=luni/src/main/native/java_util_zip_Deflater.cpp;h=c0b2feff196e63a7b85d97cf9ae5bb258
Code snippet searching for false positives with the zlib checksum:
https://gist.github.com/1118177
Change-Id: Ifd84cd2bd6b46f087c9984fb4cbd8309f483dec0
Remove a wrong call to git_mwindow_close which caused a segfault if it
ever did run. In that same piece of code, if the LRU was from the
first wiindow in the list in a different file, we didn't update that
list, so the first element had been freed.
Fix these two issues.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
The following files now have 0444 permissions:
- loose objects
- pack indexes
- pack files
- packs downloaded by fetch
- packs downloaded by the HTTP transport
And the following files now have 0666 permissions:
- config files
- repository indexes
- reflogs
- refs
This brings libgit2 more in line with Git.
Note that git_filebuf_commit() and git_filebuf_commit_at() have both
gained a new mode parameter.
The latter change fixes an important issue where filebufs created with
GIT_FILEBUF_TEMPORARY received 0600 permissions (due to mkstemp(3)
usage). Now we chmod() the file before renaming it into place.
Tests have been added to confirm that new commit, tag, and tree
objects are created with the right permissions. I don't have access to
Windows, so for now I've guarded the tests with "#ifndef GIT_WIN32".
To further match how Git behaves, this change makes most of the
directories libgit2 creates in a git repo have a file mode of
0777. Specifically:
- Intermediate directories created with git_futils_mkpath2file() have
0777 permissions. This affects odb_loose, reflog, and refs.
- The top level folder for bare repos is created with 0777
permissions.
- The top level folder for non-bare repos is created with 0755
permissions.
- /objects/info/, /objects/pack/, /refs/heads/, and /refs/tags/ are
created with 0777 permissions.
Additionally, the following changes have been made:
- fileops functions that create intermediate directories have grown a
new dirmode parameter. The only exception to this is filebuf's
lock_file(), which unconditionally creates intermediate directories
with 0777 permissions when GIT_FILEBUF_FORCE is set.
- The test runner now sets the umask to 0 before running any
tests. This ensurses all file mode checks are consistent across
systems.
- t09-tree.c now does a directory permissions check. I've avoided
adding this check to other tests that might reuse existing
directories from the prefabricated test repos. Because they're
checked into the repo, they have 0755 permissions.
- Other assorted directories created by tests have 0777 permissions.
This makes libgit2 more closely match Git, which only checks for
ambiguous pack entries when given short hashes.
Note that the only time this is ever relevant is when a pack has the
same object more than once (it's happened in the wild, I promise).
When trying to find the end of an email, instead of starting at the
beginning of the signature, we start at the end of the name (after the
first '<').
This brings libgit2 more in line with Git's behavior when reading out
existing signatures.
However, note that Git does not allow names like these through the
usual porcelain; instead, it silently strips any '>' characters it
sees.
The v0.99 tag in the Git repo triggers this behavior:
http://git.kernel.org/?p=git/git.git;a=tag;h=d6602ec5194c87b0fc87103ca4d67251c76f233a
Ideally, we'd allow the tag to be instantiated even though the tagger
field is missing, but this at the very least prevents libgit2 from
crashing.
To test this bug, a new repository has been added based on the test
branch in testrepo.git. It contains a "e90810b" tag that looks like
this:
object e90810b8df3e80c413d903f631643c716887138d
type commit
tag e90810b
This is a very simple tag.
This ensures commit->message is always non-NULL, even if the commit
message is empty or consists of only a newline.
One such commit can be found in the wild in the jQuery repository:
25b424134f
Unfortunately, we can't use the function in fetch.c due to chunked
encoding and keep-alive connections.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Using a different buffer in each function means that some data might
get lost. Store all the data in a buffer in the transport object.
Take this opportunity to use the generic download-pack function.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Taken mostly from the git transport's version, this can be used by any
transport that takes its pack data from the network.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
It's a bit awkward to run it as an extra step, and HTTP may need to
send the wants list several times.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
As we don't know the length of the message we want to send to the
other end, we send a chunk size before each message. In later
versions, sending the wants might benefit from batching the lines
together.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Not every request needs a new connection if we're using a keep-alive
connection. Store the HTTP parser, host and port in the transport in
order to have it available in later calls.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
It's rare for a configured remote, but for one given as an URL on the
command line, it's more often than not the case.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
As we no longer use the STRLEN macro, the NUL-terminator in the string
was not copied over. Fix this.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
The documentation is a bit misleading. The subsection name is always
case-sensitive, but with a [section.subsection] header, the subsection
is transformed to lowercase when the configuration is parsed.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Winsock wants us to use closesocket() instead of close(), so introduce
the gitno_close function, which does the right thing.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
memset the structure on initialisation and don't try to dereference
the vector with the heads if we didn't find a repository.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Windows wants us to initialise the networking DLL before we're allowed
to send data through a socket. Call WSASetup and WSACleanup if
GIT_WIN32 is defined.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
This is clearer and sidesteps the issue of what the return value of
snprintf is on the particular OS we're running on.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
If ExpandEnvironmentStringsW is successful, it returns the amount of
characters written, including the NUL terminator.
Thanks to Emeric for reading the MSDN documentation correctly.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
6c8b458 removed an "unused" variable needed for git_hashtable_insert2(),
causing a segfault in reference_rename(). Instead, use
git_hashtable_insert().
Signed-off-by: schu <schu-github@schulog.org>
In libgit2: Move an enum out of an int bitfield in the HTTP transport.
In the parser: Use int bitfields and change some variable sizes to
better fit thir use. Variables that count the size of the data chunk
can only ever be as large as off_t. Warning 4127 can be ignored, as
nobody takes it seriously anyway.
From Emeric: change some variable declarations to keep MSVC happy.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
git_repository_config wants to take the global and system paths again
so that one can be explicit if needed.
The git_repository_config_autoload function is provided for the cases
when it's good enough for the library to guess where those files are
located.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Taking advantage of the tree cache, git_tree_create_fromindex becomes
comparable in speed to git write-tree when the cache is available.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
When a file is updated in the index, it's path needs to be invalidated
in the tree cache as the hash is no longer correct.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
Whenever a file is updated in the index, each tree leading towards it
needs to be invalidated. Provide the supporting function.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
The fix introduced in a02fc2cd1 (2011-05-24; index: correctly parse
invalidated TREE extensions) threw out the rest of the data in the
extension if it found an invalidated entry. This was the result of
incorrect reading of the documentation.
Insted, keep reading the extension, as there may be cached data we can
use.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
reference_rename() recently failed when renaming an existing reference
refs/heads/foo/bar -> refs/heads/foo because of a change in the
underlying functions / error codes. Fixes#412.
Signed-off-by: schu <schu-github@schulog.org>
GIT_EOVERFLOW means something different. Use GIT_ESHORTBUFFER. On the
way, remove a redundant sizeof(char).
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
There were quite a few places were spaces were being used instead of
tabs. Try to catch them all. This should hopefully not break anything.
Except for `git blame`. Oh well.
1. The license header is technically not valid if it doesn't have a
copyright signature.
2. The COPYING file has been updated with the different licenses used in
the project.
3. The full GPLv2 header in each file annoys me.
For unsigned types, the comparison >= 0 is always true, so avoid it by using
a post-decrement and integrating the initial assigment into the loop body.
No change in behavior is intended.
When Content-Type is the last field, we only know when we can store it
when we reach on_headers_complete.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Depending on what we want to do, we expect the Content-Type field to
have different contents. Store which service to expect instead of
hard-coding the string.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
There is no point in reinventing the wheel when using the treebuilder
is much more straightforward and makes the code more readable. There
is no optimisation, and the performance is no worse than when writing
the tree object ourselves.
write_deflate() used to ignore errors by zlib's deflate function when
not compiling in DEBUG mode. Always read $result and throw an error
instead.
Signed-off-by: schu <schu-github@schulog.org>
As we no longer expose the transport functions, this is now the only
way to connect to a remote when given an URL instead of a remote name
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Sockets on Windows are unsigned, so define a type GIT_SOCKET which is
signed or unsigned depending on the platform.
Thanks to Em for his patience with this.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
GCC produces several -Wuninitialized warnings. Most of them can be fixed
if we make visible for gcc that git__throw() and git__rethrow() always
return first argument.
Signed-off-by: Kirill A. Shutemov <kirill@shutemov.name>
Some servers take a long time to answer and expect us to keep sending
want lines; otherwise they close the connection. Avoid this by waiting
for one second for the server to answer. If the timeout runs out,
treat is as a NAK and keep sending want lines.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Only signal that we need a pack if we do need it and don't send a want
just because it's the first. If we don't need to download the pack,
then we can skip all of the negotiation and just return success.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
There are many ways how a transport might negotiate with the server,
so instead of making it fit into the smart protocol model, let the
transport do its thing. For now, the git protocol limits itself to
send only 160 "have" lines so we don't flood the server.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
The original was written before any code was written and had nothing
to do with the way things are actually done.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
There is no need to inspect what the local repository is like. Only
check whether the objects exist locally.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Make sure we only try to download the pack if we find the pack header
in the stream, and not if the server takes a bit longer to send us the
last NAK.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
This function updates the references in the local reference storage to
match the ones in the remote.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
When indexing a file with ref deltas, a temporary cache for the
offsets has to be built, as we don't have an index file yet. If the
user takes the responsiblity for filling the cache, the packing code
will look there first when it finds a ref delta.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Provide the git_remote_download function to instruct the library to
downlad the packfile and let the user know the temporary location.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Move the generation of the want-list to be done from the negotiate
function, and keep the filtered references inside the remote
structure.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
Configurations when taken from a repository and remotes should be
identifiable as coming from a particular repository. This allows us to
reduce the amount of variables that the user has to keep track of.
Signed-off-by: Carlos Martín Nieto <carlos@cmartin.tk>
We need to do an unsigned comparison, as otherwise UTF-8 characters
might look like they have the sign bit set and the check will fail.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>