Add structures and preliminary functions to take a buffer, file or
blob and write the contents in chunks through an arbitrary number
of chained filters, finally writing into a user-provided function
accept the contents.
Increment refcount of newly added cache entries just like existing
entries looked up from the cache. Otherwise the new entry can be
evicted from the cache and destroyed while it's still in use.
Always lock the index when we begin the merge, before we write
any of the metdata files. This prevents a race where another
client may run a commit after we have written the MERGE_HEAD but
before we have updated the index, which will produce a merge
commit that is treesame to one parent. The merge will finish and
update the index and the resultant commit would not be a merge at
all.
Introduce `git_indexwriter`, to allow us to lock the index while
performing additional operations, then complete the write (or abort,
unlocking the index).
Make our overflow checking look more like gcc and clang's, so that
we can substitute it out with the compiler instrinsics on platforms
that support it. This means dropping the ability to pass `NULL` as
an out parameter.
As a result, the macros also get updated to reflect this as well.
Win32 generally ignores Unix-like mode bits that don't make any
sense on the platform (eg `0644` makes no sense to Windows). But
WINE complains loudly when presented with POSIXy bits. Remove them.
(Thanks @phkelley)
Use `size_t` to hold the size of arrays to ease overflow checking,
lest we check for overflow of a `size_t` then promptly truncate
by packing the length into a smaller type.
Introduce git__reallocarray that checks the product of the number
of elements and element size for overflow before allocation. Also
introduce git__mallocarray that behaves like calloc, but without the
`c`. (It does not zero memory, for those truly worried about every
cycle.)
Fixes#2869. If included file includes more files, it may reallocate
cfg_file->readers, hence invalidate not only `r` pointer, but `result`
pointer as well.
`p_stat` calls `git_win32_path_from_utf8`, which canonicalizes the
path. Do not further try to modify the path, else we trim the
trailing slash from a root directory and try to access `C:` instead
of `C:/`.
During checkout, assume that the .gitattributes files aren't
modified during the checkout. Instead, create an "attribute session"
during checkout. Assume that attribute data read in the same
checkout "session" hasn't been modified since the checkout started.
(But allow subsequent checkouts to invalidate the cache.)
Further, cache nonexistent git_attr_file data even when .gitattributes
files are not found to prevent re-scanning for nonexistent files.
pathspec_match_free() should not dereference a NULL passed to it.
I found this issue when I tried to run example log program with
nonexistent branch:
./example/log help
Such call leads to segmentation fault.
This fixes the build at least on FreeBSD, where those types were not
defined indirectly:
src/openssl_stream.c💯18: error: variable has incomplete type 'struct in6_addr'
struct in6_addr addr6;
^
src/openssl_stream.c💯9: note: forward declaration of 'struct in6_addr'
struct in6_addr addr6;
^
src/openssl_stream.c:111:18: error: use of undeclared identifier 'AF_INET'
if (p_inet_pton(AF_INET, host, &addr4)) {
^
src/unix/posix.h:31:40: note: expanded from macro 'p_inet_pton'
^
src/openssl_stream.c:115:18: error: use of undeclared identifier 'AF_INET6'
if(p_inet_pton(AF_INET6, host, &addr6)) {
^
src/unix/posix.h:31:40: note: expanded from macro 'p_inet_pton'
^
This is supported by the underlying set_index() implementation
and setting the repository index to NULL is recommended by the
git_repository_set_bare() documentation.
On case insensitive filesystems, we may have files in the working
directory that case fold to a name we want to write. Remove those
files (by default) so that we will not end up with a filename that
has the unexpected case.
The documentation for `git_path_join_unrooted` states that the base
length will be returned, so that consumers like checkout know where
to start creating directories instead of always creating directories
at the directory root.
The implementation of the hashsig API disallows computing a signature on
small files containing only a few lines. This new flag disables this
behavior.
git_diff_find_similar() sets this flag by default which means that rename
/ copy detection of small files will now work. This in turn affects the
behavior of the git_status and git_blame APIs which will now detect rename
of small files assuming the right options are passed.
git_remote_download() must also clear the internal push state resulting from a possible earlier push operation. Otherwise calling git_remote_update_tips() will execute the push version instead of the fetch version and among other things, tags won't be updated.
The clock_gettime() function is normally not available under
AmigaOS, hence another solution is required. We are using now
GetUpTime() that is present in current versions of this
operating system.
The openssl setup function needs to be GIT_EXPORT'ed, be sure
to include the `sys/openssl.h` header so that it is appropriately
decorated as an export function.
A byte value of 0x85 is not whitespace, we were conflating that with
U+0085 (UTF8: 0xc2 0x85). This caused us to incorrectly treat valid
multibyte characters like U+88C5 (UTF8: 0xe8 0xa3 0x85) as whitespace.
On a case-insensitive filesystem, we need to deal with case-changing
renames (eg, foo -> FOO) by removing the old and adding the new,
exactly as if we were on a case-sensitive filesystem.
Update the `checkout::tree::can_cancel_checkout_from_notify` test, now
that notifications are always sent case sensitively.
For the REUC and NAME entries, we use size_t internally, and we take
size_t for the get_byindex() functions, but the entrycount() functions
strangely cast to an unsigned int instead.
This introduces the functionality of submodule update in
'git_submodule_do_update'. The existing 'git_submodule_update' function is
renamed to 'git_submodule_update_strategy'. The 'git_submodule_update'
function now refers to functionality similar to `git submodule update`,
while `git_submodule_update_strategy` is used to get the configured value
of submodule.<name>.update.
Path validation may be influenced by `core.protectHFS` and
`core.protectNTFS` configuration settings, thus treebuilders
can take a repository to influence their configuration.
HFS filesystems ignore some characters like U+200C. When these
characters are included in a path, they will be ignored for the
purposes of comparison with other paths. Thus, if you have a ".git"
folder, a folder of ".git<U+200C>" will also match. Protect our
".git" folder by ensuring that ".git<U+200C>" and friends do not match it.
Disallow:
1. paths with trailing dot
2. paths with trailing space
3. paths with trailing colon
4. paths that are 8.3 short names of .git folders ("GIT~1")
5. paths that are reserved path names (COM1, LPT1, etc).
6. paths with reserved DOS characters (colons, asterisks, etc)
These paths would (without \\?\ syntax) be elided to other paths - for
example, ".git." would be written as ".git". As a result, writing these
paths literally (using \\?\ syntax) makes them hard to operate with from
the shell, Windows Explorer or other tools. Disallow these.
When turning UTF-8 paths into UCS-2 paths for Windows, always use
the \\?\-prefixed paths. Because this bypasses the system's
path canonicalization, handle the canonicalization functions ourselves.
We must:
1. always use a backslash as a directory separator
2. only use a single backslash between directories
3. not rely on the system to translate "." and ".." in paths
4. remove trailing backslashes, except at the drive root (C:\)
This option does not get persisted to disk, which makes it different
from the rest of the setters. Remove it until we go all the way.
We still respect the configuration option, and it's still possible to
perform a one-time prune by calling the function.
For each remote-tracking branch we want to remove, we need to consider
it against every other refspec in case we have overlapping refspecs,
such as with
refs/heads/*:refs/remotes/origin/*
refs/pull/*/head:refs/remotes/origin/pr/*
as we'd otherwise remove too many refspecs.
Create a list of condidates, which are the references matching the rhs
of any active refspec and then filter that list by removing those
entries for which we find a remove reference with any active
refspec. Those which are left after this are removed.
Most of the network-facing facilities have been copied to the socket and
openssl streams. No code now uses these functions directly anymore, so
we can now remove them.
Having an ssh stream would require extra work for stream capabilities we
don't need anywhere else (oob auth and command execution) so for now
let's move away from the gitno connection to use socket_stream.
We can introduce an ssh stream interface if and as we need it.
This unfortunately isn't as stackable as could be possible, as it
hard-codes the socket stream. This is because the method of using a
custom openssl BIO is not clear, and we do not need this for now. We can
still bring this in if and as we need it.
We currently have gitno for talking over TCP, but this needs to know
about both plaintext and OpenSSL connections and the code has gotten
somewhat messy with ifdefs determining which version of the function
should be called.
In order to clean this up and abstract away the details of sending over
the different types of streams, we can instead use an interface and
stack stream implementations.
We may not be able to use the stackability with all streams, but we
are definitely be able to use the abstraction which is currently spread
between different bits of gitno.
A rule can only negate something which was explicitly mentioned in the
rules before it. Change our parsing to ignore a negative rule which does
not negate something mentioned in the rules above it.
While here, fix a wrong allocator usage. The memory for the match string
comes from pool allocator. We must not free it with the general
allocator. We can instead simply forget the string and it will be
cleaned up.
We no longer have NULL strings, but empty ones and duplicate the sides
if necessar, so the first check will never do anything.
While in the area, remove unnecessary ifs and early returns.
There are some combination of objects and target types which we know
cannot be fulfilled. Return EINVALIDSPEC for those to signify that there
is a mismatch in the user-provided data and what the object model is
capable of satisfying.
If we start at a tag and in the course of peeling find out that we
cannot reach a particular type, we return EPEEL.
That's a bad assumption to make, even though right now it holds
(because of the way we've implemented decompression of packfiles),
this may change in the future, given that ODB objects can be
binary data.
Furthermore, the ODB object can return a NULL pointer if the object
is empty. Copying the NULL pointer to the strbuf lets us handle it
like an empty string. Again, the NULL pointer is valid behavior because
you're supposed to check the *size* of the object before working
on it.
This is a contract that we made in the library and that we need to uphold. The
contents of a blob can never be NULL because several parts of the library (including
the filter and attributes code) expect `git_blob_rawcontent` to always return a
valid pointer.
When we fetch twice with the same remote object, we did not properly
clear the connection flags, so we would leak state from the last
connection.
This can cause the second fetch with the same remote object to fail if
using a HTTP URL where the server redirects to HTTPS, as the second
fetch would see `use_ssl` set and think the initial connection wanted to
downgrade the connection.
When attempting to update a reference on a remote during push, and the
reference on the remote refers to a commit that does not exist locally,
then we should report a more clear error message.
When creating a new remote, contrary to loading one from disk,
active_refspecs was not populated. This means that if using the new
remote to push, git_push_update_tips() will be a no-op since it
checks the refspecs passed during the push against the base ones
i.e. active_refspecs. And therefore the local refs won't be created
or updated after the push operation.
There is one well-known and well-tested parser which we should use,
instead of implementing parsing a second time.
The common parser is also augmented to copy the LHS into the RHS if the
latter is empty.
The expressions test had to change a bit, as we now catch a bad RHS of a
refspec locally.
This function, similar in style to git_remote_fetch(), performs all the
steps required for a push, with a similar interface.
The remote callbacks struct has learnt about the push callbacks, letting
us set the callbacks a single time instead of setting some in the remote
and some in the push operation.
This describes their purpose better, as we now initialize ssl and some
other global stuff in there. Calling the init function is not something
which has been optional for a while now.
git hardocodes these as objects which exist regardless of whether they
are in the odb and uses them in the shell interface as a way of
expressing the lack of a blob or tree for one side of e.g. a diff.
In the library we use each language's natural way of declaring a lack of
value which makes a workaround like this unnecessary. Since git uses it,
it does however mean each shell application would need to perform this
check themselves.
This makes it common work across a range of applications and an issue
with compatibility with git, which fits right into what the library aims
to provide.
Thus we introduce the hard-coded empty blob and tree in the odb
frontend. These hard-coded objects are checked for before going to the
backends, but after the cache check, which means the second time they're
used, they will be treated as normal cached objects instead of creating
new ones.
If the remote is anonymous, then we cannot check for any configuration,
as there is no name. Check for this before we try to use the name, which
may be a NULL pointer.
This fixes#2697.
This function has one output but can match multiple files, which can be
unexpected for the user, which would usually path the exact path of the
file he wants the status of.
We cannot know from looking at .gitmodules whether a directory is a
submodule or not. We need the index or tree we are comparing against to
tell us. Otherwise we have to assume the entry in .gitmodules is stale
or otherwise invalid.
Thus we pass the index of the repository into the workdir iterator, even
if we do not want to compare against it. This follows what git does,
which even for `git diff <tree>`, it will consider staged submodules as
such.
A rule "src" in src/.gitignore must only match subdirectories of
src/. The current code does not include this context in the match rule
and would thus consider this rule to match the top-level src/ directory
instead of the intended src/src/.
Keep track fo the context in which the rule was defined so we can
perform a prefix match.
We currently consider CR to start the end of the line, but that means
that we miss cases with CR CR LF which can be used with git to match
files whose names have CR at the end of their names.
The fix from the patch comes from Russell's comment in the issue.
This fixes#2536.
Before trying to rtransform using the given refspec to figure out what
the name of the upstream branch is on the remote, we must make sure that
the target of the refspec applies to the current branch's upstream.
* Error-handling is cleaned up to only let a file-not-found error
through, not other sorts of errors. And when a file-not-found
error happens, we clean up the error.
* Test now checks that file-not-found introduces no error. And
other minor cleanups.
The create function with default refspec is the same as the one with a
custom refspec, but it has the default refspec, so we can create the one
on top of the other.
When we first ask OpenSSL to verify the certfiicate itself (rather
than the HTTPS specifics), we should also return
GIT_ECERTIFICATE. Otherwise, the caller would consider this as a failed
operation rather than a failed validation and not call the user's own
validation.
For example, if you have
[include]
path = foo
and foo didn't exist, git_config_open_ondisk() would just give up
on the rest of the file. Now it ignores the unresolved include
without error and continues reading the rest of the file.
Already cherry-picked commits should not be re-included. If all changes
included in a commit exist in the upstream, then we should error with
GIT_EAPPLIED.
`git_rebase_next` will apply the next patch (or cherry-pick)
operation, leaving the results checked out in the index / working
directory so that consumers can resolve any conflicts, as appropriate.
Remote objects are not meant to be changed from under the user. We did
this in rename, but only the name and left the refspecs, such that a
save would save the wrong refspecs (and a fetch and anything else would
use the wrong refspecs).
Instead, let's simply take a name and not change any loaded remote from
under the user.
This function does not in fact tell us anything, as almost anything with
a colon in it is a valid rsync-style SSH path; it can not tell us that
we do not support ftp or afp or similar as those are still valid SSH
paths and we do support that.
All versions of SSL are considered deprecated now, so let's ask OpenSSl
to only use TLSv1. We still ask it to load those ciphers for
compatibility with servers which want to use an older hello but will use
TLS for encryption.
For good measure we also disable compression, which can be exploitable,
if the OpenSSL version supports it.
Git for Windows will handle UNC paths only when in forward-slash
format, eg "//server/path". When given a UNC path as a remote,
rewrite standard format ("\\server\path") into this ridiculous
format.
The entry_count field is the amount of index entries covered by a
particular cache entry, that is how many files are there (recursively)
under a particular directory.
The current code that attemps to do this is severely defincient and is
trying to count the amount of children, which always comes up to zero.
We don't even need to recount, since we have the information during the
cache creation. We can take that number and keep it, as we only ever
invalidate or replace.
FindFirstFile will fail with INVALID_HANDLE_VALUE if there are no
children to the given path, which can happen if the given path is a
file (and obviously has no children) or if the given path is an empty
mount point. (Most directories have at least directory entries '.'
and '..', but ridiculously another volume mounted in another drive
letter's path space do not, and thus have nothing to enumerate.)
If FindFirstFile fails, check if this is a directory-like thing
(a mount point).
A reparse point that is an IO_REPARSE_TAG_MOUNT_POINT could be
a junction or an actual filesystem mount point. (Who knew?)
If it's the latter, its reparse point will report the actual
volume information \??\Volume{GUID}\ and we should not attempt
to dereference that further, instead readlink should report
EINVAL since it's not a symlink / junction and its original
path was canonical.
Yes, really.
An obvious place to fill the tree cache is on write-tree, as we're
guaranteed to be able to fill in the whole tree cache.
The way this commit does this is not the most efficient, as we read the
root tree from the odb instead of filling in the cache as we go along,
but it fills the cache such that successive operations (and persisting
the index to disk) will be able to take advantage of the cache, and it
reuses the code we already have for filling the cache.
Filling in the cache as we create the trees would require some
reallocation of the children vector, which is currently not possible
with out pool implementation. A different data structure would likely
allow us to perform this operation at a later date.
Keeping the cache around after read-tree is only one part of the
optimisation opportunities. In order to share the cache between program
instances, we need to write the TREE extension to the index.
Do so, taking the opportunity to rename 'entries' to 'entry_count' to
match the name given in the format description. The included test is
rather trivial, but works as a sanity check.
When reading from a tree, we know what every tree is going to look like,
so we can fill in the tree cache completely, making use of the index for
modification of trees a lot quicker.
This simplifies freeing the entries quite a bit; though there aren't
that many failure paths right now, introducing filling the cache from a
tree will introduce more. This makes sure not to leak memory on errors.