The clock_gettime() function is normally not available under
AmigaOS, hence another solution is required. We are using now
GetUpTime() that is present in current versions of this
operating system.
The openssl setup function needs to be GIT_EXPORT'ed, be sure
to include the `sys/openssl.h` header so that it is appropriately
decorated as an export function.
A byte value of 0x85 is not whitespace, we were conflating that with
U+0085 (UTF8: 0xc2 0x85). This caused us to incorrectly treat valid
multibyte characters like U+88C5 (UTF8: 0xe8 0xa3 0x85) as whitespace.
On a case-insensitive filesystem, we need to deal with case-changing
renames (eg, foo -> FOO) by removing the old and adding the new,
exactly as if we were on a case-sensitive filesystem.
Update the `checkout::tree::can_cancel_checkout_from_notify` test, now
that notifications are always sent case sensitively.
For the REUC and NAME entries, we use size_t internally, and we take
size_t for the get_byindex() functions, but the entrycount() functions
strangely cast to an unsigned int instead.
This introduces the functionality of submodule update in
'git_submodule_do_update'. The existing 'git_submodule_update' function is
renamed to 'git_submodule_update_strategy'. The 'git_submodule_update'
function now refers to functionality similar to `git submodule update`,
while `git_submodule_update_strategy` is used to get the configured value
of submodule.<name>.update.
Path validation may be influenced by `core.protectHFS` and
`core.protectNTFS` configuration settings, thus treebuilders
can take a repository to influence their configuration.
HFS filesystems ignore some characters like U+200C. When these
characters are included in a path, they will be ignored for the
purposes of comparison with other paths. Thus, if you have a ".git"
folder, a folder of ".git<U+200C>" will also match. Protect our
".git" folder by ensuring that ".git<U+200C>" and friends do not match it.
Disallow:
1. paths with trailing dot
2. paths with trailing space
3. paths with trailing colon
4. paths that are 8.3 short names of .git folders ("GIT~1")
5. paths that are reserved path names (COM1, LPT1, etc).
6. paths with reserved DOS characters (colons, asterisks, etc)
These paths would (without \\?\ syntax) be elided to other paths - for
example, ".git." would be written as ".git". As a result, writing these
paths literally (using \\?\ syntax) makes them hard to operate with from
the shell, Windows Explorer or other tools. Disallow these.
When turning UTF-8 paths into UCS-2 paths for Windows, always use
the \\?\-prefixed paths. Because this bypasses the system's
path canonicalization, handle the canonicalization functions ourselves.
We must:
1. always use a backslash as a directory separator
2. only use a single backslash between directories
3. not rely on the system to translate "." and ".." in paths
4. remove trailing backslashes, except at the drive root (C:\)
This option does not get persisted to disk, which makes it different
from the rest of the setters. Remove it until we go all the way.
We still respect the configuration option, and it's still possible to
perform a one-time prune by calling the function.
For each remote-tracking branch we want to remove, we need to consider
it against every other refspec in case we have overlapping refspecs,
such as with
refs/heads/*:refs/remotes/origin/*
refs/pull/*/head:refs/remotes/origin/pr/*
as we'd otherwise remove too many refspecs.
Create a list of condidates, which are the references matching the rhs
of any active refspec and then filter that list by removing those
entries for which we find a remove reference with any active
refspec. Those which are left after this are removed.
Most of the network-facing facilities have been copied to the socket and
openssl streams. No code now uses these functions directly anymore, so
we can now remove them.
Having an ssh stream would require extra work for stream capabilities we
don't need anywhere else (oob auth and command execution) so for now
let's move away from the gitno connection to use socket_stream.
We can introduce an ssh stream interface if and as we need it.
This unfortunately isn't as stackable as could be possible, as it
hard-codes the socket stream. This is because the method of using a
custom openssl BIO is not clear, and we do not need this for now. We can
still bring this in if and as we need it.
We currently have gitno for talking over TCP, but this needs to know
about both plaintext and OpenSSL connections and the code has gotten
somewhat messy with ifdefs determining which version of the function
should be called.
In order to clean this up and abstract away the details of sending over
the different types of streams, we can instead use an interface and
stack stream implementations.
We may not be able to use the stackability with all streams, but we
are definitely be able to use the abstraction which is currently spread
between different bits of gitno.
A rule can only negate something which was explicitly mentioned in the
rules before it. Change our parsing to ignore a negative rule which does
not negate something mentioned in the rules above it.
While here, fix a wrong allocator usage. The memory for the match string
comes from pool allocator. We must not free it with the general
allocator. We can instead simply forget the string and it will be
cleaned up.
We no longer have NULL strings, but empty ones and duplicate the sides
if necessar, so the first check will never do anything.
While in the area, remove unnecessary ifs and early returns.
There are some combination of objects and target types which we know
cannot be fulfilled. Return EINVALIDSPEC for those to signify that there
is a mismatch in the user-provided data and what the object model is
capable of satisfying.
If we start at a tag and in the course of peeling find out that we
cannot reach a particular type, we return EPEEL.
That's a bad assumption to make, even though right now it holds
(because of the way we've implemented decompression of packfiles),
this may change in the future, given that ODB objects can be
binary data.
Furthermore, the ODB object can return a NULL pointer if the object
is empty. Copying the NULL pointer to the strbuf lets us handle it
like an empty string. Again, the NULL pointer is valid behavior because
you're supposed to check the *size* of the object before working
on it.
This is a contract that we made in the library and that we need to uphold. The
contents of a blob can never be NULL because several parts of the library (including
the filter and attributes code) expect `git_blob_rawcontent` to always return a
valid pointer.
When we fetch twice with the same remote object, we did not properly
clear the connection flags, so we would leak state from the last
connection.
This can cause the second fetch with the same remote object to fail if
using a HTTP URL where the server redirects to HTTPS, as the second
fetch would see `use_ssl` set and think the initial connection wanted to
downgrade the connection.
When attempting to update a reference on a remote during push, and the
reference on the remote refers to a commit that does not exist locally,
then we should report a more clear error message.
When creating a new remote, contrary to loading one from disk,
active_refspecs was not populated. This means that if using the new
remote to push, git_push_update_tips() will be a no-op since it
checks the refspecs passed during the push against the base ones
i.e. active_refspecs. And therefore the local refs won't be created
or updated after the push operation.
There is one well-known and well-tested parser which we should use,
instead of implementing parsing a second time.
The common parser is also augmented to copy the LHS into the RHS if the
latter is empty.
The expressions test had to change a bit, as we now catch a bad RHS of a
refspec locally.
This function, similar in style to git_remote_fetch(), performs all the
steps required for a push, with a similar interface.
The remote callbacks struct has learnt about the push callbacks, letting
us set the callbacks a single time instead of setting some in the remote
and some in the push operation.
This describes their purpose better, as we now initialize ssl and some
other global stuff in there. Calling the init function is not something
which has been optional for a while now.
git hardocodes these as objects which exist regardless of whether they
are in the odb and uses them in the shell interface as a way of
expressing the lack of a blob or tree for one side of e.g. a diff.
In the library we use each language's natural way of declaring a lack of
value which makes a workaround like this unnecessary. Since git uses it,
it does however mean each shell application would need to perform this
check themselves.
This makes it common work across a range of applications and an issue
with compatibility with git, which fits right into what the library aims
to provide.
Thus we introduce the hard-coded empty blob and tree in the odb
frontend. These hard-coded objects are checked for before going to the
backends, but after the cache check, which means the second time they're
used, they will be treated as normal cached objects instead of creating
new ones.
If the remote is anonymous, then we cannot check for any configuration,
as there is no name. Check for this before we try to use the name, which
may be a NULL pointer.
This fixes#2697.
This function has one output but can match multiple files, which can be
unexpected for the user, which would usually path the exact path of the
file he wants the status of.
We cannot know from looking at .gitmodules whether a directory is a
submodule or not. We need the index or tree we are comparing against to
tell us. Otherwise we have to assume the entry in .gitmodules is stale
or otherwise invalid.
Thus we pass the index of the repository into the workdir iterator, even
if we do not want to compare against it. This follows what git does,
which even for `git diff <tree>`, it will consider staged submodules as
such.
A rule "src" in src/.gitignore must only match subdirectories of
src/. The current code does not include this context in the match rule
and would thus consider this rule to match the top-level src/ directory
instead of the intended src/src/.
Keep track fo the context in which the rule was defined so we can
perform a prefix match.
We currently consider CR to start the end of the line, but that means
that we miss cases with CR CR LF which can be used with git to match
files whose names have CR at the end of their names.
The fix from the patch comes from Russell's comment in the issue.
This fixes#2536.
Before trying to rtransform using the given refspec to figure out what
the name of the upstream branch is on the remote, we must make sure that
the target of the refspec applies to the current branch's upstream.
* Error-handling is cleaned up to only let a file-not-found error
through, not other sorts of errors. And when a file-not-found
error happens, we clean up the error.
* Test now checks that file-not-found introduces no error. And
other minor cleanups.
The create function with default refspec is the same as the one with a
custom refspec, but it has the default refspec, so we can create the one
on top of the other.
When we first ask OpenSSL to verify the certfiicate itself (rather
than the HTTPS specifics), we should also return
GIT_ECERTIFICATE. Otherwise, the caller would consider this as a failed
operation rather than a failed validation and not call the user's own
validation.
For example, if you have
[include]
path = foo
and foo didn't exist, git_config_open_ondisk() would just give up
on the rest of the file. Now it ignores the unresolved include
without error and continues reading the rest of the file.
Already cherry-picked commits should not be re-included. If all changes
included in a commit exist in the upstream, then we should error with
GIT_EAPPLIED.
`git_rebase_next` will apply the next patch (or cherry-pick)
operation, leaving the results checked out in the index / working
directory so that consumers can resolve any conflicts, as appropriate.
Remote objects are not meant to be changed from under the user. We did
this in rename, but only the name and left the refspecs, such that a
save would save the wrong refspecs (and a fetch and anything else would
use the wrong refspecs).
Instead, let's simply take a name and not change any loaded remote from
under the user.
This function does not in fact tell us anything, as almost anything with
a colon in it is a valid rsync-style SSH path; it can not tell us that
we do not support ftp or afp or similar as those are still valid SSH
paths and we do support that.
All versions of SSL are considered deprecated now, so let's ask OpenSSl
to only use TLSv1. We still ask it to load those ciphers for
compatibility with servers which want to use an older hello but will use
TLS for encryption.
For good measure we also disable compression, which can be exploitable,
if the OpenSSL version supports it.
Git for Windows will handle UNC paths only when in forward-slash
format, eg "//server/path". When given a UNC path as a remote,
rewrite standard format ("\\server\path") into this ridiculous
format.
The entry_count field is the amount of index entries covered by a
particular cache entry, that is how many files are there (recursively)
under a particular directory.
The current code that attemps to do this is severely defincient and is
trying to count the amount of children, which always comes up to zero.
We don't even need to recount, since we have the information during the
cache creation. We can take that number and keep it, as we only ever
invalidate or replace.
FindFirstFile will fail with INVALID_HANDLE_VALUE if there are no
children to the given path, which can happen if the given path is a
file (and obviously has no children) or if the given path is an empty
mount point. (Most directories have at least directory entries '.'
and '..', but ridiculously another volume mounted in another drive
letter's path space do not, and thus have nothing to enumerate.)
If FindFirstFile fails, check if this is a directory-like thing
(a mount point).
A reparse point that is an IO_REPARSE_TAG_MOUNT_POINT could be
a junction or an actual filesystem mount point. (Who knew?)
If it's the latter, its reparse point will report the actual
volume information \??\Volume{GUID}\ and we should not attempt
to dereference that further, instead readlink should report
EINVAL since it's not a symlink / junction and its original
path was canonical.
Yes, really.
An obvious place to fill the tree cache is on write-tree, as we're
guaranteed to be able to fill in the whole tree cache.
The way this commit does this is not the most efficient, as we read the
root tree from the odb instead of filling in the cache as we go along,
but it fills the cache such that successive operations (and persisting
the index to disk) will be able to take advantage of the cache, and it
reuses the code we already have for filling the cache.
Filling in the cache as we create the trees would require some
reallocation of the children vector, which is currently not possible
with out pool implementation. A different data structure would likely
allow us to perform this operation at a later date.
Keeping the cache around after read-tree is only one part of the
optimisation opportunities. In order to share the cache between program
instances, we need to write the TREE extension to the index.
Do so, taking the opportunity to rename 'entries' to 'entry_count' to
match the name given in the format description. The included test is
rather trivial, but works as a sanity check.
When reading from a tree, we know what every tree is going to look like,
so we can fill in the tree cache completely, making use of the index for
modification of trees a lot quicker.
This simplifies freeing the entries quite a bit; though there aren't
that many failure paths right now, introducing filling the cache from a
tree will introduce more. This makes sure not to leak memory on errors.
If there have been no pushes, we can immediately return ITEROVER. If
there have been no hides, we must not run the uninteresting pre-mark
phase, as we do not want to hide anything and this would simply cause us
to spend time loading objects.
This introduces a phase at the start of preparing a walk which pre-marks
uninteresting commits, but only up to the common ancestors.
We do this in a similar way to git, by walking down the history and
marking (which is what we used to do), but we keep a time-sorted
priority queue of commits and stop marking as soon as there are only
uninteresting commits in this queue.
This is a similar rule to the one used to find the merge-base. As we
keep inserting commits regardless of the uninteresting bit, if there are
only uninteresting commits in the queue, it means we've run out of
interesting commits in our walk, so we can stop.
The old mark_unintesting() logic is still in place, but that stops
walking if it finds an already-uninteresting commit, so it will stop on
the ones we've pre-marked; but keeping it allows us to also hide those
that are hidden via the callback.
We don't need the remote loaded, and the function extracted both of
these from the git_remote in order to do its work, so let's remote a
step and not ask for the loaded remote at all.
This fixes#2390.
The stash is implemented as the refs/stash reference and its reflog. In
order to modify the reflog, we need avoid races by making sure we're the
only ones allowed to modify the reflog.
We achieve this via the transactions API. Locking the reference gives us
exclusive write access, letting us modify and write it without races.
A transaction allows you to lock multiple references and set up changes
for them before applying the changes all at once (or as close as the
backend supports).
This can be used for replication purposes, or for making sure some
operations run when the reference is locked and thus cannot be changed.
When a list of refspecs is passed to fetch (what git would consider
refspec passed on the command-line), we not only need to perform the
updates described in that refspec, but also update the remote-tracking
branch of the fetched remote heads according to the remote's configured
refspecs.
These "fetches" are not however to be written to FETCH_HEAD as they
would be duplicate data, and it's not what the user asked for.
The configured/base fetch refspecs need to be taken into account in
order to implement opportunistic remote-tracking branch updates. DWIM
them and store them in the struct, but don't do anything with them yet.
We can only DWIM when we've connected to the remote and have the list of
the remote's references. Adding or setting the refspecs should not
trigger an attempt to DWIM the refspecs as we typically cannot do it,
and even if we did, we would not use them for the current fetch.
With opportunistic ref updates, git has introduced the concept of having
base refspecs *and* refspecs that are active for a particular fetch.
Let's start by letting the user override the refspecs for download.
When we describe the workdir, we perform a describe on HEAD and then
check to see if the worktree is dirty. If it is and we have a suffix
string, we append that to the buffer.
Instead of printing out to the buffer inside the information-gathering
phase, write the data to a intermediate result structure.
This allows us to split the options into gathering options and
formatting options, simplifying the gathering code.
The getaddrinfo function indicates failure with a non-zero return code,
but this code is not necessarily negative. On platforms like Android
where the code is positive, a failed call causes libgit2 to segfault.
Instead of spreading the data in function arguments, some of which
aren't used for ssh and having a struct only for ssh, use a struct for
both, using a common parent to pass to the callback.
This option make it easy to ignore anything about the server we're
connecting to, which is bad security practice. This was necessary as we
didn't use to expose detailed information about the certificate, but now
that we do, we should get rid of this.
If the user wants to ignore everything, they can still provide a
callback which ignores all the information passed.
We should let the user decide whether to cancel the connection or not
regardless of whether our checks have decided that the certificate is
fine. We provide our own assessment to the callback to let the user fall
back to our checks if they so desire.
If the certificate validation fails (or always in the case of ssh),
let the user decide whether to allow the connection.
The data structure passed to the user is the native certificate
information from the underlying implementation, namely OpenSSL or
WinHTTP.
When using a bare repo with an index, libgit2 attempts to read
files from the index. It caches those files based on the path
to the file, specifically the path to the directory that contains
the file.
If there is no working directory, we use `git_path_dirname_r` to
get the path to the containing directory. However, for the
`.gitattributes` file in the root of the repository, this ends up
normalizing the containing path to `"."` instead of the empty
string and the lookup the `.gitattributes` data fails.
This adds a test of attribute lookups on bare repos and also
fixes the problem by simply rewriting `"."` to be `""`.
Passing 0 as the length of the paths to check to git_diff_index_to_workdir
results in all files being treated as conflicting, that is, all untracked or
modified files in the worktree is reported as conflicting
A signature is made up of a non-empty name and a non-empty email so
let's validate that. This also brings us more in line with git, which
also rejects ident with an empty email.
The compiler was generating a bunch of warnings for
git_mutex_init and git_mutex_lock when GIT_THREADS
was not defined (i.e. when not using -DTHREADSAFE=ON).
Also remove an unused variable from tests/path/core.c.
When the call to the agent fails, we must retrieve the error message
just after the function call, as other calls may overwrite it.
As the agent authentication is the only one which has a teardown and
there does not seem to be a way to get the error message from a stored
error number, this tries to introduce some small changes to store the
error from the agent.
Clearing the error at the beginning of the loop lets us know whether the
agent has already set the libgit2 error message and we should skip it,
or if we should set it.
Teach git_repository_init_ext to use relative paths for the gitlink
to the work directory. This is used when creating a sub repository
where the sub repository resides in the parent repository's
.git directory.
When the fetch refspec does not include the remote's default branch, it
indicates an error in user expectations or programmer error. Error out
in that case.
This lets us get rid of the dummy refspec which can never work as its
zeroed out. In the cases where we did not find a default branch, we set
HEAD detached immediately, which lets us refactor the "normal" path,
removing `found_branch`.
It does the same as git_remote_supported_url() but has a name which
implies we'd check the URL for correctness while we're simply looking at
the scheme and looking it up in our lists.
While here, fix up the tests so we check all the combination of what's
supported.
The previous commit makes it harder to figure out if the library was
built with support for a particular transport. Roll back some of the
changes and remove ssh:// and https:// from the list if we're being
built without support for them.
Even when built without a SSH support, we know about this transport. It
is implemented, but the current code makes us return an error message
saying it's not.
This is a leftover from the initial implementation of the transports
when there were in fact transports we knew about but were not
implemented.
Instead, let the SSH transport itself say it cannot run, the same as we
do for HTTPS.
A repository can have any number of references which we're not
interested in such as notes or tags. For the default branch calculation
we only care about branches. Make the decision about the number of
branches rather than the number of refs in general.
When we see PARENT1, it means there is a local commit and thus we are
ahead. Likewise, seeing PARENT2 means that the upstream branch has a
commit and we are one more behind.
The logic is currently reversed. Correct it.
This fixes#2501.
W/o this patch it is not possible to have a third party ssh transport_cb if GIT_SSH is disabled or a third party transport_cb which has a higher priority than the default one.
Signed-off-by: Sven Strickroth <email@cs-ware.de>
The callers of git_packfile_unpack() expect the obj_offset argument to
be set to the beginning of the next object. We were mistakenly returning
the the offset of the object's data, which causes the CRC function to
try to use the wrong offset.
Set obj_offset to curpos instead of elem->offset to point to the next
element and bring back expected behaviour.
Our mkdir helper was failing is a parent directory was not
accessible even if the child directory could be created.
This changes the helper to keep trying child directories
even when the parent is unwritable.
The old `allocfmt` is of no use to callers, as they are not able to free
the returned buffer. Export a new API that returns a static string that
doesn't need to be freed.
The recv buffer (parse_buffer) and the buffer have independent sizes and
offsets. We try to fill in parse_buffer as much as possible before
passing it to the http parser. This is fine most of the time, but fails
us when the buffer is almost full.
In those situations, parse_buffer can have more data than we would be
able to put into the buffer (which may be getting full if we're towards
the end of a data sideband packet).
To work around this, we check if the space we have left on our buffer is
smaller than what could come from the network. If this happens, we make
parse_buffer think that it has as much space left as our buffer, so it
won't try to retrieve more data than we can deal with.
As the start of the data may no longer be at the start of the buffer, we
need to keep track of where it really starts (data_offset) and use that
in our calculations for the real size of the data we received from the
network.
This fixes#2518.
* Move the transport registration mechanisms into a new header under
'sys/' because this is advanced stuff.
* Remove the 'priority' argument from the registration as it adds
unnecessary complexity. (Since transports cannot decline to operate,
only the highest priority transport is ever executed.) Users who
require per-priority transports can implement that in their custom
transport themselves.
* Simplify registration further by taking a scheme (eg "http") instead
of a prefix (eg "http://").
In the check for multiline, we traverse the backslashes from the end
backwards and int the end assert that we haven't gone past the beginning
of the line. We make sure of this in the loop condition, but we also
check in the return value.
However, for certain configurations, a line in a multiline variable
might be empty to aid formatting. In that case, 'end' == 'start', since
we ended up looking at the first char which made it a multiline.
There is no need for the (end > start) check in the return, since the
loop guarantees we won't go further back than the first char in the
line, and we do accept the first char to be the final backslash.
This fixes#2483.
While scanning through a directory hierarchy, this prevents a
positive ignore match on a parent directory from blocking the scan
of a directory when a negative match rule exists for files inside
the directory.
* Removes mingw-compat.h
* Cleans up separation of compiler/platform idiosyncrasies
* Unifies mingw/msvc stat structures and functions
* (Tries to) hide more compiler specific implementation details (even in our internal API)
We always calculate multiple merge bases, but up to now we had only
exposed the "best" merge base.
Introduce git_oidarray which analogously to git_strarray lets us return
multiple ids.
This works around strict aliasing rules letting some versions of
GCC (particularly on RHEL 6) thinking that they can skip updating the
size of the array when calculating the next element's offset.
Preallocating two commits doesn't make much sense as leaving allocation
to the first array usage will allocate a sensible size with room for
growth.
This preallocation has also been hiding issues with strict aliasing in
the tests, as we have fairly simple histories and never trigger the
growth.
git allows you to set which paths to use for the git server programs
when connecting over ssh; and we want to provide something similar.
We do this by providing a factory function which can be set as the
remote's transport callback which will set the given paths upon
creation.
We used to assume a refspec would only have an asterisk in the middle of
their respective pattern. This has not been a valid assumption for some
time now with git.
Instead of assuming where the asterisk is going to be, change the logic
to treat each pattern as having two halves with a replacement bit in the
middle, where the asterisk is.
When transforming a non-pattern refspec, we simply need to copy over the
opposite string. Move that logic up to the wrapper so we can assume a
pattern refspec in the transformation function.
Move the definition of git_thread_yield() to the test which needs it and
add the correct definition for it for FreeBSD and derivatives.
Original patch adding FreeBSD and derivatives by @jacquesg.
In order to connect to a remote server, we need to provide a path to the
repository we're interested in. Consider the lack of path in the url an
error.