For each remote-tracking branch we want to remove, we need to consider
it against every other refspec in case we have overlapping refspecs,
such as with
refs/heads/*:refs/remotes/origin/*
refs/pull/*/head:refs/remotes/origin/pr/*
as we'd otherwise remove too many refspecs.
Create a list of condidates, which are the references matching the rhs
of any active refspec and then filter that list by removing those
entries for which we find a remove reference with any active
refspec. Those which are left after this are removed.
Most of the network-facing facilities have been copied to the socket and
openssl streams. No code now uses these functions directly anymore, so
we can now remove them.
Having an ssh stream would require extra work for stream capabilities we
don't need anywhere else (oob auth and command execution) so for now
let's move away from the gitno connection to use socket_stream.
We can introduce an ssh stream interface if and as we need it.
This unfortunately isn't as stackable as could be possible, as it
hard-codes the socket stream. This is because the method of using a
custom openssl BIO is not clear, and we do not need this for now. We can
still bring this in if and as we need it.
We currently have gitno for talking over TCP, but this needs to know
about both plaintext and OpenSSL connections and the code has gotten
somewhat messy with ifdefs determining which version of the function
should be called.
In order to clean this up and abstract away the details of sending over
the different types of streams, we can instead use an interface and
stack stream implementations.
We may not be able to use the stackability with all streams, but we
are definitely be able to use the abstraction which is currently spread
between different bits of gitno.
A rule can only negate something which was explicitly mentioned in the
rules before it. Change our parsing to ignore a negative rule which does
not negate something mentioned in the rules above it.
While here, fix a wrong allocator usage. The memory for the match string
comes from pool allocator. We must not free it with the general
allocator. We can instead simply forget the string and it will be
cleaned up.
We no longer have NULL strings, but empty ones and duplicate the sides
if necessar, so the first check will never do anything.
While in the area, remove unnecessary ifs and early returns.
There are some combination of objects and target types which we know
cannot be fulfilled. Return EINVALIDSPEC for those to signify that there
is a mismatch in the user-provided data and what the object model is
capable of satisfying.
If we start at a tag and in the course of peeling find out that we
cannot reach a particular type, we return EPEEL.
That's a bad assumption to make, even though right now it holds
(because of the way we've implemented decompression of packfiles),
this may change in the future, given that ODB objects can be
binary data.
Furthermore, the ODB object can return a NULL pointer if the object
is empty. Copying the NULL pointer to the strbuf lets us handle it
like an empty string. Again, the NULL pointer is valid behavior because
you're supposed to check the *size* of the object before working
on it.
This is a contract that we made in the library and that we need to uphold. The
contents of a blob can never be NULL because several parts of the library (including
the filter and attributes code) expect `git_blob_rawcontent` to always return a
valid pointer.
When we fetch twice with the same remote object, we did not properly
clear the connection flags, so we would leak state from the last
connection.
This can cause the second fetch with the same remote object to fail if
using a HTTP URL where the server redirects to HTTPS, as the second
fetch would see `use_ssl` set and think the initial connection wanted to
downgrade the connection.
When attempting to update a reference on a remote during push, and the
reference on the remote refers to a commit that does not exist locally,
then we should report a more clear error message.
When creating a new remote, contrary to loading one from disk,
active_refspecs was not populated. This means that if using the new
remote to push, git_push_update_tips() will be a no-op since it
checks the refspecs passed during the push against the base ones
i.e. active_refspecs. And therefore the local refs won't be created
or updated after the push operation.
There is one well-known and well-tested parser which we should use,
instead of implementing parsing a second time.
The common parser is also augmented to copy the LHS into the RHS if the
latter is empty.
The expressions test had to change a bit, as we now catch a bad RHS of a
refspec locally.
This function, similar in style to git_remote_fetch(), performs all the
steps required for a push, with a similar interface.
The remote callbacks struct has learnt about the push callbacks, letting
us set the callbacks a single time instead of setting some in the remote
and some in the push operation.
This describes their purpose better, as we now initialize ssl and some
other global stuff in there. Calling the init function is not something
which has been optional for a while now.
git hardocodes these as objects which exist regardless of whether they
are in the odb and uses them in the shell interface as a way of
expressing the lack of a blob or tree for one side of e.g. a diff.
In the library we use each language's natural way of declaring a lack of
value which makes a workaround like this unnecessary. Since git uses it,
it does however mean each shell application would need to perform this
check themselves.
This makes it common work across a range of applications and an issue
with compatibility with git, which fits right into what the library aims
to provide.
Thus we introduce the hard-coded empty blob and tree in the odb
frontend. These hard-coded objects are checked for before going to the
backends, but after the cache check, which means the second time they're
used, they will be treated as normal cached objects instead of creating
new ones.
If the remote is anonymous, then we cannot check for any configuration,
as there is no name. Check for this before we try to use the name, which
may be a NULL pointer.
This fixes#2697.
This function has one output but can match multiple files, which can be
unexpected for the user, which would usually path the exact path of the
file he wants the status of.
We cannot know from looking at .gitmodules whether a directory is a
submodule or not. We need the index or tree we are comparing against to
tell us. Otherwise we have to assume the entry in .gitmodules is stale
or otherwise invalid.
Thus we pass the index of the repository into the workdir iterator, even
if we do not want to compare against it. This follows what git does,
which even for `git diff <tree>`, it will consider staged submodules as
such.
A rule "src" in src/.gitignore must only match subdirectories of
src/. The current code does not include this context in the match rule
and would thus consider this rule to match the top-level src/ directory
instead of the intended src/src/.
Keep track fo the context in which the rule was defined so we can
perform a prefix match.
We currently consider CR to start the end of the line, but that means
that we miss cases with CR CR LF which can be used with git to match
files whose names have CR at the end of their names.
The fix from the patch comes from Russell's comment in the issue.
This fixes#2536.
Before trying to rtransform using the given refspec to figure out what
the name of the upstream branch is on the remote, we must make sure that
the target of the refspec applies to the current branch's upstream.
* Error-handling is cleaned up to only let a file-not-found error
through, not other sorts of errors. And when a file-not-found
error happens, we clean up the error.
* Test now checks that file-not-found introduces no error. And
other minor cleanups.
The create function with default refspec is the same as the one with a
custom refspec, but it has the default refspec, so we can create the one
on top of the other.
When we first ask OpenSSL to verify the certfiicate itself (rather
than the HTTPS specifics), we should also return
GIT_ECERTIFICATE. Otherwise, the caller would consider this as a failed
operation rather than a failed validation and not call the user's own
validation.
For example, if you have
[include]
path = foo
and foo didn't exist, git_config_open_ondisk() would just give up
on the rest of the file. Now it ignores the unresolved include
without error and continues reading the rest of the file.
Already cherry-picked commits should not be re-included. If all changes
included in a commit exist in the upstream, then we should error with
GIT_EAPPLIED.
`git_rebase_next` will apply the next patch (or cherry-pick)
operation, leaving the results checked out in the index / working
directory so that consumers can resolve any conflicts, as appropriate.
Remote objects are not meant to be changed from under the user. We did
this in rename, but only the name and left the refspecs, such that a
save would save the wrong refspecs (and a fetch and anything else would
use the wrong refspecs).
Instead, let's simply take a name and not change any loaded remote from
under the user.
This function does not in fact tell us anything, as almost anything with
a colon in it is a valid rsync-style SSH path; it can not tell us that
we do not support ftp or afp or similar as those are still valid SSH
paths and we do support that.
All versions of SSL are considered deprecated now, so let's ask OpenSSl
to only use TLSv1. We still ask it to load those ciphers for
compatibility with servers which want to use an older hello but will use
TLS for encryption.
For good measure we also disable compression, which can be exploitable,
if the OpenSSL version supports it.
Git for Windows will handle UNC paths only when in forward-slash
format, eg "//server/path". When given a UNC path as a remote,
rewrite standard format ("\\server\path") into this ridiculous
format.
The entry_count field is the amount of index entries covered by a
particular cache entry, that is how many files are there (recursively)
under a particular directory.
The current code that attemps to do this is severely defincient and is
trying to count the amount of children, which always comes up to zero.
We don't even need to recount, since we have the information during the
cache creation. We can take that number and keep it, as we only ever
invalidate or replace.
FindFirstFile will fail with INVALID_HANDLE_VALUE if there are no
children to the given path, which can happen if the given path is a
file (and obviously has no children) or if the given path is an empty
mount point. (Most directories have at least directory entries '.'
and '..', but ridiculously another volume mounted in another drive
letter's path space do not, and thus have nothing to enumerate.)
If FindFirstFile fails, check if this is a directory-like thing
(a mount point).
A reparse point that is an IO_REPARSE_TAG_MOUNT_POINT could be
a junction or an actual filesystem mount point. (Who knew?)
If it's the latter, its reparse point will report the actual
volume information \??\Volume{GUID}\ and we should not attempt
to dereference that further, instead readlink should report
EINVAL since it's not a symlink / junction and its original
path was canonical.
Yes, really.
An obvious place to fill the tree cache is on write-tree, as we're
guaranteed to be able to fill in the whole tree cache.
The way this commit does this is not the most efficient, as we read the
root tree from the odb instead of filling in the cache as we go along,
but it fills the cache such that successive operations (and persisting
the index to disk) will be able to take advantage of the cache, and it
reuses the code we already have for filling the cache.
Filling in the cache as we create the trees would require some
reallocation of the children vector, which is currently not possible
with out pool implementation. A different data structure would likely
allow us to perform this operation at a later date.
Keeping the cache around after read-tree is only one part of the
optimisation opportunities. In order to share the cache between program
instances, we need to write the TREE extension to the index.
Do so, taking the opportunity to rename 'entries' to 'entry_count' to
match the name given in the format description. The included test is
rather trivial, but works as a sanity check.
When reading from a tree, we know what every tree is going to look like,
so we can fill in the tree cache completely, making use of the index for
modification of trees a lot quicker.
This simplifies freeing the entries quite a bit; though there aren't
that many failure paths right now, introducing filling the cache from a
tree will introduce more. This makes sure not to leak memory on errors.
If there have been no pushes, we can immediately return ITEROVER. If
there have been no hides, we must not run the uninteresting pre-mark
phase, as we do not want to hide anything and this would simply cause us
to spend time loading objects.
This introduces a phase at the start of preparing a walk which pre-marks
uninteresting commits, but only up to the common ancestors.
We do this in a similar way to git, by walking down the history and
marking (which is what we used to do), but we keep a time-sorted
priority queue of commits and stop marking as soon as there are only
uninteresting commits in this queue.
This is a similar rule to the one used to find the merge-base. As we
keep inserting commits regardless of the uninteresting bit, if there are
only uninteresting commits in the queue, it means we've run out of
interesting commits in our walk, so we can stop.
The old mark_unintesting() logic is still in place, but that stops
walking if it finds an already-uninteresting commit, so it will stop on
the ones we've pre-marked; but keeping it allows us to also hide those
that are hidden via the callback.
We don't need the remote loaded, and the function extracted both of
these from the git_remote in order to do its work, so let's remote a
step and not ask for the loaded remote at all.
This fixes#2390.
The stash is implemented as the refs/stash reference and its reflog. In
order to modify the reflog, we need avoid races by making sure we're the
only ones allowed to modify the reflog.
We achieve this via the transactions API. Locking the reference gives us
exclusive write access, letting us modify and write it without races.
A transaction allows you to lock multiple references and set up changes
for them before applying the changes all at once (or as close as the
backend supports).
This can be used for replication purposes, or for making sure some
operations run when the reference is locked and thus cannot be changed.
When a list of refspecs is passed to fetch (what git would consider
refspec passed on the command-line), we not only need to perform the
updates described in that refspec, but also update the remote-tracking
branch of the fetched remote heads according to the remote's configured
refspecs.
These "fetches" are not however to be written to FETCH_HEAD as they
would be duplicate data, and it's not what the user asked for.
The configured/base fetch refspecs need to be taken into account in
order to implement opportunistic remote-tracking branch updates. DWIM
them and store them in the struct, but don't do anything with them yet.
We can only DWIM when we've connected to the remote and have the list of
the remote's references. Adding or setting the refspecs should not
trigger an attempt to DWIM the refspecs as we typically cannot do it,
and even if we did, we would not use them for the current fetch.
With opportunistic ref updates, git has introduced the concept of having
base refspecs *and* refspecs that are active for a particular fetch.
Let's start by letting the user override the refspecs for download.
When we describe the workdir, we perform a describe on HEAD and then
check to see if the worktree is dirty. If it is and we have a suffix
string, we append that to the buffer.
Instead of printing out to the buffer inside the information-gathering
phase, write the data to a intermediate result structure.
This allows us to split the options into gathering options and
formatting options, simplifying the gathering code.
The getaddrinfo function indicates failure with a non-zero return code,
but this code is not necessarily negative. On platforms like Android
where the code is positive, a failed call causes libgit2 to segfault.
Instead of spreading the data in function arguments, some of which
aren't used for ssh and having a struct only for ssh, use a struct for
both, using a common parent to pass to the callback.
This option make it easy to ignore anything about the server we're
connecting to, which is bad security practice. This was necessary as we
didn't use to expose detailed information about the certificate, but now
that we do, we should get rid of this.
If the user wants to ignore everything, they can still provide a
callback which ignores all the information passed.
We should let the user decide whether to cancel the connection or not
regardless of whether our checks have decided that the certificate is
fine. We provide our own assessment to the callback to let the user fall
back to our checks if they so desire.
If the certificate validation fails (or always in the case of ssh),
let the user decide whether to allow the connection.
The data structure passed to the user is the native certificate
information from the underlying implementation, namely OpenSSL or
WinHTTP.
When using a bare repo with an index, libgit2 attempts to read
files from the index. It caches those files based on the path
to the file, specifically the path to the directory that contains
the file.
If there is no working directory, we use `git_path_dirname_r` to
get the path to the containing directory. However, for the
`.gitattributes` file in the root of the repository, this ends up
normalizing the containing path to `"."` instead of the empty
string and the lookup the `.gitattributes` data fails.
This adds a test of attribute lookups on bare repos and also
fixes the problem by simply rewriting `"."` to be `""`.