This is useful to send to the client while we're performing the work.
The reporting function has a force parameter which makes sure that we
do send out the message of 100% completed, even if this comes before the
next udpate window.
Instead of copying each object individually, as we'd been doing, use the
packbuilder which should be faster and give us some feedback.
While performing this change, we can hook up the packbuilder's writing
to the push progress so the caller knows how far along we are.
We currently first look in the loose object dir and then in the packs
for objects. When performing operations on recent history this has a
higher likelihood of hitting, but when we deal with operations which
look further back into the past, we start spending a large amount of
time getting ENOTENT from `access`.
Reversing the priorities means that long-running operations can get to
their objects faster, as we can look at the index data we have in memory
(or rather mapped) to figure out whether we have an object, which is
faster than going out to the filesystem.
The packed backend already implements an optimistic read algorithm by
first looking at the packs we know about and only going out to disk to
referesh if the object is not found which means that in the case where
we do have the object (which will be in the majority for anything that
traverses the graph) we can avoid going to to disk entirely to determine
whether an object exists.
Operations which look at recent history may take a slight impact, but
these would be operations which look a lot less at object and thus take
less time regardless.
The base refspecs changing can be a cause of confusion as to what is the
current base refspec set and complicate saving the remote's
configuration.
Change `git_remote_add_{fetch,push}()` to update the configuration
instead of an instance.
This finally makes `git_remote_save()` a no-op, it will be removed in a
later commit.
While this will rarely be different from the default, having it in the
remote adds yet another setting it has to keep around and can affect its
behaviour. Move it to the options.
Instead of having it set in a different place from every other callback,
put it the main structure. This removes some state from the remote and
makes it behave more like clone, where the constructors are passed via
the options.
As a first step in removing the repository-saving logic, don't allow
chaning the url or push url from a remote object, but change the
configuration on the configuration immediately.
Having the setting be different from calling its actions was not a great
idea and made for the sake of the wrong convenience.
Instead of that, accept either fetch options, push options or the
callbacks when dealing with the remote. The fetch options are currently
only the callbacks, but more options will be moved from setters and
getters on the remote to the options.
This does mean passing the same struct along the different functions but
the typical use-case will only call git_remote_fetch() or
git_remote_push() and so won't notice much difference.
The push object knows which remote it's associated with, and therefore
does not need to keep its own copy of the callbacks stored in the
remote.
Remove the copy and simply access the callbacks struct within the
remote.
Restricting files to size_t is a silly limitation. The loose backend
writes to a file directly, so there is no issue in using 63 bits for the
size.
We still assume that the header is going to fit in 64 bytes, which does
mean quite a bit smaller files due to the run-length encoding, but it's
still a much larger size than you would want Git to handle.
When handling attr matching, simply compare the directory path where the
attribute file resides to the path being matched. Skip over commonality
to allow us to compare the contents of the attribute file to the remainder
of the path.
This allows us to more easily compare the pattern directly to the path,
instead of trying to guess whether we want to compare the path's basename
or the full path based on whether the match was inside a containing
directory or not.
This also allows us to do fewer translations on the pattern (trying to
re-prefix it.)
When determining whether some file matches an attr pattern, do
not try to truncate the path to pass to fnmatch. When there is
no containing directory for an item (eg, from a .gitignore in the
root) this will cause us to truncate our path, which means that
we cannot do meaningful comparisons on it and we may have false
positives when trying to determine whether a given file is actually
a file or a folder (as we have lost the path's base information.)
This mangling was to allow fnmatch to compare a directory on disk to
the name of a directory, but it is unnecessary as our fnmatch accepts
FNM_LEADING_DIR.
We use a blocking socket and set the mode to AUTO_RETRY which means that
`SSL_write` and `SSL_read` will only return once the read or write has
been completed. We therefore don't need to handle partial writes or
re-try read due to a regenotiation.
While here, consider that a zero also indicates an error condition.
When writing a configuration file, we want to take a lock on the
new file (eg, `config.lock`) before opening the configuration file
(`config`) for reading so that we can prevent somebody from changing
the contents underneath us.
When updating a configuration file, we want to copy the old data
from the file to preserve comments and funny whitespace, instead
of writing it in some "canonical" format. Thus, we keep a
pointer to the start of the line and the line length to preserve
these things we don't care to rewrite.
Previously we would try to be clever when writing the configuration
file and try to stop parsing (and simply copy the rest of the old
file) when we either found the value we were trying to write,
or when we left the section that value was in, the assumption being
that there was no more work to do.
Regrettably, you can have another section with the same name later
in the file, and we must cope with that gracefully, thus we read the
whole file in order to write a new file.
Now, writing a file looks even more than reading. Pull the config
parsing out into its own function that can be used by both reading
and writing the configuration.
When checking out with a case-insensitive working directory, we
want to change the case of items in the working directory to
reflect changes that occured in the checkout target. Diff now
has an option to break case-changing renames into delete/add.
Using FindFirstFile and FindNextFile in win32 allows us to
use the directory information that is returned, instead of
us having to get the file attributes all over again, which
is a distinct cost savings on win32.
The _next method shouldn't take a path pointer (and a path_len
pointer) as 100% of current users use the full path and ignore
the filename.
Plus let's add some docs and a unit test.
Changed win32/path_w32.c to utilize NTFS' FindFirst..FindNext data instead of doing an lstat per file. Avoiding unnecessary directory opens and file scans reduces IO, improving overall performance. Effect is magnified due to NTFS being a kernel mode file system (as opposed to user mode).
In checkout.c and filter.c we were casting a sub struct
to a parent struct which breaks the strict aliasing rules
in C. However we can use .parent or .base to access the
parent struct to avoid the build warnings.
In remote.c the local variable error was not initialized
or updated in some cases. For unintialized error a build
warning will be generated. So always keep error variable
up-to-date.
Do not automatically fail on a bad certificate, but let the caller
decide. This means we don't need our switch on errors anymore but can
return a string representation from the security framework.
If git_config_delete is to work properly in the presence of duplicate section
headers, it cannot stop searching at the end of the first matching section, as
there may be another matching section later.
When config_write is used for deletion (value = NULL), it may only terminate
when the desired key is found or there are no sections left to parse.
The idea...sometimes, a filemode is user-specified via an
explicit git_index_entry. In this case, believe the user, always.
Sometimes, it is instead built up by statting the file system. In
those cases, go with the existing logic we have to determine
whether the file system supports all filemodes and symlinks, and
make the best guess.
On file systems which have full filemode and symlink support, this
commit should make no difference. On others (most notably Windows),
this will fix problems things like:
* git_index_add and git_index_add_frombuffer() should be believed.
* As a consequence, git_checkout_tree should make the filemodes in
the index match the ones in the tree.
* And diffs with GIT_DIFF_UPDATE_INDEX don't write the wrong filemodes.
* And merges, and probably other downstream stuff now fixed, too.
This makes my previous changes to checkout.c unnecessary,
so they are now reverted.
Also, added a test for index_entry permissions from git_index_add
and git_index_add_frombuffer, both of which failed before these changes.
In `git_rebase_operation_current()`, indicate when a rebase has not
started (with `GIT_REBASE_NO_OPERATION`) rather than conflating that
with the first operation being in-progress.
It can be useful for the caller to know which update commands will be
sent to the server before the packfile is pushed up. git does this via
the pre-push hook.
We don't have hooks, but as it adds introspection into what is
happening, we can add a callback which performs the same function.
In the prior implementation, enabling GIT_DIFF_UPDATE_INDEX would overwrite
entries in the index with the ones generated from scanning the working if the
OID was the same.
Because this OID comparison ignores file modes, this means an file in the
workdir with only an exec bit difference with the one in the index would end
up being overwritten, resulting in the exec bit being loss. There might be
other related bugs but the fix of comparing OIDs and file modes should
address them all.
When walking backwards and marking parents uninteresting, make sure we
detect when the list of commits we have left has run out of
uninteresting commits so we can stop marking commits as
uninteresting. Failing to do so can mean that we walk the whole history
marking everything uninteresting, which eats up time, CPU and IO for
with useless work.
While pre-marking does look for this, we still need to check during the
main traversal as there are setups for which pre-marking does not leave
enough information in the commits. This can happen if we push a commit
and hide its parent.
The regcomp function returns a non-zero value if compilation of
a regular expression fails. In most places we only check for
negative values, but positive values indicate an error, as well.
Fix this tree-wide, fixing a segmentation fault when calling
git_config_iterator_glob_new with an invalid regexp.
When a commit is first set as unintersting and then pushed, we must take
care that we do not put it into the commit list as that makes us return
at least that commit (but maybe more) as we've inserted it into the list
because we have the assumption that we want anything in the commit list.
When no reference names could be found we did error out when trying to describe
a commit. This is wrong, though, when the option to fall back to a commit's
object ID is set.
git_checkout_tree() has some fallback behaviors for file systems
which don't have full support of filemodes. Generally works fine,
but if a given file had a change of type from a 0644 to 0755 (i.e.,
you add executable permissions), the fallback behavior incorrectly
triggers when writing hte updated index.
This would cause a git_checkout_tree() command, even with the
GIT_CHECKOUT_FORCE option set, to leave a dirty index on Windows.
Also added checks to an existing test to catch this case.
When diffs are generated, the value for the 'nfiles' field of 'git_diff_delta'
will be consistent with the value in the 'status' field. Merging diffs can
modify the 'status' field of some deltas and the 'nfiles' field needs to be
updated accordingly.