The uptodate bit should have a lifecycle of a single read->write
on the index. Once the index is written, the files within it should
be scanned for racy timestamps against the new index timestamp.
Keep track of entries that we believe are up-to-date, because we
added the index entries since the index was loaded. This prevents
us from unnecessarily examining files that we wrote during the
cleanup of racy entries (when we smudge racily clean files that have
a timestamp newer than or equal to the index's timestamp when we
read it). Without keeping track of this, we would examine every
file that we just checked out for raciness, since all their timestamps
would be newer than the index's timestamp.
When we insert a conflict in a case-insensitive index, accept the
new entry's path as the correct case instead of leaving the path we
already had.
This puts `git_index_conflict_add()` on the same level as
`git_index_add()` in this respect.
Inserting new REUC entries can quickly become pathological given that
each insert unsorts the REUC vector, and both subsequent lookups *and*
insertions will require sorting it again before being successful.
To avoid this, we're switching to `git_vector_insert_sorted`: this keeps
the REUC vector constantly sorted and lets us use the `on_dup` callback
to skip an extra binary search on each insertion.
When we do not trust the on-disk mode, we use the mode of an existing
index entry. This allows us to preserve executable bits on platforms
that do not honor them on the filesystem.
If there is no stage 0 index entry, also look at conflicts to attempt
to answer this question: prefer the data from the 'ours' side, then
the 'theirs' side before falling back to the common ancestor.
This allows us to remove OS checks from source code, instead relying
on CMake to detect whether or not `struct stat` has the nanoseconds
members we rely on.
On case insensitive platforms, allow `git_index_add` to provide a new
path for an existing index entry. Previously, we would maintain the
case in an index entry without the ability to change it (except by
removing an entry and re-adding it.)
Higher-level functions (like `git_index_add_bypath` and
`git_index_add_frombuffers`) continue to keep the old path for easier
usage.
On case insensitive systems, when given a user-provided path in the
higher-level index addition functions (eg `git_index_add_bypath` /
`git_index_add_frombuffer`), examine the index to try to match the
given path to an existing directory.
Various mechanisms can cause the on-disk representation of a folder
to not match the representation in HEAD or the index - for example,
a case changing rename of some file `a/file.txt` to `A/file.txt`
will update the paths in the index, but not rename the folder on
disk.
If a user subsequently adds `a/other.txt`, then this should be stored
in the index as `A/other.txt`.
When an entry has a racy timestamp, we need to check whether the file
itself has changed since we put its entry in the index. Only then do we
smudge the size field to force a check the next time around.
This is used by the submodule in order to figure out if the index has
changed since it last read it. Using a timestamp is racy, so let's make
it use the checksum, just like we now do for reloading the index itself.
We currently use a timetamp to check whether an index file has been
modified since we last read it, but this is racy. If two updates happen
in the same second and we read after the first one, we won't detect the
second one.
Instead read the SHA-1 checksum of the file, which are its last 20 bytes which
gives us a sure-fire way to detect whether the file has changed since we
last read it.
As we're now keeping track of it, expose an accessor to this data.
If a file entry has the same timestamp as the index itself, it is
considered racily-clean, as it may have been modified after the index
was written, but during the same second. We take extra steps to check
the contents, but this is just one part of avoiding races.
For files which do have changes but have not been updated in the index,
updating the on-disk index means updating its timestamp, which means we
would no longer recognise these entries as racy and we would trust the
timestamp to tell us whether they have changed.
In order to work around this, git zeroes out the file-size field in
entries with the same timestamp as the index in order to force the next
diff to check the contents. Do so in libgit2 as well.
Introduce a new binary diff callback to provide the actual binary
delta contents to callers. Create this data from the diff contents
(instead of directly from the ODB) to support binary diffs including
the workdir, not just things coming out of the ODB.
If there exists a conflict in the index, but no file in the working
directory, this implies that the user wants to accept the resolution
by removing the file. Thus, remove the conflict entry from the
index, instead of trying to add a (nonexistent) file.
It's not always obvious the mapping between stage level and
conflict-ness. More importantly, this can lead otherwise sane
people to write constructs like `if (!git_index_entry_stage(entry))`,
which (while technically correct) is unreadable.
Provide a nice method to help avoid such messy thinking.
Instead of going through each entry we have and re-adding, which may not
even be correct for certain crlf options and has bad performance, use
the function which performs a diff against the worktree and try to add
and remove files from that list.
We currently iterate over all the entries and re-add them to the
index. While this provides correctness, it is wasteful as we try to
re-insert files which have not changed.
Instead, take a diff between the index and the worktree and only re-add
those which we already know have changed.
The idea...sometimes, a filemode is user-specified via an
explicit git_index_entry. In this case, believe the user, always.
Sometimes, it is instead built up by statting the file system. In
those cases, go with the existing logic we have to determine
whether the file system supports all filemodes and symlinks, and
make the best guess.
On file systems which have full filemode and symlink support, this
commit should make no difference. On others (most notably Windows),
this will fix problems things like:
* git_index_add and git_index_add_frombuffer() should be believed.
* As a consequence, git_checkout_tree should make the filemodes in
the index match the ones in the tree.
* And diffs with GIT_DIFF_UPDATE_INDEX don't write the wrong filemodes.
* And merges, and probably other downstream stuff now fixed, too.
This makes my previous changes to checkout.c unnecessary,
so they are now reverted.
Also, added a test for index_entry permissions from git_index_add
and git_index_add_frombuffer, both of which failed before these changes.
git_index_add_frombuffer enables now to store a memory buffer in the odb
and to store an entry in the index directly if the index is attached to a
repository.
Introduce `git_indexwriter`, to allow us to lock the index while
performing additional operations, then complete the write (or abort,
unlocking the index).