Support reading and writing index v4. Index v4 uses a very simple
compression scheme for pathnames, but is otherwise similar to index v3.
Signed-off-by: David Turner <dturner@twitter.com>
When we do not trust the on-disk mode, we use the mode of an existing
index entry. This allows us to preserve executable bits on platforms
that do not honor them on the filesystem.
If there is no stage 0 index entry, also look at conflicts to attempt
to answer this question: prefer the data from the 'ours' side, then
the 'theirs' side before falling back to the common ancestor.
We currently use a timetamp to check whether an index file has been
modified since we last read it, but this is racy. If two updates happen
in the same second and we read after the first one, we won't detect the
second one.
Instead read the SHA-1 checksum of the file, which are its last 20 bytes which
gives us a sure-fire way to detect whether the file has changed since we
last read it.
As we're now keeping track of it, expose an accessor to this data.
It's not always obvious the mapping between stage level and
conflict-ness. More importantly, this can lead otherwise sane
people to write constructs like `if (!git_index_entry_stage(entry))`,
which (while technically correct) is unreadable.
Provide a nice method to help avoid such messy thinking.
While we are confident about the size of an int in architectures we're
likely to care about, the index format is defined by the exact size of
the fields. Use the definitions which show the exact width of the entry
fields.
As part of that, bring back 32-bit time and size fields, which currently
are 64 bits wide and can bring a false sense of security in how much
data they really store. Document that these fields are not to be taken
as authoritative.
git_index_add_frombuffer enables now to store a memory buffer in the odb
and to store an entry in the index directly if the index is attached to a
repository.
This makes them show up in the reference, even if the text itself isn't
the most descriptive.
These have been found with
grep -Przon '\n\ntypedef struct.*?\{' -- include
grep -Przon '\n\ntypedef enum.*?\{' -- include
The documentation has shown this as a single enum for a long time. These
should in fact be two enums. One with the bits for the flags and another
with the bits for the extended flags.
This reorganized the diff OID calculation to make it easier to
correctly update the stat cache during a diff once the flags to
do so are enabled.
This includes marking the path of a git_index_entry as const so
we can make a "fake" git_index_entry with a "const char *" path
and not get warnings. I was a little surprised at how unobtrusive
this change was, but I think it's probably a good thing.
I introduced a leak into conflict cleanup by removing items from
inside the git_vector_remove_matching call. This simplifies the
code to just use one common way for the two conflict cleanup APIs.
When an index has an active snapshot, removing an item can cause
an error (inserting into the deferred deletion vector), so I made
the git_index_conflict_cleanup API return an error code. I felt
like this wasn't so bad since it is just like the other APIs.
I fixed up a couple of comments while I was changing the header.
Again, laying groundwork for some index iterator changes, this
contains a bunch of code refactorings for index internals that
should make it easier down the line to add locking around index
modifications. Also this removes the redundant prefix_position
function and fixes some potential memory leaks.
* Make GIT_INLINE an internal definition so it cannot be used in
public headers
* Fix language in CONTRIBUTING
* Make index caps API use signed instead of unsigned values
This changes `git_index_read` to have two modes - a hard index
reload that always resets the index to match the on-disk data
(which was the old behavior) and a soft index reload that uses
the timestamp / file size information and only replaces the index
data if the file on disk has been modified.
This then updates the git_status code to do a soft reload unless
the new GIT_STATUS_OPT_NO_REFRESH flag is passed in.
This also changes the behavior of the git_diff functions that use
the index so that when an index is not explicitly passed in (i.e.
when the functions call git_repository_index for you), they will
also do a soft reload for you.
This intentionally breaks the file signature of git_index_read
because there has been some confusion about the behavior previously
and it seems like all existing uses of the API should probably be
examined to select the desired behavior.
The attempt to "clean up warnings" seems to have introduced some
new warnings on compliant compilers. This fixes those in a way
that I suspect will also be okay for the non-compliant compilers.
Also this fixes what appears to be an extra semicolon in the
repo initialization template dir handling (and as part of that
fix, handles the case where an error occurs correctly).
This adds a new public API for compiling pathspecs and matching
them against the working directory, the index, or a tree from the
repository. This also reworks the pathspec internals to allow the
sharing of code between the existing internal usage of pathspec
matching and the new external API.
While this is working and the new API is ready for discussion, I
think there is still an incorrect behavior in which patterns are
always matched against the full path of an entry without taking
the subdirectories into account (so "s*" will match "subdir/file"
even though it wouldn't with core Git). Further enhancements are
coming, but this was a good place to take a functional snapshot.
Fixed a few header @param and @return typos with the help of -Wdocumentation in Xcode.
The following warnings have not been fixed:
common.h:213 - Not sure how the documentation format is for '...'
notes.h:102 - Correct @param name but empty text
notes.h:111 - Correct @param name but empty text
pack.h:140 - @return missing text
pack.h:148 - @return missing text
This adds three new public APIs for manipulating the index:
1. `git_index_add_all` is similar to `git add -A` and will add
files in the working directory that match a pathspec to the
index while honoring ignores, etc.
2. `git_index_remove_all` removes files from the index that match
a pathspec.
3. `git_index_update_all` updates entries in the index based on
the current contents of the working directory, either added
the new information or removing the entry from the index.
Move the git_index_entry to the very top, since it provides the
main structure that needs to be understood by the reader, then
move the bitmasks for the flags and the flags_extended under that
since they are details for looking at particular fields of the
structure.
This removes the functions to duplicate and free copies of a
git_index_entry and updates the comments to explain that you
should just use the public definition of the struct as needed.
This adds git_index_entry_dup to make a copy of an existing entry
and git_index_entry_free to release the memory of the copy. It
also updates the documentation for git_index_get_bypath and
git_index_get_byindex to make it clear that the returned structure
should *not* be modified.
The constants for extracting data from git_index_entry flags and
flags_extended are not named in a way that makes it easy to know
where to use each one. This improves the docs for the flags (and
slightly reorganizes them), so it should be more obvious.
It is possible for there to be a submodule in a repository with
no .gitmodules file (for example, if the user forgot to commit
the .gitmodules file). In this case, core Git will just create
an empty directory as a placeholder for the submodule but
otherwise ignore it. We were generating an error and stopping
the checkout. This makes our behavior match that of core git.