That's a bad assumption to make, even though right now it holds
(because of the way we've implemented decompression of packfiles),
this may change in the future, given that ODB objects can be
binary data.
Furthermore, the ODB object can return a NULL pointer if the object
is empty. Copying the NULL pointer to the strbuf lets us handle it
like an empty string. Again, the NULL pointer is valid behavior because
you're supposed to check the *size* of the object before working
on it.
A rule "src" in src/.gitignore must only match subdirectories of
src/. The current code does not include this context in the match rule
and would thus consider this rule to match the top-level src/ directory
instead of the intended src/src/.
Keep track fo the context in which the rule was defined so we can
perform a prefix match.
We currently consider CR to start the end of the line, but that means
that we miss cases with CR CR LF which can be used with git to match
files whose names have CR at the end of their names.
The fix from the patch comes from Russell's comment in the issue.
This fixes#2536.
While scanning through a directory hierarchy, this prevents a
positive ignore match on a parent directory from blocking the scan
of a directory when a negative match rule exists for files inside
the directory.
The diff code was using an "ignored_prefix" directory to track if
a parent directory was ignored that contained untracked files
alongside tracked files. Unfortunately, when negative ignore rules
were used for directories inside ignored parents, the wrong rules
were applied to untracked files inside the negatively ignored
child directories.
This commit moves the logic for ignore containment into the workdir
iterator (which is a better place for it), so the ignored-ness of
a directory is contained in the frame stack during traversal. This
allows a child directory to override with a negative ignore and yet
still restore the ignored state of the parent when we traverse out
of the child.
Along with this, there are some problems with "directory only"
ignore rules on container directories. Given "a/*" and "!a/b/c/"
(where the second rule is a directory rule but the first rule is
just a generic prefix rule), then the directory only constraint
was having "a/b/c/d/file" match the first rule and not the second.
This was fixed by having ignore directory-only rules test a rule
against the prefix of a file with LEADINGDIR enabled.
Lastly, spot checks for ignores using `git_ignore_path_is_ignored`
were tested from the top directory down to the bottom to deal with
the containment problem, but this is wrong. We have to test bottom
to top so that negative subdirectory rules will be checked before
parent ignore rules.
This does change the behavior of some existing tests, but it seems
only to bring us more in line with core Git, so I think those
changes are acceptable.
Only apply LEADING_DIR pattern munging to patterns in ignore and
attribute files, not to pathspecs used to select files to operate
on. Also, allow internal macro definitions to be evaluated before
loading all external ones (important so that external ones can
make use of internal `binary` definition).
Ignore patterns that ended with a trailing '/*' were still needing
to match against another actual '/' character in the full path.
This is not the same behavior as core Git.
Instead, we strip a trailing '/*' off of any patterns that were
matching and just take it to imply the FNM_LEADING_DIR behavior.
There was a latent bug where files that use macro definitions
could be parsed before the macro definitions were loaded. Because
of attribute file caching, preloading files that are going to be
used doesn't add a significant amount of overhead, so let's always
preload any files that could contain macros before we assemble the
actual vector of files to scan for attributes.
The checks to see if files were out of date in the attibute cache
was wrong because the cache-breaker data wasn't getting stored
correctly. Additionally, when the cache-breaker triggered, the
old file data was being leaked.
I don't love this approach, but achieving thread-safety for
attribute and ignore data while reloading files would require a
larger rewrite in order to avoid this. If an attribute or ignore
file is out of date, this holds a lock on the file while we are
reloading the data so that another thread won't try to reload the
data at the same time.
This is a big refactoring of the attribute file cache to be a bit
simpler which in turn makes it easier to enforce a lock around any
updates to the cache so that it can be used in a threaded env.
Tons of changes to the attributes and ignores code.
This adds a basic test of doing simultaneous diffs on multiple
threads and adds basic locking for the attr file cache because
that was the immediate problem that arose from these tests.
Ignore rules with slashes in them are matched using FNM_PATHNAME
and use the path to the .gitignore file from the root of the
repository along with the path fragment (including slashes) in
the ignore file itself. Unfortunately, the relative path to the
.gitignore file was being applied to the global core.excludesfile
if that was also named ".gitignore".
This fixes that with more precise matching and includes test for
ignore rules with leading slashes (which were the primary example
of this being broken in the real world).
This also backports an improvement to the file context logic from
the threadsafe-iterators branch where we don't rely on mutating
the key of the attribute file name to generate the context path.
This rolls back the changes to fnmatch parsing from commit
2e40a60e84 except for the tests
that were added. Instead this adds couple of new flags that can
be passed in when attempting to parse an fnmatch pattern. Also,
this changes the pathspec match logic to special case matching a
filename with a '!' prefix against a negative pattern.
This fixes the build.
Files in status will, be default, be sorted according to the case
insensitivity of the filesystem that we're running on. However,
in some cases, this is not desirable. Even on case insensitive
file systems, 'git status' at the command line will generally use
a case sensitive sort (like 'ls'). Some GUIs prefer to display a
list of file case insensitively even on case-sensitive platforms.
This adds two new flags: GIT_STATUS_OPT_SORT_CASE_SENSITIVELY
and GIT_STATUS_OPT_SORT_CASE_INSENSITIVELY that will override the
default sort order of the status output and give the user control.
This includes tests for exercising these new options and makes
the examples/status.c program emulate core Git and always use a
case sensitive sort.
The goal of this work is to expose the search logic for "global",
"system", and "xdg" files through the git_libgit2_opts() interface.
Behind the scenes, I changed the logic for finding files to have a
notion of a git_strarray that represents a search path and to store
a separate search path for each of the three tiers of config file.
For each tier, I implemented a function to initialize it to default
values (generally based on environment variables), and then general
interfaces to get it, set it, reset it, and prepend new directories
to it.
Next, I exposed these interfaces through the git_libgit2_opts
interface, reusing the GIT_CONFIG_LEVEL_SYSTEM, etc., constants
for the user to control which search path they were modifying.
There are alternative designs for the opts interface / argument
ordering, so I'm putting this phase out for discussion.
Additionally, I ended up doing a little bit of clean up regarding
attr.h and attr_file.h, adding a new attrcache.h so the other two
files wouldn't have to be included in so many places.
This extends git_repository_init_ext further with support for
initializing the repository from an external template directory
and with support for the "create shared" type flags that make a
set GID repository directory.
This also adds tests for much of the new functionality to the
existing `repo/init.c` test suite.
Also, this adds a bunch of new utility functions including a
very general purpose `git_futils_mkdir` (with the ability to
make paths and to chmod the paths post-creation) and a file
tree copying function `git_futils_cp_r`. Also, this includes
some new path functions that were useful to keep the code
simple.
Fixes#824
Exporting variables in a dynamic library is a PITA. Let's keep
these values internally and wrap them through a helper method.
This doesn't break the external API. @arrbee, aren't you glad I turned
the `GIT_ATTR_` macros into function macros? ✨
This fixes two bugs:
* Issue #728 where git_status_file was not working for files
that contain spaces. This was caused by reusing the "fnmatch"
parsing code from ignore and attribute files to interpret the
"pathspec" that constrained the files to apply the status to.
In that code, unescaped whitespace was considered terminal to
the pattern, so a file with internal whitespace was excluded
from the matched files. The fix was to add a mode to that code
that allows spaces and tabs inside patterns. This mode only
comes into play when parsing in-memory strings.
* The other issue was undetected, but it was in the recently
added code to reload gitattributes / gitignores when they were
changed on disk. That code was not clearing out the old values
from the cached file content before reparsing which meant that
newly added patterns would be read in, but deleted patterns
would not be removed. The fix was to clear the vector of
patterns in a cached file before reparsing the file.
The goal of this work is to rewrite git_status_file to use the
same underlying code as git_status_foreach.
This is done in 3 phases:
1. Extend iterators to allow ranged iteration with start and
end prefixes for the range of file names to be covered.
2. Improve diff so that when there is a pathspec and there is
a common non-wildcard prefix of the pathspec, it will use
ranged iterators to minimize excess iteration.
3. Rewrite git_status_file to call git_status_foreach_ext
with a pathspec that covers just the one file being checked.
Since ranged iterators underlie the status & diff implementation,
this is actually fairly efficient. The workdir iterator does
end up loading the contents of all the directories down to the
single file, which should ideally be avoided, but it is pretty
good.
Depending on the operation, we need to consider gitattributes
in both the work dir and the index. This adds a parameter to
all of the gitattributes related functions that allows user
control of attribute reading behavior (i.e. prefer workdir,
prefer index, only use index).
This fix also covers allowing us to check attributes (and
hence do diff and status) on bare repositories.
This was a somewhat larger change that I hoped because it had
to change the cache key used for gitattributes files.
We were not following the git behavior for leading slashes
in path names when matching git ignores and git attribute
file patterns. This should fix issue #638.
This updates khash.h with some extra features (like error checking
on allocations, ability to use wrapped malloc, foreach calls, etc),
creates two high-level wrappers around khash: `git_khash_str` and
`git_khash_oid` for string-to-void-ptr and oid-to-void-ptr tables,
then converts all of the old usage of `git_hashtable` over to use
these new hashtables.
For `git_khash_str`, I've tried to create a set of macros that
yield an API not too unlike the old `git_hashtable` API. Since
the oid hashtable is only used in one file, I haven't bother to
set up all those macros and just use the khash APIs directly for
now.
This converts the git attr related code (including ignores) and
the git diff related code (and implicitly the status code) to use
`git_pools` for storing strings. This reduces the number of small
blocks allocated dramatically.
This adds preliminary support for pathspecs to diff and status.
The implementation is not very optimized (it still looks at
every single file and evaluated the the pathspec match against
them), but it works.
This adds support for a bunch of core.* settings that affect
diff and status, plus fixes up some incorrect implementations
of those settings from before. Also, this cleans up the
handling of config settings in the new submodules code and
in the old attrs/ignore code.
This continues to add other files to the new error handling
style. I think the only real concerns here are that there are
a couple of error return cases that I have converted to asserts,
but I think that it was the correct thing to do given the new
error style.
The point of having `GIT_ATTR_TRUE` and `GIT_ATTR_FALSE` macros is to be
able to change the way that true and false values are stored inside of
the returned gitattributes value pointer.
However, if these macros are implemented as a simple rename for the
`git_attr__true` pointer, they will always be used with the `==`
operator, and hence we cannot really change the implementation to any
other way that doesn't imply using special pointer values and comparing
them!
We need to do the same thing that core Git does, which is using a
function macro. With `GIT_ATTR_TRUE(attr)`, we can change
internally the way that these values are stored to anything we want.
This commit does that, and rewrites a large chunk of the attributes test
suite to remove duplicated code for expected attributes, and to
properly test the function macro behavior instead of comparing
pointers.
This makes so much sense that I can't believe it hasn't been done
before. Kill the old `git_fbuffer` and read files straight into
`git_buf` objects.
Also: In order to fully support 4GB files in 32-bit systems, the
`git_buf` implementation has been changed from using `ssize_t` for
storage and storing negative values on allocation failure, to using
`size_t` and changing the buffer pointer to a magical pointer on
allocation failure.
Hopefully this won't break anything.