This is just a bunch of small fixes that I noticed while looking
at the UTF8 and UTF16 path stuff. It fixes a slowdown in looking
for an empty directory (not exiting loop asap), makes the dir name
in the git__DIR structure be a GIT_FLEX_ARRAY to save an allocation,
and fixes some slightly odd assumptions in the cl_getenv helper.
The routines to push and pop ignore files while traversing a
directory had some issues. In particular, setting up the initial
list would sometimes push an ignore file before it ought to be
applied if the starting path was a directory containing an ignore
file. Also, the pop function was not always matching the right
part of the path and would fail to pop ignores from the list in
some cases.
This adds some tests that exercise a particular problematic case
and then fixes the problems that I could find related to this.
At some point, I'd like to isolate this ignore rule management
code and rewrite it, but that's a larger project and right now,
I'll opt to just try to fix the broken behaviors.
1. Fix sort order problem with submodules where "mod" was sorting
after "mod-plus" because they were being sorted as "mod/" and
"mod-plus/". This involved pushing the "contains a .git entry"
test significantly lower in the stack.
2. Reinstate behavior that a directory which contains a .git entry
will be treated as a submodule during iteration even if it is
not yet added to the .gitmodules.
3. Now that any directory containing .git is reported as submodule,
we have to be more careful checking for GIT_EEXISTS when we
do a submodule lookup, because that is the error code that is
returned by git_submodule_lookup when you try to look up a
directory containing .git that has no record in gitmodules or
the index.
This updates the tree iterator internals to be more efficient.
The tree_iterator_entry objects are now kept as pointers that are
allocated from a git_pool, so that we may use git__tsort_r for
sorting (which is better than qsort, given that the tree is
likely mostly ordered already).
Those tree_iterator_entry objects now keep direct pointers to the
data they refer to instead of keeping indirect index values. This
simplifies a lot of the data structure traversal code.
This also adds bsearch to find the start item position for range-
limited tree iterators, and is more explicit about using
git_path_cmp instead of reimplementing it. The git_path_cmp
changed a bit to make it easier for tree_iterators to use it (but
it was barely being used previously, so not a big deal).
This adds a git_pool_free_array function that efficiently frees a
list of pool allocated pointers (which the tree_iterator keeps).
Also, added new tests for the git_pool free list functionality
that was not previously being tested (or used).
The `git_iterator_reset` command has not been working in all cases
particularly when there is a start and end range. This fixes it
and adds tests for it, and also extends it with the ability to
update the start/end range strings when an iterator is reset.
There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc). This groups all those functions together into a
new file and converts the code to use that.
This has two enhancements to existing functionality. The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).
This is a major reworking of checkout strategy options. The
checkout code is now sensitive to the contents of the HEAD tree
and the new options allow you to update the working tree so that
it will match the index content only when it previously matched
the contents of the HEAD. This allows you to, for example, to
distinguish between removing files that are in the HEAD but not
in the index, vs just removing all untracked files.
Because of various corner cases that arise, etc., this required
some additional capabilities in rmdir and other utility functions.
This includes the beginnings of an implementation of code to read
a partial tree into the index based on a pathspec, but that is
not enabled because of the possibility of creating conflicting
index entries.
This extends git_repository_init_ext further with support for
initializing the repository from an external template directory
and with support for the "create shared" type flags that make a
set GID repository directory.
This also adds tests for much of the new functionality to the
existing `repo/init.c` test suite.
Also, this adds a bunch of new utility functions including a
very general purpose `git_futils_mkdir` (with the ability to
make paths and to chmod the paths post-creation) and a file
tree copying function `git_futils_cp_r`. Also, this includes
some new path functions that were useful to keep the code
simple.
This makes it easy to take a buffer containing a path with relative
references (i.e. .. or . path segments) and resolve all of those
into a clean path. This can be applied to URLs as well as file
paths which can be useful.
As part of this, I made the drive-letter detection apply on all
platforms, not just windows. If you give a path that looks like
"c:/..." on any platform, it seems like we might as well detect
that as a rooted path. I suppose if you create a directory named
"x:" on another platform and want to use that as the beginning
of a relative path under the root directory of your repo, this
could cause a problem, but then it seems like you're asking for
trouble.
On GNU, the d_name field of the dirent structure is defined as "char d_name[1]",
so we must allocate more than sizeof(struct dirent) bytes, just like on Sun.
When checking for a drive letter on windows, instead of using
isalpha(), it is better to just check for a..z and A..Z, I think,
particularly because the MS isalpha implementation appears to
assert when given an 0xFF byte.
On Solaris, struct dirent is defined differently than Linux. The field
containing the path name is of size 0, rather than NAME_MAX. So, we need to
use a properly sized buffer on Solaris to avoid a stack overflow.
Also fix some DIR* leaks on cleanup.
Add a new command `git_repository_open_ext` with extended options
that control how searching for a repository will be done. The
existing `git_repository_open` and `git_repository_discover` are
reimplemented on top of it. We may want to change the default
behavior of `git_repository_open` but this commit does not do that.
Improve support for "gitdir" files where the work dir is separate
from the repo and support for the "separate-git-dir" config. Also,
add support for opening repos created with `git-new-workdir` script
(although I have only confirmed that they can be opened, not that
all functions work correctly).
There are also a few minor changes that came up:
- Fix `git_path_prettify` to allow in-place prettifying.
- Fix `git_path_root` to support backslashes on Win32. This fix
should help many repo open/discover scenarios - it is the one
function called when opening before prettifying the path.
- Tweak `git_config_get_string` to set the "out" pointer to NULL
if the config value is not found. Allows some other cleanup.
- Fix a couple places that should have been calling
`git_repository_config__weakptr` and were not.
- Fix `cl_git_sandbox_init` clar helper to support bare repos.
This converts blob.c, fileops.c, and all of the win32 files.
Also, various minor cleanups throughout the code. Plus, in
testing the win32 build, I cleaned up a bunch (although not
all) of the warnings with the 64-bit build.
This continues to add other files to the new error handling
style. I think the only real concerns here are that there are
a couple of error return cases that I have converted to asserts,
but I think that it was the correct thing to do given the new
error style.
This also includes droping `git_buf_lasterror` because it makes no sense
in the new system. Note that in most of the places were it has been
dropped, the code needs cleanup. I.e. GIT_ENOMEM is going away, so
instead it should return a generic `-1` and obviously not throw
anything.