This is a major reorganization of the diff code. This changes
the diff functions to use the iterators for traversing the
content. This allowed a lot of code to be simplified. Also,
this moved the functions relating to outputting a diff into a
new file (diff_output.c).
This includes a number of other changes - adding utility
functions, extending iterators, etc. plus more tests for the
diff code. This also takes the example diff.c program much
further in terms of emulating git-diff command line options.
This makes so much sense that I can't believe it hasn't been done
before. Kill the old `git_fbuffer` and read files straight into
`git_buf` objects.
Also: In order to fully support 4GB files in 32-bit systems, the
`git_buf` implementation has been changed from using `ssize_t` for
storage and storing negative values on allocation failure, to using
`size_t` and changing the buffer pointer to a magical pointer on
allocation failure.
Hopefully this won't break anything.
This create a new git_iterator type of object that provides a
uniform interface for iterating over the index, an arbitrary
tree, or the working directory of a repository.
As part of this, git ignore support was extended to support
push and pop of directory-based ignore files as the working
directory is being traversed (so the array of ignores does
not have to be recreated at each directory during traveral).
There are a number of other small utility functions in buffer,
path, vector, and fileops that are included in this patch
that made the iterator implementation cleaner.
This takes all of the functions that look up simple data about
paths (such as `git_futils_isdir`) and moves them over to path.h
(becoming `git_path_isdir`). This leaves fileops.h just with
functions that actually manipulate the filesystem or look at
the file contents in some way.
As part of this, the dir.h header which is really just for win32
support was moved into win32 (with some minor changes).
This fixes issue 532 that attributes (and gitignores) could not
be checked for files that don't exist. It should be possible to
query such things regardless of the existence of the file.
Adds support for .gitignore files to git_status_foreach() and
git_status_file(). This includes refactoring the gitattributes
code to share logic where possible. The GIT_STATUS_IGNORED flag
will now be passed in for files that are ignored (provided they
are not already in the index or the head of repo).
Add support for git attribute macro definitions. Also, add
support for cache flush API to clear the attribute file content
cache when needed.
Additionally, improved the handling of global and system files,
making common utility functions in fileops and converting config
and attr to both use the common functions.
Adds a bunch more tests and fixed some memory leaks. Note that
adding macros required me to use refcounted attribute assignment
definitions, which complicated, but probably improved memory usage.
This converts virtually all of the places that allocate GIT_PATH_MAX
buffers on the stack for manipulating paths to use git_buf objects
instead. The patch is pretty careful not to touch the public API
for libgit2, so there are a few places that still use GIT_PATH_MAX.
This extends and changes some details of the git_buf implementation
to add a couple of extra functions and to make error handling easier.
This includes serious alterations to all the path.c functions, and
several of the fileops.c ones, too. Also, there are a number of new
functions that parallel existing ones except that use a git_buf
instead of a stack-based buffer (such as git_config_find_global_r
that exists alongsize git_config_find_global).
This also modifies the win32 version of p_realpath to allocate whatever
buffer size is needed to accommodate the realpath instead of hardcoding
a GIT_PATH_MAX limit, but that change needs to be tested still.
To further match how Git behaves, this change makes most of the
directories libgit2 creates in a git repo have a file mode of
0777. Specifically:
- Intermediate directories created with git_futils_mkpath2file() have
0777 permissions. This affects odb_loose, reflog, and refs.
- The top level folder for bare repos is created with 0777
permissions.
- The top level folder for non-bare repos is created with 0755
permissions.
- /objects/info/, /objects/pack/, /refs/heads/, and /refs/tags/ are
created with 0777 permissions.
Additionally, the following changes have been made:
- fileops functions that create intermediate directories have grown a
new dirmode parameter. The only exception to this is filebuf's
lock_file(), which unconditionally creates intermediate directories
with 0777 permissions when GIT_FILEBUF_FORCE is set.
- The test runner now sets the umask to 0 before running any
tests. This ensurses all file mode checks are consistent across
systems.
- t09-tree.c now does a directory permissions check. I've avoided
adding this check to other tests that might reuse existing
directories from the prefabricated test repos. Because they're
checked into the repo, they have 0755 permissions.
- Other assorted directories created by tests have 0777 permissions.
There were quite a few places were spaces were being used instead of
tabs. Try to catch them all. This should hopefully not break anything.
Except for `git blame`. Oh well.
1. The license header is technically not valid if it doesn't have a
copyright signature.
2. The COPYING file has been updated with the different licenses used in
the project.
3. The full GPLv2 header in each file annoys me.
This extends the git_fuitls_readbuffer function to only read in if the
file's modification date is later than the given one. Some code paths
want to check a file's modification date in order to decide whether
they should read it or not. If they do want to read it, another stat
call is done by futils. This function combines these two operations so
we avoid one stat call each time we read a new or updated file.
The git_futils_readbuffer functions is now a wrapper around the new
function.
Signed-off-by: Carlos Martín Nieto <cmn@elego.de>
`git_futils_rmdir_r`: rename, clean up.
`git_reference_rename`: cleanup. Do not use 3x4096 buffers on the stack
or things will get ugly very fast. We can reuse the same buffer.
git_futils_rmdir_recurs() shall remove the given directory and all
subdirectories. This happens only if the directories are empty.
Signed-off-by: schu <schu-github@schulog.org>
The `stat` methods were having issues when called with a trailing slash
in Windows platforms.
We now use GetFileAttributes() where possible, which doesn't have this
restriction.
The old `git_fileops_prettify_path` has been replaced with
`git_path_prettify`. This is a much simpler method that uses the OS's
`realpath` call to obtain the full path for directories and resolve
symlinks.
The `realpath` syscall is the original POSIX call in Unix system and
an emulated version under Windows using the Windows API.
Cleaned up the structure of the whole OS-abstraction layer.
fileops.c now contains a set of utility methods for file management used
by the library. These are abstractions on top of the original POSIX
calls.
There's a new file called `posix.c` that contains
emulations/reimplementations of all the POSIX calls the library uses.
These are prefixed with `p_`. There's a specific posix file for each
platform (win32 and unix).
All the path-related methods have been moved from `utils.c` to `path.c`
and have their own prefix.
Handle Symlinks if they can be handled in Win32. This is not even
compiled. Needs review.
The lstat implementation is modified from core Git.
The readlink implementation is modified from PHP.
When calling gitfo_exists() on a symbolic link, sometimes we need to
simply check whether the link exists and sometimes we need to check
whether the file pointed to by the symlink exists.
Introduce a new function gitfo_shallow_exists that only checks if the
link exists and revert gitfo_exists to the original functionality of
checking whether the file pointed to by the link exists.
gitfo_exists() used to error out if the given file was a symbolic link,
due to access() returning an error code. This is not expected behaviour,
as gitfo_exists() should only check whether the file itself exists, not
its link target if it is a symbolic link.
Fix this by calling gitfo_lstat() instead, which is just a wrapper for
lstat().
Also fix the same error in index_init_entry().
retrieve_device returns the file device for a given path (so that we can detect device change while walking through parent directories).
abspath returns a canonicalized path, symbolic link free.
retrieive_ceiling_directories_offset returns the biggest path offset that path match in the ceiling directory list (so that we can stop at ceiling directories).
Temporary files when doing streaming writes are now stored inside the
Objects folder, to prevent issues when moving files between
disks/partitions.
Add support for block writes to the ODB again (for those backends that
cannot implement streaming).
When the system temporary folder is located on a different volume than the working directory into which libgit2 is executing, MoveFileEx() requires an additional flag.
Hey. Apologies in advance -- I broke your bindings.
This is a major commit that includes a long-overdue redesign of the
whole object-database structure. This is expected to be the last major
external API redesign of the library until the first non-alpha release.
Please get your bindings up to date with these changes. They will be
included in the next minor release. Sorry again!
Major features include:
- Real caching and refcounting on parsed objects
- Real caching and refcounting on objects read from the ODB
- Streaming writes & reads from the ODB
- Single-method writes for all object types
- The external API is now partially thread-safe
The speed increases are significant in all aspects, specially when
reading an object several times from the ODB (revwalking) and when
writing big objects to the ODB.
Here's a full changelog for the external API:
blob.h
------
- Remove `git_blob_new`
- Remove `git_blob_set_rawcontent`
- Remove `git_blob_set_rawcontent_fromfile`
- Rename `git_blob_writefile` -> `git_blob_create_fromfile`
- Change `git_blob_create_fromfile`:
The `path` argument is now relative to the repository's working dir
- Add `git_blob_create_frombuffer`
commit.h
--------
- Remove `git_commit_new`
- Remove `git_commit_add_parent`
- Remove `git_commit_set_message`
- Remove `git_commit_set_committer`
- Remove `git_commit_set_author`
- Remove `git_commit_set_tree`
- Add `git_commit_create`
- Add `git_commit_create_v`
- Add `git_commit_create_o`
- Add `git_commit_create_ov`
tag.h
-----
- Remove `git_tag_new`
- Remove `git_tag_set_target`
- Remove `git_tag_set_name`
- Remove `git_tag_set_tagger`
- Remove `git_tag_set_message`
- Add `git_tag_create`
- Add `git_tag_create_o`
tree.h
------
- Change `git_tree_entry_2object`:
New signature is `(git_object **object_out, git_repository *repo, git_tree_entry *entry)`
- Remove `git_tree_new`
- Remove `git_tree_add_entry`
- Remove `git_tree_remove_entry_byindex`
- Remove `git_tree_remove_entry_byname`
- Remove `git_tree_clearentries`
- Remove `git_tree_entry_set_id`
- Remove `git_tree_entry_set_name`
- Remove `git_tree_entry_set_attributes`
object.h
------------
- Remove `git_object_new
- Remove `git_object_write`
- Change `git_object_close`:
This method is now *mandatory*. Not closing an object causes a
memory leak.
odb.h
-----
- Remove type `git_rawobj`
- Remove `git_rawobj_close`
- Rename `git_rawobj_hash` -> `git_odb_hash`
- Change `git_odb_hash`:
New signature is `(git_oid *id, const void *data, size_t len, git_otype type)`
- Add type `git_odb_object`
- Add `git_odb_object_close`
- Change `git_odb_read`:
New signature is `(git_odb_object **out, git_odb *db, const git_oid *id)`
- Change `git_odb_read_header`:
New signature is `(size_t *len_p, git_otype *type_p, git_odb *db, const git_oid *id)`
- Remove `git_odb_write`
- Add `git_odb_open_wstream`
- Add `git_odb_open_rstream`
odb_backend.h
-------------
- Change type `git_odb_backend`:
New internal signatures are as follows
int (* read)(void **, size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* read_header)(size_t *, git_otype *, struct git_odb_backend *, const git_oid *)
int (* writestream)(struct git_odb_stream **, struct git_odb_backend *, size_t, git_otype)
int (* readstream)( struct git_odb_stream **, struct git_odb_backend *, const git_oid *)
- Add type `git_odb_stream`
- Add enum `git_odb_streammode`
Signed-off-by: Vicent Marti <tanoku@gmail.com>
We now use MoveFileEx, which is not assured to be atomic but works for
always (both if the destination exists, or if it doesn't) and is
available in MinGW.
Since this is a Win32 API call, complaint about lost or overwritten files
should be forwarded at Steve Ballmer.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The `rename` call doesn't quite work on Win32: expects the destination
file to not exist. We're using a native Win32 call in those cases --
that should do the trick.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
Clean up a provided absolute or relative directory path.
This prettification relies on basic operations such as coalescing multiple forward slashes into a single slash, removing '.' and './' current directory segments, and removing parent directory whenever '..' is encountered. If not empty, the returned path ends with a forward slash.
For instance, this will turn "d1/s1///s2/..//../s3" into "d1/s3/".
This only performs a string based analysis of the path. No checks are done to make sure the path actually makes sense from the file system perspective.
It was not being used by any methods (only by malloc and calloc), and
since it needs to be TLS, it cannot be exported on DLLs on Windows.
Burn it with fire. The API always returns error codes!
Signed-off-by: Vicent Marti <tanoku@gmail.com>
The constructor to git_repository is now called
'git_repository_open(path)'
and takes a path to a git repository instead of an existing ODB object.
Unit tests have been updated accordingly and the two test repositories
have been merged into one.
Signed-off-by: Vicent Marti <tanoku@gmail.com>
This supresses some "conversion from 'size_t' to 'unsigned int',
possible loss of data" warning messages from the MS Visual C/C++
compiler with -Wp64.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Andreas Ericsson <ae@op5.se>
In particular, the git__mmap() and git__munmap() routines provide
the interface to platform specific memory-mapped file facilities.
We provide implementations for unix and win32, which can be found
in their own sub-directories.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
On windows, unless we use the O_BINARY flag in the open()
call, the file I/O routines will perform line ending
conversion (\r\n => \n on input, \n => \r\n on output).
In addition to the performance penalty, most files in the
object database are binary and will, therefore, become
corrupted by this conversion.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, conditional expressions which contain an
assignment statement, where the expression type is not
explicitly made to be boolean, elicits the following
message:
warning 2: possible unintended assignment
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, using pointer arithmetic on void pointers,
despite being quite useful, is not legal in standard C.
Avoiding non-standard C constructs will help in porting
the library to other compilers/platforms.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Win32 has a variant of mmap that is harder to use than POSIX, but
to run natively and efficiently on Win32 we need some form of it.
gitfo_map_ro() provides a basic mmap function for use in locations
where we need read-only random data access to large ranges of a file,
such as a pack-*.idx.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
When scanning the pack directory we need to see if the path
name is present for ".idx" when we discover a ".pack" file.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Our fileops API is currently private. We aren't planning on supplying
a cross-platform file API to applications that link to us. If we did,
we'd probably whole-sale publish fileops, not just the dirent code.
By moving it to be private we can also change the call signature to
permit the buffer to be passed down through the call chain. This is
very helpful when we are doing a recursive scan as we can reuse just
one buffer in all stack frames, reducing the impact the recursion has
on the stack frames in the data cache.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
This error code indicates the OS error code has a better value
describing the last error, as it is likely a network or local
file IO problem identified by a C library function call.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
We now forbid direct use of malloc, strdup or calloc within the
library and instead use wrapper functions git__malloc, etc. to
invoke the underlying library malloc and set git_errno to a no
memory error code if the allocation fails.
In the future once we have pack objects in memory we are likely
to enhance these routines with garbage collection logic to purge
cached pack data when allocations fail. Because the size of the
function will grow somewhat large, we don't want to mark them for
inline as gcc tends to aggressively inline, creating larger than
expected executables.
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, when asked to read an empty file, this function
calls malloc() with a zero size allocation request. Standard C
says that the behaviour of malloc() in this case is implementation
defined.
[C99, 7.20.3 says "... If the size of the space requested is zero,
the behavior is implementation-defined: either a null pointer is
returned, or the behavior is as if the size were some nonzero
value, except that the returned pointer shall not be used to
access an object."]
Finesse the issue by over-allocating by one byte. Setting the extra
byte to '\0' may also provide a useful sentinel for text files.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
In particular, the gitfo_read_file() routine can be used to slurp
the complete file contents into an gitfo_buf structure. The buffer
content will be allocated by malloc() and may be released by the
gitfo_free_buf() routine. The io buffer type can be initialised
on the stack with the GITFO_BUF_INIT macro.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The PATH_MAX symbol is often, but not always, defined
in the <limits.h> header. In particular, on cygwin you
need to include this header to avoid a compilation error.
However, some systems define PATH_MAX to be something as
small as 256, which POSIX is happy to allow, while others
allow much larger values. In general it can vary from
one filesystem to another.
In order to avoid the vagaries of different systems, define
our own symbol.
Signed-off-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since at least MS have something like GetFirstDirEnt() and
GetNextDirEnt() (presumably with superior performance), we
can let MS hackers add support for a dirent walker using
that API instead, while we stick with the posix-style
readdir() calls.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
The idea is taken from Junio's work in read-cache.c, where
it's used for writing out the index without tap-dancing on
the poor harddrive. Since it's almost certainly useful for
cached writing of packfiles too, we turn it into a generic
API, making it perfectly simple to reuse it later.
gitfo_write_cached() has the same contract as gitfo_write(), it
returns GIT_SUCCESS if all bytes are successfully written (or were
at least buffered for later writing), and <0 if an error occurs
during buffer writing.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>
Since it doesn't make sense to make the disk access stuff
portable *AND* public (that's a job for each application
imo), we can take a shortcut and just support unixy stuff
for now and get away with coding most of it as macros.
Since we go with an internal API for starters and only
provide higher-level API's to the libgit users, we'll be
ok with this approach.
Signed-off-by: Andreas Ericsson <ae@op5.se>
Signed-off-by: Shawn O. Pearce <spearce@spearce.org>