When no payload is set for `crlf_apply` we try to compute the
crlf attributes ourselves with `crlf_check`. When the function
determines that the current file does not require any treatment
we return the GIT_PASSTHROUGH error code without actually
allocating the out-pointer, which indicates the file should not
be passed through the filter.
The `crlf_apply` function explicitly checks for the
GIT_PASSTHROUGH return code and ignores it. This means we will
try to apply the crlf-filter to the current file, leading us to
dereference the unallocated payload-pointer.
Fix this obviously incorrect behavior by not treating
GIT_PASSTHROUGH in any special way. This is the correct thing to
do anyway, as the code indicates that the file should not be
passed through the filter.
Git for Windows 1.9.4 changed the behavior when the text=auto
attribute is specified and core.autocrlf=false. Previous observed
behavior would *not* filter files when going into the working
directory, the new behavior *does* filter. Update our behavior to match.
If you enabled core.safecrlf on an LF-ending platform, we would
error even for files with all LFs. We should only be warning on
irreversible mappings, I think.
Diff and status do not want core.safecrlf to actually raise an
error regardless of the setting, so this extends the filter API
with an additional options flags parameter and adds a flag so that
filters can be applied with GIT_FILTER_OPT_ALLOW_UNSAFE, indicating
that unsafe filter application should be downgraded from a failure
to a warning.
The git_buf_text_gather_stats call returns a boolean indicating if
the file looks like binary data. That shouldn't be an error; it
should be used to skip CRLF processing though.
This contains a few bug fixes and some header and API cleanups.
The main API change is that filters should now use GIT_PASSTHROUGH
to indicate that they wish to skip processing a file instead of
GIT_ENOTFOUND.
The bug fixes include a possible out-of-range buffer access in
the ident filter, a filter ordering problem I introduced into the
custom filter tests on Windows, and a filter buf NUL termination
issue that was coming up on Linux.
Checkout should not reject binary files from filters, as a filter
may actually wish to operate on binary files. The CRLF filter should
reject binary files itself if it wishes to. Moreover, the CRLF
filter requires this logic so that users can emulate the checkout
data in their odb -> workdir filtering.
Conflicts:
src/checkout.c
src/crlf.c
This makes the git_buf struct that was used internally into an
externally available structure and eliminates the git_buffer.
As part of that, some of the special cases that arose with the
externally used git_buffer were blended into the git_buf, such as
being careful about git_buf objects that may have a NULL ptr and
allowing for bufs with a valid ptr and size but zero asize as a
way of referring to externally owned data.
This adds the ident filter (that knows how to replace $Id$) and
tweaks the filter APIs and code so that git_filter_source objects
actually have the updated OID of the object being filtered when
it is a known value.
Extend the git2/sys/filter API with functions to look up a filter
and add it manually to a filter list. This requires some trickery
because the regular attribute lookups and checks are bypassed when
this happens, but in the right hands, it will allow a user to have
granular control over applying filters.
This moves the git_filter_list into the public API so that users
can create, apply, and dispose of filter lists. This allows more
granular application of filters to user data outside of libgit2
internals.
This also converts all the internal usage of filters to the public
APIs along with a few small tweaks to make it easier to use the
public git_buffer stuff alongside the internal git_buf.
The filter registry as implemented was too primitive to actually
work once multiple filters were coming into play. This expands
the implementation of the registry to handle multiple prioritized
filters correctly.
Additionally, this adds an "attributes" field to a filter that
makes it really really easy to implement filters that are based
on one or more attribute values. The lookup and even simple value
checking can all happen automatically without custom filter code.
Lastly, with the registry improvements, this fills out the filter
lifecycle callbacks, with initialize and shutdown callbacks that
will be called before the filter is first used and after it is
last invoked. This allows for system-wide initialization and
cleanup by the filter.
This creates include/sys/filter.h with a basic definition of a
git_filter and then converts the internal code to use it. There
are related internal objects (git_filter_list) that we will want
to publish at some point, but this is a first step.
This begins the process of exposing git_filter objects to the
public API. This includes:
* new public type and API for `git_buffer` through which an
allocated buffer can be passed to the user
* new API `git_blob_filtered_content`
* make the git_filter type and GIT_FILTER_TO_... constants public
This is a significant reorganization of the diff code to break it
into a set of more clearly distinct files and to document the new
organization. Hopefully this will make the diff code easier to
understand and to extend.
This adds a new `git_diff_driver` object that looks of diff driver
information from the attributes and the config so that things like
function content in diff headers can be provided. The full driver
spec is not implemented in the commit - this is focused on the
reorganization of the code and putting the driver hooks in place.
This also removes a few #includes from src/repository.h that were
overbroad, but as a result required extra #includes in a variety
of places since including src/repository.h no longer results in
pulling in the whole world.
This adds crlf/lf conversion functions into buf_text with more
efficient implementations that bypass the high level buffer
functions. They attempt to minimize the number of reallocations
done and they directly write the buffer data as needed if they
know that there is enough memory allocated to memcpy data.
Tests are added for these new functions. The crlf.c code is
updated to use the new functions.
Removed the include of buf_text.h from filter.h and just include
it more narrowly in the places that need it.
This adds a check to the drop_crlf filter path to check it the
file in the index already has a CR in it, in which case this will
not drop the CRs from the workdir file contents.
This uncovered a "bug" in `git_blob_create_fromworkdir` where the
full path to the file was passed to look up the attributes instead
of the relative path from the working directory root. This meant
that the check in the index for a pre-existing entry of the same
name was failing.
There are many scattered functions that look into the contents of
buffers to do various text manipulations (such as escaping or
unescaping data, calculating text stats, guessing if content is
binary, etc). This groups all those functions together into a
new file and converts the code to use that.
This has two enhancements to existing functionality. The old
text stats function is significantly rewritten and the BOM
detection code was extended (although largely we can't deal with
anything other than a UTF8 BOM).
The existing `git_odb_hashfile` does not apply text filtering
rules because it doesn't have a repository context to evaluate
the correct rules to apply. This adds a new hashfile function
that will apply repository-specific filters (based on config,
attributes, and filename) before calculating the hash.
Depending on the operation, we need to consider gitattributes
in both the work dir and the index. This adds a parameter to
all of the gitattributes related functions that allows user
control of attribute reading behavior (i.e. prefer workdir,
prefer index, only use index).
This fix also covers allowing us to check attributes (and
hence do diff and status) on bare repositories.
This was a somewhat larger change that I hoped because it had
to change the cache key used for gitattributes files.
This also includes droping `git_buf_lasterror` because it makes no sense
in the new system. Note that in most of the places were it has been
dropped, the code needs cleanup. I.e. GIT_ENOMEM is going away, so
instead it should return a generic `-1` and obviously not throw
anything.
The point of having `GIT_ATTR_TRUE` and `GIT_ATTR_FALSE` macros is to be
able to change the way that true and false values are stored inside of
the returned gitattributes value pointer.
However, if these macros are implemented as a simple rename for the
`git_attr__true` pointer, they will always be used with the `==`
operator, and hence we cannot really change the implementation to any
other way that doesn't imply using special pointer values and comparing
them!
We need to do the same thing that core Git does, which is using a
function macro. With `GIT_ATTR_TRUE(attr)`, we can change
internally the way that these values are stored to anything we want.
This commit does that, and rewrites a large chunk of the attributes test
suite to remove duplicated code for expected attributes, and to
properly test the function macro behavior instead of comparing
pointers.