If an image has an alpha map that has wide components, then we need to
use 64 bit processing for that image. We detect this situation in
pixman-image.c and remove the FAST_PATH_NARROW_FORMAT flag.
In pixman-general, the wide/narrow decision is now based on the flags
instead of on the formats.
This avoids a negative in the name. Also, by renaming the "wide"
variable in pixman-general.c to "narrow" and fixing up the logic
correspondingly, the code there reads a lot more straightforwardly.
- Test many more combinations of formats
- Test destination alpha maps
- Test various different alpha origins
Also add a transformation to the destination, but comment it out
because it is actually broken at the moment (and pretty difficult to
fix).
These variants of malloc() and free() try to surround the allocated
memory with protected pages so that out-of-bounds accessess will cause
a segmentation fault.
If mprotect() and getpagesize() are not available, these functions are
simply equivalent to malloc() and free().
This is the first demo implementation, it should be possible to
generalize it later to cover more operations with less lines of code.
It should be also possible to introduce the use of '__builtin_constant_p'
gcc builtin function for an efficient way of checking if 'unit_x' is known
to be zero at compile time (when processing padding pixels for NONE, or
PAD repeat).
Benchmarks from Intel Core i7 860:
== before (nearest OVER) ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=142.01 MPix/s
== after (nearest OVER) ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=314.99 MPix/s
== performance of nonscaled operation as a reference ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=652.09 MPix/s
Implemented very similar to PAD repeat.
And gcc also seems to be able to completely eliminate the
code responsible for left and right padding pixels for OVER
operation with NONE repeat.
When processing pixels from the left and right padding, the same
scanline function is used with 'unit_x' set to 0.
Actually appears that gcc can handle this quite efficiently. When
using 'restrict' keyword, it is able to optimize the whole operation
performed on left or right padding pixels to a small unrolled loop
(the code is reduced to a simple fill implementation):
9b30: 89 08 mov %ecx,(%rax)
9b32: 89 48 04 mov %ecx,0x4(%rax)
9b35: 48 83 c0 08 add $0x8,%rax
9b39: 49 39 c0 cmp %rax,%r8
9b3c: 75 f2 jne 9b30
Without 'restrict' keyword, there is one instruction more: reloading
source pixel data from memory in the beginning of each iteration. That
is slower, but also acceptable.
We need to implement a true PIXMAN_REPEAT_NONE support later (padding
the source with zero pixels). So it's better not to use PIXMAN_REPEAT_NONE
for handling FAST_PATH_SAMPLES_COVER_CLIP special case.
Scanline processing is now split into a separate function. This provides
an easy way of overriding it with a platform specific implementation,
which may use SIMD optimizations. Only basic C data types are used as
the arguments for this function, so it may be implemented entirely in
assembly or be generated by some JIT engine.
Also as a result of this split, the complexity of code is reduced a
bit and now it should be easier to introduce support for the currently
missing NONE, PAD and REFLECT repeat types.
These macros with some modifications can can be reused later by
various platform specific implementations, introducing SIMD
optimizations for nearest scaling fast paths.
Added a pair of macros which can help to detect corruption
of floating point registers after a function call. This may
happen if _mm_empty() call is forgotten in MMX/SSE2 fast
path code, or ARM NEON assembly optimized function
forgets to save/restore d8-d15 registers before use.
The rule is that the region passed in must be initialized and that the
region returned will still be valid. Ie., the lifecycle is the
responsibility of the caller, regardless of what the function returns.
Previously, compute_composite_region32() would finalize the region and
then return FALSE, and then the caller would finalize the region
again, leading to memory corruption in some cases.
This patch adresses the issue discussed in
http://lists.freedesktop.org/archives/pixman/2010-April/000163.html
There were only two clashing identifiers. The first one is IN, which
obviously causes problems in Pixman for lines like
PIXMAN_STD_FAST_PATH (IN, solid, a8, a8, fast_composite_in_n_8_8),
Fortunately the mingw headers provide a solution: by defining
_NO_W32_PSEUDO_MODIFIERS, these stupid symbols are skipped.
The other name is UINT64, used in pixman-mmx.c. I renamed that
function to to_uint64, but may be another name is more appropriate.
From time to time people run into issues where the configure script
detects GTK+ when it is either not installed, or not functional due to
a missing pixman. Most recently:
https://bugs.freedesktop.org/show_bug.cgi?id=29736
This patch makes the configure script more paranoid by
- always using PKG_CHECK_MODULES and not PKG_CHECK_EXISTS, since it
seems PKG_CHECK_EXISTS will sometimes return true even if a dependency
of GTK+, such as pixman-1, is missing.
- explicitly checking that pixman-1 is installed before enabling GTK+.
Cc: my.somewhat.lengthy.loginname@gmail.com
There is not much point having a separate function that just validates
the images. Also add a boolean return to lookup_composite_function()
so that we can return if no composite function is found.
Fixes the region-translate test case by clipping region translations to
the newly defined PIXMAN_REGION_MIN/MAX and using the newly introduced
type overflow_int_t to check for the overflow.
Also uses INT16_MAX or INT32_MAX for these values instead of relying on
the size of short and int types.
It doesn't make sense in other cases, and the computation would make
use of image->bits.{width,height} which lead to uninitialized memory
accesses when the image wasn't of type BITS.
This flag is set whenever the pixels of a bits image don't have an
alpha channel. Together with FAST_PATH_SAMPLES_COVER_CLIP it implies
that the image effectively is opaque, so we can do operator reductions
such as OVER->SRC.
If someone tries to set an alpha map that itself has an alpha map,
simply return. Also, if someone tries to add an alpha map to an image
that is being _used_ as an alpha map, simply return.
This ensures that an alpha map can never have an alpha map.
This tests what happens if you attempt to make an image with an alpha
map that has the image as its alpha map. This results in an infinite
loop in _pixman_image_validate(), so the test sets up a SIGALRM to
exit if it runs for more than five seconds.
Similarly to how the fast paths are done, put the various bits_image
fetchers in a table, so that we can quickly find the best one based on
the image's flags and format.
The flags are:
* AFFINE_TRANSFORM, for affine transforms
* Y_UNIT_ZERO, for when the 10 entry in the transformation is zero
* FILTER_BILINEAR, for when the image has a bilinear filter
* NO_NORMAL_REPEAT, for when the repeat mode is not NORMAL
* HAS_TRANSFORM, for when the transform is not NULL
Also add some new FAST_PATH_REPEAT_* macros. These are just shorthands
for the image not having any of the other repeat modes. For example
REPEAT_NORMAL is (NO_NONE | NO_PAD | NO_REFLECT).
These functions can simply be passed as arguments to the various pixel
fetchers. We don't need to store them. Since they are known at compile
time and the pixel fetchers are force_inline, this is not a
performance issue.
Also temporarily make all pixel access go through the alpha path.
This commit fixes two separate problems: 1. Incorrect computation of
the FAST_PATH_SAMPLES_COVER_CLIP flag, and 2. FAST_PATH_16BIT_SAFE is
a nonsensical thing to compute.
== 1. Incorrect computation of SAMPLES_COVER_CLIP:
Previously we were using pixman_transform_bounds() to compute which
source samples would be used for a composite operation. This is
incorrect for several reasons:
(a) pixman_transform_bounds() is transforming the integer bounding box
of the destination samples, where it should be transforming the
bounding box of the samples themselves. In other words, it is too
pessimistic in some cases.
(b) pixman_transform_bounds() is not rounding the same way as we do
during sampling. For example, for a NEAREST filter we subtract
pixman_fixed_e before rounding off to the nearest sample so that a
transformed value of 1 will round to the sample at 0.5 and not to the
one at 1.5. However, pixman_transform_bounds() would simply truncate
to 1 which would imply that the first sample to be used was the one at
1.5. In other words, it is too optimistic in some cases.
(c) The result of pixman_transform_bounds() does not account for the
interpolation filter applied to the source.
== 2. FAST_PATH_16BIT_SAFE is nonsensical
The FAST_PATH_16BIT_SAFE is a flag that indicates that various
computations can be safely done within a 16.16 fixed-point
variable. It was used by certain fast paths who relied on those
computations succeeding. The problem is that many other compositing
functions were making similar assumptions but not actually requiring
the flag to be set. Notably, all the general compositing functions
simply walk the source region using 16.16 variables. If the
transformation happens to overflow, strange things will happen.
So instead of computing this flag in certain cases, it is better to
simply detect that overflows will happen and not try to composite at
all in that case. This has the advantage that most compositing
functions can be written naturally way.
It does have the disadvantage that we are giving up on some cases that
previously worked, but those are all corner cases where the areas
involved were very close to the limits of the coordinate
system. Relying on these working reliably was always a somewhat
dubious proposition. The most important case that might have worked
previously was untransformed compositing involving images larger than
32 bits. But even in those cases, if you had REPEAT_PAD or
REPEAT_REFLECT turned on, you would hit bits_image_fetch_transformed()
which has the 16 bit limitations.
== Fixes
This patch fixes both problems by introducing a new function called
analyze_extents() that has the responsibility to reject corner cases,
and to compute flags based on the extents.
It does this through a new compute_sample_extents() function that will
compute a conservative (but tight) approximation to the bounding box
of the samples that will actually be needed. By basing the computation
on the positions of the _sample_ locations in the destination, and by
taking the interpolation filter into account, it fixes problem one.
The same function is also used with a one-pixel expanded version of
the destination extents. By checking if the transformed bounding box
will overflow 16.16 fixed point, it fixes problem two.