When b is 0, avoid the division by zero and just return transparent
black.
When the solution t would have an invalid radius (negative or outside
[0,1] for none-extended gradients), return transparent black.
All objects using test/util.c fail to link:
| CCLD region-test
| /usr/bin/ld: utils.o: in function enable_fp_exceptions:utils.c(.text+0x939): error: undefined reference to 'feenableexcept'
There's indeed no explicit dependency on -lm, and if HAVE_FEENABLEEXCEPT
happens to be set, test/util.c uses feenableexcept(), which is nowhere
to be found while linking.
Fix this by adding -lm to TEST_LDADD, although two alternatives could be
thought of:
- Only specifying -lm for objects using util.c.
- Introducing a conditional to add -lm only when configure detects
have_feenableexcept=yes.
Signed-off-by: Cyril Brulebois <kibi@debian.org>
Remove a stray #include <fenv.h> added in commit 2444b2265a
to fix compilation on platforms which don't have fenv.h
Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
This test program tries to use as many rarely-used features as
possible, including alpha maps, accessor functions, oddly-sized
images, strange transformations, conical gradients, etc.
The hope is to provoke crashes or irregular behavior in pixman.
The temporary scanline buffer allocated on stack was declared
as uint8_t array. As a result, the compiler was free to select
any arbitrary alignment for it (even though there is typically
no reason to use really weird alignments here and the stack is
normally at least 4 bytes aligned on most platforms). Having
improper alignment is non-portable and can impact performance
or even make the code misbehave depending on the target platform.
Using uint64_t type for this array should ensure that any possible
memory accesses done by pixman code are going to be handled correctly
(pixman-combine64.c can access this buffer via uint64_t * pointer).
Some alignment related problem was reported in:
http://lists.freedesktop.org/archives/pixman/2010-November/000747.html
With this optimization added, pixman assisted conversion from
non-premultiplied to premultiplied alpha format is now fully
NEON optimized (both with and without R/B color components
swapping in the process).
Not all types of operations can be skipped when having transparent
solid source or transparent solid mask. Add an extra flags parameter
for providing this information to the wrappers.
Renamed suppementary macros from 'over_n_8_0565' to 'over_8888_8_0565',
because they can actually support all variants of this operation:
over_8888_8_0565/over_n_8_0565/over_8888_n_0565.
Also 'over_8888_8_0565' now uses more optimized common code instead of its
own variant, improving performance a bit. Even though this operation is
still memory bandwidth limited, scaled variants of these fast paths may
put more stress on CPU later.
Benchmarked on ARM Cortex-A8 @500MHz:
== before ==
over_8888_8_0565 = L1: 67.10 L2: 53.82 M: 44.70 (105.17%)
HT: 18.73 VT: 16.91 R: 14.25 RT: 4.80 (52Kops/s)
== after ==
over_8888_8_0565 = L1: 77.83 L2: 58.14 M: 44.82 (105.52%)
HT: 20.58 VT: 17.44 R: 15.05 RT: 4.88 (52Kops/s)
Code rearranged to get better instructions scheduling for ARM Cortex-A8/A9.
Now it is ~30% faster for the pixel data in L1 cache and makes better use
of memory bandwidth when running at lower clock frequencies (ex. 500MHz).
Also register d24 (pixels from the mask image) is now not clobbered by
supplementary macros, which allows to reuse them for the other variants
of compositing operations later.
Benchmark from ARM Cortex-A8 @500MHz:
== before ==
over_n_8_0565 = L1: 63.90 L2: 63.15 M: 60.97 ( 73.53%)
HT: 28.89 VT: 24.14 R: 21.33 RT: 6.78 ( 67Kops/s)
== after ==
over_n_8_0565 = L1: 82.64 L2: 75.19 M: 71.52 ( 84.14%)
HT: 30.49 VT: 25.56 R: 22.36 RT: 6.89 ( 68Kops/s)
This macro hides the implementation details of pixels fetching
for the mask image just like 'fetch_src_pixblock' does for the
source image. This provides more possibilities for reusing the
same code blocks in different compositing functions.
This patch does not introduce any functional changes and the
resulting code in the compiled object file is exactly the same.
There are versions for all combinations of x8r8g8b8/a8r8g8b8 and
pad/repeat/none/normal repeat modes. The bulk of each function is an
inline function that takes a format and a repeat mode as parameters.
Radial gradients are "conical", thus they can have some non-opaque
parts even if all of their stops are completely opaque.
To guarantee that a radial gradient is actually opaque, it needs to
also have one of the two circles containing the other one. In this
case when extrapolating, the whole plane is completely covered (as
explained in the comment in pixman-radial-gradient.c).
The performance improvement is only in the ballpark of 5% when
compared against C code built with a reasonably good compiler
(gcc 4.5.1). But gcc 4.4 produces approximately 30% slower code
here, so assembly optimization makes sense to avoid dependency
on the compiler quality and/or optimization options.
Benchmark from ARM11:
== before ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=34.86 MPix/s
== after ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=36.62 MPix/s
Benchmark from ARM Cortex-A8:
== before ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=89.55 MPix/s
== after ==
op=1, src_fmt=10020565, dst_fmt=10020565, speed=94.91 MPix/s