Commit Graph

1929 Commits

Author SHA1 Message Date
Søren Sandmann Pedersen
51a5e949f3 Virtualize iterator initialization
Make src_iter_init() and dest_iter_init() virtual methods in the
implementation struct. This allows individual implementations to plug
in their own CPU specific scanline fetchers.
2011-01-18 12:42:26 -05:00
Søren Sandmann Pedersen
6503c6edcc Move iterator initialization to the respective image files
Instead of calling _pixman_image_get_scanline_32/64(), move the
iterator initialization into the respecive image implementations and
call the scanline generators directly.
2011-01-18 12:42:26 -05:00
Søren Sandmann Pedersen
23c6e1d2c0 Eliminate the _pixman_image_store_scanline_32/64 functions
They were only called from next_line_write_narrow/wide, so they could
simply be absorbed into those functions.
2011-01-18 12:42:25 -05:00
Søren Sandmann Pedersen
b2c9eaa502 Move initialization of iterators for bits images to pixman-bits-image.c
pixman_iter_t is now defined in pixman-private.h, and iterators for
bits images are being initialized in pixman-bits-image.c
2011-01-18 12:42:25 -05:00
Søren Sandmann Pedersen
15b1645c7b Add iterators in the general implementation
We add a new structure called a pixman_iter_t that encapsulates the
information required to read scanlines from an image. It contains two
functions, get_scanline() and write_back(). The get_scanline()
function will generate pixels for the current scanline. For iterators
for source images, it will also advance to the next scanline. The
write_back() function is only called for destination images. Its
function is to write back the modified pixels to the image and then
advance to the next scanline.

When an iterator is initialized, it is passed this information:

   - The image to iterate

   - The rectangle to be iterated

   - A buffer that the iterator may (but is not required to) use. This
     buffer is guaranteed to have space for at least width pixels.

   - A flag indicating whether a8r8g8b8 or a16r16g16b16 pixels should
     be fetched

There are a number of (eventual) benefits to the iterators:

   - The initialization of the iterator can be virtualized such that
     implementations can plug in their own CPU specific get_scanline()
     and write_back() functions.

   - If an image is horizontal, it can simply plug in an appropriate
     get_scanline(). This way we can get rid of the annoying
     classify() virtual function.

   - In general, iterators can remember what they did on the last
     scanline, so for example a REPEAT_NONE image might reuse the same
     data for all the empty scanlines generated by the zero-extension.

   - More detailed information can be passed to iterator, allowing
     more specialized fetchers to be used.

   - We can fix the bug where destination filters and transformations
     are not currently being ignored as they should be.

However, this initial implementation is not optimized at all. We lose
several existing optimizations:

   - The ability to composite directly in the destination
   - The ability to only fetch one scanline for horizontal images
   - The ability to avoid fetching the src and mask for the CLEAR
     operator

Later patches will re-introduce these optimizations.
2011-01-18 12:42:25 -05:00
Siarhei Siamashka
255d624e50 ARM: do /proc/self/auxv based cpu features detection only in linux
This method is linux specific, but earlier it was tried for any platform
that did not have _MSC_VER macro defined.
2011-01-16 23:40:38 +02:00
Siarhei Siamashka
2bbd553bd2 A new configure option --enable-static-testprogs
This option can be used for building fully static binaries of the test
programs so that they can be easily run using qemu-user. With binfmt-misc
configured, 'make check' works fine for crosscompiled pixman builds.
2011-01-16 23:40:34 +02:00
Siarhei Siamashka
55bbccf84e Make 'fast_composite_scaled_nearest_*' less suspicious
Taking address of a variable and then using it as an array looks suspicious
to static code analyzers. So change it into an array with 1 element to make
them happy. Both old and new variants of this code are correct because 'vx'
and 'unit_x' arguments are set to 0 and it means that the called scanline
function can only access a single element of 'zero' buffer.
2011-01-16 22:32:33 +02:00
Siarhei Siamashka
ae70b38d40 Bugfix for a corner case in 'pixman_transform_is_inverse'
When 'pixman_transform_multiply' fails, the result of multiplication just
could not have been identity matrix (one of the values in the resulting
matrix can't be represented as 16.16 fixed point value). So it is safe
to return FALSE.
2011-01-16 22:32:02 +02:00
Siarhei Siamashka
ab3809f4da Workaround for a preprocessor issue in old Sun Studio
Patch from Peter O'Gorman with some modifications

https://bugs.freedesktop.org//show_bug.cgi?id=32764
2011-01-16 20:48:39 +02:00
Siarhei Siamashka
f5c0a60ac8 Fix for "syntax error: empty declaration" Solaris Studio warnings 2011-01-16 20:48:13 +02:00
Siarhei Siamashka
c71e24c9fc Revert "Fix "syntax error: empty declaration" warnings."
This reverts commit b924bb1f81.

There is a better fix for these Solaris Studio warnings.
2011-01-16 20:47:56 +02:00
Andrea Canciani
29439bd772 Improve handling of tangent circles
When b is 0, avoid the division by zero and just return transparent
black.

When the solution t would have an invalid radius (negative or outside
[0,1] for none-extended gradients), return transparent black.
2011-01-12 22:04:33 +01:00
Søren Sandmann Pedersen
a484a9c49c sse2: Skip src pixels that are zero in sse2_composite_over_8888_n_8888()
This is a big speed-up in the SVG helicopter game:

   http://ie.microsoft.com/testdrive/Performance/Helicopter/Default.xhtml

when rendered by Firefox 4 since it is compositing big images
consisting almost entirely of zeros.
2010-12-20 19:37:11 -05:00
Søren Sandmann Pedersen
2610323545 Fix divide-by-zero in set_lum().
When (l - min) or (max - l) are zero, simply set all the channels to
the limit, 0 in the case of (l - min), and a in the case of (max - l).
2010-12-20 19:37:11 -05:00
Søren Sandmann Pedersen
3479050216 Add a test compositing with the various PDF operators.
The test has floating point exceptions enabled, and currently fails
with a divide-by-zero.
2010-12-20 19:37:11 -05:00
Cyril Brulebois
45a2d01077 Fix linking issues when HAVE_FEENABLEEXCEPT is set.
All objects using test/util.c fail to link:
|   CCLD   region-test
| /usr/bin/ld: utils.o: in function enable_fp_exceptions:utils.c(.text+0x939): error: undefined reference to 'feenableexcept'

There's indeed no explicit dependency on -lm, and if HAVE_FEENABLEEXCEPT
happens to be set, test/util.c uses feenableexcept(), which is nowhere
to be found while linking.

Fix this by adding -lm to TEST_LDADD, although two alternatives could be
thought of:
 - Only specifying -lm for objects using util.c.
 - Introducing a conditional to add -lm only when configure detects
   have_feenableexcept=yes.

Signed-off-by: Cyril Brulebois <kibi@debian.org>
2010-12-20 09:55:07 -05:00
Jon TURNEY
303de045ff Remove stray #include <fenv.h>
Remove a stray #include <fenv.h> added in commit 2444b2265a
to fix compilation on platforms which don't have fenv.h

Signed-off-by: Jon TURNEY <jon.turney@dronecode.org.uk>
2010-12-18 16:29:12 -05:00
Søren Sandmann Pedersen
f914cf4486 Add a stress-test program.
This test program tries to use as many rarely-used features as
possible, including alpha maps, accessor functions, oddly-sized
images, strange transformations, conical gradients, etc.

The hope is to provoke crashes or irregular behavior in pixman.
2010-12-17 17:03:29 -05:00
Søren Sandmann Pedersen
7d7b03c091 Make the argument to fence_malloc() an int64_t
That way we can detect if someone attempts to allocate a negative size
and abort instead of just returning NULL and segfaulting later.
2010-12-17 17:01:52 -05:00
Søren Sandmann Pedersen
d41522113e test/utils.c: Initialize palette->rgba to 0.
That way it can be used with palettes that are not statically
allocated, without causing valgrind issues.
2010-12-17 16:57:53 -05:00
Søren Sandmann Pedersen
337f0bff0d test: Move palette initialization to utils.[ch] 2010-12-17 16:57:53 -05:00
Søren Sandmann Pedersen
2444b2265a Extend gradient-crash-test
Test the gradients with various transformations, and test cases where
the gradients are specified with two identical points.
2010-12-17 16:57:38 -05:00
Søren Sandmann Pedersen
de2e51dacb Add enable_fp_exceptions() function in utils.[ch]
This function enables floating point traps if possible.
2010-12-17 16:57:18 -05:00
Søren Sandmann Pedersen
a2afcc9ba4 test: Make composite test use some existing macros instead of defining its own
Also move the ARRAY_LENGTH macro into utils.h so it can be used elsewhere.
2010-12-17 16:57:18 -05:00
Siarhei Siamashka
4d8d2fa47e COPYING: added Nokia to the list of copyright holders 2010-12-17 15:34:16 +02:00
Siarhei Siamashka
3d094997b1 Fix for potential unaligned memory accesses
The temporary scanline buffer allocated on stack was declared
as uint8_t array. As a result, the compiler was free to select
any arbitrary alignment for it (even though there is typically
no reason to use really weird alignments here and the stack is
normally at least 4 bytes aligned on most platforms). Having
improper alignment is non-portable and can impact performance
or even make the code misbehave depending on the target platform.

Using uint64_t type for this array should ensure that any possible
memory accesses done by pixman code are going to be handled correctly
(pixman-combine64.c can access this buffer via uint64_t * pointer).

Some alignment related problem was reported in:
http://lists.freedesktop.org/archives/pixman/2010-November/000747.html
2010-12-07 02:10:51 +02:00
Siarhei Siamashka
985e59a82f ARM: added 'neon_src_rpixbuf_8888' fast path
With this optimization added, pixman assisted conversion from
non-premultiplied to premultiplied alpha format is now fully
NEON optimized (both with and without R/B color components
swapping in the process).
2010-12-07 02:10:35 +02:00
Siarhei Siamashka
733f68912f ARM: added 'neon_composite_in_n_8' fast path 2010-12-03 15:38:04 +02:00
Siarhei Siamashka
af7a69d90e ARM: added flags parameter to some asm fast path wrapper macros
Not all types of operations can be skipped when having transparent
solid source or transparent solid mask. Add an extra flags parameter
for providing this information to the wrappers.
2010-12-03 15:38:00 +02:00
Siarhei Siamashka
f6843e3797 ARM: added 'neon_composite_add_8888_n_8888' fast path 2010-12-03 15:37:54 +02:00
Siarhei Siamashka
b066b520df ARM: added 'neon_composite_add_n_8_8888' fast path 2010-12-03 15:37:49 +02:00
Siarhei Siamashka
1fba779036 ARM: better NEON instructions scheduling for add_8888_8888_8888
Provides a minor performance improvement by using pipelining and hiding
instructions latencies. Also do not clobber d0-d3 registers (source
image pixels) while doing calculations in order to allow the use of
the same macro for add_n_8_8888 fast path later.

Benchmark from ARM Cortex-A8 @500MHz:

== before ==

  add_8888_8888_8888 = L1:  95.94  L2:  42.27  M: 25.60 (121.09%)
                       HT:  14.54  VT:  13.13  R: 12.77  RT:  4.49 (48Kops/s)
     add_8888_8_8888 = L1: 104.51  L2:  57.81  M: 36.06 (106.62%)
                       HT:  19.24  VT:  16.45  R: 14.71  RT:  4.80 (51Kops/s)

== after ==

  add_8888_8888_8888 = L1: 106.66  L2:  47.82  M: 27.32 (129.30%)
                       HT:  15.44  VT:  13.96  R: 12.86  RT:  4.48 (48Kops/s)
     add_8888_8_8888 = L1: 107.72  L2:  61.02  M: 38.26 (113.16%)
                       HT:  19.48  VT:  16.72  R: 14.82  RT:  4.80 (51Kops/s)
2010-12-03 15:37:44 +02:00
Siarhei Siamashka
c3f48b6aa2 ARM: added 'neon_composite_add_8888_8_8888' fast path 2010-12-03 15:37:40 +02:00
Siarhei Siamashka
6d2f7f981b ARM: added 'neon_composite_over_0565_n_0565' fast path 2010-12-03 15:37:23 +02:00
Siarhei Siamashka
3990931bf6 ARM: reuse common NEON code for over_{n_8|8888_n|8888_8}_0565
Renamed suppementary macros from 'over_n_8_0565' to 'over_8888_8_0565',
because they can actually support all variants of this operation:
over_8888_8_0565/over_n_8_0565/over_8888_n_0565.

Also 'over_8888_8_0565' now uses more optimized common code instead of its
own variant, improving performance a bit. Even though this operation is
still memory bandwidth limited, scaled variants of these fast paths may
put more stress on CPU later.

Benchmarked on ARM Cortex-A8 @500MHz:

== before ==

    over_8888_8_0565 =  L1:  67.10  L2:  53.82  M: 44.70 (105.17%)
                        HT:  18.73  VT:  16.91  R: 14.25  RT:  4.80 (52Kops/s)

== after ==

    over_8888_8_0565 =  L1:  77.83  L2:  58.14  M: 44.82 (105.52%)
                        HT:  20.58  VT:  17.44  R: 15.05  RT:  4.88 (52Kops/s)
2010-12-03 15:37:19 +02:00
Siarhei Siamashka
a7c36681c0 ARM: added 'neon_composite_over_8888_n_0565' fast path 2010-12-03 15:37:15 +02:00
Siarhei Siamashka
e6814837a6 ARM: better NEON instructions scheduling for over_n_8_0565
Code rearranged to get better instructions scheduling for ARM Cortex-A8/A9.
Now it is ~30% faster for the pixel data in L1 cache and makes better use
of memory bandwidth when running at lower clock frequencies (ex. 500MHz).
Also register d24 (pixels from the mask image) is now not clobbered by
supplementary macros, which allows to reuse them for the other variants
of compositing operations later.

Benchmark from ARM Cortex-A8 @500MHz:

== before ==

    over_n_8_0565 =  L1:  63.90  L2:  63.15  M: 60.97 ( 73.53%)
                     HT:  28.89  VT:  24.14  R: 21.33  RT:  6.78 (  67Kops/s)

== after ==

    over_n_8_0565 =  L1:  82.64  L2:  75.19  M: 71.52 ( 84.14%)
                     HT:  30.49  VT:  25.56  R: 22.36  RT:  6.89 (  68Kops/s)
2010-12-03 15:37:11 +02:00
Siarhei Siamashka
3be86a92cc ARM: introduced 'fetch_mask_pixblock' macro to simplify code
This macro hides the implementation details of pixels fetching
for the mask image just like 'fetch_src_pixblock' does for the
source image. This provides more possibilities for reusing the
same code blocks in different compositing functions.

This patch does not introduce any functional changes and the
resulting code in the compiled object file is exactly the same.
2010-12-03 15:37:06 +02:00
Siarhei Siamashka
98d08b37f1 ARM: added 'neon_composite_over_n_8_8' fast path 2010-12-03 15:37:01 +02:00
Siarhei Siamashka
4b5b5a2a83 C fast path for a1 fill operation
Can be used as one of the solutions to fix bug
https://bugs.freedesktop.org/show_bug.cgi?id=31604
2010-11-23 00:54:19 +02:00
Alan Coopersmith
654961efe4 Sun's copyrights belong to Oracle now
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
2010-11-21 11:42:22 -08:00
Cyril Brulebois
e7ee43c39d Fix argument quoting for AC_INIT.
One gets rid of this accordingly:
| autoreconf -vfi
| autoreconf: Entering directory `.'
| autoreconf: configure.ac: not using Gettext
| autoreconf: running: aclocal --force
| configure.ac:61: warning: AC_INIT: not a literal: "pixman@lists.freedesktop.org"
| autoreconf: configure.ac: tracing
| configure.ac:61: warning: AC_INIT: not a literal: "pixman@lists.freedesktop.org"

Signed-off-by: Cyril Brulebois <kibi@debian.org>
2010-11-19 13:57:47 -05:00
Cyril Brulebois
149ed6b1f0 Upload to experimental. 2010-11-17 15:56:52 +01:00
Cyril Brulebois
865e06cab0 Update debian/copyright from upstream's COPYING. 2010-11-17 15:28:15 +01:00
Cyril Brulebois
868ed1e2a0 Update changelogs. 2010-11-17 15:27:13 +01:00
Cyril Brulebois
bed147b523 Merge branch 'upstream-experimental' into debian-experimental 2010-11-17 15:25:39 +01:00
Søren Sandmann Pedersen
c59db8af66 Post-release version bump to 0.21.3 2010-11-16 17:14:47 -05:00
Søren Sandmann Pedersen
4646c23858 Pre-release version bump 2010-11-16 16:43:26 -05:00
Søren Sandmann Pedersen
536cf4dd3b Generate {a,x}8r8g8b8, a8, 565 fetchers for nearest/affine images
There are versions for all combinations of x8r8g8b8/a8r8g8b8 and
pad/repeat/none/normal repeat modes. The bulk of each function is an
inline function that takes a format and a repeat mode as parameters.
2010-11-16 16:41:42 -05:00