Commit Graph

2394 Commits

Author SHA1 Message Date
Søren Sandmann Pedersen
1820131fe6 utils.[ch]: Add pixel_checker_get_masks()
This function returns the a, r, g, and b masks corresponding to the
pixel checker's format.
2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen
5eb61f72ea test/utils.[ch]: Add pixel_checker_convert_pixel_to_color()
This function takes a pixel in the format corresponding to the pixel
checker, and converts to a color_t.
2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen
3ae717f71a test: Move do_composite() function from composite.c to utils.c
So that it can be used in other tests.
2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen
958bd334b3 Post-release version bump to 0.29.3 2013-01-29 21:42:02 -05:00
Søren Sandmann Pedersen
a56707e23b Pre-release version bump to 0.29.2 2013-01-29 21:14:51 -05:00
Søren Sandmann Pedersen
349015e1fc stresstest: Ensure that the rasterizer is only given alpha formats
In c2cb303d33, return_if_fail()s were added to
prevent the trapezoid rasterizers from being called with non-alpha
formats. However, stress-test actually does call the rasterizers with
non-alpha formats, but because _pixman_log_error() is disabled in
versions with an odd minor number, the errors never materialized.

Fix this by changing the argument to random format to an enum of three
values DONT_CARE, PREFER_ALPHA, or REQUIRE_ALPHA, and then in the
switch that calls the trapezoid rasterizers, pass the appropriate
value for the function in question.
2013-01-29 20:43:51 -05:00
Søren Sandmann Pedersen
afde862928 Change default GPGKEY to 3892336E, which is soren.sandmann@gmail.com
The old one belongs to the email address sandmann@daimi.au.dk, which
doesn't work anyore.

Also use gpg to get the name and address for the "(Signed by ...)"
line since that works more reliably for me than using git.
2013-01-29 15:24:22 -05:00
Ben Avison
69a7a9b6b6 Improve L1 and L2 benchmark tests for caches that don't use allocate-on-write
In particular this affects single-core ARMs (e.g. ARM11, Cortex-A8), which
are usually configured this way. For other CPUs, this should only add a
constant time, which will be cancelled out by the EXCLUDE_OVERHEAD runs.

The problems were caused by cachelines becoming permanently evicted from
the cache, because the code that was intended to pull them back in again on
each iteration assumed too long a cache line (for the L1 test) or failed to
read memory beyond the first pixel row (for the L2 test). Also, the reloading
of the source buffer was unnecessary.

These issues were identified by Siarhei in this post:
http://lists.freedesktop.org/archives/pixman/2013-January/002543.html
2013-01-29 15:23:05 -05:00
Søren Sandmann Pedersen
1fa67f499d pixman-combine-float.c: Use IS_ZERO() in clip_color() and set_sat()
The clip_color() function has some checks to avoid division by zero,
but they are done by comparing the value to 4 * FLT_EPSILON, where a
better choice is the IS_ZERO() macro that compares to +/- FLT_MIN.

In set_sat(), the check is that *max > *min before dividing by *max -
*min, but that has the potential problem that interactions between GCC
optimizions and 80 bit x87 registers could mean that (*max > *min) is
true in 80 bits, but (*max - *min) is 0 in 32 bits, so that the
division by zero is not prevented. Using IS_ZERO() here as well
prevents this.
2013-01-29 15:23:05 -05:00
Ben Avison
7e53e58664 ARMv6: Replacement add_8_8, over_8888_8888, over_8888_n_8888 and over_n_8_8888 routines
Improved by adding preloads, combining writes and using the SEL
instruction.

add_8_8

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  62.1   0.2      543.4  12.4    100.0%      +774.9%
L2  38.7   0.4      116.8  1.7     100.0%      +201.8%
M   40.0   0.1      110.1  0.5     100.0%      +175.3%
HT  30.9   0.2      43.4   0.5     100.0%      +40.4%
VT  30.6   0.3      39.2   0.5     100.0%      +28.0%
R   21.3   0.2      35.4   0.4     100.0%      +66.6%
RT  8.6    0.2      10.2   0.3     100.0%      +19.4%

over_8888_8888

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  32.3   0.1      38.0   0.2     100.0%      +17.7%
L2  15.9   0.4      30.6   0.5     100.0%      +92.8%
M   13.3   0.0      25.6   0.0     100.0%      +92.9%
HT  10.5   0.1      15.5   0.1     100.0%      +47.1%
VT  10.4   0.1      14.6   0.1     100.0%      +40.8%
R   10.3   0.1      15.8   0.1     100.0%      +53.3%
RT  6.0    0.1      7.6    0.1     100.0%      +25.9%

over_8888_n_8888

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  17.6   0.1      21.0   0.1     100.0%      +19.2%
L2  11.2   0.2      19.2   0.1     100.0%      +71.2%
M   10.2   0.0      19.6   0.0     100.0%      +92.6%
HT  8.4    0.0      11.9   0.1     100.0%      +41.7%
VT  8.3    0.0      11.3   0.1     100.0%      +36.4%
R   8.3    0.0      11.8   0.1     100.0%      +43.1%
RT  5.1    0.1      6.2    0.1     100.0%      +21.3%

over_n_8_8888

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  17.5   0.1      22.8   0.8     100.0%      +30.1%
L2  14.2   0.3      21.7   0.2     100.0%      +52.6%
M   12.0   0.0      22.3   0.0     100.0%      +84.8%
HT  10.5   0.1      14.1   0.1     100.0%      +34.5%
VT  10.0   0.1      13.5   0.1     100.0%      +35.3%
R   9.4    0.0      12.9   0.2     100.0%      +37.7%
RT  5.5    0.1      6.5    0.2     100.0%      +19.2%
2013-01-29 21:48:03 +02:00
Ben Avison
f87dfd6f37 ARMv6: New conversion routines
There was no previous attempt at accelerating these specifically for
ARMv6.

src_x888_8888

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  96.7   0.5      270.4  2.6     100.0%      +179.5%
L2  44.6   2.7      110.6  9.7     100.0%      +148.0%
M   26.9   0.1      87.6   0.5     100.0%      +226.1%
HT  19.3   0.2      37.5   0.4     100.0%      +93.7%
VT  18.6   0.1      33.7   0.4     100.0%      +81.6%
R   18.4   0.1      32.2   0.3     100.0%      +75.2%
RT  9.2    0.2      12.1   0.3     100.0%      +31.4%

src_0565_8888

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  37.0   0.3      66.9   0.2     100.0%      +80.8%
L2  30.3   0.2      55.9   0.3     100.0%      +84.4%
M   25.9   0.0      62.3   0.2     100.0%      +140.3%
HT  15.2   0.1      33.1   0.3     100.0%      +116.9%
VT  15.1   0.1      30.7   0.3     100.0%      +103.6%
R   14.2   0.1      27.6   0.3     100.0%      +94.0%
RT  6.0    0.1      11.2   0.3     100.0%      +87.2%
2013-01-29 21:47:59 +02:00
Ben Avison
a0f59f3b28 ARMv6: New blit routines
These are usable either as various composite operations, or via the
top-level function pixman_blt() which now does some blitting for the
first time on an ARMv6 platform (previously it just returned FALSE).

src_8888_8888

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  414.5  9.4      445.8  3.6     100.0%      +7.6%
L2  93.3   20.7     114.5  12.9    100.0%      +22.7%
M   57.0   0.2      89.2   0.5     100.0%      +56.4%
HT  28.7   0.3      39.6   0.4     100.0%      +37.9%
VT  25.5   0.2      35.3   0.4     100.0%      +38.4%
R   20.1   0.1      33.8   0.3     100.0%      +67.8%
RT  7.8    0.2      12.7   0.4     100.0%      +62.7%

src_0565_0565

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  397.4  6.1      412.5  5.2     100.0%      +3.8%
L2  143.2  10.9     141.9  6.5     68.9%       -0.9%  (insignificant)
M   90.7   0.4      133.5  0.7     100.0%      +47.1%
HT  38.6   0.3      53.7   0.7     100.0%      +39.0%
VT  33.0   0.3      47.3   0.6     100.0%      +43.3%
R   25.7   0.2      42.1   0.5     100.0%      +64.1%
RT  8.0    0.2      13.3   0.3     100.0%      +65.6%

src_8_8

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  716.5  9.8      768.2  20.4    100.0%      +7.2%
L2  246.2  12.7     260.5  8.8     100.0%      +5.8%
M   146.8  0.7      227.9  0.7     100.0%      +55.2%
HT  44.9   0.6      62.1   1.0     100.0%      +38.2%
VT  35.6   0.4      53.4   0.7     100.0%      +50.0%
R   29.7   0.3      48.2   0.6     100.0%      +62.2%
RT  8.6    0.2      12.9   0.4     100.0%      +49.3%
2013-01-29 21:47:54 +02:00
Ben Avison
3cff56c5b0 ARMv6: New fill routines
Note that this also effectively accelerates src_n_8888, src_n_0565 and
src_n_8 composite types, because of the fast paths in
pixman-fast-path.c implemented by fast_composite_solid_fill(), which
end up dispatching these platform-specific fill routines.

src_n_8888

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  157.3  1.1      574.2  8.7     100.0%      +265.0%
L2  94.2   0.5      364.8  4.2     100.0%      +287.3%
M   92.7   0.4      358.7  1.1     100.0%      +287.1%
HT  68.5   0.9      133.6  4.0     100.0%      +95.2%
VT  61.3   0.8      111.8  2.6     100.0%      +82.4%
R   61.1   0.9      108.7  2.8     100.0%      +78.1%
RT  24.6   1.0      28.6   1.6     100.0%      +16.0%

src_n_0565

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  157.4  1.0      983.1  38.5    100.0%      +524.6%
L2  93.6   0.5      696.0  14.3    100.0%      +643.4%
M   92.7   0.4      680.5  1.0     100.0%      +634.0%
HT  68.3   0.9      160.3  6.6     100.0%      +134.6%
VT  61.1   0.8      130.1  3.4     100.0%      +112.9%
R   61.0   0.8      125.4  4.1     100.0%      +105.7%
RT  24.9   1.3      29.5   1.5     100.0%      +18.2%

src_n_8

    Before          After
    Mean   StdDev   Mean   StdDev  Confidence  Change
L1  154.7  1.0      1324.4 48.5    100.0%      +756.3%
L2  92.4   0.4      1178.4 10.9    100.0%      +1175.6%
M   92.9   0.4      1275.7 2.1     100.0%      +1273.5%
HT  68.2   1.0      169.8  5.5     100.0%      +149.0%
VT  61.2   1.0      138.5  3.6     100.0%      +126.3%
R   61.3   0.9      130.1  3.8     100.0%      +112.4%
RT  25.5   1.3      29.2   1.9     100.0%      +14.6%
2013-01-29 21:47:49 +02:00
Ben Avison
2e173326aa ARMv6: Lay the groundwork for later patches in the series
Move the entire contents of pixman-arm-simd-asm.S to a new file;
ultimately this will only retain the scaled operations, so it is
named pixman-arm-simd-asm-scaled.S. Added new header file
pixman-arm-simd-asm.h, containing the macros which are the basis of
all the new ARMv6 implementations, although at this point in the
series, nothing uses them and the library should be binary-identical.
2013-01-29 21:47:42 +02:00
Søren Sandmann Pedersen
65fc1adb65 demo/scale: Add a spin button to set the number of subsample bits
For large upscalings the level of subsampling for the filter has a
quite visible effect, so make it settable in the UI so that people can
experiment with various values.
2013-01-27 23:06:28 -05:00
Siarhei Siamashka
ed39992564 Use pixman_transform_point_31_16() from pixman_transform_point()
Old functions pixman_transform_point() and pixman_transform_point_3d()
now become just wrappers for pixman_transform_point_31_16() and
pixman_transform_point_31_16_3d(). Eventually their uses should be
completely eliminated in the pixman code and replaced with their
extended range counterparts. This is needed in order to be able
to correctly handle any matrices and parameters that may come
to pixman from the code responsible for XRender implementation.
2013-01-27 20:50:38 +02:00
Siarhei Siamashka
5a78d74ccc test: Added matrix-test for testing projective transform accuracy
This test uses __float128 data type when it is available
for implementing a "perfect" reference implementation. The
output from from pixman_transform_point_31_16() and
pixman_transform_point_31_16_affine() is compared with the
reference implementation to make sure that the rounding
errors may only show up in a single least significant bit.

The platforms and compilers, which do not support __float128
data type, can rely on crc32 checksum for the pseudorandom
transform results.
2013-01-27 20:50:31 +02:00
Siarhei Siamashka
09600ae7e3 configure.ac: Added detection for __float128 support
GCC supports 128-bit floating point data type on some platforms (including
but not limited to x86 and x86-64). This may be useful for tests, which
need prefectly accurate reference implementations of certain algorithms.
2013-01-27 20:50:26 +02:00
Siarhei Siamashka
c3deb8334a Add higher precision "pixman_transform_point_*" functions
The following new functions are added:

pixman_transform_point_31_16_3d() -
    Calculates the product of a matrix and a vector multiplication.

pixman_transform_point_31_16() -
    Calculates the product of a matrix and a vector multiplication.
    Then converts the homogenous resulting vector [x, y, z] to
    cartesian [x', y', 1] variant, where x' = x / z, and y' = y / z.

pixman_transform_point_31_16_affine() -
    A faster sibling of the other two functions, which assumes affine
    transformation, where the bottom row of the matrix is [0, 0, 1] and
    the last element of the input vector is set to 1.

These functions transform a point with 31.16 fixed point coordinates from
the destination space to a point with 48.16 fixed point coordinates in
the source space.

The results are accurate and the rounding errors may only show up in
the least significant bit. No overflows are possible for the affine
transformations as long as the input data is provided in 31.16 format.
In the case of projective transformations, some output values may be not
representable using 48.16 fixed point format. In this case the results
are clamped to return maximum or minimum 48.16 values (so that the caller
can at least handle NONE and PAD repeats correctly).
2013-01-27 20:49:43 +02:00
Siarhei Siamashka
a47ed2c311 Faster fetch for the C variant of r5g6b5 src/dest iterator
Processing two pixels at once is used to reduce the number of
arithmetic operations.

The speedup relative to the generic fetch_scanline_r5g6b5() from
"pixman-access.c" (pixman was compiled with gcc 4.7.2):

    MIPS 74K        480MHz  :  20.32 MPix/s ->  26.47 MPix/s
    ARM11           700MHz  :  34.95 MPix/s ->  38.22 MPix/s
    ARM Cortex-A8  1000MHz  :  87.44 MPix/s -> 100.92 MPix/s
    ARM Cortex-A9  1700MHz  : 150.95 MPix/s -> 158.13 MPix/s
    ARM Cortex-A15 1700MHz  : 148.91 MPix/s -> 155.42 MPix/s
    IBM Cell PPU   3200MHz  :  75.29 MPix/s ->  98.33 MPix/s
    Intel Core i7  2800MHz  : 257.02 MPix/s -> 376.93 MPix/s

That's the performance for C code (SIMD and assembly optimizations
are disabled via PIXMAN_DISABLE environment variable).
2013-01-27 20:48:31 +02:00
Siarhei Siamashka
e66fd5ccb6 Faster write-back for the C variant of r5g6b5 dest iterator
Unrolling loops improves performance, so just use it here.

Also GCC can't properly optimize this code for RISC processors and
allocate 0x1F001F constant in a register. Because this constant is
too large to be represented as an immediate operand in instructions,
GCC inserts some redundant arithmetics. This problem can be workarounded
by explicitly using a variable for 0x1F001F constant and also initializing
it by a read from another volatile variable. In this case GCC is forced
to allocate a register for it, because it is not seen as a constant anymore.

The speedup relative to the generic store_scanline_r5g6b5() from
"pixman-access.c" (pixman was compiled with gcc 4.7.2):

    MIPS 74K        480MHz  :  33.22 MPix/s ->  43.42 MPix/s
    ARM11           700MHz  :  50.16 MPix/s ->  78.23 MPix/s
    ARM Cortex-A8  1000MHz  : 117.75 MPix/s -> 196.34 MPix/s
    ARM Cortex-A9  1700MHz  : 177.04 MPix/s -> 320.32 MPix/s
    ARM Cortex-A15 1700MHz  : 231.44 MPix/s -> 261.64 MPix/s
    IBM Cell PPU   3200MHz  : 130.25 MPix/s -> 145.61 MPix/s
    Intel Core i7  2800MHz  : 502.21 MPix/s -> 721.73 MPix/s

That's the performance for C code (SIMD and assembly optimizations
are disabled via PIXMAN_DISABLE environment variable).
2013-01-27 20:48:26 +02:00
Siarhei Siamashka
a9f6669416 Added C variants of r5g6b5 fetch/write-back iterators
Adding specialized iterators for r5g6b5 color format allows us to work
on fine tuning performance of r5g6b5 fetch/write-back operations in the
pixman general "fetch -> combine -> store" pipeline.

These iterators also make "src_x888_0565" fast path redundant, so it can
be removed.
2013-01-27 20:48:22 +02:00
Chris Wilson
794033ed43 Eliminate duplicate copies of channel flags for pixman_image_composite32()
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27 14:04:16 +00:00
Chris Wilson
a59f081df4 Always return a valid function from lookup_combiner()
We should always have at least a C combiner available, so we never
expect the search to fail. If it does, emit an error and return a
dummy function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27 14:04:16 +00:00
Chris Wilson
520230914b Always return a valid function from lookup_composite()
We never expect to fail to find the appropriate function as the
general_composite_rect should always match. So if somehow we fallthrough
the search, emit a _pixman_log_error() and return a dummy function.

Note that we remove some conditionals and a level of indentation hence a
large amount of code movement. This also reveals that in a few places we
are duplicating stack variables that can be eliminated later.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27 14:04:15 +00:00
Chris Wilson
b283c864a3 sse2: Add fast paths for bilinear source with a solid mask
Based on the existing sse2_8888_n_8888 nearest scaling routines.

fishbowl on an i5-2500: 60.9s -> 56.9s

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27 14:04:15 +00:00
Chris Wilson
d00ce40912 sse2: Add a fast path for add_n_8_8888
This path is being exercised by compositing of trapezoids for clipmasks, for
instance as used in the firefox-asteroids cairo-trace.

IVB i7-3720qm ./tests/lowlevel-blt-bench add_n_8_8888:

reference memcpy speed = 14846.7MB/s (3711.7MP/s for 32bpp fills)

before: L1: 681.10  L2: 735.14  M:701.44 ( 28.35%)  HT:283.32  VT:213.23  R:208.93  RT: 77.89 ( 793Kops/s)

after:  L1: 992.91  L2:1017.33  M:982.58 ( 39.88%)  HT:458.93  VT:332.32  R:326.13  RT:136.66 (1287Kops/s)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27 14:04:15 +00:00
Chris Wilson
7ced3beec9 sse2: Add a fast path for add_n_8888
This path is being exercised by inplace compositing of trapezoids, for
instance as used in the firefox-asteroids cairo-trace.

IVB i3-3720qm ./tests/lowlevel-blt-bench add_n_888:

reference memcpy speed = 14918.3MB/s (3729.6MP/s for 32bpp fills)

before: L1:1752.44  L2:2259.48  M:2215.73 ( 58.80%)  HT:589.49   VT:404.04   R:424.69  RT:134.68 (1182Kops/s)

after:  L1:3931.21  L2:6132.78  M:3440.17 ( 92.24%)  HT:1337.70  VT:1357.64  R:1270.27  RT:359.78 (2161Kops/s)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2013-01-27 14:04:15 +00:00
Jeff Muizelaar
b7f523e3bc Add a version of bilinear_interpolation for precision <=4
Having 4 or fewer bits means we can do two components at
a time in a single 32 bit register.

Here are the results for firefox-fishtank on a Pandaboard with
4.6.3 and PIXMAN_DISABLE="arm-neon"

Before:
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image           t-firefox-fishtank    7.841    7.910   0.70%    6/6

After:
[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image           t-firefox-fishtank    6.951    6.995   1.11%    6/6
2013-01-25 13:14:37 -05:00
Ben Avison
24e83cae64 Tweaks to lowlevel-blt-bench
This adds two extra tests, src_n_8 and src_8_8, which I have been
using to benchmark my ARMv6 changes.

I'd also like to propose that it requires an exact test name as the
executable's argument, as achieved by this strstr to strcmp change.
Without this, it is impossible to only benchmark (for example)
add_8_8, add_n_8 or src_n_8, due to those also being substrings of
many other test names.
2013-01-25 11:13:07 -05:00
Søren Sandmann Pedersen
b527a0e615 test: Use operator_name() and format_name() in composite.c
With the operator_name() and format_name() functions there is no
longer any reason for composite.c to have its own table of format and
operator names.
2013-01-23 12:24:31 -05:00
Søren Sandmann Pedersen
4eb9a24aba utils.[ch]: Add new format_name() function
This function returns the name of the given format code, which is
useful for printing out debug information. The function is written as
a switch without a default value so that the compiler will warn if new
formats are added in the future. The fake formats used in the fast
path tables are also recognized.

The function is used in alpha_map.c, where it replaces an existing
format_name() function, and in blitters-test.c, affine-test.c, and
scaling-test.c.
2013-01-23 12:24:31 -05:00
Søren Sandmann Pedersen
1676b49389 test/utils.[ch]: Add new function operator_name()
This function returns the name of the given operator, which is useful
for printing out debug information. The function is done as a switch
without a default value so that the compiler will warn if new
operators are added in the future.

The function is used in affine-test.c, scaling-test.c, and
blitters-test.c.
2013-01-23 12:24:31 -05:00
Søren Sandmann Pedersen
8d85311143 README: Add guidelines on how to contribute patches
Ben Avison pointed out here:

   http://lists.freedesktop.org/archives/pixman/2013-January/002485.html

that there isn't really any documentation about how to submit patches
to pixman. This patch adds some information to the README file.

v2: Incorporate some comments from Ben Avison
v3: Change gitweb URL to cgit
2013-01-23 12:22:40 -05:00
Matt Turner
61dacffaf4 Convert INCLUDES to AM_CPPFLAGS
INCLUDES has been deprecated starting with automake 1.13. Convert all
occurrences with the recommended AM_CPPFLAGS replacement.
2013-01-22 22:08:30 -08:00
Matt Turner
c7c28f440d Add new demos and tests to .gitignore 2013-01-22 22:08:30 -08:00
Nemanja Lukic
2c6577476e MIPS: DSPr2: Added more fast-paths:
- over_reverse_n_8888
 - in_n_8_8

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        over_reverse_n_8888 =  L1:  19.42  L2:  19.07  M: 15.38 ( 40.80%)  HT: 13.35  VT: 13.10  R: 12.92  RT:  8.27 (  49Kops/s)
                   in_n_8_8 =  L1:  21.20  L2:  22.86  M: 21.42 ( 14.21%)  HT: 15.97  VT: 15.69  R: 15.47  RT:  8.00 (  48Kops/s)

Optimized:
        over_reverse_n_8888 =  L1:  60.09  L2:  47.87  M: 28.65 ( 76.02%)  HT: 23.58  VT: 22.51  R: 21.99  RT: 12.28 (  60Kops/s)
                   in_n_8_8 =  L1:  89.38  L2:  86.07  M: 65.48 ( 43.44%)  HT: 44.64  VT: 41.50  R: 40.77  RT: 16.94 (  66Kops/s)
2013-01-22 03:12:59 +01:00
Nemanja Lukic
a67b0e24d7 MIPS: DSPr2: Added more fast-paths for REVERSE operation:
- out_reverse_8_0565
 - out_reverse_8_8888

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        out_reverse_8_0565 =  L1:  14.29  L2:  13.58  M: 12.14 ( 24.16%)  HT:  9.23  VT:  9.12  R:  8.84  RT:  4.75 (  36Kops/s)
        out_reverse_8_8888 =  L1:  27.46  L2:  23.24  M: 17.41 ( 57.73%)  HT: 12.61  VT: 12.47  R: 11.79  RT:  5.86 (  41Kops/s)

Optimized:
        out_reverse_8_0565 =  L1:  28.24  L2:  25.64  M: 20.63 ( 41.05%)  HT: 16.69  VT: 16.14  R: 15.50  RT:  8.69 (  52Kops/s)
        out_reverse_8_8888 =  L1:  52.78  L2:  41.44  M: 23.50 ( 77.94%)  HT: 18.79  VT: 18.16  R: 16.90  RT:  9.11 (  53Kops/s)
2013-01-22 03:10:31 +01:00
Maarten Lankhorst
01c2431ef8 Add 00-unexport-symbol.diff
* Add 00-unexport-symbol.diff
  - remove test-only use of _pixman_internal_only_get_implementation
  - zap the only test requiring the use of this symbol
2013-01-08 18:16:23 +01:00
Maarten Lankhorst
d6b69d4f63 update symbols file and addd lintian override for hidden symbol 2013-01-08 17:10:12 +01:00
Maarten Lankhorst
0f8c56fe52 new upstream release 2013-01-08 16:12:25 +01:00
Maarten Lankhorst
818af795d4 pixman 0.28.2 release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iEYEABECAAYFAlDF0BkACgkQmxfmIW/3waiEegCcCVDzXL2gGouDGCBqJVOmzUcv
 ZnMAoI50IhP5KXKKEEx2dJlfFkzKVo5N
 =J62R
 -----END PGP SIGNATURE-----

Merge tag 'pixman-0.28.2' into debian-experimental

pixman 0.28.2 release
2013-01-08 16:10:57 +01:00
Søren Sandmann Pedersen
35cc965514 pixman-filter.c: Cope with NULL returns from malloc()
v2: Don't return a pointer to uninitialized memory when the allocation
of horz and vert fails, but allocation of params doesn't.
2013-01-06 17:38:23 -05:00
Søren Sandmann Pedersen
58526cfc72 Handle solid images in the noop iterator
The noop src iterator already has code to handle solid images, but
that code never actually runs currently because it is not possible for
an image to have both a format code of PIXMAN_solid and a flag of
FAST_PATH_BITS_IMAGE.

If these two were to be set at the same time, the
fast_composite_tiled_repeat() fast path would trigger for solid images
(because it triggers for PIXMAN_any formats, which includes
PIXMAN_solid), but for solid images we can usually do better than that
fast path.

So this patch removes _pixman_solid_fill_iter_init() and instead
handles such images (along with repeating 1x1 bits images without an
alpha map) in pixman-noop.c.

When a 1x1R image is involved in the general composite path, before
this patch, it would hit this code in repeat() in pixman-inlines.h:

        while (*c >= size)
            *c -= size;
        while (*c < 0)
            *c += size;

and those loops could run for a huge number of iteratons (proportional
to the composite width). For such cases, the performance improvement
is really big:

./test/lowlevel-blt-bench -n add_n_8888:

Before:

    add_n_8888 =  L1:   3.86  L2:   3.78  M:  1.40 (  0.06%)  HT:  1.43  VT:  1.41  R:  1.41  RT:  1.38 (  19Kops/s)

After:

    add_n_8888 =  L1:1236.86  L2:2468.49  M:1097.88 ( 49.04%)  HT:476.49  VT:429.05  R:417.04  RT:155.12 ( 817Kops/s)
2013-01-06 17:30:12 -05:00
Marko Lindqvist
480dd38fd1 Fix build with automake-1.13
Automake-1.13 has removed long obsolete AM_CONFIG_HEADER macro (
http://lists.gnu.org/archive/html/automake/2012-12/msg00038.html )
and autoreconf errors out upon seeing it.

Attached patch replaces obsolete AM_CONFIG_HEADER with now proper
AC_CONFIG_HEADERS.
2013-01-04 01:54:10 +02:00
Siarhei Siamashka
1abde88ae6 Use more appropriate types and remove a magic constant 2013-01-04 01:27:06 +02:00
Siarhei Siamashka
c1fd5a4243 Define SIZE_MAX if it is not provided by the standard C headers
C++ compilers do not define SIZE_MAX. It is also not available
if the code is compiled by some C compilers:
    http://lists.freedesktop.org/archives/pixman/2012-August/002196.html
2013-01-04 01:26:55 +02:00
Siarhei Siamashka
66c4292822 Rename 'xor' variable to 'filler' (because 'xor' is a C++ keyword) 2012-12-20 03:14:21 +02:00
Søren Sandmann Pedersen
4dfda2adfe float-combiner.c: Change tests for x == 0.0 tests to - FLT_MIN < x < FLT_MIN
pixman-float-combiner.c currently uses checks like these:

    if (x == 0.0f)
        ...
    else
        ... / x;

to prevent division by 0. In theory this is correct: a division-by-zero
exception is only supposed to happen when the floating point numerator is
exactly equal to a positive or negative zero.

However, in practice, the combination of x87 and gcc optimizations
causes issues. The x87 registers are 80 bits wide, which means the
initial test:

	if (x == 0.0f)

may be false when x is an 80 bit floating point number, but when x is
rounded to a 32 bit single precision number, it becomes equal to
0.0. In principle, gcc should compensate for this quirk of x87, and
there are some options such as -ffloat-store, -fexcess-precision=standard,
and -std=c99 that will make it do so, but these all have a performance
cost.  It is also possible to set the FPU to a mode that makes it do
all computation with single or double precision, but that would
require pixman to save the existing mode before doing anything with
floating point and restore it afterwards.

Instead, this patch side-steps the issue by replacing exact checks for
equality with zero with a new macro that checkes whether the value is
between -FLT_MIN and FLT_MIN.

There is extensive reading material about this issue linked off the
infamous gcc bug 323:

    http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323
2012-12-19 13:49:32 -05:00
Siarhei Siamashka
2734071d7b ARM: make use of UQADD8 instruction even in generic C code paths
ARMv6 has UQADD8 instruction, which implements unsigned saturated
addition for 8-bit values packed in 32-bit registers. It is very useful
for UN8x4_ADD_UN8x4, UN8_rb_ADD_UN8_rb and ADD_UN8 macros (which would
otherwise need a lot of arithmetic operations to simulate this operation).
Since most of the major ARM linux distros are built for ARMv7, we are
much less dependent on runtime CPU detection and can get practical
benefits from conditional compilation here for a lot of users.

The results of cairo-perf-trace benchmark on ARM Cortex-A15 with pixman
compiled by gcc 4.7.2 and PIXMAN_DISABLE set to "arm-simd arm-neon":

Speedups
========
image    firefox-talos-gfx  (29938.22 0.12%) ->  (27814.76 0.51%) : 1.08x speedup
image    firefox-asteroids  (23241.11 0.07%) ->  (21795.19 0.07%) : 1.07x speedup
image firefox-canvas-alpha (174519.85 0.08%) -> (164788.64 0.20%) : 1.06x speedup
image              poppler   (9464.46 1.61%) ->   (8991.53 0.14%) : 1.05x speedup
2012-12-18 20:49:58 +02:00