Commit Graph

2407 Commits

Author SHA1 Message Date
Julien Cristau
d4898ac139 Upload to unstable 2013-08-13 12:08:22 +02:00
Julien Cristau
105c249996 Increase alpha-loop test timeout some more. 2013-08-13 12:03:40 +02:00
Julien Cristau
9b844940ba Includes big-endian matrix-test fix 2013-08-13 12:01:40 +02:00
Julien Cristau
2fc06503f6 Bump changelogs 2013-08-13 12:00:48 +02:00
Julien Cristau
a781ff50e7 pixman 0.30.2 release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJSAlYRAAoJEIWlZJw4kjNuBQYIAKwOAc0rKtX5c/z5iuf90akR
 EfEKK5ICQ8iE55Jvmn3e9ny12yrRbP/S6++W2kKkaF6gEmab2/3YswN42/ZPn3gJ
 1RER7b+x/CxsJbJVNPbRBLdkfF2HH8RicJru7cQ98TjR2mSC9uKAyiC/podWQZvO
 96rcnXZZBZMMjZLCUYfhiNz71Frhjh3fZrodx9GUJ6Lbka74bvWJ3fB4PXoTtbbr
 H8OPkxJQw5OjGtqgwB8lbLQZmZLhuZYUGOF0wbSA2+2HvylxlPlpUgC1c3r8yn77
 MQsD/ex+CfswwxxMTrINkHSVllaoJafM8cjk8HFG3EPkW/ohdpDthhtZpmSsM5E=
 =09FF
 -----END PGP SIGNATURE-----

Merge tag 'pixman-0.30.2' into debian-unstable

pixman 0.30.2 release
2013-08-13 12:00:07 +02:00
Søren Sandmann Pedersen
f8a0812b1c Pre-release version bump to 0.30.2 2013-08-07 10:07:35 -04:00
Siarhei Siamashka
b5167b8a54 test: fix matrix-test on big endian systems 2013-08-05 01:45:59 +03:00
Julien Cristau
bbb3765faf Upload to unstable 2013-08-03 10:24:43 +02:00
Julien Cristau
2e13b569cb Increase timeout for the alpha-loop test.
That will hopefully let it pass on the mips buildd.
2013-08-03 10:23:41 +02:00
Andrea Canciani
a82b95a264 test: Fix build on MSVC
The MSVC compiler is very strict about variable declarations after
statements.

Move all the declarations of each block before any statement in the
same block to fix multiple instances of:

alpha-loop.c(XX) : error C2275: 'pixman_image_t' : illegal use of this
type as an expression
2013-08-01 09:08:15 -07:00
Søren Sandmann Pedersen
4c04a86c68 Version bump to 0.30.1 2013-08-01 07:19:21 -04:00
Alexander Troosh
6300452952 Require GTK+ version >= 2.16
I'm got bug in my system:

lcc: "scale.c", line 374: warning: function "gtk_scale_add_mark" declared
          implicitly [-Wimplicit-function-declaration]
      gtk_scale_add_mark (GTK_SCALE (widget), 0.0, GTK_POS_LEFT, NULL);
      ^

  CCLD   scale
scale.o: In function `app_new':
(.text+0x23e4): undefined reference to `gtk_scale_add_mark'
scale.o: In function `app_new':
(.text+0x250c): undefined reference to `gtk_scale_add_mark'
scale.o: In function `app_new':
(.text+0x2634): undefined reference to `gtk_scale_add_mark'
make[2]: *** [scale] Error 1
make[2]: Target `all' not remade because of errors.

$ pkg-config --modversion gtk+-2.0
2.12.1

The demos/scale.c use call to gtk_scale_add_mark() function from 2.16+
version of GTK+. Need do support old GTK+ (rewrite scale.c) or simple
demand of high version of GTK+, like this:
2013-07-30 08:18:35 -04:00
Matthieu Herrb
02869a1229 configure.ac: Don't use '+=' since it's not POSIX
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Matthieu Herrb <matthieu.herrb@laas.fr>
2013-07-30 08:18:25 -04:00
Markos Chandras
35da06c828 Use AC_LINK_IFELSE to check if the Loongson MMI code can link
The Loongson code is compiled with -march=loongson2f to enable the MMI
instructions, but binutils refuses to link object code compiled with
different -march settings, leading to link failures later in the
compile. This avoids that problem by checking if we can link code
compiled for Loongson.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
2013-07-30 08:18:02 -04:00
ingmar@irsoft.de
e14f5a739f Fix broken build when HAVE_CONFIG_H is undefined, e.g. on Win32.
Build fix for platforms without a generated config.h, for example Win32.
2013-07-30 08:17:49 -04:00
Julien Cristau
3f0d759608 Upload to unstable 2013-07-27 21:40:50 +02:00
Julien Cristau
3c4dac9a7c Fix matrix-test on big endian
Patch from Siarhei Siamashka.
2013-07-27 21:40:09 +02:00
Julien Cristau
3473a947da Disable arm iwmmxt fast paths. It breaks the build. 2013-07-27 14:48:50 +02:00
Julien Cristau
dc29515934 Disable silent Makefile rules. 2013-07-27 14:37:23 +02:00
Julien Cristau
2084b2d3bd Upload to unstable 2013-07-26 14:58:46 +02:00
Julien Cristau
317b3c3eea Add more test-only exported functions to symbols file 2013-07-26 14:47:35 +02:00
Julien Cristau
73ff58c119 Remove png file missing from the tarball 2013-07-26 14:36:14 +02:00
Julien Cristau
d2fbfbc23c Bump changelog and symbols for 0.30.0 2013-07-26 14:31:38 +02:00
Julien Cristau
5de927bd3e Merge branch 'upstream-merge' into debian-unstable 2013-07-26 14:26:43 +02:00
Julien Cristau
0ef6350c3d Revert "Add 00-unexport-symbol.diff"
This reverts commit 01c2431ef8.
2013-07-26 14:26:30 +02:00
Julien Cristau
07473e703e Merge remote-tracking branch 'origin/debian-experimental' into debian-unstable
Conflicts:
	debian/changelog
2013-07-26 14:26:11 +02:00
Julien Cristau
be9bb76118 Merge remote-tracking branch 'origin/upstream-experimental' into upstream-merge 2013-07-26 14:24:21 +02:00
Søren Sandmann Pedersen
41daf50aae Pre-release version bump to 0.30.0 2013-05-08 19:31:22 -04:00
Søren Sandmann Pedersen
5a7179191d Post-release version bump to 0.29.5 2013-04-30 18:57:43 -04:00
Søren Sandmann Pedersen
2714b5d201 Pre-release version bump to 0.29.4 2013-04-30 18:50:04 -04:00
Søren Sandmann Pedersen
7fc2654a1f pixman/refactor: Delete this file
Essentially all of it is obsolete by now.
2013-04-30 16:25:10 -04:00
Nemanja Lukic
cb928a77c0 MIPS: DSPr2: Added rpixbuf fast path.
Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
       rpixbuf =  L1:  14.63  L2:  13.55  M:  9.91 ( 79.53%)  HT:  8.47  VT:  8.32  R:  8.17  RT:  4.90 (  33Kops/s)

Optimized:
       rpixbuf =  L1:  45.69  L2:  37.30  M: 17.24 (138.31%)  HT: 15.66  VT: 14.88  R: 13.97  RT:  8.38 (  44Kops/s)
2013-04-30 15:38:43 -04:00
Nemanja Lukic
c6a6fbdcd3 MIPS: DSPr2: Added pixbuf fast path.
Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        pixbuf =  L1:  18.18  L2:  16.47  M: 13.36 (107.27%)  HT: 10.16  VT: 10.07  R:  9.84  RT:  5.54 (  35Kops/s)

Optimized:
        pixbuf =  L1:  43.54  L2:  36.02  M: 17.08 (137.09%)  HT: 15.58  VT: 14.85  R: 13.87  RT:  8.38 (  44Kops/s)
2013-04-30 15:38:43 -04:00
Nemanja Lukic
f69335d529 test: add "pixbuf" and "rpixbuf" to lowlevel-blt-bench
Add necessary support to lowlevel-blt benchmark for benchmarking pixbuf and
rpixbuf fast paths. bench_composite function now checks for pixbuf string in
testname, and if that is detected, use same bits for src and mask images.
2013-04-30 15:38:43 -04:00
Nemanja Lukic
3dc9e3827e test: add "src_0888_8888_rev" and "src_0888_0565_rev" to lowlevel-blt-bench 2013-04-30 15:38:43 -04:00
Nemanja Lukic
44174ce51d MIPS: DSPr2: Fix for bug in in_n_8 routine.
Rounding logic was not implemented right.
Instead of using rounding version of the 8-bit shift, logical shifts were used.
Also, code used unnecessary multiplications, which could be avoided by packing
4 destination (a8) pixel into one 32bit register. There were also, unnecessary
spills on stack. Code is rewritten to address mentioned issues.

The bug was revealed by increasing number of the iterations in blitters-test.

Performance numbers on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
                   in_n_8 =  L1:  21.20  L2:  22.86  M: 21.42 ( 14.21%)  HT: 15.97  VT: 15.69  R: 15.47  RT:  8.00 (  48Kops/s)
Optimized (first implementation, with bug):
                   in_n_8 =  L1:  89.38  L2:  86.07  M: 65.48 ( 43.44%)  HT: 44.64  VT: 41.50  R: 40.77  RT: 16.94 (  66Kops/s)
Optimized (with bug fix, and code revisited):
                   in_n_8 =  L1: 102.33  L2:  95.65  M: 70.54 ( 46.84%)  HT: 48.35  VT: 45.06  R: 43.20  RT: 17.60 (  66Kops/s)
2013-04-30 15:38:43 -04:00
Nemanja Lukic
5858f09d26 MIPS: DSPr2: Added src_0565_8888 nearest neighbor fast path.
Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
         src_0565_8888 =  L1:  20.70  L2:  19.22  M: 12.50 ( 49.79%)  HT: 10.45  VT: 10.18  R:  9.99  RT:  5.31 (  31Kops/s)

Optimized:
         src_0565_8888 =  L1:  62.98  L2:  53.44  M: 23.07 ( 91.87%)  HT: 19.85  VT: 19.15  R: 17.70  RT:  9.68 (  43Kops/s)
2013-04-30 15:38:43 -04:00
Nemanja Lukic
311d55b6d8 MIPS: DSPr2: Added over_8888_0565 nearest neighbor fast path.
Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        over_8888_0565 =  L1:  13.22  L2:  12.02  M:  9.77 ( 38.92%)  HT:  8.58  VT:  8.35  R:  8.38  RT:  5.78 (  35Kops/s)

Optimized:
        over_8888_0565 =  L1:  26.20  L2:  22.97  M: 15.92 ( 63.40%)  HT: 13.33  VT: 13.13  R: 12.72  RT:  7.65 (  39Kops/s)
2013-04-30 15:38:43 -04:00
Nemanja Lukic
bd487ee34c MIPS: DSPr2: Added over_8888_8888 nearest neighbor fast path.
Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        over_8888_8888 =  L1:  19.47  L2:  16.30  M: 11.24 ( 59.69%)  HT:  9.54  VT:  9.29  R:  9.47  RT:  6.24 (  37Kops/s)

Optimized:
        over_8888_8888 =  L1:  43.67  L2:  33.30  M: 16.32 ( 86.65%)  HT: 14.10  VT: 13.78  R: 12.96  RT:  7.85 (  39Kops/s)
2013-04-30 15:38:43 -04:00
Nemanja Lukic
66def909ad MIPS: DSPr2: Fix bug in over_n_8888_8888_ca/over_n_8888_0565_ca routines
After introducing new PRNG (pseudorandom number generator) a bug in two DSPr2
routines was revealed. Bug manifested by wrong calculation in composite and
glyph tests, which caused make check to fail for MIPS DSPr2 optimizations.

Bug was in the calculation of the:
*dst = over (src, *dst) when ma == 0xffffffff

In this case src was not negated and shifted right by 24 bits, it was only
negated. When implementing this routine in the first place, I missplaced those
shifts, which alowed me to combine code for over operation and:
    UN8x4_MUL_UN8x4 (s, ma);
    UN8x4_MUL_UN8 (ma, srca);
    ma = ~ma;
    UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s);
So I decided to rewrite that piece of code from scratch. I changed logic, so
now assembly code mimics code from pixman-fast-path.c but processes two pixels
at a time. This code should be easier to debug and maintain.

The bug was revealed in commit b31a6962. Errors were detected by composite
and glyph tests.
2013-04-30 15:38:43 -04:00
Siarhei Siamashka
d768558ce1 sse2: faster bilinear interpolation (get rid of XOR instruction)
The old code was calculating horizontal weights for right pixels
in the following way (for simplicity assume 8-bit interpolation
precision):

  Start with "x = vx" and do increment "x += ux" after each pixel.
  In this case right pixel weight for interpolation can be calculated
  as "((x >> 8) ^ 0xFF) + 1", which is the same as "256 - (x >> 8)".

The new code instead:

  Starts with "x = -(vx + 1)", performs increment "x += -ux" after
  each pixel and calculates right weights as just "(x >> 8) + 1",
  eliminating the need for XOR operation in the inner loop.

So we have one instruction less on the critical path. Benchmarks
with "lowlevel-blt-bench -b src_8888_8888" using GCC 4.7.2 on
x86-64 system and default optimizations:

Intel Core i7 860 (2.8GHz):
    before: src_8888_8888 =  L1: 291.37  L2: 288.58  M:285.38
    after:  src_8888_8888 =  L1: 319.66  L2: 316.47  M:312.06

Intel Core2 T7300 (2GHz):
    before: src_8888_8888 =  L1: 121.95  L2: 118.38  M:118.52
    after:  src_8888_8888 =  L1: 128.82  L2: 125.12  M:124.88

Intel Atom N450 (1.67GHz):
    before: src_8888_8888 =  L1:  64.25  L2:  62.37  M: 61.80
    after:  src_8888_8888 =  L1:  64.23  L2:  62.37  M: 61.82

Inspired by the "sse2_bilinear_interpolation" function (single
pixel interpolation) from:
    http://lists.freedesktop.org/archives/pixman/2013-January/002575.html
2013-04-28 23:22:41 +03:00
Siarhei Siamashka
59109f3293 test: larger 0xFF/0x00 filled clusters in random images for blitters-test
Current blitters-test program had difficulties detecting a bug in
over_n_8888_8888_ca implementation for MIPS DSPr2:

    http://lists.freedesktop.org/archives/pixman/2013-March/002645.html

In order to hit the buggy code path, two consecutive mask values had
to be equal to 0xFFFFFFFF because of loop unrolling. The current
blitters-test generates random images in such a way that each byte
has 25% probability for having 0xFF value. Hence each 32-bit mask
value has ~0.4% probability for 0xFFFFFFFF. Because we are testing
many compositing operations with many pixels, encountering at least
one 0xFFFFFFFF mask value reasonably fast is not a problem. If a
bug related to 0xFFFFFFFF mask value is artificialy introduced into
over_n_8888_8888_ca generic C function, it gets detected on 675591
iteration in blitters-test (out of 2000000).

However two consecutive 0xFFFFFFFF mask values are much less likely
to be generated, so the bug was missed by blitters-test.

This patch addresses the problem by also randomly setting the 32-bit
values in images to either 0xFFFFFFFF or 0x00000000 (also with 25%
probability). It allows to have larger clusters of consecutive 0x00
or 0xFF bytes in images which may have special shortcuts for handling
them in unrolled or SIMD optimized code.
2013-04-28 22:14:47 +03:00
Stefan Weil
a99147d1ea Trivial spelling fixes in comments
They were found by codespell.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2013-04-27 04:08:45 -04:00
Peter Breitenlohner
9d0bb10312 Check for missing sqrtf() as, e.g., for Solaris 9
Signed-off-by: Peter Breitenlohner <peb@mppmu.mpg.de>
2013-04-08 14:33:25 -04:00
Søren Sandmann Pedersen
d8ac35af12 Improve precision of calculations in pixman-gradient-walker.c
The computations in pixman-gradient-walker.c currently take place at
very limited 8 bit precision which results in quite visible artefacts
in gradients. An example is the one produced by demos/linear-gradient
which currently looks like this:

    http://i.imgur.com/kQbX8nd.png

With the changes in this commit, the gradient looks like this:

    http://i.imgur.com/nUlyuKI.png

The images are also available here:

    http://people.freedesktop.org/~sandmann/gradients/before.png
    http://people.freedesktop.org/~sandmann/gradients/after.png

This patch computes pixels using floating point, but uses a faster
algorithm, which makes up for the loss of performance.

== Theory:

In both the new and the old algorithm, the various gradient
implementations compute a parameter x that indicates how far along the
gradient the current scanline is. The current algorithm has a cache of
the two color stops surrounding the last parameter; those are used in
a SIMD-within-register fashion in this way:

    t1 = walker->left_rb * idist + walker->right_rb * dist;

where dist and idist are the distances to the left and right color
stops respectively normalized to the distance between the left and
right stops. The normalization (which involves a division) is captured
in another cached variable "stepper". The cached values are recomputed
whenever the parameter moves in between two different stops (called
"reset" in the implementation).

Because idist and dist are computed in 8 bits only, a lot of
information is lost, which is quite visible as the image linked above
shows.

The new algorithm caches more information in the following way. When
interpolating between stops, the formula to be used is this:

     t = ((x - left) / (right - left));

     result = lc * (1 - t) + rc * t;

where

    - x is the parameter as computed by the main gradient code,
    - left is the position of the left color stop,
    - right is the position of the right color stop
    - lc is the color of the left color stop
    - rc is the color of the right color stop

That formula can also be written like this:

    result
      = lc * (1 - t) + rc * t;
      = lc + (rc - lc) * t
      = lc + (rc - lc) * ((x - left) / (right - left))
      = (rc - lc) / (right - left) * x +
      	       lc - (left * (rc - lc)) / (right - left)
      = s * x + b

where

    s = (rc - lc) / (right - left)

and

    b = lc - left * (rc - lc) / (right - left)
      = (lc * (right - left) - left * (rc - lc)) / (right - left)
      = (lc * right - rc * left) / (right - left)

To summarize, setting w = (right - left):

    s = (rc - lc) / w
    b = (lc * right - rc * left) / w

    r = s * x + b

Since s and b only depend on the two active stops, both can be cached
so that the computation only needs to do one multiplication and one
addition per pixel (followed by premultiplication of the alpha
channel). That is, seven multiplications in total, which is the same
number as the old SIMD-within-register implementation had.

== Implementation notes:

The new formula described above is implemented in single precision
floating point, and the eight divisions necessary to compute the
cached values are done by multiplication with the reciprocal of the
distance between the color stops.

The alpha values used in the cached computation are scaled by 255.0,
whereas the RGB values are kept in the [0, 1] interval. The ensures
that after premultiplication, all values will be in the [0, 255]
interval.

This scaling is done by first dividing all the all the channels by
257, and then later on dividing the r, g, b channels by 255. It would
be more natural to do all this scaling in only one place, but
inexplicably, that results in a (substantial) slowdown on Sandy Bridge
with GCC v 4.7.

== Performance impact (median of three runs of radial-perf-test):

   == Intel Sandy Bridge, Core i3 @ 1.2GHz

   Before: 0.014553
   After:  0.014410
   Change: 1.0% faster

   == AMD Barcelona @ 1.2 GHz

   Before: 0.021735
   After:  0.021328
   Change: 1.9% faster

Ie., slightly faster, though conceivably there could be a negative
impact on machines with a bigger difference between integer and
floating point performance.

V2:

- Use 's' and 'b' in the variable names instead of 'm' and 'd'. This
  way they match the explanation above

- Move variable declarations to the top of the function

- Remove unused stepper field

- Some formatting fixes

- Don't pointlessly include pixman-combine32.h

- Don't offset x for each pixel; go back to offsetting left_x and
  right_x at reset time. The offsets cancel out in the formula above,
  so there is no impact on the calcualations.
2013-03-16 01:14:22 -04:00
Søren Sandmann Pedersen
a1c2331e0e Move the IS_ZERO() to pixman-private.h and rename to FLOAT_IS_ZERO()
Some upcoming changes to pixman-gradient-walker.c will need this
macro.
2013-03-11 22:41:55 -04:00
Søren Sandmann Pedersen
2c953e572f test: Add radial-perf-test, a microbenchmark for radial gradients
This benchmark renders one of the radial gradients used in the
swfdec-youtube cairo trace 500 times and reports the average time it
took.

V2: Update .gitignore
2013-03-11 22:41:45 -04:00
Søren Sandmann Pedersen
460faaa411 demos: Add linear-gradient demo program
This program displays a linear gradient from blue to yellow. Due to
limited precision in pixman-gradient-walker.c, it currently has some
ugly artefacts that gives it a 'brushed metal' appearance.

V2: Update .gitignore
2013-03-11 22:40:05 -04:00
Behdad Esfahbod
aaae3d8eef Remove unused macro 2013-03-08 06:00:00 -05:00
Nemanja Lukic
5feda20fc3 MIPS: DSPr2: Added more fast-paths for SRC operation:
- src_0888_8888_rev
 - src_0888_0565_rev

Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        src_0888_8888_rev =  L1:  51.88  L2:  42.00  M: 19.04 ( 88.50%)  HT: 15.27  VT: 14.62  R: 14.13  RT:  7.12 (  45Kops/s)
        src_0888_0565_rev =  L1:  31.96  L2:  30.90  M: 22.60 ( 75.03%)  HT: 15.32  VT: 15.11  R: 14.49  RT:  6.64 (  43Kops/s)

Optimized:
        src_0888_8888_rev =  L1: 222.73  L2: 113.70  M: 20.97 ( 97.35%)  HT: 18.31  VT: 17.14  R: 16.71  RT:  9.74 (  54Kops/s)
        src_0888_0565_rev =  L1: 100.37  L2:  74.27  M: 29.43 ( 97.63%)  HT: 22.92  VT: 21.59  R: 20.52  RT: 10.56 (  56Kops/s)
2013-02-27 14:40:51 +01:00