Commit Graph

2114 Commits

Author SHA1 Message Date
Matt Turner
e927d23971 configure.ac: require >= gcc-4.5 for ARM iwMMXt
We're using a patched gcc-4.5, and having to modify configure.ac and
autoreconf between changes is annoying. And besides, 4.5, 4.6, and 4.7's
iwMMXt intrinsic support is equally broken, and we test a known broken
intrinsic in the configure test program, so the version check is rather
meaningless.
2012-04-15 14:00:17 -04:00
Matt Turner
0531170436 mmx: Use force_inline instead of __inline__ (bug 46906)
Fixes the build on MSVC.
2012-04-05 17:36:05 -04:00
Matt Turner
b950bb12dc mmx: enable over_n_0565 for b5g6r5
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-04-05 17:34:26 -04:00
Søren Sandmann Pedersen
87ecec8d72 gtk-utils.c: In pixbuf_from_argb32() use a8r8g8b8_to_rgba_np()
Instead of inlining a copy of that functionality.
2012-04-02 15:25:00 -04:00
Søren Sandmann Pedersen
d1ec1467f6 test/utils.c: Rename and export the pngify_pixels() function.
This function converts from a8r8g8b8 to non-premultiplied RGBA (the
PNG or GdkPixbuf format that has the channels in this order: R, G, B,
A in memory regardless of the computer's endianness). The function's
new name is a8r8g8b8_to_rgba_np().
2012-04-02 15:24:56 -04:00
Søren Sandmann Pedersen
b16ddf1782 gtk-utils.c: Don't include pixman-private.h
Use pixman_image_get_format() instead of image->bits.format.
2012-04-02 14:59:02 -04:00
Søren Sandmann Pedersen
b9ca23a9c7 Rename fast_composite_add_1000_1000 to _add_1_1()
The 1000_1000 name is a relic from before the refactoring.
2012-03-27 22:04:37 -04:00
Søren Sandmann Pedersen
746291a19e Add the original parrot image.
This is the Parrot image that was downscaled and cropped before being
used in the composite-test.c demo.
2012-03-27 22:04:36 -04:00
Søren Sandmann Pedersen
451b25ae90 composite-test.c: Add a parrot image
Instead of the yellow square, use a parrot as the source image. This
demonstrates the various blend modes much better.

The parrot is a cropped version of finger painting by Rubens LP:

    http://www.flickr.com/photos/dorubens/4030604504/in/set-72157622586088192/

where the background has been removed. Used here under Creative
Commons Attribution. The artist's web site:

     http://www.rubenslp.com.br/
2012-03-27 22:04:32 -04:00
Søren Sandmann Pedersen
3aa45d62e4 composite-test.c: Use similar gradient to the one in the PDF spec. 2012-03-24 16:41:47 -04:00
Søren Sandmann Pedersen
e1b8969e78 demos: Add checkerboard demo
This is a simple demo that displays a checkboard with a projective
transformation.
2012-03-24 16:29:36 -04:00
Søren Sandmann Pedersen
41863fbabb demos: Add quad2quad program
This program can compute the projective transformation that transforms
one quadrilateral into another. The code is basically maxima[1] output
translated into C.

[1] http://maxima.sourceforge.net/
2012-03-24 16:29:27 -04:00
Søren Sandmann Pedersen
cf0d0d6364 Use "=a" and "=d" constraints for rdtsc inline assembly
In 32 bit mode the "=A" constraint refers to the register pair
edx:eax, but according to GCC developers this is not the case in 64
bit mode, where it refers to "rax".

Hence, using "=A" for rdtsc is incorrect in 64 bit mode.

See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21249
2012-03-24 16:26:07 -04:00
Jeremy Huddleston
8a8aabf05c configure.ac: Fix a copy-paste-o in TLS detection
Regression from: a069da6c66

Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Tested-by: Matt Turner <mattst88@gmail.com>
2012-03-16 12:41:14 -07:00
Matt Turner
ee6bac11c2 Use AC_LANG_SOURCE for DSPr2 configure program
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-03-15 16:49:29 -04:00
Chun-wei Fan
21eeecffa9 Just include xmmintrin.h on MSVC as well
The xmmintrin.h as shipped with recent Visual C++ (2003+) provides
_mm_shuffle_pi16 and _mm_mulhi_pu16, so including that header
will do for using these functions, and MSVC does not like the GCC-specific
implementations of _mm_shuffle_pi16 and _mm_mulhi_pu16 that is
currently in the code.

_MM_SHUFFLE is declared in the same way in MSVC's xmmintrin.h, so don't
re-define it here to avoid a compilation warning.
2012-03-15 15:18:11 -04:00
Jeremy Huddleston
94aea2e868 Fix a false-negative in MMX check
Silence warnings that could make -Werror give a false negative
Use signed char to avoid cases where int8_t isn't declared

Reported-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
2012-03-14 19:10:22 -07:00
Nemanja Lukic
d2ee5631ae MIPS: DSPr2: Added over_n_8888_8888_ca and over_n_8888_0565_ca fast paths.
Performance numbers before/after on MIPS-74kc @ 1GHz

Referent (before):

lowlevel-blt-bench:
     over_n_8888_8888_ca =  L1:   8.32  L2:   7.65  M:  6.38 ( 51.08%)  HT:  5.78  VT:  5.74  R:  5.84  RT:  4.39 (  37Kops/s)
     over_n_8888_0565_ca =  L1:   7.40  L2:   6.95  M:  6.16 ( 41.06%)  HT:  5.72  VT:  5.52  R:  5.63  RT:  4.28 (  36Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.3
[  0]    image            xfce4-terminal-a1  138.223  139.070   0.33%    6/6
[ # ]  image16: pixman 0.25.3
[  0]  image16            xfce4-terminal-a1  132.763  132.939   0.06%    5/6

Optimized:

lowlevel-blt-bench:
     over_n_8888_8888_ca =  L1:  19.35  L2:  23.84  M: 13.68 (109.39%)  HT: 11.39  VT: 11.19  R: 11.27  RT:  6.90 (  47Kops/s)
     over_n_8888_0565_ca =  L1:  18.68  L2:  17.00  M: 12.56 ( 83.70%)  HT: 10.72  VT: 10.45  R: 10.43  RT:  5.79 (  43Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.3
[  0]    image            xfce4-terminal-a1  130.400  131.720   0.46%    6/6
[ # ]  image16: pixman 0.25.3
[  0]  image16            xfce4-terminal-a1  125.830  126.604   0.34%    6/6
2012-03-13 18:04:31 -04:00
Jeremy Huddleston
a069da6c66 Expand TLS support beyond __thread to __declspec(thread)
This code was pretty much coppied from a similar commit that I made to
xorg-server in April.

cf: xorg/xserver: bb4d145bd25e2aee988b100ecf1105ea3b6a40b8

Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
2012-03-13 18:02:26 -04:00
Jeremy Huddleston
61d999b910 Disable MMX when incompatible clang is being used.
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
2012-03-13 18:02:26 -04:00
Jeremy Huddleston
ad4b6922f2 Silence a warning about unused pixman_have_mmx
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
2012-03-13 18:02:25 -04:00
Jeremy Huddleston
bb5ff26878 Revert "Disable MMX when Clang is being used."
This reverts commit 5eb4c12a79.
2012-03-13 18:02:25 -04:00
Cyril Brulebois
c6b4daedbc Upload to experimental. 2012-03-09 13:17:30 +01:00
Cyril Brulebois
b3db603f91 Add new symbols and bump shlibs accordingly. 2012-03-09 13:15:11 +01:00
Cyril Brulebois
e6c37e621b Bump changelogs. 2012-03-09 13:03:52 +01:00
Cyril Brulebois
e4e7b8fcb8 Merge branch 'debian-unstable' into debian-experimental 2012-03-09 13:03:07 +01:00
Cyril Brulebois
44abaa5132 Merge branch 'upstream-unstable' into debian-experimental 2012-03-09 13:03:04 +01:00
Søren Sandmann Pedersen
a6ad5120f7 Post-release version bump to 0.25.3 2012-03-08 10:11:20 -05:00
Søren Sandmann Pedersen
f73f798531 Pre-release version bump to 0.25.2 2012-03-08 09:33:16 -05:00
Søren Sandmann Pedersen
62df04eb25 mmx: Squash a warning by making the argument to ldl_u() const 2012-03-08 09:29:46 -05:00
Alan Coopersmith
85943733cb Just use xmmintrin.h when building with Solaris Studio compilers
Since the Solaris Studio compilers don't have a mode where MMX
instructions are available and SSE instructions are not, we can
just use the <xmmintrin.h> header directly.

Fixes build failure due to Studio not supporting the __gnu_inline__
or __artificial__ attributes.

Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Acked-by: Matt Turner <mattst88@gmail.com>
2012-03-05 18:57:26 -08:00
Nemanja Lukic
304f57644a MIPS: DSPr2: Added mips_dspr2_blt and mips_dspr2_fill routines.
Performance numbers before/after on MIPS-74kc @ 1GHz

Referent (before):

lowlevel-blt-bench:
              src_n_0565 =  L1: 238.14  L2: 233.15  M: 57.88 ( 77.23%)  HT: 53.22  VT: 49.99  R: 47.73  RT: 24.79 (  91Kops/s)
              src_n_8888 =  L1: 190.19  L2: 187.57  M: 28.94 ( 77.23%)  HT: 27.91  VT: 27.33  R: 26.64  RT: 14.68 (  77Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.1
[  0]    image         gnome-system-monitor  268.460  269.712   0.22%    6/6

Optimized:

lowlevel-blt-bench:
              src_n_0565 =  L1:1081.39  L2: 258.22  M:189.59 (252.91%)  HT: 60.23  VT: 55.01  R: 53.44  RT: 23.68 (  89Kops/s)
              src_n_8888 =  L1: 653.46  L2: 113.55  M:135.26 (360.86%)  HT: 38.99  VT: 37.38  R: 34.95  RT: 18.67 (  84Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.1
[  0]    image         gnome-system-monitor  246.565  246.706   0.04%    6/6
2012-03-04 01:09:56 -05:00
Søren Sandmann Pedersen
999e72b80b pixman-access.c: Remove some unused macros
The macros related to palette entries:

RGB15_TO_ENTRY,
RGB24_TO_ENTRY,
RGB24_TO_ENTRY_Y

are not used anywhere.
2012-03-01 23:49:51 -05:00
Søren Sandmann Pedersen
c0cb48aae0 pixman-accessors.h: Delete unused macros
The MEMCPY_WRAPPED and ACCESS macros are not used anymore.
2012-03-01 23:49:51 -05:00
Søren Sandmann Pedersen
5adf569317 Move fetching for solid bits images to pixman-noop.c
This should be a bit faster because it can reuse the scanline on each iteration.
2012-03-01 23:49:50 -05:00
Matt Turner
3c3c70fa0b lowlevel-blt-bench: add in_8_8 and in_n_8_8
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-03-01 17:42:37 -05:00
Søren Sandmann Pedersen
fcea053561 Disable implementations mentioned in the PIXMAN_DISABLE environment variable.
With this, it becomes possible to do

     PIXMAN_DISABLE="sse2 mmx" some_app

which will run some_app without SSE2 and MMX enabled. This is useful
for benchmarking, testing and narrowing down bugs.

The current list of implementations that can be disabled:

    fast
    mmx
    sse2
    arm-simd
    arm-iwmmxt
    arm-neon
    mips-dspr2
    vmx

The general and noop implementations can't be disabled because pixman
depends on those being available for correct operation.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2012-02-28 15:46:13 -05:00
Nemanja Lukic
e7574d336b MIPS: DSPr2: Added fast-paths for SRC operation.
Following fast-path functions are implemented (routines 4, 5 and 6 utilize
same fast-memcpy routine):
    1. src_x888_8888
    2. src_8888_0565
    3. src_0565_8888
    4. src_0565_0565
    5. src_8888_8888
    6. src_0888_0888

Performance numbers before/after on MIPS-74kc @ 1GHz

Referent (before):

lowlevel-blt-bench:
        src_x888_8888 =  L1: 199.35  L2:  96.54  M: 18.87 (100.68%)  HT: 17.12  VT: 16.24  R: 15.43  RT:  9.33 (  61Kops/s)
        src_8888_0565 =  L1:  71.22  L2:  51.95  M: 24.19 ( 96.17%)  HT: 20.71  VT: 19.92  R: 18.15  RT:  9.92 (  63Kops/s)
        src_0565_8888 =  L1:  38.82  L2:  36.22  M: 18.60 ( 73.95%)  HT: 14.47  VT: 13.19  R: 12.97  RT:  6.61 (  49Kops/s)
        src_0565_0565 =  L1: 286.05  L2: 155.02  M: 37.68 (100.54%)  HT: 31.08  VT: 28.07  R: 26.26  RT: 11.93 (  68Kops/s)
        src_8888_8888 =  L1: 454.32  L2: 139.15  M: 19.30 (102.98%)  HT: 17.73  VT: 16.08  R: 16.62  RT: 10.45 (  64Kops/s)
        src_0888_0888 =  L1: 190.47  L2: 106.14  M: 25.26 (101.08%)  HT: 21.88  VT: 20.32  R: 18.83  RT: 10.10 (  63Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.1
[  0]    image            firefox-asteroids  421.215  421.325   0.01%    4/6
[  1]    image         firefox-planet-gnome  647.708  648.486   0.13%    6/6
[  2]    image         gnome-system-monitor  276.073  277.506   0.38%    6/6
[  3]    image           gnome-terminal-vim  263.866  265.229   0.39%    6/6
[  4]    image                      poppler  123.576  124.003   0.15%    6/6

Optimized (with these optimizations):

lowlevel-blt-bench:
        src_x888_8888 =  L1: 369.50  L2:  99.37  M: 27.19 (145.07%)  HT: 20.24  VT: 19.48  R: 19.00  RT: 10.22 (  63Kops/s)
        src_8888_0565 =  L1: 105.65  L2:  67.87  M: 25.41 (101.00%)  HT: 20.78  VT: 19.84  R: 18.52  RT:  9.81 (  63Kops/s)
        src_0565_8888 =  L1:  77.10  L2:  63.04  M: 23.37 ( 92.90%)  HT: 20.29  VT: 19.37  R: 18.14  RT: 10.02 (  63Kops/s)
        src_0565_0565 =  L1: 519.02  L2: 241.32  M: 62.35 (166.34%)  HT: 33.74  VT: 27.63  R: 26.12  RT: 11.70 (  67Kops/s)
        src_8888_8888 =  L1: 390.48  L2: 113.99  M: 30.32 (161.77%)  HT: 19.55  VT: 17.05  R: 17.13  RT: 10.19 (  63Kops/s)
        src_0888_0888 =  L1: 349.74  L2: 156.68  M: 40.68 (162.78%)  HT: 25.58  VT: 20.57  R: 20.20  RT:  9.96 (  63Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.1
[  0]    image            firefox-asteroids  400.050  400.308   0.04%    6/6
[  1]    image         firefox-planet-gnome  628.978  629.364   0.07%    6/6
[  2]    image         gnome-system-monitor  270.247  270.313   0.03%    6/6
[  3]    image           gnome-terminal-vim  256.413  257.641   0.21%    6/6
[  4]    image                      poppler  119.540  120.023   0.21%    6/6
2012-02-25 15:06:43 -05:00
Nemanja Lukic
1364c91bd1 MIPS: DSPr2: Basic infrastructure for MIPS architecture
MIPS DSP instruction set extensions
2012-02-25 15:06:43 -05:00
Matt Turner
e43d65d49d lowlevel-blt: add over_x888_n_8888
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-24 20:02:55 -05:00
Matt Turner
9f60704995 lowlevel-blt: add over_8888_8888
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-24 19:58:09 -05:00
Søren Sandmann Pedersen
5eb4c12a79 Disable MMX when Clang is being used.
There are several issues with the Clang compiler and pixman-mmx.c:

- When not optimizing, it doesn't seem to recognize that an argument
  to an __always_inline__ function is compile-time constant. This
  results in this error being produced:

      fatal error: error in backend: Invalid operand for inline asm
              constraint 'K'!

- This inline assembly:

      asm ("pmulhuw %1, %0\n\t"
          : "+y" (__A)
          : "y" (__B)
      );

  results in

      fatal error: error in backend: Unsupported asm: input constraint
              with a matching output constraint of incompatible type!

So disable MMX when the compiler is Clang.
2012-02-24 16:30:41 -05:00
Matt Turner
350e231b3f mmx: make load8888 take a pointer to data instead of the data itself
Allows us to tune how we load data into the vector registers.

Signed-off-by: Matt Turner <mattst88@gmail.com>

And squashed in:

mmx: define and use load8888u function

For unaligned loads.

Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-24 08:46:48 -05:00
Matt Turner
ab68316eda mmx: make store8888 take uint32_t *dest as argument
Allows us to tune how we store data from the vector registers.

Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-24 08:46:28 -05:00
Matt Turner
57a245a6e0 Update .gitignore with more demos and tests
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-22 16:32:46 -05:00
Søren Sandmann Pedersen
51ae3f2d7f mmx: Delete unused function in_over_full_src_alpha()
Also a few minor formatting fixes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2012-02-22 14:14:30 -05:00
Søren Sandmann Pedersen
bbd1e6941b mmx: Enable over_x888_8_8888() for x86 as well
It used to be slower than the generic code (with the gcc that was
current in 2007), but that doesn't seem to be the case anymore:

over_x888_8_8888 =  L1:  22.97  L2:  22.88  M: 22.27 (  5.29%)  HT: 18.30  VT: 15.81  R: 15.54  RT: 10.35 ( 131Kops/s)
over_x888_8_8888 =  L1:  53.56  L2:  53.20  M: 50.50 ( 11.99%)  HT: 38.60  VT: 31.19  R: 29.00  RT: 17.37 ( 208Kops/s)

Reviewed-by: Matt Turner <mattst88@gmail.com>
2012-02-22 14:14:08 -05:00
Matt Turner
4fc586c3df mmx: fix typo in pix_add_mul on MSVC
Typo introduced in commit a075a870.

Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-21 16:28:37 -05:00
Matt Turner
84221f4c16 mmx: Use _mm_shuffle_pi16
The pshufw x86 instruction is part of Extended 3DNow! and SSE1. The
equivalent ARM wshufh instruction was available from the first iwMMXt
instrucion set.

This instruction is already used in the SSE2 code.

Reduces code size by ~9%.

amd64
  text    data     bss     dec     hex filename
 29925    2240       0   32165    7da5 .libs/libpixman_mmx_la-pixman-mmx.o
 27237    2240       0   29477    7325 .libs/libpixman_mmx_la-pixman-mmx.o

x86
  text    data     bss     dec     hex filename
 27677    1792       0   29469    731d .libs/libpixman_mmx_la-pixman-mmx.o
 24959    1792       0   26751    687f .libs/libpixman_mmx_la-pixman-mmx.o

arm
  text    data     bss     dec     hex filename
 30176    1792       0   31968    7ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o
 27384    1792       0   29176    71f8 .libs/libpixman_iwmmxt_la-pixman-mmx.o

Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-21 12:47:49 -05:00
Matt Turner
1420834496 mmx: Use _mm_mulhi_pu16
The pmulhuw x86 instruction is part of Extended 3DNow! and SSE1. The
equivalent ARM wmuluh instruction was available from the first iwMMXt
instrucion set.

This instruction is already used in the SSE2 code.

Reduces code size by ~5%.

amd64
  text    data     bss     dec     hex filename
 31325    2240       0   33565    831d .libs/libpixman_mmx_la-pixman-mmx.o
 29925    2240       0   32165    7da5 .libs/libpixman_mmx_la-pixman-mmx.o

x86
  text    data     bss     dec     hex filename
 29165    1792       0   30957    78ed .libs/libpixman_mmx_la-pixman-mmx.o
 27677    1792       0   29469    731d .libs/libpixman_mmx_la-pixman-mmx.o

arm
  text    data     bss     dec     hex filename
 31632    1792       0   33424    8290 .libs/libpixman_iwmmxt_la-pixman-mmx.o
 30176    1792       0   31968    7ce0 .libs/libpixman_iwmmxt_la-pixman-mmx.o

Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-21 12:46:02 -05:00