Commit Graph

2096 Commits

Author SHA1 Message Date
Cyril Brulebois
ae5a109768 Upload to experimental. 2012-05-20 17:56:41 +02:00
Cyril Brulebois
a2283057a6 Remove demos/parrot.jpg before building the source package.
Let's avoid “binary file contents changed” until it's shipped in the
upstream tarball.
2012-05-20 17:56:18 +02:00
Cyril Brulebois
5cb7202a34 Bump changelogs. 2012-05-20 17:41:34 +02:00
Cyril Brulebois
4ed6f63c09 Merge branch 'upstream-experimental' into debian-experimental 2012-05-20 17:40:56 +02:00
Søren Sandmann Pedersen
1e1a00e964 Pre-release version bump to 0.25.6
Note that 0.25.4 was a botched release that doesn't have a tag and
doesn't correspond to any commit ID. It was however uploaded and
announced, so I'll just use the 0.25.6 version number.
2012-05-15 13:20:09 -04:00
Søren Sandmann Pedersen
b2c16aaadf demos/Makefile.am: Add parrot.c to EXTRA_DIST
To get 'make distcheck' to pass.
2012-05-15 13:19:19 -04:00
Matt Turner
50d3088d78 configure.ac: Rename loongson -> loongson-mmi
Make it match with the other fast paths, and the PIXMAN_DISABLE value is
already loongson-mmi.
2012-05-11 21:59:13 -04:00
Matt Turner
a0a40cb822 configure.ac: Fix loongson-mmi out-of-tree builds
When building out-of-tree, gcc wasn't able to find loongson-mmintrin.h
to compile the test program. Add -I$srcdir to CFLAGS to point gcc to it.
2012-05-11 21:49:42 -04:00
Nemanja Lukic
618a08e6aa MIPS: DSPr2: Added over_n_8_8888 and over_n_8_0565 fast paths.
Performance numbers before/after on MIPS-74kc @ 1GHz

Referent (before):

lowlevel-blt-bench:
     over_n_8_8888 =  L1:  10.40  L2:   9.79  M:  8.47 ( 33.62%)  HT:  7.64  VT:  7.59  R:  7.48  RT:  5.30 (  40Kops/s)
     over_n_8_0565 =  L1:   7.40  L2:   7.23  M:  6.78 ( 17.94%)  HT:  6.23  VT:  6.17  R:  6.14  RT:  4.62 (  37Kops/s)

Optimized:

lowlevel-blt-bench:
     over_n_8_8888 =  L1:  27.25  L2:  26.24  M: 18.15 ( 72.12%)  HT: 14.52  VT: 14.31  R: 13.83  RT:  7.57 (  48Kops/s)
     over_n_8_0565 =  L1:  18.91  L2:  17.59  M: 15.06 ( 39.90%)  HT: 12.18  VT: 11.98  R: 11.83  RT:  6.80 (  46Kops/s)
2012-05-11 17:11:27 -04:00
Matt Turner
7d4beedc61 mmx: add and use pack_4x565 function
The pack_4x565 makes use of the pack_4xpacked565 function which uses pmadd.

Some of the speed up is probably attributable to removing the artificial
serialization imposed by the
	vdest = pack_565 (..., vdest, 0);
	vdest = pack_565 (..., vdest, 1);
	...
pattern.

Loongson:
        over_n_0565 =  L1:  16.44  L2:  16.42  M: 13.83 (  9.85%)  HT: 12.83  VT: 12.61  R: 12.34  RT:  8.90 (  93Kops/s)
        over_n_0565 =  L1:  42.48  L2:  42.53  M: 29.83 ( 21.20%)  HT: 23.39  VT: 23.72  R: 21.80  RT: 11.60 ( 113Kops/s)

     over_8888_0565 =  L1:  15.61  L2:  15.42  M: 12.11 ( 25.79%)  HT: 11.07  VT: 10.70  R: 10.37  RT:  7.25 (  82Kops/s)
     over_8888_0565 =  L1:  35.01  L2:  35.20  M: 21.42 ( 45.57%)  HT: 18.12  VT: 17.61  R: 16.09  RT:  9.01 (  97Kops/s)

      over_n_8_0565 =  L1:  15.17  L2:  14.94  M: 12.57 ( 17.86%)  HT: 11.96  VT: 11.52  R: 10.79  RT:  7.31 (  79Kops/s)
      over_n_8_0565 =  L1:  29.83  L2:  29.79  M: 21.85 ( 30.94%)  HT: 18.82  VT: 18.25  R: 16.15  RT:  8.72 (  91Kops/s)

over_n_8888_0565_ca =  L1:  15.25  L2:  15.02  M: 11.64 ( 41.39%)  HT: 11.08  VT: 10.72  R: 10.02  RT:  7.00 (  77Kops/s)
over_n_8888_0565_ca =  L1:  30.12  L2:  29.99  M: 19.47 ( 68.99%)  HT: 17.05  VT: 16.55  R: 14.67  RT:  8.38 (  88Kops/s)

ARM/iwMMXt:
        over_n_0565 =  L1:  19.29  L2:  19.88  M: 17.38 ( 10.54%)  HT: 15.53  VT: 16.11  R: 13.69  RT: 11.00 (  96Kops/s)
        over_n_0565 =  L1:  36.02  L2:  34.85  M: 28.04 ( 16.97%)  HT: 22.12  VT: 24.21  R: 22.36  RT: 12.22 ( 103Kops/s)

     over_8888_0565 =  L1:  18.38  L2:  16.59  M: 12.34 ( 22.29%)  HT: 11.67  VT: 11.71  R: 11.02  RT:  6.89 (  72Kops/s)
     over_8888_0565 =  L1:  24.96  L2:  22.17  M: 15.11 ( 26.81%)  HT: 14.14  VT: 13.71  R: 13.18  RT:  8.13 (  78Kops/s)

      over_n_8_0565 =  L1:  14.65  L2:  12.44  M: 11.56 ( 14.50%)  HT: 10.93  VT: 10.39  R: 10.06  RT:  7.05 (  70Kops/s)
      over_n_8_0565 =  L1:  18.37  L2:  14.98  M: 13.97 ( 16.51%)  HT: 12.67  VT: 10.35  R: 11.80  RT:  8.14 (  74Kops/s)

over_n_8888_0565_ca =  L1:  14.27  L2:  12.93  M: 10.52 ( 33.23%)  HT:  9.70  VT:  9.90  R:  9.31  RT:  6.34 (  65Kops/s)
over_n_8888_0565_ca =  L1:  19.69  L2:  17.58  M: 13.40 ( 42.35%)  HT: 11.75  VT: 11.33  R: 11.17  RT:  7.49 (  73Kops/s)
2012-05-10 16:21:07 -04:00
Matt Turner
2beabd9fed configure.ac: make -march=loongson2f come before CFLAGS
Otherwise we'd have -march=loongson2f being overridden by automake's
CFLAGS ordering which causes build failures when -march=<not loongson2f>
is specified by the user.
2012-05-10 16:15:34 -04:00
Søren Sandmann Pedersen
dadb9a318b Add Makefile.win32 and Makefile.win32.common to EXTRA_DIST
https://bugs.freedesktop.org/show_bug.cgi?id=46905
2012-05-10 15:54:32 -04:00
Matt Turner
3c57ec471e .gitignore: add demos/checkerboard and demos/quad2quad 2012-05-09 22:50:50 -04:00
Matt Turner
2d431b53d3 mmx: Use wpackhus in src_x888_0565 on iwMMXt
iwMMXt which has an unsigned saturation pack instruction, while MMX/EXT
and Loongson don't.

ARM/iwMMXt:
src_8888_0565 =  L1: 110.38  L2:  82.33  M: 40.92 ( 73.22%)  HT: 35.63  VT: 32.22  R: 30.07  RT: 18.40 ( 132Kops/s)
src_8888_0565 =  L1: 117.91  L2:  83.05  M: 41.52 ( 75.58%)  HT: 37.63  VT: 35.40  R: 29.37  RT: 19.39 ( 134Kops/s)
2012-04-27 16:39:13 -04:00
Matt Turner
2ddd1c498b mmx: add src_8888_0565
Uses the pmadd technique described in
http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf

The technique uses the packssdw instruction which uses signed
saturatation. This works in their example because they pack 888 to 555
leaving the high bit as zero. For packing to 565, it is unsuitable, so
we replace it with an or+shuffle.

Loongson:
src_8888_0565 =  L1: 106.13  L2:  83.57  M: 33.46 ( 68.90%)  HT: 30.29  VT: 27.67  R: 26.11  RT: 15.06 ( 135Kops/s)
src_8888_0565 =  L1: 122.10  L2: 117.53  M: 37.97 ( 78.58%)  HT: 33.14  VT: 30.09  R: 29.01  RT: 15.76 ( 139Kops/s)

ARM/iwMMXt:
src_8888_0565 =  L1:  67.88  L2:  56.61  M: 31.20 ( 56.74%)  HT: 29.22  VT: 27.01  R: 25.39  RT: 19.29 ( 130Kops/s)
src_8888_0565 =  L1: 110.38  L2:  82.33  M: 40.92 ( 73.22%)  HT: 35.63  VT: 32.22  R: 30.07  RT: 18.40 ( 132Kops/s)
2012-04-27 14:12:28 -04:00
Matt Turner
3e8fe65a08 mmx: add x8f8g8b8 fetcher
Loongson:
   add_x888_x888 =  L1:  29.36  L2:  27.81  M: 14.05 ( 38.74%)  HT: 12.45  VT: 11.78  R: 11.52  RT:  7.23 (  75Kops/s)
   add_x888_x888 =  L1:  36.06  L2:  34.55  M: 14.81 ( 41.03%)  HT: 14.01  VT: 13.41  R: 13.06  RT:  9.06 (  90Kops/s)

 src_x888_8_x888 =  L1:  21.92  L2:  20.15  M: 13.35 ( 41.42%)  HT: 11.70  VT: 10.95  R: 10.53  RT:  6.18 (  65Kops/s)
 src_x888_8_x888 =  L1:  25.43  L2:  23.51  M: 14.12 ( 44.00%)  HT: 13.14  VT: 12.50  R: 11.86  RT:  7.49 (  76Kops/s)

over_x888_8_0565 =  L1:  10.64  L2:  10.17  M:  7.74 ( 21.35%)  HT:  6.83  VT:  6.55  R:  6.34  RT:  4.03 (  46Kops/s)
over_x888_8_0565 =  L1:  11.41  L2:  10.97  M:  8.07 ( 22.36%)  HT:  7.42  VT:  7.18  R:  6.92  RT:  4.62 (  52Kops/s)

ARM/iwMMXt:
   add_x888_x888 =  L1:  22.10  L2:  18.93  M: 13.48 ( 32.29%)  HT: 11.32  VT: 10.64  R: 10.36  RT:  6.51 (  61Kops/s)
   add_x888_x888 =  L1:  24.26  L2:  20.83  M: 14.52 ( 35.64%)  HT: 12.66  VT: 12.98  R: 11.34  RT:  7.69 (  72Kops/s)

 src_x888_8_x888 =  L1:  19.33  L2:  17.66  M: 14.26 ( 38.43%)  HT: 11.53  VT: 10.83  R: 10.57  RT:  6.12 (  58Kops/s)
 src_x888_8_x888 =  L1:  21.23  L2:  19.60  M: 15.41 ( 42.55%)  HT: 12.66  VT: 13.30  R: 11.55  RT:  7.32 (  67Kops/s)

over_x888_8_0565 =  L1:   8.15  L2:   7.56  M:  6.50 ( 15.58%)  HT:  5.73  VT:  5.49  R:  5.50  RT:  3.53 (  38Kops/s)
over_x888_8_0565 =  L1:   8.35  L2:   7.85  M:  6.68 ( 16.40%)  HT:  6.12  VT:  5.97  R:  5.78  RT:  4.03 (  43Kops/s)
2012-04-27 13:42:36 -04:00
Matt Turner
c2b1630d96 mmx: add a8 fetcher
oprofile of xfce4-terminal-a1
210535    9.0407  libpixman-1.so.0.25.3    fetch_scanline_a8
144802    6.0054  libpixman-1.so.0.25.3    mmx_fetch_a8

Loongson:
       add_8_8_8 =  L1:  17.98  L2:  17.28  M: 14.28 ( 19.79%)  HT: 11.11  VT: 10.38  R:  9.97  RT:  5.14 (  55Kops/s)
       add_8_8_8 =  L1:  20.44  L2:  19.65  M: 15.62 ( 21.53%)  HT: 12.86  VT: 11.98  R: 11.32  RT:  6.13 (  64Kops/s)

 src_8888_8_0565 =  L1:  19.97  L2:  18.59  M: 13.42 ( 32.55%)  HT: 11.46  VT: 10.78  R: 10.33  RT:  5.87 (  61Kops/s)
 src_8888_8_0565 =  L1:  21.16  L2:  19.68  M: 13.94 ( 33.64%)  HT: 12.31  VT: 11.52  R: 11.02  RT:  6.54 (  68Kops/s)

 src_x888_8_x888 =  L1:  20.54  L2:  18.88  M: 13.07 ( 40.74%)  HT: 11.05  VT: 10.36  R: 10.02  RT:  5.68 (  60Kops/s)
 src_x888_8_x888 =  L1:  21.92  L2:  20.15  M: 13.35 ( 41.42%)  HT: 11.70  VT: 10.95  R: 10.53  RT:  6.18 (  65Kops/s)

over_x888_8_0565 =  L1:  10.32  L2:   9.85  M:  7.63 ( 21.13%)  HT:  6.56  VT:  6.30  R:  6.12  RT:  3.80 (  43Kops/s)
over_x888_8_0565 =  L1:  10.64  L2:  10.17  M:  7.74 ( 21.35%)  HT:  6.83  VT:  6.55  R:  6.34  RT:  4.03 (  46Kops/s)

ARM/iwMMXt:
       add_8_8_8 =  L1:  13.10  L2:  11.67  M: 10.74 ( 13.46%)  HT:  8.62  VT:  8.15  R:  7.94  RT:  4.39 (  44Kops/s)
       add_8_8_8 =  L1:  13.81  L2:  12.79  M: 11.63 ( 13.93%)  HT:  9.33  VT:  9.20  R:  9.04  RT:  5.43 (  52Kops/s)

 src_8888_8_0565 =  L1:  16.62  L2:  15.07  M: 12.52 ( 27.46%)  HT: 10.07  VT: 10.17  R:  9.95  RT:  5.64 (  54Kops/s)
 src_8888_8_0565 =  L1:  16.84  L2:  16.11  M: 13.22 ( 27.71%)  HT: 11.74  VT: 10.90  R: 10.80  RT:  6.66 (  62Kops/s)

 src_x888_8_x888 =  L1:  17.49  L2:  16.22  M: 13.73 ( 38.73%)  HT: 10.10  VT: 10.33  R:  9.55  RT:  5.21 (  52Kops/s)
 src_x888_8_x888 =  L1:  19.33  L2:  17.66  M: 14.26 ( 38.43%)  HT: 11.53  VT: 10.83  R: 10.57  RT:  6.12 (  58Kops/s)

over_x888_8_0565 =  L1:   7.57  L2:   7.29  M:  6.37 ( 15.97%)  HT:  5.53  VT:  5.33  R:  5.21  RT:  3.22 (  35Kops/s)
over_x888_8_0565 =  L1:   8.15  L2:   7.56  M:  6.50 ( 15.58%)  HT:  5.73  VT:  5.49  R:  5.50  RT:  3.53 (  38Kops/s)
2012-04-27 13:42:26 -04:00
Matt Turner
20bad64d9a mmx: add r5g6b5 fetcher
Loongson:
add_0565_0565 =  L1:  12.73  L2:  12.26  M: 10.05 ( 13.87%)  HT:  8.77  VT:  8.50  R:  8.25  RT:  5.28 (  58Kops/s)
add_0565_0565 =  L1:  14.04  L2:  13.63  M: 10.96 ( 15.19%)  HT:  9.73  VT:  9.43  R:  9.11  RT:  5.93 (  64Kops/s)

ARM/iwMMXt:
add_0565_0565 =  L1:  10.36  L2:  10.03  M:  9.04 ( 10.88%)  HT:  3.11  VT:  7.16  R:  7.72  RT:  5.12 (  51Kops/s)
add_0565_0565 =  L1:  10.84  L2:  10.20  M:  9.15 ( 11.46%)  HT:  7.60  VT:  7.82  R:  7.70  RT:  5.41 (  53Kops/s)
2012-04-27 13:42:16 -04:00
Matt Turner
c136e535ad mmx: Use Loongson pextrh instruction in expand565
Same story as pinsrh in the previous commit.

 text	data	bss	dec	hex filename
25336	1952	  0   27288    6a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
25072	1952	  0   27024    6990 .libs/libpixman_loongson_mmi_la-pixman-mmx.o

-dsll: 95
+dsll: 70
-dsrl: 135
+dsrl: 105
-ldc1: 462
+ldc1: 445
-lw: 721
+lw: 700
+pextrh: 30
2012-04-27 13:42:07 -04:00
Matt Turner
facceb4a1f mmx: Use Loongson pinsrh instruction in pack_565
The pinsrh instruction is analogous to MMX EXT's pinsrw, except like
other Loongson vector instructions it cannot access the general purpose
registers. In the cases of other Loongson vector instructions, this is a
headache, but it is actually a good thing here. Since the instruction is
different from MMX, I've named the intrinsic loongson_insert_pi16.

 text	data	bss	dec	 hex filename
25976	1952	  0   27928	6d18 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
25336	1952	  0   27288	6a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o

-and: 181
+and: 147
-dsll: 143
+dsll: 95
-dsrl: 87
+dsrl: 135
-ldc1: 523
+ldc1: 462
-lw: 767
+lw: 721
+pinsrh: 35
2012-04-27 13:41:47 -04:00
Matt Turner
6d29b7d755 mmx: don't pack and unpack src unnecessarily
The combine function was store8888'ing the result, and all consumers
were immediately load8888'ing it, causing lots of unnecessary pack and
unpack instructions.

It's a very straight forward conversion, except for mmx_combine_over_u
and mmx_combine_saturate_u. mmx_combine_over_u was testing the integer
result to skip pixels, so we use the is_* functions to test the __m64
data directly without loading it into an integer register.

For mmx_combine_saturate_u there's not a lot we can do, since it uses
DIV_UN8.
2012-04-27 13:35:31 -04:00
Matt Turner
ee75003425 mmx: introduce is_equal, is_opaque, and is_zero functions
To be used by the next commit.
2012-04-27 13:35:25 -04:00
Matt Turner
10c77b339f mmx: simplify srcsrcsrcsrc calculation in over_n_8_0565 2012-04-27 13:35:19 -04:00
Matt Turner
e06947d101 mmx: remove unnecessary uint64_t<->__m64 conversions
Loongson:
add_8888_8888 =  L1:  68.73  L2:  55.09  M: 25.39 ( 68.18%)  HT: 25.28 VT: 22.42  R: 20.74  RT: 13.26 ( 131Kops/s)
add_8888_8888 =  L1: 159.19  L2: 114.10  M: 30.74 ( 77.91%)  HT: 27.63 VT: 24.99  R: 24.61  RT: 14.49 ( 141Kops/s)
2012-04-27 13:35:14 -04:00
Matt Turner
c78e986085 mmx: compile on MIPS for Loongson MMI optimizations
image               image16
           evolution   32.985 ->  29.667    27.314 ->  23.870
firefox-planet-gnome  197.982 -> 180.437   220.986 -> 205.057
gnome-system-monitor   48.482 ->  49.752    52.820 ->  49.528
  gnome-terminal-vim   60.799 ->  50.528    51.655 ->  44.131
      grads-heat-map    3.167 ->   3.181     3.328 ->   3.321
                gvim   38.646 ->  32.552    38.126 ->  34.453
       midori-zoomed   44.371 ->  43.338    28.860 ->  28.865
           ocitysmap   23.065 ->  18.057    23.046 ->  18.055
             poppler   43.676 ->  36.077    43.065 ->  36.090
  swfdec-giant-steps   20.166 ->  20.365    22.354 ->  16.578
      swfdec-youtube   31.502 ->  28.118    44.052 ->  41.771
   xfce4-terminal-a1   69.517 ->  51.288    62.225 ->  53.309
2012-04-27 13:35:05 -04:00
Matt Turner
4e0c7902b2 mmx: make ldq_u take __m64* directly
Before, if __m64 is allocated in vector or floating-point registers,

	__m64 vs = ldq_u((uint64_t *)src);

would cause src to be loaded into an integer register and then
transferred to an __m64 register. By switching ldq_u's argument type to
__m64 we give the compile enough information to recognize that it can
load to the vector register directly.

This patch is necessary for the Loongson optimizations when __m64 is
typedef'd as double.
2012-04-27 13:34:59 -04:00
Matt Turner
2e54b76a2d mmx: add load function and use it in add_8888_8888 2012-04-27 13:34:53 -04:00
Matt Turner
084e3f2f4b mmx: add store function and use it in add_8888_8888 2012-04-27 13:34:45 -04:00
Søren Sandmann Pedersen
e24c1c849d bits_image_fetch_pixel_convolution(): Make sure channels are signed
In the computation:

    srtot += RED_8 (pixel) * f

RED_8 (pixel) is an unsigned quantity, which means the signed filter
coefficient f gets converted to an unsigned integer before the
multiplication. We get away with this because when the 32 bit unsigned
result is converted to int32_t, the correct sign is produced. But if
srtot had been an int64_t, the result would have been a very large
positive number.

Fix this by explicitly casting the channels to int.
2012-04-20 10:17:13 -04:00
Søren Sandmann Pedersen
4d2fee1406 test/utils.c: Clip values to the [0, 255] interval
Unpremultiplying a superluminescent pixel can result in values greater
than 255.
2012-04-20 10:17:13 -04:00
Matt Turner
e291764584 configure.ac: fix iwMMXt/gcc version error message 2012-04-18 18:14:13 -04:00
Matt Turner
b87cd1f605 mmx: fix _mm_shuffle_pi16 function when compiling without optimization
The last argument must be an immediate value, and when compiling without
optimization the compiler might not recognize this. So use a macro if
not optimizing.
2012-04-15 14:03:08 -04:00
Matt Turner
e927d23971 configure.ac: require >= gcc-4.5 for ARM iwMMXt
We're using a patched gcc-4.5, and having to modify configure.ac and
autoreconf between changes is annoying. And besides, 4.5, 4.6, and 4.7's
iwMMXt intrinsic support is equally broken, and we test a known broken
intrinsic in the configure test program, so the version check is rather
meaningless.
2012-04-15 14:00:17 -04:00
Matt Turner
0531170436 mmx: Use force_inline instead of __inline__ (bug 46906)
Fixes the build on MSVC.
2012-04-05 17:36:05 -04:00
Matt Turner
b950bb12dc mmx: enable over_n_0565 for b5g6r5
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-04-05 17:34:26 -04:00
Søren Sandmann Pedersen
87ecec8d72 gtk-utils.c: In pixbuf_from_argb32() use a8r8g8b8_to_rgba_np()
Instead of inlining a copy of that functionality.
2012-04-02 15:25:00 -04:00
Søren Sandmann Pedersen
d1ec1467f6 test/utils.c: Rename and export the pngify_pixels() function.
This function converts from a8r8g8b8 to non-premultiplied RGBA (the
PNG or GdkPixbuf format that has the channels in this order: R, G, B,
A in memory regardless of the computer's endianness). The function's
new name is a8r8g8b8_to_rgba_np().
2012-04-02 15:24:56 -04:00
Søren Sandmann Pedersen
b16ddf1782 gtk-utils.c: Don't include pixman-private.h
Use pixman_image_get_format() instead of image->bits.format.
2012-04-02 14:59:02 -04:00
Søren Sandmann Pedersen
b9ca23a9c7 Rename fast_composite_add_1000_1000 to _add_1_1()
The 1000_1000 name is a relic from before the refactoring.
2012-03-27 22:04:37 -04:00
Søren Sandmann Pedersen
746291a19e Add the original parrot image.
This is the Parrot image that was downscaled and cropped before being
used in the composite-test.c demo.
2012-03-27 22:04:36 -04:00
Søren Sandmann Pedersen
451b25ae90 composite-test.c: Add a parrot image
Instead of the yellow square, use a parrot as the source image. This
demonstrates the various blend modes much better.

The parrot is a cropped version of finger painting by Rubens LP:

    http://www.flickr.com/photos/dorubens/4030604504/in/set-72157622586088192/

where the background has been removed. Used here under Creative
Commons Attribution. The artist's web site:

     http://www.rubenslp.com.br/
2012-03-27 22:04:32 -04:00
Søren Sandmann Pedersen
3aa45d62e4 composite-test.c: Use similar gradient to the one in the PDF spec. 2012-03-24 16:41:47 -04:00
Søren Sandmann Pedersen
e1b8969e78 demos: Add checkerboard demo
This is a simple demo that displays a checkboard with a projective
transformation.
2012-03-24 16:29:36 -04:00
Søren Sandmann Pedersen
41863fbabb demos: Add quad2quad program
This program can compute the projective transformation that transforms
one quadrilateral into another. The code is basically maxima[1] output
translated into C.

[1] http://maxima.sourceforge.net/
2012-03-24 16:29:27 -04:00
Søren Sandmann Pedersen
cf0d0d6364 Use "=a" and "=d" constraints for rdtsc inline assembly
In 32 bit mode the "=A" constraint refers to the register pair
edx:eax, but according to GCC developers this is not the case in 64
bit mode, where it refers to "rax".

Hence, using "=A" for rdtsc is incorrect in 64 bit mode.

See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21249
2012-03-24 16:26:07 -04:00
Jeremy Huddleston
8a8aabf05c configure.ac: Fix a copy-paste-o in TLS detection
Regression from: a069da6c66

Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
Tested-by: Matt Turner <mattst88@gmail.com>
2012-03-16 12:41:14 -07:00
Matt Turner
ee6bac11c2 Use AC_LANG_SOURCE for DSPr2 configure program
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-03-15 16:49:29 -04:00
Chun-wei Fan
21eeecffa9 Just include xmmintrin.h on MSVC as well
The xmmintrin.h as shipped with recent Visual C++ (2003+) provides
_mm_shuffle_pi16 and _mm_mulhi_pu16, so including that header
will do for using these functions, and MSVC does not like the GCC-specific
implementations of _mm_shuffle_pi16 and _mm_mulhi_pu16 that is
currently in the code.

_MM_SHUFFLE is declared in the same way in MSVC's xmmintrin.h, so don't
re-define it here to avoid a compilation warning.
2012-03-15 15:18:11 -04:00
Jeremy Huddleston
94aea2e868 Fix a false-negative in MMX check
Silence warnings that could make -Werror give a false negative
Use signed char to avoid cases where int8_t isn't declared

Reported-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>
2012-03-14 19:10:22 -07:00
Nemanja Lukic
d2ee5631ae MIPS: DSPr2: Added over_n_8888_8888_ca and over_n_8888_0565_ca fast paths.
Performance numbers before/after on MIPS-74kc @ 1GHz

Referent (before):

lowlevel-blt-bench:
     over_n_8888_8888_ca =  L1:   8.32  L2:   7.65  M:  6.38 ( 51.08%)  HT:  5.78  VT:  5.74  R:  5.84  RT:  4.39 (  37Kops/s)
     over_n_8888_0565_ca =  L1:   7.40  L2:   6.95  M:  6.16 ( 41.06%)  HT:  5.72  VT:  5.52  R:  5.63  RT:  4.28 (  36Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.3
[  0]    image            xfce4-terminal-a1  138.223  139.070   0.33%    6/6
[ # ]  image16: pixman 0.25.3
[  0]  image16            xfce4-terminal-a1  132.763  132.939   0.06%    5/6

Optimized:

lowlevel-blt-bench:
     over_n_8888_8888_ca =  L1:  19.35  L2:  23.84  M: 13.68 (109.39%)  HT: 11.39  VT: 11.19  R: 11.27  RT:  6.90 (  47Kops/s)
     over_n_8888_0565_ca =  L1:  18.68  L2:  17.00  M: 12.56 ( 83.70%)  HT: 10.72  VT: 10.45  R: 10.43  RT:  5.79 (  43Kops/s)
cairo-perf-trace:
[ # ]  backend                         test   min(s) median(s) stddev. count
[ # ]    image: pixman 0.25.3
[  0]    image            xfce4-terminal-a1  130.400  131.720   0.46%    6/6
[ # ]  image16: pixman 0.25.3
[  0]  image16            xfce4-terminal-a1  125.830  126.604   0.34%    6/6
2012-03-13 18:04:31 -04:00