Commit Graph

2788 Commits

Author SHA1 Message Date
Søren Sandmann Pedersen
f10b5449a8 general: Ensure that iter buffers are aligned to 16 bytes
At the moment iter buffers are only guaranteed to be aligned to a 4
byte boundary. SIMD implementations benefit from the buffers being
aligned to 16 bytes, so ensure this is the case.

V2:
- Use uintptr_t instead of unsigned long
- allocate 3 * SCANLINE_BUFFER_LENGTH byte on stack rather than just
  SCANLINE_BUFFER_LENGTH
- use sizeof (stack_scanline_buffer) instead of SCANLINE_BUFFER_LENGTH
  to determine overflow
2013-09-16 16:50:35 -04:00
Siarhei Siamashka
700db9d872 sse2: faster bilinear scaling (pack 4 pixels to write with MOVDQA)
The loops are already unrolled, so it was just a matter of packing
4 pixels into a single XMM register and doing aligned 128-bit
writes to memory via MOVDQA instructions for the SRC compositing
operator fast path. For the other fast paths, this XMM register
is also directly routed to further processing instead of doing
extra reshuffling. This replaces "8 PACKSSDW/PACKUSWB + 4 MOVD"
instructions with "3 PACKSSDW/PACKUSWB + 1 MOVDQA" per 4 pixels,
which results in a clear performance improvement.

There are also some other (less important) tweaks:

1. Convert 'pixman_fixed_t' to 'intptr_t' before using it as an
   index for addressing memory. The problem is that 'pixman_fixed_t'
   is a 32-bit data type and it has to be extended to 64-bit
   offsets, which needs extra instructions on 64-bit systems.

2. Allow to recalculate the horizontal interpolation weights only
   once per 4 pixels by treating the XMM register as four pairs
   of 16-bit values. Each of these 16-bit/16-bit pairs can be
   replicated to fill the whole 128-bit register by using PSHUFD
   instructions. So we get "3 PADDW/PSRLW + 4 PSHUFD" instructions
   per 4 pixels instead of "12 PADDW/PSRLW" per 4 pixels
   (or "3 PADDW/PSRLW" per each pixel).

   Now a good question is whether replacing "9 PADDW/PSRLW" with
   "4 PSHUFD" is a favourable exchange. As it turns out, PSHUFD
   instructions are very fast on new Intel processors (including
   Atoms), but are rather slow on the first generation of Core2
   (Merom) and on the other processors from that time or older.
   A good instructions latency/throughput table, covering all the
   relevant processors, can be found at:
        http://www.agner.org/optimize/instruction_tables.pdf

   Enabling this optimization is controlled by the PSHUFD_IS_FAST
   define in "pixman-sse2.c".

3. One use of PSHUFD instruction (_mm_shuffle_epi32 intrinsic) in
   the older code has been also replaced by PUNPCKLQDQ equivalent
   (_mm_unpacklo_epi64 intrinsic) in PSHUFD_IS_FAST=0 configuration.
   The PUNPCKLQDQ instruction is usually faster on older processors,
   but has some side effects (instead of fully overwriting the
   destination register like PSHUFD does, it retains half of the
   original value, which may inhibit some compiler optimizations).

Benchmarks with "lowlevel-blt-bench -b src_8888_8888" using GCC 4.8.1 on
x86-64 system and default optimizations. The results are in MPix/s:

====== Intel Core2 T7300 (2GHz) ======

old:                     src_8888_8888 =  L1: 128.69  L2: 125.07  M:124.86
                        over_8888_8888 =  L1:  83.19  L2:  81.73  M: 80.63
                      over_8888_n_8888 =  L1:  79.56  L2:  78.61  M: 77.85
                      over_8888_8_8888 =  L1:  77.15  L2:  75.79  M: 74.63

new (PSHUFD_IS_FAST=0):  src_8888_8888 =  L1: 168.67  L2: 163.26  M:162.44
                        over_8888_8888 =  L1: 102.91  L2: 100.43  M: 99.01
                      over_8888_n_8888 =  L1:  97.40  L2:  95.64  M: 94.24
                      over_8888_8_8888 =  L1:  98.04  L2:  95.83  M: 94.33

new (PSHUFD_IS_FAST=1):  src_8888_8888 =  L1: 154.67  L2: 149.16  M:148.48
                        over_8888_8888 =  L1:  95.97  L2:  93.90  M: 91.85
                      over_8888_n_8888 =  L1:  93.18  L2:  91.47  M: 90.15
                      over_8888_8_8888 =  L1:  95.33  L2:  93.32  M: 91.42

====== Intel Core i7 860 (2.8GHz) ======

old:                     src_8888_8888 =  L1: 323.48  L2: 318.86  M:314.81
                        over_8888_8888 =  L1: 187.38  L2: 186.74  M:182.46

new (PSHUFD_IS_FAST=0):  src_8888_8888 =  L1: 373.06  L2: 370.94  M:368.32
                        over_8888_8888 =  L1: 217.28  L2: 215.57  M:211.32

new (PSHUFD_IS_FAST=1):  src_8888_8888 =  L1: 401.98  L2: 397.65  M:395.61
                        over_8888_8888 =  L1: 218.89  L2: 217.56  M:213.48

The most interesting benchmark is "src_8888_8888" (because this code can
be reused for a generic non-separable SSE2 bilinear fetch iterator).

The results shows that PSHUFD instructions are bad for Intel Core2 T7300
(Merom core) and good for Intel Core i7 860 (Nehalem core). Both of these
processors support SSSE3 instructions though, so they are not the primary
targets for SSE2 code. But without having any other more relevant hardware
to test, PSHUFD_IS_FAST=0 seems to be a reasonable default for SSE2 code
and old processors (until the runtime CPU features detection becomes
clever enough to recognize different microarchitectures).

(Rebased on top of patch that removes support for 8-bit bilinear
 filtering -ssp)
2013-09-16 16:48:44 -04:00
Siarhei Siamashka
e43cc9c902 test: safeguard the scaling-bench test against COW
The calloc call from pixman_image_create_bits may still
rely on http://en.wikipedia.org/wiki/Copy-on-write
Explicitly initializing the destination image results in
a more predictable behaviour.

V2:
 - allocate 16 bytes aligned buffer with aligned stride instead
   of delegating this to pixman_image_create_bits
 - use memset for the allocated buffer instead of pixman solid fill
 - repeat tests 3 times and select best results in order to filter
   out even more measurement noise
2013-09-07 17:20:09 -04:00
Søren Sandmann Pedersen
a4c79d695d Drop support for 8-bit precision in bilinear filtering
The default has been 7-bit for a while now, and the quality
improvement with 8-bit precision is not enough to justify keeping the
code around as a compile-time option.
2013-09-07 17:19:50 -04:00
Søren Sandmann Pedersen
80a232db68 Make the first argument to scanline fetchers have type bits_image_t *
Scanline fetchers haven't been used for images other than bits for a
long time, so by making the type reflect this fact, a bit of casting
can be saved in various places.
2013-09-07 17:12:18 -04:00
Matt Turner
8ad63f90cd iwmmxt: Disallow if gcc version is < 4.8.
Later versions of gcc-4.7.x are capable of generating iwMMXt
instructions properly, but gcc-4.8 contains better support and other
fixes, including iwMMXt in conjunction with hardfp. The existing 4.5
requirement was based on attempts to have OLPC use a patched gcc to
build pixman. Let's just require gcc-4.8.
2013-09-04 23:48:52 -07:00
Søren Sandmann Pedersen
02906e57bd fast_bilinear_cover_init: Don't install a finalizer on the error path
No memory is allocated in the error case, so a finalizer is not
necessary, and will cause problems if the data pointer is not
initialized to NULL.
2013-08-31 14:19:58 -04:00
Julien Cristau
d4898ac139 Upload to unstable 2013-08-13 12:08:22 +02:00
Julien Cristau
105c249996 Increase alpha-loop test timeout some more. 2013-08-13 12:03:40 +02:00
Julien Cristau
9b844940ba Includes big-endian matrix-test fix 2013-08-13 12:01:40 +02:00
Julien Cristau
2fc06503f6 Bump changelogs 2013-08-13 12:00:48 +02:00
Julien Cristau
a781ff50e7 pixman 0.30.2 release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJSAlYRAAoJEIWlZJw4kjNuBQYIAKwOAc0rKtX5c/z5iuf90akR
 EfEKK5ICQ8iE55Jvmn3e9ny12yrRbP/S6++W2kKkaF6gEmab2/3YswN42/ZPn3gJ
 1RER7b+x/CxsJbJVNPbRBLdkfF2HH8RicJru7cQ98TjR2mSC9uKAyiC/podWQZvO
 96rcnXZZBZMMjZLCUYfhiNz71Frhjh3fZrodx9GUJ6Lbka74bvWJ3fB4PXoTtbbr
 H8OPkxJQw5OjGtqgwB8lbLQZmZLhuZYUGOF0wbSA2+2HvylxlPlpUgC1c3r8yn77
 MQsD/ex+CfswwxxMTrINkHSVllaoJafM8cjk8HFG3EPkW/ohdpDthhtZpmSsM5E=
 =09FF
 -----END PGP SIGNATURE-----

Merge tag 'pixman-0.30.2' into debian-unstable

pixman 0.30.2 release
2013-08-13 12:00:07 +02:00
Søren Sandmann Pedersen
3518a0dafa Add an iterator that can fetch bilinearly scaled images
This new iterator works in a separable way; that is, for a destination
scaline, it scales the two involved source scanlines and then caches
them so that they can be reused for the next destination scanlines.

There are two versions of the code, one that uses 64 bit arithmetic,
and one that uses 32 bit arithmetic only. The latter version is
used on 32 bit systems, where it is expected to be faster.

This scheme saves a substantial amount of arithmetic for larger
scalings; the per-pixel times for various configurations as reported
by scaling-bench are graphed here:

	http://people.freedesktop.org/~sandmann/separable.v2/v2.png

The "sse2" graph is current default on x86, "mmx" is with sse2
disabled, "old c" is with sse2 and mmx disabled. The "new 32" and "new
64" graphs show times for the new code. As the graphs show, the 64 bit
version of the new code beats the "old c" for all scaling ratios.

The data was taken on a Sandy Bridge Core i3-2350M CPU @ 2.0 GHz
running in 64 bit mode.

The data used to generate the graph is available in this directory:

    http://people.freedesktop.org/~sandmann/separable.v2/

There is also a Gnumeric spreadsheet v2.gnumeric containing the
per-pixel values and the graph.

V2:
- Add error message in the OOM/bad matrix case
- Save some shifts by storing the cached scanlines in AGBR order
- Special cased version that uses 32 bit arithmetic when sizeof(long) <= 4
2013-08-10 11:18:23 -04:00
Søren Sandmann Pedersen
146116eff4 Add support for iter finalizers
Iterators may sometimes need to allocate auxillary memory. In order to
be able to free this memory, optional iterator finalizers are
required.
2013-08-10 11:18:23 -04:00
Søren Sandmann Pedersen
1be9208e04 test/scaling-bench.c: New benchmark for bilinear scaling
This new benchmark scales a 320 x 240 test a8r8g8b8 image by all
ratios from 0.1, 0.2, ... up to 10.0 and reports the time it to took
to do each of the scaling operations, and the time spent per
destination pixel.

The times reported for the scaling operations are given in
milliseconds, the times-per-pixel are in nanoseconds.

V2: Format output better
2013-08-10 11:18:23 -04:00
Søren Sandmann Pedersen
fedd6b192d RELEASING: Add note about changing the topic of the #cairo IRC channel 2013-08-07 10:22:25 -04:00
Søren Sandmann Pedersen
f8a0812b1c Pre-release version bump to 0.30.2 2013-08-07 10:07:35 -04:00
Siarhei Siamashka
b5167b8a54 test: fix matrix-test on big endian systems 2013-08-05 01:45:59 +03:00
Siarhei Siamashka
d87601ffc3 test: fix matrix-test on big endian systems 2013-08-05 01:42:29 +03:00
Julien Cristau
bbb3765faf Upload to unstable 2013-08-03 10:24:43 +02:00
Julien Cristau
2e13b569cb Increase timeout for the alpha-loop test.
That will hopefully let it pass on the mips buildd.
2013-08-03 10:23:41 +02:00
Andrea Canciani
a82b95a264 test: Fix build on MSVC
The MSVC compiler is very strict about variable declarations after
statements.

Move all the declarations of each block before any statement in the
same block to fix multiple instances of:

alpha-loop.c(XX) : error C2275: 'pixman_image_t' : illegal use of this
type as an expression
2013-08-01 09:08:15 -07:00
Søren Sandmann Pedersen
4c04a86c68 Version bump to 0.30.1 2013-08-01 07:19:21 -04:00
Alexander Troosh
6300452952 Require GTK+ version >= 2.16
I'm got bug in my system:

lcc: "scale.c", line 374: warning: function "gtk_scale_add_mark" declared
          implicitly [-Wimplicit-function-declaration]
      gtk_scale_add_mark (GTK_SCALE (widget), 0.0, GTK_POS_LEFT, NULL);
      ^

  CCLD   scale
scale.o: In function `app_new':
(.text+0x23e4): undefined reference to `gtk_scale_add_mark'
scale.o: In function `app_new':
(.text+0x250c): undefined reference to `gtk_scale_add_mark'
scale.o: In function `app_new':
(.text+0x2634): undefined reference to `gtk_scale_add_mark'
make[2]: *** [scale] Error 1
make[2]: Target `all' not remade because of errors.

$ pkg-config --modversion gtk+-2.0
2.12.1

The demos/scale.c use call to gtk_scale_add_mark() function from 2.16+
version of GTK+. Need do support old GTK+ (rewrite scale.c) or simple
demand of high version of GTK+, like this:
2013-07-30 08:18:35 -04:00
Matthieu Herrb
02869a1229 configure.ac: Don't use '+=' since it's not POSIX
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Matthieu Herrb <matthieu.herrb@laas.fr>
2013-07-30 08:18:25 -04:00
Markos Chandras
35da06c828 Use AC_LINK_IFELSE to check if the Loongson MMI code can link
The Loongson code is compiled with -march=loongson2f to enable the MMI
instructions, but binutils refuses to link object code compiled with
different -march settings, leading to link failures later in the
compile. This avoids that problem by checking if we can link code
compiled for Loongson.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Markos Chandras <markos.chandras@imgtec.com>
2013-07-30 08:18:02 -04:00
ingmar@irsoft.de
e14f5a739f Fix broken build when HAVE_CONFIG_H is undefined, e.g. on Win32.
Build fix for platforms without a generated config.h, for example Win32.
2013-07-30 08:17:49 -04:00
Julien Cristau
3f0d759608 Upload to unstable 2013-07-27 21:40:50 +02:00
Julien Cristau
3c4dac9a7c Fix matrix-test on big endian
Patch from Siarhei Siamashka.
2013-07-27 21:40:09 +02:00
Julien Cristau
3473a947da Disable arm iwmmxt fast paths. It breaks the build. 2013-07-27 14:48:50 +02:00
Julien Cristau
dc29515934 Disable silent Makefile rules. 2013-07-27 14:37:23 +02:00
Julien Cristau
2084b2d3bd Upload to unstable 2013-07-26 14:58:46 +02:00
Julien Cristau
317b3c3eea Add more test-only exported functions to symbols file 2013-07-26 14:47:35 +02:00
Julien Cristau
73ff58c119 Remove png file missing from the tarball 2013-07-26 14:36:14 +02:00
Julien Cristau
d2fbfbc23c Bump changelog and symbols for 0.30.0 2013-07-26 14:31:38 +02:00
Julien Cristau
5de927bd3e Merge branch 'upstream-merge' into debian-unstable 2013-07-26 14:26:43 +02:00
Julien Cristau
0ef6350c3d Revert "Add 00-unexport-symbol.diff"
This reverts commit 01c2431ef8.
2013-07-26 14:26:30 +02:00
Julien Cristau
07473e703e Merge remote-tracking branch 'origin/debian-experimental' into debian-unstable
Conflicts:
	debian/changelog
2013-07-26 14:26:11 +02:00
Julien Cristau
be9bb76118 Merge remote-tracking branch 'origin/upstream-experimental' into upstream-merge 2013-07-26 14:24:21 +02:00
Andrea Canciani
1e49329333 test: Fix build on MSVC
The MSVC compiler is very strict about variable declarations after
statements.

Move all the declarations of each block before any statement in the
same block to fix multiple instances of:

alpha-loop.c(XX) : error C2275: 'pixman_image_t' : illegal use of this
type as an expression
2013-06-25 16:55:24 +02:00
Alexander Troosh
279bdcda7e Require GTK+ version >= 2.16
I'm got bug in my system:

lcc: "scale.c", line 374: warning: function "gtk_scale_add_mark" declared
          implicitly [-Wimplicit-function-declaration]
      gtk_scale_add_mark (GTK_SCALE (widget), 0.0, GTK_POS_LEFT, NULL);
      ^

  CCLD   scale
scale.o: In function `app_new':
(.text+0x23e4): undefined reference to `gtk_scale_add_mark'
scale.o: In function `app_new':
(.text+0x250c): undefined reference to `gtk_scale_add_mark'
scale.o: In function `app_new':
(.text+0x2634): undefined reference to `gtk_scale_add_mark'
make[2]: *** [scale] Error 1
make[2]: Target `all' not remade because of errors.

$ pkg-config --modversion gtk+-2.0
2.12.1

The demos/scale.c use call to gtk_scale_add_mark() function from 2.16+
version of GTK+. Need do support old GTK+ (rewrite scale.c) or simple
demand of high version of GTK+, like this:
2013-06-11 12:09:49 -04:00
Matthieu Herrb
889f118946 configure.ac: Don't use '+=' since it's not POSIX
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Matthieu Herrb <matthieu.herrb@laas.fr>
2013-06-08 10:21:54 -07:00
Søren Sandmann Pedersen
2acfac5f8e Consolidate all the iter_init_bits_stride functions
The SSE2, MMX, and fast implementations all have a copy of the
function iter_init_bits_stride that computes an image buffer and
stride.

Move that function to pixman-utils.c and share it among all the
implementations.
2013-05-22 09:43:21 -04:00
Søren Sandmann Pedersen
533f54430a Delete the old src/dest_iter_init() functions
Now that we are using the new _pixman_implementation_iter_init(), the
old _src/_dest_iter_init() functions are no longer needed, so they can
be deleted, and the corresponding fields in pixman_implementation_t
can be removed.
2013-05-22 09:43:21 -04:00
Søren Sandmann Pedersen
125a4fd36f Add _pixman_implementation_iter_init() and use instead of _src/_dest_init()
A new field, 'iter_info', is added to the implementation struct, and
all the implementations store a pointer to their iterator tables in
it. A new function, _pixman_implementation_iter_init(), is then added
that searches those tables, and the new function is called in
pixman-general.c and pixman-image.c instead of the old
_pixman_implementation_src_init() and _pixman_implementation_dest_init().
2013-05-22 09:43:21 -04:00
Søren Sandmann Pedersen
245d0090c5 general: Store the iter initializer in a one-entry pixman_iter_info_t table
In preparation for sharing all iterator initialization code from all
the implementations, move the general implementation to use a table of
pixman_iter_info_t.

The existing src_iter_init and dest_iter_init functions are
consolidated into one general_iter_init() function that checks the
iter_flags for whether it is dealing with a source or destination
iterator.

Unlike in the other implementations, the general_iter_init() function
stores its own get_scanline() and write_back() functions in the
iterator, so it relies on the initializer being called after
get_scanline and write_back being copied from the struct to the
iterator.
2013-05-22 09:43:21 -04:00
Søren Sandmann Pedersen
9c15afb105 fast: Replace the fetcher_info_t table with a pixman_iter_info_t table
Similar to the SSE2 and MMX patches, this commit replaces a table of
fetcher_info_t with a table of pixman_iter_info_t, and similar to the
noop patch, both fast_src_iter_init() and fast_dest_iter_init() are
now doing exactly the same thing, so their code can be shared in a new
function called fast_iter_init_common().
2013-05-22 09:43:21 -04:00
Søren Sandmann Pedersen
71c2d519d0 mmx: Replace the fetcher_info_t table with a pixman_iter_info_t table
Similar to the SSE2 commit, information about the iterators is stored
in a table of pixman_iter_info_t.
2013-05-22 09:43:21 -04:00
Søren Sandmann Pedersen
78f437d61e sse2: Replace the fetcher_info_t table with a pixman_iter_info_t table
Similar to the changes to noop, put all the iterators into a table of
pixman_iter_info_t and then do a generic search of that table during
iterator initialization.
2013-05-22 09:43:20 -04:00
Søren Sandmann Pedersen
c7b0da8a96 noop: Keep information about iterators in an array of pixman_iter_info_t
Instead of having a nest of if statements, store the information about
iterators in a table of a new struct type, pixman_iter_info_t, and
then walk that table when initializing iterators.

The new struct contains a format, a set of image flags, and a set of
iter flags, plus a pixman_iter_get_scanline_t, a
pixman_iter_write_back_t, and a new function type
pixman_iter_initializer_t.

If the iterator matches an entry, it is first initialized with the
given get_scanline and write_back functions, and then the provided
iter_initializer (if present) is run. Running the iter_initializer
after setting get_scanline and write_back allows the initializer to
override those fields if it wishes.

The table contains both source and destination iterators,
distinguished based on the recently-added ITER_SRC and ITER_DEST;
similarly, wide iterators are recognized with the ITER_WIDE
flag. Having both source and destination iterators in the table means
the noop_src_iter_init() and noop_dest_iter_init() functions become
identical, so this patch factors out their code in a new function
noop_iter_init_common() that both calls.

The following patches in this series will change all the
implementations to use an iterator table, and then move the table
search code to pixman-implementation.c.
2013-05-22 09:43:20 -04:00