Commit Graph

1948 Commits

Author SHA1 Message Date
Siarhei Siamashka
a732d3baeb ARM: added 'neon_composite_src_0888_0565_rev' fast path
This is ARM NEON optimized conversion of native RGB format used by
GTK/GDK into r5g6b5 format.
2009-12-09 15:22:03 +02:00
Siarhei Siamashka
a1386a1ceb ARM: added 'neon_src_0888_8888_rev' fast path
This is ARM NEON optimized conversion of native RGB format used by
GTK/GDK into native 32bpp RGB format used by cairo/pixman.
2009-12-09 15:21:57 +02:00
Siarhei Siamashka
78a60047ac ARM: added 'neon_composite_over_n_8888' fast path 2009-12-09 11:29:13 +02:00
Siarhei Siamashka
96fd17488f ARM: added 'neon_composite_over_n_0565' fast path 2009-12-09 11:27:57 +02:00
Siarhei Siamashka
2d332c7a56 ARM: added 'neon_composite_src_0565_8888' fast path 2009-12-09 10:33:01 +02:00
Siarhei Siamashka
062da411d8 ARM: added 'neon_composite_add_8888_8888_8888' fast path 2009-12-09 10:26:47 +02:00
Siarhei Siamashka
3d0eedb5d9 ARM: added 'neon_composite_add_8888_8888' fast path 2009-12-09 10:25:03 +02:00
Siarhei Siamashka
86b54c6701 ARM: added 'neon_composite_over_8888_8_8888' fast path 2009-12-09 10:24:30 +02:00
Siarhei Siamashka
aec1524e77 ARM: added 'neon_composite_over_8888_8888_8888' fast path 2009-12-09 10:19:37 +02:00
Siarhei Siamashka
ba59d53d0b ARM: minor source formatting changes
Now it's a bit harder to exceed 80 characters line limit
when binding assembly functions.
2009-12-09 10:17:23 +02:00
Siarhei Siamashka
a47b5167c4 ARM: added '.arch armv7a' directive to NEON assembly file
This fix prevents build failure due to not accepting PLD instruction when
compiling for armv4 cpu with the relevant -mcpu/-march options set in CFLAGS.
2009-12-08 08:52:34 +02:00
Benjamin Otte
3fba7dc6fa Make test program not throw warnings about undefined variables 2009-12-04 15:04:24 +01:00
Benjamin Otte
10ab592d57 Fix bug that prevented pixman_fill MMX and SSE paths for 16 and 8bpp 2009-12-04 15:04:24 +01:00
Siarhei Siamashka
7c7b6f5de7 ARM: NEON optimized pixman_blt
NEON unit has fast access to L1/L2 caches and even simple
copy of memory buffers using NEON provides more than 1.5x
performance improvement on ARM Cortex-A8.
2009-11-30 22:21:08 +02:00
Siarhei Siamashka
dce6e1bd68 test: support for testing pixbuf fast path functions in blitters-test 2009-11-27 15:50:26 +02:00
Benjamin Otte
0901ef41fb Remove nonexistant function from header 2009-11-22 10:57:06 +01:00
Søren Sandmann Pedersen
c97b1e803f Post-release version bump 2009-11-20 12:02:50 +01:00
Søren Sandmann Pedersen
5a7597f818 Pre-release version bump 2009-11-20 11:55:40 +01:00
Søren Sandmann Pedersen
95a08dece3 Remove stray semicolon from blitters-test.c
Pointed out by scottmc2@gmail.com in bug 25137.
2009-11-20 11:18:58 +01:00
Siarhei Siamashka
6e2c7d54c6 C fast path function for 'over_n_1_0565'
This function is needed to improve performance of xfce4 terminal when
using bitmap fonts and running with 16bpp desktop. Some other applications
may potentially benefit too.

After applying this patch, top functions from Xorg process in
oprofile log change from

samples  %        image name               symbol name
13296    29.1528  libpixman-1.so.0.17.1    combine_over_u
6452     14.1466  libpixman-1.so.0.17.1    fetch_scanline_r5g6b5
5516     12.0944  libpixman-1.so.0.17.1    fetch_scanline_a1
2273      4.9838  libpixman-1.so.0.17.1    store_scanline_r5g6b5
1741      3.8173  libpixman-1.so.0.17.1    fast_composite_add_1000_1000
1718      3.7669  libc-2.9.so              memcpy

to

samples  %        image name               symbol name
5594     14.7033  libpixman-1.so.0.17.1    fast_composite_over_n_1_0565
4323     11.3626  libc-2.9.so              memcpy
3695      9.7119  libpixman-1.so.0.17.1    fast_composite_add_1000_1000

when scrolling text in terminal (reading man page).
2009-11-20 11:18:58 +01:00
Søren Sandmann Pedersen
282f5cf8b8 Round horizontal sampling points towards northwest.
This is a similar change as the top/bottom one, but in this case the
rounding is simpler because it's just always rounding down.

Based on a patch by M Joonas Pihlaja.
2009-11-17 01:58:01 -05:00
Søren Sandmann Pedersen
f44431986f Fix rounding of top and bottom coordinates.
The rules for trap rasterization is that coordinates are rounded
towards north-west.

The pixman_sample_ceil() function is used to compute the first
(top-most) sample row included in the trap, so when the input
coordinate is already exactly on a sample row, no rounding should take
place.

On the other hand, pixman_sample_floor() is used to compute the final
(bottom-most) sample row, so if the input is precisely on a sample
row, it needs to be rounded down to the previous row.

This commit fixes the rounding computation. The idea of the
computation is like this:

Floor operation that rounds exact matches down: First subtract
pixman_fixed_e to make sure input already on a sample row gets rounded
down. Then find out how many small steps are between the input and the
first fraction. Then add those small steps to the first fraction.

The ceil operation first adds (small_step + pixman_e), then runs a
floor. This ensures that exact matches are not rounded off.

Based on a patch by M Joonas Pihlaja.
2009-11-17 01:58:01 -05:00
Søren Sandmann Pedersen
3bea18e3ea Fix slightly skewed sampling grid for antialiased traps
The sampling grid is slightly skewed in the antialiased case. Consider
the case where we have n = 8 bits of alpha.

The small step is

     small_step = fixed_1 / 15 = 65536 / 15 = 4369

The first fraction is then

     frac_first = (small_step / 2) = (65536 - 15) / 2 = 2184

and the last fraction becomes

     frac_last
          = frac_first + (15 - 1) * small_step = 2184 + 14 * 4369 = 63350

which means the size of the last bit of the pixel is

     65536 - 63350 = 2186

which is 2 bigger than the first fraction. This is not the end of the
world, but it would be more correct to have 2185 and 2185, and we can
accomplish that simply by making the first fraction half the *big*
step instead of half the small step.

If we ever move to coordinates with 8 fractional bits, the
corresponding values become 8 and 10 out of 256, where 9 and 9 would
be better.

Similarly in the X direction.
2009-11-17 01:58:01 -05:00
Søren Sandmann Pedersen
98bb0a509f Delete the flags field from fast_path_info_t 2009-11-17 00:47:49 -05:00
Søren Sandmann Pedersen
b7fb7e6c70 Eliminate NEED_PIXBUF flag.
Instead introduce two new fake formats

	PIXMAN_pixbuf
	PIXMAN_rpixbuf

and compute whether the source and mask have them in
find_fast_path(). This lead to some duplicate entries in the fast path
tables that could then be removed.
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
542b79c30d Compute src_format outside the fast path loop.
Inside the loop all we have to do is check that the formats match.
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
12108ecbe4 Eliminate the NEED_COMPONENT_ALPHA flag.
Instead introduce two new fake formats

	PIXMAN_a8r8g8b8_ca
	PIXMAN_a8b8g8r8_ca

that are used in the fast path tables for this case.
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
4686d1f53b Eliminate the NEED_SOLID_MASK flag
This flag was used to indicate that the mask was solid while still
allowing a specific format to be required. However, there is not
actually any need for this because the fast paths all used
_pixman_image_get_solid() which already allowed arbitrary formats.

The one thing that had to be dealt with was component alpha. In
addition to interpreting the presence of the NEED_COMPONENT_ALPHA
flag, we now also interprete the *absence* of this flag as a
requirement that the mask does *not* have component alpha.

Siarhei Siamashka pointed out that the first version of this commit
had a bug, in which a NEED_SOLID_MASK was accidentally not turned into
a PIXMAN_solid in the ARM NEON implementation.
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
2ef8b394d7 Use the destination buffer directly in more cases instead of fetching.
When the destination buffer is either a8r8g8b8 or x8r8g8b8, we can use
it directly instead of fetching into a temporary buffer. When the
format is x8r8g8b8, we require the operator to not make use of
destination alpha, but when it is a8r8g8b8, there are no restrictions.

This is approximately a 5% speedup on the poppler cairo benchmark:

[ # ]  backend                         test   min(s) median(s) stddev. count

Before:
[  0]    image                      poppler    6.661    6.709   0.59%    6/6

After:
[  0]    image                      poppler    6.307    6.320   0.12%    5/6
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
13f4e02b14 test: Move image_endian_swap() from blitters-test.c to utils.[ch] 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
24e203a8a8 test: Move random number generator from blitters/scaling-test to utils.[ch] 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
cc34554652 test: In scaling-test use the crc32 from utils.c 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
b465b8b79d test: Move CRC32 code from blitters-test to new files utils.[ch] 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
56bd913401 test: Rename utils.[ch] to gtk-utils.[ch] 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
7be529f3bd sse2: Add a fast path for OVER 8888 x 8 x 8888
This is a small speedup on the swfdec-youtube benchmark:

Before:
[  0]    image               swfdec-youtube    5.789    5.806   0.20%    6/6

After:
[  0]    image               swfdec-youtube    5.489    5.524   0.27%    6/6

Ie., approximately 5% faster.
2009-11-13 15:57:48 -05:00
Siarhei Siamashka
abefe68ae2 ARM: enabled 'neon_composite_add_8000_8000' fast path 2009-11-11 18:12:58 +02:00
Siarhei Siamashka
635f389ff4 ARM: enabled 'neon_composite_add_8_8_8' fast path 2009-11-11 18:12:58 +02:00
Siarhei Siamashka
7e1bfed676 ARM: enabled 'neon_composite_add_n_8_8' fast path 2009-11-11 18:12:58 +02:00
Siarhei Siamashka
deeb67b13a ARM: enabled 'neon_composite_over_8888_8888' fast path 2009-11-11 18:12:58 +02:00
Siarhei Siamashka
f449364849 ARM: enabled 'neon_composite_over_8888_0565' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
2dfbf6c4a5 ARM: enabled 'neon_composite_over_8888_n_8888' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
43824f98f1 ARM: enabled 'neon_composite_over_n_8_8888' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
189d0d783c ARM: enabled 'neon_composite_over_n_8_0565' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
cccfc87f4f ARM: enabled 'neon_composite_src_0888_0888' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
e89b4f8105 ARM: enabled 'neon_composite_src_8888_0565' fast path 2009-11-11 18:12:56 +02:00
Siarhei Siamashka
2d54ed46fb ARM: enabled 'neon_composite_src_0565_0565' fast path 2009-11-11 18:12:56 +02:00
Siarhei Siamashka
5d695cb86e ARM: added 'bindings' for NEON assembly optimized functions
These functions serve as 'adaptors', converting standard internal
pixman fast path function arguments into arguments expected
by assembly functions.
2009-11-11 18:12:56 +02:00
Siarhei Siamashka
dcfade3df9 ARM: enabled new implementation for pixman_fill_neon 2009-11-11 18:12:56 +02:00
Siarhei Siamashka
bcb4bc7932 ARM: introduction of the new framework for NEON fast path optimizations
GNU assembler and its macro preprocessor is now used to generate
NEON optimized functions from a common template. This automatically
takes care of nuisances like ensuring optimal alignment, dealing with
leading/trailing pixels, doing prefetch, etc.

Implementations for a lot of compositing functions are also added,
but not enabled.
2009-11-11 18:12:56 +02:00
Siarhei Siamashka
1eff0ab487 ARM: removed old ARM NEON optimizations 2009-11-11 18:12:55 +02:00