Commit Graph

1635 Commits

Author SHA1 Message Date
Siarhei Siamashka
7c7b6f5de7 ARM: NEON optimized pixman_blt
NEON unit has fast access to L1/L2 caches and even simple
copy of memory buffers using NEON provides more than 1.5x
performance improvement on ARM Cortex-A8.
2009-11-30 22:21:08 +02:00
Siarhei Siamashka
dce6e1bd68 test: support for testing pixbuf fast path functions in blitters-test 2009-11-27 15:50:26 +02:00
Benjamin Otte
0901ef41fb Remove nonexistant function from header 2009-11-22 10:57:06 +01:00
Søren Sandmann Pedersen
c97b1e803f Post-release version bump 2009-11-20 12:02:50 +01:00
Søren Sandmann Pedersen
5a7597f818 Pre-release version bump 2009-11-20 11:55:40 +01:00
Søren Sandmann Pedersen
95a08dece3 Remove stray semicolon from blitters-test.c
Pointed out by scottmc2@gmail.com in bug 25137.
2009-11-20 11:18:58 +01:00
Siarhei Siamashka
6e2c7d54c6 C fast path function for 'over_n_1_0565'
This function is needed to improve performance of xfce4 terminal when
using bitmap fonts and running with 16bpp desktop. Some other applications
may potentially benefit too.

After applying this patch, top functions from Xorg process in
oprofile log change from

samples  %        image name               symbol name
13296    29.1528  libpixman-1.so.0.17.1    combine_over_u
6452     14.1466  libpixman-1.so.0.17.1    fetch_scanline_r5g6b5
5516     12.0944  libpixman-1.so.0.17.1    fetch_scanline_a1
2273      4.9838  libpixman-1.so.0.17.1    store_scanline_r5g6b5
1741      3.8173  libpixman-1.so.0.17.1    fast_composite_add_1000_1000
1718      3.7669  libc-2.9.so              memcpy

to

samples  %        image name               symbol name
5594     14.7033  libpixman-1.so.0.17.1    fast_composite_over_n_1_0565
4323     11.3626  libc-2.9.so              memcpy
3695      9.7119  libpixman-1.so.0.17.1    fast_composite_add_1000_1000

when scrolling text in terminal (reading man page).
2009-11-20 11:18:58 +01:00
Søren Sandmann Pedersen
282f5cf8b8 Round horizontal sampling points towards northwest.
This is a similar change as the top/bottom one, but in this case the
rounding is simpler because it's just always rounding down.

Based on a patch by M Joonas Pihlaja.
2009-11-17 01:58:01 -05:00
Søren Sandmann Pedersen
f44431986f Fix rounding of top and bottom coordinates.
The rules for trap rasterization is that coordinates are rounded
towards north-west.

The pixman_sample_ceil() function is used to compute the first
(top-most) sample row included in the trap, so when the input
coordinate is already exactly on a sample row, no rounding should take
place.

On the other hand, pixman_sample_floor() is used to compute the final
(bottom-most) sample row, so if the input is precisely on a sample
row, it needs to be rounded down to the previous row.

This commit fixes the rounding computation. The idea of the
computation is like this:

Floor operation that rounds exact matches down: First subtract
pixman_fixed_e to make sure input already on a sample row gets rounded
down. Then find out how many small steps are between the input and the
first fraction. Then add those small steps to the first fraction.

The ceil operation first adds (small_step + pixman_e), then runs a
floor. This ensures that exact matches are not rounded off.

Based on a patch by M Joonas Pihlaja.
2009-11-17 01:58:01 -05:00
Søren Sandmann Pedersen
3bea18e3ea Fix slightly skewed sampling grid for antialiased traps
The sampling grid is slightly skewed in the antialiased case. Consider
the case where we have n = 8 bits of alpha.

The small step is

     small_step = fixed_1 / 15 = 65536 / 15 = 4369

The first fraction is then

     frac_first = (small_step / 2) = (65536 - 15) / 2 = 2184

and the last fraction becomes

     frac_last
          = frac_first + (15 - 1) * small_step = 2184 + 14 * 4369 = 63350

which means the size of the last bit of the pixel is

     65536 - 63350 = 2186

which is 2 bigger than the first fraction. This is not the end of the
world, but it would be more correct to have 2185 and 2185, and we can
accomplish that simply by making the first fraction half the *big*
step instead of half the small step.

If we ever move to coordinates with 8 fractional bits, the
corresponding values become 8 and 10 out of 256, where 9 and 9 would
be better.

Similarly in the X direction.
2009-11-17 01:58:01 -05:00
Søren Sandmann Pedersen
98bb0a509f Delete the flags field from fast_path_info_t 2009-11-17 00:47:49 -05:00
Søren Sandmann Pedersen
b7fb7e6c70 Eliminate NEED_PIXBUF flag.
Instead introduce two new fake formats

	PIXMAN_pixbuf
	PIXMAN_rpixbuf

and compute whether the source and mask have them in
find_fast_path(). This lead to some duplicate entries in the fast path
tables that could then be removed.
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
542b79c30d Compute src_format outside the fast path loop.
Inside the loop all we have to do is check that the formats match.
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
12108ecbe4 Eliminate the NEED_COMPONENT_ALPHA flag.
Instead introduce two new fake formats

	PIXMAN_a8r8g8b8_ca
	PIXMAN_a8b8g8r8_ca

that are used in the fast path tables for this case.
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
4686d1f53b Eliminate the NEED_SOLID_MASK flag
This flag was used to indicate that the mask was solid while still
allowing a specific format to be required. However, there is not
actually any need for this because the fast paths all used
_pixman_image_get_solid() which already allowed arbitrary formats.

The one thing that had to be dealt with was component alpha. In
addition to interpreting the presence of the NEED_COMPONENT_ALPHA
flag, we now also interprete the *absence* of this flag as a
requirement that the mask does *not* have component alpha.

Siarhei Siamashka pointed out that the first version of this commit
had a bug, in which a NEED_SOLID_MASK was accidentally not turned into
a PIXMAN_solid in the ARM NEON implementation.
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
2ef8b394d7 Use the destination buffer directly in more cases instead of fetching.
When the destination buffer is either a8r8g8b8 or x8r8g8b8, we can use
it directly instead of fetching into a temporary buffer. When the
format is x8r8g8b8, we require the operator to not make use of
destination alpha, but when it is a8r8g8b8, there are no restrictions.

This is approximately a 5% speedup on the poppler cairo benchmark:

[ # ]  backend                         test   min(s) median(s) stddev. count

Before:
[  0]    image                      poppler    6.661    6.709   0.59%    6/6

After:
[  0]    image                      poppler    6.307    6.320   0.12%    5/6
2009-11-17 00:42:21 -05:00
Søren Sandmann Pedersen
13f4e02b14 test: Move image_endian_swap() from blitters-test.c to utils.[ch] 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
24e203a8a8 test: Move random number generator from blitters/scaling-test to utils.[ch] 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
cc34554652 test: In scaling-test use the crc32 from utils.c 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
b465b8b79d test: Move CRC32 code from blitters-test to new files utils.[ch] 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
56bd913401 test: Rename utils.[ch] to gtk-utils.[ch] 2009-11-17 00:32:03 -05:00
Søren Sandmann Pedersen
7be529f3bd sse2: Add a fast path for OVER 8888 x 8 x 8888
This is a small speedup on the swfdec-youtube benchmark:

Before:
[  0]    image               swfdec-youtube    5.789    5.806   0.20%    6/6

After:
[  0]    image               swfdec-youtube    5.489    5.524   0.27%    6/6

Ie., approximately 5% faster.
2009-11-13 15:57:48 -05:00
Siarhei Siamashka
abefe68ae2 ARM: enabled 'neon_composite_add_8000_8000' fast path 2009-11-11 18:12:58 +02:00
Siarhei Siamashka
635f389ff4 ARM: enabled 'neon_composite_add_8_8_8' fast path 2009-11-11 18:12:58 +02:00
Siarhei Siamashka
7e1bfed676 ARM: enabled 'neon_composite_add_n_8_8' fast path 2009-11-11 18:12:58 +02:00
Siarhei Siamashka
deeb67b13a ARM: enabled 'neon_composite_over_8888_8888' fast path 2009-11-11 18:12:58 +02:00
Siarhei Siamashka
f449364849 ARM: enabled 'neon_composite_over_8888_0565' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
2dfbf6c4a5 ARM: enabled 'neon_composite_over_8888_n_8888' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
43824f98f1 ARM: enabled 'neon_composite_over_n_8_8888' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
189d0d783c ARM: enabled 'neon_composite_over_n_8_0565' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
cccfc87f4f ARM: enabled 'neon_composite_src_0888_0888' fast path 2009-11-11 18:12:57 +02:00
Siarhei Siamashka
e89b4f8105 ARM: enabled 'neon_composite_src_8888_0565' fast path 2009-11-11 18:12:56 +02:00
Siarhei Siamashka
2d54ed46fb ARM: enabled 'neon_composite_src_0565_0565' fast path 2009-11-11 18:12:56 +02:00
Siarhei Siamashka
5d695cb86e ARM: added 'bindings' for NEON assembly optimized functions
These functions serve as 'adaptors', converting standard internal
pixman fast path function arguments into arguments expected
by assembly functions.
2009-11-11 18:12:56 +02:00
Siarhei Siamashka
dcfade3df9 ARM: enabled new implementation for pixman_fill_neon 2009-11-11 18:12:56 +02:00
Siarhei Siamashka
bcb4bc7932 ARM: introduction of the new framework for NEON fast path optimizations
GNU assembler and its macro preprocessor is now used to generate
NEON optimized functions from a common template. This automatically
takes care of nuisances like ensuring optimal alignment, dealing with
leading/trailing pixels, doing prefetch, etc.

Implementations for a lot of compositing functions are also added,
but not enabled.
2009-11-11 18:12:56 +02:00
Siarhei Siamashka
1eff0ab487 ARM: removed old ARM NEON optimizations 2009-11-11 18:12:55 +02:00
Søren Sandmann Pedersen
b8898d77d0 Define PIXMAN_USE_INTERNAL_API in pixman-private.h
Instead of mucking around with CFLAGS in configure.ac, preventing
users from setting their own CFLAGS, just define the
PIXMAN_USE_INTERNAL_API and PIXMAN_DISABLE_DEPRECATED in
pixman-private.h
2009-11-07 14:47:22 -05:00
Søren Sandmann Pedersen
67bf739187 Include <inttypes.h> when compiled with HP's C compiler.
Fixes bug 23169.
2009-10-27 09:11:28 -04:00
Siarhei Siamashka
384fb88b90 C fast path function for 'over_n_1_8888'
This function is needed to improve performance of xfce4 terminal.
Some other applications may potentially benefit too.
2009-10-27 12:32:04 +02:00
Siarhei Siamashka
a2985da947 C fast path function for 'add_1000_1000'
This function is needed to improve performance of xfce4 terminal.
Some other applications may potentially benefit too.
2009-10-27 12:31:59 +02:00
Siarhei Siamashka
5f429e4510 blitters-test updated to also randomly generate mask_x/mask_y 2009-10-27 12:31:55 +02:00
André Tupinambá
0d5562747c Add fast path scaled, bilinear fetcher.
This adds a bilinear fetcher for the case where the image has a scaled
transformation, does not repeat, and the format {ax}8r8g8b8.

Results for the swfdec-youtube benchmark

Before:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image               swfdec-youtube    7.841    7.915   0.72%    6/6

After:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image               swfdec-youtube    6.677    6.780   0.94%    6/6

These results were measured on a faster machine than the ones in the
previous commit, so the numbers are not comparable.

Signed-off-by: Søren Sandmann Pedersen <sandmann@redhat.com>
2009-10-26 13:04:21 -04:00
André Tupinambá
88323c5abe Speed up bilinear interpolation.
Speed up bilinear interpolation by processing more than one component
at a time on 64 bit architectures, and by precomputing the dist{ixiy}
products on 32 bit architectures.

Previously bilinear interpolation for one pixel would take 24
multiplications. With this improvement it takes 12 on 64 bit, and 20
on 32 bit.

This is a small but consistent speedup on the swfdec-youtube
benchmark:

[ # ]  backend                         test   min(s) median(s) stddev. count
Before:
[  0]    image               swfdec-youtube   18.010   18.020   0.09%    4/5

After:
[  0]    image               swfdec-youtube   17.488   17.584   0.22%    5/6

Signed-off-by: Søren Sandmann Pedersen <sandmann@redhat.com>
2009-10-26 13:04:21 -04:00
Søren Sandmann Pedersen
f0c157f888 Extend scaling-test to also test bilinear filtering. 2009-10-26 13:04:21 -04:00
Jeremy Huddleston
eab882ef38 This is not a GNU project, so declare it foreign.
On Wed, 2009-10-21 at 13:36 +1000, Peter Hutterer wrote:
> On Tue, Oct 20, 2009 at 08:23:55PM -0700, Jeremy Huddleston wrote:
> > I noticed an INSTALL file in xlsclients and libXvMC today, and it
> > was quite annoying to work around since 'autoreconf -fvi' replaces
> > it and git wants to commit it.  Should these files even be in git?
> > Can I nuke them for the betterment of humanity and since they get
> > created by autoreconf anyways?
>
> See https://bugs.freedesktop.org/show_bug.cgi?id=24206

As an interim measure, replace AM_INIT_AUTOMAKE([dist-bzip2]) with
AM_INIT_AUTOMAKE([foreign dist-bzip2]). This will prevent the generation
of the INSTALL file. It is also part of the 24206 solution.

Signed-off-by: Jeremy Huddleston <jeremyhu@freedesktop.org>
2009-10-21 12:47:27 -07:00
Søren Sandmann Pedersen
dc46ad274a Make walk_region_internal() use 32 bit dimensions 2009-10-19 20:32:37 -04:00
Søren Sandmann Pedersen
bb3698d479 Make pixman_compute_composite_region32() use 32 bit dimensions 2009-10-19 20:31:54 -04:00
Søren Sandmann Pedersen
895c281c40 Change prototype of _pixman_walk_composite_region from int16_t to int32_t 2009-10-19 20:30:22 -04:00
Søren Sandmann Pedersen
9cd470665b Remove unused color_table and color_table_size fields 2009-10-19 20:27:36 -04:00