Commit Graph

1929 Commits

Author SHA1 Message Date
Siarhei Siamashka
70a923882c ARM: a bit faster NEON bilinear scaling for r5g6b5 source images
Instructions scheduling improved in the code responsible for fetching r5g6b5
pixels and converting them to the intermediate x8r8g8b8 color format used in
the interpolation part of code. Still a lot of NEON stalls are remaining,
which can be resolved later by the use of pipelining.

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=10020565, speed=32.29 MPix/s
          op=1, src=10020565, dst=20020888, speed=36.82 MPix/s
  after:  op=1, src=10020565, dst=10020565, speed=41.35 MPix/s
          op=1, src=10020565, dst=20020888, speed=49.16 MPix/s
2011-03-12 21:30:22 +02:00
Siarhei Siamashka
fe99673719 ARM: NEON optimization for bilinear scaled 'src_0565_0565'
Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=10020565, speed=3.30 MPix/s
  after:  op=1, src=10020565, dst=10020565, speed=32.29 MPix/s
2011-03-12 21:30:18 +02:00
Siarhei Siamashka
29003c3bef ARM: NEON optimization for bilinear scaled 'src_0565_x888'
Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=20020888, speed=3.39 MPix/s
  after:  op=1, src=10020565, dst=20020888, speed=36.82 MPix/s
2011-03-12 21:30:13 +02:00
Siarhei Siamashka
2ee27e7d79 ARM: NEON optimization for bilinear scaled 'src_8888_0565'
Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=10020565, speed=6.56 MPix/s
  after:  op=1, src=20028888, dst=10020565, speed=61.65 MPix/s
2011-03-12 21:30:09 +02:00
Siarhei Siamashka
11a0c5badb ARM: use common macro template for bilinear scaled 'src_8888_8888'
This is a cleanup for old and now duplicated code. The performance improvement
is mostly coming from the enabled use of software prefetch, but instructions
scheduling is also slightly better.

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=53.24 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=74.36 MPix/s
2011-03-12 21:30:05 +02:00
Siarhei Siamashka
34098dba67 ARM: NEON: common macro template for bilinear scanline scalers
This allows to generate bilinear scanline scaling functions targeting
various source and destination color formats. Right now a8r8g8b8/x8r8g8b8
and r5g6b5 color formats are supported. More formats can be added if needed.
2011-03-12 21:30:00 +02:00
Siarhei Siamashka
66f4ee1b3b ARM: new bilinear fast path template macro in 'pixman-arm-common.h'
It can be reused in different ARM NEON bilinear scaling fast path functions.
2011-03-12 21:29:56 +02:00
Siarhei Siamashka
5921c17639 ARM: assembly optimized nearest scaled 'src_8888_8888'
Benchmark on ARM Cortex-A8 r1p3 @500MHz, 32-bit LPDDR @166MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=44.36 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=39.79 MPix/s

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=102.36 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=163.12 MPix/s
2011-03-12 21:26:05 +02:00
Siarhei Siamashka
f3e17872f5 ARM: common macro for nearest scaling fast paths
The code of nearest scaled 'src_0565_0565' function was generalized
and moved to a common macro, so that it can be reused for other
fast paths.
2011-03-12 21:24:40 +02:00
Siarhei Siamashka
bb3d1b67fd ARM: use prefetch in nearest scaled 'src_0565_0565'
Benchmark on ARM Cortex-A8 r1p3 @500MHz, 32-bit LPDDR @166MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=10020565, speed=75.02 MPix/s
  after:  op=1, src=10020565, dst=10020565, speed=73.63 MPix/s

Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=10020565, dst=10020565, speed=176.12 MPix/s
  after:  op=1, src=10020565, dst=10020565, speed=267.50 MPix/s
2011-03-12 21:23:54 +02:00
Cyril Brulebois
3503f7956f Upload to experimental. 2011-03-09 04:08:04 +01:00
Cyril Brulebois
19f2d3d9c1 Bump Standards-Version to 3.9.1 (no changes needed). 2011-03-09 04:07:54 +01:00
Cyril Brulebois
bec6320b0e Add a quilt series placeholder file. 2011-03-09 04:04:13 +01:00
Cyril Brulebois
43375c5d66 Switch to dh. 2011-03-09 03:55:08 +01:00
Cyril Brulebois
d3975d7ff9 Update Uploaders list. Thanks, David! 2011-03-09 03:42:00 +01:00
Cyril Brulebois
b03a2e477b Remove libpixman1-dev from Conflicts, last seen in etch! 2011-03-09 03:41:05 +01:00
Cyril Brulebois
61363cc614 Wrap Build-Depends. 2011-03-09 03:40:06 +01:00
Cyril Brulebois
b98292b4d5 Bump shlibs accordingly. 2011-03-09 03:39:07 +01:00
Cyril Brulebois
1e6491fdde Update symbols file with new symbols. 2011-03-09 03:38:42 +01:00
Cyril Brulebois
1d60bb92f7 Bump changelogs. 2011-03-09 03:21:07 +01:00
Cyril Brulebois
a0ab0aecb2 Merge branch 'upstream-experimental' into debian-experimental 2011-03-09 03:20:36 +01:00
Søren Sandmann Pedersen
84e361c8e3 test: Do endian swapping of the source and destination images.
Otherwise the test fails on big endian. Fix for bug 34767, reported by
Siarhei Siamashka.
2011-03-07 14:08:00 -05:00
Søren Sandmann Pedersen
84f3c5a71a test: In image_endian_swap() use pixman_image_get_format() to get the bpp.
There is no reason to pass in the bpp as an argument; it can be gotten
directly from the image.
2011-03-07 14:07:44 -05:00
Siarhei Siamashka
17feaa9c50 ARM: NEON optimization for bilinear scaled 'src_8888_8888'
Initial NEON optimization for bilinear scaling. Can be probably
improved more.

Benchmark on ARM Cortex-A8:
 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=6.70 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=44.27 MPix/s
2011-02-28 15:47:58 +02:00
Siarhei Siamashka
350029396d SSE2 optimization for bilinear scaled 'src_8888_8888'
A primitive naive implementation of bilinear scaling using SSE2 intrinsics,
which only handles one pixel at a time. It is approximately 2x faster than
pixman general compositing path. Single pass processing without intermediate
temporary buffer contributes to ~15% and loop unrolling contributes to ~20%
of this speedup.

Benchmark on Intel Core i7 (x86-64):
 Using cairo-perf-trace:
  before: image        firefox-planet-gnome   12.566   12.610   0.23%    6/6
  after:  image        firefox-planet-gnome   10.961   11.013   0.19%    5/6

 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x):
  before: op=1, src=20028888, dst=20028888, speed=70.48 MPix/s
  after:  op=1, src=20028888, dst=20028888, speed=165.38 MPix/s
2011-02-28 15:47:52 +02:00
Siarhei Siamashka
0df43b8ae5 test: check correctness of 'bilinear_pad_repeat_get_scanline_bounds'
Individual correctness check for the new bilinear scaling related
supplementary function. This test program uses a bit wider range
of input arguments, not covered by other tests.
2011-02-28 15:29:23 +02:00
Siarhei Siamashka
d506bf68fd Main loop template for fast single pass bilinear scaling
Can be used for implementing SIMD optimized fast path
functions which work with bilinear scaled source images.

Similar to the template for nearest scaling main loop, the
following types of mask are supported:
1. no mask
2. non-scaled a8 mask with SAMPLES_COVER_CLIP flag
3. solid mask

PAD repeat is fully supported. NONE repeat is partially
supported (right now only works if source image has alpha
channel or when alpha channel of the source image does not
have any effect on the compositing operation).
2011-02-28 15:29:16 +02:00
Andrea Canciani
9ebde285fa test: Silence MSVC warnings
MSVC does not notice non-returning functions (abort() / assert(0))
and warns about paths which end with them in non-void functions:

c:\cygwin\home\ranma42\code\fdo\pixman\test\fetch-test.c(114) :
warning C4715: 'reader' : not all control paths return a value
c:\cygwin\home\ranma42\code\fdo\pixman\test\stress-test.c(133) :
warning C4715: 'real_reader' : not all control paths return a value
c:\cygwin\home\ranma42\code\fdo\pixman\test\composite.c(431) :
warning C4715: 'calc_op' : not all control paths return a value

These warnings can be silenced by adding a return after the
termination call.
2011-02-28 10:38:02 +01:00
Andrea Canciani
8868778ea1 Do not include unused headers
pixman-combine32.h is included without being used both in
pixman-image.c and in pixman-general.c.
2011-02-28 10:38:02 +01:00
Andrea Canciani
72f5e5f608 test: Add Makefile for Win32 2011-02-28 10:38:02 +01:00
Andrea Canciani
11305b4ecd test: Fix tests for compilation on Windows
The Microsoft C compiler cannot handle subobject initialization and
Win32 does not provide snprintf.

Work around these limitations by using normal struct initialization
and using sprintf (a manual check shows that the buffer size is
sufficient).
2011-02-28 10:38:02 +01:00
Andrea Canciani
20ed723a5a Fix compilation on Win32
Makefile.win32 contained a typo and was missing the dependency from
the built sources.
2011-02-28 10:38:01 +01:00
Søren Sandmann Pedersen
48e951000c Post-release version bump to 0.21.7 2011-02-22 16:13:32 -05:00
Søren Sandmann Pedersen
8b33321660 Pre-release version bump to 0.21.6 2011-02-22 15:43:41 -05:00
Søren Sandmann Pedersen
2cb67d2a0b Minor fix to the RELEASING file 2011-02-22 15:40:34 -05:00
Søren Sandmann Pedersen
3cdf74257b Delete pixman-x64-mmx-emulation.h from pixman/Makefile.am 2011-02-22 15:28:17 -05:00
Siarhei Siamashka
65919ad17f Ensure that tests run as the last step of a build for 'make check'
Previously 'make check' would compile and run tests first, and only
then proceed to compiling demos. Which is not very convenient
because of the need to scroll back console output to see the
tests verdict. Swapping order of SUBDIRS variable entries in
Makefile.am resolves this.
2011-02-22 19:43:57 +02:00
Søren Sandmann Pedersen
34a7ac0474 sse2: Minor coding style cleanups.
Also make pixman_fill_sse2() static.
2011-02-18 16:03:30 -05:00
Søren Sandmann Pedersen
10f69e5ec8 sse2: Remove pixman-x64-mmx-emulation.h
Also stop including mmintrin.h
2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen
984be4def2 sse2: Delete obsolete or redundant comments 2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen
33d9890226 sse2: Remove all the core_combine_* functions
Now that _mm_empty() is not used anymore, they are no longer different
from the sse2_combine_* functions, so they can be consolidated.
2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen
87cd6b8056 sse2: Don't compile pixman-sse2.c with -mmmx anymore
It's not necessary now that the file doesn't use MMX instructions.
2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen
e7fe5e35e9 sse2: Delete unused MMX functions and constants and all _mm_empty()s
These are not needed because the SSE2 implementation doesn't use MMX
anymore.
2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen
f88ae14c15 sse2: Convert all uses of MMX registers to use SSE2 registers instead.
By avoiding use of MMX registers we won't need to call emms all over
the place, which avoids various miscompilation issues.
2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen
7fb75bb3e6 Coding style: core_combine_in_u_pixelsse2 -> core_combine_in_u_pixel_sse2 2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen
510c0d088a In pixman_image_set_transform() allow NULL for transform
Previously, this would crash unless the existing transform were also
NULL.
2011-02-18 06:21:38 -05:00
Søren Sandmann Pedersen
7feb710e60 Avoid marking images dirty when properties are reset
When an image property is set to the same value that it already is,
there is no reason to mark the image dirty and incur a recomputation
of the flags.
2011-02-18 06:21:37 -05:00
Søren Sandmann Pedersen
3598ec26ec Add new public function pixman_add_triangles()
This allows some more code to be deleted from the X server. The
implementation consists of converting to trapezoids, and is shared
with pixman_composite_triangles().
2011-02-18 06:21:37 -05:00
Søren Sandmann Pedersen
964c7e7cd2 Optimize adding opaque trapezoids onto a8 destination.
When the source is opaque and the destination is alpha only, we can
avoid the temporary mask and just add the trapezoids directly.
2011-02-18 06:21:37 -05:00
Søren Sandmann Pedersen
0bc03482f1 Add a test program, tri-test
This program tests whether the new triangle support works.
2011-02-18 06:21:31 -05:00