pixman

mirror of https://salsa.debian.org/xorg-team/lib/pixman synced 2025-09-02 12:15:01 +00:00

Author	SHA1	Message	Date
Søren Sandmann Pedersen	e75e6a4ef5	ARM: Add 'neon_composite_over_n_8888_0565_ca' fast path This improves the performance of the firefox-talos-gfx benchmark with the image16 backend. Benchmark on an 800 MHz ARM Cortex A8: Before: [ # ] backend test min(s) median(s) stddev. count [ 0] image16 firefox-talos-gfx 121.773 122.218 0.15% 6/6 After: [ # ] backend test min(s) median(s) stddev. count [ 0] image16 firefox-talos-gfx 85.247 85.563 0.22% 6/6 V2: Slightly better instruction scheduling based on comments from Taekyun Kim. V3: Eliminate all stalls from the inner loop. Also based on comments from Taekyun Kim.	2011-04-18 16:25:36 -04:00
Gilles Espinasse	1670b95214	Fix OpenMP not supported case PIXMAN_LINK_WITH_ENV did not fail unless -Wall -Werror is used. So even when the compiler did not support OpenMP, USE_OPENMP was defined. Fix that by running the second OpenMP test only when first AC_OPENMP find supported configure tested in the cases : gcc without libgomp support, no openmp option, --enable-openmp and --disable-openmp gcc with libgomp support, no openmp option, --enable-openmp and --disable-openmp Not tested with autoconf version not knowing openmp (<2.62) Warn when --enable-openmp is requested but no support is found Signed-off-by: Gilles Espinasse <g.esp@free.fr>	2011-04-18 16:13:58 -04:00
Gilles Espinasse	b9e8f7fb74	Fix missing AC_MSG_RESULT value from Werror test Use the correct variable name Signed-off-by: Gilles Espinasse <g.esp@free.fr>	2011-04-18 16:13:58 -04:00
Siarhei Siamashka	caae4e82ff	ARM: pipelined NEON implementation of bilinear scaled 'src_8888_0565' Benchmark on ARM Cortex-A8 r1p3 @600MHz, 32-bit LPDDR @166MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=10020565, speed=33.59 MPix/s after: op=1, src=20028888, dst=10020565, speed=46.25 MPix/s Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=10020565, speed=63.86 MPix/s after: op=1, src=20028888, dst=10020565, speed=84.22 MPix/s	2011-04-11 10:48:35 +03:00
Siarhei Siamashka	d080d59b80	ARM: pipelined NEON implementation of bilinear scaled 'src_8888_8888' Performance of the inner loop when working with the data in L1 cache: ARM Cortex-A8: 41 cycles per 4 pixels (no stalls and partial dual issue) ARM Cortex-A9: 48 cycles per 4 pixels (no stalls) It might be still possible to improve performance even more on ARM Cortex-A8 with a better use of dual issue. Benchmark on ARM Cortex-A8 r1p3 @600MHz, 32-bit LPDDR @166MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=40.38 MPix/s after: op=1, src=20028888, dst=20028888, speed=48.47 MPix/s Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=79.68 MPix/s after: op=1, src=20028888, dst=20028888, speed=93.11 MPix/s	2011-04-11 10:48:30 +03:00
Siarhei Siamashka	b496a8b279	ARM: support different levels of loop unrolling in bilinear scaler Now an extra 'flag' parameter is supported in bilinear scaline scaling function generation macro. It can be used to enable 4 or 8 pixels per loop iteration unrolling and provide save/restore code for d8-d15 registers.	2011-04-11 10:48:24 +03:00
Siarhei Siamashka	34ca9cf03f	ARM: use less ARM instructions in NEON bilinear scaling code This reduces code size and also puts less pressure on the instruction decoder.	2011-04-11 10:48:14 +03:00
Siarhei Siamashka	0f7be9f72e	ARM: support for software pipelining in bilinear macros Now it's possible to override the main loop of bilinear scaling code with optimized pipelined implementation.	2011-04-11 10:48:10 +03:00
Siarhei Siamashka	9638af9583	ARM: use aligned memory writes in NEON bilinear scaling code	2011-04-11 10:48:05 +03:00
Siarhei Siamashka	8bba3a0e1e	ARM: tweaked horizontal weights update in NEON bilinear scaling code Moving horizontal interpolation weights update instructions from the beginning of loop to its end allows to hide some pipeline stalls and improve performance.	2011-04-11 10:48:01 +03:00
Cyril Brulebois	eade7b4dbd	Upload to unstable.	2011-04-10 23:08:45 +02:00
Søren Sandmann Pedersen	a215322267	ARM: Tiny improvement in over_n_8888_8888_ca_process_pixblock_head Instead of two mvn d24, d24 mvn d25, d25 use just one mvn q12, q12 Also move another vmvn instruction into the created pipeline bubble, as pointed out by Siarhei.	2011-04-06 23:03:19 -04:00
Søren Sandmann Pedersen	44f99735d9	Makefile.am: Put development releases in "snapshots" directory Up until now, all pixman release, both snapshots and releases were uploaded to the "releases" directory on www.cairographics.org, but it's better to development snapshots in the "snapshots" directory. This patch changes Makefile.am to do that.	2011-04-06 23:03:10 -04:00
Steve Langasek	c6ce22e73a	build for multiarch	2011-03-26 00:30:06 -07:00
Søren Sandmann Pedersen	ad3cbfb073	test: Fix infinite loop in composite When run in PIXMAN_RANDOMIZE_TESTS mode, this test would go into an infinite loop because the loop started at 'seed' but the stop condition was still N_TESTS.	2011-03-22 13:43:29 -04:00
Alexandros Frantzis	b514e63cfc	Add support for the r8g8b8a8 and r8g8b8x8 formats to the tests.	2011-03-22 13:43:29 -04:00
Alexandros Frantzis	f05a90e5f8	Add simple support for the r8g8b8a8 and r8g8b8x8 formats. This format is particularly useful on big-endian architectures, where RGBA in memory/file order corresponds to r8g8b8a8 as an uint32_t. This is important because RGBA is in some cases the only available choice (for example as a pixel format in OpenGL ES 2.0).	2011-03-22 13:43:29 -04:00
Søren Sandmann Pedersen	7eb0abb5e8	test: Randomize some tests if PIXMAN_RANDOMIZE_TESTS is set This patch makes so that composite and stress-test will start from a random seed if the PIXMAN_RANDOMIZE_TESTS environment variable is set. Running the test suite in this mode is useful to get more test coverage. Also, in stress-test.c make it so that setting the initial seed causes threads to be turned off. This makes it much easier to see when something fails.	2011-03-19 08:51:35 -04:00
Søren Sandmann Pedersen	6b27768d81	Simplify the prototype for iterator initializers. All of the information previously passed to the iterator initializers is now available in the iterator itself, so there is no need to pass it as arguments anymore.	2011-03-18 16:23:10 -04:00
Søren Sandmann Pedersen	74d0f44b6d	Fill out parts of iters in _pixman_implementation_{src,dest}_iter_init() This makes _pixman_implementation_{src,dest}_iter_init() responsible for filling parts of the information in the iterators. Specifically, the information passed as arguments is stored in the iterator. Also add a height field to pixman_iter_t().	2011-03-18 16:23:10 -04:00
Søren Sandmann Pedersen	be4eaa0e4f	In delegate_{src,dest}_iter_init() call delegate directly. There is no reason to go through _pixman_implementation_{src,dest}_iter_init(), especially since _pixman_implementation_src_iter_init() is doing various other checks that only need to be done once. Also call delegate->src_iter_init() directly in pixman-sse2.c	2011-03-18 16:23:10 -04:00
Siarhei Siamashka	70a923882c	ARM: a bit faster NEON bilinear scaling for r5g6b5 source images Instructions scheduling improved in the code responsible for fetching r5g6b5 pixels and converting them to the intermediate x8r8g8b8 color format used in the interpolation part of code. Still a lot of NEON stalls are remaining, which can be resolved later by the use of pipelining. Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=10020565, speed=32.29 MPix/s op=1, src=10020565, dst=20020888, speed=36.82 MPix/s after: op=1, src=10020565, dst=10020565, speed=41.35 MPix/s op=1, src=10020565, dst=20020888, speed=49.16 MPix/s	2011-03-12 21:30:22 +02:00
Siarhei Siamashka	fe99673719	ARM: NEON optimization for bilinear scaled 'src_0565_0565' Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=10020565, speed=3.30 MPix/s after: op=1, src=10020565, dst=10020565, speed=32.29 MPix/s	2011-03-12 21:30:18 +02:00
Siarhei Siamashka	29003c3bef	ARM: NEON optimization for bilinear scaled 'src_0565_x888' Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=20020888, speed=3.39 MPix/s after: op=1, src=10020565, dst=20020888, speed=36.82 MPix/s	2011-03-12 21:30:13 +02:00
Siarhei Siamashka	2ee27e7d79	ARM: NEON optimization for bilinear scaled 'src_8888_0565' Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=10020565, speed=6.56 MPix/s after: op=1, src=20028888, dst=10020565, speed=61.65 MPix/s	2011-03-12 21:30:09 +02:00
Siarhei Siamashka	11a0c5badb	ARM: use common macro template for bilinear scaled 'src_8888_8888' This is a cleanup for old and now duplicated code. The performance improvement is mostly coming from the enabled use of software prefetch, but instructions scheduling is also slightly better. Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=53.24 MPix/s after: op=1, src=20028888, dst=20028888, speed=74.36 MPix/s	2011-03-12 21:30:05 +02:00
Siarhei Siamashka	34098dba67	ARM: NEON: common macro template for bilinear scanline scalers This allows to generate bilinear scanline scaling functions targeting various source and destination color formats. Right now a8r8g8b8/x8r8g8b8 and r5g6b5 color formats are supported. More formats can be added if needed.	2011-03-12 21:30:00 +02:00
Siarhei Siamashka	66f4ee1b3b	ARM: new bilinear fast path template macro in 'pixman-arm-common.h' It can be reused in different ARM NEON bilinear scaling fast path functions.	2011-03-12 21:29:56 +02:00
Siarhei Siamashka	5921c17639	ARM: assembly optimized nearest scaled 'src_8888_8888' Benchmark on ARM Cortex-A8 r1p3 @500MHz, 32-bit LPDDR @166MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=44.36 MPix/s after: op=1, src=20028888, dst=20028888, speed=39.79 MPix/s Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=102.36 MPix/s after: op=1, src=20028888, dst=20028888, speed=163.12 MPix/s	2011-03-12 21:26:05 +02:00
Siarhei Siamashka	f3e17872f5	ARM: common macro for nearest scaling fast paths The code of nearest scaled 'src_0565_0565' function was generalized and moved to a common macro, so that it can be reused for other fast paths.	2011-03-12 21:24:40 +02:00
Siarhei Siamashka	bb3d1b67fd	ARM: use prefetch in nearest scaled 'src_0565_0565' Benchmark on ARM Cortex-A8 r1p3 @500MHz, 32-bit LPDDR @166MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=10020565, speed=75.02 MPix/s after: op=1, src=10020565, dst=10020565, speed=73.63 MPix/s Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=10020565, speed=176.12 MPix/s after: op=1, src=10020565, dst=10020565, speed=267.50 MPix/s	2011-03-12 21:23:54 +02:00
Cyril Brulebois	3503f7956f	Upload to experimental.	2011-03-09 04:08:04 +01:00
Cyril Brulebois	19f2d3d9c1	Bump Standards-Version to 3.9.1 (no changes needed).	2011-03-09 04:07:54 +01:00
Cyril Brulebois	bec6320b0e	Add a quilt series placeholder file.	2011-03-09 04:04:13 +01:00
Cyril Brulebois	43375c5d66	Switch to dh.	2011-03-09 03:55:08 +01:00
Cyril Brulebois	d3975d7ff9	Update Uploaders list. Thanks, David!	2011-03-09 03:42:00 +01:00
Cyril Brulebois	b03a2e477b	Remove libpixman1-dev from Conflicts, last seen in etch!	2011-03-09 03:41:05 +01:00
Cyril Brulebois	61363cc614	Wrap Build-Depends.	2011-03-09 03:40:06 +01:00
Cyril Brulebois	b98292b4d5	Bump shlibs accordingly.	2011-03-09 03:39:07 +01:00
Cyril Brulebois	1e6491fdde	Update symbols file with new symbols.	2011-03-09 03:38:42 +01:00
Cyril Brulebois	1d60bb92f7	Bump changelogs.	2011-03-09 03:21:07 +01:00
Cyril Brulebois	a0ab0aecb2	Merge branch 'upstream-experimental' into debian-experimental	2011-03-09 03:20:36 +01:00
Søren Sandmann Pedersen	84e361c8e3	test: Do endian swapping of the source and destination images. Otherwise the test fails on big endian. Fix for bug 34767, reported by Siarhei Siamashka.	2011-03-07 14:08:00 -05:00
Søren Sandmann Pedersen	84f3c5a71a	test: In image_endian_swap() use pixman_image_get_format() to get the bpp. There is no reason to pass in the bpp as an argument; it can be gotten directly from the image.	2011-03-07 14:07:44 -05:00
Siarhei Siamashka	17feaa9c50	ARM: NEON optimization for bilinear scaled 'src_8888_8888' Initial NEON optimization for bilinear scaling. Can be probably improved more. Benchmark on ARM Cortex-A8: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=6.70 MPix/s after: op=1, src=20028888, dst=20028888, speed=44.27 MPix/s	2011-02-28 15:47:58 +02:00
Siarhei Siamashka	350029396d	SSE2 optimization for bilinear scaled 'src_8888_8888' A primitive naive implementation of bilinear scaling using SSE2 intrinsics, which only handles one pixel at a time. It is approximately 2x faster than pixman general compositing path. Single pass processing without intermediate temporary buffer contributes to ~15% and loop unrolling contributes to ~20% of this speedup. Benchmark on Intel Core i7 (x86-64): Using cairo-perf-trace: before: image firefox-planet-gnome 12.566 12.610 0.23% 6/6 after: image firefox-planet-gnome 10.961 11.013 0.19% 5/6 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=70.48 MPix/s after: op=1, src=20028888, dst=20028888, speed=165.38 MPix/s	2011-02-28 15:47:52 +02:00
Siarhei Siamashka	0df43b8ae5	test: check correctness of 'bilinear_pad_repeat_get_scanline_bounds' Individual correctness check for the new bilinear scaling related supplementary function. This test program uses a bit wider range of input arguments, not covered by other tests.	2011-02-28 15:29:23 +02:00
Siarhei Siamashka	d506bf68fd	Main loop template for fast single pass bilinear scaling Can be used for implementing SIMD optimized fast path functions which work with bilinear scaled source images. Similar to the template for nearest scaling main loop, the following types of mask are supported: 1. no mask 2. non-scaled a8 mask with SAMPLES_COVER_CLIP flag 3. solid mask PAD repeat is fully supported. NONE repeat is partially supported (right now only works if source image has alpha channel or when alpha channel of the source image does not have any effect on the compositing operation).	2011-02-28 15:29:16 +02:00
Andrea Canciani	9ebde285fa	test: Silence MSVC warnings MSVC does not notice non-returning functions (abort() / assert(0)) and warns about paths which end with them in non-void functions: c:\cygwin\home\ranma42\code\fdo\pixman\test\fetch-test.c(114) : warning C4715: 'reader' : not all control paths return a value c:\cygwin\home\ranma42\code\fdo\pixman\test\stress-test.c(133) : warning C4715: 'real_reader' : not all control paths return a value c:\cygwin\home\ranma42\code\fdo\pixman\test\composite.c(431) : warning C4715: 'calc_op' : not all control paths return a value These warnings can be silenced by adding a return after the termination call.	2011-02-28 10:38:02 +01:00
Andrea Canciani	8868778ea1	Do not include unused headers pixman-combine32.h is included without being used both in pixman-image.c and in pixman-general.c.	2011-02-28 10:38:02 +01:00

1 2 3 4 5 ...

1850 Commits