pixman

mirror of https://salsa.debian.org/xorg-team/lib/pixman synced 2025-09-02 21:33:55 +00:00

Author	SHA1	Message	Date
Siarhei Siamashka	70a923882c	ARM: a bit faster NEON bilinear scaling for r5g6b5 source images Instructions scheduling improved in the code responsible for fetching r5g6b5 pixels and converting them to the intermediate x8r8g8b8 color format used in the interpolation part of code. Still a lot of NEON stalls are remaining, which can be resolved later by the use of pipelining. Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=10020565, speed=32.29 MPix/s op=1, src=10020565, dst=20020888, speed=36.82 MPix/s after: op=1, src=10020565, dst=10020565, speed=41.35 MPix/s op=1, src=10020565, dst=20020888, speed=49.16 MPix/s	2011-03-12 21:30:22 +02:00
Siarhei Siamashka	fe99673719	ARM: NEON optimization for bilinear scaled 'src_0565_0565' Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=10020565, speed=3.30 MPix/s after: op=1, src=10020565, dst=10020565, speed=32.29 MPix/s	2011-03-12 21:30:18 +02:00
Siarhei Siamashka	29003c3bef	ARM: NEON optimization for bilinear scaled 'src_0565_x888' Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=20020888, speed=3.39 MPix/s after: op=1, src=10020565, dst=20020888, speed=36.82 MPix/s	2011-03-12 21:30:13 +02:00
Siarhei Siamashka	2ee27e7d79	ARM: NEON optimization for bilinear scaled 'src_8888_0565' Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=10020565, speed=6.56 MPix/s after: op=1, src=20028888, dst=10020565, speed=61.65 MPix/s	2011-03-12 21:30:09 +02:00
Siarhei Siamashka	11a0c5badb	ARM: use common macro template for bilinear scaled 'src_8888_8888' This is a cleanup for old and now duplicated code. The performance improvement is mostly coming from the enabled use of software prefetch, but instructions scheduling is also slightly better. Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=53.24 MPix/s after: op=1, src=20028888, dst=20028888, speed=74.36 MPix/s	2011-03-12 21:30:05 +02:00
Siarhei Siamashka	34098dba67	ARM: NEON: common macro template for bilinear scanline scalers This allows to generate bilinear scanline scaling functions targeting various source and destination color formats. Right now a8r8g8b8/x8r8g8b8 and r5g6b5 color formats are supported. More formats can be added if needed.	2011-03-12 21:30:00 +02:00
Siarhei Siamashka	66f4ee1b3b	ARM: new bilinear fast path template macro in 'pixman-arm-common.h' It can be reused in different ARM NEON bilinear scaling fast path functions.	2011-03-12 21:29:56 +02:00
Siarhei Siamashka	5921c17639	ARM: assembly optimized nearest scaled 'src_8888_8888' Benchmark on ARM Cortex-A8 r1p3 @500MHz, 32-bit LPDDR @166MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=44.36 MPix/s after: op=1, src=20028888, dst=20028888, speed=39.79 MPix/s Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=102.36 MPix/s after: op=1, src=20028888, dst=20028888, speed=163.12 MPix/s	2011-03-12 21:26:05 +02:00
Siarhei Siamashka	f3e17872f5	ARM: common macro for nearest scaling fast paths The code of nearest scaled 'src_0565_0565' function was generalized and moved to a common macro, so that it can be reused for other fast paths.	2011-03-12 21:24:40 +02:00
Siarhei Siamashka	bb3d1b67fd	ARM: use prefetch in nearest scaled 'src_0565_0565' Benchmark on ARM Cortex-A8 r1p3 @500MHz, 32-bit LPDDR @166MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=10020565, speed=75.02 MPix/s after: op=1, src=10020565, dst=10020565, speed=73.63 MPix/s Benchmark on ARM Cortex-A8 r2p2 @1GHz, 32-bit LPDDR @200MHz: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=10020565, dst=10020565, speed=176.12 MPix/s after: op=1, src=10020565, dst=10020565, speed=267.50 MPix/s	2011-03-12 21:23:54 +02:00
Cyril Brulebois	3503f7956f	Upload to experimental.	2011-03-09 04:08:04 +01:00
Cyril Brulebois	19f2d3d9c1	Bump Standards-Version to 3.9.1 (no changes needed).	2011-03-09 04:07:54 +01:00
Cyril Brulebois	bec6320b0e	Add a quilt series placeholder file.	2011-03-09 04:04:13 +01:00
Cyril Brulebois	43375c5d66	Switch to dh.	2011-03-09 03:55:08 +01:00
Cyril Brulebois	d3975d7ff9	Update Uploaders list. Thanks, David!	2011-03-09 03:42:00 +01:00
Cyril Brulebois	b03a2e477b	Remove libpixman1-dev from Conflicts, last seen in etch!	2011-03-09 03:41:05 +01:00
Cyril Brulebois	61363cc614	Wrap Build-Depends.	2011-03-09 03:40:06 +01:00
Cyril Brulebois	b98292b4d5	Bump shlibs accordingly.	2011-03-09 03:39:07 +01:00
Cyril Brulebois	1e6491fdde	Update symbols file with new symbols.	2011-03-09 03:38:42 +01:00
Cyril Brulebois	1d60bb92f7	Bump changelogs.	2011-03-09 03:21:07 +01:00
Cyril Brulebois	a0ab0aecb2	Merge branch 'upstream-experimental' into debian-experimental	2011-03-09 03:20:36 +01:00
Søren Sandmann Pedersen	84e361c8e3	test: Do endian swapping of the source and destination images. Otherwise the test fails on big endian. Fix for bug 34767, reported by Siarhei Siamashka.	2011-03-07 14:08:00 -05:00
Søren Sandmann Pedersen	84f3c5a71a	test: In image_endian_swap() use pixman_image_get_format() to get the bpp. There is no reason to pass in the bpp as an argument; it can be gotten directly from the image.	2011-03-07 14:07:44 -05:00
Siarhei Siamashka	17feaa9c50	ARM: NEON optimization for bilinear scaled 'src_8888_8888' Initial NEON optimization for bilinear scaling. Can be probably improved more. Benchmark on ARM Cortex-A8: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=6.70 MPix/s after: op=1, src=20028888, dst=20028888, speed=44.27 MPix/s	2011-02-28 15:47:58 +02:00
Siarhei Siamashka	350029396d	SSE2 optimization for bilinear scaled 'src_8888_8888' A primitive naive implementation of bilinear scaling using SSE2 intrinsics, which only handles one pixel at a time. It is approximately 2x faster than pixman general compositing path. Single pass processing without intermediate temporary buffer contributes to ~15% and loop unrolling contributes to ~20% of this speedup. Benchmark on Intel Core i7 (x86-64): Using cairo-perf-trace: before: image firefox-planet-gnome 12.566 12.610 0.23% 6/6 after: image firefox-planet-gnome 10.961 11.013 0.19% 5/6 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=70.48 MPix/s after: op=1, src=20028888, dst=20028888, speed=165.38 MPix/s	2011-02-28 15:47:52 +02:00
Siarhei Siamashka	0df43b8ae5	test: check correctness of 'bilinear_pad_repeat_get_scanline_bounds' Individual correctness check for the new bilinear scaling related supplementary function. This test program uses a bit wider range of input arguments, not covered by other tests.	2011-02-28 15:29:23 +02:00
Siarhei Siamashka	d506bf68fd	Main loop template for fast single pass bilinear scaling Can be used for implementing SIMD optimized fast path functions which work with bilinear scaled source images. Similar to the template for nearest scaling main loop, the following types of mask are supported: 1. no mask 2. non-scaled a8 mask with SAMPLES_COVER_CLIP flag 3. solid mask PAD repeat is fully supported. NONE repeat is partially supported (right now only works if source image has alpha channel or when alpha channel of the source image does not have any effect on the compositing operation).	2011-02-28 15:29:16 +02:00
Andrea Canciani	9ebde285fa	test: Silence MSVC warnings MSVC does not notice non-returning functions (abort() / assert(0)) and warns about paths which end with them in non-void functions: c:\cygwin\home\ranma42\code\fdo\pixman\test\fetch-test.c(114) : warning C4715: 'reader' : not all control paths return a value c:\cygwin\home\ranma42\code\fdo\pixman\test\stress-test.c(133) : warning C4715: 'real_reader' : not all control paths return a value c:\cygwin\home\ranma42\code\fdo\pixman\test\composite.c(431) : warning C4715: 'calc_op' : not all control paths return a value These warnings can be silenced by adding a return after the termination call.	2011-02-28 10:38:02 +01:00
Andrea Canciani	8868778ea1	Do not include unused headers pixman-combine32.h is included without being used both in pixman-image.c and in pixman-general.c.	2011-02-28 10:38:02 +01:00
Andrea Canciani	72f5e5f608	test: Add Makefile for Win32	2011-02-28 10:38:02 +01:00
Andrea Canciani	11305b4ecd	test: Fix tests for compilation on Windows The Microsoft C compiler cannot handle subobject initialization and Win32 does not provide snprintf. Work around these limitations by using normal struct initialization and using sprintf (a manual check shows that the buffer size is sufficient).	2011-02-28 10:38:02 +01:00
Andrea Canciani	20ed723a5a	Fix compilation on Win32 Makefile.win32 contained a typo and was missing the dependency from the built sources.	2011-02-28 10:38:01 +01:00
Søren Sandmann Pedersen	48e951000c	Post-release version bump to 0.21.7	2011-02-22 16:13:32 -05:00
Søren Sandmann Pedersen	8b33321660	Pre-release version bump to 0.21.6	2011-02-22 15:43:41 -05:00
Søren Sandmann Pedersen	2cb67d2a0b	Minor fix to the RELEASING file	2011-02-22 15:40:34 -05:00
Søren Sandmann Pedersen	3cdf74257b	Delete pixman-x64-mmx-emulation.h from pixman/Makefile.am	2011-02-22 15:28:17 -05:00
Siarhei Siamashka	65919ad17f	Ensure that tests run as the last step of a build for 'make check' Previously 'make check' would compile and run tests first, and only then proceed to compiling demos. Which is not very convenient because of the need to scroll back console output to see the tests verdict. Swapping order of SUBDIRS variable entries in Makefile.am resolves this.	2011-02-22 19:43:57 +02:00
Søren Sandmann Pedersen	34a7ac0474	sse2: Minor coding style cleanups. Also make pixman_fill_sse2() static.	2011-02-18 16:03:30 -05:00
Søren Sandmann Pedersen	10f69e5ec8	sse2: Remove pixman-x64-mmx-emulation.h Also stop including mmintrin.h	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	984be4def2	sse2: Delete obsolete or redundant comments	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	33d9890226	sse2: Remove all the core_combine_* functions Now that _mm_empty() is not used anymore, they are no longer different from the sse2_combine_* functions, so they can be consolidated.	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	87cd6b8056	sse2: Don't compile pixman-sse2.c with -mmmx anymore It's not necessary now that the file doesn't use MMX instructions.	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	e7fe5e35e9	sse2: Delete unused MMX functions and constants and all _mm_empty()s These are not needed because the SSE2 implementation doesn't use MMX anymore.	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	f88ae14c15	sse2: Convert all uses of MMX registers to use SSE2 registers instead. By avoiding use of MMX registers we won't need to call emms all over the place, which avoids various miscompilation issues.	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	7fb75bb3e6	Coding style: core_combine_in_u_pixelsse2 -> core_combine_in_u_pixel_sse2	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	510c0d088a	In pixman_image_set_transform() allow NULL for transform Previously, this would crash unless the existing transform were also NULL.	2011-02-18 06:21:38 -05:00
Søren Sandmann Pedersen	7feb710e60	Avoid marking images dirty when properties are reset When an image property is set to the same value that it already is, there is no reason to mark the image dirty and incur a recomputation of the flags.	2011-02-18 06:21:37 -05:00
Søren Sandmann Pedersen	3598ec26ec	Add new public function pixman_add_triangles() This allows some more code to be deleted from the X server. The implementation consists of converting to trapezoids, and is shared with pixman_composite_triangles().	2011-02-18 06:21:37 -05:00
Søren Sandmann Pedersen	964c7e7cd2	Optimize adding opaque trapezoids onto a8 destination. When the source is opaque and the destination is alpha only, we can avoid the temporary mask and just add the trapezoids directly.	2011-02-18 06:21:37 -05:00
Søren Sandmann Pedersen	0bc03482f1	Add a test program, tri-test This program tests whether the new triangle support works.	2011-02-18 06:21:31 -05:00

... 2 3 4 5 6 ...

1929 Commits