pixman

mirror of https://salsa.debian.org/xorg-team/lib/pixman synced 2025-09-04 08:22:14 +00:00

Author	SHA1	Message	Date
Siarhei Siamashka	17feaa9c50	ARM: NEON optimization for bilinear scaled 'src_8888_8888' Initial NEON optimization for bilinear scaling. Can be probably improved more. Benchmark on ARM Cortex-A8: Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=6.70 MPix/s after: op=1, src=20028888, dst=20028888, speed=44.27 MPix/s	2011-02-28 15:47:58 +02:00
Siarhei Siamashka	350029396d	SSE2 optimization for bilinear scaled 'src_8888_8888' A primitive naive implementation of bilinear scaling using SSE2 intrinsics, which only handles one pixel at a time. It is approximately 2x faster than pixman general compositing path. Single pass processing without intermediate temporary buffer contributes to ~15% and loop unrolling contributes to ~20% of this speedup. Benchmark on Intel Core i7 (x86-64): Using cairo-perf-trace: before: image firefox-planet-gnome 12.566 12.610 0.23% 6/6 after: image firefox-planet-gnome 10.961 11.013 0.19% 5/6 Microbenchmark (scaling 2000x2000 image with scale factor close to 1x): before: op=1, src=20028888, dst=20028888, speed=70.48 MPix/s after: op=1, src=20028888, dst=20028888, speed=165.38 MPix/s	2011-02-28 15:47:52 +02:00
Siarhei Siamashka	0df43b8ae5	test: check correctness of 'bilinear_pad_repeat_get_scanline_bounds' Individual correctness check for the new bilinear scaling related supplementary function. This test program uses a bit wider range of input arguments, not covered by other tests.	2011-02-28 15:29:23 +02:00
Siarhei Siamashka	d506bf68fd	Main loop template for fast single pass bilinear scaling Can be used for implementing SIMD optimized fast path functions which work with bilinear scaled source images. Similar to the template for nearest scaling main loop, the following types of mask are supported: 1. no mask 2. non-scaled a8 mask with SAMPLES_COVER_CLIP flag 3. solid mask PAD repeat is fully supported. NONE repeat is partially supported (right now only works if source image has alpha channel or when alpha channel of the source image does not have any effect on the compositing operation).	2011-02-28 15:29:16 +02:00
Andrea Canciani	9ebde285fa	test: Silence MSVC warnings MSVC does not notice non-returning functions (abort() / assert(0)) and warns about paths which end with them in non-void functions: c:\cygwin\home\ranma42\code\fdo\pixman\test\fetch-test.c(114) : warning C4715: 'reader' : not all control paths return a value c:\cygwin\home\ranma42\code\fdo\pixman\test\stress-test.c(133) : warning C4715: 'real_reader' : not all control paths return a value c:\cygwin\home\ranma42\code\fdo\pixman\test\composite.c(431) : warning C4715: 'calc_op' : not all control paths return a value These warnings can be silenced by adding a return after the termination call.	2011-02-28 10:38:02 +01:00
Andrea Canciani	8868778ea1	Do not include unused headers pixman-combine32.h is included without being used both in pixman-image.c and in pixman-general.c.	2011-02-28 10:38:02 +01:00
Andrea Canciani	72f5e5f608	test: Add Makefile for Win32	2011-02-28 10:38:02 +01:00
Andrea Canciani	11305b4ecd	test: Fix tests for compilation on Windows The Microsoft C compiler cannot handle subobject initialization and Win32 does not provide snprintf. Work around these limitations by using normal struct initialization and using sprintf (a manual check shows that the buffer size is sufficient).	2011-02-28 10:38:02 +01:00
Andrea Canciani	20ed723a5a	Fix compilation on Win32 Makefile.win32 contained a typo and was missing the dependency from the built sources.	2011-02-28 10:38:01 +01:00
Søren Sandmann Pedersen	48e951000c	Post-release version bump to 0.21.7	2011-02-22 16:13:32 -05:00
Søren Sandmann Pedersen	8b33321660	Pre-release version bump to 0.21.6	2011-02-22 15:43:41 -05:00
Søren Sandmann Pedersen	2cb67d2a0b	Minor fix to the RELEASING file	2011-02-22 15:40:34 -05:00
Søren Sandmann Pedersen	3cdf74257b	Delete pixman-x64-mmx-emulation.h from pixman/Makefile.am	2011-02-22 15:28:17 -05:00
Siarhei Siamashka	65919ad17f	Ensure that tests run as the last step of a build for 'make check' Previously 'make check' would compile and run tests first, and only then proceed to compiling demos. Which is not very convenient because of the need to scroll back console output to see the tests verdict. Swapping order of SUBDIRS variable entries in Makefile.am resolves this.	2011-02-22 19:43:57 +02:00
Søren Sandmann Pedersen	34a7ac0474	sse2: Minor coding style cleanups. Also make pixman_fill_sse2() static.	2011-02-18 16:03:30 -05:00
Søren Sandmann Pedersen	10f69e5ec8	sse2: Remove pixman-x64-mmx-emulation.h Also stop including mmintrin.h	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	984be4def2	sse2: Delete obsolete or redundant comments	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	33d9890226	sse2: Remove all the core_combine_* functions Now that _mm_empty() is not used anymore, they are no longer different from the sse2_combine_* functions, so they can be consolidated.	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	87cd6b8056	sse2: Don't compile pixman-sse2.c with -mmmx anymore It's not necessary now that the file doesn't use MMX instructions.	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	e7fe5e35e9	sse2: Delete unused MMX functions and constants and all _mm_empty()s These are not needed because the SSE2 implementation doesn't use MMX anymore.	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	f88ae14c15	sse2: Convert all uses of MMX registers to use SSE2 registers instead. By avoiding use of MMX registers we won't need to call emms all over the place, which avoids various miscompilation issues.	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	7fb75bb3e6	Coding style: core_combine_in_u_pixelsse2 -> core_combine_in_u_pixel_sse2	2011-02-18 16:03:29 -05:00
Søren Sandmann Pedersen	510c0d088a	In pixman_image_set_transform() allow NULL for transform Previously, this would crash unless the existing transform were also NULL.	2011-02-18 06:21:38 -05:00
Søren Sandmann Pedersen	7feb710e60	Avoid marking images dirty when properties are reset When an image property is set to the same value that it already is, there is no reason to mark the image dirty and incur a recomputation of the flags.	2011-02-18 06:21:37 -05:00
Søren Sandmann Pedersen	3598ec26ec	Add new public function pixman_add_triangles() This allows some more code to be deleted from the X server. The implementation consists of converting to trapezoids, and is shared with pixman_composite_triangles().	2011-02-18 06:21:37 -05:00
Søren Sandmann Pedersen	964c7e7cd2	Optimize adding opaque trapezoids onto a8 destination. When the source is opaque and the destination is alpha only, we can avoid the temporary mask and just add the trapezoids directly.	2011-02-18 06:21:37 -05:00
Søren Sandmann Pedersen	0bc03482f1	Add a test program, tri-test This program tests whether the new triangle support works.	2011-02-18 06:21:31 -05:00
Søren Sandmann Pedersen	79e69aac8c	Add support for triangles to pixman. The Render X extension can draw triangles as well as trapezoids, but the implementation has always converted them to trapezoids. This patch moves the X server's triangle conversion code into pixman, where we can reuse the pixman_composite_trapezoid() code.	2011-02-15 09:25:18 -05:00
Søren Sandmann Pedersen	4e6dd4928d	Add a test program for pixman_composite_trapezoids(). A CRC32 based test program to check that pixman_composite_trapezoids() actually works.	2011-02-15 09:25:18 -05:00
Søren Sandmann Pedersen	803272e38c	Add pixman_composite_trapezoids(). This function is an implementation of the X server request Trapezoids. That request is what the X backend of cairo is using all the time; by moving it into pixman we can hopefully make it faster.	2011-02-15 09:25:18 -05:00
Søren Sandmann Pedersen	1feaf6bea7	test/Makefile.am: Move all the TEST_LDADD into a new global LDADD. This gets rid of a bunch of replicated *_LDADD clauses	2011-02-15 09:25:17 -05:00
Søren Sandmann Pedersen	1237fd9bc8	Add @TESTPROGS_EXTRA_LDFLAGS@ to AM_LDFLAGS Instead of explicitly adding it to each test program.	2011-02-15 09:25:17 -05:00
Søren Sandmann Pedersen	7dfe845786	Move all the GTK+ based test programs to a new subdir, "demos" This separates the test suite from the random gtk+ using test programs. "demos" is somewhat misleading because the programs there are not particularly exciting (with the possible exception of composite-test which shows off all the compositing operators).	2011-02-15 09:25:17 -05:00
Siarhei Siamashka	8e4100260b	SSE2 optimization for nearest scaled over_8888_n_8888 This operation shows up a little bit in some of the html5 based games from http://www.kesiev.com/akihabara/ === Cairo trace of the game intro animation for 'Legend of Sadness' === before: [ 0] image firefox-legend-of-sadness 46.286 46.298 0.01% 5/6 after: [ 0] image firefox-legend-of-sadness 45.088 45.102 0.04% 6/6 === Microbenchmark (scaling ~2000x~2000 -> ~2000x~2000) === before: translucent: op=3, src=8888, mask=s dst=8888, speed=131.30 MPix/s transparent: op=3, src=8888, mask=s dst=8888, speed=132.38 MPix/s opaque: op=3, src=8888, mask=s dst=8888, speed=167.90 MPix/s after: translucent: op=3, src=8888, mask=s dst=8888, speed=301.93 MPix/s transparent: op=3, src=8888, mask=s dst=8888, speed=770.70 MPix/s opaque: op=3, src=8888, mask=s dst=8888, speed=301.80 MPix/s	2011-02-15 14:32:41 +02:00
Siarhei Siamashka	39b86b032d	ARM: NEON optimization for nearest scaled over_0565_8_0565 In some cases may be used for html5 video when hardware acceleration is not available.	2011-02-15 14:32:34 +02:00
Siarhei Siamashka	9a90c1c90f	ARM: NEON optimization for nearest scaled over_8888_8_0565 In some cases may be used for html5 video when hardware acceleration is not available.	2011-02-15 14:32:28 +02:00
Siarhei Siamashka	cd1062ded4	ARM: new macro template for using scaled fast paths with a8 mask	2011-02-15 14:32:23 +02:00
Siarhei Siamashka	b099957887	Better support for NONE repeat in nearest scaling main loop template Scaling function now gets an extra boolean argument, which is set to TRUE when we are fetching padding pixels for NONE repeat. This allows to make a decision whether to interpret alpha as 0xFF or 0x00 for such pixels when working with formats which don't have alpha channel (for example x8r8g8b8 and r5g6b5).	2011-02-15 14:32:16 +02:00
Siarhei Siamashka	14f82083a1	Support for a8 and solid mask in nearest scaling main loop template In addition to the most common case of not having any mask at all, two variants of scaling with mask show up in cairo traces: 1. non-scaled a8 mask with SAMPLES_COVER_CLIP flag 2. solid mask This patch extends the nearest scaling main loop template to also support these cases.	2011-02-15 14:32:06 +02:00
Siarhei Siamashka	e83cee5aac	test: Extend scaling-test to support a8/solid mask and ADD operation Image width also has been increased because SIMD optimizations typically do more unrolling in the inner loops, and this needs to be tested.	2011-02-15 14:32:01 +02:00
Siarhei Siamashka	97447f440f	Use const modifiers for source buffers in nearest scaling fast paths	2011-02-15 14:29:54 +02:00
Siarhei Siamashka	8d359b00c5	C fast paths for a simple 90/270 degrees rotation Depending on CPU architecture, performance is in the range of 1.5 to 4 times slower than simple nonrotated copy (which would be an ideal case, perfectly utilizing memory bandwidth), but still is more than 7 times faster if compared to general path. This implementation sets a performance baseline for rotation. The use of SIMD instructions may further improve memory bandwidth utilization.	2011-02-10 16:18:01 +02:00
Siarhei Siamashka	e0c7948c97	New flags for 90/180/270 rotation These flags are set when the transform is a simple nonscaled 90/180/270 degrees rotation.	2011-02-10 16:17:24 +02:00
Siarhei Siamashka	3b68c295fd	test: affine-test updated to stress 90/180/270 degrees rotation more	2011-02-10 16:17:18 +02:00
Søren Sandmann Pedersen	56f173f0af	Add pixman-conical-gradient.c to Makefile.win32. Pointed out by Kirill Tishin.	2011-02-10 05:21:42 -05:00
Cyril Brulebois	fc1b85f258	Upload to unstable.	2011-02-06 05:31:27 +01:00
Cyril Brulebois	84bb9a7605	Mention upstream git URL in a comment.	2011-02-06 05:30:48 +01:00
Søren Sandmann Pedersen	7fd4897730	Add SSE2 fetcher for 0565 Before: add_0565_0565 = L1: 61.08 L2: 61.03 M: 60.57 ( 10.95%) HT: 46.85 VT: 45.25 R: 39.99 RT: 20.41 ( 233Kops/s) After: add_0565_0565 = L1: 77.84 L2: 76.25 M: 75.38 ( 13.71%) HT: 55.99 VT: 54.56 R: 45.41 RT: 21.95 ( 255Kops/s)	2011-02-03 03:25:05 -05:00
Søren Sandmann Pedersen	8414aa76c2	Improve performance of sse2_combine_over_u() Split this function into two, one that has a mask, and one that doesn't. This is a fairly substantial speed-up in many cases. New output of lowlevel-blt-bench over_x888_8_0565: over_x888_8_0565 = L1: 63.76 L2: 62.75 M: 59.37 ( 21.55%) HT: 45.89 VT: 43.55 R: 34.51 RT: 16.80 ( 201Kops/s)	2011-02-03 03:25:05 -05:00
Søren Sandmann Pedersen	08e855f15c	Add SSE2 fetcher for a8 New output of lowlevel-blt-bench over_x888_8_0565: over_x888_8_0565 = L1: 57.85 L2: 56.80 M: 54.14 ( 19.50%) HT: 42.64 VT: 40.56 R: 32.67 RT: 16.22 ( 195Kops/s) Based in part on code by Steve Snyder from https://bugs.freedesktop.org/show_bug.cgi?id=21173	2011-02-03 03:25:05 -05:00

... 3 4 5 6 7 ...

1956 Commits