pixman

mirror of https://salsa.debian.org/xorg-team/lib/pixman synced 2025-09-09 07:43:38 +00:00

Author	SHA1	Message	Date
Siarhei Siamashka	2f6c7b4f9d	ARM: NEON: added forgotten cache preload for over_n_8888/over_n_0565 Prefetch provides up to 40-50% better performance when working with large images and/or when having lots of L2 cache misses on ARM Cortex-A8 @ 720MHz: == before == over_n_8888 = L1: 225.83 L2: 181.02 M: 55.57 ( 41.41%) HT: 38.96 VT: 36.92 R: 32.84 RT: 14.15 ( 123Kops/s) over_n_0565 = L1: 153.91 L2: 149.69 M: 83.17 ( 30.95%) HT: 50.41 VT: 49.15 R: 40.56 RT: 15.45 ( 131Kops/s) == after == over_n_8888 = L1: 222.39 L2: 170.95 M: 76.86 ( 57.27%) HT: 58.80 VT: 53.03 R: 45.51 RT: 14.13 ( 124Kops/s) over_n_0565 = L1: 151.87 L2: 149.54 M:125.63 ( 46.80%) HT: 67.85 VT: 57.54 R: 50.21 RT: 15.32 ( 130Kops/s)	2010-10-04 23:05:24 +03:00
Mika Yrjola	b924bb1f81	Fix "syntax error: empty declaration" warnings. These minor changes should fix a large number of macro declaration - related "syntax error: empty declaration" warnings which are seen while compiling the code with the Solaris Studio compiler.	2010-10-04 11:20:01 -04:00
Søren Sandmann Pedersen	73c1fefa1b	Delete simple repeat code This was supposedly an optimization, but it has pathological cases where it definitely isn't. For example a 1 x n image will cause it to have terrible memory access patterns and to generate a ton of modulus operations. Since no one has ever measured whether it actually is an improvement, and since it is doing the repeating at the wrong the stage in the pipeline, and since with the previous commit it can't be triggered anymore because we now require SAMPLES_COVER_CLIP for regular fast paths, just delete it.	2010-10-04 11:19:27 -04:00
Søren Sandmann Pedersen	a4d1c9d383	Fix bug in FAST_PATH_STD_FAST_PATH The standard fast paths deal with two kinds of images: solids and bits. These two image types require different flags, but PIXMAN_STD_FAST_PATH uses the same ones for both. This patch makes it so that solid images just get the standard flags, while bits images must be untransformed contain the destination clip within the sample grid. This means that the old FAST_PATH_COVERS_CLIP flag is now not used anymore, so it can be deleted.	2010-10-04 11:17:53 -04:00
Dmitri Vorobiev	10e13135c3	Some clean-ups in fence_malloc() and fence_free() This patch removes an unnecessary typecast of MAP_FAILED, replaces an erroneous free() by the correct munmap() in the error path for a failing mprotect(), and, finally, removes redundant calls to mprotect() that aren't necessary, because munmap() doesn't call for any specific memory protection.	2010-09-29 02:15:12 -04:00
Søren Sandmann Pedersen	ba693d2e88	Fix search-and-replace issue in lowlevel-blt-bench.c	2010-09-28 02:52:17 -04:00
Søren Sandmann Pedersen	77d3e5f6ff	Rename all the fast paths with _8000 in their names to _8 This inconsistent naming somehow survived the refactoring from a while back.	2010-09-28 00:07:47 -04:00
Liu Xinyun	ba69989374	Remove cache prefetch code. The performance is decreased with cache prefetch, especially for ATOM. So remove these code. Following is the experiment. old: 0.19.5-with-cache-prefetch new: 0.19.5-without-cache-prefetch CPU: Intel Atom N270@1.6GHz OS: MeeGo (32 bits) Speedups ======== image-rgba poppler-0 17125.68 (17279.58 0.92%) -> 14765.36 (15926.49 3.54%): 1.16x speedup image-rgba ocitysmap-0 9008.25 (9040.41 7.50%) -> 8277.94 (8343.09 5.44%): 1.09x speedup image-rgba xfce4-terminal-a1-0 18020.76 (18230.68 0.97%) -> 16703.77 (16712.42 1.22%): 1.08x speedup image-rgba gnome-terminal-vim-0 25081.38 (25133.38 0.24%) -> 23407.47 (23652.98 0.54%): 1.07x speedup image-rgba firefox-talos-gfx-0 57916.97 (57973.20 0.11%) -> 54556.64 (54624.55 0.39%): 1.06x speedup image-rgba firefox-planet-gnome-0 102377.47 (103496.63 0.70%) -> 96816.65 (97075.54 0.15%): 1.06x speedup image-rgba swfdec-giant-steps-0 12376.24 (12616.84 1.02%) -> 11705.30 (11825.20 1.06%): 1.06x speedup CPU: Intel Core(TM)2 Duo CPU T9600@2.80GHz OS: Ubuntu 10.04 (64bits) Speedups ======== image-rgba ocitysmap-0 2671.46 (2691.82 8.55%) -> 2296.20 (2307.26 5.77%): 1.16x speedup image-rgba swfdec-giant-steps-0 1614.55 (1615.18 1.68%) -> 1532.84 (1538.52 0.72%): 1.05x speedup Signed-off-by: Liu Xinyun <xinyun.liu@intel.com> Signed-off-by: Chen Miaobo <miaobo.chen@intel.com>	2010-09-27 23:44:09 -04:00
Dmitri Vorobiev	56777f3f67	Use <sys/mman.h> macros only when they are available Not all systems are regular Unices, so let's be careful with the mmap()-related stuff, which might be unavailable. This patch makes sure that mmap() and friends is used only when the <sys/mman.h> header is found.	2010-09-23 16:02:29 -04:00
Søren Sandmann Pedersen	39524a4687	Revert "add enable-cache-prefetch option" Revert this accidentally committed patch. This reverts commit `19ea0e16b9`.	2010-09-21 14:20:43 -04:00
Søren Sandmann Pedersen	e97da21049	If MAP_ANONYMOUS is not defined, define it to MAP_ANON. This hopefully fixes the build failure on OS X.	2010-09-21 14:12:00 -04:00
Liu Xinyun	19ea0e16b9	add enable-cache-prefetch option OK. here is the work to clear all cache prefetch. Please review it. 3x On Tue, Sep 21, 2010 at 11:36:30PM +0800, Soeren Sandmann wrote: > Liu Xinyun <xinyun.liu@intel.com> writes: > > > This patch is to add a new configuration option: enable-cache-prefetch, > > which is default yes. > > > > Here is a link which talks on cache issue. > > http://lists.freedesktop.org/archives/pixman/2010-June/000218.html > > > > When disable it on Atom CPU(configured with --enable-cache-prefetch=no), > > it will have a little performance gain. Here is the patch. > > I think the cache prefetch code should just be deleted outright. No > benchmarks that I'm aware of show it to be an improvement. > > > Thanks, > Soren >From bca2192ef524bcae4eea84d0ffed9e8c4855675f Mon Sep 17 00:00:00 2001 From: Liu Xinyun <xinyun.liu@intel.com> Date: Wed, 22 Sep 2010 00:11:56 +0800 Subject: [PATCH] remove cache prefetch	2010-09-21 12:35:51 -04:00
Søren Sandmann Pedersen	edd1733966	Post-release version bump to 0.19.5	2010-09-21 10:18:44 -04:00
Søren Sandmann Pedersen	e5b3a6e710	Pre-release version bump to 0.19.4	2010-09-21 10:11:34 -04:00
Søren Sandmann Pedersen	0742ba4164	compute_composite_region32: Zero extents before returning FALSE. If the extents of the composite region are broken such that x2 <= x1 or y2 <= y1, then we need to zero the extents before returning so that the region won't be completely broken when calling pixman_region32_fini().	2010-09-21 10:05:52 -04:00
Jonathan Morton	7cd4f2fa20	Add a lowlevel blitter benchmark This test is a modified version of Siarhei's compositor throughput benchmark. It's expanded with explicit reporting of memory bandwidth consumption for the M-test, and with an additional 8x8-random test intended to determine peak ops/sec capability. There are also quite a lot more operations tested for.	2010-09-21 08:50:18 -04:00
Dmitri Vorobiev	eab3a77877	Add noinline macro This patch adds a noinline macro, which expands to compiler-dependent keywords that tell the compiler to never inline a function.	2010-09-21 08:50:17 -04:00
Dmitri Vorobiev	cab3261c0d	Add gettime() routine to test utils Impending benchmark code will need a function to get current time in seconds, and this patch introduces such routine. We try to use the POSIX gettimeofday() function when available, and fall back to clock() when not.	2010-09-21 08:50:17 -04:00
Dmitri Vorobiev	fd3c87d460	Move aligned_malloc() to utils The aligned_malloc() routine will be used in more than one test utility. At least, a low-level blitter benchmark needs it. Therefore, let's make this function a part of common test utilities code.	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	f474783607	Enable bits_image_fetch_bilinear_affine_normal_r5g6b5	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	91521d30ab	Enable bits_image_fetch_bilinear_affine_reflect_r5g6b5	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	372d7b954a	Enable bits_image_fetch_bilinear_affine_none_r5g6b5	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	a826ae0e3a	Enable bits_image_fetch_bilinear_affine_pad_r5g6b5	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	c5238bd180	Enable bits_image_fetch_bilinear_affine_normal_a8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	d12daefcdb	Enable bits_image_fetch_bilinear_affine_reflect_a8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	9388be3293	Enable bits_image_fetch_bilinear_affine_none_a8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	8e4d4e8d11	Enable bits_image_fetch_bilinear_affine_pad_a8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	ce1f6c50b4	Enable bits_image_fetch_bilinear_affine_normal_x8r8g8b8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	83f2ee3e95	Enable bits_image_fetch_bilinear_affine_reflect_x8r8g8b8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	be37ae331c	Enable bits_image_fetch_bilinear_affine_none_x8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	5f8a9bebc0	Enable bits_image_fetch_bilinear_affine_pad_x8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	c59584cb86	Enable bits_image_fetch_bilinear_affine_normal_a8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	2292cff304	Enable bits_image_fetch_bilinear_affine_reflect_a8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	8b29162693	Enable bits_image_fetch_bilinear_affine_none_a8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	e8555874e1	Enable bits_image_fetch_bilinear_affine_pad_a8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	f9778c15e9	Use a macro to generate some {a,x}8r8g8b8, a8, and r5g6b5 bilinear fetchers. There are versions for all combinations of x8r8g8b8/a8r8g8b8 and pad/repeat/none/normal repeat modes. The bulk of each scaler is an inline function that takes a format and a repeat mode as parameters. The new scalers are all commented out, but the next commits will enable them one at a time to facilitate bisecting.	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	6d1e10a8b5	test: Add affine-test This test tests compositing with various affine transformations. It is almost identical to scaling-test, except that it also applies a random rotation in addition to the random scaling and translation.	2010-09-21 08:31:09 -04:00
Søren Sandmann Pedersen	4fa33537d7	analyze_extents: Fast path for non-transformed BITS images Profiling various cairo traces showed that we were spending a lot of time in analyze_extents and compute_sample_extents(). This was especially bad for glyphs where all this computation was completely unnecessary. This patch adds a fast path for the case of non-transformed BITS images. The result is approximately a 6% improvement on the firefox-talos-gfx benchmark: Before: [ # ] backend test min(s) median(s) stddev. count [ 0] image firefox-talos-gfx 13.797 13.848 0.20% 6/6 After: [ # ] backend test min(s) median(s) stddev. count [ 0] image firefox-talos-gfx 12.946 13.018 0.39% 6/6	2010-09-21 08:31:09 -04:00
Søren Sandmann Pedersen	c97881fe3c	Move some of the FAST_PATH_COVERS_CLIP computation to pixman-image.c When an image is solid or repeating, the FAST_PATH_COVERS_CLIP flag can be set in compute_image_info(). Also the code that turned this flag off in pixman.c was not correct; it didn't take transformations into account. With this patch, pixman.c doesn't set the flag by default, but instead relies on the call to compute_samples_extents() to set it when possible.	2010-09-21 08:31:09 -04:00
Tor Lillqvist	3411f9399c	Support __thread on MINGW 4.5 By the way, it seems that with gcc 4.5.0 from mingw.org, __thread, sse and mmx work fine. I added the below to pixman 0.18 and as far as I can see, it works. make check reports no problems. (Earlier I had to use --disable-mmx and --disable-sse2.) Also gtk-demo and gimp run fine. (Also a change to get rid of the warnings about -fvisibility being ignored.)	2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen	add0fd1bac	Clip composite region against the destination alpha map extents. Otherwise we can end up writing outside the alpha map.	2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen	af2f0080fe	Remove FAST_PATH_NARROW_FORMAT flag if there is a wide alpha map If an image has an alpha map that has wide components, then we need to use 64 bit processing for that image. We detect this situation in pixman-image.c and remove the FAST_PATH_NARROW_FORMAT flag. In pixman-general, the wide/narrow decision is now based on the flags instead of on the formats.	2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen	0afc613415	Rename FAST_PATH_NO_WIDE_FORMAT to FAST_PATH_NARROW_FORMAT This avoids a negative in the name. Also, by renaming the "wide" variable in pixman-general.c to "narrow" and fixing up the logic correspondingly, the code there reads a lot more straightforwardly.	2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen	ae77548f0d	Update and extend the alphamap test - Test many more combinations of formats - Test destination alpha maps - Test various different alpha origins Also add a transformation to the destination, but comment it out because it is actually broken at the moment (and pretty difficult to fix).	2010-09-21 08:28:55 -04:00
Søren Sandmann Pedersen	dc9fe269ea	Add fence_malloc() and fence_free(). These variants of malloc() and free() try to surround the allocated memory with protected pages so that out-of-bounds accessess will cause a segmentation fault. If mprotect() and getpagesize() are not available, these functions are simply equivalent to malloc() and free().	2010-09-21 08:28:55 -04:00
Søren Sandmann Pedersen	f4dc73bad4	Do opacity computation with shifts instead of comparing with 0 Also add a COMPILE_TIME_ASSERT() macro and use it to assert that the shift is correct.	2010-09-21 08:28:55 -04:00
Siarhei Siamashka	517a77a992	SSE2 optimization for scaled over_8888_8888 operation with nearest filter This is the first demo implementation, it should be possible to generalize it later to cover more operations with less lines of code. It should be also possible to introduce the use of '__builtin_constant_p' gcc builtin function for an efficient way of checking if 'unit_x' is known to be zero at compile time (when processing padding pixels for NONE, or PAD repeat). Benchmarks from Intel Core i7 860: == before (nearest OVER) == op=3, src_fmt=20028888, dst_fmt=20028888, speed=142.01 MPix/s == after (nearest OVER) == op=3, src_fmt=20028888, dst_fmt=20028888, speed=314.99 MPix/s == performance of nonscaled operation as a reference == op=3, src_fmt=20028888, dst_fmt=20028888, speed=652.09 MPix/s	2010-09-21 13:33:57 +03:00
Siarhei Siamashka	abc90dad57	NONE repeat support for fast scaling with nearest filter Implemented very similar to PAD repeat. And gcc also seems to be able to completely eliminate the code responsible for left and right padding pixels for OVER operation with NONE repeat.	2010-09-21 13:33:08 +03:00
Siarhei Siamashka	45833d5b19	PAD repeat support for fast scaling with nearest filter When processing pixels from the left and right padding, the same scanline function is used with 'unit_x' set to 0. Actually appears that gcc can handle this quite efficiently. When using 'restrict' keyword, it is able to optimize the whole operation performed on left or right padding pixels to a small unrolled loop (the code is reduced to a simple fill implementation): 9b30: 89 08 mov %ecx,(%rax) 9b32: 89 48 04 mov %ecx,0x4(%rax) 9b35: 48 83 c0 08 add $0x8,%rax 9b39: 49 39 c0 cmp %rax,%r8 9b3c: 75 f2 jne 9b30 Without 'restrict' keyword, there is one instruction more: reloading source pixel data from memory in the beginning of each iteration. That is slower, but also acceptable.	2010-09-21 13:32:11 +03:00
Siarhei Siamashka	3db0cc5c75	Introduce a fake PIXMAN_REPEAT_COVER constant We need to implement a true PIXMAN_REPEAT_NONE support later (padding the source with zero pixels). So it's better not to use PIXMAN_REPEAT_NONE for handling FAST_PATH_SAMPLES_COVER_CLIP special case.	2010-09-21 13:30:59 +03:00

... 7 8 9 10 11 ...

1970 Commits