pixman

mirror of https://salsa.debian.org/xorg-team/lib/pixman synced 2025-09-07 13:40:37 +00:00

Author	SHA1	Message	Date
Søren Sandmann Pedersen	3f7da59352	test: Parallize composite.c with OpenMP Each test uses the test number as the random number seed; if it didn't, all the threads would run the same tests since they would all start from the same seed.	2010-10-11 12:06:20 -04:00
Søren Sandmann Pedersen	a10ccc9f30	test: Change composite so that it tests randomly generated images Previously this test would try to exhaustively test all combinations of formats and operators, which meant that it would take hours to run. Instead, generate images randomly and test compositing those. Cc: chris@chris-wilson.co.uk	2010-10-11 12:06:20 -04:00
Søren Sandmann Pedersen	55e4065cbb	test: Fix eval_diff() so that it provides useful error values. Previously, this function would evaluate the error under the assumption that the format was 565 or wider. This patch changes it to take the actual format into account. With that fixed, we can turn on testing for the rest of the formats. Cc: chris@chris-wilson.co.uk	2010-10-11 12:06:20 -04:00
Søren Sandmann Pedersen	fe411cf2ac	test: Fix bug in color_correct() in composite.c This function was using the number of bits in a channel as if it were a mask, which lead to many spurious errors. With that fixed, we can turn on testing for all formats where all channels have 5 or more bits. Cc: chris@chris-wilson.co.uk	2010-10-11 12:06:20 -04:00
Søren Sandmann Pedersen	4e89a5b7f3	Remove broken optimizations in combine_disjoint_over_u() The first broken optimization is that it checks "a != 0x00" where it should check "s != 0x00". The other is that it skips the computation when alpha is 0xff. That is wrong because in the formula: min (1, (1 - Aa)/Ab) the render specification states that if Ab is 0, the quotient is defined to positive infinity. That is the case even if (1 - Aa) is 0.	2010-10-11 12:06:20 -04:00
Siarhei Siamashka	8d76c1b339	ARM: restore fallback to ARMv6 implementation from NEON in the delegate chain After fast path cache introduction, the overhead of having this fallback is insignificant. On the other hand, some of the ARM assembly optimizations (for example nearest neighbor scaling) do not need NEON.	2010-10-11 01:07:07 +03:00
Siarhei Siamashka	c748650d70	Use more unrolling for scaled src_0565_0565 with nearest filter Benchmark from Intel Core i7 860: == before == op=1, src_fmt=10020565, dst_fmt=10020565, speed=1335.29 MPix/s == after == op=1, src_fmt=10020565, dst_fmt=10020565, speed=1550.96 MPix/s == performance of nonscaled src_0565_0565 operation as a reference == op=1, src_fmt=10020565, dst_fmt=10020565, speed=2401.31 MPix/s Benchmark from ARM Cortex-A8: == before == op=1, src_fmt=10020565, dst_fmt=10020565, speed=81.79 MPix/s == after == op=1, src_fmt=10020565, dst_fmt=10020565, speed=89.55 MPix/s == performance of nonscaled src_0565_0565 operation as a reference == op=1, src_fmt=10020565, dst_fmt=10020565, speed=197.44 MPix/s	2010-10-11 01:07:01 +03:00
Siarhei Siamashka	a520c15e11	ARM: added 'neon_composite_out_reverse_8_0565' fast path == before == outrev_8_0565 = L1: 22.91 L2: 22.40 M: 18.75 ( 10.47%) HT: 12.62 VT: 12.22 R: 11.32 RT: 5.30 ( 58Kops/s) == after == outrev_8_0565 = L1: 176.27 L2: 151.70 M:108.79 ( 60.81%) HT: 50.43 VT: 37.16 R: 32.26 RT: 9.62 ( 97Kops/s)	2010-10-04 23:08:54 +03:00
Siarhei Siamashka	d8820360f7	ARM: added 'neon_composite_add_0565_8_0565' fast path == before == add_0565_8_0565 = L1: 14.05 L2: 14.03 M: 11.57 ( 12.94%) HT: 8.31 VT: 8.10 R: 7.47 RT: 3.64 ( 42Kops/s) == after == add_0565_8_0565 = L1: 123.36 L2: 94.70 M: 74.36 ( 83.15%) HT: 31.17 VT: 23.97 R: 21.06 RT: 6.42 ( 70Kops/s)	2010-10-04 23:08:47 +03:00
Siarhei Siamashka	2f6c7b4f9d	ARM: NEON: added forgotten cache preload for over_n_8888/over_n_0565 Prefetch provides up to 40-50% better performance when working with large images and/or when having lots of L2 cache misses on ARM Cortex-A8 @ 720MHz: == before == over_n_8888 = L1: 225.83 L2: 181.02 M: 55.57 ( 41.41%) HT: 38.96 VT: 36.92 R: 32.84 RT: 14.15 ( 123Kops/s) over_n_0565 = L1: 153.91 L2: 149.69 M: 83.17 ( 30.95%) HT: 50.41 VT: 49.15 R: 40.56 RT: 15.45 ( 131Kops/s) == after == over_n_8888 = L1: 222.39 L2: 170.95 M: 76.86 ( 57.27%) HT: 58.80 VT: 53.03 R: 45.51 RT: 14.13 ( 124Kops/s) over_n_0565 = L1: 151.87 L2: 149.54 M:125.63 ( 46.80%) HT: 67.85 VT: 57.54 R: 50.21 RT: 15.32 ( 130Kops/s)	2010-10-04 23:05:24 +03:00
Mika Yrjola	b924bb1f81	Fix "syntax error: empty declaration" warnings. These minor changes should fix a large number of macro declaration - related "syntax error: empty declaration" warnings which are seen while compiling the code with the Solaris Studio compiler.	2010-10-04 11:20:01 -04:00
Søren Sandmann Pedersen	73c1fefa1b	Delete simple repeat code This was supposedly an optimization, but it has pathological cases where it definitely isn't. For example a 1 x n image will cause it to have terrible memory access patterns and to generate a ton of modulus operations. Since no one has ever measured whether it actually is an improvement, and since it is doing the repeating at the wrong the stage in the pipeline, and since with the previous commit it can't be triggered anymore because we now require SAMPLES_COVER_CLIP for regular fast paths, just delete it.	2010-10-04 11:19:27 -04:00
Søren Sandmann Pedersen	a4d1c9d383	Fix bug in FAST_PATH_STD_FAST_PATH The standard fast paths deal with two kinds of images: solids and bits. These two image types require different flags, but PIXMAN_STD_FAST_PATH uses the same ones for both. This patch makes it so that solid images just get the standard flags, while bits images must be untransformed contain the destination clip within the sample grid. This means that the old FAST_PATH_COVERS_CLIP flag is now not used anymore, so it can be deleted.	2010-10-04 11:17:53 -04:00
Dmitri Vorobiev	10e13135c3	Some clean-ups in fence_malloc() and fence_free() This patch removes an unnecessary typecast of MAP_FAILED, replaces an erroneous free() by the correct munmap() in the error path for a failing mprotect(), and, finally, removes redundant calls to mprotect() that aren't necessary, because munmap() doesn't call for any specific memory protection.	2010-09-29 02:15:12 -04:00
Søren Sandmann Pedersen	ba693d2e88	Fix search-and-replace issue in lowlevel-blt-bench.c	2010-09-28 02:52:17 -04:00
Søren Sandmann Pedersen	77d3e5f6ff	Rename all the fast paths with _8000 in their names to _8 This inconsistent naming somehow survived the refactoring from a while back.	2010-09-28 00:07:47 -04:00
Liu Xinyun	ba69989374	Remove cache prefetch code. The performance is decreased with cache prefetch, especially for ATOM. So remove these code. Following is the experiment. old: 0.19.5-with-cache-prefetch new: 0.19.5-without-cache-prefetch CPU: Intel Atom N270@1.6GHz OS: MeeGo (32 bits) Speedups ======== image-rgba poppler-0 17125.68 (17279.58 0.92%) -> 14765.36 (15926.49 3.54%): 1.16x speedup image-rgba ocitysmap-0 9008.25 (9040.41 7.50%) -> 8277.94 (8343.09 5.44%): 1.09x speedup image-rgba xfce4-terminal-a1-0 18020.76 (18230.68 0.97%) -> 16703.77 (16712.42 1.22%): 1.08x speedup image-rgba gnome-terminal-vim-0 25081.38 (25133.38 0.24%) -> 23407.47 (23652.98 0.54%): 1.07x speedup image-rgba firefox-talos-gfx-0 57916.97 (57973.20 0.11%) -> 54556.64 (54624.55 0.39%): 1.06x speedup image-rgba firefox-planet-gnome-0 102377.47 (103496.63 0.70%) -> 96816.65 (97075.54 0.15%): 1.06x speedup image-rgba swfdec-giant-steps-0 12376.24 (12616.84 1.02%) -> 11705.30 (11825.20 1.06%): 1.06x speedup CPU: Intel Core(TM)2 Duo CPU T9600@2.80GHz OS: Ubuntu 10.04 (64bits) Speedups ======== image-rgba ocitysmap-0 2671.46 (2691.82 8.55%) -> 2296.20 (2307.26 5.77%): 1.16x speedup image-rgba swfdec-giant-steps-0 1614.55 (1615.18 1.68%) -> 1532.84 (1538.52 0.72%): 1.05x speedup Signed-off-by: Liu Xinyun <xinyun.liu@intel.com> Signed-off-by: Chen Miaobo <miaobo.chen@intel.com>	2010-09-27 23:44:09 -04:00
Dmitri Vorobiev	56777f3f67	Use <sys/mman.h> macros only when they are available Not all systems are regular Unices, so let's be careful with the mmap()-related stuff, which might be unavailable. This patch makes sure that mmap() and friends is used only when the <sys/mman.h> header is found.	2010-09-23 16:02:29 -04:00
Søren Sandmann Pedersen	39524a4687	Revert "add enable-cache-prefetch option" Revert this accidentally committed patch. This reverts commit `19ea0e16b9`.	2010-09-21 14:20:43 -04:00
Søren Sandmann Pedersen	e97da21049	If MAP_ANONYMOUS is not defined, define it to MAP_ANON. This hopefully fixes the build failure on OS X.	2010-09-21 14:12:00 -04:00
Liu Xinyun	19ea0e16b9	add enable-cache-prefetch option OK. here is the work to clear all cache prefetch. Please review it. 3x On Tue, Sep 21, 2010 at 11:36:30PM +0800, Soeren Sandmann wrote: > Liu Xinyun <xinyun.liu@intel.com> writes: > > > This patch is to add a new configuration option: enable-cache-prefetch, > > which is default yes. > > > > Here is a link which talks on cache issue. > > http://lists.freedesktop.org/archives/pixman/2010-June/000218.html > > > > When disable it on Atom CPU(configured with --enable-cache-prefetch=no), > > it will have a little performance gain. Here is the patch. > > I think the cache prefetch code should just be deleted outright. No > benchmarks that I'm aware of show it to be an improvement. > > > Thanks, > Soren >From bca2192ef524bcae4eea84d0ffed9e8c4855675f Mon Sep 17 00:00:00 2001 From: Liu Xinyun <xinyun.liu@intel.com> Date: Wed, 22 Sep 2010 00:11:56 +0800 Subject: [PATCH] remove cache prefetch	2010-09-21 12:35:51 -04:00
Søren Sandmann Pedersen	edd1733966	Post-release version bump to 0.19.5	2010-09-21 10:18:44 -04:00
Søren Sandmann Pedersen	e5b3a6e710	Pre-release version bump to 0.19.4	2010-09-21 10:11:34 -04:00
Søren Sandmann Pedersen	0742ba4164	compute_composite_region32: Zero extents before returning FALSE. If the extents of the composite region are broken such that x2 <= x1 or y2 <= y1, then we need to zero the extents before returning so that the region won't be completely broken when calling pixman_region32_fini().	2010-09-21 10:05:52 -04:00
Jonathan Morton	7cd4f2fa20	Add a lowlevel blitter benchmark This test is a modified version of Siarhei's compositor throughput benchmark. It's expanded with explicit reporting of memory bandwidth consumption for the M-test, and with an additional 8x8-random test intended to determine peak ops/sec capability. There are also quite a lot more operations tested for.	2010-09-21 08:50:18 -04:00
Dmitri Vorobiev	eab3a77877	Add noinline macro This patch adds a noinline macro, which expands to compiler-dependent keywords that tell the compiler to never inline a function.	2010-09-21 08:50:17 -04:00
Dmitri Vorobiev	cab3261c0d	Add gettime() routine to test utils Impending benchmark code will need a function to get current time in seconds, and this patch introduces such routine. We try to use the POSIX gettimeofday() function when available, and fall back to clock() when not.	2010-09-21 08:50:17 -04:00
Dmitri Vorobiev	fd3c87d460	Move aligned_malloc() to utils The aligned_malloc() routine will be used in more than one test utility. At least, a low-level blitter benchmark needs it. Therefore, let's make this function a part of common test utilities code.	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	f474783607	Enable bits_image_fetch_bilinear_affine_normal_r5g6b5	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	91521d30ab	Enable bits_image_fetch_bilinear_affine_reflect_r5g6b5	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	372d7b954a	Enable bits_image_fetch_bilinear_affine_none_r5g6b5	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	a826ae0e3a	Enable bits_image_fetch_bilinear_affine_pad_r5g6b5	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	c5238bd180	Enable bits_image_fetch_bilinear_affine_normal_a8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	d12daefcdb	Enable bits_image_fetch_bilinear_affine_reflect_a8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	9388be3293	Enable bits_image_fetch_bilinear_affine_none_a8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	8e4d4e8d11	Enable bits_image_fetch_bilinear_affine_pad_a8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	ce1f6c50b4	Enable bits_image_fetch_bilinear_affine_normal_x8r8g8b8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	83f2ee3e95	Enable bits_image_fetch_bilinear_affine_reflect_x8r8g8b8	2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen	be37ae331c	Enable bits_image_fetch_bilinear_affine_none_x8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	5f8a9bebc0	Enable bits_image_fetch_bilinear_affine_pad_x8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	c59584cb86	Enable bits_image_fetch_bilinear_affine_normal_a8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	2292cff304	Enable bits_image_fetch_bilinear_affine_reflect_a8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	8b29162693	Enable bits_image_fetch_bilinear_affine_none_a8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	e8555874e1	Enable bits_image_fetch_bilinear_affine_pad_a8r8g8b8	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	f9778c15e9	Use a macro to generate some {a,x}8r8g8b8, a8, and r5g6b5 bilinear fetchers. There are versions for all combinations of x8r8g8b8/a8r8g8b8 and pad/repeat/none/normal repeat modes. The bulk of each scaler is an inline function that takes a format and a repeat mode as parameters. The new scalers are all commented out, but the next commits will enable them one at a time to facilitate bisecting.	2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen	6d1e10a8b5	test: Add affine-test This test tests compositing with various affine transformations. It is almost identical to scaling-test, except that it also applies a random rotation in addition to the random scaling and translation.	2010-09-21 08:31:09 -04:00
Søren Sandmann Pedersen	4fa33537d7	analyze_extents: Fast path for non-transformed BITS images Profiling various cairo traces showed that we were spending a lot of time in analyze_extents and compute_sample_extents(). This was especially bad for glyphs where all this computation was completely unnecessary. This patch adds a fast path for the case of non-transformed BITS images. The result is approximately a 6% improvement on the firefox-talos-gfx benchmark: Before: [ # ] backend test min(s) median(s) stddev. count [ 0] image firefox-talos-gfx 13.797 13.848 0.20% 6/6 After: [ # ] backend test min(s) median(s) stddev. count [ 0] image firefox-talos-gfx 12.946 13.018 0.39% 6/6	2010-09-21 08:31:09 -04:00
Søren Sandmann Pedersen	c97881fe3c	Move some of the FAST_PATH_COVERS_CLIP computation to pixman-image.c When an image is solid or repeating, the FAST_PATH_COVERS_CLIP flag can be set in compute_image_info(). Also the code that turned this flag off in pixman.c was not correct; it didn't take transformations into account. With this patch, pixman.c doesn't set the flag by default, but instead relies on the call to compute_samples_extents() to set it when possible.	2010-09-21 08:31:09 -04:00
Tor Lillqvist	3411f9399c	Support __thread on MINGW 4.5 By the way, it seems that with gcc 4.5.0 from mingw.org, __thread, sse and mmx work fine. I added the below to pixman 0.18 and as far as I can see, it works. make check reports no problems. (Earlier I had to use --disable-mmx and --disable-sse2.) Also gtk-demo and gimp run fine. (Also a change to get rid of the warnings about -fvisibility being ignored.)	2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen	add0fd1bac	Clip composite region against the destination alpha map extents. Otherwise we can end up writing outside the alpha map.	2010-09-21 08:31:08 -04:00

... 6 7 8 9 10 ...

1929 Commits