pixman

mirror of https://salsa.debian.org/xorg-team/lib/pixman synced 2025-09-01 02:10:49 +00:00

Author	SHA1	Message	Date
Søren Sandmann Pedersen	1820131fe6	utils.[ch]: Add pixel_checker_get_masks() This function returns the a, r, g, and b masks corresponding to the pixel checker's format.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	5eb61f72ea	test/utils.[ch]: Add pixel_checker_convert_pixel_to_color() This function takes a pixel in the format corresponding to the pixel checker, and converts to a color_t.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	3ae717f71a	test: Move do_composite() function from composite.c to utils.c So that it can be used in other tests.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	958bd334b3	Post-release version bump to 0.29.3	2013-01-29 21:42:02 -05:00
Søren Sandmann Pedersen	a56707e23b	Pre-release version bump to 0.29.2	2013-01-29 21:14:51 -05:00
Søren Sandmann Pedersen	349015e1fc	stresstest: Ensure that the rasterizer is only given alpha formats In `c2cb303d33`, return_if_fail()s were added to prevent the trapezoid rasterizers from being called with non-alpha formats. However, stress-test actually does call the rasterizers with non-alpha formats, but because _pixman_log_error() is disabled in versions with an odd minor number, the errors never materialized. Fix this by changing the argument to random format to an enum of three values DONT_CARE, PREFER_ALPHA, or REQUIRE_ALPHA, and then in the switch that calls the trapezoid rasterizers, pass the appropriate value for the function in question.	2013-01-29 20:43:51 -05:00
Søren Sandmann Pedersen	afde862928	Change default GPGKEY to 3892336E, which is soren.sandmann@gmail.com The old one belongs to the email address sandmann@daimi.au.dk, which doesn't work anyore. Also use gpg to get the name and address for the "(Signed by ...)" line since that works more reliably for me than using git.	2013-01-29 15:24:22 -05:00
Ben Avison	69a7a9b6b6	Improve L1 and L2 benchmark tests for caches that don't use allocate-on-write In particular this affects single-core ARMs (e.g. ARM11, Cortex-A8), which are usually configured this way. For other CPUs, this should only add a constant time, which will be cancelled out by the EXCLUDE_OVERHEAD runs. The problems were caused by cachelines becoming permanently evicted from the cache, because the code that was intended to pull them back in again on each iteration assumed too long a cache line (for the L1 test) or failed to read memory beyond the first pixel row (for the L2 test). Also, the reloading of the source buffer was unnecessary. These issues were identified by Siarhei in this post: http://lists.freedesktop.org/archives/pixman/2013-January/002543.html	2013-01-29 15:23:05 -05:00
Søren Sandmann Pedersen	1fa67f499d	pixman-combine-float.c: Use IS_ZERO() in clip_color() and set_sat() The clip_color() function has some checks to avoid division by zero, but they are done by comparing the value to 4 * FLT_EPSILON, where a better choice is the IS_ZERO() macro that compares to +/- FLT_MIN. In set_sat(), the check is that max > min before dividing by max - min, but that has the potential problem that interactions between GCC optimizions and 80 bit x87 registers could mean that (max > min) is true in 80 bits, but (max - min) is 0 in 32 bits, so that the division by zero is not prevented. Using IS_ZERO() here as well prevents this.	2013-01-29 15:23:05 -05:00
Ben Avison	7e53e58664	ARMv6: Replacement add_8_8, over_8888_8888, over_8888_n_8888 and over_n_8_8888 routines Improved by adding preloads, combining writes and using the SEL instruction. add_8_8 Before After Mean StdDev Mean StdDev Confidence Change L1 62.1 0.2 543.4 12.4 100.0% +774.9% L2 38.7 0.4 116.8 1.7 100.0% +201.8% M 40.0 0.1 110.1 0.5 100.0% +175.3% HT 30.9 0.2 43.4 0.5 100.0% +40.4% VT 30.6 0.3 39.2 0.5 100.0% +28.0% R 21.3 0.2 35.4 0.4 100.0% +66.6% RT 8.6 0.2 10.2 0.3 100.0% +19.4% over_8888_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 32.3 0.1 38.0 0.2 100.0% +17.7% L2 15.9 0.4 30.6 0.5 100.0% +92.8% M 13.3 0.0 25.6 0.0 100.0% +92.9% HT 10.5 0.1 15.5 0.1 100.0% +47.1% VT 10.4 0.1 14.6 0.1 100.0% +40.8% R 10.3 0.1 15.8 0.1 100.0% +53.3% RT 6.0 0.1 7.6 0.1 100.0% +25.9% over_8888_n_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 17.6 0.1 21.0 0.1 100.0% +19.2% L2 11.2 0.2 19.2 0.1 100.0% +71.2% M 10.2 0.0 19.6 0.0 100.0% +92.6% HT 8.4 0.0 11.9 0.1 100.0% +41.7% VT 8.3 0.0 11.3 0.1 100.0% +36.4% R 8.3 0.0 11.8 0.1 100.0% +43.1% RT 5.1 0.1 6.2 0.1 100.0% +21.3% over_n_8_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 17.5 0.1 22.8 0.8 100.0% +30.1% L2 14.2 0.3 21.7 0.2 100.0% +52.6% M 12.0 0.0 22.3 0.0 100.0% +84.8% HT 10.5 0.1 14.1 0.1 100.0% +34.5% VT 10.0 0.1 13.5 0.1 100.0% +35.3% R 9.4 0.0 12.9 0.2 100.0% +37.7% RT 5.5 0.1 6.5 0.2 100.0% +19.2%	2013-01-29 21:48:03 +02:00
Ben Avison	f87dfd6f37	ARMv6: New conversion routines There was no previous attempt at accelerating these specifically for ARMv6. src_x888_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 96.7 0.5 270.4 2.6 100.0% +179.5% L2 44.6 2.7 110.6 9.7 100.0% +148.0% M 26.9 0.1 87.6 0.5 100.0% +226.1% HT 19.3 0.2 37.5 0.4 100.0% +93.7% VT 18.6 0.1 33.7 0.4 100.0% +81.6% R 18.4 0.1 32.2 0.3 100.0% +75.2% RT 9.2 0.2 12.1 0.3 100.0% +31.4% src_0565_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 37.0 0.3 66.9 0.2 100.0% +80.8% L2 30.3 0.2 55.9 0.3 100.0% +84.4% M 25.9 0.0 62.3 0.2 100.0% +140.3% HT 15.2 0.1 33.1 0.3 100.0% +116.9% VT 15.1 0.1 30.7 0.3 100.0% +103.6% R 14.2 0.1 27.6 0.3 100.0% +94.0% RT 6.0 0.1 11.2 0.3 100.0% +87.2%	2013-01-29 21:47:59 +02:00
Ben Avison	a0f59f3b28	ARMv6: New blit routines These are usable either as various composite operations, or via the top-level function pixman_blt() which now does some blitting for the first time on an ARMv6 platform (previously it just returned FALSE). src_8888_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 414.5 9.4 445.8 3.6 100.0% +7.6% L2 93.3 20.7 114.5 12.9 100.0% +22.7% M 57.0 0.2 89.2 0.5 100.0% +56.4% HT 28.7 0.3 39.6 0.4 100.0% +37.9% VT 25.5 0.2 35.3 0.4 100.0% +38.4% R 20.1 0.1 33.8 0.3 100.0% +67.8% RT 7.8 0.2 12.7 0.4 100.0% +62.7% src_0565_0565 Before After Mean StdDev Mean StdDev Confidence Change L1 397.4 6.1 412.5 5.2 100.0% +3.8% L2 143.2 10.9 141.9 6.5 68.9% -0.9% (insignificant) M 90.7 0.4 133.5 0.7 100.0% +47.1% HT 38.6 0.3 53.7 0.7 100.0% +39.0% VT 33.0 0.3 47.3 0.6 100.0% +43.3% R 25.7 0.2 42.1 0.5 100.0% +64.1% RT 8.0 0.2 13.3 0.3 100.0% +65.6% src_8_8 Before After Mean StdDev Mean StdDev Confidence Change L1 716.5 9.8 768.2 20.4 100.0% +7.2% L2 246.2 12.7 260.5 8.8 100.0% +5.8% M 146.8 0.7 227.9 0.7 100.0% +55.2% HT 44.9 0.6 62.1 1.0 100.0% +38.2% VT 35.6 0.4 53.4 0.7 100.0% +50.0% R 29.7 0.3 48.2 0.6 100.0% +62.2% RT 8.6 0.2 12.9 0.4 100.0% +49.3%	2013-01-29 21:47:54 +02:00
Ben Avison	3cff56c5b0	ARMv6: New fill routines Note that this also effectively accelerates src_n_8888, src_n_0565 and src_n_8 composite types, because of the fast paths in pixman-fast-path.c implemented by fast_composite_solid_fill(), which end up dispatching these platform-specific fill routines. src_n_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 157.3 1.1 574.2 8.7 100.0% +265.0% L2 94.2 0.5 364.8 4.2 100.0% +287.3% M 92.7 0.4 358.7 1.1 100.0% +287.1% HT 68.5 0.9 133.6 4.0 100.0% +95.2% VT 61.3 0.8 111.8 2.6 100.0% +82.4% R 61.1 0.9 108.7 2.8 100.0% +78.1% RT 24.6 1.0 28.6 1.6 100.0% +16.0% src_n_0565 Before After Mean StdDev Mean StdDev Confidence Change L1 157.4 1.0 983.1 38.5 100.0% +524.6% L2 93.6 0.5 696.0 14.3 100.0% +643.4% M 92.7 0.4 680.5 1.0 100.0% +634.0% HT 68.3 0.9 160.3 6.6 100.0% +134.6% VT 61.1 0.8 130.1 3.4 100.0% +112.9% R 61.0 0.8 125.4 4.1 100.0% +105.7% RT 24.9 1.3 29.5 1.5 100.0% +18.2% src_n_8 Before After Mean StdDev Mean StdDev Confidence Change L1 154.7 1.0 1324.4 48.5 100.0% +756.3% L2 92.4 0.4 1178.4 10.9 100.0% +1175.6% M 92.9 0.4 1275.7 2.1 100.0% +1273.5% HT 68.2 1.0 169.8 5.5 100.0% +149.0% VT 61.2 1.0 138.5 3.6 100.0% +126.3% R 61.3 0.9 130.1 3.8 100.0% +112.4% RT 25.5 1.3 29.2 1.9 100.0% +14.6%	2013-01-29 21:47:49 +02:00
Ben Avison	2e173326aa	ARMv6: Lay the groundwork for later patches in the series Move the entire contents of pixman-arm-simd-asm.S to a new file; ultimately this will only retain the scaled operations, so it is named pixman-arm-simd-asm-scaled.S. Added new header file pixman-arm-simd-asm.h, containing the macros which are the basis of all the new ARMv6 implementations, although at this point in the series, nothing uses them and the library should be binary-identical.	2013-01-29 21:47:42 +02:00
Søren Sandmann Pedersen	65fc1adb65	demo/scale: Add a spin button to set the number of subsample bits For large upscalings the level of subsampling for the filter has a quite visible effect, so make it settable in the UI so that people can experiment with various values.	2013-01-27 23:06:28 -05:00
Siarhei Siamashka	ed39992564	Use pixman_transform_point_31_16() from pixman_transform_point() Old functions pixman_transform_point() and pixman_transform_point_3d() now become just wrappers for pixman_transform_point_31_16() and pixman_transform_point_31_16_3d(). Eventually their uses should be completely eliminated in the pixman code and replaced with their extended range counterparts. This is needed in order to be able to correctly handle any matrices and parameters that may come to pixman from the code responsible for XRender implementation.	2013-01-27 20:50:38 +02:00
Siarhei Siamashka	5a78d74ccc	test: Added matrix-test for testing projective transform accuracy This test uses __float128 data type when it is available for implementing a "perfect" reference implementation. The output from from pixman_transform_point_31_16() and pixman_transform_point_31_16_affine() is compared with the reference implementation to make sure that the rounding errors may only show up in a single least significant bit. The platforms and compilers, which do not support __float128 data type, can rely on crc32 checksum for the pseudorandom transform results.	2013-01-27 20:50:31 +02:00
Siarhei Siamashka	09600ae7e3	configure.ac: Added detection for __float128 support GCC supports 128-bit floating point data type on some platforms (including but not limited to x86 and x86-64). This may be useful for tests, which need prefectly accurate reference implementations of certain algorithms.	2013-01-27 20:50:26 +02:00
Siarhei Siamashka	c3deb8334a	Add higher precision "pixman_transform_point_*" functions The following new functions are added: pixman_transform_point_31_16_3d() - Calculates the product of a matrix and a vector multiplication. pixman_transform_point_31_16() - Calculates the product of a matrix and a vector multiplication. Then converts the homogenous resulting vector [x, y, z] to cartesian [x', y', 1] variant, where x' = x / z, and y' = y / z. pixman_transform_point_31_16_affine() - A faster sibling of the other two functions, which assumes affine transformation, where the bottom row of the matrix is [0, 0, 1] and the last element of the input vector is set to 1. These functions transform a point with 31.16 fixed point coordinates from the destination space to a point with 48.16 fixed point coordinates in the source space. The results are accurate and the rounding errors may only show up in the least significant bit. No overflows are possible for the affine transformations as long as the input data is provided in 31.16 format. In the case of projective transformations, some output values may be not representable using 48.16 fixed point format. In this case the results are clamped to return maximum or minimum 48.16 values (so that the caller can at least handle NONE and PAD repeats correctly).	2013-01-27 20:49:43 +02:00
Siarhei Siamashka	a47ed2c311	Faster fetch for the C variant of r5g6b5 src/dest iterator Processing two pixels at once is used to reduce the number of arithmetic operations. The speedup relative to the generic fetch_scanline_r5g6b5() from "pixman-access.c" (pixman was compiled with gcc 4.7.2): MIPS 74K 480MHz : 20.32 MPix/s -> 26.47 MPix/s ARM11 700MHz : 34.95 MPix/s -> 38.22 MPix/s ARM Cortex-A8 1000MHz : 87.44 MPix/s -> 100.92 MPix/s ARM Cortex-A9 1700MHz : 150.95 MPix/s -> 158.13 MPix/s ARM Cortex-A15 1700MHz : 148.91 MPix/s -> 155.42 MPix/s IBM Cell PPU 3200MHz : 75.29 MPix/s -> 98.33 MPix/s Intel Core i7 2800MHz : 257.02 MPix/s -> 376.93 MPix/s That's the performance for C code (SIMD and assembly optimizations are disabled via PIXMAN_DISABLE environment variable).	2013-01-27 20:48:31 +02:00
Siarhei Siamashka	e66fd5ccb6	Faster write-back for the C variant of r5g6b5 dest iterator Unrolling loops improves performance, so just use it here. Also GCC can't properly optimize this code for RISC processors and allocate 0x1F001F constant in a register. Because this constant is too large to be represented as an immediate operand in instructions, GCC inserts some redundant arithmetics. This problem can be workarounded by explicitly using a variable for 0x1F001F constant and also initializing it by a read from another volatile variable. In this case GCC is forced to allocate a register for it, because it is not seen as a constant anymore. The speedup relative to the generic store_scanline_r5g6b5() from "pixman-access.c" (pixman was compiled with gcc 4.7.2): MIPS 74K 480MHz : 33.22 MPix/s -> 43.42 MPix/s ARM11 700MHz : 50.16 MPix/s -> 78.23 MPix/s ARM Cortex-A8 1000MHz : 117.75 MPix/s -> 196.34 MPix/s ARM Cortex-A9 1700MHz : 177.04 MPix/s -> 320.32 MPix/s ARM Cortex-A15 1700MHz : 231.44 MPix/s -> 261.64 MPix/s IBM Cell PPU 3200MHz : 130.25 MPix/s -> 145.61 MPix/s Intel Core i7 2800MHz : 502.21 MPix/s -> 721.73 MPix/s That's the performance for C code (SIMD and assembly optimizations are disabled via PIXMAN_DISABLE environment variable).	2013-01-27 20:48:26 +02:00
Siarhei Siamashka	a9f6669416	Added C variants of r5g6b5 fetch/write-back iterators Adding specialized iterators for r5g6b5 color format allows us to work on fine tuning performance of r5g6b5 fetch/write-back operations in the pixman general "fetch -> combine -> store" pipeline. These iterators also make "src_x888_0565" fast path redundant, so it can be removed.	2013-01-27 20:48:22 +02:00
Chris Wilson	794033ed43	Eliminate duplicate copies of channel flags for pixman_image_composite32() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:16 +00:00
Chris Wilson	a59f081df4	Always return a valid function from lookup_combiner() We should always have at least a C combiner available, so we never expect the search to fail. If it does, emit an error and return a dummy function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:16 +00:00
Chris Wilson	520230914b	Always return a valid function from lookup_composite() We never expect to fail to find the appropriate function as the general_composite_rect should always match. So if somehow we fallthrough the search, emit a _pixman_log_error() and return a dummy function. Note that we remove some conditionals and a level of indentation hence a large amount of code movement. This also reveals that in a few places we are duplicating stack variables that can be eliminated later. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:15 +00:00
Chris Wilson	b283c864a3	sse2: Add fast paths for bilinear source with a solid mask Based on the existing sse2_8888_n_8888 nearest scaling routines. fishbowl on an i5-2500: 60.9s -> 56.9s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:15 +00:00
Chris Wilson	d00ce40912	sse2: Add a fast path for add_n_8_8888 This path is being exercised by compositing of trapezoids for clipmasks, for instance as used in the firefox-asteroids cairo-trace. IVB i7-3720qm ./tests/lowlevel-blt-bench add_n_8_8888: reference memcpy speed = 14846.7MB/s (3711.7MP/s for 32bpp fills) before: L1: 681.10 L2: 735.14 M:701.44 ( 28.35%) HT:283.32 VT:213.23 R:208.93 RT: 77.89 ( 793Kops/s) after: L1: 992.91 L2:1017.33 M:982.58 ( 39.88%) HT:458.93 VT:332.32 R:326.13 RT:136.66 (1287Kops/s) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:15 +00:00
Chris Wilson	7ced3beec9	sse2: Add a fast path for add_n_8888 This path is being exercised by inplace compositing of trapezoids, for instance as used in the firefox-asteroids cairo-trace. IVB i3-3720qm ./tests/lowlevel-blt-bench add_n_888: reference memcpy speed = 14918.3MB/s (3729.6MP/s for 32bpp fills) before: L1:1752.44 L2:2259.48 M:2215.73 ( 58.80%) HT:589.49 VT:404.04 R:424.69 RT:134.68 (1182Kops/s) after: L1:3931.21 L2:6132.78 M:3440.17 ( 92.24%) HT:1337.70 VT:1357.64 R:1270.27 RT:359.78 (2161Kops/s) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:15 +00:00
Jeff Muizelaar	b7f523e3bc	Add a version of bilinear_interpolation for precision <=4 Having 4 or fewer bits means we can do two components at a time in a single 32 bit register. Here are the results for firefox-fishtank on a Pandaboard with 4.6.3 and PIXMAN_DISABLE="arm-neon" Before: [ # ] backend test min(s) median(s) stddev. count [ 0] image t-firefox-fishtank 7.841 7.910 0.70% 6/6 After: [ # ] backend test min(s) median(s) stddev. count [ 0] image t-firefox-fishtank 6.951 6.995 1.11% 6/6	2013-01-25 13:14:37 -05:00
Ben Avison	24e83cae64	Tweaks to lowlevel-blt-bench This adds two extra tests, src_n_8 and src_8_8, which I have been using to benchmark my ARMv6 changes. I'd also like to propose that it requires an exact test name as the executable's argument, as achieved by this strstr to strcmp change. Without this, it is impossible to only benchmark (for example) add_8_8, add_n_8 or src_n_8, due to those also being substrings of many other test names.	2013-01-25 11:13:07 -05:00
Søren Sandmann Pedersen	b527a0e615	test: Use operator_name() and format_name() in composite.c With the operator_name() and format_name() functions there is no longer any reason for composite.c to have its own table of format and operator names.	2013-01-23 12:24:31 -05:00
Søren Sandmann Pedersen	4eb9a24aba	utils.[ch]: Add new format_name() function This function returns the name of the given format code, which is useful for printing out debug information. The function is written as a switch without a default value so that the compiler will warn if new formats are added in the future. The fake formats used in the fast path tables are also recognized. The function is used in alpha_map.c, where it replaces an existing format_name() function, and in blitters-test.c, affine-test.c, and scaling-test.c.	2013-01-23 12:24:31 -05:00
Søren Sandmann Pedersen	1676b49389	test/utils.[ch]: Add new function operator_name() This function returns the name of the given operator, which is useful for printing out debug information. The function is done as a switch without a default value so that the compiler will warn if new operators are added in the future. The function is used in affine-test.c, scaling-test.c, and blitters-test.c.	2013-01-23 12:24:31 -05:00
Søren Sandmann Pedersen	8d85311143	README: Add guidelines on how to contribute patches Ben Avison pointed out here: http://lists.freedesktop.org/archives/pixman/2013-January/002485.html that there isn't really any documentation about how to submit patches to pixman. This patch adds some information to the README file. v2: Incorporate some comments from Ben Avison v3: Change gitweb URL to cgit	2013-01-23 12:22:40 -05:00
Matt Turner	61dacffaf4	Convert INCLUDES to AM_CPPFLAGS INCLUDES has been deprecated starting with automake 1.13. Convert all occurrences with the recommended AM_CPPFLAGS replacement.	2013-01-22 22:08:30 -08:00
Matt Turner	c7c28f440d	Add new demos and tests to .gitignore	2013-01-22 22:08:30 -08:00
Nemanja Lukic	2c6577476e	MIPS: DSPr2: Added more fast-paths: - over_reverse_n_8888 - in_n_8_8 Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over_reverse_n_8888 = L1: 19.42 L2: 19.07 M: 15.38 ( 40.80%) HT: 13.35 VT: 13.10 R: 12.92 RT: 8.27 ( 49Kops/s) in_n_8_8 = L1: 21.20 L2: 22.86 M: 21.42 ( 14.21%) HT: 15.97 VT: 15.69 R: 15.47 RT: 8.00 ( 48Kops/s) Optimized: over_reverse_n_8888 = L1: 60.09 L2: 47.87 M: 28.65 ( 76.02%) HT: 23.58 VT: 22.51 R: 21.99 RT: 12.28 ( 60Kops/s) in_n_8_8 = L1: 89.38 L2: 86.07 M: 65.48 ( 43.44%) HT: 44.64 VT: 41.50 R: 40.77 RT: 16.94 ( 66Kops/s)	2013-01-22 03:12:59 +01:00
Nemanja Lukic	a67b0e24d7	MIPS: DSPr2: Added more fast-paths for REVERSE operation: - out_reverse_8_0565 - out_reverse_8_8888 Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): out_reverse_8_0565 = L1: 14.29 L2: 13.58 M: 12.14 ( 24.16%) HT: 9.23 VT: 9.12 R: 8.84 RT: 4.75 ( 36Kops/s) out_reverse_8_8888 = L1: 27.46 L2: 23.24 M: 17.41 ( 57.73%) HT: 12.61 VT: 12.47 R: 11.79 RT: 5.86 ( 41Kops/s) Optimized: out_reverse_8_0565 = L1: 28.24 L2: 25.64 M: 20.63 ( 41.05%) HT: 16.69 VT: 16.14 R: 15.50 RT: 8.69 ( 52Kops/s) out_reverse_8_8888 = L1: 52.78 L2: 41.44 M: 23.50 ( 77.94%) HT: 18.79 VT: 18.16 R: 16.90 RT: 9.11 ( 53Kops/s)	2013-01-22 03:10:31 +01:00
Maarten Lankhorst	01c2431ef8	Add 00-unexport-symbol.diff * Add 00-unexport-symbol.diff - remove test-only use of _pixman_internal_only_get_implementation - zap the only test requiring the use of this symbol	2013-01-08 18:16:23 +01:00
Maarten Lankhorst	d6b69d4f63	update symbols file and addd lintian override for hidden symbol	2013-01-08 17:10:12 +01:00
Maarten Lankhorst	0f8c56fe52	new upstream release	2013-01-08 16:12:25 +01:00
Maarten Lankhorst	818af795d4	pixman 0.28.2 release -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) iEYEABECAAYFAlDF0BkACgkQmxfmIW/3waiEegCcCVDzXL2gGouDGCBqJVOmzUcv ZnMAoI50IhP5KXKKEEx2dJlfFkzKVo5N =J62R -----END PGP SIGNATURE----- Merge tag 'pixman-0.28.2' into debian-experimental pixman 0.28.2 release	2013-01-08 16:10:57 +01:00
Søren Sandmann Pedersen	35cc965514	pixman-filter.c: Cope with NULL returns from malloc() v2: Don't return a pointer to uninitialized memory when the allocation of horz and vert fails, but allocation of params doesn't.	2013-01-06 17:38:23 -05:00
Søren Sandmann Pedersen	58526cfc72	Handle solid images in the noop iterator The noop src iterator already has code to handle solid images, but that code never actually runs currently because it is not possible for an image to have both a format code of PIXMAN_solid and a flag of FAST_PATH_BITS_IMAGE. If these two were to be set at the same time, the fast_composite_tiled_repeat() fast path would trigger for solid images (because it triggers for PIXMAN_any formats, which includes PIXMAN_solid), but for solid images we can usually do better than that fast path. So this patch removes _pixman_solid_fill_iter_init() and instead handles such images (along with repeating 1x1 bits images without an alpha map) in pixman-noop.c. When a 1x1R image is involved in the general composite path, before this patch, it would hit this code in repeat() in pixman-inlines.h: while (c >= size) c -= size; while (c < 0) c += size; and those loops could run for a huge number of iteratons (proportional to the composite width). For such cases, the performance improvement is really big: ./test/lowlevel-blt-bench -n add_n_8888: Before: add_n_8888 = L1: 3.86 L2: 3.78 M: 1.40 ( 0.06%) HT: 1.43 VT: 1.41 R: 1.41 RT: 1.38 ( 19Kops/s) After: add_n_8888 = L1:1236.86 L2:2468.49 M:1097.88 ( 49.04%) HT:476.49 VT:429.05 R:417.04 RT:155.12 ( 817Kops/s)	2013-01-06 17:30:12 -05:00
Marko Lindqvist	480dd38fd1	Fix build with automake-1.13 Automake-1.13 has removed long obsolete AM_CONFIG_HEADER macro ( http://lists.gnu.org/archive/html/automake/2012-12/msg00038.html ) and autoreconf errors out upon seeing it. Attached patch replaces obsolete AM_CONFIG_HEADER with now proper AC_CONFIG_HEADERS.	2013-01-04 01:54:10 +02:00
Siarhei Siamashka	1abde88ae6	Use more appropriate types and remove a magic constant	2013-01-04 01:27:06 +02:00
Siarhei Siamashka	c1fd5a4243	Define SIZE_MAX if it is not provided by the standard C headers C++ compilers do not define SIZE_MAX. It is also not available if the code is compiled by some C compilers: http://lists.freedesktop.org/archives/pixman/2012-August/002196.html	2013-01-04 01:26:55 +02:00
Siarhei Siamashka	66c4292822	Rename 'xor' variable to 'filler' (because 'xor' is a C++ keyword)	2012-12-20 03:14:21 +02:00
Søren Sandmann Pedersen	4dfda2adfe	float-combiner.c: Change tests for x == 0.0 tests to - FLT_MIN < x < FLT_MIN pixman-float-combiner.c currently uses checks like these: if (x == 0.0f) ... else ... / x; to prevent division by 0. In theory this is correct: a division-by-zero exception is only supposed to happen when the floating point numerator is exactly equal to a positive or negative zero. However, in practice, the combination of x87 and gcc optimizations causes issues. The x87 registers are 80 bits wide, which means the initial test: if (x == 0.0f) may be false when x is an 80 bit floating point number, but when x is rounded to a 32 bit single precision number, it becomes equal to 0.0. In principle, gcc should compensate for this quirk of x87, and there are some options such as -ffloat-store, -fexcess-precision=standard, and -std=c99 that will make it do so, but these all have a performance cost. It is also possible to set the FPU to a mode that makes it do all computation with single or double precision, but that would require pixman to save the existing mode before doing anything with floating point and restore it afterwards. Instead, this patch side-steps the issue by replacing exact checks for equality with zero with a new macro that checkes whether the value is between -FLT_MIN and FLT_MIN. There is extensive reading material about this issue linked off the infamous gcc bug 323: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=323	2012-12-19 13:49:32 -05:00
Siarhei Siamashka	2734071d7b	ARM: make use of UQADD8 instruction even in generic C code paths ARMv6 has UQADD8 instruction, which implements unsigned saturated addition for 8-bit values packed in 32-bit registers. It is very useful for UN8x4_ADD_UN8x4, UN8_rb_ADD_UN8_rb and ADD_UN8 macros (which would otherwise need a lot of arithmetic operations to simulate this operation). Since most of the major ARM linux distros are built for ARMv7, we are much less dependent on runtime CPU detection and can get practical benefits from conditional compilation here for a lot of users. The results of cairo-perf-trace benchmark on ARM Cortex-A15 with pixman compiled by gcc 4.7.2 and PIXMAN_DISABLE set to "arm-simd arm-neon": Speedups ======== image firefox-talos-gfx (29938.22 0.12%) -> (27814.76 0.51%) : 1.08x speedup image firefox-asteroids (23241.11 0.07%) -> (21795.19 0.07%) : 1.07x speedup image firefox-canvas-alpha (174519.85 0.08%) -> (164788.64 0.20%) : 1.06x speedup image poppler (9464.46 1.61%) -> (8991.53 0.14%) : 1.05x speedup	2012-12-18 20:49:58 +02:00

1 2 3 4 5 ...

2394 Commits