pixman

mirror of https://salsa.debian.org/xorg-team/lib/pixman synced 2025-09-01 10:35:16 +00:00

Author	SHA1	Message	Date
Søren Sandmann Pedersen	6dfdd8534f	Fix for infinite-loop test The infinite loop detected by "affine-test 212944861" is caused by an overflow in this expression: max_x = pixman_fixed_to_int (vx + (width - 1) * unit_x) + 1; where (width - 1) * unit_x doesn't fit in a signed int. This causes max_x to be too small so that this: src_width = 0 while (src_width < REPEAT_NORMAL_MIN_WIDTH && src_width <= max_x) src_width += src_image->bits.width; results in src_width being 0. Later on when src_width is used for repeat calculations, we get the infinite loop. By casting unit_x to int64_t, the expression no longer overflows and affine-test 212944861 and infinite-loop no longer loop forever. (cherry picked from commit `de60e2e0e3`)	2013-02-18 19:58:06 +01:00
Søren Sandmann Pedersen	2156fb51b3	gtk-utils.c: Use cairo in show_image() rather than GdkPixbuf GdkPixbufs are not premultiplied, so when using them to display pixman images, there is some unecessary conversions going on: First the image is converted to non-premultiplied, and then GdkPixbuf premultiplies before sending the result to the X server. These conversions may cause the displayed image to not be exactly identical to the original. This patch just uses a cairo image surface instead, which avoids these conversions. Also make the comment about sRGB a little more concise.	2013-02-15 18:57:24 -05:00
Ben Avison	5e207f825b	Fix to lowlevel-blt-bench The source, mask and destination buffers are initialised to 0xCC just after they are allocated. Between each benchmark, there are a pair of memcpys, from the destination buffer to the source buffer and back again (there are no explanatory comments, but presumably this is an effort to flush the caches). However, it has an unintended consequence, which is to change the contents of the buffers on entry to subsequent benchmarks. This means it is not a fair test: for example, with over_n_8888 (featured in the following patches) it reports L2 and even M tests as being faster than the L1 test, because after the L1 test, the source buffer is filled with fully opaque pixels, for which over_n_8888 has a shortcut. The fix here is simply to reverse the order of the memcpys, so src and destination are both filled with 0xCC on entry to all tests.	2013-02-13 02:24:34 -05:00
Stefan Weil	d26f922dc1	sse2: Use uintptr_t in type casts from pointer to integral value Some recent code added new type casts from pointer to unsigned long. These type casts result in compiler warnings for systems like MinGW-w64 (64 bit Windows) where sizeof(unsigned long) != sizeof(void *). Signed-off-by: Stefan Weil <sw@weilnetz.de> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	dc80eb09e2	lookup_composite: Don't update cache in case of error If we fail to find a composite function, don't update the fast path cache with the dummy compositing function. Also make the error message state that the bug is likely caused by issues with thread local storage.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	4dced81c91	Turn on error logging at all times While releasing 0.29.2 the distcheck run produced a number of error messages that had to be fixed in `349015e1fc`. These were not caught before so nobody had actually run pixman with debugging turned on. It's not the first time this has happened, see `5b0563f39e` for example. So this patch makes the return_if_fail() macros use unlikely() around the expressions and then turns on error logging at all times. The performance hit should negligible since we were already evaluating the expressions. The place where DEBUG actually does cause a performance hit is in the region selfcheck code, and that will still only be enabled in development snapshots.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	f4c9492c12	pixman-compiler.h: Add unlikely() macro When compiling with GCC this macro expands to __builtin_expect((expr), 0). On other compilers, it just expands to (expr).	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	5ebb5ac380	utils.c: Increase acceptable deviation to 0.0064 in pixel_checker_t The check-formats programs reveals that the 8 bit pipeline cannot meet the current 0.004 acceptable deviation specified in utils.c, so we have to increase it. Some of the failing pixels were captured in pixel-test, which with this commit now passes. == a4r4g4b4 DISJOINT_XOR a8r8g8b8 == The DISJOINT_XOR operator applied to an a4r4g4b4 source pixel of 0xd0c0 and a destination pixel of 0x5300ea00 results in the exact value: fa = (1 - da) / sa = (1 - 0x53 / 255.0) / (0xd / 15.0) = 0.7782 fb = (1 - sa) / da = (1 - 0xd / 15.0) / (0x53 / 255.0) = 0.4096 r = fa * (0xc / 15.0) + fb * (0xea / 255.0) = 0.99853 But when computing in 8 bits, we get: fa8 = ((255 - 0x53) * 255 + 0xdd / 2) / 0xdd = 0xc6 fb8 = ((255 - 0xdd) * 255 + 0x53 / 3) / 0x53 = 0x68 r8 = (fa8 * 0xcc + 127) / 255 + (fb8 * 0xea + 127) / 255 = 0xfd and 0xfd / 255.0 = 0.9921568627450981 for a deviation of 0.00637118610187, which we then have to consider acceptable given the current implementation. By switching to computing the result with r = (fa * s + fb * d + 127) / 255 rather than r = (fa * s + 127) / 255 + (fb * d + 127) / 255 the deviation would be only 0.00244961747442, so at some point it may be worth doing either this, or switching to floating point for operators that involve divisions. Note that the conversion from 4 bits to 8 bits does not cause any error in this case because both rounding and bit replication produces an exact result when the number of from-bits divide the number of to-bits. == a8r8g8b8 OVER r5g6b5 == When OVER compositing the a8r8g8b8 pixel 0x0f00c300 with the x14r6g6b6 pixel 0x03c0, the true floating point value of the resulting green channel is: 0xc3 / 255.0 + (1.0 - 0x0f / 255.0) * (0x0f / 63.0) = 0.9887955 but when compositing 8 bit values, where the 6-bit green channel is converted to 8 bit through bit replication, the 8-bit result is: 0xc3 + ((255 - 0x0f) * 0x3c + 127) / 255 = 251 which corresponds to a real value of 0.984314. The difference from the true value is 0.004482 which is bigger than the acceptable deviation of 0.004. So, if we were to compute all the CONJOINT/DISJOINT operators in floating point, or otherwise make them more accurate, the acceptable deviation could be set at 0.0045. If we were doing the 6-bit conversion with rounding: (x / 63.0 * 255.0 + 0.5) instead of bit replication, the deviation in this particular case would be only 0.0005, so we may want to consider this at some point.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	f2ba7fe1d8	test: Add new pixel-test regression test This test program contains a table of individual operator/pixel combinations. For each pixel combination, images of various sizes are filled with the pixels and then composited. The result is then verified against the output of do_composite(). If the result doesn't match, detailed error information is printed. The initial 14 pixel combinations currently all fail.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	6781636740	a1-trap-test: Add tests for operator_name and format_name() The check-formats.c test depends on the exact format of the strings returned from these functions, so add a test here. a1-trap-test isn't the ideal place, but it seems like overkill to add a new test just for these trivial checks.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	d1434d112c	test: Add new check-formats utility Given an operator and two formats, this program will composite and check all pixels where the red and blue channels are 0. That is, if the two formats are a8r8g8b8 and a4r4g4b4, all source pixels matching the mask 0xff00ff00 are composited with the given operator against all destination pixels matching the mask 0xf0f0 and the result is then verified against the do_composite() function that was moved to utils.c earlier. This program reveals that a number of operators and format combinations are not computed to within the precision currently accepted by pixel_checker_t. For example: check-formats over a8r8g8b8 r5g6b5 \| grep failed \| wc -l 30 reveals that there are 30 pixel combinations where OVER produces insufficiently precise results for the a8r8g8b8 and r5g6b5 formats.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	1820131fe6	utils.[ch]: Add pixel_checker_get_masks() This function returns the a, r, g, and b masks corresponding to the pixel checker's format.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	5eb61f72ea	test/utils.[ch]: Add pixel_checker_convert_pixel_to_color() This function takes a pixel in the format corresponding to the pixel checker, and converts to a color_t.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	3ae717f71a	test: Move do_composite() function from composite.c to utils.c So that it can be used in other tests.	2013-02-13 02:18:01 -05:00
Søren Sandmann Pedersen	958bd334b3	Post-release version bump to 0.29.3	2013-01-29 21:42:02 -05:00
Søren Sandmann Pedersen	a56707e23b	Pre-release version bump to 0.29.2	2013-01-29 21:14:51 -05:00
Søren Sandmann Pedersen	349015e1fc	stresstest: Ensure that the rasterizer is only given alpha formats In `c2cb303d33`, return_if_fail()s were added to prevent the trapezoid rasterizers from being called with non-alpha formats. However, stress-test actually does call the rasterizers with non-alpha formats, but because _pixman_log_error() is disabled in versions with an odd minor number, the errors never materialized. Fix this by changing the argument to random format to an enum of three values DONT_CARE, PREFER_ALPHA, or REQUIRE_ALPHA, and then in the switch that calls the trapezoid rasterizers, pass the appropriate value for the function in question.	2013-01-29 20:43:51 -05:00
Søren Sandmann Pedersen	afde862928	Change default GPGKEY to 3892336E, which is soren.sandmann@gmail.com The old one belongs to the email address sandmann@daimi.au.dk, which doesn't work anyore. Also use gpg to get the name and address for the "(Signed by ...)" line since that works more reliably for me than using git.	2013-01-29 15:24:22 -05:00
Ben Avison	69a7a9b6b6	Improve L1 and L2 benchmark tests for caches that don't use allocate-on-write In particular this affects single-core ARMs (e.g. ARM11, Cortex-A8), which are usually configured this way. For other CPUs, this should only add a constant time, which will be cancelled out by the EXCLUDE_OVERHEAD runs. The problems were caused by cachelines becoming permanently evicted from the cache, because the code that was intended to pull them back in again on each iteration assumed too long a cache line (for the L1 test) or failed to read memory beyond the first pixel row (for the L2 test). Also, the reloading of the source buffer was unnecessary. These issues were identified by Siarhei in this post: http://lists.freedesktop.org/archives/pixman/2013-January/002543.html	2013-01-29 15:23:05 -05:00
Søren Sandmann Pedersen	1fa67f499d	pixman-combine-float.c: Use IS_ZERO() in clip_color() and set_sat() The clip_color() function has some checks to avoid division by zero, but they are done by comparing the value to 4 * FLT_EPSILON, where a better choice is the IS_ZERO() macro that compares to +/- FLT_MIN. In set_sat(), the check is that max > min before dividing by max - min, but that has the potential problem that interactions between GCC optimizions and 80 bit x87 registers could mean that (max > min) is true in 80 bits, but (max - min) is 0 in 32 bits, so that the division by zero is not prevented. Using IS_ZERO() here as well prevents this.	2013-01-29 15:23:05 -05:00
Ben Avison	7e53e58664	ARMv6: Replacement add_8_8, over_8888_8888, over_8888_n_8888 and over_n_8_8888 routines Improved by adding preloads, combining writes and using the SEL instruction. add_8_8 Before After Mean StdDev Mean StdDev Confidence Change L1 62.1 0.2 543.4 12.4 100.0% +774.9% L2 38.7 0.4 116.8 1.7 100.0% +201.8% M 40.0 0.1 110.1 0.5 100.0% +175.3% HT 30.9 0.2 43.4 0.5 100.0% +40.4% VT 30.6 0.3 39.2 0.5 100.0% +28.0% R 21.3 0.2 35.4 0.4 100.0% +66.6% RT 8.6 0.2 10.2 0.3 100.0% +19.4% over_8888_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 32.3 0.1 38.0 0.2 100.0% +17.7% L2 15.9 0.4 30.6 0.5 100.0% +92.8% M 13.3 0.0 25.6 0.0 100.0% +92.9% HT 10.5 0.1 15.5 0.1 100.0% +47.1% VT 10.4 0.1 14.6 0.1 100.0% +40.8% R 10.3 0.1 15.8 0.1 100.0% +53.3% RT 6.0 0.1 7.6 0.1 100.0% +25.9% over_8888_n_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 17.6 0.1 21.0 0.1 100.0% +19.2% L2 11.2 0.2 19.2 0.1 100.0% +71.2% M 10.2 0.0 19.6 0.0 100.0% +92.6% HT 8.4 0.0 11.9 0.1 100.0% +41.7% VT 8.3 0.0 11.3 0.1 100.0% +36.4% R 8.3 0.0 11.8 0.1 100.0% +43.1% RT 5.1 0.1 6.2 0.1 100.0% +21.3% over_n_8_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 17.5 0.1 22.8 0.8 100.0% +30.1% L2 14.2 0.3 21.7 0.2 100.0% +52.6% M 12.0 0.0 22.3 0.0 100.0% +84.8% HT 10.5 0.1 14.1 0.1 100.0% +34.5% VT 10.0 0.1 13.5 0.1 100.0% +35.3% R 9.4 0.0 12.9 0.2 100.0% +37.7% RT 5.5 0.1 6.5 0.2 100.0% +19.2%	2013-01-29 21:48:03 +02:00
Ben Avison	f87dfd6f37	ARMv6: New conversion routines There was no previous attempt at accelerating these specifically for ARMv6. src_x888_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 96.7 0.5 270.4 2.6 100.0% +179.5% L2 44.6 2.7 110.6 9.7 100.0% +148.0% M 26.9 0.1 87.6 0.5 100.0% +226.1% HT 19.3 0.2 37.5 0.4 100.0% +93.7% VT 18.6 0.1 33.7 0.4 100.0% +81.6% R 18.4 0.1 32.2 0.3 100.0% +75.2% RT 9.2 0.2 12.1 0.3 100.0% +31.4% src_0565_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 37.0 0.3 66.9 0.2 100.0% +80.8% L2 30.3 0.2 55.9 0.3 100.0% +84.4% M 25.9 0.0 62.3 0.2 100.0% +140.3% HT 15.2 0.1 33.1 0.3 100.0% +116.9% VT 15.1 0.1 30.7 0.3 100.0% +103.6% R 14.2 0.1 27.6 0.3 100.0% +94.0% RT 6.0 0.1 11.2 0.3 100.0% +87.2%	2013-01-29 21:47:59 +02:00
Ben Avison	a0f59f3b28	ARMv6: New blit routines These are usable either as various composite operations, or via the top-level function pixman_blt() which now does some blitting for the first time on an ARMv6 platform (previously it just returned FALSE). src_8888_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 414.5 9.4 445.8 3.6 100.0% +7.6% L2 93.3 20.7 114.5 12.9 100.0% +22.7% M 57.0 0.2 89.2 0.5 100.0% +56.4% HT 28.7 0.3 39.6 0.4 100.0% +37.9% VT 25.5 0.2 35.3 0.4 100.0% +38.4% R 20.1 0.1 33.8 0.3 100.0% +67.8% RT 7.8 0.2 12.7 0.4 100.0% +62.7% src_0565_0565 Before After Mean StdDev Mean StdDev Confidence Change L1 397.4 6.1 412.5 5.2 100.0% +3.8% L2 143.2 10.9 141.9 6.5 68.9% -0.9% (insignificant) M 90.7 0.4 133.5 0.7 100.0% +47.1% HT 38.6 0.3 53.7 0.7 100.0% +39.0% VT 33.0 0.3 47.3 0.6 100.0% +43.3% R 25.7 0.2 42.1 0.5 100.0% +64.1% RT 8.0 0.2 13.3 0.3 100.0% +65.6% src_8_8 Before After Mean StdDev Mean StdDev Confidence Change L1 716.5 9.8 768.2 20.4 100.0% +7.2% L2 246.2 12.7 260.5 8.8 100.0% +5.8% M 146.8 0.7 227.9 0.7 100.0% +55.2% HT 44.9 0.6 62.1 1.0 100.0% +38.2% VT 35.6 0.4 53.4 0.7 100.0% +50.0% R 29.7 0.3 48.2 0.6 100.0% +62.2% RT 8.6 0.2 12.9 0.4 100.0% +49.3%	2013-01-29 21:47:54 +02:00
Ben Avison	3cff56c5b0	ARMv6: New fill routines Note that this also effectively accelerates src_n_8888, src_n_0565 and src_n_8 composite types, because of the fast paths in pixman-fast-path.c implemented by fast_composite_solid_fill(), which end up dispatching these platform-specific fill routines. src_n_8888 Before After Mean StdDev Mean StdDev Confidence Change L1 157.3 1.1 574.2 8.7 100.0% +265.0% L2 94.2 0.5 364.8 4.2 100.0% +287.3% M 92.7 0.4 358.7 1.1 100.0% +287.1% HT 68.5 0.9 133.6 4.0 100.0% +95.2% VT 61.3 0.8 111.8 2.6 100.0% +82.4% R 61.1 0.9 108.7 2.8 100.0% +78.1% RT 24.6 1.0 28.6 1.6 100.0% +16.0% src_n_0565 Before After Mean StdDev Mean StdDev Confidence Change L1 157.4 1.0 983.1 38.5 100.0% +524.6% L2 93.6 0.5 696.0 14.3 100.0% +643.4% M 92.7 0.4 680.5 1.0 100.0% +634.0% HT 68.3 0.9 160.3 6.6 100.0% +134.6% VT 61.1 0.8 130.1 3.4 100.0% +112.9% R 61.0 0.8 125.4 4.1 100.0% +105.7% RT 24.9 1.3 29.5 1.5 100.0% +18.2% src_n_8 Before After Mean StdDev Mean StdDev Confidence Change L1 154.7 1.0 1324.4 48.5 100.0% +756.3% L2 92.4 0.4 1178.4 10.9 100.0% +1175.6% M 92.9 0.4 1275.7 2.1 100.0% +1273.5% HT 68.2 1.0 169.8 5.5 100.0% +149.0% VT 61.2 1.0 138.5 3.6 100.0% +126.3% R 61.3 0.9 130.1 3.8 100.0% +112.4% RT 25.5 1.3 29.2 1.9 100.0% +14.6%	2013-01-29 21:47:49 +02:00
Ben Avison	2e173326aa	ARMv6: Lay the groundwork for later patches in the series Move the entire contents of pixman-arm-simd-asm.S to a new file; ultimately this will only retain the scaled operations, so it is named pixman-arm-simd-asm-scaled.S. Added new header file pixman-arm-simd-asm.h, containing the macros which are the basis of all the new ARMv6 implementations, although at this point in the series, nothing uses them and the library should be binary-identical.	2013-01-29 21:47:42 +02:00
Søren Sandmann Pedersen	65fc1adb65	demo/scale: Add a spin button to set the number of subsample bits For large upscalings the level of subsampling for the filter has a quite visible effect, so make it settable in the UI so that people can experiment with various values.	2013-01-27 23:06:28 -05:00
Siarhei Siamashka	ed39992564	Use pixman_transform_point_31_16() from pixman_transform_point() Old functions pixman_transform_point() and pixman_transform_point_3d() now become just wrappers for pixman_transform_point_31_16() and pixman_transform_point_31_16_3d(). Eventually their uses should be completely eliminated in the pixman code and replaced with their extended range counterparts. This is needed in order to be able to correctly handle any matrices and parameters that may come to pixman from the code responsible for XRender implementation.	2013-01-27 20:50:38 +02:00
Siarhei Siamashka	5a78d74ccc	test: Added matrix-test for testing projective transform accuracy This test uses __float128 data type when it is available for implementing a "perfect" reference implementation. The output from from pixman_transform_point_31_16() and pixman_transform_point_31_16_affine() is compared with the reference implementation to make sure that the rounding errors may only show up in a single least significant bit. The platforms and compilers, which do not support __float128 data type, can rely on crc32 checksum for the pseudorandom transform results.	2013-01-27 20:50:31 +02:00
Siarhei Siamashka	09600ae7e3	configure.ac: Added detection for __float128 support GCC supports 128-bit floating point data type on some platforms (including but not limited to x86 and x86-64). This may be useful for tests, which need prefectly accurate reference implementations of certain algorithms.	2013-01-27 20:50:26 +02:00
Siarhei Siamashka	c3deb8334a	Add higher precision "pixman_transform_point_*" functions The following new functions are added: pixman_transform_point_31_16_3d() - Calculates the product of a matrix and a vector multiplication. pixman_transform_point_31_16() - Calculates the product of a matrix and a vector multiplication. Then converts the homogenous resulting vector [x, y, z] to cartesian [x', y', 1] variant, where x' = x / z, and y' = y / z. pixman_transform_point_31_16_affine() - A faster sibling of the other two functions, which assumes affine transformation, where the bottom row of the matrix is [0, 0, 1] and the last element of the input vector is set to 1. These functions transform a point with 31.16 fixed point coordinates from the destination space to a point with 48.16 fixed point coordinates in the source space. The results are accurate and the rounding errors may only show up in the least significant bit. No overflows are possible for the affine transformations as long as the input data is provided in 31.16 format. In the case of projective transformations, some output values may be not representable using 48.16 fixed point format. In this case the results are clamped to return maximum or minimum 48.16 values (so that the caller can at least handle NONE and PAD repeats correctly).	2013-01-27 20:49:43 +02:00
Siarhei Siamashka	a47ed2c311	Faster fetch for the C variant of r5g6b5 src/dest iterator Processing two pixels at once is used to reduce the number of arithmetic operations. The speedup relative to the generic fetch_scanline_r5g6b5() from "pixman-access.c" (pixman was compiled with gcc 4.7.2): MIPS 74K 480MHz : 20.32 MPix/s -> 26.47 MPix/s ARM11 700MHz : 34.95 MPix/s -> 38.22 MPix/s ARM Cortex-A8 1000MHz : 87.44 MPix/s -> 100.92 MPix/s ARM Cortex-A9 1700MHz : 150.95 MPix/s -> 158.13 MPix/s ARM Cortex-A15 1700MHz : 148.91 MPix/s -> 155.42 MPix/s IBM Cell PPU 3200MHz : 75.29 MPix/s -> 98.33 MPix/s Intel Core i7 2800MHz : 257.02 MPix/s -> 376.93 MPix/s That's the performance for C code (SIMD and assembly optimizations are disabled via PIXMAN_DISABLE environment variable).	2013-01-27 20:48:31 +02:00
Siarhei Siamashka	e66fd5ccb6	Faster write-back for the C variant of r5g6b5 dest iterator Unrolling loops improves performance, so just use it here. Also GCC can't properly optimize this code for RISC processors and allocate 0x1F001F constant in a register. Because this constant is too large to be represented as an immediate operand in instructions, GCC inserts some redundant arithmetics. This problem can be workarounded by explicitly using a variable for 0x1F001F constant and also initializing it by a read from another volatile variable. In this case GCC is forced to allocate a register for it, because it is not seen as a constant anymore. The speedup relative to the generic store_scanline_r5g6b5() from "pixman-access.c" (pixman was compiled with gcc 4.7.2): MIPS 74K 480MHz : 33.22 MPix/s -> 43.42 MPix/s ARM11 700MHz : 50.16 MPix/s -> 78.23 MPix/s ARM Cortex-A8 1000MHz : 117.75 MPix/s -> 196.34 MPix/s ARM Cortex-A9 1700MHz : 177.04 MPix/s -> 320.32 MPix/s ARM Cortex-A15 1700MHz : 231.44 MPix/s -> 261.64 MPix/s IBM Cell PPU 3200MHz : 130.25 MPix/s -> 145.61 MPix/s Intel Core i7 2800MHz : 502.21 MPix/s -> 721.73 MPix/s That's the performance for C code (SIMD and assembly optimizations are disabled via PIXMAN_DISABLE environment variable).	2013-01-27 20:48:26 +02:00
Siarhei Siamashka	a9f6669416	Added C variants of r5g6b5 fetch/write-back iterators Adding specialized iterators for r5g6b5 color format allows us to work on fine tuning performance of r5g6b5 fetch/write-back operations in the pixman general "fetch -> combine -> store" pipeline. These iterators also make "src_x888_0565" fast path redundant, so it can be removed.	2013-01-27 20:48:22 +02:00
Chris Wilson	794033ed43	Eliminate duplicate copies of channel flags for pixman_image_composite32() Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:16 +00:00
Chris Wilson	a59f081df4	Always return a valid function from lookup_combiner() We should always have at least a C combiner available, so we never expect the search to fail. If it does, emit an error and return a dummy function. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:16 +00:00
Chris Wilson	520230914b	Always return a valid function from lookup_composite() We never expect to fail to find the appropriate function as the general_composite_rect should always match. So if somehow we fallthrough the search, emit a _pixman_log_error() and return a dummy function. Note that we remove some conditionals and a level of indentation hence a large amount of code movement. This also reveals that in a few places we are duplicating stack variables that can be eliminated later. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:15 +00:00
Chris Wilson	b283c864a3	sse2: Add fast paths for bilinear source with a solid mask Based on the existing sse2_8888_n_8888 nearest scaling routines. fishbowl on an i5-2500: 60.9s -> 56.9s Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:15 +00:00
Chris Wilson	d00ce40912	sse2: Add a fast path for add_n_8_8888 This path is being exercised by compositing of trapezoids for clipmasks, for instance as used in the firefox-asteroids cairo-trace. IVB i7-3720qm ./tests/lowlevel-blt-bench add_n_8_8888: reference memcpy speed = 14846.7MB/s (3711.7MP/s for 32bpp fills) before: L1: 681.10 L2: 735.14 M:701.44 ( 28.35%) HT:283.32 VT:213.23 R:208.93 RT: 77.89 ( 793Kops/s) after: L1: 992.91 L2:1017.33 M:982.58 ( 39.88%) HT:458.93 VT:332.32 R:326.13 RT:136.66 (1287Kops/s) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:15 +00:00
Chris Wilson	7ced3beec9	sse2: Add a fast path for add_n_8888 This path is being exercised by inplace compositing of trapezoids, for instance as used in the firefox-asteroids cairo-trace. IVB i3-3720qm ./tests/lowlevel-blt-bench add_n_888: reference memcpy speed = 14918.3MB/s (3729.6MP/s for 32bpp fills) before: L1:1752.44 L2:2259.48 M:2215.73 ( 58.80%) HT:589.49 VT:404.04 R:424.69 RT:134.68 (1182Kops/s) after: L1:3931.21 L2:6132.78 M:3440.17 ( 92.24%) HT:1337.70 VT:1357.64 R:1270.27 RT:359.78 (2161Kops/s) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>	2013-01-27 14:04:15 +00:00
Jeff Muizelaar	b7f523e3bc	Add a version of bilinear_interpolation for precision <=4 Having 4 or fewer bits means we can do two components at a time in a single 32 bit register. Here are the results for firefox-fishtank on a Pandaboard with 4.6.3 and PIXMAN_DISABLE="arm-neon" Before: [ # ] backend test min(s) median(s) stddev. count [ 0] image t-firefox-fishtank 7.841 7.910 0.70% 6/6 After: [ # ] backend test min(s) median(s) stddev. count [ 0] image t-firefox-fishtank 6.951 6.995 1.11% 6/6	2013-01-25 13:14:37 -05:00
Ben Avison	24e83cae64	Tweaks to lowlevel-blt-bench This adds two extra tests, src_n_8 and src_8_8, which I have been using to benchmark my ARMv6 changes. I'd also like to propose that it requires an exact test name as the executable's argument, as achieved by this strstr to strcmp change. Without this, it is impossible to only benchmark (for example) add_8_8, add_n_8 or src_n_8, due to those also being substrings of many other test names.	2013-01-25 11:13:07 -05:00
Søren Sandmann Pedersen	b527a0e615	test: Use operator_name() and format_name() in composite.c With the operator_name() and format_name() functions there is no longer any reason for composite.c to have its own table of format and operator names.	2013-01-23 12:24:31 -05:00
Søren Sandmann Pedersen	4eb9a24aba	utils.[ch]: Add new format_name() function This function returns the name of the given format code, which is useful for printing out debug information. The function is written as a switch without a default value so that the compiler will warn if new formats are added in the future. The fake formats used in the fast path tables are also recognized. The function is used in alpha_map.c, where it replaces an existing format_name() function, and in blitters-test.c, affine-test.c, and scaling-test.c.	2013-01-23 12:24:31 -05:00
Søren Sandmann Pedersen	1676b49389	test/utils.[ch]: Add new function operator_name() This function returns the name of the given operator, which is useful for printing out debug information. The function is done as a switch without a default value so that the compiler will warn if new operators are added in the future. The function is used in affine-test.c, scaling-test.c, and blitters-test.c.	2013-01-23 12:24:31 -05:00
Søren Sandmann Pedersen	8d85311143	README: Add guidelines on how to contribute patches Ben Avison pointed out here: http://lists.freedesktop.org/archives/pixman/2013-January/002485.html that there isn't really any documentation about how to submit patches to pixman. This patch adds some information to the README file. v2: Incorporate some comments from Ben Avison v3: Change gitweb URL to cgit	2013-01-23 12:22:40 -05:00
Matt Turner	61dacffaf4	Convert INCLUDES to AM_CPPFLAGS INCLUDES has been deprecated starting with automake 1.13. Convert all occurrences with the recommended AM_CPPFLAGS replacement.	2013-01-22 22:08:30 -08:00
Matt Turner	c7c28f440d	Add new demos and tests to .gitignore	2013-01-22 22:08:30 -08:00
Nemanja Lukic	2c6577476e	MIPS: DSPr2: Added more fast-paths: - over_reverse_n_8888 - in_n_8_8 Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): over_reverse_n_8888 = L1: 19.42 L2: 19.07 M: 15.38 ( 40.80%) HT: 13.35 VT: 13.10 R: 12.92 RT: 8.27 ( 49Kops/s) in_n_8_8 = L1: 21.20 L2: 22.86 M: 21.42 ( 14.21%) HT: 15.97 VT: 15.69 R: 15.47 RT: 8.00 ( 48Kops/s) Optimized: over_reverse_n_8888 = L1: 60.09 L2: 47.87 M: 28.65 ( 76.02%) HT: 23.58 VT: 22.51 R: 21.99 RT: 12.28 ( 60Kops/s) in_n_8_8 = L1: 89.38 L2: 86.07 M: 65.48 ( 43.44%) HT: 44.64 VT: 41.50 R: 40.77 RT: 16.94 ( 66Kops/s)	2013-01-22 03:12:59 +01:00
Nemanja Lukic	a67b0e24d7	MIPS: DSPr2: Added more fast-paths for REVERSE operation: - out_reverse_8_0565 - out_reverse_8_8888 Performance numbers before/after on MIPS-74kc @ 1GHz: lowlevel-blt-bench results Referent (before): out_reverse_8_0565 = L1: 14.29 L2: 13.58 M: 12.14 ( 24.16%) HT: 9.23 VT: 9.12 R: 8.84 RT: 4.75 ( 36Kops/s) out_reverse_8_8888 = L1: 27.46 L2: 23.24 M: 17.41 ( 57.73%) HT: 12.61 VT: 12.47 R: 11.79 RT: 5.86 ( 41Kops/s) Optimized: out_reverse_8_0565 = L1: 28.24 L2: 25.64 M: 20.63 ( 41.05%) HT: 16.69 VT: 16.14 R: 15.50 RT: 8.69 ( 52Kops/s) out_reverse_8_8888 = L1: 52.78 L2: 41.44 M: 23.50 ( 77.94%) HT: 18.79 VT: 18.16 R: 16.90 RT: 9.11 ( 53Kops/s)	2013-01-22 03:10:31 +01:00
Maarten Lankhorst	01c2431ef8	Add 00-unexport-symbol.diff * Add 00-unexport-symbol.diff - remove test-only use of _pixman_internal_only_get_implementation - zap the only test requiring the use of this symbol	2013-01-08 18:16:23 +01:00

... 2 3 4 5 6 ...

2505 Commits