pixman

mirror of https://salsa.debian.org/xorg-team/lib/pixman synced 2025-09-06 12:25:16 +00:00

Author	SHA1	Message	Date
Søren Sandmann Pedersen	4ac0a1d60f	Move PowerPC specific CPU detection to its own file pixman-ppc.c	2012-07-07 01:09:23 -04:00
Søren Sandmann Pedersen	8590415f0e	Move ARM specific CPU detection to a new file pixman-arm.c Similar to the x86 commit, this moves the ARM specific CPU detection to its own file which exports a pixman_arm_get_implementations() function that is supposed to be a noop on non-ARM.	2012-07-07 01:09:22 -04:00
Søren Sandmann Pedersen	39ac18570a	Move x86 specific CPU detection to a new file pixman-x86.c Extract the x86 specific parts of pixman-cpu.c and put them in their own file called pixman-x86.c which exports one function pixman_x86_get_implementations() that creates the MMX and SSE2 implementations. This file is supposed to be compiled on all architectures, but pixman_x86_get_implementations() should be a noop on non-x86.	2012-07-06 23:53:19 -04:00
Søren Sandmann Pedersen	1a3b7614a9	pixman-cpu.c: Rename disabled to _pixman_disabled() and export it	2012-07-06 23:52:14 -04:00
Sebastian Bauer	d4aa82fb91	Qualify the static variables in pixman_f_transform_invert() with the const keyword. Their contents is not overwritten.	2012-07-06 23:50:21 -04:00
Søren Sandmann Pedersen	f9c91ee2f2	Use a compile-time constant for the "K" constraint in the MMX detection. When compiling with -O0, gcc doesn't understand that in signed char x = 0; ... asm ("...", : "K" (x)); x is constant. Fix this by using an immediate constant instead of a variable.	2012-07-02 18:21:21 -04:00
Søren Sandmann Pedersen	cd7ecf548a	In fast_composite_tiled_repeat() don't clone images with a palette In fast_composite_tiled_repeat() if the source image is less than a certain constant width, a clone is created which is then pre-repeated. However, the source image's palette, if it has one, is not cloned, so for indexed images, the pre-repeating would crash. Fix this by not doing any pre-repeating for images with a palette set.	2012-07-02 18:21:21 -04:00
Søren Sandmann Pedersen	7b20ad39f7	test: Make stress-test more likely to actually composite something stress-test current almost never composites anything because the clip rectangles and transformations are such that either _pixman_compute_composite_region32() or analyze_extent() will return FALSE. Fix this by: - making log_rand() return smaller numbers so that the clip rectangles are more likely to be within the destination image - adding rand_x() and rand_y() functions that pick positions within an image and using them for positioning alpha maps and source/mask positions. - making it less likely that clip regions are used in general These changes make the test take longer, so speed it up a little by making most images smaller and by reducing the maximum convolution filter from 17x19 to 3x4. With these changes, stress-test reveals a crash in iteration 0xd39 where fast_composite_tiled_repeat() creates an indexed image without a palette.	2012-07-02 18:21:21 -04:00
Matt Turner	4cdf8e9f3a	sse2: add missing ABGR entires for bilinear src_8888_8888	2012-07-01 16:35:46 -04:00
Matt Turner	ef99f9e972	loongson: optimize _mm_set_pi* functions with shuffle instructions	2012-07-01 16:34:45 -04:00
Matt Turner	9aa8e3a260	mmx: optimize bilinear function when using 7-bit precision Loongson: image firefox-fishtank 1037.738 1040.218 0.19% 3/3 image firefox-fishtank 1056.611 1057.581 0.20% 3/3 ARM/iwMMXt: image firefox-fishtank 1487.282 1492.640 0.17% 3/3 image firefox-fishtank 1363.913 1364.366 0.11% 3/3	2012-07-01 16:34:21 -04:00
Matt Turner	1ad6ae6ee8	mmx: add scaled bilinear over_8888_8_8888 Loongson: image firefox-fishtank 1665.163 1670.370 0.17% 3/3 image firefox-fishtank 1037.738 1040.218 0.19% 3/3 ARM/iwMMXt: image firefox-fishtank 2042.723 2045.308 0.10% 3/3 image firefox-fishtank 1487.282 1492.640 0.17% 3/3	2012-07-01 16:34:14 -04:00
Matt Turner	c43de364cb	mmx: add scaled bilinear over_8888_8888 Loongson: image firefox-planet-gnome 157.012 158.087 0.30% 6/6 image firefox-planet-gnome 156.617 157.109 0.15% 5/6 ARM/iwMMXt: image firefox-planet-gnome 148.086 149.339 0.76% 6/6 image firefox-planet-gnome 144.939 146.123 0.61% 6/6	2012-07-01 16:33:19 -04:00
Matt Turner	9209cd746b	mmx: add scaled bilinear src_8888_8888 Loongson: image firefox-planet-gnome 170.025 170.229 0.09% 3/4 image firefox-planet-gnome 157.012 158.087 0.30% 6/6 ARM/iwMMXt: image firefox-planet-gnome 164.192 164.875 0.34% 3/4 image firefox-planet-gnome 148.086 149.339 0.76% 6/6	2012-07-01 16:33:08 -04:00
Matt Turner	51f27d7364	mmx: Use expand_alpha instead of mask/shift	2012-07-01 16:25:30 -04:00
Siarhei Siamashka	b0855f095a	Change default bilinear interpolation precision to 7 bits This improves performance for the current SSE2 code. Further reduction to 4 bits may be considered later if it proves to allow additional speedup.	2012-07-01 23:00:34 +03:00
Siarhei Siamashka	c430b1dba7	sse2: _mm_madd_epi16 for faster bilinear scaling with 7-bit precision Reducing interpolation precision allows the use of PMADDWD instruction. This makes bilinear scaling much faster (on Intel Core i7): 8-bit: image firefox-fishtank 57.584 58.349 0.74% 3/3 7-bit: image firefox-fishtank 51.139 51.229 0.30% 3/3 8-bit: src_8888_8888 = L1: 228.71 L2: 226.52 M:224.82 ( 14.95%) HT:183.22 VT:154.02 R:171.72 RT:109.36 7-bit: src_8888_8888 = L1: 320.45 L2: 317.43 M:314.38 ( 20.77%) HT:215.13 VT:177.35 R:204.46 RT:121.93	2012-07-01 22:40:23 +03:00
Siarhei Siamashka	ccd31896bc	Bilinear interpolation precision is now configurable at compile time Macro BILINEAR_INTERPOLATION_BITS in pixman-private.h selects the number of fractional bits used for bilinear interpolation. scaling-test and affine-test have checksums for 4-bit, 7-bit and 8-bit configurations.	2012-07-01 21:45:43 +03:00
Matt Turner	ad9f1d0201	Fix distcheck due to custom iwMMXt rules	2012-06-29 14:24:30 -04:00
Siarhei Siamashka	ff5d041b88	sse2: faster bilinear scaling (use _mm_loadl_epi64) Using _mm_loadl_epi64() to load two pixels at once (pairs of top and bottom pixels) is faster than loading each pixel separately and combining them with _mm_set_epi32(). === cairo-perf-trace === before: image firefox-fishtank 66.912 66.931 0.13% 3/3 after: image firefox-fishtank 57.584 58.349 0.74% 3/3 === lowlevel-blt-bench === before: src_8888_8888 = L1: 181.10 L2: 179.14 M:178.08 ( 11.02%) HT:153.22 VT:133.45 R:142.24 RT: 95.32 after: src_8888_8888 = L1: 228.68 L2: 225.75 M:223.98 ( 14.23%) HT:185.32 VT:155.06 R:162.73 RT:102.52 This improvement was suggested by Matt Turner on irc.	2012-06-29 03:29:32 +03:00
Siarhei Siamashka	fc162bad56	test: support nearest/bilinear scaling in lowlevel-blt-bench Scale factor is selected to be nearly 1x, so that the MPix/s results can be directly compared with the results of non-scaled compositing operations.	2012-06-29 03:24:29 +03:00
Siarhei Siamashka	387e9bcddb	test: Fix for strict aliasing issue in 'get_random_seed' Gets rid of gcc warning when compiled with -fstrict-aliasing option in CFLAGS	2012-06-29 03:23:09 +03:00
Andrea Canciani	4cbeb0aedc	build: Fix compilation on win32 When compiling using the win32 build system, config.h is not available nor needed. Fixes: pixman-glyph.c(26) : fatal error C1083: Cannot open include file: 'config.h': No such file or directory	2012-06-20 17:13:33 +02:00
Matt Turner	21077e1b83	sse2: add src_x888_0565 Port of `2ddd1c498b` to SSE2. Uses the pmadd technique described in http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf Works around lack of packusdw instruction by first sign extending the values. fast: src_8888_0565 = L1: 681.40 L2: 689.20 M: 644.76 ( 25.51%) HT:404.42 VT:288.04 R:306.07 RT:150.80 (1619Kops/s) mmx: src_8888_0565 = L1:2056.03 L2:1985.44 M:1574.91 ( 61.87%) HT:533.10 VT:376.35 R:416.10 RT:178.79 (1833Kops/s) sse2: src_8888_0565 = L1:3793.42 L2:3653.44 M:1878.83 ( 73.94%) HT:535.03 VT:407.96 R:421.46 RT:163.31 (1727Kops/s) and for reference, using packusdw sse4: src_8888_0565 = L1:4396.18 L2:4229.25 M:1904.04 ( 75.18%) HT:559.79 VT:427.96 R:440.06 RT:165.71 (1744Kops/s) Notice that MMX is faster in the RT case because it can operate on 8-bytes instead of the current 16-bytes for SSE2.	2012-06-16 16:00:00 -04:00
Cyril Brulebois	3acc1ffc32	Upload to unstable.	2012-06-15 01:25:23 +02:00
Cyril Brulebois	1952e2a77b	Document the cherry-pick, fixing FTBFS on *i386.	2012-06-15 01:20:14 +02:00
Matt Turner	1701defb49	mmx: add missing _mm_empty calls Fixes spurious test failures on x86-32. (cherry picked from commit `da6193b1fc`)	2012-06-15 01:19:04 +02:00
Cyril Brulebois	8940c5222e	Upload to unstable.	2012-06-15 00:16:59 +02:00
Cyril Brulebois	0181d422ab	Bump changelogs.	2012-06-15 00:15:43 +02:00
Cyril Brulebois	f53c40a739	Merge branch 'upstream-unstable' into debian-unstable	2012-06-15 00:15:23 +02:00
Matt Turner	7db07cb731	sse2: enable over_n_0565 for b5g6r5 Same as `b950bb12` for MMX.	2012-06-13 19:32:21 -04:00
Matt Turner	45946c5fa1	.gitignore: add test/glyph-test	2012-06-13 19:32:21 -04:00
Søren Sandmann Pedersen	eadb442b5c	test: Add missing break in stress-test.c Found by coverity: https://bugzilla.redhat.com/show_bug.cgi?id=756069	2012-06-13 07:30:06 -04:00
Siarhei Siamashka	492dac7593	test: fix bisecting issue in fuzzer-find-diff.pl Before bisecting to find the exact test which has failed, we first need to make sure that the first test is fine (the first test is "good" and the whole range is "bad"). Otherwise test 2 gets incorrectly flagged as problematic in the case if we already got a failure on test 1 right from the start.	2012-06-12 04:21:57 +03:00
Siarhei Siamashka	40a0d10eea	test: OpenMP 2.5 requires signed loop iteration variables Unsigned loop variables are only supported since version 3.0 of OpenMP specification. Changing loop variables to use int32_t type fixes pixman build problems with path64 compiler.	2012-06-12 04:21:07 +03:00
Søren Sandmann Pedersen	619a60d201	test: Make glyph test pass on big endian The destination buffer was initialized with random uint32_t values, so it started out different on big endian vs. little endian. Fix that by initializing the buffer with random uint8_t values instead.	2012-06-11 19:19:23 -04:00
Søren Sandmann Pedersen	f80e7ad3cb	bits-image: Turn all the fetchers into iterator getters Instead of caching these fetchers in the image structure, and then have the iterator getter call them from there, simply change them to be iterator getters themselves. This avoids an extra indirect function call and lets us get rid of the get_scanline_32/64 fields in pixman_image_t.	2012-06-11 07:15:00 -04:00
Antti S. Lankila	fd175f9d02	Faster unorm_to_unorm for wide processing. Optimizing the unorm_to_unorm functions allows a speedup from: src_8888_2x10 = L1: 62.08 L2: 60.73 M: 59.61 ( 4.30%) HT: 46.81 VT: 42.17 R: 43.18 RT: 26.01 (325Kops/s) to: src_8888_2x10 = L1: 76.94 L2: 78.43 M: 75.87 ( 5.59%) HT: 56.73 VT: 52.39 R: 53.00 RT: 29.29 (363Kops/s) on a i7 Q720 -based laptop. The key of the patch is the observation that unorm_to_unorm's work can more easily be done with a simple multiplication and shift, when the function is applied repeatedly and the parameters are not compile-time constants. For instance, converting from 0xfe to 0xfefe (expanding from 8 bits to 16 bits) can be done by calculating c = c * 0x101 However, sometimes the result is not a neat replication of all the bits. For instance, going from 10 bits to 16 bits can be done by calculating c = c * 0x401UL >> 4 where the intermediate result is 20 bit wide repetition of the 10-bit pattern followed by shifting off the unnecessary lowest bits. The patch has the algorithm to calculate the factor and the shift, and converts the code to use it.	2012-06-10 14:23:17 -04:00
Matt Turner	367b78fd5c	configure.ac: add iwmmxt2 configure flag The flag allows the user to select whether pixman-mmx.c is compiled with -march=iwmmxt or -march=iwmmxt2. gcc has scheduling support for the Marvell CPU in the XO 1.75 when building with -march=iwmmxt2.	2012-06-09 16:57:16 -04:00
Matt Turner	31a6563ec5	autotools: use custom build rule to build iwMMXt code gcc has no sane way of enabling iwmmxt code generation, like -msse for SSE, so you have to use -march=iwmmxt{,2}. User CFLAGS are placed after -march=iwmmxt and override the march value, so we have to use a custom build rule to order the CFLAGS such that pixman-mmx.c will be built with the necessary CFLAGS.	2012-06-09 16:57:16 -04:00
Søren Sandmann Pedersen	706bf8264c	Speed up _pixman_image_get_solid() in common cases Make _pixman_image_get_solid() faster by special-casing the common cases where the image is SOLID or a repeating a8r8g8b8 image. This optimization together with the previous one results in a small but reproducable performance improvement on the xfce4-terminal-a1 cairo trace: [ # ] backend test min(s) median(s) stddev. count Before: [ 0] image xfce4-terminal-a1 1.221 1.239 1.21% 100/100 After: [ 0] image xfce4-terminal-a1 1.170 1.199 1.26% 100/100 Either optimization by itself is difficult to separate from noise.	2012-06-02 08:19:38 -04:00
Søren Sandmann Pedersen	934c9d8546	Speed up _pixman_composite_glyphs_no_mask() Bypass much of the overhead of pixman_image_composite32() by only computing the composite region once instead of once per glyph, and by only looking up the composite function whenever the glyph format or flags change. As part of this, the pixman_compute_composite_region32() was renamed to _pixman_compute_composite_region32() and exported in pixman-private.h. I couldn't find a trace that would reliably demonstrate that this is actually an improvement by itself (since _pixman_composite_glyphs_no_mask() is called so rarely), but together with the following optimization for solid sources, there is a small but reliable improvement to the xfce4-a1-terminal cairo trace.	2012-06-02 08:19:38 -04:00
Søren Sandmann Pedersen	a162189dc0	Speed up pixman_composite_glyphs() When adding glyphs to the mask, bypass most of the overhead of pixman_image_composite32() by: - Only looking up the composite function when the glyph changes either format or flags. - Only using a white source when the glyph format is different from the mask format. - Simply intersecting the glyph rectangle with the destination rectangle instead of doing the full _pixman_composite_region32(). Performance results: [ # ] backend test min(s) median(s) stddev. count Before: [ 0] image firefox-talos-gfx 6.570 6.577 0.13% 8/10 After: [ 0] image firefox-talos-gfx 4.272 4.289 0.28% 10/10 V2: Changes to deal with white sources	2012-06-02 08:19:30 -04:00
Søren Sandmann Pedersen	d9710442b4	test: Add glyph-test This test tests the new glyph cache and compositing API. Much of this test is intending to making sure that clipping and alpha map handling survive any optimizations that may be added to the glyph compositing. V2: Evaluating lcg_rand_n() multiple times in an argument list lead to undefined behavior.	2012-06-02 07:55:11 -04:00
Søren Sandmann Pedersen	dc92374727	Add support for alpha maps to compute_crc32_for_image(). When a destination image I has an alpha map A, the following rules apply: - If I has an alpha channel itself, the content of that channel is undefined - If A has RGB channels, the content of those channels is undefined. Hence in order to compute the CRC32 for such an image, we have to mask off the alpha channel of the image, and the RGB channels of the alpha map. V2: Shifting by 32 is undefined in C	2012-06-02 07:55:11 -04:00
Søren Sandmann Pedersen	43e029d525	Move CRC32 computation from blitters-test.c into utils.c This way it can be used in other tests.	2012-06-02 07:55:11 -04:00
Søren Sandmann Pedersen	fce31a5ef8	Add pixman_glyph_cache_t API This new API allows entire glyph strings to be composited in one go which reduces overhead compared to multiple calls to pixman_image_composite32(). The pixman_glyph_cache_t is a hash table that maps two keys (a "font" and a "glyph" key, but they are just keys; there is no distinction between them as far as pixman is concerned) to a glyph. Glyphs in the cache can be composited through two new entry points pixman_glyph_cache_composite_glyphs() and pixman_glyph_cache_composite_glyphs_no_mask(). A glyph cache may only be inserted into when it is "frozen", which is achieved by calling pixman_glyph_cache_freeze(). When pixman_glyph_cache_thaw() is later called, if the cache has become too crowded, some glyphs (currently the least-recently-used) will automatically be evicted. This means that a user must ensure that all the required glyphs are present in the cache before compositing a string. The intended way to use the cache is like this: pixman_glyph_t glyphs[MAX_GLYPHS]; pixman_glyph_cache_freeze (cache); for (i = 0; i < n_glyphs; ++i) { const void g; if (!(g = pixman_glyph_cache_lookup (cache, font_key, glyph_key))) { img = <rasterize glyph as a pixman_image_t>; g = pixman_glyph_cache_insert (cache, font_key, glyph_key, glyph_origin_x, glyph_origin_y, img); if (!g) { / Clean up out-of-memory condition */ goto oom; } glyphs[i].pos_x = glyph_x_pos; glyphs[i].pos_y = glyph_y_pos; glyphs[i].glyph = g; } } pixman_composite_glyphs (op, src, dest, ..., cache, n_glyphs, glyphs); pixman_glyph_cache_thaw (cache); V2: - Move glyphs to front of the MRU list when they are used. Pointed out by Behdad Esfahbod. - Composite glyphs with (white IN glyph) ADD mask in order to support mixed a8 and a8r8g8b8 glyphs. Also pointed out by Behdad. - Add pixman_glyph_get_mask_format	2012-06-02 07:55:11 -04:00
Søren Sandmann Pedersen	a3ae88b71b	Add doubly linked lists This commit adds some new inline functions to maintain a doubly linked list. The way to use them is to embed a pixman_link_t into the structures that should be linked, and use a pixman_list_t as the head of the list. The new functions are pixman_list_init (pixman_list_t list); pixman_list_prepend (pixman_list_t list, pixman_link_t link); pixman_list_move_to_front (pixman_list_t list, pixman_link_t *link); There are also a new macro: CONTAINER_OF(type, member, data); that can be used to get from a pointer to a member to the containing structure. V2: Use the C89 macro offsetof() instead of rolling our own - suggested by Alan Coopersmith.	2012-06-02 07:54:48 -04:00
Søren Sandmann Pedersen	c2230fe2af	Make use of image flags in mmx and sse2 iterators Now that we have the full image flags available, the SSE2 and MMX iterators can simply check against SAMPLES_COVER_CLIP_NEAREST (which is computed in pixman_image_composite32()) instead of comparing all the x/y/width/height parameters.	2012-05-30 04:42:29 -04:00
Søren Sandmann Pedersen	c1065a9cb4	Pass the full image flags to iterators When pixman_image_composite32() is called some flags are computed that indicate various things about the composite operation that can't be deduced from the image flags themselves. These additional flags are not currently available to iterators. All they can do is read the image flags in image->common.flags. Fix that by passing the info->{src, mask, dest}_flags on to the iterator initialization and store the flags in the iter struct as "image_flags". At the same time rename the iterator flags variable to "iter_flags" to avoid confusion.	2012-05-30 04:34:29 -04:00

... 8 9 10 11 12 ...

2611 Commits