pixman

mirror of https://salsa.debian.org/xorg-team/lib/pixman synced 2025-09-06 12:25:16 +00:00

Author	SHA1	Message	Date
Matt Turner	da6193b1fc	mmx: add missing _mm_empty calls Fixes spurious test failures on x86-32.	2012-05-27 14:59:56 -04:00
Matt Turner	62c4bdc94f	mmx: add over_reverse_n_8888 Loongson: over_reverse_n_8888 = L1: 16.04 L2: 15.35 M: 10.20 ( 27.96%) HT: 10.95 VT: 10.45 R: 9.18 RT: 6.99 ( 76Kops/s) over_reverse_n_8888 = L1: 27.40 L2: 26.67 M: 16.97 ( 45.78%) HT: 16.66 VT: 15.38 R: 14.15 RT: 9.44 ( 97Kops/s) image poppler 34.106 35.500 1.48% 6/6 image poppler 29.598 30.835 1.70% 6/6 ARM/iwMMXt: over_reverse_n_8888 = L1: 15.63 L2: 14.33 M: 10.83 ( 27.55%) HT: 9.78 VT: 9.91 R: 9.49 RT: 6.96 ( 69Kops/s) over_reverse_n_8888 = L1: 22.79 L2: 19.40 M: 13.76 ( 34.19%) HT: 11.66 VT: 11.86 R: 11.17 RT: 7.85 ( 75Kops/s) image poppler 38.040 38.606 1.10% 6/6 image poppler 31.686 32.278 0.80% 5/6	2012-05-26 20:32:27 -04:00
Matt Turner	17acc7a4c7	mmx: add add_0565_0565 Loongson: add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s) add_0565_0565 = L1: 45.06 L2: 46.71 M: 27.45 ( 38.00%) HT: 23.76 VT: 22.84 R: 18.96 RT: 9.79 ( 104Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 12.87 L2: 11.58 M: 10.11 ( 12.50%) HT: 9.06 VT: 8.66 R: 7.70 RT: 5.62 ( 58Kops/s) add_0565_0565 = L1: 31.14 L2: 28.87 M: 22.46 ( 28.60%) HT: 18.61 VT: 17.04 R: 15.21 RT: 9.35 ( 90Kops/s)	2012-05-26 20:32:27 -04:00
Matt Turner	d551dc0494	fast: add add_0565_0565 function I'll need this code for header and tail alignment loops in MMX, so I might as well implement a fast path here.	2012-05-26 20:32:27 -04:00
Matt Turner	f8dc0e9834	mmx: implement expand_4x565 in terms of expand_4xpacked565 Loongson: over_n_0565 = L1: 38.57 L2: 38.88 M: 30.01 ( 20.97%) HT: 23.60 VT: 23.88 R: 21.95 RT: 11.65 ( 113Kops/s) over_n_0565 = L1: 56.28 L2: 55.90 M: 34.20 ( 23.82%) HT: 25.66 VT: 26.60 R: 23.78 RT: 11.80 ( 115Kops/s) over_8888_0565 = L1: 35.89 L2: 36.11 M: 21.56 ( 45.47%) HT: 18.33 VT: 17.90 R: 16.27 RT: 9.07 ( 98Kops/s) over_8888_0565 = L1: 40.91 L2: 41.06 M: 23.13 ( 48.46%) HT: 19.24 VT: 18.71 R: 16.82 RT: 9.18 ( 99Kops/s) over_n_8_0565 = L1: 28.92 L2: 29.12 M: 21.42 ( 30.00%) HT: 18.37 VT: 17.75 R: 16.15 RT: 8.79 ( 91Kops/s) over_n_8_0565 = L1: 32.32 L2: 32.13 M: 22.44 ( 31.27%) HT: 19.15 VT: 18.66 R: 16.62 RT: 8.86 ( 92Kops/s) over_n_8888_0565_ca = L1: 29.33 L2: 29.22 M: 18.99 ( 66.69%) HT: 16.69 VT: 16.22 R: 14.63 RT: 8.42 ( 88Kops/s) over_n_8888_0565_ca = L1: 34.97 L2: 34.14 M: 20.32 ( 71.73%) HT: 17.67 VT: 17.19 R: 15.23 RT: 8.50 ( 89Kops/s) ARM/iwMMXt: over_n_0565 = L1: 29.70 L2: 30.53 M: 24.47 ( 14.84%) HT: 22.28 VT: 21.72 R: 21.13 RT: 12.58 ( 105Kops/s) over_n_0565 = L1: 41.42 L2: 40.00 M: 30.95 ( 19.13%) HT: 27.06 VT: 27.28 R: 23.43 RT: 14.44 ( 114Kops/s) over_8888_0565 = L1: 12.73 L2: 11.53 M: 9.07 ( 16.47%) HT: 9.00 VT: 9.25 R: 8.44 RT: 7.27 ( 76Kops/s) over_8888_0565 = L1: 23.72 L2: 21.76 M: 15.89 ( 29.51%) HT: 14.36 VT: 14.05 R: 12.44 RT: 8.94 ( 86Kops/s) over_n_8_0565 = L1: 6.80 L2: 7.15 M: 6.37 ( 7.90%) HT: 6.58 VT: 6.24 R: 6.49 RT: 5.94 ( 59Kops/s) over_n_8_0565 = L1: 12.06 L2: 11.02 M: 10.16 ( 13.43%) HT: 9.57 VT: 8.49 R: 9.10 RT: 6.86 ( 69Kops/s) over_n_8888_0565_ca = L1: 7.62 L2: 7.01 M: 6.27 ( 20.52%) HT: 6.00 VT: 6.07 R: 5.68 RT: 5.53 ( 57Kops/s) over_n_8888_0565_ca = L1: 13.54 L2: 11.96 M: 9.76 ( 30.66%) HT: 9.72 VT: 8.45 R: 9.37 RT: 6.85 ( 67Kops/s)	2012-05-26 20:32:27 -04:00
Matt Turner	51681a052f	mmx: add and use expand_4xpacked565 function Loongson: add_0565_0565 = L1: 14.39 L2: 13.98 M: 11.28 ( 15.22%) HT: 10.11 VT: 9.74 R: 9.39 RT: 6.05 ( 67Kops/s) add_0565_0565 = L1: 15.37 L2: 14.91 M: 11.83 ( 16.06%) HT: 10.53 VT: 10.15 R: 9.74 RT: 6.19 ( 68Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 11.12 L2: 10.40 M: 8.82 ( 10.65%) HT: 7.98 VT: 7.41 R: 7.57 RT: 5.21 ( 54Kops/s) add_0565_0565 = L1: 12.87 L2: 11.58 M: 10.11 ( 12.50%) HT: 9.06 VT: 8.66 R: 7.70 RT: 5.62 ( 58Kops/s)	2012-05-26 20:32:27 -04:00
Søren Sandmann Pedersen	6491c70e3a	Post-release version bump to 0.27.1	2012-05-26 16:34:13 -04:00
Søren Sandmann Pedersen	b1a401e6c9	Pre-release version bump to 0.26.0	2012-05-26 16:17:14 -04:00
Ingmar Runge	f71e3dba97	Fix MSVC compilation Only up to three SSE intrinsics supported in function declaration.	2012-05-25 20:10:31 -04:00
Søren Sandmann Pedersen	1e59e18d73	test: Composite with solid images instead of using pixman_image_fill_* There is a couple of places where the test suite uses the pixman_image_fill_* functions to initialize images. These functions can fail, and will do so if the "fast" implementation is disabled. So to make sure the test suite passes even using PIXMAN_DISABLE="fast", use pixman_image_composite32() with a solid image instead of pixman_image_fill_*.	2012-05-24 15:30:41 -04:00
Nemanja Lukic	30816e3068	MIPS: DSPr2: Added bilinear over_8888_8_8888 fast path. Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): cairo-perf-trace: [ # ] backend test min(s) median(s) stddev. count [ # ] image: pixman 0.25.3 [ 0] image firefox-fishtank 2289.180 2290.567 0.05% 5/6 Optimized: cairo-perf-trace: [ # ] backend test min(s) median(s) stddev. count [ # ] image: pixman 0.25.3 [ 0] image firefox-fishtank 1700.925 1708.314 0.22% 5/6	2012-05-23 13:50:05 -04:00
Nemanja Lukic	aea0522f6f	MIPS: DSPr2: Fix bug in over_n_8888_8888_ca/over_n_8888_0565_ca routines In main loop (unrolled by factor 2), instead of negating multiplied mask values by srca, values of srca was negated, and passed as alpha argument for UN8x4_MUL_UN8x4_ADD_UN8x4 macro. Instead of: ma = ~ma; UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s); Code was doing this: ma = ~srca; UN8x4_MUL_UN8x4_ADD_UN8x4 (d, ma, s); Key is in substituting registers s0/s1 (containing srca value), with t0/t1 containing mask values multiplied by srca. Register usage is also improved (less registers are saved on stack, for over_n_8888_8888_ca routine). The bug was introduced in commit `d2ee5631` and revealed by composite test.	2012-05-23 13:41:44 -04:00
Søren Sandmann Pedersen	74bf5dc2f9	demos: Add parrot.jpg to EXTRA_DIST Pointed out by Cyril Brulebois.	2012-05-20 13:09:16 -04:00
Cyril Brulebois	ae5a109768	Upload to experimental.	2012-05-20 17:56:41 +02:00
Cyril Brulebois	a2283057a6	Remove demos/parrot.jpg before building the source package. Let's avoid “binary file contents changed” until it's shipped in the upstream tarball.	2012-05-20 17:56:18 +02:00
Cyril Brulebois	5cb7202a34	Bump changelogs.	2012-05-20 17:41:34 +02:00
Cyril Brulebois	4ed6f63c09	Merge branch 'upstream-experimental' into debian-experimental	2012-05-20 17:40:56 +02:00
Matt Turner	55698584be	configure.ac: Fail the ARM/iwMMXt test if not compiling with -march=iwmmxt If not compiling with -march=iwmmxt, the configure test will still pass, thinking that the __builtin_arm_* intrinsic is a function instead of generating a single instruction. Since no linking is done, the configure test doesn't catch this, and we get linking errors in the build.	2012-05-15 16:41:22 -04:00
Søren Sandmann Pedersen	3682b61515	Post-release version bump to 0.25.7	2012-05-15 13:38:44 -04:00
Søren Sandmann Pedersen	1e1a00e964	Pre-release version bump to 0.25.6 Note that 0.25.4 was a botched release that doesn't have a tag and doesn't correspond to any commit ID. It was however uploaded and announced, so I'll just use the 0.25.6 version number.	2012-05-15 13:20:09 -04:00
Søren Sandmann Pedersen	b2c16aaadf	demos/Makefile.am: Add parrot.c to EXTRA_DIST To get 'make distcheck' to pass.	2012-05-15 13:19:19 -04:00
Matt Turner	50d3088d78	configure.ac: Rename loongson -> loongson-mmi Make it match with the other fast paths, and the PIXMAN_DISABLE value is already loongson-mmi.	2012-05-11 21:59:13 -04:00
Matt Turner	a0a40cb822	configure.ac: Fix loongson-mmi out-of-tree builds When building out-of-tree, gcc wasn't able to find loongson-mmintrin.h to compile the test program. Add -I$srcdir to CFLAGS to point gcc to it.	2012-05-11 21:49:42 -04:00
Nemanja Lukic	618a08e6aa	MIPS: DSPr2: Added over_n_8_8888 and over_n_8_0565 fast paths. Performance numbers before/after on MIPS-74kc @ 1GHz Referent (before): lowlevel-blt-bench: over_n_8_8888 = L1: 10.40 L2: 9.79 M: 8.47 ( 33.62%) HT: 7.64 VT: 7.59 R: 7.48 RT: 5.30 ( 40Kops/s) over_n_8_0565 = L1: 7.40 L2: 7.23 M: 6.78 ( 17.94%) HT: 6.23 VT: 6.17 R: 6.14 RT: 4.62 ( 37Kops/s) Optimized: lowlevel-blt-bench: over_n_8_8888 = L1: 27.25 L2: 26.24 M: 18.15 ( 72.12%) HT: 14.52 VT: 14.31 R: 13.83 RT: 7.57 ( 48Kops/s) over_n_8_0565 = L1: 18.91 L2: 17.59 M: 15.06 ( 39.90%) HT: 12.18 VT: 11.98 R: 11.83 RT: 6.80 ( 46Kops/s)	2012-05-11 17:11:27 -04:00
Matt Turner	7d4beedc61	mmx: add and use pack_4x565 function The pack_4x565 makes use of the pack_4xpacked565 function which uses pmadd. Some of the speed up is probably attributable to removing the artificial serialization imposed by the vdest = pack_565 (..., vdest, 0); vdest = pack_565 (..., vdest, 1); ... pattern. Loongson: over_n_0565 = L1: 16.44 L2: 16.42 M: 13.83 ( 9.85%) HT: 12.83 VT: 12.61 R: 12.34 RT: 8.90 ( 93Kops/s) over_n_0565 = L1: 42.48 L2: 42.53 M: 29.83 ( 21.20%) HT: 23.39 VT: 23.72 R: 21.80 RT: 11.60 ( 113Kops/s) over_8888_0565 = L1: 15.61 L2: 15.42 M: 12.11 ( 25.79%) HT: 11.07 VT: 10.70 R: 10.37 RT: 7.25 ( 82Kops/s) over_8888_0565 = L1: 35.01 L2: 35.20 M: 21.42 ( 45.57%) HT: 18.12 VT: 17.61 R: 16.09 RT: 9.01 ( 97Kops/s) over_n_8_0565 = L1: 15.17 L2: 14.94 M: 12.57 ( 17.86%) HT: 11.96 VT: 11.52 R: 10.79 RT: 7.31 ( 79Kops/s) over_n_8_0565 = L1: 29.83 L2: 29.79 M: 21.85 ( 30.94%) HT: 18.82 VT: 18.25 R: 16.15 RT: 8.72 ( 91Kops/s) over_n_8888_0565_ca = L1: 15.25 L2: 15.02 M: 11.64 ( 41.39%) HT: 11.08 VT: 10.72 R: 10.02 RT: 7.00 ( 77Kops/s) over_n_8888_0565_ca = L1: 30.12 L2: 29.99 M: 19.47 ( 68.99%) HT: 17.05 VT: 16.55 R: 14.67 RT: 8.38 ( 88Kops/s) ARM/iwMMXt: over_n_0565 = L1: 19.29 L2: 19.88 M: 17.38 ( 10.54%) HT: 15.53 VT: 16.11 R: 13.69 RT: 11.00 ( 96Kops/s) over_n_0565 = L1: 36.02 L2: 34.85 M: 28.04 ( 16.97%) HT: 22.12 VT: 24.21 R: 22.36 RT: 12.22 ( 103Kops/s) over_8888_0565 = L1: 18.38 L2: 16.59 M: 12.34 ( 22.29%) HT: 11.67 VT: 11.71 R: 11.02 RT: 6.89 ( 72Kops/s) over_8888_0565 = L1: 24.96 L2: 22.17 M: 15.11 ( 26.81%) HT: 14.14 VT: 13.71 R: 13.18 RT: 8.13 ( 78Kops/s) over_n_8_0565 = L1: 14.65 L2: 12.44 M: 11.56 ( 14.50%) HT: 10.93 VT: 10.39 R: 10.06 RT: 7.05 ( 70Kops/s) over_n_8_0565 = L1: 18.37 L2: 14.98 M: 13.97 ( 16.51%) HT: 12.67 VT: 10.35 R: 11.80 RT: 8.14 ( 74Kops/s) over_n_8888_0565_ca = L1: 14.27 L2: 12.93 M: 10.52 ( 33.23%) HT: 9.70 VT: 9.90 R: 9.31 RT: 6.34 ( 65Kops/s) over_n_8888_0565_ca = L1: 19.69 L2: 17.58 M: 13.40 ( 42.35%) HT: 11.75 VT: 11.33 R: 11.17 RT: 7.49 ( 73Kops/s)	2012-05-10 16:21:07 -04:00
Matt Turner	2beabd9fed	configure.ac: make -march=loongson2f come before CFLAGS Otherwise we'd have -march=loongson2f being overridden by automake's CFLAGS ordering which causes build failures when -march=<not loongson2f> is specified by the user.	2012-05-10 16:15:34 -04:00
Søren Sandmann Pedersen	dadb9a318b	Add Makefile.win32 and Makefile.win32.common to EXTRA_DIST https://bugs.freedesktop.org/show_bug.cgi?id=46905	2012-05-10 15:54:32 -04:00
Matt Turner	3c57ec471e	.gitignore: add demos/checkerboard and demos/quad2quad	2012-05-09 22:50:50 -04:00
Matt Turner	2d431b53d3	mmx: Use wpackhus in src_x888_0565 on iwMMXt iwMMXt which has an unsigned saturation pack instruction, while MMX/EXT and Loongson don't. ARM/iwMMXt: src_8888_0565 = L1: 110.38 L2: 82.33 M: 40.92 ( 73.22%) HT: 35.63 VT: 32.22 R: 30.07 RT: 18.40 ( 132Kops/s) src_8888_0565 = L1: 117.91 L2: 83.05 M: 41.52 ( 75.58%) HT: 37.63 VT: 35.40 R: 29.37 RT: 19.39 ( 134Kops/s)	2012-04-27 16:39:13 -04:00
Matt Turner	2ddd1c498b	mmx: add src_8888_0565 Uses the pmadd technique described in http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf The technique uses the packssdw instruction which uses signed saturatation. This works in their example because they pack 888 to 555 leaving the high bit as zero. For packing to 565, it is unsuitable, so we replace it with an or+shuffle. Loongson: src_8888_0565 = L1: 106.13 L2: 83.57 M: 33.46 ( 68.90%) HT: 30.29 VT: 27.67 R: 26.11 RT: 15.06 ( 135Kops/s) src_8888_0565 = L1: 122.10 L2: 117.53 M: 37.97 ( 78.58%) HT: 33.14 VT: 30.09 R: 29.01 RT: 15.76 ( 139Kops/s) ARM/iwMMXt: src_8888_0565 = L1: 67.88 L2: 56.61 M: 31.20 ( 56.74%) HT: 29.22 VT: 27.01 R: 25.39 RT: 19.29 ( 130Kops/s) src_8888_0565 = L1: 110.38 L2: 82.33 M: 40.92 ( 73.22%) HT: 35.63 VT: 32.22 R: 30.07 RT: 18.40 ( 132Kops/s)	2012-04-27 14:12:28 -04:00
Matt Turner	3e8fe65a08	mmx: add x8f8g8b8 fetcher Loongson: add_x888_x888 = L1: 29.36 L2: 27.81 M: 14.05 ( 38.74%) HT: 12.45 VT: 11.78 R: 11.52 RT: 7.23 ( 75Kops/s) add_x888_x888 = L1: 36.06 L2: 34.55 M: 14.81 ( 41.03%) HT: 14.01 VT: 13.41 R: 13.06 RT: 9.06 ( 90Kops/s) src_x888_8_x888 = L1: 21.92 L2: 20.15 M: 13.35 ( 41.42%) HT: 11.70 VT: 10.95 R: 10.53 RT: 6.18 ( 65Kops/s) src_x888_8_x888 = L1: 25.43 L2: 23.51 M: 14.12 ( 44.00%) HT: 13.14 VT: 12.50 R: 11.86 RT: 7.49 ( 76Kops/s) over_x888_8_0565 = L1: 10.64 L2: 10.17 M: 7.74 ( 21.35%) HT: 6.83 VT: 6.55 R: 6.34 RT: 4.03 ( 46Kops/s) over_x888_8_0565 = L1: 11.41 L2: 10.97 M: 8.07 ( 22.36%) HT: 7.42 VT: 7.18 R: 6.92 RT: 4.62 ( 52Kops/s) ARM/iwMMXt: add_x888_x888 = L1: 22.10 L2: 18.93 M: 13.48 ( 32.29%) HT: 11.32 VT: 10.64 R: 10.36 RT: 6.51 ( 61Kops/s) add_x888_x888 = L1: 24.26 L2: 20.83 M: 14.52 ( 35.64%) HT: 12.66 VT: 12.98 R: 11.34 RT: 7.69 ( 72Kops/s) src_x888_8_x888 = L1: 19.33 L2: 17.66 M: 14.26 ( 38.43%) HT: 11.53 VT: 10.83 R: 10.57 RT: 6.12 ( 58Kops/s) src_x888_8_x888 = L1: 21.23 L2: 19.60 M: 15.41 ( 42.55%) HT: 12.66 VT: 13.30 R: 11.55 RT: 7.32 ( 67Kops/s) over_x888_8_0565 = L1: 8.15 L2: 7.56 M: 6.50 ( 15.58%) HT: 5.73 VT: 5.49 R: 5.50 RT: 3.53 ( 38Kops/s) over_x888_8_0565 = L1: 8.35 L2: 7.85 M: 6.68 ( 16.40%) HT: 6.12 VT: 5.97 R: 5.78 RT: 4.03 ( 43Kops/s)	2012-04-27 13:42:36 -04:00
Matt Turner	c2b1630d96	mmx: add a8 fetcher oprofile of xfce4-terminal-a1 210535 9.0407 libpixman-1.so.0.25.3 fetch_scanline_a8 144802 6.0054 libpixman-1.so.0.25.3 mmx_fetch_a8 Loongson: add_8_8_8 = L1: 17.98 L2: 17.28 M: 14.28 ( 19.79%) HT: 11.11 VT: 10.38 R: 9.97 RT: 5.14 ( 55Kops/s) add_8_8_8 = L1: 20.44 L2: 19.65 M: 15.62 ( 21.53%) HT: 12.86 VT: 11.98 R: 11.32 RT: 6.13 ( 64Kops/s) src_8888_8_0565 = L1: 19.97 L2: 18.59 M: 13.42 ( 32.55%) HT: 11.46 VT: 10.78 R: 10.33 RT: 5.87 ( 61Kops/s) src_8888_8_0565 = L1: 21.16 L2: 19.68 M: 13.94 ( 33.64%) HT: 12.31 VT: 11.52 R: 11.02 RT: 6.54 ( 68Kops/s) src_x888_8_x888 = L1: 20.54 L2: 18.88 M: 13.07 ( 40.74%) HT: 11.05 VT: 10.36 R: 10.02 RT: 5.68 ( 60Kops/s) src_x888_8_x888 = L1: 21.92 L2: 20.15 M: 13.35 ( 41.42%) HT: 11.70 VT: 10.95 R: 10.53 RT: 6.18 ( 65Kops/s) over_x888_8_0565 = L1: 10.32 L2: 9.85 M: 7.63 ( 21.13%) HT: 6.56 VT: 6.30 R: 6.12 RT: 3.80 ( 43Kops/s) over_x888_8_0565 = L1: 10.64 L2: 10.17 M: 7.74 ( 21.35%) HT: 6.83 VT: 6.55 R: 6.34 RT: 4.03 ( 46Kops/s) ARM/iwMMXt: add_8_8_8 = L1: 13.10 L2: 11.67 M: 10.74 ( 13.46%) HT: 8.62 VT: 8.15 R: 7.94 RT: 4.39 ( 44Kops/s) add_8_8_8 = L1: 13.81 L2: 12.79 M: 11.63 ( 13.93%) HT: 9.33 VT: 9.20 R: 9.04 RT: 5.43 ( 52Kops/s) src_8888_8_0565 = L1: 16.62 L2: 15.07 M: 12.52 ( 27.46%) HT: 10.07 VT: 10.17 R: 9.95 RT: 5.64 ( 54Kops/s) src_8888_8_0565 = L1: 16.84 L2: 16.11 M: 13.22 ( 27.71%) HT: 11.74 VT: 10.90 R: 10.80 RT: 6.66 ( 62Kops/s) src_x888_8_x888 = L1: 17.49 L2: 16.22 M: 13.73 ( 38.73%) HT: 10.10 VT: 10.33 R: 9.55 RT: 5.21 ( 52Kops/s) src_x888_8_x888 = L1: 19.33 L2: 17.66 M: 14.26 ( 38.43%) HT: 11.53 VT: 10.83 R: 10.57 RT: 6.12 ( 58Kops/s) over_x888_8_0565 = L1: 7.57 L2: 7.29 M: 6.37 ( 15.97%) HT: 5.53 VT: 5.33 R: 5.21 RT: 3.22 ( 35Kops/s) over_x888_8_0565 = L1: 8.15 L2: 7.56 M: 6.50 ( 15.58%) HT: 5.73 VT: 5.49 R: 5.50 RT: 3.53 ( 38Kops/s)	2012-04-27 13:42:26 -04:00
Matt Turner	20bad64d9a	mmx: add r5g6b5 fetcher Loongson: add_0565_0565 = L1: 12.73 L2: 12.26 M: 10.05 ( 13.87%) HT: 8.77 VT: 8.50 R: 8.25 RT: 5.28 ( 58Kops/s) add_0565_0565 = L1: 14.04 L2: 13.63 M: 10.96 ( 15.19%) HT: 9.73 VT: 9.43 R: 9.11 RT: 5.93 ( 64Kops/s) ARM/iwMMXt: add_0565_0565 = L1: 10.36 L2: 10.03 M: 9.04 ( 10.88%) HT: 3.11 VT: 7.16 R: 7.72 RT: 5.12 ( 51Kops/s) add_0565_0565 = L1: 10.84 L2: 10.20 M: 9.15 ( 11.46%) HT: 7.60 VT: 7.82 R: 7.70 RT: 5.41 ( 53Kops/s)	2012-04-27 13:42:16 -04:00
Matt Turner	c136e535ad	mmx: Use Loongson pextrh instruction in expand565 Same story as pinsrh in the previous commit. text data bss dec hex filename 25336 1952 0 27288 6a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o 25072 1952 0 27024 6990 .libs/libpixman_loongson_mmi_la-pixman-mmx.o -dsll: 95 +dsll: 70 -dsrl: 135 +dsrl: 105 -ldc1: 462 +ldc1: 445 -lw: 721 +lw: 700 +pextrh: 30	2012-04-27 13:42:07 -04:00
Matt Turner	facceb4a1f	mmx: Use Loongson pinsrh instruction in pack_565 The pinsrh instruction is analogous to MMX EXT's pinsrw, except like other Loongson vector instructions it cannot access the general purpose registers. In the cases of other Loongson vector instructions, this is a headache, but it is actually a good thing here. Since the instruction is different from MMX, I've named the intrinsic loongson_insert_pi16. text data bss dec hex filename 25976 1952 0 27928 6d18 .libs/libpixman_loongson_mmi_la-pixman-mmx.o 25336 1952 0 27288 6a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o -and: 181 +and: 147 -dsll: 143 +dsll: 95 -dsrl: 87 +dsrl: 135 -ldc1: 523 +ldc1: 462 -lw: 767 +lw: 721 +pinsrh: 35	2012-04-27 13:41:47 -04:00
Matt Turner	6d29b7d755	mmx: don't pack and unpack src unnecessarily The combine function was store8888'ing the result, and all consumers were immediately load8888'ing it, causing lots of unnecessary pack and unpack instructions. It's a very straight forward conversion, except for mmx_combine_over_u and mmx_combine_saturate_u. mmx_combine_over_u was testing the integer result to skip pixels, so we use the is_* functions to test the __m64 data directly without loading it into an integer register. For mmx_combine_saturate_u there's not a lot we can do, since it uses DIV_UN8.	2012-04-27 13:35:31 -04:00
Matt Turner	ee75003425	mmx: introduce is_equal, is_opaque, and is_zero functions To be used by the next commit.	2012-04-27 13:35:25 -04:00
Matt Turner	10c77b339f	mmx: simplify srcsrcsrcsrc calculation in over_n_8_0565	2012-04-27 13:35:19 -04:00
Matt Turner	e06947d101	mmx: remove unnecessary uint64_t<->__m64 conversions Loongson: add_8888_8888 = L1: 68.73 L2: 55.09 M: 25.39 ( 68.18%) HT: 25.28 VT: 22.42 R: 20.74 RT: 13.26 ( 131Kops/s) add_8888_8888 = L1: 159.19 L2: 114.10 M: 30.74 ( 77.91%) HT: 27.63 VT: 24.99 R: 24.61 RT: 14.49 ( 141Kops/s)	2012-04-27 13:35:14 -04:00
Matt Turner	c78e986085	mmx: compile on MIPS for Loongson MMI optimizations image image16 evolution 32.985 -> 29.667 27.314 -> 23.870 firefox-planet-gnome 197.982 -> 180.437 220.986 -> 205.057 gnome-system-monitor 48.482 -> 49.752 52.820 -> 49.528 gnome-terminal-vim 60.799 -> 50.528 51.655 -> 44.131 grads-heat-map 3.167 -> 3.181 3.328 -> 3.321 gvim 38.646 -> 32.552 38.126 -> 34.453 midori-zoomed 44.371 -> 43.338 28.860 -> 28.865 ocitysmap 23.065 -> 18.057 23.046 -> 18.055 poppler 43.676 -> 36.077 43.065 -> 36.090 swfdec-giant-steps 20.166 -> 20.365 22.354 -> 16.578 swfdec-youtube 31.502 -> 28.118 44.052 -> 41.771 xfce4-terminal-a1 69.517 -> 51.288 62.225 -> 53.309	2012-04-27 13:35:05 -04:00
Matt Turner	4e0c7902b2	mmx: make ldq_u take __m64* directly Before, if __m64 is allocated in vector or floating-point registers, __m64 vs = ldq_u((uint64_t *)src); would cause src to be loaded into an integer register and then transferred to an __m64 register. By switching ldq_u's argument type to __m64 we give the compile enough information to recognize that it can load to the vector register directly. This patch is necessary for the Loongson optimizations when __m64 is typedef'd as double.	2012-04-27 13:34:59 -04:00
Matt Turner	2e54b76a2d	mmx: add load function and use it in add_8888_8888	2012-04-27 13:34:53 -04:00
Matt Turner	084e3f2f4b	mmx: add store function and use it in add_8888_8888	2012-04-27 13:34:45 -04:00
Søren Sandmann Pedersen	e24c1c849d	bits_image_fetch_pixel_convolution(): Make sure channels are signed In the computation: srtot += RED_8 (pixel) * f RED_8 (pixel) is an unsigned quantity, which means the signed filter coefficient f gets converted to an unsigned integer before the multiplication. We get away with this because when the 32 bit unsigned result is converted to int32_t, the correct sign is produced. But if srtot had been an int64_t, the result would have been a very large positive number. Fix this by explicitly casting the channels to int.	2012-04-20 10:17:13 -04:00
Søren Sandmann Pedersen	4d2fee1406	test/utils.c: Clip values to the [0, 255] interval Unpremultiplying a superluminescent pixel can result in values greater than 255.	2012-04-20 10:17:13 -04:00
Matt Turner	e291764584	configure.ac: fix iwMMXt/gcc version error message	2012-04-18 18:14:13 -04:00
Matt Turner	b87cd1f605	mmx: fix _mm_shuffle_pi16 function when compiling without optimization The last argument must be an immediate value, and when compiling without optimization the compiler might not recognize this. So use a macro if not optimizing.	2012-04-15 14:03:08 -04:00
Matt Turner	e927d23971	configure.ac: require >= gcc-4.5 for ARM iwMMXt We're using a patched gcc-4.5, and having to modify configure.ac and autoreconf between changes is annoying. And besides, 4.5, 4.6, and 4.7's iwMMXt intrinsic support is equally broken, and we test a known broken intrinsic in the configure test program, so the version check is rather meaningless.	2012-04-15 14:00:17 -04:00
Matt Turner	0531170436	mmx: Use force_inline instead of __inline__ (bug 46906) Fixes the build on MSVC.	2012-04-05 17:36:05 -04:00
Matt Turner	b950bb12dc	mmx: enable over_n_0565 for b5g6r5 Signed-off-by: Matt Turner <mattst88@gmail.com>	2012-04-05 17:34:26 -04:00

... 9 10 11 12 13 ...

2611 Commits