Note that 0.25.4 was a botched release that doesn't have a tag and
doesn't correspond to any commit ID. It was however uploaded and
announced, so I'll just use the 0.25.6 version number.
Otherwise we'd have -march=loongson2f being overridden by automake's
CFLAGS ordering which causes build failures when -march=<not loongson2f>
is specified by the user.
Uses the pmadd technique described in
http://software.intel.com/sites/landingpage/legacy/mmx/MMX_App_24-16_Bit_Conversion.pdf
The technique uses the packssdw instruction which uses signed
saturatation. This works in their example because they pack 888 to 555
leaving the high bit as zero. For packing to 565, it is unsuitable, so
we replace it with an or+shuffle.
Loongson:
src_8888_0565 = L1: 106.13 L2: 83.57 M: 33.46 ( 68.90%) HT: 30.29 VT: 27.67 R: 26.11 RT: 15.06 ( 135Kops/s)
src_8888_0565 = L1: 122.10 L2: 117.53 M: 37.97 ( 78.58%) HT: 33.14 VT: 30.09 R: 29.01 RT: 15.76 ( 139Kops/s)
ARM/iwMMXt:
src_8888_0565 = L1: 67.88 L2: 56.61 M: 31.20 ( 56.74%) HT: 29.22 VT: 27.01 R: 25.39 RT: 19.29 ( 130Kops/s)
src_8888_0565 = L1: 110.38 L2: 82.33 M: 40.92 ( 73.22%) HT: 35.63 VT: 32.22 R: 30.07 RT: 18.40 ( 132Kops/s)
The pinsrh instruction is analogous to MMX EXT's pinsrw, except like
other Loongson vector instructions it cannot access the general purpose
registers. In the cases of other Loongson vector instructions, this is a
headache, but it is actually a good thing here. Since the instruction is
different from MMX, I've named the intrinsic loongson_insert_pi16.
text data bss dec hex filename
25976 1952 0 27928 6d18 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
25336 1952 0 27288 6a98 .libs/libpixman_loongson_mmi_la-pixman-mmx.o
-and: 181
+and: 147
-dsll: 143
+dsll: 95
-dsrl: 87
+dsrl: 135
-ldc1: 523
+ldc1: 462
-lw: 767
+lw: 721
+pinsrh: 35
The combine function was store8888'ing the result, and all consumers
were immediately load8888'ing it, causing lots of unnecessary pack and
unpack instructions.
It's a very straight forward conversion, except for mmx_combine_over_u
and mmx_combine_saturate_u. mmx_combine_over_u was testing the integer
result to skip pixels, so we use the is_* functions to test the __m64
data directly without loading it into an integer register.
For mmx_combine_saturate_u there's not a lot we can do, since it uses
DIV_UN8.
Before, if __m64 is allocated in vector or floating-point registers,
__m64 vs = ldq_u((uint64_t *)src);
would cause src to be loaded into an integer register and then
transferred to an __m64 register. By switching ldq_u's argument type to
__m64 we give the compile enough information to recognize that it can
load to the vector register directly.
This patch is necessary for the Loongson optimizations when __m64 is
typedef'd as double.
In the computation:
srtot += RED_8 (pixel) * f
RED_8 (pixel) is an unsigned quantity, which means the signed filter
coefficient f gets converted to an unsigned integer before the
multiplication. We get away with this because when the 32 bit unsigned
result is converted to int32_t, the correct sign is produced. But if
srtot had been an int64_t, the result would have been a very large
positive number.
Fix this by explicitly casting the channels to int.
The last argument must be an immediate value, and when compiling without
optimization the compiler might not recognize this. So use a macro if
not optimizing.
We're using a patched gcc-4.5, and having to modify configure.ac and
autoreconf between changes is annoying. And besides, 4.5, 4.6, and 4.7's
iwMMXt intrinsic support is equally broken, and we test a known broken
intrinsic in the configure test program, so the version check is rather
meaningless.
This function converts from a8r8g8b8 to non-premultiplied RGBA (the
PNG or GdkPixbuf format that has the channels in this order: R, G, B,
A in memory regardless of the computer's endianness). The function's
new name is a8r8g8b8_to_rgba_np().
This program can compute the projective transformation that transforms
one quadrilateral into another. The code is basically maxima[1] output
translated into C.
[1] http://maxima.sourceforge.net/
In 32 bit mode the "=A" constraint refers to the register pair
edx:eax, but according to GCC developers this is not the case in 64
bit mode, where it refers to "rax".
Hence, using "=A" for rdtsc is incorrect in 64 bit mode.
See http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21249
The xmmintrin.h as shipped with recent Visual C++ (2003+) provides
_mm_shuffle_pi16 and _mm_mulhi_pu16, so including that header
will do for using these functions, and MSVC does not like the GCC-specific
implementations of _mm_shuffle_pi16 and _mm_mulhi_pu16 that is
currently in the code.
_MM_SHUFFLE is declared in the same way in MSVC's xmmintrin.h, so don't
re-define it here to avoid a compilation warning.
Silence warnings that could make -Werror give a false negative
Use signed char to avoid cases where int8_t isn't declared
Reported-by: Mike Lothian <mike@fireburn.co.uk>
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Jeremy Huddleston <jeremyhu@apple.com>