Commit Graph

2394 Commits

Author SHA1 Message Date
Siarhei Siamashka
f9a41703b2 Faster conversion from a8r8g8b8 to r5g6b5 in C code
This change reduces 3 shifts, 3 ANDs and 2 ORs (total 8 arithmetic
operations) to 3 shifts, 2 ANDs and 2 ORs (total 7 arithmetic
operations).

We get garbage in the high 16 bits of the result, which might need
to be cleared when casting to uint16_t (it would bring us back to
total 8 arithmetic operations). However in the case if the result
of a8r8g8b8->r5g6b5 conversion is immediately stored to memory, no
extra instructions for clearing these garbage bits are needed.

This allows the a8r8g8b8->r5g6b5 conversion code to be compiled
into 4 instructions for ARM instead of 5 (assuming a good optimizing
compiler), which has no pipeline stalls on ARM11 as an additional
bonus.

The change in benchmark results for 'lowlevel-blt-bench src_8888_0565'
with PIXMAN_DISABLE="arm-simd arm-neon mips-dspr2 mmx sse2" and pixman
compiled by gcc-4.7.2:

    MIPS 74K        480MHz  :  40.44 MPix/s ->  40.13 MPix/s
    ARM11           700MHz  :  50.28 MPix/s ->  62.85 MPix/s
    ARM Cortex-A8  1000MHz  : 124.38 MPix/s -> 141.85 MPix/s
    ARM Cortex-A15 1700MHz  : 281.07 MPix/s -> 303.29 MPix/s
    Intel Core i7  2800MHz  : 515.92 MPix/s -> 531.16 MPix/s

The same trick was used in xomap (X server for Nokia N800/N810):
    http://repository.maemo.org/pool/diablo/free/x/xorg-server/
    xorg-server_1.3.99.0~git20070321-0osso20083801.tar.gz
2012-12-18 20:45:57 +02:00
Siarhei Siamashka
3922e90c40 Change CONVERT_XXXX_TO_YYYY macros into inline functions
It is easier and safer to modify their code in the case if the
calculations need some temporary variables. And the temporary
variables will be needed soon.
2012-12-18 20:45:47 +02:00
Siarhei Siamashka
e4519360c1 test: add "src_0565_8888" to lowlevel-blt-bench 2012-12-18 20:43:51 +02:00
Søren Sandmann Pedersen
6a6c8c51ed pixman_composite_trapezoids(): Check for NULL return from create_bits()
A check is needed that the creation of the temporary image in
pixman_composite_trapezoids() succeeds.

Fixes crash in stress-test -s 0x313c on my system.
2012-12-13 16:13:11 -05:00
Søren Sandmann Pedersen
c2cb303d33 pixman_composite_trapezoids: Return early if mask_format is not of TYPE_ALPHA
stress-test -s 0x17ee crashes because pixman_composite_trapezoids() is
given a mask_format of PIXMAN_c8, which causes it to create a
temporary image with that format but without a palette. This causes
crashes later.

The only mask_format that we actually support are those of TYPE_ALPHA,
so this patch add a return_if_fail() to ensure this.

Similarly, although currently it won't crash if given an invalid
format, alpha-only formats have always been the only thing that made
sense for the pixman_rasterize_edges() functions, so add a
return_if_fail() ensuring that the destination format is of type
PIXMAN_TYPE_ALPHA.
2012-12-13 16:10:41 -05:00
Søren Sandmann Pedersen
1f0c02811e Add testing of trapezoids to stress-test
The entry points add_trapezoids(), rasterize_trapezoid() and
composite_trapezoid() are exercised with random trapezoids.

This uncovers crashes with stress-test seeds 0x17ee and 0x313c.
2012-12-13 15:59:18 -05:00
Søren Sandmann Pedersen
526dc06e56 demos/radial-test: Add checkerboard to display the alpha channel 2012-12-11 09:05:58 -05:00
Søren Sandmann Pedersen
6402b2aa0c demos/conical-test: Use the draw_checkerboard() utility function
Instead of having its own copy.
2012-12-11 09:05:58 -05:00
Søren Sandmann Pedersen
e382e52d67 test/utils.[ch]: Add utility function to draw a checkerboard
This is useful in demo programs to display the alpha channel.
2012-12-11 09:05:58 -05:00
Søren Sandmann Pedersen
b0a6504122 radial: When comparing t to mindr, use >= rather than >
Radial gradients are conceptually rendered as a sequence of circles
generated by linearly extrapolating from the two circles given by the
gradient specification. Any circles in that sequence that would end up
with a negative radius are not drawn, a condition that is enforced by
checking that t * dr is bigger than mindr:

     if (t * dr > mindr)

However, it is legitimate for a circle to have radius exactly 0, so
the test should use >= rather than >.

This gets rid of the dots in demos/radial-test except for when the c2
circle has radius 0 and a repeat mode of either NONE or NORMAL. Both
those dots correspond to a t value of 1.0, which is outside the
defined interval of [0.0, 1.0) and therefore subject to the repeat
algorithm. As a result, in the NONE case, a value of 1.0 turns into
transparent black. In the NORMAL case, 1.0 wraps around and becomes
0.0 which is red, unlike 0.99 which is blue.

Cc: ranma42@gmail.com
2012-12-11 09:05:38 -05:00
Søren Sandmann Pedersen
54aca22058 demos/radial-test: Add zero-radius circles to demonstrate rendering bugs
Add two new gradient columns, one where the start circle is has radius
0 and one where the end circle has radius 0. All the new gradients
except for one are rendered with a bright dot in the middle. In most
but not all cases this is incorrect.

Cc: ranma42@gmail.com
2012-12-11 08:20:45 -05:00
Siarhei Siamashka
fdab3c1b6c test: Workaround unaligned MOVDQA bug (http://gcc.gnu.org/PR55614)
Just use SSE2 intrinsics to do unaligned memory accesses as
a workaround for this gcc bug related to vector extensions.
2012-12-10 20:05:15 +02:00
Siarhei Siamashka
2bc59006d7 Improve performance of combine_over_u
The generic C over_u combiner can be a lot faster with the
addition of special shortcuts for 0xFF and 0x00 alpha/mask
values. This is already implemented in C and SSE2 fast paths.

Profiling the run of cairo-perf-trace benchmarks with PIXMAN_DISABLE
environment variable set to "fast mmx sse2" on Intel Core i7:

=== before ===

37.32%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] combine_over_u
21.37%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888
13.51%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8
 2.96%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] radial_compute_color
 2.74%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] fetch_scanline_a8
 2.71%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] fetch_scanline_x8r8g8b8
 2.17%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel
 1.86%  cairo-perf-trac  libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate
 1.57%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8
 0.97%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] combine_in_reverse_u
 0.96%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] combine_over_ca

=== after ===

28.79%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_no_repeat_8888
18.44%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_none_a8r8g8b8
15.54%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] combine_over_u
 3.94%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] radial_compute_color
 3.69%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] fetch_scanline_a8
 3.69%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] fetch_scanline_x8r8g8b8
 2.94%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] _pixman_gradient_walker_pixel
 2.52%  cairo-perf-trac  libcairo.so.2.11200.0 [.] _cairo_tor_scan_converter_generate
 2.08%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] bits_image_fetch_bilinear_affine_pad_a8r8g8b8
 1.31%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] combine_in_reverse_u
 1.29%  cairo-perf-trac  libpixman-1.so.0.29.1 [.] combine_over_ca
2012-12-10 20:02:08 +02:00
Søren Sandmann Pedersen
a5e5179b56 Pre-release version bump to 0.28.2 2012-12-10 06:46:36 -05:00
Benjamin Gilbert
6e270a7968 Fix thread safety on mingw-w64 and clang
After finding a working TLS storage class specifier, configure was
continuing to test other candidates.  This caused it to prefer
__declspec(thread) over __thread.  However, __declspec(thread) is
ignored with a warning by mingw-w64 [1] and silently ignored by clang [2].
The resulting binary behaved as if PIXMAN_NO_TLS was defined.

Bug introduced by a069da6c.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=57591
[2] http://lists.freedesktop.org/archives/pixman/2012-October/002320.html
2012-12-10 06:46:36 -05:00
Stefan Weil
d91f550b2a Always use xmmintrin.h for 64 bit Windows
MinGW-w64 uses the GNU compiler and does not define _MSC_VER.
Nevertheless, it provides xmmintrin.h and must be handled
here like the MS compiler. Otherwise compilation fails due to
conflicting declarations.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2012-12-10 06:46:36 -05:00
Joshua Root
2092aa0d92 Fix undeclared variable use and sysctlbyname error handling on ppc
Fixes bug 56889.
2012-12-10 06:46:36 -05:00
Søren Sandmann Pedersen
9029026edd Post-release version bump to 0.28.1 2012-12-10 06:46:36 -05:00
Søren Sandmann Pedersen
8ca4e14472 Add fast paths for separable convolution
Similar to the fast paths for general affine access, add some fast
paths for the separable filter for all combinations of formats
x8r8g8b8, a8r8g8b8, r5g6b5, a8 with the four repeat modes.

It is easy to see the speedup in the demos/scale program.
2012-12-08 12:38:58 -05:00
Søren Sandmann Pedersen
4f18ba30ce Add demo program for conical gradients
This new test is derived from radial-test.c and displays conical
gradients at various angles.

It also demonstrates how PIXMAN_REPEAT_NORMAL is supposed to work when
used with a gradient specification where the first stop is not a 0.0:
In this case the gradient is supposed to have a smooth transition from
the last stop back to the first stop with no sharp transitions. It
also shows that the repeat mode is not ignored for conical gradients
as one might be tempted to think.
2012-12-08 10:50:51 -05:00
Søren Sandmann Pedersen
3a98787bdd Add demos/zone_plate.png
The zone plate image is a useful test case for image scalers because
it contains all representable frequencies, so any imperfection in
resampling filters will show up as Moire patterns.

This version is symmetric around the midpoint of the image, so since
rotating it is supposed to be a noop, it can also be used to verify
that the resampling filters don't shift the image.

V2: Run the file through OptiPNG to cut the size in half, as suggested
by Siarhei.
2012-12-08 10:50:51 -05:00
Søren Sandmann Pedersen
97491ed26c demos: Add new demo program, "scale"
This program allows interactively scaling and rotating images with
using various filters and repeat modes. It uses
pixman_filter_create_separate_convolution() to generate the filters.
2012-12-08 10:50:51 -05:00
Søren Sandmann Pedersen
7f5bb22d17 demos/gtk-utils.[ch]: Add pixman_image_from_file()
This function uses GdkPixbuf to load various common formats such as
.png and .jpg into a pixman image.
2012-12-08 10:50:51 -05:00
Søren Sandmann Pedersen
6915f3e24f Add new pixman_filter_create_separable_convolution() API
This new API is a helper function to create filter parameters suitable
for use with PIXMAN_FILTER_SEPARABLE_CONVOLUTION.

For each dimension, given a scale factor, reconstruction and sample
filter kernels, and a subsampling resolution, this function will
compute a convolution of the two kernels scaled appropriately, then
sample that convolution and return the resulting vectors in a form
suitable for being used as parameters to
PIXMAN_FILTER_SEPARABLE_CONVOLUTION.

The filter kernels offered are the following:

  - IMPULSE:            Dirac delta function, ie., point sampling
  - BOX:                Box filter
  - LINEAR:             Linear filter, aka. "Tent" filter
  - CUBIC:              Cubic filter, currently Mitchell-Netravali
  - GAUSSIAN:           Gaussian function, sigma=1, support=3*sigma
  - LANCZOS2:           Two-lobed Lanczos filter
  - LANCZOS3:           Three-lobed Lanczos filter
  - LANCZOS3_STRETCHED: Three-lobed Lanczos filter, stretched by 4/3.0.
                        This is the "Nice" filter from Dirty Pixels by
                        Jim Blinn.

The intended way to use this function is to extract scaling factors
from the transformation and then pass those to this function to get a
filter suitable for compositing with that transformation. The filter
kernels can be chosen according to quality and performance tradeoffs.

To get equivalent quality to GdkPixbuf for downscalings, use BOX for
both reconstruction and sampling. For upscalings, use LINEAR for
reconstruction and IMPULSE for sampling (though note that for
upscaling in both X and Y directions, simply using
PIXMAN_FILTER_BILINEAR will likely be a better choice).
2012-12-08 10:50:51 -05:00
Søren Sandmann Pedersen
68760d3fe1 rounding.txt: Describe how SEPARABLE_CONVOLUTION filter works
Add some notes on how to compute the convolution matrices to be used
with the SEPARABLE_CONVOLUTION filter.
2012-12-08 10:50:51 -05:00
Søren Sandmann Pedersen
6fd480b17c Add new filter PIXMAN_FILTER_SEPARABLE_CONVOLUTION
This filter is a new way to use a convolution matrix for filtering. In
contrast to the existing CONVOLUTION filter, this new variant is
different in two respects:

- It is subsampled: Instead of just one convolution matrix, this
  filter chooses between a number of matrices based on the subpixel
  sample location, allowing the convolution kernel to be sampled at a
  higher resolution.

- It is separable: Each matrix is specified as the tensor product of
  two vectors. This has the advantages that many fewer values have to
  be stored, and that the filtering can be done separately in the x
  and y dimensions (although the initial implementation doesn't
  actually do that).

The motivation for this new filter is to improve image downsampling
quality. Currently, the best pixman can do is the regular convolution
filter which is limited to coarsely sampled convolution kernels.

With this new feature, any separable filter can be used at any desired
resolution.
2012-12-08 10:50:51 -05:00
Benjamin Gilbert
7e39861da3 Fix thread safety on mingw-w64 and clang
After finding a working TLS storage class specifier, configure was
continuing to test other candidates.  This caused it to prefer
__declspec(thread) over __thread.  However, __declspec(thread) is
ignored with a warning by mingw-w64 [1] and silently ignored by clang [2].
The resulting binary behaved as if PIXMAN_NO_TLS was defined.

Bug introduced by a069da6c.

[1] https://bugs.freedesktop.org/show_bug.cgi?id=57591
[2] http://lists.freedesktop.org/archives/pixman/2012-October/002320.html
2012-12-08 16:41:10 +02:00
Siarhei Siamashka
ebedd9a2ad test: Get rid of the obsolete 'prng_rand_N' and 'prng_rand_u32'
They are the same as 'prng_rand_n' and 'prng_rand'
2012-12-06 17:20:38 +02:00
Siarhei Siamashka
b31a696263 test: Switch to the new PRNG instead of old LCG
Wallclock time for running pixman "make check" (compile time not included):

----------------------------+----------------+-----------------------------+
                            | old PRNG (LCG) |   new PRNG (Bob Jenkins)    |
       Processor type       +----------------+------------+----------------+
                            |    gcc 4.5     |  gcc 4.5   | gcc 4.7 (simd) |
----------------------------+----------------+------------+----------------+
quad Intel Core i7  @2.8GHz |    0m49.494s   |  0m43.722s |    0m37.560s   |
dual ARM Cortex-A15 @1.7GHz |     5m8.465s   |  4m37.375s |    3m45.819s   |
     IBM Cell PPU   @3.2GHz |    23m0.821s   | 20m38.316s |   16m37.513s   |
----------------------------+----------------+------------+----------------+

But some tests got a particularly large boost. For example benchmarking and
profiling blitters-test on Core i7:

=== before ===

$ time ./blitters-test

real    0m10.907s
user    0m55.650s
sys     0m0.000s

  70.45%  blitters-test  blitters-test       [.] create_random_image
  15.81%  blitters-test  blitters-test       [.] compute_crc32_for_image_internal
   2.26%  blitters-test  blitters-test       [.] _pixman_implementation_lookup_composite
   1.07%  blitters-test  libc-2.15.so        [.] _int_free
   0.89%  blitters-test  libc-2.15.so        [.] malloc_consolidate
   0.87%  blitters-test  libc-2.15.so        [.] _int_malloc
   0.75%  blitters-test  blitters-test       [.] combine_conjoint_general_u
   0.61%  blitters-test  blitters-test       [.] combine_disjoint_general_u
   0.40%  blitters-test  blitters-test       [.] test_composite
   0.31%  blitters-test  libc-2.15.so        [.] _int_memalign
   0.31%  blitters-test  blitters-test       [.] _pixman_bits_image_setup_accessors
   0.28%  blitters-test  libc-2.15.so        [.] malloc

=== after ===

$ time ./blitters-test

real    0m3.655s
user    0m20.550s
sys     0m0.000s

  41.77%  blitters-test.n  blitters-test.new  [.] compute_crc32_for_image_internal
  15.77%  blitters-test.n  blitters-test.new  [.] prng_randmemset_r
   6.15%  blitters-test.n  blitters-test.new  [.] _pixman_implementation_lookup_composite
   3.09%  blitters-test.n  libc-2.15.so       [.] _int_free
   2.68%  blitters-test.n  libc-2.15.so       [.] malloc_consolidate
   2.39%  blitters-test.n  libc-2.15.so       [.] _int_malloc
   2.27%  blitters-test.n  blitters-test.new  [.] create_random_image
   2.22%  blitters-test.n  blitters-test.new  [.] combine_conjoint_general_u
   1.52%  blitters-test.n  blitters-test.new  [.] combine_disjoint_general_u
   1.40%  blitters-test.n  blitters-test.new  [.] test_composite
   1.02%  blitters-test.n  blitters-test.new  [.] prng_srand_r
   1.00%  blitters-test.n  blitters-test.new  [.] _pixman_image_validate
   0.96%  blitters-test.n  blitters-test.new  [.] _pixman_bits_image_setup_accessors
   0.90%  blitters-test.n  libc-2.15.so       [.] malloc
2012-12-06 17:20:35 +02:00
Siarhei Siamashka
309e66f047 test: Search/replace 'lcg_*' -> 'prng_*'
The 'lcg' prefix is going to be misleading if we replace
PRNG algorithm.
2012-12-06 17:20:31 +02:00
Siarhei Siamashka
d6545a2fc6 test: Added a better PRNG (pseudorandom number generator)
This adds a fast SIMD-optimized variant of a small noncryptographic
PRNG originally developed by Bob Jenkins:
    http://www.burtleburtle.net/bob/rand/smallprng.html

The generated pseudorandom data is good enough to pass "Big Crush"
tests from TestU01 (http://en.wikipedia.org/wiki/TestU01).

SIMD code uses http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html
which is a GCC specific extension. There is also a slower alternative
code path, which should work with any C compiler.

The performance of filling buffer with random data:
   Intel Core i7  @2.8GHz (SSE2)     : ~5.9 GB/s
   ARM Cortex-A15 @1.7GHz (NEON)     : ~2.2 GB/s
   IBM Cell PPU   @3.2GHz (Altivec)  : ~1.7 GB/s
2012-12-06 17:20:27 +02:00
Siarhei Siamashka
41f98a07fc test: Change is_little_endian() into inline function
Also dropped redundant volatile keyword because any object
can be accessed via char* pointer without breaking aliasing
rules. The compilers are able to optimize this function to either
constant 0 or 1.
2012-12-06 17:20:23 +02:00
Cyril Brulebois
97a117ef1d New upstream release. 2012-11-27 14:00:27 +01:00
Cyril Brulebois
e33dbc6c69 Merge branch 'upstream-experimental' into debian-experimental 2012-11-27 13:59:51 +01:00
Søren Sandmann Pedersen
978bab253d Add text file rounding.txt describing how rounding works
It is not entirely obvious how pixman gets from "location in the
source image" to "pixel value stored in the destination". This file
describes how the filters work, and in particular how positions are
rounded to samples.
2012-11-22 01:16:54 -05:00
Søren Sandmann Pedersen
74319e9d39 Convolution filter: round color values instead of truncating
The pixel computed by the convolution filter should be rounded off,
not truncated. As a simple example consider a convolution matrix
consisting of five times 0x3333. If all five all five input pixels are
0xff, then the result of truncating will be

    (5 * 0x3333 * 255) >> 16 = 254

But the real value of the computation is (5 * 0x3333 / 65536.0) * 254
= 254.9961, so the error is almost 1. If the user isn't very careful
about normalizing the convolution kernel so that it sums to one in
fixed point, such error might cause solid images to change color, or
opaque images to become translucent.

The fix is simply to round instead of truncate.
2012-11-22 01:06:29 -05:00
Søren Sandmann Pedersen
f0816ddaf4 Round fixed-point multiplication
After two fixed-point numbers are multiplied, the result is shifted
into place, but up until now pixman has simply discarded the low-order
bits instead of rounding to the closest number.

Fix that by adding 0x8000 (or 0x2 in one place) before shifting and
update the test checksums to match.
2012-11-20 03:23:51 -05:00
Stefan Weil
44dd746bb6 test: Fix compiler warnings caused by unused code
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2012-11-14 18:02:14 -05:00
Stefan Weil
5f96022d3b pixman: Use uintptr_t in type casts from pointer to integral value
These modifications fix lots of compiler warnings for systems where
sizeof(unsigned long) != sizeof(void *).
This is especially true for MinGW-w64 (64 bit Windows).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2012-11-14 18:02:14 -05:00
Stefan Weil
a96efd02d6 Always use xmmintrin.h for 64 bit Windows
MinGW-w64 uses the GNU compiler and does not define _MSC_VER.
Nevertheless, it provides xmmintrin.h and must be handled
here like the MS compiler. Otherwise compilation fails due to
conflicting declarations.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
2012-11-14 18:02:13 -05:00
Nemanja Lukic
899e0d6052 MIPS: DSPr2: Added several nearest neighbor fast paths with a8 mask:
Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench -n

Referent (before):
        over_8888_8_0565 =  L1:   9.62  L2:   8.85  M:  7.40 ( 39.27%)  HT:  5.67  VT:  5.61  R:  5.45  RT:  2.98 (  22Kops/s)
        over_0565_8_0565 =  L1:   7.90  L2:   7.49  M:  6.72 ( 26.75%)  HT:  5.24  VT:  5.20  R:  5.06  RT:  2.90 (  22Kops/s)

Optimized:
        over_8888_8_0565 =  L1:  18.51  L2:  16.82  M: 12.13 ( 64.43%)  HT: 10.06  VT:  9.88  R:  9.54  RT:  5.63 (  31Kops/s)
        over_0565_8_0565 =  L1:  14.82  L2:  13.94  M: 11.34 ( 45.20%)  HT:  9.45  VT:  9.35  R:  9.03  RT:  5.50 (  31Kops/s)
2012-11-14 18:01:18 -05:00
Nemanja Lukic
a432bdce66 MIPS: DSPr2: Added more fast-paths for OVER operation:
Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        over_n_0565 =  L1:  14.48  L2:  21.36  M: 17.57 ( 23.30%)  HT:  6.95  VT:  6.44  R:  6.39  RT:  2.16 (  22Kops/s)
        over_n_8888 =  L1:  92.60  L2:  86.13  M: 24.41 ( 64.74%)  HT:  8.94  VT:  8.06  R:  8.00  RT:  2.53 (  25Kops/s)

Optimized:
        over_n_0565 =  L1:  27.65  L2: 189.22  M: 58.19 ( 77.12%)  HT: 52.80  VT: 49.88  R: 47.53  RT: 23.67 (  72Kops/s)
        over_n_8888 =  L1: 235.99  L2: 230.86  M: 29.09 ( 77.11%)  HT: 27.95  VT: 27.24  R: 26.58  RT: 18.10 (  67Kops/s)
2012-11-14 18:01:18 -05:00
Nemanja Lukic
e33e9d3f55 MIPS: DSPr2: Added more fast-paths for SRC operation:
Performance numbers before/after on MIPS-74kc @ 1GHz:

lowlevel-blt-bench results

Referent (before):
        src_n_8_8888 =  L1:  13.79  L2:  22.47  M: 17.55 ( 58.28%)  HT:  6.95  VT:  6.46  R:  6.34  RT:  2.07 (  20Kops/s)
           src_n_8_8 =  L1:  20.22  L2:  20.21  M: 18.20 ( 24.17%)  HT:  6.65  VT:  6.22  R:  6.11  RT:  2.03 (  20Kops/s)

Optimized:
        src_n_8_8888 =  L1:  58.31  L2:  53.34  M: 25.69 ( 85.29%)  HT: 22.55  VT: 21.44  R: 19.91  RT: 10.34 (  48Kops/s)
           src_n_8_8 =  L1: 102.60  L2:  89.43  M: 65.01 ( 86.32%)  HT: 37.87  VT: 37.02  R: 32.43  RT: 12.41 (  51Kops/s)
2012-11-14 18:01:18 -05:00
Søren Sandmann Pedersen
d881e1f580 Allow src and dst to be identical in pixman_f_transform_invert()
It is useful to be able to invert a matrix in place, but currently
pixman_f_transform_invert() will produce wrong results if you pass the
same matrix as both source and destination.

Fix that by inverting into a temporary matrix and then copying that to
the destination.
2012-11-11 14:09:22 -05:00
Søren Sandmann Pedersen
614e7aaf14 pixman.h: Add typedefs for pixman_f_transform and pixman_f_vector 2012-11-10 01:46:17 -05:00
Joshua Root
b2e0e240fe Fix undeclared variable use and sysctlbyname error handling on ppc
Fixes bug 56889.
2012-11-09 16:13:31 -05:00
Søren Sandmann Pedersen
400436dc52 pixman_image_composite: Reduce opaque masks to NULL
When the mask is known to be opaque, we might as well reduce it to
NULL to take advantage of the various fast paths that operate on NULL
masks.
2012-11-09 16:13:31 -05:00
Søren Sandmann Pedersen
f2ada9e63f Post-release version bump to 0.29.1 2012-11-07 13:45:09 -05:00
Søren Sandmann Pedersen
8a2ff3e0ef Pre-release version bump to 0.28.0 2012-11-07 13:41:15 -05:00
Søren Sandmann Pedersen
4b91f6ca72 Post-release version bump to 0.27.5 2012-10-25 10:42:26 -04:00