Commit Graph

1850 Commits

Author SHA1 Message Date
Søren Sandmann Pedersen
91521d30ab Enable bits_image_fetch_bilinear_affine_reflect_r5g6b5 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
372d7b954a Enable bits_image_fetch_bilinear_affine_none_r5g6b5 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
a826ae0e3a Enable bits_image_fetch_bilinear_affine_pad_r5g6b5 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
c5238bd180 Enable bits_image_fetch_bilinear_affine_normal_a8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
d12daefcdb Enable bits_image_fetch_bilinear_affine_reflect_a8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
9388be3293 Enable bits_image_fetch_bilinear_affine_none_a8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
8e4d4e8d11 Enable bits_image_fetch_bilinear_affine_pad_a8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
ce1f6c50b4 Enable bits_image_fetch_bilinear_affine_normal_x8r8g8b8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
83f2ee3e95 Enable bits_image_fetch_bilinear_affine_reflect_x8r8g8b8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
be37ae331c Enable bits_image_fetch_bilinear_affine_none_x8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
5f8a9bebc0 Enable bits_image_fetch_bilinear_affine_pad_x8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
c59584cb86 Enable bits_image_fetch_bilinear_affine_normal_a8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
2292cff304 Enable bits_image_fetch_bilinear_affine_reflect_a8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
8b29162693 Enable bits_image_fetch_bilinear_affine_none_a8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
e8555874e1 Enable bits_image_fetch_bilinear_affine_pad_a8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
f9778c15e9 Use a macro to generate some {a,x}8r8g8b8, a8, and r5g6b5 bilinear fetchers.
There are versions for all combinations of x8r8g8b8/a8r8g8b8 and
pad/repeat/none/normal repeat modes. The bulk of each scaler is an
inline function that takes a format and a repeat mode as parameters.

The new scalers are all commented out, but the next commits will
enable them one at a time to facilitate bisecting.
2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
6d1e10a8b5 test: Add affine-test
This test tests compositing with various affine transformations. It is
almost identical to scaling-test, except that it also applies a random
rotation in addition to the random scaling and translation.
2010-09-21 08:31:09 -04:00
Søren Sandmann Pedersen
4fa33537d7 analyze_extents: Fast path for non-transformed BITS images
Profiling various cairo traces showed that we were spending a lot of
time in analyze_extents and compute_sample_extents(). This was
especially bad for glyphs where all this computation was completely
unnecessary.

This patch adds a fast path for the case of non-transformed BITS
images. The result is approximately a 6% improvement on the
firefox-talos-gfx benchmark:

Before:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image            firefox-talos-gfx   13.797   13.848   0.20%    6/6

After:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image            firefox-talos-gfx   12.946   13.018   0.39%    6/6
2010-09-21 08:31:09 -04:00
Søren Sandmann Pedersen
c97881fe3c Move some of the FAST_PATH_COVERS_CLIP computation to pixman-image.c
When an image is solid or repeating, the FAST_PATH_COVERS_CLIP flag
can be set in compute_image_info().

Also the code that turned this flag off in pixman.c was not correct;
it didn't take transformations into account. With this patch, pixman.c
doesn't set the flag by default, but instead relies on the call to
compute_samples_extents() to set it when possible.
2010-09-21 08:31:09 -04:00
Tor Lillqvist
3411f9399c Support __thread on MINGW 4.5
By the way, it seems that with gcc 4.5.0 from mingw.org, __thread, sse
and mmx work fine.

I added the below to pixman 0.18 and as far as I can see, it works.
make check reports no problems. (Earlier I had to use --disable-mmx
and --disable-sse2.) Also gtk-demo and gimp run fine.

(Also a change to get rid of the warnings about -fvisibility being ignored.)
2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen
add0fd1bac Clip composite region against the destination alpha map extents.
Otherwise we can end up writing outside the alpha map.
2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen
af2f0080fe Remove FAST_PATH_NARROW_FORMAT flag if there is a wide alpha map
If an image has an alpha map that has wide components, then we need to
use 64 bit processing for that image. We detect this situation in
pixman-image.c and remove the FAST_PATH_NARROW_FORMAT flag.

In pixman-general, the wide/narrow decision is now based on the flags
instead of on the formats.
2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen
0afc613415 Rename FAST_PATH_NO_WIDE_FORMAT to FAST_PATH_NARROW_FORMAT
This avoids a negative in the name. Also, by renaming the "wide"
variable in pixman-general.c to "narrow" and fixing up the logic
correspondingly, the code there reads a lot more straightforwardly.
2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen
ae77548f0d Update and extend the alphamap test
- Test many more combinations of formats

- Test destination alpha maps

- Test various different alpha origins

Also add a transformation to the destination, but comment it out
because it is actually broken at the moment (and pretty difficult to
fix).
2010-09-21 08:28:55 -04:00
Søren Sandmann Pedersen
dc9fe269ea Add fence_malloc() and fence_free().
These variants of malloc() and free() try to surround the allocated
memory with protected pages so that out-of-bounds accessess will cause
a segmentation fault.

If mprotect() and getpagesize() are not available, these functions are
simply equivalent to malloc() and free().
2010-09-21 08:28:55 -04:00
Søren Sandmann Pedersen
f4dc73bad4 Do opacity computation with shifts instead of comparing with 0
Also add a COMPILE_TIME_ASSERT() macro and use it to assert that the
shift is correct.
2010-09-21 08:28:55 -04:00
Siarhei Siamashka
517a77a992 SSE2 optimization for scaled over_8888_8888 operation with nearest filter
This is the first demo implementation, it should be possible to
generalize it later to cover more operations with less lines of code.

It should be also possible to introduce the use of '__builtin_constant_p'
gcc builtin function for an efficient way of checking if 'unit_x' is known
to be zero at compile time (when processing padding pixels for NONE, or
PAD repeat).

Benchmarks from Intel Core i7 860:

== before (nearest OVER) ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=142.01 MPix/s

== after (nearest OVER) ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=314.99 MPix/s

== performance of nonscaled operation as a reference ==
op=3, src_fmt=20028888, dst_fmt=20028888, speed=652.09 MPix/s
2010-09-21 13:33:57 +03:00
Siarhei Siamashka
abc90dad57 NONE repeat support for fast scaling with nearest filter
Implemented very similar to PAD repeat.

And gcc also seems to be able to completely eliminate the
code responsible for left and right padding pixels for OVER
operation with NONE repeat.
2010-09-21 13:33:08 +03:00
Siarhei Siamashka
45833d5b19 PAD repeat support for fast scaling with nearest filter
When processing pixels from the left and right padding, the same
scanline function is used with 'unit_x' set to 0.

Actually appears that gcc can handle this quite efficiently. When
using 'restrict' keyword, it is able to optimize the whole operation
performed on left or right padding pixels to a small unrolled loop
(the code is reduced to a simple fill implementation):

    9b30:       89 08                   mov    %ecx,(%rax)
    9b32:       89 48 04                mov    %ecx,0x4(%rax)
    9b35:       48 83 c0 08             add    $0x8,%rax
    9b39:       49 39 c0                cmp    %rax,%r8
    9b3c:       75 f2                   jne    9b30

Without 'restrict' keyword, there is one instruction more: reloading
source pixel data from memory in the beginning of each iteration. That
is slower, but also acceptable.
2010-09-21 13:32:11 +03:00
Siarhei Siamashka
3db0cc5c75 Introduce a fake PIXMAN_REPEAT_COVER constant
We need to implement a true PIXMAN_REPEAT_NONE support later (padding
the source with zero pixels). So it's better not to use PIXMAN_REPEAT_NONE
for handling FAST_PATH_SAMPLES_COVER_CLIP special case.
2010-09-21 13:30:59 +03:00
Siarhei Siamashka
e9b0740af7 Nearest scaling fast path macro split into two parts
Scanline processing is now split into a separate function. This provides
an easy way of overriding it with a platform specific implementation,
which may use SIMD optimizations. Only basic C data types are used as
the arguments for this function, so it may be implemented entirely in
assembly or be generated by some JIT engine.

Also as a result of this split, the complexity of code is reduced a
bit and now it should be easier to introduce support for the currently
missing NONE, PAD and REFLECT repeat types.
2010-09-21 13:29:55 +03:00
Siarhei Siamashka
066ce191a6 Nearest scaling fast path macros moved to 'pixman-fast-path.h'
These macros with some modifications can can be reused later by
various platform specific implementations, introducing SIMD
optimizations for nearest scaling fast paths.
2010-09-21 13:28:40 +03:00
Søren Sandmann Pedersen
fb819c0e93 Add FAST_PATH_NO_ALPHA_MAP to the standard destination flags.
We can't in general take a fast path if the destination has an alpha
map.
2010-09-14 08:57:17 -04:00
Siarhei Siamashka
ba6c98fc4b test: detection of possible floating point registers corruption
Added a pair of macros which can help to detect corruption
of floating point registers after a function call. This may
happen if _mm_empty() call is forgotten in MMX/SSE2 fast
path code, or ARM NEON assembly optimized function
forgets to save/restore d8-d15 registers before use.
2010-09-13 18:12:31 +03:00
Siarhei Siamashka
e470c0dc5b ARM: added 'neon_composite_over_0565_8_0565' fast path 2010-09-13 18:10:59 +03:00
Siarhei Siamashka
a5bf7c3b1a ARM: helper macros for conversion between 8888/x888/0565 formats 2010-09-13 18:08:16 +03:00
Siarhei Siamashka
8e299702f3 ARM: common init/cleanup macro for saving/restoring NEON registers
This is a typical prologue/epilogue for many NEON fast path functions, so
it makes sense to provide common reusable macros for it in the header file.
2010-09-13 18:05:53 +03:00
Søren Sandmann Pedersen
e29d9dfcb5 Silence some warnings about uninitialized variables
Neither were real problems, but GCC was complaining about them.
2010-09-08 19:16:21 -04:00
Søren Sandmann Pedersen
27f7852b5a When pixman_compute_composite_region32() returns FALSE, don't fini the region.
The rule is that the region passed in must be initialized and that the
region returned will still be valid. Ie., the lifecycle is the
responsibility of the caller, regardless of what the function returns.

Previously, compute_composite_region32() would finalize the region and
then return FALSE, and then the caller would finalize the region
again, leading to memory corruption in some cases.
2010-09-08 19:15:01 -04:00
Søren Sandmann Pedersen
df6dbc9024 Store a2b2g2r2 pixel through the WRITE macro
Otherwise, accessor functions won't work.
2010-09-08 19:14:58 -04:00
Siarhei Siamashka
f42419a3e4 ARM: added 'neon_composite_over_8888_8_0565' fast path 2010-09-06 23:56:05 +03:00
Julien Cristau
a4f6c93016 Upload to experimental 2010-09-06 21:15:21 +02:00
Maarten Bosmans
765bde32e0 Add *.exe to .gitignore 2010-08-30 13:41:38 -04:00
Maarten Bosmans
8596408261 Use windows.h directly for mingw32 build
This patch adresses the issue discussed in
http://lists.freedesktop.org/archives/pixman/2010-April/000163.html

There were only two clashing identifiers.  The first one is IN, which
obviously causes problems in Pixman for lines like

    PIXMAN_STD_FAST_PATH (IN, solid, a8, a8, fast_composite_in_n_8_8),

Fortunately the mingw headers provide a solution: by defining
_NO_W32_PSEUDO_MODIFIERS, these stupid symbols are skipped.

The other name is UINT64, used in pixman-mmx.c. I renamed that
function to to_uint64, but may be another name is more appropriate.
2010-08-30 13:39:48 -04:00
Søren Sandmann Pedersen
5b99710042 Be more paranoid about checking for GTK+
From time to time people run into issues where the configure script
detects GTK+ when it is either not installed, or not functional due to
a missing pixman. Most recently:

  https://bugs.freedesktop.org/show_bug.cgi?id=29736

This patch makes the configure script more paranoid by

- always using PKG_CHECK_MODULES and not PKG_CHECK_EXISTS, since it
seems PKG_CHECK_EXISTS will sometimes return true even if a dependency
of GTK+, such as pixman-1, is missing.

- explicitly checking that pixman-1 is installed before enabling GTK+.

Cc: my.somewhat.lengthy.loginname@gmail.com
2010-08-24 08:12:20 -04:00
Søren Sandmann Pedersen
5530bcab26 Merge pixman_image_composite32() and do_composite().
There is not much point having a separate function that just validates
the images. Also add a boolean return to lookup_composite_function()
so that we can return if no composite function is found.
2010-08-24 08:12:20 -04:00
Benjamin Otte
a8ea889e5e region: Fix pixman_region_translate() clipping bug
Fixes the region-translate test case by clipping region translations to
the newly defined PIXMAN_REGION_MIN/MAX and using the newly introduced
type overflow_int_t to check for the overflow.
Also uses INT16_MAX or INT32_MAX for these values instead of relying on
the size of short and int types.
2010-08-24 12:17:50 +02:00
Benjamin Otte
4d8fb1bc01 region: Add a new test region-translate
This test exercises a bug in pixman_region32_translate(). The function
clips the region to int16 coordinates SHRT_MIN/SHRT_MAX.
2010-08-24 12:17:18 +02:00
Søren Sandmann Pedersen
5ff359b8a0 Post-release version bump to 0.19.3 2010-08-21 06:39:44 -04:00
Søren Sandmann Pedersen
39308ed3b0 Pre-release version bump to 0.19.2 2010-08-21 06:33:19 -04:00