Commit Graph

2107 Commits

Author SHA1 Message Date
Matt Turner
5d98abb14c mmx: wrap x86/MMX inline assembly in ifdef USE_X86_MMX
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27 13:12:55 -04:00
Matt Turner
02c1f1a022 mmx: rename USE_MMX to USE_X86_MMX
This will make upcoming ARM usage of pixman-mmx.c unambiguous.

Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-27 13:12:50 -04:00
Matt Turner
57fd8c37aa mmx: convert while (w) to if (w) when possible
gcc isn't able to see that w is no greater than 1, so it generates
unnecessary loop instructions with while (w).

Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26 11:30:05 -04:00
Matt Turner
38a7aae1d9 mmx: fix formats in commented code
b8r8g8 is apparently no longer supported sometime since this code was
commented.

Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26 11:29:58 -04:00
Matt Turner
b6b77488a0 lowlevel-blt: add over_x888_8_8888
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26 11:29:51 -04:00
Siarhei Siamashka
9126f36b96 BILINEAR->NEAREST filter optimization for simple rotation and translation
Simple rotation and translation are the additional cases when BILINEAR
filter can be safely reduced to NEAREST.
2011-09-21 18:55:25 -04:00
Søren Sandmann Pedersen
ad5c6bbb36 Strength-reduce BILINEAR filter to NEAREST filter for identity transforms
An image with a bilinear filter and an identity transform is
equivalent to one with a nearest filter, so there is no reason the
standard fast paths shouldn't be usable.

But because a BILINEAR filter samples a 2x2 pixel block in the source
image, FAST_PATH_SAMPLES_COVER_CLIP can't be set in the case where the
source area is the entire image, because some compositing operations
might then read pixels outside the image.

This patch fixes the problem by splitting the
FAST_PATH_SAMPLES_COVER_CLIP flag into two separate flags
FAST_PATH_SAMPLES_COVER_CLIP_NEAREST and
FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR that indicate that the clip
covers the samples taking into account NEAREST/BILINEAR filters
respectively.

All the existing compositing operations that require
FAST_PATH_SAMPLES_COVER_CLIP then have their flags modified to pick
either COVER_CLIP_NEAREST or COVER_CLIP_BILINEAR depending on which
filter they depend on.

In compute_image_info() both COVER_CILP_NEAREST and
COVER_CLIP_BILINEAR can be set depending on how much room there is
around the clip rectangle.

Finally, images with an identity transform and a bilinear filter get
FAST_PATH_NEAREST_FILTER set as well as FAST_PATH_BILINEAR_FILTER.

Performance measurementas with render_bench against Xephyr:

Before

*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 5.720 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 5.149 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.237 sec.

After:

*** ROUND 1 ***
---------------------------------------------------------------
Test: Test Xrender doing non-scaled Over blends
Time: 4.947 sec.
---------------------------------------------------------------
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 4.487 sec.
---------------------------------------------------------------
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.235 sec.
2011-09-21 18:55:25 -04:00
Søren Sandmann Pedersen
eb2e7ed81b test: Occasionally use a BILINEAR filter in blitters-test
To test that reductions of BILINEAR->NEAREST for identity
transformations happen correctly, occasionally use a bilinear filter
in blitters test.
2011-09-21 18:55:25 -04:00
Siarhei Siamashka
2a9f88430e test: better coverage for BILINEAR->NEAREST filter optimization
The upcoming optimization which is going to be able to replace BILINEAR filter
with NEAREST where appropriate needs to analyze the transformation matrix
and not to make any mistakes.

The changes to affine-test include:
1. Higher chance of using the same scale factor for x and y axes. This can help
   to stress some special cases (for example the case when both x and y scale
   factors are integer). The same applies to x/y translation.
2. Introduced a small chance for "corrupting" transformation matrix by flipping
   random bits. This supposedly can help to identify the cases when some of the
   fast paths or other code logic is wrongly activated due to insufficient checks.
2011-09-21 18:55:10 -04:00
Søren Sandmann Pedersen
054922e2fc Eliminate compute_sample_extents() function
In analyze_extents(), instead of calling compute_sample_extents() call
compute_transformed_extents() and inline the remaining part of
compute_sample_extents(). The upcoming bilinear->nearest optimization
will do something different with these two pieces of code.
2011-09-21 18:53:03 -04:00
Søren Sandmann Pedersen
577b6c46fd Split computation of sample area into own function
compute_sample_extents() have two parts: one that computes the
transformed extents, and one that checks whether the computed extents
fit within the 16.16 coordinate space.

Split the first part into its own function
compute_transformed_extents().
2011-09-21 18:52:18 -04:00
Søren Sandmann Pedersen
5064f18031 Remove x and y coordinates from analyze_extents() and compute_sample_extents()
These coordinates were only ever used for subtracting from the extents
box to put it into the coordinate space of the image, so we might as
well do this coordinate translation only once before entering the
functions.
2011-09-21 18:48:55 -04:00
Søren Sandmann Pedersen
dbcb4af60d Use MAKE_ACCESSORS() to generate accessors for paletted formats
Add support in convert_pixel_from_a8r8g8b8() and
convert_pixel_to_a8r8g8b8() for conversion to/from paletted formats,
then use MAKE_ACCESSORS() to generate accessors for the indexed
formats: c8, g8, g4, c4, g1
2011-09-20 06:44:05 -04:00
Søren Sandmann Pedersen
c82c2c3853 Use MAKE_ACCESSORS() to generate accessors for the a1 format.
Add FETCH_1 and STORE_1 macros and use them to add support for 1bpp
pixels to fetch_and_convert_pixel() and convert_and_store_pixel(),
then use MAKE_ACCESSORS() to generate the accessors for the a1
format. (Not the g1 format as it is indexed).
2011-09-20 06:44:05 -04:00
Søren Sandmann Pedersen
2114dd8aa1 Use MAKE_ACCESSORS() to generate accessors for 24bpp formats
Add FETCH_24 and STORE_24 macros and use them to add support for 24bpp
pixels in fetch_and_convert_pixel() and
convert_and_store_pixel(). Then use MAKE_ACCESSORS() to generate
accessors for the 24 bpp formats:

    r8g8b8
    b8g8r8
2011-09-20 06:44:05 -04:00
Søren Sandmann Pedersen
f19f5daa1b Use MAKE_ACCESSORS() to generate accessors for 4 bpp RGB formats
Use FETCH_4 and STORE_4 macros to add support for 4bpp pixels to
fetch_and_convert_pixel() and convert_and_store_pixel(), then use
MAKE_ACCESSORS() to generate accessors for 4 bpp formats, except g4 and
c4 which are indexed:

    a4
    r1g2b1
    b1g2r1
    a1r1g1b1
    a1b1g1r1
2011-09-20 06:44:04 -04:00
Søren Sandmann Pedersen
af78fe24e4 Use MAKE_ACCESSORS() to generate accessors for 8bpp RGB formats
Add support for 8 bpp formats to fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate the
accessors for all the 8 bpp formats, except g8 and c8, which are
indexed:

    a8
    r3g3b2
    b2g3r3
    a2r2g2b2
    a2b2g2r2
    x4a4
2011-09-20 06:44:04 -04:00
Søren Sandmann Pedersen
5e1b9f8975 Use MAKE_ACCESSORS() to generate accessors for all the 16bpp formats
Add support for 16bpp pixels to fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate
accessors for all the 16bpp formats:

    r5g6b5
    b5g6r5
    a1r5g5b5
    x1r5g5b5
    a1b5g5r5
    x1b5g5r5
    a4r4g4b4
    x4r4g4b4
    a4b4g4r4
    x4b4g4r4
2011-09-20 06:44:04 -04:00
Søren Sandmann Pedersen
a77597bcb8 Use MAKE_ACCESSORS() to generate all the 32 bit accessors
Add support for 32bpp formats in fetch_and_convert_pixel() and
convert_and_store_pixel(), then use MAKE_ACCESSORS() to generate
accessors for all the 32 bpp formats:

    a8r8g8b8
    x8r8g8b8
    a8b8g8r8
    x8b8g8r8
    x14r6g6b6
    b8g8r8a8
    b8g8r8x8
    r8g8b8x8
    r8g8b8a8
2011-09-20 06:44:04 -04:00
Søren Sandmann Pedersen
814af33df3 Add initial version of the MAKE_ACCESSORS() macro
This macro will eventually allow the fetchers and storers to be
generated automatically. For now, it's just a skeleton that doesn't
actually do anything.
2011-09-20 06:44:04 -04:00
Søren Sandmann Pedersen
5cae7a3fe6 Add general pixel converter
This function can convert between any <= 32 bpp formats. Nothing uses
it yet.
2011-09-20 06:44:04 -04:00
Søren Sandmann Pedersen
22f54dde6b Add a generic unorm_to_unorm() conversion utility
This function can convert between normalized numbers of different
depths. When converting to higher bit depths, it will replicate the
existing bits, when converting to lower bit depths, it will simply
truncate.

This function replaces the expand16() function in pixman-utils.c
2011-09-20 06:44:04 -04:00
Søren Sandmann Pedersen
d842669a46 A few tweaks to a comment in pixman-combine.c.template
Include a link to

	http://marc.info/?l=xfree-render&m=99792000027857&w=2

where Keith explains how the disjoint/conjoint operators work.
2011-09-19 09:08:33 -04:00
Jon TURNEY
3432e1a344 Fix build on cygwin after commit efdf65c0c4
libutils depends on pixman and so needs to preceed it in the link order

Found by tinderbox, see [1]

[1] http://tinderbox.freedesktop.org/builds/2011-09-15-0005/logs/pixman/#build

Signed-off-by: Jon TURNEY <jon.turney at dronecode.org.uk>
2011-09-19 06:17:58 -04:00
Søren Sandmann Pedersen
f9faf4df44 test: Use smaller boxes in region_contains_test()
The boxes used region_contains_test() sometimes overflow causing

    *** BUG ***
    In pixman_region32_union_rect: Invalid rectangle passed
    Set a breakpoint on '_pixman_log_error' to debug

messages to be printed when pixman is compiled with DEBUG. Fix this by
dividing the x, y, w, h coordinates by 4 to prevent overflows.
2011-09-19 06:15:14 -04:00
Andrea Canciani
9623b478f7 build-win32: Add 'check' target
On win32 the tests are built but they are not run automatically by the
build system.

A minimal 'check' target (depending on the tests being built) can
simply run them and log to the console their success/failure.
2011-09-14 07:03:35 -07:00
Andrea Canciani
479d094485 test: Do not include config.h unless HAVE_CONFIG_H is defined
The win32 build system does not generate config.h and correctly runs
the compiler without defining HAVE_CONFIG_H. Nevertheless some files
include config.h without checking for its availability, breaking the
build from a clean directory:

test\utils.h(2) : fatal error C1083: Cannot open include file:
'config.h': No such file or directory
...
2011-09-14 07:03:35 -07:00
Andrea Canciani
d46a9f3ace build-win32: Add root Makefile.win32
Add Makefile.win32 to the pixman root. This makefile can recursively
run the other ones to compile the library or the test suite.
2011-09-14 07:03:35 -07:00
Andrea Canciani
a76b78c2da build-win32: Share targets and variables across win32 makefiles
The win32 build system repeatedly defines some basic variables
(notably program names and flags) and C sources compilation rules.

They can be factored out to a common Makefile, to be included in every
other Makefile.win32.
2011-09-14 07:03:35 -07:00
Andrea Canciani
efdf65c0c4 build: Reuse test sources
Makefile.am and Makefile.win32 should not duplicate content, as this
leads to breaking the build when they are not kept in sync.

This can be avoided by listing sources, headers and common build
variables/rules in a Makefile.sources file.

In order to further simplify the test makefiles, the utility functions
are now in a static library, which gets linked to all the tests and
benchmarks.
2011-09-14 07:03:34 -07:00
Andrea Canciani
a4f95d083b build: Reuse sources and pixman-combine build rules
Makefile.am and Makefile.win32 should not duplicate content, as this
leads to breaking the build when they are not kept in sync.

This can be avoided by listing sources, headers and common build
variables/rules in a Makefile.sources file.
2011-09-14 07:02:59 -07:00
Andrea Canciani
25bd96a3d0 test: Fix compilation on win32
Adding scaling-helpers-test to the testsuite on win32 makes MSVC
complain about int64_t being used as an expression:

scaling-helpers-test.c(27) : error C2275: 'int64_t' : illegal use of
this type as an expression
2011-09-14 07:02:59 -07:00
Søren Sandmann Pedersen
9882d832f6 Use pkg-config to determine the flags to use with libpng
Previously we would unconditionally link with -lpng leading to build
failures on systems without libpng.
2011-09-12 22:39:53 -04:00
Søren Sandmann Pedersen
99a53667da test: New function to save a pixman image to .png
When debugging it is often very useful to be able to save an image as
a png file. This commit adds a function "write_png()" that does that.

If libpng is not available, then the function becomes a noop.
2011-09-10 04:07:50 -04:00
Søren Sandmann Pedersen
1e1ae0bf6e Post-release version bump to 0.23.5 2011-09-09 23:59:20 -04:00
Søren Sandmann Pedersen
f901e3b58b Pre-release version bump to 0.23.4 2011-09-09 23:51:11 -04:00
Chris Wilson
f5da52b677 bits: optimise fetching width==1 repeats
Profiling ign.com, 20% of the entire render time was absorbed in this
single operation:

<< /content //COLOR_ALPHA /width 480 /height 800 >> surface context
<< /width 1 /height 677 /format //ARGB32 /source <|!!!@jGb!m5gD']#$jFHGWtZcK&2i)Up=!TuR9`G<8;ZQp[FQk;emL9ibhbEL&NTh-j63LhHo$E=mSG,0p71`cRJHcget4%<S\X+~> >> image pattern
  //EXTEND_REPEAT set-extend
  set-source
n 0 0 480 677 rectangle
fill+
pop

which is a simple composition of a single pixel wide image. Sadly this
is a workaround for lack of independent repeat-x/y handling in cairo and
pixman. Worse still is that the worst-case behaviour of the general repeat
path is for width 1 images...

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
2011-09-09 23:43:16 -04:00
Taekyun Kim
7ef44cae6b ARM: NEON better instruction scheduling of over_n_8888
New head, tail, tail/head blocks are added and instructions
are reordered to eliminate pipeline stalls

Performance numbers of before/after

- cortex a8 -
before : L1: 375.39  L2: 391.93  M:114.39 ( 40.99%)  HT: 99.37  VT: 98.20  R: 90.24  RT: 32.87 ( 240Kops/s)
after  : L1: 481.90  L2: 483.46  M:114.29 ( 40.69%)  HT:106.91  VT: 93.38  R: 90.74  RT: 29.51 ( 236Kops/s)

- cortex a9 -
before : L1: 324.50  L2: 332.79  M:155.55 ( 47.51%)  HT:111.93  VT: 93.58  R: 71.92  RT: 28.21 ( 233Kops/s)
after  : L1: 355.87  L2: 364.49  M:156.90 ( 47.59%)  HT:111.52  VT: 91.76  R: 72.16  RT: 28.22 ( 234Kops/s)
2011-09-07 11:01:50 +09:00
Taekyun Kim
6aa82b7a72 ARM: NEON better instruction scheduling of over_n_8_8888
tail/head block is expanded and reordered to eliminate stalls

Performance numbers of before/after

- cortex a8 -
before : L1: 201.35  L2: 190.48  M:101.94 ( 54.85%)  HT: 78.41  VT: 63.83  R: 58.25  RT: 21.74 ( 191Kops/s)
after  : L1: 257.65  L2: 255.49  M:102.04 ( 55.33%)  HT: 79.19  VT: 65.46  R: 59.23  RT: 21.12 ( 189Kops/s)

- cortex a9 -
before : L1: 157.35  L2: 159.81  M:133.00 ( 60.94%)  HT: 82.44  VT: 63.64  R: 51.66  RT: 19.15 ( 179Kops/s)
after  : L1: 216.83  L2: 219.40  M:135.83 ( 61.80%)  HT: 85.60  VT: 64.80  R: 52.23  RT: 19.16 ( 179Kops/s)
2011-09-07 11:01:47 +09:00
Andrea Canciani
4ffa077487 Workaround bug in llvm-gcc
llvm-gcc (shipped in Apple XCode 4.1.1 as the default compiler or in
the 2.9 release of LLVM) performs an invalid optimization which
unifies the empty_region and the bad_region structures because they
have the same content.

A bugreport has been filed against Apple Developers Tool for this
issue. This commit works around this bug by making one of the two
structures volatile, so that it cannot be merged.

Fixes region-contains-test.
2011-08-29 07:38:37 +02:00
Andrea Canciani
a1ebff0dcb win32: Build benchmarks
Add the makefile rules needed to compile lowlevel-blt-bench on win32
and fix the compilation errors.
2011-08-29 07:37:46 +02:00
Søren Sandmann Pedersen
2644d5a947 Move bilinear interpolation to pixman-inlines.h 2011-08-19 20:01:40 -04:00
Søren Sandmann Pedersen
12ad42dd32 Use repeat() function from pixman-inlines.h in pixman-bits-image.c
The repeat() functionality was duplicated between pixman-bits-image.c
and pixman-inlines.h
2011-08-19 20:01:40 -04:00
Søren Sandmann Pedersen
2f443466bb Rename pixman-fast-path.h to pixman-inlines.h
It is not really specific to pixman-fast-path.c.
2011-08-19 20:01:36 -04:00
Søren Sandmann Pedersen
e58b208958 In pixman_image_create_bits() allow images larger than 2GB
There is no reason for pixman_image_create_bits() to check that the
image size fits in int32_t. The correct check is against size_t since
that is what the argument to calloc() is.

This patch fixes this by adding a new _pixman_multiply_overflows_size()
and using it in create_bits(). Also prepend an underscore to the names
of other similar functions since they are internal to pixman.

V2: Use int, not ssize_t for the arguments in create_bits() since
width/height are still limited to 32 bits, as pointed out by Chris
Wilson.
2011-08-15 09:37:49 -04:00
Søren Sandmann Pedersen
bdfb5944ff Don't include stdint.h in lowlevel-blt-bench.c
Some systems don't have the file, and the types are already defined in
pixman.h.

https://bugs.freedesktop.org//show_bug.cgi?id=37422
2011-08-11 03:32:14 -04:00
Søren Sandmann Pedersen
e5d85ce662 Use find_box_for_y() in pixman_region_contains_point() too
The same binary search from the previous commit can be used in this
function too.

V2: Remove check from loop that is not needed anymore, pointed out by
Andrea Canciani.
2011-08-11 03:32:14 -04:00
Søren Sandmann Pedersen
04bd4bdca6 Speed up pixman_region{,32}_contains_rectangle()
When someone selects some text in Firefox under a non-composited X
server and initiates a drag, a shaped window is created with a complex
shape corresponding to the outline of the text. Then, on every mouse
movement pixman_region_contains_rectangle() is called many times on
that complicated region. And pixman_region_contains_rectangle() is
doing a linear scan through the rectangles in the region, although the
scan does exit when it finds the first box that can't possibly
intersect the passed-in rectangle.

This patch changes the loop so that it uses a binary search to skip
boxes that don't overlap the current y position.  The performance
improvement for the text dragging case is easily noticable.

V2: Use the binary search for the "getting up to speed or skippping
remainder of band" as well.
2011-08-11 03:32:14 -04:00
Søren Sandmann Pedersen
795ec5af2f New test of pixman_region_contains_{rectangle,point}
This test generates random regions and checks whether random boxes and
points are contained within them. The results are combined and a CRC32
value is computed and compared to a known-correct one.
2011-08-11 03:32:14 -04:00
Søren Sandmann Pedersen
842591d9d1 Fix lcg_rand_u32() to return 32 random bits.
The lcg_rand() function only returns 15 random bits, so lcg_rand_u32()
would always have 0 in bit 31 and bit 15. Fix that by calling
lcg_rand() three times, to generate 15, 15, and 2 random bits
respectively.

V2: Use the 10/11 most significant bits from the 3 lcg results and mix
them with the low ones from the adjacent one, as suggested by Andrea
Canciani.
2011-08-11 03:32:14 -04:00