Commit Graph

1929 Commits

Author SHA1 Message Date
Søren Sandmann Pedersen
3f7da59352 test: Parallize composite.c with OpenMP
Each test uses the test number as the random number seed; if it
didn't, all the threads would run the same tests since they would all
start from the same seed.
2010-10-11 12:06:20 -04:00
Søren Sandmann Pedersen
a10ccc9f30 test: Change composite so that it tests randomly generated images
Previously this test would try to exhaustively test all combinations
of formats and operators, which meant that it would take hours to run.
Instead, generate images randomly and test compositing those.

Cc: chris@chris-wilson.co.uk
2010-10-11 12:06:20 -04:00
Søren Sandmann Pedersen
55e4065cbb test: Fix eval_diff() so that it provides useful error values.
Previously, this function would evaluate the error under the
assumption that the format was 565 or wider. This patch changes it to
take the actual format into account.

With that fixed, we can turn on testing for the rest of the formats.

Cc: chris@chris-wilson.co.uk
2010-10-11 12:06:20 -04:00
Søren Sandmann Pedersen
fe411cf2ac test: Fix bug in color_correct() in composite.c
This function was using the number of bits in a channel as if it were
a mask, which lead to many spurious errors. With that fixed, we can
turn on testing for all formats where all channels have 5 or more
bits.

Cc: chris@chris-wilson.co.uk
2010-10-11 12:06:20 -04:00
Søren Sandmann Pedersen
4e89a5b7f3 Remove broken optimizations in combine_disjoint_over_u()
The first broken optimization is that it checks "a != 0x00" where it
should check "s != 0x00". The other is that it skips the computation
when alpha is 0xff. That is wrong because in the formula:

     min (1, (1 - Aa)/Ab)

the render specification states that if Ab is 0, the quotient is
defined to positive infinity. That is the case even if (1 - Aa) is 0.
2010-10-11 12:06:20 -04:00
Siarhei Siamashka
8d76c1b339 ARM: restore fallback to ARMv6 implementation from NEON in the delegate chain
After fast path cache introduction, the overhead of having this fallback is
insignificant. On the other hand, some of the ARM assembly optimizations (for
example nearest neighbor scaling) do not need NEON.
2010-10-11 01:07:07 +03:00
Siarhei Siamashka
c748650d70 Use more unrolling for scaled src_0565_0565 with nearest filter
Benchmark from Intel Core i7 860:

    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=1335.29 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=1550.96 MPix/s

    == performance of nonscaled src_0565_0565 operation as a reference ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=2401.31 MPix/s

Benchmark from ARM Cortex-A8:

    == before ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=81.79 MPix/s

    == after ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=89.55 MPix/s

    == performance of nonscaled src_0565_0565 operation as a reference ==
    op=1, src_fmt=10020565, dst_fmt=10020565, speed=197.44 MPix/s
2010-10-11 01:07:01 +03:00
Siarhei Siamashka
a520c15e11 ARM: added 'neon_composite_out_reverse_8_0565' fast path
== before ==

    outrev_8_0565 =  L1:  22.91  L2:  22.40  M: 18.75 ( 10.47%)
                     HT: 12.62   VT: 12.22   R: 11.32  RT:  5.30 (  58Kops/s)

== after ==

    outrev_8_0565 =  L1: 176.27  L2: 151.70  M:108.79 ( 60.81%)
                     HT: 50.43   VT: 37.16   R: 32.26  RT:  9.62 (  97Kops/s)
2010-10-04 23:08:54 +03:00
Siarhei Siamashka
d8820360f7 ARM: added 'neon_composite_add_0565_8_0565' fast path
== before ==

    add_0565_8_0565 =  L1:  14.05  L2:  14.03  M: 11.57 ( 12.94%)
                       HT:  8.31   VT:  8.10   R:  7.47  RT:  3.64 (  42Kops/s)

== after ==

    add_0565_8_0565 =  L1: 123.36  L2:  94.70  M: 74.36 ( 83.15%)
                       HT: 31.17   VT:  23.97  R: 21.06  RT:  6.42 (  70Kops/s)
2010-10-04 23:08:47 +03:00
Siarhei Siamashka
2f6c7b4f9d ARM: NEON: added forgotten cache preload for over_n_8888/over_n_0565
Prefetch provides up to 40-50% better performance when working
with large images and/or when having lots of L2 cache misses
on ARM Cortex-A8 @ 720MHz:

== before ==

    over_n_8888 =  L1: 225.83  L2: 181.02  M: 55.57 ( 41.41%)
                   HT: 38.96   VT: 36.92   R: 32.84  RT: 14.15 ( 123Kops/s)

    over_n_0565 =  L1: 153.91  L2: 149.69  M: 83.17 ( 30.95%)
                   HT: 50.41   VT: 49.15   R: 40.56  RT: 15.45 ( 131Kops/s)

== after ==

    over_n_8888 =  L1: 222.39  L2: 170.95  M: 76.86 ( 57.27%)
                   HT: 58.80   VT: 53.03   R: 45.51  RT: 14.13 ( 124Kops/s)

    over_n_0565 =  L1: 151.87  L2: 149.54  M:125.63 ( 46.80%)
                   HT: 67.85   VT: 57.54   R: 50.21  RT: 15.32 ( 130Kops/s)
2010-10-04 23:05:24 +03:00
Mika Yrjola
b924bb1f81 Fix "syntax error: empty declaration" warnings.
These minor changes should fix a large number of
macro declaration - related "syntax error:  empty declaration" warnings
which are seen while compiling the code with the Solaris Studio
compiler.
2010-10-04 11:20:01 -04:00
Søren Sandmann Pedersen
73c1fefa1b Delete simple repeat code
This was supposedly an optimization, but it has pathological cases
where it definitely isn't. For example a 1 x n image will cause it to
have terrible memory access patterns and to generate a ton of modulus
operations.

Since no one has ever measured whether it actually is an improvement,
and since it is doing the repeating at the wrong the stage in the
pipeline, and since with the previous commit it can't be triggered
anymore because we now require SAMPLES_COVER_CLIP for regular fast
paths, just delete it.
2010-10-04 11:19:27 -04:00
Søren Sandmann Pedersen
a4d1c9d383 Fix bug in FAST_PATH_STD_FAST_PATH
The standard fast paths deal with two kinds of images: solids and
bits. These two image types require different flags, but
PIXMAN_STD_FAST_PATH uses the same ones for both.

This patch makes it so that solid images just get the standard flags,
while bits images must be untransformed contain the destination clip
within the sample grid.

This means that the old FAST_PATH_COVERS_CLIP flag is now not used
anymore, so it can be deleted.
2010-10-04 11:17:53 -04:00
Dmitri Vorobiev
10e13135c3 Some clean-ups in fence_malloc() and fence_free()
This patch removes an unnecessary typecast of MAP_FAILED,
replaces an erroneous free() by the correct munmap() in the
error path for a failing mprotect(), and, finally, removes
redundant calls to mprotect() that aren't necessary, because
munmap() doesn't call for any specific memory protection.
2010-09-29 02:15:12 -04:00
Søren Sandmann Pedersen
ba693d2e88 Fix search-and-replace issue in lowlevel-blt-bench.c 2010-09-28 02:52:17 -04:00
Søren Sandmann Pedersen
77d3e5f6ff Rename all the fast paths with _8000 in their names to _8
This inconsistent naming somehow survived the refactoring from a while
back.
2010-09-28 00:07:47 -04:00
Liu Xinyun
ba69989374 Remove cache prefetch code.
The performance is decreased with cache prefetch, especially for
ATOM. So remove these code. Following is the experiment.

old: 0.19.5-with-cache-prefetch
new: 0.19.5-without-cache-prefetch

CPU: Intel Atom N270@1.6GHz
OS: MeeGo (32 bits)
Speedups
========
image-rgba                    poppler-0    17125.68 (17279.58 0.92%) -> 14765.36 (15926.49 3.54%):  1.16x speedup
image-rgba                  ocitysmap-0    9008.25 (9040.41 7.50%) -> 8277.94 (8343.09 5.44%):  1.09x speedup
image-rgba          xfce4-terminal-a1-0    18020.76 (18230.68 0.97%) -> 16703.77 (16712.42 1.22%):  1.08x speedup
image-rgba         gnome-terminal-vim-0    25081.38 (25133.38 0.24%) -> 23407.47 (23652.98 0.54%):  1.07x speedup
image-rgba          firefox-talos-gfx-0    57916.97 (57973.20 0.11%) -> 54556.64 (54624.55 0.39%):  1.06x speedup
image-rgba       firefox-planet-gnome-0    102377.47 (103496.63 0.70%) -> 96816.65 (97075.54 0.15%):  1.06x speedup
image-rgba         swfdec-giant-steps-0    12376.24 (12616.84 1.02%) -> 11705.30 (11825.20 1.06%):  1.06x speedup

CPU: Intel Core(TM)2 Duo CPU T9600@2.80GHz
OS: Ubuntu 10.04 (64bits)
Speedups
========
image-rgba                  ocitysmap-0    2671.46 (2691.82 8.55%) -> 2296.20 (2307.26 5.77%):  1.16x speedup
image-rgba         swfdec-giant-steps-0    1614.55 (1615.18 1.68%) -> 1532.84 (1538.52 0.72%):  1.05x speedup

Signed-off-by: Liu Xinyun <xinyun.liu@intel.com>
Signed-off-by: Chen Miaobo <miaobo.chen@intel.com>
2010-09-27 23:44:09 -04:00
Dmitri Vorobiev
56777f3f67 Use <sys/mman.h> macros only when they are available
Not all systems are regular Unices, so let's be careful with the
mmap()-related stuff, which might be unavailable. This patch makes
sure that mmap() and friends is used only when the <sys/mman.h>
header is found.
2010-09-23 16:02:29 -04:00
Søren Sandmann Pedersen
39524a4687 Revert "add enable-cache-prefetch option"
Revert this accidentally committed patch.

This reverts commit 19ea0e16b9.
2010-09-21 14:20:43 -04:00
Søren Sandmann Pedersen
e97da21049 If MAP_ANONYMOUS is not defined, define it to MAP_ANON.
This hopefully fixes the build failure on OS X.
2010-09-21 14:12:00 -04:00
Liu Xinyun
19ea0e16b9 add enable-cache-prefetch option
OK. here is the work to clear all cache prefetch. Please review it. 3x

On Tue, Sep 21, 2010 at 11:36:30PM +0800, Soeren Sandmann wrote:
> Liu Xinyun <xinyun.liu@intel.com> writes:
>
> >    This patch is to add a new configuration option: enable-cache-prefetch,
> > which is default yes.
> >
> >    Here is a link which talks on cache issue.
> >    http://lists.freedesktop.org/archives/pixman/2010-June/000218.html
> >
> >    When disable it on Atom CPU(configured with --enable-cache-prefetch=no),
> > it will have a little performance gain. Here is the patch.
>
> I think the cache prefetch code should just be deleted outright. No
> benchmarks that I'm aware of show it to be an improvement.
>
>
> Thanks,
> Soren

>From bca2192ef524bcae4eea84d0ffed9e8c4855675f Mon Sep 17 00:00:00 2001
From: Liu Xinyun <xinyun.liu@intel.com>
Date: Wed, 22 Sep 2010 00:11:56 +0800
Subject: [PATCH] remove cache prefetch
2010-09-21 12:35:51 -04:00
Søren Sandmann Pedersen
edd1733966 Post-release version bump to 0.19.5 2010-09-21 10:18:44 -04:00
Søren Sandmann Pedersen
e5b3a6e710 Pre-release version bump to 0.19.4 2010-09-21 10:11:34 -04:00
Søren Sandmann Pedersen
0742ba4164 compute_composite_region32: Zero extents before returning FALSE.
If the extents of the composite region are broken such that x2 <= x1
or y2 <= y1, then we need to zero the extents before returning so that
the region won't be completely broken when calling
pixman_region32_fini().
2010-09-21 10:05:52 -04:00
Jonathan Morton
7cd4f2fa20 Add a lowlevel blitter benchmark
This test is a modified version of Siarhei's compositor throughput
benchmark.  It's expanded with explicit reporting of memory bandwidth
consumption for the M-test, and with an additional 8x8-random test
intended to determine peak ops/sec capability.  There are also quite a
lot more operations tested for.
2010-09-21 08:50:18 -04:00
Dmitri Vorobiev
eab3a77877 Add noinline macro
This patch adds a noinline macro, which expands to compiler-dependent
keywords that tell the compiler to never inline a function.
2010-09-21 08:50:17 -04:00
Dmitri Vorobiev
cab3261c0d Add gettime() routine to test utils
Impending benchmark code will need a function to get current time
in seconds, and this patch introduces such routine. We try to use
the POSIX gettimeofday() function when available, and fall back to
clock() when not.
2010-09-21 08:50:17 -04:00
Dmitri Vorobiev
fd3c87d460 Move aligned_malloc() to utils
The aligned_malloc() routine will be used in more than one test utility.
At least, a low-level blitter benchmark needs it. Therefore, let's make
this function a part of common test utilities code.
2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
f474783607 Enable bits_image_fetch_bilinear_affine_normal_r5g6b5 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
91521d30ab Enable bits_image_fetch_bilinear_affine_reflect_r5g6b5 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
372d7b954a Enable bits_image_fetch_bilinear_affine_none_r5g6b5 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
a826ae0e3a Enable bits_image_fetch_bilinear_affine_pad_r5g6b5 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
c5238bd180 Enable bits_image_fetch_bilinear_affine_normal_a8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
d12daefcdb Enable bits_image_fetch_bilinear_affine_reflect_a8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
9388be3293 Enable bits_image_fetch_bilinear_affine_none_a8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
8e4d4e8d11 Enable bits_image_fetch_bilinear_affine_pad_a8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
ce1f6c50b4 Enable bits_image_fetch_bilinear_affine_normal_x8r8g8b8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
83f2ee3e95 Enable bits_image_fetch_bilinear_affine_reflect_x8r8g8b8 2010-09-21 08:50:17 -04:00
Søren Sandmann Pedersen
be37ae331c Enable bits_image_fetch_bilinear_affine_none_x8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
5f8a9bebc0 Enable bits_image_fetch_bilinear_affine_pad_x8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
c59584cb86 Enable bits_image_fetch_bilinear_affine_normal_a8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
2292cff304 Enable bits_image_fetch_bilinear_affine_reflect_a8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
8b29162693 Enable bits_image_fetch_bilinear_affine_none_a8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
e8555874e1 Enable bits_image_fetch_bilinear_affine_pad_a8r8g8b8 2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
f9778c15e9 Use a macro to generate some {a,x}8r8g8b8, a8, and r5g6b5 bilinear fetchers.
There are versions for all combinations of x8r8g8b8/a8r8g8b8 and
pad/repeat/none/normal repeat modes. The bulk of each scaler is an
inline function that takes a format and a repeat mode as parameters.

The new scalers are all commented out, but the next commits will
enable them one at a time to facilitate bisecting.
2010-09-21 08:50:16 -04:00
Søren Sandmann Pedersen
6d1e10a8b5 test: Add affine-test
This test tests compositing with various affine transformations. It is
almost identical to scaling-test, except that it also applies a random
rotation in addition to the random scaling and translation.
2010-09-21 08:31:09 -04:00
Søren Sandmann Pedersen
4fa33537d7 analyze_extents: Fast path for non-transformed BITS images
Profiling various cairo traces showed that we were spending a lot of
time in analyze_extents and compute_sample_extents(). This was
especially bad for glyphs where all this computation was completely
unnecessary.

This patch adds a fast path for the case of non-transformed BITS
images. The result is approximately a 6% improvement on the
firefox-talos-gfx benchmark:

Before:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image            firefox-talos-gfx   13.797   13.848   0.20%    6/6

After:

[ # ]  backend                         test   min(s) median(s) stddev. count
[  0]    image            firefox-talos-gfx   12.946   13.018   0.39%    6/6
2010-09-21 08:31:09 -04:00
Søren Sandmann Pedersen
c97881fe3c Move some of the FAST_PATH_COVERS_CLIP computation to pixman-image.c
When an image is solid or repeating, the FAST_PATH_COVERS_CLIP flag
can be set in compute_image_info().

Also the code that turned this flag off in pixman.c was not correct;
it didn't take transformations into account. With this patch, pixman.c
doesn't set the flag by default, but instead relies on the call to
compute_samples_extents() to set it when possible.
2010-09-21 08:31:09 -04:00
Tor Lillqvist
3411f9399c Support __thread on MINGW 4.5
By the way, it seems that with gcc 4.5.0 from mingw.org, __thread, sse
and mmx work fine.

I added the below to pixman 0.18 and as far as I can see, it works.
make check reports no problems. (Earlier I had to use --disable-mmx
and --disable-sse2.) Also gtk-demo and gimp run fine.

(Also a change to get rid of the warnings about -fvisibility being ignored.)
2010-09-21 08:31:08 -04:00
Søren Sandmann Pedersen
add0fd1bac Clip composite region against the destination alpha map extents.
Otherwise we can end up writing outside the alpha map.
2010-09-21 08:31:08 -04:00