Commit Graph

2610 Commits

Author SHA1 Message Date
Pekka Paalanen
be49f929b6 lowlevel-blt-bench: use the test pattern parser
Let lowlevel-blt-bench parse the test name string from the command line,
allowing to run almost infinitely more tests. One is no longer limited
to the tests listed in the big table.

While you can use the old short-hand names like src_8888_8888, you can
also use all possible operators now, and specify pixel formats exactly
rather than just x888, for instance.

This even allows to run crazy patterns like
conjoint_over_reverse_a8b8g8r8_n_r8g8b8x8.

All individual patterns are now interpreted through the parser. The
pattern "all" runs the same old default test set as before but through
the parser instead of the hard-coded parameters.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15 12:43:01 +03:00
Pekka Paalanen
5b27912108 lowlevel-blt-bench: add test name parser and self-test
This patch is inspired by "lowlevel-blt-bench: Parse test name strings in
general case" by Ben Avison. From Ben's commit message:

"There are many types of composite operation that are useful to benchmark
but which are omitted from the table. Continually having to add extra
entries to the table is a nuisance and is prone to human error, so this
patch adds the ability to break down unknow strings of the format
  <operation>_<src>[_<mask]_<dst>[_ca]
where bitmap formats are specified by number of bits of each component
(assumed in ARGB order) or 'n' to indicate a solid source or mask."

Add the parser to lowlevel-blt-bench.c, but do not hook it up to the
command line just yet. Instead, make it run a self-test.

As we now dynamically parse strings similar to the test names in the
huge table 'tests_tbl', we should make sure we can parse the old
well-known test names and produce exactly the same test parameters. The
self-test goes through this old table and verifies the parsing results.

Unfortunately the old table is not exactly consistent, it contains some
special cases that cannot be produced by the parsing rules. Whether
these special cases are intentional or just an oversight is not always
clear. Anyway, add a small table to reproduce the special cases
verbatim.

If we wanted, we could remove the big old table in a follow-up commit,
but then we would also lose the parser self-test.

The point of this whole excercise to let lowlevel-blt-bench recognize
novel test patterns in the future, following exactly the conventions
used in the old table.

Ben, from what I see, this parser has one major difference to what you
wrote. For a solid mask, your parser uses a8r8g8b8 format, while mine
uses a8 which comes from the old table.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15 12:42:51 +03:00
Pekka Paalanen
1f45bd6565 test/utils: add format aliases used by lowlevel-blt-bench
Lowlevel-blt-bench uses several pixel format shorthands. Pick them from
the great table in lowlevel-blt-bench.c and add them here so that
format_from_string() can recognize them.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15 12:42:45 +03:00
Pekka Paalanen
ef9c28a0e4 test/utils: add operator aliases for lowlevel-blt-bench
Lowlevel-blt-bench uses the operator alias "outrev". Add an alias for it
in the operator-name table.

Also add aliases for overrev, inrev and atoprev, so that
lowlevel-blt-bench can later recognize them for new test cases.

The aliases are added such, that an operator to name lookup will never
return them; it returns the proper names instead.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15 12:42:40 +03:00
Pekka Paalanen
f1f6cc23ce test/utils: support format name aliases
Previously there was a flat list of formats, used to iterate over all
formats when looking up a format from name or listing them. This cannot
support name aliases.

To support name aliases (multiple name strings mapping to the same
format), create a format-name mapping table. Functions format_name(),
format_from_string(), and list_formats() should keep on working exactly
like before, except format_from_string() now recognizes the additional
formats that format_name() already supported.

The only the formats from the old format list are added with ENTRY, so
that list_formats() works as before. The whole list is verified against
the authoritative list in pixman.h, entries missing from the old list
are commented out.

The extra formats supported by the old format_name() are added as
ALIASes. A side-effect of that is that now also format_from_string()
recognizes the following new names: x4c4 / c8, x4g4 / g8, c4, g4, g1,
yuy2, yv12, null, solid, pixbuf, rpixbuf, unknown.

Name aliases will be useful in follow-up patches, where
lowlevel-blt-bench.c is converted to parse short-hand format names from
strings.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15 12:42:33 +03:00
Pekka Paalanen
2c5fac9320 test/utils: support operator name aliases
Previously there was a flat list of operators (pixman_op_t), used to
iterate over all operators when looking up an operator from name or
listing them. This cannot support name aliases.

To support name aliases (multiple name strings mapping to the same
operator), create an operator-name mapping table. Functions
operator_name, operator_from_string, and list_operators should keep on
working exactly like before, except operator_from_string now recognizes
a few aliases too.

Name aliases will be useful in follow-up patches, where
lowlevel-blt-bench.c is converted to parse operator names from strings.
Lowlevel-blt-bench uses shorthand names instead of the usual names. This
change allows lowlevel-blt-bench.s to use operator_from_string in the
future.

Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
2015-04-15 12:41:47 +03:00
Ben Avison
f122907dc1 test: Move format and operator string functions to utils.[ch]
This permits format_from_string(), list_formats(), list_operators() and
operator_from_string() to be used from tests other than check-formats.

Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-04-13 10:11:51 +03:00
Ben Avison
9bc025f7cd pixman.c: Coding style
A few violations of coding style were identified in code copied from here
into affine-bench.

Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
2015-04-09 12:04:55 +03:00
Ben Avison
978dd9fc65 armv6: Fix typo in preload macro
Missing "lsl" meant that cases with a 32-bit source and/or mask, and an
8-bit destination, the code would not assemble.
2015-04-01 18:38:36 -07:00
Siarhei Siamashka
594e6a6c93 mmx: Fix _mm_empty problems for over_8888_8888/over_8888_n_8888
Using "--disable-sse2 --disable-ssse3" configure options and
CFLAGS="-m32 -O2 -g" on an x86 system results in pixman "make check"
failures:

    ../test-driver: line 95: 29874 Aborted
    FAIL: affine-test
    ../test-driver: line 95: 29887 Aborted
    FAIL: scaling-test

One _mm_empty () was missing and another one is needed to workaround
an old GCC bug https://gcc.gnu.org/PR47759 (GCC may move MMX instructions
around and cause test suite failures).

Reviewed-by: Matt Turner <mattst88@gmail.com>
2014-10-24 14:25:30 -07:00
Søren Sandmann Pedersen
a8669137b9 Fix comment about BILINEAR_INTERPOLATION_BITS to say < 8 rather than <= 8
Since a4c79d695d the constant
BILINEAR_INTERPOLATION_BITS must be strictly less than 8, so fix the
comment to say this, and also add a COMPILE_TIME_ASSERT in the
bilinear fetcher in pixman-fast-path.c
2014-10-05 12:42:47 -07:00
Matt Turner
f078727f39 mmx: Add nearest over_8888_8888
lowlevel-blt-bench -n, over_8888_8888, 15 iterations on Loongson 2f:

           Before          After
          Mean StdDev     Mean StdDev   Change
    L1    15.8   0.02     24.0   0.06   +52.0%
    L2    14.8   0.15     23.3   0.13   +56.9%
    M     10.3   0.01     13.8   0.03   +33.6%
    HT    10.0   0.02     14.5   0.05   +44.7%
    VT     9.7   0.02     13.5   0.04   +39.2%
    R      9.1   0.01     12.2   0.04   +34.4%
    RT     7.1   0.06      8.9   0.09   +25.2%
2014-09-05 00:22:07 -07:00
Matt Turner
f868ff5e34 mmx: Add nearest over_8888_n_8888
lowlevel-blt-bench -n, over_8888_n_8888, 15 iterations on Loongson 2f:

           Before          After
          Mean StdDev     Mean StdDev   Change
    L1     9.7   0.01     19.2   0.02   +98.2%
    L2     9.6   0.11     19.2   0.16   +99.5%
    M      7.3   0.02     12.5   0.01   +72.0%
    HT     6.6   0.01     13.4   0.02  +103.2%
    VT     6.4   0.01     12.6   0.03   +96.1%
    R      6.3   0.01     11.2   0.01   +76.5%
    RT     4.4   0.01      8.1   0.03   +82.6%
2014-09-05 00:22:04 -07:00
Julien Cristau
b483955605 Upload to unstable 2014-08-23 22:16:47 -07:00
intrigeri
c03d98f8d1 Enable hardening build flags with dpkg-buildflags.
All default dpkg-buildflags, plus the bonus bindnow one, are used.
The last available one (PIE) is not applicable to shared libraries.
2014-08-23 22:16:13 -07:00
Cyril Brulebois
b16d4c7ed7 Upload to unstable 2014-08-18 22:52:38 +02:00
Julien Cristau
f9c2d54a62 Disable vmx on ppc64el (closes: #745547).
Thanks, Breno Leitao!
2014-07-24 22:43:15 +02:00
Julien Cristau
cd23302b1a Upload to unstable 2014-07-13 16:31:09 +02:00
Julien Cristau
98eadfa08b Remove Cyril from Uploaders. 2014-07-13 16:31:02 +02:00
Julien Cristau
9e8362a51f Bump debhelper compat level to 9. 2014-07-13 16:24:29 +02:00
Julien Cristau
6a7a144be1 Bump changelogs 2014-07-13 16:22:42 +02:00
Julien Cristau
fd99d1a9c8 pixman 0.32.6 release
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQEcBAABAgAGBQJTuIYBAAoJEIWlZJw4kjNuHA8H/0wBuk7d/7uqCAfqyQ3o5Qs9
 q00UvsZVCymC6f1Hh+bgQGtMHgy2Wo1gvw/usSoxlxqc+T4wWeN912RPZwvprVzn
 v9+J7UyjLH28yUVq9NBn91LqHEWfzLK8gf3Y3i3IIQNd9YtIkqjPMyKDuTaQVUYc
 Op6vzXzjzwf0lKjTTZOWsnm9Zh6vvFoqVOajS6hSvA20/xczknAbU3HfUIBI+G4o
 6/br7A6OpIB08vFAJd1XJpAkrHjjIJCECg3wxsfxuCYcoRSWhUPoul4IEkHXn4p4
 mTKjTzBxuDM85FAadTT7PxygABelcQljMlJPKwY4rJwz5t8/yFLc5h5WXft2laI=
 =z2mk
 -----END PGP SIGNATURE-----

Merge tag 'pixman-0.32.6' into debian-unstable

pixman 0.32.6 release
2014-07-13 16:21:47 +02:00
Søren Sandmann Pedersen
87eea99e44 Pre-release version bump to 0.32.6 2014-07-05 18:55:43 -04:00
Siarhei Siamashka
9f18ea3483 configure.ac: Check if the compiler supports GCC vector extensions
The Intel Compiler 14.0.0 claims version GCC 4.7.3 compatibility
via __GNUC__/__GNUC__MINOR__ macros, but does not provide the same
level of GCC vector extensions support as the original GCC compiler:
    http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

Which results in the following compilation failure:

In file included from ../test/utils.h(7),
                 from ../test/utils.c(3):
../test/utils-prng.h(138): error: expression must have integral type
      uint32x4 e = x->a - ((x->b << 27) + (x->b >> (32 - 27)));
                            ^

The problem is fixed by doing a special check in configure for
this feature.
2014-07-04 20:52:59 -04:00
Søren Sandmann
50d7b5fa8e create_bits(): Cast the result of height * stride to size_t
In create_bits() both height and stride are ints, so the result is
also an int, which will overflow if height or stride are big enough
and size_t is bigger than int.

This patch simply casts height to size_t to prevent these overflows,
which prevents the crash in:

    https://bugzilla.redhat.com/show_bug.cgi?id=972647

It's not even close to fixing the full problem of supporting big
images in pixman.

See also

    https://bugs.freedesktop.org/show_bug.cgi?id=69014
2014-07-04 20:50:58 -04:00
Nemanja Lukic
6d2cf40166 MIPS: Fix exported symbols in public API. 2014-07-03 13:35:21 -04:00
Nemanja Lukic
c42824ebb5 MIPS: Fix exported symbols in public API. 2014-07-03 13:34:53 -04:00
Søren Sandmann Pedersen
5a2edb3f2c test: Rearrange tests in order of increasing runtime
Making short tests run first is convenient to catch obvious bugs
early.
2014-06-28 19:24:27 -04:00
Søren Sandmann Pedersen
9cd283b2eb pixman-gradient-walker: Make left_x and right_x 64 bit variables
The variables left_x, and right_x in gradient_walker_reset() are
computed from pos, which is a 64 bit quantity, so to avoid overflows,
these variables must be 64 bit as well.

Similarly, the left_x and right_x that are stored in
pixman_gradient_walker_t need to be 64 bit as well; otherwise,
pixman_gradient_walker_pixel() will call reset too often.

This fixes the radial-invalid test, which was generating 'invalid'
floating point exceptions when the overflows caused color values to be
outside of [0, 255].
2014-05-15 13:29:58 -04:00
Søren Sandmann Pedersen
f5f5dbbbc6 test: Add radial-invalid test program
This program demonstrates a bug in gradient walker, where some integer
overflows cause colors outside the range [0, 255] to be generated,
which in turns cause 'invalid' floating point exceptions when those
colors are converted to uint8_t.

The bug was first reported by Owen Taylor on the #cairo IRC channel.
2014-05-15 13:29:38 -04:00
Ben Avison
91f32ce961 ARMv6: Add fast path for src_x888_0565
Benchmark results, "before" is upstream/master
5f661ee719, and "after" contains this
patch on top.

lowlevel-blt-bench, src_8888_0565, 100 iterations:

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    25.9   0.20    115.6   0.70    100.00%    +347.1%
L2    14.4   0.23     52.7   3.48    100.00%    +265.0%
M     14.1   0.01     79.8   0.17    100.00%    +465.9%
HT    10.2   0.03     32.9   0.31    100.00%    +221.2%
VT     9.8   0.03     29.8   0.25    100.00%    +203.4%
R      9.4   0.03     27.8   0.18    100.00%    +194.7%
RT     4.6   0.04     10.9   0.29    100.00%    +135.9%

At most 19 outliers rejected per test per set.

cairo-perf-trace with trimmed traces results were indifferent.

A system-wide perf_3.10 profile on Raspbian shows significant
differences in the X server CPU usage. The following were measured from
a 130x62 char lxterminal running 'dmesg' every 0.5 seconds for roughly
30 seconds. These profiles are libpixman.so symbols only.

Before:

Samples: 63K of event 'cpu-clock', Event count (approx.): 2941348112, DSO: libpixman-1.so.0.33.1
 37.77%  Xorg  [.] fast_fetch_r5g6b5
 14.39%  Xorg  [.] pixman_composite_over_n_8_8888_asm_armv6
  8.51%  Xorg  [.] fast_write_back_r5g6b5
  7.38%  Xorg  [.] pixman_composite_src_8888_8888_asm_armv6
  4.39%  Xorg  [.] pixman_composite_add_8_8_asm_armv6
  3.69%  Xorg  [.] pixman_composite_src_n_8888_asm_armv6
  2.53%  Xorg  [.] _pixman_image_validate
  2.35%  Xorg  [.] pixman_image_composite32

After:

Samples: 31K of event 'cpu-clock', Event count (approx.): 3619782704, DSO: libpixman-1.so.0.33.1
 22.36%  Xorg  [.] pixman_composite_over_n_8_8888_asm_armv6
 13.59%  Xorg  [.] pixman_composite_src_x888_0565_asm_armv6
 12.75%  Xorg  [.] pixman_composite_src_8888_8888_asm_armv6
  6.79%  Xorg  [.] pixman_composite_add_8_8_asm_armv6
  5.95%  Xorg  [.] pixman_composite_src_n_8888_asm_armv6
  4.12%  Xorg  [.] pixman_image_composite32
  3.69%  Xorg  [.] _pixman_image_validate
  3.65%  Xorg  [.] _pixman_bits_image_setup_accessors

Before, fast_fetch_r5g6b5 + fast_write_back_r5g6b5 took 46% of the
samples in libpixman, and probably incurred some memcpy() load, too.
After, pixman_composite_src_x888_0565_asm_armv6 takes 14%. Note, that
the sample counts are very different before/after, as less time is spent
in Pixman and running time is not exactly the same.

Furthermore, in the above test, the CPU idle function was sampled 9%
before, and 15% after.

v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
	Re-benchmarked on Raspberry Pi, commit message.
2014-05-01 15:11:42 -04:00
Pekka Paalanen
5f661ee719 ARM: use pixman_asm_function in internal headers
The two ARM headers contained open-coded copies of pixman_asm_function,
replace these.

Since it seems customary that ARM headers do not use CPP include guards,
rely on the .S files to #include "pixman-arm-asm.h" first. They all
do now.

v2: Fix a build failure on rpi by adding one #include.
2014-04-21 20:38:09 -04:00
Ben Avison
ab587b444c ARMv6: Add fast path for in_reverse_8888_8888
Benchmark results, "before" is the patch
* upstream/master 4b76bbfda6
+ ARMv6: Support for very variable-hungry composite operations
+ ARMv6: Add fast path for over_n_8888_8888_ca
and "after" contains the additional patches on top:
+ ARMv6: Add fast path flag to force no preload of destination buffer
+ ARMv6: Add fast path for in_reverse_8888_8888 (this patch)

lowlevel-blt-bench, in_reverse_8888_8888, 100 iterations:

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    21.1   0.07     32.3   0.08    100.00%     +52.9%
L2    11.6   0.29     18.0   0.52    100.00%     +54.4%
M     10.5   0.01     16.1   0.03    100.00%     +54.1%
HT     8.2   0.02     12.0   0.04    100.00%     +45.9%
VT     8.1   0.02     11.7   0.04    100.00%     +44.5%
R      8.1   0.02     11.3   0.04    100.00%     +39.7%
RT     4.8   0.04      6.1   0.09    100.00%     +27.3%

At most 12 outliers rejected per test per set.

cairo-perf-trace with trimmed traces, 30 iterations:

                                    Before          After
                                   Mean StdDev     Mean StdDev   Confidence   Change
t-firefox-paintball.trace          18.0   0.01     14.1   0.01    100.00%     +27.4%
t-firefox-chalkboard.trace         36.7   0.03     36.0   0.02    100.00%      +1.9%
t-firefox-canvas-alpha.trace       20.7   0.22     20.3   0.22    100.00%      +1.9%
t-swfdec-youtube.trace              7.8   0.03      7.8   0.03    100.00%      +0.9%
t-firefox-talos-gfx.trace          25.8   0.44     25.6   0.29     93.87%      +0.7%  (insignificant)
t-firefox-talos-svg.trace          20.6   0.04     20.6   0.03    100.00%      +0.2%
t-firefox-fishbowl.trace           21.2   0.04     21.1   0.02    100.00%      +0.2%
t-xfce4-terminal-a1.trace           4.8   0.01      4.8   0.01     98.85%      +0.2%  (insignificant)
t-swfdec-giant-steps.trace         14.9   0.03     14.9   0.02     99.99%      +0.2%
t-poppler-reseau.trace             22.4   0.11     22.4   0.08     86.52%      +0.2%  (insignificant)
t-gnome-system-monitor.trace       17.3   0.03     17.2   0.03     99.74%      +0.2%
t-firefox-scrolling.trace          24.8   0.12     24.8   0.11     70.15%      +0.1%  (insignificant)
t-firefox-particles.trace          27.5   0.18     27.5   0.21     48.33%      +0.1%  (insignificant)
t-grads-heat-map.trace              4.4   0.04      4.4   0.04     16.61%      +0.0%  (insignificant)
t-firefox-fishtank.trace           13.2   0.01     13.2   0.01      7.64%      +0.0%  (insignificant)
t-firefox-canvas.trace             18.0   0.05     18.0   0.05      1.31%      -0.0%  (insignificant)
t-midori-zoomed.trace               8.0   0.01      8.0   0.01     78.22%      -0.0%  (insignificant)
t-firefox-planet-gnome.trace       10.9   0.02     10.9   0.02     64.81%      -0.0%  (insignificant)
t-gvim.trace                       33.2   0.21     33.2   0.18     38.61%      -0.1%  (insignificant)
t-firefox-canvas-swscroll.trace    32.2   0.09     32.2   0.11     73.17%      -0.1%  (insignificant)
t-firefox-asteroids.trace          11.1   0.01     11.1   0.01    100.00%      -0.2%
t-evolution.trace                  13.0   0.05     13.0   0.05     91.99%      -0.2%  (insignificant)
t-gnome-terminal-vim.trace         19.9   0.14     20.0   0.14     97.38%      -0.4%  (insignificant)
t-poppler.trace                     9.8   0.06      9.8   0.04     99.91%      -0.5%
t-chromium-tabs.trace               4.9   0.02      4.9   0.02    100.00%      -0.6%

At most 6 outliers rejected per test per set.

Cairo perf reports the running time, but the change is computed for
operations per second instead (inverse of running time).

Confidence is based on Welch's t-test. Absolute changes less than 1%
can be accounted as measurement errors, even if statistically
significant.

There was a question of why FLAG_NO_PRELOAD_DST is used. It makes
lowlevel-blt-bench results worse except for L1, but improves some
Cairo trace benchmarks.

"Ben Avison" <bavison@riscosopen.org> wrote:

> The thing with the lowlevel-blt-bench benchmarks for the more
> sophisticated composite types (as a general rule, anything that involves
> branches at the per-pixel level) is that they are only profiling the case
> where you have mid-level alpha values in the source/mask/destination.
> Real-world images typically have a disproportionate number of fully
> opaque and fully transparent pixels, which is why when there's a
> discrepancy between which implementation performs best with cairo-perf
> trace versus lowlevel-blt-bench, I usually favour the Cairo winner.
>
> The results of removing FLAG_NO_PRELOAD_DST (in other words, adding
> preload of the destination buffer) are easy to explain in the
> lowlevel-blt-bench results. In the L1 case, the destination buffer is
> already in the L1 cache, so adding the preloads is simply adding extra
> instruction cycles that have no effect on memory operations. The "in"
> compositing operator depends upon the alpha of both source and
> destination, so if you use uniform mid-alpha, then you actually do need
> to read your destination pixels, so you benefit from preloading them. But
> for fully opaque or fully transparent source pixels, you don't need to
> read the corresponding destination pixel - it'll either be left alone or
> overwritten. Since the ARM11 doesn't use write-allocate cacheing, both of
> these cases avoid both the time taken to load the extra cachelines, as
> well as increasing the efficiency of the cache for other data. If you
> examine the source images being used by the Cairo test, you'll probably
> find they mostly use transparent or opaque pixels.

v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
	Rebased, re-benchmarked on Raspberry Pi, commit message.

v5, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
	Rebased, re-benchmarked on Raspberry Pi due to a fix to
	"ARMv6: Add fast path for over_n_8888_8888_ca" patch.
2014-04-21 20:34:26 -04:00
Ben Avison
68d2f7b486 ARMv6: Add fast path flag to force no preload of destination buffer 2014-04-21 20:34:26 -04:00
Ben Avison
4ad769cbec ARMv6: Add fast path for over_n_8888_8888_ca
Benchmark results, "before" is
* upstream/master 4b76bbfda6
"after" contains the additional patches on top:
+ ARMv6: Support for very variable-hungry composite operations
+ ARMv6: Add fast path for over_n_8888_8888_ca (this patch)

lowlevel-blt-bench, over_n_8888_8888_ca, 100 iterations:

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1     2.7   0.00     16.1   0.06    100.00%    +500.7%
L2     2.4   0.01     14.1   0.15    100.00%    +489.9%
M      2.3   0.00     14.3   0.01    100.00%    +510.2%
HT     2.2   0.00      9.7   0.03    100.00%    +345.0%
VT     2.2   0.00      9.4   0.02    100.00%    +333.4%
R      2.2   0.01      9.5   0.03    100.00%    +331.6%
RT     1.9   0.01      5.5   0.07    100.00%    +192.7%

At most 1 outliers rejected per test per set.

cairo-perf-trace with trimmed traces, 30 iterations:

                                    Before          After
                                   Mean StdDev     Mean StdDev   Confidence   Change
t-firefox-talos-gfx.trace          33.1   0.42     25.8   0.44    100.00%     +28.6%
t-firefox-scrolling.trace          31.4   0.11     24.8   0.12    100.00%     +26.3%
t-gnome-terminal-vim.trace         22.4   0.10     19.9   0.14    100.00%     +12.5%
t-evolution.trace                  13.9   0.07     13.0   0.05    100.00%      +6.5%
t-firefox-planet-gnome.trace       11.6   0.02     10.9   0.02    100.00%      +6.5%
t-gvim.trace                       34.0   0.21     33.2   0.21    100.00%      +2.4%
t-chromium-tabs.trace               4.9   0.02      4.9   0.02    100.00%      +1.0%
t-poppler.trace                     9.8   0.05      9.8   0.06    100.00%      +0.7%
t-firefox-canvas-swscroll.trace    32.3   0.10     32.2   0.09    100.00%      +0.4%
t-firefox-paintball.trace          18.1   0.01     18.0   0.01    100.00%      +0.3%
t-poppler-reseau.trace             22.5   0.09     22.4   0.11     99.29%      +0.3%
t-firefox-canvas.trace             18.1   0.06     18.0   0.05     99.29%      +0.2%
t-xfce4-terminal-a1.trace           4.8   0.01      4.8   0.01     99.77%      +0.2%
t-firefox-fishbowl.trace           21.2   0.03     21.2   0.04    100.00%      +0.2%
t-gnome-system-monitor.trace       17.3   0.03     17.3   0.03     99.54%      +0.1%
t-firefox-asteroids.trace          11.1   0.01     11.1   0.01    100.00%      +0.1%
t-midori-zoomed.trace               8.0   0.01      8.0   0.01     99.98%      +0.1%
t-grads-heat-map.trace              4.4   0.04      4.4   0.04     34.08%      +0.1%  (insignificant)
t-firefox-talos-svg.trace          20.6   0.03     20.6   0.04     54.06%      +0.0%  (insignificant)
t-firefox-fishtank.trace           13.2   0.01     13.2   0.01     52.81%      -0.0%  (insignificant)
t-swfdec-giant-steps.trace         14.9   0.02     14.9   0.03     85.50%      -0.1%  (insignificant)
t-firefox-chalkboard.trace         36.6   0.02     36.7   0.03    100.00%      -0.2%
t-firefox-canvas-alpha.trace       20.7   0.32     20.7   0.22     55.76%      -0.3%  (insignificant)
t-swfdec-youtube.trace              7.8   0.02      7.8   0.03    100.00%      -0.5%
t-firefox-particles.trace          27.4   0.16     27.5   0.18     99.94%      -0.6%

At most 4 outliers rejected per test per set.

Cairo perf reports the running time, but the change is computed for
operations per second instead (inverse of running time).

Confidence is based on Welch's t-test. Absolute changes less than 1%
can be accounted as measurement errors, even if statistically
significant.

v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
	Use pixman_asm_function instead of startfunc.
	Rebased. Re-benchmarked on Raspberry Pi.
	Commit message.

v5, Ben Avison <bavison@riscosopen.org> :
	Fixed the bug exposed in blitters-test 4928372.
	15 hours of testing, compared to the 45 minutes to hit
	the bug originally.
    Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
	Squash the fix, re-benchmark on Raspberry Pi.
2014-04-21 20:34:26 -04:00
Ben Avison
73d2f8b61a ARMv6: Support for very variable-hungry composite operations
Previously, the variable ARGS_STACK_OFFSET was available to extract values
from function arguments during the init macro. Now this changes dynamically
around stack operations in the function as a whole so that arguments can be
accessed at any point. It is also joined by LOCALS_STACK_OFFSET, which
allows access to space reserved on the stack during the init macro.

On top of this, composite macros now have the option of using all of WK0-WK3
registers rather than just the subset it was told to use; this requires the
pixel count to be spilled to the stack over the leading pixels at the start
of each line. Thus, at best, each composite operation can use 11 registers,
plus any pointer registers not required for the composite type, plus as much
stack space as it needs, divided up into constants and variables as necessary.
2014-04-21 20:34:26 -04:00
Søren Sandmann
857e40f3d2 create_bits(): Cast the result of height * stride to size_t
In create_bits() both height and stride are ints, so the result is
also an int, which will overflow if height or stride are big enough
and size_t is bigger than int.

This patch simply casts height to size_t to prevent these overflows,
which prevents the crash in:

    https://bugzilla.redhat.com/show_bug.cgi?id=972647

It's not even close to fixing the full problem of supporting big
images in pixman.

See also

    https://bugs.freedesktop.org/show_bug.cgi?id=69014
2014-04-15 14:21:14 -04:00
Pekka Paalanen
4b76bbfda6 ARM: share pixman_asm_function definition
Several files define identically the asm macro pixman_asm_function.
Merge all these definitions into a new asm header.

The original definition is taken from pixman-arm-simd-asm-scaled.S with
the copyright/licence/author blurb verbatim.
2014-04-02 12:48:26 +03:00
Ben Avison
4ee85b0083 ARMv6: Add fast path for over_reverse_n_8888
Benchmark results, "before" is upstream commit
c343846 lowlevel-blt-bench: add in_reverse_8888_8888 test
and "after" is with this patch only added on top.

lowlevel-blt-bench, over_reverse_n_8888, 100 iterations:

       Before          After
      Mean StdDev     Mean StdDev   Confidence   Change
L1    15.1    0.1    274.5    2.3    100.00%   +1718.9%
L2    12.8    0.3    181.8    0.7    100.00%   +1315.5%
M     10.8    0.0     77.9    0.0    100.00%    +621.2%
HT     9.7    0.0     29.4    0.2    100.00%    +204.9%
VT     9.5    0.0     26.7    0.1    100.00%    +179.3%
R      9.3    0.0     25.3    0.1    100.00%    +173.6%
RT     6.0    0.1     11.0    0.2    100.00%     +82.9%

At most 16 outliers rejected per case per set.

cairo-perf-trace with trimmed traces, 30 iterations:

                                    Before          After
                                   Mean StdDev     Mean StdDev   Confidence   Change
t-poppler.trace                    12.9    0.1      9.7    0.0    100.00%     +32.6%
t-firefox-talos-gfx.trace          33.2    0.7     32.9    0.4     95.23%      +0.9%  (insignificant)
t-firefox-particles.trace          27.4    0.1     27.3    0.2     99.65%      +0.4%
t-firefox-canvas-alpha.trace       20.5    0.3     20.5    0.3     57.51%      +0.3%  (insignificant)
t-poppler-reseau.trace             22.4    0.1     22.4    0.1     95.69%      +0.3%  (insignificant)
t-firefox-fishtank.trace           13.2    0.0     13.2    0.0     99.84%      +0.1%
t-swfdec-giant-steps.trace         14.9    0.0     14.9    0.0     87.68%      +0.1%  (insignificant)
t-swfdec-youtube.trace              7.8    0.0      7.8    0.0     35.22%      +0.1%  (insignificant)
t-firefox-planet-gnome.trace       11.5    0.0     11.5    0.0     29.37%      +0.0%  (insignificant)
t-firefox-fishbowl.trace           21.2    0.0     21.2    0.0     18.09%      +0.0%  (insignificant)
t-grads-heat-map.trace              4.4    0.0      4.4    0.0      1.84%      +0.0%  (insignificant)
t-firefox-paintball.trace          18.0    0.0     18.0    0.0     33.43%      -0.0%  (insignificant)
t-firefox-talos-svg.trace          20.5    0.0     20.5    0.1     68.56%      -0.1%  (insignificant)
t-midori-zoomed.trace               8.0    0.0      8.0    0.0     99.98%      -0.1%
t-firefox-canvas-swscroll.trace    32.1    0.1     32.1    0.1     85.27%      -0.1%  (insignificant)
t-gnome-system-monitor.trace       17.2    0.0     17.2    0.0     99.97%      -0.2%
t-firefox-chalkboard.trace         36.5    0.0     36.6    0.0    100.00%      -0.2%
t-firefox-asteroids.trace          11.1    0.0     11.1    0.0    100.00%      -0.2%
t-firefox-canvas.trace             17.9    0.0     18.0    0.0    100.00%      -0.3%
t-chromium-tabs.trace               4.9    0.0      4.9    0.0     97.95%      -0.3%  (insignificant)
t-xfce4-terminal-a1.trace           4.8    0.0      4.8    0.0    100.00%      -0.4%
t-firefox-scrolling.trace          31.1    0.1     31.2    0.1    100.00%      -0.5%
t-evolution.trace                  13.7    0.1     13.8    0.1     99.99%      -0.6%
t-gnome-terminal-vim.trace         22.0    0.2     22.2    0.1     99.99%      -0.7%
t-gvim.trace                       33.2    0.2     33.5    0.2    100.00%      -0.8%

At most 6 outliers rejected per case per set.

Cairo perf reports the running time, but the change is computed for
operations per second instead (inverse of running time).

Changes in the order of +/- 1% can be accounted for measurement errors,
even if they are deemed to be statistically significant. This claim is
based on comparing two 30-iteration identical "before" runs using the
exact same binaries, and observing changes from -0.4% to +0.5% with
>=99% confidence.

Confidence is based on Welch's t-test.

v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
	Rebased, re-benchmarked on Raspberry Pi, commit message.
2014-04-02 12:46:24 +03:00
Siarhei Siamashka
56622140e3 test: Fix OpenMP clauses for the tolerance-test
Compiling with the Intel Compiler reveals a problem:

tolerance-test.c(350): error: index variable "i" of for statement following an OpenMP for pragma must be private
  #       pragma omp parallel for default(none) shared(i) private (result)
  ^

In addition to this, the 'result' variable also should not be private
(otherwise its value does not survive after the end of the loop). It
needs to be either shared or use the reduction clause to describe how
the results from multiple threads are combined together. Reduction
seems to be more appropriate here.
2014-04-02 12:46:09 +03:00
Siarhei Siamashka
840912b311 configure.ac: Check if the compiler supports GCC vector extensions
The Intel Compiler 14.0.0 claims version GCC 4.7.3 compatibility
via __GNUC__/__GNUC__MINOR__ macros, but does not provide the same
level of GCC vector extensions support as the original GCC compiler:
    http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

Which results in the following compilation failure:

In file included from ../test/utils.h(7),
                 from ../test/utils.c(3):
../test/utils-prng.h(138): error: expression must have integral type
      uint32x4 e = x->a - ((x->b << 27) + (x->b >> (32 - 27)));
                            ^

The problem is fixed by doing a special check in configure for
this feature.
2014-04-02 12:46:04 +03:00
Ben Avison
c343846625 lowlevel-blt-bench: add in_reverse_8888_8888 test
in_reverse_8888_8888 is one of the more commonly used operations in the
cairo-perf-trace suite that hasn't been in lowlevel-blt-bench until now.

v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
	Split from "Add extra test to lowlevel-blt-bench and fix an
	existing one", new summary.
2014-03-20 08:33:05 -04:00
Ben Avison
898859f3d3 lowlevel-blt-bench: over_reverse_n_8888 needs solid source
v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
	Split from "Add extra test to lowlevel-blt-bench and fix an
	existing one", new summary.
2014-03-20 08:33:05 -04:00
Ben Avison
38317cbfde ARMv6: remove 1 instr per row in generate_composite_function
This knocks off one instruction per row. The effect is probably too small to
be measurable, but might as well be included. The second occurrence of this
sequence doesn't actually benefit at all, but is changed for consistency.

The saved instruction comes from combining the "and" inside the .if
statement with an earlier "tst". The "and" was normally needed, except
for in one special case, where bits 4-31 were all shifted off the top of
the register later on in preload_leading_step2, so we didn't care about
their values.

v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
	Remove "bits 0-3" from the comments, update patch summary, and
	augment message with Ben's suggestion.
2014-03-20 08:33:05 -04:00
Ben Avison
763a6d3e67 ARMv6: Fix indentation in the composite macros 2014-03-20 08:33:05 -04:00
Søren Sandmann
82d094654a Remove all the operators that use division from pixman-combine32.c
These are now handled by floating point combiners.
2014-01-04 16:13:27 -05:00
Søren Sandmann
ccb1df0c5e Copy the comments from pixman-combine32.c to pixman-combine-float.c
An upcoming commit will delete many of the operators from
pixman-combine32.c and rely on the ones in pixman-combine-float.c. The
comments about how the operators were derived are still useful though,
so copy them into pixman-combine-float.c before the deletion.
2014-01-04 16:13:27 -05:00
Søren Sandmann Pedersen
94244b0c40 utils.c: Set DEVIATION to 0.0128
Consider a HARD_LIGHT operation with the following pixels:

- source:           15      (6 bits)
- source alpha:     255     (8 bits)
- mask alpha:       223     (8 bits)
- dest              255     (8 bits)
- dest alpha:       0       (8 bits)

Since 2 times the source is less than source alpha, the first branch
of the hard light blend mode is taken:

        (1 - sa) * d + (1 - da) * s + 2 * s * d

Since da is 0 and d is 1, this degenerates to:

        (1 - sa) + 3 * s

Taking (src IN mask) into account along with the fact that sa is 1,
this becomes:

        (1 - ma) + 3 * s * ma

      = (1 - 223/255.0) + 3 * (15/63.0) * (223/255.0)

      = 0.7501400560224089

When computed with the source converted by bit replication to eight
bits, and additionally with the (src IN mask) part rounded to eight
bits, we get:

        ma = 223/255.0

        s * ma = (60 / 255.0) * (223/255.0) which rounds to 52 / 255

and the result is

        (1 - ma) + 3 * s * ma

      = (1 - 223/255.0) + 3 * 52/255.0

      = 0.7372549019607844

so now we have an error of 0.012885.

Without making changes to the way pixman does integer
rounding/arithmetic, this error must then be considered
acceptable. Due to conservative computations in the test suite we can
however get away with 0.0128 as the acceptable deviation.

This fixes the remaining failures in pixel-test.
2014-01-04 16:13:27 -05:00
Søren Sandmann
15aa37adec Use floating point combiners for all operators that involve divisions
Consider a DISJOINT_ATOP operation with the following pixels:

- source:	0xff (8 bits)
- source alpha:	0x01 (8 bits)
- mask alpha:	0x7b (8 bits)
- dest:		0x00 (8 bits)
- dest alpha:	0xff (8 bits)

When (src IN mask) is computed in 8 bits, the resulting alpha channel
is 0 due to rounding:

     floor ((0x01 * 0x7b) / 255.0 + 0.5) = floor (0.9823) = 0

which means that since Render defines any division by zero as
infinity, the Fa and Fb for this operator end up as follows:

     Fa = max (1 - (1 - 1) / 0, 0) = 0

     Fb = min (1, (1 - 0) / 1) = 1

and so since dest is 0x00, the overall result is 0.

However, when computed in full precision, the alpha value no longer
rounds to 0, and so Fa ends up being

     Fa = max (1 - (1 - 1) / 0.0001, 0) = 1

and so the result is now

     s * ma * Fa + d * Fb

   = (1.0 * (0x7b / 255.0) * 1) + d * 0

   = 0x7b / 255.0

   = 0.4823

so the error in this case ends up being 0.48235294, which is clearly
not something that can be considered acceptable.

In order to avoid this problem, we need to do all arithmetic in such a
way that a multiplication of two tiny numbers can never end up being
zero unless one of the input numbers is itself zero.

This patch makes all computations that involve divisions take place in
floating point, which is sufficient to fix the test cases

This brings the number of failures in pixel-test down to 14.
2014-01-04 16:13:27 -05:00
Søren Sandmann
8f38243163 Soft Light: Consistent approach to division by zero
The Soft Light operator has several branches. One them is decided
based on whether 2 * s is less than or equal to 2 * sa. In floating
point implementations, when those two values are very close to each
other, it may not be completely predictable which branch we hit.

This is a problem because in one branch, when destination alpha is
zero, we get the result

      r = d * as

and in the other we get

      r = 0

So when d and as are not 0, this causes two different results to be
returned from essentially identical input values. In other words,
there is a discontinuity in the current implementation.

This patch randomly changes the second branch such that it now returns
d * sa instead. There is no deep meaning behind this, because
essentially this is an attempt to assign meaning to division by zero,
and all that is requires is that that meaning doesn't depend on minute
differences in input values.

This makes the number of failed pixels in pixel-test go down to 347.
2014-01-04 16:13:27 -05:00