Commit Graph

21 Commits

Author SHA1 Message Date
Nemanja Lukic
f69335d529 test: add "pixbuf" and "rpixbuf" to lowlevel-blt-bench
Add necessary support to lowlevel-blt benchmark for benchmarking pixbuf and
rpixbuf fast paths. bench_composite function now checks for pixbuf string in
testname, and if that is detected, use same bits for src and mask images.
2013-04-30 15:38:43 -04:00
Nemanja Lukic
3dc9e3827e test: add "src_0888_8888_rev" and "src_0888_0565_rev" to lowlevel-blt-bench 2013-04-30 15:38:43 -04:00
Ben Avison
5e207f825b Fix to lowlevel-blt-bench
The source, mask and destination buffers are initialised to 0xCC just after
they are allocated. Between each benchmark, there are a pair of memcpys,
from the destination buffer to the source buffer and back again (there are
no explanatory comments, but presumably this is an effort to flush the
caches). However, it has an unintended consequence, which is to change the
contents of the buffers on entry to subsequent benchmarks. This means it is
not a fair test: for example, with over_n_8888 (featured in the following
patches) it reports L2 and even M tests as being faster than the L1 test,
because after the L1 test, the source buffer is filled with fully opaque
pixels, for which over_n_8888 has a shortcut.

The fix here is simply to reverse the order of the memcpys, so src and
destination are both filled with 0xCC on entry to all tests.
2013-02-13 02:24:34 -05:00
Ben Avison
69a7a9b6b6 Improve L1 and L2 benchmark tests for caches that don't use allocate-on-write
In particular this affects single-core ARMs (e.g. ARM11, Cortex-A8), which
are usually configured this way. For other CPUs, this should only add a
constant time, which will be cancelled out by the EXCLUDE_OVERHEAD runs.

The problems were caused by cachelines becoming permanently evicted from
the cache, because the code that was intended to pull them back in again on
each iteration assumed too long a cache line (for the L1 test) or failed to
read memory beyond the first pixel row (for the L2 test). Also, the reloading
of the source buffer was unnecessary.

These issues were identified by Siarhei in this post:
http://lists.freedesktop.org/archives/pixman/2013-January/002543.html
2013-01-29 15:23:05 -05:00
Ben Avison
24e83cae64 Tweaks to lowlevel-blt-bench
This adds two extra tests, src_n_8 and src_8_8, which I have been
using to benchmark my ARMv6 changes.

I'd also like to propose that it requires an exact test name as the
executable's argument, as achieved by this strstr to strcmp change.
Without this, it is impossible to only benchmark (for example)
add_8_8, add_n_8 or src_n_8, due to those also being substrings of
many other test names.
2013-01-25 11:13:07 -05:00
Siarhei Siamashka
e4519360c1 test: add "src_0565_8888" to lowlevel-blt-bench 2012-12-18 20:43:51 +02:00
Siarhei Siamashka
fc162bad56 test: support nearest/bilinear scaling in lowlevel-blt-bench
Scale factor is selected to be nearly 1x, so that the MPix/s results
can be directly compared with the results of non-scaled compositing
operations.
2012-06-29 03:24:29 +03:00
Matt Turner
62c4bdc94f mmx: add over_reverse_n_8888
Loongson:
over_reverse_n_8888 =  L1:  16.04  L2:  15.35  M: 10.20 ( 27.96%)  HT: 10.95  VT: 10.45  R:  9.18  RT:  6.99 (  76Kops/s)
over_reverse_n_8888 =  L1:  27.40  L2:  26.67  M: 16.97 ( 45.78%)  HT: 16.66  VT: 15.38  R: 14.15  RT:  9.44 (  97Kops/s)

image                      poppler   34.106   35.500   1.48%    6/6
image                      poppler   29.598   30.835   1.70%    6/6

ARM/iwMMXt:
over_reverse_n_8888 =  L1:  15.63  L2:  14.33  M: 10.83 ( 27.55%)  HT:  9.78  VT:  9.91  R:  9.49  RT:  6.96 (  69Kops/s)
over_reverse_n_8888 =  L1:  22.79  L2:  19.40  M: 13.76 ( 34.19%)  HT: 11.66  VT: 11.86  R: 11.17  RT:  7.85 (  75Kops/s)

image                      poppler   38.040   38.606   1.10%    6/6
image                      poppler   31.686   32.278   0.80%    5/6
2012-05-26 20:32:27 -04:00
Matt Turner
3c3c70fa0b lowlevel-blt-bench: add in_8_8 and in_n_8_8
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-03-01 17:42:37 -05:00
Matt Turner
e43d65d49d lowlevel-blt: add over_x888_n_8888
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-24 20:02:55 -05:00
Matt Turner
9f60704995 lowlevel-blt: add over_8888_8888
Signed-off-by: Matt Turner <mattst88@gmail.com>
2012-02-24 19:58:09 -05:00
Andrea Canciani
97b9fa090c Use the ARRAY_LENGTH() macro when possible
This patch has been generated by the following Coccinelle semantic patch:

// Use the ARRAY_LENGTH() macro when possible
//
// Replace open-coded array length computations with the
// ARRAY_LENGTH() macro

@@
type T;
T[] E;
@@
- (sizeof(E)/sizeof(T))
+ ARRAY_LENGTH (E)
2011-11-09 09:17:00 +01:00
Andrea Canciani
06760f5cb0 test: Cleanup includes
All the tests are linked to libutil, hence it makes sence to always
include utils.h and reuse what it provides (config.h inclusion, access
to private pixman APIs, ARRAY_LENGTH, ...).
2011-11-09 09:17:00 +01:00
Matt Turner
b6b77488a0 lowlevel-blt: add over_x888_8_8888
Signed-off-by: Matt Turner <mattst88@gmail.com>
2011-09-26 11:29:51 -04:00
Andrea Canciani
a1ebff0dcb win32: Build benchmarks
Add the makefile rules needed to compile lowlevel-blt-bench on win32
and fix the compilation errors.
2011-08-29 07:37:46 +02:00
Søren Sandmann Pedersen
bdfb5944ff Don't include stdint.h in lowlevel-blt-bench.c
Some systems don't have the file, and the types are already defined in
pixman.h.

https://bugs.freedesktop.org//show_bug.cgi?id=37422
2011-08-11 03:32:14 -04:00
Søren Sandmann Pedersen
a89f8cfaf1 Replace argumentxs to composite functions with a pointer to a struct
This allows more information, such as flags or the composite region,
to be passed to the composite functions.
2011-06-20 02:03:23 -04:00
Søren Sandmann Pedersen
13aed37758 Add a test for over_x888_8_0565 in lowlevel_blt_bench().
The next few commits will speed this up quite a bit.

Current output:

---
reference memcpy speed = 2217.5MB/s (554.4MP/s for 32bpp fills)
---
over_x888_8_0565 =  L1:  54.67  L2:  54.01  M: 52.33 ( 18.88%)  HT: 37.19  VT: 35.54  R: 29.40  RT: 13.63 ( 162Kops/s)
2011-01-28 14:35:17 -05:00
Søren Sandmann Pedersen
ba693d2e88 Fix search-and-replace issue in lowlevel-blt-bench.c 2010-09-28 02:52:17 -04:00
Søren Sandmann Pedersen
77d3e5f6ff Rename all the fast paths with _8000 in their names to _8
This inconsistent naming somehow survived the refactoring from a while
back.
2010-09-28 00:07:47 -04:00
Jonathan Morton
7cd4f2fa20 Add a lowlevel blitter benchmark
This test is a modified version of Siarhei's compositor throughput
benchmark.  It's expanded with explicit reporting of memory bandwidth
consumption for the M-test, and with an additional 8x8-random test
intended to determine peak ops/sec capability.  There are also quite a
lot more operations tested for.
2010-09-21 08:50:18 -04:00