Refer [1], always-inline is not suggested to be used if you have indirect
calls. so replace force_inline with inline to fix error like:
In function ‘combine_inner’,
inlined from ‘combine_soft_light_ca_float’ at ../pixman/pixman-combine-float.c:655:511:
../pixman/pixman-combine-float.c:655:211: error: inlining failed in call to ‘always_inline’ ‘combine_soft_light_c’: function not considered for inlining
Test with gcc-9 and gcc-14, both works well
[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115679
Signed-off-by: Changqing Li <changqing.li@windriver.com>
Temporary version pin of gcovr due to errors in coverage report
generation when running with newly released version 8.x.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
https://github.com/mesonbuild/meson/pull/13604 got merged and released
with Meson 1.6.0, which we already use in the Docker images, so the
workaround can be dropped.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
Some targets require different version of LLVM, so now it's possible to
set it in the target's environment. Mind that the highest available
version depends on the base Debian image.
The change bumps LLVM version for all Linux targets:
- by default from 14 to 16,
- from 16 to 18 for riscv64 (based on Sid; for now, LLVM 19 doesn't have
libomp packaged),
- mipsel stays at 14 as there seem to be some missing packages for
higher versions.
Windows targets stay the same, as they use a different source of LLVM
(MinGW-compatible, which is currently version 18).
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
To ensure that the runtime discovery works correctly, and RVV code is
disabled for target without RVV extension.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
RVV compilation will be enabled for RVV implementation alone, similar to
other platforms. This prevents introducing autovectorized code in the
main library, thus making pixman compatible with RISC-V targets without
RVV.
If rule condition for selectively running Docker image builds was ill
formed. It resulted in build of all images even when not all targets
were selected with ACTIVE_TARGET_PATTERN variable.
If the MR doesn't modify the Docker context, the pipeline should use the
image from upstream.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
This commit unifies codecov and pltcov build and test stages as single
parametrizable GitLab job templates. This cleans up the pipeline flow in
preparation for LLVM support in the pipeline.
Each target has now a Meson cross file, even when using a native
compiler, so that the job template can be better generalized. This also
allows to move architecture-specific build configuration to the cross
file instead of using the additional Meson flags in the job declaration.
This commit merges codecov and pltcov Dockerfiles into a single,
multi-stage Dockerfile. This results in more streamlined Docker image
builds with some common layers which can be reused by multiple images.
Also, by making a common Dockerfile, all common dependencies have the
same exact description, which decreases disparity between different
images for all the supported architectures. Mind that package version
disparity cannot be prevented 100%, as different base images may be used
for different architectures (e.g., Debian Sid for riscv64 instead of
Bookworm).
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
It replaces CODE- and PLT- specific target enable variables. It is a
ground work for unification of codecov and pltcov flows.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
It makes per-targe environment declaration more extensible, as it's
possible now to set custom env variables only for the selected target
for the entire pipeline workflow in a centralized way.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
There was a missing wildcard for Docker directory
change detection, so basically this rule was not
checked correctly.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
Platform coverage checks if the code builds and executes properly for
architectures that are not officially supported by Debian. They don't
contribute to general code coverage report but provide a valuable
insight if all supported platforms are working correctly.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
Add images providing an environment for architecture coverage tests.
There is a separate build for Linux and Windows, as the Windows image is
really large compared to Linux one. It decreases the execution time of
both targets, as the images needed to be pulled by runners are smaller.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
This commit introduces a build and test CI workflow, which tests the
correctness of execution for nearly all configurations supported by
pixman. The notable exception is ARM iWMMXt, which is omitted as it's
soon to be deprecated as mentioned in #98.
The build and test stage is separated, as a single build can be used to
test multiple configurations for a given platform (e.g., MMX, SSE2,
SSSE3 for x86).
Execution is performed using multi-arch Docker images built in the
`docker` stage. The important thing to note is that the runner needs to
have a relatively recent version of Docker and QEMU, and needs to have
the qemu-user-static+binfmt execution enabled.
Once all tests are complete, coverage reports are merged together in the
`summary` stage. Then the result can be used in a GitLab-native coverage
report summary.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
Used to force feature discovery in CI where /proc/cpuinfo is unreliable.
It can happen, e.g., if executed in qemu-user-static mode.
For such a build, MIPS-specific features need to be manually disabled by
using `PIXMAN_DISABLE` env variable.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
DSPr2 can be available for targets other than mips32. Some distros
(e.g., Debian) don't support mips32 but still support mipsel. Extending
the check enables use of such images for testing.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
The image is used in CI pipeline to build and test on different
architectures.
This commit introduces more extensible GitLab CI scheme borrowed from
qemu project.
Signed-off-by: Marek Pikuła <m.pikula@partner.samsung.com>
This avoids callers to have to optimize this codepath, in case this scenario happens.
And definitely it may happen when the function is not explicitly called.
Enable Pointer Authentication Codes (PAC) and Branch Target
Identification (BTI) support for ARM 64 targets.
PAC works by signing the LR with either an A key or B key and verifying
the return address. There are quite a few instructions capable of doing
this, however, the Linux ARM ABI is to use hint compatible instructions
that can be safely NOP'd on older hardware and can be assembled and
linked with older binutils. This limits the instruction set to paciasp,
pacibsp, autiasp and autibsp. Instructions prefixed with pac are for
signing and instructions prefixed with aut are for signing. Both
instructions are then followed with an a or b to indicate which signing
key they are using. The keys can be controlled using
-mbranch-protection=pac-ret for the A key and
-mbranch-protection=pac-ret+b-key for the B key.
BTI works by marking all call and jump positions with bti c and bti
j instructions. If execution control transfers to an instruction other
than a BTI instruction, the execution is killed via SIGILL. Note that
to remove one instruction, the aforementioned pac instructions will
also work as a BTI landing pad for bti c usages.
For BTI to work, all object files linked for a unit of execution,
whether an executable or a library must have the GNU Notes section of
the ELF file marked to indicate BTI support. This is so loader/linkers
can apply the proper permission bits (PROT_BRI) on the memory region.
PAC can also be annotated in the GNU ELF notes section, but it's not
required for enablement, as interleaved PAC and non-pac code works as
expected since it's the callee that performs all the checking. The
linker follows the same rules as BTI for discarding the PAC flag from
the GNU Notes section.
Testing was done under the following CFLAGS and CXXFLAGS for all
combinations:
1. -mbranch-protection=none
2. -mbranch-protection=standard
3. -mbranch-protection=pac-ret
4. -mbranch-protection=pac-ret+b-key
5. -mbranch-protection=bti
Signed-off-by: Bill Roberts <bill.roberts@arm.com>
This reverts commit b4a105d772.
There is an endianness issue in pixman-fast-path.c. In the function
bits_image_fetch_separable_convolution_affine we have this code:
#ifdef WORDS_BIGENDIAN
buffer[k] = (satot << 0) | (srtot << 8) | (sgtot << 16) | (sbtot << 24);
#else
buffer[k] = (satot << 24) | (srtot << 16) | (sgtot << 8) | (sbtot << 0);
#endif
This will write out the pixels as BGRA on big endian systems but
obviously that's wrong. Pixel order should be ARGB on big endian systems
so we don't need any #ifdef for big endian here at all. Instead, the
code should be the same on little and big endian, i.e. it should be just
this line instead of the 5 lines above:
buffer[k] = (satot << 24) | (srtot << 16) | (sgtot << 8) | (sbtot << 0);
Changing the code like this fixes the wrong colors that I get with
pixman on my PowerPC/s390x system.
Here is what cairo.h has to say (which is rooted in pixman):
* @CAIRO_FORMAT_ARGB32: each pixel is a 32-bit quantity, with
* alpha in the upper 8 bits, then red, then green, then blue.
* The 32-bit quantities are stored native-endian. Pre-multiplied
* alpha is used. (That is, 50% transparent red is 0x80800000,
* not 0x80ff0000.) (Since 1.0)
Closes: https://gitlab.freedesktop.org/pixman/pixman/-/issues/78
Signed-off-by: Gayathri Berli <gayathri.berli@ibm.com>
We don't use the historical odd stable release numbering scheme
anymore.
Developers can still enable this debugging code via CFLAGS=-DDEBUG.
Signed-off-by: Simon Ser <contact@emersion.fr>
Closes: https://gitlab.freedesktop.org/pixman/pixman/-/issues/87
The variable is accessed through uint32_t pointer, so it needs to be
aligned to avoid undefined behavior (crashes on architectures which
require aligned accesses).
Closes: https://gitlab.freedesktop.org/pixman/pixman/-/issues/84
GCC 14 introduces a new -Walloc-size included in -Wextra which gives (when forced
to be an error):
```
../pixman/pixman-bits-image.c: In function ‘create_bits’:
../pixman/pixman-bits-image.c:1273:16: error: allocation of insufficient size ‘1’ for type ‘uint32_t’ {aka ‘unsigned int’} with size ‘4’ [-Werror=alloc-size]
1273 | return calloc (buf_size, 1);
| ^~~~~~~~~~~~~~~~~~~~
```
The calloc prototype is:
```
void *calloc(size_t nmemb, size_t size);
```
So, just swap the number of members and size arguments to match the prototype, as
we're initialising 1 element of size `buf_size`. GCC then sees we're not
doing anything wrong.
Signed-off-by: Sam James <sam@gentoo.org>
Based on suggestion from @siamashka.
This lets the compiler pick the vector instruction to use which is
usually the best idea.
Use create_mask_32_128() instead of create_mask_1x32_128() in
vmx_fill(), avoiding loading memory beyond the filler argument on the
stack.
Remove the now-unused create_mask_1x32_128(). This gets rid of some
(correct) warnings from the compiler about indexing beyond the variable
in question.
Since combine4() does not take vector variables as arguments, there's no
need to use a vector variable and casts back and forth to normal scalars
for the arguments.
meson can handle building for win32 (including using visual studio, and
mingw), and does a good deal more than these could. Since we're dropping
autotools, we might as well drop these too.
At this point meson is pretty well tested and seems to pretty much work,
so we can consider dropping an extra build system.
This doesn't solve the problem that pixman's release scripts are part of
the autotools build system (as make targets). One solution might be to
use xorg's release.sh instead.
This enables building the aarch64 assembly with clang.
Changes:
1. Use `.func` or `.endfunc` only if available
2. Prefix macro arg names with `\`
3. Use `\()` instead of `&`
4. Always use commas to separate macro arguments
5. Prefix asm symbols with an undderscore if necessary
As of mingw-w64 commit 463f00975, winnt.h includes emmintrin.h when
compiling with SSE2, causing redefinition errors for our copied MMX
intrinsics. If the build is assuming SSE2 anyway, just use the system
header instead.
Inverse of pixman_region32_not_empty().
Most of the time, callers want to check whether a region is empty,
not whether a region is not empty. This results in code with
double-negatives such as !pixman_region32_not_empty(), which is
confusing to read.
Signed-off-by: Simon Ser <contact@emersion.fr>
After the latest changes and separation of demo- and test-targets,
it was visible that a dependency towards `libtestutils_dep` was
missing in one of the demo-dependencies. This change will fix
this particular problem.
This reverts commit aaf59b0338.
This commit regressed the scaling-test unit test, by apparently allowing
the compiler to emit fused multiply-add instructions in cases they
wouldn't have been allowed before. While using gcc's -ffp-contract=...
flag avoids the issue on amd64, it does not on at least aarch64 and
ppc64.
This is unfortunate, because the commit being reverted resolved
https://gitlab.freedesktop.org/pixman/pixman/-/issues/43 so we will
reintroduce this failure, but after more than a year without a fix for
the unit test, I think it's time to bite the bullet.
Fixes: https://gitlab.freedesktop.org/pixman/pixman/-/issues/49
This explicitly indicates that GNU extensions (like asm) are used.
This fixes build errors when Pixman is used as a Meson subproject.
Signed-off-by: Simon Ser <contact@emersion.fr>
When compiling with MinGW, use of the __thread attribute causes pixman
to gain a dependency on the winpthread DLL. With Autotools, this could
be avoided by configuring with ac_cv_tls=none, causing pixman to fall
back to TlsSetValue() instead.
Add a Meson 'tls' option that can be 'disabled' to skip support for TLS
compiler attributes, or 'enabled' to require a working TLS attribute.
Override the x64 hardware capability autodetection by Solaris Studio
compilers for x64 libraries the same way we do for x86 libraries.
Also fix configure test for this override to work in out-of-tree builds.
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
The library that the pkgconfig file is for should be the first
positional argument. The `libraries:` kwarg is for libraries that the
user must also link against, and which meson does not know about (and
hence cannot automatically add to the `Libs:` or `Requires:` section
in the .pc file).
Fixes:
```
subprojects/pixman/meson.build:564: DEPRECATION: Library pixman-1 was
passed to the "libraries" keyword argument of a previous call to
generate() method instead of first positional argument. Adding
pixman-1 to "Requires" field, but this is a deprecated behaviour that
will change in a future version of Meson. Please report the issue if
this warning cannot be avoided in your case.
```
We use this because of a meson bug that was fixed in 0.52:
https://mesonbuild.com/Release-notes-for-0-52-0.html#improved-support-for-static-libraries
Bump the requirement and remove the extract_all_objects workaround.
This gets rid of a meson warning:
WARNING: extract_all_objects called without setting recursive
keyword argument. Meson currently defaults to
non-recursive to maintain backward compatibility but
the default will be changed in the future.
GTK2 has reached end of life, and GTK3 has been available for a
almost a decade.
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Reviewed-by: Simon Ser <contact@emersion.fr>
Since aarch64 has different neon syntax from aarch32 and has no
support for (older) arm-simd,
there are no SIMD accelerations for pixman on aarch64.
We need new implementations.
This patch also contains Ben Avions's series of patches for aarch32
and now the benchmark results are fine to aarch64.
Please find the result at the below ticket.
Added: https://bugs.freedesktop.org/show_bug.cgi?id=94758
Signed-off-by: Mizuki Asakura <ed6e117f@gmail.com>
This allows callers to pass around const Pixman region in their
APIs, improving type safety and documentation.
Signed-off-by: Simon Ser <contact@emersion.fr>
prng_state and prng_state_data are getting classified as a "Common
symbol" by the compiler due to the convoluted way in which it is
`#include`-ed in various test sources, and that's not read as a valid
symbol by the linker later.
Initializing the symbol clarifies it to the compiler that this
specific declaration is the canonical location for this variable, and
that it's not a "Common symbol".
Fixes https://gitlab.freedesktop.org/pixman/pixman/-/issues/42
Adding const to the return type does nothing and means that the function
pointer types do not match exactly:
error: incompatible function pointer types passing 'const float (int, int)' to parameter of type 'dither_factor_t' (aka 'float (*)(int, int)')
In __bits_image_fetch_affine_no_alpha and __bits_image_fetch_general,
when `wide` is true, the mask is actually an array of argb_t instead
of the array of uint32_t it was cast to, and the access to `mask[i]`
does not correctly detect when the pixel is nontrivial. The code now
uses a check appropriate for argb_t when `wide` is true.
One caveat: this new check only skips entries when the mask pixel data
is binary all zero; this misses cases like `-0.f` which would be caught
by the FLOAT_IS_ZERO macro. As the mask check only appears to be a
performance optimization to avoid loading inconsequential pixels, it
erring on the side of loading more pixels is safe.
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
The important changes here are a handful of places where we replace
memcpy(&m, mask++, sizeof(uint32_t));
or similar code with
uint8_t m = *mask++;
because we're only supposed to be reading a single byte from *mask,
and accessing a 32-bit value may read out of bounds (besides that
it reads values we don't actually want; whether this matters would
depend exactly how the value in m is subsequently used).
I've also changed a bunch of other places to use this same pattern
(a local 8-bit variable) when reading individual bytes from the mask;
the code was inconsistent about this, sometimes casting the byte to
a uint32_t instead. This makes no actual difference, it just seemed
better to use a consistent pattern throughout the file.
AFAICT from the git history, what happened is that the gtk demos rely on
gtk being built with pixman support. pkg-config isn't really expressive
enough to have that information, so the solution that was come up with
was to search for pixman as well as gtk+ and hope that pixman being
installed was.
This isn't actually used anywhere in the meson build anyway, and it's
causing problems for projects that want to use pixman as a supproject
(there's a port of cairo underway that's hitting this), because it
confuses meson.
Add option to include cpu-features.[ch] from a given path
into the build for platforms that don't provide this out
of the box. This is needed on Android.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
This should resolve https://gitlab.freedesktop.org/pixman/pixman/-/issues/22
and make the tests pass with clang.
-ftrapping-math is already the default[1] for gcc, so this should not change
behavior when compiling with gcc. However, clang defaults[2] to -fno-trapping-math,
so -ftrapping-math is needed to avoid floating-point expceptions when running the
combiner and stress tests.
The root causes of this issue is that that pixman-combine-float.c guards floating-point
division operations with a FLOAT_IS_ZERO check e.g.
if (FLOAT_IS_ZERO (sa))
f = 1.0f;
else
f = CLAMP (da / sa);
With -fno-trapping-math, the compiler assumes that division will never trap, so it may
re-order the division and the guard and execute the division first. In most cases,
this would not be an issue, because floating-point exceptions are ignored. However,
these tests call enable_divbyzero_exceptions() which causes the SIGFPE signal to
be sent to the program when a divide by zero exception is raised.
[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
[2] https://clang.llvm.org/docs/UsersManual.html#controlling-floating-point-behavior
The expansion of PIXMAN_DEFINE_THREAD_LOCAL(...) may end in a
function definition, so the following semicolon is considered an
empty top-level declaration, which is not allowed in ISO C.
Reviewed-by: Matt Turner <mattst88@gmail.com>
We want a uint8_t * at the end of this math, because that's what the
function we're about to pass it to takes. But ->bits is a uint32_t, so
if we just do the math in units of that we can avoid the explicit factor
of four which would risk an integer overflow.
Fixes: pixman/pixman#14
MinGW supports __declspec(dllexport) but the current logic that sets
PIXMAN_EXPORT only uses it when building with MSVC, leaving some symbols
hidden when building with MinGW.
This results in an error when trying to link the tests:
-----------------------------------------------------------------------
FAILED: subprojects/pixman/test/combiner-test.exe
x86_64-w64-mingw32-gcc -o subprojects/pixman/test/combiner-test.exe 'subprojects/pixman/test/f48fa9c@@combiner-test@exe/combiner-test.c.obj' -Wl,--allow-shlib-undefined -Wl,--start-group subprojects/pixman/test/libtestutils.a subprojects/pixman/pixman/libpixman-1.dll.a -pthread -fopenmp -fopenmp -lm -mconsole -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -Wl,--end-group
/usr/bin/x86_64-w64-mingw32-ld: subprojects/pixman/test/f48fa9c@@combiner-test@exe/combiner-test.c.obj: in function `main':
.../build/../subprojects/pixman/test/combiner-test.c:124: undefined reference to `_pixman_internal_only_get_implementation'
collect2: error: ld returned 1 exit status
ninja: build stopped: subcommand failed.
-----------------------------------------------------------------------
By using PIXMAN_API also when building with MinGW, the tests can link
successfully and the build succeed.
Tested with x86_64-w64-mingw32-gcc (GCC) 8.3-win32 20191201.
...When we don't have a pthreads implementation available, which is
normally the case on Windows. This attempts to make it easier for people
on Windows to verify whether their builds of Pixman (and Cairo component,
if applicable) are thread-safe. Also, make the number of threads
a #define, so if we need to change it at some point, it's easier.
This re-enables the thread-test program on Windows in Meson builds.
Define the existing PIXMAN_EXPORT to be PIXMAN_API, which can overriden
to be __declspec(dllexport) during the build of the pixman DLL on MSVC
builds, which will be in the next patch.
Also, export more private symbols as they are needed for the test
programs.
This prepares to mark the public APIs that we have in pixman.h so that
we can use compiler directives such as __declspec(dllexport) to export
those symbols.
The build system for libpng for MSVC does not generate a pkg-config file
for us, and CMake support in Meson does not work very well. So, look
for libpng manually on MSVC builds if depedency discovery did not work
out via pkg-config or the CMake config files.
Look also for pthread.h if threading support is found by Meson, as the
underlying threading support may not be PThreads, depending on platform.
For now, disable the thread-test test program if pthread.h and if
necessary, the PThreads library, cannot be found, as the current
implementation assumes the use of PThreads.
Also bump the required Meson version to 0.50.0 since we need it for
-cc.get_argument_syntax()
-For a later commit, the has_headers sub-method for cc.find_library()
The implementation of OpenMP is not compliant for our uses, so disable
it for now by just not checking for it on MSVC builds, as we implicitly
add an /openmp switch to the build, which will cause linking the tests
programs to fail, as the OpenMP implementation is not enough.
-For MSVC builds, do not use the GCC-specific CFlags when checking for
these features.
-For the MMX check, assume that we have good enough MMX intrinsics and
inline assembly support (on ix86), since MSVC provides sufficient
support for those since before the times of MSVC 2008, and 2008 is the
oldest version that we can support, as with the pre-C99 GTK+ stack.
Unfortunately due to x64 compiler issues, pre-Visual Studio 2010 will
crash when building SSSE3 code, so we do not enable building SSSE3 code
on pre-2010 Visual Studio.
Also, for all x64 Visual Studio builds, we do not enable USE_X86_MMX
as inline assembly is not allowed for x64 Visual Studio builds, and
instead use the compatibility instrinsics that we already have in the
code.
Does not make the test pass, but does fix this error:
../test/stress-test.c:538:25: runtime error: signed integer overflow: 2147483647 - -2 cannot be represented in type 'int'
../pixman/pixman-combine32.c:657:1: runtime error: left shift of 128 by 24 places cannot be represented in type 'int'
../pixman/pixman-combine32.c:694:1: runtime error: left shift of 232 by 24 places cannot be represented in type 'int'
../pixman/pixman-combine32.c:712:1: runtime error: left shift of 255 by 24 places cannot be represented in type 'int'
../pixman/pixman-combine32.c:786:1: runtime error: left shift of 255 by 24 places cannot be represented in type 'int'
../pixman/pixman-combine32.c:805:1: runtime error: left shift of 255 by 24 places cannot be represented in type 'int'
../pixman/pixman-access.c:389:2: runtime error: left shift of 1 by 31 places cannot be represented in type 'int'
../pixman/pixman-access.c:1101:2: runtime error: left shift of 2 by 30 places cannot be represented in type 'int'
../pixman/pixman-access.c:1152:2: runtime error: left shift of 2 by 30 places cannot be represented in type 'int'
Reported in https://bugzilla.mozilla.org/show_bug.cgi?id=1580352. Casting the argument to uint32_t should avoid invoking undefined behavior here. We'll still have *implementation-defined* behavior when casting the result back to pixman_fixed_t, but that's better than *undefined*.
Meson doesn't do the expected thing when library() creates a static
library. Instead of combining the libraries together into a single
archive it effectively discards them, resulting in missing symbols.
To work around this we manually unpack the archives and shove the .o
files into the final library. This doesn't affect the shared library at
all, but makes the static library have the necessary symbols
Fixes#33
To avoid potential signed integer overflow (undefined behavior), as implicit integer promotion means the operand becomes a (signed) int.
(Issue originally reported at https://bugzilla.mozilla.org/show_bug.cgi?id=1577669)
GCC on Windows complains that "__declspec(thread)" doesn't work, but still
compiles it, so the meson check doesn't work. The warning printed by gcc:
"warning: 'thread' attribute directive ignored [-Wattributes]"
Pass -Werror=attributes to make the check fail instead.
This fixes the test suite (minus gtk tests) on Windows with mingw.
So that passing "-Ddefault_library=both" also creates a static lib.
Note that Libs.private in the .pc file will still be wrong because of
https://github.com/mesonbuild/meson/issues/3934 (it contains things like
-lpixman-mmx)
The dithering code (specifically `dither_factor_bayer_8`) uses a GNU
extension for binary notation, eg 0b001. This is not supported by MSVC
(at least) and breaks the build on this platform [1].
This patches uses hexadecimal notation instead, fixing the build.
[1]: https://lists.freedesktop.org/archives/pixman/2019-June/004883.html
Reviewed-by: Matt Turner <mattst88@gmail.com>
On some screens (typically low quality laptop screens), using Bayer
ordered dithering has been observed to cause color changes depending on
*where the gradient is rendered on the screen*, causing visible
flickering when moving an image on the screen.
To alleviate the issue, this patch adds support for ordered dithering
using a 64x64 matrix tuned from blue noise. In addition to being devoid
of the positional dependency on screen, the blue noise matrix also
generates more pleasing and less discernable patterns. As such, it is
now the method used for PIXMAN_DITHER_GOOD and PIXMAN_DITHER_BEST
dithering methods.
The 64x64 blue noise matrix has been generated using the provided
`pixman/dither/make-blue-noise.c` script, which uses the
void-and-cluster method.
Changes since v1 (thanks Bill):
- Use uint16_t for the blue noise matrix for lower memory usage
- Use bitwise computation for array index
This adds a dither.c which provides a demo of the dithering feature.
This is based on the scale.c demo for scaling and provides a selection
of intermediate formats and dithering operators (currently, only
PIXMAN_DITHER_ORDERED_BAYER_8) to use. Images are first blitted onto a
surface of the intermediate format with the requested dither setup, then
blitted back onto a a8r8g8b8 surface for display.
This adds support for testing dithered destinations in tolerance-test.
When dithering is enabled, the pixel checker allows for an additional
quantization error.
This patch implements dithering in pixman. A "dither" property is added
to BITS images, which is used to:
- Force rendering to the image to go through the floating point
pipeline. Note that this is different from FAST_PATH_NARROW_FORMAT
as it should not enable the floating point pipeline when reading from
the image.
- Enable dithering in dest_write_back_wide. The dithering uses the
destination format to determine noise amplitude.
This does not change pixman's behavior when dithering is disabled (the
default).
Additional types and functions are added to the public API:
- The `pixman_dither_t` enum exposes the available dithering methods.
Currently a single dithering method based on 8x8 Bayer matrices is
implemented (PIXMAN_DITHER_ORDERED_BAYER_8). The PIXMAN_DITHER_FAST,
PIXMAN_DITHER_GOOD and PIXMAN_DITHER_BEST aliases are provided and
should be used to benefit from future specializations.
- The `pixman_image_set_dither` function allows to set the dithering
method to use when rendering to a bits image.
- The `pixman_image_set_dither_offset` function allows to set a
vertical and horizontal offsets for the dither matrix. This can be
used after scrolling to ensure a consistent spatial positioning of
the dither matrix.
Changes since previous version (v2):
- linear_gradient_is_horizontal optimization is still compatible with
the wide pipeline. The code disabling it was a remnant of a previous
patch which performed dithering directly inside linear_get_scanline,
and thus needed to be called independently for each scanline.
Changes since v1:
- Renamed PIXMAN_DITHER_BAYER_8 to PIXMAN_DITHER_ORDERED_BAYER_8
- Disable dithering for channels with 32bpp or more (since they can
represent exactly the wide values already). This makes the patches
compatible with the newly added floating point format.
Dithering is compatible with linear_gradient_is_horizontal
The recently introduced wide pipeline for filters has a typo which
causes it to improperly compute bilinear interpolation positions,
causing various glitches when enabled.
This patch uses the proper computation for bilinear interpolation in the
wide pipeline. It also makes related `if` statements conformant to the
CODING_STYLE:
* If a substatement spans multiple lines, then there must be braces
around it.
* If one substatement of an if statement has braces, then the other
must too.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
These were forgotten in commit 0ea37df428 (meson: store ARM SIMD and
NEON tests as text files) and since autotools doesn't use them make
distcheck still succeeded.
Fixes#30
Signed-off-by: Matt Turner <mattst88@gmail.com>
This is unfortunately required to make the tests work correctly, as
otherwise meson assumes that the files are C code not assembly. I've
opened https://github.com/mesonbuild/meson/issues/5151, to discuss
fixing the issue in meson upstream.
Fixes#29
This issue causes openmp arguments to be injected into compilers that
can support openmp, even if they don't. This issue will be fixed in
0.51 (code already landed in mesonbuild#5116), for older versions lets
work around the issue.
pixman-bits-image's wide helpers first obtains the 8-bits image,
then converts it to float. This destroys all the precision that
the wide path was offering.
Fix this by making get_pixel() take a pointer instead of returning
a value. Floating point will fill in a argb_t, while the 8-bits path
will fill a 32-bits ARGB value. This also requires writing a floating
point bilinear interpolator. With this change pixman can use the full
floating point precision internally in all paths.
Changes since v1:
- Make accum and reduce an argument to convolution functions,
to remove duplication.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Acked-by: Basile Clement <basile-pixman@clement.pm>
This patch modifies the gradient walker to be able to generate floating
point values directly in addition to a8r8g8b8 32 bit values. This is
then used by the various gradient implementations to render in floating
point when asked to do so, instead of rendering to a8r8g8b8 and then
expanding to floating point as they were doing previously.
Changes since v1 (mlankhorst):
- Implement pixman_gradient_walker_pixel_32 without calling
pixman_gradient_walker_pixel_float, to prevent performance degradation.
Suggested by Adam Jackson.
- Fix whitespace errors.
- Remove unnecessary function prototypes in pixman-private.h
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
[mlankhorst: Add comment about pixman_contract_from_float,
based on Basille's suggestion]
Acked-by: Basile Clement <basile-pixman@clement.pm>
This commit adds a meson build system for pixman. It carries the usual
improvements of meson, better clean build time, much better incremental
build times, while being simpler and easier to understand.
This takes advantage of some features from the most recent versions of
meson: the builtin openmp dependency and the feature option type.
There are a couple of things that I've done a bit differently than the
autotools build system, I've built a libdemos which is the utilities
from the demos folder, and I've linked the demos with libtestutils from
tetsts, otherwise I expect that most things will be the same.
I've tested so far cross compiling from x86_64 -> x86, x86_64 ->
Aarch64, and Linux to Windows via mingw, as well as native x86_64 Linux
builds which all work. I've also built with mingw nativly, there are
some test failures there. An MSVC build can be generated, but fails.
v2: - set WORDS_BIGENDIAN in the config for big endian systems.
Add some basic tests to ensure that the newly added formats work as
intended.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Pixman is already using the floating point formats internally, expose
this capability in case someone wants to support higher bit per
component formats.
This is useful for igt which depends on cairo to do the rendering.
It can use it to convert floats internally to planar Y'CbCr formats,
or to F16.
We add a new type PIXMAN_TYPE_RGBA_FLOAT for this format, which is an
all float array of R, G, B, and A. Formats that use mixed float/int
RGBA aren't supported, and will probably need their own type.
Changes since v1:
- Use RGBA 128 bits and RGB 96 bits memory layouts, to better match the opengl format.
Changes since v2:
- Add asserts in accessor and for strides to force alignment.
- Move test changes to their own commit.
Changes since v3:
- Define 32bpc as PIXMAN_FORMAT_PACKED_C32
- Rename pixman accessors from rgb*_float_float to rgb*f_float
Changes since v4:
- Create a new PIXMAN_FORMAT_BYTE for fitting up to 64 bits per component.
(based on Siarhei Siamashka's suggestion)
- Use new format type PIXMAN_TYPE_RGBA_FLOAT
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> #v4
[mlankhorst: Fix missing braces in PIXMAN_FORMAT_RESHIFT macro]
Currently the number of bits per pixel is used instead of the
number of bytes per pixel when calculating image strides. This
does not cause any real problems, but the gaps between scanlines
are excessively large.
This patch actually converts bits to bytes and rounds up the result
to the nearest byte boundary.
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Reviewed-by: soren.sandmann@gmail.com
__builtin_shuffle was removed in clang 5.0.
Build log says:
test/utils-prng.c:207:27: error: use of unknown builtin '__builtin_shuffle' [-Wimplicit-function-declaration]
randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
^
test/utils-prng.c:207:25: error: assigning to 'uint8x16' (vector of 16 'uint8_t' values) from incompatible type 'int'
randdata.vb = __builtin_shuffle (randdata.vb, bswap_shufflemask);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated
Link to original discussion:
http://lists.llvm.org/pipermail/cfe-dev/2017-August/055140.html
It's possible to build pixman if attached patch is applied. Basically
patch adds check for __builtin_shuffle support and in case there is
none, falls back to clang-specific __builtin_shufflevector that do the
same but have different API.
Bugzilla: https://bugs.gentoo.org/646360
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104886
Tested-by: Philip Chimento <philip.chimento@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Just builds on Fedora 28 for x86_64 at the moment, but it's a start.
Credit to Daniel Stone for eliminating the nested docker image.
Signed-off-by: Adam Jackson <ajax@redhat.com>
Use vector intrinsic for loading possibly unaligned data instead of a
typecast.
Bugzilla: https://bugzilla.redhat.com/1572540
Signed-off-by: Dan Horák <dan@danny.cz>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Tested-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Expanded the size slightly (from ~4.25 to 5) to make the cutoff less
noticable. Previouly the value at the cutoff was
gaussian_filter(sqrt(2)*3/2) = 0.00626 which is larger than the
difference between 8-bit pixels (1/255 = 0.003921). New cutoff is
gaussian_filter(2.5) = 0.001089 which is smaller.
v11: added some math to commit message
v14: left SIGMA in there
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Søren Sandmann <soren.sandmann@gmail.com>
There are a few bugs in the current normalization code
(1) The normalization is based on the sum of the *floating point*
values generated by integral(). But in order to get the sum to be
close to pixman_fixed_1, the sum of the rounded fixed point values
should be used.
(2) The multiplications in the normalization loops often round the
same way, so the residual error can fairly large.
(3) The residual error is added to the sample located at index
(width - width / 2), which is not the midpoint for odd widths (and
for width 1 is in fact outside the array).
This patch fixes these issues by (1) using the sum of the fixed point
values as the total to divide by, (2) doing error diffusion in the
normalization loop, and (3) putting any residual error (which is now
guaranteed to be less than pixman_fixed_e) at the first sample, which
is the only one that didn't get any error diffused into it.
Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
The convolution of two BOX filters is simply the length of the
interval where both are non-zero, so we can simply return width from
the integral() function because the integration region has already
been restricted to be such that both functions are non-zero on it.
This is both faster and more accurate than doing numerical integration.
This patch is based on one by Bill Spitzak
https://lists.freedesktop.org/archives/pixman/2016-March/004446.html
with these changes:
- Rebased to not assume any changes in the arguments to integral().
- Dropped the multiplication by scale
- Added more details in the commit message.
Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Bill Spitzak <spitzak@gmail.com>
Only the triangle is discontinuous at 0. The other filters resemble a
cubic closely enough that Simpsons integration works without
splitting.
Changes by Søren: Rebase without the changes to the integral function,
update comment to match the new code.
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Søren Sandmann <soren.sandmann@gmail.com>
Simpsons uses cubic curve fitting, with 3 samples defining each
cubic. This makes the weights of the samples be in a pattern of
1,4,2,4,2...4,1, and then dividing the result by 3.
The previous code was using weights of 1,2,0,6,0,6...,2,1.
With this fix the integration is accurate enough that the number of
samples could be reduced a lot. Multiples of 12 seem to work best.
v7: Merged with patch to reduce from 128 samples to 16
v9: Changed samples from 16 to 12
v10: Fixed rebase error that made it not compile
v11: minor whitespace change
v14: more whitespace changes
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Søren Sandmann <soren.sandmann@gmail.com>
Rearranged so that the entire block of memory for the filter pair
is allocated first, and then filled in. Previous version allocated
and freed two temporary buffers for each filter and did an extra
memcpy.
v8: small refactor to remove the filter_width function
v10: Restored filter_width function but with arguments changed to
match later patches
v11: Removed unused arg and pointer from filter_width function
Whitespace fixes.
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Søren Sandmann <soren.sandmann@gmail.com>
If enable-gnuplot is configured, then you can pipe the output of a
pixman-using program to gnuplot and get a continuously-updated plot of
the horizontal filter. This works well with demos/scale to test the
filter generation.
The plot is all the different subposition filters shuffled
together. This is misleading in a few cases:
IMPULSE.BOX - goes up and down as the subfilters have different
numbers of non-zero samples
IMPULSE.TRIANGLE - somewhat crooked for the same reason
1-wide filters - looks triangular, but a 1-wide box would be more
accurate
Changes by Søren: Rewrote the pixman-filter.c part to
- make it generate correct coordinates
- add a comment on how coordinates are generated
- in rounding.txt, add a ceil() variant of the first-sample
formula
- make the gnuplot output slightly prettier
v7: First time this ability was included
v8: Use config option
Moved code to the filter generator
Modified scale demo to not call filter generator a second time.
v10: Only print if successful generation of plots
Use #ifdef, not #if
v11: small whitespace fixes
v12: output range from -width/2 to width/2 and include y==0, to avoid misleading plots
for subsample_bits==0 and for box filters which may have no small values.
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
This is very useful for comparing the results of SEPARABLE_CONVOLUTION
with BILINEAR and NEAREST.
v14: Removed good/best items
v15: Skip filter generation so gnuplot output continues showing previous value
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
It now shows the initial value of 4 when the demo is started
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Søren Sandmann <soren.sandmann@gmail.com>
Instead of using the boundary of xformed rectangle, use the boundary
of xformed ellipse. This is much more accurate and less blurry. In
particular the filtering does not change as the image is rotated.
Signed-off-by: Bill Spitzak <spitzak@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Soren Sandmann <soren.sandmann@gmail.com>
Generalize and simplify the code that reduces BILINEAR to NEAREST so
that the reduction happens for all affine transformations where
t00...t12 are integers and (t00 + t01) and (t10 + t11) are both
odd. This is a sufficient condition for the resulting transformed
coordinates to be exactly at the center of a pixel so that BILINEAR
becomes identical to NEAREST.
V2: Address some comments by Bill Spitzak
Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Bill Spitzak <spitzak@gmail.com>
This new test tests a bunch of bilinear downscalings, where many have
a transformation such that the BILINEAR filter can be reduced to
NEAREST (and many don't).
A CRC32 is computed for all the resulting images and compared to a
known-good value for both 4-bit and 7-bit interpolation.
V2: Remove leftover comment, some minor formatting fixes, use a
timestamp as the PRNG seed.
Signed-off-by: Søren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Bill Spitzak <spitzak@gmail.com>
When a BILINEAR filter is reduced to NEAREST, it is possible for both
types of fast paths to run; in this case, the NEAREST ones should be
preferred as that is the simpler filter.
Signed-off-by: Soren Sandmann <soren.sandmann@gmail.com>
Reviewed-by: Bill Spitzak <spitzak@gmail.com>
<float.h> is included unconditionally by pixman-private.h, which in
turn gets included by assembler files. Unfortunately, with certain C
libraries (like the musl C library), <float.h> cannot be included in
assembler files:
CCLD libpixman-arm-simd.la
/home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h: Assembler messages:
/home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h:8: Error: bad instruction `int __flt_rounds(void)'
/home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h: Assembler messages:
/home/test/buildroot/output/host/usr/arm-buildroot-linux-musleabihf/sysroot/usr/include/float.h:8: Error: bad instruction `int __flt_rounds(void)'
It turns out however that <float.h> is not needed by assembly files,
so we move its inclusion within the #ifndef __ASSEMBLER__ condition,
which solves the problem.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
The `check` target in test/Makefile.win32 assumed that any non-0 exit
code from the tests was an error, but the testsuite is currently using
77 as a SKIP exit code (based on the convention used in autotools).
Fixes fence-image-self-test and cover-test (now reported as SKIP).
Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
The `rm` command is not usually available when running on Win32 in a
`cmd.exe` shell. Instead the shell provides the `del` builtin, which
has somewhat more limited wildcars expansion and error handling.
This makes all of the Makefile targets work on Win32 both using
`cmd.exe` and using the MSYS environment.
Signed-off-by: Simon Richter <Simon.Richter@hogyros.de>
Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
When the build is performed using `cmd.exe` as shell, the `mkdir`
command does not support the `-p` flag. The ability to create multiple
netsted folder is not used, hence it can be easily replaced by only
creating the directory if it does not exist.
This makes the build work on the `cmd.exe` shell, except for the
`clean` targets.
Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Instead of explicitly depending on "pixman" for the "all" and "check"
targets, rely on the dependency to the .lib file
Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Since 3d81d89c29 BUILT_SOURCES is not
used anymore, but it was unintentionally left in Win32 Makefiles.
Signed-off-by: Andrea Canciani <ranma42@gmail.com>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
This patch modifies the SSE2 & SSSE3 tests in configure.ac to use a
global variable to initialize vector variables. In addition, we now
return the value of the computation instead of 0.
This is done so gcc 4.9 (and lower) won't optimize the SSE assembly
instructions (when using -O1 and higher), because then the configure test
might incorrectly pass even though the assembler doesn't support the
SSE instructions (the test will pass because the compiler does support
the intrinsics).
v2: instead of using volatile, use a global variable as input
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Older versions of clang emitted an error on the "K" constraint, but at
least since version 3.7 it is supported. Just like gcc, this
constraint is only allowed for constants, but apparently clang
requires them to be known before inlining.
Using the macro definition _mm_shuffle_pi16(A, N) ensures that the "K"
constraint is always applied to a literal constant, independently from
the compiler optimizations and allows building pixman-mmx on modern
clang.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrea Canciani <ranma42@gmail.com>
On MacOS X, according to the manpage of mprotect(), "When a program
violates the protections of a page, it gets a SIGBUS or SIGSEGV
signal.", but fence-image-self-test was only accepting a SIGSEGV as
notification of invalid access.
Fixes fence-image-self-test
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
We had lots of hacks to handle the inability to include xmmintrin.h
without compiling with -msse (lest SSE instructions be used in
pixman-mmx.c). Some recent version of gcc relaxed this restriction.
Change configure.ac to test that xmmintrin.h can be included and that we
can use some intrinsics from it, and remove the work-around code from
pixman-mmx.c.
Evidently allows gcc 4.9.3 to optimize better as well:
text data bss dec hex filename
657078 30848 680 688606 a81de libpixman-1.so.0.33.3 before
656710 30848 680 688238 a806e libpixman-1.so.0.33.3 after
Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Tested-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Signed-off-by: Matt Turner <mattst88@gmail.com>
Patch "Remove the 8e extra safety margin in COVER_CLIP analysis" reduced
the required image area for setting the COVER flags in
pixman.c:analyze_extent(). Do the same reduction in affine-bench.
Leaving the old calculations in place would be very confusing for anyone
reading the code.
Also add a comment that explains how affine-bench wants to hit the COVER
paths. This explains why the intricate extent calculations are copied
from pixman.c.
[Pekka: split patch, change comments, write commit message]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
As discussed in
http://lists.freedesktop.org/archives/pixman/2015-August/003905.html
the 8 * pixman_fixed_e (8e) adjustment which was applied to the transformed
coordinates is a legacy of rounding errors which used to occur in old
versions of Pixman, but which no longer apply. For any affine transform,
you are now guaranteed to get the same result by transforming the upper
coordinate as though you transform the lower coordinate and add (size-1)
steps of the increment in source coordinate space. No projective
transform routines use the COVER_CLIP flags, so they cannot be affected.
Proof by Siarhei Siamashka:
Let's take a look at the following affine transformation matrix (with 16.16
fixed point values) and two vectors:
| a b c |
M = | d e f |
| 0 0 0x10000 |
| x_dst |
P = | y_dst |
| 0x10000 |
| 0x10000 |
ONE_X = | 0 |
| 0 |
The current matrix multiplication code does the following calculations:
| (a * x_dst + b * y_dst + 0x8000) / 0x10000 + c |
M * P = | (d * x_dst + e * y_dst + 0x8000) / 0x10000 + f |
| 0x10000 |
These calculations are not perfectly exact and we may get rounding
because the integer coordinates are adjusted by 0.5 (or 0x8000 in the
16.16 fixed point format) before doing matrix multiplication. For
example, if the 'a' coefficient is an odd number and 'b' is zero,
then we are losing some of the least significant bits when dividing by
0x10000.
So we need to strictly prove that the following expression is always
true even though we have to deal with rounding:
| a |
M * (P + ONE_X) - M * P = M * ONE_X = | d |
| 0 |
or
((a * (x_dst + 0x10000) + b * y_dst + 0x8000) / 0x10000 + c)
-
((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
=
a
It's easy to see that this is equivalent to
a + ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
- ((a * x_dst + b * y_dst + 0x8000) / 0x10000 + c)
=
a
Which means that stepping exactly by one pixel horizontally in the
destination image space (advancing 'x_dst' by 0x10000) is the same as
changing the transformed 'x_src' coordinate in the source image space
exactly by 'a'. The same applies to the vertical direction too.
Repeating these steps, we can reach any pixel in the source image
space and get exactly the same fixed point coordinates as doing
matrix multiplications per each pixel.
By the way, the older matrix multiplication implementation, which was
relying on less accurate calculations with three intermediate roundings
"((a + 0x8000) >> 16) + ((b + 0x8000) >> 16) + ((c + 0x8000) >> 16)",
also has the same properties. However reverting
http://cgit.freedesktop.org/pixman/commit/?id=ed39992564beefe6b12f81e842caba11aff98a9c
and applying this "Remove the 8e extra safety margin in COVER_CLIP
analysis" patch makes the cover test fail. The real reason why it fails
is that the old pixman code was using "pixman_transform_point_3d()"
function
http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n49
for getting the transformed coordinate of the top left corner pixel
in the image scaling code, but at the same time using a different
"pixman_transform_point()" function
http://cgit.freedesktop.org/pixman/tree/pixman/pixman-matrix.c?id=pixman-0.28.2#n82
in the extents calculation code for setting the cover flag. And these
functions did the intermediate rounding differently. That's why the 8e
safety margin was needed.
** proof ends
However, for COVER_CLIP_NEAREST, the actual margins added were not 8e.
Because the half-way cases round down, that is, coordinate 0 hits pixel
index -1 while coordinate e hits pixel index 0, the extra safety margins
were actually 7e to the left and up, and 9e to the right and down. This
patch removes the 7e and 9e margins and restores the -e adjustment
required for NEAREST sampling in Pixman. For reference, see
pixman/rounding.txt.
For COVER_CLIP_BILINEAR, the margins were exactly 8e as there are no
additional offsets to be restored, so simply removing the 8e additions
is enough.
Proof:
All implementations must give the same numerical results as
bits_image_fetch_pixel_nearest() / bits_image_fetch_pixel_bilinear().
The former does
int x0 = pixman_fixed_to_int (x - pixman_fixed_e);
which maps directly to the new test for the nearest flag, when you consider
that x0 must fall in the interval [0,width).
The latter does
x1 = x - pixman_fixed_1 / 2;
x1 = pixman_fixed_to_int (x1);
x2 = x1 + 1;
When you write a COVER path, you take advantage of the assumption that
both x1 and x2 fall in the interval [0, width).
As samplers are allowed to fetch the pixel at x2 unconditionally, we
require
x1 >= 0
x2 < width
so
x - pixman_fixed_1 / 2 >= 0
x - pixman_fixed_1 / 2 + pixman_fixed_1 < width * pixman_fixed_1
so
pixman_fixed_to_int (x - pixman_fixed_1 / 2) >= 0
pixman_fixed_to_int (x + pixman_fixed_1 / 2) < width
which matches the source code lines for the bilinear case, once you delete
the lines that add the 8e margin.
Signed-off-by: Ben Avison <bavison@riscosopen.org>
[Pekka: adjusted commit message, left affine-bench changes for another patch]
[Pekka: add commit message parts from Siarhei]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Each of the aligns can only add a maximum of 15 bytes to the space
requirement. This permits some edge cases to use the stack buffer where
previously it would have deduced that a heap buffer was required.
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
As https://bugs.freedesktop.org/show_bug.cgi?id=92027#c6 explains,
the stack is allocated at the very top of the process address space
in some configurations (32-bit x86 systems with ASLR disabled).
And the careless computations done with the 'dest_buffer' pointer
may overflow, failing the buffer upper limit check.
The problem can be reproduced using the 'stress-test' program,
which segfaults when executed via setarch:
export CFLAGS="-O2 -m32" && ./autogen.sh
./configure --disable-libpng --disable-gtk && make
setarch i686 -R test/stress-test
This patch introduces the required corrections. The extra check
for negative 'width' may be redundant (the invalid 'width' value
is not supposed to reach here), but it's better to play safe
when dealing with the buffers allocated on stack.
Reported-by: Ludovic Courtès <ludo@gnu.org>
Signed-off-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Reviewed-by: soren.sandmann@gmail.com
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Some architectures, such as Microblaze and Nios2, currently do not
implement FE_DIVBYZERO, even though they have <fenv.h> and
feenableexcept(). This commit adds a configure.ac check to verify
whether FE_DIVBYZERO is defined or not, and if not, disables the
problematic code in test/utils.c.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Signed-off-by: Marek Vasut <marex@denx.de>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Now that we replaced the expensive functions with better performing
alternatives, we should remove them so they will not be used again.
Running Cairo benchmark on trimmed traces gave the following results:
POWER8, 8 cores, 3.4GHz, RHEL 7.2 ppc64le.
Speedups
========
t-firefox-scrolling 1232.30 -> 1096.55 : 1.12x
t-gnome-terminal-vim 613.86 -> 553.10 : 1.11x
t-evolution 405.54 -> 371.02 : 1.09x
t-firefox-talos-gfx 919.31 -> 862.27 : 1.07x
t-gvim 653.02 -> 616.85 : 1.06x
t-firefox-canvas-alpha 941.29 -> 890.42 : 1.06x
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
This patch optimizes vmx_composite_over_n_8888_8888_ca by removing use
of expand_alpha_1x128, unpack/pack and in_over_2x128 in favor of
splat_alpha, in_over and MUL/ADD macros from pixman_combine32.h.
Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
3.4GHz, RHEL 7.2 ppc64le gave the following results:
reference memcpy speed = 23475.4MB/s (5868.8MP/s for 32bpp fills)
Before After Change
--------------------------------------------
L1 244.97 474.05 +93.51%
L2 243.74 473.05 +94.08%
M 243.29 467.16 +92.02%
HT 144.03 252.79 +75.51%
VT 174.24 279.03 +60.14%
R 109.86 149.98 +36.52%
RT 47.96 53.18 +10.88%
Kops/s 524 576 +9.92%
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
This patch optimizes scaled_nearest_scanline_vmx_8888_8888_OVER and all
the functions it calls (combine1, combine4 and
core_combine_over_u_pixel_vmx).
The optimization is done by removing use of expand_alpha_1x128 and
expand_alpha_2x128 in favor of splat_alpha and MUL/ADD macros from
pixman_combine32.h.
Running "lowlevel-blt-bench -n over_8888_8888" on POWER8, 8 cores,
3.4GHz, RHEL 7.2 ppc64le gave the following results:
reference memcpy speed = 24847.3MB/s (6211.8MP/s for 32bpp fills)
Before After Change
--------------------------------------------
L1 182.05 210.22 +15.47%
L2 180.6 208.92 +15.68%
M 180.52 208.22 +15.34%
HT 130.17 178.97 +37.49%
VT 145.82 184.22 +26.33%
R 104.51 129.38 +23.80%
RT 48.3 61.54 +27.41%
Kops/s 430 504 +17.21%
v2: Check *pm is not NULL before dereferencing it in combine1()
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Enable the fast path added in the previous patch by moving the lookup
table entries to their proper locations.
Lowlevel-blt-bench benchmark statistics with 30 iterations, showing the
effect of adding this one patch on top of
"armv6: Add over_n_8888 fast path (disabled)", which was applied on
fd59569294.
Before After
Mean StdDev Mean StdDev Confidence Change
L1 12.5 0.04 45.2 0.10 100.00% +263.1%
L2 11.1 0.02 43.2 0.03 100.00% +289.3%
M 9.4 0.00 42.4 0.02 100.00% +351.7%
HT 8.5 0.02 25.4 0.10 100.00% +198.8%
VT 8.4 0.02 22.3 0.07 100.00% +167.0%
R 8.2 0.02 23.1 0.09 100.00% +183.6%
RT 5.4 0.05 11.4 0.21 100.00% +110.3%
At most 3 outliers rejected per test per set.
Iterating here means that lowlevel-blt-bench was executed 30 times, and
the statistics above were computed from the output.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
This new fast path is initially disabled by putting the entries in the
lookup table after the sentinel. The compiler cannot tell the new code
is not used, so it cannot eliminate the code. Also the lookup table size
will include the new fast path. When the follow-up patch then enables
the new fast path, the binary layout (alignments, size, etc.) will stay
the same compared to the disabled case.
Keeping the binary layout identical is important for benchmarking on
Raspberry Pi 1. The addresses at which functions are loaded will have a
significant impact on benchmark results, causing unexpected performance
changes. Keeping all function addresses the same across the patch
enabling a new fast path improves the reliability of benchmarks.
Benchmark results are included in the patch enabling this fast path.
[Pekka: disabled the fast path, commit message]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
This test aims to verify both numerical correctness and the honouring of
array bounds for scaled plots (both nearest-neighbour and bilinear) at or
close to the boundary conditions for applicability of "cover" type fast paths
and iter fetch routines.
It has a secondary purpose: by setting the env var EXACT (to any value) it
will only test plots that are exactly on the boundary condition. This makes
it possible to ensure that "cover" routines are being used to the maximum,
although this requires the use of a debugger or code instrumentation to
verify.
Changes in v4:
Check the fence page size and skip the test if it is too large. Since
we need to deal with pixman_fixed_t coordinates that go beyond the
real image width, make the page size limit 16 kB. A 32 kB or larger
page size would cause an a8 image width to be 32k or more, which is no
longer representable in pixman_fixed_t.
Use a shorthand variable 'filter' in test_cover().
Whitespace adjustments.
Changes in v5:
Skip if fenced memory is not supported. Do you know of any such
platform?
Signed-off-by: Ben Avison <bavison@riscosopen.org>
[Pekka: changes in v4 and v5]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Acked-by: Oded Gabbay <oded.gabbay@gmail.com>
Add a new option to PIXMAN_DISABLE: "wholeops". This option disables all
whole-operation fast paths regardless of implementation level, except
the general path (general_composite_rect).
The purpose is to add a debug option that allows us to test optimized
iterator paths specifically. With this, it is possible to see if:
- fast paths mask bugs in iterators
- compare fast paths with iterator paths for performance
The effect was tested on x86_64 by running:
$ PIXMAN_DISABLE='' ./test/lowlevel-blt-bench over_8888_8888
$ PIXMAN_DISABLE='wholeops' ./test/lowlevel-blt-bench over_8888_8888
In the first case time is spent in sse2_composite_over_8888_8888(), and
in the latter in sse2_combine_over_u().
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Add a function to get the page size used for memory fence purposes, and
use it everywhere where getpagesize() was used.
This offers a single point in code to override the page size, in case
one wants to experiment how the tests work with a higher page size than
what the developer's machine has.
This also offers a clean API, without adding #ifdefs, to tests for
checking the page size.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Used a wrong variable name, causing:
/home/pq/git/pixman/demos/../test/utils.c: In function ‘fence_image_create_bits’:
/home/pq/git/pixman/demos/../test/utils.c:562:46: error: ‘width’ undeclared (first use in this function)
Use the correct variable.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Quoting Simon again: "It currently has the same effect as hardening=+bindnow,
but will automatically enable future hardening options and in case the package
will ever build binaries those are immediately protected with PIE as well."
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Quoting Simon Ruderich <simon@ruderich.org>:
"There's no need to use dpkg-buildflags manually in debian/rules.
Debhelper with compat=9 automatically enables the hardening flags when
dh_auto_configure is used. So just by calling dh_auto_configure [...]
the hardening flags get automatically passed to the build system.
DEB_BUILD_MAINT_OPTIONS is also respected."
Signed-off-by: Andreas Boll <andreas.boll.dev@gmail.com>
Tests that fence_malloc and fence_image_create_bits actually work: that
out-of-bounds and out-of-row (unused stride area) accesses trigger
SIGSEGV.
If fence_malloc is a dummy (FENCE_MALLOC_ACTIVE not defined), this test
is skipped.
Changes in v2:
- check FENCE_MALLOC_ACTIVE value, not whether it is defined
- test that reading bytes near the fence pages does not cause a
segmentation fault
Changes in v3:
- Do not print progress messages unless VERBOSE environment variable is
set. Avoid spamming the terminal output of 'make check' on some
versions of autotools.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Useful for detecting out-of-bounds accesses in composite operations.
This will be used by follow-up patches adding new tests.
Changes in v2:
- fix style on fence_image_create_bits args
- add page to stride only if stride_fence
- add comment on the fallback definition about freeing storage
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Define a new token to simplify checking whether fence_malloc() actually
can catch out-of-bounds access.
This will be used in the future to skip tests that rely on fence_malloc
checking functionality.
Changes in v2:
- #define FENCE_MALLOC_ACTIVE always, but change its value to help catch
use of it without including utils.h
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Add mask details to the output.
[Pekka: redo whitespace and print src,dst,mask x and y.]
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
If a user gives multiple patterns or extra arguments, only the last one
was used as the pattern while the former were just ignored. This is a
user error silently converted to something possibly unexpected.
In presence of extra arguments, complain and quit.
Cc: Ben Avison <bavison@riscosopen.org>
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
It was benchmarked against commid id 2be523b from pixman/master
POWER8, 8 cores, 3.4GHz, RHEL 7.1 ppc64le.
reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)
Before After Change
---------------------------------------------
L1 61.92 244.91 +295.53%
L2 62.74 243.3 +287.79%
M 63.03 241.94 +283.85%
HT 59.91 144.22 +140.73%
VT 59.4 174.39 +193.59%
R 53.6 111.37 +107.78%
RT 37.99 46.38 +22.08%
Kops/s 436 506 +16.06%
cairo trimmed benchmarks :
Speedups
========
t-xfce4-terminal-a1 1540.37 -> 1226.14 : 1.26x
t-firefox-talos-gfx 1488.59 -> 1209.19 : 1.23x
Slowdowns
=========
t-evolution 553.88 -> 581.63 : 1.05x
t-poppler 364.99 -> 383.79 : 1.05x
t-firefox-scrolling 1223.65 -> 1304.34 : 1.07x
The slowdowns can be explained in cases where the images are small and
un-aligned to 16-byte boundary. In that case, the function will first
work on the un-aligned area, even in operations of 1 byte. In case of
small images, the overhead of such operations can be more than the
savings we get from using the vmx instructions that are done on the
aligned part of the image.
In the C fast-path implementation, there is no special treatment for the
un-aligned part, as it works in 4 byte quantities on the entire image.
Because llbb is a synthetic test, I would assume it has much less
alignment issues than "real-world" scenario, such as cairo benchmarks,
which are basically recorded traces of real application activity.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
This patch adds the following helper functions for reuse of code,
hiding BE/LE differences and maintainability.
All of the functions were defined as static force_inline.
Names were copied from pixman-sse2.c so conversion of fast-paths between
sse2 and vmx would be easier from now on. Therefore, I tried to keep the
input/output of the functions to be as close as possible to the sse2
definitions.
The functions are:
- load_128_aligned : load 128-bit from a 16-byte aligned memory
address into a vector
- load_128_unaligned : load 128-bit from memory into a vector,
without guarantee of alignment for the
source pointer
- save_128_aligned : save 128-bit vector into a 16-byte aligned
memory address
- create_mask_16_128 : take a 16-bit value and fill with it
a new vector
- create_mask_1x32_128 : take a 32-bit pointer and fill a new
vector with the 32-bit value from that pointer
- create_mask_32_128 : take a 32-bit value and fill with it
a new vector
- unpack_32_1x128 : unpack 32-bit value into a vector
- unpacklo_128_16x8 : unpack the eight low 8-bit values of a vector
- unpackhi_128_16x8 : unpack the eight high 8-bit values of a vector
- unpacklo_128_8x16 : unpack the four low 16-bit values of a vector
- unpackhi_128_8x16 : unpack the four high 16-bit values of a vector
- unpack_128_2x128 : unpack the eight low 8-bit values of a vector
into one vector and the eight high 8-bit
values into another vector
- unpack_128_2x128_16 : unpack the four low 16-bit values of a vector
into one vector and the four high 16-bit
values into another vector
- unpack_565_to_8888 : unpack an RGB_565 vector to 8888 vector
- pack_1x128_32 : pack a vector and return the LSB 32-bit of it
- pack_2x128_128 : pack two vectors into one and return it
- negate_2x128 : xor two vectors with mask_00ff (separately)
- is_opaque : returns whether all the pixels contained in
the vector are opaque
- is_zero : returns whether the vector equals 0
- is_transparent : returns whether all the pixels
contained in the vector are transparent
- expand_pixel_8_1x128 : expand an 8-bit pixel into lower 8 bytes of a
vector
- expand_alpha_1x128 : expand alpha from vector and return the new
vector
- expand_alpha_2x128 : expand alpha from one vector and another alpha
from a second vector
- expand_alpha_rev_2x128 : expand a reversed alpha from one vector and
another reversed alpha from a second vector
- pix_multiply_2x128 : do pix_multiply for two vectors (separately)
- over_2x128 : perform over op. on two vectors
- in_over_2x128 : perform in-over op. on two vectors
v2: removed expand_pixel_32_1x128 as it was not used by any function and
its implementation was erroneous
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
This patch adds a macro for loading a single vector.
It also make the other LOAD_VECTORx macros use this macro as a base so
code would be re-used.
In addition, I fixed minor coding style issues.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
The memcpy speed measurement takes several seconds. When you are running
single tests in a harness that iterates dozens or hundreds of times, the
repeated measurements are redundant and take a lot of time. It is also
an open question whether the measured speed changes over long test runs
due to unidentified platform reasons (Raspberry Pi).
Add a command line option to set the reference memcpy speed, skipping
the measuring.
The speed is mainly used to compute how many iterations do run inside
the bench_*() functions, so for repeated testing on the same hardware,
it makes sense to lock that number to a constant.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Add a command line option for choosing CSV output mode.
In CSV mode, only the results in Mpixels/s are printed in an easily
machine-parseable format. All user-friendly printing is suppressed.
This is intended for cases where you benchmark one particular operation
at a time. Running the "all" set of benchmarks will print just fine, but
you may have trouble matching rows to operations as you have to look at
the tests_tbl[] to see what row is which.
Reviewed-by: Ben Avison <bavison@riscosopen.org>
v2: don't add a space after comma in CSV.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Refactor the Mpixels/s computations into a function. Easier to read and
better documents what is being computed.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
The bench_* functions, that did not already do it, are modified to
return the number of pixels processed during the benchmark. This moves
the computation to the site that actually determines the number, and
simplifies bench_composite() a bit.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Move the printing of the memory speed and scaling mode into a new
function. This will help with implementing a machine-readable output
option.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
When given just a single test pattern instead of "all", print the test
details. This can be used to verify the pattern parser agrees with the
user, just like scaling settings are printed.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
We assign string literals to it, so it better be const.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Move explanation printing to a new function. This will help with
implementing a machine-readable output option.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Move printing of usage into a new function and use argv[0] as the
program name. This will help printing usage from multiple places.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
vec_mergeh/l operates differently for BE and LE, because of the order of
the vector elements (l->r in BE and r->l in LE).
To fix that, we simply need to swap between the input parameters, in case
we are working in LE.
v2:
- replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency
- fixed whitespaces and indentation issues
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
v2: don't put ';' at the end of macro definition. Instead, move it to
each line the macro is used.
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
v2: fixed whitespaces and indentation issues
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Replaced usage of vec_lvsl to direct unaligned assignment
operation (=). That is because, according to Power ABI Specification,
the usage of lvsl is deprecated on ppc64le.
Changed COMPUTE_SHIFT_{MASK,MASKS,MASKC} macro usage to no-op for powerpc
little endian since unaligned access is supported on ppc64le.
v2:
- replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency
- fixed whitespaces and indentation issues
Signed-off-by: Fernando Seiti Furusato <ferseiti@linux.vnet.ibm.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
The permutation vector isn't correct for LE, so correct its values
in case we are in LE mode.
v2:
- replace _LITTLE_ENDIAN with WORDS_BIGENDIAN for consistency
- change #ifndef to #ifdef for readability
Signed-off-by: Oded Gabbay <oded.gabbay@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
These two architectures were the only place where
SIMPLE_NEAREST_SOLID_MASK_FAST_PATH was used, and in both cases the
equivalent SIMPLE_NEAREST_SOLID_MASK_FAST_PATH_NORMAL macro was used
immediately afterwards, so including the NORMAL case in the main macro
simplifies the fast path table.
[Pekka: removed extra comma from the end of
SIMPLE_NEAREST_SOLID_MASK_FAST_PATH]
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
There is some reordering, but the only significant thing to ensure that
the same routine is chosen is that a COVER fast path for a given
combination of operator and source/destination pixel formats must
precede all the variants of repeated fast paths for the same
combination. This patch (and the other mmx/sse2 one) still follows that
rule.
I believe that in every other case, the set of operations that match any
pair of fast paths that are reordered in these patches are mutually
exclusive. While there will be a very subtle timing difference due to
the distance through the table we have to search to find a match
(sometimes faster, sometime slower) there is no evidence that the tables
have been carefully ordered by frequency of occurrence - just for ease
of copy-and-pasting.
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
This macro does exactly the same thing as the platform-neutral macro
SIMPLE_NEAREST_A8_MASK_FAST_PATH.
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
This macro is a superset of the platform-neutral macro
SIMPLE_NEAREST_A8_MASK_FAST_PATH. In other words, in addition to the
_COVER, _NONE and _PAD suffixes, its expansion includes the _NORMAL suffix.
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
This macro does exactly the same thing as the platform-neutral macro
SIMPLE_NEAREST_FAST_PATH.
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
When generating test data, we need to make sure the interpretation of
the data is the same regardless of endianess. That is, the pixel value
for each channel is the same on both little and big-endians.
This fixes a test failure on ppc64 (big-endian).
Tested-by: Fernando Seiti Furusato <ferseiti@linux.vnet.ibm.com> (ppc64le, ppc64, powerpc)
Tested-by: Ben Avison <bavison@riscosopen.org> (armv6l, armv7l, i686)
[Pekka: added commit message]
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Tested-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk> (x86_64)
This places a heavier emphasis on solid images than the other fuzz testers,
and tests both single-pixel repeating bitmap images as well as those created
using pixman_image_create_solid_fill(). In the former case, it also
exercises the case where the bitmap contents are written to after the
image's first use, which is not a use-case that any other test has
previously covered.
[Pekka: added the default case to the switch in test_solid ().]
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Commit 6d2cf40166 ("MIPS: Fix exported symbols in public API") attempted to
add a .hidden assembly directive, conditional on the code being compiled for an
ELF target. Unfortunately the #ifdef added was already inside a macro and
wasn't expanded properly by the preprocessor.
Fix by removing the check. It's unlikely there are many non-ELF MIPS systems
around anyway.
Fixes: Bug 83358 (https://bugs.freedesktop.org/83358)
Fixes: 6d2cf40166 ("MIPS: Fix exported symbols in public API")
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Cc: Vicente Olivert Riera <Vincent.Riera@imgtec.com>
Cc: Nemanja Lukic <nemanja.lukic@rt-rk.com>
Acked-by: Siarhei Siamashka <siarhei.siamashka@gmail.com>
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Affine-bench is written by following the example of lowlevel-blt-bench.
Affine-bench differs from lowlevel-blt-bench in the following:
- does not test different sized operations fitting to specific caches,
destination is always 1920x1080
- allows defining the affine transformation parameters
- carefully computes operation extents to hit the COVER_CLIP fast paths
Original version by Ben Avison. Changes by Pekka in v3:
- commit message
- style fixes
- more comments
- refactoring (e.g. bench_info_t)
- help output tweak
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
When doing component alpha with a solid mask, use a mask format that has
all the color channels instead of just a8. As Ben Avison explains it:
"Lowlevel-blt-bench initialises all its images using memset(0xCC) so an
a8 solid image would be converted by _pixman_image_get_solid() to
0xCC000000 whereas an a8r8g8b8 would be 0xCCCCCCCC. When you're not in
component alpha mode, only the alpha byte matters for the mask image,
but in the case of component alpha operations, a fast path might decide
that it can save itself a lot of multiplications if it spots that 3
constant mask components are already 0."
No (default) test so far has a solid mask with CA. This is just
future-proofing lowlevel-blt-bench to do what one would expect.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Let lowlevel-blt-bench parse the test name string from the command line,
allowing to run almost infinitely more tests. One is no longer limited
to the tests listed in the big table.
While you can use the old short-hand names like src_8888_8888, you can
also use all possible operators now, and specify pixel formats exactly
rather than just x888, for instance.
This even allows to run crazy patterns like
conjoint_over_reverse_a8b8g8r8_n_r8g8b8x8.
All individual patterns are now interpreted through the parser. The
pattern "all" runs the same old default test set as before but through
the parser instead of the hard-coded parameters.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
This patch is inspired by "lowlevel-blt-bench: Parse test name strings in
general case" by Ben Avison. From Ben's commit message:
"There are many types of composite operation that are useful to benchmark
but which are omitted from the table. Continually having to add extra
entries to the table is a nuisance and is prone to human error, so this
patch adds the ability to break down unknow strings of the format
<operation>_<src>[_<mask]_<dst>[_ca]
where bitmap formats are specified by number of bits of each component
(assumed in ARGB order) or 'n' to indicate a solid source or mask."
Add the parser to lowlevel-blt-bench.c, but do not hook it up to the
command line just yet. Instead, make it run a self-test.
As we now dynamically parse strings similar to the test names in the
huge table 'tests_tbl', we should make sure we can parse the old
well-known test names and produce exactly the same test parameters. The
self-test goes through this old table and verifies the parsing results.
Unfortunately the old table is not exactly consistent, it contains some
special cases that cannot be produced by the parsing rules. Whether
these special cases are intentional or just an oversight is not always
clear. Anyway, add a small table to reproduce the special cases
verbatim.
If we wanted, we could remove the big old table in a follow-up commit,
but then we would also lose the parser self-test.
The point of this whole excercise to let lowlevel-blt-bench recognize
novel test patterns in the future, following exactly the conventions
used in the old table.
Ben, from what I see, this parser has one major difference to what you
wrote. For a solid mask, your parser uses a8r8g8b8 format, while mine
uses a8 which comes from the old table.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Lowlevel-blt-bench uses several pixel format shorthands. Pick them from
the great table in lowlevel-blt-bench.c and add them here so that
format_from_string() can recognize them.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Lowlevel-blt-bench uses the operator alias "outrev". Add an alias for it
in the operator-name table.
Also add aliases for overrev, inrev and atoprev, so that
lowlevel-blt-bench can later recognize them for new test cases.
The aliases are added such, that an operator to name lookup will never
return them; it returns the proper names instead.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Previously there was a flat list of formats, used to iterate over all
formats when looking up a format from name or listing them. This cannot
support name aliases.
To support name aliases (multiple name strings mapping to the same
format), create a format-name mapping table. Functions format_name(),
format_from_string(), and list_formats() should keep on working exactly
like before, except format_from_string() now recognizes the additional
formats that format_name() already supported.
The only the formats from the old format list are added with ENTRY, so
that list_formats() works as before. The whole list is verified against
the authoritative list in pixman.h, entries missing from the old list
are commented out.
The extra formats supported by the old format_name() are added as
ALIASes. A side-effect of that is that now also format_from_string()
recognizes the following new names: x4c4 / c8, x4g4 / g8, c4, g4, g1,
yuy2, yv12, null, solid, pixbuf, rpixbuf, unknown.
Name aliases will be useful in follow-up patches, where
lowlevel-blt-bench.c is converted to parse short-hand format names from
strings.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
Previously there was a flat list of operators (pixman_op_t), used to
iterate over all operators when looking up an operator from name or
listing them. This cannot support name aliases.
To support name aliases (multiple name strings mapping to the same
operator), create an operator-name mapping table. Functions
operator_name, operator_from_string, and list_operators should keep on
working exactly like before, except operator_from_string now recognizes
a few aliases too.
Name aliases will be useful in follow-up patches, where
lowlevel-blt-bench.c is converted to parse operator names from strings.
Lowlevel-blt-bench uses shorthand names instead of the usual names. This
change allows lowlevel-blt-bench.s to use operator_from_string in the
future.
Signed-off-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Reviewed-by: Ben Avison <bavison@riscosopen.org>
This permits format_from_string(), list_formats(), list_operators() and
operator_from_string() to be used from tests other than check-formats.
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
A few violations of coding style were identified in code copied from here
into affine-bench.
Reviewed-by: Pekka Paalanen <pekka.paalanen@collabora.co.uk>
Using "--disable-sse2 --disable-ssse3" configure options and
CFLAGS="-m32 -O2 -g" on an x86 system results in pixman "make check"
failures:
../test-driver: line 95: 29874 Aborted
FAIL: affine-test
../test-driver: line 95: 29887 Aborted
FAIL: scaling-test
One _mm_empty () was missing and another one is needed to workaround
an old GCC bug https://gcc.gnu.org/PR47759 (GCC may move MMX instructions
around and cause test suite failures).
Reviewed-by: Matt Turner <mattst88@gmail.com>
Since a4c79d695d the constant
BILINEAR_INTERPOLATION_BITS must be strictly less than 8, so fix the
comment to say this, and also add a COMPILE_TIME_ASSERT in the
bilinear fetcher in pixman-fast-path.c
The variables left_x, and right_x in gradient_walker_reset() are
computed from pos, which is a 64 bit quantity, so to avoid overflows,
these variables must be 64 bit as well.
Similarly, the left_x and right_x that are stored in
pixman_gradient_walker_t need to be 64 bit as well; otherwise,
pixman_gradient_walker_pixel() will call reset too often.
This fixes the radial-invalid test, which was generating 'invalid'
floating point exceptions when the overflows caused color values to be
outside of [0, 255].
This program demonstrates a bug in gradient walker, where some integer
overflows cause colors outside the range [0, 255] to be generated,
which in turns cause 'invalid' floating point exceptions when those
colors are converted to uint8_t.
The bug was first reported by Owen Taylor on the #cairo IRC channel.
Benchmark results, "before" is upstream/master
5f661ee719, and "after" contains this
patch on top.
lowlevel-blt-bench, src_8888_0565, 100 iterations:
Before After
Mean StdDev Mean StdDev Confidence Change
L1 25.9 0.20 115.6 0.70 100.00% +347.1%
L2 14.4 0.23 52.7 3.48 100.00% +265.0%
M 14.1 0.01 79.8 0.17 100.00% +465.9%
HT 10.2 0.03 32.9 0.31 100.00% +221.2%
VT 9.8 0.03 29.8 0.25 100.00% +203.4%
R 9.4 0.03 27.8 0.18 100.00% +194.7%
RT 4.6 0.04 10.9 0.29 100.00% +135.9%
At most 19 outliers rejected per test per set.
cairo-perf-trace with trimmed traces results were indifferent.
A system-wide perf_3.10 profile on Raspbian shows significant
differences in the X server CPU usage. The following were measured from
a 130x62 char lxterminal running 'dmesg' every 0.5 seconds for roughly
30 seconds. These profiles are libpixman.so symbols only.
Before:
Samples: 63K of event 'cpu-clock', Event count (approx.): 2941348112, DSO: libpixman-1.so.0.33.1
37.77% Xorg [.] fast_fetch_r5g6b5
14.39% Xorg [.] pixman_composite_over_n_8_8888_asm_armv6
8.51% Xorg [.] fast_write_back_r5g6b5
7.38% Xorg [.] pixman_composite_src_8888_8888_asm_armv6
4.39% Xorg [.] pixman_composite_add_8_8_asm_armv6
3.69% Xorg [.] pixman_composite_src_n_8888_asm_armv6
2.53% Xorg [.] _pixman_image_validate
2.35% Xorg [.] pixman_image_composite32
After:
Samples: 31K of event 'cpu-clock', Event count (approx.): 3619782704, DSO: libpixman-1.so.0.33.1
22.36% Xorg [.] pixman_composite_over_n_8_8888_asm_armv6
13.59% Xorg [.] pixman_composite_src_x888_0565_asm_armv6
12.75% Xorg [.] pixman_composite_src_8888_8888_asm_armv6
6.79% Xorg [.] pixman_composite_add_8_8_asm_armv6
5.95% Xorg [.] pixman_composite_src_n_8888_asm_armv6
4.12% Xorg [.] pixman_image_composite32
3.69% Xorg [.] _pixman_image_validate
3.65% Xorg [.] _pixman_bits_image_setup_accessors
Before, fast_fetch_r5g6b5 + fast_write_back_r5g6b5 took 46% of the
samples in libpixman, and probably incurred some memcpy() load, too.
After, pixman_composite_src_x888_0565_asm_armv6 takes 14%. Note, that
the sample counts are very different before/after, as less time is spent
in Pixman and running time is not exactly the same.
Furthermore, in the above test, the CPU idle function was sampled 9%
before, and 15% after.
v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
Re-benchmarked on Raspberry Pi, commit message.
The two ARM headers contained open-coded copies of pixman_asm_function,
replace these.
Since it seems customary that ARM headers do not use CPP include guards,
rely on the .S files to #include "pixman-arm-asm.h" first. They all
do now.
v2: Fix a build failure on rpi by adding one #include.
Benchmark results, "before" is the patch
* upstream/master 4b76bbfda6
+ ARMv6: Support for very variable-hungry composite operations
+ ARMv6: Add fast path for over_n_8888_8888_ca
and "after" contains the additional patches on top:
+ ARMv6: Add fast path flag to force no preload of destination buffer
+ ARMv6: Add fast path for in_reverse_8888_8888 (this patch)
lowlevel-blt-bench, in_reverse_8888_8888, 100 iterations:
Before After
Mean StdDev Mean StdDev Confidence Change
L1 21.1 0.07 32.3 0.08 100.00% +52.9%
L2 11.6 0.29 18.0 0.52 100.00% +54.4%
M 10.5 0.01 16.1 0.03 100.00% +54.1%
HT 8.2 0.02 12.0 0.04 100.00% +45.9%
VT 8.1 0.02 11.7 0.04 100.00% +44.5%
R 8.1 0.02 11.3 0.04 100.00% +39.7%
RT 4.8 0.04 6.1 0.09 100.00% +27.3%
At most 12 outliers rejected per test per set.
cairo-perf-trace with trimmed traces, 30 iterations:
Before After
Mean StdDev Mean StdDev Confidence Change
t-firefox-paintball.trace 18.0 0.01 14.1 0.01 100.00% +27.4%
t-firefox-chalkboard.trace 36.7 0.03 36.0 0.02 100.00% +1.9%
t-firefox-canvas-alpha.trace 20.7 0.22 20.3 0.22 100.00% +1.9%
t-swfdec-youtube.trace 7.8 0.03 7.8 0.03 100.00% +0.9%
t-firefox-talos-gfx.trace 25.8 0.44 25.6 0.29 93.87% +0.7% (insignificant)
t-firefox-talos-svg.trace 20.6 0.04 20.6 0.03 100.00% +0.2%
t-firefox-fishbowl.trace 21.2 0.04 21.1 0.02 100.00% +0.2%
t-xfce4-terminal-a1.trace 4.8 0.01 4.8 0.01 98.85% +0.2% (insignificant)
t-swfdec-giant-steps.trace 14.9 0.03 14.9 0.02 99.99% +0.2%
t-poppler-reseau.trace 22.4 0.11 22.4 0.08 86.52% +0.2% (insignificant)
t-gnome-system-monitor.trace 17.3 0.03 17.2 0.03 99.74% +0.2%
t-firefox-scrolling.trace 24.8 0.12 24.8 0.11 70.15% +0.1% (insignificant)
t-firefox-particles.trace 27.5 0.18 27.5 0.21 48.33% +0.1% (insignificant)
t-grads-heat-map.trace 4.4 0.04 4.4 0.04 16.61% +0.0% (insignificant)
t-firefox-fishtank.trace 13.2 0.01 13.2 0.01 7.64% +0.0% (insignificant)
t-firefox-canvas.trace 18.0 0.05 18.0 0.05 1.31% -0.0% (insignificant)
t-midori-zoomed.trace 8.0 0.01 8.0 0.01 78.22% -0.0% (insignificant)
t-firefox-planet-gnome.trace 10.9 0.02 10.9 0.02 64.81% -0.0% (insignificant)
t-gvim.trace 33.2 0.21 33.2 0.18 38.61% -0.1% (insignificant)
t-firefox-canvas-swscroll.trace 32.2 0.09 32.2 0.11 73.17% -0.1% (insignificant)
t-firefox-asteroids.trace 11.1 0.01 11.1 0.01 100.00% -0.2%
t-evolution.trace 13.0 0.05 13.0 0.05 91.99% -0.2% (insignificant)
t-gnome-terminal-vim.trace 19.9 0.14 20.0 0.14 97.38% -0.4% (insignificant)
t-poppler.trace 9.8 0.06 9.8 0.04 99.91% -0.5%
t-chromium-tabs.trace 4.9 0.02 4.9 0.02 100.00% -0.6%
At most 6 outliers rejected per test per set.
Cairo perf reports the running time, but the change is computed for
operations per second instead (inverse of running time).
Confidence is based on Welch's t-test. Absolute changes less than 1%
can be accounted as measurement errors, even if statistically
significant.
There was a question of why FLAG_NO_PRELOAD_DST is used. It makes
lowlevel-blt-bench results worse except for L1, but improves some
Cairo trace benchmarks.
"Ben Avison" <bavison@riscosopen.org> wrote:
> The thing with the lowlevel-blt-bench benchmarks for the more
> sophisticated composite types (as a general rule, anything that involves
> branches at the per-pixel level) is that they are only profiling the case
> where you have mid-level alpha values in the source/mask/destination.
> Real-world images typically have a disproportionate number of fully
> opaque and fully transparent pixels, which is why when there's a
> discrepancy between which implementation performs best with cairo-perf
> trace versus lowlevel-blt-bench, I usually favour the Cairo winner.
>
> The results of removing FLAG_NO_PRELOAD_DST (in other words, adding
> preload of the destination buffer) are easy to explain in the
> lowlevel-blt-bench results. In the L1 case, the destination buffer is
> already in the L1 cache, so adding the preloads is simply adding extra
> instruction cycles that have no effect on memory operations. The "in"
> compositing operator depends upon the alpha of both source and
> destination, so if you use uniform mid-alpha, then you actually do need
> to read your destination pixels, so you benefit from preloading them. But
> for fully opaque or fully transparent source pixels, you don't need to
> read the corresponding destination pixel - it'll either be left alone or
> overwritten. Since the ARM11 doesn't use write-allocate cacheing, both of
> these cases avoid both the time taken to load the extra cachelines, as
> well as increasing the efficiency of the cache for other data. If you
> examine the source images being used by the Cairo test, you'll probably
> find they mostly use transparent or opaque pixels.
v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
Rebased, re-benchmarked on Raspberry Pi, commit message.
v5, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
Rebased, re-benchmarked on Raspberry Pi due to a fix to
"ARMv6: Add fast path for over_n_8888_8888_ca" patch.
Previously, the variable ARGS_STACK_OFFSET was available to extract values
from function arguments during the init macro. Now this changes dynamically
around stack operations in the function as a whole so that arguments can be
accessed at any point. It is also joined by LOCALS_STACK_OFFSET, which
allows access to space reserved on the stack during the init macro.
On top of this, composite macros now have the option of using all of WK0-WK3
registers rather than just the subset it was told to use; this requires the
pixel count to be spilled to the stack over the leading pixels at the start
of each line. Thus, at best, each composite operation can use 11 registers,
plus any pointer registers not required for the composite type, plus as much
stack space as it needs, divided up into constants and variables as necessary.
In create_bits() both height and stride are ints, so the result is
also an int, which will overflow if height or stride are big enough
and size_t is bigger than int.
This patch simply casts height to size_t to prevent these overflows,
which prevents the crash in:
https://bugzilla.redhat.com/show_bug.cgi?id=972647
It's not even close to fixing the full problem of supporting big
images in pixman.
See also
https://bugs.freedesktop.org/show_bug.cgi?id=69014
Several files define identically the asm macro pixman_asm_function.
Merge all these definitions into a new asm header.
The original definition is taken from pixman-arm-simd-asm-scaled.S with
the copyright/licence/author blurb verbatim.
Compiling with the Intel Compiler reveals a problem:
tolerance-test.c(350): error: index variable "i" of for statement following an OpenMP for pragma must be private
# pragma omp parallel for default(none) shared(i) private (result)
^
In addition to this, the 'result' variable also should not be private
(otherwise its value does not survive after the end of the loop). It
needs to be either shared or use the reduction clause to describe how
the results from multiple threads are combined together. Reduction
seems to be more appropriate here.
The Intel Compiler 14.0.0 claims version GCC 4.7.3 compatibility
via __GNUC__/__GNUC__MINOR__ macros, but does not provide the same
level of GCC vector extensions support as the original GCC compiler:
http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html
Which results in the following compilation failure:
In file included from ../test/utils.h(7),
from ../test/utils.c(3):
../test/utils-prng.h(138): error: expression must have integral type
uint32x4 e = x->a - ((x->b << 27) + (x->b >> (32 - 27)));
^
The problem is fixed by doing a special check in configure for
this feature.
in_reverse_8888_8888 is one of the more commonly used operations in the
cairo-perf-trace suite that hasn't been in lowlevel-blt-bench until now.
v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
Split from "Add extra test to lowlevel-blt-bench and fix an
existing one", new summary.
This knocks off one instruction per row. The effect is probably too small to
be measurable, but might as well be included. The second occurrence of this
sequence doesn't actually benefit at all, but is changed for consistency.
The saved instruction comes from combining the "and" inside the .if
statement with an earlier "tst". The "and" was normally needed, except
for in one special case, where bits 4-31 were all shifted off the top of
the register later on in preload_leading_step2, so we didn't care about
their values.
v4, Pekka Paalanen <pekka.paalanen@collabora.co.uk> :
Remove "bits 0-3" from the comments, update patch summary, and
augment message with Ben's suggestion.
An upcoming commit will delete many of the operators from
pixman-combine32.c and rely on the ones in pixman-combine-float.c. The
comments about how the operators were derived are still useful though,
so copy them into pixman-combine-float.c before the deletion.
Consider a HARD_LIGHT operation with the following pixels:
- source: 15 (6 bits)
- source alpha: 255 (8 bits)
- mask alpha: 223 (8 bits)
- dest 255 (8 bits)
- dest alpha: 0 (8 bits)
Since 2 times the source is less than source alpha, the first branch
of the hard light blend mode is taken:
(1 - sa) * d + (1 - da) * s + 2 * s * d
Since da is 0 and d is 1, this degenerates to:
(1 - sa) + 3 * s
Taking (src IN mask) into account along with the fact that sa is 1,
this becomes:
(1 - ma) + 3 * s * ma
= (1 - 223/255.0) + 3 * (15/63.0) * (223/255.0)
= 0.7501400560224089
When computed with the source converted by bit replication to eight
bits, and additionally with the (src IN mask) part rounded to eight
bits, we get:
ma = 223/255.0
s * ma = (60 / 255.0) * (223/255.0) which rounds to 52 / 255
and the result is
(1 - ma) + 3 * s * ma
= (1 - 223/255.0) + 3 * 52/255.0
= 0.7372549019607844
so now we have an error of 0.012885.
Without making changes to the way pixman does integer
rounding/arithmetic, this error must then be considered
acceptable. Due to conservative computations in the test suite we can
however get away with 0.0128 as the acceptable deviation.
This fixes the remaining failures in pixel-test.
Consider a DISJOINT_ATOP operation with the following pixels:
- source: 0xff (8 bits)
- source alpha: 0x01 (8 bits)
- mask alpha: 0x7b (8 bits)
- dest: 0x00 (8 bits)
- dest alpha: 0xff (8 bits)
When (src IN mask) is computed in 8 bits, the resulting alpha channel
is 0 due to rounding:
floor ((0x01 * 0x7b) / 255.0 + 0.5) = floor (0.9823) = 0
which means that since Render defines any division by zero as
infinity, the Fa and Fb for this operator end up as follows:
Fa = max (1 - (1 - 1) / 0, 0) = 0
Fb = min (1, (1 - 0) / 1) = 1
and so since dest is 0x00, the overall result is 0.
However, when computed in full precision, the alpha value no longer
rounds to 0, and so Fa ends up being
Fa = max (1 - (1 - 1) / 0.0001, 0) = 1
and so the result is now
s * ma * Fa + d * Fb
= (1.0 * (0x7b / 255.0) * 1) + d * 0
= 0x7b / 255.0
= 0.4823
so the error in this case ends up being 0.48235294, which is clearly
not something that can be considered acceptable.
In order to avoid this problem, we need to do all arithmetic in such a
way that a multiplication of two tiny numbers can never end up being
zero unless one of the input numbers is itself zero.
This patch makes all computations that involve divisions take place in
floating point, which is sufficient to fix the test cases
This brings the number of failures in pixel-test down to 14.
The Soft Light operator has several branches. One them is decided
based on whether 2 * s is less than or equal to 2 * sa. In floating
point implementations, when those two values are very close to each
other, it may not be completely predictable which branch we hit.
This is a problem because in one branch, when destination alpha is
zero, we get the result
r = d * as
and in the other we get
r = 0
So when d and as are not 0, this causes two different results to be
returned from essentially identical input values. In other words,
there is a discontinuity in the current implementation.
This patch randomly changes the second branch such that it now returns
d * sa instead. There is no deep meaning behind this, because
essentially this is an attempt to assign meaning to division by zero,
and all that is requires is that that meaning doesn't depend on minute
differences in input values.
This makes the number of failed pixels in pixel-test go down to 347.
In the component alpha part of the PDF_SEPARABLE_BLEND_MODE macro, the
expression ~RED_8 (m) is used. Because RED_8(m) gets promoted to int
before ~ is applied, the whole expression typically becomes some
negative value rather than (255 - RED_8(m)) as desired.
Fix this by using unsigned temporary variables.
This reduces the number of failures in pixel-test to 363.
This commit fixes four separate bugs:
1. In the computation
(1 - sa) * d + (1 - da) * s + sa * da * B(s, d)
we were using regular addition for all four channels, but for
superluminescent pixels, the addition could overflow causing
nonsensical results.
2. The variables and return types used for the results of the blend
mode calculations were unsigned, but for various blend modes (and
especially with superluminescent pixels), the blend mode
calculations could be negative, resulting in underflows.
3. The blend mode computations were returned as 8-bit values, which is
not sufficient precision (especially considering that we need
signed results).
4. The value before the final division by 255 was not properly clamped
to [0, 255].
This patch fixes all those bugs. The blend mode computations are now
returned as signed 16 bit values with 1 represented as 255 * 255.
With these fixes, the number of failing pixels in pixel-test goes down
from 431 to 384.
This commit adds a large number of pixel regressions to
pixel-test. All of these have at some point been failing in
blend-mode-test, and most of them do fail currently.
To be specific, with this commit, pixel-test reports 431 failed tests.
This new test program is similar to test/composite in that it relies
on the pixel_checker_t API to do tolerance based verification. But
unlike the composite test, which verifies combinations of a fixed set
of pixels, this one generates random images and verifies that those
composite correctly.
Also unlike composite, tolerance-test supports all the separable blend
mode operators in addition to the original Render operators.
When tests fail, a C struct is printed that can be pasted into
pixel-test for regression purposes.
There is an option "--forever" which causes the random seed to be set
to the current time, and then the test runs until interrupted. This is
useful for overnight runs.
This test currently fails badly due to various bugs in the blend mode
operators. Later commits will fix those.
Support is added to pixel-test for verifying operations involving
masks. If a regression includes a mask, it is verified with the
pixel_checker API in in both unified and component alpha modes.
With GCC 4.8.2 the COMPILE_TIME_ASSERT macro produces a spurious
warning about an unused local typedef:
In file included from pixman.c:29:0:
pixman.c: In function 'optimize_operator':
pixman-private.h:1019:22: warning: typedef 'compile_time_assertion' locally defined but not used [-Wunused-local-typedefs]
The flag -Wno-unused-local-typedefs suppresses that warning.
Currently, if you attempt to use component alpha on source images or
images without RGB channels, Pixman will silently just use unified
alpha instead. This patch makes such images supported for component
alpha.
There is no particularly compelling usecase at the moment, but this
patch does get rid of a bit of special-case code both in
pixman-general.c and in test/composite.c.
The call_test_function() contains some assembly that deliberately
causes the stack to be aligned to 32 bits rather than 128 bits on
x86-32. The intention is to catch bugs that surface when pixman is
called from code that only uses a 32 bit alignment.
However, recent versions of GCC apparently make the assumption (either
accidentally or deliberately) that that the incoming stack is aligned
to 128 bits, where older versions only seemed to make this assumption
when compiling with -msse2. This causes the vector code in the PRNG to
now segfault when called from call_test_function() on x86-32.
This patch fixes that by only making the stack unaligned on 32 bit
Windows, where it would definitely be incorrect for GCC to assume that
the incoming stack is aligned to 128 bits.
V2: Put "defined(...)" around __GNUC__
Reviewed-and-Tested-by: Matt Turner <mattst88@gmail.com>
Bugzilla: https://bugs.gentoo.org/show_bug.cgi?id=491110
SSSE3 is detected by bit 9 of ECX, but we were checking bit 9 of EDX
which is APIC leading to SSSE3 routines being called on CPUs without
SSSE3.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Without this, if tarballs are generated on a system that doesn't have
GTK+ 2 development headers available, the files in EXTRA_DIST will not
be included, which then causes builds from the tarball to fail on
systems that do have GTK+ 2 headers available.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=71465
RELEASE_TYPE=$$(iftest"x$(PIXMAN_VERSION_MINOR)"="x$$(echo "$(PIXMAN_VERSION_MINOR)/2*2" | bc)";thenecho"stable release in the";elseecho"development snapshot leading up to a stable";fi)
release-publish-message:$(HASHFILES)ensure-prev
@echo "Please follow the instructions in RELEASING to push stuff out and"
@echo "send out the announcement mails. Here is the excerpt you need:"
@echo ""
@echo "Lists: $(RELEASE_ANNOUNCE_LIST)"
@echo "Subject: [ANNOUNCE] $(PACKAGE) release $(VERSION) now available"
@echo "============================== CUT HERE =============================="
@echo "A new $(PACKAGE) release $(VERSION) is now available. This is a $(RELEASE_TYPE)"
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.