Previously this routine would compute (x * a + y * b) / 255. Now it
computes (x * a) / 255 + (y * b) / 255, so that the results are
bitwise equivalent to the non-mmx versions.
Previously they were not bit-for-bit equivalent to the one-component
versions. The new code is also simpler and easier to read because it
factors out some common sub-macros.
The x * a + y * b macro now only uses four multiplications - the
previous version used eight.
Previously the code assumed that an alpha of 0 meant that no change
would take place. This is incorrect because an alpha of 0 can happen
as the result of the source having alpha=0, but rgb != 0.
By default both are intialized to bits_image_fetch_pixel_raw(), but if
there is an alpha map, then fetch_pixel_32() is set to
bits_image_fetch_pixel_alpha().
Previously, it would generate a buffer of coordinates, then pass that
off to a pixel fetcher, but this caused a large performance regression
with the swfdec-fill-rate-2xfsaa cairo trace.
This is the first step towards fixing that.
Soeren rightfully complained that I had removed all the comments from
André's patch, most importantly that explain why the transformation is
valid. So add a few details to show that B varies linearly across the
scanline and how we can therefore reduce the per-pixel cost of evaluating
B.
Fixes: Bug 22908 -- Invalid output of radial gradient
http://bugs.freedesktop.org/show_bug.cgi?id=22908
We also include a modified patch by André Tupinambá <andrelrt@gmail.com>,
to pull constant expressions out of the inner radial gradient walker.
Microsoft C++ does not define __m64 and all related MMX functions in
x64. However, it succeeds in generating object files for SSE2 code
inside pixman.
The real problem happens during linking, when it cannot find MMX functions
(which are not defined as intrinsics for AMD64 platform).
I have implemented those missing functions using general programming.
MMX __m64 is used relatively scarcely within SSE2 implementation, and the
performance impact probably is negligible.
Bug 22390.
During the fast-path query, the read_func and write_func from the bits
structure are queried for the solid image.
==32723== Conditional jump or move depends on uninitialised value(s)
==32723== at 0x412AF20: _pixman_run_fast_path (pixman-utils.c:681)
==32723== by 0x4136319: sse2_composite (pixman-sse2.c:5554)
==32723== by 0x4100CD2: _pixman_implementation_composite
(pixman-implementation.c:227)
==32723== by 0x412396E: pixman_image_composite (pixman.c:140)
==32723== by 0x4123D64: pixman_image_fill_rectangles (pixman.c:322)
==32723== by 0x40482B7: _cairo_image_surface_fill_rectangles
(cairo-image-surface.c:1180)
==32723== by 0x4063BE7: _cairo_surface_fill_rectangles
(cairo-surface.c:1883)
==32723== by 0x4063E38: _cairo_surface_fill_region
(cairo-surface.c:1840)
==32723== by 0x4067FDC: _clip_and_composite_trapezoids
(cairo-surface-fallback.c:625)
==32723== by 0x40689C5: _cairo_surface_fallback_paint
(cairo-surface-fallback.c:835)
==32723== by 0x4065731: _cairo_surface_paint (cairo-surface.c:1923)
==32723== by 0x4044098: _cairo_gstate_paint (cairo-gstate.c:900)
==32723== Uninitialised value was created by a heap allocation
==32723== at 0x402732D: malloc (vg_replace_malloc.c:180)
==32723== by 0x410099F: _pixman_image_allocate (pixman-image.c:100)
==32723== by 0x41265B8: pixman_image_create_solid_fill
(pixman-solid-fill.c:75)
==32723== by 0x4123CE1: pixman_image_fill_rectangles (pixman.c:314)
==32723== by 0x40482B7: _cairo_image_surface_fill_rectangles
(cairo-image-surface.c:1180)
==32723== by 0x4063BE7: _cairo_surface_fill_rectangles
(cairo-surface.c:1883)
==32723== by 0x4063E38: _cairo_surface_fill_region
(cairo-surface.c:1840)
==32723== by 0x4067FDC: _clip_and_composite_trapezoids
(cairo-surface-fallback.c:625)
==32723== by 0x40689C5: _cairo_surface_fallback_paint
(cairo-surface-fallback.c:835)
==32723== by 0x4065731: _cairo_surface_paint (cairo-surface.c:1923)
==32723== by 0x4044098: _cairo_gstate_paint (cairo-gstate.c:900)
==32723== by 0x403C10B: cairo_paint (cairo.c:2052)
This works because the X server always attempts to set a clip region
within the bounds of the drawable, and it only fails at it when it is
computing the wrong translation and therefore needs the workaround.