All of the information previously passed to the iterator initializers
is now available in the iterator itself, so there is no need to pass
it as arguments anymore.
This makes _pixman_implementation_{src,dest}_iter_init() responsible
for filling parts of the information in the iterators. Specifically,
the information passed as arguments is stored in the iterator.
Also add a height field to pixman_iter_t().
Add two new iterator flags, ITER_IGNORE_ALPHA and ITER_IGNORE_RGB that
are set when the alpha and rgb values are not needed. If both are set,
then we can skip fetching entirely and just use
_pixman_iter_get_scanline_noop.
Introduce a new ITER_LOCALIZED_ALPHA flag that indicates that the
alpha value computed is used only for the alpha channel of the output;
it doesn't affect the RGB channels.
Then in pixman-bits-image.c, if a destination is either a8r8g8b8 or
x8r8g8b8 with localized alpha, the iterator will return a pointer
directly into the image.
Make src_iter_init() and dest_iter_init() virtual methods in the
implementation struct. This allows individual implementations to plug
in their own CPU specific scanline fetchers.
Instead of calling _pixman_image_get_scanline_32/64(), move the
iterator initialization into the respecive image implementations and
call the scanline generators directly.
We add a new structure called a pixman_iter_t that encapsulates the
information required to read scanlines from an image. It contains two
functions, get_scanline() and write_back(). The get_scanline()
function will generate pixels for the current scanline. For iterators
for source images, it will also advance to the next scanline. The
write_back() function is only called for destination images. Its
function is to write back the modified pixels to the image and then
advance to the next scanline.
When an iterator is initialized, it is passed this information:
- The image to iterate
- The rectangle to be iterated
- A buffer that the iterator may (but is not required to) use. This
buffer is guaranteed to have space for at least width pixels.
- A flag indicating whether a8r8g8b8 or a16r16g16b16 pixels should
be fetched
There are a number of (eventual) benefits to the iterators:
- The initialization of the iterator can be virtualized such that
implementations can plug in their own CPU specific get_scanline()
and write_back() functions.
- If an image is horizontal, it can simply plug in an appropriate
get_scanline(). This way we can get rid of the annoying
classify() virtual function.
- In general, iterators can remember what they did on the last
scanline, so for example a REPEAT_NONE image might reuse the same
data for all the empty scanlines generated by the zero-extension.
- More detailed information can be passed to iterator, allowing
more specialized fetchers to be used.
- We can fix the bug where destination filters and transformations
are not currently being ignored as they should be.
However, this initial implementation is not optimized at all. We lose
several existing optimizations:
- The ability to composite directly in the destination
- The ability to only fetch one scanline for horizontal images
- The ability to avoid fetching the src and mask for the CLEAR
operator
Later patches will re-introduce these optimizations.
The temporary scanline buffer allocated on stack was declared
as uint8_t array. As a result, the compiler was free to select
any arbitrary alignment for it (even though there is typically
no reason to use really weird alignments here and the stack is
normally at least 4 bytes aligned on most platforms). Having
improper alignment is non-portable and can impact performance
or even make the code misbehave depending on the target platform.
Using uint64_t type for this array should ensure that any possible
memory accesses done by pixman code are going to be handled correctly
(pixman-combine64.c can access this buffer via uint64_t * pointer).
Some alignment related problem was reported in:
http://lists.freedesktop.org/archives/pixman/2010-November/000747.html
If an image has an alpha map that has wide components, then we need to
use 64 bit processing for that image. We detect this situation in
pixman-image.c and remove the FAST_PATH_NARROW_FORMAT flag.
In pixman-general, the wide/narrow decision is now based on the flags
instead of on the formats.
This avoids a negative in the name. Also, by renaming the "wide"
variable in pixman-general.c to "narrow" and fixing up the logic
correspondingly, the code there reads a lot more straightforwardly.
Back in the day, the mask_bits argument was used to distinguish
between masks used for component alpha (where it was 0xffffffff) and
masks for unified alpha (where it was 0xff000000). In this way, the
fetchers could check if just the alpha channel was 0 and in that case
avoid fetching the source.
However, we haven't actually used it like that for a long time; it is
currently always either 0xffffffff or 0 (if the mask is NULL). It also
doesn't seem worthwhile resurrecting it because for premultiplied
buffers, if alpha is 0, then so are the color channels
normally.
This patch eliminates the mask_bits and changes the fetchers to just
assume it is 0xffffffff if mask is non-NULL.
They are no longer necessary because we will just walk the fast path
tables, and the general composite path is treated as another fast
path.
This unfortunately means that sse2_composite() can no longer be
responsible for realigning the stack to 16 bytes, so we have to move
that to pixman_image_composite().
We introduce a new PIXMAN_OP_any fake operator and a PIXMAN_any fake
format that match anything. Then general_composite_rect() can be used
as another fast path.
Because general_composite_rect() does not require the sources to cover
the clip region, we add a new flag FAST_PATH_COVERS_CLIP which is part
of the set of standard flags for fast paths.
Because this flag cannot be computed until after the clip region is
available, we have to call pixman_compute_composite_region32() before
checking for fast paths. This will resolve itself when we get to the
point where _pixman_run_fast_path() is only called once per composite
operation.
When the destination buffer is either a8r8g8b8 or x8r8g8b8, we can use
it directly instead of fetching into a temporary buffer. When the
format is x8r8g8b8, we require the operator to not make use of
destination alpha, but when it is a8r8g8b8, there are no restrictions.
This is approximately a 5% speedup on the poppler cairo benchmark:
[ # ] backend test min(s) median(s) stddev. count
Before:
[ 0] image poppler 6.661 6.709 0.59% 6/6
After:
[ 0] image poppler 6.307 6.320 0.12% 5/6
For consistency we will probably want to allow component alpha to be
set on all masks at some point, but this commit only enabled it for
solid images.
This reverts commit 29e22cf38e.
The general_composite_rect() function has two invocations
of the return_if_fail() macro before any of its variable
declarations. Removing them allows for compilation to
succeed using a pre-C99 compiler.
The new rule is:
- Output is clipped to the destination clip region.
- If a source image has the clip_sources property set, then there
is an additional step, after repeating and transforming, but before
compositing, where pixels that are not in the source clip are
rejected. Rejected means no compositing takes place (not that the
pixel is treated as 0). By default source clipping is turned off;
when they are turned on, only client-set clips are honored.
The old rules were unclear and inconsistently implemented.