On migration, destroy_surfaces is called from qxl (qxl_hard_reset), before the device was loaded (on destination).
handle_dev_destroy_surfaces led to red_process_commands, which read the qxl command ring
(which appeared to be not empty), and then when processing the command
it accessed unmapped memory.
This does the following, all to remove any referenced memory on the pci bars:
flush_all_qxl_commands(worker);
flush_all_surfaces(worker);
red_wait_outgoing_item((RedChannel *)worker->display_channel);
red_wait_outgoing_item((RedChannel *)worker->cursor_channel);
The added api is specifically async, i.e. it calls async_complete
when done.
(cherry picked from commit 2a4d97fb78)
The new _ASYNC io's in qxl_dev listed at the end get six new api
functions, and an additional callback function "async_complete". When
the async version of a specific io is used, completion is notified by
calling async_complete, and no READY message is written or expected by
the dispatcher.
update_area has been changed to push QXLRects to the worker thread, where
the conversion to SpiceRect takes place.
A cookie has been added to each async call to QXLWorker, and is passed back via
async_complete.
Added api:
QXLWorker:
update_area_async
add_memslot_async
destroy_surfaces_async
destroy_primary_surface_async
create_primary_surface_async
destroy_surface_wait_async
QXLInterface:
async_complete
(cherry picked from commit 096f49afbf)
In addition (1) make handle_dev_destroy_surfaces call red_release_cursor
(2) call red_wait_outgoing_item(cursor_channel) only after adding msgs to pipe
[3d3066b175 cherry-pick with modifications]
red_pipe_add_drawable can lead to removal of drawables from current tree
(since it calls red_handle_drawable_surfaces_client_synced), which can
also lead to releasing these drawables.
Before the fix, red_current_add_equal, called red_pipe_add_drawable,
without assuring afterwards that the drawables it refers to are still alive or
still in the current tree.
red_handle_drawable_surfaces_client_synced was called only from red_pipe_add_drawable, while it
should also be called from red_pipe_add_drawable_after. Otherwise, the client
might receive a command with a reference to a surface it doesn't hold and crash.
When the worker was stoped, the cursor was copied from guest ram to the host ram,
and its corresponding qxl command was released.
This is unecessary, since the qxl ram still exists (we keep references
to the surfaces in the same manner).
It also led to BSOD on guest upon migration: the device tracks cursor set commands and it stores
a reference to the last one. Then, it replays it to the destination server when migrating to it.
However, the command the qxl replayed has already been released from the pci by the original
worker, upon STOP.
Conflicts:
server/red_worker.c
This is stylish change again. We are talking about a RedStream object,
so let's just name the variable "stream" everywhere, to avoid
confusion with a non existent RedPeer object.
https://bugs.freedesktop.org/show_bug.cgi?id=34795
drawable_count was becoming negative. It tracks the number of
items in the worker->current_list ring. It was decremented correctly,
but incremented only in several cases. The cases it wasn't incremented
where:
red_current_add_equal found an equivalent drawable
by moving the increment to where the item is added to current_list, in
__current_add_drawable, the accounting remains correct.
This has no affect other then correct accounting, as drawable_count isn't
used for anything.
When drawing a drawable with a NULL src bitmap that means we should
be using the previously generated self_bitmap. Not doing this causes
a segfault due to accessing the NULL.
The self_bitmap is the size of self_bitmap_area, not the bbox.
This is especially important since we later copy the self_bitmap_area
into the new bitmap, and if that is larger than bbox then we will
overwrite random memory.
We really need to flush the ring to ensure that we push something on the
release ring. If we don't do this and the ring is not pushed for other
reasons we will timeout in the guest driver waiting for the ring.
We've changed how resources are released so they are now being
freed continuosly, rather than on OOM, since we want to free as early
possible to avoid fragmentation. So, OOM situations should be a bit
less common now and signify a real memory shortage, so we should try
to free up more resources.
This will prevent: 1) rendering problems (commands sent to the client in the wrong order)
2) sending commands for surfaces that haven't been created yet on the client side.
A side effect of the previous red_current_flush, which flushed all the surfaces, and was called on a new display channel connection, was
that red_handle_drawable_surfaces_client_synced sent the most updated surfaces images when needed. However, now, it should
explicitly call red_current_flush.
Moreover, since red_current_flush was called on a new display channel connection only if there was a primary surface,
if the connection of the display channel occurred at the moment of no primary surface, red_handle_drawable_surfaces_client_synced was buggy.
The actual bitmap data was added to the main marshaller rather than
the submarshaller that pointed to the SpiceImage part. This made us
send too short messages failing demarshalling in the client.
When the we reset qxl, we destroy all srufaces. Since surfaces and glz
drawables are no longer dependent, we need to call red_display_clear_glz_drawables explicitly
in order to clear all our drawables references in the server.