When qemu migration completes, we need to stop the streams, and to send
the corresponding upgrade_items to the client.
Otherwise, (1) the client might display lossy regions that we don't track
(streams are not part of the migration data).
(2) streams_timeout may occur after MSG_MIGRATE has been sent, leading
to messages being sent to the client after MSG_MIGRATE and before
MSG_MIGRATE_DATA (e.g., STREAM_CLIP, STREAM_DESTROY, DRAW_COPY).
No message besides MSG_MIGRATE_DATA should be sent after
MSG_MIGRATE.
When a msg other than MIGRATE_DATA reached spice-gtk after MSG_MIGRATE,
spice-gtk sent it to dest server as the migration data, and the dest
server crashed with a "bad message size" assert.
1) This does not buy us much, as red_marshall_monitors_config() also
removes 0x0 sized monitors and does a much better job at it
(also removing intermediate ones, not only tailing ones)
2) The code is wrong, as it allocs space for real_count heads, where
real_count always <= monitors_config->count and then stores
monitors_config->count in worker->monitors_config->count, causing
red_marshall_monitors_config to potentially walk
worker->monitors_config->heads past its boundaries.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
During my dynamic monitor support testing today, I hit the following assert
in red_worker.c:
"red_push_monitors_config: condition `monitors_config != NULL' failed"
This is caused by the following scenario:
1) Guest causes handle_dev_monitors_config_async() to be called
2) handle_dev_monitors_config_async() calls worker_update_monitors_config()
3) handle_dev_monitors_config_async() pushes worker->monitors_config, this
takes a ref on the current monitors_config
4) Guest causes handle_dev_monitors_config_async() to be called *again*
5) handle_dev_monitors_config_async() calls worker_update_monitors_config()
6) worker_update_monitors_config() does a decref on worker->monitors_config,
releasing the workers reference, this monitor_config from step 2 is
not yet free-ed though as the pipe-item still holds a ref
7) worker_update_monitors_config() creates a new monitors_config with an
initial ref-count of 1 and stores that in worker->monitors_config
8) The pipe-item of the *first* monitors_config is send, upon completion
a decref is done on the monitors_config, and monitors_config_decref not
only frees the monitor_config, but *also* sets worker->monitors_config
to NULL, even though worker->monitors_config no longer refers to the
monitor_config being freed, it refers to the 2nd monitor_config!
9) The client which was connected when this all happened disconnects
10) A new client connects, leading to the assert:
at red_worker.c:9519
num_common_caps=1, common_caps=0x5555569b6f60, migrate=0,
stream=<optimized out>, client=<optimized out>, worker=<optimized out>)
at red_worker.c:10423
at red_worker.c:11301
Note that red_worker.c:9519 is:
red_push_monitors_config(dcc);
gdb does not point to the actual line of the assert because the function gets
inlined.
The fix is easy and obvious, don't set worker->monitors_config to NULL in
monitors_config_decref. I'm a bit baffled as to why that code is there in
the first place, the whole point of ref-counting is to not have one single
unique place to store the reference...
This fix should not have any adverse side-effects as the 4 callers of
monitors_config_decref fall into 2 categories:
1) Code which immediately after the decref replaces worker->monitors_config
with a new monitors_config:
worker_update_monitors_config()
set_monitors_config_to_primary()
2) pipe-item freeing code, which should not touch the worker state at all
to being with
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The stream vis_region should be cleared after the stream region was sent
to the client losslessly. Otherwise, we might send redundant stream upgrades
if we process more drawables that are dependent on the stream region.
resolves: rhbz#891326
Starting from commit 81fe00b08a, red_detach_streams_behind can
trigger modifications in the current tree (by update_area calls). Thus,
after calling red_detach_streams_behind it is not safe to access tree
entries that were calculated before the call.
This patch inserts the drawable to the tree before the call to
red_detach_streams_behind. This change also requires making sure
that rendering operations that can be triggered by
red_detach_streams_behind will not include this drawable (which is now part of the tree).
Reported-by: Michal Luscon <mluscon@redhat.com>
Found by a Coverity scan:
in handle_dev_start -
Checking "worker->display_channel" implies that "worker->display_channel"
might be NULL.
Passing "worker" to function "guest_set_client_capabilities"
in guest_set_client_capabilities -
Directly dereferencing parameter "worker->display_channel"
red_proccess_commands calls were added after calling
guest_set_client_capabilities in order to cleanup the command ring from
old commands that the client might not be able to handle.
However, calling red_process_commands at this stage does send messages
to the client.
In addition, since setting the client capabilities at the guest is not
synchronized, emptying the command ring is not enough in order to make
sure the following commands will be supported by the client.
The call to red_proccess_commands before initializing the display
streams (the call to red_display_start_streams), caused inconsistencies
related to video streaming upon reconnecting (rhbz#883564).
I'm reverting this patch till another solution for the capabilities
mismatch is introduced.
Resolves: rhbz#883564
Internal images are just read from the surface, compressed, and sent to the client.
Then, they are destroyed. I can't find any reason for aligning their memory.
rhbz#876685
The current lz implementation does not support such bitmaps.
The following patch will actually prevent allocating stride > bpp*width
for internal images.
Previously, there was no check for the size of the message received from
the client, and all messages were read into a buffer of size 1024.
However, migration data can be bigger than 1024. In such cases, memory
corruption occurred.
red_wait_outgoing_item only waits till the currently outgoing msg is
completely sent.
red_wait_outgoing_items does the same for multi-clients. handle_dev_stop erroneously called
red_wait_outgoing_items, instead of waiting till all the items in the
pipes are sent.
This waiting is necessary because after drawables are sent to the client, we release them from the
device. The device might have been stopped due to moving to the non-live
phase of migration. Accessing the device memory during this phase can lead
to inconsistencies.
Also, MSG_MIGRATE should be the last message sent to the client, before
MSG_MIGRATE_DATA. Due to this bug, msgs were marshalled and sent after
handle_dev_stop and after handle_dev_display_migrate which sometimes led
to the release of surfaces, and inserting MSG_DISPLAY_DESTROY_SURFACE
after MSG_MIGRATE.
This patch also removes the calls to red_wait_outgoing_items, from
dev_flush_surfaces. They were unnecessary.
fix: rhbz#866929
At migration destination side, we need to restore the client's surfaces
state, before sending surfaces related messages.
Before this patch, we stopped the processing of only the cmd ring, till migration data
arrived.
However, some QXL_IOs require reading and rendering the cmd ring (e.g.,
update_area). Moreover, when the device is reset, after destroying all
surfaces, we assert (in qemu) if the cmd ring is not empty (see
rhbz#866929).
This fix makes the red_worker thread wait till the migration data arrives
(or till a timeout), and not process any input from the device after the
vm is started.
We try to inject an interrupt to the vm in this case, which we cannot do
if it is stopped. Instead log this and update when vm restarts.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=870972
(that bz is on qemu, it will be cloned or just changed, not
sure yet)
When a new client connects, there may be commands in the ring that it
can't understand, so we need to process these before forwarding new
commands to the client. By doing this after changing the capability
bits we ensure that the new client will never see a command that it
doesn't understand (under the assumption that the guest will read and
obey the capability bits).
Acked-by: Alon Levy <alonl@redhat.com>
A new interface
set_client_capabilities (QXLInstance *qin,
uint8_t client_present,
uint8_t caps[58]);
is added to QXLInstance, and spice server is changed to call it
whenever a client connects or disconnects. The QXL device in response
is expected to update the client capability bits in the ROM of the
device and raise the QXL_INTERRUPT_CLIENT interrupt.
There is a potential race condition in the case where a client
disconnects and a new client with fewer capabilities connects. There
may be commands in the ring that the new client can't handle. This
case is handled by first changing the capability bits, then processing
all commands in the ring, and then start forwarding commands to the
new client. As long as the guest obeys the capability bits, the new
client will never see anything it doesn't understand.
Just checks stride vs width times bpp.
This fixes a potential abort on guest generated bad images in
glz_encoder.
Other files touched to move some consts to red_common, they are
static so no problem to be defined in both red_worker.c and
red_parse_qxl.c .
replace add_ref with add for stack allocated SpiceMigrateDataDisplay.
This fixes wrong MIGRATE_DATA message in display channel (symptom is
glz_encoder_max being way too big, and malloc failure at target) seen on
F18 with gcc-4.7.1-5.fc18.x86_64 and glibc-2.16-8.fc18.x86_64 (didn't
appear on RHEL 6).
Graduality is irrelevant for A8 images, so instead of using RGB-ness
as a short-cut, add a new macro BITMAP_FMT_HAS_GRADUALITY() that
returns true for the existing RGB images, but false for A8.
After marshalling MSG_STREAM_CREATE, there is no need to ref and
unref stream->current before and after completing the sending of the
message (correspondingly). The referencing is unnecessary because all
the data that is required from the drawable (the clipping), is copied
during marshalling, and no field in the drawable is referenced (see
spice_marshall_msg_display_stream_create).
Moreover, the referencing was bugous:
While display_channel_hold_pipe_item and
display_channel_client_release_item_after_push referenced and
dereferenced, correspondingly, stream->current, stream->current might
have changed in between these calls, and then we ended up with one drawable
leaking, and one drawable released before its time has come (which
of course led to critical errors).
a SpiceMsgDisplayMonitorsConfig is sent on two occasions:
* as a result of a spice_qxl_monitors_config_async
* whenever a client connects and there is a previously set monitors
config
Sending the new message is protected by a new cap,
SPICE_DISPLAY_CAP_MONITORS_CONFIG
More elaborately:
spice_qxl_monitors_config_async receives a QXLPHYSICAL address of a
QXLMonitorsConfig struct and reads it, caching it in the RedWorker, and
sending it to all clients. Whenever a new client connects it receives
a SpiceMsgDisplayMonitorsConfig message as well.
Specifically all those that the previous patch converted to spice_debug.
spice_debug contains very verbose stuff like update_area that drowns out
those relatively rare (client connect / disconnect generated) messages.
Resolves: rhbz#824384
red_wait_pipe_item_sent mistakingly returned without waiting for sending the given pipe item
when the channel wasn't blocked. As a result, we failed when we had to
destroy a surface (e.g., QXL_IO_DESTROY_ALL_SURFACES) and to release all
the drawables that are depended on it (by removing them or waiting they will be sent).
In addition, red_wait_pipe_item_sent increased and decreased the reference to the pipe item
using channel_cbs->hold_item, and channel_cbs->release_item. However,
these calls can be called only by red_channel, otherwise
display_channel_client_release_item_before_push is called twice and
leads to a double call to ring_remove(&dpi->base).
Instead ref/put_drawable_pipe_item should be called.
The above routine was risky, since red_channel_client_init_send_data
can also be called with item==NULL. Thus, not all pipe items can be tracked.
The one call that was made for this routine was not necessary.
Resolves: rhbz#820669
Fix a segfault caused by a call to __red_is_next_stream_frame made by
red_stream_maintenance, with a drawable that is not a DRAW_COPY one.
The segfault is a reault of __red_is_next_stream_frame accessing
red_drawable->u.copy.src_bitmap for a non DRAW_COPY drawable.
Simplify keeping count of self_bitmap_image by putting it in
RedDrawable. It is allocated on reading from the command pipe and
deallocated when the last reference to the RedDrawable is dropped,
instead of keeping track of it in GlzDrawable and Drawable.