RCC_FOREACH may be dangerous
The following patches replace FOREACH loops with a SAFE version.
Using unsafe loops may cause spice-server to abort (assert fails).
Specifically a read/write fail in those loops, may cause the client
to disconnect, removing the node currently iterated, which cause spice
to abort in ring_next():
-- assertion `pos->next != NULL && pos->prev != NULL' failed
The image descriptor flags shouldn't be copied as is from the flags that
were set by the driver. Specifically, the CACHE_ME flag shouldn't be copied,
since it is possible that (a) the image won't be cached (b) the image
is already cached, but in its lossy version, and we may want to set the bit for
CACHE_REPLACE_ME, in order to cache it in its lossless version.
In case (b), the client first looks for the CACHE_ME flag, and only if
it is not set it looks for CACHE_REPLACE_ME (see canvas_base.c). Since both flags where set,
the client ignored REPLACE_ME, and didn't turned off the lossy flag of the
cach item. Then, when a request from this lossles item reached the
client (FROM_CACHE_LOSSLESS), the client display channel waited
endlessly for the lossless version of the image.
When setting an initial video stream bit rate, if the bit rate
wasn't calculated by main_channel_client, and we don't have
estimation from previos streams, use some default values.
The patch also removes updating dcc->streams_max_bit_rate when
the bit_rate held by the main_channel is larger than it. It is not necessary
since we compare those 2 values each time we set the initial bit rate
for a stream.
rhbz#956345
After a spice session has been migrated, we don't retest the network
(user experience considerations). Instead, we obtain the is_low_bandwidth flag
from the src-server, via the migration data.
Before this patch, if we migrated from server s1 to s2 and then to s3,
and if the connection to s1 was a low bandwidth one, we erroneously
passed is_low_bandwidth=FALSE from s2 to s3.
Cc: Marc-André Lureau <marcandre.lureau@redhat.com>
Replace the mixed calls to display_channel_client_is_low_bandwidth
and to main_channel_client_is_low_bandwidth, with one flag in
CommonChannelClient that is set upon channel creation.
red_create_stream is called even without any client but there is no
encoding since the mjpeg encoder is now associated with StreamAgent
which is only created when we have a client.
With a SPICE_DISPLAY_CAP_MONITORS_CONFIG capable client, the client needs to
know what part of the primary to use for each monitor. If the guest driver
does not support this, the server sends messages to the client for a
single monitor spanning the entire primary.
As soon as the guest calls spice_qxl_monitors_config_async once, we set
the red_worker driver_has_monitors_config flag and stop doing this.
This is a problem when the driver gets unloaded, for example after a reboot
or when switching to a text vc with usermode mode-setting under Linux.
To reproduce this start a multi-mon capable Linux guest which uses
usermode mode-setting and then once X has started switch to a text vc. Note
how the client window does not only not resize, if you try to resize it
manually you always keep blackborders since the aspect is wrong.
This patch is the spice-server side of fixing this, it adds a new
spice_qxl_driver_unload method which clears the driver_has_monitors_config
flag.
The other patch needed to fix this is in qemu, and will calls this new method
from qxl_enter_vga_mode.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
mjpeg_encoder modify the initial bit we supply it, according to the
client feedback. If it reaches a bit rate which is higher than the
initial one, we use the higher bit rate as the new bit rate estimation.
A frame can be dropped if a new frame was added during the same
call to red_process_command (we didn't attempt to send the older
frame). Such drops are ignored.
This patch only employs setting the stream parameters based on
the initial given bit-rate, the latency, and the encoding size.
Later patches will also employ mjpeg_encoder response to client reports,
and its control over frame drops.
The patch also removes old stream bit rate calculations that weren't
used.
mjpeg_encoder can receive periodic reports about the playback status on
the client side. Then, mjpeg_encoder analyses the report and can
increase or decrease the stream bit rate, depending on the report.
When the bit rate is changed, the quality and frame rate of the stream
are re-evaluated.
Previously, the mjpeg quality was always 70. The frame rate was
tuned according to the frames' congestion in the pipe.
This patch sets the quality and frame rate according to
a given bit rate and the size of the first encoded frames.
The following patches will introduce an adaptive video streaming, in which
the bit rate, the quality, and the frame rate, change in response to
different parameters.
Patches that make red_worker adopt this feature will also follow.
The mjpeg_encoder should be client specific, and not shared between
different clients**, for the following reasons:
(1) Since we use abbreviated jpeg datastream for mjpeg, employing the same
mjpeg_encoder for different clients might cause errors when the
clients decode the jpeg data.
(2) The next patch introduces bit rate control to the mjpeg_encoder.
This feature depends on the bandwidth available, which is client
specific.
** at least till we change multi-clients not to re-encode the same
streams.
When qemu migration completes, we need to stop the streams, and to send
the corresponding upgrade_items to the client.
Otherwise, (1) the client might display lossy regions that we don't track
(streams are not part of the migration data).
(2) streams_timeout may occur after MSG_MIGRATE has been sent, leading
to messages being sent to the client after MSG_MIGRATE and before
MSG_MIGRATE_DATA (e.g., STREAM_CLIP, STREAM_DESTROY, DRAW_COPY).
No message besides MSG_MIGRATE_DATA should be sent after
MSG_MIGRATE.
When a msg other than MIGRATE_DATA reached spice-gtk after MSG_MIGRATE,
spice-gtk sent it to dest server as the migration data, and the dest
server crashed with a "bad message size" assert.
1) This does not buy us much, as red_marshall_monitors_config() also
removes 0x0 sized monitors and does a much better job at it
(also removing intermediate ones, not only tailing ones)
2) The code is wrong, as it allocs space for real_count heads, where
real_count always <= monitors_config->count and then stores
monitors_config->count in worker->monitors_config->count, causing
red_marshall_monitors_config to potentially walk
worker->monitors_config->heads past its boundaries.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
During my dynamic monitor support testing today, I hit the following assert
in red_worker.c:
"red_push_monitors_config: condition `monitors_config != NULL' failed"
This is caused by the following scenario:
1) Guest causes handle_dev_monitors_config_async() to be called
2) handle_dev_monitors_config_async() calls worker_update_monitors_config()
3) handle_dev_monitors_config_async() pushes worker->monitors_config, this
takes a ref on the current monitors_config
4) Guest causes handle_dev_monitors_config_async() to be called *again*
5) handle_dev_monitors_config_async() calls worker_update_monitors_config()
6) worker_update_monitors_config() does a decref on worker->monitors_config,
releasing the workers reference, this monitor_config from step 2 is
not yet free-ed though as the pipe-item still holds a ref
7) worker_update_monitors_config() creates a new monitors_config with an
initial ref-count of 1 and stores that in worker->monitors_config
8) The pipe-item of the *first* monitors_config is send, upon completion
a decref is done on the monitors_config, and monitors_config_decref not
only frees the monitor_config, but *also* sets worker->monitors_config
to NULL, even though worker->monitors_config no longer refers to the
monitor_config being freed, it refers to the 2nd monitor_config!
9) The client which was connected when this all happened disconnects
10) A new client connects, leading to the assert:
at red_worker.c:9519
num_common_caps=1, common_caps=0x5555569b6f60, migrate=0,
stream=<optimized out>, client=<optimized out>, worker=<optimized out>)
at red_worker.c:10423
at red_worker.c:11301
Note that red_worker.c:9519 is:
red_push_monitors_config(dcc);
gdb does not point to the actual line of the assert because the function gets
inlined.
The fix is easy and obvious, don't set worker->monitors_config to NULL in
monitors_config_decref. I'm a bit baffled as to why that code is there in
the first place, the whole point of ref-counting is to not have one single
unique place to store the reference...
This fix should not have any adverse side-effects as the 4 callers of
monitors_config_decref fall into 2 categories:
1) Code which immediately after the decref replaces worker->monitors_config
with a new monitors_config:
worker_update_monitors_config()
set_monitors_config_to_primary()
2) pipe-item freeing code, which should not touch the worker state at all
to being with
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
The stream vis_region should be cleared after the stream region was sent
to the client losslessly. Otherwise, we might send redundant stream upgrades
if we process more drawables that are dependent on the stream region.
resolves: rhbz#891326
Starting from commit 81fe00b08a, red_detach_streams_behind can
trigger modifications in the current tree (by update_area calls). Thus,
after calling red_detach_streams_behind it is not safe to access tree
entries that were calculated before the call.
This patch inserts the drawable to the tree before the call to
red_detach_streams_behind. This change also requires making sure
that rendering operations that can be triggered by
red_detach_streams_behind will not include this drawable (which is now part of the tree).
Reported-by: Michal Luscon <mluscon@redhat.com>
Found by a Coverity scan:
in handle_dev_start -
Checking "worker->display_channel" implies that "worker->display_channel"
might be NULL.
Passing "worker" to function "guest_set_client_capabilities"
in guest_set_client_capabilities -
Directly dereferencing parameter "worker->display_channel"
red_proccess_commands calls were added after calling
guest_set_client_capabilities in order to cleanup the command ring from
old commands that the client might not be able to handle.
However, calling red_process_commands at this stage does send messages
to the client.
In addition, since setting the client capabilities at the guest is not
synchronized, emptying the command ring is not enough in order to make
sure the following commands will be supported by the client.
The call to red_proccess_commands before initializing the display
streams (the call to red_display_start_streams), caused inconsistencies
related to video streaming upon reconnecting (rhbz#883564).
I'm reverting this patch till another solution for the capabilities
mismatch is introduced.
Resolves: rhbz#883564
Internal images are just read from the surface, compressed, and sent to the client.
Then, they are destroyed. I can't find any reason for aligning their memory.
rhbz#876685
The current lz implementation does not support such bitmaps.
The following patch will actually prevent allocating stride > bpp*width
for internal images.
Previously, there was no check for the size of the message received from
the client, and all messages were read into a buffer of size 1024.
However, migration data can be bigger than 1024. In such cases, memory
corruption occurred.
red_wait_outgoing_item only waits till the currently outgoing msg is
completely sent.
red_wait_outgoing_items does the same for multi-clients. handle_dev_stop erroneously called
red_wait_outgoing_items, instead of waiting till all the items in the
pipes are sent.
This waiting is necessary because after drawables are sent to the client, we release them from the
device. The device might have been stopped due to moving to the non-live
phase of migration. Accessing the device memory during this phase can lead
to inconsistencies.
Also, MSG_MIGRATE should be the last message sent to the client, before
MSG_MIGRATE_DATA. Due to this bug, msgs were marshalled and sent after
handle_dev_stop and after handle_dev_display_migrate which sometimes led
to the release of surfaces, and inserting MSG_DISPLAY_DESTROY_SURFACE
after MSG_MIGRATE.
This patch also removes the calls to red_wait_outgoing_items, from
dev_flush_surfaces. They were unnecessary.
fix: rhbz#866929
At migration destination side, we need to restore the client's surfaces
state, before sending surfaces related messages.
Before this patch, we stopped the processing of only the cmd ring, till migration data
arrived.
However, some QXL_IOs require reading and rendering the cmd ring (e.g.,
update_area). Moreover, when the device is reset, after destroying all
surfaces, we assert (in qemu) if the cmd ring is not empty (see
rhbz#866929).
This fix makes the red_worker thread wait till the migration data arrives
(or till a timeout), and not process any input from the device after the
vm is started.
We try to inject an interrupt to the vm in this case, which we cannot do
if it is stopped. Instead log this and update when vm restarts.
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=870972
(that bz is on qemu, it will be cloned or just changed, not
sure yet)