This commit fixes enough endianness issues that it's possible to
connect to a spice-server/qemu running on a big-endian box with a client
running on a little-endian machine.
I haven't tested more than getting to the bios/bootloader and typing a
bit on the keyboard as I did not manage to boot a distro afterwards :(
This is based on patches send by Erlon R. Cruz
<erlon.cruz@br.flextronics.com>
The multimedia time is defined by the server side monotonic time [1],
but the drawing time-stamp is done in guest side, so it requires
synchronization between host and guest. This is expensive, when no audio
is playing, there is a ~30x/sec wakeup to update the qxl device mmtime,
and it requires marking dirty the rom region.
Instead, the video timestamping can be done more efficiently on server
side, without visible drawbacks.
[1] a better timestamp could be the audio time, since audio players are
usually sync with audio time)
Related to:
https://bugzilla.redhat.com/show_bug.cgi?id=912763
If the client advertises the SASL cap, it means it guarantees it will be
able to use SASL if the server supports, and that it does not need a valid
SpiceLinkReply::pub_key field when using SASL.
When the client cap is set, we thus don't need to create a RSA public key
if SASL is enabled server side.
The reason for needing client guarantees about not looking at the pub_key
field is that its presence and size is hardcoded in the protocol, but in
some hardened setups (using fips mode), generating a RSA 1024 bit key as
expected is forbidden and fails. With this new capability, the server
knows the client will be able to handle SASL if needed, and can skip
the generation of the key altogether. This means that on the setups
described above, SASL authentication has to be used.
During seamless migration, after switching host, if a client was connected
during the migration, it will have data to send back to the new
qemu/spice-server instance. This is handled through MIGRATE_DATA messages.
SPICE char devices use such MIGRATE_DATA messages to restore their state.
However, the MIGRATE_DATA message can arrive any time after the new qemu
instance has started, this can happen before or after the SPICE char
devices have been created. In order to handle this, if the migrate data
arrives early, it's stored in reds->agent_state.mig_data, and
attach_to_red_agent() will restore the agent state as appropriate.
Unfortunately this does not work as expected, for main
channel (agent messages).
If attach_to_red_agent() is called before the MIGRATE_DATA
message reaches the server, all goes well,
but if MIGRATE_DATA reaches the server before
attach_to_red_agent() gets called, then some assert() gets
triggered in spice_char_device_state_restore():
((null):32507): Spice-ERROR **: char_device.c:937:spice_char_device_state_restore: assertion `dev->num_clients == 1 && dev->wait_for_migrate_data' failed
Thread 3 (Thread 0x7f406b543700 (LWP 32543)):
Thread 2 (Thread 0x7f40697ff700 (LWP 32586)):
Thread 1 (Thread 0x7f4079b45a40 (LWP 32507)):
When restoring state, a client must already be added to the
spice-char-device.
What happens is that a client is not being added to the char-device
when when MIGRATE_DATA arrives first, which leaves both
dev->num_clients and dev->wait_for_migrate_data value at 0.
This commit changes the logic in spice_server_char_device_add_interface(),
such that if there is migrate data pending in reds->agent_state.mig_data
but no client was added to the spice-char-device yet,
then first the client is added to the device by calling
spice_char_device_client_add(), and only then the state is restored.
=== How to Reproduce
To reproduce, add delays to the migration connection between
qmeu-kvm on the source host (SRC) and on the destination (DST).
Specifically I added a man in the middle DLY host between
migration ports from SRC to DST.
+-----+ +-----+ +-----+
| SRC |--> | DLY | --> | DST |
+-----+ +-----+ +-----+
DLY listens on port P1 (e.g. 4444) and DST listens on port
PINCOMING (e.g. 4444, from qemu-kvm '-incoming' command line option)
Precondition: make sure port P1 on DLY is accessible in iptables.
Option 1: use ssh tcp port forwarding
On DLY host run ssh:
ssh DLY:P1:DST:PINCOMING DST
Then use the following migration command (on qemu-kvm monitor):
client_migrate_info spice DST PSPICE
migrate -d tcp:DLY:P1
Option 2: Use a simple proxy program that forwards
packets from SRC to DST while adding some delays.
The program runs on DLY, listens to port D1, upon
accept connects to DST:PINCOMING and forward all
packets from DLY:D1 to DST:PINCOMING.
Then use the same migrate command as in option 1:
client_migrate_info spice DST PSPICE
migrate -d tcp:DLY:P1
=== How to Reproduce Ends
This fixes https://bugzilla.redhat.com/show_bug.cgi?id=1035184
Based-on-a-patch-by: Christophe Fergeau <cfergeau@redhat.com>
In reds_send_link_ack(), lookup the channel with the same id as the link
message.
The bug was found during code review a while ago.
A reproducer bug was later reported:
https://bugzilla.redhat.com/show_bug.cgi?id=1058625
9feed69 moved the async reader code to RedsStream so that it can be used
for the SASL authentication code. In particular, it introduced a
RedsStream::async_read member which is used by the SASL authentication code
for its async operations.
However, what was not done is to remove the now redundant
RedLinkInfo::async_read field. This causes failures when using SASL
authentication as the async read error callback is getting set
on the RedLinkInfo::async_read structure, but then the SASL code is trying
to use the RedeStream::async_read structure for its async IOs, which do not
have the needed error callback set.
This commit makes use of the newly introduced reds_stream_async_read()
helper in order to make use of RedsStream::async_read.
This can fail in fips mode for example. If we ignore the failure, we'll get
a crash:
#0 0x00007f38d63728a0 in BN_num_bits () from /lib64/libcrypto.so.10
#1 0x00007f38d639661d in RSA_size () from /lib64/libcrypto.so.10
#2 0x00007f38d7991762 in reds_handle_read_link_done () from /lib64/libspice-server.so.1
#3 0x00007f38d7990c06 in spice_server_add_client () from /lib64/libspice-server.so.1
#4 0x00007f38d7990c6a in reds_accept () from /lib64/libspice-server.so.1
#5 0x00007f38dc0d2946 in qemu_iohandler_poll (pollfds=0x7f38dedce200, ret=755449965, ret@entry=1) at iohandler.c:143
#6 0x00007f38dc0d6ea8 in main_loop_wait (nonblocking=<optimized out>) at main-loop.c:465
#7 0x00007f38dbffd7c0 in main_loop () at vl.c:1988
#8 main (argc=<optimized out>, argv=<optimized out>, envp=<optimized out>) at vl.c:4357
This commit will cause the client connection to fail but qemu won't
segfault.
For example, with qemu, a webdav channel can be created this way:
-chardev spiceport,name=org.spice-space.webdav.0,...
And redirected to a virtio port:
-device virtserialport,...,name=org.spice-space.webdav.0
SASL authentication mostly use members from RedsStream to do its work, so
it makes sense to have its code in reds_stream.c. This should allow to make
RedsStream::sasl private in the future.
The AsyncRead structure in reds.h wraps an async read + callback to
be done on a stream. Moving it to reds_stream.h is needed in order
to move SASL authentication there.
Now that stream creation and SSL enabling are done by helpers
in reds_stream.c, we can move the initialization of the vfunc
read/write pointers there too.
Initializing a new stream means initializing quite a few fields.
This commit factors this initialization in a dedicated reds_stream_new
helper. This also helps moving more code from reds.c to reds_stream.c
When creating a TLS socket, both spice-server and spice-gtk currently
call SSL_CTX_new(TLSv1_method()). The TLSv1_method() function set the
protocol version to TLS 1.0 exclusively. The correct way to support
multiple protocol versions is to call SSLv23_method() in spite of its
scary name. This method will enable all SSL/TLS protocol versions. The
protocol suite may be further narrowed down by setting respective
SSL_OP_NO_<version_code> options of SSL context. This possibility is
used in this patch in order to block use of SSLv3 that is enabled by
default in openssl for client sockets as of now but spice has never used
it.
reds_handle_ticket uses a fixed size 'password' buffer for the decrypted
password whose size is SPICE_MAX_PASSWORD_LENGTH. However,
RSA_private_decrypt which we call for the decryption expects the
destination buffer to be at least RSA_size(link->tiTicketing.rsa)
bytes long. On my spice-server build, SPICE_MAX_PASSWORD_LENGTH
is 60 while RSA_size() is 128, so we end up overflowing 'password'
when using long passwords (this was reproduced using the string:
'fullscreen=1proxy=#enter proxy here; e.g spice_proxy = http://[proxy]:[port]'
as a password).
When the overflow occurs, QEMU dies with:
*** stack smashing detected ***: qemu-system-x86_64 terminated
This commit ensures we use a corectly sized 'password' buffer,
and that it's correctly nul-terminated so that we can use strcmp
instead of strncmp. To keep using strncmp, we'd need to figure out
which one of 'password' and 'taTicket.password' is the smaller buffer,
and use that size.
This fixes rhbz#999839
It's depending on an unmaintained package (slirp), and I don't
think anyone uses that code. It's not tested upstream nor in fedora,
so let's remove it.
When we want to disconnect the main channel from a callback, it is
safer to use red_channel_client_shutdown, instead of directly
destroying the client. It is also more consistent with how other
channels treat errors.
red_channel_client_shutdown will trigger socket error in the main channel.
Then, main_channel_client_on_disconnect will be called,
and eventually, main_dispatcher_client_disconnect.
I didn't replace calls to reds_disconnect/reds_client_disconnect in
places where those calls were safe && that might need immediate client
disconnection.
Fixes rhbz#918169
Some channels make direct calls to reds/main_channel routines. If
these routines try to read/write to the socket, and they get socket
error, main_channel_client_on_disconnect is called, and triggers
red_client_destroy. In order to prevent accessing expired references
to RedClient, RedChannelClient, or other objects (inside the original call, after
red_client_destroy has been called) I made the call to
red_client_destroy asynchronous with respect to main_channel_client_on_disconnect.
I added MAIN_DISPATCHER_CLIENT_DISCONNECT to main_dispatcher.
main_channel_client_on_disconnect pushes this msg to the dispatcher,
instead of calling directly to reds_client_disconnect.
The patch uses RedClient ref-count in order to handle a case where
reds_client_disconnect is called directly (e.g., when a new client connects while
another one is connected), while there is already CLIENT_DISCONNECT msg
pending in the main_dispatcher.
Examples:
(1) snd_worker.c
snd_disconnect_channel()
channel->cleanup() //snd_playback_cleanup
reds_enable_mm_timer()
.
.
main_channel_push_multi_media_time()...socket_error
.
.
red_client_destory()
.
.
snd_disconnect_channel()
channel->cleanup()
celt051_encoder_destroy()
celt051_encoder_destory() // double release
Note that this bug could have been solved by changing the order of
calls: e.g., channel->stream = NULL before calling cleanup, and
some other changes + reference counting. However, I found other
places in the code with similar problems, and I looked for a general
solution, at least till we redesign red_channel to handle reference
counting more consistently.
(2) inputs_channel.c
inputs_connect()
main_channel_client_push_notify()...socket_error
.
.
red_client_destory()
.
.
red_channel_client_create() // refers to client which is already destroyed
(3) reds.c
reds_handle_main_link()
main_channel_push_init() ...socket error
.
.
red_client_destory()
.
.
main_channel_client_start_net_test(mcc) // refers to mcc which is already destroyed
This can explain the assert in rhbz#964136, comment #1 (but not the hang that occurred before).
It's not always obvious what address spice-server will bind to,
in particular when the 'addr' parameter is omitted on QEMU
commandline. The decision of what address to bind to is made
in reds_init_socket with a call to getaddrinfo. Surprisingly,
that function had a call to getnameinfo() already, but it does
not seem to be using the result of that call in any way.
This commit moves this call after the socket is successfully bound
and add a log message to indicate which address it's bound to.
This bug results in the client dropping all the video frames after
migration in case that (1) the hosts involved in migration have different
mm-time; and that (2) there is no audio playback.
This is relvant only for the client that was connected during the
migration.
rhbz#958276
When there is no audio playback, we set the mm_time in the client to be older
than the one in the server by at least the requested latency (the delta is
actually bigger, due to the network latency).
When there is an audio playback, we adjust the mm_time in the client by
adjusting the playback buffer using SPICE_MSG_PLAYBACK_LATENCY.
This is clearly something which should be handled in the inputs_channel code,
rather then having a special case for it in the generic channel handling
code in reds.c. Moving it here also fixes the TODO we had on only sending
this message to new clients.
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Currently main_channel_push_notify only gets passed a static string, but
chances are in the future it may get passed dynamically allocated strings,
prepare it for this.
While at it also make clear that its argument is a string, and simplify
things a bit by making use of this knowledge (pushing the strlen call down).
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Client -> agent messages can spawn multiple VDIChunks. When this happens
the agent re-assembles the chunks into a complete VDAgentMessage before
processing it. The server only guarentees coherency at the chunk level,
so it is not possible for a partial chunk to get delivered to the agent.
But it is possible for some chunks of a VDAgentMessage to be delivered to
the agent followed by a client to disconnect without the rest of the
VDAgentMessage being delivered!
This will leave the agent in a wrong state, and the first messages send to it
by the next client to connect will get seen as the rest of the VDAgentMessage
from the previous client.
This patch sends the agent a new VD_AGENT_CLIENT_DISCONNECTED message from the
VDP_SERVER_PORT, on which the agent can then reset its VDP_CLIENT_PORT state.
Note that no capability check is done for this, since the capabilities are
something negotiated between client and agent. The server will simply always
send this message on client disconnect, relying on older agents discarding the
message since it has an unknown type (which both the windows and linux agents
already do).
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
reds.c is using strncpy with a length one byte less than the
destination buffer size, and is relying on the fact that the
destination buffers are static global variables.
Now that we depend on glib, we can use g_strlcpy instead, which
avoids relying on such a subtle trick to get a nul-terminated
string.
We currently output a warning when getaddrinfo fails, but then
we go on trying to use the information it couldn't read. Make
sure we bail out of reds_init_socket if getaddrinfo fails.