From 0b50d3d29f9b7889a496ef8ffa22dcdcb1875038 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 25 Jul 2024 14:32:26 +0200
Subject: [PATCH] resume: bump timeout for query-status

As reported in the community forum [0], after migration, the VM might
not immediately be able to respond to QMP commands, which means the VM
could fail to resume and stay in paused state on the target.

The reason is that activating the block drives in QEMU can take a bit
of time. For example, it might be necessary to invalidate the caches
(where for raw devices a flush might be needed) and the request
alignment and size of the block device needs to be queried.

In [0], an external Ceph cluster with krbd is used, and the initial
read to the block device after migration, for probing the request
alignment, takes a bit over 10 seconds[1]. Use 60 seconds as the new
timeout to be on the safe side for the future.

All callers are inside workers or via the 'qm' CLI command, so bumping
beyond 30 seconds is fine.

[0]: https://forum.proxmox.com/threads/149610/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
 PVE/QemuServer.pm | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index f9ec0a84..7d34325c 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -6459,7 +6459,9 @@ sub vm_resume {
     my ($vmid, $skiplock, $nocheck) = @_;
 
     PVE::QemuConfig->lock_config($vmid, sub {
-	my $res = mon_cmd($vmid, 'query-status');
+	# After migration, the VM might not immediately be able to respond to QMP commands, because
+	# activating the block devices might take a bit of time.
+	my $res = mon_cmd($vmid, 'query-status', timeout => 60);
 	my $resume_cmd = 'cont';
 	my $reset = 0;
 	my $conf;