VirtIO-fs using writeback cache seems very broken at the moment. If a
guest accesses a file (even just using 'touch'), that the host is
currently writing, the guest can permanently end up with a truncated
version of that file. Even subsequent operations like moving the file,
will not result in the correct file being visible, but just rename the
truncated one.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Add support for sharing directories with a guest VM.
virtio-fs needs virtiofsd to be started. In order to start virtiofsd
as a process (despite being a daemon it is does not run in the
background), a double-fork is used.
virtiofsd should close itself together with QEMU.
There are the parameters dirid and the optional parameters direct-io,
cache and writeback. Additionally the expose-xattr & expose-acl
parameter can be set to expose xattr & acl settings from the shared
filesystem to the guest system.
The dirid gets mapped to the path on the current node and is also used
as a mount tag (name used to mount the device on the guest).
example config:
```
virtiofs0: foo,direct-io=1,cache=always,expose-acl=1
virtiofs1: dirid=bar,cache=never,expose-xattr=1,writeback=1
```
For information on the optional parameters see the coherent doc patch
and the official gitlab README:
https://gitlab.com/virtio-fs/virtiofsd/-/blob/main/README.md
Also add a permission check for virtiofs directory access.
Add virtiofsd to the Recommends list for the qemu-server Debian
package, this allows users to opt-out of installing this package, e.g.
for certification reasons.
Signed-off-by: Markus Frank <m.frank@proxmox.com>
Link: https://lore.proxmox.com/20250407134950.265270-3-m.frank@proxmox.com
Tested-by: Lukas Wagner <l.wagner@proxmox.com>
[TL: squash d/control change and re-add Lukas' T-b, as nothing
essentially changed from the v16 where his tag applied]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This makes it a bit more obvious what happens and having an actual
error for bogus $PVE_MACHINE_VERSION entries.
Note that there was no auto-vivification before, as we never directly
accessed $PVE_MACHINE_VERSION->{$verstr}->{highest} but used
get_machine_pve_revisions to query a specific QEMU machine version's
PVE revisions and then operated on the return value, and that method
returns undef if there is no entry at all.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fiona Ebner <f.ebner@proxmox.com> says:
Record the created fleecing images in the VM configuration to be able
to remove left-overs after hard failures.
Adds a new special configuration section 'fleecing', making special
section handling more generic as preparation, as well as fixing some
corner cases in configuration parsing and adding tests.
Fiona Ebner (16):
migration: remove unused variable
test: avoid duplicate mock module in restore config test
test: add parse config tests
parse config: be precise about section type checks
test: add test case exposing issue with unknown sections
parse config: skip unknown sections and warn about their presence
vzdump: anchor matches for pending and special sections
vzdump: skip all special sections
config: make special section handling generic
test: parse config: test config with duplicate sections
parse config: warn about duplicate sections
check type: require schema as an argument
config: add fleecing section
fix#5440: vzdump: better cleanup fleecing images after hard errors
migration: attempt to clean up potential left-over fleecing images
destroy vm: clean up potential left-over fleecing images
Link: https://lore.proxmox.com/20250127112923.31703-1-f.ebner@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Collect special sections below a common 'special-sections' key in
preparation to introduce a new special section.
The special 'cloudinit' section was added in the top-level of the
configuration structure, but it's cleaner to group special sections
more similar to snapshots.
The 'cloudinit' key was already initialized, so having the new
'special-sections' key be always initialized should not cause issues
after checking and adapting all usages of 'cloudinit' which this patch
attempts to do.
Add compat handling for remote migration which might receive the
configuration hash from a node that does not yet have the changes.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250127112923.31703-10-f.ebner@proxmox.com
In preparation to add another option and to improve style for the
callers.
One of the test cases that specified $is_zero_initialized is for a
non-existent storage, so the option was not added there.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Reviewed-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Link: https://lore.proxmox.com/20250404133204.239783-16-f.ebner@proxmox.com
It's a bit nicer to avoid multiple hashes, this keeps the information
also closer together, albeit that alone wasn't the reason for doing
this because this (hopefully) will never grow that big, rather it
should encourage adding changes entries until we test for that and it
also makes handling these entries a tiny bit more natural as no
wrapper counter for-loops doing integer string interpolation are
required.
While at it expand the comment, might be even a tad to verbose now,
and add a new helper to get these per-machine revisions.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
So new guests, or guests with the 'latest' machine type, have that
setting automatically disabled.
The previous default (enabling S3/S4), does not make too much sense in
a virtual environment, and sometimes makes problems, e.g. Windows
defaults to using 'hybrid shutdown' and 'fast startup' when S4 is
enabled, which leads to NVIDIA vGPU being broken on the boot after
that.
Since the tests don't pin the pve version themselves, we have to
update all the ones where the machine versions are derived from the
installed QEMU version.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Reviewed-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Tested-By: Stoiko Ivanov <s.ivanov@proxmox.com>
Link: https://lore.proxmox.com/20250404125345.3244659-7-d.csapak@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
When creating or updating guests with OS type windows, we want to pin
the machine version to a specific one. Since introduction of that
feature, we never bumped the pve machine version, so this was missing.
Append the pve machine version only if it's not 0 so we don't add that
unnecessarily.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Tested-By: Stoiko Ivanov <s.ivanov@proxmox.com>
Link: https://lore.proxmox.com/20250404125345.3244659-5-d.csapak@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
When we don't have a specific machine version on a windows guest, we
use the creation meta info to pin the machine version. Currently we
always append the pve machine version from the current installed KVM
version, which is not necessarily the version we pinned the guest to.
Instead, use the same mechanism as for normal version pinned machines,
which use 'pve0'.
For non-windows machines, we use the current QEMU machine version so
we should use the pve machine version from that too, so that stays the
same.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Tested-By: Stoiko Ivanov <s.ivanov@proxmox.com>
Link: https://lore.proxmox.com/20250404125345.3244659-4-d.csapak@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This patch is for enabling AMD SEV-SNP support.
Where applicable, it extends support for existing SEV(-ES) variables
to SEV-SNP. This means that it retains no-debug and kernel-hashes
options, but the no-key-sharing option is removed.
The default policy value is identical to QEMU’s, and the therefore
required option has been added to configure SMT support.
The code was tested by running a VM without SEV, with SEV, SEV-ES,
SEV-SNP. Each configuration was tested with and without an EFI disk
attached. For SEV-enabled configurations it was also verified that the
kernel actually used the respective feature.
Signed-off-by: Philipp Giersfeld <philipp.giersfeld@canarybit.eu>
Tested-by: Markus Frank <m.frank@proxmox.com>
Reviewed-by: Markus Frank <m.frank@proxmox.com>
Convert policy calculation to use shift operators and OR operation
instead of binary numbers and addition.
Signed-off-by: Philipp Giersfeld <philipp.giersfeld@canarybit.eu>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Markus Frank <m.frank@proxmox.com>
Reviewed-by: Markus Frank <m.frank@proxmox.com>
The format was dropped in QEMU binary version 2.2 with commit
550830f935 ("block: delete cow block driver").
Very old backups might still include this format as a hint (the data
in the backup is present in raw/chunk format in any case), but that is
not an issue. Restore already checks that the target storage supports
a given format and defaults to the default format of the storage if
the hint does not apply.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Acked-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Set the 'enable-migration' QEMU command-line flag to on for
live-migration marked mapped devices, as the default is 'auto', but
for those which are marked as capable for live-migration, we want to
explicitly enable that, so that we get an early error on start if the
driver does not support live-migration.
With that we can drop such devices from being a migration blocker.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[TL: squash and re-order similar changes together]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
By passing the mapping config to assert_valid, not only the specific
mapping.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Reviewed-by: Christoph Heiss <c.heiss@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Christoph Heiss <c.heiss@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fixes device hotplug in combination with QEMU 9.2.
QEMU commit be93fd5372 ("qdev-monitor: avoid QemuOpts in QMP device_add")
notes:
> This patch changes the behavior of QMP device_add but not HMP
> device_add. QMP clients that sent incorrectly typed device_add QMP
> commands no longer work. This is a breaking change but clients should be
> using the correct types already.
The qemu_deviceadd() helper does not have the required type
information right now, so switch to using HMP, which still behaves the
same when passing a device commandline string. QEMU commit be93fd5372
fixes passing in complex properties via JSON, but the qemu_deviceadd()
helper never uses any such, as it already only received a string (and
naively split it up).
Use HMP for 'device_del' too, simply to keep the qemu_deviceadd() and
qemu_devicedel() helpers consistent.
Switching back to QMP using the correct types in the JSON can still be
done later. Unfortunately, 'qmp-query-schema' does not provide
device-specific types, so another way is needed.
A timeout of 25 seconds is used rather then relying on the low default
like before, since device plug operations require actions by the guest
kernel and might require IO. Device plug is often an interactive
operation, so a too high timeout could lead to bad UX. For now stay a
few seconds under the default timeout of 30 seconds of our web UI's
API request handler. Should specific devices need a higher timeout, it
can still be increased further for them in the future.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[TL: reduce timeout from 30s to 25s to ensure sync API requests
(without a task worker) do not run into the frontend timeout]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
To allow re-using it for CD-ROM hotplug.
No functional change intended.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
This was the only user of List::Util, so move that too.
No functional change intended.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
Extract the logic for guest OS-type dependent machine version pinning
into a dedicated helper, so it can be re-used.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Starting from QEMU creation version 9.1, pin to the creation version
instead. Support for machine version 5.1 is expected to drop with QEMU
11.1 and it would still be good to handle Windows VMs that do not have
explicit machine version for whatever reason. For example, explicitly
setting the machine without a version on the CLI/API after creation is
one way to end up with such a machine.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Like this, it can be used by modules that cannot depend on
QemuServer.pm without creating a cyclic dependency.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
There are already other places where 'aarch64' and 'x86_64' are
checked to be the only valid architectures, for example, the
get_command_for_arch() helper, so the new error scenario for an
unknown arch should not cause any regressions.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Add an export, since the function is rather commonly used (in
particular inlined in function calls, where prefixing with the module
name would hurt readability) and there won't be much potential for
confusion name-wise.
This was the only user of stat(), so remove the File::stat include.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Cannot use the is_native_arch() helper inside the function anymore,
to avoid a cyclic dependency between the 'CPUConfig' and 'Helpers'
modules, inline it.
While at it, improve the variable name for the mapping.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The file_size_info() will return the size of the image based on
guessing the format. When importing via API, the correct size is
already known, so it's better to pass it in. The root-only 'qm'
commands for disk import and OVF import will still use auto-detection
for backwards compatibility. It might make sense to be able to
explicitly specify the format there too to get the correct size in all
cases.
New callers should detect the size according to the appropriate format
first.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
First step towards using the storage layer format instead of the
extension based format from qemu_img_format() as a source of truth
everywhere. Currently, some callers use qemu_img_format() and some
use parse_volname().
For import, special handling is needed, because the format can be a
combined ova+$extracted_format.
There is a fallback for 'raw' format when no format is returned by the
storage layer for backwards compatibility, e.g. ISOs. Formats that are
not part of the $QEMU_FORMAT_RE are not allowed.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
There have been some reports about `qm start` timeouts on VMs that have a
lot of NICs assigned.
This patch considers the number of NICs when calculating the config-specific
timeout. Since the increase in start time is linearly related to the number
of NICs, a constant timeout increment per NIC was chosen.
Signed-off-by: Hannes Laimer <h.laimer@proxmox.com>
This patch is for enabling AMD SEV (Secure Encrypted Virtualization)
support in QEMU.
VM-Config-Examples:
amd_sev: type=std,no-debug=1,no-key-sharing=1
amd_sev: es,no-debug=1,kernel-hashes=1
kernel-hashes, reduced-phys-bits & cbitpos correspond to the variables
with the same name in QEMU.
kernel-hashes=1 adds kernel hashes to enable measured linux kernel
launch since it is per default off for backward compatibility.
reduced-phys-bios and cbitpos are system specific and are read out by
the query-machine-capabilities c program and saved to the
/run/qemu-server/host-hw-capabilities.json file. This file is parsed
and than used by qemu-server to correctly start a AMD SEV VM.
type=std stands for standard sev to differentiate it from sev-es (es)
or sev-snp (snp) when support is upstream.
QEMU's sev-guest policy gets calculated with the parameters no-debug
& no-key-sharing. These parameters correspond to policy-bits 0 & 1.
If type is 'es' than policy-bit 2 gets set to 1 to activate SEV-ES.
Policy bit 3 (nosend) is always set to 1, because migration features
for sev are not upstream yet and are attackable.
SEV-ES is highly experimental since it could not be tested.
see coherent doc patch
Signed-off-by: Markus Frank <m.frank@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
when 'import-from' contains a disk image that needs extraction
(currently only from an 'ova' archive), do that in 'create_disks'
and overwrite the '$source' volid.
Collect the names into a 'delete_sources' list, that we use later
to clean it up again (either when we're finished with importing or in an
error case).
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Since pve-common commit:
eff5957 (sysfstools: file_write: properly catch errors)
this check here fails now when the reset does not work. It turns out
that resetting the device is not always necessary, and we previously
ignored most errors when trying to do so.
To restore that functionality, downgrade this `die` to a warning.
If the device really needs a reset to work, it will either fail later
during startup, or not work correctly in the guest, but that behavior
existed before and is AFAIK not really detectable from our side.
Also improve the warning message a bit to not scare users and explain
that we're continuing.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
[ TL: fine-tune error message a bit and avoid parenthesis ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>