netdev_add is now a proper qmp command, which means that it verifies
the parameter types properly
instead of sending strings, we now have to choose the correct
types for the parameters
bool for vhost
and uint64 for queues
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
the special case was dropped when moving this to pve-storage.
fixes commit c6d517835a
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
More API calls will follow for this path, for now add the 'index' call to
list all custom and default CPU models.
Any user can list the default CPU models, as these are public anyway, but
custom models are restricted to users with Sys.Audit on /nodes.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Explicitly allows changing other properties than the cputype, even if
the currently set cputype is not accessible by the user. This way, an
administrator can assign a custom CPU type to a VM for a less privileged
user without breaking edit functionality for them.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
fixing the following two issues:
- the legacy code path was never converted to the new fork_tunnel
signature (which probably means that nothing triggers it in practice
anymore?)
- the NBD Unix socket got forwarded multiple times if more than one disk
was migrated via NBD (this is harmless, but wrong)
for the second issue I opted to keep the code compatible with the
possibility that Qemu starts supporting multiple NBD servers in the
future (and the target node could thus return multiple UNIX socket
paths). currently we can only start one NBD server on one socket, and
each drive-mirror simply starts a new connection over that single
socket.
I took the liberty of renaming the variables/keys since I found
'tunnel_addr' and 'sock_addr' rather confusing.
Reviewed-By: Mira Limbeck <m.limbeck@proxmox.com>
Tested-By: Mira Limbeck <m.limbeck@proxmox.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
fixes commit 940e2a3a06
QEMU 4.1 will fail to start a guest with an audio device set with:
> Property '.audiodev' not found
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
If /dev/hwrng exists, but no actual generator is connected (or it is
disabled on the host), QEMU will happily start the VM but crash as soon
as the guest accesses the VirtIO RNG device.
To prevent this unfortunate behaviour, check if a useable hwrng is
connected to the host before allowing the VM to be started.
While at it, clean up config_to_command by moving new and existing rng
source checks to a seperate sub.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
redirecting to the saved STDOUT in case of a template backup or a VM
without any disks failed because of the erroneous '=':
Backup of VM 123123 failed - command '/usr/bin/vma create -v -c [...]' failed:
Bad filehandle: =5 at /usr/share/perl/5.28/IPC/Open3.pm line 58.
https://forum.proxmox.com/threads/vzdump-to-stdout.69364
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
It's possible to have a VM with OVMF but without an efidisk, so don't
call parse_drive on a potential undef value.
Partial revert of 818c3b8d91 ("cfg2cmd: ovmf: code cleanup")
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
and move the lock call and decision logic closer together
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Tested-by: Fabian Ebner <f.ebner@proxmox.com>
This reverts commit b5490d8a98.
When resizing a volume of a running VM, a qmp block_resize command
is issued. This is non-blocking, so the size on the storage immediately
after issuing the command might still be the old one.
This is part of the issue reported in bug #2621.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
we really only want to rescan the disk size of the disks we actually
need, and that are only the local disks (for which we have to allocate
the correct size on the target)
also we want to always skip the efidisk, since we get the wanted
size after the loop, and this produced a confusing log line
(for details why we do not want the 'real' size,
see commit 818ce80ec1)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
by avoiding auto-vivification of $self->{online_local_volumes} via
iteration. most code paths don't care whether it's undef or a reference
to an empty list, but this caused the (already) fixed bug of calling
nbd_stop without having started an NBD server in the first place.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
lock_file is used by PVE::QemuServer::Memory, but it does properly 'use
PVE::Tools ...' itself so we can drop them in the main module.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
regex to reduce the code duplication, as archive_info and
decompressor_info provides the same information as well.
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
VM was can be true for stop mode backup, we cannot check the "is VM
currently running" as that doesn't tells us anything (could be the
backup process), so check the mode also..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
as the nbd server could have been stopped by something else.
Further, it makes no sense to die and mark the migration thus as
failed, just because of a NBD server stop issue.
At this point the migration hand off to the target was done already,
so normally we're good, if it fails we have other (followup) problems
anyway.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
and refactor the test_volid closure. Like this get_replicatable_volumes doesn't
need a separate loop for unused volumes anymore. For get_vm_volumes, which is used
for activation/deactivation of volumes at migration and deactivation in vm_stop_cleanup,
includes those volumes now. For migration it's an improvement, because those volumes
might need to be migrated and for vm_stop_cleanup it shouldn't hurt. The last user
of foreach_volid is check_vm_disks_local used by migrate_vm_precondition,
where information about the additional volumes doesn't hurt either.
Note that replicate is (still) set by default, so the behavior for
get_replicatable_volumes for unused volumes should not change.
Hibernation vmstate files are now also included and recognized as 'is_vmstate'.
The 'size' attribute will not be overwritten by subsequent iterations for the
same volid anymore (a volid may appear both in the config and in snapshots),
so the size from the current config is now preferred.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
when a backup includes a cloudinit disk on a non-existent storage,
the restore fails with 'storage' does not exist
this happens because we want to get the format of the disk, by
checking the source storage
we fix this by using the target storage first and only the source as
fallback
this will still fail if neither storage exists
(which is ok, since we cannot restore then anyway)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Some OVF files to not declare 'rasd' as a default namespace (in the
top level Envelope element), but inline in each element (e.g.
<rasd:HostResource xmlns:rasd="foo">...</rasd:HostResource>)
This trips up our relative findvalue with
> XPath error : Undefined namespace prefix
To avoid this, search in the global XPathContext (where we register
those namespaces ourselves) and pass the item_node as context
parameter.
This works then for both cases
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
this is only used for migration via 'qm mtunnel', regular users should
never need to resume a VM that does not logically belong to the node it
is running on
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
by counting only local volumes that will be live-migrated via qemu_drive_mirror,
i.e. those listed in $self->{online_local_volumes}.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
With Qemu 4.2 a new `audiodev` property was introduced [0] to explicitly
specify the backend to be used for the audio device. This is accompanied
with a warning that the fallback to the default audio backend is
deprecated.
[0] https://wiki.qemu.org/ChangeLog/4.2#Audio
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
If storage_migrate dies, the error message might not include the
volume ID or the target storage ID, but those might be good to know.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This makes it possible to migrate a VM with volumes store1:vm-123-disk-0
store2:vm-123-disk-0 to some targetstorage. Also prevents migration failure
when there is an orphaned disk with the same volid on the target.
To avoid confusion, the name should not change for 'vmstate'-volumes.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
It was necessary to move foreach_volid back to QemuServer.pm
In VZDump/QemuServer.pm and QemuMigrate.pm the dependency on
QemuConfig.pm was already there, just the explicit "use" was missing.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Upstream marks these as having a micro-version of >=90, unfortunately the
machine versions are bumped earlier so testing them is made unnecessarily
difficult, since the version checking code would abort on migrations etc...
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[ Thomas: do so refactor ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
so that pve-container and qemu-server use the same one, in preparation
for moving it to JSONSchema and having a bridgepair format.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Can be specified for a particular VM or via a custom CPU model (VM takes
precedence).
QEMU's default limit only allows up to 1TB of RAM per VM. Increasing the
physical address bits available to a VM can fix this.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
If a cputype is custom (check via prefix), try to load options from the
custom CPU model config, and set values accordingly.
While at it, extract currently hardcoded values into seperate sub and add
reasonings.
Since the new flag resolving outputs flags in sorted order for
consistency, adapt the test cases to not break. Only the order is
changed, not which flags are present.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Reviewed-By: Fabian Ebner <f.ebner@proxmox.com>
Tested-By: Fabian Ebner <f.ebner@proxmox.com>
To avoid hardcoding even more CPU-flag related things for custom CPU
models, introduce a dynamic approach to resolving flags.
resolve_cpu_flags takes a list of hashes (as documented in the
comment) and resolves them to a valid "-cpu" argument without
duplicates. This also helps by providing a reason why specific CPU flags
have been added, and thus allows for useful warning messages should a
flag be overwritten by another.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Reviewed-By: Fabian Ebner <f.ebner@proxmox.com>
Tested-By: Fabian Ebner <f.ebner@proxmox.com>
Just like with live-migration, custom CPU models might change after a
snapshot has been taken (or a VM suspended), which would lead to a
different QEMU invocation on rollback/resume.
Save the "-cpu" argument as a new "runningcpu" option into the VM conf
akin to "runningmachine" and use as override during rollback/resume.
No functional change with non-custom CPU types intended.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This is required to support custom CPU models, since the
"cpu-models.conf" file is not versioned, and can be changed while a VM
using a custom model is running. Changing the file in such a state can
lead to a different "-cpu" argument on the receiving side.
This patch fixes this by passing the entire "-cpu" option (extracted
from /proc/.../cmdline) as a "qm start" parameter. Note that this is
only done if the VM to migrate is using a custom model (which we can
check just fine, since the <vmid>.conf *is* versioned with pending
changes), thus not breaking any live-migration directionality.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
in addition to printing it. preparation for remote cluster migration,
where we want to return this in a structured fashion over the migration
tunnel instead of parsing stdout via SSH.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
into one sub that retrieves the local disks, and the actual NBD
allocation. that way, remote incoming migration can just call the NBD
allocation with a custom list of volume names/storages/..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
both where previously missing. the existing 'check_storage_access'
helper is not applicable here since it operates on a full set of VM
config options, not just storage IDs.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the syntax is backwards compatible, providing a single storage ID or '1'
works like before. the new helper ensures consistent behaviour at all
call sites.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
485449e37 ("qmp: use migrate-set-parameters in favor of deprecated options")
changed the initial "migrate_set_downtime" QMP call to the more recent
"migrate-set-parameters", but forgot to do so for the auto-increase code
further below.
Since the units of the two calls don't match, this would have caused the
auto-increase to increase the limit to absurd levels as soon as it kicked
in (ms treated as s).
Update the second call to the new version as well, and while at it remove
the unnecessary "defined()" check for $migrate_downtime, which is always
initialized from the defaults anyway.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
as preparation of targetstorage mapping and remote migration. this also
removes re-using of the $local_volumes hash in the original code.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
to start breaking up vm_start before extending parts for new migration
features like storage and network mapping.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
as preparation for refactoring it further. remote migration will add
another 1-2 parameters, and it is already unwieldly enough as it is.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
to also handle cases where disk allocation failed in the remote
vm_start, and we only have a bitmap but no target drive information.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
on storages where the minimum size of images is bigger than the real
OVMF_VARS.fd file, they get padded to their minimum size
when using such an image, qemu maps it fully to the vm, but the efi
does not find the vars region and creates a file on the first efi
partition it finds
this breaks some settings in the ovmf, such as resolution
to fix this, we have to specify the size for the pflash, so that
qemu only maps the first n bytes in the vm (this only works for
raw files, not for qcow2)
we also have to use the correct size when converting between storages
in 'clone_disk' (used for move disk and cloning vms) and when
live migrating to different storages
when we now expect that the source image is always correctly used/created
(e.g. raw with size=x in pflash argument) then we always create the
target correctly
when encountering users which have a non-valid image (e.g. a efidisk
moved from zfs to qcow2 before this patch), we have to tell them to
recreate the efidisk and the settings on it
we have to version_guard it to 4.1+pve2 (since we haven't bumped yet
since the change to pve2)
also add 2 tests, one for the old version and one for the new
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
Reviewed-by: Stefan Reiter <s.reiter@proxmox.com>
[ Thomas: rebased to master ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
since bitmaps are set early on, and 'qm start' potentially has allocated
the disks but still failed. we can only clean up what we know about
anyway, so the disk part is still only best effort.
also use replicated_volumes instead of bitmap existence to check for
replicated volumes, since 'qm start' on an old node that does not
understand replicated volumes might have allocated a new volume that we
DO want to clean up, and not skip.
also cleanup disks after stopping target VM, otherwise we might end up
in a situation where the target VM is still running and using the disks,
thus blocking the disk cleanup.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
by only checking for replicatable volumes when a replication job is
defined, and passing only actually replicated volumes to the target node
via STDIN, and back via STDOUT.
otherwise this can pick up theoretically replicatable, but not actually
replicated volumes and treat them wrong.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
fixes commit 0b2f574b4c
enforce_vm_running_for_backup is now witout return value, for the PBS
I forgot to remove an now outdated call to handle_vm_powerstate, drop
that.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
$cpu_fmt is being reused for custom CPUs as well as VM-specific CPU
settings. The "pve-vm-cpu-conf" format is introduced to verify a config
specifically for use as VM-specific settings.
"pve-cpu-conf" is registered for use in custom CPU API calls (where no
additional checks are required).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Turn CPUConfig into a SectionConfig with parsing/writing support for
custom CPU models. IO is handled using cfs.
Namespacing will be provided using "custom-" prefix for custom model
names (in VM config only, cpu-models.conf will contain unprefixed
names).
Includes two overrides to avoid writing redundant information to the
config file, additionally get_custom_model is used to retrieve a custom
model configuration by name.
Resolve custom names in print_cpu_device when a custom cpu is passed.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
There is a need to set $noerr, because otherwise migration for a
VM with a non-replicatable volume fails with:
missing replicate feature on volume 'myfs:107/vm-107-disk-2.raw'
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
as we need at least pve-qemu in 4.2 for this to work, the target side
is implicitly checked with "to old version" check for migrate or the
mirror will fail anyway.
Just use the simple "qemu binary version check", as we could stil
live migrate an older snapshot with older machine versions if both
sides have a recent enough qemu.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
with incremental drive-mirror and dirty-bitmap tracking.
1.) get replicated disks that are currently referenced by running VM
2.) add a block-dirty-bitmap to each of them
3.) replicate ALL replicated disks
4.) pass bitmaps from 2) to drive-mirror for disks from 1)
5.) skip replicated disks when cleaning up volumes on either source or
target
added error handling is just removing the bitmaps if an error occurs at
any point after 2, except when the handover to the target node has
already happened, since the bitmaps are cleaned up together with the
source VM in that case.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
to make migration logs a bit easier to grasp with a quick glance.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
by re-using a dirty bitmap that represents changes since the divergence
of source and target volume. requires a qemu that supports incremental
drive-mirroring, and will die otherwise.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
Moved code so that initialization of drivedesc_hash stays a single block.
Avoid auto-vivication in parse_drive.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
E.g.: If a feature requires 4.1+pveN and we're using machine version 4.2
we don't need to increase the pve version to N (4.2+pve0 is enough).
We check this by doing a min_version call against a non-existant higher
pve-version for the major/minor tuple we want to test for, which can
only work if the major/minor alone is high enough.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Clarify why a cancel is actually not really canceling here, because
we're already finished with storage migration and the block jobs are
all in ready state and we (source) are going to stop soon to hand
over to target.
> Note that if you issue 'block-job-cancel' after 'drive-mirror' has
> indicated (via the event BLOCK_JOB_READY) that the source and
> destination are synchronized, then the event triggered by this
> command changes to BLOCK_JOB_COMPLETED, to indicate that the
> mirroring has ended and the destination now has a point-in-time
> copy tied to the time of the cancellation
-- qapi/block-core.json (QEMU 4.2)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The change to the prefixed version broke migration from new to old
qemu-server version. This reverts the change and adds a TODO comment for
7.0 to change it to the prefixed version then.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
...instead of booting with an invalid config once and then silently
changing the memory size for consequent VM starts.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Tested-by: Alwin Antreich <a.antreich@proxmox.com>
This cannot work, since we adjust the 'memory' property of the VM config
on hotplugging, but then the user-defined NUMA topology won't match for
the next start attempt.
Check needs to happen here, since it otherwise fails early with "total
memory for NUMA nodes must be equal to vm static memory".
With this change the error message reflects what is actually happening
and doesn't allow VMs with exactly 1GB of RAM either.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Tested-by: Alwin Antreich <a.antreich@proxmox.com>
The reuse of the tunnel, which we're opening to communicate with the target
node and to forward the unix socket for the state migration, for the NBD unix
socket requires adding support for an array of sockets to forward, not just a
single one. We also have to change the $sock_addr variable to an array
for the cleanup of the socket file as SSH does not remove the file.
To communicate to the target node the support of unix sockets for NBD
storage migration, we're specifying an nbd_protocol_version which is set
to 1. This version is then passed to the target node via STDIN. Because
we don't want to be dependent on the order of arguments being passed
via STDIN, we also prefix the spice ticket with 'spice_ticket: '. The
target side handles both the spice ticket and the nbd protocol version
with a fallback for old source nodes passing the spice ticket without a
prefix.
All arguments are line based and require a newline in between.
When the NBD server on the target node is started with a unix socket, we
get a different line containing all the information required to start
the drive-mirror. This contains the unix socket path used on the target node
which we require for forwarding and cleanup.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
For secure live migration with local disks via NBD over a unix socket,
we have to somehow communicate from the source node to the target node
if it supports it. This is because there can only be one NBD server with
exactly one socket bound.
The source node passes that information via STDIN. Support for
'spice_ticket: (...)' is added in addition to 'nbd_protocol_version:
<version>'. As old source nodes send the spice ticket without a prefix,
we still have to have a fallback for this case. New information should
always be passed via a prefix that is matched, otherwise it will be
recognized as spice ticket.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
As the NBD server spawned by qemu can only listen on a single socket,
we're dependent on a version being passed to vm_start that indicates
which protocol can be used, TCP or Unix, by the source node.
The change in socket type (TCP to Unix) comes with a different URI. For
unix sockets it has the form: 'nbd:unix:<path/to/socket>:exportname=<device>'.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
With Qemu 4.2 we encountered a problem with unix sockets and SSH socket
forwarding for drive-mirror. It seems the socket gets reopened again and
again after it closes for some reason. This can be worked around by
specifying 'block-job-cancel' instead of 'block-job-complete' when we're
not interested in swapping the disks again from NBD to their original
protocol. This is always the case when we use drive-mirror for live
migrating a VM.
qemu_drive_mirror is used for migration and for clone_disk. All in all
we have 3 cases to handle. Either the 'skip' case which skips the
completion of the job. The 'wait' case which was the default before and
still is when $completion is undefined. And the new 'wait_noswap' case
which is used for the live migration.
If 'wait_noswap' is specified, we issue a 'block-job-cancel' once the block
job is in 'ready' state. This completes the block job without swapping the
disks.
clone_disk always uses 'block-job-cancel' via the qemu_blockjobs_cancel
sub.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
and make it match with what parse_drive does. Even though the 'real' format
was pve-volume-id, callers already expected that parse_drive returns a hash
with a valid 'file' key (e.g. PVE/API2/Qemu.pm:1147ff).
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Reviewed-By: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Since the MacOS Mojave Apple ships AppleQEMUGuestAgent by default.
However, it does not fully adhere to QGA specs as they do expect each
command to be newline delimited.
This makes each command to be newline delimited which is harmless for
all other systems (Windows, Linux), but enable guest agent by default
without any changes on OSX.
Signed-off-by: Kamil Trzcinski <ayufan@ayufan.eu>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
avoids a genisoimage output like:
> Total translation table size: 0
> Total rockridge attributes bytes: 417
> Total directory bytes: 0
> Path table size(bytes): 10
> Max brk space used 0
> 178 extents written (0 MB)
on every VM start.
Rather than that useless output, tell genisoimage to be quiet, which
still prints errors but nothing else. Additionally print a short
single line about that we're to create the cloud-init iso.
Reformat while at it.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
removes safe_string_ne and safe_num_ne code which is now shared in
GuestHelpers. also change all the calls to use the shared definitions.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
This fixes an issue when migrating a VM with an unused volume with format
qcow2 or vmdk. Since 'snapshots' wasn't set, storage_migrate wanted to
export/import with format raw+size instead. Therefore it used (instead of
just 'dd') 'qemu-img convert', which fails when its output leaves through
a pipe. Upon importing, a second error is present, because the format from
the volume ID doesn't match the format of the stream and there is no
conversion yet.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
LGTM-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
If for whatever reason there is no size in the property string
of a drive, 'qm rescan' would do nothing for that drive and
live migration would also fail.
Also adds a check to avoid potential auto-vivification of volid_hash->{$volid}
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
The initialization for the drive keys in $confdesc is changed
to be a single for-loop iterating over the keys of $drivedesc_hash and
the initialization of the unusedN keys is move to directly below it.
To avoid the need to change all the call sites, functions with more than
a few callers are exported from the submodule and imported into QemuServer.pm.
For callers of the now imported functions within QemuServer.pm, the prefix
PVE::QemuServer is dropped, because it is unnecessary and now even confusing.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
which contains the full descriptions of the drives, and
make parse_drive not depend on $confdesc anymore.
In preparation to moving drive-related code to its own module.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Allow a user to add a virtio-rng-pci (an emulated hardware random
number generator) to a VM with the rng0 setting. The setting is
version_guard()-ed.
Limit the selection of entropy source to one of three:
/dev/urandom (preferred): Non-blocking kernel entropy source
/dev/random: Blocking kernel source
/dev/hwrng: Hardware RNG on the host for passthrough
QEMU itself defaults to /dev/urandom (or the equivalent getrandom()
call) if no source file is given, but I don't fully trust that
behaviour to stay constant, considering the documentation [0] already
disagrees with the code [1], so let's always specify the file ourselves.
/dev/urandom is preferred, since it prevents host entropy starvation.
The quality of randomness is still good enough to emulate a hwrng, since
a) it's still seeded from the kernel's true entropy pool periodically
and b) it's mixed with true entropy in the guest as well.
Additionally, all sources about entropy predicition attacks I could find
mention that to predict /dev/urandom results, /dev/random has to be
accessed or manipulated in one way or the other - this is not possible
from a VM however, as the entropy we're talking about comes from the
*hosts* blocking pool.
More about the entropy and security implications of the non-blocking
interface in [2] and [3].
Note further that only one /dev/hwrng exists at any given time, if
multiple RNGs are available, only the one selected in
'/sys/devices/virtual/misc/hw_random/rng_current' will feed the file.
Selecting this is left as an exercise to the user, if at all required.
We limit the available entropy to 1 KiB/s by default, but allow the user
to override this. Interesting to note is that the limiter does not work
linearly, i.e. max_bytes=1024/period=1000 means that up to 1 KiB of data
becomes available on a 1000 millisecond timer, not that 1 KiB is
streamed to the guest over the course of one second - hence the
configurable period.
The default used here is the same as given in the QEMU documentation [0]
and has been verified to affect entropy availability in a guest by
measuring /dev/random throughput. 1 KiB/s is enough to avoid any
early-boot entropy shortages, and already has a significant impact on
/dev/random availability in the guest.
[0] https://wiki.qemu.org/Features/VirtIORNG
[1] https://git.qemu.org/?p=qemu.git;a=blob;f=crypto/random-platform.c;h=f92f96987d7d262047c7604b169a7fdf11236107;hb=HEAD
[2] https://lwn.net/Articles/261804/
[3] https://lwn.net/Articles/808575/
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The http-server has a 64KB payload limit for post requests, so note
that explicit even if it's a theoretical maximum as the reamainig
params also need some space in the request
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
'input-data' can be used to pass arbitrary data to a guest when running
an agent command with 'guest-exec'. Most guest-agent implementations
treat this as STDIN to the command given by "path"/"arg", but some go as
far as relying solely on this parameter, and even fail if "path" or
"arg" are set (e.g. Mikrotik Cloud Hosted Router) - thus "command" needs
to be made optional.
Via the API, an arbitrary string can be passed, on the command line ('qm
guest exec'), an additional '--pass-stdin' flag allows to forward STDIN
of the qm process to 'input-data', with a size limitation of 1 MiB to
not overwhelm QMP.
Without 'input-data' (API) or '--pass-stdin' (CLI) behaviour is unchanged.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
1. Avoids the error
"VM 111 qmp command 'block_resize' failed - The new size must be a multiple of 512"
for qcow2 disks.
2. Because volume_import expects disk sizes to be a multiple of 1 KiB.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Machines running with SeaBIOS don't have the efidisk attached, so QEMU
cannot back it up and fails with "unknown drive".
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Some of the recent QMP changes require at least 2.8.0, but since the
oldest version we officially package for 6.x is 4.0.0 anyway, checking
for at least 3.0 should not break anyone's setup.
Note that this does not affect machine version checks, only the
installed QEMU binary version.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Live-migrating a VM with more than 14 SCSI disks to a node that doesn't
support it yet is broken. Use a bumped pve-version to represent that and
give the user a nice error message instead.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The previously introduced approach can fail for pinned versions when a
new QEMU release is introduced. The saner approach is to use a mapping
that gives one pve-version for each QEMU release.
Fortunately, the old system has not been bumped yet, so we can still
change it without too much effort.
QEMU versions without a mapping are assumed to be pve0, 4.1 is mapped to
pve1 since thats what we had as our default previously.
Pinned machine versions (i.e. pc-i440fx-4.1) are always assumed to be
pve0, for specific pve-versions they'd have to be pinned as well (i.e.
pc-i440fx-4.1+pve1).
The new logic also makes the pve-version dynamic, and starts VMs with
the lowest possible 'feature-level', i.e. if a feature is only available
with 4.1+pve2, but the VM isn't using it, we still start it with
4.1+pve0.
We die if we don't support a version that is requested from us. This
allows us to use the pve-version as live-migration blocks (i.e. bumping
the version and then live-migrating a VM which uses the new feature (so
is running with the bumped version) to an outdated node will present the
user with a helpful error message and fail instead of silently modifying
the config and only failing *after* the migration).
$version_guard is introduced in config_to_command to use for features
that need to check pve-version, it automatically handles selecting the
newest necessary pve-version for the VM.
Tests have to be adjusted, since all of them now resolve to pve0 instead
of pve1. EXPECT_ERROR matching is changed to use 'eq' instead of regex
to allow special characters in error messages.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Because of alignment and rounding in the storage backend, the effective
size might not match the 'newsize' parameter we passed along.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
query-cpus has been deprecated since 2.12.0 [0] in favor of
query-cpus-fast, which no longer incurs a guest performance penalty on
the guest. The returned information is the same as far as our use case
is concerned.
[0] https://qemu.weilnetz.de/doc/qemu-doc.html#Deprecated-features
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
migrate_set_downtime, migrate_set_speed and migrate-set-cachesize have
all been deprecated since 2.8 or 2.11 [0]. They still work, but no
reason not to use the correct version.
Note that the downtime-limit parameter switched from seconds to
milliseconds, so convert to that. Slightly improve log output with units
while at it.
[0] https://qemu.weilnetz.de/doc/qemu-doc.html#Deprecated-features
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
'device' is deprecated since 2.8 in favor of 'id' [0], but since we
always consistently set the id on our drives anyway we can substitute it
easily.
[0] see files qapi/block.json and qapi/block-core.json in QEMU source
code, the online documentation doesn't mention it AFAICT
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
...and cleanup surrounding code a bit.
'change' is deprecated, and according to the qapi definition in QEMU it
is 'strongly recommended' to avoid using it.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The description for vm_config was out of date and from the description
for vm_pending it was hard to tell what the difference to vm_config was.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
regression introduced with commit a85ff91b
previously we set $target to undef if it's localnode or localhost, then
we check if node exists.
with regression commit, behaviour changes as we do the node check in
else, but $target may be undef. this causes an error:
no such cluster node ''
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
improved readability
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
to achieve this we have to add 3 new scsihw addresses since lsi
controllers can only hold 7 scsi drives
we go up to 31, since this is the limit for virtio-scsi-single devices
we have reserved (we can increase this in the future)
to make it more future proof, we add a new pci bridge under pci
bridge 1, so we have to adapt the bridge adding code (we did not
need this for q35 previously)
impact on live migration:
since on older versions of qemu-server we do not have those config
settings, there is no problem from old -> new
new->old is not supported anyway and this breaks so that
the vm crashes and loses the configs for scsi15-30
(same behaviour as e.g. with audio0 and migration from new->old)
tested with 31 scsi disk on
i440fx + virtio-scsi
i440fx + lsi
q35 + virtio-scsi
q35 + lsi
with ovmf + seabios
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
and adapt the tests
this does not impact live migration, since the order here does not
change the device layout
we want this to consistently have the readconfig first
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
VM.Audit can see the current config and the list of snapshots
already, so there is no real reason to disallow
the config of snapshots
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
from hotplug_pending we go into 'vmconfig_update_disk', where we check the
hotpluggability of options.
add 'ssd' there as a non-hotpluggable option (since we'd have to unplug/plug to
change the drive type)
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
The package will be used for custom CPU models as a SectionConfig, hence
the name. For now we simply move some CPU related helper functions and
declarations over from QemuServer to reduce clutter there.
Exports are to avoid changing all call sites, functions have useful
names on their own.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
As 'qemu_img_format' just matches a regex, this doesn't make much of
a difference, but AFAICT all other calls of 'qemu_img_format' use 'volname'.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
since we handle errors gracefully now, we don't need to write & save
config every time we change a setting.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
get_basic_machine_info was removed by commit 045749f2fc.
Use get_host_arch to get the default machine type instead, and
optionally allow to specify architecture as parameter.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This is the guarantee that this call operates on it's created config.
A VMID cannot be reused afterall. So only remove the guarantee at the
last step, just before throwing up the error message about the clone
failure.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
We clone the source VM firewall config before forking the "realcmd"
worker, but did not mind cleaning it up again if the clone failed
somewhere in the worker.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
* query_understood_cpu_flags returns all flags that QEMU/KVM knows about
* query_supported_cpu_flags returns all flags that QEMU/KVM can use on
this particular host.
To get supported flags, a temporary VM is started with QEMU, so we can
issue the "query-cpu-model-expansion" QMP command. This is how libvirt
queries supported flags for its "host-passthrough" CPU type.
query_supported_cpu_flags is thus rather slow and shouldn't be called
unnecessarily.
Note that KVM and TCG accelerators provide different expansions for the
"host" CPU type, so we need to query both.
Currently only supports x86_64, because QEMU-aarch64 doesn't provide the
necessary querying functions.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
wrap around code which can possibly fail in evals to handle them
gracefully, and log errors.
note: this results in a change of behavior in the API. since errors
are handled gracefully instead of "die"ing, when there is a pending
change which cannot be applied for some reason, it will get logged in
the tasklog but the vm will continue booting regardless. the
non-applied change will stay in the pending section of the
configuration.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
instead of writing the config after every change, we can do it once for
all the changes in the end to avoid redundant i/o.
we also don't need to load_config after writing fastplug changes.
Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
Do the same as for the "create" case, only trigger the "start after
create/restore" task after the locked "realcmd" was done. Else, the
start can never succeed, it also acquires a lock, but restore only
release it once outside of realcmd.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
bump versioned build-dependency, as qemu-server has tests checking
for errors, and we fixed an grammar error in pve-storage, so we need
the newer version to ensure our test go through
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
run_command only passes defined and chomped strings to the callback,
so no need to do that twice.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
QEMU usually only prints warnings and errors and stays silent otherwise,
so it makes sense to just log all of it's output.
Prefix it with '[<target_hostname>]' to indicate that the output is
coming from the remote node, so users know where to search for the
error.
Side effect is that the 'VM start' task created by the migration will
now show the "QEMU:" prefix, but it's still very readable IMHO.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
By default run_command prints the entire commandline executed when an
error occurs, but QEMU and our migrate command are not only
uninteresting to the user[*] but also annoyingly long. Hide them and only
print the exit code.
[*] Especially our migrate command, since it can't be manually executed
anyway. QEMU's commandline *might* contain something interesting, but is
so long that it's tricky to parse anyway, any a user can always call 'qm
showcmd --pretty'.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Split out 'update_disksize' from the renamed 'update_disk_config' to
allow code reuse in QemuMigrate.
Remove dots after messages to keep style consistent for migration log.
After updating in sync_disks (phase1) of migration, write out updated
config. This means that even if migration fails or is aborted in later
stages, we keep the fixed config - this is not an issue, as it would
have been fixed on the next attempt anyway, and it can't hurt to have
the correct size instead of a wrong one either way.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
only VM.PowerMgmt is not enough, since we allocate space on a storage,
so we need VM.Config.Disk on the vm and Datastore.AllocateSpace on the storage
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if the user set a device as hostpci with the 'shorthand' syntax:
hostpciX: 00:12
we ignored it on starting and showcmd and continued.
Since the user explicitly wanted to passthrough a device, we now check
if there is actually a device with that id
for explicitly configured devices (00:12.1), we did not check if it exists,
but the kvm call failed with a non-obvious error message
now we always call 'lspci' from SysFSTools to check if it actually exists,
and fail if not. With this, we can drop the workaround for adding
'0000' if no domain was given, since lspci does it already for us
this fixes#2510, an issue with using mediated devices where the users did not have
the domain in the config, since we forgot to add the default domain there
the only issue with this patch is that it changes the behaviour of
'showcmd' slightly, as in now, we die if the device was explicitly
given, but did not exists (we showed the commandline, now we fail)
this also slightly changes the commandline for qemu (adding always
the domain), which is not a problem since we cannot live migrate
or snapshot such vms, but we have to adapt the tests
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
some storage backends have bigger granularity than the default 128k
size from the EFIVARS template file, so we actually need to poll the
real created disk size, as it will be used to create the target
volume for local storage migration on running VMs, if it's to small
the target will be to small and migration will fail.
Just a fix for newly created EFIDISKS, for others we need to rescan
the size after we've got the migrate lock and write the updated info
out, so that the target node has the correct one (protected from
migrate lock).
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Sometimes, a user wants to remove the 'suspended' state without
resuming the vm from that state. Since the vm is locked with
'suspended', this was not possible without help from root@pam
This patch allows to delete the vmstate and the suspended lock and
related config entries with it. The user still has to have the right
priviliges and the vm cannot be 'protected' for this to work
Inspired-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
we did not actually delete the state if we deleted the 'vmstate' config,
leaving stray vmstates on the disks
actually implement the removal, requiring 'VM.Config.Disk' and
'VM.PowerMgmt' privs
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if a user removed the vmstate from the config for whatever reason,
a vmstart did not remove the 'suspended' lock
so always delete it and delete the vmstate only if it really was there
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>