Commit Graph

108 Commits

Author SHA1 Message Date
Fabian Grünbichler
1dbe979c7c CPUConfig: fix module load when pmxcfs is unavailable
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-03-26 09:03:07 +01:00
Stefan Reiter
5d008ad383 Verify VM-specific CPU configs seperately
$cpu_fmt is being reused for custom CPUs as well as VM-specific CPU
settings. The "pve-vm-cpu-conf" format is introduced to verify a config
specifically for use as VM-specific settings.

"pve-cpu-conf" is registered for use in custom CPU API calls (where no
additional checks are required).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-03-25 18:42:24 +01:00
Stefan Reiter
b3e894883a Adapt CPUConfig to handle custom models
Turn CPUConfig into a SectionConfig with parsing/writing support for
custom CPU models. IO is handled using cfs.

Namespacing will be provided using "custom-" prefix for custom model
names (in VM config only, cpu-models.conf will contain unprefixed
names).

Includes two overrides to avoid writing redundant information to the
config file, additionally get_custom_model is used to retrieve a custom
model configuration by name.

Resolve custom names in print_cpu_device when a custom cpu is passed.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-03-25 18:42:21 +01:00
Fabian Ebner
43c4c7b693 Add unused description to drivedesc_hash
Moved code so that initialization of drivedesc_hash stays a single block.
Avoid auto-vivication in parse_drive.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-03-23 09:58:30 +01:00
Stefan Reiter
746232eeb1 Die on misaligned memory for hotplugging
...instead of booting with an invalid config once and then silently
changing the memory size for consequent VM starts.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Tested-by: Alwin Antreich <a.antreich@proxmox.com>
2020-03-19 18:55:27 +01:00
Stefan Reiter
456bab5445 Disable memory hotplugging for custom NUMA topologies
This cannot work, since we adjust the 'memory' property of the VM config
on hotplugging, but then the user-defined NUMA topology won't match for
the next start attempt.

Check needs to happen here, since it otherwise fails early with "total
memory for NUMA nodes must be equal to vm static memory".

With this change the error message reflects what is actually happening
and doesn't allow VMs with exactly 1GB of RAM either.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Tested-by: Alwin Antreich <a.antreich@proxmox.com>
2020-03-19 18:54:53 +01:00
Fabian Ebner
758a08eb39 Change format for unused drives
and make it match with what parse_drive does. Even though the 'real' format
was pve-volume-id, callers already expected that parse_drive returns a hash
with a valid 'file' key (e.g. PVE/API2/Qemu.pm:1147ff).

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Reviewed-By: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-03-16 13:30:50 +01:00
Thomas Lamprecht
86a2e85a26 cloudinit: make genisoimage only output errors
avoids a genisoimage output like:
> Total translation table size: 0
> Total rockridge attributes bytes: 417
> Total directory bytes: 0
> Path table size(bytes): 10
> Max brk space used 0
> 178 extents written (0 MB)

on every VM start.

Rather than that useless output, tell genisoimage to be quiet, which
still prints errors but nothing else. Additionally print a short
single line about that we're to create the cloud-init iso.

Reformat while at it.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-03-10 14:50:39 +01:00
Thomas Lamprecht
b2d27b3242 update_disksize: small code cleanup
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-03-07 18:34:21 +01:00
Fabian Ebner
63e313f386 Also update disk size if there was no old size
If for whatever reason there is no size in the property string
of a drive, 'qm rescan' would do nothing for that drive and
live migration would also fail.

Also adds a check to avoid potential auto-vivification of volid_hash->{$volid}

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-03-07 18:23:57 +01:00
Fabian Ebner
776c5f5067 Rename disksize to bootdisk_size and print_drive_full to print_drive_commandline_full
to avoid confusion with print_drive

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-03-07 18:23:57 +01:00
Fabian Ebner
e0fd2b2f84 Create Drive.pm and move drive-related code there
The initialization for the drive keys in $confdesc is changed
to be a single for-loop iterating over the keys of $drivedesc_hash and
the initialization of the unusedN keys is move to directly below it.

To avoid the need to change all the call sites, functions with more than
a few callers are exported from the submodule and imported into QemuServer.pm.

For callers of the now imported functions within QemuServer.pm, the prefix
PVE::QemuServer is dropped, because it is unnecessary and now even confusing.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-03-07 18:23:57 +01:00
Stefan Reiter
2cf61f33d9 fix #2264: add virtio-rng device
Allow a user to add a virtio-rng-pci (an emulated hardware random
number generator) to a VM with the rng0 setting. The setting is
version_guard()-ed.

Limit the selection of entropy source to one of three:
/dev/urandom (preferred): Non-blocking kernel entropy source
/dev/random: Blocking kernel source
/dev/hwrng: Hardware RNG on the host for passthrough

QEMU itself defaults to /dev/urandom (or the equivalent getrandom()
call) if no source file is given, but I don't fully trust that
behaviour to stay constant, considering the documentation [0] already
disagrees with the code [1], so let's always specify the file ourselves.

/dev/urandom is preferred, since it prevents host entropy starvation.
The quality of randomness is still good enough to emulate a hwrng, since
a) it's still seeded from the kernel's true entropy pool periodically
and b) it's mixed with true entropy in the guest as well.

Additionally, all sources about entropy predicition attacks I could find
mention that to predict /dev/urandom results, /dev/random has to be
accessed or manipulated in one way or the other - this is not possible
from a VM however, as the entropy we're talking about comes from the
*hosts* blocking pool.

More about the entropy and security implications of the non-blocking
interface in [2] and [3].

Note further that only one /dev/hwrng exists at any given time, if
multiple RNGs are available, only the one selected in
'/sys/devices/virtual/misc/hw_random/rng_current' will feed the file.
Selecting this is left as an exercise to the user, if at all required.

We limit the available entropy to 1 KiB/s by default, but allow the user
to override this. Interesting to note is that the limiter does not work
linearly, i.e. max_bytes=1024/period=1000 means that up to 1 KiB of data
becomes available on a 1000 millisecond timer, not that 1 KiB is
streamed to the guest over the course of one second - hence the
configurable period.

The default used here is the same as given in the QEMU documentation [0]
and has been verified to affect entropy availability in a guest by
measuring /dev/random throughput. 1 KiB/s is enough to avoid any
early-boot entropy shortages, and already has a significant impact on
/dev/random availability in the guest.

[0] https://wiki.qemu.org/Features/VirtIORNG
[1] https://git.qemu.org/?p=qemu.git;a=blob;f=crypto/random-platform.c;h=f92f96987d7d262047c7604b169a7fdf11236107;hb=HEAD
[2] https://lwn.net/Articles/261804/
[3] https://lwn.net/Articles/808575/

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-03-06 18:09:04 +01:00
Thomas Lamprecht
d0cdb1de07 cpu models: add missing comma
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-03-06 17:57:46 +01:00
Alexandre Derumier
bb84db9d3e cpu models: qemu 4.2 : add skylake, icelake, cascadelake notsx
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2020-03-06 17:57:46 +01:00
Alexandre Derumier
257ae68768 cpu models : add icelake-{server|client}
exist since 2018
https://git.qemu.org/?p=qemu.git;a=commit;h=8a11c62da9146dd89aee98947e6bd831e65a970d

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2020-03-06 17:57:46 +01:00
Stefan Reiter
d8f61794f6 fix #2612: allow input-data in guest exec and make command optional
'input-data' can be used to pass arbitrary data to a guest when running
an agent command with 'guest-exec'. Most guest-agent implementations
treat this as STDIN to the command given by "path"/"arg", but some go as
far as relying solely on this parameter, and even fail if "path" or
"arg" are set (e.g. Mikrotik Cloud Hosted Router) - thus "command" needs
to be made optional.

Via the API, an arbitrary string can be passed, on the command line ('qm
guest exec'), an additional '--pass-stdin' flag allows to forward STDIN
of the qm process to 'input-data', with a size limitation of 1 MiB to
not overwhelm QMP.

Without 'input-data' (API) or '--pass-stdin' (CLI) behaviour is unchanged.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-03-03 14:34:02 +01:00
Stefan Reiter
b8fb1c03c3 version_guard scsi drive count
Live-migrating a VM with more than 14 SCSI disks to a node that doesn't
support it yet is broken. Use a bumped pve-version to represent that and
give the user a nice error message instead.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-02-12 10:32:57 +01:00
Stefan Reiter
ac0077cc33 Use 'QEMU version' -> '+pve-version' mapping for machine types
The previously introduced approach can fail for pinned versions when a
new QEMU release is introduced. The saner approach is to use a mapping
that gives one pve-version for each QEMU release.

Fortunately, the old system has not been bumped yet, so we can still
change it without too much effort.

QEMU versions without a mapping are assumed to be pve0, 4.1 is mapped to
pve1 since thats what we had as our default previously.

Pinned machine versions (i.e. pc-i440fx-4.1) are always assumed to be
pve0, for specific pve-versions they'd have to be pinned as well (i.e.
pc-i440fx-4.1+pve1).

The new logic also makes the pve-version dynamic, and starts VMs with
the lowest possible 'feature-level', i.e. if a feature is only available
with 4.1+pve2, but the VM isn't using it, we still start it with
4.1+pve0.

We die if we don't support a version that is requested from us. This
allows us to use the pve-version as live-migration blocks (i.e. bumping
the version and then live-migrating a VM which uses the new feature (so
is running with the bumped version) to an outdated node will present the
user with a helpful error message and fail instead of silently modifying
the config and only failing *after* the migration).

$version_guard is introduced in config_to_command to use for features
that need to check pve-version, it automatically handles selecting the
newest necessary pve-version for the VM.

Tests have to be adjusted, since all of them now resolve to pve0 instead
of pve1. EXPECT_ERROR matching is changed to use 'eq' instead of regex
to allow special characters in error messages.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-02-12 10:32:57 +01:00
Dominik Csapak
2513b862e6 fix #2566: increase scsi limit to 31
to achieve this we have to add 3 new scsihw addresses since lsi
controllers can only hold 7 scsi drives

we go up to 31, since this is the limit for virtio-scsi-single devices
we have reserved (we can increase this in the future)

to make it more future proof, we add a new pci bridge under pci
bridge 1, so we have to adapt the bridge adding code (we did not
need this for q35 previously)

impact on live migration:
since on older versions of qemu-server we do not have those config
settings, there is no problem from old -> new

new->old is not supported anyway and this breaks so that
the vm crashes and loses the configs for scsi15-30
(same behaviour as e.g. with audio0 and migration from new->old)

tested with 31 scsi disk on
i440fx + virtio-scsi
i440fx + lsi
q35 + virtio-scsi
q35 + lsi
with ovmf + seabios

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-01-31 20:26:26 +01:00
Stefan Reiter
d786a27435 Add CPUConfig file and migrate some helpers
The package will be used for custom CPU models as a SectionConfig, hence
the name. For now we simply move some CPU related helper functions and
declarations over from QemuServer to reduce clutter there.

Exports are to avoid changing all call sites, functions have useful
names on their own.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-01-22 15:47:32 +01:00
Tim Marx
2f18c84dc7 add new helper to calculate timeout based on vm config
Signed-off-by: Tim Marx <t.marx@proxmox.com>
2020-01-15 17:36:16 +01:00
Thomas Lamprecht
ae1f94e158 mon_cmd: add explicit return
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-30 17:24:55 +01:00
Thomas Lamprecht
9471e48bf9 implement PVE Version addition for QEMU machine
With our QEMU 4.1.1 package we can pass a additional internal version
to QEMU's machine, it will be split out there and ignored, but
returned on a QMP 'query-machines' call.

This allows us to use it for increasing the granularity with which we
can roll-out HW layout changes/additions for VMs. Until now we
required a machine version bump, happening normally every major
release of QEMU, with seldom, for us irrelevant, exceptions.
This often delays rolling out a feature, which would break
live-migration, by several months. That can now be avoided, the new
"pve-version" component of the machine can be bumped at will, and
thus we are much more flexible.

That versions orders after the ($major, $minor) version components
from an stable release - it can thus also be reset on the next
release.

The implementation extends the qemu-machine REGEX, remembers
"pve-version" when doing a "query-machines" and integrates support
into the min_version and extract_version helpers.

We start out with a version of 1.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-25 16:43:38 +01:00
Thomas Lamprecht
cbfff937ae version_cmp: give info about caller on error
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-25 11:16:38 +01:00
Thomas Lamprecht
825ae5bc3f fixup: use correct version_cmp
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-22 14:18:02 +01:00
Stefan Reiter
2ea5fb7ecf refactor: split qemu_machine_feature_enabled
...into:

* PVE::QemuServer::Helpers::min_version: check a major.minor version
  string with a given major/minor version (this is equivalent to calling
  the old qemu_machine_feature_enabled with only $kvmver)
* PVE::QemuServer::Machine::extract_version: get major.minor version
  string from arbitrary machine type (e.g. pc-q35-4.0, ...)
* PVE::QemuServer::Machine::machine_version: helper to call
  extract_version automatically before min_version

Includes a cfg2cmd test case with pinned machine version.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-20 18:32:35 +01:00
Stefan Reiter
3392d6cacf refactor: extract QEMU machine related helpers to package
...PVE::QemuServer::Machine.

qemu_machine_feature_enabled is exported since it has a *lot* of users
in PVE::QemuServer and a long enough name as it is.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Stefan Reiter
0a13e08ec2 refactor: create QemuServer::Monitor for high-level QMP access
QMP and monitor helpers are moved from QemuServer.pm.

By using only vm_running_locally instead of check_running, a cyclic
dependency to QemuConfig is avoided. This also means that the $nocheck
parameter serves no more purpose, and has thus been removed along with
vm_mon_cmd_nocheck.

Care has been taken to avoid errors resulting from this, and
occasionally a manual check for a VM's existance inserted on the
callsite.

Methods have been renamed to avoid redundant naming:
* vm_qmp_command -> qmp_cmd
* vm_mon_cmd -> mon_cmd
* vm_human_monitor_command -> hmp_cmd

mon_cmd is exported since it has many users. This patch also changes all
non-package users of vm_qmp_command to use the mon_cmd helper. Includes
mocking for tests.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Stefan Reiter
babf613a08 refactor: split check_running into _exists_ and _running_
vm_exists_on_node in PVE::QemuConfig checks if a config file for a vmid
exists

vm_running_locally in PVE::QemuServer::Helpers checks if a VM is running
on the local machine by probing its pidfile and checking /proc/.../cmdline

check_running is left in QemuServer for compatibility, but changed to
simply call the two new helper functions.

Both methods are also correctly mocked for testing snapshots.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Stefan Reiter
d036e418a8 refactor: create QemuServer::Helpers and move file/dir code
Also remove unused $confdir variable in QemuConfig, but leave it and
$lock_dir there, since those paths should only be used with
cfs_config_path anyway.

nodename() is still called in multiple places, but since it's cached by
INotify it doesn't really matter.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-11-20 16:29:23 +01:00
Thomas Lamprecht
c75bf16117 qm importdisk: tell user to what VM disk we actually imported
as else one has no idea what the imported disk is, especially if
multiple unused disks are already present..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-10-29 19:11:21 +01:00
Thomas Lamprecht
5600c5b22f cleanup do_import, s/optional/params/ and move skiplock into params
mixed with indentation changes a whole lot of other changes which
should normally not mixed to much together, but this is all a bit
tangled and I'm not sure if splitting it into two or three parts
would help anybody.. just use "-w" (ignore whitespace changes) when
looking at the diff..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-10-29 19:11:21 +01:00
Dominic Jäger
7f384190de Add skiplock to do_import
Functions like qm importovf can now set the "lock" property in a config file
before calling do_import.

Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
2019-10-29 19:11:21 +01:00
Thomas Lamprecht
93981fa799 refactor hugepages_size conf
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-10-29 17:58:53 +01:00
Thomas Lamprecht
71aba4eac3 refactor hugepages_size
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-10-29 17:49:37 +01:00
Stefan Reiter
062a7ea714 hugepages: fix memory size checking
The codepath for "any" hugepages did not check if memory size was even,
leading to the code below trying to allocate half a hugepage (e.g. VM
with 2049MiB RAM would lead to 1024.5 2kB hugepages).

Also improve error message for systems with only 1GB hugepages enabled.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-10-29 16:52:54 +01:00
Mira Limbeck
9a13f0fed3 cloudinit: fix vm start hanging with disk on ZFS
With the changes to pve-storage in commit 56362cf the startup hangs for
5 minutes on ZFS if the cloudinit disk does not exist. Instead of
calling activate_volume followed by file_size_info we now call
volume_size_info. This should work reliably on all storages that support
cloudinit disks.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2019-10-18 21:40:34 +02:00
Dominik Csapak
af1f1ec038 fix #2395: refactor qemu_img_convert to accept files as source
and use it also for efidisk creation and importdisk
this way we correctly handle zfs-over-iscsi options for those cases

also write tests for it

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-10-17 13:57:21 +02:00
Stefan Reiter
3d8d2e8dad fix #2402: allow 1GB hugepages if 2MB is unavailable
As reported in bug #2402, a system started with "default_hugepagesz=1G
hugepagesz=1G" does not have a /sys/kernel/mm/hugepages/hugepages-2048kB
directory.

To fix, ignore the missing directory in hugepages_mount (since it might
not be needed anyway), and correctly check if the requested hugepage
size is available in hugepages_size instead.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2019-10-10 15:32:46 +02:00
Aaron Lauterer
a022e3fdab tree-wide trailing whitespace cleanup
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2019-09-25 16:55:53 +02:00
Aaron Lauterer
ae36393d5a usb: Add USB3 capabilities to Spice USB devices
To not change current behaviour and thus breaking live migration USB3
for a Spice USB device requires Qemu v4.1.

The old behavior was that even though technically it was possible to
the set `usb3=1` setting, it was ignored. The bus was hardcoded to
ehci. If another USB2 device was added or the machine type was set to
Q35 an ehci controller was present and the VM was able to boot.

With this patch the behaviour is changing and the bus is set to xhci if
USB3 is set for the Spice USB device and the VM is running under Qemu
v4.1.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2019-09-21 13:22:17 +02:00
Aaron Lauterer
47717a90cf usb: Cleanup redundant if condition
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2019-09-21 13:17:40 +02:00
Thomas Lamprecht
e2b0d85dda PCIe passthrough: fixup: avoid addr conflict and cleanup a bit
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-09-06 19:27:30 +02:00
Thomas Lamprecht
d7d698f60c pci: add conflict tests
best viewed with: git show -w

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-09-06 19:27:30 +02:00
Aaron Lauterer
c4e1638148 Add support for up to 16 PCI(e) devices
For non pci express passthrough additional addresses are reserved.
For pcie passthrough pcie root ports are needed (unless guest is like
windows 7).

The first 4 pcie root ports are defined by default in the pve-q35*.cfg
files. If more than 4 pcie devices are passed through the needed root
ports are created on demand. This helps to keep live migration possible
without adding a new pve-q35*.cfg file.

For the windows 7 like guests additional addresses are reserved as well.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2019-09-06 19:27:30 +02:00
Aaron Lauterer
d438e06028 Add PCI address for audio device
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2019-07-18 08:24:39 +02:00
Dominik Csapak
7583d156fd use new pcie port hardware
with qemu 4.0 we can make use of the new pcie-root-ports with settings
for the width/speed which can resolve issues with some hardware combinations
when negioating link speed

so we add a new q35 cfg that we include with machine types >= 4.0
to preserve live migration of machines without passthrough but q35

for details about the link speeds see:

pcie: Enhanced link speed and width support
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02827.html

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-07-08 13:31:55 +02:00
Mira Limbeck
8ebf1734e9 cloudinit: set iso-level in genisoimage call
This is required for Windows to recognize the ISO and as a result the cloudinit
config. This is the minimum to get any config working at all for windows.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2019-06-28 14:27:08 +02:00
Mira Limbeck
e73ca4d0ce add function to dump cloudinit config
This adds a function to dump the generated cloudinit config. Only one
can be dumped at a time, either 'user', 'network' or 'meta'.

The logic to get user, network and metadata is copied from the other
path that also creates the ISO image to keep it simple and not
complicate the other code path further.

The hash generation for the metadata config is unified between nocloud
and configdrive2 formats. We need it a 3rd time with the new dump
functions so it makes sense to combine it and the metadata config
generation in a single function. The <format>_gen_metadata functions are
each used twice now.

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2019-06-06 14:34:11 +02:00