Live-migrating a VM with more than 14 SCSI disks to a node that doesn't
support it yet is broken. Use a bumped pve-version to represent that and
give the user a nice error message instead.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The previously introduced approach can fail for pinned versions when a
new QEMU release is introduced. The saner approach is to use a mapping
that gives one pve-version for each QEMU release.
Fortunately, the old system has not been bumped yet, so we can still
change it without too much effort.
QEMU versions without a mapping are assumed to be pve0, 4.1 is mapped to
pve1 since thats what we had as our default previously.
Pinned machine versions (i.e. pc-i440fx-4.1) are always assumed to be
pve0, for specific pve-versions they'd have to be pinned as well (i.e.
pc-i440fx-4.1+pve1).
The new logic also makes the pve-version dynamic, and starts VMs with
the lowest possible 'feature-level', i.e. if a feature is only available
with 4.1+pve2, but the VM isn't using it, we still start it with
4.1+pve0.
We die if we don't support a version that is requested from us. This
allows us to use the pve-version as live-migration blocks (i.e. bumping
the version and then live-migrating a VM which uses the new feature (so
is running with the bumped version) to an outdated node will present the
user with a helpful error message and fail instead of silently modifying
the config and only failing *after* the migration).
$version_guard is introduced in config_to_command to use for features
that need to check pve-version, it automatically handles selecting the
newest necessary pve-version for the VM.
Tests have to be adjusted, since all of them now resolve to pve0 instead
of pve1. EXPECT_ERROR matching is changed to use 'eq' instead of regex
to allow special characters in error messages.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
to achieve this we have to add 3 new scsihw addresses since lsi
controllers can only hold 7 scsi drives
we go up to 31, since this is the limit for virtio-scsi-single devices
we have reserved (we can increase this in the future)
to make it more future proof, we add a new pci bridge under pci
bridge 1, so we have to adapt the bridge adding code (we did not
need this for q35 previously)
impact on live migration:
since on older versions of qemu-server we do not have those config
settings, there is no problem from old -> new
new->old is not supported anyway and this breaks so that
the vm crashes and loses the configs for scsi15-30
(same behaviour as e.g. with audio0 and migration from new->old)
tested with 31 scsi disk on
i440fx + virtio-scsi
i440fx + lsi
q35 + virtio-scsi
q35 + lsi
with ovmf + seabios
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
The package will be used for custom CPU models as a SectionConfig, hence
the name. For now we simply move some CPU related helper functions and
declarations over from QemuServer to reduce clutter there.
Exports are to avoid changing all call sites, functions have useful
names on their own.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
With our QEMU 4.1.1 package we can pass a additional internal version
to QEMU's machine, it will be split out there and ignored, but
returned on a QMP 'query-machines' call.
This allows us to use it for increasing the granularity with which we
can roll-out HW layout changes/additions for VMs. Until now we
required a machine version bump, happening normally every major
release of QEMU, with seldom, for us irrelevant, exceptions.
This often delays rolling out a feature, which would break
live-migration, by several months. That can now be avoided, the new
"pve-version" component of the machine can be bumped at will, and
thus we are much more flexible.
That versions orders after the ($major, $minor) version components
from an stable release - it can thus also be reset on the next
release.
The implementation extends the qemu-machine REGEX, remembers
"pve-version" when doing a "query-machines" and integrates support
into the min_version and extract_version helpers.
We start out with a version of 1.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Stefan Reiter <s.reiter@proxmox.com>
...into:
* PVE::QemuServer::Helpers::min_version: check a major.minor version
string with a given major/minor version (this is equivalent to calling
the old qemu_machine_feature_enabled with only $kvmver)
* PVE::QemuServer::Machine::extract_version: get major.minor version
string from arbitrary machine type (e.g. pc-q35-4.0, ...)
* PVE::QemuServer::Machine::machine_version: helper to call
extract_version automatically before min_version
Includes a cfg2cmd test case with pinned machine version.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
...PVE::QemuServer::Machine.
qemu_machine_feature_enabled is exported since it has a *lot* of users
in PVE::QemuServer and a long enough name as it is.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
QMP and monitor helpers are moved from QemuServer.pm.
By using only vm_running_locally instead of check_running, a cyclic
dependency to QemuConfig is avoided. This also means that the $nocheck
parameter serves no more purpose, and has thus been removed along with
vm_mon_cmd_nocheck.
Care has been taken to avoid errors resulting from this, and
occasionally a manual check for a VM's existance inserted on the
callsite.
Methods have been renamed to avoid redundant naming:
* vm_qmp_command -> qmp_cmd
* vm_mon_cmd -> mon_cmd
* vm_human_monitor_command -> hmp_cmd
mon_cmd is exported since it has many users. This patch also changes all
non-package users of vm_qmp_command to use the mon_cmd helper. Includes
mocking for tests.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
vm_exists_on_node in PVE::QemuConfig checks if a config file for a vmid
exists
vm_running_locally in PVE::QemuServer::Helpers checks if a VM is running
on the local machine by probing its pidfile and checking /proc/.../cmdline
check_running is left in QemuServer for compatibility, but changed to
simply call the two new helper functions.
Both methods are also correctly mocked for testing snapshots.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Also remove unused $confdir variable in QemuConfig, but leave it and
$lock_dir there, since those paths should only be used with
cfs_config_path anyway.
nodename() is still called in multiple places, but since it's cached by
INotify it doesn't really matter.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
as else one has no idea what the imported disk is, especially if
multiple unused disks are already present..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
mixed with indentation changes a whole lot of other changes which
should normally not mixed to much together, but this is all a bit
tangled and I'm not sure if splitting it into two or three parts
would help anybody.. just use "-w" (ignore whitespace changes) when
looking at the diff..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Functions like qm importovf can now set the "lock" property in a config file
before calling do_import.
Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
The codepath for "any" hugepages did not check if memory size was even,
leading to the code below trying to allocate half a hugepage (e.g. VM
with 2049MiB RAM would lead to 1024.5 2kB hugepages).
Also improve error message for systems with only 1GB hugepages enabled.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
With the changes to pve-storage in commit 56362cf the startup hangs for
5 minutes on ZFS if the cloudinit disk does not exist. Instead of
calling activate_volume followed by file_size_info we now call
volume_size_info. This should work reliably on all storages that support
cloudinit disks.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
and use it also for efidisk creation and importdisk
this way we correctly handle zfs-over-iscsi options for those cases
also write tests for it
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
As reported in bug #2402, a system started with "default_hugepagesz=1G
hugepagesz=1G" does not have a /sys/kernel/mm/hugepages/hugepages-2048kB
directory.
To fix, ignore the missing directory in hugepages_mount (since it might
not be needed anyway), and correctly check if the requested hugepage
size is available in hugepages_size instead.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
To not change current behaviour and thus breaking live migration USB3
for a Spice USB device requires Qemu v4.1.
The old behavior was that even though technically it was possible to
the set `usb3=1` setting, it was ignored. The bus was hardcoded to
ehci. If another USB2 device was added or the machine type was set to
Q35 an ehci controller was present and the VM was able to boot.
With this patch the behaviour is changing and the bus is set to xhci if
USB3 is set for the Spice USB device and the VM is running under Qemu
v4.1.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
For non pci express passthrough additional addresses are reserved.
For pcie passthrough pcie root ports are needed (unless guest is like
windows 7).
The first 4 pcie root ports are defined by default in the pve-q35*.cfg
files. If more than 4 pcie devices are passed through the needed root
ports are created on demand. This helps to keep live migration possible
without adding a new pve-q35*.cfg file.
For the windows 7 like guests additional addresses are reserved as well.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
with qemu 4.0 we can make use of the new pcie-root-ports with settings
for the width/speed which can resolve issues with some hardware combinations
when negioating link speed
so we add a new q35 cfg that we include with machine types >= 4.0
to preserve live migration of machines without passthrough but q35
for details about the link speeds see:
pcie: Enhanced link speed and width support
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02827.html
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
This is required for Windows to recognize the ISO and as a result the cloudinit
config. This is the minimum to get any config working at all for windows.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
This adds a function to dump the generated cloudinit config. Only one
can be dumped at a time, either 'user', 'network' or 'meta'.
The logic to get user, network and metadata is copied from the other
path that also creates the ISO image to keep it simple and not
complicate the other code path further.
The hash generation for the metadata config is unified between nocloud
and configdrive2 formats. We need it a 3rd time with the new dump
functions so it makes sense to combine it and the metadata config
generation in a single function. The <format>_gen_metadata functions are
each used twice now.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
The variable is used instead of the literal value so we have one single
place to change the actual value of every use.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
file_size_info can't find the file if it is not available, e.g.,
RBD storage with KRBD or LVM where the volume was not yet activated,
returns then 0, which we interpret as the disk not existing, thus
call vdisk_alloc which errors as the disk, in fact, really already exists.
With this patch we call activate_volume before trying
file_size_info, so if the volume exists we get it available and else
we can really create it.
If the disk does not exist and is created with vdisk_alloc we still
require an additional call to activate_volume for the new disk.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
we know the size, and even if a storage plugin pads this up (it
mustn't alloc something smaller, but something bigger can be OK) we
know that our 4MB is OK, and can only be used anyway to make this
compatible between storage plugins.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
use file_size_info to check for existence of cloudinit disk instead of
'-e'. It uses `qemu-img info` to get some file info, which can handle
rbd, and various other paths for volumes not exposed as normal file
or not mapped, yet.
this addresses a problem with rbd where the path returned available
is not checkable with '-e'.
Any size > 0 is interpreted as the image existing.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
create a fixed size cloudinit disk if it is referenced in config and
does not exist. the size of the disk created when first added to the
config is reduced to 4MiB to match the one created in
commit_cloudinit_disk.
maximum file size per snippet file (network, user, meta) is increased to 1MiB.
preparation for offline migration without the cloudinit disk (that is
always regenerated on start).
also fixes#1807, although a further patch is required to change the
vmid on restore
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
configdrive2 uses /etc/network/interfaces style config instead of the
official yaml one. this does not allow quoting of the ip addresses.
Tested with Windows Server 2016.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
Adds the 'cicustom' option to specify either or both network and user
options as property strings. Their parameters are files in a snippets
storage (e.g. local:snippets/network.yaml). If one or both are specified
they are used instead of their respective generated configuration.
This allows the use of completely custom configurations and is also a
possible solution for bug #2068 by specifying a custom user file that
contains package_upgrade: false.
Tested with Ubuntu 18.10 and cloud-init 18.4.7
Signed-off-by: David Limbeck <d.limbeck@proxmox.com>
with such a shared memory device, a vm can share data with other
vms or with the host via memory
one of the use cases is looking-glass[1] with pci-passthrough, which copies
the guest fb to the host and you get a high-speed, low-latency
display client for the vm
on vm stop we delete the file again
1: https://looking-glass.hostfission.com/
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Space or newline after ':' is recognized as a mapping and as a result an
ipv6 ending in ':' is not parsed as a string. The solution is to quote
the address. For consistency all other addresses (including mac) are
quoted.
Signed-off-by: David Limbeck <d.limbeck@proxmox.com>
when no numaX config options were present we returned the
hash as a list instead of a hash reference...
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Currently Proxmox VE always deallocates HugePagesTLB
when starting a new machine and it makes it impossible
to preconfigure kernel /proc/cmdline with persistent allocation.
This change makes deallocation to prefer defaults set by /proc/cmdline,
by parsing the cmdline and respecting hugepages= and hugepagesz=.
Signed-off-by: Kamil Trzciński <ayufan@ayufan.eu>
Win7 is very picky about pcie assignments and fails with
'error 12' the way we add hospci devices.
To combat that, we simply give the hostpci device a normal port
instead.
Start with address 0x10, so that we have space before those devices,
and between them and the ones configured in pve-q35.cfg should we
need it in the future.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
When live migrating, with a q35 machine will get the qemu version
encoded in the machine type, for example,'pc-q35-2.12', so we need to
allow this too and cannot expect that all q35 machine have
q35' in verbatim as their type.
So, when migrating such a machine live, we missed to include the q35
cfg because we didn't allowed versioned q35 machine types, which then
failed the migration.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
On arm we start off with a pcie bridge pcie.0. We need a
keyboard in addition to the tablet device, and we need to
connect both to an 'ehci' controller.
To do all this, we also pass the $arch variable through a
whole lot of function calls to ultimately also adapt the
hotplug code to take care of the new keyboard device.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
SLAAC previously set 'auto' which is not supported by nocloud network
config. On an up-to-date Ubuntu this should work as it uses 'dhcp' for
both dhcp and SLAAC. For others it was invalid anyway.
Signed-off-by: David Limbeck <d.limbeck@proxmox.com>
we change 'vga' to a property string and add a 'memory' property
with this, the user can better control the memory given to the virtual
gpu, this is especially useful for spice/qxl since high resolutions need
more memory
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
this imitates the qemu-guest-agent interface
with an 'exec' api call which returns a pid
and an 'exec-status' api call which takes a pid
the command for the exec call is given as an 'alist'
which means that when using we have to give the 'command'
parameter multiple times e.g.
pvesh create <...>/exec --command ls --command '-lha' --command '/home/user'
so that we avoid having to deal with shell escaping etc.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we have '$conf, $vmid' elsewhere for cloudinit, this was the only
function which had them in reverse
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Leaving files in /tmp was mostly useful for debugging
purposes initially. Also /tmp is a rather insecure option
for this for a final version, so use
/run/pve/cloudinit/$vmid, and move the file writing into
commit_cloudinit_disk() which now takes a hash mapping file
paths to contents, to not duplicate the temp-file logic for
the different citypes.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
With configdrives we end up with the /etc/network/interfaces
file containing the interface names we use on the disk, ie.
eth0/eth1/..., which doesn't work on systems which do not
use this name.
With the 'nocloud' image type we can provide a
network-config in yaml which matches mac addresses. Ideally
we'd use version 2, but debian stretch ships with a too old
cloud-init for this, so for now we're writing version 1.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
We now have a patch on top of qemu to allow 'qemu-img dd'
to read from stdin when specifying input and output sizes,
as well as a way to tell it that the size of the source is
not known.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
perls 'local' must be either used in front of each $SIG{...}
assignments or they must be put in a list, else it affects only the
first variable and the rest are *not* in local context.
In all cases the global signal handlers we overwrote were in cli programs or
forked workers, not in daemons.
Each disk passed as paramater is add as 'unused[n]' in the vm.conf
(the default) or ide[n]|scsi[n]|sata[n]
We rely on qemu-img(1) convert heuristics to detect the file type,
as this works in most case.
foreach_dimm() provides a guest numa node index, when used
in conjunction with the guest-to-host numa node topology
mapping one has to make sure that the correct host-side
indices are used.
This covers situations where the user defines a numaX with
hostnodes=Y with Y != X.
For instance:
cores: 2
hotplug: disk,network,cpu,memory
hugepages: 2
memory: 2048
numa: 1
numa1: memory=512,hostnodes=0,cpus=0-1,policy=bind
numa2: memory=512,hostnodes=0,cpus=2-3,policy=bind
Both numa IDs 1 and 2 passed by foreach_dimm() have to be
mapped to host node 0.
Note that this also reverses the foreach_reverse_dimm() numa
node numbering as the current code, while walking sizes
backwards, walked the numa IDs inside each size forward,
which makes more sense. (Memory hot-unplug is still working
with this.)
vm configuration
----------------
hugepages: (any|2|1024)
any: we'll try to allocate 1GB hugepage if possible, if not we use 2MB hugepage
2: we want to use 2MB hugepage
1024: we want to use 1GB hugepage. (memory need to be multiple of 1GB in this case)
optionnal host configuration for 1GB hugepages
----------------------------------------------
1GB hugepages can be allocated at boot if user want it.
hugepages need to be contiguous, so sometime it's not possible to reserve them on the fly
/etc/default/grub : GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=x"
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>