try to increase variable declaration to use locality, reduce
duplication of same variable names used, and try to cleanup a bit in
general..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
For non pci express passthrough additional addresses are reserved.
For pcie passthrough pcie root ports are needed (unless guest is like
windows 7).
The first 4 pcie root ports are defined by default in the pve-q35*.cfg
files. If more than 4 pcie devices are passed through the needed root
ports are created on demand. This helps to keep live migration possible
without adding a new pve-q35*.cfg file.
For the windows 7 like guests additional addresses are reserved as well.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
This adds a new config option called `spice_enhancements` with two
optional settings:
* videostreaming
* foldersharing
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
we now also save the mtime of the binary and cache per binary (for each
arch; this is done so we have it already when we sometime decide
that we want to split the qemu package for each arch)
so that we get the real version if only pve-qemu-kvm was updated
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
The latest changes to our audio device implemenation [0] changed the
naming of the device id to "audio<id>" which in practice resulted in
"audio0".
This conflicts with the predefined audio device in the Q35 configs
that is also using "audio0". The result is that a VM with a
configured audio device and Q35 type will not start.
While we just would had removed the audio0 device if we had detected
this earlier in the new 4.0 q35 config, we cannot do so anymore due
to migration compatibility.
So rename the device from "audio$id" to audiodev$id".
Co-authored-by: Aaron Lauterer <a.lauterer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Both flags were causing issues and have been made into optional flags to
be manually selected by the user if needed.
The supported flags have been merged into a single list for convenience.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
QEMU 4 adds the `-audiodev` parameter to explicitly specify the audio
backend. Setting it avoids occasional error messages when starting a
virtual machine with an audio device and qemu wants to connect it to the
physical audio device.
For now only SPICE is supported as it's also the biggest use case.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
die if we get some unknown value, which is either a bug (new dev type
added to enum but not here) or a user manual edit error, good to
ccatch both
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Until now audio devices had to be added manually with the `args` option
in the VMs config file. This adds a new config option to define an audio
device. It is called `audio0` with the extra `0` to be prepared should
we ever implement support for more than one audio device.
Supported devices are:
* ich9-intel-hda: Intel HD Audio Controller, emulates ICH9
* intel-hda: Intel HD Audio Controller, emulates ICH6
* AC97: useful for older OSs like Win XP
If any of the `intel-hda`s is selected two additional devices will be
created: hda-micro and hda-duplex. The `*intel-hda` controllers need at
least one of them as they emulate microphones and speakers (hda-micro)
or line in and out devices (hda-duplex).
Having both is deemed best practice to avoid problems if some software
needs one of the other kind of output/input ports.
I also cleaned up some old, commented out, code regarding audio devices.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
with qemu 4.0 we can make use of the new pcie-root-ports with settings
for the width/speed which can resolve issues with some hardware combinations
when negioating link speed
so we add a new q35 cfg that we include with machine types >= 4.0
to preserve live migration of machines without passthrough but q35
for details about the link speeds see:
pcie: Enhanced link speed and width support
https://lists.gnu.org/archive/html/qemu-devel/2018-12/msg02827.html
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
If 'now' is passed to the startdate option, the kvm start fails with
below failure.
kvm: invalid datetime format
valid formats: '2006-06-17T16:01:21' or '2006-06-17'
With this patch, 'now' is ignored and not passed to the rtcflags (-rtc).
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
e.g. local storage was considered not allowed for offline migration
even if it is available on all nodes, this should now be fixed as it
is now considered available on all nodes if a local storage isn't
restricted to a specific subset of the available nodes. The user is
responseable to make sure that the datacenter storage config reflects
the actual setup, so there is no additional check for local storages
which aren't available on all nodes if they are not explicitly marked
at datacenter level.
Signed-off-by: Tim Marx <t.marx@proxmox.com>
Regression from change allowing timeouts to be set, now shutting down
also works without timeouts again (previously qmp failed because of
the unknown "timeout" parameter passed to it).
We always delete the timeout value from the arguments, regardless of
truthiness. "delete" returns the deleted element, deleting a
non-existant hash entry returns undef, which is fine after this point:
"deleting non-existent elements returns the undefined value in their
corresponding positions."
- https://perldoc.perl.org/functions/delete.html
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
while it really should not pose an issue for live migrations lets be
on the safe side and also add this only with 3.1 or later machines
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Kernels 4.18+ (4.17+ for evmcs) support new Hyper-V enlightenments for
Windows KVM guests. QEMU supports these since 3.0 and 3.1 respectively.
tlbflush and ipi improve performance on overcommitted systems, evmcs
improves nested virtualization.
It's not entirely clear to me if Win7 already supports these, but since
it doesn't cause any performance penalties (and it works fine without
crashing, which makes sense either way, because Hyper-V enlightenments
are opt-in by the guest OS), enabling it regardless should be fine.
(As opposed to adding a new if branch for win8+)
Feature explanations to the best of my understanding:
hv_tlbflush allows the guest OS to trigger tlb shootdowns via a
hypercall. This allows CPUs to be identified via their vpindex (which
makes hv_vpindex a prerequisite to hv_tlbflush, but that is already
handled in our code). In overcommited configurations, where multiple
vCPUs reside on one pCPU, this increases performance of guest tlb
flushes, by only flushing each pCPU once. It also allows multiple tlb
flushes with only one vmexit.
hv_ipi allows sending inter-processor interrupts via vpindex, once again
making it a prerequisite. Benefits are pretty much as with tlbflush.
hv_evmcs is a VM control structure in L1 guest memory, allowing an L1 guest
to modify L2 VMCS and entering L2 without having the L0 host perform an
expensive VMCS update on trapping the nested vmenter.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
The "guest-shutdown" guest agent call is blocking for some reason, so if
it fails (e.g. agent not installed on guest) only the default timeout of
10 minutes (see QMPClient.pm, sub cmd) would apply.
With this change, if (and only if) a timeout is specified via CLI/API,
it is used instead. In case it is not specified, behaviour stays the
same (default 10 min timeout).
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This should help with the rare case where stop mode backups
fail to restart due to the $vmid.scope not being completely
gone when we want to restart. This queries systemd via dbus,
and if the scope is still there, awaits a UnitRemoved signal
for the scope from dbus.
For now with a 5 second timeout... (given that the processes
are already gone and it's really just waiting for systemd to
wake up, this should be plenty...)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
to acutally allow to set the values requested by #2190 we need to
alter the limit a bit, as else the requested values cannot be save.
Just double it to 512, which should be really enough for this, else
go complain to your software vendor for needing such _hacks_ in the
first place..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
On some occasions e.g. license checking, the manufacturer string in the
SMBIOS settings edit has to allow characters such as whitespaces.
https://forum.proxmox.com/threads/proxmox-and-windows-rok-license-for-dell.53236/
In principle SMBIOS allows to pass any zero terminated string to the
corresponding fields in the structure type 1 (System Information).
By base64 encoding the values clashing of the config is avoided.
Relies on the corresponding patch to pve-manager to obtain base64 encoded values.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
for both vm_mon_cmd calls. under certain circumstances, the following
sequence of events can otherwise fail when live-migrating under load:
S...source node
T...target node
0: migration is complete, handover from S to T starts
1: S: logically move VM config file from S to T via rename()
2: S: rename returns, config file is (visibly) moved on S
3: S: trigger resume on T via mtunnel
4a: T: call vm_resume while config file move is not yet visible on T
4b: T: call vm_resume while config file move is already visible on T
4a instead of 4b means vm_mon_cmd will die in check_running unless
vm_mon_cmd_nocheck is used.
under heavy pmxcfs load and a slow cluster/corosync network, there can
be a few seconds of delay between 1 and 2, with a subsequent race ending in 4a
instead of 4b.
this issue was reported to occur on bulk migrations.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
cloudinit is just completely broken...
Until we rewrite this to a sane designe (i.e., never backup, mirror,
... any CI disk anywhere - the state is in the config and gets
created on start and deleted on stop anyway) do this..
Co-developed-by: Mira Limbek <m.limbek@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Resolves the issue of restoring a VM that has a cloudinit drive
configured to a new VMID. The VMID of the disk name gets now remapped
correctly and in the process the cloudinit disk is created with the same size
as in PVE/API2/Qemu.pm create_disks and in PVE/QemuServer/Cloudinit
commit_cloudinit_disk as the same constant is used.
This is done by matching any line starting with either 'ide', 'sata' or
'scsi' and then check if it is a cloudinit drive. As cloudinit drives
are attached as cdrom, only those 3 are possible. For the cloudinit
check we use 'parse_drive' followed by 'drive_is_cloudinit' so no custom
regex is necessary.
'--targetstorage' is also taken into account on restore now for
cloudinit disks. If a target storage is specified the format is either
kept if possible, or changed to the default of that storage.
This should fix#1807 completely. The restore error was already resolved
with commit 7e8ab2a, but the vmid of the disk might not have matched the new
one.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
with commit 64d1a6a it's now possible to specify a format other than raw
or qcow2 when creating VMs. This can lead to an error when cloning the
VMs and a cloudinit disk with a different format is attached (e.g.
vmdk).
We use QEMU_FORMAT_RE in drive_is_cloudinit and according to the
QEMU_FORMAT_RE we support 7 different formats.
With this change we add any format other than 'raw' as '.<format>' to the
name and no longer die on any other format. Cloudinit disks with invalid
format are not cloned as the drive is recognized as cdrom, not cloudinit.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
The restore of a backup from a VM template will first restore the VM and then
convert the restored VM back into a template.
This automatically performes the steps of the current behaviour, where the user
has to manually convert the restored VM back to a template.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
used for online local disks via qemu_drive_mirror
Add TODO comment for offline disks, as clone_disk calls `qemu-img
convert`, which does not have a bandwidth limit parameter.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
this makes it possible to give a storage for state saving, if one
wants to use a different storage than for snapshots or does not
want to save this info into the config
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if a vm has the 'suspended' lock, we resume with the saved state
and remove the lock, the saved vmstate and the saved runningmachine
after the vm started
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
the idea is to have the same logic as with snapshots, but without
the snapshotting of the disks, and after saving the vm state (incl memory),
we hard shut off the guest.
this way the disks will not be touched anymore by the guest
to prevent any alteration of the vm (incl migration, hw changes, etc) we
add a config lock 'suspend'
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we map scsiX to virtioscsiX/scsihwX when we use virtio-scsi-single to add
and iothread so we have to map it back when we delete an iothread, else the
parsing fails with
'invalid drive key: virtioscsi0'
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
qemu-img uses the qemu default initiator name 'iqn.2008-11.org.linux-kvm'
since we use the one of the host (/etc/iscsi/initiatorname.iscsi) when
using it with a running vm, we want to using it also when moving a disk
with qemu-img
to do that we have give qemu-img the image in as a full option string
this fixes the issue that we could not move an zfs-over-iscsi disk
without allowing the default qemu initiator
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Adds the 'cicustom' option to specify either or both network and user
options as property strings. Their parameters are files in a snippets
storage (e.g. local:snippets/network.yaml). If one or both are specified
they are used instead of their respective generated configuration.
This allows the use of completely custom configurations and is also a
possible solution for bug #2068 by specifying a custom user file that
contains package_upgrade: false.
Tested with Ubuntu 18.10 and cloud-init 18.4.7
Signed-off-by: David Limbeck <d.limbeck@proxmox.com>
we also need to set the link status if the whole device changed,
otherwise a change of macaddress allows a network connection even
if link_down is set to 1
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
with such a shared memory device, a vm can share data with other
vms or with the host via memory
one of the use cases is looking-glass[1] with pci-passthrough, which copies
the guest fb to the host and you get a high-speed, low-latency
display client for the vm
on vm stop we delete the file again
1: https://looking-glass.hostfission.com/
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
This allows to set the wwn parameter for ide, sata and scsi disks in the VM
config and passes it to the qemu command on execution.
VirtIO Block does not supports this property, so exclude it from
there.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
commit 3c23aa808c tried to fix a issue
where after a stop mode backup a scope could still linger around, but
it actually removed the wrong check. If we want to remove a
lingering, not yet cleaned up, scope we need to check if said scope
exists not if a VM process is still running. While they are corelated
the scope will always get cleaned up _after_ it's processes are gone.
Should fix#2043, but as this is seemingly not that easy to fix one
for all I'll put the should as disclaimer here.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Cc: Wolfgang Bumiller <w.bumiller@proxmox.com>
this adds a new config option for it, and executes it on four
points in time:
'pre-start'
'post-start'
'pre-stop'
'post-stop'
on pre-start we abort if the script fails
and pre-stop will not be called if the vm crashes or if
the vm gets powered off from inside the guest
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
The qm CLI command offer the config and showcmd functions. Both of those
outputs may vary with respect to a given snapshot. This adds a switch
that shows the corresponding snapshot's config and command line.
The code needs a newer libpve-guest-common-perl, thus bumping the
dependency.
Signed-off-by: Rhonda D'Vine <rhonda@proxmox.com>
this patch allows the user to explicitely set a virtual vga,
even when using the 'x-vga' flag, this is sometimes necessary,
as some users need the 'x-vga' flag on the pci device,
but still want to use a virtual vga
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
with this, a user can set the hv_vendor_id independently of
any 'x-vga=on' setting he may or may not have configured.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Win7 is very picky about pcie assignments and fails with
'error 12' the way we add hospci devices.
To combat that, we simply give the hostpci device a normal port
instead.
Start with address 0x10, so that we have space before those devices,
and between them and the ones configured in pve-q35.cfg should we
need it in the future.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
When not setting 'vga' we would get a warning:
Use of uninitialized value $type in string eq at
/usr/share/perl5/PVE/QemuServer.pm line 2026.
This patch changes the order of the conditions and checks if $type is set
before using it, so that we do not get the warning anymore.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
so that one can explicitly disable the vga without having to specify
a serial port as display, this is mostly useful for very special
and custom gpu passthrough setups which have to be specified with
'args' and for setups which do not care about any display (not even serial)
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
there is nothing that should be really affected by this, but
even then, this option is only for experts and people using this
should know what they are doing
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
since lspci does not split between id and function anymore,
there is no need to plug id + function together
also we can remove the capture groups from PCIRE
since parse_property_string does this check for us
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
with this, we are able to create and use mediated devices,
which include Intel GVT-g (aka KVMGT) and Nvidia vGPUs, and probably more
types of devices in the future
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we reverse the direction of the event socket (this does not
prevent live migration) and point it to wher qmeventd listens
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
On arm we start off with a pcie bridge pcie.0. We need a
keyboard in addition to the tablet device, and we need to
connect both to an 'ehci' controller.
To do all this, we also pass the $arch variable through a
whole lot of function calls to ultimately also adapt the
hotplug code to take care of the new keyboard device.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
This was never actually used, but we want to use it as
alternative to checking /proc/cpuinfo for 'hvm' on ARM.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
with commit 55655ebc32
we changed $vga to a parsed hash instead of a string
and forgot to check the property type in one place
this fixes an issue where a vm with a gpu passed through
with x-vga=on could not start
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we change 'vga' to a property string and add a 'memory' property
with this, the user can better control the memory given to the virtual
gpu, this is especially useful for spice/qxl since high resolutions need
more memory
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
When enabled, the `ssd` property exposes drives as SSDs (rather than
rotational hard disks) by setting QEMU's `rotation_rate` property [1,
2] on `ide-hd`, `scsi-block`, and `scsi-hd` devices. This is required
to enable support for TRIM and SSD-specific optimizations in certain
guest operating systems that are limited to emulated controller types
(IDE, AHCI, and non-VirtIO SCSI).
This change also unifies the diverging IDE and SATA code paths in
QemuServer::print_drivedevice_full(), which suffered from:
* Code duplication: The only differences between IDE and SATA were in
bus-unit specification and maximum device counts.
* Inconsistent implementation: The IDE code used the new `ide-hd`
and `ide-cd` device types, whereas SATA still relied on the deprecated
`ide-drive` [3, 4] (which doesn't support `rotation_rate`).
* Different feature sets: The IDE code exposed a `model` property that
the SATA code didn't, even though QEMU supports it for both.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1498042
[2] https://lists.gnu.org/archive/html/qemu-devel/2017-10/msg00698.html
[3] https://www.redhat.com/archives/libvir-list/2012-March/msg00684.html
[4] https://lists.gnu.org/archive/html/qemu-devel/2017-05/msg02024.html
Signed-off-by: Nick Chevsky <nchevsky@gmail.com>
we will use this for the qmeventd, but we have to limit this
to qemu 2.12, because we cannot add this during a live migration
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Instead of our own. The code is almost the same, but the
upstream implementation uses qemu's transactional system and
performs a drain() on the block device first. This seems to
help avoid some issues we run into with qcow2 files when
creating snapshots.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
do not use $1 do write out config, if code gets added this may easily
get overwritten, as vmgenid is a fixed key just hardcode it.
also move the comment to where it actually belongs
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This caused a few hiccups with qemu 3.0...
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Acked-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
instead of overwriting the 'machine' config in the snapshot,
use its own 'runningmachine' config only for the snapshot
this way, we do not lose the machine type if it was
explicitely set during the snapshot, but deleted afterwards
we also have to adapt the tests for this
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
> The following are important CPU features that should be used on
> Intel x86 hosts, when available in the host CPU. Some of them
> require explicit configuration to enable, as they are not included
> by default in some, or all, of the named CPU models listed above.
> In general all of these features are included if using “Host
> passthrough” or “Host model”.
>
> pcid: Recommended to mitigate the cost of the Meltdown
> (CVE-2017-5754) fix. Included by default in Haswell, Broadwell &
> Skylake Intel CPU models. Should be explicitly turned on for
> Westmere, SandyBridge, and IvyBridge Intel CPU models. Note that
> some desktop/mobile Westmere CPUs cannot support this feature.
>
> spec-ctrl: Required to enable the Spectre (CVE-2017-5753 and
> CVE-2017-5715) fix, in cases where retpolines are not sufficient.
> Included by default in Intel CPU models with -IBRS suffix. Must be
> explicitly turned on for Intel CPU models without -IBRS suffix.
> Requires the host CPU microcode to support this feature before it
> can be used for guest CPUs.
>
> ssbd: Required to enable the CVE-2018-3639 fix. Not included by
> default in any Intel CPU model. Must be explicitly turned on for
> all Intel CPU models. Requires the host CPU microcode to support
> this feature before it can be used for guest CPUs.
>
> pdpe1gbr: Recommended to allow guest OS to use 1GB size pages.Not
> included by default in any Intel CPU model. Should be explicitly
> turned on for all Intel CPU models. Note that not all CPU hardware
> will support this feature.
-- https://www.berrange.com/posts/2018/06/29/cpu-model-configuration-for-qemu-kvm-on-x86-hosts/
changelog v2:
- remove hash
- remove check if cdrom
if we try to delete a snapshot, and that is disk from the snapshot
is not attached anymore (unused), we can't delete the snapshot
with qemu snapshot delete command (for storage which use it (qcow2,rbd,...))
example:
...
unused0: rbd:vm-107-disk-3
[snap1]
...
scsi2: rbd:vm-107-disk-3,size=1G
-> die
qmp command 'delete-drive-snapshot' failed - Device 'drive-scsi2' not found
If drive is not attached, we need to use the storage snapshot delete command
tells an user what would get touched, so he has a chance to fix
unwanted things before changes are actually made.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Else an user has no idea what, or if something happened.
Gets printed to tty when using qm rescan or to tasklog for the case
where we do a rescan after restoring a backup.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Unused disk(s) appeared after a rescan of storages. Especially shown
with ceph pools, where two storage entries are made, <storage>_ct and
<storage>_vm. The rescan method did include images from both storages.
This patch filters any storage not containing the content type 'images'.
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
when a vm is suspended (e.g. autosuspend on windows)
we detect that it is not running, display the resume button,
but 'cont' does not wakeup the system from suspend
with this we can wake up suspended vms
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
the not definedness check is unecessary here, since it does not
do anything then, and to check balloon twice is also not necessary
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Deleting the balloon config entry means resetting it to its
default. This means having a balloon device but not actually
doing any ballooning with it (iow. resetting the VM's
'balloon' value to its specified memory.).
Hotplugging a balloon device (coming from explicit '0' to
any other value (including deleting it)) is not possible.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
To avoid potential cleanup & post-start actions to cause
unwanted processes (such as gpg-agent) to be started as part
of the scope, as the enter_systemd_scope() function causes
the current process to enter the scope.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
when using q35 as machine type, there are nested pci-bridges,
but we only checked the first layer
this resulted in not being able to hotplug scsi devices,
because scsihw0 was deeper in the pci-bridge construct, we did not see
it and tried to add it (which fails of course)
this patch checks all bridges, regardless how deeply nested they are
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
it is not necessary to check the romfile of the running vm
for .pxe machine types, since the machine type itself is not
hot-pluggable
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
we accidentally moved nbd_stop to CloudInit.pm in
commit 0c9a7596f6
and removed it in
commit 3db6e4ab70
without realizing that live local storage migration still depends on it
readd it
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
With QEMU 2.10 the serial parameter of the -drive command line option
was deprecated [1], so move the logic which adds this parameter now
to the -drive analogue -device CLI option.
Features marked deprecated will continue to work for two releases[2],
so we need to switch over before 2.12, AFAICT.
[1]: https://wiki.qemu.org/ChangeLog/2.10#Deprecated_options
[2]: https://qemu.weilnetz.de/doc/qemu-doc.html#Deprecated-features
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
since this requires O_DIRECT support by the underlying storage, which
might not be available.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
this fixes an issue with zvols, which require cache=none and eat up all
free memory as buffered pages otherwise
https://github.com/zfsonlinux/zfs/issues/7235
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Modern noVNC does not needs this anymore, actually things may get
worse if it's used. E.g., when one sets 'de' and the VM locale is
'de' you may get a 'ĸ' (unicode kra) if you want to send an ampersand
character through pressing SHIFT + 6.
Qemus manual pages confirms that this is most times not needed
anymore:
> -k language
> Use keyboard layout language (for example "fr" for
> French). This option is only needed where it is not
> easy to get raw PC keycodes (e.g. on Macs, with some
> X11 servers or with a VNC or curses display). You don't
> normally need to use it on PC/Linux or PC/Windows
> hosts.
-- man kvm
An user can always set it per VM, wew simply remove the implict
default derived from the cluster wide datacenter.cfg
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The git history of this is not immediately obvious due to
the date of the cloud init patches, but the removal of this
line was basically reverted by them later at merge-time.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
We introduced our QMP socket with commit
c971c4f221 (29.05.2012)
Already tried to remove this with commit
7b7c6d1b5d (13.07.2012)
But reverted that to allow migration of VMs still using the old
montior to ones which already switched over to the new QMP one,
in commit dab36e1ee9 (17.08.2012)
see bug #242 for reference
This was all done and released in PVE 2.2, as no migration through
nodes differing more than one major version is possible we can
finally remove this code for good.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Checking for the cgroup directory is a kind of time-of-check
time-of-use race condition stop-mode backups seem to
occasionally run into on some systems.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
With configdrives we end up with the /etc/network/interfaces
file containing the interface names we use on the disk, ie.
eth0/eth1/..., which doesn't work on systems which do not
use this name.
With the 'nocloud' image type we can provide a
network-config in yaml which matches mac addresses. Ideally
we'd use version 2, but debian stretch ships with a too old
cloud-init for this, so for now we're writing version 1.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
move: don't error out with "you can't move a cdrom"
clone: always full-clone cloud-init images
They get completely replaced anyway at the next start, so
there's no point in keeping them.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
*) always replace old cloudinit images
*) apply pending cloudinit changes when generating a new
image
For cloudinit we now always use vdisk_free before
vdisk_alloc in order to always replace old images, this
allows us to hotplug a new drive by setting it to
`none,media=cdrom` first (to eject the disk), then setting
it back to 'storage:cloudinit' to have a new image generated
after applying the currently pending changes.
socat tunnel for nbd mirror was introduce here
https://pve.proxmox.com/pipermail/pve-devel/2017-January/024777.html
to workaround when nbd client was hanging on non responding nbd server.
We have added a 30s timeout on socat tunnel, but when we migrate
multiple disks, it can break migration if for example first disk
is already finished and don't send any new datas in the tunnel.
The connect timeout bug has been fixed in qemu 2.9,
so we can remove the socat tunnel now.
With shared=1, (live) migration ignores the disk and assumes it is
present on all target nodes. This works similar to shared=1 on LXC
mountpoints.
Signed-off-by: Chris Hofstaedtler <chris.hofstaedtler@deduktiva.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
if the value was '0', we did not append the option to the drive,
resulting in wrong command line if the qemu default of an option is not
'0'
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Skylake-Server is the Xeon variant of Skylake
max is "all features supported by the accelerator in the current host"
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
this have the 'spec-ctrl' flag by default to allow IBRS based Spectre
mitigation by the guest kernel.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Hugepages can take some time to be allocated by qemu at start (60s for 120G of 1G hugepages).
This patch increase start timeout to 5min when hugepages are enabled.
Currently this only allows specifying '+pcid' or '-pcid'
but might be extended in the future.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Fixes: 2bfbee039b ("include format for efidisk")
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This also fixes a bug where VMs with no memory defined in the config
where reported as using 0MB instead of 512.
Signed-off-by: Emmanuel Kasper <e.kasper@proxmox.com>
when having an unused disk on a storage for which there are multiple
definitions, we added it again on another storage when that storage
was alphabetically before the already existing one
this happens for example when using our automatically generated
ceph storages: 'pool_ct' and 'pool_vm' and having a vm with
an unused disk
with this patch, we also leave the unused disks in the hash
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
this means that we do not include the '-k' parameter anymore by default
(which is deprecated by qemu)
with this, noVNC and spice always respect the guest keyboard
configuration and altgr keys work without problems
tested:
ubuntu with english intl and german with novnc and spice
windows 10 with english intl and german with novnc and spice
live migration
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
since the guest-fsfreeze-freeze command has a timeout of 1 hour,
we want to check if the guest-agent even runs before executing that,
or else we wait 1 hour and then continue
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
if the efidisk is in 'raw' format, qemu will prevent writes
on block zero if the format is not explicitely given
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
'These options take an integer value and control the "cpu.shares"
control group attribute. The allowed range is 2 to 262144. Defaults to
1024.' – man 5 systemd.resource-control
we only checked if a vm had in use base disks when deleting them,
at which point we do not stop to delete the vm even when a
disk deletion fails, which means we could successfully delete the config
and all not used (base) disks of a template, resulting in left over vm disks
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
perls 'local' must be either used in front of each $SIG{...}
assignments or they must be put in a list, else it affects only the
first variable and the rest are *not* in local context.
In all cases the global signal handlers we overwrote were in cli programs or
forked workers, not in daemons.
this was only kept for PVE 4.X where the switch to the newer OVMF
image with actual working persisten EFIVARS was made.
We do not ship this old image in PVE 5.0 anymore so remove this
legacy code as it can never trigger anyhow.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
factor out code in a new create_efidisk submethod, as else this code
is hardly readable as the efidisk0 case is a special case. Refer from
putting all this specialised handling directly to the much shorter
code for all other cases.
Also the disk was created with a specific format and then a format
detection on the newly created disk was done, which is pretty
useless, clear that up.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
perls 'local' must be either used in front of each $SIG{...}
assignments or they must be put in a list, else it affects only the
first variable and the rest are *not* in local context.
This may cause weird behaviour where daemons seemingly do not get
terminating signals delivered correctly and thus may not shutdown
gracefully anymore.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Moved the check to the beginning of the function.
VMs configured to use KVM won't start if KVM is not available.
VMs not configured to use KVM will start regardless.
Fixes a typo in the function name and removes the $nokvm parameter, as it's only
used to immideately exit the function. Instead calling the function
conditionally.
If we get an VM machine older than 2.9 we use the old selection
expression for the VGA type. This allows to live migrate VMs to PVE
5.0 from beta 1 and PVE 4.4 again.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
the syntax was wrong, it was (e.g. for iops-write):
throttling.iops-write=-max100
instead of
throttling.iops-write-max=100
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
QemuServer::lspci() iterates over /sys/bus/pci/devices which
doesn't guarantee any order which means functions sometimes
ended up in the wrong order and it was never clear which
one would get the additional options such as x-vga passed
to them.
bps_max_length & friends were wrongly named and were only
passed to qemu when hot-applying changes. They can only
be passed via the command line with their new names. For
consistency let's rename them all, that way they're all in
one place.
Fixes#1195 (for real this time).
We cannot look for ports on "any" wildcard address while
letting qemu bind to "localhost", this may lead to a qemu
process occupying ::1 while the next search successfully
finds the same port available for IPv4's '*' address.
Instead, we now lookup the IP of the desired family for
'localhost'. Note that while we could simply be hardcoding
::1 or 127.0.0.1, with this code we are protocol agnostic.
For calculation of "current_core" the input variable id is decremented.
For calculation of "current_socket" this decrement was missing resulting
in a wrong value when "cores" is set to 1.
Signed-off-by: Tobias Böhm <tb@robhost.de>
This patch will include all necessary properties for the replication.
Also will it enable and disable a replication job
when appointed flags are set or deleted.
this (correctly!) errored out with Qemu 2.9 when live-migrating
local disks, because the NBD server blocks the VM from being
resumed. was probably missed when migrating via unix domains
was originally introduced..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
This was checking for scsihw being set in both branches
whereas lsi is also the default. Added the missing 'not'.
Fixes a bug where a VM with a disk with a scsi index >= 7
refused to start due to an invalid scsi id.
Reported-by: Friedrich Ramberger <f.ramberger@proxmox.com>
since it can cause I/O errors and data corruption in low
memory or highly fragmented memory situations since Qemu 2.7
use scsi-hd by default instead
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
if qga is enabled, we try to freeze the fs before cancelling block job.
if not , we pause the vm.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This will create a new drive for each local drive found,
and start the vm with this new drives.
if targetstorage == 1, we use same sid than original vm disk
a nbd server is started in qemu and expose local volumes to network port
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
we can use multiple drive_mirror in parralel.
block-job-complete can be skipped, if we want to add more mirror job later.
also add support for nbd uri to qemu_drive_mirror
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
otherwise we end up with undeletable VM configs in case
vdisk_free fails (which could happen because of cluster-wide
lock contention, storage problems, ..).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
When trying to migrate a VM from a node with qemu server <= 4.0-92 to
a node with qemu server >= 4.0-93 we failed as the remote qemu-server
got no explicit migration_type' from the older qemu server on the
source.
Check if migration_type is defined on a incoming migration start, if
not set it.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
All special flags for Windows 8 and Windows 2012 (win8 type)
are kept the same , since we set flags based on checking if
/^win(\d+)$/ is greater than 6 or 7
like for snapshot, we need to check if krbd is enabled, to known
if we need to use qmp delete-drive-snapshot or storage command directly
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This cleanup windows guest os version handling,
with normalizing ostype with numbers in a new windows_version sub.
if($ostype eq 'wxp' || $ostype eq 'w2k3' || $ostype eq 'w2k') {
$winversion = 5;
} elsif($ostype eq 'w2k8' || $ostype eq 'wvista') {
$winversion = 6;
} elsif ($ostype =~ m/^win(\d+)$/) {
$winversion = $1;
}
so we can simply do test on windows version with lower or upper version
Hyperv enlightments configuration is centralized
in a new add_hyperv_enlighments sub.
Also disable hyperv with win < 8 + ovmf.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Without this patch we use the network were the cluster traffic runs
for sending migration traffic. This is not ideal as it may hinder
cluster traffic. Further some users have a powerful network which
would be perfect for migrations, with this patch they can run the
migration traffic over such a network without having the corosync
traffic on the same network.
The network is configurable through /etc/pve/datacenter.cfg which
got a new property, namely migration. migration has two
subproperties: type (replaces the old migration_unsecure property)
and network.
For the case of a network failure or that a VM has to be moved over
another network for arbitrary other reasons I added the
migration_type and migration_network parameters to qm migrate (and
respectively vm_start as this gets used on migration).
They allow overwriting the datacenter.cfg settings.
Fixes bug #1177
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Let 'cdrom' use the pve-qm-ide format, as it's supposed to
be an alias to ide2.
We're not using the 'alias' schema property since the qemu
configs still use a custom parser (due to the
pending-changes system and the filename-to-volume-id
conversion for legacy support) which does not deal with
schema aliases.
when restoring into an existing VM, we don't want to die
half-way through because we can't delete one of the existing
volumes. instead, warn about the deletion failure, but
continue anyway. the not deleted disk is then added as
unused automatically.
this adds a bootsplash image in /usr/share/qemu-server
and if this file exists, use it for seabios
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if efidisk0 is defined, use it as a efivars disk,
to permanently store efivars (such as boot options)
we check if the files exist, and act accordingly
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
just a simple disk (only size, format and volid) for
efivars disk
also do not add it to command line in foreach_drive
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
drive-mirror is not working with qemu 2.6 when iothread is enabled.
with virtio-blk : mirror is working, but block-job-completed crash the vm
with virtio-scsi : mirror hang at start.
This should be fixed in qemu 2.7
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
we have a few problems with hotplug at the moment:
qemu may add usb hubs when adding usb devices but fails to remove them
when removing the usb device (this is a qemu bug)
also when starting a guest with a usb device we add ehci and uchi
controllers, which we cannot hot unplug
with those devices, it is impossible to live migrate the guest
to another host, meaning even if you remove all usb devices,
the migrate fails
so we deactivate usb hotplugging for now
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
this patch introduces working usb hotplugging
you can now add a usb device while a vm is running
this does not work with spice at the moment, only
with usb passthrough
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
since usb devices do not have their own
"query" command in qmp, we have to use
qom-list /machines/peripheral
which essentially gets a list of peripheral devices of
the vm
there we only get the usb devices
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
vm configuration
----------------
hugepages: (any|2|1024)
any: we'll try to allocate 1GB hugepage if possible, if not we use 2MB hugepage
2: we want to use 2MB hugepage
1024: we want to use 1GB hugepage. (memory need to be multiple of 1GB in this case)
optionnal host configuration for 1GB hugepages
----------------------------------------------
1GB hugepages can be allocated at boot if user want it.
hugepages need to be contiguous, so sometime it's not possible to reserve them on the fly
/etc/default/grub : GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=x"
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
add them by default for qemu 2.6
(support is already present in qemu 2.5, but we don't want to break live migration for current running vm)
vpindex && runtime need host kernel 4.4
Theses 3 enlightements are needed by windows to use vmbus
http://searchwindowsserver.techtarget.com/definition/Microsoft-Virtual-Machine-Bus-VMBus
details :
- When Hyper-V "vpindex" is on, guest can use MSR HV_X64_MSR_VP_INDEX
to get virtual processor ID.
- Hyper-V "runtime" enlightement feature allows to use MSR
HV_X64_MSR_VP_RUNTIME to get the time the virtual processor consumes
running guest code, as well as the time the hypervisor spends running
code on behalf of that guest.
- Hyper-V "reset" allows guest to reset VM.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
We cannot guarantee when the SSH forward Tunnel really becomes
ready. The check with the mtunnel API call did not help for this
prolem as it only checked that the SSH connection itself works and
that the destination node has quorum but the forwarded tunnel itself
was not checked.
The Forward tunnel is a different channel in the SSH connection,
independent of the SSH `qm mtunnel` channel, so only if that works
it does not guarantees that our migration tunnel is up and ready.
When the node(s) where under load, or when we did parallel
migrations (migrateall), the migrate command was often started
before a tunnel was open and ready to receive data. This led to
a direct abortion of the migration and is the main cause in why
parallel migrations often leave two thirds or more VMs on the
source node.
The issue was tracked down to SSH after debugging the QEMU
process and enabling debug logging showed that the tunnel became
often to late available and ready, or not at all.
Fixing the TCP forward tunnel is quirky and not straight ahead, the
only way SSH gives as a possibility is to use -N (no command)
-f (background) and -o "ExitOnForwardFailure=yes", then it would
wait in the foreground until the tunnel is ready and only then
background itself. This is not quite the nicest way for our special
use case and our code base.
Waiting for the local port to become open and ready (through
/proc/net/tcp[6]] as a proof of concept is not enough, even if the
port is in the listening state and should theoretically accept
connections this still failed often as the tunnel was not yet fully
ready.
Further another problem would still be open if we tried to patch the
SSH Forward method we currently use - which we solve for free with
the approach of this patch - namely the problem that the method
to get an available port (next_migration_port) has a serious race
condition which could lead to multiple use of the same port on a
parallel migration (I observed this on my many test, seldom but if
it happens its really bad).
So lets now use UNIX sockets, which ssh supports since version 5.7.
The end points are UNIX socket bound to the VMID - thus no port so
no race and also no limitation of available ports (we reserved 50 for
migration).
The endpoints get created in /run/qemu-server/VMID.migrate and as
KVM/QEMU in current versions is able to use UNIX socket just as well
as TCP we have not to change much on the interaction with QEMU.
QEMU is started with the migrate_incoming url at the local
destination endpoint and creates the socket file, we then create
a listening socket on the source side and connect over SSH to the
destination.
Now the migration can be started by issuing the migrate qmp command
with an updated uri.
This breaks live migration from new to old, but *not* from old to
new, so there is a upgrade path.
If a live migration from new to old must be made (for whatever
reason), use the unsecure_migration setting (man datacenter.conf)
to allow this, although that should only be done in trusted network.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
With systemd-run qemu's --daemonize forks often happen
before systemd finishes setting up the scopes, which means
the limits we apply often don't work.
We now use enter_systemd_scope() to create the scope before
running qemu directly without systemd-run.
Note that vm_start() runs in a forked-worker or qm cli
command, so entering the scope in such a process should not
affect the rest of the pve daemon.
if we got an option which was not valid, we still
wrote it to the config, and subsequently returned
it on every api call
instead, now we die instead of warn and do not accept
invalid options
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
otherwise, long kvm commands lead to systemd unit files with
very long lines, with confuses the systemd unit file parser.
apparently systemd has a length limit for unit file lines and
(line-)breaks the description string at that point. since
the rest of the description is probably not a valid key/value
pair, this leads to warnings. the default semantics of systemd-run
is to use the executed command as description unless a description
is specified explicitly.
note that this behaviour of systemd could allow an attacker
with access to the VM configuration to craft a kvm commandline
that starts or stops arbitrary systemd units.
previously, we did not check the file parameter of a disk,
allowing passthrough of a block device (by design)
with the change to the json parser for the disks, the format
became 'pve-volume-id' which is only valid for our volume ids
(and later we also allowed the value 'none')
this patch alternatively checks if the parameter is a path
or 'cdrom'
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Otherwise some move operations will fail to delete the old
disk (eg. when moving from ceph to local storage).
Note that in order for the deactivation to succeed we need
to make sure qemu has closed its file descriptors, so we
need to wait for the job to disappear the same way we do in
$cancel_job().
Factored the waiting out into $finish_job().
Additionally since the cpu and host node list isn't
restricted to a single range one can now provide multipel
ranges separated by semicolons. (eg. cpus=0-3;5;7)
The urlencoded format currently cannot check the real
decoded length, so we limit to an upper bound and document
the real limits. Ideally we'd introduce a decodedLength
schema parameter at some point...
/cirrur/cirrus/
/devive/device/
/Numa/NUMA/
and a few grammar fixes, rewrites of sentences
Also if already touching those lines lets break them up from one
liners to a column limit of ~80.
When using OVS tap_plug() resets rate limiting so we need
to pass it along to reapply it.
The rate on its own can still be hot-plugged with the
regular tap_rate_limit() call.
Drop snapshot_create, snapshot_delete and snapshot_rollback
in favour of PVE::AbstractConfig. Qemu-specific parts are
implemented in __snapshot_XX methods in PVE::QemuConfig.
has_feature is made an implementation of the abstract
has_feature, and thus moves to PVE::QemuConfig.
Note: a new hook method needed to be introduced to be called
before creating a volume snapshot, after creating a volume
snapshot, and after unfreezing the guestfs after creating a
volume snapshot. The base method in PVE::AbstractConfig is a
noop, the implemention in PVE::QemuConfig runs the necessary
Qemu monitor commands.
Drop load_config, write_config, lock_config[_xx],
check_lock, check_protection, is_template and config_file
in favour of implementions in PVE::AbstractConfig.
Implement guest_type, __config_max_unused_disks,
config_file_lock and cfs_config_path from
PVE::AbstractConfig in PVE::QemuConfig.
Previously, foreach_drive iterated over all configuration
keys (in a random order) and checked whether the current key
is a valid drive name. Instead, we now iterate over a list
of valid drive names (with deterministic order) and check
whether a drive with such a name exists in the
configuration.
Also rename the two involved methods from valid_drive_name
to is_valid_drive_name (for the check) and from disknames
to valid_drive_names (for the list of valid keys), for
consistency. These two were only used in the qemu-server
code base.
We hold a lock from snapshot_prepare until snapshot_commit,
so there is no need to copy back the snapshot config to the
actual config. This allows to drop a workaround for not
copying the 'machine' type config option.
We don't have any storage types other than LVM which react
to scsi inquiry, and we don't want to treat LVM as a scsi
device, so now we only query devices added as actual /dev
path. This was originally intended to be a pass-through
feature anyway, so this makes sense.
As there the signleton function "kvm_user_version" may not have been
called and with the machine alias q35 the regex from the
qemu_machine_feature_enabled method does not match and thus we
need a valid kvm version here
instead of hardcoding the storagetypes for writing zeros on a
backup restore, we use volume_has_feature with 'sparseinit'
for determining if we can omit writing zeros
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Currently migration is broken, because qemu_machine_pxe return nothing if no pxe rom exist.
That mean that we don't pass -machine flag to migration, and migration is broken between qemu 2.4->2.5
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
qemu 2.5 support a new hyper-v feature: hv_vendor_id
This allow nvidia drivers to install on windows with hyper-v feature on.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
currently we leave orphaned vmstate files when we restore a
backup over a vm, which has snapshots with saved ram state.
this patch deletes those files on a restore.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Since write_config was always called with skiplock=1 except
once, it makes sense to drop this parameter like in
PVE::LXC::write_config . If needed in the future, the
caller can use check_lock before write_config anyway.
The method update_config wrapped update_config_nolock
using lock_config, but to prevent update races the whole
"read config", "do something", "write config" flow was
always protected by lock_config anyway, and update_config
was never called.
Thus, we can safely drop update_config and rename
update_config_nolock to write_config like in PVE::LXC .
since we want the usb3 option to be really boolean and not only
'usb3=yes', we have to change the usb json format a little
to not break existing configs for 'usbX: spice', we set the 'host'
option as non-optional and default_key and allow 'spice' as its
content (this also makes the option less ambiguous)
another side effect is that previously accepted multiple 'host='
entries are now forbidden
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
adding a flag for usb devices (usb3), if this is set to yes,
add a xhci controller and attach the specified devices to it
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
The API passes $skiplock to vm_destroy() which performed a
check conditionally depending on the $skiplock parameter and
then simply calls destroy_vm() inside lock_config() which
did yet another check_lock() without any way to avoid that.
Added the $skiplock parameter to destroy_vm() and removed
the conditional check in vm_destroy() as both happened after
locking the config.
This add support for net trunks vlan filtering
for ovs and linux vlan-aware bridge
Can be mixed with current "tag" option
examples:
----------
allow only 802.1Q packets with vlanid 2,3,4 :
netx: .....,trunks=2,3,4
allow only 802.1Q packets with vlanid 2,3,4 and tag non-802.1Q packets to vlanid 5 :
netx: tag=5,trunks=2,3,4
tag non-802.1Q packets to vlanid 5
netx: tag=5
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
x-vga vfio-pci flag is to enable seabios quirks only.
This patch keep using x-vga=on from proxmox config, to disable hyperv,kvm=off,vga=none by default
but don't pass x-vga to vfio-pci when ovmf is enabled.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
On some storages BLKZEROOUT commands do not work properly
and return without error while having no effect whatsoever.
This can produce various filesystem errors and thus needs
to be made optional.
A drive can now have 'detect_zeroes=off' to disable this
behavior. By default the behavior is the same as before:
always-on (and set to 'unmap' if discard is enabled).
get_used_paths returned a hash of used paths for all the
volumes in a VM's config, which is not enough to figure out
whether there are snapshots, as snapshots often have
different paths. Eg. on ZFS it is not enough to check for
/dev/zvol/tank/vm-123-disk-1 because the snapshot's path is
/dev/zvol/tank/vm-123-disk-1@snap1 and thus we allowed
deleting the drive. Then when trying to delete the snapshot
later you get:
zfs error: cannot open 'tank/vm-751-disk-1': dataset does not exist
and it refuses to delete the snapshot.
Since its only use was to check whether or not a drive is
still in use it is now renamed to is_volume_in_use and
beside checking paths now also checks volume-ids as those
should stay the same.
use it for nic hotplug, because pve-bridge script will
not work after a live migration, because of the PVE_MIGRATED_FROM env var.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Users have reported resume bug when HA is used.
They seem to have a little race (bench show >0s < 1s) between the vm conf file move on source node and replication to,
and resume on target node.
I don't known why this is only with HA, maybe this occur will standard migration too.
Anyway, we don't need to read the vm config file to resume the vm on target host,
as we are sure that the vm is migrated, and config file move action is correct in the cluster.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
used to prevent an unintended virtual machine remove operation
v3 changes:
- changed man page message
- removed protection parameter (where not needed)
The -force flag didn't have any effect since the pending
changes didn't carry over the the flag.
Now forced deletes have an exclamation mark prepended to the
option name.
qemu 2.4 feature
changelog: rebase on last git
Note that currently linux guest don't support unplug of dimm when it'sused by kernel memory.
They are some tunning to do with memory zone movable.
http://events.linuxfoundation.org/sites/events/files/lcjp13_chen.pdf
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Changed from old, now missing, subroutine parse_startup() to new
pve_parse_startup_order() in qemu-server and pve-manager
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Currently enforce with cpumodel=host on amd cpus don't work,
because amd cpus have unsupported flags in qemu.
This is a protection, and this is good.
but cpumodel host should be never use by users for production (only for testing).
For production and stability, users need to choose a true cpu model which filter
the supported cpuflags by qemu.
So I think we can remove the enforce for host model as for testing it's ok.
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 0]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 1]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 2]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 3]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 4]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 6]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 7]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 8]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 9]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 12]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 13]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 14]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 15]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 16]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 17]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 23]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 24]
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
rdtscp is not supported by qemu and with enforce it's not starting
warning: host doesn't support requested feature: CPUID.80000001H:EDX.rdtscp [bit 27]
from to qemu wiki
http://wiki.qemu.org/Features/CPUModels#Disabling_features_that_were_always_disabled_on_KVM
"Fact: currently libvirt runs CPU models having rdtscp without the "enforce" flag, and rdtscp is silently disabled
Consequence: libvirt SHOULD use something like "-cpu Opteron_G5,-rdtscp",
especially when it starts using (or emulating) enforce mode
This will require a solution on libvirt side. QEMU will just provide the mechanisms to report CPU model information
and check what the host and QEMU supports, but the decision to disable rdtscp to be able
to run Opteron_G[2345] needs to be taken by libvirt."
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Only non-cdrom drives default to cache=none, so the check
for whether to default to aio=native needs to take the same
condition into account.
I combined them close together to make their relation more
visible.
drive-mirror is doing lseek on source image before starting, and this can take a lot of time for big nfs volume
during this time, qmp socket is hanging
http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg01838.html
so we need to setup a big timeout
qemu devs are currently working to fix this
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Since qemu 2.2, a new "ready" flag has been added to blockjob
http://git.qemu.org/?p=qemu.git;a=commit;h=ef6dbf1e46ebd1d41ab669df5bba0bbdec6bd374
to known if we can complete it.
we can't use len==offset to known if all block are mirrored, because behaviour will change soon in qemu 2.3
http://git.qemu.org/?p=qemu.git;a=commit;h=b21c76529d55bf7bb02ac736b312f5f8bf033ea2
"block/mirror: Improve progress report
Instead of taking the total length of the block device as the block
job's length, use the number of dirty sectors. The progress is now the
number of sectors mirrored to the target block device. Note that this
may result in the job's length increasing during operation, which is
however in fact desirable.
"
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
It is better to check if a VM is running in QemuServer then in Storage.
for the Storage there is no difference if it is running or not.
Signed-off-by: Wolfgang Link <w.link@proxmox.com>
Currently qemu auto fallback to aio=threads if cache=none|directsync
It's better to handle that correctly
see:
https://bugzilla.redhat.com/show_bug.cgi?id=1086704http://wiki.qemu.org/ChangeLog/2.3
Future incompatible changes:
Block device parameter aio=native has no effect without cache.direct=on. It will be made an error.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
we need to remove scsi controller, because live migration will crash,
as on migration target node, we'll start the vm without controller if no disk exist
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
pci bridge are not hot-unplugglable,
which can give us live migration problem,
if we hot-unplug a device on pcibridge 1 or 2, we don't create the pci bridge on target guest
and pci bridge hotplug is not working on all os (windows for example).
So it's better to always add them at startup.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
It wasn't working with 2.6.32,
now that 3.10 kernel is the default, we can enable it.
It's help to be sure that all cpu flags are supported by host && qemu,
to be sure that nothing break
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Paravirtualized End-of-Interrupt Indication (PV-EOI)
Hosts and guests require two VM exits (context switches from a VM to a Hypervisor) for each interrupt:
one to inject the interrupt, and another to signal the end of the interrupt.
With pv_eoi , they can negotiate a paravirtualized end-of-interrupt feature and only require one switch per interrupt.
Number of exits is reduced by half for interrupt-intensive workloads,
such as incoming network traffic with a virtio network device.
This leads to significant reduction in host CPU utilization for such workloads.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This sub compare current machine type to a specific version,
and return 1 if machinetype is bigger or equal to version
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
we always need to enable pooling interval, because it doesn't seem to be setup with -machine option
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
this will check, if it is possibel to rollback a snapshot befor VM will shutdown and get locked.
Signed-off-by: Wolfgang Link <w.link@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
This patch allow to hotplug memory dimm modules
though a new option : dimm_memory
The dimm modules are generated from a map
dimmid size dimm_memory
dimm0 512 512 100.00 0
dimm1 512 1024 50.00 1
dimm2 512 1536 33.33 2
dimm3 512 2048 25.00 3
dimm4 512 2560 20.00 0
dimm5 512 3072 16.67 1
dimm6 512 3584 14.29 2
dimm7 512 4096 12.50 3
dimm8 512 4608 11.11 0
dimm9 512 5120 10.00 1
dimm10 512 5632 9.09 2
dimm11 512 6144 8.33 3
dimm12 512 6656 7.69 0
dimm13 512 7168 7.14 1
dimm14 512 7680 6.67 2
dimm15 512 8192 6.25 3
dimm16 512 8704 5.88 0
dimm17 512 9216 5.56 1
dimm18 512 9728 5.26 2
dimm19 512 10240 5.00 3
dimm20 512 10752 4.76 0
...
dimm241 65536 3260416 2.01 1
dimm242 65536 3325952 1.97 2
dimm243 65536 3391488 1.93 3
dimm244 65536 3457024 1.90 0
dimm245 65536 3522560 1.86 1
dimm246 65536 3588096 1.83 2
dimm247 65536 3653632 1.79 3
dimm248 65536 3719168 1.76 0
dimm249 65536 3784704 1.73 1
dimm250 65536 3850240 1.70 2
dimm251 65536 3915776 1.67 3
dimm252 65536 3981312 1.65 0
dimm253 65536 4046848 1.62 1
dimm254 65536 4112384 1.59 2
dimm255 65536 4177920 1.57 3
max dimm_memory size is 4TB, which is the current qemu limit
If the dimm_memory value is not aligned on memory module, we align the dimm_memory on the next module.
vmid.conf
---------
memory: 1024
numa:1
hotplug: memmory
when hotplug memory option is enabled, the minimum memory value must be 1GB, and also numa need to be enabled.
we assign the first 1GB as static memory, splitted on each numa nodes.
The remaining memory is assigned on hotpluggable dimm devices.
The static memory need to be also 128MB aligned, to have other dimm devices aligned too.
This 128MB alignment is a linux limitation, windows can align on 2MB size.
Numa need to be aligned, as linux guest don't boot on some setup with multi sockets,
and windows need numa to be able to hotplug memory
hotplug
----
qm set <vmid> -memory X (where X is bigger than current value)
unplug (not yet implemented in qemu)
------
qm set <vmid> -memory X (where X is lower than current value)
linux guest
-----------
-acpi hotplug module should be loaded in guest
-need a recent kernel. (tested with 3.10)
can be enable automaticaly, adding:
/lib/udev/rules.d/80-hotplug-cpu-mem.rules
SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", \
ATTR{online}="1"
SUBSYSTEM=="memory", ACTION=="add", TEST=="state", ATTR{state}=="offline", \
ATTR{state}="online"
windows guest
-------------
tested with:
- windows 2012 standard
- windows 2008 enterprise/datacenter
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
vcpus = current allocate vpus to virtual machine
maxcpus is now compute from $sockets*cores
vcpus = maxcpus if not defined
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Original patch by Wolfgang, adopted for new hotplug implementation.
I do not verify link status, because that patch was rejected upstream.
Signed-off-by: Wolfgang Link <wolfgang@linksystems.org>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>