This patch will include all necessary properties for the replication.
Also will it enable and disable a replication job
when appointed flags are set or deleted.
this (correctly!) errored out with Qemu 2.9 when live-migrating
local disks, because the NBD server blocks the VM from being
resumed. was probably missed when migrating via unix domains
was originally introduced..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
This was checking for scsihw being set in both branches
whereas lsi is also the default. Added the missing 'not'.
Fixes a bug where a VM with a disk with a scsi index >= 7
refused to start due to an invalid scsi id.
Reported-by: Friedrich Ramberger <f.ramberger@proxmox.com>
since it can cause I/O errors and data corruption in low
memory or highly fragmented memory situations since Qemu 2.7
use scsi-hd by default instead
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
if qga is enabled, we try to freeze the fs before cancelling block job.
if not , we pause the vm.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This will create a new drive for each local drive found,
and start the vm with this new drives.
if targetstorage == 1, we use same sid than original vm disk
a nbd server is started in qemu and expose local volumes to network port
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
we can use multiple drive_mirror in parralel.
block-job-complete can be skipped, if we want to add more mirror job later.
also add support for nbd uri to qemu_drive_mirror
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
otherwise we end up with undeletable VM configs in case
vdisk_free fails (which could happen because of cluster-wide
lock contention, storage problems, ..).
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
When trying to migrate a VM from a node with qemu server <= 4.0-92 to
a node with qemu server >= 4.0-93 we failed as the remote qemu-server
got no explicit migration_type' from the older qemu server on the
source.
Check if migration_type is defined on a incoming migration start, if
not set it.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
All special flags for Windows 8 and Windows 2012 (win8 type)
are kept the same , since we set flags based on checking if
/^win(\d+)$/ is greater than 6 or 7
like for snapshot, we need to check if krbd is enabled, to known
if we need to use qmp delete-drive-snapshot or storage command directly
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This cleanup windows guest os version handling,
with normalizing ostype with numbers in a new windows_version sub.
if($ostype eq 'wxp' || $ostype eq 'w2k3' || $ostype eq 'w2k') {
$winversion = 5;
} elsif($ostype eq 'w2k8' || $ostype eq 'wvista') {
$winversion = 6;
} elsif ($ostype =~ m/^win(\d+)$/) {
$winversion = $1;
}
so we can simply do test on windows version with lower or upper version
Hyperv enlightments configuration is centralized
in a new add_hyperv_enlighments sub.
Also disable hyperv with win < 8 + ovmf.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Without this patch we use the network were the cluster traffic runs
for sending migration traffic. This is not ideal as it may hinder
cluster traffic. Further some users have a powerful network which
would be perfect for migrations, with this patch they can run the
migration traffic over such a network without having the corosync
traffic on the same network.
The network is configurable through /etc/pve/datacenter.cfg which
got a new property, namely migration. migration has two
subproperties: type (replaces the old migration_unsecure property)
and network.
For the case of a network failure or that a VM has to be moved over
another network for arbitrary other reasons I added the
migration_type and migration_network parameters to qm migrate (and
respectively vm_start as this gets used on migration).
They allow overwriting the datacenter.cfg settings.
Fixes bug #1177
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Let 'cdrom' use the pve-qm-ide format, as it's supposed to
be an alias to ide2.
We're not using the 'alias' schema property since the qemu
configs still use a custom parser (due to the
pending-changes system and the filename-to-volume-id
conversion for legacy support) which does not deal with
schema aliases.
when restoring into an existing VM, we don't want to die
half-way through because we can't delete one of the existing
volumes. instead, warn about the deletion failure, but
continue anyway. the not deleted disk is then added as
unused automatically.
this adds a bootsplash image in /usr/share/qemu-server
and if this file exists, use it for seabios
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
if efidisk0 is defined, use it as a efivars disk,
to permanently store efivars (such as boot options)
we check if the files exist, and act accordingly
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
just a simple disk (only size, format and volid) for
efivars disk
also do not add it to command line in foreach_drive
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
drive-mirror is not working with qemu 2.6 when iothread is enabled.
with virtio-blk : mirror is working, but block-job-completed crash the vm
with virtio-scsi : mirror hang at start.
This should be fixed in qemu 2.7
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
we have a few problems with hotplug at the moment:
qemu may add usb hubs when adding usb devices but fails to remove them
when removing the usb device (this is a qemu bug)
also when starting a guest with a usb device we add ehci and uchi
controllers, which we cannot hot unplug
with those devices, it is impossible to live migrate the guest
to another host, meaning even if you remove all usb devices,
the migrate fails
so we deactivate usb hotplugging for now
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
this patch introduces working usb hotplugging
you can now add a usb device while a vm is running
this does not work with spice at the moment, only
with usb passthrough
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
since usb devices do not have their own
"query" command in qmp, we have to use
qom-list /machines/peripheral
which essentially gets a list of peripheral devices of
the vm
there we only get the usb devices
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
vm configuration
----------------
hugepages: (any|2|1024)
any: we'll try to allocate 1GB hugepage if possible, if not we use 2MB hugepage
2: we want to use 2MB hugepage
1024: we want to use 1GB hugepage. (memory need to be multiple of 1GB in this case)
optionnal host configuration for 1GB hugepages
----------------------------------------------
1GB hugepages can be allocated at boot if user want it.
hugepages need to be contiguous, so sometime it's not possible to reserve them on the fly
/etc/default/grub : GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=x"
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
add them by default for qemu 2.6
(support is already present in qemu 2.5, but we don't want to break live migration for current running vm)
vpindex && runtime need host kernel 4.4
Theses 3 enlightements are needed by windows to use vmbus
http://searchwindowsserver.techtarget.com/definition/Microsoft-Virtual-Machine-Bus-VMBus
details :
- When Hyper-V "vpindex" is on, guest can use MSR HV_X64_MSR_VP_INDEX
to get virtual processor ID.
- Hyper-V "runtime" enlightement feature allows to use MSR
HV_X64_MSR_VP_RUNTIME to get the time the virtual processor consumes
running guest code, as well as the time the hypervisor spends running
code on behalf of that guest.
- Hyper-V "reset" allows guest to reset VM.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
We cannot guarantee when the SSH forward Tunnel really becomes
ready. The check with the mtunnel API call did not help for this
prolem as it only checked that the SSH connection itself works and
that the destination node has quorum but the forwarded tunnel itself
was not checked.
The Forward tunnel is a different channel in the SSH connection,
independent of the SSH `qm mtunnel` channel, so only if that works
it does not guarantees that our migration tunnel is up and ready.
When the node(s) where under load, or when we did parallel
migrations (migrateall), the migrate command was often started
before a tunnel was open and ready to receive data. This led to
a direct abortion of the migration and is the main cause in why
parallel migrations often leave two thirds or more VMs on the
source node.
The issue was tracked down to SSH after debugging the QEMU
process and enabling debug logging showed that the tunnel became
often to late available and ready, or not at all.
Fixing the TCP forward tunnel is quirky and not straight ahead, the
only way SSH gives as a possibility is to use -N (no command)
-f (background) and -o "ExitOnForwardFailure=yes", then it would
wait in the foreground until the tunnel is ready and only then
background itself. This is not quite the nicest way for our special
use case and our code base.
Waiting for the local port to become open and ready (through
/proc/net/tcp[6]] as a proof of concept is not enough, even if the
port is in the listening state and should theoretically accept
connections this still failed often as the tunnel was not yet fully
ready.
Further another problem would still be open if we tried to patch the
SSH Forward method we currently use - which we solve for free with
the approach of this patch - namely the problem that the method
to get an available port (next_migration_port) has a serious race
condition which could lead to multiple use of the same port on a
parallel migration (I observed this on my many test, seldom but if
it happens its really bad).
So lets now use UNIX sockets, which ssh supports since version 5.7.
The end points are UNIX socket bound to the VMID - thus no port so
no race and also no limitation of available ports (we reserved 50 for
migration).
The endpoints get created in /run/qemu-server/VMID.migrate and as
KVM/QEMU in current versions is able to use UNIX socket just as well
as TCP we have not to change much on the interaction with QEMU.
QEMU is started with the migrate_incoming url at the local
destination endpoint and creates the socket file, we then create
a listening socket on the source side and connect over SSH to the
destination.
Now the migration can be started by issuing the migrate qmp command
with an updated uri.
This breaks live migration from new to old, but *not* from old to
new, so there is a upgrade path.
If a live migration from new to old must be made (for whatever
reason), use the unsecure_migration setting (man datacenter.conf)
to allow this, although that should only be done in trusted network.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
With systemd-run qemu's --daemonize forks often happen
before systemd finishes setting up the scopes, which means
the limits we apply often don't work.
We now use enter_systemd_scope() to create the scope before
running qemu directly without systemd-run.
Note that vm_start() runs in a forked-worker or qm cli
command, so entering the scope in such a process should not
affect the rest of the pve daemon.
if we got an option which was not valid, we still
wrote it to the config, and subsequently returned
it on every api call
instead, now we die instead of warn and do not accept
invalid options
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
otherwise, long kvm commands lead to systemd unit files with
very long lines, with confuses the systemd unit file parser.
apparently systemd has a length limit for unit file lines and
(line-)breaks the description string at that point. since
the rest of the description is probably not a valid key/value
pair, this leads to warnings. the default semantics of systemd-run
is to use the executed command as description unless a description
is specified explicitly.
note that this behaviour of systemd could allow an attacker
with access to the VM configuration to craft a kvm commandline
that starts or stops arbitrary systemd units.
previously, we did not check the file parameter of a disk,
allowing passthrough of a block device (by design)
with the change to the json parser for the disks, the format
became 'pve-volume-id' which is only valid for our volume ids
(and later we also allowed the value 'none')
this patch alternatively checks if the parameter is a path
or 'cdrom'
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Otherwise some move operations will fail to delete the old
disk (eg. when moving from ceph to local storage).
Note that in order for the deactivation to succeed we need
to make sure qemu has closed its file descriptors, so we
need to wait for the job to disappear the same way we do in
$cancel_job().
Factored the waiting out into $finish_job().
Additionally since the cpu and host node list isn't
restricted to a single range one can now provide multipel
ranges separated by semicolons. (eg. cpus=0-3;5;7)
The urlencoded format currently cannot check the real
decoded length, so we limit to an upper bound and document
the real limits. Ideally we'd introduce a decodedLength
schema parameter at some point...
/cirrur/cirrus/
/devive/device/
/Numa/NUMA/
and a few grammar fixes, rewrites of sentences
Also if already touching those lines lets break them up from one
liners to a column limit of ~80.
When using OVS tap_plug() resets rate limiting so we need
to pass it along to reapply it.
The rate on its own can still be hot-plugged with the
regular tap_rate_limit() call.
Drop snapshot_create, snapshot_delete and snapshot_rollback
in favour of PVE::AbstractConfig. Qemu-specific parts are
implemented in __snapshot_XX methods in PVE::QemuConfig.
has_feature is made an implementation of the abstract
has_feature, and thus moves to PVE::QemuConfig.
Note: a new hook method needed to be introduced to be called
before creating a volume snapshot, after creating a volume
snapshot, and after unfreezing the guestfs after creating a
volume snapshot. The base method in PVE::AbstractConfig is a
noop, the implemention in PVE::QemuConfig runs the necessary
Qemu monitor commands.
Drop load_config, write_config, lock_config[_xx],
check_lock, check_protection, is_template and config_file
in favour of implementions in PVE::AbstractConfig.
Implement guest_type, __config_max_unused_disks,
config_file_lock and cfs_config_path from
PVE::AbstractConfig in PVE::QemuConfig.
Previously, foreach_drive iterated over all configuration
keys (in a random order) and checked whether the current key
is a valid drive name. Instead, we now iterate over a list
of valid drive names (with deterministic order) and check
whether a drive with such a name exists in the
configuration.
Also rename the two involved methods from valid_drive_name
to is_valid_drive_name (for the check) and from disknames
to valid_drive_names (for the list of valid keys), for
consistency. These two were only used in the qemu-server
code base.
We hold a lock from snapshot_prepare until snapshot_commit,
so there is no need to copy back the snapshot config to the
actual config. This allows to drop a workaround for not
copying the 'machine' type config option.
We don't have any storage types other than LVM which react
to scsi inquiry, and we don't want to treat LVM as a scsi
device, so now we only query devices added as actual /dev
path. This was originally intended to be a pass-through
feature anyway, so this makes sense.
As there the signleton function "kvm_user_version" may not have been
called and with the machine alias q35 the regex from the
qemu_machine_feature_enabled method does not match and thus we
need a valid kvm version here
instead of hardcoding the storagetypes for writing zeros on a
backup restore, we use volume_has_feature with 'sparseinit'
for determining if we can omit writing zeros
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Currently migration is broken, because qemu_machine_pxe return nothing if no pxe rom exist.
That mean that we don't pass -machine flag to migration, and migration is broken between qemu 2.4->2.5
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
qemu 2.5 support a new hyper-v feature: hv_vendor_id
This allow nvidia drivers to install on windows with hyper-v feature on.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
currently we leave orphaned vmstate files when we restore a
backup over a vm, which has snapshots with saved ram state.
this patch deletes those files on a restore.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Since write_config was always called with skiplock=1 except
once, it makes sense to drop this parameter like in
PVE::LXC::write_config . If needed in the future, the
caller can use check_lock before write_config anyway.
The method update_config wrapped update_config_nolock
using lock_config, but to prevent update races the whole
"read config", "do something", "write config" flow was
always protected by lock_config anyway, and update_config
was never called.
Thus, we can safely drop update_config and rename
update_config_nolock to write_config like in PVE::LXC .
since we want the usb3 option to be really boolean and not only
'usb3=yes', we have to change the usb json format a little
to not break existing configs for 'usbX: spice', we set the 'host'
option as non-optional and default_key and allow 'spice' as its
content (this also makes the option less ambiguous)
another side effect is that previously accepted multiple 'host='
entries are now forbidden
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
adding a flag for usb devices (usb3), if this is set to yes,
add a xhci controller and attach the specified devices to it
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
The API passes $skiplock to vm_destroy() which performed a
check conditionally depending on the $skiplock parameter and
then simply calls destroy_vm() inside lock_config() which
did yet another check_lock() without any way to avoid that.
Added the $skiplock parameter to destroy_vm() and removed
the conditional check in vm_destroy() as both happened after
locking the config.
This add support for net trunks vlan filtering
for ovs and linux vlan-aware bridge
Can be mixed with current "tag" option
examples:
----------
allow only 802.1Q packets with vlanid 2,3,4 :
netx: .....,trunks=2,3,4
allow only 802.1Q packets with vlanid 2,3,4 and tag non-802.1Q packets to vlanid 5 :
netx: tag=5,trunks=2,3,4
tag non-802.1Q packets to vlanid 5
netx: tag=5
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
x-vga vfio-pci flag is to enable seabios quirks only.
This patch keep using x-vga=on from proxmox config, to disable hyperv,kvm=off,vga=none by default
but don't pass x-vga to vfio-pci when ovmf is enabled.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
On some storages BLKZEROOUT commands do not work properly
and return without error while having no effect whatsoever.
This can produce various filesystem errors and thus needs
to be made optional.
A drive can now have 'detect_zeroes=off' to disable this
behavior. By default the behavior is the same as before:
always-on (and set to 'unmap' if discard is enabled).
get_used_paths returned a hash of used paths for all the
volumes in a VM's config, which is not enough to figure out
whether there are snapshots, as snapshots often have
different paths. Eg. on ZFS it is not enough to check for
/dev/zvol/tank/vm-123-disk-1 because the snapshot's path is
/dev/zvol/tank/vm-123-disk-1@snap1 and thus we allowed
deleting the drive. Then when trying to delete the snapshot
later you get:
zfs error: cannot open 'tank/vm-751-disk-1': dataset does not exist
and it refuses to delete the snapshot.
Since its only use was to check whether or not a drive is
still in use it is now renamed to is_volume_in_use and
beside checking paths now also checks volume-ids as those
should stay the same.
use it for nic hotplug, because pve-bridge script will
not work after a live migration, because of the PVE_MIGRATED_FROM env var.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Users have reported resume bug when HA is used.
They seem to have a little race (bench show >0s < 1s) between the vm conf file move on source node and replication to,
and resume on target node.
I don't known why this is only with HA, maybe this occur will standard migration too.
Anyway, we don't need to read the vm config file to resume the vm on target host,
as we are sure that the vm is migrated, and config file move action is correct in the cluster.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
used to prevent an unintended virtual machine remove operation
v3 changes:
- changed man page message
- removed protection parameter (where not needed)
The -force flag didn't have any effect since the pending
changes didn't carry over the the flag.
Now forced deletes have an exclamation mark prepended to the
option name.
qemu 2.4 feature
changelog: rebase on last git
Note that currently linux guest don't support unplug of dimm when it'sused by kernel memory.
They are some tunning to do with memory zone movable.
http://events.linuxfoundation.org/sites/events/files/lcjp13_chen.pdf
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Changed from old, now missing, subroutine parse_startup() to new
pve_parse_startup_order() in qemu-server and pve-manager
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Currently enforce with cpumodel=host on amd cpus don't work,
because amd cpus have unsupported flags in qemu.
This is a protection, and this is good.
but cpumodel host should be never use by users for production (only for testing).
For production and stability, users need to choose a true cpu model which filter
the supported cpuflags by qemu.
So I think we can remove the enforce for host model as for testing it's ok.
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 0]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 1]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 2]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 3]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 4]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 5]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 6]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 7]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 8]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 9]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 12]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 13]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 14]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 15]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 16]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 17]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 23]
warning: host doesn't support requested feature: CPUID.80000001H:EDX [bit 24]
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
rdtscp is not supported by qemu and with enforce it's not starting
warning: host doesn't support requested feature: CPUID.80000001H:EDX.rdtscp [bit 27]
from to qemu wiki
http://wiki.qemu.org/Features/CPUModels#Disabling_features_that_were_always_disabled_on_KVM
"Fact: currently libvirt runs CPU models having rdtscp without the "enforce" flag, and rdtscp is silently disabled
Consequence: libvirt SHOULD use something like "-cpu Opteron_G5,-rdtscp",
especially when it starts using (or emulating) enforce mode
This will require a solution on libvirt side. QEMU will just provide the mechanisms to report CPU model information
and check what the host and QEMU supports, but the decision to disable rdtscp to be able
to run Opteron_G[2345] needs to be taken by libvirt."
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Only non-cdrom drives default to cache=none, so the check
for whether to default to aio=native needs to take the same
condition into account.
I combined them close together to make their relation more
visible.
drive-mirror is doing lseek on source image before starting, and this can take a lot of time for big nfs volume
during this time, qmp socket is hanging
http://lists.nongnu.org/archive/html/qemu-devel/2015-05/msg01838.html
so we need to setup a big timeout
qemu devs are currently working to fix this
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Since qemu 2.2, a new "ready" flag has been added to blockjob
http://git.qemu.org/?p=qemu.git;a=commit;h=ef6dbf1e46ebd1d41ab669df5bba0bbdec6bd374
to known if we can complete it.
we can't use len==offset to known if all block are mirrored, because behaviour will change soon in qemu 2.3
http://git.qemu.org/?p=qemu.git;a=commit;h=b21c76529d55bf7bb02ac736b312f5f8bf033ea2
"block/mirror: Improve progress report
Instead of taking the total length of the block device as the block
job's length, use the number of dirty sectors. The progress is now the
number of sectors mirrored to the target block device. Note that this
may result in the job's length increasing during operation, which is
however in fact desirable.
"
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
It is better to check if a VM is running in QemuServer then in Storage.
for the Storage there is no difference if it is running or not.
Signed-off-by: Wolfgang Link <w.link@proxmox.com>
Currently qemu auto fallback to aio=threads if cache=none|directsync
It's better to handle that correctly
see:
https://bugzilla.redhat.com/show_bug.cgi?id=1086704http://wiki.qemu.org/ChangeLog/2.3
Future incompatible changes:
Block device parameter aio=native has no effect without cache.direct=on. It will be made an error.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
we need to remove scsi controller, because live migration will crash,
as on migration target node, we'll start the vm without controller if no disk exist
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
pci bridge are not hot-unplugglable,
which can give us live migration problem,
if we hot-unplug a device on pcibridge 1 or 2, we don't create the pci bridge on target guest
and pci bridge hotplug is not working on all os (windows for example).
So it's better to always add them at startup.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
It wasn't working with 2.6.32,
now that 3.10 kernel is the default, we can enable it.
It's help to be sure that all cpu flags are supported by host && qemu,
to be sure that nothing break
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Paravirtualized End-of-Interrupt Indication (PV-EOI)
Hosts and guests require two VM exits (context switches from a VM to a Hypervisor) for each interrupt:
one to inject the interrupt, and another to signal the end of the interrupt.
With pv_eoi , they can negotiate a paravirtualized end-of-interrupt feature and only require one switch per interrupt.
Number of exits is reduced by half for interrupt-intensive workloads,
such as incoming network traffic with a virtio network device.
This leads to significant reduction in host CPU utilization for such workloads.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This sub compare current machine type to a specific version,
and return 1 if machinetype is bigger or equal to version
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
we always need to enable pooling interval, because it doesn't seem to be setup with -machine option
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
this will check, if it is possibel to rollback a snapshot befor VM will shutdown and get locked.
Signed-off-by: Wolfgang Link <w.link@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
This patch allow to hotplug memory dimm modules
though a new option : dimm_memory
The dimm modules are generated from a map
dimmid size dimm_memory
dimm0 512 512 100.00 0
dimm1 512 1024 50.00 1
dimm2 512 1536 33.33 2
dimm3 512 2048 25.00 3
dimm4 512 2560 20.00 0
dimm5 512 3072 16.67 1
dimm6 512 3584 14.29 2
dimm7 512 4096 12.50 3
dimm8 512 4608 11.11 0
dimm9 512 5120 10.00 1
dimm10 512 5632 9.09 2
dimm11 512 6144 8.33 3
dimm12 512 6656 7.69 0
dimm13 512 7168 7.14 1
dimm14 512 7680 6.67 2
dimm15 512 8192 6.25 3
dimm16 512 8704 5.88 0
dimm17 512 9216 5.56 1
dimm18 512 9728 5.26 2
dimm19 512 10240 5.00 3
dimm20 512 10752 4.76 0
...
dimm241 65536 3260416 2.01 1
dimm242 65536 3325952 1.97 2
dimm243 65536 3391488 1.93 3
dimm244 65536 3457024 1.90 0
dimm245 65536 3522560 1.86 1
dimm246 65536 3588096 1.83 2
dimm247 65536 3653632 1.79 3
dimm248 65536 3719168 1.76 0
dimm249 65536 3784704 1.73 1
dimm250 65536 3850240 1.70 2
dimm251 65536 3915776 1.67 3
dimm252 65536 3981312 1.65 0
dimm253 65536 4046848 1.62 1
dimm254 65536 4112384 1.59 2
dimm255 65536 4177920 1.57 3
max dimm_memory size is 4TB, which is the current qemu limit
If the dimm_memory value is not aligned on memory module, we align the dimm_memory on the next module.
vmid.conf
---------
memory: 1024
numa:1
hotplug: memmory
when hotplug memory option is enabled, the minimum memory value must be 1GB, and also numa need to be enabled.
we assign the first 1GB as static memory, splitted on each numa nodes.
The remaining memory is assigned on hotpluggable dimm devices.
The static memory need to be also 128MB aligned, to have other dimm devices aligned too.
This 128MB alignment is a linux limitation, windows can align on 2MB size.
Numa need to be aligned, as linux guest don't boot on some setup with multi sockets,
and windows need numa to be able to hotplug memory
hotplug
----
qm set <vmid> -memory X (where X is bigger than current value)
unplug (not yet implemented in qemu)
------
qm set <vmid> -memory X (where X is lower than current value)
linux guest
-----------
-acpi hotplug module should be loaded in guest
-need a recent kernel. (tested with 3.10)
can be enable automaticaly, adding:
/lib/udev/rules.d/80-hotplug-cpu-mem.rules
SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", \
ATTR{online}="1"
SUBSYSTEM=="memory", ACTION=="add", TEST=="state", ATTR{state}=="offline", \
ATTR{state}="online"
windows guest
-------------
tested with:
- windows 2012 standard
- windows 2008 enterprise/datacenter
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
vcpus = current allocate vpus to virtual machine
maxcpus is now compute from $sockets*cores
vcpus = maxcpus if not defined
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Original patch by Wolfgang, adopted for new hotplug implementation.
I do not verify link status, because that patch was rejected upstream.
Signed-off-by: Wolfgang Link <wolfgang@linksystems.org>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
commit 1c0c1c17b0
Author: Wolfgang Link <wolfgang@linksystems.org>
Date: Wed Nov 26 11:11:40 2014 +0100
shutdown by Qemu Guest Agent if the agent flag in the config is set
Important: "guest-shutdown" returns only by error a message.
Signed-off-by: Wolfgang Link <wolfgang@linksystems.org>
breaks live migration as it always tries to load the vm config - even in case of $nocheck. Also it double loads the config ($conf && $config)
Signed-off-by: Stefan Priebe <s.priebe@profihost.ag>
This enable numa support inside the guest, and share the memory and cores across the sockets numa nodes.
numa: 0|1
example:
-------
sockets:2
cores:2
memory:4096
numa: 1
qemu command line
-----------------
-object memory-backend-ram,size=2048,id=ram-node0
-numa node,nodeid=0,cpus=0-1,memdev=ram-node0
-object memory-backend-ram,size=2048,id=ram-node1
-numa node,nodeid=1,cpus=2-3,memdev=ram-node1
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
remove the freezefs flag.
If Qemu Guest Agent flag is set in config the vm filesystem will always be frozen,
unless we save RAM.
also remove param freezefs in PVE::API2 snapshot,
because there is no use for it.
Signed-off-by: Wolfgang Link <wolfgang@linksystems.org>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Even if we check the busy flag, we can have sometime race condition if new write
are coming between the query-block-job and the block-job-complete.
block-job-complete throw an error "The active block job for device '%(name)' cannot be completed"
we just need to retry in this case.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
block-job-cancel is async, we need to check that job is really finished
before try to free the volume
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
new config option:
iothread: 1|0
This enable iothread/dataplane support, to improve io performance on fast storages
Currently block jobs don't work yet, it's planned for qemu 2.2.
So it's better to not expose yet this option in gui.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
the machine option is write in the snapshot (ok), but also in the running config (bad)
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
we should push to $devices array instead $cmd array,
because pci bridges need to be create before spice devices
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
multifunction device should be define without the .function
hostpci0: 00:00
example
-------
if 00:00.0
00:00.1
00:00.2
exists,
then we generate the multifunction devices
-device (pci-assign|vfio-pci),host=00:00.0,id=hostpci0.0,bus=...,addr=0x0.0,multifunction=on
-device (pci-assign|vfio-pci),host=00:00.1,id=hostpci0.1,bus=...,addr=0x0.1
-device (pci-assign|vfio-pci),host=00:00.2,id=hostpci0.2,bus=...,addr=0x0.2
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
hostpci0: .....,x-vga=on,pcie=1
x-vga require kernel 3.10 with vfio-vga support enable
if x-vga=on, we force vfio-pci device
pcie=1 choose the pciexpress bus (need q35 machine model)
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
q35 use pcie.0 root by default. so currently we can't start machine model q35.
we need to add 3 pci-bridge pci.0, pci.1, pci.2, to handle our devices.
pcie.0 does not support hotplug. so pci-bridge are defined at startup.
I use an pve-q35.cfg (mostly the same than q35-chipset.cfg from qemu docs).
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
this a new option queue=(\d+) to net interface
Allow to use more than 1 cpu for network stream, so this can improve network bandwidth,
when vhost-net cpu is the bottleneck
http://www.linux-kvm.org/page/Multiqueue#Enable_MQ_feature
-netdev tap,vhost=on,queues=N -device virtio-net-pci,mq=on,vectors=2N+2
host requirement
----------------
this require host kernel >= 3.8 (or qemu die at start)
linux guest requirement
-----------------------
kernel >= 3.8
manual enabling multiqueue
windows guest requierement
--------------------------
recent virtio-net driver
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
We simply add option iscsi if we have an initiator name. So we
never add this option multiple times, and it works with hotplug
in case someone plugs an 'iscsi:' drive later.
enable check if host support all cpu flags configured for the guests
this avoid some bad setup like Opteron vcpu on a intel host for example,
and avoid some bad live migrations
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This reduce guest cpu speed if dirtied bytes is 50% more than the approx.amount of bytes that just got transferred since the last time we were in this routine.
qemu commit :
http://git.qemu.org/?p=qemu.git;a=commit;h=bde1e2ec2176c363c1783bf8887b6b1beb08dfee
tested with "stress -m 2 -c 2" under debian
without autoconvergence : downtime 12s - duration 12min
with autoconvergence : downtime 2s - duration 4min
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
add qxl2 (2monitors),qxl3 (3monitors),qxl4 (4monitors) vga type.
For linux, we only need 1 qxl card with more memory
For windows, we need 1 qxl card by monitor
Original Information from spice-mailing
"
You need to specify multiple devices for Windows VMs. This is what
libvirt gives me (via 'virsh domxml-to-native qemu argv DOMAIN_XML'):
<...> -vga qxl -global qxl-vga.ram_size=67108864 -global qxl-vga.vram_size=33554432 -device qxl,id=video1,ram_size=67108864,vram_size=33554432 -device qxl,id=video2,ram_size=67108864,vram_size=33554432 -device qxl,id=video3,ram_size=67108864,vram_size=33554432
For Linux VM, just one qxl device is OK but then it's advisable to
increase the available RAM:
<...> -vga qxl -global qxl-vga.ram_size=134217728 -global qxl-vga.vram_size=33554432
If you don't turn off surfaces, then you should increase vram size to
say 64 MB from current default of 32 MB.
"
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This patch adds support for unsecure migration using a direct tcp connection
KVM <=> KVM instead of an extra SSH tunnel. Without ssh the limit is just the
bandwith and no longer the CPU / one single core.
You can enable this by adding:
migration_unsecure: 1
to datacenter.cfg
Examples using qemu 1.4 as migration with qemu 1.3 still does not work for me:
current default with SSH Tunnel VM uses 2GB mem:
Dec 27 21:10:32 starting migration of VM 105 to node 'cloud1-1202' (10.255.0.20)
Dec 27 21:10:32 copying disk images
Dec 27 21:10:32 starting VM 105 on remote node 'cloud1-1202'
Dec 27 21:10:35 starting ssh migration tunnel
Dec 27 21:10:36 starting online/live migration on localhost:60000
Dec 27 21:10:36 migrate_set_speed: 8589934592
Dec 27 21:10:36 migrate_set_downtime: 1
Dec 27 21:10:38 migration status: active (transferred 152481002, remaining 1938546688), total 2156396544) , expected downtime 0
Dec 27 21:10:40 migration status: active (transferred 279836995, remaining 1811140608), total 2156396544) , expected downtime 0
Dec 27 21:10:42 migration status: active (transferred 421265271, remaining 1669840896), total 2156396544) , expected downtime 0
Dec 27 21:10:44 migration status: active (transferred 570987974, remaining 1520152576), total 2156396544) , expected downtime 0
Dec 27 21:10:46 migration status: active (transferred 721469404, remaining 1369939968), total 2156396544) , expected downtime 0
Dec 27 21:10:48 migration status: active (transferred 875595258, remaining 1216057344), total 2156396544) , expected downtime 0
Dec 27 21:10:50 migration status: active (transferred 1034654822, remaining 1056931840), total 2156396544) , expected downtime 0
Dec 27 21:10:54 migration status: active (transferred 1176288424, remaining 915369984), total 2156396544) , expected downtime 0
Dec 27 21:10:56 migration status: active (transferred 1339734759, remaining 752050176), total 2156396544) , expected downtime 0
Dec 27 21:10:58 migration status: active (transferred 1503743261, remaining 588206080), total 2156396544) , expected downtime 0
Dec 27 21:11:02 migration status: active (transferred 1645097827, remaining 446906368), total 2156396544) , expected downtime 0
Dec 27 21:11:04 migration status: active (transferred 1810562934, remaining 281751552), total 2156396544) , expected downtime 0
Dec 27 21:11:06 migration status: active (transferred 1964377505, remaining 126033920), total 2156396544) , expected downtime 0
Dec 27 21:11:08 migration status: active (transferred 2077930417, remaining 0), total 2156396544) , expected downtime 0
Dec 27 21:11:09 migration speed: 62.06 MB/s - downtime 37 ms
Dec 27 21:11:09 migration status: completed
Dec 27 21:11:13 migration finished successfuly (duration 00:00:41)
TASK OK
with unsecure migration without SSH Tunnel:
Dec 27 22:43:14 starting migration of VM 105 to node 'cloud1-1203' (10.255.0.22)
Dec 27 22:43:14 copying disk images
Dec 27 22:43:14 starting VM 105 on remote node 'cloud1-1203'
Dec 27 22:43:17 starting online/live migration on 10.255.0.22:60000
Dec 27 22:43:17 migrate_set_speed: 8589934592
Dec 27 22:43:17 migrate_set_downtime: 1
Dec 27 22:43:19 migration speed: 1024.00 MB/s - downtime 1100 ms
Dec 27 22:43:19 migration status: completed
Dec 27 22:43:22 migration finished successfuly (duration 00:00:09)
TASK OK
That way we do not need to run qmp command to get the port.
Set spice ticket expire time to 30 (5 seconds seems a bit too short).
Coding style cleanups.
This add special hyper-v cpu flags for windows guests.
This improve performance and avoid some bsod related to timer.
(I currently disable the hv_vapic flag because I can't get it working).
I have tested all theses flags with: win2003, win2008R2, winxp, linux debian 64bit, on intel and amd physicals processor
It doesn't break live migration, because new cpu flags are not see by guests until a vm reset.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Need for win8 boot.
This flag was missing from rhel < 6.4 host kernel. It's ok now.
But it's also missing from kvm64 model. (It's exist in other cpu models, amd or intel).
So it's pretty safe to enable it.
If the host kernel is older, qemu filter the flag.
This also improve performance of winxp && win7 32 bits guests.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This reduce context switch with multicore guests.
Even if the host cpu don't have x2apic, it's working because qemu have an virtual x2apic implementation for guest.
We need in-kernel irqchip support for this, which is enable for kvm guest since qemu 1.3.
(I don't enable it if nokvm param is set)
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
This is experimental code, spice connections are not encryped and thus insecure.
We use ticket passwords for spice auth, and do direct spice connections to
the nodes instead of using a tunnel.
fix : Use of uninitialized value $bridgeid in numeric lt (<) at /usr/share/perl5/PVE/QemuServer.pm line 2774.
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>