Commit Graph

615 Commits

Author SHA1 Message Date
Dominik Csapak
de9768f002 refactor PCI into own file
to reduce QemuServer.pm size
also move the $device hash out of any function

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2016-06-22 09:13:16 +02:00
Alexandre Derumier
7023f3ea16 add hugepages option
vm configuration
----------------
hugepages: (any|2|1024)

any: we'll try to allocate 1GB hugepage if possible, if not we use 2MB hugepage
2: we want to use 2MB hugepage
1024: we want to use 1GB hugepage. (memory need to be multiple of 1GB in this case)

optionnal host configuration for 1GB hugepages
----------------------------------------------
1GB hugepages can be allocated at boot if user want it.
hugepages need to be contiguous, so sometime it's not possible to reserve them on the fly

/etc/default/grub : GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=x"

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2016-06-22 09:11:11 +02:00
Fabian Grünbichler
b74ff0476e add @param to foreach_drive 2016-06-17 16:20:57 +02:00
Wolfgang Link
b6adff3385 fix perl scope issues
Add parameter array to foreach_volid to use is in the functions.
correct typos.
2016-06-16 11:26:37 +02:00
Wolfgang Bumiller
387ba25792 split old style pipe open call 2016-06-09 18:12:26 +02:00
Alexandre Derumier
7a131888d7 add hyperv enlightments : hv_reset, hv_vpindex, hv_runtime
add them by default for qemu 2.6
(support is already present in qemu 2.5, but we don't want to break live migration for current running vm)

vpindex && runtime need host kernel 4.4

Theses 3 enlightements are needed by windows to use vmbus
http://searchwindowsserver.techtarget.com/definition/Microsoft-Virtual-Machine-Bus-VMBus

details :

- When Hyper-V "vpindex" is on, guest can use MSR HV_X64_MSR_VP_INDEX
to get virtual processor ID.

- Hyper-V "runtime" enlightement feature allows to use MSR
HV_X64_MSR_VP_RUNTIME to get the time the virtual processor consumes
running guest code, as well as the time the hypervisor spends running
code on behalf of that guest.

- Hyper-V "reset" allows guest to reset VM.

Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2016-06-07 10:08:25 +02:00
Thomas Lamprecht
54323eed5f migrate: unlink unix socket before starting migration
Just to be sure nobody else has (wrongfully) left that file here.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 16:02:25 +02:00
Thomas Lamprecht
1c9d54bfd0 migrate: use ssh forwarded UNIX socket tunnel
We cannot guarantee when the SSH forward Tunnel really becomes
ready. The check with the mtunnel API call did not help for this
prolem as it only checked that the SSH connection itself works and
that the destination node has quorum but the forwarded tunnel itself
was not checked.

The Forward tunnel is a different channel in the SSH connection,
independent of the SSH `qm mtunnel` channel, so only if that works
it does not guarantees that our migration tunnel is up and ready.

When the node(s) where under load, or when we did parallel
migrations (migrateall), the migrate command was often started
before a tunnel was open and ready to receive data. This led to
a direct abortion of the migration and is the main cause in why
parallel migrations often leave two thirds or more VMs on the
source node.
The issue was tracked down to SSH after debugging the QEMU
process and enabling debug logging showed that the tunnel became
often to late available and ready, or not at all.

Fixing the TCP forward tunnel is quirky and not straight ahead, the
only way SSH gives as a possibility is to use -N (no command)
-f (background) and -o "ExitOnForwardFailure=yes", then it would
wait in the foreground until the tunnel is ready and only then
background itself. This is not quite the nicest way for our special
use case and our code base.
Waiting for the local port to become open and ready (through
/proc/net/tcp[6]] as a proof of concept is not enough, even if the
port is in the listening state and should theoretically accept
connections this still failed often as the tunnel was not yet fully
ready.

Further another problem would still be open if we tried to patch the
SSH Forward method we currently use - which we solve for free with
the approach of this patch - namely the problem that the method
to get an available port (next_migration_port) has a serious race
condition which could lead to multiple use of the same port on a
parallel migration (I observed this on my many test, seldom but if
it happens its really bad).

So lets now use UNIX sockets, which ssh supports since version 5.7.
The end points are UNIX socket bound to the VMID - thus no port so
no race and also no limitation of available ports (we reserved 50 for
migration).

The endpoints get created in /run/qemu-server/VMID.migrate and as
KVM/QEMU in current versions is able to use UNIX socket just as well
as TCP we have not to change much on the interaction with QEMU.
QEMU is started with the migrate_incoming url at the local
destination endpoint and creates the socket file, we then create
a listening socket on the source side and connect over SSH to the
destination.
Now the migration can be started by issuing the migrate qmp command
with an updated uri.

This breaks live migration from new to old, but *not* from old to
new, so there is a upgrade path.
If a live migration from new to old must be made (for whatever
reason), use the unsecure_migration setting (man datacenter.conf)
to allow this, although that should only be done in trusted network.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2016-06-03 11:51:46 +02:00
Wolfgang Bumiller
8e59d952be use enter_systemd_scope instead of systemd-run
With systemd-run qemu's --daemonize forks often happen
before systemd finishes setting up the scopes, which means
the limits we apply often don't work.
We now use enter_systemd_scope() to create the scope before
running qemu directly without systemd-run.

Note that vm_start() runs in a forked-worker or qm cli
command, so entering the scope in such a process should not
affect the rest of the pve daemon.
2016-06-03 11:41:31 +02:00
Dominik Csapak
596a0a2056 do not ignore hotplug parse errors
if we got an option which was not valid, we still
wrote it to the config, and subsequently returned
it on every api call

instead, now we die instead of warn and do not accept
invalid options

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2016-05-31 12:15:32 +02:00
Alexandre Derumier
0567a4d572 move memory config generation to QemuServer::Memory::config
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2016-05-23 10:04:31 +02:00
Alexandre Derumier
6779f1ac3c move qemu_memory_hotplug && qemu_dimm_list to QemuServer::Memory
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2016-05-23 10:03:40 +02:00
Alexandre Derumier
3f669af25d move foreach_dimm && foreach_reverse_dimm to QemuServer::Memory
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2016-05-23 10:02:29 +02:00
Dietmar Maurer
faab53066c hostpci docs: move notes into verbose_description 2016-05-20 11:59:30 +02:00
Dietmar Maurer
fad17f04fc add full path reference to datacenter.conf file 2016-05-19 16:27:30 +02:00
Dietmar Maurer
522619458c improve documentation 2016-05-19 13:13:25 +02:00
Dominik Csapak
9f41a659a1 allow VLAN 1 tag in qemu-kvm vms
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2016-05-18 11:25:29 +02:00
Fabian Grünbichler
19333c9b82 add --description to systemd scope unit
otherwise, long kvm commands lead to systemd unit files with
very long lines, with confuses the systemd unit file parser.

apparently systemd has a length limit for unit file lines and
(line-)breaks the description string at that point. since
the rest of the description is probably not a valid key/value
pair, this leads to warnings. the default semantics of systemd-run
is to use the executed command as description unless a description
is specified explicitly.

note that this behaviour of systemd could allow an attacker
with access to the VM configuration to craft a kvm commandline
that starts or stops arbitrary systemd units.
2016-05-14 09:02:58 +02:00
Dietmar Maurer
30983c3bac remove unneeded keyAlias option 2016-05-11 13:04:59 +02:00
Dietmar Maurer
7f694a7113 fix #975, use new keyAlias feature.
Also remove unneccessary format_descriptions for boolean and enums.
2016-05-11 10:11:49 +02:00
Dominik Csapak
e7a5104daa add warning for iothread with invalid scsi controller
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2016-05-04 11:11:32 +02:00
Dietmar Maurer
8930da746f correctly set cpu vendor 2016-05-01 09:24:25 +02:00
Wolfgang Bumiller
3c525055dd restore: pass format to vma extract
This silences the "probing guessed raw" warnings of
'qmrestore'.
2016-04-29 09:02:34 +02:00
Alexandre Derumier
2b401189e3 vm_start : force systemctl stop if orphan scope exist
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
2016-04-22 11:09:09 +02:00
Dominik Csapak
ffa42b860d fix #947: reenable disk/cdrom passthrough
previously, we did not check the file parameter of a disk,
allowing passthrough of a block device (by design)

with the change to the json parser for the disks, the format
became 'pve-volume-id' which is only valid for our volume ids
(and later we also allowed the value 'none')

this patch alternatively checks if the parameter is a path
or 'cdrom'

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2016-04-21 11:54:55 +02:00
Fabian Grünbichler
c7a8aad601 docs: cleanup 2016-04-15 16:37:41 +02:00
Wolfgang Bumiller
2e953867ad Fix #848: deactivate old volume after clone before deletion
Otherwise some move operations will fail to delete the old
disk (eg. when moving from ceph to local storage).

Note that in order for the deactivation to succeed we need
to make sure qemu has closed its file descriptors, so we
need to wait for the job to disappear the same way we do in
$cancel_job().
Factored the waiting out into $finish_job().
2016-04-13 08:24:13 +02:00
Wolfgang Bumiller
ec3582b52a property string update: watchdog 2016-04-01 09:31:40 +02:00
Wolfgang Bumiller
1f4f447b58 property string update: hostpci*
This commit changes the listing of virtual functions from
multiple host= entries to one semicolon-separated host list.
2016-04-01 09:31:25 +02:00
Wolfgang Bumiller
cd9c34d186 property string update: net*
This requires the new 'group' schema mechanism.
2016-04-01 09:31:06 +02:00
Wolfgang Bumiller
ffc0d8c793 property string update: numa*
Additionally since the cpu and host node list isn't
restricted to a single range one can now provide multipel
ranges separated by semicolons. (eg. cpus=0-3;5;7)
2016-04-01 09:30:45 +02:00
Wolfgang Bumiller
822c8a0776 drive schema: allow 'none' again 2016-04-01 09:30:01 +02:00
Wolfgang Bumiller
bb9207e0e1 cputype: format_description to avoid huge enum in manpage 2016-04-01 09:27:35 +02:00
Wolfgang Bumiller
ff6ffe20c9 cleanup: naming consistency 2016-04-01 09:27:12 +02:00
Wolfgang Bumiller
93c0971cec fix a few property string descriptions 2016-04-01 09:26:51 +02:00
Fabian Grünbichler
20519efc76 use PVE::Storage::config(), not cfs_read_file() 2016-03-30 10:37:22 +02:00
Dietmar Maurer
8a61e0fd38 use asciidoc compatible markup
s/Note:/NOTE:/
2016-03-23 10:22:17 +01:00
Wolfgang Bumiller
ba8fc5d13e limit serial and model and document their real limits
The urlencoded format currently cannot check the real
decoded length, so we limit to an upper bound and document
the real limits. Ideally we'd introduce a decodedLength
schema parameter at some point...
2016-03-21 11:19:55 +01:00
Wolfgang Bumiller
988e2714ad clone: use the zeroinit filter for sparseinit storages 2016-03-21 09:59:49 +01:00
Wolfgang Bumiller
46630a5fd4 cfg: use the 'urlencoded' format for drive model and serial 2016-03-21 09:01:15 +01:00
Wolfgang Bumiller
918d09150e clanup: qemu_drive_options is only used inside the one function
and it doesn't contain 'bootindex'
2016-03-21 09:00:47 +01:00
Thomas Lamprecht
1917695c93 Fix some typos in JSON schema descriptions
/cirrur/cirrus/
/devive/device/
/Numa/NUMA/
and a few grammar fixes, rewrites of sentences

Also if already touching those lines lets break them up from one
liners to a column limit of ~80.
2016-03-16 16:46:08 +01:00
Fabian Grünbichler
e79706d47a Use has_lock to check for specific lock 2016-03-14 09:03:28 +01:00
Wolfgang Bumiller
4f4fbeb048 fix #909: pass rate to tap_plug()
When using OVS tap_plug() resets rate limiting so we need
to pass it along to reapply it.

The rate on its own can still be hot-plugged with the
regular tap_rate_limit() call.
2016-03-08 15:52:31 +01:00
Fabian Grünbichler
8793d4950e Refactor add_unused_volume
Drop add_unused_volume from PVE::QemuServer in favor of
(identical) implementation in PVE::AbstractConfig
2016-03-08 11:42:51 +01:00
Fabian Grünbichler
b2c9558da8 Rework snapshot code, has_feature
Drop snapshot_create, snapshot_delete and snapshot_rollback
in favour of PVE::AbstractConfig. Qemu-specific parts are
implemented in __snapshot_XX methods in PVE::QemuConfig.

has_feature is made an implementation of the abstract
has_feature, and thus moves to PVE::QemuConfig.

Note: a new hook method needed to be introduced to be called
before creating a volume snapshot, after creating a volume
snapshot, and after unfreezing the guestfs after creating a
volume snapshot. The base method in PVE::AbstractConfig is a
noop, the implemention in PVE::QemuConfig runs the necessary
Qemu monitor commands.
2016-03-08 11:42:37 +01:00
Fabian Grünbichler
ffda963f46 Refactor basic config-related methods
Drop load_config, write_config, lock_config[_xx],
check_lock, check_protection, is_template and config_file
in favour of implementions in PVE::AbstractConfig.

Implement guest_type, __config_max_unused_disks,
config_file_lock and cfs_config_path from
PVE::AbstractConfig in PVE::QemuConfig.
2016-03-08 11:41:59 +01:00
Fabian Grünbichler
74479ee9bb Make foreach_drive order deterministic
Previously, foreach_drive iterated over all configuration
keys (in a random order) and checked whether the current key
is a valid drive name. Instead, we now iterate over a list
of valid drive names (with deterministic order) and check
whether a drive with such a name exists in the
configuration.

Also rename the two involved methods from valid_drive_name
to is_valid_drive_name (for the check) and from disknames
to valid_drive_names (for the list of valid keys), for
consistency. These two were only used in the qemu-server
code base.
2016-03-04 06:25:48 +01:00
Fabian Grünbichler
521c52e09c Remove dead code
This sub is not used anywhere.
2016-03-01 09:31:09 +01:00
Fabian Grünbichler
ff9922861a Don't apply snapshot config in snapshot_commit
We hold a lock from snapshot_prepare until snapshot_commit,
so there is no need to copy back the snapshot config to the
actual config. This allows to drop a workaround for not
copying the 'machine' type config option.
2016-03-01 08:37:05 +01:00