Commit Graph

207 Commits

Author SHA1 Message Date
Dominik Csapak
3fa4ce4516 add snapshot rollback hook and remove qemu machine code
instead move the QEMU machine logic inside qemu-server package with
the help of the new snapshot rollback hook

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2018-09-17 15:21:22 +02:00
Thomas Lamprecht
bc6185bb38 bump version to 2.0-17 2018-06-19 14:07:19 +02:00
Thomas Lamprecht
cbbb06a5ae add create_and_lock_config
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2018-06-12 09:45:40 +02:00
Wolfgang Bumiller
24691c2141 cleanup: ReplicationConfig: be specific about write_config
Since it doesn't write but returns the text to be written,
let's be specific about the fact that we're returning a
value.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-05-14 13:51:26 +02:00
Fabian Grünbichler
48b638587f bump version to 2.0-16 2018-05-14 11:15:20 +02:00
Wolfgang Link
7319c02cfd Add lock to pervent lost update. 2018-05-09 15:10:30 +02:00
Wolfgang Link
7bbb0cd6a9 Swap source and target in replication config, if VM was stolen. 2018-05-09 15:10:30 +02:00
Wolfgang Link
286a9ab991 Add function: swap source and target in replication config 2018-05-09 15:10:30 +02:00
Wolfgang Link
d5b277dc99 Get snapshots when no state is available.
With this patch we can restore the state of a state less job.
It may happen that there are more replication snapshots,
because no job state is known can not delete any snapshot.
Existing multiple replication-snapshot happens
when a node fails in middel of a replication
and then the VM is moved to another node.
That's why we have to test if we have a common base on both nodes.
Given this, we take this as a replica state.
After we have a state again, the rest of the snapshots can be deleted on the next run.
2018-05-09 15:10:30 +02:00
Wolfgang Link
a1dfeff3a8 Delete replication snapshots only if last_sync is not 0.
If last_sync is 0, the VM configuration has been stolen
(either manually or by HA restoration).
Under this condition, the replication snapshot should not be deleted.
This snapshot is used to restore replication state.
If the last_snap is greater than 0 and does not match the snap name
it must be a remnant of an earlier sync and should be deleted.
2018-05-09 15:10:30 +02:00
Wolfgang Link
4ea5167ef0 Add config parameter 'source'.
This parameter is useful for restoring the replication status.
It is also corrected if it is missing or wrong.
2018-05-09 15:10:30 +02:00
Wolfgang Link
d869a19c9e Cleanup for stateless jobs.
If a VM configuration has been manually moved or recovered by HA,
there is no status on this new node.
In this case, the replication snapshots still exist on the remote side.
It must be possible to remove a job without status,
otherwise, a new replication job on the same remote node will fail
and the disks will have to be manually removed.
When searching through the sorted_volumes generated from the VMID.conf,
we can be sure that every disk will be removed in the event
of a complete job removal on the remote side.

In the end, the remote_prepare_local_job calls on the remote side a prepare.
2018-05-09 15:10:30 +02:00
Dietmar Maurer
235b2c059e bump version to 2.0-15 2018-04-16 11:48:13 +02:00
Dietmar Maurer
edd61f2b3a Replication.pm: code cleanup 2018-04-16 10:52:24 +02:00
Dietmar Maurer
c1797f7a4d PVE/Replication.pm: fix error message 2018-04-16 10:48:49 +02:00
Wolfgang Link
ce22af0895 fix #1694: make failure of snapshot removal non-fatal
In certain high-load scenarios ANY ZFS operation can block,
including registering an (async) destroy.
Since ZFS operations are implemented via ioctl's,
killing the user space process
does not affect the waiting kernel thread processing the ioctl.

Once "zfs destroy" has been called, killing it does not say anything
about whether the destroy operation will be aborted or not.
Since running into a timeout effectively means killing it,
we don't know whether the snapshot exists afterwards or not.
We also don't know how long it takes for ZFS to catch up on pending ioctls.

Given the above problem, we must to not die on errors when deleting a no
longer needed snapshot fails (under a timeout) after an otherwise
successful replication. Since we retry on the next run anyway, this is
not problematic.

The snapshot deletion error will be logged in the replication log
and the syslog/journal.
2018-04-16 10:40:48 +02:00
Thomas Lamprecht
c8a71da5ce vzdump: add common log sub-method
Add a general log method here which supports to pass on the "log to
syslog too" functionality and makes it more clear what each
parameter of logerr and logginfo means.

Further, we can now also log wlith a 'warn' level, which can be
useful to notice an backup user of a possible problem which isn't a
error per se, but may need the users attention.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2017-12-15 12:11:22 +01:00
Thomas Lamprecht
8044127d7d vzdump: allow all defined log levels
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2017-12-15 12:11:22 +01:00
Wolfgang Bumiller
a2dd551b0a bump version to 2.0-14
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2017-12-13 14:51:50 +01:00
Wolfgang Link
ac02a68e07 Remove noerr form replication.
We will handle this errors in the API and decide what to do.

Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2017-12-13 14:51:34 +01:00
Fabian Grünbichler
01132427b5 bump version to 2.0-13
and add versioned dependency on libpve-storage-perl for storage_migrate
signature change (added logfunc parameter).
2017-10-17 15:04:41 +02:00
Wolfgang Bumiller
81228d280f replication: purge states: verify the vmlist
Instead of clearing out the local state if the last
cfs_update failed.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2017-10-17 14:00:39 +02:00
Wolfgang Link
aa0d516fc5 Add logfunc in storage_migration.
This will redirect export and import output to the correct log, instead of paring it into the syslog.
2017-10-16 15:00:57 +02:00
Fabian Grünbichler
a9038010f0 build: reformat debian/control
using wrap-and-sort -abt
2017-10-04 11:05:33 +02:00
Wolfgang Bumiller
b200ca58e5 bump version to 2.0-12 2017-09-21 09:48:08 +02:00
Thomas Lamprecht
047ee481a6 VZDump/Plugin: avoid cyclic dependency
pve-guest-common is above qemu-server, pve-container and thus also
pve-manager in the package hierarchy.
The latter hosts PVE::VZDump, so using it here adds a cyclic
dependency between pve-manager and pve-guest-common.

Move the log method to the base plugin class and inline the
run_command function directly do the plugins cmd method.

pve-manager's PVE::VZDump may use this plugins static log function
then instead of its own copy.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2017-09-21 09:48:08 +02:00
Thomas Lamprecht
71dd5d907b AbstractMigrate: remove unused obsolete variables 2017-09-20 12:39:11 +02:00
Thomas Lamprecht
ee966a3f7a AbstractMigrate: do not overwrite global signal handlers
perls 'local' must be either used in front of each $SIG{...}
assignments or they must be put in a list, else it affects only the
first variable and the rest are *not* in local context.

This may cause weird behaviour where daemons seemingly do not get
terminating signals delivered correctly and thus may not shutdown
gracefully anymore.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2017-09-07 10:32:08 +02:00
Alwin Antreich
4c3144eaa6 Fix #1480: die if snapshot name is not found before set_lock is used
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2017-09-01 09:06:24 +02:00
Wolfgang Bumiller
11ae6ca525 bump version to 2.0-11 2017-07-03 14:51:10 +02:00
Wolfgang Bumiller
d91bac5053 replication: we must call storage_migrate with with_snapshots true 2017-07-03 11:58:41 +02:00
Dietmar Maurer
9146f8ced6 bump version to 2.0-10 2017-06-29 10:57:07 +02:00
Thomas Lamprecht
23ca78cd25 replication job_status: add get_disabled parameter
allows the API/frontend to get the disabled jobs easier

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2017-06-29 10:42:19 +02:00
Dietmar Maurer
03f417a990 bump version to 2.0-9 2017-06-29 07:29:29 +02:00
Dietmar Maurer
b3ed460ed0 Revert "Add guest type at find_local_replication_job"
This reverts commit 914b6647a4.

No longer required.
2017-06-29 07:27:16 +02:00
Dietmar Maurer
6358ffe1cb PVE::Replication - do not use $jobcfg->{vmtype} 2017-06-29 07:26:51 +02:00
Wolfgang Bumiller
c34b13dbed bump version to 2.0-8 2017-06-28 14:32:56 +02:00
Wolfgang Link
914b6647a4 Add guest type at find_local_replication_job
We need this at migration time.
2017-06-28 14:29:45 +02:00
Dietmar Maurer
9622dffbb4 bump version to 2.0-7 2017-06-28 12:47:05 +02:00
Dietmar Maurer
40bcf6526b fix previous commit 2017-06-28 12:05:18 +02:00
Dietmar Maurer
22ce136731 replication: improve schedule_job_now
do no not modify anything if there is no state
2017-06-28 12:01:50 +02:00
Wolfgang Bumiller
b90dc712c5 replication: add schedule_job_now helper 2017-06-28 11:54:11 +02:00
Wolfgang Bumiller
621b955fb8 replication: sort time stamps numerically 2017-06-28 09:52:17 +02:00
Dietmar Maurer
1b82f17117 replication: pass $noerr to run_replication_nolock 2017-06-28 07:54:11 +02:00
Wolfgang Link
14849765e5 Add new function delete_guest_states. 2017-06-27 12:51:44 +02:00
Wolfgang Bumiller
fd844180a7 replication: don't sync to offline targets on error states
There's no point in trying to replicate to a target node
which is offline. Note that if we're not already in an
error state we do still give it a try in order for this to
get logged as an error at least once.
2017-06-27 12:13:24 +02:00
Wolfgang Bumiller
3385399339 replication: keep retrying every 30 minutes in error state
Otherwise we never get out of it.
2017-06-27 12:13:24 +02:00
Dietmar Maurer
92a243e986 PVE::ReplicationState - cleanup job state on job removal
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
2017-06-27 11:53:28 +02:00
Dietmar Maurer
44972014b2 PVE::ReplicationState::purge_old_states - new helper 2017-06-27 10:15:01 +02:00
Dietmar Maurer
2c508173ea PVE::ReplicationState::write_job_state - allow to remove state completely 2017-06-27 08:13:36 +02:00