replication: update last_sync before removing old replication snapshots

If pvesr was terminated after finishing with the new sync and after
removing old replication snapshots, but before it could write the new
state, the next replication would fail. It would wrongly interpret the
actual last replication snapshot as stale, remove it, and (if no other
snapshots are present) attempt a full sync, which would fail.

Reported in the community forum [0], this was brought to light by the
new pvescheduler before it learned graceful reload.

It's not possible to simply preserve a last remaining snapshot in
prepare(), because prepare() is also used for valid removals. Instead,
update last_sync early enough. Stale snapshots will still be removed
on the next run if there are any.

[0]: https://forum.proxmox.com/threads/100154

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
This commit is contained in:
Fabian Ebner 2021-11-26 11:52:30 +01:00 committed by Fabian Grünbichler
parent 7d604b5bbd
commit ff574bf8d2
2 changed files with 10 additions and 0 deletions

View File

@ -372,6 +372,9 @@ sub replicate {
die $err;
}
# Ensure that new sync is recorded before removing old replication snapshots.
PVE::ReplicationState::record_sync_end($jobcfg, $state, $start_time);
# remove old snapshots because they are no longer needed
$cleanup_local_snapshots->($last_snapshots, $last_sync_snapname);

View File

@ -159,6 +159,13 @@ sub delete_guest_states {
PVE::Tools::lock_file($state_lock, 10, $code);
}
sub record_sync_end {
my ($jobcfg, $state, $start_time) = @_;
$state->{last_sync} = $start_time;
write_job_state($jobcfg, $state);
}
sub record_job_end {
my ($jobcfg, $state, $start_time, $duration, $err) = @_;