Commit Graph

13 Commits

Author SHA1 Message Date
Dominik Csapak
c61c192e17 fix #4026: add 'repeat-missed' option for jobs
like systemd-timers 'persistent'. so that the user can configure it to not be
run after powering up when it was previously missed

this reverses the default behaviour to not run missed jobs after pvescheduler
was started, since most of the time that's not the desired behaviour

since we don't use it for updated schedules anymore, rename
'updated_job_schedule' to 'update_last_runtime'

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Reviewed-by: Fabian Ebner <f.ebner@proxmox.com>
2022-06-17 17:21:56 +02:00
Thomas Lamprecht
7de8b7301c pvescheduler: use private sub instead of code-ref
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-23 08:59:49 +01:00
Thomas Lamprecht
9c1943935c pvescheduler: fix potential stall on full shutdown
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-23 08:59:18 +01:00
Thomas Lamprecht
d4eb0c1993 pvescheduler: record some todos and small cleanup
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-22 20:25:34 +01:00
Thomas Lamprecht
427a5cb429 pvescheduler: make jobs tracking more flexible, rework stop
Avoid hard-coding the current implication of the replication stack to
not get started again until the old worker is done..

We still apply the same check, but changing that to let the jobs have
control is rather easy now.

Also rework the stop logic, send terminate to _all_ workers and make
the timeout a actual shared one (not first gets all, remaining get
kill) and send a kill to the stuck, leftover ones in one go at the
end, including some logging so that the admin can actually know about
this non-ideal situation.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-22 20:15:30 +01:00
Thomas Lamprecht
7d546fb5fd pvescheduler: do not delay restart artifically
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-22 20:14:40 +01:00
Dominik Csapak
983ad9b91b pvescheduler: implement graceful reloading
utilize PVE::Daemons 'hup' functionality to reload gracefully.

Leaves the children running (if any) and give them to the new instance
via ENV variables. After loading, check if they are still around

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-22 17:19:12 +01:00
Dominik Csapak
4af87395e9 pvescheduler: reworking child pid tracking
previously, systemd timers were responsible for running replication jobs.
those timers would not restart if the previous one is still running.

though trying again while it is running does no harm really, it spams
the log with errors about not being able to acquire the correct lock

to fix this, we rework the handling of child processes such that we only
start one per loop if there is currently none running. for that,
introduce the types of forks we do and allow one child process per type
(for now, we have 'jobs' and 'replication' as types)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-22 17:19:12 +01:00
Dominik Csapak
b8981dbd60 pvescheduler: catch errors in forked childs
if '$sub' dies, the error handler of PVE::Daemon triggers, which
initiates a shutdown of the child, resulting in confusing error logs
(e.g. 'got shutdown request, signal running jobs to stop')

instead, run it under 'eval' and print the error to the sylog instead

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-22 17:19:12 +01:00
Thomas Lamprecht
727673eb4f jobs: code/style cleanups
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-15 16:12:42 +01:00
Fabian Ebner
db101be037 pvescheduler: simplify code for sleep time calculation
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-11-11 21:04:34 +01:00
Dominik Csapak
fa7d54564a pvescheduler: run jobs from jobs.cfg
PVE/Jobs is responsible to decide if the job must run (e.g. with a
schedule)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-10 16:11:00 +01:00
Thomas Lamprecht
6385fb8183 replace systemd timer with pvescheduler daemon
The whole thing is already prepared for this, the systemd timer was
just a fixed periodic timer with a frequency of one minute. And we
just introduced it as the assumption was made that less memory usage
would be generated with this approach, AFAIK.

But logging 4+ lines just about that the timer was started, even if
it does nothing, and that 24/7 is not to cheap and a bit annoying.

So in a first step add a simple daemon, which forks of a child for
running jobs once a minute.
This could be made still a bit more intelligent, i.e., look if we
have jobs tor run before forking - as forking is not the cheapest
syscall. Further, we could adapt the sleep interval to the next time
we actually need to run a job (and sending a SIGUSR to the daemon if
a job interval changes such, that this interval got narrower)

We try to sync running on minute-change boundaries at start, this
emulates systemd.timer behaviour, we had until now. Also user can
configure jobs on minute precision, so they probably expect that
those also start really close to a minute change event.
Could be adapted to resync during running, to factor in time drift.
But, as long as enough cpu cycles are available we run in correct
monotonic intervalls, so this isn't a must, IMO.

Another improvement could be locking a bit more fine grained, i.e.
not on a per-all-local-job-runs basis, but per-job (per-guest?)
basis, which would improve temporary starvement  of small
high-periodic jobs through big, less peridoci jobs.
We argued that it's the user fault if such situations arise, but they
can evolve over time without noticing, especially in compolexer
setups.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-10 16:11:00 +01:00