From 2957ef8041443af97feb6ff29dffbdee2b333fd2 Mon Sep 17 00:00:00 2001 From: Thomas Lamprecht Date: Tue, 14 Jun 2016 16:57:45 +0200 Subject: [PATCH] ha-manager: add section for recovery after fencing Describe how and why nodes get selected on a recovery of a fenced service Signed-off-by: Thomas Lamprecht --- ha-manager.adoc | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/ha-manager.adoc b/ha-manager.adoc index 53ee319..5db5b05 100644 --- a/ha-manager.adoc +++ b/ha-manager.adoc @@ -350,6 +350,24 @@ If you have a hardware watchdog available remove its kernel module from the blacklist, load it with insmod and restart the 'watchdog-mux' service or reboot the node. +Recover Fenced Services +~~~~~~~~~~~~~~~~~~~~~~~ + +After a node failed and its fencing was successful we start to recover services +to other available nodes and restart them there so that they can provide service +again. + +The selection of the node on which the services gets recovered is influenced +by the users group settings, the currently active nodes and their respective +active service count. +First we build a set out of the intersection between user selected nodes and +available nodes. Then the subset with the highest priority of those nodes +gets chosen as possible nodes for recovery. We select the node with the +currently lowest active service count as a new node for the service. +That minimizes the possibility of an overload, which else could cause an +unresponsive node and as a result a chain reaction of node failures in the +cluster. + Groups ------