votequorum: fix expected_votes propagation

it is not correct to randomly accept expected_votes from any node in
the cluster. We can only allow expected_votes from quorate nodes.

A quorate cluster is "always" right and have the correct expected_votes.

One of the different bug triggers:

quorum {
  expected_votes: 8
  auto_tie_breaker: 1
  last_man_standing: 1
}

start all 8 nodes.
clean shut down 2 nodes.
wait for lms to kick in.
kill 3 nodes with highest nodeid
(we want to retain a quorate partition of 3 nodes)
start one node again -> cluster will be unquorate

This happens because the node rebooting/rejoining with
non current cluster status will propagate an expected_votes of 8,
while in reality the cluster is down to expected_votes: 3.

4 nodes are still < 5 (quorum for 8 nodes/votes).

In order to avoid this condition, we need to exchange expected_votes
information among nodes but we cannot randomly trust everybody.

1) Allow expected_votes to be changed cluster-wide only if the
   information is coming from a quorate node.
2) Fix node->expected_votes based on quorate status
3) allow a joining node to decrease quorum and expected_votes
   if the node is not yet quorate, but it's joining a quorate
   cluster

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This commit is contained in:
Fabio M. Di Nitto 2012-01-26 13:14:26 +01:00
parent 88e6830df1
commit b05477859f

View File

@ -1016,6 +1016,7 @@ static void message_handler_req_exec_votequorum_nodeinfo (
int old_expected;
nodestate_t old_state;
int new_node = 0;
int allow_downgrade = 0;
ENTER();
@ -1038,9 +1039,20 @@ static void message_handler_req_exec_votequorum_nodeinfo (
/* Update node state */
node->votes = req_exec_quorum_nodeinfo->votes;
node->expected_votes = req_exec_quorum_nodeinfo->expected_votes;
node->state = NODESTATE_MEMBER;
if ((!cluster_is_quorate) &&
(req_exec_quorum_nodeinfo->quorate)) {
allow_downgrade = 1;
us->expected_votes = req_exec_quorum_nodeinfo->expected_votes;
}
if (req_exec_quorum_nodeinfo->quorate) {
node->expected_votes = req_exec_quorum_nodeinfo->expected_votes;
} else {
node->expected_votes = us->expected_votes;
}
log_printf(LOGSYS_LEVEL_DEBUG, "nodeinfo message: votes: %d, expected: %d wfa: %d quorate: %d",
req_exec_quorum_nodeinfo->votes,
req_exec_quorum_nodeinfo->expected_votes,
@ -1064,7 +1076,7 @@ static void message_handler_req_exec_votequorum_nodeinfo (
old_votes != node->votes ||
old_expected != node->expected_votes ||
old_state != node->state) {
recalculate_quorum(0, 0);
recalculate_quorum(allow_downgrade, 0);
}
if (!nodeid) {