mirror of
https://git.proxmox.com/git/mirror_corosync
synced 2025-08-14 23:22:18 +00:00

Backtick and apostrophe are formatted as directional quotes by plain groff, but they behave literally in the body of a man page. Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>
411 lines
13 KiB
Groff
411 lines
13 KiB
Groff
.\"/*
|
|
.\" * Copyright (c) 2012-2014 Red Hat, Inc.
|
|
.\" *
|
|
.\" * All rights reserved.
|
|
.\" *
|
|
.\" * Authors: Christine Caulfield <ccaulfie@redhat.com>
|
|
.\" * Fabio M. Di Nitto <fdinitto@redhat.com>
|
|
.\" *
|
|
.\" * This software licensed under BSD license, the text of which follows:
|
|
.\" *
|
|
.\" * Redistribution and use in source and binary forms, with or without
|
|
.\" * modification, are permitted provided that the following conditions are met:
|
|
.\" *
|
|
.\" * - Redistributions of source code must retain the above copyright notice,
|
|
.\" * this list of conditions and the following disclaimer.
|
|
.\" * - Redistributions in binary form must reproduce the above copyright notice,
|
|
.\" * this list of conditions and the following disclaimer in the documentation
|
|
.\" * and/or other materials provided with the distribution.
|
|
.\" * - Neither the name of the MontaVista Software, Inc. nor the names of its
|
|
.\" * contributors may be used to endorse or promote products derived from this
|
|
.\" * software without specific prior written permission.
|
|
.\" *
|
|
.\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
.\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
.\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
.\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
.\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
.\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
.\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
.\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
|
|
.\" * THE POSSIBILITY OF SUCH DAMAGE.
|
|
.\" */
|
|
.TH VOTEQUORUM 5 2018-12-14 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual"
|
|
.SH NAME
|
|
votequorum \- Votequorum Configuration Overview
|
|
.SH OVERVIEW
|
|
The votequorum service is part of the corosync project. This service can be optionally loaded
|
|
into the nodes of a corosync cluster to avoid split-brain situations.
|
|
It does this by having a number of votes assigned to each system in the cluster and ensuring
|
|
that only when a majority of the votes are present, cluster operations are allowed to proceed.
|
|
The service must be loaded into all nodes or none. If it is loaded into a subset of cluster nodes
|
|
the results will be unpredictable.
|
|
.PP
|
|
The following corosync.conf extract will enable votequorum service within corosync:
|
|
.PP
|
|
.nf
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
}
|
|
.fi
|
|
.PP
|
|
votequorum reads its configuration from corosync.conf. Some values can be changed at runtime, others
|
|
are only read at corosync startup. It is very important that those values are consistent
|
|
across all the nodes participating in the cluster or votequorum behavior will be unpredictable.
|
|
.PP
|
|
votequorum requires an expected_votes value to function, this can be provided in two ways.
|
|
The number of expected votes will be automatically calculated when the nodelist { } section is
|
|
present in corosync.conf or expected_votes can be specified in the quorum { } section. Lack of
|
|
both will disable votequorum. If both are present at the same time,
|
|
the quorum.expected_votes value will override the one calculated from the nodelist.
|
|
.PP
|
|
Example (no nodelist) of an 8 node cluster (each node has 1 vote):
|
|
.nf
|
|
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
expected_votes: 8
|
|
}
|
|
.fi
|
|
.PP
|
|
Example (with nodelist) of a 3 node cluster (each node has 1 vote):
|
|
.nf
|
|
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
}
|
|
|
|
nodelist {
|
|
node {
|
|
ring0_addr: 192.168.1.1
|
|
}
|
|
node {
|
|
ring0_addr: 192.168.1.2
|
|
}
|
|
node {
|
|
ring0_addr: 192.168.1.3
|
|
}
|
|
}
|
|
.fi
|
|
.SH SPECIAL FEATURES
|
|
.PP
|
|
.B two_node: 1
|
|
.PP
|
|
Enables two node cluster operations (default: 0).
|
|
.PP
|
|
The "two node cluster" is a use case that requires special consideration.
|
|
With a standard two node cluster, each node with a single vote, there
|
|
are 2 votes in the cluster. Using the simple majority calculation
|
|
(50% of the votes + 1) to calculate quorum, the quorum would be 2.
|
|
This means that the both nodes would always have
|
|
to be alive for the cluster to be quorate and operate.
|
|
.PP
|
|
Enabling two_node: 1, quorum is set artificially to 1.
|
|
.PP
|
|
Example configuration 1:
|
|
|
|
.nf
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
expected_votes: 2
|
|
two_node: 1
|
|
}
|
|
.fi
|
|
|
|
.PP
|
|
Example configuration 2:
|
|
|
|
.nf
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
two_node: 1
|
|
}
|
|
|
|
nodelist {
|
|
node {
|
|
ring0_addr: 192.168.1.1
|
|
}
|
|
node {
|
|
ring0_addr: 192.168.1.2
|
|
}
|
|
}
|
|
.fi
|
|
.PP
|
|
NOTES: enabling two_node: 1 automatically enables wait_for_all. It is
|
|
still possible to override wait_for_all by explicitly setting it to 0.
|
|
If more than 2 nodes join the cluster, the two_node option is
|
|
automatically disabled.
|
|
.PP
|
|
.B wait_for_all: 1
|
|
.PP
|
|
Enables Wait For All (WFA) feature (default: 0).
|
|
.PP
|
|
The general behaviour of votequorum is to switch a cluster from inquorate to quorate
|
|
as soon as possible. For example, in an 8 node cluster, where every node has 1 vote,
|
|
expected_votes is set to 8 and quorum is (50% + 1) 5. As soon as 5 (or more) nodes
|
|
are visible to each other, the partition of 5 (or more) becomes quorate and can
|
|
start operating.
|
|
.PP
|
|
When WFA is enabled, the cluster will be quorate for the first time
|
|
only after all nodes have been visible at least once at the same time.
|
|
.PP
|
|
This feature has the advantage of avoiding some startup race conditions, with the cost
|
|
that all nodes need to be up at the same time at least once before the cluster
|
|
can operate.
|
|
.PP
|
|
A common startup race condition based on the above example is that as soon as 5
|
|
nodes become quorate, with the other 3 still offline, the remaining 3 nodes will
|
|
be fenced.
|
|
.PP
|
|
It is very useful when combined with last_man_standing (see below).
|
|
.PP
|
|
Example configuration:
|
|
.nf
|
|
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
expected_votes: 8
|
|
wait_for_all: 1
|
|
}
|
|
.fi
|
|
.PP
|
|
.B last_man_standing: 1
|
|
/
|
|
.B last_man_standing_window: 10000
|
|
.PP
|
|
Enables Last Man Standing (LMS) feature (default: 0).
|
|
Tunable last_man_standing_window (default: 10 seconds, expressed in ms).
|
|
.PP
|
|
The general behaviour of votequorum is to set expected_votes and quorum
|
|
at startup (unless modified by the user at runtime, see below) and use
|
|
those values during the whole lifetime of the cluster.
|
|
.PP
|
|
Using for example an 8 node cluster where each node has 1 vote, expected_votes
|
|
is set to 8 and quorum to 5. This condition allows a total failure of 3
|
|
nodes. If a 4th node fails, the cluster becomes inquorate and it will
|
|
stop providing services.
|
|
.PP
|
|
Enabling LMS allows the cluster to dynamically recalculate expected_votes
|
|
and quorum under specific circumstances. It is essential to enable
|
|
WFA when using LMS in High Availability clusters.
|
|
.PP
|
|
Using the above 8 node cluster example, with LMS enabled the cluster can retain
|
|
quorum and continue operating by losing, in a cascade fashion, up to 6 nodes with
|
|
only 2 remaining active.
|
|
.PP
|
|
Example chain of events:
|
|
.nf
|
|
1) cluster is fully operational with 8 nodes.
|
|
(expected_votes: 8 quorum: 5)
|
|
|
|
2) 3 nodes die, cluster is quorate with 5 nodes.
|
|
|
|
3) after last_man_standing_window timer expires,
|
|
expected_votes and quorum are recalculated.
|
|
(expected_votes: 5 quorum: 3)
|
|
|
|
4) at this point, 2 more nodes can die and
|
|
cluster will still be quorate with 3.
|
|
|
|
5) once again, after last_man_standing_window
|
|
timer expires expected_votes and quorum are
|
|
recalculated.
|
|
(expected_votes: 3 quorum: 2)
|
|
|
|
6) at this point, 1 more node can die and
|
|
cluster will still be quorate with 2.
|
|
|
|
7) one more last_man_standing_window timer
|
|
(expected_votes: 2 quorum: 2)
|
|
.fi
|
|
.PP
|
|
NOTES: In order for the cluster to downgrade automatically from 2 nodes
|
|
to a 1 node cluster, the auto_tie_breaker feature must also be enabled (see below).
|
|
If auto_tie_breaker is not enabled, and one more failure occurs, the
|
|
remaining node will not be quorate. LMS does not work with asymmetric voting
|
|
schemes, each node must vote 1. LMS is also incompatible with quorum devices,
|
|
if last_man_standing is specified in corosync.conf then the quorum device
|
|
will be disabled.
|
|
|
|
.PP
|
|
Example configuration 1:
|
|
.nf
|
|
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
expected_votes: 8
|
|
last_man_standing: 1
|
|
}
|
|
.fi
|
|
.PP
|
|
Example configuration 2 (increase timeout to 20 seconds):
|
|
.nf
|
|
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
expected_votes: 8
|
|
last_man_standing: 1
|
|
last_man_standing_window: 20000
|
|
}
|
|
.fi
|
|
.PP
|
|
.B auto_tie_breaker: 1
|
|
.PP
|
|
Enables Auto Tie Breaker (ATB) feature (default: 0).
|
|
.PP
|
|
The general behaviour of votequorum allows a simultaneous node failure up
|
|
to 50% - 1 node, assuming each node has 1 vote.
|
|
.PP
|
|
When ATB is enabled, the cluster can suffer up to 50% of the nodes failing
|
|
at the same time, in a deterministic fashion. By default the cluster
|
|
partition, or the set of nodes that are still in contact with the
|
|
node that has the lowest nodeid will remain quorate. The other nodes will
|
|
be inquorate. This behaviour can be changed by also specifying
|
|
.PP
|
|
.B auto_tie_breaker_node: lowest|highest|<list of node IDs>
|
|
.PP
|
|
\(oqlowest\(cq is the default, \(oqhighest\(cq is similar in that if the current set of
|
|
nodes contains the highest nodeid then it will remain quorate. Alternatively
|
|
it is possible to specify a particular node ID or list of node IDs that will
|
|
be required to maintain quorum. If a (space-separated) list is given, the
|
|
nodes are evaluated in order, so if the first node is present then it will
|
|
be used to determine the quorate partition, if that node is not in either
|
|
half (ie was not in the cluster before the split) then the second node ID
|
|
will be checked for and so on. ATB is incompatible with quorum devices -
|
|
if auto_tie_breaker is specified in corosync.conf then the quorum device
|
|
will be disabled.
|
|
.PP
|
|
Example configuration 1:
|
|
.nf
|
|
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
expected_votes: 8
|
|
auto_tie_breaker: 1
|
|
auto_tie_breaker_node: lowest
|
|
}
|
|
.fi
|
|
.PP
|
|
Example configuration 2:
|
|
.nf
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
expected_votes: 8
|
|
auto_tie_breaker: 1
|
|
auto_tie_breaker_node: 1 3 5
|
|
}
|
|
.PP
|
|
.fi
|
|
.PP
|
|
.B allow_downscale: 1
|
|
.PP
|
|
Enables allow downscale (AD) feature (default: 0).
|
|
.PP
|
|
THIS FEATURE IS INCOMPLETE AND CURRENTLY UNSUPPORTED.
|
|
.PP
|
|
The general behaviour of votequorum is to never decrease expected votes or quorum.
|
|
.PP
|
|
When AD is enabled, both expected votes and quorum are recalculated when
|
|
a node leaves the cluster in a clean state (normal corosync shutdown process) down
|
|
to configured expected_votes.
|
|
.PP
|
|
Example use case:
|
|
.PP
|
|
.nf
|
|
1) N node cluster (where N is any value higher than 3)
|
|
|
|
2) expected_votes set to 3 in corosync.conf
|
|
|
|
3) only 3 nodes are running
|
|
|
|
4) admin requires to increase processing power and adds 10 nodes
|
|
|
|
5) internal expected_votes is automatically set to 13
|
|
|
|
6) minimum expected_votes is 3 (from configuration)
|
|
|
|
- up to this point this is standard votequorum behavior -
|
|
|
|
7) once the work is done, admin wants to remove nodes from the cluster
|
|
|
|
8) using an ordered shutdown the admin can reduce the cluster size
|
|
automatically back to 3, but not below 3, where normal quorum
|
|
operation will work as usual.
|
|
|
|
.fi
|
|
.PP
|
|
Example configuration:
|
|
.nf
|
|
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
expected_votes: 3
|
|
allow_downscale: 1
|
|
}
|
|
.fi
|
|
allow_downscale implicitly enabled EVT (see below).
|
|
.PP
|
|
.B expected_votes_tracking: 1
|
|
.PP
|
|
Enables Expected Votes Tracking (EVT) feature (default: 0).
|
|
.PP
|
|
Expected Votes Tracking stores the highest-seen value of expected votes on disk and uses
|
|
that as the minimum value for expected votes in the absence of any higher authority (eg
|
|
a current quorate cluster). This is useful for when a group of nodes becomes detached from
|
|
the main cluster and after a restart could have enough votes to provide quorum, which can
|
|
happen after using allow_downscale.
|
|
.PP
|
|
Note that even if the in-memory version of expected_votes is reduced, eg by removing nodes
|
|
or using corosync-quorumtool, the stored value will still be the highest value seen - it
|
|
never gets reduced.
|
|
.PP
|
|
The value is held in the file ev_tracking (stored in the directory configured in system.state_dir
|
|
or /var/lib/corosync/ when unset) which can be deleted if you
|
|
really do need to reduce the expected votes for any reason, like the node has been moved
|
|
to a different cluster.
|
|
.PP
|
|
.fi
|
|
.PP
|
|
.SH VARIOUS NOTES
|
|
.PP
|
|
* WFA / LMS / ATB / AD can be used combined together.
|
|
.PP
|
|
* In order to change the default votes for a node there are two options:
|
|
.nf
|
|
|
|
1) nodelist:
|
|
|
|
nodelist {
|
|
node {
|
|
ring0_addr: 192.168.1.1
|
|
quorum_votes: 3
|
|
}
|
|
....
|
|
}
|
|
|
|
2) quorum section (deprecated):
|
|
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
expected_votes: 2
|
|
votes: 2
|
|
}
|
|
|
|
.fi
|
|
In the event that both nodelist and quorum { votes: } are defined, the value
|
|
from the nodelist will be used.
|
|
.PP
|
|
* Only votes, quorum_votes, expected_votes and two_node can be changed at runtime. Everything else
|
|
requires a cluster restart.
|
|
.SH BUGS
|
|
No known bugs at the time of writing. The authors are from outerspace. Deal with it.
|
|
.SH "SEE ALSO"
|
|
.BR corosync (8),
|
|
.BR corosync.conf (5),
|
|
.BR corosync-quorumtool (8),
|
|
.BR corosync-qdevice (8),
|
|
.BR votequorum_overview (3)
|
|
.PP
|