mirror of
				https://git.proxmox.com/git/mirror_corosync
				synced 2025-10-31 06:54:43 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			410 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			410 lines
		
	
	
		
			13 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
| .\"/*
 | |
| .\" * Copyright (c) 2012-2014 Red Hat, Inc.
 | |
| .\" *
 | |
| .\" * All rights reserved.
 | |
| .\" *
 | |
| .\" * Authors: Christine Caulfield <ccaulfie@redhat.com>
 | |
| .\" *          Fabio M. Di Nitto   <fdinitto@redhat.com>
 | |
| .\" *
 | |
| .\" * This software licensed under BSD license, the text of which follows:
 | |
| .\" *
 | |
| .\" * Redistribution and use in source and binary forms, with or without
 | |
| .\" * modification, are permitted provided that the following conditions are met:
 | |
| .\" *
 | |
| .\" * - Redistributions of source code must retain the above copyright notice,
 | |
| .\" *   this list of conditions and the following disclaimer.
 | |
| .\" * - Redistributions in binary form must reproduce the above copyright notice,
 | |
| .\" *   this list of conditions and the following disclaimer in the documentation
 | |
| .\" *   and/or other materials provided with the distribution.
 | |
| .\" * - Neither the name of the MontaVista Software, Inc. nor the names of its
 | |
| .\" *   contributors may be used to endorse or promote products derived from this
 | |
| .\" *   software without specific prior written permission.
 | |
| .\" *
 | |
| .\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 | |
| .\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 | |
| .\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 | |
| .\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 | |
| .\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 | |
| .\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 | |
| .\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 | |
| .\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 | |
| .\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 | |
| .\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
 | |
| .\" * THE POSSIBILITY OF SUCH DAMAGE.
 | |
| .\" */
 | |
| .TH VOTEQUORUM 5 2012-01-24 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual"
 | |
| .SH NAME
 | |
| votequorum \- Votequorum Configuration Overview
 | |
| .SH OVERVIEW
 | |
| The votequorum service is part of the corosync project. This service can be optionally loaded
 | |
| into the nodes of a corosync cluster to avoid split-brain situations.
 | |
| It does this by having a number of votes assigned to each system in the cluster and ensuring 
 | |
| that only when a majority of the votes are present, cluster operations are allowed to proceed.
 | |
| The service must be loaded into all nodes or none. If it is loaded into a subset of cluster nodes
 | |
| the results will be unpredictable.
 | |
| .PP
 | |
| The following corosync.conf extract will enable votequorum service within corosync:
 | |
| .PP
 | |
| .nf
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
| }
 | |
| .fi
 | |
| .PP
 | |
| votequorum reads its configuration from corosync.conf. Some values can be changed at runtime, others
 | |
| are only read at corosync startup. It is very important that those values are consistent
 | |
| across all the nodes participating in the cluster or votequorum behavior will be unpredictable.
 | |
| .PP
 | |
| votequorum requires an expected_votes value to function, this can be provided in two ways. 
 | |
| The number of expected votes will be automatically calculated when the nodelist { } section is 
 | |
| present in corosync.conf or expected_votes can be specified in the quorum { } section. Lack of 
 | |
| both will disable votequorum. If both are present at the same time, 
 | |
| the quorum.expected_votes value will override the one calculated from the nodelist.
 | |
| .PP
 | |
| Example (no nodelist) of an 8 node cluster (each node has 1 vote):
 | |
| .nf
 | |
| 
 | |
| quorum { 
 | |
|     provider: corosync_votequorum
 | |
|     expected_votes: 8
 | |
| }
 | |
| .fi
 | |
| .PP
 | |
| Example (with nodelist) of a 3 node cluster (each node has 1 vote):
 | |
| .nf
 | |
| 
 | |
| quorum { 
 | |
|     provider: corosync_votequorum
 | |
| } 
 | |
| 
 | |
| nodelist {
 | |
|     node {
 | |
|         ring0_addr: 192.168.1.1
 | |
|     }
 | |
|     node {
 | |
|         ring0_addr: 192.168.1.2
 | |
|     }
 | |
|     node {
 | |
|         ring0_addr: 192.168.1.3
 | |
|     }
 | |
| }
 | |
| .fi
 | |
| .SH SPECIAL FEATURES
 | |
| .PP
 | |
| .B two_node: 1
 | |
| .PP
 | |
| Enables two node cluster operations (default: 0).
 | |
| .PP
 | |
| The "two node cluster" is a use case that requires special consideration.
 | |
| With a standard two node cluster, each node with a single vote, there 
 | |
| are 2 votes in the cluster. Using the simple majority calculation 
 | |
| (50% of the votes + 1) to calculate quorum, the quorum would be 2. 
 | |
| This means that the both nodes would always have
 | |
| to be alive for the cluster to be quorate and operate.
 | |
| .PP
 | |
| Enabling two_node: 1, quorum is set artificially to 1.
 | |
| .PP
 | |
| Example configuration 1:
 | |
| 
 | |
| .nf
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
|     expected_votes: 2
 | |
|     two_node: 1
 | |
| }
 | |
| .fi
 | |
| 
 | |
| .PP
 | |
| Example configuration 2:
 | |
| 
 | |
| .nf
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
|     two_node: 1
 | |
| }
 | |
| 
 | |
| nodelist {
 | |
|     node {
 | |
|         ring0_addr: 192.168.1.1
 | |
|     }
 | |
|     node {
 | |
|         ring0_addr: 192.168.1.2
 | |
|     }
 | |
| }
 | |
| .fi
 | |
| .PP
 | |
| NOTES: enabling two_node: 1 automatically enables wait_for_all. It is
 | |
| still possible to override wait_for_all by explicitly setting it to 0.
 | |
| If more than 2 nodes join the cluster, the two_node option is 
 | |
| automatically disabled.
 | |
| .PP
 | |
| .B wait_for_all: 1
 | |
| .PP
 | |
| Enables Wait For All (WFA) feature (default: 0).
 | |
| .PP
 | |
| The general behaviour of votequorum is to switch a cluster from inquorate to quorate
 | |
| as soon as possible. For example, in an 8 node cluster, where every node has 1 vote,
 | |
| expected_votes is set to 8 and quorum is (50% + 1) 5. As soon as 5 (or more) nodes
 | |
| are visible to each other, the partition of 5 (or more) becomes quorate and can
 | |
| start operating.
 | |
| .PP
 | |
| When WFA is enabled, the cluster will be quorate for the first time
 | |
| only after all nodes have been visible at least once at the same time.
 | |
| .PP
 | |
| This feature has the advantage of avoiding some startup race conditions, with the cost
 | |
| that all nodes need to be up at the same time at least once before the cluster
 | |
| can operate.
 | |
| .PP
 | |
| A common startup race condition based on the above example is that as soon as 5
 | |
| nodes become quorate, with the other 3 still offline, the remaining 3 nodes will
 | |
| be fenced.
 | |
| .PP
 | |
| It is very useful when combined with last_man_standing (see below).
 | |
| .PP
 | |
| Example configuration:
 | |
| .nf
 | |
| 
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
|     expected_votes: 8
 | |
|     wait_for_all: 1
 | |
| }
 | |
| .fi
 | |
| .PP
 | |
| .B last_man_standing: 1
 | |
| /
 | |
| .B last_man_standing_window: 10000
 | |
| .PP
 | |
| Enables Last Man Standing (LMS) feature (default: 0).
 | |
| Tunable last_man_standing_window (default: 10 seconds, expressed in ms).
 | |
| .PP
 | |
| The general behaviour of votequorum is to set expected_votes and quorum
 | |
| at startup (unless modified by the user at runtime, see below) and use
 | |
| those values during the whole lifetime of the cluster.
 | |
| .PP
 | |
| Using for example an 8 node cluster where each node has 1 vote, expected_votes
 | |
| is set to 8 and quorum to 5. This condition allows a total failure of 3
 | |
| nodes. If a 4th node fails, the cluster becomes inquorate and it will
 | |
| stop providing services.
 | |
| .PP
 | |
| Enabling LMS allows the cluster to dynamically recalculate expected_votes
 | |
| and quorum under specific circumstances. It is essential to enable
 | |
| WFA when using LMS in High Availability clusters.
 | |
| .PP
 | |
| Using the above 8 node cluster example, with LMS enabled the cluster can retain
 | |
| quorum and continue operating by losing, in a cascade fashion, up to 6 nodes with 
 | |
| only 2 remaining active.
 | |
| .PP
 | |
| Example chain of events:
 | |
| .nf
 | |
| 1) cluster is fully operational with 8 nodes.
 | |
|    (expected_votes: 8 quorum: 5)
 | |
| 
 | |
| 2) 3 nodes die, cluster is quorate with 5 nodes.
 | |
| 
 | |
| 3) after last_man_standing_window timer expires,
 | |
|    expected_votes and quorum are recalculated.
 | |
|    (expected_votes: 5 quorum: 3)
 | |
| 
 | |
| 4) at this point, 2 more nodes can die and
 | |
|    cluster will still be quorate with 3.
 | |
| 
 | |
| 5) once again, after last_man_standing_window
 | |
|    timer expires expected_votes and quorum are
 | |
|    recalculated.
 | |
|    (expected_votes: 3 quorum: 2)
 | |
| 
 | |
| 6) at this point, 1 more node can die and
 | |
|    cluster will still be quorate with 2.
 | |
| 
 | |
| 7) one more last_man_standing_window timer
 | |
|    (expected_votes: 2 quorum: 2)
 | |
| .fi
 | |
| .PP
 | |
| NOTES: In order for the cluster to downgrade automatically from 2 nodes
 | |
| to a 1 node cluster, the auto_tie_breaker feature must also be enabled (see below).
 | |
| If auto_tie_breaker is not enabled, and one more failure occours, the
 | |
| remaining node will not be quorate. LMS does not work with asymmetric voting
 | |
| schemes, each node must vote 1. LMS is also incompatible with quorum devices,
 | |
| if last_man_standing is specified in corosync.conf then the quorum device 
 | |
| will be disabled.
 | |
| 
 | |
| .PP
 | |
| Example configuration 1:
 | |
| .nf
 | |
| 
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
|     expected_votes: 8
 | |
|     last_man_standing: 1
 | |
| }
 | |
| .fi
 | |
| .PP
 | |
| Example configuration 2 (increase timeout to 20 seconds):
 | |
| .nf
 | |
| 
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
|     expected_votes: 8
 | |
|     last_man_standing: 1
 | |
|     last_man_standing_window: 20000
 | |
| }
 | |
| .fi
 | |
| .PP
 | |
| .B auto_tie_breaker: 1
 | |
| .PP
 | |
| Enables Auto Tie Breaker (ATB) feature (default: 0).
 | |
| .PP
 | |
| The general behaviour of votequorum allows a simultaneous node failure up
 | |
| to 50% - 1 node, assuming each node has 1 vote.
 | |
| .PP
 | |
| When ATB is enabled, the cluster can suffer up to 50% of the nodes failing
 | |
| at the same time, in a deterministic fashion. By default the cluster 
 | |
| partition, or the set of nodes that are still in contact with the 
 | |
| node that has the lowest nodeid will remain quorate. The other nodes will 
 | |
| be inquorate. This behaviour can be changed by also specifying
 | |
| .PP
 | |
| .B auto_tie_breaker_node: lowest|highest|<list of node IDs>
 | |
| .PP
 | |
| \'lowest' is the default, 'highest' is similar in that if the current set of
 | |
| nodes contains the highest nodeid then it will remain quorate. Alternatively
 | |
| it is possible to specify a particular node ID or list of node IDs that will 
 | |
| be required to maintain quorum. If a (space-separated) list is given, the 
 | |
| nodes are evaluated in order, so if the first node is present then it will 
 | |
| be used to determine the quorate partition, if that node is not in either
 | |
| half (ie was not in the cluster before the split) then the second node ID 
 | |
| will be checked for and so on. ATB is incompatible with quorum devices - 
 | |
| if auto_tie_breaker is specified in corosync.conf then the quorum device 
 | |
| will be disabled.
 | |
| .PP
 | |
| Example configuration 1:
 | |
| .nf
 | |
| 
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
|     expected_votes: 8
 | |
|     auto_tie_breaker: 1
 | |
|     auto_tie_breaker_node: lowest
 | |
| }
 | |
| .fi
 | |
| .PP
 | |
| Example configuration 2:
 | |
| .nf
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
|     expected_votes: 8
 | |
|     auto_tie_breaker: 1
 | |
|     auto_tie_breaker_node: 1 3 5
 | |
| }
 | |
| .PP
 | |
| .fi
 | |
| .PP
 | |
| .B allow_downscale: 1
 | |
| .PP
 | |
| Enables allow downscale (AD) feature (default: 0).
 | |
| .PP
 | |
| THIS FEATURE IS INCOMPLETE AND CURRENTLY UNSUPPORTED.
 | |
| .PP
 | |
| The general behaviour of votequorum is to never decrease expected votes or quorum.
 | |
| .PP
 | |
| When AD is enabled, both expected votes and quorum are recalculated when
 | |
| a node leaves the cluster in a clean state (normal corosync shutdown process) down
 | |
| to configured expected_votes.
 | |
| .PP
 | |
| Example use case:
 | |
| .PP
 | |
| .nf
 | |
| 1) N node cluster (where N is any value higher than 3)
 | |
| 
 | |
| 2) expected_votes set to 3 in corosync.conf
 | |
| 
 | |
| 3) only 3 nodes are running
 | |
| 
 | |
| 4) admin requires to increase processing power and adds 10 nodes
 | |
| 
 | |
| 5) internal expected_votes is automatically set to 13
 | |
| 
 | |
| 6) minimum expected_votes is 3 (from configuration)
 | |
| 
 | |
| - up to this point this is standard votequorum behavior -
 | |
| 
 | |
| 7) once the work is done, admin wants to remove nodes from the cluster
 | |
| 
 | |
| 8) using an ordered shutdown the admin can reduce the cluster size
 | |
|    automatically back to 3, but not below 3, where normal quorum
 | |
|    operation will work as usual.
 | |
| 
 | |
| .fi
 | |
| .PP
 | |
| Example configuration:
 | |
| .nf
 | |
| 
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
|     expected_votes: 3
 | |
|     allow_downscale: 1
 | |
| }
 | |
| .fi
 | |
| allow_downscale implicitly enabled EVT (see below).
 | |
| .PP
 | |
| .B expected_votes_tracking: 1
 | |
| .PP
 | |
| Enables Expected Votes Tracking (EVT) feature (default: 0).
 | |
| .PP
 | |
| Expected Votes Tracking stores the highest-seen value of expected votes on disk and uses
 | |
| that as the minimum value for expected votes in the absence of any higher authority (eg 
 | |
| a current quorate cluster). This is useful for when a group of nodes becomes detached from
 | |
| the main cluster and after a restart could have enough votes to provide quorum, which can 
 | |
| happen after using allow_downscale. 
 | |
| .PP
 | |
| Note that even if the in-memory version of expected_votes is reduced, eg by removing nodes
 | |
| or using corosync-quorumtool, the stored value will still be the highest value seen - it
 | |
| never gets reduced.
 | |
| .PP
 | |
| The value is held in the file /var/lib/corosync/ev_tracking which can be deleted if you 
 | |
| really do need to reduce the expected votes for any reason, like the node has been moved 
 | |
| to a different cluster.
 | |
| .PP
 | |
| .fi
 | |
| .PP
 | |
| .SH VARIOUS NOTES
 | |
| .PP
 | |
| * WFA / LMS / ATB / AD can be used combined together.
 | |
| .PP
 | |
| * In order to change the default votes for a node there are two options:
 | |
| .nf
 | |
| 
 | |
| 1) nodelist:
 | |
| 
 | |
| nodelist {
 | |
|     node {
 | |
|         ring0_addr: 192.168.1.1
 | |
|         quorum_votes: 3
 | |
|     }
 | |
|     ....
 | |
| }
 | |
| 
 | |
| 2) quorum section (deprecated):
 | |
| 
 | |
| quorum {
 | |
|     provider: corosync_votequorum
 | |
|     expected_votes: 2
 | |
|     votes: 2
 | |
| }
 | |
| 
 | |
| .fi
 | |
| In the event that both nodelist and quorum { votes: } are defined, the value
 | |
| from the nodelist will be used.
 | |
| .PP
 | |
| * Only votes, quorum_votes, expected_votes and two_node can be changed at runtime. Everything else
 | |
| requires a cluster restart.
 | |
| .SH BUGS
 | |
| No known bugs at the time of writing. The authors are from outerspace. Deal with it.
 | |
| .SH "SEE ALSO"
 | |
| .BR corosync (8),
 | |
| .BR corosync.conf (5),
 | |
| .BR corosync-quorumtool (8),
 | |
| .BR corosync-qdevice (8),
 | |
| .BR votequorum_overview (8)
 | |
| .PP
 | 
