mirror of
https://git.proxmox.com/git/mirror_corosync-qdevice
synced 2025-08-06 05:48:26 +00:00

Used the code from corosync master (31ddba64a2726bcedf81eb84df2e2da4846832f7) Signed-off-by: Jan Friesse <jfriesse@redhat.com>
462 lines
14 KiB
Groff
462 lines
14 KiB
Groff
.\"/*
|
|
.\" * Copyright (C) 2016-2017 Red Hat, Inc.
|
|
.\" *
|
|
.\" * All rights reserved.
|
|
.\" *
|
|
.\" * Author: Jan Friesse <jfriesse@redhat.com>
|
|
.\" *
|
|
.\" * This software licensed under BSD license, the text of which follows:
|
|
.\" *
|
|
.\" * Redistribution and use in source and binary forms, with or without
|
|
.\" * modification, are permitted provided that the following conditions are met:
|
|
.\" *
|
|
.\" * - Redistributions of source code must retain the above copyright notice,
|
|
.\" * this list of conditions and the following disclaimer.
|
|
.\" * - Redistributions in binary form must reproduce the above copyright notice,
|
|
.\" * this list of conditions and the following disclaimer in the documentation
|
|
.\" * and/or other materials provided with the distribution.
|
|
.\" * - Neither the name of Red Hat, Inc. nor the names of its
|
|
.\" * contributors may be used to endorse or promote products derived from this
|
|
.\" * software without specific prior written permission.
|
|
.\" *
|
|
.\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
|
.\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
|
.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
|
.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
|
.\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
|
.\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
|
.\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
|
.\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
|
.\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
|
.\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
|
|
.\" * THE POSSIBILITY OF SUCH DAMAGE.
|
|
.\" */
|
|
.TH COROSYNC-QDEVICE 8 2017-10-17
|
|
.SH NAME
|
|
corosync-qdevice \- QDevice daemon
|
|
.SH SYNOPSIS
|
|
.B "corosync-qdevice [-dfh] [-S option=value[,option2=value2,...]]"
|
|
|
|
.SH DESCRIPTION
|
|
.B corosync-qdevice
|
|
is a daemon running on each node of a cluster. It provides a configured
|
|
number of votes to the
|
|
quorum subsystem based on a third-party arbitrator's decision. Its primary use
|
|
is to allow a cluster to sustain more node failures than standard quorum rules allow.
|
|
It is recommended for clusters with an even number of nodes and highly recommended
|
|
for 2 node clusters.
|
|
.SH OPTIONS
|
|
.TP
|
|
.B -d
|
|
Forcefully turn on debug information without the need to change corosync.conf.
|
|
.TP
|
|
.B -f
|
|
Do not daemonize, run in the foreground.
|
|
.TP
|
|
.B -h
|
|
Show short help text
|
|
.TP
|
|
.B -S
|
|
Set advanced settings described in its own section below. This option
|
|
shouldn't be generally used because most of the options are
|
|
not safe to change.
|
|
.SH CONFIGURATION
|
|
.B corosync-qdevice
|
|
reads its configuration from corosync.conf file.
|
|
|
|
The main configuration is within
|
|
.B quorum.device
|
|
sub-key. Each model also has its own configuration within a
|
|
similarly named sub-key.
|
|
.TP
|
|
.B model
|
|
Specifies the model to be used. This parameter is required.
|
|
.B corosync-qdevice
|
|
is modular and is able to support multiple different models. The model basically
|
|
defines what type of arbitrator is used. Currently only
|
|
.I net
|
|
is supported.
|
|
.TP
|
|
.B timeout
|
|
Specifies how often
|
|
.B corosync-qdevice
|
|
should call the votequorum_poll function. It is also used by the
|
|
.I net
|
|
model to adjust
|
|
its hearbeat timeout. It is recommended that you don't change this value.
|
|
Default is
|
|
.IR 10000 .
|
|
.TP
|
|
.B sync_timeout
|
|
Specifies how often
|
|
.B corosync-qdevice
|
|
should call the votequorum_poll function during a sync phase. It is recommended that you don't change this value.
|
|
Default is
|
|
.IR 30000 .
|
|
.TP
|
|
.B votes
|
|
The number of votes provided to the cluster by qdevice. Default is (number_of_nodes - 1) or generally
|
|
sum(votes_per_node) - 1.
|
|
|
|
.PP
|
|
.B quorum.device.heuristics
|
|
subkey holds the configuration of the heuristics. Heuristics are set of commands executed locally on
|
|
startup, cluster membership change, successful connect to
|
|
.B corosync-qnetd
|
|
and optionally also at regular times. Commands are executed in parallel.
|
|
When all commands finish successfully
|
|
(their return error code is zero) on time,
|
|
heuristics have passed, otherwise they have failed. The heuristics result is sent to
|
|
.B corosync-qnetd
|
|
and there it's used in calculations to determine which partition should be quorate.
|
|
.TP
|
|
.B timeout
|
|
Specifies maximum time in milliseconds how long
|
|
.B corosync-qdevice
|
|
waits till the heuristics commands finish. If some command doesn't finish before the timeout, it's
|
|
killed and heuristics fail. This timeout is used for heuristics executed at regular times.
|
|
Default value is half of the
|
|
.BR quorum.device.timeout ", so"
|
|
.IR 5000 .
|
|
.TP
|
|
.B sync_timeout
|
|
Similar to quorum.device.heuristics.timeout but used during membership changes. Default
|
|
value is half of the
|
|
.BR quorum.device.sync_timeout ", so"
|
|
.IR 15000 .
|
|
.TP
|
|
.B interval
|
|
Specifies interval between two regular heuristics execution. Default value is
|
|
3 *
|
|
.BR quorum.device.timeout ", so"
|
|
.IR 30000 .
|
|
.TP
|
|
.B mode
|
|
Can be one of
|
|
.IR on ", " sync " or " off
|
|
and specifies mode of operation of heuristics. Default is
|
|
.IR off ,
|
|
which means heuristics are disabled. When
|
|
.I sync
|
|
is set, heuristics are executed only during startup, membership change and when connection
|
|
to
|
|
.B corosync-qnetd
|
|
is established. When heuristics should be running also on regular basis, this option
|
|
should be set to
|
|
.I on
|
|
value.
|
|
.TP
|
|
.B exec_NAME
|
|
defines executables.
|
|
.I NAME
|
|
can be arbitrary valid cmap key name string and it has no special meaning.
|
|
The value of this variable must contain a command to execute. The value is parsed (split)
|
|
into arguments similarly as Bourne shell would do. Quoting is possible by
|
|
using backslash and double quotes.
|
|
|
|
.PP
|
|
.B quorum.device.net
|
|
subkey holds the configuration for
|
|
.B model
|
|
.IR net .
|
|
.TP
|
|
.B tls
|
|
Can be one of
|
|
.IR on ", " off " or " required
|
|
and specifies if tls should be used.
|
|
.I on
|
|
means a connection with TLS is attempted first, but if the server doesn't advertise TLS support
|
|
then non-TLS will be used.
|
|
.I off
|
|
is used then TLS is not required and it's then not even tried. This mode is the
|
|
only one which doesn't need a properly initialized NSS database.
|
|
.I required
|
|
means TLS is required and if the server doesn't support TLS, qdevice will
|
|
exit with error message. Default is
|
|
.IR on .
|
|
.TP
|
|
.B host
|
|
Specifies the IP address or host name of the qnetd server to be used. This parameter
|
|
is required.
|
|
.TP
|
|
.B port
|
|
Specifies TCP port of qnetd server. Default is
|
|
.IR 5403 .
|
|
.TP
|
|
.B algorithm
|
|
Decision algorithm. Can be one of the
|
|
.I ffsplit
|
|
or
|
|
.IR lms .
|
|
(actually there are also
|
|
.I test
|
|
and
|
|
.IR 2nodelms ,
|
|
both of which are mainly for developers and shouldn't be used for production clusters).
|
|
For a description of what each algorithm means and how the algorithms differ see their
|
|
individual sections.
|
|
Default value is
|
|
.IR ffsplit .
|
|
.TP
|
|
.B tie_breaker
|
|
can be one of
|
|
.IR lowest ", " highest
|
|
or valid_node_id (number) values. It's used as a fallback if qdevice has to decide between two or more
|
|
equal partitions.
|
|
.I lowest
|
|
means the partition with the lowest node id is chosen.
|
|
.I highest
|
|
means the partition with highest node id is chosen. And valid_node_id means that the partition
|
|
containing the node with the given node id is chosen.
|
|
Default is
|
|
.IR lowest .
|
|
.TP
|
|
.B connect_timeout
|
|
Timeout when
|
|
.B corosync-qdevice
|
|
is trying to connect to
|
|
.B corosync-qnetd
|
|
host. Default is 0.8 *
|
|
.BR quorum.sync_timeout .
|
|
.TP
|
|
.B force_ip_version
|
|
can be one of
|
|
.I 0|4|6
|
|
and forces the software to use the given IP version.
|
|
.I 0
|
|
(default value) means IPv6 is preferred and IPv4 should be used as a fallback.
|
|
|
|
.PP
|
|
Logging configuration is within the
|
|
.B logging
|
|
directive.
|
|
.B corosync-qdevice
|
|
parses and supports most of the options with exception of
|
|
.BR to_logfile ", " logfile
|
|
and
|
|
.BR logfile_priority .
|
|
The
|
|
.B logger_subsys
|
|
sub-directive can be also used if
|
|
.B subsys
|
|
is set to
|
|
.IR QDEVICE .
|
|
|
|
.PP
|
|
For
|
|
.B corosync-qdevice
|
|
to work correctly, the
|
|
.B nodelist
|
|
directive has to be used and properly configured. Also the
|
|
.I net
|
|
model requires that
|
|
.B totem.cluster_name
|
|
option is set.
|
|
|
|
.SH MODEL NET TLS CONFIGURATION
|
|
For
|
|
.B model
|
|
.I net
|
|
to work using TLS, it's necessary to create the NSS database, import Qnetd
|
|
CA certificate, and get/distribute a valid client certificate.
|
|
|
|
If pcs is used (recommended) the following steps are not needed because pcs does them automatically.
|
|
|
|
.B corosync-qdevice-net-certutil
|
|
is the tool to perform required actions semi-automatically. Please consult the help output of
|
|
it and its man page. For a first time configuration it may make sense to start with the
|
|
.B -Q
|
|
option.
|
|
|
|
If TLS is not required just edit corosync.conf file and set
|
|
.B quorum.device.net.tls
|
|
to
|
|
.IR off .
|
|
|
|
.SH MODEL NET ALGORITHMS
|
|
Algorithms are used to change behavior of how
|
|
.B corosync-qnetd
|
|
provides votes to a given node/partition. Currently there are two algorithms supported.
|
|
.TP
|
|
.B ffsplit
|
|
This one makes sense only for clusters with an even number of nodes. It provides exactly one
|
|
vote to the partition with the highest number of active nodes. If there are two exactly
|
|
similar partitions,
|
|
it provides its vote to the partition with higher score. The score is computed
|
|
as (number_of_connected_nodes +
|
|
number_of_connected_nodes_with_passed_heuristics - number_of_connected_nodes_with_failed_heuristics)
|
|
If the scores are equal, the vote is provided to partition with the most clients connected to the qnetd
|
|
server. If this number is also equal, then the tie_breaker is used. It is able to transition
|
|
its vote if the currently active partition becomes partitioned and a non-active partition
|
|
still has at least 50% of the active nodes. Because of this, a vote is not provided
|
|
if the qnetd connection is not active.
|
|
|
|
To use this algorithm it's required to set the number of votes per node to 1 (default)
|
|
and the qdevice number of votes has to be also 1. This is achieved by setting
|
|
.B quorum.device.votes
|
|
key in corosync.conf file to 1.
|
|
.TP
|
|
.B lms
|
|
Last-man-standing. If the node is the only one left in the cluster that can see the
|
|
qnetd server then we return a vote.
|
|
|
|
If more than one node can see the qnetd server but some nodes can't
|
|
see each other then the cluster is divided up into 'partitions' based on
|
|
their ring_id and this algorithm returns a vote to the partition with highest
|
|
heuristics score (computed the same way as for the
|
|
.B ffsplit
|
|
algorithm), or if there is more than 1 partition with equal scores,
|
|
the largest active partition or,
|
|
if there is more than 1 equal partition, the partition that contains the tie_breaker
|
|
node (lowest, highest, etc). For LMS to work, the number
|
|
of qdevice votes has to be set to default (so just delete
|
|
.B quorum.device.votes
|
|
key from corosync.conf).
|
|
|
|
.SH ADVANCED SETTINGS
|
|
Set by using
|
|
.B -S
|
|
option. The default value is shown in parentheses) Options
|
|
beginning with
|
|
.B net_
|
|
prefix are specific to
|
|
.B model
|
|
.IR net .
|
|
.TP
|
|
.B lock_file
|
|
Lock file location. (/var/run/corosync-qdevice/corosync-qdevice.pid)
|
|
.TP
|
|
.B local_socket_file
|
|
Internal IPC socket file location. (/var/run/corosync-qdevice/corosync-qdevice.sock)
|
|
.TP
|
|
.B local_socket_backlog
|
|
Parameter passed to listen syscall. (10)
|
|
.TP
|
|
.B max_cs_try_again
|
|
How many times to retry the call to a corosync function which has returned CS_ERR_TRY_AGAIN. (10)
|
|
.TP
|
|
.B votequorum_device_name
|
|
Name used for qdevice registration. (Qdevice)
|
|
.TP
|
|
.B ipc_max_clients
|
|
Maximum allowed simultaneous IPC clients. (10)
|
|
.TP
|
|
.B ipc_max_receive_size
|
|
Maximum size of a message received by IPC client. (4096)
|
|
.TP
|
|
.B ipc_max_send_size
|
|
Maximum size of a message allowed to be sent to an IPC client. (65536)
|
|
.TP
|
|
.B master_wins
|
|
Force enable/disable master wins. (default is model)
|
|
.TP
|
|
.B heuristics_ipc_max_send_buffers
|
|
Maximum number of heuristics worker send buffers. (128)
|
|
.TP
|
|
.B heuristics_ipc_max_send_receive_size
|
|
Maximum size of a message allowed to be send to, or received from heuristics worker. (4096)
|
|
.TP
|
|
.B heuristics_min_timeout
|
|
Minimum heuristics timeout accepted by client in ms. (1000)
|
|
.TP
|
|
.B heuristics_max_timeout
|
|
Maximum heuristics timeout accepted by client in ms. (120000)
|
|
.TP
|
|
.B heuristics_min_interval
|
|
Minimum heuristics interval accepted by client in ms. (1000)
|
|
.TP
|
|
.B heuristics_max_interval
|
|
Maximum heuristics interval accepted by client in ms. (3600000)
|
|
.TP
|
|
.B heuristics_max_execs
|
|
Maximum number of exec_ commands. (32)
|
|
.TP
|
|
.B heuristics_use_execvp
|
|
Use execvp instead of execv for executing commands. (off)
|
|
.TP
|
|
.B heuristics_max_processes
|
|
Maximum number of processes running at one time. (160)
|
|
.TP
|
|
.B heuristics_kill_list_interval
|
|
Interval between status is gathered and eventually signal is sent
|
|
to processes which didn't finished on time in ms. (5000)
|
|
.TP
|
|
.B net_nss_db_dir
|
|
NSS database directory. (/etc/corosync/qdevice/net/nssdb)
|
|
.TP
|
|
.B net_initial_msg_receive_size
|
|
Initial (used during connection parameters negotiation)
|
|
maximum size of the receive buffer for message (maximum
|
|
allowed message size received from qnetd). (32768)
|
|
.TP
|
|
.B net_initial_msg_send_size
|
|
Initial (used during connection parameter negotiation)
|
|
maximum size of one send buffer (message) to be sent to server. (32768)
|
|
.TP
|
|
.B net_min_msg_send_size
|
|
Minimum required size of one send buffer (message) to be sent to server. (32768)
|
|
.TP
|
|
.B net_max_msg_receive_size
|
|
Maximum allowed size of receive buffer for a message sent by server. (16777216)
|
|
.TP
|
|
.B net_max_send_buffers
|
|
Maximum number of send buffers. (10)
|
|
.TP
|
|
.B net_nss_qnetd_cn
|
|
Canonical name of qnetd server certificate. (Qnetd Server)
|
|
.TP
|
|
.B net_nss_client_cert_nickname
|
|
NSS nickname of qdevice client certificate. (Cluster Cert)
|
|
.TP
|
|
.B net_heartbeat_interval_min
|
|
Minimum heartbeat timeout accepted by client in ms. (1000)
|
|
.TP
|
|
.B net_heartbeat_interval_max
|
|
Maximum heartbeat timeout accepted by client in ms. (120000)
|
|
.TP
|
|
.B net_min_connect_timeout
|
|
Minimum connection timeout accepted by client in ms. (1000)
|
|
.TP
|
|
.B net_max_connect_timeout
|
|
Maximum connection timeout accepted by client in ms. (120000)
|
|
.TP
|
|
.B net_test_algorithm_enabled
|
|
Enable test algorithm. (if built with --enable-debug on, otherwise off)
|
|
|
|
.SH EXAMPLE
|
|
Define qdevice with
|
|
.I net
|
|
model connecting to qnetd running on qnetd.example.org host, using
|
|
.I ffsplit
|
|
algorithm.
|
|
Heuristics is set to
|
|
.I sync
|
|
mode and executes two commands.
|
|
|
|
.nf
|
|
quorum {
|
|
provider: corosync_votequorum
|
|
device {
|
|
votes: 1
|
|
model: net
|
|
net {
|
|
tls: on
|
|
host: qnetd.example.org
|
|
algorithm: ffsplit
|
|
}
|
|
heuristics {
|
|
mode: sync
|
|
exec_ping: /bin/ping -q -c 1 "www.example.org"
|
|
exec_test_txt_exists: /usr/bin/test -f /tmp/test.txt
|
|
}
|
|
}
|
|
.fi
|
|
.SH SEE ALSO
|
|
.BR corosync-qdevice-tool (8)
|
|
.BR corosync-qdevice-net-certutil (8)
|
|
.BR corosync-qnetd (8)
|
|
.BR corosync.conf (5)
|
|
.SH AUTHOR
|
|
Jan Friesse
|
|
.PP
|