Configuration option quorum.device.sync_timeout is available for setting
qdevice poll timeout for synchronization phase. Default value is 30
sec.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Fix several places where 'then' is used instead of 'than' in error
messages and a comment.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Previous safe_atoi didn't check range of input values so if for example
user used -1 s token timeout, it was converted to UINT32_MAX without
letting user know.
Another safe_atoi problem was using strtol. This works pretty well on
64-bit systems, where long integer is usually 64-bits long, sadly on
32-bit systems, it is usually 32-bit long. And because strtol returns
signed integer, it was not possible to enter 32-bit value with highest
bit set.
Solution is to use strtoll which is guaranteed to be at least 64-bits
long and check value range.
Also error message now contains also information about expected value
range.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Token coefficient is used only when nodelist is specified and contains
at least 3 nodes. If so, real token timeout is then computed as
token + (number_of_nodes - 2) * token_coefficient. This allows cluster
to scale without manually changing token timeout every time new
node is added. This value can be set to 0 resulting in effective
removal of this feature.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This patch adds the option to store expected_votes to
persistent storage. This is needed to allow_downscale
to operate properly.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Pass an icmap hashtable into coroparse so we can load it into
a temporary one during reload
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
If configuration file contains closing brace before opening brace
at top level, configuration parsing is stopped and file is not
completely parsed. Solution is to detect extra closing brace and display
error.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
If colon was entered as part of value on end of value, it is deleted.
This makes impossible to enter (legal) IPv6 address ending with :: (like
fed0::).
Also when line contains both brace and colon, it is parsed twice (first
as key = value and second as start of section). This is handled by
continue in if section.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
openindiana toolchain is rather messy. This is the first cut only
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
in previous incarnation of qdisk + cman, master_wins was restricted
to 2 node only.
In this new version it is possible to use master_wins for any cluster
size.
Let's assume a 4 node cluster. Each node votes 1, qdevice votes 3.
node 1 becomes qdevice master
node 2/3/4 no
In case of a split (let's assume 2/2):
partition 1: {4, 1}
partition 2: {1, 1}
node 2 in partition 1 would normally be unquorate, leaving effectively
only node 1 active.
master_wins allows node 2 to recognize to be part of a quorate partition
(since node1 is broadcasting that qdevice is voting) and retain
quorum.
node1 has never lost quorate status since qdevice is voting there.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Full path to key is now tested rather then key name only.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
SHA224 is not supported on RHEL6 and also it's kind of weird. Instead of
that, md5 can now be configured.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
add support for sha224/256/384/512
change config defaults to match coroparse and totemconfig
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Tomcrypt in corosync is for long time not updated. Because we have
support for libnss, libtomcrypt can be removed.
Also few leftovers (AES is 256 bits, not 128, ...) are removed.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
pload is a performance benchmark that measures the onwire
speed of corosync.
problem is that once pload has been executed, the cluster
is basically dead.
turn pload into a test tool, by removing corosync-pload tool
and user library.
cleanup pload code to make it more readable and drop lots
of unnecessary stuff.
add test/ploadstart tool that can configure and start pload
via cmap calls.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
pointed out that leave_remove can be easily confused with the old
cman leave_remove behavior. The two are substantially different
and we need to avoid confusion both for users and our support team.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
qdevice is a very special node in the cluster and it adds a certain
amount of complexity and special cases across the code.
most of the qdevice data are shared across the cluster (name/votes)
but effectively each node has a different view of the qdevice
(registered/unregistered/voting/etc.)
with this change, we align the qdevice view across the node,
exchanging more data between nodes and we fix how qdevice behaves
and it is configured.
The only side effect is that the amount of data transmitted on wire
is slightly higher.
The qdevice API is still disabled by default. This means that
the amount of real changes in current code are a lot smaller
than it appears by this patch.
TODO: documentation/man pages needs to be updated once
this change is in (and behavior finalized).
User visible changes:
- configuration (coroparse, exec/votequorum):
the quorum device section is now standalone within the quorum.
quorum {
provider: corosync_votequorum
device {
model: (name)
timeout: (millisec)
votes:
}
}
the keyword "model:" is mandatory to enable qdevice in configuration
and should express the name of the script/daemon that will provide
the qdevice. Looking into the future, an init script or systemd
service will look for that name in /path/to/be/decided/name
and start/stop qdevice.
timeout: defines the maximum interval the qdevice implementation
has available between poll (see votequorum_qdevice_poll.3) before
the device is considered dead and votes discarded
votes: is now a configuration parameter and not an API call.
quorum devices don't care what they need to vote.
votes is autocalculated when a nodelist is available and all
nodes in the list vote 1. Otherwise this parameter is mandatory.
- configuration (exec/votequorum):
startup and runtime configuration changes have been improved.
errors at startup are considered fatal. errors at runtime
have different exit paths.
startup:
* quorum.two_node and qdevice are incompatible.
* quorum.expected_votes requires quorum.device.votes.
* quorum.expected_votes - quorum.device.votes cannot be lower
than 2.
* qdevice and last_man_standing are mutually exclusive.
* qdevice and auto_tie_breaker are mutually exclusive.
runtime config changes:
* quorum.two_node and qdevice are incompatible:
if quorum device is alive, two_node is disabled.
if quorum device is not alive and node count is 2, two_node is
enabled, and quorum device cannot be registered
* if either last_man_standing or auto_tie_breaker were enabled
at startup, and at runtime quorum device is configured,
quorum device registration will be blocked.
* if quorum.expected_votes is configured but not quorum.device.votes,
quorum device registration will be blocked.
* if quorum.device.votes is not configured and we cannot
automatically calculate it, quorum device registration will be blocked.
* An error in configuring quorum.expected_votes and quorum.device.votes
will block quorum device registration.
blocking quorum device registation, also means dropping the votes.
quorum.device.votes (either set or automatically calculated) is now
used to determine current expected_votes in the cluster.
- logging (exec/votequorum):
all errors from configuration are treated as WARNING/CRITICAL.
lots of extra DEBUG output is added (see internal changes too).
- corosync-quorumtool (tools/corosync-quorumtool):
* added option to forcefully kick out a quorum device from the local
node. This is for emergency recovery only and it is only
available when qdevice API is built-in.
* Improved status output, specifically add node state and qdevice
information
[root@fedora-master-node2 coro]# corosync-quorumtool -s
Version: 1.99.4.12-9c7d-dirty
Quorum type: corosync_votequorum
Nodes: 2
Ring ID: 132
Quorate: Yes
Node votes: 1
Node state: Member
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Nodeid Votes Name
1 1 fedora-master-node1.int.fabbione.net
2 1 fedora-master-node2.int.fabbione.net
0 1 QDEVICE (Voting)
* allow to print status for any node in the cluster known to
local node.
[root@fedora-master-node1 coro]# corosync-quorumtool -s
Version: 1.99.4.12-9c7d-dirty
Quorum type: corosync_votequorum
Nodes: 2
Ring ID: 144
Quorate: Yes
Node votes: 1
Node state: Member
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate
Nodeid Votes Name
1 1 fedora-master-node1.int.fabbione.net
2 1 fedora-master-node2.int.fabbione.net
[root@fedora-master-node1 coro]# corosync-quorumtool -s -n 2
Version: 1.99.4.12-9c7d-dirty
Quorum type: corosync_votequorum
Nodes: 2
Ring ID: 144
Quorate: Yes
Node votes: 1
Node state: Member
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Nodeid Votes Name
1 1 fedora-master-node1.int.fabbione.net
2 1 fedora-master-node2.int.fabbione.net
0 1 QDEVICE (Voting)
Internal changes:
- change qdevice timer to not run all time, but only when necessary.
- change votequorum_nodeinfo on wire data to use flags instead of uint8_t
and add QDEVICE status.
- allocate nodeid 0 to qdevice since it's the only real
nodeid that be reserved.
- change send_nodeinfo to allow to send nodeinfo for any node
so that we can share qdevice info across the cluster
(and this might be useful in future if we need to sync
internal cluster view).
- add votequorum api call to update qdevice name
- add runtime data if quorum device has been forcefully disabled
by config error
- add qdevice votes to expected_votes calculation (this
is probably the biggest difference vs cman)
- change votequorum_read_nodelist_configuration so that
we can autocalculate votes for qdevice (we need the nodecount
vs votes).
- add all checks for startup/runtime config (see above).
- do not make qdevice part of the membership_list received from
totem. None of our users care about it and it is not a real node.
- change onwire message handlers to deal with "data for this node from any node"
case and undersand nodeid 0 for qdevice info
- always allocate qdevice at startup. this simplifies code a lot.
- dispatch qdevice nodeinfo on membership changes.
- inform libvotequorum users when a qdevice is registered
- improve substantially qdevice api and add a simple
barrier based on qdevice name.
- add qdevice API barrier at cluster level. This feature allow
only one qdevice name to be active in the cluster at any time.
- qdevice getinfo can now report status for qdevice on any node.
- change slightly the way the qdevice API is built-in/out:
only the libvotequorum calls are #ifdef'out now. Doing so in
the core is too complex and would make the code unreadable
with the risk of missing a bit or two effectively introducing
an on-wire incompatibility if we will ever turn the API on.
- probably added some bugs on the way...
TODO: update qdevice_* API once the above is settled and test
qdevice integration with other features.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com> (only second part)
Thanks to totemip_getifaddrs infrastructure it's now possible to use
nodelist informations to autoconfigure interface bindnetaddr. Together
with cluster_name, interface section can be completely omitted.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Instead of atoi, strtol is used. This allows detection of typical
problems like empty value of key and incorrectly entered numbers.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
this also cleanup NODESTATE for good. JOINING was never used
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
These look ugly, are inconsistently done and just have
to be removed later in libqb before calling syslog.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Quorum is broken in this patch.
service.h needs to be cleaned up significantly
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>
this flag (0|1) can be configured via quorum.last_man_standing and when
enabled, it allows expected_votes to be dynamically recalculated.
Assuming an 8 nodes cluster, every node votes 1 (mandatory requirement for
this feature).
In the first event, 3 nodes are lost.
The remaining partition of 5 is barely quorate.
After a configurable timeout (quorum.last_man_standing_window, default 10sec)
the quorate partition is allow to recalculate expected_votes based on
the remaining nodes.
This operation will bring expected_votes to 5 and quorum to 3.
Repeating the above loop, in the next event, 2 more nodes are allowed to
die. etc. etc.
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
this is a very old leftover from the RHEL5 timeframe, not used in RHEL6.
Also change votequorum soname since this change implies an ABI change.
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
this flag (0|1) can be configured via quorum.auto_tie_breaker and when
enabled, support for perfect even split is on.
In case of a 50% of votes loss in one single transition, the partition
with the node that has the lowest node id will remain quorate.
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
this flag (0|1) can be configured via quorum.wait_for_all and changes
behavior when granting quorum for the first time.
Normal behavior (default / 0) grants quorum as soon as enough nodes
are available in a cluster.
Setting this value to 1 will grant quorum only after all cluster
memembers are part of the cluster at the same time.
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Our preferred shared logging system is exported via the libqb library. As
a result, the corosync project no longer needs to export logsys.so and the
code can be directly included in the binary. The header file can also be
removed.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>