vqsim is a small program that allows node up/down/split/join
operations to be simulated without the use of an actual cluster.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.
To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet
The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.
Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
As we now have update_node_expected_votes(), we can use that
when receiving a new EXPECTED_VOTES value from another node
rather than having our own loop.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
If expected_votes was set via the library but the calculation
decides it's too high, then an error is correctly returned but
the value is still set in the nodes' expected_votes field and
turns up in the corosync-quorumtool display.
This patch separates out the quorum calculation from the updating
of expected_votes per node to prevent this from happening.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Solves situation when in 2 node cluster tie-breaker node dies. Because
code contains two bugs, other node got NACK instead of ACK.
- Algo timer is not stack, so calling abort and schedule in timer
callback without setting reschedule is noop.
- It's needed to check not only what current node thinks about
membership, but also what other nodes thinks. If views diverge -> wait.
Thanks Christine Caulfield <ccaulfie@redhat.com> for fixing the English
in the comments somewhat.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Uidgid entries parsed from configuration files now has prefix
(uidgid.config.) so they are distinguishable from dynamically added
entries. Entries added from config file are pruned on reload if no
longer exists in config file (dynamic one stays unaffected). Also whole
uidgid.config. prefix is made read only.
This make PCMK work again after configuration reload is called.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
There are changes in pacemaker-cts which corosync-testagents denpends
on. With these changes, corosync-testagents can not run. This patch
fixes the issues, and makes corosync-testagents run.
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
including mentioning corosync-qdevice(5) on the
votequorum(5) and corosync.conf(5) pages.
Thanks to Jan Pokorný for reporting these.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Revert patch 9f54f0a1fad7dad42c55562a50dfb9d773e6a660 as it causes
more troubles than it solves. Code that uses the quorum nodelist
to get a list of actual nodes in the cluster for communication
break using this as well as the display from corosync-quorumtool
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
On 50:50 split, ffsplit algorithm now prefers partition with
higher number of active clients. Tie-breaker is used only if both
partitions have same number of active clients.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
50:50 split algorithm now works in following way:
- On client configuration change, membership change or disconnect wait
till membership is stable (= all client configuration node list are
equal, and all partitions has equal information).
- Choose best partition >= 50%
- If no such partition exists, send NACK to all clients
- Send NACK to all clients who should receive NACK
- After all clients who should receive NACK confirm vote reception, send
ACK to all clients who should get ACK
This ensures that there are never two partitions with ACK and it has
much better behavior than previous version, because if tie-breaker
partition is not connected, other partition gets ACK.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
To prevent receiving vote from old membership ring id is sent to server
during init and replied back to client in every node list,
ask for vote reply and vote info messages.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>