This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.
To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet
The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.
Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
As we now have update_node_expected_votes(), we can use that
when receiving a new EXPECTED_VOTES value from another node
rather than having our own loop.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
If expected_votes was set via the library but the calculation
decides it's too high, then an error is correctly returned but
the value is still set in the nodes' expected_votes field and
turns up in the corosync-quorumtool display.
This patch separates out the quorum calculation from the updating
of expected_votes per node to prevent this from happening.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Uidgid entries parsed from configuration files now has prefix
(uidgid.config.) so they are distinguishable from dynamically added
entries. Entries added from config file are pruned on reload if no
longer exists in config file (dynamic one stays unaffected). Also whole
uidgid.config. prefix is made read only.
This make PCMK work again after configuration reload is called.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Revert patch 9f54f0a1fad7dad42c55562a50dfb9d773e6a660 as it causes
more troubles than it solves. Code that uses the quorum nodelist
to get a list of actual nodes in the cluster for communication
break using this as well as the display from corosync-quorumtool
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
We were looking for us in other node lists, rather than
others in our nodelist.
Also, remove debug print in votequorum.c
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
This patch tidies the two state change callbacks and explains them
in the man page:
The difference between votequorum_nodelist_notification_t and
votequorum_quorum_notification_t is subtle but important.
The 'nodelist' callback is sent at the start of a cluster state
transition and contains the new ring_id and only the list of
nodes that are included in the sync state - ie only active nodes. No
quorum information is included this callback because it is not
available at that time.
The 'quorum' callback is sent after the cluster state transition has
completed and does contain quorum information.
In addition, the nodelist contains a list of all nodes known to
votequorum (whether up or down) and their state as well
as information about the quorum device attached (if any). quorum
callbacks will not be sent for qdevice up and down
events unless they affect quorum.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
This split is needed for qdevice, so that it gets the ring_id and
nodelist as part of the sync process and not afterwards - when quorum
has been calculated.
As this is and unsupported API I'm not too worried about breaking
existing code - all the clients I know of are using the quorum API
anyway as they should be.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
In my previous logconfig patch, adding a subsys so the
logging stanzas could disable logging to a file, because
the subsys closed the file used by the main logging.
This patch only applies defaults to higher-level logging and
non-deprecated keys.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
There were several places where defaults were not restored
if the keys were removed from corosync.conf and the file reloaded.
This patch adds those back so that reloading corosync.conf
has the expected effect when keys are deleted.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Schedwrk is passing hdb handle (64-bit) to
totempg_callback_token_create as a context. Context is defined to be
pointer, so there is conversion function which stores 64-bit hdb_handle
into pointer. Potentially, pointer can be 32-bit. This means, check
part of hdb is discarded (and have to get special no_check value in
schedwrk_do) later. This works quite well on 32-bit Little-Endian
system. Sadly on Big-Endian system, check partition of hdb is stored
instead of value. Result is error of hdb_handle_get call.
Proposed solution is to pass handle pointer to
totempg_callback_token_create as context. This means full hdb (check +
value) can be used in schedwrk_do (easier detection of memory
corruption).
Main reason for this patch is to remove usage of pointer as integer
value.
Small drawback of given solution is that handle pointer must be memory
allocated on heap or static memory, making API more bug-prone. Current
usage of schedwrk API across corosync always use memory in .text
section (safe), so it's not a problem.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Add configuration option resources.watchdog_device allowing runtime
selection of watchdog device. Useful for newer servers having more
than one watchdog available (IPMI and iTCO).
Special value "off" disables watchdog in configuration rather than
just using build options. Useful when watchdog device is needed
elsewhere (SBD cluster stonith service).
Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
basename() function has some potentially odd issues on
other platforms.
So, to be safe, here's an internal version.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
If corosync is built out-of-tree (passing --srcdir to configure) then
TOTEM logging doesn't print anything.
This is caused by the source filenames (from __FILE__ at compilation
time) having the configured path in them - in this example
../corosync/exec/totemudp.c etc. The list of totem source filenames
passed to libqb logging facility only has the basenames so the filenames
never match up as libqb does an exact string match.
I looked into fixing this in libqb but it causes a regression. We can't
simply basename() __FILE__ at the point of calling log_printf as it's i
common also to use __FILE__ to generate the logging source, and
using basename() on both removes the distinction between similarly named
files from different directories which could be a requirement.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
pass 'state' down the stack so that the state of the
hierarchy doesn't get lost when there are unexpected items
in the config hierarchy.
Don't bother setting 'state' on SECTION_END as there's no point
now we're going back up the stack.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
If resolver was set to prefer IPv6 (almost always) and interface section
was not defined (almost all config files created by pcs), IP version was
set to mcast_addr.family. Because mcast_addr.family was unset (reset to
zero), IPv6 address was returned causing failure in totemsrp.
Solution is to pass correct IP version stored in
totem_config->ip_version.
Patch also simplifies get_cluster_mcast_addr. It was using mix of
explicitly passed IP version and bindnet IP version.
Also return value of get_cluster_mcast_addr is now properly checked.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Previously there were two free lists. One for operational and one for
transitional state. Because every node starts in transitional state and
always ends in the operational state, assembly was always put to normal
state free list and never in transitional free list, so new assembly
structure was always allocated after new node connected.
Solution is to have only one free list.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Steven Dake <stdake@cisco.com>
- Changed paramater to parameter in exec/logcconfig.c
Change-Id: I8a24b0ef5c6621dc6c19d7decbdfe7a255afd10d
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
If we don't have it, fall back to fsync
Fixes the build on FreeBSD
Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
It seems that the IPv6 multicast parameters only take effect when bind()
is called, so I've moved the mcast recv socket bind() to the bottom of
totemudp_build_sockets_ip().
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This patch aligns the votequorum callbacks so that they are
the same as the quorum ones. Previously it was quite common
for votequorum to send one callback for every node in the cluster
when a single new node joined (because it sent one for every
nodeinfo message it received).
This new system makes much more sense in itself and being
consistent with the internal quorum is also an advantage!
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Fix setting of initial watchdog timeout, and also changing of timeout.
Remove redundant starting of timer in exec_init_fn
Signed-off-by: Kazunori INOUE <kazunori.inoue3@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
During SYNC, corosync-cfgtool -R/-H commands can pass through IPC then
send totem messages. This may corrupts
assembly_list_inuse/assembly_list_free if those messages are recedived
after SYNC is done.
The solution is marking related CFG APIs as
CS_LIB_FLOW_CONTROL_REQUIRED.
Signed-off-by: Jason HU <huzhijiang@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
auto_tie_breaker can behave incorrectly in the case of a cluster
with an odd number of nodes. It's possible for a partition to
have quorum while the other side has the ATB node, and both will
continue working. (Of course in a properly configured cluster one side
will be fenced but that becomes an indeterminate race .. just what ATB
is supposed to avoid).
This patch prevents ATB from running in a partition if the 'other'
partition might have quorum, and also mandates the use of wait_for_all
in clusters with an odd number of nodes so that a quorate partition
cannot start services or fence an existing partition with the tie
breaker node.
Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This patch from Hideo Yamauchi improves the logging of
whether nodes leave the cluster cleanly or uncleanly,
making it easier to determine if a node ws shut down
by the operator. There is also the possibility that a
LEAVE message could get missed (due to the node being
in flush state) so this can also make that clearer.
The modifications are as follows.
Change 1) I added the list which maintained LEAVE node to totemsrp.
Change 2) I added registration, a search, the handling of to clear LEAVE
node.
Change 3) I added the output to log.
Change 4) I changed an output level of the log.
Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
As per recent email thread, this patch adds a log message if a JOIN or
LEAVE message is discarded while corosync is flushing the receive queue.
While ignoring a JOIN message is harmless (it will be resent), ignoring
a LEAVE message can cause a longer state transition as it is treated as
a node crashing rather than leaving gracefully, so the system admin
might be confused as to the cause.
Unfortunately, we can't (at the totemudp level) distinguish between JOIN
or LEAVE messages without a lot more protocol-specific code creeping in
the lower layer so the message is left ambiguous.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Having duplicate nodeids in corosync.conf can play havoc with a cluster,
so (as suggested by someone on this list) here is some code to check
that all nodeids are unique. Even if a nodeid is not specified it will
check to be sure that the ID generated from the IP address (ipv4 only)
does not clash with one that is provided.
It logs all non-unique nodeids to syslog, but only the last is reported
on the command-line to the user which should be enough to get them to
check further. At startup this will cause corosync to fail to start.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
If quorum_trackstart() or votequorum_trackstart() are called twice with
CS_TRACK_CHANGES then the client gets added twice to the notifications
list effectively corrupting it. Users have reported segfaults in
corosync when they did this (by mistake!).
As there's already a tracking_enabled flag in the private-data, we check
that before adding to the list again and return an error if
the process is already registered.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
If a cpg client sends a message larger than 1Mb (actually slightly
less to allow for internal buffers) cpg will now fragment that into
several corosync messages before sending it around the ring.
cpg_mcast_joined() can now return CS_ERR_INTERRUPT which means that the
cpg membership was disrupted during the send operation and the message
needs to be resent.
The new API call cpg_max_atomic_msgsize_get() returns the maximum size
of a message that will not be fragmented internally.
New test program cpghum was written to stress test this functionality,
it checks message integrity and order of receipt.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
The two_node and auto_tie_breaker options are incompatible as they
specify conflicting methods of determining the quorate half of a cluster
partition.
This patch detects this error in corosync.conf, issues a message and
disables two_node if auto_tie_breaker is present.
Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
The default for auto_tie_breaker should be 'lowest' - which is what it
was before the extended ATB functionality of auto_tie_breaker_node was
added, and what the documentation states.
However this was broken so that if auto_tie_breaker_node was not
specified then auto_tie_breaker itself was ignored. This patch fixes
that.
It also fixes a typo in a comment.
Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
When config file is reloaded with removed UDPU member, internal icmap
index of nodelist.node can change. This can result in removal and then
adding back node. This, with UDPU alive filtering (where member is by
default considered as not a member) makes corosync not sending messages
to such members resulting in new membership creation.
Solution is to properly test which members were really deleted and added
(instead of relying on internal and dynamic naming of icmap hash table
key name).
Also trully dynamic add and remove node (via cmap) is now handled by
same function so totem_config->interfaces is now updated properly.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
corosync_ring_id_store should use same (safer) permissions as
corosync_ring_id_create_or_load for (eventually) newly created ringid
file.
Credit to Sjerek for finding this problem.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
In active rrp mode, commit tokens are treated as mcast data messages,
thus, rrp directly delivers them to srp layer by active_mcast_recv().
This will result in duplicated commit tokens being received by srp from
different heartbeat links. If node is in recovery state and has already
sent out the initial orf token, those duplicated commit tokens will
cause message_handler_memb_commit_token() to send initial orf token
again! This is wrong because it resets the orf token content in
instance->orf_token_retransmit, which breaks the token retransmission
state.
Furthermore, by sending those initial orf tokens again and again,
it may lead active_token_recv() to drop some subsequent orf tokens.
It is OK for rrp because srp will do token retransmission,
but as said above, srp retransmission state has already been broken,
so finally we meet a "token lost in recovery state" condition caused
by software. If token timeout value is large, then it will takes long
time to create a new ring.
This can be reproduced by having two noded set to active rrp mode, with
two heartbeat links. Then with one node always on, let the other one do
stop/start again and again. It has a low probability to reproduce.
In theory, I think, the more heartbeat links used, the more easily it
can be reproduced.
This problem can be resolved by letting
message_handler_memb_commit_token() to ignore duplicated commit tokens
in recovery state if node (the ring representation) has already sent
out the initial orf token.
Different from prev take, this version do not depends on stored token
data but uses originated_orf_token in totemsrp_instance to remember
if initial orf token has been already originated for current membership.
Signed-off-by: Jason <huzhijiang@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Make sure to log auto-recovery of ring only once. Every
MESSAGE_TYPE_RING_TEST_ACTIVATE receive is logged, but with lower
priority and more detailed information.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Experience with larger production clusters showed that setting RR
priority for corosync is viable for prevent random fencing, ...
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
After a heartbeat link's FAULTY and its auto re-enable,
active_instance->timer_problem_decrementer did not reset to zero. So in
the next timer_function_active_token_expired() round,
active_timer_problem_decrementer_start() will not be called. This will
result in that the active_instance->counter_problems of this link can
not be decreased any more. Cause rrp lose the ability to tolerate
network fluctuation.
This problem can be reproduced by the following sequence:
1) Set RRP in active mode, configure at least 2 heartbeat links.
2) Unplug one link till corosync-cfgtool -s shows it is FAULTY.
3) Re-plug this link then corosync-cfgtool -s shows it is active with
no faults.
4) Unplug this link again but quicky re-plug it before it becomes
FAULTY.
5) Finally, you can see corosync-cfgtool -s shows it is in
"Incrementing problem counter" state despite it currently is physically
healthy.
It can be solved by not forget to reset timer_problem_decrementer to
zero in active_timer_problem_decrementer_cancel().
Signed-off-by: Jason <huzhijiang@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
When using multiple interfaces, it's necessary to use different
multicast address/port pair for each interface to make
rrp work correctly. This is now checked in parser.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Broadcast option is global but in config set in interface section. When
more interfaces are defined, only broadcast from last section was used.
Solution is to use broadcast whenever at least one interface use
broadcast.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Checking code was there, sadly not correct, so it was possible to enter
one bindnet addr as IPv4 and second as IPv6. Fix is trivial.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Some totem configuration values (like token, consensus, ...) are ether
computed or default value is used. It's hard to find out, what
value is really used.
Solution is to store values in cmap.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
MTU for IPv6 is 20 bytes larger then IPv4. This fact was not taken into
account so IPv6 packets were larger then MTU resulting in fragmentation.
Solution is to substract correct IP header size.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
libnss is "weird" in this respect as some block sizes are hardcoded,
others need to be determined dynamically.
For AES we need to use the values we know since GetBlockSize would
return errors, for 3des (that hopefully nobody is using) the value
returned by GetBlockSize is 8, but let's use the call into libnss
to avoid possible conflicts with distro patching or older versions.
Now, given the correct block size, the old calculation simply added
block size to the hdr_size. This is not sufficient.
We use _PAD encryption methods and we need to take that into account.
_PAD is calculated given the current input buf len and rounded up
to block size boundary, then block_size is added.
Ideally we would do that on a per packet base but current transport
infrastructure doesn't allow it yet.
So round up the hdr_size to double the block_size reported by the
cipher.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
To follow spec it's needed to send messages to all nodes (not only
active members) from time to time to detect merge.
This is needed in situations when totemsrp merge timer isn't running
(because there is enough messages sent by processors) to detect merge.
Example scenario:
- 3 nodes, all of them running cpgverify
- One node is isolated (iptables for example)
- Node is un-isolated
Without this commit, node will not merge as long as the cpgverify is
running.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Member active is used for sending "multicast" messages only to members
of ring. This reduces network load if some nodes are intentionally down.
Only regular multicast message load is reduced (messages sent by
totemudpu_mcast_noflush_send), because special messages (like hold
cancel, join message, ...) still have to be send to all members to
ensure correct behavior.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
All *_membership_changed calls totemnet_member_set_active passing 1 as
active parameter for joined nodes and 0 for left nodes.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
totemnet_member_set_active together with transport specific
member_set_active makes possible for totemnet (and more interestingly
transport) to be informed about membership changes.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Services are informed about membership changes, but if same information
is needed inside totemrrp or totemnet, it's impossible to gather this
information.
Patch makes this possible for now only for RRP with empty callbacks.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Although YKD is currently unsupported, untested and decprecated it's
handy for testing things in the quorum module.
This patch allows YKD to actually load without an error. It does not fix
anything else in the service!
Also remove vsftype and its reference to YKD being the preferred and
default provider from the corosync.conf man page,
as that hasn't been true for a considerable time.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
It's possible in a two_node cluster (and others but it's more likely
with just two) that a node could be booted up after downtime or failure
and the other node is not available for some reason. In this case it
would not be allowed to proceed because wait_for_all is enforced.
This patch provides a cmap key to clear this flag in the desperate
situation where that becomes necessary. It should only be used with
extreme caution and will be wrapped up in pcs which should also check
that fencing has been run.
Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
When there is no other activty on ring but only retransmition, and
token is in hold mode, the retransmition will become slow. More over,
if the retransmition is always fail but token rotation works well, then
it takes quite a lone time
(fail_to_recv_const * token_hold = 2500 * 180ms = 450sec) for the
retransmit requester to meet the "FAILED TO RECEIVE" condition to
re-construct a new ring.
This problem can be solved by checking if retransmits are present
before going into hold. If a node is the retransmit requester or
the resender, it set my_token_held to 0 to speed up retransmition
and omit further unnecessary sending of token_hold_cancel signal.
Signed-off-by: Jason HU <huzhijiang@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Configuration option quorum.device.sync_timeout is available for setting
qdevice poll timeout for synchronization phase. Default value is 30
sec.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
If qdevice is registered a alive, corosync waits in sync phase until
timeout expires or qdevice votes with correct nodeid parameter.
This gives qdevice time to decide to vote or not undisturbed and without
time hazard.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This is needed for qdevice to be able to process messages during
synchronization phase.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
If votequorum service receives incorrect (not current) ringid, call is
ignored and CS_ERR_MESSAGE_ERROR is returned.
This and previous commits makes incompatible changes in votequorum
API/ABI, so library version is increased.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Returning ring id will be used in poll function.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
The thesis contains this paragraph:
" The Join timeout is shorter than the Consensus timeout and is used to
increase the probability that Join messages from all currently
working processors are received during a single round of consensus."
Empirically I can confirm that making join less than consensus can cause
havoc with a cluster so I think we should enforce this.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Fix several places where 'then' is used instead of 'than' in error
messages and a comment.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Move finding of bindaddr in nodelist to generally usable function
totem_config_find_local_addr_in_nodelist and refactor
config_convert_nodelist_to_interface function to use it.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Add totem_config_get_ip_version to get user configured ip version.
Make totem_config_read use this newly introduced function.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
QB loop signal handler prototype differs from signal(2) prototype.
Solution is to create wrapper functions.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
SIGSEGV and SIGABRT signals are now correctly handled (blackbox is
dumped and logsys is finalized).
Signed-off-by: zouyu <hopkings2005@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
During reload, local_node_pos is deleted and reinstation is handled in
totemconfig after reload is finished. votequorum handles this events and
tries to reload it's configuration. This led to logging a little scary
messages (even nothing bad is happening, because after local_node_pos
reinstation everything back to normal).
Solution is to stop processing events during reload. Sadly, simple
tracking of config.reload_in_progress doesn't work because LibQB events
triggering order is undefined so votequorum reload handler can be called
before totemconfig (and before local_node_pos is reinstatied).
So new config.totemconfig_reload_in_progress key is defined with very
similar semanthic as config.reload_in_progress but set inside
totem_reload_notify function. Votequorum then use this new key.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
It's not very good idea to allow user apps changing internal key
reload_in_progress.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Previous safe_atoi didn't check range of input values so if for example
user used -1 s token timeout, it was converted to UINT32_MAX without
letting user know.
Another safe_atoi problem was using strtol. This works pretty well on
64-bit systems, where long integer is usually 64-bits long, sadly on
32-bit systems, it is usually 32-bit long. And because strtol returns
signed integer, it was not possible to enter 32-bit value with highest
bit set.
Solution is to use strtoll which is guaranteed to be at least 64-bits
long and check value range.
Also error message now contains also information about expected value
range.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Functions for storing and loading ring id was in the totem library. This
causes problem, what to do when it's impossible to load or store ring
id. Easy solution seemed to be assert, but sadly this makes hard for
user to find out what happened (because corosync was just aborted and
logsys didn't flush)
Solution is to move these functions to main.c, where is much easier to
handle error. This also makes libtotem free of any file system
operations.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Run dir (LOCALSTATEDIR/lib/corosync) was hardcoded thru whole codebase.
Totemsrp was trying to create and chdir into it, but also
takes into account environment variable COROSYNC_RUN_DIR creating
inconsistency.
get_run_dir correctly returns COROSYNC_RUN_DIR (when set) or
LOCALSTATEDIR/lib/corosync. This is now used by all functions instead of
hardcoded string.
All occurrences of mkdir/chdir are removed from totemsrp and chdir is
now called in main function. Mkdir call is completely removed, because
it was not used anyway (check in main.c was called before totemsrp init,
so mkdir was never called) and also make install and/or package system
should take care of creating this directory with correct
permissions/context.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
rdma_join_multicast failed ... message parameters was swapped.
Also information about multicast join is now logged as notice.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Totemiba wasn't able to survive SubnetManager handover or
restart. If SM was migrated to another node, corosync logged
"multicast error" and losses connectivity.
Commit should solve this situation.
Signed-off-by: Yevheniy Demchenko <zheka@uvt.cz>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
token_coefficient change in cmap didn't triggered change. So only way
how to change token_coefficient was editing config file and reload.
Patch let's key totem.token_coefficient to be processed so
token_coefficient can be dynamically changed.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>