Remove another environment variable (reasons similar to removal of
COROSYNC_MAIN_CONFIG_FILE).
Also properly document both totem.keyfile and totem.key.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
COROSYNC_MAIN_CONFIG_FILE environment variable was quite well hidden
and it was never used by init script. It also makes quite hard to debug
possible problems.
Replace it by -c option.
Also patch makes use of configuration file path as a base for uidgid.d
directory, so it's no longer needed to keep uidgid.d in sysconfdir.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
The reason for this change is, that number of corosync CLI options
kind of exploded and scheduler based one are really beter to be kept in
config file.
Nice side-effect of this move is better "integration" with systemd,
because currently used EnvironmentFile should be really used for
environment and not that much for passing extra options to CLI.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
The conversion to the new srp_addr format broke the feature where
UDP/UDPU nodes could get their nodeids generated from the IP address.
A big part of this was the removal of mandatory ring0_addr - it was used
as a placeholder when reading down the nodelist. I replaced this with
nodeid thinking that nodeid was now mandatory, forgetting this use case.
So the compare on "ring0_addr" or "nodeid" is now replaced with a more
robust check that we're only reading keys from the same node_pos once,
this was needed in votequorum.c as well as totemconfig.c
Another tidying side-effect of this patch is that the nodeid generation
is now all in a single routine in totemconfig.c and not shared between
it and totemip.c.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Feature depends on existence of libqb function qb_log_file_reopen.
New function call is added into CFG service API. This function is
used by corosync-cfgtool which now accepts -L parameter.
Finally, logrotate "postrotate" script is calling
corosync-cfgtool -L to notify corosync, instead of using
copytruncate option.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Formally not needed, because totemip_print should not return string
longer than INET6_ADDRSTRLEN, but static analysis tools are not capable
of such conclusion.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This is a bug I seem to have introduced in
429209f4aa where we compare links
for changes. if a new node was added on an existing link then it
was compared against a non-existant one in the previous configuration.
We now only compare nodes that are in both interfaces.
As I needed min() for this function, I moved it from individual
.c files into util.h so we only have one copy.
And the error message was fixed.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Instead of compiling totempg as a shared library, compile all totem code
directly into corosync binary.
Main idea of having totempg which may be
used in other projects was nice, but never really finished (and as far
as I know no project were ever really using it). So at the end of the
day, we've end with huge amount of problems (need to pass new arguments
thru X layers, hard debugging, ...) without any real benefit.
For a future version, we may consider to revisit idea of split totemsrp
into well tested library without unrelated bits like transports/ip/...
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Complete removal of NSS from corosync tree. Most of the changes are
in build system and cpgverify had to be rewritten to use crc32 instead
of sha1.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
It's just much easier to find out what is happening when message like
parser error: /etc/corosync/corosync.conf:39: Unexpected closing brace
is logged instead of
parser error: Unexpected closing brace
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Corosync parser is not very clever, but it is able to detect more errors
without too much code.
1. Check if section name is not empty (just '{' character)
2. Check if there is no extra characters after opening bracket '{'
3. Check if there is no extra characters after or before closing bracket
'}'
4. Check if line is opening section, closing section or key/value
So following examples are reported as error:
totem {
version: 2
}}}}}}}}}}
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
When remove_whitespace function parameter is single character string
with whitespaces (like a:) then colon is not removed. Reason is end
condition end != start, which is valid for empty string, but invalid in
case described above. Solution is to check if *end is '\0'.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Libcgroup is deprecated and not shipping with new distributions
(OpenSuSE is one example). Solution is to have a partial implementation
of required functionality of libcgroup in the corosync code.
Patch uses hardcoded cgroup mount point, because most of the systems are
now systemd and systemd is also using hardcoded mountpoint (see
https://github.com/systemd/systemd/blob/master/src/core/mount-setup.c)
Configuration option --enable-cgroup is gone, because it's not needed
any longer.
Big thanks to Christine Caulfield <ccaulfie@redhat.com> for example of
simplified implementation of cgroup management code primitives.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Token_warning is used to present information about
when the token was last received.
Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Add assert when there are no members in token_memb structure so
non-existing member is not accessed (token should always have
at least one member).
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
... so error_reason can be fully included into parse error message.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Trailing zero is always added so there is no need to have a warning
about unterminated destination string.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Trailing zero is always added so there is no need to have a warning
about unterminated destination string.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This patch intends to solve long time ifdown corosync problem. Idea is
to use local socket for sending both unicast and multicast messages if
interface is down.
Together with testing what is current bind state it's possible to keep
pretending existence of old IP address instead of rebinding to localhost
what breaks a lot things badly.
Heavilly based on Yu, Zou <zouyu@shiqichuban.com> work and it's
basically port of UDP patch created by
Jan Friesse <jfriesse@redhat.com>.
(ported from needle 96354fba72)
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
KNET requires that all links be full-mesh (this may change in the future
but almost certainly not before knet 2.0), so enforce this in the
config.
Also avoid a potential div-by-0 error if the local node is not fully
configured either.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
If the local host does not have a 'name' attribute and the cluster
has more than one link then fail the validation test.
I'm open to the idea of checking all of the nodes in the nodelist
if necessary. It seems overkill as each node will check its own
entry though.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
There are a few things in the interface that cannot be changed on the
fly. Warn about them and tell the user that these things need to be done
in two steps and why.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Compiler shows warnings about possible not large enough buffer, so check
snprintf return value properly.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
It wasn't hardmful, but it generated an annoying message
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
knet sends log messages as struct knet_log_msg, not a string
of KNET_MAX_LOG_MSG_SIZE (which is only part of that structure).
So we were both losing and corrupting messages.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Patch tries to fix incorrect behaviour during following test-case:
- 3 nodes
- Node 1 is paused
- Node 2 and 3 detects node 1 as failed and informs CPG clients
- Node 1 is unpaused
- Node 1 clients are informed about new membership, but not about Node 1
being paused, so from Node 1 point-of-view, Node 2 and 3 failure
Solution is to:
- Remove downlist master choose and always choose local node downlist.
For Node 1 in example above, downlist contains Node 2 and 3.
- Keep code which informs clients about left nodes
- Use joinlist as a authoritative source of nodes/clients which exists
in membership
This patch doesn't break backwards compatibility.
I've walked thru all the patches which changed behavior of cpg to ensure
patch does not break CPG behavior. Most important were:
- 058f50314c - Base. Code was significantly
changed to handle double free by split group_info into two structures
cpg_pd (local node clients) and process_info (all clients). Joinlist
was
- 97c28ea756 - This patch removed
confchg_fn and made CPG sync correct
- feff0e8542 - I've tested described
behavior without any issues
- 6bbbfcb6b4 - Added idea of using
heuristics to choose same downlist on all nodes. Sadly this idea
was beginning of the problems described in
040fda8872,
ac1d79ea7c,
559d4083ed,
02c5dffa5b,
64d0e5ace0 and
b55f32fe2e
- 02c5dffa5b - Made joinlist as
authoritative source of nodes/clients but left downlist_master_choose
as a source of information about left nodes
Long story made short. This patch basically reverts
idea of using heuristics to choose same downlist on all nodes.
(ported from needle 9c2a97f4f9)
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Leave message in totem is just join message where leaving member is
excluded from member list and included in fail list. It also contains
special nodeid in header.nodeid and system_from.nodeid fields.
Before "totem: Use nodeid ONLY in srp_addr" fix, most of the functions
were using system_from addresses and not nodeid, which was used only in
one specific case for memb_consensus_set function.
After the patch, addresses are gone and only nodeid is used. Result is,
that leaving node nodeid is not added into local fail list
(my_faillist) so node is unable to reach consensus till token timeout,
which starts new gather process.
Solution is to send valid leaving node nodeid in system_from.nodeid and
handle specific case for memb_consensus_set in memb_join_process.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
These information are useful and with trace log level they should not be
too much irritating.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
There is regression caused by "totem: Use nodeid ONLY in srp_addr" patch
in srp_addr_compare function. This function should be usable with qsort,
so it should return values less than, equal to or greater than zero. It
was however returning only zero or negation of a zero. Final results
were unable to reach consensus in following test case:
- 3 node cluster
- start nodes 1, 2, 3
- shutdown node 3
- start node 3
- shutdown node 2
- start node 2
- shutdown node 1
After this steps, node 2 and 3 were unable to reach consensus.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
If number of proc_list, failed_list or active members is too high it
may be impossible to put them into message, which is allocated on the
stack what results in stack corruption.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Sanity checkers are used to prevent crashing because of
accessing unallocated memory.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
To make finding victim of incompatible messages easier, IP of sender is
logged. Propagating IP in layers makes patch slightly larger.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Magic number (0xC070) together with version in every packet
is used for detecting that other node is really
Corosync 3.x.
Endian_detector field is removed and magic number is now
used instead.
If received packet magic number differs, guessing is used to show more
about the source (Corosync 2.3+, 2.2 are quite reliable, Knet and
unencrypted Corosync 2.1/2.0/1.x/OpenAIS are semi-reliable and encrypted
Corosync 2.1/2.0/1.x/OpenAIS are quite unreliable).
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
because totemknet always configures link0 as loopback even
if it's not known to corosync, we need to filter it
out when returning the link status, as things get misaligned
in cfg.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Even if it's not used for anything else.
Also, make cfgtool show the correct link ID when links are not
contiguous
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Fix crash introduced a couple of commits ago in iface_get
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Allow ring0_addr to be used in place of 'name' for
backwards compatibility
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Make the message more representative of what's going on.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>