Commit Graph

4160 Commits

Author SHA1 Message Date
Jan Friesse
1d2c6e4696 totemconfig: Enlarge error_string_response
... so error_reason can be fully included into parse error message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-08-13 09:00:44 +02:00
Jan Friesse
0095b9a3cb ipc_glue: Fix strncpy in pid_to_name function
Trailing zero is always added so there is no need to have a warning
about unterminated destination string.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-08-13 09:00:43 +02:00
Jan Friesse
f286388275 cmap: Fix strncpy warning in cmap_iter_next
cmap_iter_next in contrast of it's icmap counterpart copies key name
into user preallocated space. In the worst case, key name may be
CMAP_KEYNAME_MAXLEN, so cmap_iter_next then need CMAP_KEYNAME_MAXLEN +
additional byte to store zero. strncpy was copying only
CMAP_KEYNAME_MAXLEN characters so there was possibility of unterminated
string.

Patch solves this by using memcpy and always add trailing zero.
Documentation was improved suggesting minimum size of keyname buffer to
be CMAP_KEYNAME_MAXLEN + 1.

Also sam and quorumtool were using too short buffer so they are fixed too.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-08-13 09:00:41 +02:00
Jan Friesse
f576ad6388 util: Fix strncpy in setcs_name_t function
Trailing zero is always added so there is no need to have a warning
about unterminated destination string.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-08-13 09:00:39 +02:00
Jan Friesse
844a76e775 totemknet: Free instance on failure exit
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-08-13 09:00:35 +02:00
Jan Friesse
afdc405512 spec: Add explicit gcc build requirement
Also remove %clean macro which is not needed for ages.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2018-08-09 16:57:43 +02:00
Chris Walker
bde247677b Add option for quiet operation to corosync-cmapctl
Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-08-09 10:05:48 +02:00
Jan Friesse
31268cc744 totemudpu: Pass correct paramto totemip_nosigpipe
Fixes compilation on (at least) FreeBSD.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2018-07-12 16:29:15 +02:00
Bin Liu
96b4bd1660 totemudpu: Add local loop support
This patch intends to solve long time ifdown corosync problem. Idea is
to use local socket for sending both unicast and multicast messages if
interface is down.

Together with testing what is current bind state it's possible to keep
pretending existence of old IP address instead of rebinding to localhost
what breaks a lot things badly.

Heavilly based on Yu, Zou <zouyu@shiqichuban.com> work and it's
basically port of UDP patch created by
Jan Friesse <jfriesse@redhat.com>.

(ported from needle 96354fba72)

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-07-12 15:43:03 +02:00
Christine Caulfield
a471bab798 config: Fail config validation if not all nodes have all links
KNET requires that all links be full-mesh (this may change in the future
but almost certainly not before knet 2.0), so enforce this in the
config.

Also avoid a potential div-by-0 error if the local node is not fully
configured either.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-07-03 12:38:02 +02:00
Christine Caulfield
d1db8c2851 config: Enforce use of 'name' node attribute in multi-link clusters
If the local host does not have a 'name' attribute and the cluster
has more than one link then fail the validation test.

I'm open to the idea of checking all of the nodes in the nodelist
if necessary. It seems overkill as each node will check its own
entry though.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-07-03 12:37:45 +02:00
Christine Caulfield
429209f4aa totemconfig: Check for things that cannot be changed on the fly
There are a few things in the interface that cannot be changed on the
fly. Warn about them and tell the user that these things need to be done
in two steps and why.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-07-02 09:54:31 +02:00
Jan Friesse
cc81696ff5 Fix snprintf warnings
Compiler shows warnings about possible not large enough buffer, so check
snprintf return value properly.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-07-02 08:08:33 +02:00
Jan Friesse
fc45968223 init: Use existing env variable from sysconf
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-07-02 08:08:23 +02:00
Jan Friesse
0031822c68 upstart: Remove notifyd upstart unit
Hopefully this is last upstart bit.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-07-02 08:07:15 +02:00
Christine Caulfield
137b31397c knet: Don't try to create loopback interface twice
It wasn't hardmful, but it generated an annoying message

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-07-02 08:00:36 +02:00
Christine Caulfield
5dda71ae29 knet: Fix knet log buffer size
knet sends log messages as struct knet_log_msg, not a string
of KNET_MAX_LOG_MSG_SIZE (which is only part of that structure).
So we were both losing and corrupting messages.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-07-02 08:00:15 +02:00
Jan Friesse
23e17953fe cpg: Inform clients about left nodes during pause
Patch tries to fix incorrect behaviour during following test-case:
- 3 nodes
- Node 1 is paused
- Node 2 and 3 detects node 1 as failed and informs CPG clients
- Node 1 is unpaused
- Node 1 clients are informed about new membership, but not about Node 1
  being paused, so from Node 1 point-of-view, Node 2 and 3 failure

Solution is to:
- Remove downlist master choose and always choose local node downlist.
  For Node 1 in example above, downlist contains Node 2 and 3.
- Keep code which informs clients about left nodes
- Use joinlist as a authoritative source of nodes/clients which exists
  in membership

This patch doesn't break backwards compatibility.

I've walked thru all the patches which changed behavior of cpg to ensure
patch does not break CPG behavior. Most important were:
- 058f50314c - Base. Code was significantly
  changed to handle double free by split group_info into two structures
  cpg_pd (local node clients) and process_info (all clients). Joinlist
  was
- 97c28ea756 - This patch removed
  confchg_fn and made CPG sync correct
- feff0e8542 - I've tested described
  behavior without any issues
- 6bbbfcb6b4 - Added idea of using
  heuristics to choose same downlist on all nodes. Sadly this idea
  was beginning of the problems described in
  040fda8872,
  ac1d79ea7c,
  559d4083ed,
  02c5dffa5b,
  64d0e5ace0 and
  b55f32fe2e
- 02c5dffa5b - Made joinlist as
  authoritative source of nodes/clients but left downlist_master_choose
  as a source of information about left nodes

Long story made short. This patch basically reverts
idea of using heuristics to choose same downlist on all nodes.

(ported from needle 9c2a97f4f9)

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-30 14:37:20 +02:00
Chris Lamb
82b81990aa man: Make the manpages reproducible
Whilst working on the Reproducible Builds effort [0], we noticed
that corosync could not be built reproducibly.

This is because, whilst it uses SOURCE_DATE_EPOCH[1], the output
varies depending on the current timezone.

(The LC_ALL is not needed as we only use %Y-%m-%d)

This was originally filed in Debian as #896441.

 [0] https://reproducible-builds.org/
 [1] https://reproducible-builds.org/specs/source-date-epoch/
 [2] https://bugs.debian.org/896441

Signed-off-by: Chris Lamb <lamby@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-04-25 15:24:19 +02:00
Jan Friesse
e45bbcc92a totemsrp: Fix leave message regression
Leave message in totem is just join message where leaving member is
excluded from member list and included in fail list. It also contains
special nodeid in header.nodeid and system_from.nodeid fields.

Before "totem: Use nodeid ONLY in srp_addr" fix, most of the functions
were using system_from addresses and not nodeid, which was used only in
one specific case for memb_consensus_set function.

After the patch, addresses are gone and only nodeid is used. Result is,
that leaving node nodeid is not added into local fail list
(my_faillist) so node is unable to reach consensus till token timeout,
which starts new gather process.

Solution is to send valid leaving node nodeid in system_from.nodeid and
handle specific case for memb_consensus_set in memb_join_process.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-23 17:46:05 +02:00
Jan Friesse
dc590159f5 totemsrp: Log proc/fail lists in memb_join_process
These information are useful and with trace log level they should not be
too much irritating.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-23 17:45:51 +02:00
Jan Friesse
9b3782e48e totemsrp: Fix srp_addr_compare
There is regression caused by "totem: Use nodeid ONLY in srp_addr" patch
in srp_addr_compare function. This function should be usable with qsort,
so it should return values less than, equal to or greater than zero. It
was however returning only zero or negation of a zero. Final results
were unable to reach consensus in following test case:
- 3 node cluster
- start nodes 1, 2, 3
- shutdown node 3
- start node 3
- shutdown node 2
- start node 2
- shutdown node 1

After this steps, node 2 and 3 were unable to reach consensus.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-23 17:45:29 +02:00
Ferenc Wágner
21b80818c5 tools: don't distribute what we can easily make
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-04-23 11:30:14 +02:00
Fabio M. Di Nitto
94dff3b765 Drop all references to SECURITY file
File was removed by 6bdf0962ad.
Patch fixes master branch build again.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-04-23 08:05:24 +02:00
Jan Friesse
6bdf0962ad SECURITY: Remove SECURITY file
Basically no information from SECURITY file is valid.

Library interface and related uidgid are better described in manpages.

LibNSS is not directly used any longer.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-20 15:30:55 +02:00
Ferenc Wágner
67c644e69b NSS_NoDB_Init: the parameter is reserved, must be NULL
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-04-20 12:19:22 +02:00
Ferenc Wágner
83c3f620bb Fix typo: defualt -> default
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-04-20 12:05:04 +02:00
Ferenc Wágner
baece74c39 Fix typo: sucesfully -> successfully
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-04-20 12:04:49 +02:00
Jan Friesse
ccb2290f84 totemsrp: Check join and leave msg length
If number of proc_list, failed_list or active members is too high it
may be impossible to put them into message, which is allocated on the
stack what results in stack corruption.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-12 15:25:38 +02:00
Jan Friesse
c139255669 totemsrp: Implement sanity checks of received msgs
Sanity checkers are used to prevent crashing because of
accessing unallocated memory.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-12 15:25:33 +02:00
Rytis Karpuška
aa62c2c028 cpg: Handle fragmented message sending interrupt
It turns out that there are some legitimate cases where fragmented
messages might be interrupted during sending (e.g. CS_ERR_TRY_AGAIN or
as in my case: CS_ERR_INTERRUPT). This creates a situation where
LIBCPG_PARTIAL_FIRST is sent multiple times before receiving
LIBCPG_PARTIAL_LAST.

Solution is to drop incomplete message and start assembly of new message
as libcpg should have reported error during sending of that
incomplete message.

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-04-11 18:40:07 +02:00
Jan Friesse
69857efb5b totem: Display IP of sender
To make finding victim of incompatible messages easier, IP of sender is
logged. Propagating IP in layers makes patch slightly larger.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-03-16 13:58:15 +01:00
Jan Friesse
0c509a25a7 totemsrp: Add magic and version into header
Magic number (0xC070) together with version in every packet
is used for detecting that other node is really
Corosync 3.x.

Endian_detector field is removed and magic number is now
used instead.

If received packet magic number differs, guessing is used to show more
about the source (Corosync 2.3+, 2.2 are quite reliable, Knet and
unencrypted Corosync 2.1/2.0/1.x/OpenAIS are semi-reliable and encrypted
Corosync 2.1/2.0/1.x/OpenAIS are quite unreliable).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-03-16 13:57:55 +01:00
Christine Caulfield
066525efd3 knet: Fix display of links with unconfigured link0
because totemknet always configures link0 as loopback even
if it's not known to corosync, we need to filter it
out when returning the link status, as things get misaligned
in cfg.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-16 13:11:13 +01:00
Jan Friesse
b3f3a1df26 main: Set errno before calling of strtol
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-03-02 17:29:22 +01:00
Jan Friesse
4ec3d590fa quorumtool: Don't set our_flags without v_handle
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-03-02 17:29:20 +01:00
Jan Friesse
db069ebd72 sam_test_agent: Remove unused assignment
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-03-02 17:29:18 +01:00
Jan Friesse
883dbeb953 blackbox: Quote subshell result properly
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-03-02 17:29:16 +01:00
Jan Friesse
e72b4fee62 init: Quote subshell result properly
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-03-02 17:29:11 +01:00
Christine Caulfield
d2876abed3 cfgtool: Don't assume link ID is a single char
For the moment link-ids are a single digit, but that could change and
the tools shouldn't be quite so fragile. So parse the interface_name
properly by looking for the space between the linkID and the IP.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 16:09:46 +01:00
Christine Caulfield
2c20590d16 knet: Always use link0 for loopback
Even if it's not used for anything else.

Also, make cfgtool show the correct link ID when links are not
contiguous

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:23:20 +01:00
Christine Caulfield
111bfbc11d totem: Fix debug warnings printed by knet
Fix crash introduced a couple of commits ago in iface_get

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:22:22 +01:00
Christine Caulfield
f5871c6b4c config: Allow use of ring0_addr
Allow ring0_addr to be used in place of 'name' for
backwards compatibility

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:21:37 +01:00
Christine Caulfield
7a639d1b62 config: Update message when local host isn't found
Make the message more representative of what's going on.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:20:00 +01:00
Christine Caulfield
386d710ed1 cfg: Fix cfg_get_node_addrs so that DLM works
Also update copyright dates

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:19:45 +01:00
Christine Caulfield
f5b690bd96 totem: Return interface count correctly
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:19:12 +01:00
Christine Caulfield
fc8580bdbf totem: Use nodeid ONLY in srp_addr
This shrinks the srp_addr (and consequently every packet sent by
corosync) so that instead of containing loads of IP addresses to
identify a node, it just sends the nodeid.

This then allows us to make ring0 optional and replaceable when running
knet.

It also means that we need some other way of identifying the local
node in corosync.conf, so the nodelist.node.name entry is now mandatory
and is mapped to the local host using the same algorithm as used in
cman.

This code needs LOTS of testing as it touches a huge amount of totemsrp
and totemconfig.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:18:51 +01:00
Fabio M. Di Nitto
6f784804fe [rpm] use rpm macros to identify build distro
thanks Honza for spotting it

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-02-14 09:48:01 +01:00
Fabio M. Di Nitto
30c7f3319f [rpm] fixup corosync.spec.in to build on opensuse
- move dbus-devel and nss-devel BuildRequires to file based depedency.
  Those 2 BR have different names in OpenSUSE vs Fedora/RHEL/Centos.
  This is kind of controversial as most distribution prefers a package
  based build depedency, but the rpm version that supports
  BuildRequires: foo || bar
  is only available in rawhide and tumbleweed (aka no stable releases
  are shipping it yet).
  In order to build rpms in CI and have some level of flexibility
  with upstream spec file, we need to compromise a bit.

- add explicit --docdir
  OpenSUSE does not ship docs in the normal dir and their rpm macro
  does not appear to set it for us.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-02-14 09:22:27 +01:00
Rytis Karpuška
105f3ae98c totempg: Fix corrupted messages
Commit 899cb29983 changed copy_len
to iovec[i].iov_len, assuming,
copy_len is always the same as iovec[i].iov_len under those
circumstances, but it missed the possability of small message being
partly put at the end of packet, which cuts this message in two parts
and therefore making copy_len not equal to iovec[i].iov_len.

This is revert of 899cb29983

Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-02-09 17:38:05 +01:00