Commit Graph

2100 Commits

Author SHA1 Message Date
liangxin1300
f0e1eaff2d totemconfig: validate totem.transport value
Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-09-03 16:00:31 +02:00
Christine Caulfield
5f71445be0 config: Allow reconfiguration of crypto options
Needs new knet crypto API.

If it's not available, then fall back to the old
API and forbid changing crypto while running.

To avoid us being dependant on the leader node, each
node sends its own crypto_reconfig_phase messages so
we can guarantee that the reconfiguration always completes
on each node.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-07-09 16:54:16 +02:00
Christine Caulfield
f8b63083e1 config: Fix crash when a reload fails twice
Have string values stored in char arrays in totem_config
so we don't get into a mess with the pointers.

Also remove vsftype (which hasn't been used since corosync 1)

Use strncpy even though we know the string is fine. Keep covscan happy

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-04-24 16:27:18 +02:00
Christine Caulfield
4ddc96cd4e config: Don't free pointers used by transports
reload failed for UDP[U] because they had saved pointers
to the interfaces[] array. so memcpy into that rather then
re-allocate it.

Also, move the check for different IP address families so
it also gets run at reload time.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-04-24 16:27:09 +02:00
Christine Caulfield
7cb539e2e3 config: don't reload vquorum if reload fails
Fix an 'error: success' stype message by propogating error_string
back down the stack.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-04-24 16:27:01 +02:00
Christine Caulfield
600072ef38 cfg: Improve error return to cfgtool -R
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-04-24 16:26:54 +02:00
Christine Caulfield
f078fff6eb config: Reorganise the config system
To be more reliable & maintainable

The basic plan here is to fix reloads to be more stable
using read/parse/verify/build/commit stages, so that any errors
will not leave corosync in an unstable state. This should
also make the code more maintainable as currently the verify/commit
stages are horribly intertwined.

Also:
- Fix local_node_pos not being updated in the new map during validation
 (broke adding and removing new nodes in the middle of the list).
- Fix reconfiguration so that nodes are indexed by nodeid and not their
  position in the list. This is an old bug that's just been carried
  over

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-04-24 16:26:44 +02:00
Jan Friesse
1777d9992c Revert "totemip: compare sin6_scope_id and interface_num"
This reverts commit efd34df531 to make
master compile after revert of 934c47ed43.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2020-04-22 13:30:36 +02:00
Jan Friesse
cd6cc90a6f Revert "totemip: Add support for sin6_scope_id"
This reverts commit 934c47ed43 which is
causing protocol incompatibility in needle. Master seems to be not
affected, but it needs more checking.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2020-04-22 13:30:19 +02:00
Christine Caulfield
c631951ef5 icmap: icmap_init_r() leaks if trie_create() fails
Thanks to Coverity for finding this

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-03-26 14:42:41 +01:00
Jan Friesse
ca320beac2 votequorum: set wfa status only on startup
Previously reload of configuration with enabled wait_for_all result in
set of wait_for_all_status which set cluster_is_quorate to 0 but didn't
inform the quorum service so votequorum and quorum information may get
out of sync.

Example is 1 node cluster, which is extended to 3 nodes. Quorum service
reports cluster as a quorate (incorrect) and votequorum as not-quorate
(correct). Similar behavior happens when extending cluster in general,
but some configurations are less incorrect (3->4).

Discussed solution was to inform quorum service but that would mean
every reload would cause loss of quorum until all nodes would be seen
again.

Such behaviour is consistent but seems to be a bit too strict.

Proposed solution sets wait_for_all_status only on startup and
doesn't touch it during reload.

This solution fulfills requirement of "cluster will be quorate for
the first time only after all nodes have been visible at least
once at the same time." because node clears wait_for_all_status only
after it sees all other nodes or joins cluster which is quorate. It also
solves problem with extending cluster, because when cluster becomes
unquorate (1->3) wait_for_all_status is set.

Added assert is only for ensure that I haven't missed any case when
quorate cluster may become unquorate.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2020-03-24 14:13:32 +01:00
Jan Friesse
0c16442f2d votequorum: Change check of expected_votes
Previously value of new expected_votes was checked so newly computed
quorum value was in the interval <total_votes / 2, total_votes>. The
upper range prevented the cluster to become unquorate, but bottom check
was almost useless because it allowed to change expected_votes so it is
smaller than total_votes.

Solution is to check if expected_votes is bigger or equal to total_votes
and for quorate cluster only check if cluster doesn't become unquorate
(for unquorate cluster one can set upper range freely - as it is
perfectly possible when using config file)

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2020-03-13 09:06:55 +01:00
Jan Friesse
35662dd0ec main: Add schedmiss timestamp into message
This is useful for matching schedmiss event in stats map with logged
event.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2020-02-27 08:37:35 +01:00
liangxin1300
efd34df531 totemip: compare sin6_scope_id and interface_num
When user configure a specific interface like vlan
with the same IPv6 link-local address, Corosync should
compare sin6_scope_id with interface_num, to make sure got
the right interface to bind

Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-02-21 15:46:22 +01:00
Jan Friesse
38d1d10d39 totemip: Remove unused totemip_copy_endian_convert
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2020-02-17 17:31:55 +01:00
Jan Friesse
934c47ed43 totemip: Add support for sin6_scope_id
sin6_scope_id was not present in totemip structure making impossible to
use link-local ipv6 address.

Patch adds sin6_scope_id and changes convert/copy functions to use it
(formally also comparator functions should be changed, but it seems to
cause more harm and it is not really needed).

This makes corosync work with link-local addresses fine for both UDPU
and Knet transport as long as interface specification is used (so
fe80::xxxx:xxxx:xxxx:xxxx%eth0).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2020-02-17 17:31:42 +01:00
Jan Friesse
720a892751 cfgtool: Improve link status display
Totemknet is enhanced to use 'n' character for localhost and not adding
status, because it is safe to expect that localhost link is always
connectd. corosync-cfgtool is enhanced to properly decode 'n', '?' and
'd' characters and display its meaning for extended status. Special
characters are also documented in man page.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2020-02-12 13:08:25 +01:00
Hideo Yamauchi
0143ee9a2f totemknet: Change the initial value of the status
Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-02-10 16:41:22 +01:00
Jan Friesse
ebd05fa008 stats: Use nanoseconds from epoch for schedmiss
Using monotonic time is not working because it doesn't have to match
time from epoch.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2020-01-23 17:58:41 +01:00
Christine Caulfield
48b6894ef4 stats: Add stats for scheduler misses
This patch add a stats.schedmiss.* set of entries that
are a record of the last 10 times corosync was not scheduled
in time.

These entries are keypt in reverse order (so stats.schedmiss.0.* is
always the latest one kept) and the values, including the timestamp,
are in milliseconds.

It's also possible to use a cmap tracker to follow these events, which
might be useful.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-01-22 17:06:10 +01:00
Jan Friesse
8ce65bf951 votequorum: Reflect runtime change of 2Node to WFA
When 2Node mode is set, WFA is also set unless WFA is configured
explicitly. This behavior was not reflected on runtime change, so
restarted corosync behavior was different (WFA not set). Also when
cluster is reduced from 3 nodes to 2 nodes during runtime, WFA was not
set, what may result in two quorate partitions.

Solution is to set WFA depending on 2Node when WFA
is not explicitly configured.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2020-01-21 16:19:49 +01:00
Hideo Yamauchi
9fda4dc6ac cpg: Change downlist log level
Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-01-09 12:40:32 +01:00
Jan Friesse
89b0d62f8b stats: Check return code of stats_map_get
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:45 +01:00
Jan Friesse
35c312f810 votequorum: Assert copied strings length
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
29109683cf totemknet: Assert strcpy length
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
0c118d8ff4 totemknet: Check result of fcntl O_NONBLOCK call
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
a24cbad590 totemconfig: Initialize warnings variable
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
74eed54a7f sync: Assert sync_callbacks.name length
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
380b744ec8 totemknet: Don't mix corosync and knet error codes
And use correct return code in stats.c.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
624b6a4707 stats: Assert value_len when value is needed
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
f31a31f91a cmap: Assert copied string length
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
e925e389a2 totemconfig: Reuse already fetched pointer
Make code a bit readable and easier to process for coverity.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
09f6d34aaa logconfig: Remove double free of value
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
cddd62f972 votequorum: Ignore the icmap_get_* return value
Express intention to ignore icmap_get_* return
value and rely on default behavior of not changing the output
parameter on error.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:44 +01:00
Jan Friesse
efe48120e2 totemconfig: Free leaks found by coverity
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-28 09:44:43 +01:00
Christine Caulfield
1ba03a3816 icmap: fix the icmap_get_*_r functions
Make the icmap*_r functions read from the specified map rather
than the global map.

Also include icmap_get_string_r() which seems to have been missed out.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2019-11-18 16:29:57 +01:00
Jan Friesse
6ba9870f69 Initialize stack allocated memory
Some functions allocated memory on stack without clearing memory and
then send them on wire. This is not an issue, but valgrind reports this
as a problem so it is easy to miss real problem then.

Solution is to clear stack memory.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-11-08 11:20:18 +01:00
Thomas Lamprecht
721c5d4b5b man: Fix corosync.conf knet pong count default
commit 029b8ebad6 changed the default
of the KNET_PONG_COUNT from the kronosnet default of 5 to 2, as
corosync bring up was deemed to slow.

The documentation, and the comment stating that the totem config
default values match the knet ones were not updated, and thus now out
of date.

Fixhis by noting the correct default of 2 for KNET_PONG_COUNT and
note that all but that one are in sync with the korosync defaults in
the comment.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2019-10-17 08:27:07 +02:00
Jan Friesse
ee8b8993d9 totemsrp: Reduce MTU to left room second mcast
Messages sent during recovery phase are encapsulated so such message has
extra size of mcast structure. This is not so big problem for UDPU,
because most of the switches are able to fragment and defragment packet
but it is problem for knet, because totempg is using maximum packet size
(65536 bytes) and when another header is added during retransmition,
then packet is too large.

Solution is to reduce mtu by 2 * sizeof (struct mcast).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2019-10-09 11:48:43 +02:00
Jan Friesse
bd11a3380c totempg: Check sanity (length) of received message
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2019-10-09 11:48:17 +02:00
Jan Friesse
1cf1558fe7 totemknet: Add locking for log call
Knet callbacks may be called from different thread than main thread. If
this happens, log messages may be lost. Most prominent example is when
link goes up (logged by main thread) and host_change_callback_fn is
called.

Implemented solution is adding mutex for every log call in totemknet.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2019-09-10 11:29:54 +02:00
Jan Friesse
3675daceee totem: Increase ring_id seq after load
This patch handles the situation where the leader
node (the node with lowest node_id) crashes and is started again
before token timeout of the rest of the cluster.
The newly restarted node restores the ringid of the old ring from
stable storage, so it has the same ringid as rest of the nodes,
but ARU is zero. If the node is able to create a singleton membership
before receiving the joinlist from rest of the cluster,
everything works as expected, because the ring id gets increased
correctly.

But if the node receives a joinlist from another cluster node before
its own joinlist, then it continues as it would had it never left
the cluster. This is not correct, because the new node should always
create a singleton configuration first.

During the recovery phase, ARUs are compared and because they differ
(the ARU of the old leader node is 0), the other nodes
try to sent all of their previous messages. This is impossible
(even if it was correct), because other nodes have already freed most
of those messages. The implementation uses an assert to limit maximum
number of messages sent during recovery (we could fix this,
but it's not really the point).

The solution here is to increase the ring_id sequence number by 1 after
loading it from storage. During creation of the commit token it is
always increased by 4, so it will not collide with an existing
sequence.

Thanks Christine Caulfield <ccaulfie@redhat.com> for clarify commit
message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-07-15 16:39:32 +02:00
Jan Friesse
5731af2782 logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID
Previously node id was logged ether as a %d (most often), %u, %x or
PRI.32 and ring id ether as %lld, %llx with various separators (., :, /)
between rep nodeid and seq. This seems to cause confusion.

This patch adds macros CS_PRI_NODE_ID, CS_PRI_RING_ID and
CS_PRI_RING_ID_SEQ (CS prefix = corosync, PRI modeled in spirit of
inttypes.h PRIx32) and makes code use them.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-07-03 10:53:52 +02:00
Jan Friesse
d59a18d4a1 totemknet: Disable forwarding on shutdown
Disabling forwarding will make knet flush the messages (especially
LEAVE one).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-06-28 08:27:18 +02:00
Jan Friesse
51fbd7bafe totemconfig: Fix compiler warning
Compiler is unable to understand relation between members and
num_configured and warns about uninitialized members. Instead of
initializing members to 0 and (potentially after some code
refactor) let code fall to display error message, more explicit method
of assert is used.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-06-17 17:44:10 +02:00
Thomas Lamprecht
816324c94c totem: fix check if all nodes have same number of links
configured links may not come in order in the interfaces array, which
holds an entry for _all_ possible links, not just configured ones.

So iterate through all interfaces, but skip those which are not
configured. This allows to start corosync with a configuration where
link 0 is currently not mentioned, as else it was checked but had
member_count = 0 from it's default initialization, which then made
this code report a false positive for the "Not all nodes have the
same number of links" check even on a correct config.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2019-06-17 12:29:30 +02:00
Thomas Lamprecht
7ada508a82 totem: fix check if all nodes have name attrs in multi-link setups
As totem_config->interfaces entries are _all_ possible links and not
only the configured ones we cannot trust that interface[0] is
configured at the time of checking, and thus has a valid
member_count. So set the members variable to the member_count entry
from an actually configured interface and loop over that one.

This fixes a case where the check for the name property on all nodes
for multi links was skipped if link 0 was not configured, as then its
member_count was 0.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2019-06-17 12:29:09 +02:00
Jan Friesse
b0c24ec665 totemsrp: Fix warnings produced by gcc 9.1
New gcc warn about passing posibly unaligned pointer from packed
structure. This shouldn't be problem for x86.

Implemented solution is to let compiler do its job (compiler knows if
pointer is aligned so accessing structure field is safe) and
use it together with support for asigning and returning of structure
(not a pointer to the structure).

- srp_addr_copy is removed and replaced by simple assignment
- srp_addr_copy_endian_convert is removed and replaced by
  srp_addr_endian_convert function which takes srp_addr structure and
  returns endian converted srp_addr structure
- functions which accepts srp_addr array are not changed because
  (luckily) non-aligned pointer is always just one item array and
  such item is always used as a source pointer so it's possible to use
  temporary variable

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-06-14 10:03:31 +02:00
Jan Friesse
3c7f19a02f cpg: Move filling of member_list to subfunction
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2019-06-13 15:16:31 +02:00
Jan Friesse
1e2df0ba0c cpg: Add more comments to notify_lib_joinlist
And make handling of left_list more generic. Also free skiplist
allocated by joinlist_inform_clients function. Last (but not least)
remove czechlish founded (should have been pp of "find").

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2019-06-13 15:16:13 +02:00