Because crypto changing happens in the 'commit' phase
of the reload and we can't get sure that knet will
allow the new parameters, the result gets ignored.
This can happen in FIPS mode if a non-FIPS cipher
is requested.
This patch reports the errors back in a cmap key
so that the command-line can spot those errors
and report them back to the user.
It also restores the internal values for crypto
so that subsequent attempts to change things have
predictable results. Otherwise further attempts can
do nothing but not report any errors back.
I've also added some error reporting back for the
knet ping counters using this mechanism.
The alternative to all of this would be to check for FIPS
in totemconfig.c and then exclude certain options, but this
would be duplicating code that could easily get out of sync.
This system could also be a useful mechanism for reporting
back other 'impossible' errors.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This required adding a lot of return values to two previously
'void' functions. I did two rather than just the one that was
needed because it seemed to make sense to do them both together.
Although these functions now return errors, they are probably
still ignored higher up. this really needs a comprehensive audit.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
totem.knet_mtu is new configuration option which allows setting
of automatic or manual knet MTU.
Also reload of totem.knet_pmtud_interval is fixed now, so it works when
key is deleted (and set back default value).
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
uname in Solaris/Illumos returns non-negative value when succesful.
Signed-off-by: Andreas Grueninger <andreas.grueninger@noemail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Previously, existence of retransmit messages canceled holding
of token (and never allowed representative to enter token hold
state).
This makes token rotating maximum speed and keeps processor
resending messages over and over again - overloading network
and reducing chance to successfully deliver the messages.
Also there were reports of various Antivirus / IPS / IDS which slows
down delivery of packets with certain sizes (packets bigger than token)
what make Corosync retransmit messages over and over again.
Proposed solution is to allow representative to enter token hold
state when there are only retransmit messages. This allows network to
handle overload and/or gives Antivirus/IPS/IDS enough time scan and
deliver packets without corosync entering "FAILED TO RECEIVE" state and
adding more load to network.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Knet limits maximum node id to 16-bit type. This was not ensured in
corosync and it was possible to set nodeid to value >= 65536 and
(surprisingly) most of the things were working quite well because of
overflow. corosync-cmapctl -m stats contained knet nodeid in
stats.knet. subtree, so for nodeid 65536 result was:
Can't get value of stats.knet.node0.link0.connected. Error
CS_ERR_NOT_EXIST
Commit implements checking of nodeid and limits it to KNET_MAX_HOST
value when knet is used.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Nodeid is required for knet for every node. Right now, existence of
nodeid is checked only for local for local node, so broaden the test.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
totem.nodeid is relict from times when nodelist was not required and
totemsrp was sending whole membership with ip addresses.
With Corosync 3 ip addresses are no longer sent so
it is not possible to find "next" node ip address where to send token
(because only nodeid is sent) without having information about all of
the nodes stored locally.
When totem.nodeid was configured it was partly used and other parts
(most notably totemudpu_token_target_set) were using autogenerated
nodeid. Together it was not possible to create even single node
membership.
Solution is to ignore totem.nodeid completely (and display warning when
it is set).
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Use knet_get_crypto_list to find knet supported crypto models and use
them instead of hardcoded list.
Also fix compression handling. Previously knet_compression_model
value was not checked at all and was directly passed to knet.
Use knet_get_compress_list to find knet supported compress models and
use them to check validity of config file and for more informative
error message.
Lastly enhance corosync version display with information
about available crypto/compression models.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Fix integer underflow when computing `namelen` in `nodelist_byname`,
always use computed `namelen`.
Fixes#626.
Signed-off-by: Johannes Krupp <johannes.krupp@cispa.saarland>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Default token timeout of 1000 ms was often changed by users because of
other workloads on machine which may make corosync responding a bit
later than needed and resulting in token loss.
3000 ms was chosen as a compromise between token timeout increase
and allow live cluster upgrade (other nodes should receive token
by node with new default on time).
It doesn't affect token token_coefficient so final token timeout still
depends on number of configured nodes (just base is higher).
This change slows down failover a bit so for clusters where failover
times are important, please change the token timeout in configuration
file corosync.conf as a:
totem {
version: 2
token: 1000
...
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Check whether linknumber larger than INTERFACE_MAX and display error if
so.
Signed-off-by: liangxin1300 <XLiang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Needs new knet crypto API.
If it's not available, then fall back to the old
API and forbid changing crypto while running.
To avoid us being dependant on the leader node, each
node sends its own crypto_reconfig_phase messages so
we can guarantee that the reconfiguration always completes
on each node.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Have string values stored in char arrays in totem_config
so we don't get into a mess with the pointers.
Also remove vsftype (which hasn't been used since corosync 1)
Use strncpy even though we know the string is fine. Keep covscan happy
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
reload failed for UDP[U] because they had saved pointers
to the interfaces[] array. so memcpy into that rather then
re-allocate it.
Also, move the check for different IP address families so
it also gets run at reload time.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Fix an 'error: success' stype message by propogating error_string
back down the stack.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
To be more reliable & maintainable
The basic plan here is to fix reloads to be more stable
using read/parse/verify/build/commit stages, so that any errors
will not leave corosync in an unstable state. This should
also make the code more maintainable as currently the verify/commit
stages are horribly intertwined.
Also:
- Fix local_node_pos not being updated in the new map during validation
(broke adding and removing new nodes in the middle of the list).
- Fix reconfiguration so that nodes are indexed by nodeid and not their
position in the list. This is an old bug that's just been carried
over
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
commit 029b8ebad6 changed the default
of the KNET_PONG_COUNT from the kronosnet default of 5 to 2, as
corosync bring up was deemed to slow.
The documentation, and the comment stating that the totem config
default values match the knet ones were not updated, and thus now out
of date.
Fixhis by noting the correct default of 2 for KNET_PONG_COUNT and
note that all but that one are in sync with the korosync defaults in
the comment.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Previously node id was logged ether as a %d (most often), %u, %x or
PRI.32 and ring id ether as %lld, %llx with various separators (., :, /)
between rep nodeid and seq. This seems to cause confusion.
This patch adds macros CS_PRI_NODE_ID, CS_PRI_RING_ID and
CS_PRI_RING_ID_SEQ (CS prefix = corosync, PRI modeled in spirit of
inttypes.h PRIx32) and makes code use them.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Compiler is unable to understand relation between members and
num_configured and warns about uninitialized members. Instead of
initializing members to 0 and (potentially after some code
refactor) let code fall to display error message, more explicit method
of assert is used.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
configured links may not come in order in the interfaces array, which
holds an entry for _all_ possible links, not just configured ones.
So iterate through all interfaces, but skip those which are not
configured. This allows to start corosync with a configuration where
link 0 is currently not mentioned, as else it was checked but had
member_count = 0 from it's default initialization, which then made
this code report a false positive for the "Not all nodes have the
same number of links" check even on a correct config.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
As totem_config->interfaces entries are _all_ possible links and not
only the configured ones we cannot trust that interface[0] is
configured at the time of checking, and thus has a valid
member_count. So set the members variable to the member_count entry
from an actually configured interface and loop over that one.
This fixes a case where the check for the name property on all nodes
for multi links was skipped if link 0 was not configured, as then its
member_count was 0.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Compiler may have problem understanding relation between addr1p and
addrlen. Small change makes code a little more readable and compiler
happy.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This feature allows corosync to block packets received from unknown
nodes (nodes with IP address which is not in the nodelist). This is
mainly for situations when "forgotten" node is booted and tries to join
cluster which already removed such node from configuration. Another use
case is to allow atomic reconfiguration and rejoin of two separate
clusters.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Make sure the retransmit timeout have the lowest limit
`MINIMUM_TIMEOUT`. So, the lowest limit of hold should be
recalculated.
Also token timeout and retransmits count should
keep a relational expression.
Signed-off-by: yuan ren <yren@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
When adding a new link for the first time you will often see:
1) knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid
argument (22)
2) New config has different knet transport for link 1. Internal value
was NOT changed. To reconfigure an interface it must be deleted and
recreated. A working interface needs to be available to corosync at all
times
1) is caused by setting the ping timers twice, once in
totemknet_member_add() and once in totemknet_refresh_config().
The first time we don't know the value
so it's zero and thus display an error. For this we simply check
for the zero and skip the knet API call. It's not ideal, but
totemconfig needs a lot of reconfiguring itself before we can
make this more sane.
2) was caused by simply comparing an unconfigured link with
a configured one, so OF COURSE, they are going to be different!
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
When UDP is used as a transport, the error would occur
"Multicast address family does not match bind address family"
because there is no ipv6 in /etc/hosts specified but using the
totem.ip_version: ipv6-4. because
the mcastaddr generated (if not specified) only according to
the totem.ip_version.
Solution is to use bindnetaddr (configured or generated from
nodelist) addr family.
Signed-off-by: yuan ren <yren@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Checking whole structure is fine for IPv4, but IPv6 contains also scope
id, what may be problem for local address. It's possible to use a zone
index, but because it's not required when host name is used, it
shouldn't be needed when IPv6 address is used.
Example configuration snip which fails without patch:
...
nodelist {
node {
nodeid: 1
ring0_addr: fe80:🔢5678:9abc:def1
}
}
...
(example succeed when %eth0 is used).
With patch, zone index is not needed.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
with the following semantics:
- default off
- implies crypto_hash SHA256 and crypto_cipher AES256
- crypto_* have higher precedence
- only applicable for knet, like crypto_*
this should make upgrading from Corosync 2.x less painful for users that
have an explicit secauth=on in their configuration.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Triple DES is considered as a "weak cipher" since 2016 so there is
really no need to support it in the corosync. Thanks to bug in
Corosync/Knet/NSS which caused 3des to not work at all,
no matter what library was used, we can just remove support for 3des
without braking the compatibility.
Also fix coroparse so:
- totem.crypto_type is removed (this is 1.x construct which was not used
even in 2.x)
- Add checking of totem.crypto_model.
- Enumarate possible values for crypto_model, crypto_cipher and
crypto_hash error messages
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Originally totem.ip_version was used to force ip version used by totem.
With Knet this variable didn't make too much sense so it was not used.
Sadly rely only on DNS resolver order doesn't always work (RFC is quite
complicated, but if IPv6 is not configured then IPv4 is preferred), what
we tried to solve by forcing IPv6 and only if that fails, use IPv4.
Sadly this collides with nss_myhostname which is able to return every
local address and today system usually have at least one autogenerated
link-local IPv6 address so it is able to "overwrite" /etc/hosts.
Solution is to enhance totem.ip_version and use it also for Knet.
totem.ip_version is now just a flag for resolver and can have four
states: ipv4 (only IPv4 is used), ipv6 (only IPv6 is used), ipv4-6 (ask
IPv4 first and if it fails ask for IPv6) and ipv6-4 (ask IPv6 first and
if it fails ask for IPv4). Default for Knet and UDPU transports is
ipv6-4, for UDP it's ipv4, because autogenerated mcast addr doesn't play
too well with ipv6-4.
So everywhere where nss_myhostname becomes problem, it's just possible
to set totem.ip_version to ipv4-6.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
It didn't work anyway (the config system requires whole links
to be configured at once) and caused crashes.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Corosync used to just ignore parse errors so that un-resolved names
could cause silent failures. We now always check the result from
totemip_parse() and at least print something in syslog.
There's also a little get-out here that allows you to correct
a bad node address without having to destroy and recreate the
whole link. I'm being nice to you.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Remove another environment variable (reasons similar to removal of
COROSYNC_MAIN_CONFIG_FILE).
Also properly document both totem.keyfile and totem.key.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
The conversion to the new srp_addr format broke the feature where
UDP/UDPU nodes could get their nodeids generated from the IP address.
A big part of this was the removal of mandatory ring0_addr - it was used
as a placeholder when reading down the nodelist. I replaced this with
nodeid thinking that nodeid was now mandatory, forgetting this use case.
So the compare on "ring0_addr" or "nodeid" is now replaced with a more
robust check that we're only reading keys from the same node_pos once,
this was needed in votequorum.c as well as totemconfig.c
Another tidying side-effect of this patch is that the nodeid generation
is now all in a single routine in totemconfig.c and not shared between
it and totemip.c.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Formally not needed, because totemip_print should not return string
longer than INET6_ADDRSTRLEN, but static analysis tools are not capable
of such conclusion.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>