Commit Graph

188 Commits

Author SHA1 Message Date
Jan Friesse
bd338449ac totemconfig: Fix logging of freed string
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-10-29 17:45:19 +01:00
Christine Caulfield
707a9afa30 config: Allow generated nodeis for UDP & UDPU
The conversion to the new srp_addr format broke the feature where
UDP/UDPU nodes could get their nodeids generated from the IP address.

A big part of this was the removal of mandatory ring0_addr - it was used
as a placeholder when reading down the nodelist. I replaced this with
nodeid thinking that nodeid was now mandatory, forgetting this use case.
So the compare on "ring0_addr" or "nodeid" is now replaced with a more
robust check that we're only reading keys from the same node_pos once,
this was needed in votequorum.c as well as totemconfig.c

Another tidying side-effect of this patch is that the nodeid generation
is now all in a single routine in totemconfig.c and not shared between
it and totemip.c.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-10-25 18:21:02 +02:00
Jan Friesse
b13ab76e10 totemconfig: Replace strcpy by strncpy
Formally not needed, because totemip_print should not return string
longer than INET6_ADDRSTRLEN, but static analysis tools are not capable
of such conclusion.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-10-16 12:30:58 +02:00
Christine Caulfield
9f2d5a3a3f config: Fix crash in reload if new interfaces are added
This is a bug I seem to have introduced in
429209f4aa where we compare links
for changes. if a new node was added on an existing link then it
was compared against a non-existant one in the previous configuration.
We now only compare nodes that are in both interfaces.

As I needed min() for this function, I moved it from individual
.c files into util.h so we only have one copy.

And the error message was fixed.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-10-15 15:54:57 +02:00
Chris Walker
3f7d2cf6aa Add token_warning configuration option
Token_warning is used to present information about
when the token was last received.

Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-08-14 10:34:49 +02:00
Jan Friesse
1d2c6e4696 totemconfig: Enlarge error_string_response
... so error_reason can be fully included into parse error message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-08-13 09:00:44 +02:00
Christine Caulfield
a471bab798 config: Fail config validation if not all nodes have all links
KNET requires that all links be full-mesh (this may change in the future
but almost certainly not before knet 2.0), so enforce this in the
config.

Also avoid a potential div-by-0 error if the local node is not fully
configured either.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-07-03 12:38:02 +02:00
Christine Caulfield
d1db8c2851 config: Enforce use of 'name' node attribute in multi-link clusters
If the local host does not have a 'name' attribute and the cluster
has more than one link then fail the validation test.

I'm open to the idea of checking all of the nodes in the nodelist
if necessary. It seems overkill as each node will check its own
entry though.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-07-03 12:37:45 +02:00
Christine Caulfield
429209f4aa totemconfig: Check for things that cannot be changed on the fly
There are a few things in the interface that cannot be changed on the
fly. Warn about them and tell the user that these things need to be done
in two steps and why.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-07-02 09:54:31 +02:00
Christine Caulfield
f5871c6b4c config: Allow use of ring0_addr
Allow ring0_addr to be used in place of 'name' for
backwards compatibility

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:21:37 +01:00
Christine Caulfield
7a639d1b62 config: Update message when local host isn't found
Make the message more representative of what's going on.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:20:00 +01:00
Christine Caulfield
386d710ed1 cfg: Fix cfg_get_node_addrs so that DLM works
Also update copyright dates

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:19:45 +01:00
Christine Caulfield
fc8580bdbf totem: Use nodeid ONLY in srp_addr
This shrinks the srp_addr (and consequently every packet sent by
corosync) so that instead of containing loads of IP addresses to
identify a node, it just sends the nodeid.

This then allows us to make ring0 optional and replaceable when running
knet.

It also means that we need some other way of identifying the local
node in corosync.conf, so the nodelist.node.name entry is now mandatory
and is mapped to the local host using the same algorithm as used in
cman.

This code needs LOTS of testing as it touches a huge amount of totemsrp
and totemconfig.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:18:51 +01:00
Christine Caulfield
31ddba64a2 config: Don't fudge port numbers
When I was adding knet I wanted the port numbers to default to the
base port number + the linknumber.

However I seem to have messed this up such that any port number
specified in the config file has the link number added to it. Which
is almost certainly not what people would expect.

This patch sets it right. If a port number is not specified
then 5405+linknumber is used. If a port number IS specified
then that actual number is used.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-01-18 16:31:24 +01:00
Christine Caulfield
22ae4cacda knet: Allow ping_timers to be auto-configured
knet ping_timers are auto-configured according to token value.

This patch also fixes some knet config bugs that resulted in defaults
not being applied when values were removed from corosync.conf.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-01-15 15:08:19 +01:00
Christine Caulfield
236032f7b5 config: if local node addr is wrong, fail with a sensible message
If no valid local address is found in corosync.conf then corosync
exits with: "parse error in config: No multicast port specified"

This is because of the config change for knet that always populates
the interfaces. The old error of "no interfaces found" was only
slightly better anyway IMHO.

This patch adds an explicit check that local_node_pos has been
set in icmap and uses that to determine if a valid local address
has been found.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-01-09 17:50:12 +01:00
Jan Friesse
3982f795d5 totemconfig: Fix UDP autogeneration of mcast addr
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-01-09 17:46:21 +01:00
Christine Caulfield
98bb0c78c8 config: Allow selection of crypto_model
KNET has options for nss or openssl crpyto libraries, make this
available to corosync.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-01-05 15:25:17 +01:00
Christine Caulfield
2a6a571c06 config: Allow links to have different ip_versions
knet allows links to have different IP versions - proivided they
all match per link. So don't force them all to be the same.

I've added a check here to make sure that all nodes on the same
link are using the same IP version.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-12-22 17:15:19 +01:00
Bin Liu
af21baf0ff totemconfig: remove duplicate aes256 test
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-11-29 18:18:52 +01:00
Bin Liu
cf339c20c3 totemconfig: generate mcast icmap items for UDP
Generating mcastaddr and mcastport in icmap make
sense only for UDP transport.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-30 14:14:48 +01:00
Bin Liu
99567f0e65 totemconfig: add nodeid check for knet
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-30 13:02:03 +01:00
Christine Caulfield
396bca4739 config: Fix memory leak
totem_volatile_config_set_string_value was not properly freeing memory.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-23 17:31:14 +02:00
Christine Caulfield
16f616b65d knet: Add support for knet compression
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-23 17:30:25 +02:00
Christine Caulfield
294a629fb5 config: Allow dynamic link configuration
Now we are using knet, it's possible to dynamically add, remove and
reconfigure links on the fly.

Also print 'n' for non-existant knet links. This will show up
only on loopback links >0. But it looks better than 'status ='

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-21 17:16:21 +02:00
Jan Friesse
cf18736d52 totemconfig: Make crypto work again
Knet needs longer key and supports various key lengths. Split
TOTEM_PRIVATE_KEY_LEN into TOTEM_PRIVATE_KEY_LEN_MIN and
TOTEM_PRIVATE_KEY_LEN_MAX (both using KNET_*_KEY_LEN).

Fix incorrect "Could only read..." message.

Make sure key is properly initialized/zeroed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-07-03 13:19:02 +02:00
Bin Liu
0462b5e609 totemconfig: Prefer nodelist over bindnetaddr
In a two-node cluster, I 've one node configured with open-vswtich:
5: br-fixed: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN group default
inet 192.168.124.88/24 scope global br-fixed
inet 192.168.124.87/24 scope global secondary br-fixed
inet 192.168.124.83/24 brd 192.168.124.255 scope global secondary
tentative br-fixed
inet 192.168.124.89/24 scope global secondary br-fixed

while I use 192.168.124.83 in node list of corosync.conf with udpu, and
the bind_addr is 192.168.124.0. After upgrading corosync on this node,
the it uses 192.168.124.88 instead of 192.168.124.83. As we can see:

corosync-cfgtool -s
Printing ring status.
Local node ID 1084783704

corosync-quorumtool -s
Membership information:
Nodeid Votes Name
1084783697 1 d52-54-77-77-01-02
1084783699 1 d52-54-77-77-01-01 (local)

while the other node can only see itself:
corosync-cfgtool -s
Printing ring status.
Local node ID 1084783697
RING ID 0
id = 192.168.124.81
status = ring 0 active with no faults

corosync-quorumtool -s
Membership information:
Nodeid Votes Name
1084783697 1 d52-54-77-77-01-02.virtual.cloud.suse.de (local)

this patch will check if there are both nodelist and bindnetaddr and if
so, display warning and use nodelist information.

Signed-off-by: Bin Liu <bliu@suse.com>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-11 11:19:31 +02:00
Christine Caulfield
c0f1d576d6 knet: Fix MTU sizes & allow transport config in corosync.conf
Corosync layers don't need to know the knet MTU size - this way
corosync fragments buffers only when they get larger than the
KNET buffer size (64K) and knet fragments below that based on
the actual MTU and transport considerations.

It is also now possible to configure knet to use UDP or SCTP
transports in corosync.conf. This is currently done per-link
so if you have more than 1 link you need several interface{}
stanzas inside totem{} to make it use other than the default
of UDP. if it's useful I might add the option of a global
default.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-02-13 16:54:30 +00:00
Christine Caulfield
029b8ebad6 knet: Reduce default pong count to 2 for faster startup
The default PONG_COUNT of 5 made corosync slow to connect to other
nodes.
This helps.
2017-01-03 13:30:26 +00:00
Christine Caulfield
7cec6a131d knet: Allow configuration of more params
knet_pmtud_interval &
knet_pong_count

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-11-15 09:32:09 +00:00
Jan Friesse
1f90c31ba7 list: Replace for_each by safe version where need
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-10-27 14:56:52 +02:00
Michael Jones
b4c06e52f3 list: Replace uses of list.h with qblist.h
Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-10-27 14:56:52 +02:00
Christine Caulfield
268cde6ee4 totem: Add Kronosnet transport.
This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.

To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet

The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.

Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-10-11 10:09:42 +01:00
Jan Friesse
1925074909 Fix few bugs found by coverity
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2016-06-28 13:58:43 +02:00
Jan Friesse
44df76a7ee config: get_cluster_mcast_addr error is not fatal
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2016-06-28 13:57:14 +02:00
Jan Friesse
60565b7da7 totemconfig: Explicitly pass IP version
If resolver was set to prefer IPv6 (almost always) and interface section
was not defined (almost all config files created by pcs), IP version was
set to mcast_addr.family. Because mcast_addr.family was unset (reset to
zero), IPv6 address was returned causing failure in totemsrp.
Solution is to pass correct IP version stored in
totem_config->ip_version.

Patch also simplifies get_cluster_mcast_addr. It was using mix of
explicitly passed IP version and bindnet IP version.

Also return value of get_cluster_mcast_addr is now properly checked.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-04-07 14:45:05 +02:00
Christine Caulfield
997074cc3e totemconfig: Check for duplicate nodeids
Having duplicate nodeids in corosync.conf can play havoc with a cluster,
so (as suggested by someone on this list) here is some code to check
that all nodeids are unique. Even if a nodeid is not specified it will
check to be sure that the ID generated from the IP address (ipv4 only)
does not clash with one that is provided.

It logs all non-unique nodeids to syslog, but only the last is reported
on the command-line to the user which should be enough to get them to
check further. At startup this will cause corosync to fail to start.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2015-04-10 14:22:07 +01:00
Jan Friesse
d77cec24d0 Handle adding and removing UDPU members atomically
When config file is reloaded with removed UDPU member, internal icmap
index of nodelist.node can change. This can result in removal and then
adding back node. This, with UDPU alive filtering (where member is by
default considered as not a member) makes corosync not sending messages
to such members resulting in new membership creation.

Solution is to properly test which members were really deleted and added
(instead of relying on internal and dynamic naming of icmap hash table
key name).

Also trully dynamic add and remove node (via cmap) is now handled by
same function so totem_config->interfaces is now updated properly.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2015-01-21 16:37:26 +01:00
Jan Friesse
6449bea835 config: Ensure mcast address/port differs for rrp
When using multiple interfaces, it's necessary to use different
multicast address/port pair for each interface to make
rrp work correctly. This is now checked in parser.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-11-24 11:55:37 +01:00
Jan Friesse
70bd35fc06 config: Process broadcast option consistently
Broadcast option is global but in config set in interface section. When
more interfaces are defined, only broadcast from last section was used.

Solution is to use broadcast whenever at least one interface use
broadcast.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-11-24 11:55:37 +01:00
Jan Friesse
6c028d4d9c config: Make sure user doesn't mix IPv6 and IPv4
Checking code was there, sadly not correct, so it was possible to enter
one bindnet addr as IPv4 and second as IPv6. Fix is trivial.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-11-24 11:55:37 +01:00
Jan Friesse
bb52fc2774 Store configuration values used by totem to cmap
Some totem configuration values (like token, consensus, ...) are ether
computed or default value is used. It's hard to find out, what
value is really used.

Solution is to store values in cmap.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-10-13 11:59:06 +02:00
Christine Caulfield
88dbb9f722 totemconfig: Make sure join timeout is less than consensus
The thesis contains this paragraph:

" The Join timeout is shorter than the Consensus timeout and is used to
  increase the probability that Join messages from all currently
  working processors are received during a single round of consensus."

Empirically I can confirm that making join less than consensus can cause
havoc with a cluster so I think we should enforce this.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-07-25 08:24:02 +01:00
Christine Caulfield
3b8365e806 config: Fix typos
Fix several places where 'then' is used instead of 'than' in error
messages and a comment.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-07-24 10:27:45 +01:00
Jan Friesse
63bf09776f totemconfig: refactor nodelist_to_interface func
Move finding of bindaddr in nodelist to generally usable function
totem_config_find_local_addr_in_nodelist and refactor
config_convert_nodelist_to_interface function to use it.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2014-07-22 14:59:31 +02:00
Jan Friesse
10c80f454e totemconfig: totem_config_get_ip_version
Add totem_config_get_ip_version to get user configured ip version.
Make totem_config_read use this newly introduced function.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2014-07-22 14:59:27 +02:00
Jan Friesse
dc35bfae62 totemconfig: Free ifaddrs list
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2014-07-22 14:59:20 +02:00
Jan Friesse
72cf15af27 votequorum: Do not process events during reload
During reload, local_node_pos is deleted and reinstation is handled in
totemconfig after reload is finished. votequorum handles this events and
tries to reload it's configuration. This led to logging a little scary
messages (even nothing bad is happening, because after local_node_pos
reinstation everything back to normal).

Solution is to stop processing events during reload. Sadly, simple
tracking of config.reload_in_progress doesn't work because LibQB events
triggering order is undefined so votequorum reload handler can be called
before totemconfig (and before local_node_pos is reinstatied).

So new config.totemconfig_reload_in_progress key is defined with very
similar semanthic as config.reload_in_progress but set inside
totem_reload_notify function. Votequorum then use this new key.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-27 11:40:21 +02:00
Jan Friesse
7557fdec48 config: Allow dynamic change of token_coefficient
token_coefficient change in cmap didn't triggered change. So only way
how to change token_coefficient was editing config file and reload.

Patch let's key totem.token_coefficient to be processed so
token_coefficient can be dynamically changed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-05-07 15:55:26 +02:00
Jan Friesse
58176d6779 Add token_coefficient option
Token coefficient is used only when nodelist is specified and contains
at least 3 nodes. If so, real token timeout is then computed as
token + (number_of_nodes - 2) * token_coefficient. This allows cluster
to scale without manually changing token timeout every time new
node is added. This value can be set to 0 resulting in effective
removal of this feature.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:29:17 +01:00