Commit Graph

1865 Commits

Author SHA1 Message Date
Christine Caulfield
86de6ce1e6 totem: add totemknet.[ch]
it seems git is better at deleting files than adding them

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-10-13 08:46:34 +01:00
Michael Jones
a24d26c46a cfg: Prevents use of uninitialized buffer
Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-10-12 16:19:05 +02:00
Christine Caulfield
268cde6ee4 totem: Add Kronosnet transport.
This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.

To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet

The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.

Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-10-11 10:09:42 +01:00
HideoYamauchi
f1ffe31ce5 coropase: Set a poll_period value for wd monitor
Signed-off-by: HideoYamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-10-06 15:48:38 +02:00
Christine Caulfield
c4683be9b0 votequorum: simplify reconfigure message handling
As we now have update_node_expected_votes(), we can use that
when receiving a new EXPECTED_VOTES value from another node
rather than having our own loop.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-09-13 15:55:58 +01:00
Christine Caulfield
bd2e6b5d9d votequorum: Don't update expected_votes display if value is too high
If expected_votes was set via the library but the calculation
decides it's too high, then an error is correctly returned but
the value is still set in the nodes' expected_votes field and
turns up in the corosync-quorumtool display.

This patch separates out the quorum calculation from the updating
of expected_votes per node to prevent this from happening.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-09-13 14:28:56 +01:00
Ferenc Wágner
cf10a754e9 Fix various typos
occured -> occurred
parantheses -> parentheses
configuraton -> configuration
aquire -> acquire
retrive -> retrieve
prefered -> preferred

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-09-12 09:50:11 +02:00
Jan Friesse
f837f95dfe Config: Flag config uidgid entries
Uidgid entries parsed from configuration files now has prefix
(uidgid.config.) so they are distinguishable from dynamically added
entries. Entries added from config file are pruned on reload if no
longer exists in config file (dynamic one stays unaffected). Also whole
uidgid.config. prefix is made read only.

This make PCMK work again after configuration reload is called.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-08-04 16:13:48 +02:00
HideoYamauchi
71c9035c27 Low: totemsrp: Addition of the log.
Signed-off-by: HideoYamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-08-01 10:11:45 +02:00
Jan Friesse
1925074909 Fix few bugs found by coverity
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2016-06-28 13:58:43 +02:00
Christine Caulfield
0665aca9e1 quorum: revert patch that adds qdevice (node 0) to quorum callback
Revert patch 9f54f0a1fad7dad42c55562a50dfb9d773e6a660 as it causes
more troubles than it solves. Code that uses the quorum nodelist
to get a list of actual nodes in the cluster for communication
break using this as well as the display from corosync-quorumtool

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:43 +02:00
Christine Caulfield
c9c6d9e30f quorum: Return qdevice nodeid in the quorum callbacks (if active).
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:41 +02:00
Christine Caulfield
e41b256c67 votequorum: Allow wait_for_all with qdevice
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:39 +02:00
Christine Caulfield
98548e1880 qnetd: lms: Fix search for node/ring_id check
We were looking for us in other node lists, rather than
others in our nodelist.

Also, remove debug print in votequorum.c

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:39 +02:00
Christine Caulfield
3a5d51fca7 votequorum: Fix up quorum/nodelist callbacks
This patch tidies the two state change callbacks and explains them
in the man page:

The difference between votequorum_nodelist_notification_t and
votequorum_quorum_notification_t is subtle but important.
The 'nodelist' callback is sent at the start of a cluster state
transition and contains the new ring_id and only the list of
nodes that are included in the sync state - ie only active nodes. No
quorum information is included this callback because it is not
available at that time.

The 'quorum' callback is sent after the cluster state transition has
completed and does contain quorum information.
In addition, the nodelist contains a list of all nodes known to
votequorum (whether up or down) and their state as well
as information about the quorum device attached (if any). quorum
callbacks will not be sent for qdevice up and down
events unless they affect quorum.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:39 +02:00
Christine Caulfield
cf0028c86e votequorum: split callbacks into nodelist and quorum
This split is needed for qdevice, so that it gets the ring_id and
nodelist as part of the sync process and not afterwards - when quorum
has been calculated.

As this is and unsupported API I'm not too worried about breaking
existing code - all the clients I know of are using the quorum API
anyway as they should be.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:38 +02:00
Jan Friesse
44df76a7ee config: get_cluster_mcast_addr error is not fatal
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2016-06-28 13:57:14 +02:00
Ferenc Wágner
c76ee39f61 Fix typo: Diabled -> disabled
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-06-22 14:26:48 +02:00
Ferenc Wágner
b1de8efd15 Fix typo: aquire -> acquire
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-06-22 14:26:28 +02:00
Ferenc Wágner
841f48e253 Fix typo: Uknown -> Unknown
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-06-22 14:26:22 +02:00
Christine Caulfield
f2a1fcc5bf logconfig: Fix logging reload disabling logfiles
In my previous logconfig patch, adding a subsys so the
logging stanzas could disable logging to a file, because
the subsys closed the file used by the main logging.

This patch only applies defaults to higher-level logging and
non-deprecated keys.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-27 17:36:30 +02:00
yuusuke
2ef086bd9b wd: Warn if values are out of range
Signed-off-by: yuusuke <yusk.iida@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-05-27 10:38:30 +02:00
yuusuke
39cd6b3d1d parser: WD Read type correctly from corosync.conf
Signed-off-by: yuusuke <yusk.iida@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-05-27 10:36:24 +02:00
Christine Caulfield
571b1621e9 Add some more RO keys
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-24 12:33:55 +02:00
Christine Caulfield
125848d80a Reapply config defaults corosync.conf reload
There were several places where defaults were not restored
if the keys were removed from corosync.conf and the file reloaded.

This patch adds those back so that reloading corosync.conf
has the expected effect when keys are deleted.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-24 12:33:35 +02:00
Jan Friesse
b93d75abc4 schedwrk: Cleanup and make it work on PPC BE
Schedwrk is passing hdb handle (64-bit) to
totempg_callback_token_create as a context. Context is defined to be
pointer, so there is conversion function which stores 64-bit hdb_handle
into pointer. Potentially, pointer can be 32-bit. This means, check
part of hdb is discarded (and have to get special no_check value in
schedwrk_do) later. This works quite well on 32-bit Little-Endian
system. Sadly on Big-Endian system, check partition of hdb is stored
instead of value. Result is error of hdb_handle_get call.

Proposed solution is to pass handle pointer to
totempg_callback_token_create as context. This means full hdb (check +
value) can be used in schedwrk_do (easier detection of memory
corruption).

Main reason for this patch is to remove usage of pointer as integer
value.

Small drawback of given solution is that handle pointer must be memory
allocated on heap or static memory, making API more bug-prone. Current
usage of schedwrk API across corosync always use memory in .text
section (safe), so it's not a problem.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-05-17 16:29:25 +02:00
Valentin Vidic
8d8d4a936a wd: make watchdog device configurable
Add configuration option resources.watchdog_device allowing runtime
selection of watchdog device.  Useful for newer servers having more
than one watchdog available (IPMI and iTCO).

Special value "off" disables watchdog in configuration rather than
just using build options.  Useful when watchdog device is needed
elsewhere (SBD cluster stonith service).

Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-03 15:47:15 +02:00
Christine Caulfield
1e2de52ef1 logging: Use our own version of basename
basename() function has some potentially odd issues on
other platforms.

So, to be safe, here's an internal version.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-03 15:31:29 +02:00
Christine Caulfield
d245831d65 logsys: fix TOTEM logging when corosync built out of tree
If corosync is built out-of-tree (passing --srcdir to configure) then
TOTEM logging doesn't print anything.

This is caused by the source filenames (from __FILE__ at compilation
time) having the configured path in them - in this example
../corosync/exec/totemudp.c etc. The list of totem source filenames
passed to libqb logging facility only has the basenames so the filenames
never match up as libqb does an exact string match.

I looked into fixing this in libqb but it causes a regression. We can't
simply basename() __FILE__ at the point of calling log_printf as it's i
common also to use __FILE__ to generate the logging source, and
using basename() on both removes the distinction between similarly named
files from different directories which could be a requirement.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-04-26 09:49:53 +01:00
Christine Caulfield
aab55a004b parser: Make config file parser more hierarchy
pass 'state' down the stack so that the state of the
hierarchy doesn't get lost when there are unexpected items
in the config hierarchy.

Don't bother setting 'state' on SECTION_END as there's no point
now we're going back up the stack.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-04-22 13:01:04 +02:00
Jan Friesse
60565b7da7 totemconfig: Explicitly pass IP version
If resolver was set to prefer IPv6 (almost always) and interface section
was not defined (almost all config files created by pcs), IP version was
set to mcast_addr.family. Because mcast_addr.family was unset (reset to
zero), IPv6 address was returned causing failure in totemsrp.
Solution is to pass correct IP version stored in
totem_config->ip_version.

Patch also simplifies get_cluster_mcast_addr. It was using mix of
explicitly passed IP version and bindnet IP version.

Also return value of get_cluster_mcast_addr is now properly checked.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-04-07 14:45:05 +02:00
Jan Friesse
600fb4084a totempg: Fix memory leak
Previously there were two free lists. One for operational and one for
transitional state. Because every node starts in transitional state and
always ends in the operational state, assembly was always put to normal
state free list and never in transitional free list, so new assembly
structure was always allocated after new node connected.

Solution is to have only one free list.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Steven Dake <stdake@cisco.com>
2016-02-10 15:57:20 +01:00
Richard B Winters
028c473886 Fix spelling error in binary corosync
- Changed paramater to parameter in exec/logcconfig.c

Change-Id: I8a24b0ef5c6621dc6c19d7decbdfe7a255afd10d
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-01-27 18:29:25 +01:00
Ruben Kerkhof
37f092bbed totemsrp: Fix clang warning (tautological compare)
gsfrom is always >= 0

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-01-04 17:28:14 +01:00
Ruben Kerkhof
da3288217c Remove a few unused variables and functions
Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-01-04 17:11:06 +01:00
Ruben Kerkhof
479ec4dbf0 Check for fdatasync
If we don't have it, fall back to fsync

Fixes the build on FreeBSD

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-12-16 16:43:27 +01:00
Hideo Yamauchi
5ab922701a quorum: Display node id as unsigned int.
Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-11-27 15:56:54 +01:00
Christine Caulfield
165561df9b totemudp: Move udp bind() so that multicast works with IPv6
It seems that the IPv6 multicast parameters only take effect when bind()
is called, so I've moved the mcast recv socket bind() to the bottom of
totemudp_build_sockets_ip().

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-11-16 16:00:36 +00:00
Christine Caulfield
a71ec5d95d votequorum: Don't send multiple callbacks when nodes join
This patch aligns the votequorum callbacks so that they are
the same as the quorum ones. Previously it was quite common
for votequorum to send one callback for every node in the cluster
when a single new node joined (because it sent one for every
nodeinfo message it received).

This new system makes much more sense in itself and being
consistent with the internal quorum is also an advantage!

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-10-22 11:45:26 +01:00
Ferenc Wágner
73910bd66e totmesrp: Fix typo in log message
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-08-26 09:26:26 +02:00
Christine Caulfield
d64ee7b531 wd: fix setting of watchdog timeouts
Fix setting of initial watchdog timeout, and also changing of timeout.

Remove redundant starting of timer in exec_init_fn

Signed-off-by: Kazunori INOUE <kazunori.inoue3@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2015-07-14 10:04:06 +01:00
Jason HU
15b2e94cca CFG: Prevent CFG orignating messages during SYNC
During SYNC, corosync-cfgtool -R/-H commands can pass through IPC then
send totem messages. This may corrupts
assembly_list_inuse/assembly_list_free if those messages are recedived
after SYNC is done.

The solution is marking related CFG APIs as
CS_LIB_FLOW_CONTROL_REQUIRED.

Signed-off-by: Jason HU <huzhijiang@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-07-02 16:49:38 +02:00
Christine Caulfield
b9f5c290b7 votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters
auto_tie_breaker can behave incorrectly in the case of a cluster
with an odd number of nodes. It's possible for a partition to
have quorum while the other side has the ATB node, and both will
continue working. (Of course in a properly configured cluster one side
will be fenced but that becomes an indeterminate race .. just what ATB
is supposed to avoid).

This patch prevents ATB from running in a partition if the 'other'
partition might have quorum, and also mandates the use of wait_for_all
in clusters with an odd number of nodes so that a quorate partition
cannot start services or fence an existing partition with the tie
breaker node.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-06-18 09:57:59 +01:00
Christine Caulfield
ab8942f626 totemsrp: Improve logging of left/down nodes
This patch from Hideo Yamauchi improves the logging of
whether nodes leave the cluster cleanly or uncleanly,
making it easier to determine if a node ws shut down
by the operator. There is also the possibility that a
LEAVE message could get missed (due to the node being
in flush state) so this can also make that clearer.

The modifications are as follows.

Change 1) I added the list which maintained LEAVE node to totemsrp.
Change 2) I added registration, a search, the handling of to clear LEAVE
node.
Change 3) I added the output to log.
Change 4) I changed an output level of the log.

Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-06-12 16:16:45 +01:00
Christine Caulfield
53f67a2a79 totem: Log a message if JOIN or LEAVE message is ignored
As per recent email thread, this patch adds a log message if a JOIN or
LEAVE message is discarded while corosync is flushing the receive queue.

While ignoring a JOIN message is harmless (it will be resent), ignoring
a LEAVE message can cause a longer state transition as it is treated as
a node crashing rather than leaving gracefully, so the system admin
might be confused as to the cause.

Unfortunately, we can't (at the totemudp level) distinguish between JOIN
or LEAVE messages without a lot more protocol-specific code creeping in
the lower layer so the message is left ambiguous.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2015-04-17 15:49:53 +01:00
Christine Caulfield
997074cc3e totemconfig: Check for duplicate nodeids
Having duplicate nodeids in corosync.conf can play havoc with a cluster,
so (as suggested by someone on this list) here is some code to check
that all nodeids are unique. Even if a nodeid is not specified it will
check to be sure that the ID generated from the IP address (ipv4 only)
does not clash with one that is provided.

It logs all non-unique nodeids to syslog, but only the last is reported
on the command-line to the user which should be enough to get them to
check further. At startup this will cause corosync to fail to start.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2015-04-10 14:22:07 +01:00
Christine Caulfield
82526d2fe9 quorum: don't allow quorum_trackstart to be called twice
If quorum_trackstart() or votequorum_trackstart() are called twice with
CS_TRACK_CHANGES then the client gets added twice to the notifications
list effectively corrupting it. Users have reported segfaults in
corosync when they did this (by mistake!).

As there's already a tracking_enabled flag in the private-data, we check
that before adding to the list again and return an error if
the process is already registered.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-16 11:37:52 +00:00
Christine Caulfield
8cc8e51363 cpg: Add support for messages larger than 1Mb
If a cpg client sends a message larger than 1Mb (actually slightly
less to allow for internal buffers) cpg will now fragment that into
several corosync messages before sending it around the ring.

cpg_mcast_joined() can now return CS_ERR_INTERRUPT which means that the
cpg membership was disrupted during the send operation and the message
needs to be resent.

The new API call cpg_max_atomic_msgsize_get() returns the maximum size
of a message that will not be fragmented internally.

New test program cpghum was written to stress test this functionality,
it checks message integrity and order of receipt.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-05 16:45:15 +00:00
Andrey N. Groshev
5d9acc5604 totemsrp: Format member list log as unsigned int
Signed-off-by: Andrey N. Groshev <greenx@yandex.ru>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-05 16:34:07 +01:00
Christine Caulfield
c832ade034 Don't allow both two_node and auto_tie_breaker in corosync.conf
The two_node and auto_tie_breaker options are incompatible as they
specify conflicting methods of determining the quorate half of a cluster
partition.

This patch detects this error in corosync.conf, issues a message and
disables two_node if auto_tie_breaker is present.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-02 15:50:21 +00:00