Commit Graph

2100 Commits

Author SHA1 Message Date
Christine Caulfield
31ddba64a2 config: Don't fudge port numbers
When I was adding knet I wanted the port numbers to default to the
base port number + the linknumber.

However I seem to have messed this up such that any port number
specified in the config file has the link number added to it. Which
is almost certainly not what people would expect.

This patch sets it right. If a port number is not specified
then 5405+linknumber is used. If a port number IS specified
then that actual number is used.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-01-18 16:31:24 +01:00
Christine Caulfield
22ae4cacda knet: Allow ping_timers to be auto-configured
knet ping_timers are auto-configured according to token value.

This patch also fixes some knet config bugs that resulted in defaults
not being applied when values were removed from corosync.conf.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-01-15 15:08:19 +01:00
yuskiida
e7734fab70 build: Add the headers necessary for RPM build
Signed-off-by: yuskiida <yusk.iida@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-01-11 14:47:46 +01:00
Christine Caulfield
236032f7b5 config: if local node addr is wrong, fail with a sensible message
If no valid local address is found in corosync.conf then corosync
exits with: "parse error in config: No multicast port specified"

This is because of the config change for knet that always populates
the interfaces. The old error of "no interfaces found" was only
slightly better anyway IMHO.

This patch adds an explicit check that local_node_pos has been
set in icmap and uses that to determine if a valid local address
has been found.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-01-09 17:50:12 +01:00
Jan Friesse
96cb977880 totemknet: Drop truncated packets on receive
This is backport of part of "totemudpu: Scale receive buffer" patch.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-01-09 17:46:31 +01:00
Jan Friesse
0f1813adff totemudp: Make use of UDP_RECEIVE_FRAME_SIZE_MAX
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-01-09 17:46:28 +01:00
Jan Friesse
32535b842c totemudpu: Export and rename UDPU_FRAME_SIZE_MAX
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-01-09 17:46:25 +01:00
Jan Friesse
3982f795d5 totemconfig: Fix UDP autogeneration of mcast addr
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-01-09 17:46:21 +01:00
Jan Friesse
155c0d4052 totemudpu: Scale receive buffer
Receive buffer should be based on PROCESSOR_COUNT_MAX and not static
buffer.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-01-09 17:46:04 +01:00
Christine Caulfield
98bb0c78c8 config: Allow selection of crypto_model
KNET has options for nss or openssl crpyto libraries, make this
available to corosync.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-01-05 15:25:17 +01:00
Christine Caulfield
2a6a571c06 config: Allow links to have different ip_versions
knet allows links to have different IP versions - proivided they
all match per link. So don't force them all to be the same.

I've added a check here to make sure that all nodes on the same
link are using the same IP version.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-12-22 17:15:19 +01:00
Bin Liu
b1d3eca448 wd: fix snprintf warnings
When running ./configure --enable-watchdog, gcc 7.2.1 will report
warnings for snprintf. This patch fixes the warnings.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-12-01 17:23:54 +01:00
Christine Caulfield
1ca72a1154 totemsrp: Revert totemsrp_get_ifaces() changes
In my enthusiasm for removing code while integrating knet I
also deleted the correct code for returning IP address for a node,
so that only the IP addres of the local node was ever returned.

This commit restores the the previous code.

Also, because we always return INTERFACE_MAX interfaces now (they don't
have to be contiguous) set ss_family to zero if that interface is not
in use so that downstream apps know and don't display a lot of 0.0.0.0

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-11-30 16:59:05 +01:00
Bin Liu
af21baf0ff totemconfig: remove duplicate aes256 test
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-11-29 18:18:52 +01:00
Jan Friesse
154895dfbe sync: Call sync_init of all services at once
This patch solves situation which can happen very rearly:
- Node B is running
- Node A is started and tries to create singleton membership. It also
  initialize service S which tries to send message during initialization
- Just before node A finished move to operational state, it gets
  Node B multicast message so moves to gather state
- Node A and B creates membership and moves to operational state and
  sync is started
- Node A and B receives message sent by node A during initialization of
  service S
- Node A exits before sync of service is finished

In this situation, node B may never execute sync_init for
service S. So node B service S is not aware of existence of node A but
it received message from it.

Similar situation can theoretically also happen during merge.

Solution is to change flow of sync, so now it looks like:

- Build service_list
- Call sync_init for all local services
- Send service_list
- Receive service_list from all members and send barier
- For all services:
  - Receive barier
  - Call sync_activate if this is not first service
  - Call sync_process for next service or finish sync if previous
    this service is the last one
  - Send barier

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-11-16 15:22:19 +01:00
Jan Friesse
499eaac80f sync: Remove unneeded determine sync code
Code was used for compatibility with old sync v1 (in needle this was
deleted and previous version 2 became v1), and it's no longer needed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-11-16 15:22:14 +01:00
Christine Caulfield
1df7eca5ad stats: Add some missing knet stats
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-11-16 08:35:50 +01:00
Ferenc Wágner
09b0123d58 Send corosync startup notification to systemd
This enables starting the daemon directly in the service file, because
dependent units won't be started until initialization is complete.

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-11-09 09:49:18 +01:00
Jan Friesse
f05d1c9293 coroparse: Do not convert empty uid, gid to 0
When uid (or gid) value was empty string it was incorrectly converted to
0. Solution is to check input string emptines.

Thanks Bin Liu <bliu@suse.com> for reporting the bug.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Bin Liu <bliu@suse.com>
2017-11-06 09:37:54 +01:00
Christine Caulfield
45fe19ed86 stats: Don't display errors when reading knet stat
Only add the knet handle stat keys if we are actually running knet. This
prevents errors occurring when iterating through all of the stats keys

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-11-03 13:40:41 +01:00
Christine Caulfield
d9dfd41e4e stats: Add cmap key to clear the various stats.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-31 17:39:14 +01:00
Bin Liu
cf339c20c3 totemconfig: generate mcast icmap items for UDP
Generating mcastaddr and mcastport in icmap make
sense only for UDP transport.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-30 14:14:48 +01:00
Bin Liu
99567f0e65 totemconfig: add nodeid check for knet
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-30 13:02:03 +01:00
Christine Caulfield
396bca4739 config: Fix memory leak
totem_volatile_config_set_string_value was not properly freeing memory.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-23 17:31:14 +02:00
Christine Caulfield
16f616b65d knet: Add support for knet compression
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-23 17:30:25 +02:00
Jan Friesse
165d748c04 cmap: Remove noop highest config version check
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2017-10-11 17:11:33 +02:00
Jonathan Davies
2d0e8114ba cmap: don't shutdown highest config_version node
Scenario:
 1. node A starts corosync with config_version = 2, nodelist = {A, B}
 2. node B starts corosync with config_version = 1, nodelist = {A, B}

corosync.conf(5) says the config_version option is "used to prevent
joining old nodes with not up-to-date configuration."

So expected outcome is:
 * corosync on node A remains alive
 * corosync on node B exits

Actual outcome is:
 * corosync on node A exits
 * corosync on node B exits

Explanation of actual behaviour:
 * Host A will have cmap_my_config_version = 2 but
   cmap_highest_config_version_received = 1, so will shutdown in
   cmap_sync_activate because these are not equal.
 * Host B will have cmap_my_config_version = 1 but
   cmap_highest_config_version_received = 2, so will shutdown in
   cmap_sync_activate because these are not equal.

Instead, node A should consider its own config_version in the
calculation of the highest config_version, i.e.
cmap_highest_config_version_received = 2, and so not shutdown
in cmap_sync_activate.

Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-11 17:07:35 +02:00
Kazunori INOUE
576a493d1e totemudp: Remove memb_join discarding
This is already implemented in totemsrp in much cleaner way (added
by commit ab8942f626).

Signed-off-by: Kazunori INOUE <inouekazu@intellilink.co.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-02 11:33:58 +02:00
Edwin Torok
15383b3eb3 votequorum: make atb consistent on nodelist reload
When the cluster changes from even sized to odd sized corosync
disables auto-tie-breaker if wait_for_all is not enabled.
However when changing from odd sized to even sized it doesn't reenable
it, causing auto_tie_breaker to be inconsistent across the cluster:
the newly added node and any nodes that restart corosync
will have it, but all the previously running nodes won't.

Signed-off-by: Edwin Torok <edvin.torok@citrix.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-26 18:05:17 +02:00
Fabio M. Di Nitto
76591baa4a totem: Remove unnecessary NSS headers
Also fix corosync.spec.in to depend on libknet.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-22 10:27:01 +02:00
Christine Caulfield
294a629fb5 config: Allow dynamic link configuration
Now we are using knet, it's possible to dynamically add, remove and
reconfigure links on the fly.

Also print 'n' for non-existant knet links. This will show up
only on loopback links >0. But it looks better than 'status ='

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-21 17:16:21 +02:00
Masse Nicolas
5b38aa721a totemudp: Retry if bind fails
If bind call fails it's retried for BIND_MAX_RETRIES.
If it's still unsuccessful, corosync exists instead
of working incorrectly.

Slightly modified by reviewer.

Signed-off-by: Masse Nicolas <nicolas.masse@stormshield.eu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-19 12:44:26 +02:00
Ferenc Wágner
b7b318b86f wd: default to not using a watchdog
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-14 17:40:48 +02:00
Ferenc Wágner
151ed9dfe5 wd: remove extra capitalization typo
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-12 14:23:04 +02:00
Jonathan Davies
3296a0d41a totemknet: fix debug message typo
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-11 11:51:16 +02:00
Ferenc Wágner
0f33464531 wd: fix typo
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-11 11:40:12 +02:00
Christine Caulfield
ed235edfe3 stats: add knet 'handle' stats
knet handle stats show compression and crypto statistics. With these
you can see the effectiveness of compression and the overheads of both
crypto and compression.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-23 14:18:59 +02:00
Christine Caulfield
01495f650c main: use syslog & printf directly for early log messages
libqb seems funny about logging things before its fully configured.
This corosync commit didn't help either:
8b6bd86a55

So to make sure that messages about the config file not being opened
get delivered to the user/syslog we send them directly.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-22 09:51:09 +01:00
Christine Caulfield
9898fc8760 totempg: Allow space for incoming overflow
totempg needs to store the current message + any
overflow for the next message which can be up to (nearly) the MTU size.
in knet that's large, but for UDP it's just 1500.

The reason we've never seen it before is because the actual max message
size is 1024 less than 1MB and after all the headers are stripped out the overflow is
usually 1024 bytes or less.
The 1024*1024 size of the assembly buffer is large enough to hold a max message (1047552) +
1024 bytes of a new UDP message. So we never saw any problems.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-14 14:04:31 +01:00
Chrissie Caulfield
f4a7e54d45 totemknet: Use knet's LOOPBACK transport (#236)
knet now has a built-in LOOPBACK transport so use that
rather than special-casing it for ourself.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-08-04 12:59:16 +01:00
Christine Caulfield
9da89f32c2 CFG: Remove ring-reenable code
RRP doesn't exist any more so all the ring re-enable code is redundant.

I've removed it from the library and all the code that does anything,
but I've left the hole in the IPC just in case old libraries are
hanging around.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-03 14:32:02 +02:00
Jan Friesse
9a50628fd1 main: Add support for libcgroup
When corosync is started in environment where it ends in cgroup without
properly set rt_runtime_us it's impossible to get RT priority.

Already implemented workaround is to use higher non-RT priority.

This patch implements another solution. It moves corosync into root cpu
cgroup. Root cpu cgroup hopefully has enough RT budget.

Another solution was mentioned on ML
https://lists.freedesktop.org/archives/systemd-devel/2017-July/039353.html
but this means to generate some "random" values.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
(cherry picked from commit c56086c701)
2017-08-01 14:32:53 +02:00
Christine Caulfield
55c3dcb76d stats: Add map with on-demand statistics
Icmap is factored out so it's possible to add other
maps for cmap. API call to switch maps from application
end is added.

Corosync-cmapctl is enhanced with -m option.

Stats contains all statistics previously found in runtime.connections,
runtime.services and runtime.totem prefixes together with new knet
related. All stats are read only.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-07-27 15:53:04 +02:00
Christine Caulfield
876910d8ff ipc: Check for the libraries sending invalid message IDs
If the library sent an invalid (ie too high) message ID to
corosync, then it could cause the daemon to crash.

Now we check the message ID before indexing the function array

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-07-14 14:06:49 +01:00
Jan Friesse
9627d7350b main: Add option to set priority
Option -P takes numeric value with same meaning
as nice or values min / max, meaning maximal / minimal priority (so
minimal / maximal nice value).

Scheduler / priority setting is moved in code so it is now executed
after logsys is configured so errors are logged.

Setting maximal priority is also used as fallback when realtime
scheduling is requested and sched_setscheduler fails.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
(cherry picked from commit a008448efb)
2017-07-10 16:40:39 +02:00
Jan Friesse
2c17832fa6 totemknet: Prevent dead-loop in log_flush_messages
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2017-07-03 15:41:08 +02:00
Jan Friesse
abc1fa5626 totemknet: Flush knet log messages
When initialization fails knet logs messages into pipe. Previously they
were never processed. Solution is to add log_flush_messages which takes
care to call log_deliver_fn.

Call of log_flush_messages is also added to totemknet_finalize because
this removes log pipe fd from qb_loop so similar problem can happen.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-07-03 13:19:11 +02:00
Jan Friesse
cf18736d52 totemconfig: Make crypto work again
Knet needs longer key and supports various key lengths. Split
TOTEM_PRIVATE_KEY_LEN into TOTEM_PRIVATE_KEY_LEN_MIN and
TOTEM_PRIVATE_KEY_LEN_MAX (both using KNET_*_KEY_LEN).

Fix incorrect "Could only read..." message.

Make sure key is properly initialized/zeroed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-07-03 13:19:02 +02:00
Christine Caulfield
b7df8fa46f knet: Compile with latest knet API
extra parameter added to knet_link_get_status()

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-06-29 10:02:21 +01:00
Jan Friesse
564b4bf7d4 totem: Propagate totem initialization failure
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-06-15 11:07:33 +02:00
Christine Caulfield
fa37b6073c totemknet: Use new knet_link_set_config() API
TC_PRIO_INTERACTIVE is now a link option in knet, so we have
to provide it at link config time.

This needs the latest knet git to compile as this is an updated API.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-06-09 13:28:46 +01:00
Michael Jones
afd97d7884 coroapi: Use size_t for private_data_size
Unsigned int and size_t represent two different concepts.

Same problem was present in ipc_glue.

Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-05-29 17:23:37 +02:00
Christine Caulfield
5b1df51aa6 votequorum: Report errors from votequorum_exec_send_reconfigure
If votequorum_exec_send_reconfigure() returns an error (ie the
packet could not be sent) then we should either return it to the
sender (for a library call) or, for an internal call, log it.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-05-26 16:18:33 +02:00
Jan Friesse
95b91e4ae7 main: Display reason why cluster cannot be formed
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-05-18 17:15:55 +02:00
Christine Caulfield
571f499e0a knet: Allow space for encapsulated messages
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-05-09 09:05:12 +01:00
Andrew Price
86012ebb45 Main: Call mlockall after fork
Man page of mlockall is clear:
Memory locks are not inherited by a child created via fork(2) and are
automatically removed (unlocked) during an execve(2) or when the
process terminates.

So calling mlockall before corosync_tty_detach is noop when corosync is
executed as a daemon (corosync -f was not affected).

This regression is caused by ed7d054e55
(setprio for logsys/qblog was correct, mlockall was not).

Solution is to move corosync_mlockall call on correct place.

Signed-off-by: Andrew Price <anprice@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-25 14:50:04 +02:00
Bin Liu
c83e6c7ed9 coroparse: Use readdir instead of readdir_r
readdir_r is deprecated in glibc 2.24 in favor of readdir (which became
thread safe). Also because corosync never calls read_uidgid_files_into_icmap
in muliple threads, no problem should appears even with libc where
readdir is thread-safe.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-20 08:53:54 +02:00
Bin Liu
725f9039e9 totemknet: Handle logpipe creation failure
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-20 08:53:49 +02:00
Bin Liu
be3e166249 wd: Report error when close of wd fails
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-20 08:53:45 +02:00
Christine Caulfield
44afff227d totemknet: Got back to recvmsg() from recvmmsg()
The kernel team have recommended us not to use recvmmsg and as it
confers no particular speed advantage (especially given the extra
memory consumption) I'm going back to single message recvmsg() again.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-04-11 13:44:08 +01:00
Bin Liu
0462b5e609 totemconfig: Prefer nodelist over bindnetaddr
In a two-node cluster, I 've one node configured with open-vswtich:
5: br-fixed: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN group default
inet 192.168.124.88/24 scope global br-fixed
inet 192.168.124.87/24 scope global secondary br-fixed
inet 192.168.124.83/24 brd 192.168.124.255 scope global secondary
tentative br-fixed
inet 192.168.124.89/24 scope global secondary br-fixed

while I use 192.168.124.83 in node list of corosync.conf with udpu, and
the bind_addr is 192.168.124.0. After upgrading corosync on this node,
the it uses 192.168.124.88 instead of 192.168.124.83. As we can see:

corosync-cfgtool -s
Printing ring status.
Local node ID 1084783704

corosync-quorumtool -s
Membership information:
Nodeid Votes Name
1084783697 1 d52-54-77-77-01-02
1084783699 1 d52-54-77-77-01-01 (local)

while the other node can only see itself:
corosync-cfgtool -s
Printing ring status.
Local node ID 1084783697
RING ID 0
id = 192.168.124.81
status = ring 0 active with no faults

corosync-quorumtool -s
Membership information:
Nodeid Votes Name
1084783697 1 d52-54-77-77-01-02.virtual.cloud.suse.de (local)

this patch will check if there are both nodelist and bindnetaddr and if
so, display warning and use nodelist information.

Signed-off-by: Bin Liu <bliu@suse.com>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-11 11:19:31 +02:00
Christine Caulfield
6076e840f5 knet: Close libknet down cleanly at shutdown
By tidily shutting down knet in totekmknet_finalize we
make sure all the links are cleanly taken down and,
more importantly for us, the corosync LEAVE message gets
sent so we don't get fenced on a clean exit.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-04-11 09:03:26 +01:00
Bin Liu
d2a5e1442e logconfig: Do not overwrite logger_subsys priority
logfile_priority and syslog_priority could be modified by
logging.logger_subsys.{logfile_priority|syslog_priority}. which could
lead to the following output(which are at notice level):

corosync[21419]:   [QUORUM] Using quorum provider corosync_votequorum
corosync[21419]:   [QUORUM] Members[1]: 1084777643
corosync[21419]:   [QUORUM] This node is within the primary component
                   and will provide service.
corosync[21419]:   [QUORUM] Members[3]: 1084777563 1084777584 1084777643

even the syslog_priority is warning. This patch could avoid the
overwrite.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-03-10 09:09:42 +01:00
Christine Caulfield
16770a4153 totem: Fix buffer sizes
knet needs buffers to be KNET_MAX_PACKET_SIZE or messages will
get lost or corrupted.

UDPU packets shouldn't be that big so I introduced UDP_FRAME_SIZE_MAX
for that transport.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-03-02 14:57:39 +00:00
Christine Caulfield
30771a39a8 main: Don't ask libqb to handle segv, it doesn't work
segv should be handled by corosync, libqb is not the
place to be handling emergency signals.

This currently requires the head of libqb git tree to
generate a blackbox & coredump in the event of a segfault,
but it's better than the write() spin that currently happens.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-02-27 15:14:41 +00:00
Jan Friesse
8b6bd86a55 Logsys: Change logsys syslog_priority priority
LibQB adds default "*" syslog filter so we have to set syslog_priority
as low as possible so filters applied later in
_logsys_config_apply_per_file takes effect.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2017-02-24 16:23:50 +01:00
Fabio M. Di Nitto
36ef2af5a7 knet: improve logging messages by adding knet subsystem
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2017-02-24 09:41:35 +01:00
Fabio M. Di Nitto
19232f6052 knet: Change nodeids to knet_node_id_t for new knet compatibility
after some feedback on github, people prefers to have the option
to support up to 64K node_id's.

libknet added knet_node_id_t to mask the size and type, currently
set to uint16_t.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2017-02-14 06:08:45 +01:00
Christine Caulfield
c0f1d576d6 knet: Fix MTU sizes & allow transport config in corosync.conf
Corosync layers don't need to know the knet MTU size - this way
corosync fragments buffers only when they get larger than the
KNET buffer size (64K) and knet fragments below that based on
the actual MTU and transport considerations.

It is also now possible to configure knet to use UDP or SCTP
transports in corosync.conf. This is currently done per-link
so if you have more than 1 link you need several interface{}
stanzas inside totem{} to make it use other than the default
of UDP. if it's useful I might add the option of a global
default.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-02-13 16:54:30 +00:00
Fabio M. Di Nitto
970549ddfc knet: PMTUd data_mtu already accounts for IP and knet header overheads
provide some more space for data and small (+1% perf boost)

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2017-02-11 06:41:38 +01:00
Fabio M. Di Nitto
18fef0ae7f knet: switch from write to sendto()
this provides another 9.6% performance boost on 2 node clusters

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2017-02-11 06:24:12 +01:00
Christine Caulfield
d9df98ceba knet: Change nodeids to 8 bit for new knet compatibility
I've also put an assert in totemknet_member_add() to check
for invalid nodeids. Later on we need to fix the rest of the
corosync code to only use 8bit nodeids (or force people to use
UDPU if they want large nodeids).

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-02-03 09:38:32 +00:00
Christine Caulfield
2d478505e5 knet: Fix member_remove to shut down existing links first
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-01-16 13:16:15 +00:00
Christine Caulfield
029b8ebad6 knet: Reduce default pong count to 2 for faster startup
The default PONG_COUNT of 5 made corosync slow to connect to other
nodes.
This helps.
2017-01-03 13:30:26 +00:00
Christine Caulfield
950cca886e totemknet: Make it compile with kronosnet git master
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-12-22 10:25:11 +00:00
Takeshi MIZUTA
4939c75629 Remove redundant header file inclusion
Signed-off-by: Takeshi MIZUTA <miz.take4@gmail.com>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-12-05 09:59:08 +01:00
Bin Liu
819d66ca1c Totempg: remove duplicate memcpy in mcast_msg func
In function mcast_msg of totempg.c, line 923, there is a memcpy call in
"else" branch, and also another memcpy out of the "else" branch, while
the two calls have the same parameters. It is possibleto remove the memcpy
in "else" branch.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-12-05 09:40:55 +01:00
Takeshi MIZUTA
034553c080 man: Modify man-page according to command usage
Signed-off-by: Takeshi MIZUTA <miz.take4@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-12-01 16:32:42 +01:00
Takeshi MIZUTA
9c5b39d438 totempg: totempg_groups_join return valid error
totempg_groups_join() is called by sync_init().
sync_init() judge that totempg_groups_join() failed if return code of
totempg_groups_join() is -1.
Therefore, the return code should return in -1 when
totempg_groups_join() fails.

Signed-off-by: Takeshi MIZUTA <miz.take4@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-11-23 09:22:21 +01:00
Christine Caulfield
401f483cce knet: Support reload of link parameters
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-11-17 11:41:54 +00:00
Takeshi MIZUTA
f5dcc4a5f2 list: Unify the list processing with qb_list func
Signed-off-by: Takeshi MIZUTA <miz.take4@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-11-15 12:19:13 +01:00
Christine Caulfield
7cec6a131d knet: Allow configuration of more params
knet_pmtud_interval &
knet_pong_count

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-11-15 09:32:09 +00:00
Chrissie Caulfield
65219a6300 knet: Don't lose log messages when knet gets busy (#165)
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-11-14 15:01:34 +00:00
Jan Friesse
1f90c31ba7 list: Replace for_each by safe version where need
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-10-27 14:56:52 +02:00
Michael Jones
b4c06e52f3 list: Replace uses of list.h with qblist.h
Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-10-27 14:56:52 +02:00
Christine Caulfield
86de6ce1e6 totem: add totemknet.[ch]
it seems git is better at deleting files than adding them

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-10-13 08:46:34 +01:00
Michael Jones
a24d26c46a cfg: Prevents use of uninitialized buffer
Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-10-12 16:19:05 +02:00
Christine Caulfield
268cde6ee4 totem: Add Kronosnet transport.
This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.

To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet

The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.

Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-10-11 10:09:42 +01:00
HideoYamauchi
f1ffe31ce5 coropase: Set a poll_period value for wd monitor
Signed-off-by: HideoYamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-10-06 15:48:38 +02:00
Christine Caulfield
c4683be9b0 votequorum: simplify reconfigure message handling
As we now have update_node_expected_votes(), we can use that
when receiving a new EXPECTED_VOTES value from another node
rather than having our own loop.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-09-13 15:55:58 +01:00
Christine Caulfield
bd2e6b5d9d votequorum: Don't update expected_votes display if value is too high
If expected_votes was set via the library but the calculation
decides it's too high, then an error is correctly returned but
the value is still set in the nodes' expected_votes field and
turns up in the corosync-quorumtool display.

This patch separates out the quorum calculation from the updating
of expected_votes per node to prevent this from happening.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-09-13 14:28:56 +01:00
Ferenc Wágner
cf10a754e9 Fix various typos
occured -> occurred
parantheses -> parentheses
configuraton -> configuration
aquire -> acquire
retrive -> retrieve
prefered -> preferred

Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-09-12 09:50:11 +02:00
Jan Friesse
f837f95dfe Config: Flag config uidgid entries
Uidgid entries parsed from configuration files now has prefix
(uidgid.config.) so they are distinguishable from dynamically added
entries. Entries added from config file are pruned on reload if no
longer exists in config file (dynamic one stays unaffected). Also whole
uidgid.config. prefix is made read only.

This make PCMK work again after configuration reload is called.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-08-04 16:13:48 +02:00
HideoYamauchi
71c9035c27 Low: totemsrp: Addition of the log.
Signed-off-by: HideoYamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-08-01 10:11:45 +02:00
Jan Friesse
1925074909 Fix few bugs found by coverity
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2016-06-28 13:58:43 +02:00
Christine Caulfield
0665aca9e1 quorum: revert patch that adds qdevice (node 0) to quorum callback
Revert patch 9f54f0a1fad7dad42c55562a50dfb9d773e6a660 as it causes
more troubles than it solves. Code that uses the quorum nodelist
to get a list of actual nodes in the cluster for communication
break using this as well as the display from corosync-quorumtool

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:43 +02:00
Christine Caulfield
c9c6d9e30f quorum: Return qdevice nodeid in the quorum callbacks (if active).
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:41 +02:00
Christine Caulfield
e41b256c67 votequorum: Allow wait_for_all with qdevice
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:39 +02:00
Christine Caulfield
98548e1880 qnetd: lms: Fix search for node/ring_id check
We were looking for us in other node lists, rather than
others in our nodelist.

Also, remove debug print in votequorum.c

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:39 +02:00
Christine Caulfield
3a5d51fca7 votequorum: Fix up quorum/nodelist callbacks
This patch tidies the two state change callbacks and explains them
in the man page:

The difference between votequorum_nodelist_notification_t and
votequorum_quorum_notification_t is subtle but important.
The 'nodelist' callback is sent at the start of a cluster state
transition and contains the new ring_id and only the list of
nodes that are included in the sync state - ie only active nodes. No
quorum information is included this callback because it is not
available at that time.

The 'quorum' callback is sent after the cluster state transition has
completed and does contain quorum information.
In addition, the nodelist contains a list of all nodes known to
votequorum (whether up or down) and their state as well
as information about the quorum device attached (if any). quorum
callbacks will not be sent for qdevice up and down
events unless they affect quorum.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:39 +02:00
Christine Caulfield
cf0028c86e votequorum: split callbacks into nodelist and quorum
This split is needed for qdevice, so that it gets the ring_id and
nodelist as part of the sync process and not afterwards - when quorum
has been calculated.

As this is and unsupported API I'm not too worried about breaking
existing code - all the clients I know of are using the quorum API
anyway as they should be.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-06-28 13:58:38 +02:00
Jan Friesse
44df76a7ee config: get_cluster_mcast_addr error is not fatal
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2016-06-28 13:57:14 +02:00
Ferenc Wágner
c76ee39f61 Fix typo: Diabled -> disabled
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-06-22 14:26:48 +02:00
Ferenc Wágner
b1de8efd15 Fix typo: aquire -> acquire
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-06-22 14:26:28 +02:00
Ferenc Wágner
841f48e253 Fix typo: Uknown -> Unknown
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-06-22 14:26:22 +02:00
Christine Caulfield
f2a1fcc5bf logconfig: Fix logging reload disabling logfiles
In my previous logconfig patch, adding a subsys so the
logging stanzas could disable logging to a file, because
the subsys closed the file used by the main logging.

This patch only applies defaults to higher-level logging and
non-deprecated keys.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-27 17:36:30 +02:00
yuusuke
2ef086bd9b wd: Warn if values are out of range
Signed-off-by: yuusuke <yusk.iida@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-05-27 10:38:30 +02:00
yuusuke
39cd6b3d1d parser: WD Read type correctly from corosync.conf
Signed-off-by: yuusuke <yusk.iida@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-05-27 10:36:24 +02:00
Christine Caulfield
571b1621e9 Add some more RO keys
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-24 12:33:55 +02:00
Christine Caulfield
125848d80a Reapply config defaults corosync.conf reload
There were several places where defaults were not restored
if the keys were removed from corosync.conf and the file reloaded.

This patch adds those back so that reloading corosync.conf
has the expected effect when keys are deleted.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-24 12:33:35 +02:00
Jan Friesse
b93d75abc4 schedwrk: Cleanup and make it work on PPC BE
Schedwrk is passing hdb handle (64-bit) to
totempg_callback_token_create as a context. Context is defined to be
pointer, so there is conversion function which stores 64-bit hdb_handle
into pointer. Potentially, pointer can be 32-bit. This means, check
part of hdb is discarded (and have to get special no_check value in
schedwrk_do) later. This works quite well on 32-bit Little-Endian
system. Sadly on Big-Endian system, check partition of hdb is stored
instead of value. Result is error of hdb_handle_get call.

Proposed solution is to pass handle pointer to
totempg_callback_token_create as context. This means full hdb (check +
value) can be used in schedwrk_do (easier detection of memory
corruption).

Main reason for this patch is to remove usage of pointer as integer
value.

Small drawback of given solution is that handle pointer must be memory
allocated on heap or static memory, making API more bug-prone. Current
usage of schedwrk API across corosync always use memory in .text
section (safe), so it's not a problem.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-05-17 16:29:25 +02:00
Valentin Vidic
8d8d4a936a wd: make watchdog device configurable
Add configuration option resources.watchdog_device allowing runtime
selection of watchdog device.  Useful for newer servers having more
than one watchdog available (IPMI and iTCO).

Special value "off" disables watchdog in configuration rather than
just using build options.  Useful when watchdog device is needed
elsewhere (SBD cluster stonith service).

Signed-off-by: Valentin Vidic <Valentin.Vidic@CARNet.hr>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-03 15:47:15 +02:00
Christine Caulfield
1e2de52ef1 logging: Use our own version of basename
basename() function has some potentially odd issues on
other platforms.

So, to be safe, here's an internal version.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-05-03 15:31:29 +02:00
Christine Caulfield
d245831d65 logsys: fix TOTEM logging when corosync built out of tree
If corosync is built out-of-tree (passing --srcdir to configure) then
TOTEM logging doesn't print anything.

This is caused by the source filenames (from __FILE__ at compilation
time) having the configured path in them - in this example
../corosync/exec/totemudp.c etc. The list of totem source filenames
passed to libqb logging facility only has the basenames so the filenames
never match up as libqb does an exact string match.

I looked into fixing this in libqb but it causes a regression. We can't
simply basename() __FILE__ at the point of calling log_printf as it's i
common also to use __FILE__ to generate the logging source, and
using basename() on both removes the distinction between similarly named
files from different directories which could be a requirement.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-04-26 09:49:53 +01:00
Christine Caulfield
aab55a004b parser: Make config file parser more hierarchy
pass 'state' down the stack so that the state of the
hierarchy doesn't get lost when there are unexpected items
in the config hierarchy.

Don't bother setting 'state' on SECTION_END as there's no point
now we're going back up the stack.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-04-22 13:01:04 +02:00
Jan Friesse
60565b7da7 totemconfig: Explicitly pass IP version
If resolver was set to prefer IPv6 (almost always) and interface section
was not defined (almost all config files created by pcs), IP version was
set to mcast_addr.family. Because mcast_addr.family was unset (reset to
zero), IPv6 address was returned causing failure in totemsrp.
Solution is to pass correct IP version stored in
totem_config->ip_version.

Patch also simplifies get_cluster_mcast_addr. It was using mix of
explicitly passed IP version and bindnet IP version.

Also return value of get_cluster_mcast_addr is now properly checked.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-04-07 14:45:05 +02:00
Jan Friesse
600fb4084a totempg: Fix memory leak
Previously there were two free lists. One for operational and one for
transitional state. Because every node starts in transitional state and
always ends in the operational state, assembly was always put to normal
state free list and never in transitional free list, so new assembly
structure was always allocated after new node connected.

Solution is to have only one free list.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Steven Dake <stdake@cisco.com>
2016-02-10 15:57:20 +01:00
Richard B Winters
028c473886 Fix spelling error in binary corosync
- Changed paramater to parameter in exec/logcconfig.c

Change-Id: I8a24b0ef5c6621dc6c19d7decbdfe7a255afd10d
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-01-27 18:29:25 +01:00
Ruben Kerkhof
37f092bbed totemsrp: Fix clang warning (tautological compare)
gsfrom is always >= 0

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-01-04 17:28:14 +01:00
Ruben Kerkhof
da3288217c Remove a few unused variables and functions
Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-01-04 17:11:06 +01:00
Ruben Kerkhof
479ec4dbf0 Check for fdatasync
If we don't have it, fall back to fsync

Fixes the build on FreeBSD

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-12-16 16:43:27 +01:00
Hideo Yamauchi
5ab922701a quorum: Display node id as unsigned int.
Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-11-27 15:56:54 +01:00
Christine Caulfield
165561df9b totemudp: Move udp bind() so that multicast works with IPv6
It seems that the IPv6 multicast parameters only take effect when bind()
is called, so I've moved the mcast recv socket bind() to the bottom of
totemudp_build_sockets_ip().

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-11-16 16:00:36 +00:00
Christine Caulfield
a71ec5d95d votequorum: Don't send multiple callbacks when nodes join
This patch aligns the votequorum callbacks so that they are
the same as the quorum ones. Previously it was quite common
for votequorum to send one callback for every node in the cluster
when a single new node joined (because it sent one for every
nodeinfo message it received).

This new system makes much more sense in itself and being
consistent with the internal quorum is also an advantage!

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-10-22 11:45:26 +01:00
Ferenc Wágner
73910bd66e totmesrp: Fix typo in log message
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-08-26 09:26:26 +02:00
Christine Caulfield
d64ee7b531 wd: fix setting of watchdog timeouts
Fix setting of initial watchdog timeout, and also changing of timeout.

Remove redundant starting of timer in exec_init_fn

Signed-off-by: Kazunori INOUE <kazunori.inoue3@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2015-07-14 10:04:06 +01:00
Jason HU
15b2e94cca CFG: Prevent CFG orignating messages during SYNC
During SYNC, corosync-cfgtool -R/-H commands can pass through IPC then
send totem messages. This may corrupts
assembly_list_inuse/assembly_list_free if those messages are recedived
after SYNC is done.

The solution is marking related CFG APIs as
CS_LIB_FLOW_CONTROL_REQUIRED.

Signed-off-by: Jason HU <huzhijiang@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-07-02 16:49:38 +02:00
Christine Caulfield
b9f5c290b7 votequorum: Fix auto_tie_breaker behaviour in odd-sized clusters
auto_tie_breaker can behave incorrectly in the case of a cluster
with an odd number of nodes. It's possible for a partition to
have quorum while the other side has the ATB node, and both will
continue working. (Of course in a properly configured cluster one side
will be fenced but that becomes an indeterminate race .. just what ATB
is supposed to avoid).

This patch prevents ATB from running in a partition if the 'other'
partition might have quorum, and also mandates the use of wait_for_all
in clusters with an odd number of nodes so that a quorate partition
cannot start services or fence an existing partition with the tie
breaker node.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-06-18 09:57:59 +01:00
Christine Caulfield
ab8942f626 totemsrp: Improve logging of left/down nodes
This patch from Hideo Yamauchi improves the logging of
whether nodes leave the cluster cleanly or uncleanly,
making it easier to determine if a node ws shut down
by the operator. There is also the possibility that a
LEAVE message could get missed (due to the node being
in flush state) so this can also make that clearer.

The modifications are as follows.

Change 1) I added the list which maintained LEAVE node to totemsrp.
Change 2) I added registration, a search, the handling of to clear LEAVE
node.
Change 3) I added the output to log.
Change 4) I changed an output level of the log.

Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-06-12 16:16:45 +01:00
Christine Caulfield
53f67a2a79 totem: Log a message if JOIN or LEAVE message is ignored
As per recent email thread, this patch adds a log message if a JOIN or
LEAVE message is discarded while corosync is flushing the receive queue.

While ignoring a JOIN message is harmless (it will be resent), ignoring
a LEAVE message can cause a longer state transition as it is treated as
a node crashing rather than leaving gracefully, so the system admin
might be confused as to the cause.

Unfortunately, we can't (at the totemudp level) distinguish between JOIN
or LEAVE messages without a lot more protocol-specific code creeping in
the lower layer so the message is left ambiguous.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2015-04-17 15:49:53 +01:00
Christine Caulfield
997074cc3e totemconfig: Check for duplicate nodeids
Having duplicate nodeids in corosync.conf can play havoc with a cluster,
so (as suggested by someone on this list) here is some code to check
that all nodeids are unique. Even if a nodeid is not specified it will
check to be sure that the ID generated from the IP address (ipv4 only)
does not clash with one that is provided.

It logs all non-unique nodeids to syslog, but only the last is reported
on the command-line to the user which should be enough to get them to
check further. At startup this will cause corosync to fail to start.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2015-04-10 14:22:07 +01:00
Christine Caulfield
82526d2fe9 quorum: don't allow quorum_trackstart to be called twice
If quorum_trackstart() or votequorum_trackstart() are called twice with
CS_TRACK_CHANGES then the client gets added twice to the notifications
list effectively corrupting it. Users have reported segfaults in
corosync when they did this (by mistake!).

As there's already a tracking_enabled flag in the private-data, we check
that before adding to the list again and return an error if
the process is already registered.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-16 11:37:52 +00:00
Christine Caulfield
8cc8e51363 cpg: Add support for messages larger than 1Mb
If a cpg client sends a message larger than 1Mb (actually slightly
less to allow for internal buffers) cpg will now fragment that into
several corosync messages before sending it around the ring.

cpg_mcast_joined() can now return CS_ERR_INTERRUPT which means that the
cpg membership was disrupted during the send operation and the message
needs to be resent.

The new API call cpg_max_atomic_msgsize_get() returns the maximum size
of a message that will not be fragmented internally.

New test program cpghum was written to stress test this functionality,
it checks message integrity and order of receipt.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-05 16:45:15 +00:00
Andrey N. Groshev
5d9acc5604 totemsrp: Format member list log as unsigned int
Signed-off-by: Andrey N. Groshev <greenx@yandex.ru>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-05 16:34:07 +01:00
Christine Caulfield
c832ade034 Don't allow both two_node and auto_tie_breaker in corosync.conf
The two_node and auto_tie_breaker options are incompatible as they
specify conflicting methods of determining the quorate half of a cluster
partition.

This patch detects this error in corosync.conf, issues a message and
disables two_node if auto_tie_breaker is present.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-02 15:50:21 +00:00
Christine Caulfield
314a01c98e Votequorum: Fix auto_tie_breaker default
The default for auto_tie_breaker should be 'lowest' - which is what it
was before the extended ATB functionality of auto_tie_breaker_node was
added, and what the documentation states.

However this was broken so that if auto_tie_breaker_node was not
specified then auto_tie_breaker itself was ignored. This patch fixes
that.

It also fixes a typo in a comment.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-02 15:48:01 +00:00
Jan Friesse
d77cec24d0 Handle adding and removing UDPU members atomically
When config file is reloaded with removed UDPU member, internal icmap
index of nodelist.node can change. This can result in removal and then
adding back node. This, with UDPU alive filtering (where member is by
default considered as not a member) makes corosync not sending messages
to such members resulting in new membership creation.

Solution is to properly test which members were really deleted and added
(instead of relying on internal and dynamic naming of icmap hash table
key name).

Also trully dynamic add and remove node (via cmap) is now handled by
same function so totem_config->interfaces is now updated properly.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2015-01-21 16:37:26 +01:00
Jan Friesse
252b38ab8a corosync_ring_id_store: Use safer permissions
corosync_ring_id_store should use same (safer) permissions as
corosync_ring_id_create_or_load for (eventually) newly created ringid
file.

Credit to Sjerek for finding this problem.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2015-01-20 11:21:05 +01:00
Jason
4ee84c51fa totem: Ignore duplicated commit tokens in recovery
In active rrp mode, commit tokens are treated as mcast data messages,
thus, rrp directly delivers them to srp layer by active_mcast_recv().
This will result in duplicated commit tokens being received by srp from
different heartbeat links. If node is in recovery state and has already
sent out the initial orf token, those duplicated commit tokens will
cause message_handler_memb_commit_token() to send initial orf token
again! This is wrong because it resets the orf token content in
instance->orf_token_retransmit, which breaks the token retransmission
state.

Furthermore, by sending those initial orf tokens again and again,
it may lead active_token_recv() to drop some subsequent orf tokens.
It is OK for rrp because srp will do token retransmission,
but as said above, srp retransmission state has already been broken,
so finally we meet a "token lost in recovery state" condition caused
by software. If token timeout value is large, then it will takes long
time to create a new ring.

This can be reproduced by having two noded set to active rrp mode, with
two heartbeat links. Then with one node always on, let the other one do
stop/start again and again. It has a low probability to reproduce.
In theory, I think, the more heartbeat links used, the more easily it
can be reproduced.

This problem can be resolved by letting
message_handler_memb_commit_token() to ignore duplicated commit tokens
in recovery state if node (the ring representation) has already sent
out the initial orf token.

Different from prev take, this version do not depends on stored token
data but uses originated_orf_token in totemsrp_instance to remember
if initial orf token has been already originated for current membership.

Signed-off-by: Jason <huzhijiang@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2015-01-15 17:33:04 +01:00
Jan Friesse
e0ac861efd Log auto-recovery of ring only once
Make sure to log auto-recovery of ring only once. Every
MESSAGE_TYPE_RING_TEST_ACTIVATE receive is logged, but with lower
priority and more detailed information.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2015-01-14 18:13:29 +01:00
Jan Friesse
177ef0e524 Set RR priority by default
Experience with larger production clusters showed that setting RR
priority for corosync is viable for prevent random fencing, ...

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2015-01-05 15:01:49 +01:00
Jason
8f284b26b3 Reset timer_problem_decrementer on fault
After a heartbeat link's FAULTY and its auto re-enable,
active_instance->timer_problem_decrementer did not reset to zero. So in
the next timer_function_active_token_expired() round,
active_timer_problem_decrementer_start() will not be called. This will
result in that the active_instance->counter_problems of this link can
not be decreased any more. Cause rrp lose the ability to tolerate
network fluctuation.

This problem can be reproduced by the following sequence:
1) Set RRP in active mode, configure at least 2 heartbeat links.
2) Unplug one link till corosync-cfgtool -s shows it is FAULTY.
3) Re-plug this link then corosync-cfgtool -s shows it is active with
no faults.
4) Unplug this link again but quicky re-plug it before it becomes
FAULTY.
5) Finally, you can see corosync-cfgtool -s shows it is in
"Incrementing problem counter" state despite it currently is physically
healthy.

It can be solved by not forget to reset timer_problem_decrementer to
zero in active_timer_problem_decrementer_cancel().

Signed-off-by: Jason <huzhijiang@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-12-08 16:26:28 +01:00
Jan Friesse
6449bea835 config: Ensure mcast address/port differs for rrp
When using multiple interfaces, it's necessary to use different
multicast address/port pair for each interface to make
rrp work correctly. This is now checked in parser.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-11-24 11:55:37 +01:00
Jan Friesse
70bd35fc06 config: Process broadcast option consistently
Broadcast option is global but in config set in interface section. When
more interfaces are defined, only broadcast from last section was used.

Solution is to use broadcast whenever at least one interface use
broadcast.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-11-24 11:55:37 +01:00
Jan Friesse
6c028d4d9c config: Make sure user doesn't mix IPv6 and IPv4
Checking code was there, sadly not correct, so it was possible to enter
one bindnet addr as IPv4 and second as IPv6. Fix is trivial.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-11-24 11:55:37 +01:00
Jan Friesse
bb52fc2774 Store configuration values used by totem to cmap
Some totem configuration values (like token, consensus, ...) are ether
computed or default value is used. It's hard to find out, what
value is really used.

Solution is to store values in cmap.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-10-13 11:59:06 +02:00
Jan Friesse
03f95ddaa1 Adjust MTU for IPv6 correctly
MTU for IPv6 is 20 bytes larger then IPv4. This fact was not taken into
account so IPv6 packets were larger then MTU resulting in fragmentation.

Solution is to substract correct IP header size.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-10-01 14:20:21 +02:00
Fabio M. Di Nitto
239e239782 [crypto] fix crypto block rounding/padding calculation
libnss is "weird" in this respect as some block sizes are hardcoded,
others need to be determined dynamically.

For AES we need to use the values we know since GetBlockSize would
return errors, for 3des (that hopefully nobody is using) the value
returned by GetBlockSize is 8, but let's use the call into libnss
to avoid possible conflicts with distro patching or older versions.

Now, given the correct block size, the old calculation simply added
block size to the hdr_size. This is not sufficient.

We use _PAD encryption methods and we need to take that into account.

_PAD is calculated given the current input buf len and rounded up
to block size boundary, then block_size is added.

Ideally we would do that on a per packet base but current transport
infrastructure doesn't allow it yet.

So round up the hdr_size to double the block_size reported by the
cipher.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-09-06 07:11:56 +02:00
Jan Friesse
2429481b96 totemudpu: Send msgs to all members occasionally
To follow spec it's needed to send messages to all nodes (not only
active members) from time to time to detect merge.

This is needed in situations when totemsrp merge timer isn't running
(because there is enough messages sent by processors) to detect merge.

Example scenario:
- 3 nodes, all of them running cpgverify
- One node is isolated (iptables for example)
- Node is un-isolated

Without this commit, node will not merge as long as the cpgverify is
running.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-26 15:36:07 +02:00
Jan Friesse
71f1b99649 totemudpu: Implement member_set_active
Member active is used for sending "multicast" messages only to members
of ring. This reduces network load if some nodes are intentionally down.
Only regular multicast message load is reduced (messages sent by
totemudpu_mcast_noflush_send), because special messages (like hold
cancel, join message, ...) still have to be send to all members to
ensure correct behavior.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-26 15:36:05 +02:00
Jan Friesse
371a99e961 totemrrp: Implement *_membership_changed
All *_membership_changed calls totemnet_member_set_active passing 1 as
active parameter for joined nodes and 0 for left nodes.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-26 15:36:02 +02:00
Jan Friesse
4c717942cf totemnet: Add totemnet_member_set_active
totemnet_member_set_active together with transport specific
member_set_active makes possible for totemnet (and more interestingly
transport) to be informed about membership changes.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-26 15:35:59 +02:00
Jan Friesse
acb55cdb03 totem: Inform RRP about membership changes
Services are informed about membership changes, but if same information
is needed inside totemrrp or totemnet, it's impossible to gather this
information.

Patch makes this possible for now only for RRP with empty callbacks.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-26 15:35:56 +02:00
Christine Caulfield
02f58aec9c YKD: Fix loading of YKD quorum module
Although YKD is currently unsupported, untested and decprecated it's
handy for testing things in the quorum module.

This patch allows YKD to actually load without an error. It does not fix
anything else in the service!

Also remove vsftype and its reference to YKD being the preferred and
default provider from the corosync.conf man page,
as that hasn't been true for a considerable time.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-08-18 09:33:59 +01:00
Christine Caulfield
cbf753405b votequorum: Add cmap key to reset wait_for_all
It's possible in a two_node cluster (and others but it's more likely
with just two) that a node could be booted up after downtime or failure
and the other node is not available for some reason. In this case it
would not be allowed to proceed because wait_for_all is enforced.

This patch provides a cmap key to clear this flag in the desperate
situation where that becomes necessary. It should only be used with
extreme caution and will be wrapped up in pcs which should also check
that fencing has been run.

Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by:  Jan Friesse <jfriesse@redhat.com>
2014-08-12 16:02:46 +01:00
Jason HU
f135b68096 Cancel token holding while in retransmition
When there is no other activty on ring but only retransmition, and
token is in hold mode, the retransmition will become slow. More over,
if the retransmition is always fail but token rotation works well, then
it takes quite a lone time
(fail_to_recv_const * token_hold = 2500 * 180ms = 450sec) for the
retransmit requester to meet the "FAILED TO RECEIVE" condition to
re-construct a new ring.

This problem can be solved by checking if retransmits are present
before going into hold. If a node is the retransmit requester or
the resender, it set my_token_held to 0 to speed up retransmition
and omit further unnecessary sending of token_hold_cancel signal.

Signed-off-by: Jason HU <huzhijiang@gmail.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-12 09:28:04 +02:00
Jan Friesse
17488909d4 votequorum: Make qdev timeout in sync configurable
Configuration option quorum.device.sync_timeout is available for setting
qdevice poll timeout for synchronization phase. Default value is 30
sec.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-05 17:22:52 +02:00
Jan Friesse
b4c9934635 votequorum: Block sync until qdevice poll
If qdevice is registered a alive, corosync waits in sync phase until
timeout expires or qdevice votes with correct nodeid parameter.

This gives qdevice time to decide to vote or not undisturbed and without
time hazard.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-05 17:22:47 +02:00
Jan Friesse
7cad804629 ipc: Process votequorum messages during sync
This is needed for qdevice to be able to process messages during
synchronization phase.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-05 17:22:44 +02:00
Jan Friesse
b8902464d1 votequorum: Add ring id to poll call
If votequorum service receives incorrect (not current) ringid, call is
ignored and CS_ERR_MESSAGE_ERROR is returned.

This and previous commits makes incompatible changes in votequorum
API/ABI, so library version is increased.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-05 17:22:41 +02:00
Jan Friesse
5f6f68805c votequorum: Return current ring id in callback
Returning ring id will be used in poll function.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-08-05 17:22:37 +02:00
Christine Caulfield
88dbb9f722 totemconfig: Make sure join timeout is less than consensus
The thesis contains this paragraph:

" The Join timeout is shorter than the Consensus timeout and is used to
  increase the probability that Join messages from all currently
  working processors are received during a single round of consensus."

Empirically I can confirm that making join less than consensus can cause
havoc with a cluster so I think we should enforce this.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-07-25 08:24:02 +01:00
Christine Caulfield
3b8365e806 config: Fix typos
Fix several places where 'then' is used instead of 'than' in error
messages and a comment.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-07-24 10:27:45 +01:00
Jan Friesse
63bf09776f totemconfig: refactor nodelist_to_interface func
Move finding of bindaddr in nodelist to generally usable function
totem_config_find_local_addr_in_nodelist and refactor
config_convert_nodelist_to_interface function to use it.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2014-07-22 14:59:31 +02:00
Jan Friesse
10c80f454e totemconfig: totem_config_get_ip_version
Add totem_config_get_ip_version to get user configured ip version.
Make totem_config_read use this newly introduced function.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2014-07-22 14:59:27 +02:00
Jan Friesse
dc35bfae62 totemconfig: Free ifaddrs list
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2014-07-22 14:59:20 +02:00
Fabio M. Di Nitto
84b9e5989a be consistent in using CPPFLAGS vs CFLAGS
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-07-21 08:47:21 +02:00
Vladislav Bogdanov
e3ffd4fedc Implement config file testing mode
Signed-off-by: Vladislav Bogdanov <bubble@hoster-ok.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-07-16 16:10:32 +02:00
Jan Friesse
dfaca4b10a Fix compiler warning introduced by previous patch
QB loop signal handler prototype differs from signal(2) prototype.
Solution is to create wrapper functions.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2014-07-09 15:57:35 +02:00
zouyu
384760cb67 Handle SIGSEGV and SIGABRT signals
SIGSEGV and SIGABRT signals are now correctly handled (blackbox is
dumped and logsys is finalized).

Signed-off-by: zouyu <hopkings2005@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-07-03 15:13:48 +02:00
zouyu
cc80c8567d fix memory leak produced by 'corosync -v'
Signed-off-by: zouyu <hopkings2005@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-07-03 14:54:05 +02:00
Jan Friesse
72cf15af27 votequorum: Do not process events during reload
During reload, local_node_pos is deleted and reinstation is handled in
totemconfig after reload is finished. votequorum handles this events and
tries to reload it's configuration. This led to logging a little scary
messages (even nothing bad is happening, because after local_node_pos
reinstation everything back to normal).

Solution is to stop processing events during reload. Sadly, simple
tracking of config.reload_in_progress doesn't work because LibQB events
triggering order is undefined so votequorum reload handler can be called
before totemconfig (and before local_node_pos is reinstatied).

So new config.totemconfig_reload_in_progress key is defined with very
similar semanthic as config.reload_in_progress but set inside
totem_reload_notify function. Votequorum then use this new key.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-27 11:40:21 +02:00
Jan Friesse
c8e3f14fdb Make config.reload_in_progress key read only
It's not very good idea to allow user apps changing internal key
reload_in_progress.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-27 11:40:18 +02:00
Jan Friesse
4e9716ed30 coroparse: More strict numbers parsing
Previous safe_atoi didn't check range of input values so if for example
user used -1 s token timeout, it was converted to UINT32_MAX without
letting user know.

Another safe_atoi problem was using strtol. This works pretty well on
64-bit systems, where long integer is usually 64-bits long, sadly on
32-bit systems, it is usually 32-bit long. And because strtol returns
signed integer, it was not possible to enter 32-bit value with highest
bit set.

Solution is to use strtoll which is guaranteed to be at least 64-bits
long and check value range.

Also error message now contains also information about expected value
range.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-12 14:49:00 +02:00
Jan Friesse
da46ecfc30 Move ringid store and load from totem library
Functions for storing and loading ring id was in the totem library. This
causes problem, what to do when it's impossible to load or store ring
id. Easy solution seemed to be assert, but sadly this makes hard for
user to find out what happened (because corosync was just aborted and
logsys didn't flush)

Solution is to move these functions to main.c, where is much easier to
handle error. This also makes libtotem free of any file system
operations.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-02 14:54:57 +02:00
Jan Friesse
d310b251c3 Introduce get_run_dir function
Run dir (LOCALSTATEDIR/lib/corosync) was hardcoded thru whole codebase.
Totemsrp was trying to create and chdir into it, but also
takes into account environment variable COROSYNC_RUN_DIR creating
inconsistency.

get_run_dir correctly returns COROSYNC_RUN_DIR (when set) or
LOCALSTATEDIR/lib/corosync. This is now used by all functions instead of
hardcoded string.

All occurrences of mkdir/chdir are removed from totemsrp and chdir is
now called in main function. Mkdir call is completely removed, because
it was not used anyway (check in main.c was called before totemsrp init,
so mkdir was never called) and also make install and/or package system
should take care of creating this directory with correct
permissions/context.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-02 14:53:18 +02:00
Jan Friesse
8f13a98320 logsys: Log warning if flightrecorder init fails
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-02 14:36:10 +02:00
Jan Friesse
19c5b63ff5 logsys: Log error if blackbox cannot be created
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-02 14:36:08 +02:00
Jan Friesse
e905f92bf5 totemiba: Fix incorrect failed log message
rdma_join_multicast failed ... message parameters was swapped.

Also information about multicast join is now logged as notice.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2014-05-15 15:28:51 +02:00
Yevheniy Demchenko
4d6a18d8a5 totemiba: Add multicast recovery
Totemiba wasn't able to survive SubnetManager handover or
restart. If SM was migrated to another node, corosync logged
"multicast error" and losses connectivity.

Commit should solve this situation.

Signed-off-by: Yevheniy Demchenko <zheka@uvt.cz>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-05-14 14:51:07 +02:00
hfu
d0dc9ae93c Indent: Remove newline before else branch start
Signed-off-by: hfu <askfuhu@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-05-09 11:38:02 +02:00
hfu
b6e2c8024d Indent: Remove space in negation of expression
Signed-off-by: hfu <askfuhu@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-05-09 11:37:47 +02:00
Jan Friesse
7557fdec48 config: Allow dynamic change of token_coefficient
token_coefficient change in cmap didn't triggered change. So only way
how to change token_coefficient was editing config file and reload.

Patch let's key totem.token_coefficient to be processed so
token_coefficient can be dynamically changed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-05-07 15:55:26 +02:00
Jan Friesse
58176d6779 Add token_coefficient option
Token coefficient is used only when nodelist is specified and contains
at least 3 nodes. If so, real token timeout is then computed as
token + (number_of_nodes - 2) * token_coefficient. This allows cluster
to scale without manually changing token timeout every time new
node is added. This value can be set to 0 resulting in effective
removal of this feature.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:29:17 +01:00
Jan Friesse
9a8de87c34 totemconfig: Log errors on key change and reload
When volatile key was changed (cmap set or reload) and checks fails,
nothing was logged.

Values are now checked and error string is logged on problems.

Also totem_config is dumped to log (DEBUG level) after every
volatile key change and every reload.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:29:14 +01:00
Jan Friesse
b95ebd640e totemconfig: Key change process dependencies
When key with dependency was changed, dependant keys were not recomputed.
Nice example is consensus timeout. If token timout was changed,
consensus timeout was not recomputed correctly (nether via cmap change
of key nor via cfg reload).

Solution is almost complete refactor of handling volatile defaults.

totem_volatile_config_read now handles not only storing cmap key to
totem_config structure, but also checking of existence, comparing with
zero value and properly storing defaults.

totem_set_volatile_defaults is gone. It's function was splitted into
totem_volatile_config_read and totem_volatile_config_validate functions.

Reload callback and change of key callback are now mostly same functions
and both calls totem_volatile_config_read.

Patch also fixes small memory leak. totem.vsftype key is not used for
long time and original totem_volatile_config_read wasn't freeing
allocated memory returned by icmap_get_string. Whole reading of
totem.vsftype is removed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:29:12 +01:00
Jan Friesse
eeb2384157 Really clear totemconfig nodes on reload
When reload was called nodes were constantly added to totemconfig
nodelist.

So simple corosync-cfgtool -R resulted very quickly in filling whole
array and segfault.

Solution is to clear member_count.

Clearing is also moved directly to put_nodelist_members_to_config to
make sure it's always processed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:29:09 +01:00
Jan Friesse
1b6abcc7d5 Log: Make reload of logging work
When reload was called multiple times (~20), logging to file stopped
working.

Main problem was hidden in the fact, that log file was opened multiple
times, because even target_id was shared via subsystem loggers, file
name was not.

Solution is to ALWAYS set proper log file name into subsystem logger
(copy is stored). This will not only fix problem but also removes small
leak.

Also if filename didn't changed, function can return sooner.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:13:33 +01:00
Jan Friesse
2f0cad20a9 config: Handle totem_set_volatile_defaults errors
When totem_set_volatile_defaults is called from totem_config_validate
return code is unchecked.

It's then perfectly possible to set (for example) join timeout to very
small value (1) and consensus value is then set to 0 making corosync
unable to create membership.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-17 10:04:00 +01:00
Jan Friesse
e1801ba497 votequorum: Properly initialize atb and atb_string
icmap_get_* behavior is to NOT modify passed variable when it doesn't
success. So we must initialize variable before icmap_get_* call.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2014-02-26 16:59:02 +01:00
Jan Friesse
ff67daa55f mon: Make monitoring work
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-25 14:57:20 +01:00
Jan Friesse
099f704cdd mon: Pass correct pointer to inst
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-25 14:57:16 +01:00
Jan Friesse
57ff693b70 mon: Fix comparsion typo
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-25 14:57:13 +01:00
Jan Friesse
e1e2390b61 mon: Make mon compilable with libstatgrab ver 0.9
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-25 14:57:10 +01:00
Jan Friesse
fbe8768f1b cpg: Make sure left nodes are really removed
When node is paused and other nodes has in meantime exited cpg process,
paused node after resume doesn't update it's membership correctly so on
previously paused node exited cpg process is still visible.

Solution is to compare join list with cpd and remove all pids which are
not included in join list.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-19 10:59:14 +01:00
Jan Friesse
83c63b247f cpg: Make sure nodid is always logged as hex num
Also number is prefixed by 0x so it's easier to spot that number is
hexadecimal.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-19 10:59:10 +01:00
Jan Friesse
fcf26e0303 cpg: Refactor mh_req_exec_cpg_procleave
Most of functionality is moved to do_proc_leave function to make it
reusable.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-19 10:59:05 +01:00
Jan Friesse
38c04d9a66 totemsrp: Fix typo with cont gather
Patch f3ffd3da5c introduced named states
of state-machine, but sadly contains logical problem causing
stats.continuous_gather increasing even when it shouldn't. Problem is
not critical, because continuous_gather is set to 0 on successful
membership creation.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-18 16:12:57 +01:00
Christine Caulfield
90d448af3b votequorum: Add extended options to auto_tie_breaker
This patch adds more flexibility to the auto_tie_breaker feature of
votequorum. With this, not only can the lowest nodeid be used as
a tie breaker, but also the highest, or a node from a nominated list.

If there is a list of nodes, the first node in the list that was not
part of the previous partition is used. This allows the user to
specify a preferred set of nodes but prevents a split-brain if the
cluster divides evenly with a node in each half.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-02-17 16:29:45 +00:00
Masatake YAMATO
fa71067a93 Free object allocated at quorum_register_callback
Memory object allocated with malloc at quorum_register_callback
is not freed. The object is linked to internal_trackers_list.

The object is unlinked at quorum_unregister_callback. However,
it is not freed at the function.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-01-23 17:18:44 +01:00