Commit Graph

3821 Commits

Author SHA1 Message Date
Ferenc Wágner
151ed9dfe5 wd: remove extra capitalization typo
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-12 14:23:04 +02:00
Ferenc Wágner
b510a0f45c corosync.conf.5: add warning about slow watchdogs
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-12 14:22:48 +02:00
Jonathan Davies
3296a0d41a totemknet: fix debug message typo
Signed-off-by: Jonathan Davies <jonathan.davies@citrix.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-11 11:51:16 +02:00
Ferenc Wágner
4b21de562c corosync.conf.5: Fix watchdog documentation
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-11 11:40:49 +02:00
Ferenc Wágner
0f33464531 wd: fix typo
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-11 11:40:12 +02:00
Khem Raj
1a2c72a80e Include fcntl.h for F_* and O_* defines
Fixes errors like
utils.c:95:22: error: use of undeclared identifier 'O_WRONLY'

Signed-off-by: Khem Raj <raj.khem@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-31 07:46:34 +02:00
Christine Caulfield
ed235edfe3 stats: add knet 'handle' stats
knet handle stats show compression and crypto statistics. With these
you can see the effectiveness of compression and the overheads of both
crypto and compression.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-23 14:18:59 +02:00
Christine Caulfield
01495f650c main: use syslog & printf directly for early log messages
libqb seems funny about logging things before its fully configured.
This corosync commit didn't help either:
8b6bd86a55

So to make sure that messages about the config file not being opened
get delivered to the user/syslog we send them directly.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-22 09:51:09 +01:00
Christine Caulfield
9898fc8760 totempg: Allow space for incoming overflow
totempg needs to store the current message + any
overflow for the next message which can be up to (nearly) the MTU size.
in knet that's large, but for UDP it's just 1500.

The reason we've never seen it before is because the actual max message
size is 1024 less than 1MB and after all the headers are stripped out the overflow is
usually 1024 bytes or less.
The 1024*1024 size of the assembly buffer is large enough to hold a max message (1047552) +
1024 bytes of a new UDP message. So we never saw any problems.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-14 14:04:31 +01:00
Chrissie Caulfield
93d48c8cc6 cpghum: Add options to change flood start/mult/end sizes (#237)
I ran out of sensible short options for cpghum so added some long
ones to cope with them.

Also added is the ability to specify most size values in a sensible format
eg 64M for 64 Megabytes or 48K for 48 Kilobytes.

Strictly those are MiB and KiB of course, but I'm old-fashioned.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-08-11 15:28:02 +01:00
Chrissie Caulfield
f4a7e54d45 totemknet: Use knet's LOOPBACK transport (#236)
knet now has a built-in LOOPBACK transport so use that
rather than special-casing it for ourself.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-08-04 12:59:16 +01:00
Christine Caulfield
9da89f32c2 CFG: Remove ring-reenable code
RRP doesn't exist any more so all the ring re-enable code is redundant.

I've removed it from the library and all the code that does anything,
but I've left the hole in the IPC just in case old libraries are
hanging around.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-03 14:32:02 +02:00
Jan Friesse
9a50628fd1 main: Add support for libcgroup
When corosync is started in environment where it ends in cgroup without
properly set rt_runtime_us it's impossible to get RT priority.

Already implemented workaround is to use higher non-RT priority.

This patch implements another solution. It moves corosync into root cpu
cgroup. Root cpu cgroup hopefully has enough RT budget.

Another solution was mentioned on ML
https://lists.freedesktop.org/archives/systemd-devel/2017-July/039353.html
but this means to generate some "random" values.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
(cherry picked from commit c56086c701)
2017-08-01 14:32:53 +02:00
Christine Caulfield
55c3dcb76d stats: Add map with on-demand statistics
Icmap is factored out so it's possible to add other
maps for cmap. API call to switch maps from application
end is added.

Corosync-cmapctl is enhanced with -m option.

Stats contains all statistics previously found in runtime.connections,
runtime.services and runtime.totem prefixes together with new knet
related. All stats are read only.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-07-27 15:53:04 +02:00
Christine Caulfield
876910d8ff ipc: Check for the libraries sending invalid message IDs
If the library sent an invalid (ie too high) message ID to
corosync, then it could cause the daemon to crash.

Now we check the message ID before indexing the function array

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-07-14 14:06:49 +01:00
Jan Friesse
9627d7350b main: Add option to set priority
Option -P takes numeric value with same meaning
as nice or values min / max, meaning maximal / minimal priority (so
minimal / maximal nice value).

Scheduler / priority setting is moved in code so it is now executed
after logsys is configured so errors are logged.

Setting maximal priority is also used as fallback when realtime
scheduling is requested and sched_setscheduler fails.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
(cherry picked from commit a008448efb)
2017-07-10 16:40:39 +02:00
Jan Friesse
2c17832fa6 totemknet: Prevent dead-loop in log_flush_messages
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2017-07-03 15:41:08 +02:00
Jan Friesse
84b37ef1ef corosync-keygen: Display number of needed bits
Instead of currently read bits, number of already read bits is
displayed to let the user know how long it's needed to "press keys"

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-07-03 13:19:13 +02:00
Jan Friesse
abc1fa5626 totemknet: Flush knet log messages
When initialization fails knet logs messages into pipe. Previously they
were never processed. Solution is to add log_flush_messages which takes
care to call log_deliver_fn.

Call of log_flush_messages is also added to totemknet_finalize because
this removes log pipe fd from qb_loop so similar problem can happen.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-07-03 13:19:11 +02:00
Jan Friesse
0924b06811 corosync-keygen: Make less-secure default
/dev/urandom is good enough for crypto keys and it's not blocking. If
superb randomness is really needed, it's possible to use newly added
option -r.

Also manpage is reworked a bit to use .nf instead of many .br.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-07-03 13:19:09 +02:00
Jan Friesse
a67df8c553 corosync-keygen: Adapt to knet key sizes
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-07-03 13:19:07 +02:00
Jan Friesse
cf18736d52 totemconfig: Make crypto work again
Knet needs longer key and supports various key lengths. Split
TOTEM_PRIVATE_KEY_LEN into TOTEM_PRIVATE_KEY_LEN_MIN and
TOTEM_PRIVATE_KEY_LEN_MAX (both using KNET_*_KEY_LEN).

Fix incorrect "Could only read..." message.

Make sure key is properly initialized/zeroed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-07-03 13:19:02 +02:00
Christine Caulfield
b7df8fa46f knet: Compile with latest knet API
extra parameter added to knet_link_get_status()

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-06-29 10:02:21 +01:00
Jan Friesse
564b4bf7d4 totem: Propagate totem initialization failure
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-06-15 11:07:33 +02:00
Christine Caulfield
fa37b6073c totemknet: Use new knet_link_set_config() API
TC_PRIO_INTERACTIVE is now a link option in knet, so we have
to provide it at link config time.

This needs the latest knet git to compile as this is an updated API.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-06-09 13:28:46 +01:00
Michael Jones
afd97d7884 coroapi: Use size_t for private_data_size
Unsigned int and size_t represent two different concepts.

Same problem was present in ipc_glue.

Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-05-29 17:23:37 +02:00
Christine Caulfield
5b1df51aa6 votequorum: Report errors from votequorum_exec_send_reconfigure
If votequorum_exec_send_reconfigure() returns an error (ie the
packet could not be sent) then we should either return it to the
sender (for a library call) or, for an internal call, log it.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-05-26 16:18:33 +02:00
Christine Caulfield
efef3a90e3 cpghum: remove space after delimiter
machine-readable stats do not need extra spaces!

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-05-25 15:55:17 +02:00
Christine Caulfield
57c4086fff cpghum: Add interim RTT to cpghum
when -f is selected the interim stats show the RTTs for that
size of packet.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-05-25 15:54:48 +02:00
Michael Jones
f8053a3a4b configure: Enable C99 language standard
Also disable some obsolete warnings.

Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-05-23 15:35:39 +02:00
Jan Friesse
95b91e4ae7 main: Display reason why cluster cannot be formed
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-05-18 17:15:55 +02:00
Hideo Yamauchi
97696bb4f5 notifyd: Add the community name to an SNMP trap
Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-05-18 17:03:50 +02:00
Christine Caulfield
ce188ff3f4 cpghum: Add machine-readable output
and fix a few small counter bugs.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-05-15 14:19:38 +01:00
Chrissie Caulfield
0be72f03f3 test: Fold cpgbench into cpghum (#205)
* test: Fold cpgbench into cpghum

cpgbench and cpghum share a lot of code & concepts so it makes
sense to merge them into a single test program that can both
benchmark and sanity check CPG.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-05-11 08:51:34 +01:00
Christine Caulfield
571f499e0a knet: Allow space for encapsulated messages
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-05-09 09:05:12 +01:00
Andrew Price
86012ebb45 Main: Call mlockall after fork
Man page of mlockall is clear:
Memory locks are not inherited by a child created via fork(2) and are
automatically removed (unlocked) during an execve(2) or when the
process terminates.

So calling mlockall before corosync_tty_detach is noop when corosync is
executed as a daemon (corosync -f was not affected).

This regression is caused by ed7d054e55
(setprio for logsys/qblog was correct, mlockall was not).

Solution is to move corosync_mlockall call on correct place.

Signed-off-by: Andrew Price <anprice@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-25 14:50:04 +02:00
Michael Schwarz
4ebb7ad07d Fix typos in README.recovery
Signed-off-by: Michael Schwarz <michi.schwarz@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-21 17:42:06 +02:00
Bin Liu
c83e6c7ed9 coroparse: Use readdir instead of readdir_r
readdir_r is deprecated in glibc 2.24 in favor of readdir (which became
thread safe). Also because corosync never calls read_uidgid_files_into_icmap
in muliple threads, no problem should appears even with libc where
readdir is thread-safe.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-20 08:53:54 +02:00
Bin Liu
725f9039e9 totemknet: Handle logpipe creation failure
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-20 08:53:49 +02:00
Bin Liu
be3e166249 wd: Report error when close of wd fails
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-20 08:53:45 +02:00
Bin Liu
28f40c7fe0 Qnetd lms: Use UTILS_PRI_RING_ID printf format str
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-20 08:53:41 +02:00
Bin Liu
c144dba622 cpghum: Fix printf of size_t variable
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-20 08:53:33 +02:00
Christine Caulfield
44afff227d totemknet: Got back to recvmsg() from recvmmsg()
The kernel team have recommended us not to use recvmmsg and as it
confers no particular speed advantage (especially given the extra
memory consumption) I'm going back to single message recvmsg() again.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-04-11 13:44:08 +01:00
Bin Liu
0462b5e609 totemconfig: Prefer nodelist over bindnetaddr
In a two-node cluster, I 've one node configured with open-vswtich:
5: br-fixed: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN group default
inet 192.168.124.88/24 scope global br-fixed
inet 192.168.124.87/24 scope global secondary br-fixed
inet 192.168.124.83/24 brd 192.168.124.255 scope global secondary
tentative br-fixed
inet 192.168.124.89/24 scope global secondary br-fixed

while I use 192.168.124.83 in node list of corosync.conf with udpu, and
the bind_addr is 192.168.124.0. After upgrading corosync on this node,
the it uses 192.168.124.88 instead of 192.168.124.83. As we can see:

corosync-cfgtool -s
Printing ring status.
Local node ID 1084783704

corosync-quorumtool -s
Membership information:
Nodeid Votes Name
1084783697 1 d52-54-77-77-01-02
1084783699 1 d52-54-77-77-01-01 (local)

while the other node can only see itself:
corosync-cfgtool -s
Printing ring status.
Local node ID 1084783697
RING ID 0
id = 192.168.124.81
status = ring 0 active with no faults

corosync-quorumtool -s
Membership information:
Nodeid Votes Name
1084783697 1 d52-54-77-77-01-02.virtual.cloud.suse.de (local)

this patch will check if there are both nodelist and bindnetaddr and if
so, display warning and use nodelist information.

Signed-off-by: Bin Liu <bliu@suse.com>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-11 11:19:31 +02:00
Christine Caulfield
6076e840f5 knet: Close libknet down cleanly at shutdown
By tidily shutting down knet in totekmknet_finalize we
make sure all the links are cleanly taken down and,
more importantly for us, the corosync LEAVE message gets
sent so we don't get fenced on a clean exit.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-04-11 09:03:26 +01:00
Christine Caulfield
84b7be6d46 man: Document -a option to corosync-quorumtool
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-04-07 17:10:55 +02:00
Jan Friesse
eca52f679f cpghum test: Improve error codes
Return error when unknown option is found. Also return error code 2 if
one of send/crc/length/sequence error happened. Finally make sure abort
returns same error code and not 999 (what is nonsense code anyway).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2017-04-07 09:36:39 +02:00
Christine Caulfield
3f3d8ca553 quorumtool: Add option to show all node addresses
New -a option shows all of the names/ip address of nodes
in a multi-homed environment.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-04-04 10:08:33 +01:00
Christine Caulfield
e6d0f87f73 cpghum: Stop cpghum from reporting fake CRC errors
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2017-03-14 16:38:19 +00:00
Bin Liu
d2a5e1442e logconfig: Do not overwrite logger_subsys priority
logfile_priority and syslog_priority could be modified by
logging.logger_subsys.{logfile_priority|syslog_priority}. which could
lead to the following output(which are at notice level):

corosync[21419]:   [QUORUM] Using quorum provider corosync_votequorum
corosync[21419]:   [QUORUM] Members[1]: 1084777643
corosync[21419]:   [QUORUM] This node is within the primary component
                   and will provide service.
corosync[21419]:   [QUORUM] Members[3]: 1084777563 1084777584 1084777643

even the syslog_priority is warning. This patch could avoid the
overwrite.

Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-03-10 09:09:42 +01:00