mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-10-09 23:26:16 +00:00

Author	SHA1	Message	Date
Jan Friesse	5731af2782	logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID Previously node id was logged ether as a %d (most often), %u, %x or PRI.32 and ring id ether as %lld, %llx with various separators (., :, /) between rep nodeid and seq. This seems to cause confusion. This patch adds macros CS_PRI_NODE_ID, CS_PRI_RING_ID and CS_PRI_RING_ID_SEQ (CS prefix = corosync, PRI modeled in spirit of inttypes.h PRIx32) and makes code use them. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-07-03 10:53:52 +02:00
Fabian Grünbichler	03fba21503	set totem.keyfile and totem.key to RO so that we get the nice log message when attempting to modify them at runtime, just like for totem.crypto_* and co. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-04-05 17:07:03 +02:00
Ferenc Wágner	13b070f0c8	Don't declare success early Here we're very far from entering the main loop, even farther from sending the READY notification to systemd. This sounded awkward: systemd[1]: Starting Corosync Cluster Engine... corosync[827]: [MAIN ] Corosync Cluster Engine ('2.99.5'): started and ready to provide service. corosync[827]: [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pie relro bindnow corosync[827]: [MAIN ] parse error in config: No interfaces defined corosync[827]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1378. systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a systemd[1]: corosync.service: Failed with result 'exit-code'. systemd[1]: Failed to start Corosync Cluster Engine. Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-01-03 09:50:36 +01:00
Ferenc Wágner	ba24bef8bd	More natural error messages Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-01-03 09:48:51 +01:00
Jan Friesse	0ee7fd0c2f	main: Rename run_dir to state_dir system.run_dir was a little bit unfortunate and confusing name. Rename to state_dir makes more evident what is content of this directory. To keep setting consistent with code, get_run_dir is changed to get_state_dir. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-12-14 13:48:33 +01:00
Jan Friesse	a84ade701c	totemconfig: Enhance totem.ip_version Originally totem.ip_version was used to force ip version used by totem. With Knet this variable didn't make too much sense so it was not used. Sadly rely only on DNS resolver order doesn't always work (RFC is quite complicated, but if IPv6 is not configured then IPv4 is preferred), what we tried to solve by forcing IPv6 and only if that fails, use IPv4. Sadly this collides with nss_myhostname which is able to return every local address and today system usually have at least one autogenerated link-local IPv6 address so it is able to "overwrite" /etc/hosts. Solution is to enhance totem.ip_version and use it also for Knet. totem.ip_version is now just a flag for resolver and can have four states: ipv4 (only IPv4 is used), ipv6 (only IPv6 is used), ipv4-6 (ask IPv4 first and if it fails ask for IPv6) and ipv6-4 (ask IPv6 first and if it fails ask for IPv4). Default for Knet and UDPU transports is ipv6-4, for UDP it's ipv4, because autogenerated mcast addr doesn't play too well with ipv6-4. So everywhere where nss_myhostname becomes problem, it's just possible to set totem.ip_version to ipv4-6. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-12-14 10:56:06 +01:00
Jan Friesse	aa7daf8c77	totemip: Add debug information to totemip_parse It's required to create TOTEM logsys subsys before totemip_parse is used (so before totem_config_read). Logsys is not yet fully initialized, but it's good enough. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-12-13 15:25:04 +01:00
Christine Caulfield	e6be234565	config: Disallow corosync-cmapctl updates of nodelist It didn't work anyway (the config system requires whole links to be configured at once) and caused crashes. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-12-06 14:30:34 +01:00
Jan Friesse	2f9aaefbb7	main: Replace COROSYNC_MAIN_CONFIG_FILE COROSYNC_MAIN_CONFIG_FILE environment variable was quite well hidden and it was never used by init script. It also makes quite hard to debug possible problems. Replace it by -c option. Also patch makes use of configuration file path as a base for uidgid.d directory, so it's no longer needed to keep uidgid.d in sysconfdir. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-11-15 17:30:14 +01:00
Jan Friesse	cd4d5fd38e	main: Move sched paramaters to config file The reason for this change is, that number of corosync CLI options kind of exploded and scheduler based one are really beter to be kept in config file. Nice side-effect of this move is better "integration" with systemd, because currently used EnvironmentFile should be really used for environment and not that much for passing extra options to CLI. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-11-15 17:30:03 +01:00
Jan Friesse	82f35f1720	log: Implement support for reopening log files Feature depends on existence of libqb function qb_log_file_reopen. New function call is added into CFG service API. This function is used by corosync-cfgtool which now accepts -L parameter. Finally, logrotate "postrotate" script is calling corosync-cfgtool -L to notify corosync, instead of using copytruncate option. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-10-16 14:46:52 +02:00
Chris Walker	51989b4a0a	Add option to force cluster into GATHER state Signed-off-by: Chris Walker <cwalker@cray.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-09-07 13:27:36 +02:00
Jan Friesse	c9e5d6db13	Remove libcgroup Libcgroup is deprecated and not shipping with new distributions (OpenSuSE is one example). Solution is to have a partial implementation of required functionality of libcgroup in the corosync code. Patch uses hardcoded cgroup mount point, because most of the systems are now systemd and systemd is also using hardcoded mountpoint (see https://github.com/systemd/systemd/blob/master/src/core/mount-setup.c) Configuration option --enable-cgroup is gone, because it's not needed any longer. Big thanks to Christine Caulfield <ccaulfie@redhat.com> for example of simplified implementation of cgroup management code primitives. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-14 14:54:28 +02:00
Chris Walker	3f7d2cf6aa	Add token_warning configuration option Token_warning is used to present information about when the token was last received. Signed-off-by: Chris Walker <cwalker@cray.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-08-14 10:34:49 +02:00
Jan Friesse	cc81696ff5	Fix snprintf warnings Compiler shows warnings about possible not large enough buffer, so check snprintf return value properly. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-07-02 08:08:33 +02:00
Ferenc Wágner	baece74c39	Fix typo: sucesfully -> successfully Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-04-20 12:04:49 +02:00
Jan Friesse	b3f3a1df26	main: Set errno before calling of strtol Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-02 17:29:22 +01:00
Christine Caulfield	fc8580bdbf	totem: Use nodeid ONLY in srp_addr This shrinks the srp_addr (and consequently every packet sent by corosync) so that instead of containing loads of IP addresses to identify a node, it just sends the nodeid. This then allows us to make ring0 optional and replaceable when running knet. It also means that we need some other way of identifying the local node in corosync.conf, so the nodelist.node.name entry is now mandatory and is mapped to the local host using the same algorithm as used in cman. This code needs LOTS of testing as it touches a huge amount of totemsrp and totemconfig. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:18:51 +01:00
Jan Friesse	11fa527ed4	logging: Close before and open blackbox after fork Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-01-30 13:21:52 +01:00
Jan Friesse	79dba9c51f	logging: Make blackbox configurable Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-01-30 13:21:48 +01:00
Ferenc Wágner	09b0123d58	Send corosync startup notification to systemd This enables starting the daemon directly in the service file, because dependent units won't be started until initialization is complete. Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2017-11-09 09:49:18 +01:00
Christine Caulfield	01495f650c	main: use syslog & printf directly for early log messages libqb seems funny about logging things before its fully configured. This corosync commit didn't help either: `8b6bd86a55` So to make sure that messages about the config file not being opened get delivered to the user/syslog we send them directly. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2017-08-22 09:51:09 +01:00
Jan Friesse	9a50628fd1	main: Add support for libcgroup When corosync is started in environment where it ends in cgroup without properly set rt_runtime_us it's impossible to get RT priority. Already implemented workaround is to use higher non-RT priority. This patch implements another solution. It moves corosync into root cpu cgroup. Root cpu cgroup hopefully has enough RT budget. Another solution was mentioned on ML https://lists.freedesktop.org/archives/systemd-devel/2017-July/039353.html but this means to generate some "random" values. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> (cherry picked from commit `c56086c701`)	2017-08-01 14:32:53 +02:00
Christine Caulfield	55c3dcb76d	stats: Add map with on-demand statistics Icmap is factored out so it's possible to add other maps for cmap. API call to switch maps from application end is added. Corosync-cmapctl is enhanced with -m option. Stats contains all statistics previously found in runtime.connections, runtime.services and runtime.totem prefixes together with new knet related. All stats are read only. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2017-07-27 15:53:04 +02:00
Christine Caulfield	876910d8ff	ipc: Check for the libraries sending invalid message IDs If the library sent an invalid (ie too high) message ID to corosync, then it could cause the daemon to crash. Now we check the message ID before indexing the function array Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2017-07-14 14:06:49 +01:00
Jan Friesse	9627d7350b	main: Add option to set priority Option -P takes numeric value with same meaning as nice or values min / max, meaning maximal / minimal priority (so minimal / maximal nice value). Scheduler / priority setting is moved in code so it is now executed after logsys is configured so errors are logged. Setting maximal priority is also used as fallback when realtime scheduling is requested and sched_setscheduler fails. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> (cherry picked from commit `a008448efb`)	2017-07-10 16:40:39 +02:00
Jan Friesse	564b4bf7d4	totem: Propagate totem initialization failure Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2017-06-15 11:07:33 +02:00
Jan Friesse	95b91e4ae7	main: Display reason why cluster cannot be formed Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2017-05-18 17:15:55 +02:00
Andrew Price	86012ebb45	Main: Call mlockall after fork Man page of mlockall is clear: Memory locks are not inherited by a child created via fork(2) and are automatically removed (unlocked) during an execve(2) or when the process terminates. So calling mlockall before corosync_tty_detach is noop when corosync is executed as a daemon (corosync -f was not affected). This regression is caused by `ed7d054e55` (setprio for logsys/qblog was correct, mlockall was not). Solution is to move corosync_mlockall call on correct place. Signed-off-by: Andrew Price <anprice@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2017-04-25 14:50:04 +02:00
Bin Liu	0462b5e609	totemconfig: Prefer nodelist over bindnetaddr In a two-node cluster, I 've one node configured with open-vswtich: 5: br-fixed: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default inet 192.168.124.88/24 scope global br-fixed inet 192.168.124.87/24 scope global secondary br-fixed inet 192.168.124.83/24 brd 192.168.124.255 scope global secondary tentative br-fixed inet 192.168.124.89/24 scope global secondary br-fixed while I use 192.168.124.83 in node list of corosync.conf with udpu, and the bind_addr is 192.168.124.0. After upgrading corosync on this node, the it uses 192.168.124.88 instead of 192.168.124.83. As we can see: corosync-cfgtool -s Printing ring status. Local node ID 1084783704 corosync-quorumtool -s Membership information: Nodeid Votes Name 1084783697 1 d52-54-77-77-01-02 1084783699 1 d52-54-77-77-01-01 (local) while the other node can only see itself: corosync-cfgtool -s Printing ring status. Local node ID 1084783697 RING ID 0 id = 192.168.124.81 status = ring 0 active with no faults corosync-quorumtool -s Membership information: Nodeid Votes Name 1084783697 1 d52-54-77-77-01-02.virtual.cloud.suse.de (local) this patch will check if there are both nodelist and bindnetaddr and if so, display warning and use nodelist information. Signed-off-by: Bin Liu <bliu@suse.com> Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2017-04-11 11:19:31 +02:00
Christine Caulfield	30771a39a8	main: Don't ask libqb to handle segv, it doesn't work segv should be handled by corosync, libqb is not the place to be handling emergency signals. This currently requires the head of libqb git tree to generate a blackbox & coredump in the event of a segfault, but it's better than the write() spin that currently happens. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2017-02-27 15:14:41 +00:00
Jan Friesse	8b6bd86a55	Logsys: Change logsys syslog_priority priority LibQB adds default "*" syslog filter so we have to set syslog_priority as low as possible so filters applied later in _logsys_config_apply_per_file takes effect. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2017-02-24 16:23:50 +01:00
Takeshi MIZUTA	034553c080	man: Modify man-page according to command usage Signed-off-by: Takeshi MIZUTA <miz.take4@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-12-01 16:32:42 +01:00
Michael Jones	b4c06e52f3	list: Replace uses of list.h with qblist.h Signed-off-by: Michael Jones <jonesmz@jonesmz.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-10-27 14:56:52 +02:00
Christine Caulfield	268cde6ee4	totem: Add Kronosnet transport. This is a big update that removes RRP & MRP from the codebase and makes knet the default transport for corosync. UDP & UDPU are still (currently) supported but are deprecated. Also crypto and mutiple interfaces are only supported over knet. To compile this codebase you will need to install libknet from https://github.com/fabbione/kronosnet The corosync.conf(5) man page has been updated with info on the new options. Older config files should still work but many options have changed because of the knet implementation so configs should be checked carefully. In particular any cluster using using RRP over UDP or UDPU will not start as RRP is no longer present. If you need multiple interface support then you should be using the knet transport. Knet brings many benefits to the corosync codebase, it provides support for more interfaces than RRP (up to 8), will be more reliable in the event of network outages and allows dynamic reconfiguration of interfaces. It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have plagued corosync/openais from day 1 Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>	2016-10-11 10:09:42 +01:00
Ferenc Wágner	cf10a754e9	Fix various typos occured -> occurred parantheses -> parentheses configuraton -> configuration aquire -> acquire retrive -> retrieve prefered -> preferred Signed-off-by: Ferenc Wágner <wferi@niif.hu> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-09-12 09:50:11 +02:00
Jan Friesse	f837f95dfe	Config: Flag config uidgid entries Uidgid entries parsed from configuration files now has prefix (uidgid.config.) so they are distinguishable from dynamically added entries. Entries added from config file are pruned on reload if no longer exists in config file (dynamic one stays unaffected). Also whole uidgid.config. prefix is made read only. This make PCMK work again after configuration reload is called. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2016-08-04 16:13:48 +02:00
Ferenc Wágner	b1de8efd15	Fix typo: aquire -> acquire Signed-off-by: Ferenc Wágner <wferi@niif.hu> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-06-22 14:26:28 +02:00
Christine Caulfield	571b1621e9	Add some more RO keys Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-05-24 12:33:55 +02:00
Christine Caulfield	1e2de52ef1	logging: Use our own version of basename basename() function has some potentially odd issues on other platforms. So, to be safe, here's an internal version. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-05-03 15:31:29 +02:00
Christine Caulfield	d245831d65	logsys: fix TOTEM logging when corosync built out of tree If corosync is built out-of-tree (passing --srcdir to configure) then TOTEM logging doesn't print anything. This is caused by the source filenames (from __FILE__ at compilation time) having the configured path in them - in this example ../corosync/exec/totemudp.c etc. The list of totem source filenames passed to libqb logging facility only has the basenames so the filenames never match up as libqb does an exact string match. I looked into fixing this in libqb but it causes a regression. We can't simply basename() __FILE__ at the point of calling log_printf as it's i common also to use __FILE__ to generate the logging source, and using basename() on both removes the distinction between similarly named files from different directories which could be a requirement. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>	2016-04-26 09:49:53 +01:00
Jan Friesse	d77cec24d0	Handle adding and removing UDPU members atomically When config file is reloaded with removed UDPU member, internal icmap index of nodelist.node can change. This can result in removal and then adding back node. This, with UDPU alive filtering (where member is by default considered as not a member) makes corosync not sending messages to such members resulting in new membership creation. Solution is to properly test which members were really deleted and added (instead of relying on internal and dynamic naming of icmap hash table key name). Also trully dynamic add and remove node (via cmap) is now handled by same function so totem_config->interfaces is now updated properly. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2015-01-21 16:37:26 +01:00
Jan Friesse	252b38ab8a	corosync_ring_id_store: Use safer permissions corosync_ring_id_store should use same (safer) permissions as corosync_ring_id_create_or_load for (eventually) newly created ringid file. Credit to Sjerek for finding this problem. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2015-01-20 11:21:05 +01:00
Jan Friesse	177ef0e524	Set RR priority by default Experience with larger production clusters showed that setting RR priority for corosync is viable for prevent random fencing, ... Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2015-01-05 15:01:49 +01:00
Jan Friesse	bb52fc2774	Store configuration values used by totem to cmap Some totem configuration values (like token, consensus, ...) are ether computed or default value is used. It's hard to find out, what value is really used. Solution is to store values in cmap. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-10-13 11:59:06 +02:00
Vladislav Bogdanov	e3ffd4fedc	Implement config file testing mode Signed-off-by: Vladislav Bogdanov <bubble@hoster-ok.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-07-16 16:10:32 +02:00
Jan Friesse	dfaca4b10a	Fix compiler warning introduced by previous patch QB loop signal handler prototype differs from signal(2) prototype. Solution is to create wrapper functions. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2014-07-09 15:57:35 +02:00
zouyu	384760cb67	Handle SIGSEGV and SIGABRT signals SIGSEGV and SIGABRT signals are now correctly handled (blackbox is dumped and logsys is finalized). Signed-off-by: zouyu <hopkings2005@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-07-03 15:13:48 +02:00
zouyu	cc80c8567d	fix memory leak produced by 'corosync -v' Signed-off-by: zouyu <hopkings2005@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-07-03 14:54:05 +02:00
Jan Friesse	72cf15af27	votequorum: Do not process events during reload During reload, local_node_pos is deleted and reinstation is handled in totemconfig after reload is finished. votequorum handles this events and tries to reload it's configuration. This led to logging a little scary messages (even nothing bad is happening, because after local_node_pos reinstation everything back to normal). Solution is to stop processing events during reload. Sadly, simple tracking of config.reload_in_progress doesn't work because LibQB events triggering order is undefined so votequorum reload handler can be called before totemconfig (and before local_node_pos is reinstatied). So new config.totemconfig_reload_in_progress key is defined with very similar semanthic as config.reload_in_progress but set inside totem_reload_notify function. Votequorum then use this new key. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-27 11:40:21 +02:00

1 2 3 4 5 ...

390 Commits