mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-10-22 21:42:07 +00:00

Author	SHA1	Message	Date
Fabian Grünbichler	7fb2470966	cpg: send single confchg event per group on joinlist using a similar approach to `43bead3645` "Send one confchg event per CPG group to CPG client" which did the same for leave events on a network partition. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-06-13 15:15:32 +02:00
Fabian Grünbichler	c16abe515f	cpg: notify_lib_joinlist: drop conn parameter since it is always set to NULL. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-06-13 15:14:53 +02:00
Jan Friesse	0839d3af82	totemknet: Initialize return value in setup_nozzle Also add comment why return value is currently not used. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-06-12 15:40:51 +02:00
Jan Friesse	0d82e23517	totemknet: macaddr_str is always set Check for NULL was invalid, because macaddr_str is ether defined in cmap or set to "54:54:01:00:00:00". Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-06-12 15:40:51 +02:00
Jan Friesse	9b809383e6	totemknet: Ignore icmap_get_string result ... and add comment why it is not a bug. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-06-12 15:40:51 +02:00
Jan Friesse	9a0e7b584e	totemknet: create_nozzle_device simplify check ipaddr existence is checked for being not NULL by caller setup_nozzle. Also ipaddr was passed to reparse_nozzle_ip_address function unchecked so code would crash before reaching the actual check. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-06-12 15:40:50 +02:00
Jan Friesse	d4d48d9268	totemip: Use res in totemip_sa_equal Setting res to -1 was not entirely following semantics of "equal" operation. Set it to 0 and return it when families differs makes compiler happy. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-06-12 15:40:50 +02:00
Jan Friesse	299c9c5b70	totemconfig: ipaddr_equal use switch Compiler may have problem understanding relation between addr1p and addrlen. Small change makes code a little more readable and compiler happy. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-06-12 15:40:50 +02:00
Jan Friesse	9bba026bcd	knet: Use block_unlisted_ips Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-05-29 16:30:18 +02:00
Jan Friesse	72737d3929	udpu: Drop packets from unlisted IPs This feature allows corosync to block packets received from unknown nodes (nodes with IP address which is not in the nodelist). This is mainly for situations when "forgotten" node is booted and tries to join cluster which already removed such node from configuration. Another use case is to allow atomic reconfiguration and rejoin of two separate clusters. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-05-29 16:30:10 +02:00
Christine Caulfield	482df5d67b	knet: Fix initialising of knet access lists. It needs to be done at both reload and initialize time. Also disable access lists if the config key is removed. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-05-29 16:29:56 +02:00
Fabio M. Di Nitto	5c9a2b1c06	knet: allow corosync to use knet access lists currently knet acl are only available on master but they might be backported to stable1 as they don´t break onwire protocol. Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-05-29 16:29:35 +02:00
yuan ren	2a4cd3c4af	totemconfig: Fix minimum limit for hold timeout Make sure the retransmit timeout have the lowest limit `MINIMUM_TIMEOUT`. So, the lowest limit of hold should be recalculated. Also token timeout and retransmits count should keep a relational expression. Signed-off-by: yuan ren <yren@suse.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-05-15 16:28:43 +02:00
Christine Caulfield	01ce5a96ef	knet: Fix a couple of errors when adding a new link When adding a new link for the first time you will often see: 1) knet_link_set_ping_timers for nodeid 1, link 1 failed: Invalid argument (22) 2) New config has different knet transport for link 1. Internal value was NOT changed. To reconfigure an interface it must be deleted and recreated. A working interface needs to be available to corosync at all times 1) is caused by setting the ping timers twice, once in totemknet_member_add() and once in totemknet_refresh_config(). The first time we don't know the value so it's zero and thus display an error. For this we simply check for the zero and skip the knet API call. It's not ideal, but totemconfig needs a lot of reconfiguring itself before we can make this more sane. 2) was caused by simply comparing an unconfigured link with a configured one, so OF COURSE, they are going to be different! Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-05-02 16:42:03 +02:00
yuan ren	70cda5d55f	totemconfig: fix autogen mcastaddr for ipv6-4 When UDP is used as a transport, the error would occur "Multicast address family does not match bind address family" because there is no ipv6 in /etc/hosts specified but using the totem.ip_version: ipv6-4. because the mcastaddr generated (if not specified) only according to the totem.ip_version. Solution is to use bindnetaddr (configured or generated from nodelist) addr family. Signed-off-by: yuan ren <yren@suse.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-05-02 10:53:54 +02:00
Jan Friesse	3172a76d12	totemconfig: Ensure nodeid is specified for IPv6 Thanks Yuan Ren <yren@suse.com> for finding this problem. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-04-25 17:11:12 +02:00
Jan Friesse	d05c1593a1	totemconfig: ipaddr_equal check just addr part Checking whole structure is fine for IPv4, but IPv6 contains also scope id, what may be problem for local address. It's possible to use a zone index, but because it's not required when host name is used, it shouldn't be needed when IPv6 address is used. Example configuration snip which fails without patch: ... nodelist { node { nodeid: 1 ring0_addr: fe80:🔢5678:9abc:def1 } } ... (example succeed when %eth0 is used). With patch, zone index is not needed. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-04-23 16:17:42 +02:00
Jan Friesse	41f9e966bb	cpg: Add CPG_REASON_UNDEFINED Previously the reason field for the member_list items in cpg_totem_confchg_fn was unset what may be little confusing. Solution is to add a special value CPG_REASON_UNDEFINED and use it for the member_list items. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-04-16 14:49:10 +02:00
Fabian Grünbichler	b97ca8e9f0	crypto: re-introduce secauth parameter with the following semantics: - default off - implies crypto_hash SHA256 and crypto_cipher AES256 - crypto_* have higher precedence - only applicable for knet, like crypto_* this should make upgrading from Corosync 2.x less painful for users that have an explicit secauth=on in their configuration. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-04-15 13:29:41 +02:00
Jan Friesse	d05636b738	totemconfig: Remove support for 3des Triple DES is considered as a "weak cipher" since 2016 so there is really no need to support it in the corosync. Thanks to bug in Corosync/Knet/NSS which caused 3des to not work at all, no matter what library was used, we can just remove support for 3des without braking the compatibility. Also fix coroparse so: - totem.crypto_type is removed (this is 1.x construct which was not used even in 2.x) - Add checking of totem.crypto_model. - Enumarate possible values for crypto_model, crypto_cipher and crypto_hash error messages Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-04-11 15:15:38 +02:00
Fabian Grünbichler	03fba21503	set totem.keyfile and totem.key to RO so that we get the nice log message when attempting to modify them at runtime, just like for totem.crypto_* and co. Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-04-05 17:07:03 +02:00
yuan ren	24a72e9780	totemsrp: Word spelling mistake Signed-off-by: yuan ren <reyren179@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-04-01 08:20:46 +02:00
Jan Friesse	7c825173de	coroparse: Fix compiler warning Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-02-26 13:28:40 +01:00
Christine Caulfield	eab55e7384	nozzle: Add support for libnozzle devices A nozzle device is a pseudo ethernet device that routes network traffic through a channel on the corosync knet network (NOT cpg or any corosync internal service) to other nodes in the cluster. It allows applications to take advantage of knet features such as multipathing, automatic failover, link switching etc. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-02-26 13:11:35 +01:00
Jan Friesse	2ab4d41886	totemip: Use AF_UNSPEC for ipv4-6 and ipv6-4 AF_UNSPEC returns different results than AF_INET/AF_INET6, because of nsswitch.conf search is in order and it stops asking other modules once current module success. Example of difference between previous and new code when ipv6-4 is used: - /etc/hosts contains test_name with an ipv4 - previous code called AF_INET6 where /etc/hosts failed so other methods were used which may return IPv6 addr -> result was ether fail or IPv6 address. - new code calls AF_UNSPEC returning IPv4 defined in /etc/hosts -> result is IPv4 address New code behavior should solve problems caused by nss-myhostname. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2019-01-11 09:37:30 +01:00
Fabio M. Di Nitto	ff7ace7655	[totemknet] update for libknet.so.2.0.0 init API more changes are to be expected on this front as the API evolves in knet master. Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-01-03 10:10:38 +01:00
Ferenc Wágner	13b070f0c8	Don't declare success early Here we're very far from entering the main loop, even farther from sending the READY notification to systemd. This sounded awkward: systemd[1]: Starting Corosync Cluster Engine... corosync[827]: [MAIN ] Corosync Cluster Engine ('2.99.5'): started and ready to provide service. corosync[827]: [MAIN ] Corosync built-in features: dbus monitoring watchdog augeas systemd xmlconf snmp pie relro bindnow corosync[827]: [MAIN ] parse error in config: No interfaces defined corosync[827]: [MAIN ] Corosync Cluster Engine exiting with status 8 at main.c:1378. systemd[1]: corosync.service: Main process exited, code=exited, status=8/n/a systemd[1]: corosync.service: Failed with result 'exit-code'. systemd[1]: Failed to start Corosync Cluster Engine. Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2019-01-03 09:50:36 +01:00
Ferenc Wágner	ba24bef8bd	More natural error messages Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2019-01-03 09:48:51 +01:00
Jan Friesse	0ee7fd0c2f	main: Rename run_dir to state_dir system.run_dir was a little bit unfortunate and confusing name. Rename to state_dir makes more evident what is content of this directory. To keep setting consistent with code, get_run_dir is changed to get_state_dir. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-12-14 13:48:33 +01:00
Jan Friesse	a84ade701c	totemconfig: Enhance totem.ip_version Originally totem.ip_version was used to force ip version used by totem. With Knet this variable didn't make too much sense so it was not used. Sadly rely only on DNS resolver order doesn't always work (RFC is quite complicated, but if IPv6 is not configured then IPv4 is preferred), what we tried to solve by forcing IPv6 and only if that fails, use IPv4. Sadly this collides with nss_myhostname which is able to return every local address and today system usually have at least one autogenerated link-local IPv6 address so it is able to "overwrite" /etc/hosts. Solution is to enhance totem.ip_version and use it also for Knet. totem.ip_version is now just a flag for resolver and can have four states: ipv4 (only IPv4 is used), ipv6 (only IPv6 is used), ipv4-6 (ask IPv4 first and if it fails ask for IPv6) and ipv6-4 (ask IPv6 first and if it fails ask for IPv4). Default for Knet and UDPU transports is ipv6-4, for UDP it's ipv4, because autogenerated mcast addr doesn't play too well with ipv6-4. So everywhere where nss_myhostname becomes problem, it's just possible to set totem.ip_version to ipv4-6. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-12-14 10:56:06 +01:00
Jan Friesse	aa7daf8c77	totemip: Add debug information to totemip_parse It's required to create TOTEM logsys subsys before totemip_parse is used (so before totem_config_read). Logsys is not yet fully initialized, but it's good enough. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-12-13 15:25:04 +01:00
Jan Friesse	e17e3f4b81	totemconfig: Add IPs to family mismatch error Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-12-13 15:24:37 +01:00
Christine Caulfield	3d7f136f86	config: Look up hostnames in a defined order Current practice is to let getaddrinfo() decide which address we get but this is not necessarily deterministic as DNS servers won't always return addresses in the same order if a node has several. While this doesn't deal with node names that have multiple IP addresses of the same family (that's an installation issue IMHO) we can, at least, force a definite order for IPv6/IPv4 name resolution. I've chosen IPv6 then IPv4 as that's what happens on my test system ( using /etc/hosts) and it also seems more 'future proof'. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-12-11 11:20:50 +01:00
Jan Friesse	41ce8fc640	totemconfig: Really use totemip_parse results Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-12-10 11:40:13 +01:00
Christine Caulfield	e6be234565	config: Disallow corosync-cmapctl updates of nodelist It didn't work anyway (the config system requires whole links to be configured at once) and caused crashes. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-12-06 14:30:34 +01:00
Christine Caulfield	ab941843d5	config: Report IP addr/nodename parse errors back Corosync used to just ignore parse errors so that un-resolved names could cause silent failures. We now always check the result from totemip_parse() and at least print something in syslog. There's also a little get-out here that allows you to correct a bad node address without having to destroy and recreate the whole link. I'm being nice to you. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-12-03 17:25:42 +01:00
Jan Friesse	42aded40cb	coroparse: Remove unused cs_err initialization Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-11-29 17:50:01 +01:00
Jan Friesse	db7eebf817	stats: Fix delete of track When cmap_track_delete was called to stats map (cmap created with CMAP_MAP_STATS parameter) result was always ERR_BAD_HANDLE. It turned out that corosync part of cmap is always calling icmap function to get user data (where required hdb handle is stored) instead of generalized map_fns. After fixing this issue, valgrind showed error about jump depending on unitialized data in stats_map_track_delete. Solution seems to be always initialize tracker->events (so not only when track_type is add or delete). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-11-16 11:47:22 +01:00
Jan Friesse	fec6bfd341	main: Remove COROSYNC_RUN_DIR Remove last used environment variable (reasons similar to removal of COROSYNC_MAIN_CONFIG_FILE). This environment variable was never documented, so document it properly. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-11-15 17:30:37 +01:00
Jan Friesse	29f46c56d0	main: Remove COROSYNC_TOTEM_AUTHKEY_FILE Remove another environment variable (reasons similar to removal of COROSYNC_MAIN_CONFIG_FILE). Also properly document both totem.keyfile and totem.key. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-11-15 17:30:22 +01:00
Jan Friesse	2f9aaefbb7	main: Replace COROSYNC_MAIN_CONFIG_FILE COROSYNC_MAIN_CONFIG_FILE environment variable was quite well hidden and it was never used by init script. It also makes quite hard to debug possible problems. Replace it by -c option. Also patch makes use of configuration file path as a base for uidgid.d directory, so it's no longer needed to keep uidgid.d in sysconfdir. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-11-15 17:30:14 +01:00
Jan Friesse	cd4d5fd38e	main: Move sched paramaters to config file The reason for this change is, that number of corosync CLI options kind of exploded and scheduler based one are really beter to be kept in config file. Nice side-effect of this move is better "integration" with systemd, because currently used EnvironmentFile should be really used for environment and not that much for passing extra options to CLI. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-11-15 17:30:03 +01:00
Jan Friesse	48cb28b0a4	logsys: Make hires timestamp default Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-10-29 17:45:35 +01:00
Jan Friesse	bd2fff5bb3	logsys: Support hires timestamp Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-10-29 17:45:29 +01:00
Jan Friesse	bd338449ac	totemconfig: Fix logging of freed string Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-10-29 17:45:19 +01:00
Christine Caulfield	707a9afa30	config: Allow generated nodeis for UDP & UDPU The conversion to the new srp_addr format broke the feature where UDP/UDPU nodes could get their nodeids generated from the IP address. A big part of this was the removal of mandatory ring0_addr - it was used as a placeholder when reading down the nodelist. I replaced this with nodeid thinking that nodeid was now mandatory, forgetting this use case. So the compare on "ring0_addr" or "nodeid" is now replaced with a more robust check that we're only reading keys from the same node_pos once, this was needed in votequorum.c as well as totemconfig.c Another tidying side-effect of this patch is that the nodeid generation is now all in a single routine in totemconfig.c and not shared between it and totemip.c. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-10-25 18:21:02 +02:00
Jan Friesse	82f35f1720	log: Implement support for reopening log files Feature depends on existence of libqb function qb_log_file_reopen. New function call is added into CFG service API. This function is used by corosync-cfgtool which now accepts -L parameter. Finally, logrotate "postrotate" script is calling corosync-cfgtool -L to notify corosync, instead of using copytruncate option. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-10-16 14:46:52 +02:00
Jan Friesse	b13ab76e10	totemconfig: Replace strcpy by strncpy Formally not needed, because totemip_print should not return string longer than INET6_ADDRSTRLEN, but static analysis tools are not capable of such conclusion. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-10-16 12:30:58 +02:00
Christine Caulfield	9f2d5a3a3f	config: Fix crash in reload if new interfaces are added This is a bug I seem to have introduced in `429209f4aa` where we compare links for changes. if a new node was added on an existing link then it was compared against a non-existant one in the previous configuration. We now only compare nodes that are in both interfaces. As I needed min() for this function, I moved it from individual .c files into util.h so we only have one copy. And the error message was fixed. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-10-15 15:54:57 +02:00
Jan Friesse	853e5b96fb	build: Do not compile totempg as a shared library Instead of compiling totempg as a shared library, compile all totem code directly into corosync binary. Main idea of having totempg which may be used in other projects was nice, but never really finished (and as far as I know no project were ever really using it). So at the end of the day, we've end with huge amount of problems (need to pass new arguments thru X layers, hard debugging, ...) without any real benefit. For a future version, we may consider to revisit idea of split totemsrp into well tested library without unrelated bits like transports/ip/... Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-09-25 08:23:30 +02:00
Jan Friesse	06504c0f6f	build: Remove NSS dependencies Complete removal of NSS from corosync tree. Most of the changes are in build system and cpgverify had to be rewritten to use crc32 instead of sha1. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-09-17 10:26:05 +02:00
Jan Friesse	e3989c2b56	coroparse: Fix newly introduced warning Small fix for a problem introduced by "coroparse: Use key_name for error message" patch. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2018-09-07 16:53:08 +02:00
Chris Walker	51989b4a0a	Add option to force cluster into GATHER state Signed-off-by: Chris Walker <cwalker@cray.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-09-07 13:27:36 +02:00
Jan Friesse	0ac659608d	coroparse: Use key_name for error message Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-09-06 13:02:03 +02:00
Jan Friesse	f6262e5755	coroparse: Add file name and line to error message It's just much easier to find out what is happening when message like parser error: /etc/corosync/corosync.conf:39: Unexpected closing brace is logged instead of parser error: Unexpected closing brace Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-09-06 13:01:56 +02:00
Jan Friesse	80701845ab	coroparse: Be more strict in what is parsed Corosync parser is not very clever, but it is able to detect more errors without too much code. 1. Check if section name is not empty (just '{' character) 2. Check if there is no extra characters after opening bracket '{' 3. Check if there is no extra characters after or before closing bracket '}' 4. Check if line is opening section, closing section or key/value So following examples are reported as error: totem { version: 2 }}}}}}}}}} Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-09-06 13:01:35 +02:00
Jan Friesse	7a4725f9da	coroparse: Fix remove_whitespace end condition When remove_whitespace function parameter is single character string with whitespaces (like a:) then colon is not removed. Reason is end condition end != start, which is valid for empty string, but invalid in case described above. Solution is to check if *end is '\0'. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-09-06 13:01:20 +02:00
Jan Friesse	ffb759cd7d	coroparse: Check icmap_set results Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-09-06 13:01:12 +02:00
Jan Friesse	20bd68b3fb	coroparse: Return error if config line is too long Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-09-06 13:01:01 +02:00
Jan Friesse	c9e5d6db13	Remove libcgroup Libcgroup is deprecated and not shipping with new distributions (OpenSuSE is one example). Solution is to have a partial implementation of required functionality of libcgroup in the corosync code. Patch uses hardcoded cgroup mount point, because most of the systems are now systemd and systemd is also using hardcoded mountpoint (see https://github.com/systemd/systemd/blob/master/src/core/mount-setup.c) Configuration option --enable-cgroup is gone, because it's not needed any longer. Big thanks to Christine Caulfield <ccaulfie@redhat.com> for example of simplified implementation of cgroup management code primitives. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-14 14:54:28 +02:00
Chris Walker	3f7d2cf6aa	Add token_warning configuration option Token_warning is used to present information about when the token was last received. Signed-off-by: Chris Walker <cwalker@cray.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-08-14 10:34:49 +02:00
Jan Friesse	f60541513e	totemsrp: Add assert into memb_lowest_in_config Add assert when there are no members in token_memb structure so non-existing member is not accessed (token should always have at least one member). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:47 +02:00
Jan Friesse	1d2c6e4696	totemconfig: Enlarge error_string_response ... so error_reason can be fully included into parse error message. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:44 +02:00
Jan Friesse	0095b9a3cb	ipc_glue: Fix strncpy in pid_to_name function Trailing zero is always added so there is no need to have a warning about unterminated destination string. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:43 +02:00
Jan Friesse	f576ad6388	util: Fix strncpy in setcs_name_t function Trailing zero is always added so there is no need to have a warning about unterminated destination string. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:39 +02:00
Jan Friesse	844a76e775	totemknet: Free instance on failure exit Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:35 +02:00
Jan Friesse	31268cc744	totemudpu: Pass correct paramto totemip_nosigpipe Fixes compilation on (at least) FreeBSD. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2018-07-12 16:29:15 +02:00
Bin Liu	96b4bd1660	totemudpu: Add local loop support This patch intends to solve long time ifdown corosync problem. Idea is to use local socket for sending both unicast and multicast messages if interface is down. Together with testing what is current bind state it's possible to keep pretending existence of old IP address instead of rebinding to localhost what breaks a lot things badly. Heavilly based on Yu, Zou <zouyu@shiqichuban.com> work and it's basically port of UDP patch created by Jan Friesse <jfriesse@redhat.com>. (ported from needle `96354fba72`) Signed-off-by: Bin Liu <bliu@suse.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-12 15:43:03 +02:00
Christine Caulfield	a471bab798	config: Fail config validation if not all nodes have all links KNET requires that all links be full-mesh (this may change in the future but almost certainly not before knet 2.0), so enforce this in the config. Also avoid a potential div-by-0 error if the local node is not fully configured either. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-03 12:38:02 +02:00
Christine Caulfield	d1db8c2851	config: Enforce use of 'name' node attribute in multi-link clusters If the local host does not have a 'name' attribute and the cluster has more than one link then fail the validation test. I'm open to the idea of checking all of the nodes in the nodelist if necessary. It seems overkill as each node will check its own entry though. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-03 12:37:45 +02:00
Christine Caulfield	429209f4aa	totemconfig: Check for things that cannot be changed on the fly There are a few things in the interface that cannot be changed on the fly. Warn about them and tell the user that these things need to be done in two steps and why. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-02 09:54:31 +02:00
Jan Friesse	cc81696ff5	Fix snprintf warnings Compiler shows warnings about possible not large enough buffer, so check snprintf return value properly. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-07-02 08:08:33 +02:00
Christine Caulfield	137b31397c	knet: Don't try to create loopback interface twice It wasn't hardmful, but it generated an annoying message Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-02 08:00:36 +02:00
Christine Caulfield	5dda71ae29	knet: Fix knet log buffer size knet sends log messages as struct knet_log_msg, not a string of KNET_MAX_LOG_MSG_SIZE (which is only part of that structure). So we were both losing and corrupting messages. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-02 08:00:15 +02:00
Jan Friesse	23e17953fe	cpg: Inform clients about left nodes during pause Patch tries to fix incorrect behaviour during following test-case: - 3 nodes - Node 1 is paused - Node 2 and 3 detects node 1 as failed and informs CPG clients - Node 1 is unpaused - Node 1 clients are informed about new membership, but not about Node 1 being paused, so from Node 1 point-of-view, Node 2 and 3 failure Solution is to: - Remove downlist master choose and always choose local node downlist. For Node 1 in example above, downlist contains Node 2 and 3. - Keep code which informs clients about left nodes - Use joinlist as a authoritative source of nodes/clients which exists in membership This patch doesn't break backwards compatibility. I've walked thru all the patches which changed behavior of cpg to ensure patch does not break CPG behavior. Most important were: - `058f50314c` - Base. Code was significantly changed to handle double free by split group_info into two structures cpg_pd (local node clients) and process_info (all clients). Joinlist was - `97c28ea756` - This patch removed confchg_fn and made CPG sync correct - `feff0e8542` - I've tested described behavior without any issues - `6bbbfcb6b4` - Added idea of using heuristics to choose same downlist on all nodes. Sadly this idea was beginning of the problems described in `040fda8872`, `ac1d79ea7c`, `559d4083ed`, `02c5dffa5b`, `64d0e5ace0` and `b55f32fe2e` - `02c5dffa5b` - Made joinlist as authoritative source of nodes/clients but left downlist_master_choose as a source of information about left nodes Long story made short. This patch basically reverts idea of using heuristics to choose same downlist on all nodes. (ported from needle `9c2a97f4f9`) Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-30 14:37:20 +02:00
Jan Friesse	e45bbcc92a	totemsrp: Fix leave message regression Leave message in totem is just join message where leaving member is excluded from member list and included in fail list. It also contains special nodeid in header.nodeid and system_from.nodeid fields. Before "totem: Use nodeid ONLY in srp_addr" fix, most of the functions were using system_from addresses and not nodeid, which was used only in one specific case for memb_consensus_set function. After the patch, addresses are gone and only nodeid is used. Result is, that leaving node nodeid is not added into local fail list (my_faillist) so node is unable to reach consensus till token timeout, which starts new gather process. Solution is to send valid leaving node nodeid in system_from.nodeid and handle specific case for memb_consensus_set in memb_join_process. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-23 17:46:05 +02:00
Jan Friesse	dc590159f5	totemsrp: Log proc/fail lists in memb_join_process These information are useful and with trace log level they should not be too much irritating. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-23 17:45:51 +02:00
Jan Friesse	9b3782e48e	totemsrp: Fix srp_addr_compare There is regression caused by "totem: Use nodeid ONLY in srp_addr" patch in srp_addr_compare function. This function should be usable with qsort, so it should return values less than, equal to or greater than zero. It was however returning only zero or negation of a zero. Final results were unable to reach consensus in following test case: - 3 node cluster - start nodes 1, 2, 3 - shutdown node 3 - start node 3 - shutdown node 2 - start node 2 - shutdown node 1 After this steps, node 2 and 3 were unable to reach consensus. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-23 17:45:29 +02:00
Ferenc Wágner	baece74c39	Fix typo: sucesfully -> successfully Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-04-20 12:04:49 +02:00
Jan Friesse	ccb2290f84	totemsrp: Check join and leave msg length If number of proc_list, failed_list or active members is too high it may be impossible to put them into message, which is allocated on the stack what results in stack corruption. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-12 15:25:38 +02:00
Jan Friesse	c139255669	totemsrp: Implement sanity checks of received msgs Sanity checkers are used to prevent crashing because of accessing unallocated memory. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-12 15:25:33 +02:00
Jan Friesse	69857efb5b	totem: Display IP of sender To make finding victim of incompatible messages easier, IP of sender is logged. Propagating IP in layers makes patch slightly larger. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-16 13:58:15 +01:00
Jan Friesse	0c509a25a7	totemsrp: Add magic and version into header Magic number (0xC070) together with version in every packet is used for detecting that other node is really Corosync 3.x. Endian_detector field is removed and magic number is now used instead. If received packet magic number differs, guessing is used to show more about the source (Corosync 2.3+, 2.2 are quite reliable, Knet and unencrypted Corosync 2.1/2.0/1.x/OpenAIS are semi-reliable and encrypted Corosync 2.1/2.0/1.x/OpenAIS are quite unreliable). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-16 13:57:55 +01:00
Christine Caulfield	066525efd3	knet: Fix display of links with unconfigured link0 because totemknet always configures link0 as loopback even if it's not known to corosync, we need to filter it out when returning the link status, as things get misaligned in cfg. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-16 13:11:13 +01:00
Jan Friesse	b3f3a1df26	main: Set errno before calling of strtol Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-02 17:29:22 +01:00
Christine Caulfield	2c20590d16	knet: Always use link0 for loopback Even if it's not used for anything else. Also, make cfgtool show the correct link ID when links are not contiguous Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:23:20 +01:00
Christine Caulfield	111bfbc11d	totem: Fix debug warnings printed by knet Fix crash introduced a couple of commits ago in iface_get Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:22:22 +01:00
Christine Caulfield	f5871c6b4c	config: Allow use of ring0_addr Allow ring0_addr to be used in place of 'name' for backwards compatibility Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:21:37 +01:00
Christine Caulfield	7a639d1b62	config: Update message when local host isn't found Make the message more representative of what's going on. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:20:00 +01:00
Christine Caulfield	386d710ed1	cfg: Fix cfg_get_node_addrs so that DLM works Also update copyright dates Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:19:45 +01:00
Christine Caulfield	f5b690bd96	totem: Return interface count correctly Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:19:12 +01:00
Christine Caulfield	fc8580bdbf	totem: Use nodeid ONLY in srp_addr This shrinks the srp_addr (and consequently every packet sent by corosync) so that instead of containing loads of IP addresses to identify a node, it just sends the nodeid. This then allows us to make ring0 optional and replaceable when running knet. It also means that we need some other way of identifying the local node in corosync.conf, so the nodelist.node.name entry is now mandatory and is mapped to the local host using the same algorithm as used in cman. This code needs LOTS of testing as it touches a huge amount of totemsrp and totemconfig. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:18:51 +01:00
Rytis Karpuška	105f3ae98c	totempg: Fix corrupted messages Commit `899cb29983` changed copy_len to iovec[i].iov_len, assuming, copy_len is always the same as iovec[i].iov_len under those circumstances, but it missed the possability of small message being partly put at the end of packet, which cuts this message in two parts and therefore making copy_len not equal to iovec[i].iov_len. This is revert of `899cb29983` Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-02-09 17:38:05 +01:00
Rytis Karpuška	899cb29983	totempg: use iovec[i].iov_len instead of copy_len To be more explicit that we are copying whole message. Related to `0ebae6b47d`. Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-02-08 09:30:07 +01:00
Rytis Karpuška	0ebae6b47d	totempg: Fix fragmentation segfault The problem was that two or more messages were concatenated together during fragmentation in mcast_msg() function. In specific case, message of just short of 1MB was provided for mcast_msg() and it happened so, that the remainder (212 bytes to be exact) left some free space in packet, therefore branch if ((copy_len + fragment_size) < (max_packet_size - sizeof (unsigned short))) { ... was selected and this was the last mesage in provided iovec. Then, on the second call, came another big message (about 300KB ) and during fragmentation mcast.fragmented was set to 1. On the other end, while receiving messages, due to missing mcast.fragmentation==0 those two messages were concatenated and therefore assembly->data array overflowed overwriting linked list pointers and offset (which happened to be set to 0 and that 300KB message was being copied from the beginning again). After whole 300KB message has been sent, mcast.fragmentation==0 arrived and totempg_deliver_fn() tried to move assembly structure to assembly_list_free list, but as linked list pointers has been overriden, segfault occured. Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-02-08 09:29:22 +01:00
Fabio M. Di Nitto	1411608a81	[build] fix build with non-standard knet location Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-02-05 15:57:12 +01:00
Jan Friesse	11fa527ed4	logging: Close before and open blackbox after fork Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-01-30 13:21:52 +01:00
Jan Friesse	79dba9c51f	logging: Make blackbox configurable Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-01-30 13:21:48 +01:00
Jan Friesse	1fba1b83aa	build: Replace -lknet with autoconf generated vars Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-01-25 16:08:09 +01:00
Jan Friesse	589ed92505	build: Remove rdma/ibverbs Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-01-25 16:08:07 +01:00

1 2 3 4 5 ...

2100 Commits