mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-11-03 10:28:45 +00:00

Author	SHA1	Message	Date
Jan Friesse	1d2c6e4696	totemconfig: Enlarge error_string_response ... so error_reason can be fully included into parse error message. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:44 +02:00
Jan Friesse	0095b9a3cb	ipc_glue: Fix strncpy in pid_to_name function Trailing zero is always added so there is no need to have a warning about unterminated destination string. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:43 +02:00
Jan Friesse	f286388275	cmap: Fix strncpy warning in cmap_iter_next cmap_iter_next in contrast of it's icmap counterpart copies key name into user preallocated space. In the worst case, key name may be CMAP_KEYNAME_MAXLEN, so cmap_iter_next then need CMAP_KEYNAME_MAXLEN + additional byte to store zero. strncpy was copying only CMAP_KEYNAME_MAXLEN characters so there was possibility of unterminated string. Patch solves this by using memcpy and always add trailing zero. Documentation was improved suggesting minimum size of keyname buffer to be CMAP_KEYNAME_MAXLEN + 1. Also sam and quorumtool were using too short buffer so they are fixed too. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:41 +02:00
Jan Friesse	f576ad6388	util: Fix strncpy in setcs_name_t function Trailing zero is always added so there is no need to have a warning about unterminated destination string. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:39 +02:00
Jan Friesse	844a76e775	totemknet: Free instance on failure exit Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-08-13 09:00:35 +02:00
Jan Friesse	afdc405512	spec: Add explicit gcc build requirement Also remove %clean macro which is not needed for ages. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2018-08-09 16:57:43 +02:00
Chris Walker	bde247677b	Add option for quiet operation to corosync-cmapctl Signed-off-by: Chris Walker <cwalker@cray.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-08-09 10:05:48 +02:00
Jan Friesse	31268cc744	totemudpu: Pass correct paramto totemip_nosigpipe Fixes compilation on (at least) FreeBSD. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2018-07-12 16:29:15 +02:00
Bin Liu	96b4bd1660	totemudpu: Add local loop support This patch intends to solve long time ifdown corosync problem. Idea is to use local socket for sending both unicast and multicast messages if interface is down. Together with testing what is current bind state it's possible to keep pretending existence of old IP address instead of rebinding to localhost what breaks a lot things badly. Heavilly based on Yu, Zou <zouyu@shiqichuban.com> work and it's basically port of UDP patch created by Jan Friesse <jfriesse@redhat.com>. (ported from needle `96354fba72`) Signed-off-by: Bin Liu <bliu@suse.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-12 15:43:03 +02:00
Christine Caulfield	a471bab798	config: Fail config validation if not all nodes have all links KNET requires that all links be full-mesh (this may change in the future but almost certainly not before knet 2.0), so enforce this in the config. Also avoid a potential div-by-0 error if the local node is not fully configured either. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-03 12:38:02 +02:00
Christine Caulfield	d1db8c2851	config: Enforce use of 'name' node attribute in multi-link clusters If the local host does not have a 'name' attribute and the cluster has more than one link then fail the validation test. I'm open to the idea of checking all of the nodes in the nodelist if necessary. It seems overkill as each node will check its own entry though. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-03 12:37:45 +02:00
Christine Caulfield	429209f4aa	totemconfig: Check for things that cannot be changed on the fly There are a few things in the interface that cannot be changed on the fly. Warn about them and tell the user that these things need to be done in two steps and why. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-02 09:54:31 +02:00
Jan Friesse	cc81696ff5	Fix snprintf warnings Compiler shows warnings about possible not large enough buffer, so check snprintf return value properly. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-07-02 08:08:33 +02:00
Jan Friesse	fc45968223	init: Use existing env variable from sysconf Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-07-02 08:08:23 +02:00
Jan Friesse	0031822c68	upstart: Remove notifyd upstart unit Hopefully this is last upstart bit. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-07-02 08:07:15 +02:00
Christine Caulfield	137b31397c	knet: Don't try to create loopback interface twice It wasn't hardmful, but it generated an annoying message Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-02 08:00:36 +02:00
Christine Caulfield	5dda71ae29	knet: Fix knet log buffer size knet sends log messages as struct knet_log_msg, not a string of KNET_MAX_LOG_MSG_SIZE (which is only part of that structure). So we were both losing and corrupting messages. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-07-02 08:00:15 +02:00
Jan Friesse	23e17953fe	cpg: Inform clients about left nodes during pause Patch tries to fix incorrect behaviour during following test-case: - 3 nodes - Node 1 is paused - Node 2 and 3 detects node 1 as failed and informs CPG clients - Node 1 is unpaused - Node 1 clients are informed about new membership, but not about Node 1 being paused, so from Node 1 point-of-view, Node 2 and 3 failure Solution is to: - Remove downlist master choose and always choose local node downlist. For Node 1 in example above, downlist contains Node 2 and 3. - Keep code which informs clients about left nodes - Use joinlist as a authoritative source of nodes/clients which exists in membership This patch doesn't break backwards compatibility. I've walked thru all the patches which changed behavior of cpg to ensure patch does not break CPG behavior. Most important were: - `058f50314c` - Base. Code was significantly changed to handle double free by split group_info into two structures cpg_pd (local node clients) and process_info (all clients). Joinlist was - `97c28ea756` - This patch removed confchg_fn and made CPG sync correct - `feff0e8542` - I've tested described behavior without any issues - `6bbbfcb6b4` - Added idea of using heuristics to choose same downlist on all nodes. Sadly this idea was beginning of the problems described in `040fda8872`, `ac1d79ea7c`, `559d4083ed`, `02c5dffa5b`, `64d0e5ace0` and `b55f32fe2e` - `02c5dffa5b` - Made joinlist as authoritative source of nodes/clients but left downlist_master_choose as a source of information about left nodes Long story made short. This patch basically reverts idea of using heuristics to choose same downlist on all nodes. (ported from needle `9c2a97f4f9`) Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-30 14:37:20 +02:00
Chris Lamb	82b81990aa	man: Make the manpages reproducible Whilst working on the Reproducible Builds effort [0], we noticed that corosync could not be built reproducibly. This is because, whilst it uses SOURCE_DATE_EPOCH[1], the output varies depending on the current timezone. (The LC_ALL is not needed as we only use %Y-%m-%d) This was originally filed in Debian as #896441. [0] https://reproducible-builds.org/ [1] https://reproducible-builds.org/specs/source-date-epoch/ [2] https://bugs.debian.org/896441 Signed-off-by: Chris Lamb <lamby@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-04-25 15:24:19 +02:00
Jan Friesse	e45bbcc92a	totemsrp: Fix leave message regression Leave message in totem is just join message where leaving member is excluded from member list and included in fail list. It also contains special nodeid in header.nodeid and system_from.nodeid fields. Before "totem: Use nodeid ONLY in srp_addr" fix, most of the functions were using system_from addresses and not nodeid, which was used only in one specific case for memb_consensus_set function. After the patch, addresses are gone and only nodeid is used. Result is, that leaving node nodeid is not added into local fail list (my_faillist) so node is unable to reach consensus till token timeout, which starts new gather process. Solution is to send valid leaving node nodeid in system_from.nodeid and handle specific case for memb_consensus_set in memb_join_process. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-23 17:46:05 +02:00
Jan Friesse	dc590159f5	totemsrp: Log proc/fail lists in memb_join_process These information are useful and with trace log level they should not be too much irritating. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-23 17:45:51 +02:00
Jan Friesse	9b3782e48e	totemsrp: Fix srp_addr_compare There is regression caused by "totem: Use nodeid ONLY in srp_addr" patch in srp_addr_compare function. This function should be usable with qsort, so it should return values less than, equal to or greater than zero. It was however returning only zero or negation of a zero. Final results were unable to reach consensus in following test case: - 3 node cluster - start nodes 1, 2, 3 - shutdown node 3 - start node 3 - shutdown node 2 - start node 2 - shutdown node 1 After this steps, node 2 and 3 were unable to reach consensus. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-23 17:45:29 +02:00
Ferenc Wágner	21b80818c5	tools: don't distribute what we can easily make Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-04-23 11:30:14 +02:00
Fabio M. Di Nitto	94dff3b765	Drop all references to SECURITY file File was removed by `6bdf0962ad`. Patch fixes master branch build again. Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-04-23 08:05:24 +02:00
Jan Friesse	6bdf0962ad	SECURITY: Remove SECURITY file Basically no information from SECURITY file is valid. Library interface and related uidgid are better described in manpages. LibNSS is not directly used any longer. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-20 15:30:55 +02:00
Ferenc Wágner	67c644e69b	NSS_NoDB_Init: the parameter is reserved, must be NULL Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-04-20 12:19:22 +02:00
Ferenc Wágner	83c3f620bb	Fix typo: defualt -> default Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-04-20 12:05:04 +02:00
Ferenc Wágner	baece74c39	Fix typo: sucesfully -> successfully Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-04-20 12:04:49 +02:00
Jan Friesse	ccb2290f84	totemsrp: Check join and leave msg length If number of proc_list, failed_list or active members is too high it may be impossible to put them into message, which is allocated on the stack what results in stack corruption. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-12 15:25:38 +02:00
Jan Friesse	c139255669	totemsrp: Implement sanity checks of received msgs Sanity checkers are used to prevent crashing because of accessing unallocated memory. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-04-12 15:25:33 +02:00
Rytis Karpuška	aa62c2c028	cpg: Handle fragmented message sending interrupt It turns out that there are some legitimate cases where fragmented messages might be interrupted during sending (e.g. CS_ERR_TRY_AGAIN or as in my case: CS_ERR_INTERRUPT). This creates a situation where LIBCPG_PARTIAL_FIRST is sent multiple times before receiving LIBCPG_PARTIAL_LAST. Solution is to drop incomplete message and start assembly of new message as libcpg should have reported error during sending of that incomplete message. Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-04-11 18:40:07 +02:00
Jan Friesse	69857efb5b	totem: Display IP of sender To make finding victim of incompatible messages easier, IP of sender is logged. Propagating IP in layers makes patch slightly larger. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-16 13:58:15 +01:00
Jan Friesse	0c509a25a7	totemsrp: Add magic and version into header Magic number (0xC070) together with version in every packet is used for detecting that other node is really Corosync 3.x. Endian_detector field is removed and magic number is now used instead. If received packet magic number differs, guessing is used to show more about the source (Corosync 2.3+, 2.2 are quite reliable, Knet and unencrypted Corosync 2.1/2.0/1.x/OpenAIS are semi-reliable and encrypted Corosync 2.1/2.0/1.x/OpenAIS are quite unreliable). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-16 13:57:55 +01:00
Christine Caulfield	066525efd3	knet: Fix display of links with unconfigured link0 because totemknet always configures link0 as loopback even if it's not known to corosync, we need to filter it out when returning the link status, as things get misaligned in cfg. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-16 13:11:13 +01:00
Jan Friesse	b3f3a1df26	main: Set errno before calling of strtol Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-02 17:29:22 +01:00
Jan Friesse	4ec3d590fa	quorumtool: Don't set our_flags without v_handle Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-02 17:29:20 +01:00
Jan Friesse	db069ebd72	sam_test_agent: Remove unused assignment Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-02 17:29:18 +01:00
Jan Friesse	883dbeb953	blackbox: Quote subshell result properly Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-02 17:29:16 +01:00
Jan Friesse	e72b4fee62	init: Quote subshell result properly Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2018-03-02 17:29:11 +01:00
Christine Caulfield	d2876abed3	cfgtool: Don't assume link ID is a single char For the moment link-ids are a single digit, but that could change and the tools shouldn't be quite so fragile. So parse the interface_name properly by looking for the space between the linkID and the IP. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 16:09:46 +01:00
Christine Caulfield	2c20590d16	knet: Always use link0 for loopback Even if it's not used for anything else. Also, make cfgtool show the correct link ID when links are not contiguous Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:23:20 +01:00
Christine Caulfield	111bfbc11d	totem: Fix debug warnings printed by knet Fix crash introduced a couple of commits ago in iface_get Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:22:22 +01:00
Christine Caulfield	f5871c6b4c	config: Allow use of ring0_addr Allow ring0_addr to be used in place of 'name' for backwards compatibility Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:21:37 +01:00
Christine Caulfield	7a639d1b62	config: Update message when local host isn't found Make the message more representative of what's going on. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:20:00 +01:00
Christine Caulfield	386d710ed1	cfg: Fix cfg_get_node_addrs so that DLM works Also update copyright dates Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:19:45 +01:00
Christine Caulfield	f5b690bd96	totem: Return interface count correctly Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:19:12 +01:00
Christine Caulfield	fc8580bdbf	totem: Use nodeid ONLY in srp_addr This shrinks the srp_addr (and consequently every packet sent by corosync) so that instead of containing loads of IP addresses to identify a node, it just sends the nodeid. This then allows us to make ring0 optional and replaceable when running knet. It also means that we need some other way of identifying the local node in corosync.conf, so the nodelist.node.name entry is now mandatory and is mapped to the local host using the same algorithm as used in cman. This code needs LOTS of testing as it touches a huge amount of totemsrp and totemconfig. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-03-01 14:18:51 +01:00
Fabio M. Di Nitto	6f784804fe	[rpm] use rpm macros to identify build distro thanks Honza for spotting it Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-02-14 09:48:01 +01:00
Fabio M. Di Nitto	30c7f3319f	[rpm] fixup corosync.spec.in to build on opensuse - move dbus-devel and nss-devel BuildRequires to file based depedency. Those 2 BR have different names in OpenSUSE vs Fedora/RHEL/Centos. This is kind of controversial as most distribution prefers a package based build depedency, but the rpm version that supports BuildRequires: foo \|\| bar is only available in rawhide and tumbleweed (aka no stable releases are shipping it yet). In order to build rpms in CI and have some level of flexibility with upstream spec file, we need to compromise a bit. - add explicit --docdir OpenSUSE does not ship docs in the normal dir and their rpm macro does not appear to set it for us. Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-02-14 09:22:27 +01:00
Rytis Karpuška	105f3ae98c	totempg: Fix corrupted messages Commit `899cb29983` changed copy_len to iovec[i].iov_len, assuming, copy_len is always the same as iovec[i].iov_len under those circumstances, but it missed the possability of small message being partly put at the end of packet, which cuts this message in two parts and therefore making copy_len not equal to iovec[i].iov_len. This is revert of `899cb29983` Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2018-02-09 17:38:05 +01:00

... 3 4 5 6 7 ...

4160 Commits