mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-10-11 18:06:59 +00:00

Author	SHA1	Message	Date
Jerome FLESCH	99faa3b864	When flushing, discard only memb_join messages Patch solves problem when 1 ring out of 2 went up/down quite often. The simplest setup to reproduce bug is following: - 2 VMs, connected by 2 network interfaces - OS: Linux - On one of the VMs, a test program sending some CPG messages (see the script "test_corosync.sh" joined to this mail for example) Here are the Corosync logs we get when we do this setup: Jun 06 16:23:40 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed. Jun 06 16:23:40 corosync [CPG ] chosen downlist: sender r(0) ip(192.168.56.104) r(1) ip(192.168.57.104) ; members(old:1 left:0) Jun 06 16:23:40 corosync [MAIN ] Completed service synchronization, ready to provide service. Jun 06 16:24:37 corosync [TOTEM ] Marking ringid 1 interface 192.168.57.105 FAULTY Jun 06 16:24:38 corosync [TOTEM ] Automatically recovered ring 1 Jun 06 16:25:33 corosync [TOTEM ] Marking ringid 1 interface 192.168.57.105 FAULTY Jun 06 16:25:34 corosync [TOTEM ] Automatically recovered ring 1 Jun 06 16:26:35 corosync [TOTEM ] Marking ringid 1 interface 192.168.57.105 FAULTY Jun 06 16:26:36 corosync [TOTEM ] Automatically recovered ring 1 (...) The second ring goes down about every 2 minutes and automatically back up right after. We spent some times looking for the commit that introduced this bug, and it appears it's due the following one: Corosync 1.3.3 -> 1.3.4: `e27a58d93d` Corosync 1.4.1 -> 1.4.2: `be608c0502` Commit message: Ignore memb_join messages during flush operations I had a look at this commit, and it seems to me it's dropping too many packets: Because of this commit, while totemrrp_recv_flush() is called, Corosync drops memb_join packets, but also ORF tokens. In the end, it seems that sometimes, we drop so many of them that Corosync marks the ring as faulty. To fix that, only memb_join messages are dropped now. Signed-off-by: Jerome FLESCH <jerome.flesch@netasq.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2012-06-11 10:59:30 +02:00
Jan Friesse	3b7c2f0588	Update crypto_set API Also few leftovers from cfg is removed and version of totempg is increased to 5 to reflect all changes we made Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-03-15 17:33:53 +01:00
Fabio M. Di Nitto	c3f7d0ef3e	totem: don't send garbage onwire if we fail to crypt Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2012-03-14 15:30:40 +01:00
Fabio M. Di Nitto	0a6a6bbcfa	crypto: drop secauth and make crypto none work again keep totem.secauth config key for compatibility if the key is NOT set, crypto will default to aes256/sha1 if the key is set to "off", crypto is disabled. this reflects pretty much old behavior keywords totem.crypto_cipher and totem.crypto_hash can override secauth individually. Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2012-03-14 11:28:36 +01:00
Jan Friesse	ab1675f0fe	Parse and use hash and crypto from config file Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-03-13 17:38:59 +01:00
Jan Friesse	cb97ed186a	Rename totemcrypto Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-03-13 17:38:46 +01:00
Fabio M. Di Nitto	55e8476697	crypto: mask the crypto operations from totem packet size management totem doesn't need to understand what crypto does. totem needs to be able to tell crypto: "those are data, play with them" and crypto needs to return: "here are your scrambled data and the new size" similar to decrypt/verify. this way we add enough dynamic within crypto to change header size and all at any given time (for different hash algorithm for example) without affecting on wire compat. Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2012-03-13 15:50:58 +01:00
Jan Friesse	42a2f69e6f	onecrypt: move encryption code to crypto.c This will remove duplicity of code. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-03-13 12:23:13 +01:00
Jan Friesse	8cdd2fc493	Remove libtomcrypt Tomcrypt in corosync is for long time not updated. Because we have support for libnss, libtomcrypt can be removed. Also few leftovers (AES is 256 bits, not 128, ...) are removed. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2012-03-13 09:19:47 +01:00
Fabio M. Di Nitto	142ce8c3a1	totem: drop crypt_accept: concept/option this was another old onwire compat mode that is not useful anylonger. we can safely move the new model by default. According to Honza (real hardware 1 node testing) there are no performance impact. My tests (8 nodes VM cluster), there is up to 10/12% performance improvements up to 1M packet size where old and new models are equal. As a side note, nss still shows to be a performance loss on both real and virtual hw (without any kind of nss hw acceleration). Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2012-03-10 07:08:30 +01:00
Steven Dake	2ad0cdc832	Update copyright header dates in exec directory Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2012-02-13 17:05:04 -07:00
Angus Salkeld	3131601ce2	Remove all unneccessary "\n" from log messages These look ugly, are inconsistently done and just have to be removed later in libqb before calling syslog. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2012-01-23 13:08:23 +11:00
Steven Dake	7c8e83ac34	Change all ais references to corosync Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>	2012-01-12 07:29:15 -07:00
Steven Dake	8ad583a54c	Move logsys.c into corosync binary instead of a shared object Our preferred shared logging system is exported via the libqb library. As a result, the corosync project no longer needs to export logsys.so and the code can be directly included in the binary. The header file can also be removed. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2012-01-06 18:19:59 -07:00
Steven Dake	48ffa8892d	Ignore memb_join messages during flush operations a memb_join operation that occurs during flushing can result in an entry into the GATHER state from the RECOVERY state. This results in the regular sort queue being used instead of the recovery sort queue, resulting in segfault. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-09-02 09:58:44 -07:00
Steven Dake	32f11337b1	Remove hdb.h header includes from unnecessary files The files in this patch do not use the hdb.h header. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-23 22:28:40 -07:00
Angus Salkeld	37e17e7a94	libqb: logging & trace Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	78e06739b7	libqb: remove worker thread - keep to one thread. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:15 +10:00
Angus Salkeld	f717bc60e1	libqb: make timer api a wrapper around qb_loop timers. - change timeout value to nano seconds - fix timer handles (don't alloc on stack) Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:14 +10:00
Angus Salkeld	c6895faa05	libqb: change ipc -> qb_ipc IPC: return 0/-ENOBUFS from message handler IPC: use the new rate_limit API to improve perf. CPG: add send_async API & hook up flow control IPC: Fix flow control getting stuck. IPC: Port the remaining libs to use libqb IPC IPC: remove libqb flowcontrol API TEST: put cpg_dispatch() in it's own thread IPC: cleanup ipc_glue.c name everything cs_ipcs_() IPC: add back statistics IPC: remove coroipcc_ symbols from lib.versions IPC: init each se's IPC as it is loaded. IPC: use the new connection_closed() event to free the context. IPC: re-add zero copy functionality back IPC: remove cpg_mcast_joined_async() and make it the default -> now cpg_mcast_joined() == cpg_mcast_joined_async() libqb: expose a libqb error converter libqb: add missing error conversions libqb: remove repeat try loop in lib/cpg.c CPG: fix zero copy mcast CPG: use newer return codes Add ENOTCONN to qb_to_cs_error() libqb: fix error conversion from errno to cs_error_t in confdb libqb: change errno_to_cs to qb_to_cs_error libqb: add a cs_strerror() to get a more meaningful message libqb: fix some confusing error conversions. libqb: set the timeout on recv's to -1 (wait forever) Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:14 +10:00
Angus Salkeld	fce8a3c3b6	libqb: convert coropoll calls to qb_loop calls. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:14 +10:00
Jan Friesse	ddb5214c2c	Revert "totemsrp: Remove recv_flush code" This reverts commit `1a7b7a39f4`. Reversion is needed to remove overflow of receive buffers and dropping messages. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2011-07-26 10:05:55 +02:00
Jerome Flesch	00434a4f10	Fix usage of strerror_r()/perror() Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-06-28 09:56:58 +02:00
Jan Friesse	531e81602f	totemudp: memset of proper size In totemudp_mcast_thread_state_constructor memset to sizeof(struct totemudp_mcast_thread_state) instead of size of pointer. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-06-03 11:09:27 +02:00
Steven Dake	1a7b7a39f4	totemsrp: Remove recv_flush code The recv_flush code is no longer necessary because of the miss_count_count addition. It can in some cases lead to register corruption because of interactions with -fstack-protector, the recursive nature of how this code works, and interactions with the optimizer in some versions of gcc. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-03-24 09:21:27 -07:00
Angus Salkeld	0ad2494ae7	Fix some "set but not used" warnings [-Wunused-but-set-variable] Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-16 07:13:42 +11:00
Zane Bitter	dddaeef21c	Allocate packet buffers in the transport drivers This change paves the way for eliminating a copy within the Infiniband driver in the future by transferring responsibility for allocating and freeing message buffers to the transport driver layer. Tested under valgrind on a single-node cluster. Signed-off-by: Zane Bitter <zane.bitter@gmail.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-11 20:38:28 -07:00
Angus Salkeld	2c46de5ac1	Add totem/interface/ttl config option. This adds a per-interface config option to adjust the TTL. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2010-11-24 14:35:56 +11:00
Steven Dake	d209f7bb27	Fix leak in error path in nss encryption. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2857 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-19 05:30:18 +00:00
Angus Salkeld	7999995273	cov 10412: fix mem leak in encrypt_and_sign_nss() git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2854 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-19 04:35:25 +00:00
Angus Salkeld	202de1b8e2	cov 10411: fix leak in totemudp.c git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2833 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-16 21:26:15 +00:00
J�r�me Flesch	a5f2733211	Totemudp: Add debug logs when a call to sendmsg() fails git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2740 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-03-26 13:54:42 +00:00
Angus Salkeld	20f3331d0e	convert strerror() into strerror_r() git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2665 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-02-25 19:28:36 +00:00
Steven Dake	1978c9d471	Use nodeid instead of localhost ip for the case when binding to a loalhost interface. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2656 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-02-15 21:39:33 +00:00
Steven Dake	f9f663f459	Add a target token set completed callback in totemrrp and below layers. Handle management of callback in totemsrp. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2371 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-07-27 00:29:32 +00:00
Steven Dake	5eae4c135c	Optimization of totemsrp and below by removing hdb usage. cpgbench shows results of 4% to 20% increase in tps and mbs depending on hardware. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2369 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-07-22 22:10:35 +00:00
Steven Dake	5225b3809c	Initial infrastructure changes to support iba. Split totemnet into totemiba and totemudp and have totemnet call the appropriate transport calls. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2367 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-07-21 18:17:15 +00:00

37 Commits