mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-07-27 09:22:03 +00:00

Author	SHA1	Message	Date
Steven Dake	7d5e588931	totemsrp: free messages originated in recovery rather then rely on messages_free Relying on messages_free may seem like it should work, but it leads to a situation where every node has released the messages, yet some nodes think messages are missing. The output then looks like "Retransmit: #" in repitition. This patch frees those messages immediately during the transition to the OPERATIONAL state and sets the internal variables totemsrp depends upon to the proper values. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-03-24 09:25:15 -07:00
Steven Dake	ef05817ce5	totemsrp: Only restore old ring id information one time The current code stores the current ring information every time a commit token is generated. This causes the old ring id used for comparison purposes to increase if a token is lost in commit or recovery, resulting in failure of totem. This patch changes the behavior to only store the old ring id one time when the commit token is received, and then further commit token ring id saves are not done until OPERATIONAL is reached. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-03-24 09:22:34 -07:00
Steven Dake	1a7b7a39f4	totemsrp: Remove recv_flush code The recv_flush code is no longer necessary because of the miss_count_count addition. It can in some cases lead to register corruption because of interactions with -fstack-protector, the recursive nature of how this code works, and interactions with the optimizer in some versions of gcc. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-03-24 09:21:27 -07:00
Steven Dake	d99fba72e6	Resolve abort during simulatenous stopping of atleast 4 nodes consider 5 nodes. node 3,4 stopped (by random stopping) node 1,2,5 form new configuration and during recovery node 1 and node 2 are stopped (via service service corosync stop). This causes 5 never to finish recovery within the timeout period, triggering a token loss in recovery. Bug #623176 resolved an assert which happens because the full ring id was being restored. The resolution to Bug #623176 was to not restore the full ring id, and instead operate (according to specifications) the new ring id. Unfortunately this exposes a problem whereby the restarting of nodes 1-4 generate the same ring id. This ring id gets to the recovery failed node 5 which is now in gather, and triggers a condition not accounted for in the original totem specification. It appears later work from Dr. Agarwal's PHD dissertation considers this scenario. That solution entails rejecting the regular token in the above condition. Since the ring id is also used to make decisions for commit token acceptance, we must also take care to reject the regular token in all cases after transitioning from OPERATIONAL. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-21 09:26:35 -07:00
Angus Salkeld	0ad2494ae7	Fix some "set but not used" warnings [-Wunused-but-set-variable] Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-16 07:13:42 +11:00
Zane Bitter	dddaeef21c	Allocate packet buffers in the transport drivers This change paves the way for eliminating a copy within the Infiniband driver in the future by transferring responsibility for allocating and freeing message buffers to the transport driver layer. Tested under valgrind on a single-node cluster. Signed-off-by: Zane Bitter <zane.bitter@gmail.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-11 20:38:28 -07:00
Steven Dake	6aa47fde95	Fix abort when token is lost in RECOVERY state A commit token should be rejected when a token is lost in the recovery state. This occurs naturally because the ring id increases by 4 for every new ring. Prior to this patch, if the token was lost, the old ring id information was restored, causing a commit token to be accepted when it should be rejected. This erronously accepted commit token would lead to an assertion which is fixed by this patch. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-03-07 17:15:05 -07:00
Steven Dake	7471c88346	Don't assert when ring id file is less then 8 bytes If the ring id file for the processor is less then 8 bytes, totemsrp would assert. Our speculation is that this condition happens during a fencing operation or local filesystem corruption. With this patch, Corosync will create fresh ring id file data when the incorrect number of bytes are read from the ring id. Amend to use sizeof the strerror string length and PATH_MAX for the path length. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-02-24 15:34:39 -07:00
Steven Dake	6646a864b4	Handle delayed multicast packets that occur with switches Some switches delay multicast packets vs the unicast token. This patch works around that problem by providing a new tuneable called miss_count_const. This tuneable works by counting the number of times a message is found missing and once reaching the const value, marks it as missing in the retransmit list. This improves performance and doesn't display warning messages about missed multicast messages when operating in these switching environments. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-01-11 10:34:46 -07:00
Jan Friesse	b9df4424b1	Display warning when not possible to form cluster This may typically happen if local firewall is enabled. Patch adds new item to statistics called continuous_gather where is number of continuous entered gather state. If this number is bigger then MAX_NO_CONT_GATHER, warning message is displayed. This is also used on exiting, so stop of corosync is now possible even with enabled firewall. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2010-12-03 10:11:11 +01:00
Steven Dake	bb05aed93f	Add the UDPU transport The UDPU transport is useful for those deployments which can't use multicast. UDPU works by using UDP unicast, which is fully supported by every switch manufacturer by default and doesn't rely on a functional IGMP implementation. An example of the UDPU transport is contained in the corosync.conf.example.udpu file which shows a 16 node cluster. This file should be copied to each node in the cluster and IP addresses changed as appropriate. Amended to remove dead udpu REUSEADDR socket option. Signed-off-by: Steven Dake <sdake@redhat.com>	2010-11-18 14:21:30 -07:00
Steven Dake	fef259970a	Remove cancel token retransmit timeout. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3012 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-08-03 17:31:33 +00:00
Steven Dake	b8878d2e76	Remove reset of token timeout on retransmitted token reception. The timer should only be reset when a real token is received or membership protocol could run into problems with certain timing parameters. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2988 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-07-14 18:35:36 +00:00
Steven Dake	95615b2fec	Fix fail list fault that occurs in very rare circumstances. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2984 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-07-03 21:54:22 +00:00
Steven Dake	22471e113d	Fix fail to receive logic which occurs very rarely on high loss networks with software based multicast. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2919 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-06-03 21:36:21 +00:00
Steven Dake	79c60fd0ad	Totem spec is clear: reject retransmitted tokens if token.aru = aru in token on last rotation ... do some logic Here is how the current code works: last_aru = instance->my_last_aru; instance->my_last_aru = token->aru; reject retransmitted tokens if token.aru = aru in token on last rotation ... do some logic The issue is last_aru will be set to token->aru when a token retransmission occurs before a new token arrives. This results in the "do some logic" part happening more often then it should. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2917 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-06-01 20:35:53 +00:00
Angus Salkeld	2a23dfc585	cov 10391: allow assert to check for a negative number git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2848 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-18 00:43:41 +00:00
Angus Salkeld	92985c31d9	cov 10405: remove unused pointer from totemsrp. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2845 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-18 00:12:52 +00:00
Angus Salkeld	18326ad242	cov 10392: remove pointless assert backlog is unsigned git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2842 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-16 21:40:19 +00:00
Angus Salkeld	3eb76c2154	cov 10382: improve error handling around open() git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2825 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-14 02:05:32 +00:00
Steven Dake	5672c3efa8	The retransmit token storage area is an improper type of an array of pointers rather then a pointer to a buffer. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2795 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-04-30 05:18:02 +00:00
Steven Dake	005b9af59d	When a message is retransmitted, a memmove operation is done to remove the newly retransmitted entry from the list. It is possible this memmove operation can buffer overflow because it has an invalid length calculation fixed by this revision. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2794 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-04-30 05:15:41 +00:00
Steven Dake	80d621e25f	Allow maximum entries in the retransmit queue when recovery takes place. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2793 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-04-30 05:14:08 +00:00
Steven Dake	107ef19913	Save the ring id and restore it properly when the recovery operation fails as a result of a new gather or token loss. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2792 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-04-30 05:12:26 +00:00
Steven Dake	084d19cddc	Fix problem where retransmissions don't occur resulting in failure to receive condition. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2685 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-03-20 20:08:38 +00:00
Angus Salkeld	20f3331d0e	convert strerror() into strerror_r() git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2665 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-02-25 19:28:36 +00:00
Steven Dake	bc64fbcb58	Patch to set unset value in token hold cancel structure as to not crash wireshark. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2660 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-02-18 20:08:39 +00:00
Angus Salkeld	8671945cef	totemsrp: fix transitional configuration changes with long token timeouts git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2651 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-02-02 06:24:01 +00:00
Steven Dake	dc2a0e68d6	Remove invalid assertion in totemsrp. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2640 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-12-15 19:22:36 +00:00
Steven Dake	208d907f63	Remove string overwrite if many recovery messages are originated. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2580 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-12-08 00:01:39 +00:00
Steven Dake	f434e74faa	Remove compiler warning in totemsrp. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2578 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-12-07 23:21:01 +00:00
Steven Dake	b4367e6075	Set boolean indicating the retrans flag was set to 1 to 0 when setting retrans flag in token to zero. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2573 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-12-07 18:41:49 +00:00
Steven Dake	ff5cd7d57c	Make assertions for range checking of message delivery check with the define instead of magic numbers that are not valid if the define changes. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2572 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-12-07 18:22:48 +00:00
Steven Dake	adf8d6db24	Prevent lockup in recovery state in totem after 206 messages are originated. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2569 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-12-07 05:03:25 +00:00
Steven Dake	3020fd5742	Fix recovery messages to be proper length to remove segfault that occurs during recovery. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2568 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-12-07 05:02:28 +00:00
Angus Salkeld	c052bd3f3f	stats: don't calloc the totemsrp stats struct. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2564 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-12-03 18:18:29 +00:00
Angus Salkeld	acf9a8d85f	Correct some ugly indentation. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2563 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-12-03 18:13:52 +00:00
Steven Dake	72fe262478	Start pause timer at initialization so first gather doesn't result in pause timeout operations. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2555 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-11-30 19:11:20 +00:00
Angus Salkeld	29eb20a389	Rename totem_new_msg_signal() to something more generic. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2553 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-11-29 18:42:00 +00:00
Angus Salkeld	948ca19aa7	Add some missing calls to increment the relevant stats. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2519 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-10-12 21:56:23 +00:00
Angus Salkeld	73a24c0352	Add totem stats to objdb. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2517 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-10-12 17:30:20 +00:00
Steven Dake	21825d46ea	Fix incorrect assertion with frame sizes of 9000. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2395 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-08-28 01:19:30 +00:00
Jan Friesse	c50a6bd065	Support for monotime timer This patch should solve problems with corosync and ntp, by using clock_gettime where it make sense. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2373 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-07-27 10:12:55 +00:00
Steven Dake	69928e301a	Add notification when totem has completed initialization. This triggers the initialization of the service engines which may need totem for initialization. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2372 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-07-27 02:00:05 +00:00
Steven Dake	f9f663f459	Add a target token set completed callback in totemrrp and below layers. Handle management of callback in totemsrp. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2371 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-07-27 00:29:32 +00:00
Steven Dake	5eae4c135c	Optimization of totemsrp and below by removing hdb usage. cpgbench shows results of 4% to 20% increase in tps and mbs depending on hardware. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2369 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-07-22 22:10:35 +00:00
Steven Dake	fc2de3db2a	Simplify notifications from totem at the notice level. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2315 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-06-28 05:26:25 +00:00
Steven Dake	23aea08ae4	Slay the debug messages coming out at notice level in totem. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2313 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-06-28 05:12:27 +00:00
Steven Dake	47c6bc3aaf	Remove totemsrp warning. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2312 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-06-28 04:57:31 +00:00
Steven Dake	e448603f2f	Add ability to detect process pause and not implode the membership algorithm when this occurs. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2304 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-06-26 21:39:44 +00:00

1 2 3 4

180 Commits