mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-08-02 14:39:25 +00:00

Author	SHA1	Message	Date
Steven Dake	e48ddf99a6	From: Yunkai Zhang: Today, I have observed one of the reason that corosync running into FAILED TO RECEIVE state. There was five nodes(A,B,C,D,E) in my testing, and I limited the UDP transmission rate of C nodes by iptables command: iptables -A INPUT -i eth0 -p udp -m limit --limit 10000/s --limit-burst 1 -j ACCEPT iptables -A INPUT -i eth0 -p udp -j DROP After one hour later, C node had been missing some MCAST messages, it's state described as following: ==state of C node== my_aru:0x805 my_high_seq_received:0xC2C my_aru_count:7 =>receved MCAST message with seq:806 from B nodes =>enter message_handler_mcast =>add this message to regular_sort_queue ... =>enter update_aru function => range = (my_high_seq_received - my_aru) = (0xC2C - 0x805) = 1063 => if range>1024, do nothing and and return directly. ==END== According this logic, after (my_high_req_received-my_aru)>1024, my_aru will not be updated though corosync can receive MCAST messages retransmitted by other nodes. But at that timte, my_aru_count was only 7. So the corosync at C node would keep in this status until my_aru_count increased to fail_to_recv_const(the default value is 2500). This was a long time for corosync, but we wasted it. To solve this issue, maybe we can enlarge the range condition in update_aru function? Or we just ingnore the checking of range value, it seems no harmfull, because we have been using fail_to_recv_const to control the things. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-11-29 10:59:11 -07:00
Yunkai Zhang	19652c3d7c	Correct nodeid of token when we retransmit it Although incorrect nodeid will not affect program's logic, but it will make us confused when we add some logs to record the transmission path of token in debug mode. Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-28 05:56:28 -07:00
Yunkai Zhang	d991400372	Fixed bug when corosync receive JoinMSG in OPERATIONAL state Accordig the totem protocal, nodes should enter GATHER state when it receive JoinMSG in OPERATIONAL state. If we discard it in OPERATIONAL state, the nodes sending this JoinMSG could not receive the response untill other nodes reach token lost timeout. This bug will cause nodes having entered GATHER state spend more time to rejoin the ring, and then it will make nodes reach token expired timeout more easily. Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:52:26 -07:00
Steven Dake	25a6701e9d	Remove unchecked return error in test code Signed-off-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:50:25 -07:00
Steven Dake	b7207138d6	Remove unused variable from latest cpg work that merged all config changes Signed-off-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:50:25 -07:00
Steven Dake	bdd03a4bb7	Remove unchecked return problem in test code Signed-off-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:50:25 -07:00
Steven Dake	aa76b79f24	Remove unchecked return warning Signed-off-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:50:25 -07:00
Steven Dake	bcbb7e028c	Remove use of NULL in test agent Signed-off-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:50:25 -07:00
Steven Dake	f601c73436	Remove unchecked return error Signed-off-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:50:25 -07:00
Steven Dake	73a0adf10e	Correct typing in memory_map function in lib/cpg.c Signed-off-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:50:25 -07:00
Angus Salkeld	0290297b42	Fix last warnings so we can build with --enable-fatal-warnings Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-15 09:42:26 +11:00
Angus Salkeld	92ca91fa66	TOTEM: better clean up on exit Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 09:08:04 +11:00
Angus Salkeld	a6729003a6	OBJDB: free up resources on exit Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 09:06:50 +11:00
Angus Salkeld	0fc51c40fd	LOG: cleanup logging resources at exit Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 09:05:08 +11:00
Angus Salkeld	21f1008be8	Clean up the poll loop resourses on exit Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 08:13:08 +11:00
Angus Salkeld	f5a31e55a2	Add calls to missing object_find_destroy() to fix mem leaks Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 08:12:13 +11:00
Angus Salkeld	390391acba	Free mem allocated by getaddrinfo Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 08:11:17 +11:00
Yunkai Zhang	43bead3645	Send one confchg event per CPG group to CPG client We found that sheepdog will receive more than one confchg msg when network partition occur. For example, suppose the cluster has 4 nodes: N1, N2, N3, N4, and they form a single-ring initially. After a while, network partition occur, the single-ring divide into two sub-ring: ring(N1, N2, N3) and ring(N4). The sheepdog in the ring(N4) will receive the following confchg messages in turn: Memb: N2,N3,N4 Left:N1 Joined:null memb: N3,N4 Left:N2 Joined:null memb: N4 Left:N3 Joined:null This patch will fixed this bug, and the client will only receive one confchg event in this case: memb: N4 Left:N1,N2,N3 Joined:null Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-10-31 17:05:38 +11:00
Anton Jouline	a358791d5b	Adding support for dynamic membership with UDPU transport Add a new object called totem.interface.dynamic to allow creation/deletion of new child objects using the corosync-objctl utility: to add new member: linux# corosync-objctl -c totem.interface.dynamic.10-211-55-12 to delete an existing member: linux# corosync-objctl -d totem.interface.dynamic.10-211-55-12 Corosync will dynamically add these members to the configuration and start communicating with those nodes. Signed-off-by: Anton Jouline <anton.jouline@cbsinteractive.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-27 23:52:16 -07:00
Jan Friesse	783dd4e553	Remove unused buf and len variables in log_printf Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-25 16:29:10 +02:00
Jan Friesse	26db8b21b2	api: Change some of totempg definitons Recent changes in patch "Get rid of hdb usage in totempg.h interface" caused incompatibility between corosync API and totempg. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-24 17:43:36 +02:00
Jan Friesse	87821f52a6	totemmrp: Allow compilation without warnings Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-24 17:43:32 +02:00
Jan Friesse	1711aea72f	Allow compilation of totempg without warnings Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-24 17:43:28 +02:00
Jan Friesse	99bbf4cc78	logsys.h: Properly define LEAVE macro Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-10-24 14:24:52 +02:00
Angus Salkeld	2cf37d4063	Set the size of the blackbox to the size on flatiron Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-22 17:42:53 +11:00
Angus Salkeld	ec771a9a9a	CTS: remove dead code in sam_test_agent Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-22 10:51:47 +11:00
Angus Salkeld	9fbd5c08c4	don't log an error if exiting with 0 Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-22 10:51:47 +11:00
Angus Salkeld	1b63c3cf57	LOG: update the log defines Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-22 10:51:47 +11:00
Steven Dake	036ea63107	add wait-for-license to cov-analyze Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-10-21 03:01:21 -07:00
Steven Dake	8671c967e1	res could return an undefined value if there was no error in totempg_groups_initialize Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-10-21 03:01:14 -07:00
Steven Dake	b793135834	Remove default from cpg_model_initialize - atm there is only one model Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-10-21 03:01:07 -07:00
Steven Dake	3ad0979dc1	Remove dead code in evs service Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-10-21 03:01:01 -07:00
Steven Dake	589da8f0e1	Remove dead code in votequorum Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-10-21 03:00:41 -07:00
Angus Salkeld	3ade35ca01	TEST: make cpgbench go to 1M Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 20:20:17 +11:00
Angus Salkeld	bf3a0ad542	Remove references to README.devmap Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 20:20:17 +11:00
Steven Dake	d05ddc0342	Remove dead code in cfg.c Signed-off-by: Steven Dake <sdake@redhat.com>	2011-10-21 02:15:39 -07:00
Angus Salkeld	2d0946cf8a	Remove old README.devmap Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:36:36 +11:00
Angus Salkeld	21186a0f70	MAN: remove unused man pages Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:36:36 +11:00
Angus Salkeld	78a5260c06	LOG: use libqb facility conversion functions Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:34:43 +11:00
Angus Salkeld	0e58141a2f	LOG: get logging to file working correctly Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:34:43 +11:00
Angus Salkeld	26a6e26f57	LOG: Fix debugging Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:34:43 +11:00
Steven Dake	9e18d91827	Updated TODO with feedback from community defining our Needle 2.0/2.1 goals Signed-off-by: Steven Dake <sdake@redhat.com> Revieweed-by: Jan Friesse <jfriesse@redhat.com>	2011-10-20 12:08:38 -07:00
Steven Dake	89df2cb7f9	Add --concurrency to coverity make target Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-10-20 12:08:25 -07:00
Masatake YAMATO	721e2d2a2a	Remove cloned lines in main of main.c Signed-off-by: Masatake YAMATO <yamato@redhat.com>	2011-10-09 20:32:39 -07:00
Steven Dake	2ec4ddb039	Deliver all messages from my_high_seq_recieved to the last gap This patch passes two test cases: ------- Test #1 ------- Two node cluster - run cpgbench on each node modify totemsrp with following defines: Two test cases: ------- Test #2 ------- 5 node cluster start 5 nodes randomly at about same time, start 5 nodes randomly at about same time, wait 10 seconds and attempt to send a message. If message blocks on "TRY_AGAIN" likely a message loss has occured. Wait a few minutes without cyclng the nodes and see if the TRY_AGAIN state becomes unblocked. If it doesn't the test case has failed Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-09-22 10:21:37 +02:00
Jan Friesse	f6c2a8dab7	totemconfig: change minimum RRP threshold RRP threshold can be lower value then 5. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2011-09-08 09:52:16 +02:00
Steven Dake	c505993ecb	Allow conditional rpmbuilds of NSS feature NSS is currently non-conditional. Allow nss to be build conditonally. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalked@redhat.com>	2011-09-05 09:16:35 -07:00
Steven Dake	48ffa8892d	Ignore memb_join messages during flush operations a memb_join operation that occurs during flushing can result in an entry into the GATHER state from the RECOVERY state. This results in the regular sort queue being used instead of the recovery sort queue, resulting in segfault. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-09-02 09:58:44 -07:00
Jan Friesse	752239eaa1	rrp: Higher threshold in passive mode for mcast There were too much false positives with passive mode rrp when high number of messages were received. Patch adds new configurable variable rrp_problem_count_mcast_threshold which is by default 10 times rrp_problem_count_threshold and this is used as threshold for multicast packets in passive mode. Variable is unused in active mode. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed by: Steven Dake <sdake@redhat.com>	2011-09-01 11:21:09 +02:00
Jan Friesse	0eade8de79	rrp: Handle endless loop if all ifaces are faulty If all interfaces were faulty, passive_mcast_flush_send and related functions ended in endless loop. This is now handled and if there is no live interface, message is dropped. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed by: Steven Dake <sdake@redhat.com>	2011-09-01 11:20:18 +02:00

1 2 3 4 5 ...

2719 Commits