mirror of
				https://git.proxmox.com/git/mirror_corosync
				synced 2025-10-31 17:54:14 +00:00 
			
		
		
		
	|  e48ddf99a6 Today, I have observed one of the reason that corosync running into
FAILED TO RECEIVE state.
There was five nodes(A,B,C,D,E) in my testing, and I limited the UDP
transmission rate of C nodes by iptables command:
iptables -A INPUT -i eth0 -p udp -m limit --limit 10000/s
--limit-burst 1 -j ACCEPT
iptables -A INPUT -i eth0 -p udp -j DROP
After one hour later, C node had been missing some MCAST messages,
it's state described as following:
==state of C node==
my_aru:0x805
my_high_seq_received:0xC2C
my_aru_count:7
=>receved MCAST message with seq:806 from B nodes
=>enter *message_handler_mcast*
  =>add this message to regular_sort_queue
  ...
  =>enter *update_aru* function
    => range = (my_high_seq_received - my_aru)
             = (0xC2C - 0x805)
             = 1063
    => if range>1024, do nothing and and return directly.
==END==
According this logic, after (my_high_req_received-my_aru)>1024, my_aru
will not be updated though corosync can receive MCAST messages
retransmitted by other nodes.
But at that timte, my_aru_count was only 7. So the corosync at C node
would keep in this status until my_aru_count increased to
fail_to_recv_const(the default value is 2500). This was a long time
for corosync, but we wasted it.
To solve this issue, maybe we can enlarge the range condition in
update_aru function? Or we just ingnore the checking of range value,
it seems no harmfull, because we have been using fail_to_recv_const to
control the things.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com> | ||
|---|---|---|
| .. | ||
| .gitignore | ||
| apidef.c | ||
| apidef.h | ||
| coroparse.c | ||
| crypto.c | ||
| crypto.h | ||
| cs_queue.h | ||
| evil.c | ||
| evil.h | ||
| fsm.h | ||
| ipc_glue.c | ||
| logsys.c | ||
| main.c | ||
| main.h | ||
| mainconfig.c | ||
| mainconfig.h | ||
| Makefile.am | ||
| objdb.c | ||
| quorum.c | ||
| quorum.h | ||
| schedwrk.c | ||
| schedwrk.h | ||
| service.c | ||
| service.h | ||
| sync.c | ||
| sync.h | ||
| syncv2.c | ||
| syncv2.h | ||
| timer.c | ||
| timer.h | ||
| totemconfig.c | ||
| totemconfig.h | ||
| totemiba.c | ||
| totemiba.h | ||
| totemip.c | ||
| totemmrp.c | ||
| totemmrp.h | ||
| totemnet.c | ||
| totemnet.h | ||
| totempg.c | ||
| totemrrp.c | ||
| totemrrp.h | ||
| totemsrp.c | ||
| totemsrp.h | ||
| totemudp.c | ||
| totemudp.h | ||
| totemudpu.c | ||
| totemudpu.h | ||
| util.c | ||
| util.h | ||
| vsf_quorum.c | ||
| vsf_ykd.c | ||
| vsf.h | ||