mirror of
https://git.proxmox.com/git/mirror_corosync
synced 2025-07-27 12:34:31 +00:00
![]() Today, I have observed one of the reason that corosync running into FAILED TO RECEIVE state. There was five nodes(A,B,C,D,E) in my testing, and I limited the UDP transmission rate of C nodes by iptables command: iptables -A INPUT -i eth0 -p udp -m limit --limit 10000/s --limit-burst 1 -j ACCEPT iptables -A INPUT -i eth0 -p udp -j DROP After one hour later, C node had been missing some MCAST messages, it's state described as following: ==state of C node== my_aru:0x805 my_high_seq_received:0xC2C my_aru_count:7 =>receved MCAST message with seq:806 from B nodes =>enter *message_handler_mcast* =>add this message to regular_sort_queue ... =>enter *update_aru* function => range = (my_high_seq_received - my_aru) = (0xC2C - 0x805) = 1063 => if range>1024, do nothing and and return directly. ==END== According this logic, after (my_high_req_received-my_aru)>1024, my_aru will not be updated though corosync can receive MCAST messages retransmitted by other nodes. But at that timte, my_aru_count was only 7. So the corosync at C node would keep in this status until my_aru_count increased to fail_to_recv_const(the default value is 2500). This was a long time for corosync, but we wasted it. To solve this issue, maybe we can enlarge the range condition in update_aru function? Or we just ingnore the checking of range value, it seems no harmfull, because we have been using fail_to_recv_const to control the things. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com> |
||
---|---|---|
build-aux | ||
conf | ||
cts | ||
exec | ||
include | ||
init | ||
lcr | ||
lib | ||
man | ||
pkgconfig | ||
services | ||
test | ||
tools | ||
.gitignore | ||
AUTHORS | ||
autobuild.sh | ||
autogen.sh | ||
configure.ac | ||
corosync.spec.in | ||
Doxyfile.in | ||
INSTALL | ||
LICENSE | ||
loc | ||
Makefile.am | ||
README.recovery | ||
SECURITY | ||
TODO |
SYNCHRONIZATION ALGORITHM: ------------------------- The synchronization algorithm is used for every service in corosync to synchronize state of he system. There are 4 events of the synchronization algorithm. These events are in fact functions that are registered in the service handler data structure. They are called by the synchronization system whenever a network partitions or merges. init: Within the init event a service handler should record temporary state variables used by the process event. process: The process event is responsible for executing synchronization. This event will return a state as to whether it has completed or not. This allows for synchronization to be interrupted and recontinue when the message queue buffer is full. The process event will be called again by the synchronization service if requesed to do so by the return variable returned in process. abort: The abort event occurs when during synchronization a processor failure occurs. activate: The activate event occurs when process has returned no more processing is necessary for any node in the cluster and all messages originated by process have completed. CHECKPOINT SYNCHRONIZATION ALGORITHM: ------------------------------------ The purpose of the checkpoint syncrhonization algorithm is to synchronize checkpoints after a paritition or merge of two or more partitions. The secondary purpose of the algorithm is to determine the cluster-wide reference count for every checkpoint. Every cluster contains a group of checkpoints. Each checkpoint has a checkpoint name and checkpoint number. The number is used to uniquely reference an unlinked but still open checkpoint in the cluser. Every checkpoint contains a reference count which is used to determine when that checkpoint may be released. The algorithm rebuilds the reference count information each time a partition or merge occurs. local variables my_sync_state may have the values SYNC_CHECKPOINT, SYNC_REFCOUNT my_current_iteration_state contains any data used to iterate the checkpoints and sections. checkpoint data refcount_set contains reference count for every node consisting of number of opened connections to checkpoint and node identifier refcount contains a summation of every reference count in the refcount_set pseudocode executed by a processor when the syncrhonization service calls the init event call process_checkpoints_enter pseudocode executed by a processor when the synchronization service calls the process event in the SYNC_CHECKPOINT state if lowest processor identifier of old ring in new ring transmit checkpoints or sections starting from my_current_iteration_state if all checkpoints and sections could be queued call sync_refcounts_enter else record my_current_iteration_state require process to continue pseudocode executed by a processor when the synchronization service calls the process event in the SYNC_REFCOUNT state if lowest processor identifier of old ring in new ring transmit checkpoint reference counts if all checkpoint reference counts could be queued require process to not continue else record my_current_iteration_state for checkpoint reference counts sync_checkpoints_enter: my_sync_state = SYNC_CHECKPOINT my_current_iteration_state set to start of checkpont list sync_refcounts_enter: my_sync_state = SYNC_REFCOUNT on event receipt of foreign ring id message ignore message pseudocode executed on event receipt of checkpoint update if checkpoint exists in temporary storage ignore message else create checkpoint reset checkpoint refcount array pseudocode executed on event receipt of checkpoint section update if checkpoint section exists in temporary storage ignore message else create checkpoint section pseudocode executed on event receipt of reference count update update temporary checkpoint data storage reference count set by adding any reference counts in the temporary message set to those from the event update that checkpoint's reference count set the global checkpoint id to the current checkpoint id + 1 if it would increase the global checkpoint id pseudocode called when the synchronization service calls the activate event: for all checkpoints free all previously committed checkpoints and sections convert temporary checkpoints and sections to regular sections copy my_saved_ring_id to my_old_ring_id pseudocode called when the synchronization service calls the abort event: free all temporary checkpoints and temporary sections