mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-07-27 12:34:31 +00:00

Go to file

Steven Dake e48ddf99a6 From: Yunkai Zhang: Today, I have observed one of the reason that corosync running into FAILED TO RECEIVE state. There was five nodes(A,B,C,D,E) in my testing, and I limited the UDP transmission rate of C nodes by iptables command: iptables -A INPUT -i eth0 -p udp -m limit --limit 10000/s --limit-burst 1 -j ACCEPT iptables -A INPUT -i eth0 -p udp -j DROP After one hour later, C node had been missing some MCAST messages, it's state described as following: ==state of C node== my_aru:0x805 my_high_seq_received:0xC2C my_aru_count:7 =>receved MCAST message with seq:806 from B nodes =>enter message_handler_mcast =>add this message to regular_sort_queue ... =>enter update_aru function => range = (my_high_seq_received - my_aru) = (0xC2C - 0x805) = 1063 => if range>1024, do nothing and and return directly. ==END== According this logic, after (my_high_req_received-my_aru)>1024, my_aru will not be updated though corosync can receive MCAST messages retransmitted by other nodes. But at that timte, my_aru_count was only 7. So the corosync at C node would keep in this status until my_aru_count increased to fail_to_recv_const(the default value is 2500). This was a long time for corosync, but we wasted it. To solve this issue, maybe we can enlarge the range condition in update_aru function? Or we just ingnore the checking of range value, it seems no harmfull, because we have been using fail_to_recv_const to control the things. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>		2011-11-29 10:59:11 -07:00
build-aux	add release script and git based versioning	2010-11-10 07:46:53 -07:00
conf	AUGEAS: fix "tags" log field	2011-08-09 10:37:15 +10:00
cts	Remove unchecked return error in test code	2011-11-26 08:50:25 -07:00
exec	From: Yunkai Zhang:	2011-11-29 10:59:11 -07:00
include	OBJDB: free up resources on exit	2011-11-11 09:06:50 +11:00
init	Add systemd unit files for corosync and corosync-notifyd	2011-08-09 08:21:02 +10:00
lcr	Use qb_hdb instead of mutex based hdb code	2011-08-23 12:48:21 -07:00
lib	Correct typing in memory_map function in lib/cpg.c	2011-11-26 08:50:25 -07:00
man	MAN: remove unused man pages	2011-10-21 19:36:36 +11:00
pkgconfig	libqb: Add libqb dependency in the rpm & pc file	2011-08-09 10:37:16 +10:00
services	Remove unused variable from latest cpg work that merged all config changes	2011-11-26 08:50:25 -07:00
test	Remove unchecked return problem in test code	2011-11-26 08:50:25 -07:00
tools	Fix last warnings so we can build with --enable-fatal-warnings	2011-11-15 09:42:26 +11:00
.gitignore	Add Doxyfile to .gitignore	2011-03-15 11:08:45 +11:00
AUTHORS	Update to AUTHORS file.	2009-12-07 23:18:44 +00:00
autobuild.sh	autobuild: improve messages	2011-05-05 21:39:30 +10:00
autogen.sh	Sanitize output of autogen.sh.	2009-06-18 23:08:16 +00:00
configure.ac	Make realtime scheduling optional not the default.	2011-08-09 10:37:16 +10:00
corosync.spec.in	Remove references to README.devmap	2011-10-21 20:20:17 +11:00
Doxyfile.in	docs: auto-generate the version	2011-03-12 19:39:04 +11:00
INSTALL	Add dbus and snmp notifier	2011-02-04 09:47:35 -07:00
LICENSE	Add license information to LICENSE file about build process files	2010-11-10 07:05:45 -07:00
loc	Modify property of loc script to be executable.	2009-04-23 17:37:32 +00:00
Makefile.am	add wait-for-license to cov-analyze	2011-10-21 03:01:21 -07:00
README.recovery	remove all trailing blanks	2009-04-22 08:03:55 +00:00
SECURITY	remove trailing blanks	2009-05-18 16:41:04 +00:00
TODO	Adding support for dynamic membership with UDPU transport	2011-10-27 23:52:16 -07:00

README.recovery

SYNCHRONIZATION ALGORITHM:
-------------------------
The synchronization algorithm is used for every service in corosync to
synchronize state of he system.

There are 4 events of the synchronization algorithm.  These events are in fact
functions that are registered in the service handler data structure.  They
are called by the synchronization system whenever a network partitions or
merges.

init:
Within the init event a service handler should record temporary state variables
used by the process event.

process:
The process event is responsible for executing synchronization.  This event
will return a state as to whether it has completed or not.  This allows for
synchronization to be interrupted and recontinue when the message queue buffer
is full.  The process event will be called again by the synchronization service
if requesed to do so by the return variable returned in process.

abort:
The abort event occurs when during synchronization a processor failure occurs.

activate:
The activate event occurs when process has returned no more processing is
necessary for any node in the cluster and all messages originated by process
have completed.

CHECKPOINT SYNCHRONIZATION ALGORITHM:
------------------------------------
The purpose of the checkpoint syncrhonization algorithm is to synchronize
checkpoints after a paritition or merge of two or more partitions.  The
secondary purpose of the algorithm is to determine the cluster-wide reference
count for every checkpoint.

Every cluster contains a group of checkpoints.  Each checkpoint has a
checkpoint name and checkpoint number.  The number is used to uniquely reference
an unlinked but still open checkpoint in the cluser.

Every checkpoint contains a reference count which is used to determine when
that checkpoint may be released.  The algorithm rebuilds the reference count
information each time a partition or merge occurs.

local variables
my_sync_state may have the values SYNC_CHECKPOINT, SYNC_REFCOUNT
my_current_iteration_state contains any data used to iterate the checkpoints
	and sections.
checkpoint data
	refcount_set contains reference count for every node consisting of
	number of opened connections to checkpoint and node identifier
	refcount contains a summation of every reference count in the refcount_set

pseudocode executed by a processor when the syncrhonization service calls
the init event
	call process_checkpoints_enter

pseudocode executed by a processor when the synchronization service calls
the process event in the SYNC_CHECKPOINT state
	if lowest processor identifier of old ring in new ring
		transmit checkpoints or sections starting from my_current_iteration_state
	if all checkpoints and sections could be queued
		call sync_refcounts_enter
	else
		record my_current_iteration_state

	require process to continue

pseudocode executed by a processor when the synchronization service calls
the process event in the SYNC_REFCOUNT state
	if lowest processor identifier of old ring in new ring
		transmit checkpoint reference counts
	if all checkpoint reference counts could be queued
		require process to not continue
	else
		record my_current_iteration_state for checkpoint reference counts

sync_checkpoints_enter:
	my_sync_state = SYNC_CHECKPOINT
	my_current_iteration_state set to start of checkpont list

sync_refcounts_enter:
	my_sync_state = SYNC_REFCOUNT

on event receipt of foreign ring id message
	ignore message

pseudocode executed on event receipt of checkpoint update
	if checkpoint exists in temporary storage
		ignore message
	else
		create checkpoint
		reset checkpoint refcount array

pseudocode executed on event receipt of checkpoint section update
	if checkpoint section exists in temporary storage
		ignore message
	else
		create checkpoint section

pseudocode executed on event receipt of reference count update
	update temporary checkpoint data storage reference count set by adding
	any reference counts in the temporary message set to those from the
	event
	update that checkpoint's reference count
	set the global checkpoint id to the current checkpoint id + 1 if it
	would increase the global checkpoint id

pseudocode called when the synchronization service calls the activate event:
for all checkpoints
	free all previously committed checkpoints and sections
	convert temporary checkpoints and sections to regular sections
copy my_saved_ring_id to my_old_ring_id

pseudocode called when the synchronization service calls the abort event:
	free all temporary checkpoints and temporary sections