mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2026-01-19 12:19:12 +00:00

Author	SHA1	Message	Date
Fabio M. Di Nitto	10098dba27	votequorum: add last_man_standing support (default: off) this flag (0\|1) can be configured via quorum.last_man_standing and when enabled, it allows expected_votes to be dynamically recalculated. Assuming an 8 nodes cluster, every node votes 1 (mandatory requirement for this feature). In the first event, 3 nodes are lost. The remaining partition of 5 is barely quorate. After a configurable timeout (quorum.last_man_standing_window, default 10sec) the quorate partition is allow to recalculate expected_votes based on the remaining nodes. This operation will bring expected_votes to 5 and quorum to 3. Repeating the above loop, in the next event, 2 more nodes are allowed to die. etc. etc. Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-01-10 15:48:17 +01:00
Fabio M. Di Nitto	9589611dc4	votequorum: drop concept of DISALLOWED this is a very old leftover from the RHEL5 timeframe, not used in RHEL6. Also change votequorum soname since this change implies an ABI change. Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-01-10 15:48:10 +01:00
Fabio M. Di Nitto	b41372c6b2	votequorum: add auto_tie_breaker support (default: off) this flag (0\|1) can be configured via quorum.auto_tie_breaker and when enabled, support for perfect even split is on. In case of a 50% of votes loss in one single transition, the partition with the node that has the lowest node id will remain quorate. Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-01-10 15:47:57 +01:00
Fabio M. Di Nitto	e8d0af0bc8	votequorum: add wait_for_all support (default: off) this flag (0\|1) can be configured via quorum.wait_for_all and changes behavior when granting quorum for the first time. Normal behavior (default / 0) grants quorum as soon as enough nodes are available in a cluster. Setting this value to 1 will grant quorum only after all cluster memembers are part of the cluster at the same time. Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-01-10 15:47:52 +01:00
Fabio M. Di Nitto	e34d509df7	quorum: change API to return quorum type at initialization time corosync internal theory of operation is that without a quorum provider the cluster is always quorate. This is fine for membership free clusters but it does pose a problem for applications that need membership and "real" quorum. this change add quorum_type to quorum_initialize call to return QUORUM_FREE or QUORUM_SET. Applications can then make their own decisions to error out or continue operating. The only other way to know if a quorum provider is enabled/configured is to poke at confdb/objdb, but adds an unnecessary burden to applications that really don't need to use an entire library for a boolean value. Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-01-10 15:47:24 +01:00
Steven Dake	e5aba30a49	Move coroapi out of external headers Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkled <asalkeld@redhat.com>	2012-01-07 17:47:45 -07:00
Steven Dake	8ad583a54c	Move logsys.c into corosync binary instead of a shared object Our preferred shared logging system is exported via the libqb library. As a result, the corosync project no longer needs to export logsys.so and the code can be directly included in the binary. The header file can also be removed. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2012-01-06 18:19:59 -07:00
Angus Salkeld	7ef81f1235	Fix some iterator based mem leaks Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2012-01-06 12:25:08 +11:00
Fabio M. Di Nitto	de194160e1	ipc: make less noise switch from LOG_INFO to LOG_DEBUG for some basic operations Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-01-03 11:12:52 +01:00
Jan Friesse	7c250a5147	Remove objdb and confdb Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-12-15 09:19:18 +01:00
Jan Friesse	8a45e2b152	Move corosync core to use icmap Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-12-15 09:19:17 +01:00
Jan Friesse	525e6a6ebe	Add icmap Icmap is replacement for objdb, based on libqb map (trie). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-12-15 09:19:17 +01:00
Angus Salkeld	7b02f176df	Check for the correct message size in totempg_groups_joined_reserve() Currently: - send_reserve() adds to the reserve - msg_count_send_ok() tests ((avail - totempg_reserved) > msg_count) So essentially we are checking to see if 2 * msg_count can fit in the q. So instead I am using byte_count_send_ok (size) to see if the message will fit then calling send_reserve() Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-12-15 10:43:11 +11:00
Angus Salkeld	2ba4ebe09e	Fix cpgbench (large message sizes) To allow async cpg messages of 1M we need to: 1) increase the totem queue size 4 times 2) align the critical level to one large message free There are a number of reasons for doing this: We can't let cpg_mcast_joined() fail because the user will not see it and will assume is has succeded. The reason we are getting good performance is by providing a negative feedback loop from the totem q to the IPC/poll system. This relies on 4 q states low/med/high/crit. With messages of size 1M you now have a q of size one and now go from level low to crit instantly then back to low as messages are put on and taken off. I don't think this is the best behaviour. By having a q size of 4 allows the system to utilize the q better and give us time to respond to changes in the q level. To effectively achieve flow control with a q of size 1 would require all the clients to request the space on the q like is done in totempg_groups_joined_reserve() but probably in shared memory This would take quite a bit of re-work. Signed-off-by: Angus Salkeld <asalkeld@redhat.com>	2011-12-15 10:43:11 +11:00
Angus Salkeld	94b11502cb	LOG: get the logging to work from loaded quorum modules Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-12-15 10:10:54 +11:00
Angus Salkeld	a748700cde	Be more flexible (correct) with flowcontrol. Many functions do not require flowcontrol and are two-way so they can get failures from corosync. Only cpg_mcast_joined() _really_ needs the current level of flowcontrol. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-12-14 12:03:42 +11:00
Angus Salkeld	c317ee433f	LOG: Fix a crash in the shutdown. Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Angus Salkeld <asalkeld@redhat.com>	2011-12-13 15:00:42 +11:00
Yunkai Zhang	232ac5a7fe	Correct nodeid in memb_state_commit_token_send function Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-30 11:21:22 -07:00
Steven Dake	e48ddf99a6	From: Yunkai Zhang: Today, I have observed one of the reason that corosync running into FAILED TO RECEIVE state. There was five nodes(A,B,C,D,E) in my testing, and I limited the UDP transmission rate of C nodes by iptables command: iptables -A INPUT -i eth0 -p udp -m limit --limit 10000/s --limit-burst 1 -j ACCEPT iptables -A INPUT -i eth0 -p udp -j DROP After one hour later, C node had been missing some MCAST messages, it's state described as following: ==state of C node== my_aru:0x805 my_high_seq_received:0xC2C my_aru_count:7 =>receved MCAST message with seq:806 from B nodes =>enter message_handler_mcast =>add this message to regular_sort_queue ... =>enter update_aru function => range = (my_high_seq_received - my_aru) = (0xC2C - 0x805) = 1063 => if range>1024, do nothing and and return directly. ==END== According this logic, after (my_high_req_received-my_aru)>1024, my_aru will not be updated though corosync can receive MCAST messages retransmitted by other nodes. But at that timte, my_aru_count was only 7. So the corosync at C node would keep in this status until my_aru_count increased to fail_to_recv_const(the default value is 2500). This was a long time for corosync, but we wasted it. To solve this issue, maybe we can enlarge the range condition in update_aru function? Or we just ingnore the checking of range value, it seems no harmfull, because we have been using fail_to_recv_const to control the things. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-11-29 10:59:11 -07:00
Yunkai Zhang	19652c3d7c	Correct nodeid of token when we retransmit it Although incorrect nodeid will not affect program's logic, but it will make us confused when we add some logs to record the transmission path of token in debug mode. Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-28 05:56:28 -07:00
Yunkai Zhang	d991400372	Fixed bug when corosync receive JoinMSG in OPERATIONAL state Accordig the totem protocal, nodes should enter GATHER state when it receive JoinMSG in OPERATIONAL state. If we discard it in OPERATIONAL state, the nodes sending this JoinMSG could not receive the response untill other nodes reach token lost timeout. This bug will cause nodes having entered GATHER state spend more time to rejoin the ring, and then it will make nodes reach token expired timeout more easily. Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:52:26 -07:00
Angus Salkeld	92ca91fa66	TOTEM: better clean up on exit Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 09:08:04 +11:00
Angus Salkeld	a6729003a6	OBJDB: free up resources on exit Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 09:06:50 +11:00
Angus Salkeld	0fc51c40fd	LOG: cleanup logging resources at exit Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 09:05:08 +11:00
Angus Salkeld	21f1008be8	Clean up the poll loop resourses on exit Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 08:13:08 +11:00
Angus Salkeld	f5a31e55a2	Add calls to missing object_find_destroy() to fix mem leaks Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 08:12:13 +11:00
Angus Salkeld	390391acba	Free mem allocated by getaddrinfo Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 08:11:17 +11:00
Anton Jouline	a358791d5b	Adding support for dynamic membership with UDPU transport Add a new object called totem.interface.dynamic to allow creation/deletion of new child objects using the corosync-objctl utility: to add new member: linux# corosync-objctl -c totem.interface.dynamic.10-211-55-12 to delete an existing member: linux# corosync-objctl -d totem.interface.dynamic.10-211-55-12 Corosync will dynamically add these members to the configuration and start communicating with those nodes. Signed-off-by: Anton Jouline <anton.jouline@cbsinteractive.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-27 23:52:16 -07:00
Jan Friesse	783dd4e553	Remove unused buf and len variables in log_printf Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-25 16:29:10 +02:00
Jan Friesse	26db8b21b2	api: Change some of totempg definitons Recent changes in patch "Get rid of hdb usage in totempg.h interface" caused incompatibility between corosync API and totempg. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-24 17:43:36 +02:00
Jan Friesse	87821f52a6	totemmrp: Allow compilation without warnings Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-24 17:43:32 +02:00
Jan Friesse	1711aea72f	Allow compilation of totempg without warnings Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-24 17:43:28 +02:00
Angus Salkeld	2cf37d4063	Set the size of the blackbox to the size on flatiron Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-22 17:42:53 +11:00
Angus Salkeld	9fbd5c08c4	don't log an error if exiting with 0 Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-22 10:51:47 +11:00
Steven Dake	8671c967e1	res could return an undefined value if there was no error in totempg_groups_initialize Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-10-21 03:01:14 -07:00
Angus Salkeld	78a5260c06	LOG: use libqb facility conversion functions Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:34:43 +11:00
Angus Salkeld	0e58141a2f	LOG: get logging to file working correctly Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:34:43 +11:00
Angus Salkeld	26a6e26f57	LOG: Fix debugging Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:34:43 +11:00
Masatake YAMATO	721e2d2a2a	Remove cloned lines in main of main.c Signed-off-by: Masatake YAMATO <yamato@redhat.com>	2011-10-09 20:32:39 -07:00
Steven Dake	2ec4ddb039	Deliver all messages from my_high_seq_recieved to the last gap This patch passes two test cases: ------- Test #1 ------- Two node cluster - run cpgbench on each node modify totemsrp with following defines: Two test cases: ------- Test #2 ------- 5 node cluster start 5 nodes randomly at about same time, start 5 nodes randomly at about same time, wait 10 seconds and attempt to send a message. If message blocks on "TRY_AGAIN" likely a message loss has occured. Wait a few minutes without cyclng the nodes and see if the TRY_AGAIN state becomes unblocked. If it doesn't the test case has failed Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-09-22 10:21:37 +02:00
Jan Friesse	f6c2a8dab7	totemconfig: change minimum RRP threshold RRP threshold can be lower value then 5. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2011-09-08 09:52:16 +02:00
Steven Dake	48ffa8892d	Ignore memb_join messages during flush operations a memb_join operation that occurs during flushing can result in an entry into the GATHER state from the RECOVERY state. This results in the regular sort queue being used instead of the recovery sort queue, resulting in segfault. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-09-02 09:58:44 -07:00
Jan Friesse	752239eaa1	rrp: Higher threshold in passive mode for mcast There were too much false positives with passive mode rrp when high number of messages were received. Patch adds new configurable variable rrp_problem_count_mcast_threshold which is by default 10 times rrp_problem_count_threshold and this is used as threshold for multicast packets in passive mode. Variable is unused in active mode. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed by: Steven Dake <sdake@redhat.com>	2011-09-01 11:21:09 +02:00
Jan Friesse	0eade8de79	rrp: Handle endless loop if all ifaces are faulty If all interfaces were faulty, passive_mcast_flush_send and related functions ended in endless loop. This is now handled and if there is no live interface, message is dropped. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed by: Steven Dake <sdake@redhat.com>	2011-09-01 11:20:18 +02:00
Steven Dake	e920fef7e9	Get rid of hdb usage in totempg.h interface hdb has some expense and is not necessary in the totempg.so runtime. This patch removes the dependence on hdb and instead uses a direct pointer. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-23 22:29:01 -07:00
Steven Dake	32f11337b1	Remove hdb.h header includes from unnecessary files The files in this patch do not use the hdb.h header. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-23 22:28:40 -07:00
Steven Dake	71f044bfe7	Add totempg_threaded_mode_enable() api This API allows totem to operate as a multithreaded library. Performance is better without threads but some library users may only have multithreaded systems. In the corosync case where we have removed threads, this reduces cpu utilization by ~10% by removing about 50% of the mutex lock and unlock calls that occur during typical operation. Since the latest corosync is nearly thread free, there is no need for mutex operations. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-22 19:31:52 -07:00
Steven Dake	9f36a892a8	Move cs_queue.h from include directory to exec directory This file is only used by totemsrp.c. Move out of general include directory. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-22 19:31:33 -07:00
Steven Dake	67972efa7d	use va version of external log function This removes a sprintf operation in the totem and ipc logging operations Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-22 19:31:15 -07:00
Tim Beale	370d9bcecf	Display ring-ID consistently in debug Ring ID was being displayed both as hex and decimal in places. Update so it's displayed consistently (I chose hex) to make debugging easier. Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-17 12:15:16 +10:00

1 2 3 4 5 ...

1420 Commits