mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-10-28 05:13:57 +00:00

Author	SHA1	Message	Date
Christine Caulfield	9da89f32c2	CFG: Remove ring-reenable code RRP doesn't exist any more so all the ring re-enable code is redundant. I've removed it from the library and all the code that does anything, but I've left the hole in the IPC just in case old libraries are hanging around. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2017-08-03 14:32:02 +02:00
Jan Friesse	564b4bf7d4	totem: Propagate totem initialization failure Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2017-06-15 11:07:33 +02:00
Jan Friesse	1f90c31ba7	list: Replace for_each by safe version where need Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2016-10-27 14:56:52 +02:00
Michael Jones	b4c06e52f3	list: Replace uses of list.h with qblist.h Signed-off-by: Michael Jones <jonesmz@jonesmz.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-10-27 14:56:52 +02:00
Christine Caulfield	268cde6ee4	totem: Add Kronosnet transport. This is a big update that removes RRP & MRP from the codebase and makes knet the default transport for corosync. UDP & UDPU are still (currently) supported but are deprecated. Also crypto and mutiple interfaces are only supported over knet. To compile this codebase you will need to install libknet from https://github.com/fabbione/kronosnet The corosync.conf(5) man page has been updated with info on the new options. Older config files should still work but many options have changed because of the knet implementation so configs should be checked carefully. In particular any cluster using using RRP over UDP or UDPU will not start as RRP is no longer present. If you need multiple interface support then you should be using the knet transport. Knet brings many benefits to the corosync codebase, it provides support for more interfaces than RRP (up to 8), will be more reliable in the event of network outages and allows dynamic reconfiguration of interfaces. It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have plagued corosync/openais from day 1 Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>	2016-10-11 10:09:42 +01:00
HideoYamauchi	71c9035c27	Low: totemsrp: Addition of the log. Signed-off-by: HideoYamauchi <renayama19661014@ybb.ne.jp> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-08-01 10:11:45 +02:00
Ferenc Wágner	c76ee39f61	Fix typo: Diabled -> disabled Signed-off-by: Ferenc Wágner <wferi@niif.hu> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-06-22 14:26:48 +02:00
Ruben Kerkhof	37f092bbed	totemsrp: Fix clang warning (tautological compare) gsfrom is always >= 0 Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2016-01-04 17:28:14 +01:00
Ferenc Wágner	73910bd66e	totmesrp: Fix typo in log message Signed-off-by: Ferenc Wágner <wferi@niif.hu> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2015-08-26 09:26:26 +02:00
Christine Caulfield	ab8942f626	totemsrp: Improve logging of left/down nodes This patch from Hideo Yamauchi improves the logging of whether nodes leave the cluster cleanly or uncleanly, making it easier to determine if a node ws shut down by the operator. There is also the possibility that a LEAVE message could get missed (due to the node being in flush state) so this can also make that clearer. The modifications are as follows. Change 1) I added the list which maintained LEAVE node to totemsrp. Change 2) I added registration, a search, the handling of to clear LEAVE node. Change 3) I added the output to log. Change 4) I changed an output level of the log. Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2015-06-12 16:16:45 +01:00
Andrey N. Groshev	5d9acc5604	totemsrp: Format member list log as unsigned int Signed-off-by: Andrey N. Groshev <greenx@yandex.ru> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2015-03-05 16:34:07 +01:00
Jason	4ee84c51fa	totem: Ignore duplicated commit tokens in recovery In active rrp mode, commit tokens are treated as mcast data messages, thus, rrp directly delivers them to srp layer by active_mcast_recv(). This will result in duplicated commit tokens being received by srp from different heartbeat links. If node is in recovery state and has already sent out the initial orf token, those duplicated commit tokens will cause message_handler_memb_commit_token() to send initial orf token again! This is wrong because it resets the orf token content in instance->orf_token_retransmit, which breaks the token retransmission state. Furthermore, by sending those initial orf tokens again and again, it may lead active_token_recv() to drop some subsequent orf tokens. It is OK for rrp because srp will do token retransmission, but as said above, srp retransmission state has already been broken, so finally we meet a "token lost in recovery state" condition caused by software. If token timeout value is large, then it will takes long time to create a new ring. This can be reproduced by having two noded set to active rrp mode, with two heartbeat links. Then with one node always on, let the other one do stop/start again and again. It has a low probability to reproduce. In theory, I think, the more heartbeat links used, the more easily it can be reproduced. This problem can be resolved by letting message_handler_memb_commit_token() to ignore duplicated commit tokens in recovery state if node (the ring representation) has already sent out the initial orf token. Different from prev take, this version do not depends on stored token data but uses originated_orf_token in totemsrp_instance to remember if initial orf token has been already originated for current membership. Signed-off-by: Jason <huzhijiang@gmail.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2015-01-15 17:33:04 +01:00
Jan Friesse	acb55cdb03	totem: Inform RRP about membership changes Services are informed about membership changes, but if same information is needed inside totemrrp or totemnet, it's impossible to gather this information. Patch makes this possible for now only for RRP with empty callbacks. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-26 15:35:56 +02:00
Jason HU	f135b68096	Cancel token holding while in retransmition When there is no other activty on ring but only retransmition, and token is in hold mode, the retransmition will become slow. More over, if the retransmition is always fail but token rotation works well, then it takes quite a lone time (fail_to_recv_const * token_hold = 2500 * 180ms = 450sec) for the retransmit requester to meet the "FAILED TO RECEIVE" condition to re-construct a new ring. This problem can be solved by checking if retransmits are present before going into hold. If a node is the retransmit requester or the resender, it set my_token_held to 0 to speed up retransmition and omit further unnecessary sending of token_hold_cancel signal. Signed-off-by: Jason HU <huzhijiang@gmail.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-12 09:28:04 +02:00
Jan Friesse	da46ecfc30	Move ringid store and load from totem library Functions for storing and loading ring id was in the totem library. This causes problem, what to do when it's impossible to load or store ring id. Easy solution seemed to be assert, but sadly this makes hard for user to find out what happened (because corosync was just aborted and logsys didn't flush) Solution is to move these functions to main.c, where is much easier to handle error. This also makes libtotem free of any file system operations. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-02 14:54:57 +02:00
Jan Friesse	d310b251c3	Introduce get_run_dir function Run dir (LOCALSTATEDIR/lib/corosync) was hardcoded thru whole codebase. Totemsrp was trying to create and chdir into it, but also takes into account environment variable COROSYNC_RUN_DIR creating inconsistency. get_run_dir correctly returns COROSYNC_RUN_DIR (when set) or LOCALSTATEDIR/lib/corosync. This is now used by all functions instead of hardcoded string. All occurrences of mkdir/chdir are removed from totemsrp and chdir is now called in main function. Mkdir call is completely removed, because it was not used anyway (check in main.c was called before totemsrp init, so mkdir was never called) and also make install and/or package system should take care of creating this directory with correct permissions/context. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-02 14:53:18 +02:00
Jan Friesse	38c04d9a66	totemsrp: Fix typo with cont gather Patch `f3ffd3da5c` introduced named states of state-machine, but sadly contains logical problem causing stats.continuous_gather increasing even when it shouldn't. Problem is not critical, because continuous_gather is set to 0 on successful membership creation. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-02-18 16:12:57 +01:00
Jason	cfbb021e13	totem: Drop invalid join msg in operational state According to the totem paper, if a processor receives a join message in the operational state and if the receivers identifier is in the join messages fail list, then join message should be ignored. By applying this validation of join messages, we can avoid unnecessary switching from operational state to gather state(or even lead to rings can not be merged) like the following to happen. 1. Initially, there is only one ring contains three nodes, say ring(A,B,C). 2. A and B network partition, "in the same time", C is down. 3. Node A sends join message with proclist:A,B,C. faillist:NULL. Node B sends join message with proclist:A,B,C. faillist:NULL. 4. Both A and B consensus timeout due to network partition. 5. A and B network remerged. 6. Node A sends join message with proclist:A,B,C. faillist:B,C. and create ring(A). Node B sends join message with proclist:A,B,C. faillist:A,C. and create ring(B). 7. Say join message with proclist:A,B,C. faillist:A,C which sent by node B is received by node A because network remerged. 8. Node A shifts to gather state and send out a modified join message with proclist:A,B,C. faillist:B. Such join message will prevent both A and B from merging. 9. Node A consensus timeout (caused by waiting node C) and sends join message with proclist:A,B,C. faillist:B,C again. Same thing happens on node B, so A and B will dead loop forever in step 7, 8 and 9. As the paper also said: "If a processor receives a join message in the operational state and if the sender's identifier is in the receiver's my_proclist and the join message's ring_seq is less than the receiver's ring sequence number, then it ignores the join message too." So these patch applying these validations of join messages altogether. Signed-off-by: Jason <huzhijiang@gmail.com> Reviewed-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-01-13 14:46:13 +01:00
Masatake YAMATO	f3ffd3da5c	totemsrp: Show English message when memb_state_gather_enter is called The reason why memb_state_gather_enter is invoked was printed in integer code. This patch introduces human readable English messages for the code. Signed-off-by: Masatake YAMATO <yamato@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2013-10-24 16:46:17 +02:00
Christine Caulfield	074e57910e	The corosync message "A processor joined or left the membership" is vague and unhelpful. People have to look for the following quorum message and try to deduce which nodes have joined or left from that and past membership messages, even though the routine printing the message already has this information to hand. This patch fixes that message so that it prints the nodeids of the nodes that have joined/left the cluster. Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com> Reviewed-By: Jan Friesse <jfriesse@redhat.com>	2013-06-27 14:44:46 +01:00
Jan Friesse	92e0f9c7bb	Add waiting_trans_ack also to fragmentation layer Patch for support waiting_trans_ack may fail if there is synchronization happening between delivery of fragmented message. In such situation, fragmentation layer is waiting for message with correct number, but it will never arrive. Solution is to handle (callback) change of waiting_trans_ack and use different queue. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-11-22 11:48:12 +01:00
Jan Friesse	2d4e7bebb5	Handle segfault in backlog_get If instance->memb_state is not OPERATION or RECOVERY, we was passing NULL to cs_queue_used call. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-11-22 11:48:07 +01:00
Steven Dake	402638929e	Fix problem with sync operations under very rare circumstances This patch creates a special message queue for synchronization messages. This prevents a situation in which messages are queued in the new_message_queue but have not yet been originated from corrupting the synchronization process. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-11-22 11:47:57 +01:00
Jan Friesse	d4db2ea535	If failed_to_recv is set, consensus can be empty If failed_to_recv is set (node detect itself not able to receive message), we can end up with assert, because my_failed_list and my_member_list are same list. This is happening because we are not following specification and we allow to mark node itself as failed. Because if failed_to_recv is set and we reached consensus across nodes, single node membership is created (ignoring both fail list and member_list), we can skip assert. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-11-05 15:16:25 +01:00
Jan Friesse	d042671369	Move "Totem is unable to form..." message to main Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-10-08 16:53:33 +02:00
Jan Friesse	5ce59f49ba	Move some totem and cpg messages to trace level Messages which are flow messages, rather then lifecycle are now logged in trace level. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-09-19 11:03:16 +02:00
Jan Friesse	932829bfca	Add header files when needed Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2012-09-03 09:34:31 +02:00
Tim Beale	77ea036c72	Remove unused structure Nowhere in the corosync codebase references this structure. Signed-off-by: Tim Beale <tlbeale@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2012-08-21 14:11:48 +02:00
Jan Friesse	0791f44c41	Include ringid in processor joined log message This should help correlate syslog entires with their blackbox counterparts. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Andrew Beekhof <andrew@beekhof.net>	2012-05-17 14:58:04 +02:00
Jan Friesse	e925f42165	Make ifaces_get work with dynamic no_rings Commit which added number of addresses to srp_address structure didn't count with totemsrp_ifaces_get where whole structure was copied instead of addresses only. This is now fixed. Also to make API totempg forward compatible, size of interfaces array must be passed to ifaces_get like functions to prevent memory overwrite. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-03-26 11:54:26 +02:00
Jan Friesse	124ff4339c	Add no_addrs field in srp_addr structure This should allow us future change to dynamic number of rings without breaking wire compatibility. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2012-03-22 14:03:38 +01:00
Jan Friesse	3b7c2f0588	Update crypto_set API Also few leftovers from cfg is removed and version of totempg is increased to 5 to reflect all changes we made Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2012-03-15 17:33:53 +01:00
Jan Friesse	8cdd2fc493	Remove libtomcrypt Tomcrypt in corosync is for long time not updated. Because we have support for libnss, libtomcrypt can be removed. Also few leftovers (AES is 256 bits, not 128, ...) are removed. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2012-03-13 09:19:47 +01:00
Angus Salkeld	3131601ce2	Remove all unneccessary "\n" from log messages These look ugly, are inconsistently done and just have to be removed later in libqb before calling syslog. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2012-01-23 13:08:23 +11:00
Jan Friesse	bb6bbd01e6	Store rrp faulty status of ring in cmap New key with faulty status of ring is created in cmap as name runtime.totem.pg.mrp.rrp.$ring_number.faulty Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2012-01-11 14:12:06 +01:00
Steven Dake	8ad583a54c	Move logsys.c into corosync binary instead of a shared object Our preferred shared logging system is exported via the libqb library. As a result, the corosync project no longer needs to export logsys.so and the code can be directly included in the binary. The header file can also be removed. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2012-01-06 18:19:59 -07:00
Yunkai Zhang	232ac5a7fe	Correct nodeid in memb_state_commit_token_send function Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-30 11:21:22 -07:00
Steven Dake	e48ddf99a6	From: Yunkai Zhang: Today, I have observed one of the reason that corosync running into FAILED TO RECEIVE state. There was five nodes(A,B,C,D,E) in my testing, and I limited the UDP transmission rate of C nodes by iptables command: iptables -A INPUT -i eth0 -p udp -m limit --limit 10000/s --limit-burst 1 -j ACCEPT iptables -A INPUT -i eth0 -p udp -j DROP After one hour later, C node had been missing some MCAST messages, it's state described as following: ==state of C node== my_aru:0x805 my_high_seq_received:0xC2C my_aru_count:7 =>receved MCAST message with seq:806 from B nodes =>enter message_handler_mcast =>add this message to regular_sort_queue ... =>enter update_aru function => range = (my_high_seq_received - my_aru) = (0xC2C - 0x805) = 1063 => if range>1024, do nothing and and return directly. ==END== According this logic, after (my_high_req_received-my_aru)>1024, my_aru will not be updated though corosync can receive MCAST messages retransmitted by other nodes. But at that timte, my_aru_count was only 7. So the corosync at C node would keep in this status until my_aru_count increased to fail_to_recv_const(the default value is 2500). This was a long time for corosync, but we wasted it. To solve this issue, maybe we can enlarge the range condition in update_aru function? Or we just ingnore the checking of range value, it seems no harmfull, because we have been using fail_to_recv_const to control the things. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-11-29 10:59:11 -07:00
Yunkai Zhang	19652c3d7c	Correct nodeid of token when we retransmit it Although incorrect nodeid will not affect program's logic, but it will make us confused when we add some logs to record the transmission path of token in debug mode. Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-28 05:56:28 -07:00
Yunkai Zhang	d991400372	Fixed bug when corosync receive JoinMSG in OPERATIONAL state Accordig the totem protocal, nodes should enter GATHER state when it receive JoinMSG in OPERATIONAL state. If we discard it in OPERATIONAL state, the nodes sending this JoinMSG could not receive the response untill other nodes reach token lost timeout. This bug will cause nodes having entered GATHER state spend more time to rejoin the ring, and then it will make nodes reach token expired timeout more easily. Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-26 08:52:26 -07:00
Angus Salkeld	92ca91fa66	TOTEM: better clean up on exit Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-11-11 09:08:04 +11:00
Steven Dake	2ec4ddb039	Deliver all messages from my_high_seq_recieved to the last gap This patch passes two test cases: ------- Test #1 ------- Two node cluster - run cpgbench on each node modify totemsrp with following defines: Two test cases: ------- Test #2 ------- 5 node cluster start 5 nodes randomly at about same time, start 5 nodes randomly at about same time, wait 10 seconds and attempt to send a message. If message blocks on "TRY_AGAIN" likely a message loss has occured. Wait a few minutes without cyclng the nodes and see if the TRY_AGAIN state becomes unblocked. If it doesn't the test case has failed Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-09-22 10:21:37 +02:00
Jan Friesse	752239eaa1	rrp: Higher threshold in passive mode for mcast There were too much false positives with passive mode rrp when high number of messages were received. Patch adds new configurable variable rrp_problem_count_mcast_threshold which is by default 10 times rrp_problem_count_threshold and this is used as threshold for multicast packets in passive mode. Variable is unused in active mode. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed by: Steven Dake <sdake@redhat.com>	2011-09-01 11:21:09 +02:00
Steven Dake	32f11337b1	Remove hdb.h header includes from unnecessary files The files in this patch do not use the hdb.h header. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-23 22:28:40 -07:00
Steven Dake	71f044bfe7	Add totempg_threaded_mode_enable() api This API allows totem to operate as a multithreaded library. Performance is better without threads but some library users may only have multithreaded systems. In the corosync case where we have removed threads, this reduces cpu utilization by ~10% by removing about 50% of the mutex lock and unlock calls that occur during typical operation. Since the latest corosync is nearly thread free, there is no need for mutex operations. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-22 19:31:52 -07:00
Steven Dake	9f36a892a8	Move cs_queue.h from include directory to exec directory This file is only used by totemsrp.c. Move out of general include directory. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-22 19:31:33 -07:00
Tim Beale	370d9bcecf	Display ring-ID consistently in debug Ring ID was being displayed both as hex and decimal in places. Update so it's displayed consistently (I chose hex) to make debugging easier. Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-17 12:15:16 +10:00
Tim Beale	5a724a9c39	Add code comment mapping for message handler defines As a corosync-newbie it can be hard to bridge the gap between where a particular message is sent and where the receive handler processes it, and vice versa. Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-17 11:52:25 +10:00
Angus Salkeld	37e17e7a94	libqb: logging & trace Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	b5afc9283d	libqb: change pause_timestamp to uint64_t Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:15 +10:00

1 2 3 4 5

242 Commits