mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-07-26 23:43:19 +00:00

Author	SHA1	Message	Date
Angus Salkeld	9fbd5c08c4	don't log an error if exiting with 0 Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-22 10:51:47 +11:00
Steven Dake	8671c967e1	res could return an undefined value if there was no error in totempg_groups_initialize Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-10-21 03:01:14 -07:00
Angus Salkeld	78a5260c06	LOG: use libqb facility conversion functions Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:34:43 +11:00
Angus Salkeld	0e58141a2f	LOG: get logging to file working correctly Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:34:43 +11:00
Angus Salkeld	26a6e26f57	LOG: Fix debugging Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-10-21 19:34:43 +11:00
Masatake YAMATO	721e2d2a2a	Remove cloned lines in main of main.c Signed-off-by: Masatake YAMATO <yamato@redhat.com>	2011-10-09 20:32:39 -07:00
Steven Dake	2ec4ddb039	Deliver all messages from my_high_seq_recieved to the last gap This patch passes two test cases: ------- Test #1 ------- Two node cluster - run cpgbench on each node modify totemsrp with following defines: Two test cases: ------- Test #2 ------- 5 node cluster start 5 nodes randomly at about same time, start 5 nodes randomly at about same time, wait 10 seconds and attempt to send a message. If message blocks on "TRY_AGAIN" likely a message loss has occured. Wait a few minutes without cyclng the nodes and see if the TRY_AGAIN state becomes unblocked. If it doesn't the test case has failed Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-09-22 10:21:37 +02:00
Jan Friesse	f6c2a8dab7	totemconfig: change minimum RRP threshold RRP threshold can be lower value then 5. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2011-09-08 09:52:16 +02:00
Steven Dake	48ffa8892d	Ignore memb_join messages during flush operations a memb_join operation that occurs during flushing can result in an entry into the GATHER state from the RECOVERY state. This results in the regular sort queue being used instead of the recovery sort queue, resulting in segfault. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-09-02 09:58:44 -07:00
Jan Friesse	752239eaa1	rrp: Higher threshold in passive mode for mcast There were too much false positives with passive mode rrp when high number of messages were received. Patch adds new configurable variable rrp_problem_count_mcast_threshold which is by default 10 times rrp_problem_count_threshold and this is used as threshold for multicast packets in passive mode. Variable is unused in active mode. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed by: Steven Dake <sdake@redhat.com>	2011-09-01 11:21:09 +02:00
Jan Friesse	0eade8de79	rrp: Handle endless loop if all ifaces are faulty If all interfaces were faulty, passive_mcast_flush_send and related functions ended in endless loop. This is now handled and if there is no live interface, message is dropped. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed by: Steven Dake <sdake@redhat.com>	2011-09-01 11:20:18 +02:00
Steven Dake	e920fef7e9	Get rid of hdb usage in totempg.h interface hdb has some expense and is not necessary in the totempg.so runtime. This patch removes the dependence on hdb and instead uses a direct pointer. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-23 22:29:01 -07:00
Steven Dake	32f11337b1	Remove hdb.h header includes from unnecessary files The files in this patch do not use the hdb.h header. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-23 22:28:40 -07:00
Steven Dake	71f044bfe7	Add totempg_threaded_mode_enable() api This API allows totem to operate as a multithreaded library. Performance is better without threads but some library users may only have multithreaded systems. In the corosync case where we have removed threads, this reduces cpu utilization by ~10% by removing about 50% of the mutex lock and unlock calls that occur during typical operation. Since the latest corosync is nearly thread free, there is no need for mutex operations. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-22 19:31:52 -07:00
Steven Dake	9f36a892a8	Move cs_queue.h from include directory to exec directory This file is only used by totemsrp.c. Move out of general include directory. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-22 19:31:33 -07:00
Steven Dake	67972efa7d	use va version of external log function This removes a sprintf operation in the totem and ipc logging operations Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-22 19:31:15 -07:00
Tim Beale	370d9bcecf	Display ring-ID consistently in debug Ring ID was being displayed both as hex and decimal in places. Update so it's displayed consistently (I chose hex) to make debugging easier. Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-17 12:15:16 +10:00
Tim Beale	5a724a9c39	Add code comment mapping for message handler defines As a corosync-newbie it can be hard to bridge the gap between where a particular message is sent and where the receive handler processes it, and vice versa. Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-17 11:52:25 +10:00
Steven Dake	2df7b7b8e1	properly define rec_token_cq_send_event_fn Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-15 11:15:00 -07:00
Steven Dake	e416a04b02	Define totemiba_log_printf properly Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-15 11:14:43 -07:00
Steven Dake	2565dfa03d	Fix problem in totemiba where incorrect define is used (and also not defined) Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-15 11:14:21 -07:00
Jan Friesse	99852ab203	Allow compile master on RHEL 6 corosync_timer_handle_t is know conditionally defined to prevent double definition causing compile fault on RHEL 6 systems. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-09 11:29:48 +02:00
Angus Salkeld	cdf5e95ab4	Make realtime scheduling optional not the default. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	37e17e7a94	libqb: logging & trace Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	a716f13bf9	Fix some compiler warnings Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	bd150728bf	libqb: Improve IPC dispatch and async handling Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	b5afc9283d	libqb: change pause_timestamp to uint64_t Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:15 +10:00
Angus Salkeld	b8eae0e769	libqb: rip out objdb & serialize locks Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:15 +10:00
Angus Salkeld	17a4e6d9e5	libqb: only init IPC on service engines that need it. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:15 +10:00
Angus Salkeld	b785c5ed08	libqb: use the main loop to shutdown Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:15 +10:00
Angus Salkeld	63e16ab583	libqb: remove tsafe.c Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:15 +10:00
Angus Salkeld	78e06739b7	libqb: remove worker thread - keep to one thread. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:15 +10:00
Angus Salkeld	f717bc60e1	libqb: make timer api a wrapper around qb_loop timers. - change timeout value to nano seconds - fix timer handles (don't alloc on stack) Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:14 +10:00
Angus Salkeld	c6895faa05	libqb: change ipc -> qb_ipc IPC: return 0/-ENOBUFS from message handler IPC: use the new rate_limit API to improve perf. CPG: add send_async API & hook up flow control IPC: Fix flow control getting stuck. IPC: Port the remaining libs to use libqb IPC IPC: remove libqb flowcontrol API TEST: put cpg_dispatch() in it's own thread IPC: cleanup ipc_glue.c name everything cs_ipcs_() IPC: add back statistics IPC: remove coroipcc_ symbols from lib.versions IPC: init each se's IPC as it is loaded. IPC: use the new connection_closed() event to free the context. IPC: re-add zero copy functionality back IPC: remove cpg_mcast_joined_async() and make it the default -> now cpg_mcast_joined() == cpg_mcast_joined_async() libqb: expose a libqb error converter libqb: add missing error conversions libqb: remove repeat try loop in lib/cpg.c CPG: fix zero copy mcast CPG: use newer return codes Add ENOTCONN to qb_to_cs_error() libqb: fix error conversion from errno to cs_error_t in confdb libqb: change errno_to_cs to qb_to_cs_error libqb: add a cs_strerror() to get a more meaningful message libqb: fix some confusing error conversions. libqb: set the timeout on recv's to -1 (wait forever) Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:14 +10:00
Angus Salkeld	fce8a3c3b6	libqb: convert coropoll calls to qb_loop calls. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:14 +10:00
Jan Friesse	d4fb83e971	main: let poll really stop before totempg_finalize Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-07-26 10:07:08 +02:00
Jan Friesse	ddb5214c2c	Revert "totemsrp: Remove recv_flush code" This reverts commit `1a7b7a39f4`. Reversion is needed to remove overflow of receive buffers and dropping messages. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2011-07-26 10:05:55 +02:00
MORITA Kazutaka	1d9f444fec	totemsrp: fix buffer overflows for large clusters (> 100 nodes) Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-07-24 13:33:26 -07:00
Tim Beale	04f37df2f7	Add some more stats for debugging + overload - number of times client is told to try again + invalid_request - message contained invalid paramter, e.g. invalid size + msg_queue_avail - messages currently available at the Totem layer + msg-queue_reserved - messages currently reserved at the Totem layer Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-07-19 08:58:41 -07:00
Jan Friesse	ad5cda223c	rrp: Handle rollower in passive rrp properly Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-07-18 11:46:56 +02:00
Jan Friesse	d02d288747	rrp: handle rollover in active rrp properly Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-07-18 11:46:50 +02:00
Jan Friesse	a48c8e517d	totemconfig: Change default FAIL_TO_RECV_CONST Previous default (50) was too low for most modern switch hardware. This may trigger abort because the aru doesn't increase for 50 token rotations combined with a defect in how failed to recv conditions are handled. By increasing this tunable, the condition should no longer trigger the errant code. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-07-18 11:46:21 +02:00
Steven Dake	c544e87bb0	Correct missing poll funtions from service handler struct needed for confdb APIs Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-07-15 13:30:41 -07:00
Steven Dake	a3d98f1652	Fix problem where corosync will segfault if there are gaps in recovery queue Fixes a problem where there are gaps in the recovery queue. Example my_aru = 5, but there are messages at 7,8. 8 = my_high_seq_received which results in data slots taken up in new message queue. What should really happen is these last messages should be delivered after a transitional configuration to maintain SAFE agreement. We don't have support for SAFE atm, so it is probably safe just to throw these messages away. Without this change, the new message queue on a new configuraton change is out of sync. Signed-off-by: Steven Dake <sdake@redhat.com> Tested-by: Tim Beale <tlbeale@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-07-15 10:39:57 -07:00
Jan Friesse	57749ec02a	totemiba: free send_buf on ibv_reg_mr failure Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-07-08 08:15:14 +02:00
Tim Beale	77f7e5b0fe	Fix compile/runtime issues for _POSIX_THREAD_PROCESS_SHARED < 1 For the case where _POSIX_THREAD_PROCESS_SHARED < 1, the code doesn't compile for corosync v1.3.1. And when it does compile, it crashes on our system - our version of uClibc seems to always expect a 4th arg. The man pages suggests the 4th arg is optional, but does say: 'For greater portability it is best to always call semctl() with four arguments', which is what this patch does. Also removed semop as it's an unused variable. Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-07-06 06:44:22 -07:00
Tim Beale	ba107f0a33	getpwnam_r()/getgrnam_r() returns ERANGE for some systems On our system the expected buffer length is 256. This means calls to getpwnam_r()/getgrnam_r() return ERANGE error and corosync fails to startup. These 2 functions return ERANGE when insufficient buffer space is supplied. Judging by the man page for getpwnam_r, the correct way to determine the buffersize on any given system is to use sysconf(). Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-07-06 06:31:50 -07:00
Jiaju Zhang	5dc33c2824	RRP: redundant ring automatic recovery This patch automatically recovers redundant ring failures. Please note that this patch introduced rrp_autorecovery_check_timeout in totem config hence breaks internal ABI. The internal ABI users of totem.h need to rebuild their binaries. Signed-off-by: Jiaju Zhang <jjzhang@suse.de> Signed-off-by: Steven Dake <sdake@redhat.com> Tested-by: Jan Friesse <jfriesse@redhat.com> Tested-by: Florian Haas <florian.haas@linbit.com> Tested-by: Jiaju Zhang <jjzhang@suse.de>	2011-07-05 09:13:48 -07:00
Jan Friesse	8c717c22b2	Remove spinlocks Spinlocks are now removed, because even spinlock can improve speed is some special cases, in most cases it makes corosync CPU usage much more intensive and less responsive then if only mutexes are used. What we were doing is: pthread_mutex_lock pthread_spin_lock pthread_spin_unlock pthread_mutex_unlock what is not safe. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-06-29 12:01:54 +02:00
Jerome Flesch	00434a4f10	Fix usage of strerror_r()/perror() Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-06-28 09:56:58 +02:00

1 2 3 4 5 ...

1387 Commits