mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2026-01-20 02:30:25 +00:00

Author	SHA1	Message	Date
Tim Beale	08f07be323	A CPG client can sometimes lockup if the local node is in the downlist In a 10-node cluster where all nodes are booting up and starting corosync at the same time, sometimes during this process corosync detects a node as leaving and rejoining the cluster. Occasionally the downlist that gets picked contains the local node. When the local node sends leave events for the downlist (including itself), it sets its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This means it no longer sends CPG events to the CPG client. Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-08-18 14:57:15 +02:00
Tim Beale	5a724a9c39	Add code comment mapping for message handler defines As a corosync-newbie it can be hard to bridge the gap between where a particular message is sent and where the receive handler processes it, and vice versa. Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-17 11:52:25 +10:00
Angus Salkeld	37e17e7a94	libqb: logging & trace Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	a716f13bf9	Fix some compiler warnings Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	bd150728bf	libqb: Improve IPC dispatch and async handling Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	4dffef53fd	CPG: downgrade some log messages Reviewed-by: Steven Dake <sdake@redhat.com> Signed-off-by: Angus Salkeld <asalkeld@redhat.com>	2011-08-09 10:37:16 +10:00
Angus Salkeld	4614c91fef	libqb: fix valgring warnings in mon/wd Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:15 +10:00
Angus Salkeld	f717bc60e1	libqb: make timer api a wrapper around qb_loop timers. - change timeout value to nano seconds - fix timer handles (don't alloc on stack) Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:14 +10:00
Angus Salkeld	c6895faa05	libqb: change ipc -> qb_ipc IPC: return 0/-ENOBUFS from message handler IPC: use the new rate_limit API to improve perf. CPG: add send_async API & hook up flow control IPC: Fix flow control getting stuck. IPC: Port the remaining libs to use libqb IPC IPC: remove libqb flowcontrol API TEST: put cpg_dispatch() in it's own thread IPC: cleanup ipc_glue.c name everything cs_ipcs_() IPC: add back statistics IPC: remove coroipcc_ symbols from lib.versions IPC: init each se's IPC as it is loaded. IPC: use the new connection_closed() event to free the context. IPC: re-add zero copy functionality back IPC: remove cpg_mcast_joined_async() and make it the default -> now cpg_mcast_joined() == cpg_mcast_joined_async() libqb: expose a libqb error converter libqb: add missing error conversions libqb: remove repeat try loop in lib/cpg.c CPG: fix zero copy mcast CPG: use newer return codes Add ENOTCONN to qb_to_cs_error() libqb: fix error conversion from errno to cs_error_t in confdb libqb: change errno_to_cs to qb_to_cs_error libqb: add a cs_strerror() to get a more meaningful message libqb: fix some confusing error conversions. libqb: set the timeout on recv's to -1 (wait forever) Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:14 +10:00
Angus Salkeld	fce8a3c3b6	libqb: convert coropoll calls to qb_loop calls. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-08-09 10:37:14 +10:00
Steven Dake	c544e87bb0	Correct missing poll funtions from service handler struct needed for confdb APIs Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-07-15 13:30:41 -07:00
Jan Friesse	5458d4f27a	votequorum: free newly allocated node if nodeid==0 Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-06-29 11:59:57 +02:00
Jan Friesse	b5d2f4578a	confdb: Resolve dispatch deadlock Following situation could happen: - one thread is waiting for finish write operation (line 853), objdb is locked - flush (done in objdb_notify_dispatch) is called in main thread, but this call will never appear because main thread is waiting for objdb lock. In this situation deadlock appears. Commit solves this by: - setting pipe to non-blocking mode - pipe is used only as trigger for coropoll - dispatch messages are stored in list - main thread is processing messages from list Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-06-22 11:20:55 +02:00
Jan Friesse	9afb4bdaa8	confdb: Properly check result of object_find_create in confdb_object_iter result of object_find_create is now properly checked. object_find_create can return -1 if object doesn't exists. Without this check, incorrect handle (memory garbage) was directly passed to object_find_next. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-06-10 12:33:07 +02:00
Jan Friesse	f95d3b3bf2	cpg: do_proc_join change list_slice to list_add In this concrete case result is equivalent but makes coverity happy. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-06-03 11:10:08 +02:00
Angus Salkeld	956a1dcb42	cpg: fix sync master selection when one node paused. If one node is paused it can miss a config change and thus report a larger old_members than expected. The solution is to use the left_nodes field. Master selection used to be "choose node with": 1) largest previous membership 2) (then as a tie-breaker) node with smallest nodeid New selection: 1) largest (previous #nodes - #nodes know to have left) 2) (then as a tie-breaker) node with smallest nodeid Signed-off-by: Angus Salkeld <asalkeld@redhat.com>	2011-05-05 21:39:30 +10:00
Tim Serong	5b92829d6c	Add ipc_refcnt to message_handler_req_{exec, lib}_cfg_ringreenable() Without refcounting the conn pointer here, corosync will segfault if one kills a running instance of "corosync-cfgtool -r" (rhbz#695191) Signed-off-by: Tim Serong <tserong@novell.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-04-14 20:14:12 -07:00
Angus Salkeld	265661745d	Fix shutdown when a confdb client is still connected If you are connected to corosync and registered for object notifications then corosync is asked to shutdown the IPC server will get stuck. This is because the pipe is closed and the refcount is increased. This leaves ipcs with a connection that it can't destroy. Solution: 1) if a write to the pipe fails (pipe closed) decrement the refcounter. 2) fix the object_track_stop() - it was not working as the functions did not match up. (this caused the late callbacks). 3) in ipcs call exit_fn() then stats_destroy_connection() so that the service engine can have time to call object_track_stop() before the object gets destroyed. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-29 13:48:20 +11:00
Jan Friesse	033f7ced10	cfg_get_node_addrs: Return correct addresses Zero element array behavior is very different from normal array or pointer. This behavior is root of problem in not returning correctly filled array of addresses. This appeared only in rrp mode, where more then one address is returned. All memcpy's are now correctly converted to copy pointer to char. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-24 17:42:08 +01:00
Angus Salkeld	75087f7c1b	confdb: send notifications from the main thread not IPC thread corosync-notifyd has exposed an issue with confdb notifications. The normal state of affairs is: IPC thread > lock > objdb > lock objdb notification whilst really useful turn things around: <middle of big call chain> objdb > lock > confdb > ipc > lock This reverse ordering of locks causes a horrible dead lock. I see this patch as a work around until corosync-2.0 when most of the threads and locking disappear. This patch adds a pipe to confdb service. When we get a objdb notification a struct gets written to the pipe. The poll loop then runs the dispatch in the main thread. In the dispatch we call the real ipc_dispatch_send(). Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-24 07:54:42 +11:00
Angus Salkeld	0ad2494ae7	Fix some "set but not used" warnings [-Wunused-but-set-variable] Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-16 07:13:42 +11:00
Angus Salkeld	e1a6b2ccfb	CONFDB: fix parent_get response id Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Seven Dake <sdake@redhat.com>	2011-02-08 08:10:20 +11:00
Angus Salkeld	89e4c1c048	CONFDB: add confdb_object_name_get() This is useful when tracking object changes. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Seven Dake <sdake@redhat.com>	2011-02-04 09:47:15 -07:00
Angus Salkeld	6f098bba1d	fix timersub warning on freebsd Make them all protected by #ifndef timersub Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-01-12 09:42:24 +11:00
Angus Salkeld	83b24b660b	WD/SAM integration. - timestamps -> uint64_t and in nanosecs - use clock_gettime - common object naming - common state names - timeouts in milliseconds git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3054 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-27 21:13:15 +00:00
Angus Salkeld	07d06c0c0f	Add monitoring and watchdog services. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3053 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-27 21:12:03 +00:00
Angus Salkeld	397e648080	objdb: fix some strange types (uint8_t* -> void*). git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3045 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-25 06:48:24 +00:00
Angus Salkeld	2ab786f3d1	CPG: remove irratating log "downlist received left_list:" git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3043 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-25 06:46:34 +00:00
Steven Dake	4ac55e52e4	Patch from Kacper Kowalik to support honoring user defined LDFLAGS. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3042 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-14 18:10:12 +00:00
Steven Dake	e94b3dd811	Patch from Honza: Send CPG_REASON_PROCDOWN on process left Our manual pages are clear: CPG_REASON_PROCDOWN - the process left a group without calling cpg_leave(). Currently, we are sending CPG_REASON_LEAVE in such situation. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2946 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-06-15 19:35:32 +00:00
Steven Dake	3b457d30c7	Fix problem where callbacks are not delivered to evs service. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2916 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-06-01 15:36:08 +00:00
Steven Dake	0e9f0bfeb4	Make cpg_membership_get() functional. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2855 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-19 05:03:52 +00:00
Angus Salkeld	18a1ea648b	Fix compile error in services/cfg.c git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2843 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-16 22:23:25 +00:00
Steven Dake	ed7b299290	Merge patch from Sato Yuki which fixes corosync-cfgtool -r git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2831 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-14 08:01:03 +00:00
Angus Salkeld	562616c79d	cpg: fix unitialized variable git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2814 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-12 09:25:58 +00:00
Angus Salkeld	8f430ecc8a	cpg: fix sync'ing the downlist. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2801 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-05-04 04:42:40 +00:00
Angus Salkeld	64fb3000f3	select a new sync member if the node with the lowest nodeid has left. Problem: Under certain circumstances cpg does not send group leave messages. With a big token timeout (tested with token == 5min). 1 start all nodes 2 start ./test/testcpg on all nodes 2 go to the node with the lowest nodeid 3 ifconfig <int> down && killall -9 corosync && /etc/init.d/corosync restart && ./testcpg 4 the other nodes will not get the cpg leave event 5 testcpg reports an extra cpg group (basically one was not removed) Solution: If a member gets removed using the new trans_list and that member is the node used for syncing (lowest nodeid) then the next lowest node needs to be chosen for syncing. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2785 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-04-22 22:20:09 +00:00
Jan Friesse	e8b143595c	CPG model_initialize and ringid + members callback Patch adds new function to initialize cpg, cpg_model_initialize. Model is set of callbacks. With this function, future addions of models should be possible without changing the ABI. Patch also contains callback in CPG_MODEL_V1 for notification about Totem membership changes. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2770 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-04-20 12:40:48 +00:00
Angus Salkeld	9a862803aa	Fix code coverage with lcrso's git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2729 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-03-24 22:14:25 +00:00
Christine Caulfield	1baa7b2ab3	Add a reload callback to libconfdb. This also increments the libconfdb version to 4.1.0 git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2683 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-03-16 09:51:30 +00:00
Angus Salkeld	1e17751d0d	Remove warnings. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2682 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-03-11 00:27:04 +00:00
Jan Friesse	009dfc090e	Support for lib_cpg_finalize Add support for MESSAGE_REQ_CPG_FINALIZE message. This will allow us remove cpg_pd from list of active connections, and remove problem, when cpg_finalize + cpg_initialize + cpg_join can result in CPG_ERR_EXIST error. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2676 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-03-04 12:17:47 +00:00
Jan Friesse	7e8da9a6fc	Cpg join with undelivered leave message Patch handles situation, when on one node, one process: - join cpg - do same actions - leave cpg - join cpg again Following sequence can (racy) end with broken process_info list. To solve this problem, one more check is done in message_handler_req_lib_cpg_join so if process_info with same pid and group as new join request exists, CPG_ERR_TRY_AGAIN is returned. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2675 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-03-04 12:12:24 +00:00
Angus Salkeld	ec09a97867	Fix some "make lint" problems git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2674 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-03-03 21:52:08 +00:00
Christine Caulfield	a22f051d04	Remove a double list_del() when a tracking CFG client shuts down without calling cfg_track_stop. This caused corosync to crash. The extra list_empty() check is redundant too because it also happens in remove_ci_from_shutdown() git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2655 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-02-12 07:46:02 +00:00
Angus Salkeld	c6beee076a	pass transitional members into the sync_init() callbacks. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2653 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-02-04 00:18:51 +00:00
Angus Salkeld	5f17683107	COVERITY 18: prevent deref after free. Event deref_after_free: Dereferencing freed pointer "pi". git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2543 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-11-22 06:22:49 +00:00
Angus Salkeld	73b7aa19bb	Add value types to objdb keys. This allows you to create a key with a know type. And then get the type with the key value. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2511 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-10-10 03:20:38 +00:00
Christine Caulfield	2433ee3b2c	This patche fixes a couple of small problems with votequorum: - if a single node is booted with votequorum loaded then corosync-quorumtool shows zero nodes and no votes. - votequorum doesn't always tell the main quorum module when a new node has joined the cluster (principally itself. this bug is actually tied into the above) I've also added quorum to the default list of services. As quorum has been decoupled from sync it will not interfere with normal operations as it used to do and it makes more sense to have it there than not. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2510 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-10-06 12:57:35 +00:00
Steven Dake	9b56e33ee8	Remove pointless warning. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2478 fd59a12c-fef9-0310-b244-a6a79926bd2f	2009-09-25 06:01:35 +00:00

1 2 3 4

176 Commits