When node is paused and other nodes has in meantime exited cpg process,
paused node after resume doesn't update it's membership correctly so on
previously paused node exited cpg process is still visible.
Solution is to compare join list with cpd and remove all pids which are
not included in join list.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Also number is prefixed by 0x so it's easier to spot that number is
hexadecimal.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Most of functionality is moved to do_proc_leave function to make it
reusable.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Code for zero-copy in cpg does following mmaps:
- Mmap anonymous, private memory to some address (-> malloc)
- Mmap shared memory of fd to address returned by first mmap
(effectively shadows first mapping)
This is not necessary and only one mapping is needed.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Messages which are flow messages, rather then lifecycle are now logged
in trace level.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
drop all SOLARIS specific ifdefs and replace them with feature checks
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
IPC is using buffer of CS_MAX_NAME_LENGTH for name. If user calls
function with longer string, such string can be passed to service
incomplete.
Solution is to not allow string larger then CS_MAX_NAME_LENGTH
and return error.
Same applies to cpg service.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Let's say we have 2 nodes:
- node 2 is paused
- node 1 create membership (one node)
- node 2 is unpaused
Result is that node 1 downlist is selected, so it means that from node 2
point of view, node 1 was never down.
Patch solves situation by adding additional check for largest previous
membership.
So current tests are:
1) largest (previous #nodes - #nodes know to have left)
2) (then) largest previous membership
3) (and last as a tie-breaker) node with smallest nodeid
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
In downlist and joinlist debug output group was printed in nonsense
format of integer to pointer to array.
Now it's printed by full name.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
let's say following situation will happen:
- we have 3 nodes
- on wire messages looks like D1,J1,D2,J2,D3,J3 (D is downlist, J is
joinlist)
- let's say, D1 and D3 contains node 2
- it means that J2 is applied, but right after that, D1 (or D3) is
applied what means, node 2 is again considered down
It's solved by collecting joinlists and apply them after downlist, so
order is:
- apply best matching downlist
- apply all joinlists
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Test scenario is follows:
- node 1, node 2
- node 1 is paused
- node 2 sees node 1 dead
- node 1 unpaused
- node 1 and 2 both choose same dowlist message which includes node 2 ->
node 2 is efectivelly disconnected
Patch includes additional test if left_node is localnode. If so, such
downlist is ignored.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
this change breaks onwire compatibility.
cpg is the only user of sync_* interface and it's the only
service that will require extra testing.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
We would use libqb for hashing now if we needed hashing.
cpg no longer uses jhash.h.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>
This patch fixes the issue that in some cases where cpg_finalize()
was called just after cpg_leave() was called, CPG_REASON_PROCDOWN
might also be sent while CPG_REASON_LEAVE had already been sent.
This behavior is not aligned with what the man page has described:
"CPG_REASON_PROCDOWN - the process left a group without calling
cpg_leave()."
And it will confuse CPG's clients in that one process left results
in two different reasons being sent.
The root cause of this issue is cpg_leave() will return after
adding the LEAVE message to the sending queue, but the cpg's group
name has not been cleared yet. Just at that time, cpg_finalize()
is being called, then it determines if there is the calling of
cpg_leave() happened only by the checking of cpg's group name, so
this method is not sufficient.
Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Reviewed-by: Steven Dake <sdake@redhat.com>
exec_init_fn now either returns NULL (success) or a string which indicates
the error that occured during service engine initialization. If an error
occurs, corosync will exit. This patch adds ykd and makes other suggestions
from Fabio Di Nitto.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>
These look ugly, are inconsistently done and just have
to be removed later in libqb before calling syslog.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Quorum is broken in this patch.
service.h needs to be cleaned up significantly
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>
This call causes a complete list of active groups and their
membership lists to be sent to a callback function.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1571 fd59a12c-fef9-0310-b244-a6a79926bd2f
- rests of old logging removed from all code (#define LOG_SERVICE...).
- line feed added if not in message.
- new trace() function added so that trace macros adds minimum of code and runtime penalties to user code.
- ENTER_ARGS macro changed to ENTER. ENTER macro now requires arguments.
- openais.conf.5 man page updated with logger directives.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1021 fd59a12c-fef9-0310-b244-a6a79926bd2f
list went several went down.
Also, replaced totemip_equal() calls with nodeid comparisons as CPG works
entirely on nodeIDs anyway.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@1016 fd59a12c-fef9-0310-b244-a6a79926bd2f