Send CPG_REASON_PROCDOWN on process left
Our manual pages are clear:
CPG_REASON_PROCDOWN - the process left a group without calling
cpg_leave().
Currently, we are sending CPG_REASON_LEAVE in such situation.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2946 fd59a12c-fef9-0310-b244-a6a79926bd2f
Problem:
Under certain circumstances cpg does not send group leave messages.
With a big token timeout (tested with token == 5min).
1 start all nodes
2 start ./test/testcpg on all nodes
2 go to the node with the lowest nodeid
3 ifconfig <int> down && killall -9 corosync && /etc/init.d/corosync restart && ./testcpg
4 the other nodes will not get the cpg leave event
5 testcpg reports an extra cpg group (basically one was not removed)
Solution:
If a member gets removed using the new trans_list and
that member is the node used for syncing (lowest nodeid)
then the next lowest node needs to be chosen for syncing.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2785 fd59a12c-fef9-0310-b244-a6a79926bd2f
Patch adds new function to initialize cpg, cpg_model_initialize. Model
is set of callbacks. With this function, future addions of models
should be possible without changing the ABI.
Patch also contains callback in CPG_MODEL_V1 for notification about
Totem membership changes.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2770 fd59a12c-fef9-0310-b244-a6a79926bd2f
Add support for MESSAGE_REQ_CPG_FINALIZE message. This will allow us
remove cpg_pd from list of active connections, and remove problem, when
cpg_finalize + cpg_initialize + cpg_join can result in CPG_ERR_EXIST
error.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2676 fd59a12c-fef9-0310-b244-a6a79926bd2f
Patch handles situation, when on one node, one process:
- join cpg
- do same actions
- leave cpg
- join cpg again
Following sequence can (racy) end with broken process_info list.
To solve this problem, one more check is done in
message_handler_req_lib_cpg_join so if process_info with same pid and
group as new join request exists, CPG_ERR_TRY_AGAIN is returned.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2675 fd59a12c-fef9-0310-b244-a6a79926bd2f
calling cfg_track_stop. This caused corosync to crash.
The extra list_empty() check is redundant too because it also happens in remove_ci_from_shutdown()
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2655 fd59a12c-fef9-0310-b244-a6a79926bd2f
- if a single node is booted with votequorum loaded then
corosync-quorumtool shows zero nodes and no votes.
- votequorum doesn't always tell the main quorum module when a new node
has joined the cluster (principally itself. this bug is actually tied
into the above)
I've also added quorum to the default list of services. As quorum has
been decoupled from sync it will not interfere with normal operations as
it used to do and it makes more sense to have it there than not.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2510 fd59a12c-fef9-0310-b244-a6a79926bd2f
This functions allows iterate available cpg groups
and their members. API is modelled like ckpt iteration
functions.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2399 fd59a12c-fef9-0310-b244-a6a79926bd2f
The root of the theoretical problem is that cpg_join or cpg_leave
messages are being sent via the C apis between synchronization. With
the current cpg, synchronization happens in confchg_fn, and then later
in cpg_sync_process. cpg_sync_process is called much later after
confchg_fn and introduces a small probability of a window of time for
queued in totem (but not yet ordered by totem) for those cpg_join and
cpg_leave operations to interact with the synchronization process which
should happen in one atomic operation but currently is two distinct
operations.
This patch deletes confchg_fn and make sends joinlist/downlist
in cpg_sync_process.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2365 fd59a12c-fef9-0310-b244-a6a79926bd2f
This patch allows only one connection per (node, pid, grp_name) tuple.
This means, you cannot make more connection from one process to same
group_name. This is (I hope) how cpg should behave. In case, you will
try to do that, CPG_ERR_EXISTS error is returned.
Of course, there is no problem with creating:
- more connection with same (pid, grp) if nodeid is different
- more connection with same (node, grp) if pid is different (for example
after fork, or two distinct processes)
- more connection with same (node, pid) if grp is different (connect
one process to more cpgs).
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2364 fd59a12c-fef9-0310-b244-a6a79926bd2f
This patch fixes situation, when in the middle of
sync some node will send regular message before
another node will receive confch message, and regular
message is delivered to application. From application
point of view, this node is unknown -> don't expect
any messages.
Now, no such messages are delivered to application.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2332 fd59a12c-fef9-0310-b244-a6a79926bd2f
This is needed as the objdb order will change as modules are loaded/unloaded and is
also set up to unload non-default services last (which is the opposite of what
something like Pacemaker needs).
In the worst case, the current behavior leads to cluster services (dlm, ocfs2, etc)
failing during shutdown. This patch also ensures that if, for example, cpg is unloaded
then anything that depends on it is unloaded first.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2224 fd59a12c-fef9-0310-b244-a6a79926bd2f
This could probably be more tidy to detect those OS platforms which don't do this instead of hardcoding
to a specific platform we intend to port to.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2221 fd59a12c-fef9-0310-b244-a6a79926bd2f
which is already implicitly zero.
It also adds VOTEQUORUM_NODEID_QDEVICE and makes the code that checks
for them more generic. This now allows you to change the number of votes
assigned to a quorum disk (for example)
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2199 fd59a12c-fef9-0310-b244-a6a79926bd2f
send out a JOIN message with our node removed. This should
speed up the case where a lot of nodes leave at the same time as
they don't need to wait for the token timeout for each node.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2187 fd59a12c-fef9-0310-b244-a6a79926bd2f