Commit Graph

3016 Commits

Author SHA1 Message Date
Steven Dake
c6c76cdf82 Remove dead code in sam test agent
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-02-07 08:42:58 -07:00
Fabio M. Di Nitto
cff57430d6 votequorum: fix quorum_ringid setting before any delivery occours
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-02-07 14:07:09 +01:00
Angus Salkeld
db70e14fcd Make sure ipc functions return CS_ERR_TRY_AGAIN and not CS_ERR_TIMEOUT
This is because most applications that use corosync do not test
for TIMEOUT but only for TRY_AGAIN.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-and-Tested-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-02-07 20:21:08 +11:00
Angus Salkeld
ac498ca97a Remove deprecated function qb_util_set_log_function()
Use the standard qb_log api.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-07 10:53:56 +11:00
Angus Salkeld
8992acb815 LOG: add libqb as a "subsys"
So we can see libqb internal logs

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-07 10:53:56 +11:00
Jan Friesse
546aea23cf cmap: Check RO flag in adjust int function
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-06 16:37:00 +01:00
Jan Friesse
c21f9573ce CMAP man pages
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-06 16:36:57 +01:00
Jiaju Zhang
dd9e177af7 CPG: Send CPG_REASON_PROCDOWN when really needed
This patch fixes the issue that in some cases where cpg_finalize()
was called just after cpg_leave() was called, CPG_REASON_PROCDOWN
might also be sent while CPG_REASON_LEAVE had already been sent.
This behavior is not aligned with what the man page has described:
"CPG_REASON_PROCDOWN - the process left a group without calling
cpg_leave()."
And it will confuse CPG's clients in that one process left results
in two different reasons being sent.

The root cause of this issue is cpg_leave() will return after
adding the LEAVE message to the sending queue, but the cpg's group
name has not been cleared yet. Just at that time, cpg_finalize()
is being called, then it determines if there is the calling of
cpg_leave() happened only by the checking of cpg's group name, so
this method is not sufficient.

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-06 08:07:54 -07:00
Fabio M. Di Nitto
9fa86486e9 quorumtool: fix return codes for show_status and monitor
correct return codes should be:
 1 if node is quorate
 0 if node is not quorate
-1 if there is any error gather info on the node

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-02-03 10:33:33 +01:00
Fabio M. Di Nitto
261dc6219f quorumtools: fix nodes display on status
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-02-03 10:33:33 +01:00
Fabio M. Di Nitto
3b77dd9d83 votequorum: fix expected votes manual override from quorumtools
votequorum internal quorum/expected_vote check was slightly too
conservative and was not done correctly when leave_remove feature
is enabled.

this fix allows admins to effectively override expected_votes
and drive ev_barrier as expected.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-02-03 10:33:33 +01:00
Jan Friesse
0929dcb68c Better checks of integer values in coroparse
Instead of atoi, strtol is used. This allows detection of typical
problems like empty value of key and incorrectly entered numbers.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-03 09:16:43 +01:00
Fabio M. Di Nitto
230231fedb votequorum: add runtime internal data to icmap runtime.votequorum.*
specifically ev_barrier, two_node, lowest_node_id and wait_for_all_status
are values that change internally at runtime and keeping track
of those can make debugging rather easy, specially when LOG_DEBUG is not
set.

Also track our node id.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-By: Christine Caulfield <ccaulfie@redhat.com>
2012-02-02 16:36:57 +01:00
Jan Friesse
8e58459176 Wait for corosync-notifyd exit in init script
Without wait for real exit of corosync-notifyd it can happen, that new
corosync-notifyd is killed. To prevent such condition, stop now wait for
process to die, before exit of stop function.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-02 09:30:49 +01:00
Jan Friesse
33e5ce8d56 Show correct error when open of logfile failed
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-02 09:30:49 +01:00
Jan Friesse
a80febda7e Store error str if can't open logfile
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-02 09:30:49 +01:00
Angus Salkeld
af9cfc7b55 IPC: reference count the connection whilst flushing the outq
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-02-02 11:34:26 +11:00
Fabio M. Di Nitto
ab9986cb96 build: fix make dist and make rpm
do some cleanup around to include all files that need to be shipped
and honor conditional builds properly

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2012-02-01 11:24:56 +01:00
Angus Salkeld
45cb05f1ad IPC: allow for failures in the connection_created callback
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-02-01 08:51:13 +11:00
Fabio M. Di Nitto
46b7b155a4 votequorum: add leave_remove option
this also cleanup NODESTATE for good. JOINING was never used

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-01-31 16:58:08 +01:00
Fabio M. Di Nitto
c16086bead votequorum: honor onwire node flags change
internal flags were not propagated correctly in the node status

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-01-31 10:20:32 +01:00
Fabio M. Di Nitto
9fa83dabbe quorum: fix load/unload priority for quorum services
all main services are loaded at priority 1.
vfs_quorum and votequorum did not specify a priority and
automatically defaulting to 0, that has a special meaning
of being loaded last and unloaded last.

this is not correct behavior and limits what votequorum
can do at shutdown, for example notify other nodes that
it is leaving (something that cannot be gathered by
totem membership change callback).

fix vsf_quorum to load at priority 1 as the other
default services and bump votequorum to 2 (needs to
unload before everything else currently known).

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-01-31 10:16:52 +01:00
Fabio M. Di Nitto
a2b960d109 service: fix service unload regression introduced by lcrso dropping
service exec_exit_fn was not honored because the loop was looking
into the wrong icmap key

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-01-31 10:16:16 +01:00
Fabio M. Di Nitto
811c536653 votequorum: fix possible string overflow (-1) in qdevice_register
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-01-31 10:15:47 +01:00
Fabio M. Di Nitto
fc61b20a8a votequorum: drop unnecessary flags
code inspection shows that those internal flags are never used

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-01-31 10:14:19 +01:00
Angus Salkeld
4c4a5241bc CTS: cleanup the cpg test agent
improve the diagnostic log messages

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-31 11:58:23 +11:00
Angus Salkeld
1a0ce3b4c2 CTS: make the systemd logic more reliable
rely on positive logic as there can be multiple
failure reasons.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2012-01-31 11:58:23 +11:00
Steven Dake
007e5c9458 Honor exec_init_fn call
exec_init_fn now either returns NULL (success) or a string which indicates
the error that occured during service engine initialization.  If an error
occurs, corosync will exit.  This patch adds ykd and makes other suggestions
from Fabio Di Nitto.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>
2012-01-30 14:05:09 -07:00
Steven Dake
ba480ce908 Return an exit code of 1 if an interface is faulty in corosync-cfgtool
Signed-off-by: Oren Nechustan <theoren28@hotmail.com>
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>
2012-01-30 06:30:41 -07:00
Angus Salkeld
2b66a7aa51 cmap: add iterator finalize
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-30 23:07:44 +11:00
Angus Salkeld
4c98780f89 cmap: add -D option to getopt
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-30 23:07:44 +11:00
Fabio M. Di Nitto
ccd36af00e votequorum: rename qdisk to qdevice
a quorum device is not necessarely a disk and this also aligns
various names to be generic

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-By: Christine Caulfield <ccaulfie@redhat.com>
2012-01-27 11:17:02 +01:00
Fabio M. Di Nitto
769fc913f3 quorum: drop quorum.quorate config option
it's unused / unnecessary

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-By: Christine Caulfield <ccaulfie@redhat.com>
2012-01-27 11:16:36 +01:00
Angus Salkeld
b5f643b507 CTS: add test VoteQuorumWaitForAll
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-27 20:59:22 +11:00
Angus Salkeld
1dea860e14 CTS: remove test service config
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-27 20:59:22 +11:00
Angus Salkeld
3177e1b421 augeas: update the lense (rm amf & add update quorum options)
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-27 20:59:22 +11:00
Angus Salkeld
3f20f546f8 CTS: be consistent with the cpg group name
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-27 20:59:22 +11:00
Angus Salkeld
2c242d92b6 CTS: make the status command more accurate
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-27 20:59:22 +11:00
Angus Salkeld
038b77a175 CTS: remove SamTestQuorum as there is not test_quorum anymore
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-27 20:59:22 +11:00
Angus Salkeld
bc9ed0b4be CTS: ignore blackbox shm
(only whilst running as it is still visable)

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-27 20:59:22 +11:00
Angus Salkeld
a8496b1ac5 CTS: delete resourses recursively
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-27 20:59:22 +11:00
Angus Salkeld
a39055648c CTS: init votequorum by default
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-01-27 20:59:22 +11:00
Angus Salkeld
d226614084 CTS: account for change in sam resource path.
This was:
process_name:pid
now:
pid

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-01-27 17:41:21 +11:00
Angus Salkeld
df06e98298 CTS: handle socket exceptions better
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-01-27 17:41:21 +11:00
Angus Salkeld
1331c43075 CTS: fix shell script variable name
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-01-27 17:41:21 +11:00
Fabio M. Di Nitto
5e4c02bd36 update TODO list
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-01-26 14:32:54 +01:00
Fabio M. Di Nitto
b05477859f votequorum: fix expected_votes propagation
it is not correct to randomly accept expected_votes from any node in
the cluster. We can only allow expected_votes from quorate nodes.

A quorate cluster is "always" right and have the correct expected_votes.

One of the different bug triggers:

quorum {
  expected_votes: 8
  auto_tie_breaker: 1
  last_man_standing: 1
}

start all 8 nodes.
clean shut down 2 nodes.
wait for lms to kick in.
kill 3 nodes with highest nodeid
(we want to retain a quorate partition of 3 nodes)
start one node again -> cluster will be unquorate

This happens because the node rebooting/rejoining with
non current cluster status will propagate an expected_votes of 8,
while in reality the cluster is down to expected_votes: 3.

4 nodes are still < 5 (quorum for 8 nodes/votes).

In order to avoid this condition, we need to exchange expected_votes
information among nodes but we cannot randomly trust everybody.

1) Allow expected_votes to be changed cluster-wide only if the
   information is coming from a quorate node.
2) Fix node->expected_votes based on quorate status
3) allow a joining node to decrease quorum and expected_votes
   if the node is not yet quorate, but it's joining a quorate
   cluster

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-01-26 14:32:54 +01:00
Fabio M. Di Nitto
88e6830df1 votequorum: fix auto_tie_breaker design and simplify code a lot
auto_tie_breaker requires to know the lowest node id in the currently
quorate partition and not of the whole cluster.

this allow us to determine the lowest node id as soon as we are quorate
and remove the complexity to read it from WFA or nodelist. Add
the same time it adds the flexibility for dynamic nodeids in a cluster.

drop requirement on WFA if nodelist is not specified

update man page

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-01-26 14:32:54 +01:00
Fabio M. Di Nitto
40aa40ed84 votequorum: drop NODESTATE_LEAVING
this is another leftover from cman compatibility layer

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-01-26 14:32:54 +01:00
Fabio M. Di Nitto
f25d5829f2 update TODO list
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2012-01-25 14:06:27 +01:00