mirror_corosync/test
Fabio M. Di Nitto cb5fd77501 votequorum: major rework to fix qdevice API and integration with core
qdevice is a very special node in the cluster and it adds a certain
amount of complexity and special cases across the code.

most of the qdevice data are shared across the cluster (name/votes)
but effectively each node has a different view of the qdevice
(registered/unregistered/voting/etc.)

with this change, we align the qdevice view across the node,
exchanging more data between nodes and we fix how qdevice behaves
and it is configured.

The only side effect is that the amount of data transmitted on wire
is slightly higher.

The qdevice API is still disabled by default. This means that
the amount of real changes in current code are a lot smaller
than it appears by this patch.

TODO: documentation/man pages needs to be updated once
      this change is in (and behavior finalized).

User visible changes:

- configuration (coroparse, exec/votequorum):
  the quorum device section is now standalone within the quorum.

  quorum {
    provider: corosync_votequorum
    device {
      model: (name)
      timeout: (millisec)
      votes:
    }
  }

  the keyword "model:" is mandatory to enable qdevice in configuration
  and should express the name of the script/daemon that will provide
  the qdevice. Looking into the future, an init script or systemd
  service will look for that name in /path/to/be/decided/name
  and start/stop qdevice.

  timeout: defines the maximum interval the qdevice implementation
  has available between poll (see votequorum_qdevice_poll.3) before
  the device is considered dead and votes discarded

  votes: is now a configuration parameter and not an API call.
  quorum devices don't care what they need to vote.
  votes is autocalculated when a nodelist is available and all
  nodes in the list vote 1. Otherwise this parameter is mandatory.

- configuration (exec/votequorum):
  startup and runtime configuration changes have been improved.
  errors at startup are considered fatal. errors at runtime
  have different exit paths.

  startup:

  * quorum.two_node and qdevice are incompatible.
  * quorum.expected_votes requires quorum.device.votes.
  * quorum.expected_votes - quorum.device.votes cannot be lower
    than 2.
  * qdevice and last_man_standing are mutually exclusive.
  * qdevice and auto_tie_breaker are mutually exclusive.

  runtime config changes:

  * quorum.two_node and qdevice are incompatible:
    if quorum device is alive, two_node is disabled.
    if quorum device is not alive and node count is 2, two_node is
       enabled, and quorum device cannot be registered

  * if either last_man_standing or auto_tie_breaker were enabled
    at startup, and at runtime quorum device is configured,
    quorum device registration will be blocked.

  * if quorum.expected_votes is configured but not quorum.device.votes,
    quorum device registration will be blocked.

  * if quorum.device.votes is not configured and we cannot
    automatically calculate it, quorum device registration will be blocked.

  * An error in configuring quorum.expected_votes and quorum.device.votes
    will block quorum device registration.

blocking quorum device registation, also means dropping the votes.

quorum.device.votes (either set or automatically calculated) is now
used to determine current expected_votes in the cluster.

- logging (exec/votequorum):

  all errors from configuration are treated as WARNING/CRITICAL.

  lots of extra DEBUG output is added (see internal changes too).

- corosync-quorumtool (tools/corosync-quorumtool):

  * added option to forcefully kick out a quorum device from the local
    node. This is for emergency recovery only and it is only
    available when qdevice API is built-in.

  * Improved status output, specifically add node state and qdevice
    information

[root@fedora-master-node2 coro]# corosync-quorumtool -s
Version:          1.99.4.12-9c7d-dirty
Quorum type:      corosync_votequorum
Nodes:            2
Ring ID:          132
Quorate:          Yes
Node votes:       1
Node state:       Member
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice
Nodeid     Votes  Name
   1     1  fedora-master-node1.int.fabbione.net
   2     1  fedora-master-node2.int.fabbione.net
   0     1  QDEVICE (Voting)

  * allow to print status for any node in the cluster known to
    local node.

[root@fedora-master-node1 coro]# corosync-quorumtool -s
Version:          1.99.4.12-9c7d-dirty
Quorum type:      corosync_votequorum
Nodes:            2
Ring ID:          144
Quorate:          Yes
Node votes:       1
Node state:       Member
Expected votes:   3
Highest expected: 3
Total votes:      2
Quorum:           2
Flags:            Quorate
Nodeid     Votes  Name
   1     1  fedora-master-node1.int.fabbione.net
   2     1  fedora-master-node2.int.fabbione.net

[root@fedora-master-node1 coro]# corosync-quorumtool -s -n 2
Version:          1.99.4.12-9c7d-dirty
Quorum type:      corosync_votequorum
Nodes:            2
Ring ID:          144
Quorate:          Yes
Node votes:       1
Node state:       Member
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice
Nodeid     Votes  Name
   1     1  fedora-master-node1.int.fabbione.net
   2     1  fedora-master-node2.int.fabbione.net
         0     1  QDEVICE (Voting)

Internal changes:

- change qdevice timer to not run all time, but only when necessary.
- change votequorum_nodeinfo on wire data to use flags instead of uint8_t
  and add QDEVICE status.
- allocate nodeid 0 to qdevice since it's the only real
  nodeid that be reserved.
- change send_nodeinfo to allow to send nodeinfo for any node
  so that we can share qdevice info across the cluster
  (and this might be useful in future if we need to sync
   internal cluster view).
- add votequorum api call to update qdevice name
- add runtime data if quorum device has been forcefully disabled
  by config error
- add qdevice votes to expected_votes calculation (this
  is probably the biggest difference vs cman)
- change votequorum_read_nodelist_configuration so that
  we can autocalculate votes for qdevice (we need the nodecount
  vs votes).
- add all checks for startup/runtime config (see above).
- do not make qdevice part of the membership_list received from
  totem. None of our users care about it and it is not a real node.
- change onwire message handlers to deal with "data for this node from any node"
  case and undersand nodeid 0 for qdevice info
- always allocate qdevice at startup. this simplifies code a lot.
- dispatch qdevice nodeinfo on membership changes.
- inform libvotequorum users when a qdevice is registered
- improve substantially qdevice api and add a simple
  barrier based on qdevice name.
- add qdevice API barrier at cluster level. This feature allow
  only one qdevice name to be active in the cluster at any time.
- qdevice getinfo can now report status for qdevice on any node.
- change slightly the way the qdevice API is built-in/out:
  only the libvotequorum calls are #ifdef'out now. Doing so in
  the core is too complex and would make the code unreadable
  with the risk of missing a bit or two effectively introducing
  an on-wire incompatibility if we will ever turn the API on.
- probably added some bugs on the way...

TODO: update qdevice_* API once the above is settled and test
      qdevice integration with other features.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com> (only second part)
2012-02-27 09:30:26 +01:00
..
scripts Split openais and corosync tree into two seperate repositories. 2008-08-05 13:23:46 +00:00
.gitignore Remove objdb and confdb 2011-12-15 09:19:18 +01:00
cpgbench.c corotype: drop deprecated CPG_ defines 2012-02-08 13:37:46 +01:00
cpgbenchzc.c TEST: fix the print out when cpg_finalize() fails 2011-08-09 10:37:15 +10:00
cpgbound.c Add test program that finds limits of cpg message size. 2009-08-17 18:22:51 +00:00
cpgverify.c Add -i <num-iterations> to cpgverify 2010-10-21 17:27:40 -07:00
evsbench.c Add ring id field to evs. 2009-07-01 20:57:37 +00:00
evsverify.c Add ring id field to evs. 2009-07-01 20:57:37 +00:00
Makefile.am build: fix fallout from swithing to common shared lib 2012-02-22 09:43:47 +01:00
stress_cpgcontext.c add stress_cpgcontext. 2009-07-10 12:54:56 +00:00
stress_cpgfdget.c Remove unused variable in stress_cpgfdget.c. 2009-09-25 06:03:50 +00:00
stress_cpgzc.c Add stress test case for cpg and coroipcc zero copy buffer alloc and free 2009-07-10 12:05:48 +00:00
testcpg2.c Remove unchecked return warning 2011-11-26 08:50:25 -07:00
testcpg.c TEST: add logging to testcpg and testevs 2012-02-14 11:10:14 +11:00
testcpgzc.c testcpgzc: fgets buffer to really allocated size 2011-06-03 11:11:28 +02:00
testevs.c TEST: add logging to testcpg and testevs 2012-02-14 11:10:14 +11:00
testevsth.c corotypes: drop deprecated EVS_ defines 2012-02-08 13:37:46 +01:00
testmake.sh remove all trailing blanks 2009-04-22 08:03:55 +00:00
testparse.c remove all trailing blanks 2009-04-22 08:03:55 +00:00
testquorum.c testquorum: check for quorum_dispatch return code 2012-02-09 16:49:25 +01:00
testsam.c Move SAM to use CMAP service 2011-12-15 09:19:18 +01:00
testtimer.c remove all trailing blanks 2009-04-22 08:03:55 +00:00
testvotequorum1.c testvotequorum: fix test loop to break if votequorum goes away 2012-02-09 16:49:25 +01:00
testvotequorum2.c votequorum: major rework to fix qdevice API and integration with core 2012-02-27 09:30:26 +01:00
testzcgc.c add __attribute__((noreturn)) to functions that always exit. 2010-05-19 04:34:53 +00:00