openindiana toolchain is rather messy. This is the first cut only
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
votequorum has no business to device if master_wins setting is correct or not.
only the qdevice can decide and should set the value for votequorum.
Logic is:
- user requests master_wins from config
- corosync starts
- qdevice starts
- qdevice reads cmap values / register with votequorum
- qdevice decides if the node can support master_wins or not and tells votequorum
- at this point votequorum can check if an unquorate node is part of the master_wins
partition
it is the qdevice responsibility to keep that value up to date in votequorum and the
value can be changed at runtime.
this commit also exchange per node master_wins information to lay down the infrastructure
to verify discrepancies in node config for master_wins (coming next on this channel).
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
it's really pointless to have basically a duplicated API call
to transfer one value and one name.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
those are all info flags.. it's redudant and inconsistent
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
also align naming of vote to cast_vote for info calls
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
this is a preparation commit for the next changes. right now it is
no more than an alias to ALIVE.
CAST_VOTE is required to support master/slave feature from qdevice.
Effectively a quorum device can be:
Not registered / registered (connected to API but nothing else is happening)
if registered:
Not alive / alive (quorum device is petting the API via poll and timer is running)
if alive:
Not voting (slave) / voting (master)
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
STATE is confusing and overloaded term in votequorum as it's used for nodes
and other bits.
make the name unique and ALIVE means that the qdevice is heartbeating
to votequorum.
improve display of the status in tools and tests.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
make the flag name explicit
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Sync/service was using maximal number of services in ehter numberic form
(magic constant) or inconsistently, this means using
SERVICE_HANDLER_MAXIMUM_COUNT which means maximal number of handlers.
New macro solves this.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Currently 14 are used, 64 seems like a waste of memory.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
clean up a lot of allocated blocks at exit.
those changes has no runtime effects, but it makes valgrind
output a bit more useful by dropping over 700 errors/warnings to skip
over every single run.
there are still a few icmap related valgrind errors but those need
some more complex and timeconsuming investigation.
pre patch:
==21844== HEAP SUMMARY:
==21844== in use at exit: 1,229,321 bytes in 1,516 blocks
==21844== total heap usage: 7,191 allocs, 5,675 frees, 3,819,853 bytes allocated
==21844== LEAK SUMMARY:
==21844== definitely lost: 3,617 bytes in 11 blocks
==21844== indirectly lost: 21,960 bytes in 11 blocks
==21844== possibly lost: 1,080,101 bytes in 131 blocks
==21844== still reachable: 123,643 bytes in 1,363 blocks
==21844== suppressed: 0 bytes in 0 blocks
==21844== ERROR SUMMARY: 136 errors from 136 contexts (suppressed: 0 from 0)
post patch:
==25793== HEAP SUMMARY:
==25793== in use at exit: 1,185,870 bytes in 808 blocks
==25793== total heap usage: 9,427 allocs, 8,619 frees, 4,156,841 bytes allocated
==25793== LEAK SUMMARY:
==25793== definitely lost: 3,697 bytes in 12 blocks
==25793== indirectly lost: 22,248 bytes in 13 blocks
==25793== possibly lost: 1,079,655 bytes in 113 blocks
==25793== still reachable: 80,270 bytes in 670 blocks
==25793== suppressed: 0 bytes in 0 blocks
==25793== ERROR SUMMARY: 119 errors from 119 contexts (suppressed: 0 from 0)
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Commit which added number of addresses to srp_address structure didn't
count with totemsrp_ifaces_get where whole structure was copied instead
of addresses only. This is now fixed.
Also to make API totempg forward compatible, size of interfaces array
must be passed to ifaces_get like functions to prevent memory overwrite.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Also few leftovers from cfg is removed and version of totempg is
increased to 5 to reflect all changes we made
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
keep totem.secauth config key for compatibility
if the key is NOT set, crypto will default to aes256/sha1
if the key is set to "off", crypto is disabled.
this reflects pretty much old behavior
keywords totem.crypto_cipher and totem.crypto_hash can
override secauth individually.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
totem doesn't need to understand what crypto does.
totem needs to be able to tell crypto: "those are data, play with them"
and crypto needs to return: "here are your scrambled data and the new size"
similar to decrypt/verify.
this way we add enough dynamic within crypto to change header size and all
at any given time (for different hash algorithm for example) without
affecting on wire compat.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Tomcrypt in corosync is for long time not updated. Because we have
support for libnss, libtomcrypt can be removed.
Also few leftovers (AES is 256 bits, not 128, ...) are removed.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
there are several reasons for this:
1) evs is only partially implemented with no plans to complete it
typedef enum {
EVS_TYPE_UNORDERED, /* not implemented */
EVS_TYPE_FIFO, /* same as agreed */
EVS_TYPE_AGREED,
EVS_TYPE_SAFE /* not implemented */
} evs_guarantee_t;
2) evs has no users in any upstream distribution and no search
engine can find any other upstream using it.
3) the only reason (I was told) to carry around evs was that evs
receives the full ring_id struct from totem. This is only
partially correct because while the structures are prepared
to carry around those data, they are never transmitted from
corosync engine down the IPC line to the user.
CPG ring_id contains the exact same information and it's
actually less buggy (due to prototying of the info).
worst case scenario where a user really absolutely need libevs,
it can be easily reimplemented as libcpg wrapper and avoid
lots of code duplication.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
pload is a performance benchmark that measures the onwire
speed of corosync.
problem is that once pload has been executed, the cluster
is basically dead.
turn pload into a test tool, by removing corosync-pload tool
and user library.
cleanup pload code to make it more readable and drop lots
of unnecessary stuff.
add test/ploadstart tool that can configure and start pload
via cmap calls.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
this was another old onwire compat mode that is not useful anylonger.
we can safely move the new model by default.
According to Honza (real hardware 1 node testing) there are no
performance impact.
My tests (8 nodes VM cluster), there is up to 10/12% performance
improvements up to 1M packet size where old and new models are equal.
As a side note, nss still shows to be a performance loss on both
real and virtual hw (without any kind of nss hw acceleration).
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
this change breaks onwire compatibility.
cpg is the only user of sync_* interface and it's the only
service that will require extra testing.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
qdevice is a very special node in the cluster and it adds a certain
amount of complexity and special cases across the code.
most of the qdevice data are shared across the cluster (name/votes)
but effectively each node has a different view of the qdevice
(registered/unregistered/voting/etc.)
with this change, we align the qdevice view across the node,
exchanging more data between nodes and we fix how qdevice behaves
and it is configured.
The only side effect is that the amount of data transmitted on wire
is slightly higher.
The qdevice API is still disabled by default. This means that
the amount of real changes in current code are a lot smaller
than it appears by this patch.
TODO: documentation/man pages needs to be updated once
this change is in (and behavior finalized).
User visible changes:
- configuration (coroparse, exec/votequorum):
the quorum device section is now standalone within the quorum.
quorum {
provider: corosync_votequorum
device {
model: (name)
timeout: (millisec)
votes:
}
}
the keyword "model:" is mandatory to enable qdevice in configuration
and should express the name of the script/daemon that will provide
the qdevice. Looking into the future, an init script or systemd
service will look for that name in /path/to/be/decided/name
and start/stop qdevice.
timeout: defines the maximum interval the qdevice implementation
has available between poll (see votequorum_qdevice_poll.3) before
the device is considered dead and votes discarded
votes: is now a configuration parameter and not an API call.
quorum devices don't care what they need to vote.
votes is autocalculated when a nodelist is available and all
nodes in the list vote 1. Otherwise this parameter is mandatory.
- configuration (exec/votequorum):
startup and runtime configuration changes have been improved.
errors at startup are considered fatal. errors at runtime
have different exit paths.
startup:
* quorum.two_node and qdevice are incompatible.
* quorum.expected_votes requires quorum.device.votes.
* quorum.expected_votes - quorum.device.votes cannot be lower
than 2.
* qdevice and last_man_standing are mutually exclusive.
* qdevice and auto_tie_breaker are mutually exclusive.
runtime config changes:
* quorum.two_node and qdevice are incompatible:
if quorum device is alive, two_node is disabled.
if quorum device is not alive and node count is 2, two_node is
enabled, and quorum device cannot be registered
* if either last_man_standing or auto_tie_breaker were enabled
at startup, and at runtime quorum device is configured,
quorum device registration will be blocked.
* if quorum.expected_votes is configured but not quorum.device.votes,
quorum device registration will be blocked.
* if quorum.device.votes is not configured and we cannot
automatically calculate it, quorum device registration will be blocked.
* An error in configuring quorum.expected_votes and quorum.device.votes
will block quorum device registration.
blocking quorum device registation, also means dropping the votes.
quorum.device.votes (either set or automatically calculated) is now
used to determine current expected_votes in the cluster.
- logging (exec/votequorum):
all errors from configuration are treated as WARNING/CRITICAL.
lots of extra DEBUG output is added (see internal changes too).
- corosync-quorumtool (tools/corosync-quorumtool):
* added option to forcefully kick out a quorum device from the local
node. This is for emergency recovery only and it is only
available when qdevice API is built-in.
* Improved status output, specifically add node state and qdevice
information
[root@fedora-master-node2 coro]# corosync-quorumtool -s
Version: 1.99.4.12-9c7d-dirty
Quorum type: corosync_votequorum
Nodes: 2
Ring ID: 132
Quorate: Yes
Node votes: 1
Node state: Member
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Nodeid Votes Name
1 1 fedora-master-node1.int.fabbione.net
2 1 fedora-master-node2.int.fabbione.net
0 1 QDEVICE (Voting)
* allow to print status for any node in the cluster known to
local node.
[root@fedora-master-node1 coro]# corosync-quorumtool -s
Version: 1.99.4.12-9c7d-dirty
Quorum type: corosync_votequorum
Nodes: 2
Ring ID: 144
Quorate: Yes
Node votes: 1
Node state: Member
Expected votes: 3
Highest expected: 3
Total votes: 2
Quorum: 2
Flags: Quorate
Nodeid Votes Name
1 1 fedora-master-node1.int.fabbione.net
2 1 fedora-master-node2.int.fabbione.net
[root@fedora-master-node1 coro]# corosync-quorumtool -s -n 2
Version: 1.99.4.12-9c7d-dirty
Quorum type: corosync_votequorum
Nodes: 2
Ring ID: 144
Quorate: Yes
Node votes: 1
Node state: Member
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Nodeid Votes Name
1 1 fedora-master-node1.int.fabbione.net
2 1 fedora-master-node2.int.fabbione.net
0 1 QDEVICE (Voting)
Internal changes:
- change qdevice timer to not run all time, but only when necessary.
- change votequorum_nodeinfo on wire data to use flags instead of uint8_t
and add QDEVICE status.
- allocate nodeid 0 to qdevice since it's the only real
nodeid that be reserved.
- change send_nodeinfo to allow to send nodeinfo for any node
so that we can share qdevice info across the cluster
(and this might be useful in future if we need to sync
internal cluster view).
- add votequorum api call to update qdevice name
- add runtime data if quorum device has been forcefully disabled
by config error
- add qdevice votes to expected_votes calculation (this
is probably the biggest difference vs cman)
- change votequorum_read_nodelist_configuration so that
we can autocalculate votes for qdevice (we need the nodecount
vs votes).
- add all checks for startup/runtime config (see above).
- do not make qdevice part of the membership_list received from
totem. None of our users care about it and it is not a real node.
- change onwire message handlers to deal with "data for this node from any node"
case and undersand nodeid 0 for qdevice info
- always allocate qdevice at startup. this simplifies code a lot.
- dispatch qdevice nodeinfo on membership changes.
- inform libvotequorum users when a qdevice is registered
- improve substantially qdevice api and add a simple
barrier based on qdevice name.
- add qdevice API barrier at cluster level. This feature allow
only one qdevice name to be active in the cluster at any time.
- qdevice getinfo can now report status for qdevice on any node.
- change slightly the way the qdevice API is built-in/out:
only the libvotequorum calls are #ifdef'out now. Doing so in
the core is too complex and would make the code unreadable
with the risk of missing a bit or two effectively introducing
an on-wire incompatibility if we will ever turn the API on.
- probably added some bugs on the way...
TODO: update qdevice_* API once the above is settled and test
qdevice integration with other features.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com> (only second part)
This is to make sure that we properly wait for responses
from corosync. I have made a fix to libqb to properly
handle the case when corosync exits/crashes between
a send and receive.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
We would use libqb for hashing now if we needed hashing.
cpg no longer uses jhash.h.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>
spotted while writing man pages. There are no users for this struct
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
We have always had this problem and worked around it by coping code
or using inline functions. Both not good IMO.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Add missing option for dispatch, which fills gap in combination of
block/nonblock and one/all dispatch types. New type doesn't mask
CS_ERR_TRY_AGAIN, and it means "no message was processed".
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
the only user of those obsoleted defines is dlm master (already ported)
to use CS_ and cmirror (that needs full porting to new corosync either way).
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>