getifaddrs is always available if there is freeifaddr.
all BSD and openindiana have it defined in ifaddr.h.
drop a bunch of obsoleted headers.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
drop duplicate code and remove the last COROSYNC_LINUX ifdefs
around
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Syslog and file log can block, so it's good idea to use libqb threaded
mode to prevent it.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Instead of hardcoded SHM, we should use NATIVE, so libqb is able to find
out what is best/availiable mechanism.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
openindiana toolchain is rather messy. This is the first cut only
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
We were checking 'hold_timeout == 0' in 3 different places when setting up
the default totem config.
Signed-off-by: Tim Beale <tlbeale@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Nowhere in the corosync codebase references this structure.
Signed-off-by: Tim Beale <tlbeale@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
MON and WD services are using fsm.h, which calls log function. Such
messages were incorrectly logged as SERV (or random service) which made
debugging hard.
Solution is to add callback parameter to fsm functions and do actual
logging there.
Handling of failure states is also done in calback now.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
send_ok was incorrectly tested as boolean, even it's errno type
variable.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
This will remove (non critical) debug message from QB about polling on
closed FD.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
IPC is using buffer of CS_MAX_NAME_LENGTH for name. If user calls
function with longer string, such string can be passed to service
incomplete.
Solution is to not allow string larger then CS_MAX_NAME_LENGTH
and return error.
Same applies to cpg service.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
When sync started and service is unloaded in meantime, it can happen that
sync will call sync_* functions on unloaded service.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Multithreaded corosync used to use many ugly workarounds. One of them is
shutdown process, where we had to solve problem with two locks. This was
solved by scheduling jobs between service exit_fn call and actual
service unload. Sadly this can cause to receive message from other node
in that meantime causing corosync to segfault on exit.
Because corosync is now single threaded, we don't need such hacks any
longer.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
there is really no point to have a per node view of (vote)quorum
since all the info are always there.
drop the -n option for status/display nodes and improve
the output to provide a full cluster view at any given time.
Old format:
[root@fedora-master-node2 ~]# corosync-quorumtool -s
Quorum information
------------------
Date: Mon Aug 6 10:22:27 2012
Quorum provider: corosync_votequorum
Nodes: 2
Ring ID: 8
Quorate: Yes
Votequorum information
----------------------
Node ID: 3254954176
Node state: Member
Node votes: 1
Qdevice votes: 1
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Name
3238176960 1 fedora-master-node1.int.fabbione.net
3254954176 1 fedora-master-node2.int.fabbione.net
0 1 QDEVICE (Alive/Voting/NoMasterWins)
New format:
[root@fedora-master-node1 tools]# ./corosync-quorumtool -s
Quorum information
------------------
Date: Mon Aug 6 15:50:03 2012
Quorum provider: corosync_votequorum
Nodes: 2
Ring ID: 48
Quorate: Yes
Votequorum information
----------------------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice
Membership information
----------------------
Nodeid Votes Qdevice Name
3238176960 1 A,V,MW fedora-master-node1.int.fabbione.net
3254954176 1 NR fedora-master-node2.int.fabbione.net
0 1 QDEVICE
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
votequorum has no business to device if master_wins setting is correct or not.
only the qdevice can decide and should set the value for votequorum.
Logic is:
- user requests master_wins from config
- corosync starts
- qdevice starts
- qdevice reads cmap values / register with votequorum
- qdevice decides if the node can support master_wins or not and tells votequorum
- at this point votequorum can check if an unquorate node is part of the master_wins
partition
it is the qdevice responsibility to keep that value up to date in votequorum and the
value can be changed at runtime.
this commit also exchange per node master_wins information to lay down the infrastructure
to verify discrepancies in node config for master_wins (coming next on this channel).
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
it's really pointless to have basically a duplicated API call
to transfer one value and one name.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
those are all info flags.. it's redudant and inconsistent
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
in previous incarnation of qdisk + cman, master_wins was restricted
to 2 node only.
In this new version it is possible to use master_wins for any cluster
size.
Let's assume a 4 node cluster. Each node votes 1, qdevice votes 3.
node 1 becomes qdevice master
node 2/3/4 no
In case of a split (let's assume 2/2):
partition 1: {4, 1}
partition 2: {1, 1}
node 2 in partition 1 would normally be unquorate, leaving effectively
only node 1 active.
master_wins allows node 2 to recognize to be part of a quorate partition
(since node1 is broadcasting that qdevice is voting) and retain
quorum.
node1 has never lost quorate status since qdevice is voting there.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
and cleanup similar code to make it more readable
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
also align naming of vote to cast_vote for info calls
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
- add a protection check to avoid spurious messages on membership
change
- greately simplify processing of nodeinfo, since the only
data that we send for qdevice over nodeinfo is the number of votes
- fix a flag check to trigger quorum calculation that would
leave a cluster unquorate under certain conditions
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
simplify different code path as checks are simpler, separate
ALIVE and CAST_VOTE
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
as data are moving around we can drop lots of special cases
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
this is a preparation commit for the next changes. right now it is
no more than an alias to ALIVE.
CAST_VOTE is required to support master/slave feature from qdevice.
Effectively a quorum device can be:
Not registered / registered (connected to API but nothing else is happening)
if registered:
Not alive / alive (quorum device is petting the API via poll and timer is running)
if alive:
Not voting (slave) / voting (master)
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
STATE is confusing and overloaded term in votequorum as it's used for nodes
and other bits.
make the name unique and ALIVE means that the qdevice is heartbeating
to votequorum.
improve display of the status in tools and tests.
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
make the flag name explicit
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
When service is unloaded, sync shouldn't call sync_init|process|activate
and abort functions. It happens very rare, but in process of unloading
all services, totem can recreate membership and bad things can happen
(service is unloaded, so there may be access to already freed memory,
...)
Solution is to fetch services sync handlers in every time when we are
building service list instead of using precreated one.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Sync/service was using maximal number of services in ehter numberic form
(magic constant) or inconsistently, this means using
SERVICE_HANDLER_MAXIMUM_COUNT which means maximal number of handlers.
New macro solves this.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Let's say we have 2 nodes:
- node 2 is paused
- node 1 create membership (one node)
- node 2 is unpaused
Result is that node 1 downlist is selected, so it means that from node 2
point of view, node 1 was never down.
Patch solves situation by adding additional check for largest previous
membership.
So current tests are:
1) largest (previous #nodes - #nodes know to have left)
2) (then) largest previous membership
3) (and last as a tie-breaker) node with smallest nodeid
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
In downlist and joinlist debug output group was printed in nonsense
format of integer to pointer to array.
Now it's printed by full name.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
let's say following situation will happen:
- we have 3 nodes
- on wire messages looks like D1,J1,D2,J2,D3,J3 (D is downlist, J is
joinlist)
- let's say, D1 and D3 contains node 2
- it means that J2 is applied, but right after that, D1 (or D3) is
applied what means, node 2 is again considered down
It's solved by collecting joinlists and apply them after downlist, so
order is:
- apply best matching downlist
- apply all joinlists
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>