Commit Graph

1640 Commits

Author SHA1 Message Date
Fabio M. Di Nitto
a0a14c68e3 totemip: clean up headers a lot more
getifaddrs is always available if there is freeifaddr.

all BSD and openindiana have it defined in ifaddr.h.

drop a bunch of obsoleted headers.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-30 15:00:27 +02:00
Fabio M. Di Nitto
18929089d1 build: drop MAP_ANONYMOUS check from configure
define it only in case it's not there

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-30 15:00:27 +02:00
Fabio M. Di Nitto
5c5db34e56 build: make libstatgrab the facto default for monitoring service
drop duplicate code and remove the last COROSYNC_LINUX ifdefs
around

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-30 15:00:27 +02:00
Fabio M. Di Nitto
a1c154e6fa build: use MADV_NOSYNC only when it's defined
so far only FreeBSD defines it.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-30 15:00:27 +02:00
Fabio M. Di Nitto
6098ef2c14 build: make exec/totemip os detection free
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-30 15:00:27 +02:00
Jan Friesse
dbe0e9e382 Log: Use threaded mode for syslog and file log
Syslog and file log can block, so it's good idea to use libqb threaded
mode to prevent it.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-08-30 09:46:48 +02:00
Jan Friesse
9f6e6a990b Use native IPC mechanism
Instead of hardcoded SHM, we should use NATIVE, so libqb is able to find
out what is best/availiable mechanism.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-08-30 09:45:46 +02:00
Fabio M. Di Nitto
427fdd4558 build: fix build on openindiana 151a
openindiana toolchain is rather messy. This is the first cut only

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-28 15:14:49 +02:00
Fabio M. Di Nitto
9f7181b533 build: drop more dlopen leftovers from dinosaur era...
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-28 15:14:49 +02:00
Fabio M. Di Nitto
dd4d7f86e6 build: make monitoring optional in corosync exec
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-28 15:14:49 +02:00
Fabio M. Di Nitto
8f96347100 build: respect watchdog conditional when building corosync exec
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-28 15:14:49 +02:00
Fabio M. Di Nitto
76d18f964d build: use libtool for linking
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-28 15:14:48 +02:00
Tim Beale
6129ce5b59 Remove redundant default-config code
We were checking 'hold_timeout == 0' in 3 different places when setting up
the default totem config.

Signed-off-by: Tim Beale <tlbeale@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-21 14:26:50 +02:00
Tim Beale
77ea036c72 Remove unused structure
Nowhere in the corosync codebase references this structure.

Signed-off-by: Tim Beale <tlbeale@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-21 14:11:48 +02:00
Jan Friesse
397cc89f01 Make logging of WD and MON service correct
MON and WD services are using fsm.h, which calls log function. Such
messages were incorrectly logged as SERV (or random service) which made
debugging hard.

Solution is to add callback parameter to fsm functions and do actual
logging there.

Handling of failure states is also done in calback now.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-08-16 14:45:15 +02:00
Jan Friesse
e3cef955bf IPC: Call lib function only when it's possible
send_ok was incorrectly tested as boolean, even it's errno type
variable.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-08-09 15:10:52 +02:00
Jan Friesse
8014b2facf Close sockets after deleting from poll
This will remove (non critical) debug message from QB about polling on
closed FD.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-08-09 15:10:44 +02:00
Jan Friesse
2d10e2bbea cpg: Check input param name_t length
IPC is using buffer of CS_MAX_NAME_LENGTH for name. If user calls
function with longer string, such string can be passed to service
incomplete.

Solution is to not allow string larger then CS_MAX_NAME_LENGTH
and return error.

Same applies to cpg service.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-08-09 15:10:35 +02:00
Jan Friesse
6f6988afff Handle sync and service unload correctly
When sync started and service is unloaded in meantime, it can happen that
sync will call sync_* functions on unloaded service.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-08-09 15:10:26 +02:00
Jan Friesse
dfe34d330c service: remove leftovers from mt corosync
Multithreaded corosync used to use many ugly workarounds. One of them is
shutdown process, where we had to solve problem with two locks. This was
solved by scheduling jobs between service exit_fn call and actual
service unload. Sadly this can cause to receive message from other node
in that meantime causing corosync to segfault on exit.

Because corosync is now single threaded, we don't need such hacks any
longer.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-08-09 15:10:16 +02:00
Fabio M. Di Nitto
423e37b4ca votequorum: change init/clean up to deal with exit races
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-08 09:03:57 +02:00
Fabio M. Di Nitto
50308cb08d quorumtool: make output more meaningful
there is really no point to have a per node view of (vote)quorum
since all the info are always there.

drop the -n option for status/display nodes and improve
the output to provide a full cluster view at any given time.

Old format:

[root@fedora-master-node2 ~]# corosync-quorumtool -s
Quorum information
------------------
Date: Mon Aug 6 10:22:27 2012
Quorum provider: corosync_votequorum
Nodes: 2
Ring ID: 8
Quorate: Yes

Votequorum information
----------------------
Node ID: 3254954176
Node state: Member
Node votes: 1
Qdevice votes: 1
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate Qdevice

Membership information
----------------------
Nodeid Votes Name
3238176960 1 fedora-master-node1.int.fabbione.net
3254954176 1 fedora-master-node2.int.fabbione.net
         0 1 QDEVICE (Alive/Voting/NoMasterWins)

New format:

[root@fedora-master-node1 tools]# ./corosync-quorumtool -s
Quorum information
------------------
Date:             Mon Aug  6 15:50:03 2012
Quorum provider:  corosync_votequorum
Nodes:            2
Ring ID:          48
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   3
Highest expected: 3
Total votes:      3
Quorum:           2
Flags:            Quorate Qdevice

Membership information
----------------------
    Nodeid      Votes    Qdevice Name
3238176960          1     A,V,MW fedora-master-node1.int.fabbione.net
3254954176          1         NR fedora-master-node2.int.fabbione.net
         0          1            QDEVICE

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
6b270c6cd1 votequorum: make the last QDEVICE define name consistent with everything else
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
302545e112 votequorum: add missing return call
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
379b203677 votequorum: make master_wins check stricter
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
9c50f33509 votequorum: add ENTER/LEAVE for consistency
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
2f369e7039 votequorum: delegate qdevice_master_wins setting to qdevice
votequorum has no business to device if master_wins setting is correct or not.
only the qdevice can decide and should set the value for votequorum.

Logic is:

- user requests master_wins from config
- corosync starts
- qdevice starts
- qdevice reads cmap values / register with votequorum
- qdevice decides if the node can support master_wins or not and tells votequorum
- at this point votequorum can check if an unquorate node is part of the master_wins
  partition

it is the qdevice responsibility to keep that value up to date in votequorum and the
value can be changed at runtime.

this commit also exchange per node master_wins information to lay down the infrastructure
to verify discrepancies in node config for master_wins (coming next on this channel).

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
cc7bfeb462 votequorum: drop votequorum_qdevice_getinfo and collapse data into getinfo
it's really pointless to have basically a duplicated API call
to transfer one value and one name.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
65a6c29a31 votequorum: external defines should all be prefixed with VOTEQUORUM_
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
2a37b56c49 votequorum: drop _FLAG_ from defines
those are all info flags.. it's redudant and inconsistent

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
3416eacbec votequorum: fix define name to match reality
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
86dd11b28e qdevice: implement master_wins partition
in previous incarnation of qdisk + cman, master_wins was restricted
to 2 node only.

In this new version it is possible to use master_wins for any cluster
size.

Let's assume a 4 node cluster. Each node votes 1, qdevice votes 3.

node 1 becomes qdevice master
node 2/3/4 no

In case of a split (let's assume 2/2):

partition 1: {4, 1}
partition 2: {1, 1}

node 2 in partition 1 would normally be unquorate, leaving effectively
only node 1 active.

master_wins allows node 2 to recognize to be part of a quorate partition
(since node1 is broadcasting that qdevice is voting) and retain
quorum.

node1 has never lost quorate status since qdevice is voting there.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
aa295be834 votequorum: fix flag check for qdevice votes propagation
and cleanup similar code to make it more readable

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
2dae49e54a votequorum: remove last instance of state and rename it to cast_vote
also align naming of vote to cast_vote for info calls

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
3fed1af077 votequorum: several major bug fixes and code cleanup
- add a protection check to avoid spurious messages on membership
  change
- greately simplify processing of nodeinfo, since the only
  data that we send for qdevice over nodeinfo is the number of votes
- fix a flag check to trigger quorum calculation that would
  leave a cluster unquorate under certain conditions

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
62659dbb21 votequorum: move to the new flag structure
simplify different code path as checks are simpler, separate
ALIVE and CAST_VOTE

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
c9e207ec92 votequorum: simplify getinfo data and protect against call against quorum node
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
f2b25936e5 votequorum: use REGISTERED flag consistently
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
0bcb4cddcc votequorum: simply internal qdevice_getinfo function
as data are moving around we can drop lots of special cases

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:17 +02:00
Fabio M. Di Nitto
43d1439600 votequorum: add qdevice CAST_VOTE status/flag
this is a preparation commit for the next changes. right now it is
no more than an alias to ALIVE.

CAST_VOTE is required to support master/slave feature from qdevice.

Effectively a quorum device can be:

Not registered / registered (connected to API but nothing else is happening)

if registered:

Not alive / alive (quorum device is petting the API via poll and timer is running)

if alive:

Not voting (slave) / voting (master)

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:16 +02:00
Fabio M. Di Nitto
987e26f8d1 votequorum: rename NODE_FLAGS_QDEVICE_STATE to NODE_FLAGS_QDEVICE_ALIVE
STATE is confusing and overloaded term in votequorum as it's used for nodes
and other bits.

make the name unique and ALIVE means that the qdevice is heartbeating
to votequorum.

improve display of the status in tools and tests.

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:16 +02:00
Fabio M. Di Nitto
4621a6cd02 votequorum: rename NODE_FLAGS_QDEVICE to NODE_FLAGS_QDEVICE_REGISTERED
make the flag name explicit

Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2012-08-07 11:07:16 +02:00
Jan Friesse
fed7fc23e1 Don't call sync_* funcs for unloaded services
When service is unloaded, sync shouldn't call sync_init|process|activate
and abort functions. It happens very rare, but in process of unloading
all services, totem can recreate membership and bad things can happen
(service is unloaded, so there may be access to already freed memory,
 ...)

Solution is to fetch services sync handlers in every time when we are
building service list instead of using precreated one.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-08-02 09:34:58 +02:00
Jan Friesse
9fb7979370 Introduce SERVICES_COUNT_MAX macro
Sync/service was using maximal number of services in ehter numberic form
(magic constant) or inconsistently, this means using
SERVICE_HANDLER_MAXIMUM_COUNT which means maximal number of handlers.

New macro solves this.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-08-02 09:32:05 +02:00
Jan Friesse
537bf56fcc cpg: Be more verbose for procjoin message
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-07-30 10:22:16 +02:00
Jan Friesse
04dac3ff5d Correctly free state string in wd
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2012-07-12 15:53:04 +02:00
Jan Friesse
e4d75d1ab3 Revert "Free state variable allocated in wd_resource_state_is_ok"
This reverts commit 01c63ca17c.
2012-07-11 17:04:41 +02:00
Jan Friesse
a966506c1e cpg: Enhance downlist selection algorithm
Let's say we have 2 nodes:
- node 2 is paused
- node 1 create membership (one node)
- node 2 is unpaused

Result is that node 1 downlist is selected, so it means that from node 2
point of view, node 1 was never down.

Patch solves situation by adding additional check for largest previous
membership.

So current tests are:
1) largest (previous #nodes - #nodes know to have left)
2) (then) largest previous membership
3) (and last as a tie-breaker) node with smallest nodeid

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-06-14 15:15:42 +02:00
Jan Friesse
f3457c5d49 cpg: Print cpg name to debug informations
In downlist and joinlist debug output group was printed in nonsense
format of integer to pointer to array.

Now it's printed by full name.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-06-14 15:15:39 +02:00
Jan Friesse
35446d6bcc cpg: Process join list after downlists
let's say following situation will happen:
- we have 3 nodes
- on wire messages looks like D1,J1,D2,J2,D3,J3 (D is downlist, J is
  joinlist)
- let's say, D1 and D3 contains node 2
- it means that J2 is applied, but right after that, D1 (or D3) is
  applied what means, node 2 is again considered down

It's solved by collecting joinlists and apply them after downlist, so
order is:
- apply best matching downlist
- apply all joinlists

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2012-06-14 15:15:35 +02:00