Commit Graph

1779 Commits

Author SHA1 Message Date
Jan Friesse
72cf15af27 votequorum: Do not process events during reload
During reload, local_node_pos is deleted and reinstation is handled in
totemconfig after reload is finished. votequorum handles this events and
tries to reload it's configuration. This led to logging a little scary
messages (even nothing bad is happening, because after local_node_pos
reinstation everything back to normal).

Solution is to stop processing events during reload. Sadly, simple
tracking of config.reload_in_progress doesn't work because LibQB events
triggering order is undefined so votequorum reload handler can be called
before totemconfig (and before local_node_pos is reinstatied).

So new config.totemconfig_reload_in_progress key is defined with very
similar semanthic as config.reload_in_progress but set inside
totem_reload_notify function. Votequorum then use this new key.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-27 11:40:21 +02:00
Jan Friesse
c8e3f14fdb Make config.reload_in_progress key read only
It's not very good idea to allow user apps changing internal key
reload_in_progress.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-27 11:40:18 +02:00
Jan Friesse
4e9716ed30 coroparse: More strict numbers parsing
Previous safe_atoi didn't check range of input values so if for example
user used -1 s token timeout, it was converted to UINT32_MAX without
letting user know.

Another safe_atoi problem was using strtol. This works pretty well on
64-bit systems, where long integer is usually 64-bits long, sadly on
32-bit systems, it is usually 32-bit long. And because strtol returns
signed integer, it was not possible to enter 32-bit value with highest
bit set.

Solution is to use strtoll which is guaranteed to be at least 64-bits
long and check value range.

Also error message now contains also information about expected value
range.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-12 14:49:00 +02:00
Jan Friesse
da46ecfc30 Move ringid store and load from totem library
Functions for storing and loading ring id was in the totem library. This
causes problem, what to do when it's impossible to load or store ring
id. Easy solution seemed to be assert, but sadly this makes hard for
user to find out what happened (because corosync was just aborted and
logsys didn't flush)

Solution is to move these functions to main.c, where is much easier to
handle error. This also makes libtotem free of any file system
operations.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-02 14:54:57 +02:00
Jan Friesse
d310b251c3 Introduce get_run_dir function
Run dir (LOCALSTATEDIR/lib/corosync) was hardcoded thru whole codebase.
Totemsrp was trying to create and chdir into it, but also
takes into account environment variable COROSYNC_RUN_DIR creating
inconsistency.

get_run_dir correctly returns COROSYNC_RUN_DIR (when set) or
LOCALSTATEDIR/lib/corosync. This is now used by all functions instead of
hardcoded string.

All occurrences of mkdir/chdir are removed from totemsrp and chdir is
now called in main function. Mkdir call is completely removed, because
it was not used anyway (check in main.c was called before totemsrp init,
so mkdir was never called) and also make install and/or package system
should take care of creating this directory with correct
permissions/context.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-02 14:53:18 +02:00
Jan Friesse
8f13a98320 logsys: Log warning if flightrecorder init fails
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-02 14:36:10 +02:00
Jan Friesse
19c5b63ff5 logsys: Log error if blackbox cannot be created
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-06-02 14:36:08 +02:00
Jan Friesse
e905f92bf5 totemiba: Fix incorrect failed log message
rdma_join_multicast failed ... message parameters was swapped.

Also information about multicast join is now logged as notice.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2014-05-15 15:28:51 +02:00
Yevheniy Demchenko
4d6a18d8a5 totemiba: Add multicast recovery
Totemiba wasn't able to survive SubnetManager handover or
restart. If SM was migrated to another node, corosync logged
"multicast error" and losses connectivity.

Commit should solve this situation.

Signed-off-by: Yevheniy Demchenko <zheka@uvt.cz>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-05-14 14:51:07 +02:00
hfu
d0dc9ae93c Indent: Remove newline before else branch start
Signed-off-by: hfu <askfuhu@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-05-09 11:38:02 +02:00
hfu
b6e2c8024d Indent: Remove space in negation of expression
Signed-off-by: hfu <askfuhu@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-05-09 11:37:47 +02:00
Jan Friesse
7557fdec48 config: Allow dynamic change of token_coefficient
token_coefficient change in cmap didn't triggered change. So only way
how to change token_coefficient was editing config file and reload.

Patch let's key totem.token_coefficient to be processed so
token_coefficient can be dynamically changed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-05-07 15:55:26 +02:00
Jan Friesse
58176d6779 Add token_coefficient option
Token coefficient is used only when nodelist is specified and contains
at least 3 nodes. If so, real token timeout is then computed as
token + (number_of_nodes - 2) * token_coefficient. This allows cluster
to scale without manually changing token timeout every time new
node is added. This value can be set to 0 resulting in effective
removal of this feature.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:29:17 +01:00
Jan Friesse
9a8de87c34 totemconfig: Log errors on key change and reload
When volatile key was changed (cmap set or reload) and checks fails,
nothing was logged.

Values are now checked and error string is logged on problems.

Also totem_config is dumped to log (DEBUG level) after every
volatile key change and every reload.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:29:14 +01:00
Jan Friesse
b95ebd640e totemconfig: Key change process dependencies
When key with dependency was changed, dependant keys were not recomputed.
Nice example is consensus timeout. If token timout was changed,
consensus timeout was not recomputed correctly (nether via cmap change
of key nor via cfg reload).

Solution is almost complete refactor of handling volatile defaults.

totem_volatile_config_read now handles not only storing cmap key to
totem_config structure, but also checking of existence, comparing with
zero value and properly storing defaults.

totem_set_volatile_defaults is gone. It's function was splitted into
totem_volatile_config_read and totem_volatile_config_validate functions.

Reload callback and change of key callback are now mostly same functions
and both calls totem_volatile_config_read.

Patch also fixes small memory leak. totem.vsftype key is not used for
long time and original totem_volatile_config_read wasn't freeing
allocated memory returned by icmap_get_string. Whole reading of
totem.vsftype is removed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:29:12 +01:00
Jan Friesse
eeb2384157 Really clear totemconfig nodes on reload
When reload was called nodes were constantly added to totemconfig
nodelist.

So simple corosync-cfgtool -R resulted very quickly in filling whole
array and segfault.

Solution is to clear member_count.

Clearing is also moved directly to put_nodelist_members_to_config to
make sure it's always processed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:29:09 +01:00
Jan Friesse
1b6abcc7d5 Log: Make reload of logging work
When reload was called multiple times (~20), logging to file stopped
working.

Main problem was hidden in the fact, that log file was opened multiple
times, because even target_id was shared via subsystem loggers, file
name was not.

Solution is to ALWAYS set proper log file name into subsystem logger
(copy is stored). This will not only fix problem but also removes small
leak.

Also if filename didn't changed, function can return sooner.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-25 15:13:33 +01:00
Jan Friesse
2f0cad20a9 config: Handle totem_set_volatile_defaults errors
When totem_set_volatile_defaults is called from totem_config_validate
return code is unchecked.

It's then perfectly possible to set (for example) join timeout to very
small value (1) and consensus value is then set to 0 making corosync
unable to create membership.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-03-17 10:04:00 +01:00
Jan Friesse
e1801ba497 votequorum: Properly initialize atb and atb_string
icmap_get_* behavior is to NOT modify passed variable when it doesn't
success. So we must initialize variable before icmap_get_* call.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2014-02-26 16:59:02 +01:00
Jan Friesse
ff67daa55f mon: Make monitoring work
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-25 14:57:20 +01:00
Jan Friesse
099f704cdd mon: Pass correct pointer to inst
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-25 14:57:16 +01:00
Jan Friesse
57ff693b70 mon: Fix comparsion typo
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-25 14:57:13 +01:00
Jan Friesse
e1e2390b61 mon: Make mon compilable with libstatgrab ver 0.9
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-25 14:57:10 +01:00
Jan Friesse
fbe8768f1b cpg: Make sure left nodes are really removed
When node is paused and other nodes has in meantime exited cpg process,
paused node after resume doesn't update it's membership correctly so on
previously paused node exited cpg process is still visible.

Solution is to compare join list with cpd and remove all pids which are
not included in join list.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-19 10:59:14 +01:00
Jan Friesse
83c63b247f cpg: Make sure nodid is always logged as hex num
Also number is prefixed by 0x so it's easier to spot that number is
hexadecimal.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-19 10:59:10 +01:00
Jan Friesse
fcf26e0303 cpg: Refactor mh_req_exec_cpg_procleave
Most of functionality is moved to do_proc_leave function to make it
reusable.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-19 10:59:05 +01:00
Jan Friesse
38c04d9a66 totemsrp: Fix typo with cont gather
Patch f3ffd3da5c introduced named states
of state-machine, but sadly contains logical problem causing
stats.continuous_gather increasing even when it shouldn't. Problem is
not critical, because continuous_gather is set to 0 on successful
membership creation.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-02-18 16:12:57 +01:00
Christine Caulfield
90d448af3b votequorum: Add extended options to auto_tie_breaker
This patch adds more flexibility to the auto_tie_breaker feature of
votequorum. With this, not only can the lowest nodeid be used as
a tie breaker, but also the highest, or a node from a nominated list.

If there is a list of nodes, the first node in the list that was not
part of the previous partition is used. This allows the user to
specify a preferred set of nodes but prevents a split-brain if the
cluster divides evenly with a node in each half.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-02-17 16:29:45 +00:00
Masatake YAMATO
fa71067a93 Free object allocated at quorum_register_callback
Memory object allocated with malloc at quorum_register_callback
is not freed. The object is linked to internal_trackers_list.

The object is unlinked at quorum_unregister_callback. However,
it is not freed at the function.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-01-23 17:18:44 +01:00
Jan Friesse
45dd9861ff Properly check result of symlink
Error message is displayed when it's impossible to create symlink to
fdata file.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-01-14 11:24:31 +01:00
Jan Friesse
5c54f941ac Fix cppchecks warning
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-01-14 11:24:29 +01:00
Jan Friesse
178c0d82d9 Close devnull file handler
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2014-01-14 11:24:26 +01:00
Jason
cfbb021e13 totem: Drop invalid join msg in operational state
According to the totem paper, if a processor
receives a join message in the operational state and if the
receivers identifier is in the join messages fail list,
then join message should be ignored.

By applying this validation of join messages, we can avoid unnecessary
switching from operational state to gather state(or even lead to rings
can not be merged) like the following to happen.

1. Initially, there is only one ring contains three nodes, say
   ring(A,B,C).
2. A and B network partition, "in the same time", C is down.
3. Node A sends join message with proclist:A,B,C. faillist:NULL.
   Node B sends join message with proclist:A,B,C. faillist:NULL.
4. Both A and B consensus timeout due to network partition.
5. A and B network remerged.
6. Node A sends join message with proclist:A,B,C. faillist:B,C. and
   create ring(A).
   Node B sends join message with proclist:A,B,C. faillist:A,C. and
   create ring(B).
7. Say join message with proclist:A,B,C. faillist:A,C which sent
   by node B is received by node A because network remerged.
8. Node A shifts to gather state and send out a modified join message
   with proclist:A,B,C. faillist:B. Such join message will prevent
   both A and B from merging.
9. Node A consensus timeout (caused by waiting node C) and sends join
   message with proclist:A,B,C. faillist:B,C again.

Same thing happens on node B, so A and B will dead loop forever
in step 7, 8 and 9.

As the paper also said: "If a processor receives a join message in the
operational state and if the sender's identifier is in the receiver's
my_proclist and the join message's ring_seq is less than the receiver's
ring sequence number, then it ignores the join message too." So these
patch applying these validations of join messages altogether.

Signed-off-by: Jason <huzhijiang@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2014-01-13 14:46:13 +01:00
Christine Caulfield
ff6a43edb3 votequorum: Add persistent expected_votes tracking.
This patch adds the option to store expected_votes to
persistent storage. This is needed to allow_downscale
to operate properly.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2014-01-07 15:30:11 +00:00
Jan Friesse
b88c0766fe logsys: Make logging of totem work again
Because of change in libqb (9abb686) logging of TOTEM subsystem stopped
working.

Instead of rely on previous behavior (implicit substring match), all
totem files are now explicitly given.

Also QB subsystem now uses comma separated filelist instead of previous
function calling.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2013-11-04 12:32:35 +01:00
Masatake YAMATO
f3ffd3da5c totemsrp: Show English message when memb_state_gather_enter is called
The reason why memb_state_gather_enter is invoked was printed
in integer code. This patch introduces human readable English
messages for the code.

Signed-off-by: Masatake YAMATO <yamato@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-10-24 16:46:17 +02:00
Yevheniy Demchenko
805b3423ee totemiba: Check if configured MTU is allowed by HW
Solution use aproximation of totem structures. This needs to be
rewritten in proper way. Also MTU checking should be implemented for IP
transports.

Signed-off-by: Yevheniy Demchenko <zheka@uvt.cz>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-20 11:27:08 +02:00
Yevheniy Demchenko
8f14a5788f totemiba: Fix parameters position for poll_add
Parameters in functions like mcast_cq_send_event_fn, ... were defined in
incorrect order. Also their names were weird.

Signed-off-by: Yevheniy Demchenko <zheka@uvt.cz>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-20 11:26:50 +02:00
Yevheniy Demchenko
c5d4a0762f totemiba: Del channel fd from poll before destroy
Corosync freezes after several peer node connects/disconnects. The
freeze happens in recv_token_cq_recv_event_fn in ibv_get_cq_event call.
The problems is in fact, that after each peer node connect,
recv_token_accept_destroy is called, which tries to call
poll_dispatch_delete _after_ freeing of completion_channel. As
completion_channel contains fd, handlers are not disconnected from
poller properly. This leads to complete inconsistency in subsequent
calls to handlers.

Signed-off-by: Yevheniy Demchenko <zheka@uvt.cz>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-20 11:26:04 +02:00
Yevheniy Demchenko
5046de387b totemiba: Properly allocate RDMA buffers
1. In UD mode receivnig side of RDMA application should have enough
space in buffer to hold data and GRH. Also, sge.length on the receiving
size should be set to max_msg_size + sizeof (struct ibv_grh). Current
corosync doesn't take grh in the account and does not work if mtu is set
to the real mtu of IB port (it works if netmtu is set to < 2048-40).
2. ibv_wc.byte_len is the actual lentgh of the received packet, i.e.
msg_len + GRH. GRH length should be substracted in further proceeding.
If not, it might cause problems when messages get retransmitted, as
their apparent size will constantly grow.
3. Current corosync will not work with rdma and mtus > 2048. Most modern
IB HW supports 4096 mtu.

Signed-off-by: Yevheniy Demchenko <zheka@uvt.cz>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-20 11:26:00 +02:00
Christine Caulfield
1a046793cb Reload: Add atomic reload to log config
When a reload is in progress, wait until it has all finished
before re-reading all of the logging parameters

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-12 16:10:07 +01:00
Christine Caulfield
c0bfd48928 Reload: Add atomic reload to totemconfig
When a reload is in progress, wait until the whole thing has
finished before setting parameters

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-12 16:09:55 +01:00
Christine Caulfield
82fbffc34b Reload: Add reload code to cfg
Add the code to do the actual corosync.conf reload to cfg, along with
a corosync-cfgtool -R command to trigger it

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-12 16:09:41 +01:00
Christine Caulfield
bc47c583bd Reload: Make coroparse use a designated icmap hash table
Pass an icmap hashtable into coroparse so we can load it into
a temporary one during reload

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-12 16:09:06 +01:00
Jan Friesse
95133a5d77 icmap: Add func to test equality of two key values
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2013-09-10 17:02:12 +02:00
Christine Caulfield
8567887abb [PATCH] Replace freopen with open/dup2 when daemonizing
This patch replaces the existing freopen method of
forcing stdin/out/err to /dev/null with the more
usual system of open/dup2.

While I don't like posting patches I don't fully understand,
this patch seems to fix a problem where stdout/err get
assigned to a socket causing double logging output
on systemd.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-10 15:33:31 +01:00
Christine Caulfield
3663622576 Add log message to exit signal handler
I've seen a few instances where corosync has shut down for
apparently 'no reason'. In fact most of the time the shutdown
has been caused by an external source (often an init script)
but it's not been obvious what has happened and people
implicate the deamon

This patch simply adds a log message to the signal handler
when it is called so that the cause of the shutdown is obvious.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2013-09-03 14:04:50 +01:00
Jan Friesse
26ef8e15db icmap: Add map copy function
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2013-08-29 17:08:46 +02:00
Jan Friesse
e363f8b06d icmap: Add function to return item data pointer
icmap_get_r is now implemented using this function. Function is not very
safe tho defined as static.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2013-08-29 17:08:41 +02:00
Jan Friesse
624cd439aa icmap: Fix value len checking for strings
Implementation should allow pass only parts of string (shorten string)
and must prohibit reading of uninitialized memory.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2013-08-29 17:08:37 +02:00