mirror_corosync/exec
Jan Friesse 3675daceee totem: Increase ring_id seq after load
This patch handles the situation where the leader
node (the node with lowest node_id) crashes and is started again
before token timeout of the rest of the cluster.
The newly restarted node restores the ringid of the old ring from
stable storage, so it has the same ringid as rest of the nodes,
but ARU is zero. If the node is able to create a singleton membership
before receiving the joinlist from rest of the cluster,
everything works as expected, because the ring id gets increased
correctly.

But if the node receives a joinlist from another cluster node before
its own joinlist, then it continues as it would had it never left
the cluster. This is not correct, because the new node should always
create a singleton configuration first.

During the recovery phase, ARUs are compared and because they differ
(the ARU of the old leader node is 0), the other nodes
try to sent all of their previous messages. This is impossible
(even if it was correct), because other nodes have already freed most
of those messages. The implementation uses an assert to limit maximum
number of messages sent during recovery (we could fix this,
but it's not really the point).

The solution here is to increase the ring_id sequence number by 1 after
loading it from storage. During creation of the commit token it is
always increased by 4, so it will not collide with an existing
sequence.

Thanks Christine Caulfield <ccaulfie@redhat.com> for clarify commit
message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-07-15 16:39:32 +02:00
..
.gitignore Add .gitignore files. 2010-10-21 07:43:46 -07:00
apidef.c CFG: Remove ring-reenable code 2017-08-03 14:32:02 +02:00
apidef.h Update copyright header dates in exec directory 2012-02-13 17:05:04 -07:00
cfg.c logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID 2019-07-03 10:53:52 +02:00
cmap.c logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID 2019-07-03 10:53:52 +02:00
coroparse.c knet: Use block_unlisted_ips 2019-05-29 16:30:18 +02:00
cpg.c logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID 2019-07-03 10:53:52 +02:00
cs_queue.h Update copyright header dates in exec directory 2012-02-13 17:05:04 -07:00
fsm.h Make logging of WD and MON service correct 2012-08-16 14:45:15 +02:00
icmap.c stats: Add map with on-demand statistics 2017-07-27 15:53:04 +02:00
ipc_glue.c main: Move sched paramaters to config file 2018-11-15 17:30:03 +01:00
ipcs_stats.h stats: Add cmap key to clear the various stats. 2017-10-31 17:39:14 +01:00
logconfig.c logsys: Make hires timestamp default 2018-10-29 17:45:35 +01:00
logconfig.h list: Replace uses of list.h with qblist.h 2016-10-27 14:56:52 +02:00
logsys.c logsys: Support hires timestamp 2018-10-29 17:45:29 +01:00
main.c logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID 2019-07-03 10:53:52 +02:00
main.h main: Replace COROSYNC_MAIN_CONFIG_FILE 2018-11-15 17:30:14 +01:00
Makefile.am nozzle: Add support for libnozzle devices 2019-02-26 13:11:35 +01:00
mon.c list: Replace uses of list.h with qblist.h 2016-10-27 14:56:52 +02:00
pload.c build: bring SOLARIS up to the same standard as other OSes 2012-08-30 15:00:27 +02:00
quorum.c Remove redundant header file inclusion 2016-12-05 09:59:08 +01:00
quorum.h Update copyright header dates in exec directory 2012-02-13 17:05:04 -07:00
schedwrk.c schedwrk: Cleanup and make it work on PPC BE 2016-05-17 16:29:25 +02:00
schedwrk.h Update copyright header dates in exec directory 2012-02-13 17:05:04 -07:00
service.c service: Fix memleak in service_unlink_and_exit 2013-06-21 11:21:29 +02:00
service.h service: remove leftovers from mt corosync 2012-08-09 15:10:16 +02:00
stats.c stats: Fix delete of track 2018-11-16 11:47:22 +01:00
stats.h stats: Add map with on-demand statistics 2017-07-27 15:53:04 +02:00
sync.c sync: Call sync_init of all services at once 2017-11-16 15:22:19 +01:00
sync.h sync: kill evil and syncv1 in one shot 2012-03-09 11:15:08 +01:00
timer.c Update copyright header dates in exec directory 2012-02-13 17:05:04 -07:00
timer.h Update copyright header dates in exec directory 2012-02-13 17:05:04 -07:00
totemconfig.c logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID 2019-07-03 10:53:52 +02:00
totemconfig.h config: Allow links to have different ip_versions 2017-12-22 17:15:19 +01:00
totemip.c totemip: Use res in totemip_sa_equal 2019-06-12 15:40:50 +02:00
totemknet.c logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID 2019-07-03 10:53:52 +02:00
totemknet.h totem: Display IP of sender 2018-03-16 13:58:15 +01:00
totemnet.c totem: Display IP of sender 2018-03-16 13:58:15 +01:00
totemnet.h totem: Display IP of sender 2018-03-16 13:58:15 +01:00
totempg.c config: Fix crash in reload if new interfaces are added 2018-10-15 15:54:57 +02:00
totemsrp.c totem: Increase ring_id seq after load 2019-07-15 16:39:32 +02:00
totemsrp.h Add option to force cluster into GATHER state 2018-09-07 13:27:36 +02:00
totemudp.c totem: Display IP of sender 2018-03-16 13:58:15 +01:00
totemudp.h totem: Display IP of sender 2018-03-16 13:58:15 +01:00
totemudpu.c udpu: Drop packets from unlisted IPs 2019-05-29 16:30:10 +02:00
totemudpu.h totem: Display IP of sender 2018-03-16 13:58:15 +01:00
util.c main: Rename run_dir to state_dir 2018-12-14 13:48:33 +01:00
util.h main: Rename run_dir to state_dir 2018-12-14 13:48:33 +01:00
votequorum.c logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID 2019-07-03 10:53:52 +02:00
votequorum.h list: Replace uses of list.h with qblist.h 2016-10-27 14:56:52 +02:00
vsf_quorum.c logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID 2019-07-03 10:53:52 +02:00
vsf_ykd.c YKD: Fix loading of YKD quorum module 2014-08-18 09:33:59 +01:00
vsf_ykd.h list: Replace uses of list.h with qblist.h 2016-10-27 14:56:52 +02:00
vsf.h Update copyright header dates in exec directory 2012-02-13 17:05:04 -07:00
wd.c wd: fix snprintf warnings 2017-12-01 17:23:54 +01:00