Signed-off-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio Di Nitto <fdinitto@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Quorum is broken in this patch.
service.h needs to be cleaned up significantly
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Fabio Di Nitto <fdinitto@redhat.com>
Biggest difference between fast and standard inc/dec operation is in
fast that fast doesn't do malloc/memcpy, but also it means that tracking
events doesn't have old value set.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
New key with faulty status of ring is created in cmap as name
runtime.totem.pg.mrp.rrp.$ring_number.faulty
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
this flag (0|1) can be configured via quorum.last_man_standing and when
enabled, it allows expected_votes to be dynamically recalculated.
Assuming an 8 nodes cluster, every node votes 1 (mandatory requirement for
this feature).
In the first event, 3 nodes are lost.
The remaining partition of 5 is barely quorate.
After a configurable timeout (quorum.last_man_standing_window, default 10sec)
the quorate partition is allow to recalculate expected_votes based on
the remaining nodes.
This operation will bring expected_votes to 5 and quorum to 3.
Repeating the above loop, in the next event, 2 more nodes are allowed to
die. etc. etc.
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
this is a very old leftover from the RHEL5 timeframe, not used in RHEL6.
Also change votequorum soname since this change implies an ABI change.
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
this flag (0|1) can be configured via quorum.auto_tie_breaker and when
enabled, support for perfect even split is on.
In case of a 50% of votes loss in one single transition, the partition
with the node that has the lowest node id will remain quorate.
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
this flag (0|1) can be configured via quorum.wait_for_all and changes
behavior when granting quorum for the first time.
Normal behavior (default / 0) grants quorum as soon as enough nodes
are available in a cluster.
Setting this value to 1 will grant quorum only after all cluster
memembers are part of the cluster at the same time.
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
corosync internal theory of operation is that without a quorum provider
the cluster is always quorate. This is fine for membership free clusters
but it does pose a problem for applications that need membership and
"real" quorum.
this change add quorum_type to quorum_initialize call to return QUORUM_FREE
or QUORUM_SET. Applications can then make their own decisions to error out
or continue operating.
The only other way to know if a quorum provider is enabled/configured is
to poke at confdb/objdb, but adds an unnecessary burden to applications
that really don't need to use an entire library for a boolean value.
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Our preferred shared logging system is exported via the libqb library. As
a result, the corosync project no longer needs to export logsys.so and the
code can be directly included in the binary. The header file can also be
removed.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
switch from LOG_INFO to LOG_DEBUG for some basic operations
Reviewed-by: Steven Dake <sdake@redhat.com>
Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Icmap is replacement for objdb, based on libqb map (trie).
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Currently:
- send_reserve() adds to the reserve
- msg_count_send_ok() tests ((avail - totempg_reserved) > msg_count)
So essentially we are checking to see if 2 * msg_count can fit in
the q.
So instead I am using byte_count_send_ok (size) to see if the
message will fit then calling send_reserve()
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
To allow async cpg messages of 1M we need to:
1) increase the totem queue size 4 times
2) align the critical level to one large message free
There are a number of reasons for doing this:
We can't let cpg_mcast_joined() fail because the user will not see it
and will assume is has succeded.
The reason we are getting good performance is by providing a negative
feedback loop from the totem q to the IPC/poll system. This relies
on 4 q states low/med/high/crit. With messages of size 1M you
now have a q of size one and now go from level low to crit instantly
then back to low as messages are put on and taken off. I don't think
this is the best behaviour. By having a q size of 4 allows the system
to utilize the q better and give us time to respond to changes in
the q level.
To effectively achieve flow control with a q of size 1 would require
all the clients to request the space on the q like is done in
totempg_groups_joined_reserve() but probably in shared memory
This would take quite a bit of re-work.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Many functions do not require flowcontrol and are two-way
so they can get failures from corosync.
Only cpg_mcast_joined() _really_ needs the current level
of flowcontrol.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Today, I have observed one of the reason that corosync running into
FAILED TO RECEIVE state.
There was five nodes(A,B,C,D,E) in my testing, and I limited the UDP
transmission rate of C nodes by iptables command:
iptables -A INPUT -i eth0 -p udp -m limit --limit 10000/s
--limit-burst 1 -j ACCEPT
iptables -A INPUT -i eth0 -p udp -j DROP
After one hour later, C node had been missing some MCAST messages,
it's state described as following:
==state of C node==
my_aru:0x805
my_high_seq_received:0xC2C
my_aru_count:7
=>receved MCAST message with seq:806 from B nodes
=>enter *message_handler_mcast*
=>add this message to regular_sort_queue
...
=>enter *update_aru* function
=> range = (my_high_seq_received - my_aru)
= (0xC2C - 0x805)
= 1063
=> if range>1024, do nothing and and return directly.
==END==
According this logic, after (my_high_req_received-my_aru)>1024, my_aru
will not be updated though corosync can receive MCAST messages
retransmitted by other nodes.
But at that timte, my_aru_count was only 7. So the corosync at C node
would keep in this status until my_aru_count increased to
fail_to_recv_const(the default value is 2500). This was a long time
for corosync, but we wasted it.
To solve this issue, maybe we can enlarge the range condition in
update_aru function? Or we just ingnore the checking of range value,
it seems no harmfull, because we have been using fail_to_recv_const to
control the things.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Although incorrect nodeid will not affect program's logic, but it will
make us confused when we add some logs to record the transmission path of
token in debug mode.
Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Accordig the totem protocal, nodes should enter GATHER state when it
receive JoinMSG in OPERATIONAL state. If we discard it in OPERATIONAL
state, the nodes sending this JoinMSG could not receive the response
untill other nodes reach token lost timeout.
This bug will cause nodes having entered GATHER state spend more time to
rejoin the ring, and then it will make nodes reach token expired timeout
more easily.
Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Add a new object called totem.interface.dynamic to allow creation/deletion
of new child objects using the corosync-objctl utility:
to add new member:
linux# corosync-objctl -c totem.interface.dynamic.10-211-55-12
to delete an existing member:
linux# corosync-objctl -d totem.interface.dynamic.10-211-55-12
Corosync will dynamically add these members to the configuration and start
communicating with those nodes.
Signed-off-by: Anton Jouline <anton.jouline@cbsinteractive.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Recent changes in patch "Get rid of hdb usage in totempg.h interface"
caused incompatibility between corosync API and totempg.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>