Today, I have observed one of the reason that corosync running into
FAILED TO RECEIVE state.
There was five nodes(A,B,C,D,E) in my testing, and I limited the UDP
transmission rate of C nodes by iptables command:
iptables -A INPUT -i eth0 -p udp -m limit --limit 10000/s
--limit-burst 1 -j ACCEPT
iptables -A INPUT -i eth0 -p udp -j DROP
After one hour later, C node had been missing some MCAST messages,
it's state described as following:
==state of C node==
my_aru:0x805
my_high_seq_received:0xC2C
my_aru_count:7
=>receved MCAST message with seq:806 from B nodes
=>enter *message_handler_mcast*
=>add this message to regular_sort_queue
...
=>enter *update_aru* function
=> range = (my_high_seq_received - my_aru)
= (0xC2C - 0x805)
= 1063
=> if range>1024, do nothing and and return directly.
==END==
According this logic, after (my_high_req_received-my_aru)>1024, my_aru
will not be updated though corosync can receive MCAST messages
retransmitted by other nodes.
But at that timte, my_aru_count was only 7. So the corosync at C node
would keep in this status until my_aru_count increased to
fail_to_recv_const(the default value is 2500). This was a long time
for corosync, but we wasted it.
To solve this issue, maybe we can enlarge the range condition in
update_aru function? Or we just ingnore the checking of range value,
it seems no harmfull, because we have been using fail_to_recv_const to
control the things.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Although incorrect nodeid will not affect program's logic, but it will
make us confused when we add some logs to record the transmission path of
token in debug mode.
Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Accordig the totem protocal, nodes should enter GATHER state when it
receive JoinMSG in OPERATIONAL state. If we discard it in OPERATIONAL
state, the nodes sending this JoinMSG could not receive the response
untill other nodes reach token lost timeout.
This bug will cause nodes having entered GATHER state spend more time to
rejoin the ring, and then it will make nodes reach token expired timeout
more easily.
Signed-off-by: Yunkai Zhang <qiushu.zyk@taobao.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Add a new object called totem.interface.dynamic to allow creation/deletion
of new child objects using the corosync-objctl utility:
to add new member:
linux# corosync-objctl -c totem.interface.dynamic.10-211-55-12
to delete an existing member:
linux# corosync-objctl -d totem.interface.dynamic.10-211-55-12
Corosync will dynamically add these members to the configuration and start
communicating with those nodes.
Signed-off-by: Anton Jouline <anton.jouline@cbsinteractive.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Recent changes in patch "Get rid of hdb usage in totempg.h interface"
caused incompatibility between corosync API and totempg.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
This patch passes two test cases:
-------
Test #1
-------
Two node cluster - run cpgbench on each node
modify totemsrp with following defines:
Two test cases:
-------
Test #2
-------
5 node cluster
start 5 nodes randomly at about same time, start 5 nodes randomly at about
same time, wait 10 seconds and attempt to send a message. If message blocks
on "TRY_AGAIN" likely a message loss has occured. Wait a few minutes without
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.
If it doesn't the test case has failed
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>
a memb_join operation that occurs during flushing can result in an
entry into the GATHER state from the RECOVERY state. This results in the
regular sort queue being used instead of the recovery sort queue, resulting
in segfault.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
There were too much false positives with passive mode rrp when high
number of messages were received.
Patch adds new configurable variable rrp_problem_count_mcast_threshold
which is by default 10 times rrp_problem_count_threshold and this is
used as threshold for multicast packets in passive mode. Variable is
unused in active mode.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed by: Steven Dake <sdake@redhat.com>
If all interfaces were faulty, passive_mcast_flush_send and related
functions ended in endless loop. This is now handled and if there is no
live interface, message is dropped.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed by: Steven Dake <sdake@redhat.com>
hdb has some expense and is not necessary in the totempg.so runtime. This
patch removes the dependence on hdb and instead uses a direct pointer.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
This API allows totem to operate as a multithreaded library. Performance is
better without threads but some library users may only have multithreaded
systems. In the corosync case where we have removed threads, this reduces
cpu utilization by ~10% by removing about 50% of the mutex lock and unlock calls
that occur during typical operation. Since the latest corosync is nearly
thread free, there is no need for mutex operations.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
This file is only used by totemsrp.c. Move out of general include
directory.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
This removes a sprintf operation in the totem and ipc logging operations
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
Ring ID was being displayed both as hex and decimal in places. Update so
it's displayed consistently (I chose hex) to make debugging easier.
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
As a corosync-newbie it can be hard to bridge the gap between where a
particular message is sent and where the receive handler processes it,
and vice versa.
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
corosync_timer_handle_t is know conditionally defined to prevent double
definition causing compile fault on RHEL 6 systems.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
IPC: return 0/-ENOBUFS from message handler
IPC: use the new rate_limit API to improve perf.
CPG: add send_async API & hook up flow control
IPC: Fix flow control getting stuck.
IPC: Port the remaining libs to use libqb IPC
IPC: remove libqb flowcontrol API
TEST: put cpg_dispatch() in it's own thread
IPC: cleanup ipc_glue.c name everything cs_ipcs_*()
IPC: add back statistics
IPC: remove coroipcc_ symbols from lib*.versions
IPC: init each se's IPC as it is loaded.
IPC: use the new connection_closed() event to free the context.
IPC: re-add zero copy functionality back
IPC: remove cpg_mcast_joined_async() and make it the default
-> now cpg_mcast_joined() == cpg_mcast_joined_async()
libqb: expose a libqb error converter
libqb: add missing error conversions
libqb: remove repeat try loop in lib/cpg.c
CPG: fix zero copy mcast
CPG: use newer return codes
Add ENOTCONN to qb_to_cs_error()
libqb: fix error conversion from errno to cs_error_t in confdb
libqb: change errno_to_cs to qb_to_cs_error
libqb: add a cs_strerror() to get a more meaningful message
libqb: fix some confusing error conversions.
libqb: set the timeout on recv's to -1 (wait forever)
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>