This patch passes two test cases:
-------
Test #1
-------
Two node cluster - run cpgbench on each node
modify totemsrp with following defines:
Two test cases:
-------
Test #2
-------
5 node cluster
start 5 nodes randomly at about same time, start 5 nodes randomly at about
same time, wait 10 seconds and attempt to send a message. If message blocks
on "TRY_AGAIN" likely a message loss has occured. Wait a few minutes without
cyclng the nodes and see if the TRY_AGAIN state becomes unblocked.
If it doesn't the test case has failed
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Reviewed-by: Jan Friesse <jfriesse@redhat.com>
NSS is currently non-conditional. Allow nss to be build conditonally.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalked@redhat.com>
a memb_join operation that occurs during flushing can result in an
entry into the GATHER state from the RECOVERY state. This results in the
regular sort queue being used instead of the recovery sort queue, resulting
in segfault.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
There were too much false positives with passive mode rrp when high
number of messages were received.
Patch adds new configurable variable rrp_problem_count_mcast_threshold
which is by default 10 times rrp_problem_count_threshold and this is
used as threshold for multicast packets in passive mode. Variable is
unused in active mode.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed by: Steven Dake <sdake@redhat.com>
If all interfaces were faulty, passive_mcast_flush_send and related
functions ended in endless loop. This is now handled and if there is no
live interface, message is dropped.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed by: Steven Dake <sdake@redhat.com>
hdb has some expense and is not necessary in the totempg.so runtime. This
patch removes the dependence on hdb and instead uses a direct pointer.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
This API allows totem to operate as a multithreaded library. Performance is
better without threads but some library users may only have multithreaded
systems. In the corosync case where we have removed threads, this reduces
cpu utilization by ~10% by removing about 50% of the mutex lock and unlock calls
that occur during typical operation. Since the latest corosync is nearly
thread free, there is no need for mutex operations.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
This file is only used by totemsrp.c. Move out of general include
directory.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
This removes a sprintf operation in the totem and ipc logging operations
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
In a 10-node cluster where all nodes are booting up and starting corosync
at the same time, sometimes during this process corosync detects a node as
leaving and rejoining the cluster.
Occasionally the downlist that gets picked contains the local node. When the
local node sends leave events for the downlist (including itself), it sets
its cpd state to CPD_STATE_UNJOINED and clears the cpd->group_name. This
means it no longer sends CPG events to the CPG client.
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Ring ID was being displayed both as hex and decimal in places. Update so
it's displayed consistently (I chose hex) to make debugging easier.
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
As a corosync-newbie it can be hard to bridge the gap between where a
particular message is sent and where the receive handler processes it,
and vice versa.
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
corosync_timer_handle_t is know conditionally defined to prevent double
definition causing compile fault on RHEL 6 systems.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>