Commit Graph

281 Commits

Author SHA1 Message Date
Christine Caulfield
14a5e6f361 totemsrp: Fix orf_token stats
Previously, orf_token_tx was only incremented on initial send,
this is obviously wrong and resulted in the TX count being
significantly lower than any RX count. Now we increment it every
time the ORF token is sent or resent.

As a quick test, on a single node system the RX and TX stats
will now match.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-10-31 10:58:02 +01:00
Jan Friesse
749f1cb9a5 totem: Use uint64_t type and QB_TIME_NS_IN_MSEC
Function message_handler_orf_token contains extra debug info enabled by
defining GIVEINFO. Insted of using long long unsigned int use better
suited uint64_t and make use of QB_TIME_NS_IN_MSEC constant instead
of hardcoded number. Also compile tv_old conditionally so it is not used
by accident.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-10-23 16:03:14 +02:00
Jan Friesse
55a6f657f4 totem: Use proper timestamp type for token warning
Timestamp diff is very unlikely to be larger than 32-bit integer but it
is still worth to use 64-bit.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-10-23 16:03:01 +02:00
Jan Friesse
3785829935 stats: Store token rx and tx timestamps as 64-bit
Token rx and tx timestamps were computed and stored as 32-bit unsigned
integer but substracted in other parts of code from 64-bit integer.
Result was, that node with uptime larger than 49.71 days
(2^32/(1000*60*60*24)) reported wrong numbers for
stats.srp.time_since_token_last_received and in log message during long
pause (function timer_function_orf_token_warning).

Solution is to store rx and tx data as 64-bit integer.

Fixes #761

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-10-23 16:02:50 +02:00
Jan Friesse
c01fd757a0 totem: Fix reference links
Link Corosync project archived copy of Yair Amir's PhD thesis
and paper about totem protocol.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2024-03-12 17:22:42 +01:00
Christine Caulfield
33fa5dcb85 config: Fail to start if ping timers are invalid
This required adding a lot of return values to two previously
'void' functions. I did two rather than just the one that was
needed because it seemed to make sense to do them both together.

Although these functions now return errors, they are probably
still ignored higher up. this really needs a comprehensive audit.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-10-05 15:53:55 +02:00
Jan Friesse
e7a82370a7 totemsrp: Switch totempg buffers at the right time
Commit 92e0f9c7bb added switching of
totempg buffers in sync phase. But because buffers got switch too early
there was a problem when delivering recovered messages (messages got
corrupted and/or lost). Solution is to switch buffers after recovered
messages got delivered.

I think it is worth to describe complete history with reproducers so it
doesn't get lost.

It all started with 402638929e (more info
about original problem is described in
https://bugzilla.redhat.com/show_bug.cgi?id=820821). This patch
solves problem which is way to be reproduced with following reproducer:
- 2 nodes
- Both nodes running corosync and testcpg
- Pause node 1 (SIGSTOP of corosync)
- On node 1, send some messages by testcpg
  (it's not answering but this doesn't matter). Simply hit ENTER key
  few times is enough)
- Wait till node 2 detects that node 1 left
- Unpause node 1 (SIGCONT of corosync)

and on node 1 newly mcasted cpg messages got sent before sync barrier,
so node 2 logs "Unknown node -> we will not deliver message".

Solution was to add switch of totemsrp new messages buffer.

This patch was not enough so new one
(92e0f9c7bb) was created. Reproducer of
problem was similar, just cpgverify was used instead of testcpg.
Occasionally when node 1 was unpaused it hang in sync phase because
there was a partial message in totempg buffers. New sync message had
different frag cont so it was thrown away and never delivered.

After many years problem was found which is solved by this patch
(original issue describe in
https://github.com/corosync/corosync/issues/660).
Reproducer is more complex:
- 2 nodes
- Node 1 is rate-limited (used script on the hypervisor side):
  ```
  iface=tapXXXX
  # ~0.1MB/s in bit/s
  rate=838856
  # 1mb/s
  burst=1048576
  tc qdisc add dev $iface root handle 1: htb default 1
  tc class add dev $iface parent 1: classid 1:1 htb rate ${rate}bps \
    burst ${burst}b
  tc qdisc add dev $iface handle ffff: ingress
  tc filter add dev $iface parent ffff: prio 50 basic police rate \
    ${rate}bps burst ${burst}b mtu 64kb "drop"
  ```
- Node 2 is running corosync and cpgverify
- Node 1 keeps restarting of corosync and running cpgverify in cycle
  - Console 1: while true; do corosync; sleep 20; \
      kill $(pidof corosync); sleep 20; done
  - Console 2: while true; do ./cpgverify;done

And from time to time (reproduced usually in less than 5 minutes)
cpgverify reports corrupted message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2021-11-03 10:19:44 +01:00
Jan Friesse
cdf72925db totem: Add cancel_hold_on_retransmit config option
Previously, existence of retransmit messages canceled holding
of token (and never allowed representative to enter token hold
state).

This makes token rotating maximum speed and keeps processor
resending messages over and over again - overloading network
and reducing chance to successfully deliver the messages.

Also there were reports of various Antivirus / IPS / IDS which slows
down delivery of packets with certain sizes (packets bigger than token)
what make Corosync retransmit messages over and over again.

Proposed solution is to allow representative to enter token hold
state when there are only retransmit messages. This allows network to
handle overload and/or gives Antivirus/IPS/IDS enough time scan and
deliver packets without corosync entering "FAILED TO RECEIVE" state and
adding more load to network.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2021-08-20 16:55:48 +02:00
Christine Caulfield
9e7f62d27d cfg: New API to get extended node/link infomation
Current we horribly over-use totempg_ifaces_get() to
retrieve information about knet interfaces. This is an attempt to
improve on that.

All transports are supported (so not only Knet but also UDP(U)).

This patch builds best against the "onwire-upgrade" branch of knet
as that's what sparked my interest in getting more information out.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-11-26 16:15:50 +01:00
Aleksei Burlakov
98bfd9988b totemsrp: More informative messages
... when token and consensus timeouts pop.

Signed-off-by: Aleksei Burlakov <aburlakov@suse.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-10-15 16:46:51 +02:00
Jan Friesse
40d636e9ef totemsrp: Move token received callback
Trigger token received callback only for valid token.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2020-09-29 15:51:49 +02:00
Christine Caulfield
5f71445be0 config: Allow reconfiguration of crypto options
Needs new knet crypto API.

If it's not available, then fall back to the old
API and forbid changing crypto while running.

To avoid us being dependant on the leader node, each
node sends its own crypto_reconfig_phase messages so
we can guarantee that the reconfiguration always completes
on each node.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-07-09 16:54:16 +02:00
Christine Caulfield
f078fff6eb config: Reorganise the config system
To be more reliable & maintainable

The basic plan here is to fix reloads to be more stable
using read/parse/verify/build/commit stages, so that any errors
will not leave corosync in an unstable state. This should
also make the code more maintainable as currently the verify/commit
stages are horribly intertwined.

Also:
- Fix local_node_pos not being updated in the new map during validation
 (broke adding and removing new nodes in the middle of the list).
- Fix reconfiguration so that nodes are indexed by nodeid and not their
  position in the list. This is an old bug that's just been carried
  over

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2020-04-24 16:26:44 +02:00
Jan Friesse
6ba9870f69 Initialize stack allocated memory
Some functions allocated memory on stack without clearing memory and
then send them on wire. This is not an issue, but valgrind reports this
as a problem so it is easy to miss real problem then.

Solution is to clear stack memory.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-11-08 11:20:18 +01:00
Jan Friesse
ee8b8993d9 totemsrp: Reduce MTU to left room second mcast
Messages sent during recovery phase are encapsulated so such message has
extra size of mcast structure. This is not so big problem for UDPU,
because most of the switches are able to fragment and defragment packet
but it is problem for knet, because totempg is using maximum packet size
(65536 bytes) and when another header is added during retransmition,
then packet is too large.

Solution is to reduce mtu by 2 * sizeof (struct mcast).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2019-10-09 11:48:43 +02:00
Jan Friesse
3675daceee totem: Increase ring_id seq after load
This patch handles the situation where the leader
node (the node with lowest node_id) crashes and is started again
before token timeout of the rest of the cluster.
The newly restarted node restores the ringid of the old ring from
stable storage, so it has the same ringid as rest of the nodes,
but ARU is zero. If the node is able to create a singleton membership
before receiving the joinlist from rest of the cluster,
everything works as expected, because the ring id gets increased
correctly.

But if the node receives a joinlist from another cluster node before
its own joinlist, then it continues as it would had it never left
the cluster. This is not correct, because the new node should always
create a singleton configuration first.

During the recovery phase, ARUs are compared and because they differ
(the ARU of the old leader node is 0), the other nodes
try to sent all of their previous messages. This is impossible
(even if it was correct), because other nodes have already freed most
of those messages. The implementation uses an assert to limit maximum
number of messages sent during recovery (we could fix this,
but it's not really the point).

The solution here is to increase the ring_id sequence number by 1 after
loading it from storage. During creation of the commit token it is
always increased by 4, so it will not collide with an existing
sequence.

Thanks Christine Caulfield <ccaulfie@redhat.com> for clarify commit
message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-07-15 16:39:32 +02:00
Jan Friesse
5731af2782 logging: Add CS_PRI_NODE_ID and CS_PRI_RING_ID
Previously node id was logged ether as a %d (most often), %u, %x or
PRI.32 and ring id ether as %lld, %llx with various separators (., :, /)
between rep nodeid and seq. This seems to cause confusion.

This patch adds macros CS_PRI_NODE_ID, CS_PRI_RING_ID and
CS_PRI_RING_ID_SEQ (CS prefix = corosync, PRI modeled in spirit of
inttypes.h PRIx32) and makes code use them.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-07-03 10:53:52 +02:00
Jan Friesse
b0c24ec665 totemsrp: Fix warnings produced by gcc 9.1
New gcc warn about passing posibly unaligned pointer from packed
structure. This shouldn't be problem for x86.

Implemented solution is to let compiler do its job (compiler knows if
pointer is aligned so accessing structure field is safe) and
use it together with support for asigning and returning of structure
(not a pointer to the structure).

- srp_addr_copy is removed and replaced by simple assignment
- srp_addr_copy_endian_convert is removed and replaced by
  srp_addr_endian_convert function which takes srp_addr structure and
  returns endian converted srp_addr structure
- functions which accepts srp_addr array are not changed because
  (luckily) non-aligned pointer is always just one item array and
  such item is always used as a source pointer so it's possible to use
  temporary variable

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2019-06-14 10:03:31 +02:00
yuan ren
24a72e9780 totemsrp: Word spelling mistake
Signed-off-by: yuan ren <reyren179@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2019-04-01 08:20:46 +02:00
Jan Friesse
06504c0f6f build: Remove NSS dependencies
Complete removal of NSS from corosync tree. Most of the changes are
in build system and cpgverify had to be rewritten to use crc32 instead
of sha1.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-09-17 10:26:05 +02:00
Chris Walker
51989b4a0a Add option to force cluster into GATHER state
Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-09-07 13:27:36 +02:00
Chris Walker
3f7d2cf6aa Add token_warning configuration option
Token_warning is used to present information about
when the token was last received.

Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-08-14 10:34:49 +02:00
Jan Friesse
f60541513e totemsrp: Add assert into memb_lowest_in_config
Add assert when there are no members in token_memb structure so
non-existing member is not accessed (token should always have
at least one member).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-08-13 09:00:47 +02:00
Jan Friesse
e45bbcc92a totemsrp: Fix leave message regression
Leave message in totem is just join message where leaving member is
excluded from member list and included in fail list. It also contains
special nodeid in header.nodeid and system_from.nodeid fields.

Before "totem: Use nodeid ONLY in srp_addr" fix, most of the functions
were using system_from addresses and not nodeid, which was used only in
one specific case for memb_consensus_set function.

After the patch, addresses are gone and only nodeid is used. Result is,
that leaving node nodeid is not added into local fail list
(my_faillist) so node is unable to reach consensus till token timeout,
which starts new gather process.

Solution is to send valid leaving node nodeid in system_from.nodeid and
handle specific case for memb_consensus_set in memb_join_process.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-23 17:46:05 +02:00
Jan Friesse
dc590159f5 totemsrp: Log proc/fail lists in memb_join_process
These information are useful and with trace log level they should not be
too much irritating.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-23 17:45:51 +02:00
Jan Friesse
9b3782e48e totemsrp: Fix srp_addr_compare
There is regression caused by "totem: Use nodeid ONLY in srp_addr" patch
in srp_addr_compare function. This function should be usable with qsort,
so it should return values less than, equal to or greater than zero. It
was however returning only zero or negation of a zero. Final results
were unable to reach consensus in following test case:
- 3 node cluster
- start nodes 1, 2, 3
- shutdown node 3
- start node 3
- shutdown node 2
- start node 2
- shutdown node 1

After this steps, node 2 and 3 were unable to reach consensus.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-23 17:45:29 +02:00
Jan Friesse
ccb2290f84 totemsrp: Check join and leave msg length
If number of proc_list, failed_list or active members is too high it
may be impossible to put them into message, which is allocated on the
stack what results in stack corruption.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-12 15:25:38 +02:00
Jan Friesse
c139255669 totemsrp: Implement sanity checks of received msgs
Sanity checkers are used to prevent crashing because of
accessing unallocated memory.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-04-12 15:25:33 +02:00
Jan Friesse
69857efb5b totem: Display IP of sender
To make finding victim of incompatible messages easier, IP of sender is
logged. Propagating IP in layers makes patch slightly larger.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-03-16 13:58:15 +01:00
Jan Friesse
0c509a25a7 totemsrp: Add magic and version into header
Magic number (0xC070) together with version in every packet
is used for detecting that other node is really
Corosync 3.x.

Endian_detector field is removed and magic number is now
used instead.

If received packet magic number differs, guessing is used to show more
about the source (Corosync 2.3+, 2.2 are quite reliable, Knet and
unencrypted Corosync 2.1/2.0/1.x/OpenAIS are semi-reliable and encrypted
Corosync 2.1/2.0/1.x/OpenAIS are quite unreliable).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2018-03-16 13:57:55 +01:00
Christine Caulfield
2c20590d16 knet: Always use link0 for loopback
Even if it's not used for anything else.

Also, make cfgtool show the correct link ID when links are not
contiguous

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:23:20 +01:00
Christine Caulfield
111bfbc11d totem: Fix debug warnings printed by knet
Fix crash introduced a couple of commits ago in iface_get

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:22:22 +01:00
Christine Caulfield
386d710ed1 cfg: Fix cfg_get_node_addrs so that DLM works
Also update copyright dates

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:19:45 +01:00
Christine Caulfield
f5b690bd96 totem: Return interface count correctly
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:19:12 +01:00
Christine Caulfield
fc8580bdbf totem: Use nodeid ONLY in srp_addr
This shrinks the srp_addr (and consequently every packet sent by
corosync) so that instead of containing loads of IP addresses to
identify a node, it just sends the nodeid.

This then allows us to make ring0 optional and replaceable when running
knet.

It also means that we need some other way of identifying the local
node in corosync.conf, so the nodelist.node.name entry is now mandatory
and is mapped to the local host using the same algorithm as used in
cman.

This code needs LOTS of testing as it touches a huge amount of totemsrp
and totemconfig.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2018-03-01 14:18:51 +01:00
Christine Caulfield
1ca72a1154 totemsrp: Revert totemsrp_get_ifaces() changes
In my enthusiasm for removing code while integrating knet I
also deleted the correct code for returning IP address for a node,
so that only the IP addres of the local node was ever returned.

This commit restores the the previous code.

Also, because we always return INTERFACE_MAX interfaces now (they don't
have to be contiguous) set ss_family to zero if that interface is not
in use so that downstream apps know and don't display a lot of 0.0.0.0

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-11-30 16:59:05 +01:00
Christine Caulfield
d9dfd41e4e stats: Add cmap key to clear the various stats.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-31 17:39:14 +01:00
Christine Caulfield
16f616b65d knet: Add support for knet compression
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-10-23 17:30:25 +02:00
Christine Caulfield
294a629fb5 config: Allow dynamic link configuration
Now we are using knet, it's possible to dynamically add, remove and
reconfigure links on the fly.

Also print 'n' for non-existant knet links. This will show up
only on loopback links >0. But it looks better than 'status ='

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-09-21 17:16:21 +02:00
Christine Caulfield
9da89f32c2 CFG: Remove ring-reenable code
RRP doesn't exist any more so all the ring re-enable code is redundant.

I've removed it from the library and all the code that does anything,
but I've left the hole in the IPC just in case old libraries are
hanging around.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2017-08-03 14:32:02 +02:00
Jan Friesse
564b4bf7d4 totem: Propagate totem initialization failure
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2017-06-15 11:07:33 +02:00
Jan Friesse
1f90c31ba7 list: Replace for_each by safe version where need
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2016-10-27 14:56:52 +02:00
Michael Jones
b4c06e52f3 list: Replace uses of list.h with qblist.h
Signed-off-by: Michael Jones <jonesmz@jonesmz.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-10-27 14:56:52 +02:00
Christine Caulfield
268cde6ee4 totem: Add Kronosnet transport.
This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.

To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet

The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.

Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2016-10-11 10:09:42 +01:00
HideoYamauchi
71c9035c27 Low: totemsrp: Addition of the log.
Signed-off-by: HideoYamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-08-01 10:11:45 +02:00
Ferenc Wágner
c76ee39f61 Fix typo: Diabled -> disabled
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-06-22 14:26:48 +02:00
Ruben Kerkhof
37f092bbed totemsrp: Fix clang warning (tautological compare)
gsfrom is always >= 0

Signed-off-by: Ruben Kerkhof <ruben@rubenkerkhof.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2016-01-04 17:28:14 +01:00
Ferenc Wágner
73910bd66e totmesrp: Fix typo in log message
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-08-26 09:26:26 +02:00
Christine Caulfield
ab8942f626 totemsrp: Improve logging of left/down nodes
This patch from Hideo Yamauchi improves the logging of
whether nodes leave the cluster cleanly or uncleanly,
making it easier to determine if a node ws shut down
by the operator. There is also the possibility that a
LEAVE message could get missed (due to the node being
in flush state) so this can also make that clearer.

The modifications are as follows.

Change 1) I added the list which maintained LEAVE node to totemsrp.
Change 2) I added registration, a search, the handling of to clear LEAVE
node.
Change 3) I added the output to log.
Change 4) I changed an output level of the log.

Signed-off-by: Hideo Yamauchi <renayama19661014@ybb.ne.jp>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-06-12 16:16:45 +01:00
Andrey N. Groshev
5d9acc5604 totemsrp: Format member list log as unsigned int
Signed-off-by: Andrey N. Groshev <greenx@yandex.ru>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2015-03-05 16:34:07 +01:00