Commit Graph

176 Commits

Author SHA1 Message Date
Angus Salkeld
0ad2494ae7 Fix some "set but not used" warnings [-Wunused-but-set-variable]
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-16 07:13:42 +11:00
Zane Bitter
dddaeef21c Allocate packet buffers in the transport drivers
This change paves the way for eliminating a copy within the Infiniband
driver in the future by transferring responsibility for allocating and
freeing message buffers to the transport driver layer.

Tested under valgrind on a single-node cluster.

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-11 20:38:28 -07:00
Steven Dake
6aa47fde95 Fix abort when token is lost in RECOVERY state
A commit token should be rejected when a token is lost in the recovery
state.  This occurs naturally because the ring id increases by 4 for
every new ring.  Prior to this patch, if the token was lost, the old
ring id information was restored, causing a commit token to be accepted
when it should be rejected.  This erronously accepted commit token would
lead to an assertion which is fixed by this patch.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-03-07 17:15:05 -07:00
Steven Dake
7471c88346 Don't assert when ring id file is less then 8 bytes
If the ring id file for the processor is less then 8 bytes, totemsrp would
assert.  Our speculation is that this condition happens during a fencing
operation or local filesystem corruption.

With this patch, Corosync will create fresh ring id file data when the
incorrect number of bytes are read from the ring id.

Amend to use sizeof the strerror string length and PATH_MAX for the path length.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-02-24 15:34:39 -07:00
Steven Dake
6646a864b4 Handle delayed multicast packets that occur with switches
Some switches delay multicast packets vs the unicast token.  This patch works
around that problem by providing a new tuneable called miss_count_const.  This
tuneable works by counting the number of times a message is found missing
and once reaching the const value, marks it as missing in the retransmit list.

This improves performance and doesn't display warning messages about missed
multicast messages when operating in these switching environments.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-01-11 10:34:46 -07:00
Jan Friesse
b9df4424b1 Display warning when not possible to form cluster
This may typically happen if local firewall is enabled. Patch adds new
item to statistics called continuous_gather where is number of
continuous entered gather state. If this number is bigger then
MAX_NO_CONT_GATHER, warning message is displayed. This is also used on
exiting, so stop of corosync is now possible even with enabled firewall.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2010-12-03 10:11:11 +01:00
Steven Dake
bb05aed93f Add the UDPU transport
The UDPU transport is useful for those deployments which can't use multicast.
UDPU works by using UDP unicast, which is fully supported by every switch
manufacturer by default and doesn't rely on a functional IGMP implementation.

An example of the UDPU transport is contained in the corosync.conf.example.udpu
file which shows a 16 node cluster.  This file should be copied to each node
in the cluster and IP addresses changed as appropriate.

Amended to remove dead udpu REUSEADDR socket option.

Signed-off-by: Steven Dake <sdake@redhat.com>
2010-11-18 14:21:30 -07:00
Steven Dake
fef259970a Remove cancel token retransmit timeout.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3012 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-08-03 17:31:33 +00:00
Steven Dake
b8878d2e76 Remove reset of token timeout on retransmitted token reception. The timer
should only be reset when a real token is received or membership protocol
could run into problems with certain timing parameters.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2988 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-07-14 18:35:36 +00:00
Steven Dake
95615b2fec Fix fail list fault that occurs in very rare circumstances.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2984 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-07-03 21:54:22 +00:00
Steven Dake
22471e113d Fix fail to receive logic which occurs very rarely on high loss networks with
software based multicast.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2919 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-06-03 21:36:21 +00:00
Steven Dake
79c60fd0ad Totem spec is clear:
reject retransmitted tokens
if token.aru = aru in token on last rotation ... do some logic

Here is how the current code works:

last_aru = instance->my_last_aru;
instance->my_last_aru = token->aru;
reject retransmitted tokens
if token.aru = aru in token on last rotation ... do some logic

The issue is last_aru will be set to token->aru when a token retransmission
occurs before a new token arrives.

This results in the "do some logic" part happening more often then it should.    


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2917 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-06-01 20:35:53 +00:00
Angus Salkeld
2a23dfc585 cov 10391: allow assert to check for a negative number
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2848 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-05-18 00:43:41 +00:00
Angus Salkeld
92985c31d9 cov 10405: remove unused pointer from totemsrp.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2845 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-05-18 00:12:52 +00:00
Angus Salkeld
18326ad242 cov 10392: remove pointless assert
backlog is unsigned



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2842 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-05-16 21:40:19 +00:00
Angus Salkeld
3eb76c2154 cov 10382: improve error handling around open()
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2825 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-05-14 02:05:32 +00:00
Steven Dake
5672c3efa8 The retransmit token storage area is an improper type of an array of pointers
rather then a pointer to a buffer.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2795 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-04-30 05:18:02 +00:00
Steven Dake
005b9af59d When a message is retransmitted, a memmove operation is done to remove the
newly retransmitted entry from the list.  It is possible this memmove operation
can buffer overflow because it has an invalid length calculation fixed by this
revision.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2794 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-04-30 05:15:41 +00:00
Steven Dake
80d621e25f Allow maximum entries in the retransmit queue when recovery takes place.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2793 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-04-30 05:14:08 +00:00
Steven Dake
107ef19913 Save the ring id and restore it properly when the recovery operation fails as
a result of a new gather or token loss.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2792 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-04-30 05:12:26 +00:00
Steven Dake
084d19cddc Fix problem where retransmissions don't occur resulting in failure to receive
condition.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2685 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-03-20 20:08:38 +00:00
Angus Salkeld
20f3331d0e convert strerror() into strerror_r()
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2665 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-02-25 19:28:36 +00:00
Steven Dake
bc64fbcb58 Patch to set unset value in token hold cancel structure as to not crash
wireshark.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2660 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-02-18 20:08:39 +00:00
Angus Salkeld
8671945cef totemsrp: fix transitional configuration changes with long token timeouts
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2651 fd59a12c-fef9-0310-b244-a6a79926bd2f
2010-02-02 06:24:01 +00:00
Steven Dake
dc2a0e68d6 Remove invalid assertion in totemsrp.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2640 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-12-15 19:22:36 +00:00
Steven Dake
208d907f63 Remove string overwrite if many recovery messages are originated.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2580 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-12-08 00:01:39 +00:00
Steven Dake
f434e74faa Remove compiler warning in totemsrp.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2578 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-12-07 23:21:01 +00:00
Steven Dake
b4367e6075 Set boolean indicating the retrans flag was set to 1 to 0 when setting retrans
flag in token to zero.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2573 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-12-07 18:41:49 +00:00
Steven Dake
ff5cd7d57c Make assertions for range checking of message delivery check with the define
instead of magic numbers that are not valid if the define changes.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2572 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-12-07 18:22:48 +00:00
Steven Dake
adf8d6db24 Prevent lockup in recovery state in totem after 206 messages are originated.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2569 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-12-07 05:03:25 +00:00
Steven Dake
3020fd5742 Fix recovery messages to be proper length to remove segfault that occurs during
recovery.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2568 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-12-07 05:02:28 +00:00
Angus Salkeld
c052bd3f3f stats: don't calloc the totemsrp stats struct.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2564 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-12-03 18:18:29 +00:00
Angus Salkeld
acf9a8d85f Correct some ugly indentation.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2563 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-12-03 18:13:52 +00:00
Steven Dake
72fe262478 Start pause timer at initialization so first gather doesn't result in pause
timeout operations.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2555 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-11-30 19:11:20 +00:00
Angus Salkeld
29eb20a389 Rename totem_new_msg_signal() to something more generic.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2553 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-11-29 18:42:00 +00:00
Angus Salkeld
948ca19aa7 Add some missing calls to increment the relevant stats.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2519 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-10-12 21:56:23 +00:00
Angus Salkeld
73a24c0352 Add totem stats to objdb.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2517 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-10-12 17:30:20 +00:00
Steven Dake
21825d46ea Fix incorrect assertion with frame sizes of 9000.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2395 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-08-28 01:19:30 +00:00
Jan Friesse
c50a6bd065 Support for monotime timer
This patch should solve problems with corosync and ntp, by using
clock_gettime where it make sense.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2373 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-07-27 10:12:55 +00:00
Steven Dake
69928e301a Add notification when totem has completed initialization.
This triggers the initialization of the service engines which may need totem
for initialization.



git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2372 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-07-27 02:00:05 +00:00
Steven Dake
f9f663f459 Add a target token set completed callback in totemrrp and below layers.
Handle management of callback in totemsrp.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2371 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-07-27 00:29:32 +00:00
Steven Dake
5eae4c135c Optimization of totemsrp and below by removing hdb usage. cpgbench shows
results of 4% to 20% increase in tps and mbs depending on hardware.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2369 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-07-22 22:10:35 +00:00
Steven Dake
fc2de3db2a Simplify notifications from totem at the notice level.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2315 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-06-28 05:26:25 +00:00
Steven Dake
23aea08ae4 Slay the debug messages coming out at notice level in totem.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2313 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-06-28 05:12:27 +00:00
Steven Dake
47c6bc3aaf Remove totemsrp warning.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2312 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-06-28 04:57:31 +00:00
Steven Dake
e448603f2f Add ability to detect process pause and not implode the membership algorithm
when this occurs.


git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2304 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-06-26 21:39:44 +00:00
Jim Meyering
c424b53308 totemsrp: remove unnecessary cast to avoid "make syntax-check" failure
* exec/totemsrp.c (message_handler_memb_join): Remove unnecessary
cast of alloca return value.

git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2279 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-06-21 18:27:02 +00:00
Steven Dake
04cf210d9d Use HAVE_ALLOCA_H define before including alloca.h
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2278 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-06-21 16:46:24 +00:00
Steven Dake
b8e3951ca1 Add (void *) casts for iovector assignments to remove compile warnings.
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2270 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-06-19 20:43:12 +00:00
Fabio M. Di Nitto
6d5ce092a1 logsys: port to new packed rec_ident version
git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2250 fd59a12c-fef9-0310-b244-a6a79926bd2f
2009-06-18 05:32:56 +00:00