Commit Graph

2561 Commits

Author SHA1 Message Date
Russell Bryant
a53e402912 logsys.c: Use snprintf() instead of sprintf().
Change a couple of string functions to use the the output length
limiting counterpart.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
2011-05-08 02:42:47 -05:00
Jan Friesse
801717e463 corosync-objctl: Option to display binary data
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-05-13 12:01:59 +02:00
Angus Salkeld
956a1dcb42 cpg: fix sync master selection when one node paused.
If one node is paused it can miss a config change and
thus report a larger old_members than expected.

The solution is to use the left_nodes field.

Master selection used to be "choose node with":
1) largest previous membership
2) (then as a tie-breaker) node with smallest nodeid

New selection:
1) largest (previous #nodes - #nodes know to have left)
2) (then as a tie-breaker) node with smallest nodeid

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:30 +10:00
Angus Salkeld
f3387a8287 CTS: fix some tests that didn't handle been called more than one
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:30 +10:00
Angus Salkeld
b1d65a7e8c CTS: sort the configuration - prevent duplicates in the config file
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:30 +10:00
Angus Salkeld
6ed7c36c95 CTS: fix syntax error in log message
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:30 +10:00
Angus Salkeld
fbb53397c3 CTS: bump up log messages of failed RPC
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:30 +10:00
Angus Salkeld
f33ea3be2b CTS: don't force all-once (breaks random tests)
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:30 +10:00
Angus Salkeld
1db961d6ad autobuild: improve messages
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:30 +10:00
Angus Salkeld
dc402cbb98 CTS: add -l to keygen (normal keygen struggles to run on VMs)
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:30 +10:00
Angus Salkeld
a22c9afde0 CTS: send with correct number of iovecs
Else payload won't be sent

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:30 +10:00
Angus Salkeld
e4cf620e6f CTS: timer should not be on the stack
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 21:39:29 +10:00
Jan Friesse
61d83cd719 totemsrp: Enhance mcast failure detection
memb_state_gather_enter increase stats.continuous_gather only if
previous state was gather also. This should happen only if multicast is
not working properly (local firewall in most cases) and not if many
nodes joins at one time.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 11:00:26 +02:00
Jan Friesse
719fddd8e1 coroipcs: Deny connect to service without initfn
If library connect to service with no init function, coroipcs will try
to dereference NULL pointer. Now we correctly return error code
CS_ERR_NOT_EXIST.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-15 08:56:35 +02:00
Tim Serong
5b92829d6c Add ipc_refcnt to message_handler_req_{exec, lib}_cfg_ringreenable()
Without refcounting the conn pointer here, corosync will segfault
if one kills a running instance of "corosync-cfgtool -r" (rhbz#695191)

Signed-off-by: Tim Serong <tserong@novell.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-14 20:14:12 -07:00
Steven Dake
6a752ba1b1 Align ipc on 8 byte boundaries
Align all ipc messages on 8 byte boundaries.  This alignment will remove bus
errors on systems that can't access non-byte aligned data and should improve
performance.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-04-14 17:25:08 -07:00
Steven Dake
83f528b473 Fix problem where unaligned totemip address access would result in bus error on non-unaligned-safe architectures.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-04-14 17:22:02 -07:00
Greg Walton
1db74fe1b9 Clean up ENDIAN ifdef tests
Signed-off-by: Greg Walton <corosync@gwalton.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-14 17:10:17 -07:00
Tim Serong
61d7ec1716 Fix tyop in RRP faulty error messages
Signed-off-by: Tim Serong <tserong@novell.com>
Reviewed-by: Russell Bryant <russell@russellbryant.net>
2011-04-11 02:00:18 -05:00
Angus Salkeld
4ed97d991b IPC: place calls to stats functions outside of mutexes
This is to prevent nasty deadlocks between IPC and objdb.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-13 08:15:59 +10:00
Zane Bitter
6365150ae2 Provide better checking of the message type
A negative value for the message type (on systems where char is signed)
would cause a crash. This is highly probable if the cluster is, for example,
misconfigured to have encryption enabled on some nodes but not others.

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-12 13:09:39 -07:00
Zane Bitter
6e990d202f Fix uninitialised memory errors found by valgrind
Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-08 09:13:12 -07:00
Angus Salkeld
265661745d Fix shutdown when a confdb client is still connected
If you are connected to corosync and registered for
object notifications then corosync is asked to shutdown
the IPC server will get stuck. This is because the pipe
is closed and the refcount is increased. This leaves ipcs
with a connection that it can't destroy.

Solution:
1) if a write to the pipe fails (pipe closed) decrement the refcounter.
2) fix the object_track_stop() - it was not working as the functions
   did not match up. (this caused the late callbacks).
3) in ipcs call exit_fn() then stats_destroy_connection() so that
   the service engine can have time to call object_track_stop()
   before the object gets destroyed.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-29 13:48:20 +11:00
Angus Salkeld
076e8b74f7 STATS: add the service name to the connection name.
This helps to quickly identify what service the application
is connected to.

The object will now look like:
runtime.connections.corosync-objctl:CONFDB:19654:13.service_id=11
runtime.connections.corosync-objctl:CONFDB:19654:13.client_pid=19654
etc...

This also makes it clearer to receivers of the dbus/snmp events
what is going on.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-29 13:48:13 +11:00
Angus Salkeld
4991ccd3d8 NOTIFYD: prevent duplicate quorate events.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-29 13:48:09 +11:00
Angus Salkeld
a97e1f0813 NOTIFYD: fix retrieving the application's parent name.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-29 13:47:42 +11:00
Jan Friesse
b4bef1cbf5 cfgtool: print list of IP with space between items
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-24 17:42:09 +01:00
Jan Friesse
f6df7823fa cpgtool: print list of IP with space between items
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-24 17:42:09 +01:00
Jan Friesse
033f7ced10 cfg_get_node_addrs: Return correct addresses
Zero element array behavior is very different from normal array or
pointer. This behavior is root of problem in not returning correctly
filled array of addresses. This appeared only in rrp mode, where more
then one address is returned.

All memcpy's are now correctly converted to copy pointer to char.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-24 17:42:08 +01:00
Steven Dake
7d5e588931 totemsrp: free messages originated in recovery rather then rely on messages_free
Relying on messages_free may seem like it should work, but it leads to a
situation where every node has released the messages, yet some nodes think
messages are missing.  The output then looks like "Retransmit: #" in
repitition.  This patch frees those messages immediately during the transition
to the OPERATIONAL state and sets the internal variables totemsrp depends
upon to the proper values.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2011-03-24 09:25:15 -07:00
Steven Dake
ef05817ce5 totemsrp: Only restore old ring id information one time
The current code stores the current ring information every time a commit
token is generated.  This causes the old ring id used for comparison purposes
to increase if a token is lost in commit or recovery, resulting in failure of
totem.  This patch changes the behavior to only store the old ring id one
time when the commit token is received, and then further commit token ring
id saves are not done until OPERATIONAL is reached.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2011-03-24 09:22:34 -07:00
Steven Dake
1a7b7a39f4 totemsrp: Remove recv_flush code
The recv_flush code is no longer necessary because of the miss_count_count
addition.  It can in some cases lead to register corruption because of
interactions with -fstack-protector, the recursive nature of how this code
works, and interactions with the optimizer in some versions of gcc.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2011-03-24 09:21:27 -07:00
Angus Salkeld
75087f7c1b confdb: send notifications from the main thread not IPC thread
corosync-notifyd has exposed an issue with confdb notifications.

The normal state of affairs is:
IPC thread > lock > objdb > lock

objdb notification whilst really useful turn things around:
<middle of big call chain>
objdb > lock > confdb > ipc > lock

This reverse ordering of locks causes a horrible dead lock.

I see this patch as a work around until corosync-2.0
when most of the threads and locking disappear.

This patch adds a pipe to confdb service. When we get a
objdb notification a struct gets written to the pipe.
The poll loop then runs the dispatch in the main thread.
In the dispatch we call the real ipc_dispatch_send().

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-24 07:54:42 +11:00
Steven Dake
d99fba72e6 Resolve abort during simulatenous stopping of atleast 4 nodes
consider 5 nodes.

node 3,4 stopped (by random stopping) node 1,2,5 form new configuration
and during recovery node 1 and node 2 are stopped (via service service
corosync stop).  This causes 5 never to finish recovery within the timeout
period, triggering a token loss in recovery.  Bug #623176 resolved an assert
which happens because the full ring id was being restored.  The resolution
to Bug #623176 was to not restore the full ring id, and instead operate
(according to specifications) the new ring id.  Unfortunately this exposes
a problem whereby the restarting of nodes 1-4 generate the same ring id.
This ring id gets to the recovery failed node 5 which is now in gather,
and triggers a condition not accounted for in the original totem specification.

It appears later work from Dr. Agarwal's PHD dissertation considers this
scenario.  That solution entails rejecting the regular token in the above
condition.  Since the ring id is also used to make decisions for commit token
acceptance, we must also take care to reject the regular token in all cases
after transitioning from OPERATIONAL.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-21 09:26:35 -07:00
Angus Salkeld
7004457014 notifyd: dispatch only one message at a time.
This is avoid getting stuck in the dispatch processing
messages when the user is trying to shutdown the service.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-21 09:24:01 -07:00
Angus Salkeld
0ad2494ae7 Fix some "set but not used" warnings [-Wunused-but-set-variable]
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-16 07:13:42 +11:00
Angus Salkeld
c9dee9eaa7 Remove the ttl option from udpu and rely on the kernel ttl setting.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2011-03-15 19:35:23 +11:00
Angus Salkeld
86ada30aa4 Fix the ttl defaults and range
1) both IPv4 and IPv6 mcast should default to ttl=1
2) the range should be 0..255
   0 is valid meaning localhost only (cluster of one)

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2011-03-15 19:34:46 +11:00
Russell Bryant
9909a20859 Add Doxyfile to .gitignore
Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-03-15 11:08:45 +11:00
Angus Salkeld
b6ba64c1eb docs: auto-generate the version
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-12 19:39:04 +11:00
Russell Bryant
5da4d5479a Convert existing documentation to doxygen format.
This patch modifies most of the existing comments in header files to be
in a format that doxygen can interpret.  This provides another
significant improvement to the web/pdf/etc generated documentation
without having to add new content.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-03-12 15:03:16 +11:00
Zane Bitter
dddaeef21c Allocate packet buffers in the transport drivers
This change paves the way for eliminating a copy within the Infiniband
driver in the future by transferring responsibility for allocating and
freeing message buffers to the transport driver layer.

Tested under valgrind on a single-node cluster.

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-11 20:38:28 -07:00
Zane Bitter
2303525125 Fix minor errors in man page documentation for corosync.conf
* Correct 'See Also' reference to corosync.conf(5) in corosync(8) man page
* Update path to default config (now /etc/corosync/corosync.conf)

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-10 01:25:08 -07:00
Steven Dake
6aa47fde95 Fix abort when token is lost in RECOVERY state
A commit token should be rejected when a token is lost in the recovery
state.  This occurs naturally because the ring id increases by 4 for
every new ring.  Prior to this patch, if the token was lost, the old
ring id information was restored, causing a commit token to be accepted
when it should be rejected.  This erronously accepted commit token would
lead to an assertion which is fixed by this patch.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-03-07 17:15:05 -07:00
Russell Bryant
c112ee8c89 Add content for the doxygen main page.
This creates some content on the main page of the documentation
generated by doxygen.  The main page includes the license and a link
to the project web site.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
eviewed-by: Steven Dake <sdake@redhat.com>
2011-03-07 08:42:01 -06:00
Russell Bryant
e5456008d0 Resolve a couple of doxygen warnings.
This resolves a couple of doxygen warnings.  First, the group needed a
name.  Second, all of the functions in the file were added to the group
but doxygen complained about the lack of an end to the grouping.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-07 08:39:58 -06:00
Russell Bryant
7478a3e136 Update doxygen configuration file.
The included doxygen configuration file was a bit stale.  It included
some options that were obsolete and caused doxygen to generate some
warnings when running it.  Most of the changes here were simply done by
running "doxygen -u" to automatically update the file.  It added its
documentation for the options and removed the obsolete options.

This also includes one configuration change, which is to set EXTRACT_ALL
to yes.  This instructs doxygen to generate documentation pages for all
files, public functions, and public data structures even if they are not
currently documented using doxygen syntax.  Doxygen is capable of
generating some useful documentation on its own, such as dependency
graphs.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-07 08:38:53 -06:00
Russell Bryant
8ed864ddc5 Minor build system updates for doxygen.
The configure script has been updated to check for the doxygen and dot
applications (from doxygen and graphviz).  The results from these checks
are now used in the Makefile to ensure that the tools are installed when
you run "make doxygen".  If they are not, it will generate a helpful
error message.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-07 08:36:53 -06:00
Russell Bryant
a609f79f1f Ensure that strings are null terminated after strncpy().
From the strcpy(3) man page, the following warning is given:
  The strncpy() function is similar, except that at most n bytes of src
  are  copied.  Warning: If there is no null byte among the first n bytes
  of src, the string placed in dest will not be null-terminated.

The current corosync code base does not take this warning into account
when using strncpy, potentially resulting in non-null terminated strings.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-07 08:30:03 -06:00
Russell Bryant
1be0c3bdc6 Add -l option to corosync-keygen.
This option (-l or --less-secure) causes corosync-keygen to read from
/dev/urandom instead of /dev/random to ensure that no input is required
from the user.  It may be useful when this command is used from a
script.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-05 10:02:57 -06:00