Commit Graph

1357 Commits

Author SHA1 Message Date
Angus Salkeld
63e16ab583 libqb: remove tsafe.c
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-08-09 10:37:15 +10:00
Angus Salkeld
78e06739b7 libqb: remove worker thread - keep to one thread.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-08-09 10:37:15 +10:00
Angus Salkeld
f717bc60e1 libqb: make timer api a wrapper around qb_loop timers.
- change timeout value to nano seconds
- fix timer handles (don't alloc on stack)

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-08-09 10:37:14 +10:00
Angus Salkeld
c6895faa05 libqb: change ipc -> qb_ipc
IPC: return 0/-ENOBUFS from message handler
IPC: use the new rate_limit API to improve perf.
CPG: add send_async API & hook up flow control
IPC: Fix flow control getting stuck.
IPC: Port the remaining libs to use libqb IPC
IPC: remove libqb flowcontrol API
TEST: put cpg_dispatch() in it's own thread
IPC: cleanup ipc_glue.c name everything cs_ipcs_*()
IPC: add back statistics
IPC: remove coroipcc_ symbols from lib*.versions
IPC: init each se's IPC as it is loaded.
IPC: use the new connection_closed() event to free the context.
IPC: re-add zero copy functionality back
IPC: remove cpg_mcast_joined_async() and make it the default
 -> now cpg_mcast_joined() == cpg_mcast_joined_async()
libqb: expose a libqb error converter
libqb: add missing error conversions
libqb: remove repeat try loop in lib/cpg.c
CPG: fix zero copy mcast
CPG: use newer return codes
Add ENOTCONN to qb_to_cs_error()
libqb: fix error conversion from errno to cs_error_t in confdb
libqb: change errno_to_cs to qb_to_cs_error
libqb: add a cs_strerror() to get a more meaningful message
libqb: fix some confusing error conversions.
libqb: set the timeout on recv's to -1 (wait forever)

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-08-09 10:37:14 +10:00
Angus Salkeld
fce8a3c3b6 libqb: convert coropoll calls to qb_loop calls.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-08-09 10:37:14 +10:00
Jan Friesse
d4fb83e971 main: let poll really stop before totempg_finalize
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-07-26 10:07:08 +02:00
Jan Friesse
ddb5214c2c Revert "totemsrp: Remove recv_flush code"
This reverts commit 1a7b7a39f4.

Reversion is needed to remove overflow of receive buffers and dropping
messages.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2011-07-26 10:05:55 +02:00
MORITA Kazutaka
1d9f444fec totemsrp: fix buffer overflows for large clusters (> 100 nodes)
Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-07-24 13:33:26 -07:00
Tim Beale
04f37df2f7 Add some more stats for debugging
+ overload - number of times client is told to try again
+ invalid_request - message contained invalid paramter, e.g. invalid size
+ msg_queue_avail - messages currently available at the Totem layer
+ msg-queue_reserved - messages currently reserved at the Totem layer

Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-07-19 08:58:41 -07:00
Jan Friesse
ad5cda223c rrp: Handle rollower in passive rrp properly
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-07-18 11:46:56 +02:00
Jan Friesse
d02d288747 rrp: handle rollover in active rrp properly
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-07-18 11:46:50 +02:00
Jan Friesse
a48c8e517d totemconfig: Change default FAIL_TO_RECV_CONST
Previous default (50) was too low for most modern switch hardware. This
may trigger abort because the aru doesn't increase for 50 token
rotations combined with a defect in how failed to recv conditions are
handled.  By increasing this tunable, the condition should no longer
trigger the errant code.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-07-18 11:46:21 +02:00
Steven Dake
c544e87bb0 Correct missing poll funtions from service handler struct needed for confdb APIs
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2011-07-15 13:30:41 -07:00
Steven Dake
a3d98f1652 Fix problem where corosync will segfault if there are gaps in recovery queue
Fixes a problem where there are gaps in the recovery queue.  Example my_aru = 5,
but there are messages at 7,8.  8 = my_high_seq_received which results
in data slots taken up in new message queue.  What should really happen
is these last messages should be delivered after a transitional
configuration to maintain SAFE agreement.  We don't have support for
SAFE atm, so it is probably safe just to throw these messages away.  Without
this change, the new message queue on a new configuraton change is out of sync.

Signed-off-by: Steven Dake <sdake@redhat.com>
Tested-by: Tim Beale <tlbeale@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2011-07-15 10:39:57 -07:00
Jan Friesse
57749ec02a totemiba: free send_buf on ibv_reg_mr failure
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-07-08 08:15:14 +02:00
Tim Beale
77f7e5b0fe Fix compile/runtime issues for _POSIX_THREAD_PROCESS_SHARED < 1
For the case where _POSIX_THREAD_PROCESS_SHARED < 1, the code doesn't compile
for corosync v1.3.1. And when it does compile, it crashes on our system - our
version of uClibc seems to always expect a 4th arg. The man pages suggests
the 4th arg is optional, but does say: 'For greater portability it is best to
always call semctl() with four arguments', which is what this patch does.
Also removed semop as it's an unused variable.

Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-07-06 06:44:22 -07:00
Tim Beale
ba107f0a33 getpwnam_r()/getgrnam_r() returns ERANGE for some systems
On our system the expected buffer length is 256. This means calls to
getpwnam_r()/getgrnam_r() return ERANGE error and corosync fails to startup.
These 2 functions return ERANGE when insufficient buffer space is supplied.
Judging by the man page for getpwnam_r, the correct way to determine the
buffersize on any given system is to use sysconf().

Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-07-06 06:31:50 -07:00
Jiaju Zhang
5dc33c2824 RRP: redundant ring automatic recovery
This patch automatically recovers redundant ring failures.

Please note that this patch introduced rrp_autorecovery_check_timeout
in totem config hence breaks internal ABI. The internal ABI users
of totem.h need to rebuild their binaries.

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Signed-off-by: Steven Dake <sdake@redhat.com>
Tested-by: Jan Friesse <jfriesse@redhat.com>
Tested-by: Florian Haas <florian.haas@linbit.com>
Tested-by: Jiaju Zhang <jjzhang@suse.de>
2011-07-05 09:13:48 -07:00
Jan Friesse
8c717c22b2 Remove spinlocks
Spinlocks are now removed, because even spinlock can improve
speed is some special cases, in most cases it makes corosync CPU usage
much more intensive and less responsive then if only mutexes are used.

What we were doing is:
pthread_mutex_lock
pthread_spin_lock
pthread_spin_unlock
pthread_mutex_unlock

what is not safe.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-29 12:01:54 +02:00
Jerome Flesch
00434a4f10 Fix usage of strerror_r()/perror()
Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-06-28 09:56:58 +02:00
Steven Dake
ae4a3af340 sched_params log message incorrect
The sched_params parameter was set before being printed.

Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Reviewed-by:  <sdake@redhat.com>
2011-06-22 22:46:56 -07:00
Jan Friesse
e8000c7b9b objdb: save copy of handles in object_find_create
Following situation could happen:
- process 1 thru confdb creates find handle
- calls find iteration once
- different process 2 deletes object pointed by process 1 iterator
- process 1 calls iteration again ->
  object_find_instance->find_child_list is invalid pointer

-> segfault

Now object_find_create creates array of matching object handlers and
object_find_next uses that array together with check for name. This
prevents situation where between steps 2 and 3 new object is created
with different name but sadly with same handle.

Also good to note that this patch is more or less quick hack rather
then proper solution. Real proper solution is to not use pointers
and rather use handles everywhere. This is big TODO.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-22 11:13:12 +02:00
Jiaju Zhang
c6bfc6b5d6 RRP: Fix ring initialization issue for UDPU mode
Redundant ring has some problem in the UDP unicast mode. The problem
is the second ring has not been successfully initialized, that is, the
second time iface_changes happens, the member list for that interface
has not been added, which results in that ring cannot transmit normal
message. So the second ring cannot take over the work if the first
ring is down. This patch fixes this issue.

comments from review:
More work is needed probably in totemnet where totemnet maintains the
the of node list and an iterator for them, and totemudpu_member_add adds
state information to a context for the iteration.

In any regard, that is somewhat difficult to test, so I'll merge this
patch for now - keep in mind interface changes on the bindnetaddr will
cause problems with udpu after this patch has been commmitted.

Signed-off-by: Jiaju Zhang <jjzhang@suse.de>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-16 17:23:36 -07:00
Jan Friesse
50f05bfa15 crypto: rng_make_prng prevent buf overflow
with bits set to 1023, buf of 256 bytes was filled by rng_get_bytes
up to 257 bytes. Buf is now 258 bytes so it's no longer problem.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-10 12:12:05 +02:00
Jan Friesse
afa0398ca4 mainconfig: Check retval of logsys_format_set
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-06 10:02:34 +02:00
Jan Friesse
531e81602f totemudp: memset of proper size
In totemudp_mcast_thread_state_constructor memset to
sizeof(struct totemudp_mcast_thread_state) instead of size of
pointer.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-03 11:09:27 +02:00
Jan Friesse
ea0a24866c coroipcs: init buf in coroipcs_handler_dispatch
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-03 11:09:01 +02:00
Jan Friesse
c2a39cb8e2 coroparse: don't leak dirent
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-03 11:00:56 +02:00
Jan Friesse
d76bb76d1f logsys: _logsys_wthread_create never returns != 0
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-03 10:59:17 +02:00
Jan Friesse
6b9297131c totemconfig: discard check of objdb_get_string ret
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-06-03 10:58:15 +02:00
Jan Friesse
77d9808125 iazc: Reduce number of mem alloc and memcpy
X86 processors are able to handle unaligned memory access. Improve
performance by using that feature on i386 and x86_64 compatible
processors, and use old aligning code on different processors.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-05-30 09:06:35 +02:00
Jerome Flesch
6bec0aa227 logsys: When corosync is compiled with --enable-small-memory-footprint, also reduce the size of the logsys SHM
Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-05-27 13:45:27 +02:00
Jerome Flesch
b112672115 coroipcs_handler_dispatch(): Fix conn_info->service security value: -1 is not a good security value since it's equal to SOCKET_SERVICE_INIT
Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2011-05-27 13:40:36 +02:00
Jerome Flesch
fe51e70367 Corosync: Fix build when done with --enable-fatal-warnings
Signed-off-by: Jerome Flesch <jerome.flesch@netasq.com>
Reviewed-by: Jan Friesse<jfriesse@redhat.com>
2011-05-27 13:29:12 +02:00
Russell Bryant
a53e402912 logsys.c: Use snprintf() instead of sprintf().
Change a couple of string functions to use the the output length
limiting counterpart.

Signed-off-by: Russell Bryant <russell@russellbryant.net>
2011-05-08 02:42:47 -05:00
Jan Friesse
61d83cd719 totemsrp: Enhance mcast failure detection
memb_state_gather_enter increase stats.continuous_gather only if
previous state was gather also. This should happen only if multicast is
not working properly (local firewall in most cases) and not if many
nodes joins at one time.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-05-05 11:00:26 +02:00
Jan Friesse
719fddd8e1 coroipcs: Deny connect to service without initfn
If library connect to service with no init function, coroipcs will try
to dereference NULL pointer. Now we correctly return error code
CS_ERR_NOT_EXIST.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-15 08:56:35 +02:00
Steven Dake
6a752ba1b1 Align ipc on 8 byte boundaries
Align all ipc messages on 8 byte boundaries.  This alignment will remove bus
errors on systems that can't access non-byte aligned data and should improve
performance.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-04-14 17:25:08 -07:00
Steven Dake
83f528b473 Fix problem where unaligned totemip address access would result in bus error on non-unaligned-safe architectures.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
2011-04-14 17:22:02 -07:00
Greg Walton
1db74fe1b9 Clean up ENDIAN ifdef tests
Signed-off-by: Greg Walton <corosync@gwalton.net>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-14 17:10:17 -07:00
Tim Serong
61d7ec1716 Fix tyop in RRP faulty error messages
Signed-off-by: Tim Serong <tserong@novell.com>
Reviewed-by: Russell Bryant <russell@russellbryant.net>
2011-04-11 02:00:18 -05:00
Angus Salkeld
4ed97d991b IPC: place calls to stats functions outside of mutexes
This is to prevent nasty deadlocks between IPC and objdb.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-13 08:15:59 +10:00
Zane Bitter
6365150ae2 Provide better checking of the message type
A negative value for the message type (on systems where char is signed)
would cause a crash. This is highly probable if the cluster is, for example,
misconfigured to have encryption enabled on some nodes but not others.

Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-12 13:09:39 -07:00
Zane Bitter
6e990d202f Fix uninitialised memory errors found by valgrind
Signed-off-by: Zane Bitter <zane.bitter@gmail.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-04-08 09:13:12 -07:00
Angus Salkeld
265661745d Fix shutdown when a confdb client is still connected
If you are connected to corosync and registered for
object notifications then corosync is asked to shutdown
the IPC server will get stuck. This is because the pipe
is closed and the refcount is increased. This leaves ipcs
with a connection that it can't destroy.

Solution:
1) if a write to the pipe fails (pipe closed) decrement the refcounter.
2) fix the object_track_stop() - it was not working as the functions
   did not match up. (this caused the late callbacks).
3) in ipcs call exit_fn() then stats_destroy_connection() so that
   the service engine can have time to call object_track_stop()
   before the object gets destroyed.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-29 13:48:20 +11:00
Angus Salkeld
076e8b74f7 STATS: add the service name to the connection name.
This helps to quickly identify what service the application
is connected to.

The object will now look like:
runtime.connections.corosync-objctl:CONFDB:19654:13.service_id=11
runtime.connections.corosync-objctl:CONFDB:19654:13.client_pid=19654
etc...

This also makes it clearer to receivers of the dbus/snmp events
what is going on.

Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-29 13:48:13 +11:00
Steven Dake
7d5e588931 totemsrp: free messages originated in recovery rather then rely on messages_free
Relying on messages_free may seem like it should work, but it leads to a
situation where every node has released the messages, yet some nodes think
messages are missing.  The output then looks like "Retransmit: #" in
repitition.  This patch frees those messages immediately during the transition
to the OPERATIONAL state and sets the internal variables totemsrp depends
upon to the proper values.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2011-03-24 09:25:15 -07:00
Steven Dake
ef05817ce5 totemsrp: Only restore old ring id information one time
The current code stores the current ring information every time a commit
token is generated.  This causes the old ring id used for comparison purposes
to increase if a token is lost in commit or recovery, resulting in failure of
totem.  This patch changes the behavior to only store the old ring id one
time when the commit token is received, and then further commit token ring
id saves are not done until OPERATIONAL is reached.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2011-03-24 09:22:34 -07:00
Steven Dake
1a7b7a39f4 totemsrp: Remove recv_flush code
The recv_flush code is no longer necessary because of the miss_count_count
addition.  It can in some cases lead to register corruption because of
interactions with -fstack-protector, the recursive nature of how this code
works, and interactions with the optimizer in some versions of gcc.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2011-03-24 09:21:27 -07:00
Steven Dake
d99fba72e6 Resolve abort during simulatenous stopping of atleast 4 nodes
consider 5 nodes.

node 3,4 stopped (by random stopping) node 1,2,5 form new configuration
and during recovery node 1 and node 2 are stopped (via service service
corosync stop).  This causes 5 never to finish recovery within the timeout
period, triggering a token loss in recovery.  Bug #623176 resolved an assert
which happens because the full ring id was being restored.  The resolution
to Bug #623176 was to not restore the full ring id, and instead operate
(according to specifications) the new ring id.  Unfortunately this exposes
a problem whereby the restarting of nodes 1-4 generate the same ring id.
This ring id gets to the recovery failed node 5 which is now in gather,
and triggers a condition not accounted for in the original totem specification.

It appears later work from Dr. Agarwal's PHD dissertation considers this
scenario.  That solution entails rejecting the regular token in the above
condition.  Since the ring id is also used to make decisions for commit token
acceptance, we must also take care to reject the regular token in all cases
after transitioning from OPERATIONAL.

Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
2011-03-21 09:26:35 -07:00