Some functions allocated memory on stack without clearing memory and
then send them on wire. This is not an issue, but valgrind reports this
as a problem so it is easy to miss real problem then.
Solution is to clear stack memory.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
This is a bug I seem to have introduced in
429209f4aa where we compare links
for changes. if a new node was added on an existing link then it
was compared against a non-existant one in the previous configuration.
We now only compare nodes that are in both interfaces.
As I needed min() for this function, I moved it from individual
.c files into util.h so we only have one copy.
And the error message was fixed.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Even if it's not used for anything else.
Also, make cfgtool show the correct link ID when links are not
contiguous
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Commit 899cb29983 changed copy_len
to iovec[i].iov_len, assuming,
copy_len is always the same as iovec[i].iov_len under those
circumstances, but it missed the possability of small message being
partly put at the end of packet, which cuts this message in two parts
and therefore making copy_len not equal to iovec[i].iov_len.
This is revert of 899cb29983
Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
To be more explicit that we are copying whole message.
Related to 0ebae6b47d.
Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
The problem was that two or more messages were concatenated
together during fragmentation in mcast_msg() function. In specific case,
message of just short of 1MB was provided for mcast_msg() and it
happened so, that the remainder (212 bytes to be exact) left some free
space in packet, therefore branch
if ((copy_len + fragment_size) <
(max_packet_size - sizeof (unsigned short))) {
...
was selected and this was the last mesage in provided iovec.
Then, on the second call, came another big message (about 300KB ) and
during fragmentation mcast.fragmented was set to 1.
On the other end, while receiving messages, due to missing
mcast.fragmentation==0 those two messages were concatenated and
therefore assembly->data array overflowed overwriting linked list
pointers and offset (which happened to be set to 0 and that 300KB
message was being copied from the beginning again).
After whole 300KB message has been sent, mcast.fragmentation==0 arrived
and totempg_deliver_fn() tried to move assembly structure to
assembly_list_free list, but as linked list pointers has been overriden,
segfault occured.
Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
In my enthusiasm for removing code while integrating knet I
also deleted the correct code for returning IP address for a node,
so that only the IP addres of the local node was ever returned.
This commit restores the the previous code.
Also, because we always return INTERFACE_MAX interfaces now (they don't
have to be contiguous) set ss_family to zero if that interface is not
in use so that downstream apps know and don't display a lot of 0.0.0.0
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Now we are using knet, it's possible to dynamically add, remove and
reconfigure links on the fly.
Also print 'n' for non-existant knet links. This will show up
only on loopback links >0. But it looks better than 'status ='
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
totempg needs to store the current message + any
overflow for the next message which can be up to (nearly) the MTU size.
in knet that's large, but for UDP it's just 1500.
The reason we've never seen it before is because the actual max message
size is 1024 less than 1MB and after all the headers are stripped out the overflow is
usually 1024 bytes or less.
The 1024*1024 size of the assembly buffer is large enough to hold a max message (1047552) +
1024 bytes of a new UDP message. So we never saw any problems.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
RRP doesn't exist any more so all the ring re-enable code is redundant.
I've removed it from the library and all the code that does anything,
but I've left the hole in the IPC just in case old libraries are
hanging around.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
In function mcast_msg of totempg.c, line 923, there is a memcpy call in
"else" branch, and also another memcpy out of the "else" branch, while
the two calls have the same parameters. It is possibleto remove the memcpy
in "else" branch.
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
totempg_groups_join() is called by sync_init().
sync_init() judge that totempg_groups_join() failed if return code of
totempg_groups_join() is -1.
Therefore, the return code should return in -1 when
totempg_groups_join() fails.
Signed-off-by: Takeshi MIZUTA <miz.take4@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.
To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet
The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.
Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Previously there were two free lists. One for operational and one for
transitional state. Because every node starts in transitional state and
always ends in the operational state, assembly was always put to normal
state free list and never in transitional free list, so new assembly
structure was always allocated after new node connected.
Solution is to have only one free list.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Steven Dake <stdake@cisco.com>
Patch for support waiting_trans_ack may fail if there is synchronization
happening between delivery of fragmented message. In such situation,
fragmentation layer is waiting for message with correct number, but it
will never arrive.
Solution is to handle (callback) change of waiting_trans_ack and use
different queue.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
This patch creates a special message queue for synchronization messages.
This prevents a situation in which messages are queued in the
new_message_queue but have not yet been originated from corrupting the
synchronization process.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Commit which added number of addresses to srp_address structure didn't
count with totemsrp_ifaces_get where whole structure was copied instead
of addresses only. This is now fixed.
Also to make API totempg forward compatible, size of interfaces array
must be passed to ifaces_get like functions to prevent memory overwrite.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Also few leftovers from cfg is removed and version of totempg is
increased to 5 to reflect all changes we made
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Our preferred shared logging system is exported via the libqb library. As
a result, the corosync project no longer needs to export logsys.so and the
code can be directly included in the binary. The header file can also be
removed.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Currently:
- send_reserve() adds to the reserve
- msg_count_send_ok() tests ((avail - totempg_reserved) > msg_count)
So essentially we are checking to see if 2 * msg_count can fit in
the q.
So instead I am using byte_count_send_ok (size) to see if the
message will fit then calling send_reserve()
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
To allow async cpg messages of 1M we need to:
1) increase the totem queue size 4 times
2) align the critical level to one large message free
There are a number of reasons for doing this:
We can't let cpg_mcast_joined() fail because the user will not see it
and will assume is has succeded.
The reason we are getting good performance is by providing a negative
feedback loop from the totem q to the IPC/poll system. This relies
on 4 q states low/med/high/crit. With messages of size 1M you
now have a q of size one and now go from level low to crit instantly
then back to low as messages are put on and taken off. I don't think
this is the best behaviour. By having a q size of 4 allows the system
to utilize the q better and give us time to respond to changes in
the q level.
To effectively achieve flow control with a q of size 1 would require
all the clients to request the space on the q like is done in
totempg_groups_joined_reserve() but probably in shared memory
This would take quite a bit of re-work.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Add a new object called totem.interface.dynamic to allow creation/deletion
of new child objects using the corosync-objctl utility:
to add new member:
linux# corosync-objctl -c totem.interface.dynamic.10-211-55-12
to delete an existing member:
linux# corosync-objctl -d totem.interface.dynamic.10-211-55-12
Corosync will dynamically add these members to the configuration and start
communicating with those nodes.
Signed-off-by: Anton Jouline <anton.jouline@cbsinteractive.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
hdb has some expense and is not necessary in the totempg.so runtime. This
patch removes the dependence on hdb and instead uses a direct pointer.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
This API allows totem to operate as a multithreaded library. Performance is
better without threads but some library users may only have multithreaded
systems. In the corosync case where we have removed threads, this reduces
cpu utilization by ~10% by removing about 50% of the mutex lock and unlock calls
that occur during typical operation. Since the latest corosync is nearly
thread free, there is no need for mutex operations.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
IPC: return 0/-ENOBUFS from message handler
IPC: use the new rate_limit API to improve perf.
CPG: add send_async API & hook up flow control
IPC: Fix flow control getting stuck.
IPC: Port the remaining libs to use libqb IPC
IPC: remove libqb flowcontrol API
TEST: put cpg_dispatch() in it's own thread
IPC: cleanup ipc_glue.c name everything cs_ipcs_*()
IPC: add back statistics
IPC: remove coroipcc_ symbols from lib*.versions
IPC: init each se's IPC as it is loaded.
IPC: use the new connection_closed() event to free the context.
IPC: re-add zero copy functionality back
IPC: remove cpg_mcast_joined_async() and make it the default
-> now cpg_mcast_joined() == cpg_mcast_joined_async()
libqb: expose a libqb error converter
libqb: add missing error conversions
libqb: remove repeat try loop in lib/cpg.c
CPG: fix zero copy mcast
CPG: use newer return codes
Add ENOTCONN to qb_to_cs_error()
libqb: fix error conversion from errno to cs_error_t in confdb
libqb: change errno_to_cs to qb_to_cs_error
libqb: add a cs_strerror() to get a more meaningful message
libqb: fix some confusing error conversions.
libqb: set the timeout on recv's to -1 (wait forever)
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
+ overload - number of times client is told to try again
+ invalid_request - message contained invalid paramter, e.g. invalid size
+ msg_queue_avail - messages currently available at the Totem layer
+ msg-queue_reserved - messages currently reserved at the Totem layer
Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz>
Reviewed-by: Steven Dake <sdake@redhat.com>
X86 processors are able to handle unaligned memory access. Improve
performance by using that feature on i386 and x86_64 compatible
processors, and use old aligning code on different processors.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
The UDPU transport is useful for those deployments which can't use multicast.
UDPU works by using UDP unicast, which is fully supported by every switch
manufacturer by default and doesn't rely on a functional IGMP implementation.
An example of the UDPU transport is contained in the corosync.conf.example.udpu
file which shows a 16 node cluster. This file should be copied to each node
in the cluster and IP addresses changed as appropriate.
Amended to remove dead udpu REUSEADDR socket option.
Signed-off-by: Steven Dake <sdake@redhat.com>