Some platforms requires aligned memory access. For such platforms,
special code was added using address modulo 4 to check if aligning is
needed or not. This may be problem for 64 bits platforms. Also check in
app_deliver_fn was incorrect and always true.
Solution is to use modulo sizeof pointer and add parentheses to fix the
check in app_deliver_fn function.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Current we horribly over-use totempg_ifaces_get() to
retrieve information about knet interfaces. This is an attempt to
improve on that.
All transports are supported (so not only Knet but also UDP(U)).
This patch builds best against the "onwire-upgrade" branch of knet
as that's what sparked my interest in getting more information out.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Needs new knet crypto API.
If it's not available, then fall back to the old
API and forbid changing crypto while running.
To avoid us being dependant on the leader node, each
node sends its own crypto_reconfig_phase messages so
we can guarantee that the reconfiguration always completes
on each node.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
reload failed for UDP[U] because they had saved pointers
to the interfaces[] array. so memcpy into that rather then
re-allocate it.
Also, move the check for different IP address families so
it also gets run at reload time.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
To be more reliable & maintainable
The basic plan here is to fix reloads to be more stable
using read/parse/verify/build/commit stages, so that any errors
will not leave corosync in an unstable state. This should
also make the code more maintainable as currently the verify/commit
stages are horribly intertwined.
Also:
- Fix local_node_pos not being updated in the new map during validation
(broke adding and removing new nodes in the middle of the list).
- Fix reconfiguration so that nodes are indexed by nodeid and not their
position in the list. This is an old bug that's just been carried
over
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Some functions allocated memory on stack without clearing memory and
then send them on wire. This is not an issue, but valgrind reports this
as a problem so it is easy to miss real problem then.
Solution is to clear stack memory.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
This is a bug I seem to have introduced in
429209f4aa where we compare links
for changes. if a new node was added on an existing link then it
was compared against a non-existant one in the previous configuration.
We now only compare nodes that are in both interfaces.
As I needed min() for this function, I moved it from individual
.c files into util.h so we only have one copy.
And the error message was fixed.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Even if it's not used for anything else.
Also, make cfgtool show the correct link ID when links are not
contiguous
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Commit 899cb29983 changed copy_len
to iovec[i].iov_len, assuming,
copy_len is always the same as iovec[i].iov_len under those
circumstances, but it missed the possability of small message being
partly put at the end of packet, which cuts this message in two parts
and therefore making copy_len not equal to iovec[i].iov_len.
This is revert of 899cb29983
Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
To be more explicit that we are copying whole message.
Related to 0ebae6b47d.
Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
The problem was that two or more messages were concatenated
together during fragmentation in mcast_msg() function. In specific case,
message of just short of 1MB was provided for mcast_msg() and it
happened so, that the remainder (212 bytes to be exact) left some free
space in packet, therefore branch
if ((copy_len + fragment_size) <
(max_packet_size - sizeof (unsigned short))) {
...
was selected and this was the last mesage in provided iovec.
Then, on the second call, came another big message (about 300KB ) and
during fragmentation mcast.fragmented was set to 1.
On the other end, while receiving messages, due to missing
mcast.fragmentation==0 those two messages were concatenated and
therefore assembly->data array overflowed overwriting linked list
pointers and offset (which happened to be set to 0 and that 300KB
message was being copied from the beginning again).
After whole 300KB message has been sent, mcast.fragmentation==0 arrived
and totempg_deliver_fn() tried to move assembly structure to
assembly_list_free list, but as linked list pointers has been overriden,
segfault occured.
Signed-off-by: Rytis Karpuška <rytisk@neurotechnology.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
In my enthusiasm for removing code while integrating knet I
also deleted the correct code for returning IP address for a node,
so that only the IP addres of the local node was ever returned.
This commit restores the the previous code.
Also, because we always return INTERFACE_MAX interfaces now (they don't
have to be contiguous) set ss_family to zero if that interface is not
in use so that downstream apps know and don't display a lot of 0.0.0.0
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Now we are using knet, it's possible to dynamically add, remove and
reconfigure links on the fly.
Also print 'n' for non-existant knet links. This will show up
only on loopback links >0. But it looks better than 'status ='
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
totempg needs to store the current message + any
overflow for the next message which can be up to (nearly) the MTU size.
in knet that's large, but for UDP it's just 1500.
The reason we've never seen it before is because the actual max message
size is 1024 less than 1MB and after all the headers are stripped out the overflow is
usually 1024 bytes or less.
The 1024*1024 size of the assembly buffer is large enough to hold a max message (1047552) +
1024 bytes of a new UDP message. So we never saw any problems.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
RRP doesn't exist any more so all the ring re-enable code is redundant.
I've removed it from the library and all the code that does anything,
but I've left the hole in the IPC just in case old libraries are
hanging around.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
In function mcast_msg of totempg.c, line 923, there is a memcpy call in
"else" branch, and also another memcpy out of the "else" branch, while
the two calls have the same parameters. It is possibleto remove the memcpy
in "else" branch.
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
totempg_groups_join() is called by sync_init().
sync_init() judge that totempg_groups_join() failed if return code of
totempg_groups_join() is -1.
Therefore, the return code should return in -1 when
totempg_groups_join() fails.
Signed-off-by: Takeshi MIZUTA <miz.take4@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.
To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet
The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.
Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Previously there were two free lists. One for operational and one for
transitional state. Because every node starts in transitional state and
always ends in the operational state, assembly was always put to normal
state free list and never in transitional free list, so new assembly
structure was always allocated after new node connected.
Solution is to have only one free list.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Steven Dake <stdake@cisco.com>
Patch for support waiting_trans_ack may fail if there is synchronization
happening between delivery of fragmented message. In such situation,
fragmentation layer is waiting for message with correct number, but it
will never arrive.
Solution is to handle (callback) change of waiting_trans_ack and use
different queue.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
This patch creates a special message queue for synchronization messages.
This prevents a situation in which messages are queued in the
new_message_queue but have not yet been originated from corrupting the
synchronization process.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Commit which added number of addresses to srp_address structure didn't
count with totemsrp_ifaces_get where whole structure was copied instead
of addresses only. This is now fixed.
Also to make API totempg forward compatible, size of interfaces array
must be passed to ifaces_get like functions to prevent memory overwrite.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Also few leftovers from cfg is removed and version of totempg is
increased to 5 to reflect all changes we made
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Our preferred shared logging system is exported via the libqb library. As
a result, the corosync project no longer needs to export logsys.so and the
code can be directly included in the binary. The header file can also be
removed.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Currently:
- send_reserve() adds to the reserve
- msg_count_send_ok() tests ((avail - totempg_reserved) > msg_count)
So essentially we are checking to see if 2 * msg_count can fit in
the q.
So instead I am using byte_count_send_ok (size) to see if the
message will fit then calling send_reserve()
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
To allow async cpg messages of 1M we need to:
1) increase the totem queue size 4 times
2) align the critical level to one large message free
There are a number of reasons for doing this:
We can't let cpg_mcast_joined() fail because the user will not see it
and will assume is has succeded.
The reason we are getting good performance is by providing a negative
feedback loop from the totem q to the IPC/poll system. This relies
on 4 q states low/med/high/crit. With messages of size 1M you
now have a q of size one and now go from level low to crit instantly
then back to low as messages are put on and taken off. I don't think
this is the best behaviour. By having a q size of 4 allows the system
to utilize the q better and give us time to respond to changes in
the q level.
To effectively achieve flow control with a q of size 1 would require
all the clients to request the space on the q like is done in
totempg_groups_joined_reserve() but probably in shared memory
This would take quite a bit of re-work.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Add a new object called totem.interface.dynamic to allow creation/deletion
of new child objects using the corosync-objctl utility:
to add new member:
linux# corosync-objctl -c totem.interface.dynamic.10-211-55-12
to delete an existing member:
linux# corosync-objctl -d totem.interface.dynamic.10-211-55-12
Corosync will dynamically add these members to the configuration and start
communicating with those nodes.
Signed-off-by: Anton Jouline <anton.jouline@cbsinteractive.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
hdb has some expense and is not necessary in the totempg.so runtime. This
patch removes the dependence on hdb and instead uses a direct pointer.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
This API allows totem to operate as a multithreaded library. Performance is
better without threads but some library users may only have multithreaded
systems. In the corosync case where we have removed threads, this reduces
cpu utilization by ~10% by removing about 50% of the mutex lock and unlock calls
that occur during typical operation. Since the latest corosync is nearly
thread free, there is no need for mutex operations.
Signed-off-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Angus Salkeld <asalkeld@redhat.com>
IPC: return 0/-ENOBUFS from message handler
IPC: use the new rate_limit API to improve perf.
CPG: add send_async API & hook up flow control
IPC: Fix flow control getting stuck.
IPC: Port the remaining libs to use libqb IPC
IPC: remove libqb flowcontrol API
TEST: put cpg_dispatch() in it's own thread
IPC: cleanup ipc_glue.c name everything cs_ipcs_*()
IPC: add back statistics
IPC: remove coroipcc_ symbols from lib*.versions
IPC: init each se's IPC as it is loaded.
IPC: use the new connection_closed() event to free the context.
IPC: re-add zero copy functionality back
IPC: remove cpg_mcast_joined_async() and make it the default
-> now cpg_mcast_joined() == cpg_mcast_joined_async()
libqb: expose a libqb error converter
libqb: add missing error conversions
libqb: remove repeat try loop in lib/cpg.c
CPG: fix zero copy mcast
CPG: use newer return codes
Add ENOTCONN to qb_to_cs_error()
libqb: fix error conversion from errno to cs_error_t in confdb
libqb: change errno_to_cs to qb_to_cs_error
libqb: add a cs_strerror() to get a more meaningful message
libqb: fix some confusing error conversions.
libqb: set the timeout on recv's to -1 (wait forever)
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
+ overload - number of times client is told to try again
+ invalid_request - message contained invalid paramter, e.g. invalid size
+ msg_queue_avail - messages currently available at the Totem layer
+ msg-queue_reserved - messages currently reserved at the Totem layer
Signed-off-by: Tim Beale <tim.beale@alliedtelesis.co.nz>
Reviewed-by: Steven Dake <sdake@redhat.com>
X86 processors are able to handle unaligned memory access. Improve
performance by using that feature on i386 and x86_64 compatible
processors, and use old aligning code on different processors.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>