Commit Graph

4263 Commits

Author SHA1 Message Date
Christine Caulfield
4e683699b9 rust: Update to latest standards
Updating to Rust 2021 is a no-op (but worth doing for future),
I've also taken this opportunity to use the latest bitflags crate.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-10-31 11:00:35 +01:00
Christine Caulfield
14a5e6f361 totemsrp: Fix orf_token stats
Previously, orf_token_tx was only incremented on initial send,
this is obviously wrong and resulted in the TX count being
significantly lower than any RX count. Now we increment it every
time the ORF token is sent or resent.

As a quick test, on a single node system the RX and TX stats
will now match.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-10-31 10:58:02 +01:00
Jan Friesse
749f1cb9a5 totem: Use uint64_t type and QB_TIME_NS_IN_MSEC
Function message_handler_orf_token contains extra debug info enabled by
defining GIVEINFO. Insted of using long long unsigned int use better
suited uint64_t and make use of QB_TIME_NS_IN_MSEC constant instead
of hardcoded number. Also compile tv_old conditionally so it is not used
by accident.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-10-23 16:03:14 +02:00
Jan Friesse
55a6f657f4 totem: Use proper timestamp type for token warning
Timestamp diff is very unlikely to be larger than 32-bit integer but it
is still worth to use 64-bit.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-10-23 16:03:01 +02:00
Jan Friesse
3785829935 stats: Store token rx and tx timestamps as 64-bit
Token rx and tx timestamps were computed and stored as 32-bit unsigned
integer but substracted in other parts of code from 64-bit integer.
Result was, that node with uptime larger than 49.71 days
(2^32/(1000*60*60*24)) reported wrong numbers for
stats.srp.time_since_token_last_received and in log message during long
pause (function timer_function_orf_token_warning).

Solution is to store rx and tx data as 64-bit integer.

Fixes #761

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-10-23 16:02:50 +02:00
Christine Caulfield
8b9d5e7051 rust: fix clippy warning in rust 1.81
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-09-19 11:51:03 +02:00
Jan Friesse
02d64060c1 coroparse: Free kv_item key and value on failure
If strdup of kv_item key or value failed only kv_item itself was freed.
Free also key and value (kv_item is zeroed so free of NULL variable is
safe).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-09-02 17:01:38 +02:00
Jan Friesse
2f19853bf4 icmap: Free memory if qb_map_notify_add fails
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-09-02 17:01:29 +02:00
Jan Friesse
b71b8f9dbf cfg: Free new_config interfaces on failure
new_config interfaces was freed on success, but not if some previous
configuration step failed.

Solution is to move free of interfaces to same point as where
orig_interfaces are freed.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-09-02 17:00:42 +02:00
Alexander Aring
53730fa7bd main: support lock pid file arg
This patch adds support to change the default corosync pid file lock
path. This is useful to run corosync net namespace environment only and
since the pid lock file cannot be clarified over the conf because the
pid lock file exists before config parsing we allow the user to specify
it over the command line.

Signed-off-by: Alexander Aring <aahringo@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-08-07 16:36:00 +02:00
Jan Friesse
9bcde28dbe man: fix a typo in cpg_model_initialize
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-08-05 10:29:01 +02:00
Jan Friesse
64c83682d3 man: Improve quorum provider formatting
As suggested by Christine Caulfield split long sentence so now paragraph
follows same formatting style as other options hopefully making
it less confusing.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2024-08-05 09:52:16 +02:00
Christine Caulfield
b98248d9a5 rust: tests return errors and don't hang
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-07-04 10:57:04 +02:00
Christine Caulfield
58d654261a rust: Improve Rust bindings
The big change here is that all API calls now take a &Handle
rather than making a copy. This, apart from being more sensible
and efficient, allows us to implment Drop on the handle so that
it will call _free() when it goes out of scope.

There's some jiggery-pokery with a clone flag in there now
because of callbacks that can return a valid handle, and we
want those to be Drop'ed sensibly.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-07-04 10:56:58 +02:00
Ferenc Wágner
6009018151 Move corosync-notifyd policy file into $(datadir)/dbus-1/system.d
As per dbus-daemon(1):

> Third-party packages would historically install XML files into
> /etc/dbus-1/system.d, but this practice is now considered to be
> deprecated: that directory should be treated as reserved for the
> system administrator.

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-05-28 17:03:09 +02:00
xin liang
73ffbc385f man: corosync.conf: Multi improvements
- Add default value 5405 for mcastport
- Add brief introduction for UDP/UDPU/KNET transport
- Keep format consistent (use uppercase) for above 3 transport types

Signed-off-by: xin liang <xliang@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-05-02 15:40:33 +02:00
Jan Friesse
c01fd757a0 totem: Fix reference links
Link Corosync project archived copy of Yair Amir's PhD thesis
and paper about totem protocol.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2024-03-12 17:22:42 +01:00
Christine Caulfield
ce03c68394 Report crypto errors back to cfg reload
Because crypto changing happens in the 'commit' phase
of the reload and we can't get sure that knet will
allow the new parameters, the result gets ignored.
This can happen in FIPS mode if a non-FIPS cipher
is requested.

This patch reports the errors back in a cmap key
so that the command-line can spot those errors
and report them back to the user.

It also restores the internal values for crypto
so that subsequent attempts to change things have
predictable results. Otherwise further attempts can
do nothing but not report any errors back.

I've also added some error reporting back for the
knet ping counters using this mechanism.

The alternative to all of this would be to check for FIPS
in totemconfig.c and then exclude certain options, but this
would be duplicating code that could easily get out of sync.

This system could also be a useful mechanism for reporting
back other 'impossible' errors.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2024-02-05 15:20:08 +01:00
Christine Caulfield
8d46eb0127 Fix up the library .versions files
I've only added missing symbols and removed old ones. The actual
library version numbers might need assessing too.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2024-01-02 15:32:29 +01:00
Jan Friesse
2fcda76b96 configure: Fix building of rust for release
Set rustver correctly for both release version string
(for example 3.1.7) and git one (3.1.7.1-982f).

corosyncrustver must be escaped by '[]' because sed is using these two
characters and m4 would remove them.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2023-11-28 16:03:14 +01:00
Jan Friesse
982ff8d818 License: Fix year (mainly to fix rust building)
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2023-11-20 14:56:58 +01:00
Machiry Aravind Kumar
40e08b219d Handling integer overflow issues
Avoiding signed integer overflows by converting size
related types to size_t.

Signed-off-by: Machiry Aravind Kumar <makrvcs@gmail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-11-06 16:48:08 +01:00
Christine Caulfield
6a6c7ab02f rust: Improve vector initialisation
(also silence clippy in rust 1.73)

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-10-09 15:04:52 +02:00
Christine Caulfield
0ee1d6cddb man: Update the corosync_overview manpage
The bits about IPv6 were out of date (for knet).

Added reference to the corosync-*tool utilities so that
people know they are there

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-10-06 11:32:40 +02:00
Christine Caulfield
33fa5dcb85 config: Fail to start if ping timers are invalid
This required adding a lot of return values to two previously
'void' functions. I did two rather than just the one that was
needed because it seemed to make sense to do them both together.

Although these functions now return errors, they are probably
still ignored higher up. this really needs a comprehensive audit.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-10-05 15:53:55 +02:00
Christine Caulfield
9aaac85b8d rust: Remove some pointless casts
As pointed out by clippy in Rust 1.72

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-08-30 17:31:25 +02:00
Christine Caulfield
77d9ea3ca1 parser: Allow a non-breaking space as 'whitespace'
non-breaking spaces are depressingly easy to enter in some
editors and can make a mess of a corosync.conf file, as the
character can break keyword names and generate some very strange
error messages.

So here we include it (0xA0) as a valid whitespace character.
The (unsigned char) cast is for portability - Intel systems use
signed chars so we'd need something there, but this should
protect us against unsigned char systems too.

No attempt is made to protect against UTF-8 characters, that's very
much out of scope for this project I suspect.

ref: https://github.com/corosync/corosync/issues/723

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-08-09 09:48:21 +02:00
Jan Friesse
149c64725f spec: Migrate to SPDX license
Both Fedora and openSUSE now recommends to use SPDX shortname format
for License.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2023-06-06 16:28:16 +02:00
Jan Friesse
a93e2aa363 build: Fix rust make -j build dep for distcheck
"Inspired" by similar patch from kronosnet
(531ebe195a955d9a1c8b762443ecab3edca95ad4)

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2023-05-29 11:15:16 +02:00
Jan Friesse
b47c5197ea rust: Remove tests from check scripts
Rust test are equivalent of C tests (so interactive one) and not
automated tests, so it shouldn't be executed by make check.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2023-05-23 08:47:18 +02:00
Christine Caulfield
f4ff07eba3 Rust: Remove obsolete bindgen flag
--size_t-is-usize has been deprecated for a while and is
removed in bindgen 0.64

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-04-24 09:27:32 +02:00
Christine Caulfield
3e4eba6548 knet: use knet TRACE logging level if available
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-04-03 16:13:23 +02:00
Christine Caulfield
f68a2d1c85 Rust: 'fix' clippys for Rust 1.67
This is clippy getting a bit above itself IMHO

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-01-27 17:16:48 +01:00
Christine Caulfield
846f3d13c6 rust: Make it work on FreeBSD
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-01-26 17:03:43 +01:00
Christine Caulfield
f34052d78e bindings: Add Rust bindings
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2023-01-26 17:03:36 +01:00
Jan Friesse
91348f8659 totemconfig: Add support for knet_mtu
totem.knet_mtu is new configuration option which allows setting
of automatic or manual knet MTU.

Also reload of totem.knet_pmtud_interval is fixed now, so it works when
key is deleted (and set back default value).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2022-10-24 16:57:27 +02:00
Jan Friesse
c78e159267 configure: Modernize configure.ac a bit
... to make 2.71 happy. Also increase minimum version to 2.69 (10 years
old version so should be compatible enough).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2022-07-27 09:06:22 +02:00
Christine Caulfield
7b96a937df log: Configure knet logging to the same as corosync
Before this, all knet messages, including debug, were sent
over the pipe from knet to corosync and filtered in corosync.
This was obviously a waste, so now we tell knet the logging
level we need from it and so only get the messages that the
user has requested.

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2022-03-31 17:29:20 +02:00
Jan Friesse
04362046c4 logrotate: Use copytruncate method by default
The reopen lograte method has two main problems:
1. It does fail when corosync is not running (solvable by
   adding "|| true")
2. If (for some reason, like SELinux) cfgtool -L fails, logrotate
   fails and corosync keeps logging into old file. Added "|| true"
   makes situation even worse because logrotate removes file but
   corosync keeps logging into it.

Solution is to install copytruncate logrotate snip by default (and
keep reopen config file only for reference).

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2022-03-30 16:18:36 +02:00
Andreas Grueninger
1038e4a18f totemconfig: Check uname return value correctly
uname in Solaris/Illumos returns non-negative value when succesful.

Signed-off-by: Andreas Grueninger <andreas.grueninger@noemail.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2022-03-08 15:32:07 +01:00
Jan Friesse
59d3303517 totempg: Fix alignment handling
Some platforms requires aligned memory access. For such platforms,
special code was added using address modulo 4 to check if aligning is
needed or not. This may be problem for 64 bits platforms. Also check in
app_deliver_fn was incorrect and always true.

Solution is to use modulo sizeof pointer and add parentheses to fix the
check in app_deliver_fn function.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2022-02-01 10:34:28 +01:00
Jan Friesse
ada1cfa021 pkgconfig: Export corosysconfdir
Useful for external code to easily tell where corosync.conf
is (in case someone configured it for /usr/local/etc, ...)

E.g. pacemaker's crm_report collects corosync.conf, and some
of its testing tools generate a corosync.conf for a test cluster.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2022-01-05 14:06:44 +01:00
Ferenc Wágner
6210a89314 Remove bashism from configure script
This was the real problem behind 384d168: Debian experimental now
sports a dash with LINENO support, so configure does not fall back to
using bash instead, choking on such bash-only constructs.  Unfortunately
this didn't bail out cleanly, just unexpectedly set link_all_deplibs to
no, and the error message

./configure: 13158: test: yes: unexpected operator

stayed unnoticed in the logs.  Actually, link_all_deplibs=no is the
default in Debian, reducing overlinking and causing confusion overall,
see https://debbugs.gnu.org/db/13/13920.html for example.

I think being explicit about used interfaces has its merit, so now that
Corosync has it, it might be advantageous to disable link_all_deplibs
by default across the board (after this patch re-enables it as a side
effect).

Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2022-01-03 11:01:48 +01:00
Jan Friesse
8b638e989c totemudpu: Don't block local socketpair
Commit to drop packets from unlisted IPs made ifdown case not working
because msg_name is unset for socketpair.

solution is to drop packets from unlisted IPs only when bind state is
BIND_STATE_REGULAR.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2021-11-24 09:22:49 +01:00
Jan Friesse
384d168b0e build: Add explicit dependency for used libraries
Don't rely on implicit symbol finding (cs_strerror being most prominent
example) but rather use explicit one.

This makes current debian experimental happy (compile source)

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2021-11-10 08:24:20 +01:00
Jan Friesse
e7a82370a7 totemsrp: Switch totempg buffers at the right time
Commit 92e0f9c7bb added switching of
totempg buffers in sync phase. But because buffers got switch too early
there was a problem when delivering recovered messages (messages got
corrupted and/or lost). Solution is to switch buffers after recovered
messages got delivered.

I think it is worth to describe complete history with reproducers so it
doesn't get lost.

It all started with 402638929e (more info
about original problem is described in
https://bugzilla.redhat.com/show_bug.cgi?id=820821). This patch
solves problem which is way to be reproduced with following reproducer:
- 2 nodes
- Both nodes running corosync and testcpg
- Pause node 1 (SIGSTOP of corosync)
- On node 1, send some messages by testcpg
  (it's not answering but this doesn't matter). Simply hit ENTER key
  few times is enough)
- Wait till node 2 detects that node 1 left
- Unpause node 1 (SIGCONT of corosync)

and on node 1 newly mcasted cpg messages got sent before sync barrier,
so node 2 logs "Unknown node -> we will not deliver message".

Solution was to add switch of totemsrp new messages buffer.

This patch was not enough so new one
(92e0f9c7bb) was created. Reproducer of
problem was similar, just cpgverify was used instead of testcpg.
Occasionally when node 1 was unpaused it hang in sync phase because
there was a partial message in totempg buffers. New sync message had
different frag cont so it was thrown away and never delivered.

After many years problem was found which is solved by this patch
(original issue describe in
https://github.com/corosync/corosync/issues/660).
Reproducer is more complex:
- 2 nodes
- Node 1 is rate-limited (used script on the hypervisor side):
  ```
  iface=tapXXXX
  # ~0.1MB/s in bit/s
  rate=838856
  # 1mb/s
  burst=1048576
  tc qdisc add dev $iface root handle 1: htb default 1
  tc class add dev $iface parent 1: classid 1:1 htb rate ${rate}bps \
    burst ${burst}b
  tc qdisc add dev $iface handle ffff: ingress
  tc filter add dev $iface parent ffff: prio 50 basic police rate \
    ${rate}bps burst ${burst}b mtu 64kb "drop"
  ```
- Node 2 is running corosync and cpgverify
- Node 1 keeps restarting of corosync and running cpgverify in cycle
  - Console 1: while true; do corosync; sleep 20; \
      kill $(pidof corosync); sleep 20; done
  - Console 2: while true; do ./cpgverify;done

And from time to time (reproduced usually in less than 5 minutes)
cpgverify reports corrupted message.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
2021-11-03 10:19:44 +01:00
Christine Caulfield
f6f6f41a87 cpghum: Allow to continue if corosync is restarted
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2021-10-25 11:41:34 +02:00
miharahiro
d5b53fd227 man: Fix consensus timeout
The consensus timeout is 1.2 * token_timeout,
which has been changeg from 1000 to 3000, so change also consensus
timeout.

Signed-off-by: miharahiro <hmihara@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
2021-10-18 14:21:45 +02:00
Jan Friesse
60dbacaeb4 logsys: Unlock config mutex on error
Thanks Ryan Cai <ycaibb@gmail.com> for reporting the problem.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2021-09-13 09:13:54 +02:00
Jan Friesse
cdf72925db totem: Add cancel_hold_on_retransmit config option
Previously, existence of retransmit messages canceled holding
of token (and never allowed representative to enter token hold
state).

This makes token rotating maximum speed and keeps processor
resending messages over and over again - overloading network
and reducing chance to successfully deliver the messages.

Also there were reports of various Antivirus / IPS / IDS which slows
down delivery of packets with certain sizes (packets bigger than token)
what make Corosync retransmit messages over and over again.

Proposed solution is to allow representative to enter token hold
state when there are only retransmit messages. This allows network to
handle overload and/or gives Antivirus/IPS/IDS enough time scan and
deliver packets without corosync entering "FAILED TO RECEIVE" state and
adding more load to network.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2021-08-20 16:55:48 +02:00