mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-10-04 00:31:15 +00:00

Author	SHA1	Message	Date
Christine Caulfield	4e683699b9	rust: Update to latest standards Updating to Rust 2021 is a no-op (but worth doing for future), I've also taken this opportunity to use the latest bitflags crate. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2024-10-31 11:00:35 +01:00
Christine Caulfield	14a5e6f361	totemsrp: Fix orf_token stats Previously, orf_token_tx was only incremented on initial send, this is obviously wrong and resulted in the TX count being significantly lower than any RX count. Now we increment it every time the ORF token is sent or resent. As a quick test, on a single node system the RX and TX stats will now match. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2024-10-31 10:58:02 +01:00
Jan Friesse	749f1cb9a5	totem: Use uint64_t type and QB_TIME_NS_IN_MSEC Function message_handler_orf_token contains extra debug info enabled by defining GIVEINFO. Insted of using long long unsigned int use better suited uint64_t and make use of QB_TIME_NS_IN_MSEC constant instead of hardcoded number. Also compile tv_old conditionally so it is not used by accident. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2024-10-23 16:03:14 +02:00
Jan Friesse	55a6f657f4	totem: Use proper timestamp type for token warning Timestamp diff is very unlikely to be larger than 32-bit integer but it is still worth to use 64-bit. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2024-10-23 16:03:01 +02:00
Jan Friesse	3785829935	stats: Store token rx and tx timestamps as 64-bit Token rx and tx timestamps were computed and stored as 32-bit unsigned integer but substracted in other parts of code from 64-bit integer. Result was, that node with uptime larger than 49.71 days (2^32/(10006060*24)) reported wrong numbers for stats.srp.time_since_token_last_received and in log message during long pause (function timer_function_orf_token_warning). Solution is to store rx and tx data as 64-bit integer. Fixes #761 Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2024-10-23 16:02:50 +02:00
Christine Caulfield	8b9d5e7051	rust: fix clippy warning in rust 1.81 Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2024-09-19 11:51:03 +02:00
Jan Friesse	02d64060c1	coroparse: Free kv_item key and value on failure If strdup of kv_item key or value failed only kv_item itself was freed. Free also key and value (kv_item is zeroed so free of NULL variable is safe). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2024-09-02 17:01:38 +02:00
Jan Friesse	2f19853bf4	icmap: Free memory if qb_map_notify_add fails Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2024-09-02 17:01:29 +02:00
Jan Friesse	b71b8f9dbf	cfg: Free new_config interfaces on failure new_config interfaces was freed on success, but not if some previous configuration step failed. Solution is to move free of interfaces to same point as where orig_interfaces are freed. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2024-09-02 17:00:42 +02:00
Alexander Aring	53730fa7bd	main: support lock pid file arg This patch adds support to change the default corosync pid file lock path. This is useful to run corosync net namespace environment only and since the pid lock file cannot be clarified over the conf because the pid lock file exists before config parsing we allow the user to specify it over the command line. Signed-off-by: Alexander Aring <aahringo@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2024-08-07 16:36:00 +02:00
Jan Friesse	9bcde28dbe	man: fix a typo in cpg_model_initialize Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2024-08-05 10:29:01 +02:00
Jan Friesse	64c83682d3	man: Improve quorum provider formatting As suggested by Christine Caulfield split long sentence so now paragraph follows same formatting style as other options hopefully making it less confusing. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2024-08-05 09:52:16 +02:00
Christine Caulfield	b98248d9a5	rust: tests return errors and don't hang Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2024-07-04 10:57:04 +02:00
Christine Caulfield	58d654261a	rust: Improve Rust bindings The big change here is that all API calls now take a &Handle rather than making a copy. This, apart from being more sensible and efficient, allows us to implment Drop on the handle so that it will call _free() when it goes out of scope. There's some jiggery-pokery with a clone flag in there now because of callbacks that can return a valid handle, and we want those to be Drop'ed sensibly. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2024-07-04 10:56:58 +02:00
Ferenc Wágner	6009018151	Move corosync-notifyd policy file into $(datadir)/dbus-1/system.d As per dbus-daemon(1): > Third-party packages would historically install XML files into > /etc/dbus-1/system.d, but this practice is now considered to be > deprecated: that directory should be treated as reserved for the > system administrator. Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2024-05-28 17:03:09 +02:00
xin liang	73ffbc385f	man: corosync.conf: Multi improvements - Add default value 5405 for mcastport - Add brief introduction for UDP/UDPU/KNET transport - Keep format consistent (use uppercase) for above 3 transport types Signed-off-by: xin liang <xliang@suse.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2024-05-02 15:40:33 +02:00
Jan Friesse	c01fd757a0	totem: Fix reference links Link Corosync project archived copy of Yair Amir's PhD thesis and paper about totem protocol. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2024-03-12 17:22:42 +01:00
Christine Caulfield	ce03c68394	Report crypto errors back to cfg reload Because crypto changing happens in the 'commit' phase of the reload and we can't get sure that knet will allow the new parameters, the result gets ignored. This can happen in FIPS mode if a non-FIPS cipher is requested. This patch reports the errors back in a cmap key so that the command-line can spot those errors and report them back to the user. It also restores the internal values for crypto so that subsequent attempts to change things have predictable results. Otherwise further attempts can do nothing but not report any errors back. I've also added some error reporting back for the knet ping counters using this mechanism. The alternative to all of this would be to check for FIPS in totemconfig.c and then exclude certain options, but this would be duplicating code that could easily get out of sync. This system could also be a useful mechanism for reporting back other 'impossible' errors. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2024-02-05 15:20:08 +01:00
Christine Caulfield	8d46eb0127	Fix up the library .versions files I've only added missing symbols and removed old ones. The actual library version numbers might need assessing too. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2024-01-02 15:32:29 +01:00
Jan Friesse	2fcda76b96	configure: Fix building of rust for release Set rustver correctly for both release version string (for example 3.1.7) and git one (3.1.7.1-982f). corosyncrustver must be escaped by '[]' because sed is using these two characters and m4 would remove them. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2023-11-28 16:03:14 +01:00
Jan Friesse	982ff8d818	License: Fix year (mainly to fix rust building) Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2023-11-20 14:56:58 +01:00
Machiry Aravind Kumar	40e08b219d	Handling integer overflow issues Avoiding signed integer overflows by converting size related types to size_t. Signed-off-by: Machiry Aravind Kumar <makrvcs@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-11-06 16:48:08 +01:00
Christine Caulfield	6a6c7ab02f	rust: Improve vector initialisation (also silence clippy in rust 1.73) Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-10-09 15:04:52 +02:00
Christine Caulfield	0ee1d6cddb	man: Update the corosync_overview manpage The bits about IPv6 were out of date (for knet). Added reference to the corosync-*tool utilities so that people know they are there Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-10-06 11:32:40 +02:00
Christine Caulfield	33fa5dcb85	config: Fail to start if ping timers are invalid This required adding a lot of return values to two previously 'void' functions. I did two rather than just the one that was needed because it seemed to make sense to do them both together. Although these functions now return errors, they are probably still ignored higher up. this really needs a comprehensive audit. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-10-05 15:53:55 +02:00
Christine Caulfield	9aaac85b8d	rust: Remove some pointless casts As pointed out by clippy in Rust 1.72 Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-08-30 17:31:25 +02:00
Christine Caulfield	77d9ea3ca1	parser: Allow a non-breaking space as 'whitespace' non-breaking spaces are depressingly easy to enter in some editors and can make a mess of a corosync.conf file, as the character can break keyword names and generate some very strange error messages. So here we include it (0xA0) as a valid whitespace character. The (unsigned char) cast is for portability - Intel systems use signed chars so we'd need something there, but this should protect us against unsigned char systems too. No attempt is made to protect against UTF-8 characters, that's very much out of scope for this project I suspect. ref: https://github.com/corosync/corosync/issues/723 Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-08-09 09:48:21 +02:00
Jan Friesse	149c64725f	spec: Migrate to SPDX license Both Fedora and openSUSE now recommends to use SPDX shortname format for License. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2023-06-06 16:28:16 +02:00
Jan Friesse	a93e2aa363	build: Fix rust make -j build dep for distcheck "Inspired" by similar patch from kronosnet (531ebe195a955d9a1c8b762443ecab3edca95ad4) Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2023-05-29 11:15:16 +02:00
Jan Friesse	b47c5197ea	rust: Remove tests from check scripts Rust test are equivalent of C tests (so interactive one) and not automated tests, so it shouldn't be executed by make check. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2023-05-23 08:47:18 +02:00
Christine Caulfield	f4ff07eba3	Rust: Remove obsolete bindgen flag --size_t-is-usize has been deprecated for a while and is removed in bindgen 0.64 Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-04-24 09:27:32 +02:00
Christine Caulfield	3e4eba6548	knet: use knet TRACE logging level if available Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-04-03 16:13:23 +02:00
Christine Caulfield	f68a2d1c85	Rust: 'fix' clippys for Rust 1.67 This is clippy getting a bit above itself IMHO Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-01-27 17:16:48 +01:00
Christine Caulfield	846f3d13c6	rust: Make it work on FreeBSD Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-01-26 17:03:43 +01:00
Christine Caulfield	f34052d78e	bindings: Add Rust bindings Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2023-01-26 17:03:36 +01:00
Jan Friesse	91348f8659	totemconfig: Add support for knet_mtu totem.knet_mtu is new configuration option which allows setting of automatic or manual knet MTU. Also reload of totem.knet_pmtud_interval is fixed now, so it works when key is deleted (and set back default value). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2022-10-24 16:57:27 +02:00
Jan Friesse	c78e159267	configure: Modernize configure.ac a bit ... to make 2.71 happy. Also increase minimum version to 2.69 (10 years old version so should be compatible enough). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2022-07-27 09:06:22 +02:00
Christine Caulfield	7b96a937df	log: Configure knet logging to the same as corosync Before this, all knet messages, including debug, were sent over the pipe from knet to corosync and filtered in corosync. This was obviously a waste, so now we tell knet the logging level we need from it and so only get the messages that the user has requested. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2022-03-31 17:29:20 +02:00
Jan Friesse	04362046c4	logrotate: Use copytruncate method by default The reopen lograte method has two main problems: 1. It does fail when corosync is not running (solvable by adding "\|\| true") 2. If (for some reason, like SELinux) cfgtool -L fails, logrotate fails and corosync keeps logging into old file. Added "\|\| true" makes situation even worse because logrotate removes file but corosync keeps logging into it. Solution is to install copytruncate logrotate snip by default (and keep reopen config file only for reference). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2022-03-30 16:18:36 +02:00
Andreas Grueninger	1038e4a18f	totemconfig: Check uname return value correctly uname in Solaris/Illumos returns non-negative value when succesful. Signed-off-by: Andreas Grueninger <andreas.grueninger@noemail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2022-03-08 15:32:07 +01:00
Jan Friesse	59d3303517	totempg: Fix alignment handling Some platforms requires aligned memory access. For such platforms, special code was added using address modulo 4 to check if aligning is needed or not. This may be problem for 64 bits platforms. Also check in app_deliver_fn was incorrect and always true. Solution is to use modulo sizeof pointer and add parentheses to fix the check in app_deliver_fn function. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2022-02-01 10:34:28 +01:00
Jan Friesse	ada1cfa021	pkgconfig: Export corosysconfdir Useful for external code to easily tell where corosync.conf is (in case someone configured it for /usr/local/etc, ...) E.g. pacemaker's crm_report collects corosync.conf, and some of its testing tools generate a corosync.conf for a test cluster. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2022-01-05 14:06:44 +01:00
Ferenc Wágner	6210a89314	Remove bashism from configure script This was the real problem behind `384d168`: Debian experimental now sports a dash with LINENO support, so configure does not fall back to using bash instead, choking on such bash-only constructs. Unfortunately this didn't bail out cleanly, just unexpectedly set link_all_deplibs to no, and the error message ./configure: 13158: test: yes: unexpected operator stayed unnoticed in the logs. Actually, link_all_deplibs=no is the default in Debian, reducing overlinking and causing confusion overall, see https://debbugs.gnu.org/db/13/13920.html for example. I think being explicit about used interfaces has its merit, so now that Corosync has it, it might be advantageous to disable link_all_deplibs by default across the board (after this patch re-enables it as a side effect). Signed-off-by: Ferenc Wágner <wferi@debian.org> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2022-01-03 11:01:48 +01:00
Jan Friesse	8b638e989c	totemudpu: Don't block local socketpair Commit to drop packets from unlisted IPs made ifdown case not working because msg_name is unset for socketpair. solution is to drop packets from unlisted IPs only when bind state is BIND_STATE_REGULAR. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2021-11-24 09:22:49 +01:00
Jan Friesse	384d168b0e	build: Add explicit dependency for used libraries Don't rely on implicit symbol finding (cs_strerror being most prominent example) but rather use explicit one. This makes current debian experimental happy (compile source) Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2021-11-10 08:24:20 +01:00
Jan Friesse	e7a82370a7	totemsrp: Switch totempg buffers at the right time Commit `92e0f9c7bb` added switching of totempg buffers in sync phase. But because buffers got switch too early there was a problem when delivering recovered messages (messages got corrupted and/or lost). Solution is to switch buffers after recovered messages got delivered. I think it is worth to describe complete history with reproducers so it doesn't get lost. It all started with `402638929e` (more info about original problem is described in https://bugzilla.redhat.com/show_bug.cgi?id=820821). This patch solves problem which is way to be reproduced with following reproducer: - 2 nodes - Both nodes running corosync and testcpg - Pause node 1 (SIGSTOP of corosync) - On node 1, send some messages by testcpg (it's not answering but this doesn't matter). Simply hit ENTER key few times is enough) - Wait till node 2 detects that node 1 left - Unpause node 1 (SIGCONT of corosync) and on node 1 newly mcasted cpg messages got sent before sync barrier, so node 2 logs "Unknown node -> we will not deliver message". Solution was to add switch of totemsrp new messages buffer. This patch was not enough so new one (`92e0f9c7bb`) was created. Reproducer of problem was similar, just cpgverify was used instead of testcpg. Occasionally when node 1 was unpaused it hang in sync phase because there was a partial message in totempg buffers. New sync message had different frag cont so it was thrown away and never delivered. After many years problem was found which is solved by this patch (original issue describe in https://github.com/corosync/corosync/issues/660). Reproducer is more complex: - 2 nodes - Node 1 is rate-limited (used script on the hypervisor side): ``` iface=tapXXXX # ~0.1MB/s in bit/s rate=838856 # 1mb/s burst=1048576 tc qdisc add dev $iface root handle 1: htb default 1 tc class add dev $iface parent 1: classid 1:1 htb rate ${rate}bps \ burst ${burst}b tc qdisc add dev $iface handle ffff: ingress tc filter add dev $iface parent ffff: prio 50 basic police rate \ ${rate}bps burst ${burst}b mtu 64kb "drop" ``` - Node 2 is running corosync and cpgverify - Node 1 keeps restarting of corosync and running cpgverify in cycle - Console 1: while true; do corosync; sleep 20; \ kill $(pidof corosync); sleep 20; done - Console 2: while true; do ./cpgverify;done And from time to time (reproduced usually in less than 5 minutes) cpgverify reports corrupted message. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2021-11-03 10:19:44 +01:00
Christine Caulfield	f6f6f41a87	cpghum: Allow to continue if corosync is restarted Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2021-10-25 11:41:34 +02:00
miharahiro	d5b53fd227	man: Fix consensus timeout The consensus timeout is 1.2 * token_timeout, which has been changeg from 1000 to 3000, so change also consensus timeout. Signed-off-by: miharahiro <hmihara@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2021-10-18 14:21:45 +02:00
Jan Friesse	60dbacaeb4	logsys: Unlock config mutex on error Thanks Ryan Cai <ycaibb@gmail.com> for reporting the problem. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2021-09-13 09:13:54 +02:00
Jan Friesse	cdf72925db	totem: Add cancel_hold_on_retransmit config option Previously, existence of retransmit messages canceled holding of token (and never allowed representative to enter token hold state). This makes token rotating maximum speed and keeps processor resending messages over and over again - overloading network and reducing chance to successfully deliver the messages. Also there were reports of various Antivirus / IPS / IDS which slows down delivery of packets with certain sizes (packets bigger than token) what make Corosync retransmit messages over and over again. Proposed solution is to allow representative to enter token hold state when there are only retransmit messages. This allows network to handle overload and/or gives Antivirus/IPS/IDS enough time scan and deliver packets without corosync entering "FAILED TO RECEIVE" state and adding more load to network. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2021-08-20 16:55:48 +02:00

1 2 3 4 5 ...

4263 Commits