mirror_corosync-qdevice

mirror of https://git.proxmox.com/git/mirror_corosync-qdevice synced 2025-04-28 14:44:08 +00:00

Author	SHA1	Message	Date
Jan Friesse	df3c6722b3	qnetd: Don't alloc host_addr getopt will return pointer to argv so there is no need to dup optarg. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-23 14:18:41 +01:00
Jan Friesse	28d49141f8	qnetd: Move client schedule disconnect handling Client disconnect used to be per client fd in the qnetd_client_net_socket_poll_loop_set_events_cb. Problem is, that disconnect calls algorithm which may send message to other client with fd which was already processed in the pr-poll-loop so POLLOUT is not set till new loop exec is called (and that usually happens because old one timeouts). To reproduce this problem use ffsplit and make qnetd disconnect one of the clients - ffsplit needs to send ack/nack votes, but it doesn't send them during first iteration and waits for dpd timeout. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-18 17:58:34 +01:00
Jan Friesse	72f9388083	qnetd-algo-ffsplit: Simplify KAP Tie-breaker logic Also make it more reliable. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-18 15:05:21 +01:00
Jan Friesse	a8b7513df9	qnetd: Improve dead peer detection Previously dead peer detection timer was scheduled every dpd_interval, added dpd_interval to all of the clients timestamp and if timestamp was larger than client hearbeat interval * 1.2 then check if client sent some message. If so, flag was reset. This method was source of number of problems so instead different method is now used. Every single client has its own timer with timeout based on (configurable) dpd_interval_coefficient and multiplied with client heartbeat timeout. When message is received from client timer is rescheduled. When timer callback is called (= client doesn't sent message during timeout) then client is disconnected. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-18 15:05:21 +01:00
Jan Friesse	8211cf2394	qnet-config: Add space to string concat Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-13 16:43:37 +01:00
Jan Friesse	0360d14b49	timer-list: Add functions for get and set interval Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-13 16:43:37 +01:00
Jan Friesse	af4f8826dc	timer-list: Rename delete and reschedule ops Add entry to the name so it is more evident change is happening to the entry and not to the list. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-13 16:43:37 +01:00
Jan Friesse	b9685e4860	utils: Add utils_strtod Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-13 16:34:03 +01:00
Jan Friesse	d99c195fc5	qdevice: Handle configurations without ring0_addr Configuration without ring0_addr is valid for new Corosync. Big thanks to Fabian-Gruenbichler who reported the problem and Oyvind Albrigtsen for englishify the error message. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-13 16:01:24 +01:00
Jan Friesse	897a725ad9	qdevice: Configuration without nodelist is invalid Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-13 16:01:24 +01:00
Jan Friesse	154ded5d46	qdevice-cmap: Load clear node high bit only once Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-13 16:01:24 +01:00
Jan Friesse	e4bdad3cbb	qdevice-cmap: Fix clear high node bit typo It's really bit, not byte. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-13 16:01:23 +01:00
Jan Friesse	5dd8096414	qdevice-net-ipc-cmd: Fix compiler warning Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-11 17:28:04 +01:00
Fabio M. Di Nitto	cc7f23cf11	devel: add corosync-qdevice.pc file for pcs to use add corosync-qdevice-devel package for future Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2020-11-11 17:22:49 +01:00
Jan Friesse	2b18b0bc14	tests: Fix assert problems Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-05 18:19:36 +01:00
Jan Friesse	f6bc0ceb1c	timer-list: Improve efficiency of delete operation Position in entries array, heap_pos, is added to the entry. This has to be kept in sync for every move so new internal set/get functions are added too. This removes need for searching for entry. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-05 18:19:35 +01:00
Jan Friesse	03a257fa01	test-timer-list: Ignore poll errors Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-04 17:52:04 +01:00
Jan Friesse	6c7d38ad99	tlv: Check dynar_cat result Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-04 17:52:04 +01:00
Jan Friesse	3469f01f1d	test-process-list: Fix few bugs Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-04 17:52:04 +01:00
Jan Friesse	63544ecc6d	msg: Check cat result on adding msg type and size Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-04 17:52:04 +01:00
Jan Friesse	ba5a711d69	timer-list: Implement heap based timer-list Previous timer-list was naive implementation of priority queue and very slow when number of timers increased. This was not a problem because only few timers were used. But with removal of dpd timer and replacement with per-connection timer this may become problematic. Solution is to use binary heap based priority queue which is much faster. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-04 17:52:04 +01:00
Jan Friesse	77b6a19678	pr-poll-loop: Add queue header include Also add same includes to qnetd-alog-utils header file. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-11-04 17:52:04 +01:00
Jan Friesse	3f76ace659	qdevice-ipc: Fix dereference bug Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-10-27 17:22:46 +01:00
Jan Friesse	fbc34f3b05	qnet: Add support for keep active partition vote This patch adds qdevice-net part of keep active partition tie breaker functionality. It's enabled by default. When tie happens prefer partition with members of previously active (quorate) partition. This is hard-coded behavior of LMS algorithm so this setting affects only FFSplit algorithm. By default it is disabled for backwards compatibility. This solves problem with FFSplit when node A (with lowest id) is killed, node B gets vote and then node A starts up and creates single node membership and gets vote. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-10-27 17:22:46 +01:00
Jan Friesse	09c6f78864	qnetd: Fix NULL dereference of client Shouldn't happen but be rather safe. Also add more comments. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-10-27 17:22:46 +01:00
Jan Friesse	0013607e4b	qdevice-net-heuristics: Fix log message Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-10-27 17:22:46 +01:00
Jan Friesse	71329dbceb	qdevice: Fix set option and set option reply To match the specification add heartbeat timeout only when requested. Also add qdevice client method to send option message. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-10-27 17:22:46 +01:00
Jan Friesse	a371519328	LICENSE: Update copyright date Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-09-22 13:37:10 +02:00
Jan Friesse	7a0201a5c6	qnetd: Add support for keep active partition vote When tie happens prefer partition with members of previously active (quorate) partition. This is hard-coded behavior of LMS algorithm so this setting affects only FFSplit algorithm. By default it is disabled for backwards compatibility. This solves problem with FFSplit when node A (with lowest id) is killed, node B gets vote and then node A starts up and creates single node membership and gets vote. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-09-22 13:35:55 +02:00
Jan Friesse	c2007cf2ea	README: Fix typos Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-09-21 16:35:32 +02:00
Jan Friesse	fa36d6791f	timer-list: Add test Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-09-21 16:35:32 +02:00
Jan Friesse	dc244ea404	timer-list: Return error on adding NULL callback Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-09-21 16:35:32 +02:00
Jan Friesse	11a861c93f	qnetd: Fix dpd timer With default config of running dpd timer every 10 second and waiting for 2 * client_timeout to clear message received flag and then waiting another 2 * client_timeout without message received it was possible that client was marked as a dead after more than 40 seconds making qdevice to stop sending votequorum hearbeat for too long so corosync lost votes from qdevice. This patch is simpler solution which just changes default dpd timer to 1 second and timeout to 1.2 * client_timeout. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-09-21 16:35:32 +02:00
Jan Friesse	3db05bedf6	qdevice-votequorum: Fix typo in log message Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-09-21 16:35:32 +02:00
Jan Friesse	8217e33e86	qdevice: Port qdevice to use pr-poll-loop Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-31 17:04:49 +02:00
Jan Friesse	d53a5b2961	qdevice-net: Log adds newline automatically Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-31 17:04:49 +02:00
Jan Friesse	3bbf28b368	qnetd: Return error code based on ipc closed So restore pre pr-poll-loop behavior. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-31 17:04:49 +02:00
Jan Friesse	07a7e8a7ab	pr-poll-loop: Fix set_events_cb return code When events is set to 0 and set_events return -2 it was changed to -1. Solution is to check, if return code was 0 and only if so, change return code to -1 if events is 0. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-31 17:04:49 +02:00
Jan Friesse	6bf5f6c011	qdevice: Fix connect heuristics result callback Previous patch `8dbf1bc8b0` was wrong because it fixed the crash but made qdevice not work at all. Correct solution is to test, if state is QDEVICE_NET_INSTANCE_STATE_WAITING_INIT_REPLY. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-28 08:57:15 +02:00
Jan Friesse	ea6d7a909d	pr-poll-loop: Add pre poll callbacks Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-28 08:57:15 +02:00
Jan Friesse	01a63aae28	pr-poll-loop: Pass PRPollDesc for prfd events Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-28 08:57:15 +02:00
Jan Friesse	d0cdeff06a	pr-poll-loop: Add support for PR_POLL_EXCEPT Map PR_POLL_EXCEPT to POLLPRI (as NSPR does). Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-28 08:57:15 +02:00
Jan Friesse	1ad070d8a9	qnetd: Move pr_poll_loop_exec call to function Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-28 08:57:15 +02:00
Jan Friesse	292f7dd2f5	qnetd: Log pr_poll_loop_add,del errors properly Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-25 18:01:18 +02:00
Jan Friesse	b42bc20d3c	qnetd: Remove unneeded pprio include Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-25 18:01:18 +02:00
Jan Friesse	8144b162f0	qnetd: Remove write callback on listening sockets IPC and TLS sockets are read only, so write callbacks should never happen (specifically tested in pr-poll-loop test) so remove them. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-25 18:01:18 +02:00
Jan Friesse	7a4e9c59ee	qdevice: Initial port to use pr-poll-loop Only qdevice_instance_wait_for_initial_heuristics_exec_result is ported for now. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-25 18:01:17 +02:00
Jan Friesse	c1910888ed	pr-poll-loop: Return error code if PR_Poll fails Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-25 18:01:17 +02:00
Jan Friesse	ae7d60290f	heuristics: Remove qdevice instance pointer Heuristics is designed to be component of its own, which doesn't depend on qdevice_instance. Removing qdevice_instance pointer was easy as soon as exec notifier got two user data pointers. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-25 18:01:17 +02:00
Jan Friesse	e7ef364191	tests: Enhance pr-poll-loop test Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2020-08-19 14:33:01 +02:00

1 2 3

123 Commits