mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2026-01-14 08:56:16 +00:00

Author	SHA1	Message	Date
Jan Friesse	70bd35fc06	config: Process broadcast option consistently Broadcast option is global but in config set in interface section. When more interfaces are defined, only broadcast from last section was used. Solution is to use broadcast whenever at least one interface use broadcast. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-11-24 11:55:37 +01:00
Jan Friesse	6c028d4d9c	config: Make sure user doesn't mix IPv6 and IPv4 Checking code was there, sadly not correct, so it was possible to enter one bindnet addr as IPv4 and second as IPv6. Fix is trivial. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-11-24 11:55:37 +01:00
Jan Friesse	bb52fc2774	Store configuration values used by totem to cmap Some totem configuration values (like token, consensus, ...) are ether computed or default value is used. It's hard to find out, what value is really used. Solution is to store values in cmap. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-10-13 11:59:06 +02:00
Jan Friesse	03f95ddaa1	Adjust MTU for IPv6 correctly MTU for IPv6 is 20 bytes larger then IPv4. This fact was not taken into account so IPv6 packets were larger then MTU resulting in fragmentation. Solution is to substract correct IP header size. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-10-01 14:20:21 +02:00
Fabio M. Di Nitto	239e239782	[crypto] fix crypto block rounding/padding calculation libnss is "weird" in this respect as some block sizes are hardcoded, others need to be determined dynamically. For AES we need to use the values we know since GetBlockSize would return errors, for 3des (that hopefully nobody is using) the value returned by GetBlockSize is 8, but let's use the call into libnss to avoid possible conflicts with distro patching or older versions. Now, given the correct block size, the old calculation simply added block size to the hdr_size. This is not sufficient. We use _PAD encryption methods and we need to take that into account. _PAD is calculated given the current input buf len and rounded up to block size boundary, then block_size is added. Ideally we would do that on a per packet base but current transport infrastructure doesn't allow it yet. So round up the hdr_size to double the block_size reported by the cipher. Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-09-06 07:11:56 +02:00
Jan Friesse	2429481b96	totemudpu: Send msgs to all members occasionally To follow spec it's needed to send messages to all nodes (not only active members) from time to time to detect merge. This is needed in situations when totemsrp merge timer isn't running (because there is enough messages sent by processors) to detect merge. Example scenario: - 3 nodes, all of them running cpgverify - One node is isolated (iptables for example) - Node is un-isolated Without this commit, node will not merge as long as the cpgverify is running. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-26 15:36:07 +02:00
Jan Friesse	71f1b99649	totemudpu: Implement member_set_active Member active is used for sending "multicast" messages only to members of ring. This reduces network load if some nodes are intentionally down. Only regular multicast message load is reduced (messages sent by totemudpu_mcast_noflush_send), because special messages (like hold cancel, join message, ...) still have to be send to all members to ensure correct behavior. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-26 15:36:05 +02:00
Jan Friesse	371a99e961	totemrrp: Implement _membership_changed All _membership_changed calls totemnet_member_set_active passing 1 as active parameter for joined nodes and 0 for left nodes. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-26 15:36:02 +02:00
Jan Friesse	4c717942cf	totemnet: Add totemnet_member_set_active totemnet_member_set_active together with transport specific member_set_active makes possible for totemnet (and more interestingly transport) to be informed about membership changes. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-26 15:35:59 +02:00
Jan Friesse	acb55cdb03	totem: Inform RRP about membership changes Services are informed about membership changes, but if same information is needed inside totemrrp or totemnet, it's impossible to gather this information. Patch makes this possible for now only for RRP with empty callbacks. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-26 15:35:56 +02:00
Christine Caulfield	02f58aec9c	YKD: Fix loading of YKD quorum module Although YKD is currently unsupported, untested and decprecated it's handy for testing things in the quorum module. This patch allows YKD to actually load without an error. It does not fix anything else in the service! Also remove vsftype and its reference to YKD being the preferred and default provider from the corosync.conf man page, as that hasn't been true for a considerable time. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-08-18 09:33:59 +01:00
Christine Caulfield	cbf753405b	votequorum: Add cmap key to reset wait_for_all It's possible in a two_node cluster (and others but it's more likely with just two) that a node could be booted up after downtime or failure and the other node is not available for some reason. In this case it would not be allowed to proceed because wait_for_all is enforced. This patch provides a cmap key to clear this flag in the desperate situation where that becomes necessary. It should only be used with extreme caution and will be wrapped up in pcs which should also check that fencing has been run. Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-08-12 16:02:46 +01:00
Jason HU	f135b68096	Cancel token holding while in retransmition When there is no other activty on ring but only retransmition, and token is in hold mode, the retransmition will become slow. More over, if the retransmition is always fail but token rotation works well, then it takes quite a lone time (fail_to_recv_const * token_hold = 2500 * 180ms = 450sec) for the retransmit requester to meet the "FAILED TO RECEIVE" condition to re-construct a new ring. This problem can be solved by checking if retransmits are present before going into hold. If a node is the retransmit requester or the resender, it set my_token_held to 0 to speed up retransmition and omit further unnecessary sending of token_hold_cancel signal. Signed-off-by: Jason HU <huzhijiang@gmail.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-12 09:28:04 +02:00
Jan Friesse	17488909d4	votequorum: Make qdev timeout in sync configurable Configuration option quorum.device.sync_timeout is available for setting qdevice poll timeout for synchronization phase. Default value is 30 sec. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-05 17:22:52 +02:00
Jan Friesse	b4c9934635	votequorum: Block sync until qdevice poll If qdevice is registered a alive, corosync waits in sync phase until timeout expires or qdevice votes with correct nodeid parameter. This gives qdevice time to decide to vote or not undisturbed and without time hazard. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-05 17:22:47 +02:00
Jan Friesse	7cad804629	ipc: Process votequorum messages during sync This is needed for qdevice to be able to process messages during synchronization phase. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-05 17:22:44 +02:00
Jan Friesse	b8902464d1	votequorum: Add ring id to poll call If votequorum service receives incorrect (not current) ringid, call is ignored and CS_ERR_MESSAGE_ERROR is returned. This and previous commits makes incompatible changes in votequorum API/ABI, so library version is increased. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-05 17:22:41 +02:00
Jan Friesse	5f6f68805c	votequorum: Return current ring id in callback Returning ring id will be used in poll function. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-08-05 17:22:37 +02:00
Christine Caulfield	88dbb9f722	totemconfig: Make sure join timeout is less than consensus The thesis contains this paragraph: " The Join timeout is shorter than the Consensus timeout and is used to increase the probability that Join messages from all currently working processors are received during a single round of consensus." Empirically I can confirm that making join less than consensus can cause havoc with a cluster so I think we should enforce this. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-07-25 08:24:02 +01:00
Christine Caulfield	3b8365e806	config: Fix typos Fix several places where 'then' is used instead of 'than' in error messages and a comment. Signed-off-by: Christine Caulfield <ccaulfie@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-07-24 10:27:45 +01:00
Jan Friesse	63bf09776f	totemconfig: refactor nodelist_to_interface func Move finding of bindaddr in nodelist to generally usable function totem_config_find_local_addr_in_nodelist and refactor config_convert_nodelist_to_interface function to use it. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2014-07-22 14:59:31 +02:00
Jan Friesse	10c80f454e	totemconfig: totem_config_get_ip_version Add totem_config_get_ip_version to get user configured ip version. Make totem_config_read use this newly introduced function. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2014-07-22 14:59:27 +02:00
Jan Friesse	dc35bfae62	totemconfig: Free ifaddrs list Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2014-07-22 14:59:20 +02:00
Fabio M. Di Nitto	84b9e5989a	be consistent in using CPPFLAGS vs CFLAGS Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-07-21 08:47:21 +02:00
Vladislav Bogdanov	e3ffd4fedc	Implement config file testing mode Signed-off-by: Vladislav Bogdanov <bubble@hoster-ok.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-07-16 16:10:32 +02:00
Jan Friesse	dfaca4b10a	Fix compiler warning introduced by previous patch QB loop signal handler prototype differs from signal(2) prototype. Solution is to create wrapper functions. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2014-07-09 15:57:35 +02:00
zouyu	384760cb67	Handle SIGSEGV and SIGABRT signals SIGSEGV and SIGABRT signals are now correctly handled (blackbox is dumped and logsys is finalized). Signed-off-by: zouyu <hopkings2005@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-07-03 15:13:48 +02:00
zouyu	cc80c8567d	fix memory leak produced by 'corosync -v' Signed-off-by: zouyu <hopkings2005@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-07-03 14:54:05 +02:00
Jan Friesse	72cf15af27	votequorum: Do not process events during reload During reload, local_node_pos is deleted and reinstation is handled in totemconfig after reload is finished. votequorum handles this events and tries to reload it's configuration. This led to logging a little scary messages (even nothing bad is happening, because after local_node_pos reinstation everything back to normal). Solution is to stop processing events during reload. Sadly, simple tracking of config.reload_in_progress doesn't work because LibQB events triggering order is undefined so votequorum reload handler can be called before totemconfig (and before local_node_pos is reinstatied). So new config.totemconfig_reload_in_progress key is defined with very similar semanthic as config.reload_in_progress but set inside totem_reload_notify function. Votequorum then use this new key. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-27 11:40:21 +02:00
Jan Friesse	c8e3f14fdb	Make config.reload_in_progress key read only It's not very good idea to allow user apps changing internal key reload_in_progress. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-27 11:40:18 +02:00
Jan Friesse	4e9716ed30	coroparse: More strict numbers parsing Previous safe_atoi didn't check range of input values so if for example user used -1 s token timeout, it was converted to UINT32_MAX without letting user know. Another safe_atoi problem was using strtol. This works pretty well on 64-bit systems, where long integer is usually 64-bits long, sadly on 32-bit systems, it is usually 32-bit long. And because strtol returns signed integer, it was not possible to enter 32-bit value with highest bit set. Solution is to use strtoll which is guaranteed to be at least 64-bits long and check value range. Also error message now contains also information about expected value range. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-12 14:49:00 +02:00
Jan Friesse	da46ecfc30	Move ringid store and load from totem library Functions for storing and loading ring id was in the totem library. This causes problem, what to do when it's impossible to load or store ring id. Easy solution seemed to be assert, but sadly this makes hard for user to find out what happened (because corosync was just aborted and logsys didn't flush) Solution is to move these functions to main.c, where is much easier to handle error. This also makes libtotem free of any file system operations. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-02 14:54:57 +02:00
Jan Friesse	d310b251c3	Introduce get_run_dir function Run dir (LOCALSTATEDIR/lib/corosync) was hardcoded thru whole codebase. Totemsrp was trying to create and chdir into it, but also takes into account environment variable COROSYNC_RUN_DIR creating inconsistency. get_run_dir correctly returns COROSYNC_RUN_DIR (when set) or LOCALSTATEDIR/lib/corosync. This is now used by all functions instead of hardcoded string. All occurrences of mkdir/chdir are removed from totemsrp and chdir is now called in main function. Mkdir call is completely removed, because it was not used anyway (check in main.c was called before totemsrp init, so mkdir was never called) and also make install and/or package system should take care of creating this directory with correct permissions/context. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-02 14:53:18 +02:00
Jan Friesse	8f13a98320	logsys: Log warning if flightrecorder init fails Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-02 14:36:10 +02:00
Jan Friesse	19c5b63ff5	logsys: Log error if blackbox cannot be created Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-06-02 14:36:08 +02:00
Jan Friesse	e905f92bf5	totemiba: Fix incorrect failed log message rdma_join_multicast failed ... message parameters was swapped. Also information about multicast join is now logged as notice. Signed-off-by: Jan Friesse <jfriesse@redhat.com>	2014-05-15 15:28:51 +02:00
Yevheniy Demchenko	4d6a18d8a5	totemiba: Add multicast recovery Totemiba wasn't able to survive SubnetManager handover or restart. If SM was migrated to another node, corosync logged "multicast error" and losses connectivity. Commit should solve this situation. Signed-off-by: Yevheniy Demchenko <zheka@uvt.cz> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-05-14 14:51:07 +02:00
hfu	d0dc9ae93c	Indent: Remove newline before else branch start Signed-off-by: hfu <askfuhu@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-05-09 11:38:02 +02:00
hfu	b6e2c8024d	Indent: Remove space in negation of expression Signed-off-by: hfu <askfuhu@gmail.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2014-05-09 11:37:47 +02:00
Jan Friesse	7557fdec48	config: Allow dynamic change of token_coefficient token_coefficient change in cmap didn't triggered change. So only way how to change token_coefficient was editing config file and reload. Patch let's key totem.token_coefficient to be processed so token_coefficient can be dynamically changed. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-05-07 15:55:26 +02:00
Jan Friesse	58176d6779	Add token_coefficient option Token coefficient is used only when nodelist is specified and contains at least 3 nodes. If so, real token timeout is then computed as token + (number_of_nodes - 2) * token_coefficient. This allows cluster to scale without manually changing token timeout every time new node is added. This value can be set to 0 resulting in effective removal of this feature. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-03-25 15:29:17 +01:00
Jan Friesse	9a8de87c34	totemconfig: Log errors on key change and reload When volatile key was changed (cmap set or reload) and checks fails, nothing was logged. Values are now checked and error string is logged on problems. Also totem_config is dumped to log (DEBUG level) after every volatile key change and every reload. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-03-25 15:29:14 +01:00
Jan Friesse	b95ebd640e	totemconfig: Key change process dependencies When key with dependency was changed, dependant keys were not recomputed. Nice example is consensus timeout. If token timout was changed, consensus timeout was not recomputed correctly (nether via cmap change of key nor via cfg reload). Solution is almost complete refactor of handling volatile defaults. totem_volatile_config_read now handles not only storing cmap key to totem_config structure, but also checking of existence, comparing with zero value and properly storing defaults. totem_set_volatile_defaults is gone. It's function was splitted into totem_volatile_config_read and totem_volatile_config_validate functions. Reload callback and change of key callback are now mostly same functions and both calls totem_volatile_config_read. Patch also fixes small memory leak. totem.vsftype key is not used for long time and original totem_volatile_config_read wasn't freeing allocated memory returned by icmap_get_string. Whole reading of totem.vsftype is removed. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-03-25 15:29:12 +01:00
Jan Friesse	eeb2384157	Really clear totemconfig nodes on reload When reload was called nodes were constantly added to totemconfig nodelist. So simple corosync-cfgtool -R resulted very quickly in filling whole array and segfault. Solution is to clear member_count. Clearing is also moved directly to put_nodelist_members_to_config to make sure it's always processed. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-03-25 15:29:09 +01:00
Jan Friesse	1b6abcc7d5	Log: Make reload of logging work When reload was called multiple times (~20), logging to file stopped working. Main problem was hidden in the fact, that log file was opened multiple times, because even target_id was shared via subsystem loggers, file name was not. Solution is to ALWAYS set proper log file name into subsystem logger (copy is stored). This will not only fix problem but also removes small leak. Also if filename didn't changed, function can return sooner. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-03-25 15:13:33 +01:00
Jan Friesse	2f0cad20a9	config: Handle totem_set_volatile_defaults errors When totem_set_volatile_defaults is called from totem_config_validate return code is unchecked. It's then perfectly possible to set (for example) join timeout to very small value (1) and consensus value is then set to 0 making corosync unable to create membership. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-03-17 10:04:00 +01:00
Jan Friesse	e1801ba497	votequorum: Properly initialize atb and atb_string icmap_get_* behavior is to NOT modify passed variable when it doesn't success. So we must initialize variable before icmap_get_* call. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2014-02-26 16:59:02 +01:00
Jan Friesse	ff67daa55f	mon: Make monitoring work Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-02-25 14:57:20 +01:00
Jan Friesse	099f704cdd	mon: Pass correct pointer to inst Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-02-25 14:57:16 +01:00
Jan Friesse	57ff693b70	mon: Fix comparsion typo Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>	2014-02-25 14:57:13 +01:00

1 2 3 4 5 ...

1807 Commits