mirror_corosync

mirror of https://git.proxmox.com/git/mirror_corosync synced 2025-07-24 08:41:33 +00:00

Author	SHA1	Message	Date
Russell Bryant	a53e402912	logsys.c: Use snprintf() instead of sprintf(). Change a couple of string functions to use the the output length limiting counterpart. Signed-off-by: Russell Bryant <russell@russellbryant.net>	2011-05-08 02:42:47 -05:00
Jan Friesse	61d83cd719	totemsrp: Enhance mcast failure detection memb_state_gather_enter increase stats.continuous_gather only if previous state was gather also. This should happen only if multicast is not working properly (local firewall in most cases) and not if many nodes joins at one time. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-05-05 11:00:26 +02:00
Jan Friesse	719fddd8e1	coroipcs: Deny connect to service without initfn If library connect to service with no init function, coroipcs will try to dereference NULL pointer. Now we correctly return error code CS_ERR_NOT_EXIST. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-04-15 08:56:35 +02:00
Steven Dake	6a752ba1b1	Align ipc on 8 byte boundaries Align all ipc messages on 8 byte boundaries. This alignment will remove bus errors on systems that can't access non-byte aligned data and should improve performance. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-04-14 17:25:08 -07:00
Steven Dake	83f528b473	Fix problem where unaligned totemip address access would result in bus error on non-unaligned-safe architectures. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-04-14 17:22:02 -07:00
Greg Walton	1db74fe1b9	Clean up ENDIAN ifdef tests Signed-off-by: Greg Walton <corosync@gwalton.net> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-04-14 17:10:17 -07:00
Tim Serong	61d7ec1716	Fix tyop in RRP faulty error messages Signed-off-by: Tim Serong <tserong@novell.com> Reviewed-by: Russell Bryant <russell@russellbryant.net>	2011-04-11 02:00:18 -05:00
Angus Salkeld	4ed97d991b	IPC: place calls to stats functions outside of mutexes This is to prevent nasty deadlocks between IPC and objdb. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-04-13 08:15:59 +10:00
Zane Bitter	6365150ae2	Provide better checking of the message type A negative value for the message type (on systems where char is signed) would cause a crash. This is highly probable if the cluster is, for example, misconfigured to have encryption enabled on some nodes but not others. Signed-off-by: Zane Bitter <zane.bitter@gmail.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-04-12 13:09:39 -07:00
Zane Bitter	6e990d202f	Fix uninitialised memory errors found by valgrind Signed-off-by: Zane Bitter <zane.bitter@gmail.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-04-08 09:13:12 -07:00
Angus Salkeld	265661745d	Fix shutdown when a confdb client is still connected If you are connected to corosync and registered for object notifications then corosync is asked to shutdown the IPC server will get stuck. This is because the pipe is closed and the refcount is increased. This leaves ipcs with a connection that it can't destroy. Solution: 1) if a write to the pipe fails (pipe closed) decrement the refcounter. 2) fix the object_track_stop() - it was not working as the functions did not match up. (this caused the late callbacks). 3) in ipcs call exit_fn() then stats_destroy_connection() so that the service engine can have time to call object_track_stop() before the object gets destroyed. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-29 13:48:20 +11:00
Angus Salkeld	076e8b74f7	STATS: add the service name to the connection name. This helps to quickly identify what service the application is connected to. The object will now look like: runtime.connections.corosync-objctl:CONFDB:19654:13.service_id=11 runtime.connections.corosync-objctl:CONFDB:19654:13.client_pid=19654 etc... This also makes it clearer to receivers of the dbus/snmp events what is going on. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-29 13:48:13 +11:00
Steven Dake	7d5e588931	totemsrp: free messages originated in recovery rather then rely on messages_free Relying on messages_free may seem like it should work, but it leads to a situation where every node has released the messages, yet some nodes think messages are missing. The output then looks like "Retransmit: #" in repitition. This patch frees those messages immediately during the transition to the OPERATIONAL state and sets the internal variables totemsrp depends upon to the proper values. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-03-24 09:25:15 -07:00
Steven Dake	ef05817ce5	totemsrp: Only restore old ring id information one time The current code stores the current ring information every time a commit token is generated. This causes the old ring id used for comparison purposes to increase if a token is lost in commit or recovery, resulting in failure of totem. This patch changes the behavior to only store the old ring id one time when the commit token is received, and then further commit token ring id saves are not done until OPERATIONAL is reached. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-03-24 09:22:34 -07:00
Steven Dake	1a7b7a39f4	totemsrp: Remove recv_flush code The recv_flush code is no longer necessary because of the miss_count_count addition. It can in some cases lead to register corruption because of interactions with -fstack-protector, the recursive nature of how this code works, and interactions with the optimizer in some versions of gcc. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Jan Friesse <jfriesse@redhat.com>	2011-03-24 09:21:27 -07:00
Steven Dake	d99fba72e6	Resolve abort during simulatenous stopping of atleast 4 nodes consider 5 nodes. node 3,4 stopped (by random stopping) node 1,2,5 form new configuration and during recovery node 1 and node 2 are stopped (via service service corosync stop). This causes 5 never to finish recovery within the timeout period, triggering a token loss in recovery. Bug #623176 resolved an assert which happens because the full ring id was being restored. The resolution to Bug #623176 was to not restore the full ring id, and instead operate (according to specifications) the new ring id. Unfortunately this exposes a problem whereby the restarting of nodes 1-4 generate the same ring id. This ring id gets to the recovery failed node 5 which is now in gather, and triggers a condition not accounted for in the original totem specification. It appears later work from Dr. Agarwal's PHD dissertation considers this scenario. That solution entails rejecting the regular token in the above condition. Since the ring id is also used to make decisions for commit token acceptance, we must also take care to reject the regular token in all cases after transitioning from OPERATIONAL. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-21 09:26:35 -07:00
Angus Salkeld	0ad2494ae7	Fix some "set but not used" warnings [-Wunused-but-set-variable] Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-16 07:13:42 +11:00
Angus Salkeld	c9dee9eaa7	Remove the ttl option from udpu and rely on the kernel ttl setting. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2011-03-15 19:35:23 +11:00
Angus Salkeld	86ada30aa4	Fix the ttl defaults and range 1) both IPv4 and IPv6 mcast should default to ttl=1 2) the range should be 0..255 0 is valid meaning localhost only (cluster of one) Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2011-03-15 19:34:46 +11:00
Russell Bryant	5da4d5479a	Convert existing documentation to doxygen format. This patch modifies most of the existing comments in header files to be in a format that doxygen can interpret. This provides another significant improvement to the web/pdf/etc generated documentation without having to add new content. Signed-off-by: Russell Bryant <russell@russellbryant.net> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-03-12 15:03:16 +11:00
Zane Bitter	dddaeef21c	Allocate packet buffers in the transport drivers This change paves the way for eliminating a copy within the Infiniband driver in the future by transferring responsibility for allocating and freeing message buffers to the transport driver layer. Tested under valgrind on a single-node cluster. Signed-off-by: Zane Bitter <zane.bitter@gmail.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-11 20:38:28 -07:00
Steven Dake	6aa47fde95	Fix abort when token is lost in RECOVERY state A commit token should be rejected when a token is lost in the recovery state. This occurs naturally because the ring id increases by 4 for every new ring. Prior to this patch, if the token was lost, the old ring id information was restored, causing a commit token to be accepted when it should be rejected. This erronously accepted commit token would lead to an assertion which is fixed by this patch. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-03-07 17:15:05 -07:00
Russell Bryant	c112ee8c89	Add content for the doxygen main page. This creates some content on the main page of the documentation generated by doxygen. The main page includes the license and a link to the project web site. Signed-off-by: Russell Bryant <russell@russellbryant.net> eviewed-by: Steven Dake <sdake@redhat.com>	2011-03-07 08:42:01 -06:00
Russell Bryant	a609f79f1f	Ensure that strings are null terminated after strncpy(). From the strcpy(3) man page, the following warning is given: The strncpy() function is similar, except that at most n bytes of src are copied. Warning: If there is no null byte among the first n bytes of src, the string placed in dest will not be null-terminated. The current corosync code base does not take this warning into account when using strncpy, potentially resulting in non-null terminated strings. Signed-off-by: Russell Bryant <russell@russellbryant.net> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-03-07 08:30:03 -06:00
Steven Dake	7471c88346	Don't assert when ring id file is less then 8 bytes If the ring id file for the processor is less then 8 bytes, totemsrp would assert. Our speculation is that this condition happens during a fencing operation or local filesystem corruption. With this patch, Corosync will create fresh ring id file data when the incorrect number of bytes are read from the ring id. Amend to use sizeof the strerror string length and PATH_MAX for the path length. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-02-24 15:34:39 -07:00
Jan Friesse	894ece6a14	objdb: destroy all handles in _clear_object Patch replaces free for object_instance with handle_destroy to remove leaks in handles (and also memory leak). Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-02-24 12:15:01 +01:00
Jan Friesse	41aeecc4ef	Iterate all items in object_reload_notification Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-02-23 13:36:28 +01:00
Jan Friesse	c5e8237325	logsys: Properly lock flt data before dump Data needs to be locked, otherwise resulting fdata file may be incorrect. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-02-22 10:11:11 +01:00
Jan Friesse	88515e3d20	logsys: Don't leak fd on successful fdata dump Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-02-22 10:09:10 +01:00
Russell Bryant	907d974352	Add calls to pthread_attr_destroy(). This patch adds a couple of missing calls to pthread_attr_destroy(). There were a couple of instances where pthread_attr_init() was being used without a cooresponding call to pthread_attr_destroy(). This also localizes the pthread_attr_t to the function where it is needed instead of having it persist (the man page specifically states that destroying the attributes structure has no effect on threads created using the attributes). Signed-off-by: Russell Bryant <russell@russellbryant.net> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-02-21 12:14:07 -07:00
Angus Salkeld	34cb488999	STATS: fix key name length on "join_count" Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Seven Dake <sdake@redhat.com>	2011-02-04 09:46:52 -07:00
Angus Salkeld	4da371f4f7	STATS: increase the space for application names Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Seven Dake <sdake@redhat.com>	2011-02-04 09:44:12 -07:00
Jan Friesse	5c951ac641	Add objdb firewall_enabled_or_nic_failure New objdb var runtime.totem.pg.mrp.srp.firewall_enabled_or_nic_failure is set to 1 if continuous_gather is larger then MAX_NO_CONT_GATHER. Under normal conditions, value of variable is 0. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-01-12 11:26:25 +01:00
Steven Dake	6646a864b4	Handle delayed multicast packets that occur with switches Some switches delay multicast packets vs the unicast token. This patch works around that problem by providing a new tuneable called miss_count_const. This tuneable works by counting the number of times a message is found missing and once reaching the const value, marks it as missing in the retransmit list. This improves performance and doesn't display warning messages about missed multicast messages when operating in these switching environments. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2011-01-11 10:34:46 -07:00
Angus Salkeld	a9b436c7a1	IPC: send failure message to client if memory maps fail Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2011-01-03 21:28:44 +11:00
Jan Friesse	b9df4424b1	Display warning when not possible to form cluster This may typically happen if local firewall is enabled. Patch adds new item to statistics called continuous_gather where is number of continuous entered gather state. If this number is bigger then MAX_NO_CONT_GATHER, warning message is displayed. This is also used on exiting, so stop of corosync is now possible even with enabled firewall. Signed-off-by: Jan Friesse <jfriesse@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2010-12-03 10:11:11 +01:00
Steven Dake	9096c4d96b	Set the max buffer size for sockets Set the recv buffer to a large size and the send buffer to a large size to allow the kernel to store more messages before dropping messages. Amended to change optlen type to socklen_t Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>	2010-12-01 09:45:30 -07:00
Steven Dake	00e340d095	The flushing code was introducing data corruption because of recursion errors that occur as a result of the design of udpu. Totem no longer requires the flushing technique because we don't mark a packet as missing until it has not been seen by a certain number of token rotations per a previous patch. This mechanism was introduced to work around a problem in switches where multicast messages may be delayed by long periods compared to the unicast token. This patch removes the flushing logic from udpu since it is no longer necessary. Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2010-11-28 01:45:08 -07:00
Angus Salkeld	2c46de5ac1	Add totem/interface/ttl config option. This adds a per-interface config option to adjust the TTL. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2010-11-24 14:35:56 +11:00
Steven Dake	aa03dca478	Merge branch 'topic-udpu' Conflicts: Makefile.am Signed-off-by: Steven Dake <sdake@redhat.com>	2010-11-18 15:03:19 -07:00
Steven Dake	b403fcbea9	Remove dead soresueaddr code Signed-off-by: Steven Dake <sdake@redhat.com> Reviewed-by: Angus Salkeld <asalkeld@redhat.com>	2010-11-18 14:51:17 -07:00
Steven Dake	bb05aed93f	Add the UDPU transport The UDPU transport is useful for those deployments which can't use multicast. UDPU works by using UDP unicast, which is fully supported by every switch manufacturer by default and doesn't rely on a functional IGMP implementation. An example of the UDPU transport is contained in the corosync.conf.example.udpu file which shows a 16 node cluster. This file should be copied to each node in the cluster and IP addresses changed as appropriate. Amended to remove dead udpu REUSEADDR socket option. Signed-off-by: Steven Dake <sdake@redhat.com>	2010-11-18 14:21:30 -07:00
Fabio M. Di Nitto	b2400314b2	add release script and git based versioning Signed-off-by: Fabio M. Di Nitto <fdinitto@redhat.com> Reviewed-by: Steven Dake <sdake@redhat.com>	2010-11-10 07:46:53 -07:00
Angus Salkeld	f0104b6d31	Add .gitignore files. Otherwise "git status" is a pain. Signed-off-by: Angus Salkeld <asalkeld@redhat.com> Reviewed-by: Steven Dake <sdake@edhat.com>	2010-10-21 07:43:46 -07:00
Jan Friesse	7c8cdfb197	Remove delay in library on corosync shutdown Patch removes 2 seconds delay in library on normal corosync shutdown. Delay is still present on abnormal shutdown. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3059 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-10-12 13:03:37 +00:00
Angus Salkeld	10be299e7b	Check for a properly configured multicast address. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3057 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-27 22:41:26 +00:00
Angus Salkeld	07d06c0c0f	Add monitoring and watchdog services. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3053 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-27 21:12:03 +00:00
Angus Salkeld	72addbc4cd	Add a Finite State Machine.(fsm.h) git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3052 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-27 21:11:04 +00:00
Angus Salkeld	61b7d85978	Add a Finite State Machine. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3051 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-27 21:08:01 +00:00
Angus Salkeld	53b0aa47e6	objdb: fix some ugly indentation. git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@3048 fd59a12c-fef9-0310-b244-a6a79926bd2f	2010-09-25 06:51:36 +00:00

1 2 3 4 5 ...

1323 Commits