mirror of
				https://git.proxmox.com/git/mirror_corosync
				synced 2025-11-04 14:27:05 +00:00 
			
		
		
		
	git-svn-id: http://svn.fedorahosted.org/svn/corosync/trunk@2360 fd59a12c-fef9-0310-b244-a6a79926bd2f
		
			
				
	
	
		
			186 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			186 lines
		
	
	
		
			9.3 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
.\"/*
 | 
						|
.\" * Copyright (c) 2004 MontaVista Software, Inc.
 | 
						|
.\" *
 | 
						|
.\" * All rights reserved.
 | 
						|
.\" *
 | 
						|
.\" * Author: Steven Dake (sdake@redhat.com)
 | 
						|
.\" *
 | 
						|
.\" * This software licensed under BSD license, the text of which follows:
 | 
						|
.\" *
 | 
						|
.\" * Redistribution and use in source and binary forms, with or without
 | 
						|
.\" * modification, are permitted provided that the following conditions are met:
 | 
						|
.\" *
 | 
						|
.\" * - Redistributions of source code must retain the above copyright notice,
 | 
						|
.\" *   this list of conditions and the following disclaimer.
 | 
						|
.\" * - Redistributions in binary form must reproduce the above copyright notice,
 | 
						|
.\" *   this list of conditions and the following disclaimer in the documentation
 | 
						|
.\" *   and/or other materials provided with the distribution.
 | 
						|
.\" * - Neither the name of the MontaVista Software, Inc. nor the names of its
 | 
						|
.\" *   contributors may be used to endorse or promote products derived from this
 | 
						|
.\" *   software without specific prior written permission.
 | 
						|
.\" *
 | 
						|
.\" * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
 | 
						|
.\" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 | 
						|
.\" * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 | 
						|
.\" * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
 | 
						|
.\" * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
 | 
						|
.\" * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
 | 
						|
.\" * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
 | 
						|
.\" * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
 | 
						|
.\" * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
 | 
						|
.\" * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
 | 
						|
.\" * THE POSSIBILITY OF SUCH DAMAGE.
 | 
						|
.\" */
 | 
						|
.TH EVS_OVERVIEW 8 2004-08-31 "corosync Man Page" "Corosync Cluster Engine Programmer's Manual"
 | 
						|
.SH NAME
 | 
						|
evs_overview \- EvS Library Overview
 | 
						|
.SH OVERVIEW
 | 
						|
The EVS library is delivered with the corosync project.  This library is used
 | 
						|
to create distributed applications that operate properly during partitions, merges,
 | 
						|
and faults.
 | 
						|
.PP
 | 
						|
The library provides a mechanism to:
 | 
						|
* handle abstraction for multiple instances of an EVS library in one application
 | 
						|
* Deliver messages
 | 
						|
* Deliver configuration changes
 | 
						|
* join one or more groups
 | 
						|
* leave one or more groups
 | 
						|
* send messages to one or more groups
 | 
						|
* send messages to currently joined groups
 | 
						|
.PP
 | 
						|
The EVS library implements a messaging model known as Extended Virtual Synchrony.
 | 
						|
This model allows one sender to transmit to many receivers using standard UDP/IP.
 | 
						|
UDP/IP is unreliable and unordered, so the EVS library applies ordering and reliability
 | 
						|
to messages.  Hardware multicast is used to avoid duplicated packets with two or more
 | 
						|
receivers.  Erroneous messages are corrected automatically by the library.
 | 
						|
.PP
 | 
						|
Certain guarantees are provided by the EVS library.  These guarantees are related to
 | 
						|
message delivery and configuration change delivery.
 | 
						|
.SH DEFINITIONS
 | 
						|
.TP
 | 
						|
.B multicast
 | 
						|
A multicast occurs when a network interface card sends a UDP packet to multiple
 | 
						|
receivers simulatenously.
 | 
						|
.TP
 | 
						|
.B processor
 | 
						|
A processor is the entity that executes the extended virtual synchrony algorithms.
 | 
						|
.TP
 | 
						|
.B configuration
 | 
						|
A configuration is the current description of the processors executing the extended
 | 
						|
virtual syncrhony algorithm.
 | 
						|
.TP
 | 
						|
.B configuration change
 | 
						|
A configuration change occurs when a new configuration is delivered.
 | 
						|
.TP
 | 
						|
.B partition
 | 
						|
A partition occurs when a configuration splits into two or more configurations, or
 | 
						|
a processor fails or is stopped and leaves the configuration.
 | 
						|
.TP
 | 
						|
.B merge
 | 
						|
A merge occurs when two or more configurations join into a larger new configuration.  When
 | 
						|
a new processor starts up, it is treated as a configuration with only one processor
 | 
						|
and a merge occurs.
 | 
						|
.TP
 | 
						|
.B fifo ordering
 | 
						|
A message is FIFO ordered when one sender and one receiver agree on the order of the
 | 
						|
messages sent.
 | 
						|
.TP
 | 
						|
.B agreed ordering
 | 
						|
A message is AGREED ordered when all processors agree on the order of the messages sent.
 | 
						|
.TP
 | 
						|
.B safe ordering
 | 
						|
A message is SAFE ordered when all processors agree on the order of messages sent and
 | 
						|
those messages are not delivered until all processors have a copy of the message to
 | 
						|
deliver.
 | 
						|
.TP
 | 
						|
.B virtual syncrhony
 | 
						|
Virtual syncrhony is obtained when all processors agree on the order of messages
 | 
						|
sent and configuration changes sent for each new configuration.
 | 
						|
.SH USING VIRTUAL SYNCHRONY
 | 
						|
The virtual synchrony messaging model has many benefits for developing distributed
 | 
						|
applications.  Applications designed using replication have the most benefits.  Applications
 | 
						|
that must be able to partition and merge also benefit from the virtual synchrony messaging
 | 
						|
model.
 | 
						|
.PP
 | 
						|
All applications receive a copy of transmitted messages even if there are errors on the
 | 
						|
transmission media.  This allows optimiziations when every processor must receive a copy
 | 
						|
of the message for replication.
 | 
						|
.PP
 | 
						|
All messages are ordered according to agreed ordering.  This mechanism allows the avoidance
 | 
						|
of race conditions.  Consider a lock service implemented over several processors.  Two
 | 
						|
requests occur at the same time on two seperate processors.  The requests are ordered for
 | 
						|
every processor in the same order and delivered to the processors.  Then all processors
 | 
						|
will get request A before request B and can reject request B.  Any type of creation or
 | 
						|
deletion of a shared data structure can benefit from this mechanism.
 | 
						|
.PP
 | 
						|
Self delivery ensures that messages that are sent by a processor are also delivered back
 | 
						|
to that processor.  This allows the processor sending the message to execute logic when
 | 
						|
the message is self delivered according to agreed ordering and the virtual synchrony rules.
 | 
						|
It also permits all logic to be placed in one message handler instead of two seperate places.
 | 
						|
.PP
 | 
						|
Virtual Synchrony allows the current configuration to be used to make decisions in partitions
 | 
						|
and merges.  Since the configuration is sent in the stream of messages to the application,
 | 
						|
the application can alter its behavior based upon the configuration changes.
 | 
						|
.SH ARCHITECTURE AND ALGORITHM
 | 
						|
The EVS library is a thin IPC interface to the corosync executive.  The corosync executive
 | 
						|
provides services for the SA Forum AIS libraries as well as the EVS library.
 | 
						|
.PP
 | 
						|
The corosync executive uses a ring protocol and membership protocol to send messages
 | 
						|
according to the semantics required by extended virtual synchrony.  The ring protocol
 | 
						|
creates a virtual ring of processors.  A token is rotated around the ring of processors.
 | 
						|
When the token is possessed by a processor, that processor may multicast messages to
 | 
						|
other processors in the system.
 | 
						|
.PP
 | 
						|
The token is called the ORF token (for ordering, reliability, flow control).  The ORF
 | 
						|
token orders all messages by increasing a sequence number every time a message is
 | 
						|
multicasted.  In this way, an ordering is placed on all messages that all processors
 | 
						|
agree to.  The token also contains a retransmission list.  If a token is received by
 | 
						|
a processor that has not yet received a message it should have, a message sequence
 | 
						|
number is added to the retransmission list.  A processor that has a copy of the message
 | 
						|
then retransmits the message.  The ORF token provides configuration-wide flow control
 | 
						|
by tracking the number of messages sent and limiting the number of messages that may
 | 
						|
be sent by one processor on each posession of the token.
 | 
						|
.PP
 | 
						|
The membership protocol is responsible for ring formation and detecting when a processor
 | 
						|
within a ring has failed.  If the token fails to make a rotation within a timeout period
 | 
						|
known as the token rotation timeout, the membership protocol will form a new ring.
 | 
						|
If a new processor starts, it will also form a new ring.  Two or more configurations
 | 
						|
may be used to form a new ring, allowing many partitions to merge together into one
 | 
						|
new configuration.
 | 
						|
.SH PERFORMANCE
 | 
						|
The EVS library obtains 8.5MB/sec throughput on 100 mbit network links with
 | 
						|
many processors.  Larger messages obtain better throughput results because the
 | 
						|
time to access Ethernet is about the same for a small message as it is for a
 | 
						|
larger message.  Smaller messages obtain better messages per second, because the
 | 
						|
time to send a message is not exactly the same.
 | 
						|
.PP
 | 
						|
80% of CPU utilization occurs because of encryption and authentication.  The corosync
 | 
						|
can be built without encryption and authentication for those with no security
 | 
						|
requirements and low CPU utilization requirements.  Even without encryption or
 | 
						|
authentication, under heavy load, processor utilization can reach 25% on 1.5 GHZ
 | 
						|
CPU processors.
 | 
						|
.PP
 | 
						|
The current corosync executive supports 16 processors, however, support for more processors is possible by changing defines in the corosync executive.  This is untested, however.
 | 
						|
.SH SECURITY
 | 
						|
The EVS library encrypts all messages sent over the network using the SOBER-128
 | 
						|
stream cipher.  The EVS library uses HMAC and SHA1 to authenticate all messages.
 | 
						|
The EVS library uses SOBER-128 as a pseudo random number generator.  The EVS
 | 
						|
library feeds the PRNG using the /dev/random Linux device.
 | 
						|
.SH BUGS
 | 
						|
This software is not yet production, so there may still be some bugs.  But it appears
 | 
						|
there are very few since nobody reports any unknown bugs at this point.
 | 
						|
.SH "SEE ALSO"
 | 
						|
.BR evs_initialize (3),
 | 
						|
.BR evs_finalize (3),
 | 
						|
.BR evs_fd_get (3),
 | 
						|
.BR evs_dispatch (3),
 | 
						|
.BR evs_join (3),
 | 
						|
.BR evs_leave (3),
 | 
						|
.BR evs_mcast_joined (3),
 | 
						|
.BR evs_mcast_groups (3),
 | 
						|
.BR evs_mmembership_get (3)
 | 
						|
.BR evs_context_get (3)
 | 
						|
.BR evs_context_set (3)
 | 
						|
 | 
						|
.PP
 |