mirror of
				https://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
				synced 2025-10-31 08:26:29 +00:00 
			
		
		
		
	 792b48780e
			
		
	
	
		792b48780e
		
	
	
	
	
		
			
			This adds full support for local/remote Sequence Window feature, from which the
  * sequence-number-validity (W) and
  * acknowledgment-number-validity (W') windows
derive as specified in RFC 4340, 7.5.3.
Specifically, the following is contained in this patch:
  * integrated new socket fields into dccp_sk;
  * updated the update_gsr/gss routines with regard to these fields;
  * updated handler code: the Sequence Window feature is located at the TX side,
    so the local feature is meant if the handler-rx flag is false;
  * the initialisation of `rcv_wnd' in reqsk is removed, since
    - rcv_wnd is not used by the code anywhere;
    - sequence number checks are not done in the LISTEN state (cf. 7.5.3);
    - dccp_check_req checks the Ack number validity more rigorously;
  * the `struct dccp_minisock' became empty and is now removed.
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Acked-by: Ian McDonald <ian.mcdonald@jandi.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
		
	
			
		
			
				
	
	
		
			168 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			168 lines
		
	
	
		
			7.3 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| DCCP protocol
 | |
| ============
 | |
| 
 | |
| 
 | |
| Contents
 | |
| ========
 | |
| 
 | |
| - Introduction
 | |
| - Missing features
 | |
| - Socket options
 | |
| - Notes
 | |
| 
 | |
| Introduction
 | |
| ============
 | |
| 
 | |
| Datagram Congestion Control Protocol (DCCP) is an unreliable, connection
 | |
| oriented protocol designed to solve issues present in UDP and TCP, particularly
 | |
| for real-time and multimedia (streaming) traffic.
 | |
| It divides into a base protocol (RFC 4340) and plugable congestion control
 | |
| modules called CCIDs. Like plugable TCP congestion control, at least one CCID
 | |
| needs to be enabled in order for the protocol to function properly. In the Linux
 | |
| implementation, this is the TCP-like CCID2 (RFC 4341). Additional CCIDs, such as
 | |
| the TCP-friendly CCID3 (RFC 4342), are optional.
 | |
| For a brief introduction to CCIDs and suggestions for choosing a CCID to match
 | |
| given applications, see section 10 of RFC 4340.
 | |
| 
 | |
| It has a base protocol and pluggable congestion control IDs (CCIDs).
 | |
| 
 | |
| DCCP is a Proposed Standard (RFC 2026), and the homepage for DCCP as a protocol
 | |
| is at http://www.ietf.org/html.charters/dccp-charter.html
 | |
| 
 | |
| Missing features
 | |
| ================
 | |
| 
 | |
| The Linux DCCP implementation does not currently support all the features that are
 | |
| specified in RFCs 4340...42.
 | |
| 
 | |
| The known bugs are at:
 | |
| 	http://linux-net.osdl.org/index.php/TODO#DCCP
 | |
| 
 | |
| For more up-to-date versions of the DCCP implementation, please consider using
 | |
| the experimental DCCP test tree; instructions for checking this out are on:
 | |
| http://linux-net.osdl.org/index.php/DCCP_Testing#Experimental_DCCP_source_tree
 | |
| 
 | |
| 
 | |
| Socket options
 | |
| ==============
 | |
| 
 | |
| DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of
 | |
| service codes (RFC 4340, sec. 8.1.2); if this socket option is not set,
 | |
| the socket will fall back to 0 (which means that no meaningful service code
 | |
| is present). On active sockets this is set before connect(); specifying more
 | |
| than one code has no effect (all subsequent service codes are ignored). The
 | |
| case is different for passive sockets, where multiple service codes (up to 32)
 | |
| can be set before calling bind().
 | |
| 
 | |
| DCCP_SOCKOPT_GET_CUR_MPS is read-only and retrieves the current maximum packet
 | |
| size (application payload size) in bytes, see RFC 4340, section 14.
 | |
| 
 | |
| DCCP_SOCKOPT_AVAILABLE_CCIDS is also read-only and returns the list of CCIDs
 | |
| supported by the endpoint (see include/linux/dccp.h for symbolic constants).
 | |
| The caller needs to provide a sufficiently large (> 2) array of type uint8_t.
 | |
| 
 | |
| DCCP_SOCKOPT_CCID is write-only and sets both the TX and RX CCIDs at the same
 | |
| time, combining the operation of the next two socket options. This option is
 | |
| preferrable over the latter two, since often applications will use the same
 | |
| type of CCID for both directions; and mixed use of CCIDs is not currently well
 | |
| understood. This socket option takes as argument at least one uint8_t value, or
 | |
| an array of uint8_t values, which must match available CCIDS (see above). CCIDs
 | |
| must be registered on the socket before calling connect() or listen().
 | |
| 
 | |
| DCCP_SOCKOPT_TX_CCID is read/write. It returns the current CCID (if set) or sets
 | |
| the preference list for the TX CCID, using the same format as DCCP_SOCKOPT_CCID.
 | |
| Please note that the getsockopt argument type here is `int', not uint8_t.
 | |
| 
 | |
| DCCP_SOCKOPT_RX_CCID is analogous to DCCP_SOCKOPT_TX_CCID, but for the RX CCID.
 | |
| 
 | |
| DCCP_SOCKOPT_SERVER_TIMEWAIT enables the server (listening socket) to hold
 | |
| timewait state when closing the connection (RFC 4340, 8.3). The usual case is
 | |
| that the closing server sends a CloseReq, whereupon the client holds timewait
 | |
| state. When this boolean socket option is on, the server sends a Close instead
 | |
| and will enter TIMEWAIT. This option must be set after accept() returns.
 | |
| 
 | |
| DCCP_SOCKOPT_SEND_CSCOV and DCCP_SOCKOPT_RECV_CSCOV are used for setting the
 | |
| partial checksum coverage (RFC 4340, sec. 9.2). The default is that checksums
 | |
| always cover the entire packet and that only fully covered application data is
 | |
| accepted by the receiver. Hence, when using this feature on the sender, it must
 | |
| be enabled at the receiver, too with suitable choice of CsCov.
 | |
| 
 | |
| DCCP_SOCKOPT_SEND_CSCOV sets the sender checksum coverage. Values in the
 | |
| 	range 0..15 are acceptable. The default setting is 0 (full coverage),
 | |
| 	values between 1..15 indicate partial coverage.
 | |
| DCCP_SOCKOPT_RECV_CSCOV is for the receiver and has a different meaning: it
 | |
| 	sets a threshold, where again values 0..15 are acceptable. The default
 | |
| 	of 0 means that all packets with a partial coverage will be discarded.
 | |
| 	Values in the range 1..15 indicate that packets with minimally such a
 | |
| 	coverage value are also acceptable. The higher the number, the more
 | |
| 	restrictive this setting (see [RFC 4340, sec. 9.2.1]). Partial coverage
 | |
| 	settings are inherited to the child socket after accept().
 | |
| 
 | |
| The following two options apply to CCID 3 exclusively and are getsockopt()-only.
 | |
| In either case, a TFRC info struct (defined in <linux/tfrc.h>) is returned.
 | |
| DCCP_SOCKOPT_CCID_RX_INFO
 | |
| 	Returns a `struct tfrc_rx_info' in optval; the buffer for optval and
 | |
| 	optlen must be set to at least sizeof(struct tfrc_rx_info).
 | |
| DCCP_SOCKOPT_CCID_TX_INFO
 | |
| 	Returns a `struct tfrc_tx_info' in optval; the buffer for optval and
 | |
| 	optlen must be set to at least sizeof(struct tfrc_tx_info).
 | |
| 
 | |
| On unidirectional connections it is useful to close the unused half-connection
 | |
| via shutdown (SHUT_WR or SHUT_RD): this will reduce per-packet processing costs.
 | |
| 
 | |
| Sysctl variables
 | |
| ================
 | |
| Several DCCP default parameters can be managed by the following sysctls
 | |
| (sysctl net.dccp.default or /proc/sys/net/dccp/default):
 | |
| 
 | |
| request_retries
 | |
| 	The number of active connection initiation retries (the number of
 | |
| 	Requests minus one) before timing out. In addition, it also governs
 | |
| 	the behaviour of the other, passive side: this variable also sets
 | |
| 	the number of times DCCP repeats sending a Response when the initial
 | |
| 	handshake does not progress from RESPOND to OPEN (i.e. when no Ack
 | |
| 	is received after the initial Request).  This value should be greater
 | |
| 	than 0, suggested is less than 10. Analogue of tcp_syn_retries.
 | |
| 
 | |
| retries1
 | |
| 	How often a DCCP Response is retransmitted until the listening DCCP
 | |
| 	side considers its connecting peer dead. Analogue of tcp_retries1.
 | |
| 
 | |
| retries2
 | |
| 	The number of times a general DCCP packet is retransmitted. This has
 | |
| 	importance for retransmitted acknowledgments and feature negotiation,
 | |
| 	data packets are never retransmitted. Analogue of tcp_retries2.
 | |
| 
 | |
| tx_ccid = 2
 | |
| 	Default CCID for the sender-receiver half-connection. Depending on the
 | |
| 	choice of CCID, the Send Ack Vector feature is enabled automatically.
 | |
| 
 | |
| rx_ccid = 2
 | |
| 	Default CCID for the receiver-sender half-connection; see tx_ccid.
 | |
| 
 | |
| seq_window = 100
 | |
| 	The initial sequence window (sec. 7.5.2) of the sender. This influences
 | |
| 	the local ackno validity and the remote seqno validity windows (7.5.1).
 | |
| 
 | |
| tx_qlen = 5
 | |
| 	The size of the transmit buffer in packets. A value of 0 corresponds
 | |
| 	to an unbounded transmit buffer.
 | |
| 
 | |
| sync_ratelimit = 125 ms
 | |
| 	The timeout between subsequent DCCP-Sync packets sent in response to
 | |
| 	sequence-invalid packets on the same socket (RFC 4340, 7.5.4). The unit
 | |
| 	of this parameter is milliseconds; a value of 0 disables rate-limiting.
 | |
| 
 | |
| IOCTLS
 | |
| ======
 | |
| FIONREAD
 | |
| 	Works as in udp(7): returns in the `int' argument pointer the size of
 | |
| 	the next pending datagram in bytes, or 0 when no datagram is pending.
 | |
| 
 | |
| Notes
 | |
| =====
 | |
| 
 | |
| DCCP does not travel through NAT successfully at present on many boxes. This is
 | |
| because the checksum covers the pseudo-header as per TCP and UDP. Linux NAT
 | |
| support for DCCP has been added.
 |