mirror of
https://git.proxmox.com/git/mirror_frr
synced 2025-08-13 10:14:50 +00:00
bgpd: Fix BGP session stuck in OpenConfirm state
Issue: 1. Initially BGP start listening to socket. 2. Start timer expires and BGP tries to connect to peer and moved to Idle->connect (lets say peer datastructre X) 3. Connect for X succeeds and hence moved from idle ->connect with FD-x. 4. A incoming connection is accepted and a new peer datastructure Y is created with FD-y moves from idle->Active state. 5. Peer datastercture Y FD-y sends out OPEN and moves to Active->Opensent state. 6. Peer datastrcture Y FD-y receives OPEN and moved from Opensent-> Openconfirm state. 7. Meanwhile on peer datastrcture X FD-x sends out a OPEN message and moved from connect->Opensent. 8. For peer datastrcture Y FD-y keep alive is received and it is moved from OpenConfirm->Established. 9. In this case peer datastructure Y FD-y is a accepted connection so we try to copy all its parameter to peer datastructure X and delete Y. 10. During this process TCP connection for the accepted connection (FD-y) goes down and hence get remote address and port fails. 11. With this failure bgp_stop function for both peer datastrure X and peer datastructure Y is called. 12. By this time all the parameters include state for datastrcture for X and Y are exchanged. Peer Y FD-y when it entered this function had state OpenConfirm still which has been moved to peer datastrcture X. 13. In bgp_stop it will stop all the timers and take action only if peer is in established state. Now that peer datastrcture X and Y are not in established state (in this function) it will simply close all timers and close the socket and assigns socket for both the peer datastrcture to -1. 14. Peer datastrcture Y will be deleted as it is a datastrcture created due to accept of connection where as peer datastrcture X will be held as it is created with configuration. 15. Now peer datastrcture X now holds a state of OpenConfirm without any timers running. 16. With this any new incoming connection will never be able to establish as there is config connection X which is stuck in OpenConfirm. Fix: While transferring the peer datastructure Y FD-y (accepted connection) to the peer datastructure X, if TCP connection for FD-y goes down, then 1. Call fsm event bgp_stop for X (do cleanup with bgp_stop and move the state to Idle) and 2. Call fsm event bgp_stop for Y (do cleanup with bgp_stop and gets deleted since it is an accept connection). Signed-off-by: Sarita Patra <saritap@vmware.com>
This commit is contained in:
parent
4533dc6a4e
commit
6c4d8732e9
@ -304,8 +304,8 @@ static struct peer *peer_xfer_conn(struct peer *from_peer)
|
||||
? "accept"
|
||||
: ""),
|
||||
peer->host, peer->fd, from_peer->fd);
|
||||
bgp_stop(peer);
|
||||
bgp_stop(from_peer);
|
||||
BGP_EVENT_ADD(peer, BGP_Stop);
|
||||
BGP_EVENT_ADD(from_peer, BGP_Stop);
|
||||
return NULL;
|
||||
}
|
||||
if (from_peer->status > Active) {
|
||||
|
Loading…
Reference in New Issue
Block a user