From 4533dc6a4e3d04115fb8de1967261938237898ab Mon Sep 17 00:00:00 2001 From: Sarita Patra Date: Thu, 20 Aug 2020 23:29:08 -0700 Subject: [PATCH 1/2] bgpd: Don't stop hold timer in OpenConfirm State MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Issue: 1. Initially BGP start listening to socket. 2. Start timer expires and BGP tries to connect to peer and moved to Idle->connect (lets say peer datastructre X) 3. Peer datastrcture Y FD-X receives OPEN and moved from Opensent-> Openconfirm state and start the hold timer. 4. In the OpenConfirm state, the hold timer is stopped. So peer X waits for Keepalive message from peer. If the Keepalive message is not received, then it will be in OpenConfirm state for indefinite time. 5. Due to this it neither close the existing connection nor it will accept any connection from peer. Fix: In the OpenConfirm state, don't stop the hold timer. 1. Upon receipt of a neighbor’s Keepalive, the state is moved to Established. 2. But If the hold timer expires, a stop event occurs, the state is moved to Idle. This is as per RFC. Signed-off-by: Sarita Patra --- bgpd/bgp_fsm.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/bgpd/bgp_fsm.c b/bgpd/bgp_fsm.c index ab3b88da7a..357061b444 100644 --- a/bgpd/bgp_fsm.c +++ b/bgpd/bgp_fsm.c @@ -1673,9 +1673,6 @@ static int bgp_fsm_open(struct peer *peer) /* Send keepalive and make keepalive timer */ bgp_keepalive_send(peer); - /* Reset holdtimer value. */ - BGP_TIMER_OFF(peer->t_holdtime); - return 0; } From 6c4d8732e975cf67a815fc97430166bb4d7008aa Mon Sep 17 00:00:00 2001 From: Sarita Patra Date: Thu, 20 Aug 2020 23:33:09 -0700 Subject: [PATCH 2/2] bgpd: Fix BGP session stuck in OpenConfirm state Issue: 1. Initially BGP start listening to socket. 2. Start timer expires and BGP tries to connect to peer and moved to Idle->connect (lets say peer datastructre X) 3. Connect for X succeeds and hence moved from idle ->connect with FD-x. 4. A incoming connection is accepted and a new peer datastructure Y is created with FD-y moves from idle->Active state. 5. Peer datastercture Y FD-y sends out OPEN and moves to Active->Opensent state. 6. Peer datastrcture Y FD-y receives OPEN and moved from Opensent-> Openconfirm state. 7. Meanwhile on peer datastrcture X FD-x sends out a OPEN message and moved from connect->Opensent. 8. For peer datastrcture Y FD-y keep alive is received and it is moved from OpenConfirm->Established. 9. In this case peer datastructure Y FD-y is a accepted connection so we try to copy all its parameter to peer datastructure X and delete Y. 10. During this process TCP connection for the accepted connection (FD-y) goes down and hence get remote address and port fails. 11. With this failure bgp_stop function for both peer datastrure X and peer datastructure Y is called. 12. By this time all the parameters include state for datastrcture for X and Y are exchanged. Peer Y FD-y when it entered this function had state OpenConfirm still which has been moved to peer datastrcture X. 13. In bgp_stop it will stop all the timers and take action only if peer is in established state. Now that peer datastrcture X and Y are not in established state (in this function) it will simply close all timers and close the socket and assigns socket for both the peer datastrcture to -1. 14. Peer datastrcture Y will be deleted as it is a datastrcture created due to accept of connection where as peer datastrcture X will be held as it is created with configuration. 15. Now peer datastrcture X now holds a state of OpenConfirm without any timers running. 16. With this any new incoming connection will never be able to establish as there is config connection X which is stuck in OpenConfirm. Fix: While transferring the peer datastructure Y FD-y (accepted connection) to the peer datastructure X, if TCP connection for FD-y goes down, then 1. Call fsm event bgp_stop for X (do cleanup with bgp_stop and move the state to Idle) and 2. Call fsm event bgp_stop for Y (do cleanup with bgp_stop and gets deleted since it is an accept connection). Signed-off-by: Sarita Patra --- bgpd/bgp_fsm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/bgpd/bgp_fsm.c b/bgpd/bgp_fsm.c index 357061b444..c8e5a308ef 100644 --- a/bgpd/bgp_fsm.c +++ b/bgpd/bgp_fsm.c @@ -304,8 +304,8 @@ static struct peer *peer_xfer_conn(struct peer *from_peer) ? "accept" : ""), peer->host, peer->fd, from_peer->fd); - bgp_stop(peer); - bgp_stop(from_peer); + BGP_EVENT_ADD(peer, BGP_Stop); + BGP_EVENT_ADD(from_peer, BGP_Stop); return NULL; } if (from_peer->status > Active) {