Commit Graph

174 Commits

Author SHA1 Message Date
bisdhdh
9e3b51a7f3 bgpd: Restarting node does not send EOR after the convergence.
*After a restarting router comes up and the bgp session is
successfully established with the peer. If the restarting
router doesn’t have any route to send, it send EOR to
the peer immediately before receiving updates from its peers.
*Instead the restarting router should send EOR, if the
selection deferral timer is not running OR count of eor received
and eor required are matches then send EOR.

Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
2020-01-23 09:34:25 +05:30
bisdhdh
d7b3cda6f7 bgpd: BGP tcp session failed to apply GR configuration on the transferred
bgp tcp connection.

When the BGP peer is configured between two bgp routes  both routers would create
peer structure , when they receive each other’s open message. In this event both
speakers, open duplicate TCP sessions and send OPEN messages on each socket
simultaneously, the BGP Identifier is used to resolve which socket should be closed.
If BGP GR is enabled the old tcp session is dumped and the new session is retained.
So while this transfer of connection is happening, if all the bgp gr config
is not migrated to the new connection, the new bgp gr mode will never get applied.
Fix Summary:
1.  Replicate GR configuration from the old session to the new session in bgp_accept().
2.  Replicate GR configuration from stub to full-fledged peer in bgp_establish().
3.  Disable all NSF flags, clear stale routes (if present), stop  restart & stale timers
    (if they are running) when the bgp GR mode is changed to “Disabled”.
4.  Disable R-bit in cap, if it is not set the received open message.

Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
2020-01-23 09:34:25 +05:30
bisdhdh
5cce3f0544 bgpd: Adding BGP GR change mode config apply on notification sent & received.
* Changing GR mode on a router needs a session reset from the
SAME router to negotiate new GR capability.
* The present GR implementation needs a session reset after every
new BGP GR mode change.
* When BGP session reset happens due to sending or receiving BGP
notification after changing BGP GR mode, there is no need of
explicit session reset.

Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
2020-01-23 09:34:25 +05:30
bisdhdh
f009ff2697 bgpd: Adding Selection Deferral Timer handler changes.
* Selection Deferral Timer for Graceful Restart.
* Added selection deferral timer handling function.
* Route marking as selection defer when update message is received.
* Staggered processing of routes which are pending best selection.
* Fix for multi-path test case.

Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
2020-01-23 09:34:25 +05:30
bisdhdh
2986cac299 bgpd: Adding BGP GR Per Neighbor show commands.
* Added new show command to show the graceful restart
information for each neighbor.
Cmd: show bgp [<ipv4|ipv6>] neighbors [<A.B.C.D|X:X::X:X|WORD>] graceful-restart
* Changes to show neighbors commands for displaying
graceful restart information.
Cmd :show [ip] bgp [<view|vrf> VIEWVRFNAME] [<ipv4|ipv6>] neighbors [<A.B.C.D|X:X::X:X|

Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
2020-01-23 09:34:25 +05:30
bisdhdh
794b37d521 bgpd: Adding BGP GR Global & Per Neighbour FSM changes
* Added FSM for peer and global configuration for graceful restart
 * Added debug option BGP_GRACEFUL_RESTART for logs specific to
 graceful restart processing

Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
2020-01-23 09:34:25 +05:30
Donatas Abraitis
53b4aaeca0 bgpd: Send notification to the peer on FSM error
We should send a NOTIFICATION message with the Error Code Finite State
Machine Error if we receive NOTIFICATION in OpenSent state
as defined in https://tools.ietf.org/html/rfc4271#section-8.2.2

Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2019-12-30 17:11:04 +02:00
Donatas Abraitis
e9613d32cc
Merge pull request #5429 from Spantik/bug_fix
BGP: BGP assert when it tries to access peer which is closed.
2019-12-10 09:43:28 +02:00
Santosh P K
74e00a55c1 bgpd: BGP assert when it tries to access peer which is closed.
Problem: BGP peer pointer is present in keepalive hash table
even when socket has been closed in some race condition.
When keepalive tries to access this peer it asserts.

RCA: Below sequence of events causing assert.
1. Config node peer has went down due to TCP reset
   it's FD has been set to -1.
2. Doppelganger peer goes to established state and it has
   been added to peer hash table for keepalive when it was
   in openconfirm state.
3. Config node parameters including FD are exchanged with
   doppelganger. Doppelganger will not have FD -1.
4. Doppelganger will be deleted as part of this it will
   remove it from the keepalive peer hash table.
5. While removing from hash table it tries to acquire lock.
6. During this time keepalive thread has the lock and in
   a loop trying to send keepalive for peers in hash table.
7. It tries to send keepalive for doppelganger peer with fd
   set to -1 and asserts.

Signed-off-by: Santosh P K <sapk@vmware.com>
2019-12-09 09:10:57 -08:00
David Lamparter
2b64873d24 *: generously apply const
const const const your boat, merrily down the stream...

Signed-off-by: David Lamparter <equinox@diac24.net>
2019-12-02 15:01:29 +01:00
Donatas Abraitis
c8d6f0d6c4 bgpd: Replace magic number 1 for TTL to BGP_DEFAULT_TTL
For readability and maintainability purposes.

Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2019-11-27 10:48:17 +02:00
Donatas Abraitis
0e35025eb4 bgpd: Use BGP_NOTIFY_SUBCODE_UNSPECIFIC value for bgp_notify_send() where 0
Just a code cleanup to keep the code consistent.

Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2019-11-10 17:54:37 +02:00
Lou Berger
ef5307f23f
Merge pull request #4861 from NaveenThanikachalam/logs
BGP: Rectifying the log messages.
2019-09-17 11:33:43 -04:00
Naveen Thanikachalam
4cb5e18ba5 BGP: Rectifying the log messages.
This change addresses the following:
1) Ensures logs under DEBUG macro checks are categorized
   as zlog_debug instead of zlog_info.
2) Error logs are categorized as zlog_err instead of zlog_info.
3) Rephrasing certain logs to make them appear more intuitive.

Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
2019-09-09 22:59:22 -07:00
Quentin Young
1ce14168b3
Merge pull request #4809 from martonksz/master
bgpd: hook for bgp peer status change events
2019-09-09 10:55:00 -04:00
Donald Sharp
11d443f591
Merge pull request #4925 from ddutt/master
bgpd: Fixes to error message printed for failed peerings
2019-09-03 20:36:53 -04:00
Dinesh G Dutt
05912a17e6 bgpd: Fixes to error message printed for failed peerings
There was a silly bug introduced when the command to show failed sessions
was added. A missing "," caused the wrong error message to be printed.
Debugging this led down a path that:
   - Led to discovering one more error message that needed to be added
   - Providing the error code along with the string in the JSON output
     to allow programs to key off numbers rather than strings.
   - Fixing the missing ","
   - Changing the error message to "Waiting for Peer IPv6 LLA" to
     make it clear that we're waiting for the link local addr.

Signed-off-by: Dinesh G Dutt <5016467+ddutt@users.noreply.github.com>
2019-09-03 19:55:49 +00:00
David Lamparter
00dffa8cde lib: add frr_with_mutex() block-wrapper
frr_with_mutex(...) { ... } locks and automatically unlocks the listed
mutex(es) when the block is exited.  This adds a bit of safety against
forgetting the unlock in error paths & co. and makes the code a slight
bit more readable.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2019-09-03 17:15:17 +02:00
Dinesh G Dutt
3577f1c54f bgpd: Add a new command to only show failed peerings
In a data center, having 32-128 peers is not uncommon. In such a situation, to find a
peer that has failed and why is several commands. This hinders both the automatability of
failure detection and the ease/speed with which the reason can be found. To simplify this
process of catching a failure and its cause quicker, this patch does the following:

1. Created a new function, bgp_show_failed_summary to display the
   failed summary output for JSON and vty
2. Created a new function to display the reset code/subcode. This is now used in the
   failed summary code and in the show neighbors code
3. Added a new variable failedPeers in all the JSON outputs, including the vanilla
   "show bgp summary" family. This lists the failed session count.
4. Display peer, dropped count, estd count, uptime and the reason for failure as the
   output of "show bgp summary failed" family of commands
5. Added three resset codes for the case where we're waiting for NHT, waiting for peer
   IPv6 addr, waiting for VRF to init.

This also counts the case where only one peer has advertised an AFI/SAFI.

The new command has the optional keyword "failed" added to the classical summary command.

The changes affect only one existing output, that of "show [ip] bgp neighbors <nbr>". As
we track the lack of NHT resolution for a peer or the lack of knowing a peer IPv6 addr,
the output of that command will show a "waiting for NHT" etc. as the last reset reason.

This patch includes update to the documentation too.

Signed-off-by: Dinesh G Dutt <5016467+ddutt@users.noreply.github.com>
2019-09-02 14:21:44 +00:00
vivek
e2d3a90954 bgpd: Fix nexthop reg for IPv4 route exchange using GUA IPv6 peering
In the case of IPv4 route exchange using GUA IPv6 peering, the route install
into the FIB involves mapping the immediate next hop to an IPv4 link-local
address and installing neighbor entries for this next hop address. To
accomplish the latter, IPv6 Router Advertisements are exchanged (the next hop
or peer must also have this enabled) and the RAs are dynamically initiated
based on next hop resolution.

However, in the case of a passive connection where the local system has not
initiated anything, no NHT entry is created for the peer, hence RAs were not
getting triggered. Address this by ensuring that a NHT entry is created even
in this situation. This is done at the time the connection becomes established
because the code has other assumptions that a NHT entry will be present only
for the "configured" peer. The API to create the entry ensures there are
no duplicates.

Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by:   Donald Sharp <sharpd@cumulusnetworks.com>
2019-08-18 22:12:06 -07:00
Marton Kun-Szabo
7d8d0eabb4 bgpd: hook for bgp peer status change events
Generally available hook for plugging application-specific
code in for bgp peer change events.

This hook (peer_status_changed) replaces the previous, more
specific 'peer_established' hook with a more general-purpose one.
Also, 'bgp_dump_state' is now registered under this hook.

Signed-off-by: Marton Kun-Szabo <martonk@amazon.com>
2019-08-13 11:59:27 -07:00
David Lamparter
584470fb5f bgpd: add & use bgp packet dump hook
The MRT dump code is already hooked in at the right places to write out
packets;  the BMP code needs exactly the same access so let's make this
a hook.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2019-07-03 16:58:26 +02:00
Donald Sharp
1cfe005d0c bgpd: Update an fsm debug message
When debugging I was having a hard time correlating some data and noticed that
a particular debug was not being very useful.

Signed-off-by: Donald Sharp <sharpd@cumulusnstworks.com>
2019-05-28 18:10:26 -04:00
Philippe Guibert
b83a6e054c bgpd: do not unregister bfd session when bgp session goes down
This commit fixes a previous commit:
"bfdd: remove operational bfd sessions from remote daemons"
where the handling of unregister call triggers the deletion of bfd
session.
Actually, the BFD session should not be deleted, while bgp session is
configured with BGP. this permits to receive BFD events up, and permit
quicker reconnecion.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2019-05-14 16:50:01 +02:00
Philippe Guibert
fc04a6778e bgpd: improve reconnection mechanism by cancelling connect timers
if bfd comes back up, and a bgp reconnection is in progress, theorically
it should be necessary to wait for the end of the reconnection process.
however, since that reconnection process may take some time, update the
fsm by cancelling the connect timer. This done, one just have to call
the start timer.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2019-04-18 16:11:51 +02:00
Donald Sharp
dded74d578 bgpd: Don't prevent views from being able to connect
Views are perfectly valid and should be allowed to connect.
In a bgp instance scenario the vrf_id will always be UNKNOWN,
so allow it.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2019-03-06 11:35:58 -05:00
root
36dc75886d bgpd: Creating Loopback Interface Flaps BGPd (#2865)
* The function bgp_router_id_zebra_bump() will check for active bgp
  peers before chenging the router ID.
  If there are established peers, router ID is not modified
  which prevents the flapping of established peer connection

* Added field in bgp structure to store the count of established peers

Signed-off-by: kssoman <somanks@vmware.com>
2018-11-19 04:35:32 -08:00
Don Slice
5742e42b98 bgpd: make name of default vrf/bgp instance consistent
Problems were reported with the name of the default vrf and the
default bgp instance being different, creating confusion.  This
fix changes both to "default" for consistency.

Ticket: CM-21791
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-7658
Testing: manual testing and automated tests before pushing
2018-10-31 06:20:37 -04:00
David Lamparter
0437e10517 *: spelchek
Signed-off-by: David Lamparter <equinox@diac24.net>
2018-10-25 20:10:57 +02:00
Donald Sharp
19bd3dffc1 bgpd: Do a bit better job of tracking the bgp->peerhash
When we add/remove peers we need to do a bit better job
of tracking them in the bgp->peerhash.

1) When we have the doppelganger take over, make sure the
winner is the one represented in the peerhash.

2) When creating the doppelganger, leave the current one
in place instead of blindly replacing it.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2018-10-07 20:55:52 -04:00
Donald Sharp
9bf904cc8b bgpd: Try to notice when configuration changes during startup
During peer startup there exists the possibility that both
locally and remote peers try to start communication at the
same time.  In addition it is possible for local configuration
to change at the same time this is going on.  When this happens
try to notice that the remote peer may be in opensent or openconfirm
and if so we need to restart the connection from both sides.

Additionally try to write a bit of extra code in peer_xfer_conn
to notice when this happens and to emit a error message to
the end user about this happening so that it can be cleaned up.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2018-10-01 10:58:06 -04:00
Quentin Young
1c50c1c0d6 *: style for EC replacements
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-09-13 19:38:57 +00:00
Quentin Young
450971aa99 *: LIB_[ERR|WARN] -> EC_LIB
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-09-13 19:34:28 +00:00
Quentin Young
e50f7cfdbd bgpd: BGP_[WARN|ERR] -> EC_BGP
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-09-13 18:51:04 +00:00
Quentin Young
09c866e34d *: rename ferr_zlog -> flog_err_sys
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-08-14 20:02:05 +00:00
Quentin Young
af4c27286d *: rename zlog_fer -> flog_err
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-08-14 20:02:05 +00:00
Donald Sharp
02705213b1 bgpd: Convert to using LIB_ERR_XXX where possible
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2018-08-14 20:02:05 +00:00
Don Slice
14454c9fdd bgpd: implement zlog_ferr facility for enhance error messages in bgp
Signed-off-by: Don Slice <dslice@cumulusnetworks.com<
2018-08-14 20:02:05 +00:00
Pascal Mathis
b90a8e13ee
bgpd: Implement group-overrides for peer timers
This commit implements BGP peer-group overrides for the timer flags,
which control the value of the hold, keepalive, advertisement-interval
and connect connect timers. It was kept separated on purpose as the
whole timer implementation is quite complex and merging this commit
together with with the other flag implementations did not seem right.

Basically three new peer flags were introduced, namely
*PEER_FLAG_ROUTEADV*, *PEER_FLAG_TIMER* and *PEER_FLAG_TIMER_CONNECT*.
The overrides work exactly the same way as they did before, but
introducing these flags made a few conditionals simpler as they no
longer had to compare internal data structures against eachother.

Last but not least, the test suite has been adjusted accordingly to test
the newly implemented flag overrides.

Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
2018-06-14 18:55:30 +02:00
Donald Sharp
c42eab4bf5 bgpd: Respect ability to reach nexthop if available
When bgp is thinking about opening a connection to a peer,
if we are connected to zebra, allow that to influence our
decision to start the connection.

Found Scenario:

Both bgp and zebra are started up at the same time.  Zebra is
being used to create the connected route through which bgp
will establish a peering relationship.  The machine is a
bit loaded due to other startup conditions and as such bgp
gets to the connection stage here before zebra has installed
the route.  If bgp does not respect zebra data when it does
have a connection then we will attempt to connect.  The
connect will fail because there is no route.  At that time
we will go into the connect timeout(2 minutes) and delay
connection.

What this does.  If we have established a zebra connection and
we do not have a clear path to the destination at this point
do not allow the connection to proceed.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2018-05-11 07:46:43 -04:00
Donald Sharp
54ff5e9b02 bgpd: Cleanup messages from getsockopt
The handling of the return codes for getsockopt was slightly wrong.

getsockopt returns -1 on error and errno is set.
What to do with the return code at that point is dependent
on what sockopt you are asking about.  In this case
status holds the error returned for SO_ERROR.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2018-05-11 07:34:24 -04:00
G. Paul Ziemba
960035b2d9 bgpd: nexthop tracking with labels for vrf-vpn leaking
Routes that have labels must be sent via a nexthop that also has labels.
This change notes whether any path in a nexthop update from zebra contains
labels. If so, then the nexthop is valid for routes that have labels.

If a nexthop update has no labeled paths, then any labeled routes
referencing the nexthop are marked not valid.

Add a route flag BGP_INFO_ANNC_NH_SELF that means "advertise myself
as nexthop when announcing" so that we can track our notion of the
nexthop without revealing it to peers.

Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2018-04-04 10:00:23 -07:00
Quentin Young
d7c0a89a3a
*: use C99 standard fixed-width integer types
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t

Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-03-27 15:13:34 -04:00
Donald Sharp
5410015a79 bgpd: peer->bgp must be non NULL
We lock and set peer->bgp at peer creation and only
remove it at deletion.  Therefore these tests are
not needed.

Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
2018-03-20 19:09:06 -04:00
Lou Berger
996c93142d *: conform with COMMUNITY.md formatting rules, via 'make indent'
Signed-off-by: Lou Berger <lberger@labn.net>
2018-03-06 14:04:32 -05:00
Philippe Guibert
f62abc7d65 bgpd: do not start BGP VRF peer connection, if VRF not unknown
Upon starting a BGP VRF instance, the server socket is not created,
because the VRF ID is not known, and then underlying VRF backend is not
ready yet. Because of that, the peer connection attempt will not be
started before.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2018-02-27 11:11:24 +01:00
Philippe Guibert
61cf4b3715 bgpd: bgp support for netns
The change contained in this commit does the following:
- discovery of vrf id from zebra daemon, and adaptation of bgp contexts
  with BGP.
  The list of network addresses contain a reference to the bgp context
  supporting the vrf.
  The bgp context contains a vrf pointer that gives information about
  the netns path in case the vrf is a netns path.

Only some contexts are impacted, namely socket creation, and retrieval
of local IP settings. ( this requires vrf identifier).

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2018-02-27 11:11:24 +01:00
Russ White
2ed7e4c3c3
Merge pull request #1591 from qlyoung/bgpd-ringbuf
bgpd: use ring buffer for network input
2018-01-10 19:59:24 -05:00
Quentin Young
0112e9e0b9
bgpd: use atomic_* ops on _Atomic variables
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-01-09 15:40:48 -05:00
Quentin Young
74ffbfe6fe
bgpd: use ring buffer for network input
The multithreading code has a comment that reads:
"XXX: Heavy abuse of stream API. This needs a ring buffer."

This patch makes the relevant code use a ring buffer.

Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
2018-01-03 14:35:11 -05:00