Problem Statement:
=================
In scale setup BGP sessions start flapping.
RCA:
====
In virtualized environment there are multiple places where
MTU need to be set. If there are some places were MTU is not set
properly then there is chances that BGP packets get fragmented,
in scale setup this will lead to BGP session flap.
Fix:
====
A new tcp option is provided as part of this implementation,
which can be configured per neighbor and helps to set the TCP
max segment size. User need to derive the path MTU between the BGP
neighbors and set that value as part of tcp-mss setting.
1. CLI Configuration:
[no] neighbor <A.B.C.D|X:X::X:X|WORD> tcp-mss (1-65535)
2. Running config
frr# show running-config
router bgp 100
neighbor 198.51.100.2 tcp-mss 150 => new entry
neighbor 2001:DB8::2 tcp-mss 400 => new entry
3. Show command
frr# show bgp neighbors 198.51.100.2
BGP neighbor is 198.51.100.2, remote AS 100, local AS 100, internal link
Hostname: frr
Configured tcp-mss is 150, synced tcp-mss is 138 => new display
4. Show command json output
frr# show bgp neighbors 2001:DB8::2 json
{
"2001:DB8::2":{
"remoteAs":100,
"bgpTimerKeepAliveIntervalMsecs":60000,
"bgpTcpMssConfigured":400, => new entry
"bgpTcpMssSynced":388, => new entry
Risk:
=====
Low - This is a config driven feature and it sets the max segment
size for the TCP session between BGP peers.
Tests Executed:
===============
Have done manual testing with three router topology.
1. Executed basic config and un config scenarios
2. Verified if the config is updated in running config
during config and no config operation
3. Verified the show command output in both CLI format and
JSON format.
4. Verified if TCP SYN messages carry the max segment size
in their initial packets.
5. Verified the behaviour during clear bgp session.
6. done packet capture to see if the new segment size
takes effect.
Signed-off-by: Abhinay Ramesh <rabhinay@vmware.com>
FRR in thread.c clears the passed in double pointer when
we pull it off the ready queue and pass it back to
the calling function via thread_fetch().
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The new LL code in:
8761cd6ddb
Introduced the idea of the bgp unnumbered peers using interface up/down
events to track the bgp peers nexthop. This code was not properly
working when a connection was received from a peer in some circumstances.
Effectively the connection from a peer was immediately skipping state transitions
and FRR was never properly tracking the peers nexthop. When we receive the
connection attempt, let's track the nexthop now.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Description:
BGP session not established for ipv6 link local address with vrf config
Problem Description/Summary :
BGP session not established for ipv6 link local address with vrf configyy
1.Configure ipv6 link-local address fe80::1234/64 on dut1 and fe80::4567/64 on dut2
2.Configure BGP neighbors for ipv6 link-local on both dut1 and dut2
3.Verify BGP session is UP over link-local ipv6 address
4.Observed that bgp session not established for ipv6 link local address
Expected Behavior :
BGP session should be established for ipv6 link local address with vrf config
Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
When you use a single BGP session for both IPv4 and IPv6 it's a bit
annoying going into ipv6 address-family and explicitly activating it.
Let's get this automatically if enabled with `bgp default ipv6-unicast`.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
The return from sockunion2hostprefix tells us if the conversion
succeeded or not. There are places in the code where we
always assume that it just `works`, since it can fail
notice and try to do the right thing.
Please note that failure of this function for most cases
of sockunion2hostprefix is highly highly unlikely as that
the sockunion was already created and tested elsewhere
it's just that this function can fail.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When setting authentication on a BGP peer in a VRF the listener is
looked up from a global list. However there is no check that the
listener is the one associated with the VRF being configured. This
can result in the wrong listener beiong configured with a password,
leaving the intended listener in an open authentication state.
To simplify this lookup stash a pointer to the bgp instance in
the listener on creating (in the same way as is done for NS-based
VRFS).
Signed-off-by: Pat Ruddy <pat@voltanet.io>
* Fixed integration in FSM and packet handling.
* Added CLI "show" output, incl. JSON.
* For review and testing only.
Signed-off-by: David Schweizer <dschweizer@opensourcerouting.org>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
bgp_accept() gets called over and over again when a VRF device is
deleted out from under a bgp listener socket that is bound to it.
Prevent this by noting the error and cancelling ourselves, allowing the
vrf status code to clean up the mess when it receives word about the
change from Zebra.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Try to give a bit more useful data about where we
think the connection is trying to come in from.
Hopefully this will let us debug connection issues
a bit faster in cases where there are config issues.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We are crashing in thread_cancel on shutdown because
the thread pointer is NULL. Use the more appropriate
THREAD_CANCEL macro
Ticket: CM-29873
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The act of peer_sort() being called always set this value
even when we are just looking it up. We need to seperate
out the idea of lookup from set.
For those places that this is immediately obvious that
this is a lookup switch over to using this function.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
*Adding helper caller hooks function for signalling from BGPD
to ZEBRA to enable or disable GR feature in ZEBRA depending
on bgp per peer gr configuration.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
bgp tcp connection.
When the BGP peer is configured between two bgp routes both routers would create
peer structure , when they receive each other’s open message. In this event both
speakers, open duplicate TCP sessions and send OPEN messages on each socket
simultaneously, the BGP Identifier is used to resolve which socket should be closed.
If BGP GR is enabled the old tcp session is dumped and the new session is retained.
So while this transfer of connection is happening, if all the bgp gr config
is not migrated to the new connection, the new bgp gr mode will never get applied.
Fix Summary:
1. Replicate GR configuration from the old session to the new session in bgp_accept().
2. Replicate GR configuration from stub to full-fledged peer in bgp_establish().
3. Disable all NSF flags, clear stale routes (if present), stop restart & stale timers
(if they are running) when the bgp GR mode is changed to “Disabled”.
4. Disable R-bit in cap, if it is not set the received open message.
Signed-off-by: Biswajit Sadhu <sadhub@vmware.com>
Add -s X or --socket_size X to the bgp cli to allow
the end user to specify the outgoing bgp tcp kernel
socket buffer size.
It is recommended that this option is only used on
large scale operations.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Under high load instances with hundreds of thousands of prefixes this
could result in very unstable systems.
When maximum-prefix is set, but restart timer is not set then the session
flaps between Idle(Pfx) -> Established -> Idle(Pfx) states.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
When using the maximum-prefix restart option with a BGP peer,
if the peer exceeds the limit of prefixes, bgpd causes the
connection to be closed and sets a timer. It will not attempt
to connect to that peer until the timer expires. But if the
peer attempts to connect to it before the timer expires, it
accepts the connection and starts exchanging routes again.
When accepting a connection from a peer, reject the connection
if the max prefix restart timer is set.
Signed-off-by: Matthew Smith <mgsmith@netgate.com>
the vrf_id parameter is replaced by struct vrf * parameter.
this impacts most of the daemons that look for an interface based on the
name and the vrf identifier.
Also, it fixes 2 lookup calls in zebra and sharpd, where the vrf_id was
ignored until now.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The SO_MARK socket option was being used pre vrf to allow for the
separation of the front panel -vs- the management port. This
was facilitated by a ip rule. Since this is undocumented anywhere
in our system( other than old commits see
ed40466af8 ). We should remove this
because this will cause interference with people using rules
and are not aware of this offshoot of functionality.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Co-authored-by: Donald Sharp <sharpd@cumulusnetworks.com>
Co-authored-by: Quentin Young <qlyoung@cumulusnetworks.com>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Now that all daemons receive the VRF backend from zebra, we can get
rid of vrf_is_mapped_on_netns() in favor of using the more convenient
vrf_is_backend_netns() function, which doesn't require any argument.
This commit also fixes the following problem:
debian(config)# ip route 50.0.0.0/8 blackhole vrf FAKE table 2
% table param only available when running on netns-based vrfs
Even when zebra was started with the --vrfwnetns, the error
above would be displayed since the VRF FAKE didn't exist, which
would make vrf_is_mapped_on_netns() return 0 incorrectly. Using
vrf_is_backend_netns() this problem doesn't happen anymore.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
if zebra is not started, then vrf identifiers are not available. This
prevents import/exportation to be available. This commit permits having
import/export available, even when zebra is not started.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Cleanup calls where we were passing in the su for
peer creation a tiny bit.
Creating a peer from the cli will always have a conf_if *or*
a su but not both. While a doppelganger will have both.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The peer->nexthop.ifp pointer must be set when parsing the
attributes in bgp_mp_reach_parse, notice this
and fail gracefully.
Rework bgp_nexthop_set to remove the HAVE_CUMULUS and to
fail the nexthop_set when we have a zebra connection and
no ifp pointer, as that not havinga zebra connection and
no ifp pointer is legal.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>