The idea is to drop unwanted attributes from the BGP UPDATE messages and
continue by just ignoring them. This improves the security, flexiblity, etc.
This is the command that Cisco has also.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
For now, if the order was mixed, most of the commands are just silently
ignored. Let the operator notice that.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Some keys are only present in the JSON data of BGP neighbors are only present if the peer is, or has previously been established.
While they are not present if the peer has never come up.
To keep the data structure aligned, the below keys are added also to the neighbors that BGP adjacency has never been established.
Values of the keys are all set to Unknown
hostname:Unknown,
nexthop:Unknown,
nexthopGlobal:Unknown,
nexthopLocal:Unknown,
bgpConnection:Unknown,
Signed-off-by: Karl Quan <kquan@nvidia.com>
When actually creating a peer in BGP, tell the creation if
it is a config node or not. There were cases where the
CONFIG_NODE was being set *after* being placed into
the bgp->peerhash, thus causing collisions between the
doppelganger and the peer and eventually use after free's.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Use %pI4/%pI6 where possible, otherwise at least atjust stack buffer sizes
for inet_ntop() calls.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
currently the following configuration
dut:
!
interface ntfp2
ip router isis 1
!
router bgp 200
no bgp ebgp-requires-policy
bgp confederation identifier 300
bgp confederation peers 300
neighbor 192.168.1.1 remote-as 100
neighbor 192.168.2.2 remote-as 300
!
address-family ipv4 unicast
neighbor 192.168.2.2 default-originate
exit-address-family
!
router isis 1
is-type level-2-only
net 49.0001.0002.0002.0002.00
redistribute ipv4 connected level-2
!
end
router:
!
interface ntfp2
ip router isis 1
isis circuit-type level-2-only
!
router bgp 300
no bgp ebgp-requires-policy
bgp confederation identifier 300
bgp confederation peers 200
neighbor 192.168.2.1 remote-as 200
neighbor 192.168.3.2 remote-as 400
!
address-family ipv4 unicast
network 3.3.3.0/24
exit-address-family
!
router isis 1
is-type level-2-only
net 49.0001.0003.0003.0003.00
redistribute ipv4 connected level-2
!
end
on dut result of show bgp ipv4 unicast command is:
show bgp ipv4 unicast
BGP table version is 1, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 200
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 192.168.1.1 0 0 100 i
instead of
sho bgp ipv4 unicast
BGP table version is 3, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 200
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 192.168.1.1 0 0 100 i
*> 3.3.3.0/24 192.168.2.2 0 100 0 (300) i
*> 4.4.4.0/24 192.168.3.2 0 100 0 (300) 400 i
Displayed 3 routes and 3 total paths
According to RFC 5065:the usage of one of the member AS number as the
confederation identifier is not forbidden.
fixes are the following
in bgp_route.c:
in bgp_update remove the test for presence of confederation id in
as_path since, this case is allowed;
in bgp_vty.c
bgp_confederation_peers, remove the test on peer as value
in bgpd.c
bgp_confederation_peers_add
remove the test on peer as value
invert the order of setting peer->sort value and peer->local_as,
since peer->sort is depending from current peer->local_as value
bgp_confederation_peers_remove
invert the order of setting peer->sort value and peer->local_as,
since peer->sort is depending from current peer->local_as value
Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
Previously BGP supported up to 255 SIDs.
The PR https://github.com/FRRouting/frr/pull/11981 extended the
transposition computation algorithm in BGP to support more SIDs (up to
1048575 SIDs).
However the BGP VTY command for allocating an SRv6 per-VRF SID
(`sid vpn per-vrf export`) is still limited to 255 SIDs.
This commit extends the SID index in `sid vpn per-vrf export` VTY
command to support up to 1048575 SIDs.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
We already have a global knob for graceful-shutdown, but it's handy having
per neighbor knob as well.
Especially when a single neighbor needs to be restarted/shutdown gracefuly.
We can do this route-maps, but this is a faster/cleaner way doing the same
for an operator.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Simulated latency with:
```
tc qdisc add dev eth3 root netem delay 100ms
```
```
donatas-laptop# sh ip bgp summary failed
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.0.2.252, local AS number 65000 vrf-id 0
BGP table version 28
RIB entries 0, using 0 bytes of memory
Peers 1, using 724 KiB of memory
Neighbor EstdCnt DropCnt ResetTime Reason
192.168.10.65 2 2 00:00:17 Admin. shutdown (RTT)
Displayed neighbors 1
Total number of neighbors 1
donatas-laptop#
```
Another end received:
```
%NOTIFICATION: received from neighbor 192.168.10.17 6/2 (Cease/Administrative Shutdown) "shutdown due to high round-trip-time (104ms > 5ms, hit 21 times)"
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
`srv6_locator_chunk_free()` is a wrapper around the `XFREE()` macro.
Passing a NULL pointer to `XFREE()` is safe. Therefore, checking that
the pointer passed to the `srv6_locator_chunk_free()` is not null is
unnecessary.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
`srv6_locator_chunk_free()` takes care of freeing the memory allocated
for a `struct srv6_locator_chunk` and setting the
`struct srv6_locator_chunk` pointer to NULL.
It is not necessary to explicitly set the pointer to NULL after invoking
`srv6_locator_chunk_free()`.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
A programmer can use the `srv6_locator_chunk_free()` function to free
the memory allocated for a `struct srv6_locator_chunk`.
The programmer invokes `srv6_locator_chunk_free()` by passing a single
pointer to the `struct srv6_locator_chunk` to be freed.
`srv6_locator_chunk_free()` uses `XFREE()` to free the memory.
It is the responsibility of the programmer to set the
`struct srv6_locator_chunk` pointer to NULL after freeing memory with
`srv6_locator_chunk_free()`.
This commit modifies the `srv6_locator_chunk_free()` function to take a
double pointer instead of a single pointer. In this way, setting the
`struct srv6_locator_chunk` pointer to NULL is no longer the
programmer's responsibility but is the responsibility of
`srv6_locator_chunk_free()`. This prevents programmers from making
mistakes such as forgetting to set the pointer to NULL after invoking
`srv6_locator_chunk_free()`.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
Rather than running selected source files through the preprocessor and a
bunch of perl regex'ing to get the list of all DEFUNs, use the data
collected in frr.xref.
This not only eliminates issues we've been having with preprocessor
failures due to nonexistent header files, but is also much faster.
Where extract.pl would take 5s, this now finishes in 0.2s. And since
this is a non-parallelizable build step towards the end of the build
(dependent on a lot of other things being done already), the speedup is
actually noticeable.
Also files containing CLI no longer need to be listed in `vtysh_scan`
since the .xref data covers everything. `#ifndef VTYSH_EXTRACT_PL`
checks are equally obsolete.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Add a default limit to the InQ for messages off the bgp peer
socket. Make the limit configurable via cli.
Adding in this limit causes the messages to be retained in the tcp
socket and allow for tcp back pressure and congestion control to kick
in.
Before this change, we allow the InQ to grow indefinitely just taking
messages off the socket and adding them to the fifo queue, never letting
the kernel know we need to slow down. We were seeing under high loads of
messages and large perf-heavy routemaps (regex matching) this queue
would cause a memory spike and BGP would get OOM killed. Modifying this
leaves the messages in the socket and distributes that load where it
should be in the socket buffers on both send/recv while we handle the
mesages.
Also, changes were made to allow the ringbuffer to hold messages and
continue to be filled by the IO pthread while we wait for the Main
pthread to handle the work on the InQ.
Memory spike seen with large numbers of routes flapping and route-maps
with dozens of regex matching:
```
Memory statistics for bgpd:
System allocator statistics:
Total heap allocated: > 2GB
Holding block headers: 516 KiB
Used small blocks: 0 bytes
Used ordinary blocks: 160 MiB
Free small blocks: 3680 bytes
Free ordinary blocks: > 2GB
Ordinary blocks: 121244
Small blocks: 83
Holding blocks: 1
```
With most of it being held by the inQ (seen from the stream datastructure info here):
```
Type : Current# Size Total Max# MaxBytes
...
...
Stream : 115543 variable 26963208 15970740 3571708768
```
With this change that memory is capped and load is left in the sockets:
RECV Side:
```
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 265350 0 [fe80::4080:30ff:feb0:cee3]%veth1:36950 [fe80::4c14:9cff:fe1d:5bfd]:179 users:(("bgpd",pid=1393334,fd=26))
skmem:(r403688,rb425984,t0,tb425984,f1816,w0,o0,bl0,d61)
```
SEND Side:
```
State Recv-Q Send-Q Local Address:Port Peer Address:Port Process
ESTAB 0 1275012 [fe80::4c14:9cff:fe1d:5bfd]%veth1:179 [fe80::4080:30ff:feb0:cee3]:36950 users:(("bgpd",pid=1393443,fd=27))
skmem:(r0,rb131072,t0,tb1453568,f1916,w1300612,o0,bl0,d0)
```
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Ensure that un-configuring allowas-in for a peer or group
clears the related flags and integer value. Tighten the use
of the integer counter so that it's only used when the config
flag is set. Add show output if allowas-in is enabled.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
The command `sid vpn per-vrf export (1-255)|auto` can be used to export
IPv4 and IPv6 routes from a VRF to the VPN RIB using a single SRv6 SID
(End.DT46 behavior).
This commit implements the no form of the above command, which can be
used to disable the export of the IPv4/IPv6 routes:
`no sid vpn per-vrf export`.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
This commit adds the per-VRF SID chosen to advertise L3VPN for IPv4 and IPv6 address families using a single SID to the bgpd configuration.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
In the current implementation of bgpd, SRv6 SIDs can be configured only
under the address-family. This enables bgpd to leak IPv6 routes using
an SRv6 End.DT6 behavior and IPv4 routes using an SRv6 End.DT4
behavior. It is not possible to leak both IPv6 and IPv4 routes using a
single SRv6 SID.
This commit adds a new CLI command
"sid vpn per-vrf export <sid_idx|auto>" that enables bgpd to leak both
IPv6 and IPv4 routes using a single SRv6 SID (End.DT46 behavior).
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
When an SRv6 locator is unset, all the SRv6 SIDs allocated from the
locator are removed. Before freeing the memory allocated for an SRv6
SID, we check if the pointer to the SID is `NULL`.
However, checking for `NULL` before freeing memory is useless.
This PR aims to improve the code's readability by removing the
useless `NULL` checks.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
In order to send correct SRv6 L3VPN advertisement, we need to save
srv6_locator_chunk in vpn_policy. With this information, we can
construct correct SRv6 L3VPN advertisement packets.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
As an example, Arista EOS allows this behavior.
Configuration something like:
```
neighbor PG peer-group
neighbor PG remote-as 65001
neighbor PG local-as 65001
neighbor 192.168.10.124 peer-group PG
```
Or without peer-group.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
RFC4364 describes peerings between multiple AS domains, to ease
the continuity of VPN services across multiple SPs. This commit
implements a sub-set of IETF option b) described in chapter 10 b.
The ASBR to ASBR approach is taken, with an EBGP peering between
the two routers. The EBGP peering must be directly connected to
the outgoing interface used. In those conditions, the next hop
is directly connected, and there is no need to have a transport
label to convey the VPN label. A new vty command is added on a
per interface basis:
This command if enabled, will permit to convey BGP VPN labels
without any transport labels (i.e. with implicit-null label).
restriction:
this command is used only for EBGP directly connected peerings.
Other use cases are not covered.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
TCP keepalive is enabled once BGP connection is established.
New vty commands:
bgp tcp-keepalive <1-65535> <1-65535> <1-30>
no bgp tcp-keepalive
Signed-off-by: Xiaofeng Liu <xiaofeng.liu@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Let's convert to our actual library call instead
of using yet another abstraction that makes it fun
for people to switch daemons.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Running `bgp_srv6l3vpn_to_bgp_vrf` and `bgp_srv6l3vpn_to_bgp_vrf2`
topotests with `--valgrind-memleaks` gives several memory leak errors.
This is due to the way SRv6 locators are removed/unset in bgpd: when
an SRv6 locator is deleted or unset, the memory allocated for the
locator prefix (`tovpn_sid_locator`) is not freed.
This patch adds a `for` loop that iterates over the list of BGP
instances. For each BGP instance using the SRv6 locator to be
removed/unset, we use `XFREE()` to properly free the memory allocated
for `tovpn_sid_locator` after the SRv6 locator is removed or unset.
The memory allocated for `tovpn_sid_locator` cannot be freed before
calling `vpn_leak_postchange_all()`. This is because
after deleting an SRv6 locator, we call `vpn_leak_postchange_all()`
to handle the SRv6 locator deletion and send a BGP Prefix SID withdraw
message. `tovpn_sid_locator` is required to properly build the BGP
Prefix SID withdraw message. After calling `vpn_leak_postchange_all()`
we can safely remove the `tovpn_sid_locator` and free the allocated
memory.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
Running `bgp_srv6l3vpn_to_bgp_vrf` and `bgp_srv6l3vpn_to_bgp_vrf2`
topotests with `--valgrind-memleaks` gives several memory leak errors.
This is due to the way SRv6 SIDs are removed in bgpd: when
an SRv6 locator is deleted/unset, all the SIDs allocated from that
locator are removed from the SRv6 functions list
(`bgp->srv6_functions`),but the memory allocated for the SIDs is not
freed.
This patch adds a call to `XFREE()` to properly free the allocated
memory when an SRv6 SID is removed.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
Running `bgp_srv6l3vpn_to_bgp_vrf` and `bgp_srv6l3vpn_to_bgp_vrf2`
topotests with `--valgrind-memleaks` gives several memory leak errors.
This is due to the way SRv6 locators are deleted/unset in bgpd: when
an SRv6 locator is deleted/unset, all the chunks of the locator are
removed from the SRv6 locator chunks list (`bgp->srv6_locator_chunks`).
However, the memory allocated for the chunks is not freed.
This patch adds a call to the `srv6_locator_chunk_free()` function to
properly free the allocated memory when an SRv6 locator is removed or
unset.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
BGP SoO is a tag that is appended on BGP updates to allow a peer to mark
a particular peer as belonging to a particular site. In certain MPLS L3 VPN
configurations, the BGP AS-Path may not provide the granularity needed
prevent a loop in the control-plane. With this in mind, BGP SoO is designed
to fill this gap and prevent a routing loop that may occur.
If we configure for example, `neighbor soo 65000:1` at PEs, routes won't be
announced between CPEs if soo matches. This is especially needed when using
as-override or allowas-in.
Also, this is the automated way of the same behavior as configuring route-maps
for each peer like:
```
bgp extcommunity-list cpe permit soo 65000:1
!
route-map cpe permit 10
set extcommunity soo 65000:1
...
route-map cpe deny 10
match extcommunity cpe
route-map cpe permit 20
...
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When using bgp_vty_afi_from_str it can
return AFI_MAX( but in practice never will with
our cli ). In bgp_default_afi_safi_cmd the code
directly references:
bgp->default_afi[afi][safi] = TRUE;
and if afi is AFI_MAX FRRR would be accessing
memory where it should not be.
Let's just provide some assurances for coverity
that this never happens.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Just convert all uses of thread_cancel to THREAD_OFF. Additionally
use THREAD_ARG instead of t->arg to get the arguement. Individual
files should never be accessing thread private data like this.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Let's just use THREAD_OFF consistently in the code base
instead of each daemon having a special macro that needs to
be looked at and remembered what it does.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
currently "show bgp summary" and "sho bgp summary wide" commands
provide a description string until a whitespace is occuring this
respectively with size limits of 20 and 60 chars
now theses two commands are providing strings with all
characters until the last witespace before size limit
Signed-off-by: Francois Dumontet <francois.dumontet@6wind.com>
A new command is available under SAFI_MPLS_VPN:
With this command, the BGP vpnvx prefixes received are
not kept, if there are no VRF interested in importing
those vpn entries.
A soft refresh is performed if there is a change of
configuration: retain cmd, vrf import settings, or
route-map change.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
These values were named WITHDRAW and UPDATE. Yeah, you guessed it, those
are already #define's elsewhere (bgp_debug.h). Hilarity ensues.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
Just adding a support for peer-groups, because now it's not possible to
configure BGP role for peer-groups.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
The default keepalive/hold timers are always exposed via this commit:
```
commit 9b1b96233d (origin/bgp_timer_always_on)
Author: Trey Aspelund <taspelund@nvidia.com>
Date: Mon Jun 27 23:20:33 2022 +0000
bgpd: always display keepalive/hold intervals
`show bgp neighbors <peer> [json]` was only displaying the configured
keepalive and holdtime intervals when they differed from the default
values. Since default config is still config, let's make sure these
values are always displayed.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
```
However it mistakenly changed the logic to only display the peer's
timers if the configured value was non-zero. This updates the logic to
check PEER_FLAG_TIMER to determine if the values were configured,
given 0 is a valid value (to disable keepalives).
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
`show bgp neighbors <peer> [json]` was only displaying the configured
keepalive and holdtime intervals when they differed from the default
values. Since default config is still config, let's make sure these
values are always displayed.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
The command `debug bgp allow-martian` is not actually
a debug command it's a command that when entered allows
bgp to not reset a peering when a martian nexthop is
passed in the nlri.
Add the `bgp allow-martian-nexthop` command and allow it to be
used.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When a peer has not established connection yet, these values:
`hostLocal`, `portLocal`, `hostForeign`, `portForeign` might
not have any values and json output will not display anything
for them. Modify the code to display some nominal values in
this situation so that parsers are not surprised.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>