In some cases (large scale) it's desired to avoid changing configurations, but
let the BGP to automatically handle ASN changes.
`auto` means the peering can be iBGP or eBGP. It will be automatically detected
and adjusted from the OPEN message.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Instead of using 3 uint8_t variables under struct attr, let's use a single
uint8_t as the flags. Saving 2-bytes. Not a big deal, but it's even easier to
track EVPN-related flags/variables.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Add a new start option "-K" to libfrr to denote a graceful start,
and use it in zebra and bgpd.
zebra will use this option to denote a planned FRR graceful restart
(supporting only bgpd currently) to wait for a route sync completion
from bgpd before cleaning up old stale routes from the FIB. An optional
timer provides an upper-bounds for this cleanup.
bgpd will use this option to denote either a planned FRR graceful
restart or a bgpd-only graceful restart, and this will drive the BGP
GR restarting router procedures.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
Introduce BGP-wide flags to denote if BGP has started gracefully
and GR is in progress or not. Use this for setting of the R-bit in
the GR capability, and not a timer which is set for any new
instance creation. Mark graceful restart is complete when the
deferred path selection has been done and route sync with zebra as
well as deferred EOR advertisement has been initiated.
Introduce a function to check on F-bit setting rather than just
base it on configuration.
Subsequent commits will extend these functionalities.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
Streamline the BGP graceful-restart configuration at the global and
peer level some more. Similar to many other neighbor capability
parameters like MP and ENHE, reset the session immediately upon a
change to the configuration. This will be more aligned with the
transactional UI model also and will not require a separate 'clear'
command to be executed.
Note: Peer-group graceful-restart configuration is not yet supported.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
Add support for a BGP-wide setting for graceful restart modes and
parameters. This setting will apply to all BGP peers across all BGP
instances, but per-neighbor configuration can override it.
Per-instance configuration is disallowed if the BGP-wide setting
is in effect.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
Under heavy system load with many peers in passive mode and a large
number of routes, bgpd can enter an infinite loop. This occurs while
processing timeout BGP_OPEN messages, which prevents it from accepting
new connections. The following log entries illustrate the issue:
>bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0
>bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224
>bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224
... repeating
The issue occurs when bgpd handles a massive number of routes in the RIB
while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it
fails to process these packets promptly, leading the remote peer to
close the connection and resend BGP_OPEN packets.
When bgpd eventually starts processing these timeout BGP_OPEN packets,
it finds the TCP connection closed by the remote peer, resulting in
"bgp_stop()" being called. For each timeout peer, bgpd must iterate
through the routing table, which is time-consuming and causes new
incoming BGP_OPEN packets to timeout, perpetuating the infinite loop.
To address this issue, the code is modified to check if the peer has
been established at least once before calling "bgp_clear_route_all()".
This ensures that routes are only cleared for peers that had a
successful session, preventing unnecessary iterations over the routing
table for peers that never established a connection.
With this change, BGP_OPEN timeout messages may still occur, but in the
worst case, bgpd will stabilize. Before this patch, bgpd could enter a
loop where it was unable to accpet any new connections.
Signed-off-by: Loïc Sang <loic.sang@6wind.com>
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here https://github.com/FRRouting/frr/issues/13708
Signed-off-by: Piotr Suchy <psuchy@akamai.com>
RFC 8212 defines leak prevention for eBGP peers, but BGP-OAD defines a new
peering type One Administrative Domain (OAD), where multiple ASNs could be used
inside a single administrative domain. OAD allows sending non-transitive attributes,
so this prevention should be relaxed too.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
The backup_nexthop entry list has been populated by mistake,
and should not. Fix this by reverting the introduced behavior.
Fixes: 237ebf8d45 ("bgpd: rework bgp_zebra_announce() function, separate nexthop handling")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Before this patch, we always printed the last reason "Waiting for OPEN", but
if it's a manual shutdown, then we technically are not waiting for OPEN.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
In scaled EVPN + ipv4/ipv6 uni route sync to zebra,
some of the ipv4/ipv6 routes skipped reinstallation
due to incorrect local variable's stale value.
Once the local variable value reset in each loop
iteration all skipped routes synced to zebra properly.
Ticket: #3948828
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Use the name for when putting out debugs in bgp_zebra.c.
Additionally add an evpn flag for announce_route_actual.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When the packet is malformed it can use whatever values it wants. Let's check
what the real data we have in a stream instead of relying on malformed values.
Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
We advance data pointer (data++), but we do memcpy() with the length that is 1-byte
over, which is technically heap overflow.
```
==411461==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x50600011da1a at pc 0xc4f45a9786f0 bp 0xffffed1e2740 sp 0xffffed1e1f30
READ of size 4 at 0x50600011da1a thread T0
0 0xc4f45a9786ec in __asan_memcpy (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x3586ec) (BuildId: e794c5f796eee20c8973d7efb9bf5735e54d44cd)
1 0xc4f45abf15f8 in bgp_dynamic_capability_fqdn /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3457:4
2 0xc4f45abdd408 in bgp_capability_msg_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3911:4
3 0xc4f45abdbeb4 in bgp_capability_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3980:9
4 0xc4f45abde2cc in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4109:11
5 0xc4f45a9b6110 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3
```
Found by fuzzing.
Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
If we receive CAPABILITY message (software-version), we SHOULD check if we really
have enough data before doing memcpy(), that could also lead to buffer overflow.
(data + len > end) is not enough, because after this check we do data++ and later
memcpy(..., data, len). That means we have one more byte.
Hit this through fuzzing by
```
0 0xaaaaaadf872c in __asan_memcpy (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x35872c) (BuildId: 9c6e455d0d9a20f5a4d2f035b443f50add9564d7)
1 0xaaaaab06bfbc in bgp_dynamic_capability_software_version /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3713:3
2 0xaaaaab05ccb4 in bgp_capability_msg_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3839:4
3 0xaaaaab05c074 in bgp_capability_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:3980:9
4 0xaaaaab05e48c in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4109:11
5 0xaaaaaae36150 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3
```
Hit this again by Iggy \m/
Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
If we do:
```
bfd
profile foo
shutdown
```
The session is dropped, but immediately established again because we don't
have a proper check on BFD.
If BFD is administratively shutdown, ignore starting the session.
Fixes: https://github.com/FRRouting/frr/issues/16186
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When we receive a hard-reset notification, we always show it if it was a hard,
or not.
For sending side, we missed that. Let's display it too.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Under a setup where two BGP prefixes are available from multiple sources,
if one of the two prefixes is recursive over the other BGP prefix, then
it will not be considered as multipath. The below output shows the two
prefixes 192.0.2.24/32 and 192.0.2.21/32. The 192.0.2.[5,6,8] are the
known IP addresses visible from the IGP.
> # show bgp ipv4 192.0.2.24/32
> *>i 192.0.2.24/32 192.0.2.21 0 100 0 i
> * i 192.0.2.21 0 100 0 i
> * i 192.0.2.21 0 100 0 i
> # show bgp ipv4 192.0.2.21/32
> *>i 192.0.2.21/32 192.0.2.5 0 100 0 i
> *=i 192.0.2.6 0 100 0 i
> *=i 192.0.2.8 0 100 0 i
The bgp best selection algorithm refuses to consider the paths to
'192.0.2.24/32' as multipath, whereas the BGP paths which use the
BGP peer as nexthop are considered multipath.
> ... has the same nexthop as the bestpath, skip it ...
Previously, this condition has been added to prevent ZEBRA from
installing routes with same nexthop:
> Here you can see the two paths with nexthop 210.2.2.2
> superm-redxp-05# show ip route 2.23.24.192/28
> Routing entry for 2.23.24.192/28
> Known via "bgp", distance 20, metric 0, best
> Last update 00:32:12 ago
> * 210.2.2.2, via swp3
> * 210.2.0.2, via swp1
> * 210.2.1.2, via swp2
> * 210.2.2.2, via swp3
> [..]
But today, ZEBRA knows how to handle it. When receiving incoming routes,
nexthop groups are used. At creation, duplicated nexthops are
identified, and will not be installed. The below output illustrate the
duplicate paths to 172.16.0.200 received by an other peer.
> r1# show ip route 172.18.1.100 nexthop-group
> Routing entry for 172.18.1.100/32
> Known via "bgp", distance 200, metric 0, best
> Last update 00:03:03 ago
> Nexthop Group ID: 75757580
> 172.16.0.200 (recursive), weight 1
> * 172.31.0.3, via r1-eth1, label 16055, weight 1
> * 172.31.2.4, via r1-eth2, label 16055, weight 1
> * 172.31.0.3, via r1-eth1, label 16006, weight 1
> * 172.31.2.4, via r1-eth2, label 16006, weight 1
> * 172.31.8.7, via r1-eth4, label 16008, weight 1
> 172.16.0.200 (duplicate nexthop removed) (recursive), weight 1
> 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16055, weight 1
> 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16055, weight 1
> 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16006, weight 1
> 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16006, weight 1
> 172.31.8.7, via r1-eth4 (duplicate nexthop removed), label 16008, weight 1
Fix this by proposing to let ZEBRA handle this duplicate decision.
Fixes: 7dc9d4e4e3 ("bgp may add multiple path entries with the same nexthop")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If we receive a malformed packets, this could lead ptr_get_be64() reading
the packets more than needed (heap overflow).
```
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
0 0xaaaaaadf86ec in __asan_memcpy (/home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/.libs/bgpd+0x3586ec) (BuildId: 78123cd26ada92b8b59fc0d74d292ba70c9d2e01)
1 0xaaaaaaeb60fc in ptr_get_be64 /home/ubuntu/frr-public/frr_public_private-libfuzzer/./lib/stream.h:377:2
2 0xaaaaaaeb5b90 in ecommunity_linkbw_present /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_ecommunity.c:1895:10
3 0xaaaaaae50f30 in bgp_attr_ext_communities /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_attr.c:2639:8
4 0xaaaaaae49d58 in bgp_attr_parse /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_attr.c:3776:10
5 0xaaaaab063260 in bgp_update_receive /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:2371:20
6 0xaaaaab05df00 in bgp_process_packet /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_packet.c:4063:11
7 0xaaaaaae36110 in LLVMFuzzerTestOneInput /home/ubuntu/frr-public/frr_public_private-libfuzzer/bgpd/bgp_main.c:582:3
```
This is triggered when receiving such a packet (malformed):
```
(gdb) bt
0 ecommunity_linkbw_present (ecom=0x555556287990, bw=bw@entry=0x7fffffffda68)
at bgpd/bgp_ecommunity.c:1802
1 0x000055555564fcac in bgp_attr_ext_communities (args=0x7fffffffd840) at bgpd/bgp_attr.c:2619
2 bgp_attr_parse (peer=peer@entry=0x55555628cdf0, attr=attr@entry=0x7fffffffd960, size=size@entry=20,
mp_update=mp_update@entry=0x7fffffffd940, mp_withdraw=mp_withdraw@entry=0x7fffffffd950)
at bgpd/bgp_attr.c:3755
3 0x00005555556aa655 in bgp_update_receive (connection=connection@entry=0x5555562aa030,
peer=peer@entry=0x55555628cdf0, size=size@entry=41) at bgpd/bgp_packet.c:2324
4 0x00005555556afab7 in bgp_process_packet (thread=<optimized out>) at bgpd/bgp_packet.c:3897
5 0x00007ffff7ac2f73 in event_call (thread=thread@entry=0x7fffffffdc70) at lib/event.c:2011
6 0x00007ffff7a6fb90 in frr_run (master=0x555555bc7c90) at lib/libfrr.c:1212
7 0x00005555556457e1 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:543
(gdb) p *ecom
$1 = {refcnt = 1, unit_size = 8 '\b', disable_ieee_floating = false, size = 2, val = 0x555556282150 "",
str = 0x5555562a9c30 "UNK:0, 255 UNK:2, 6"}
```
Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
After modifying the "label vpn export value", the vpn label information
of the VRF is not updated to the peers.
For example, the 192.168.0.0/24 prefix is announced to the peer with a
label value of 222.
> router bgp 65500
> [..]
> neighbor 192.0.2.2 remote-as 65501
> address-family ipv4-vpn
> neighbor 192.0.2.2 activate
> exit-address-family
> exit
> router bgp 65500 vrf vrf2
> address-family ipv4 unicast
> network 192.168.0.0/24
> label vpn export 222
> rd vpn export 444:444
> rt vpn both 53:100
> export vpn
> import vpn
> exit-address-family
Changing the label with "label vpn export" does not update the label
value to the peer unless the BGP sessions is re-established.
No labels are stored are stored struct bgp_adj_out so that it is
impossible to compare the current value with the previous value
in adj-RIB-out.
Reference the bgp_labels pointer in struct bgp_adj_out and compare the
values when updating adj-RIB-out.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
In a BGP L3VPN context using ADJ-RIB-IN (ie. enabled with
'soft-reconfiguration inbound'), after applying a deny route-map and
removing it, the remote MPLS label information is lost. As a result, BGP
is unable to re-install the related routes in the RIB.
For example,
> router bgp 65500
> [..]
> neighbor 192.0.2.2 remote-as 65501
> address-family ipv4 vpn
> neighbor 192.0.2.2 activate
> neighbor 192.0.2.2 soft-reconfiguration inbound
The 192.168.0.0/24 prefix has a remote label value of 102 in the BGP
RIB.
> # show bgp ipv4 vpn 192.168.0.0/24
> BGP routing table entry for 444:1:192.168.0.0/24, version 2
> [..]
> 192.168.0.0 from 192.0.2.2
> Origin incomplete, metric 0, valid, external, best (First path received)
> Extended Community: RT:52:100
> Remote label: 102
A route-map now filter all incoming BGP updates:
> route-map rmap deny 1
> router bgp 65500
> address-family ipv4 vpn
> neighbor 192.0.2.2 route-map rmap in
The prefix is now filtered:
> # show bgp ipv4 vpn 192.168.0.0/24
> #
The route-map is detached:
> router bgp 65500
> address-family ipv4 vpn
> no neighbor 192.168.0.1 route-map rmap in
The BGP RIB entry is present but the remote label is lost:
> # show bgp ipv4 vpn 192.168.0.0/24
> BGP routing table entry for 444:1:192.168.0.0/24, version 2
> [..]
> 192.168.0.0 from 192.0.2.2
> Origin incomplete, metric 0, valid, external, best (First path received)
> Extended Community: RT:52:100
The reason for the loose is that labels are stored within struct attr ->
struct extra -> struct bgp_labels but not in the struct bgp_adj_in.
Reference the bgp_labels pointer in struct bgp_adj_in and use its values
when doing a soft reconfiguration of the BGP table.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>