Even if we have unnumbered peering, let's respect `no neighbor X capability link-local`
and disable it per-neighbor on demand.
Fixes: db853cc97e ("bgpd: Implement Link-Local Next Hop capability")
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Currently when a directly connected peer is going down
BGP gets a call back for nexthop tracking in addition
the interface down events. On the interface down
event BGP goes through and sets up a per peer Q that
holds all the bgp path info's associated with that peer
and then it goes and processes this in the future. In
the meantime zebra is also at work and sends a nexthop
removal event to BGP as well. This triggers a complete
walk of all path info's associated with the bnc( which
happens to be all the path info's already scheduled
for removal here shortly). This evaluate paths
is not an inexpensive operation in addition the work
for handling this is already being done via the
peer down queue. Let's optimize the bnc handling
of evaluate paths and check to see if the peer is
still up to actually do the work here.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Related: https://datatracker.ietf.org/doc/html/draft-white-linklocal-capability
TL;DR; use 16 bytes long next-hops for point-to-point (unnumbered) links instead
of sending 32 bytes (::/LL, GUA/LL, LL/LL combinations).
For backward compatiblity we should handle even 32 bytes existing next hops.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When a BGP listener configured with BMP receives the first BGP
IPv6 update from a connected BGP IPv6 peer, the BMP collector
receives a withdraw post-policy message.
> {"peer_type": "route distinguisher instance", "policy": "post-policy",
> "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "444:1",
> "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp":
> "2024-10-29 11:44:47.111962", "bmp_log_type": "withdraw", "afi": 2,
> "safi": 1, "ip_prefix": "2001::1125/128", "seq": 22}
> {"peer_type": "route distinguisher instance", "policy": "pre-policy",
> "ipv6": true, "peer_ip": "192:167::3", "peer_distinguisher": "444:1",
> "peer_asn": 65501, "peer_bgp_id": "192.168.1.3", "timestamp":
> "2024-10-29 11:44:47.111963", "bmp_log_type": "update", "origin":
> "IGP", "as_path": "", "afi": 2, "safi": 1, "nxhp_ip": "192:167::3",
> "nxhp_link-local": "fe80::7063:d8ff:fedb:9e11", "ip_prefix": "2001::1125/128", "seq": 23}
Actually, the BGP update is not valid, and BMP considers it as a
withdraw message. The BGP upate is not valid, because the nexthop
reachability is unknown at the time of reception, and no other
BMP message is sent.
Fix this by re-sending a BMP post update message when nexthop
tracking becomes successfull. Generalise the re-sending of
messages when nexthop tracking changes.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When receiving an SRv6 BGP update, the nexthop tracking is used
to find out the reachability of the BGP update.
> # show bgp ipv6 vpn fd00:200::/64
> Paths: (1 available, best #1)
> [..]
> 4:4::4:4 from 4:4::4:4 (4.4.4.4)
> Origin incomplete, metric 0, localpref 100, valid, internal, best (First path received)
> Extended Community: RT:52:100
> Remote label: 16
> Remote SID: 2001:db8:f4::
> Last update: Mon Mar 11 11:50:04 2024
The IPv6 address used is the "Remote SID". Actually, this value is
incomplete. Remote SID stands for the attribute value received in BGP,
while the label value determines a complement of SRv6 SID value. The
transposition technique authorises that in BGP, and in the above case,
the incoming BGP update has used the transposition length.
When there is a transposition in the SID attribute available, use the
real SID address. The nexthop tracking will use that forged address.
> # show bgp nexthop
> Current BGP nexthop cache:
> 4:4::4:4 valid [IGP metric 30], #paths 0, peer 4:4::4:4
> gate fe80::dced:1ff:fed6:878c, if ntfp3
> Last update: Mon Mar 11 11:50:02 2024
> 2001:db8:f4:1:: valid [IGP metric 0], #paths 2
> gate fe80::dced:1ff:fed6:878c, if ntfp3
Fixes: 26c747ed6c ("bgpd: extend make_prefix to form srv6-based prefix")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Currently FRR is limiting the nexthop count to a uint8_t not a
uint16_t. This leads to issues when the nexthop count is 256
which results in the count to overflow to 0 causing problems
in the code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Introduce a command to stop bgpd from enabling IPv6 router advertisement
messages sending on interfaces.
Signed-off-by: Mikhail Sokolovskiy <sokolmish@gmail.com>
Without this patch:
```
r1# sh ip bgp vrf CUSTOMER-A
BGP table version is 1, local router ID is 20.20.20.0, vrf id 4
Default local pref 100, local AS 65000
Status codes: s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
192.168.2.0/24 192.168.179.5@0<
0 0 479 ?
Displayed 1 routes and 1 total paths
r1#
```
This is because the route is imported, next-hop is in a default VRF, and we should
evaluate an ultimate path info, not the current path info.
After:
```
r1# sh ip bgp vrf CUSTOMER-A
BGP table version is 1, local router ID is 20.20.20.0, vrf id 4
Default local pref 100, local AS 65000
Status codes: s suppressed, d damped, h history, u unsorted, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 192.168.2.0/24 192.168.179.5@0<
0 0 479 ?
Displayed 1 routes and 1 total paths
r1#
```
In both cases next-hop in cache table is valid:
```
r1# sh ip bgp nexthop
Current BGP nexthop cache:
192.168.179.5 valid [IGP metric 0], #paths 2, peer 192.168.179.5
Resolved prefix 192.168.179.0/24
if r1-eth0
Last update: Thu Sep 5 11:24:37 2024
r1#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Fix static-analyser warnings with BGP labels:
> $ scan-build make -j12
> bgpd/bgp_updgrp_packet.c:819:10: warning: Access to field 'extra' results in a dereference of a null pointer (loaded from variable 'path') [core.NullDereference]
> ? &path->extra->labels->label[0]
> ^~~~~~~~~
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
On a multihomed setup with colored bgp updates, when the primary
PE goes offline, only a small subset of colored bgp routes are
not switching to the secondary pe.
When a switchover happens, due to a remote IP becoming unreachable,
some nexthop tracking down notifications are sent, but those messages
are completely ignored for colored bgp updates.
The original code has been thought for mounting up the SR-TE service,
when IP reachability is ok, but not when services goes offline.
Fix this by extending the down notification mechanism for colored routes
too.
Fixes: 545aeef1d1 ("bgpd: extend the NHT code to understand SR-TE colors")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When the SR-TE service is off, colored BGP routes are not
selected if it is recursively resolved over routes that are
colored only.
Actually, a BGP nexthop context includes the color attribute;
when an update from ZEBRA is received, there is no color, and
the colored BGP nexthop contexts are parsed, only if there
is a non colored BGP nexthop context. The actual setup shows
this may not be the case every time.
Fix this by parsing all the colored BGP nexthop contexts.
Fixes: b8210849b8 ("bgpd: Make bgp ready to remove distinction between 2 nh tracking types")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a peer sends an IPv4-mapped IPv6 global and a IPv6 link-local
nexthop, prefer the link-local unless a route-map tells to use the
global.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
SRTE_COLOR is not defined at all as an attribute, it was a mistake from the
beginning.
SRTE_COLOR is extended community, can't see the reason having it as a community,
and a separate attribute.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
This will allow a consistency of approach to adding/removing
pi's to from the workqueue for processing as well as properly
handling the dest->info pi list more appropriately.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The nexthop tracking never displays the prefix that
has been used in ZEBRA to resolve its nexthop. This
information will be useful if some decision has to be
taken regarding any loops, that is to say if for instance
a BGP prefix is resolved over a prefix in ZEBRA that is
exactly the same.
Store the value in bgp nexthop context, and display it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In make_prefix, the code checks to see if the pi->attr
is non-NULL. Since (A) we cannot have a path_info without
an attribute and (B) all paths leading up to the test
in make_prefix already have pi->attr deref'ed and the
code is not crashing we know this is safe to remove.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Move mp_nexthop_prefer_global boolean attribute to nh_flags. It does
not currently save memory because of the packing.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
In the case of SRv6-VPN we track the reachability
to the SID. We check that the SID is available
in the BGP update and then we check the nexthop
reachability.
Fixes 7f8c7d9 ("bgpd: ignore nexthop validation for srv6-vpn")
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Enable the SRv6 SID prefix generation in make_prefix()
function of bgp_nht.c.
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
fixup: bgpd: extend make_prefix to form srv6-based prefix
The following configuration creates an infinite routing leaking loop
because 'rt vpn both' parameters are the same in both VRFs.
> router bgp 5227 vrf r1-cust4
> no bgp network import-check
> bgp router-id 192.168.1.1
> address-family ipv4 unicast
> network 28.0.0.0/24
> rd vpn export 10:12
> rt vpn both 52:100
> import vpn
> export vpn
> exit-address-family
> !
> router bgp 5227 vrf r1-cust5
> no bgp network import-check
> bgp router id 192.168.1.1
> address-family ipv4 unicast
> network 29.0.0.0/24
> rd vpn export 10:13
> rt vpn both 52:100
> import vpn
> export vpn
> exit-address-family
The previous commit has added a routing leak update when a nexthop
update is received from zebra. It indirectly calls
bgp_find_or_add_nexthop() in which a static route triggers a nexthop
cache entry registration that triggers a nexthop update from zebra.
Do not register again the nexthop cache entry if the BGP_STATIC_ROUTE is
already set.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
If 'bgp network import-check' is defined on the source BGP session,
prefixes that are defined with the network command cannot be leaked to
the other VRFs BGP table even if they are present in the origin VRF RIB
if the 'rt import' statement is defined after the 'network <prefix>'
ones.
When a prefix nexthop is updated, update the prefix route leaking. The
current state of nexthop validation is now stored in the attributes of
the bgp path info. Attributes are compared with the previous ones at
route leaking update so that a nexthop validation change now triggers
the update of destination VRF BGP table.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
"if not XX else" statements are confusing.
Replace two "if not XX else" statements by "if XX else" to prepare next
commits. The patch is only cosmetic.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
This rework separates l3nhg functionality from the nexthop
tracking code, by introducing two bgp_nhg.[ch] files. The
calling functions are renamed from bgp_l3nhg* to bgp_nhg*.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In some cases BGP can be monitoring the same prefix
in both the nexthop and import check tables. If this
is the case, when unregistering one bnc from one table
make sure we are not still registered in the other
Example of the problem:
r1(config-router)# address-family ipv4 uni
r1(config-router-af)# no network 192.168.100.41/32
r1(config-router-af)# exit
r1# show bgp import-check-table
Current BGP import check cache:
r1# show bgp nexthop
Current BGP nexthop cache:
192.168.100.41 valid [IGP metric 0], #paths 1, peer 192.168.100.41
if r1-eth0
Last update: Wed Dec 6 11:01:40 2023
BGP now believes it is only watching 192.168.100.41 in the nexthop
cache, but zebra doesn't have anything:
r1# show ip import-check
VRF default:
Resolve via default: on
r1# show ip nht
VRF default:
Resolve via default: on
So if anything happens to the route that is being matched for
192.168.100.41 bgp is no longer going to be notified about this.
The source of this problem is that zebra has dropped the two different
tables into 1 table, while bgp has 2 tables to track this. The solution
to this problem (other than the rewrite that is being done ) is to have
BGP have a bit of smarts about looking in both tables for the bnc and
if found in both don't send the delete of the prefix tracking to zebra.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In zebra, the import check table and the nexthop check tables
were combined. This leaves an issue where when bgp happens
to have a tracked address in both the import check table
and the nexthop track table that are the same address.
When the the item is removed from one table the call
to remove it from zebra removes tracking for the other
table.
Combine the two tables together and keep track where
they came from for processing in bgpd.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
For v4 nexthops, ifindex was being set. Modified the check to set
ifindex only for v6 nexthops. Also modified the check to set ifindex
only if the v6 nexthop matches peer's LL address.
Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
Problem:
On GR helper, paths learnt from an interface based peer were linked
to bnc with ifindex=0. During restart of GR peer, BGP (unnumbered)
session (with GR restarter peer) goes down on GR helper but the routes
are retained. Later, when BGP receives an interface up event, it
will process all the paths associated with BNC whose ifindex matches the
ifindex of the interface for which UP event is received. However, paths
associated with bnc that has ifindex=0 were not being reinstalled since
ifindex=0 doesn't match ifindex of any interfaces. This results in
BGP routes not being reinstalled in zebra and kernel.
Fix:
For paths learnt from an interface based peer, set the
ifindex to peer's interface ifindex so that correct
peer based nexthop can be found and linked to the path.
Signed-off-by: Donald Sharp sharpd@nvidia.com
Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
Even if some of the attributes in bgp_path_info_extra are
not used, their memory is still allocated every time. It
cause a waste of memory.
This commit code deletes all unnecessary attributes and
changes the optional attributes to pointer storage. Memory
will only be allocated when they are actually used. After
optimization, extra info related memory is reduced by about
half(~400B -> ~200B).
Signed-off-by: Valerian_He <1826906282@qq.com>
Assuming field 'ifindex_ipv6_ll' is not equal to field 'ifindex', then
nhop_found is just a garbage, let's avoid that.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>