Fix several issues in sourcing AIGP attribute:
1) AIGP should not be set as default for a redistributed route or a
static network. It should be set by config instead.
2) AIGP sourced by "set aigp-metric igp-metric" in a route-map does
not set the correct value for a redistributed route.
3) When redistribute a connected route like loopback, the AGIP (with
value 0) is sourced by "set aigp-metric igp-metric", but the
attribute is not propagated as the attribute flag is not set.
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
If the underlay IGP metric changes, we SHOULD re-announce the routes with the
correct bpi->extra->igpmetric set.
Without this patch if the IGP link cost (metric) changes, we never notice this
and the peers do not have the updated metrics, which in turn causes incorrect
best path selections on remote peers.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
The nexthop metric should be added to AIGP when calculating the
bestpath in bgp_path_info_cmp().
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
AS 65000 | AS 65001
|
RR |
| |
R1 --- | --- R2
|
When r1 peer is an iBGP route reflector client of rr and r2 peer is a
eBGP neighbor of rr, and all three routers shares the same network, r2
receives announcements coming from r1 with a IPv6 link-local nexthop
from rr. This is incorrect as r2 should send traffic to r1 without
involving rr.
Do not send an IPv6 link-local nexthop if the originating peer is a
route-reflector client.
Link: https://github.com/FRRouting/frr/pull/16219#issuecomment-2397425505
Link: https://github.com/FRRouting/frr/pull/17037#discussion_r1792529683
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
In subgroup_announce_check(), the variable reflect is misleading, as it
suggests a relation to route reflection. However, it actually refers to
the scenario where an iBGP peer announces a route to another iBGP peer.
Rename reflect to ibgp_to_ibgp.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
If the "nexthop-local unchanged" setting is enabled, it preserves the
IPv6 link-local nexthop from the originating peer. However, if the
originating and destination peers are not on the same network segment,
the originating peer's IPv6 link-local address will be unreachable from
the destination peer.
In such cases, reset the IPv6 link-local nexthop, even if "nexthop-local
unchanged" is set on the destination peer.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Do not add an IPv6 link-local nexthop if the originating peer does not
provide one and the nexthop-local unchanged setting is enabled.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Currently bgp multipath has these properties:
a) mp_info may or may not be on a single path, based
upon path perturbations in the past.
b) mp_info->count started counting at 0( meaning 1 ). As that the
bestpath path_info was never included in the count
c) The first mp_info in the list held the multipath data associated
with the multipath. As such if you were at any other node that data
was not filled in.
d) As such the mp_info's that are not first on the list basically
were just pointers to the corresponding bgp_path_info that was in
the multipath.
e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was
created.
f) This linklist was passed in to a comparison function that took the
old mpinfo list and compared it item by item to the linklist and
doing magic to figure out how to create a new mp_info list.
g) the old mp_info and the link list had to be memory managed and
freed up.
h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the
multipath.
This is really complicated. Let's change the algorithm to this:
a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as
BGP_PATH_MULTIPATH_NEW.
b) When running multipath, just walk the list of bgp_path_info's and if
it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH.
If we run out of space to put in the ecmp, clear the flag on the rest.
c) Clean up the counting of sometimes adding 1 to the mpath count.
d) Only allocate a mpath_info node for the bestpath. Clean it up
when done with it.
e) remove the unneeded list management associated with the linklist and
the mp_list.
This greatly simplifies multipath computation for bgp and reduces memory
load for large scale deployments.
2 full feeds in work_queue_run prior:
0 56367.471 1123 50193 493695 50362 493791 0 0 0 TE work_queue_run
BGP multipath info : 1941844 48 110780992 1941844 110780992
2 full feeds in work_queue_run after change:
1 52924.931 1296 40837 465968 41025 487390 0 0 1 TE work_queue_run
BGP multipath info : 970860 32 38836880 970866 38837120
Aproximately 4 seconds of saved cpu time for convergence and ~75 mb
smaller run time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is handy when you need to do source matching e.g. `match src-peer ...`
on outgoing direction with a route-map.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
If we have soft inbound enabled, we should see how the route looks like
before it was modified by a route-map/prefix-list.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
1. bgp coredump is observed when we delete default bgp instance
when we have multi-vrf; and route-leaking is enabled between
default, non-default vrfs.
Removing default router bgp when routes leaked between non-default vrfs.
- Routes are leaked from VRF-A to VRF-B
- VPN table is created with auto RD/RT in default instance.
- Default instance is deleted, we try to unimport the routes from all VRFs
- non-default VRF schedules a work-queue to process deleted routes.
- Meanwhile default bgp instance clears VPN tables and free the route
entries as well, which are still referenced by non-default VRFs which
have imported routes.
- When work queue process starts to delete imported route in VRF-A it cores
as it accesses freed memory.
- Whenever we delete bgp in default vrf, we skip deleting routes in the vpn
table, import and export lists.
- The default hidden bgp instance will not be listed in any of the show
commands.
- Whenever we create new default instance, handle it with AS number change
i.e. old hidden default bgp's AS number is updated and also changing
local_as for all peers.
2. A default instance is created with ASN of the vrf with the import
statement.
This may not be the ASN desired for the default table
- First problem with current behavior.
Define two vrfs with different ASNs and then add import between.
starting without any bgp config (no default instance)
A default instance is created with ASN of the vrf with the import
statement.
This may not be the ASN desired for the default table
- Second related problem. Start with a default instance and a vrf in a
different ASN. Do an import statement in the vrf for a bgp vrf instance
not yet defined and it auto-creates that bgp/vrf instance and it inherits
the ASN of the importing vrf
- Handle bgp instances with different ASNs and handle ASN for auto created
BGP instance
Signed-off-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Add counters for redistributed routes, and local aggregates to the
output of "show ip bgp statistics".
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
The check for an equivalent bgp pointer makes no sense
in the context of the workqueue as that we have a
work queue per bgp process, as such the bgp pointer
will always be the same as the pqnode.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There is a need to be able to process certain bgp
routes earlier than others. Especially when there
is major trauma going on in the network. Start
the ability for this to happen.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Ticket: #4060069
show bgp vrf afi unicast statistics json output is not return in json
format for non exists vrf.
Fix:
Json output is formatted for non exists vrf cases.
Command supported:
```
show bgp vrf <VRFNAME> ipv4/ipv6 unicast statistics json
show bgp vrf <VRFNAME> l2vpn evpn statistics json
```
Before Fix:
```
leaf11#
leaf11# show bgp vrf test ipv4 unicast statistics json
View/Vrf test is unknown
leaf11#
leaf11#
leaf11# show bgp vrf test ipv6 unicast statistics json
View/Vrf test is unknown
leaf11#
leaf11#
leaf11# show bgp vrf default1 l2vpn evpn statistics json
View/Vrf default1 is unknown
leaf11#
```
After Fix:
```
leaf11#
leaf11# show bgp vrf test ipv4 unicast statistics json
{
"warning":"View/Vrf is unknown"
}
leaf11#
leaf11#
leaf11# show bgp vrf test ipv6 unicast statistics json
{
"warning":"View/Vrf is unknown"
}
leaf11#
leaf11# show bgp vrf default1 l2vpn evpn statistics json
{
"warning":"View/Vrf is unknown"
}
leaf11#
```
Ticket: #4060069
Signed-off-by: Sindhu Parvathi Gopinathan's <sgopinathan@nvidia.com>
With lots of update-groups, subgroups, this could be very tricky and the timer
is spawned even if it's totally unnecessary (default-originate is not enabled).
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
evpn has a concept of `local` tables where the evpn routes
are actually converted into underlying routes/neighbor
table entries( or vice versa ). Then this local route
is propagated to the global evpn l2vpn table and sent
to the peers. Certain show commands in evpn look
operate on the local table but make the output look
like the data has not been sent to the peer. This
is confusing for the operator. Modify the code
such that local tables get a `Local BGP table not advertised`
in the place where the code talks about whom has received
the data or not.
Example:
torm11# show bgp l2vpn evpn route vni 1000 mac 8a:a1:cc:73:a3:ac ip 45.0.0.5
BGP routing table entry for [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5]
Paths: (2 available, best #2)
Local BGP table not advertised
Route [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5] VNI 1000
Imported from 192.168.100.18:2:[2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5], VNI 1000
65101 65005
192.168.100.18(leaf2) from leaf2(192.168.5.1) (192.168.100.14)
Origin IGP, valid, external
Extended Community: RT:65005:1000 ET:8
Last update: Thu Mar 21 14:29:04 2024
Route [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5] VNI 1000
Imported from 192.168.100.18:2:[2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5], VNI 1000
65101 65005
192.168.100.18(leaf1) from leaf1(192.168.1.1) (192.168.100.13)
Origin IGP, valid, external, bestpath-from-AS 65101, best (Router ID)
Extended Community: RT:65005:1000 ET:8
Last update: Thu Mar 21 14:29:04 2024
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If we expand the truth (A || B) to "(A && B) || (A && !B) || (!A && B)"
so that we can isolate the case (!A && B), we then add the additional
check (C) to ensure that original route actually has a link-local hext-hop
Signed-off-by: Richard Cunningham <29760295+cunningr@users.noreply.github.com>
Fix static-analyser warnings with BGP labels:
> $ scan-build make -j12
> bgpd/bgp_updgrp_packet.c:819:10: warning: Access to field 'extra' results in a dereference of a null pointer (loaded from variable 'path') [core.NullDereference]
> ? &path->extra->labels->label[0]
> ^~~~~~~~~
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
When parsing EVPN NLRIs, and an error occurred, do no forget to free the memory.
Fixes: 4ace11d010 ("bgpd: Move evpn_overlay to a pointer")
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
A crash happens when executing the following command:
> ubuntu2204hwe# conf
> ubuntu2204hwe(config)# router bgp 65500
> ubuntu2204hwe(config-router)# !
> ubuntu2204hwe(config-router)# address-family ipv4 unicast
> ubuntu2204hwe(config-router-af)# sid vpn export auto
> ubuntu2204hwe(config-router-af)# exit-address-family
> ubuntu2204hwe(config-router)# !
> ubuntu2204hwe(config-router)# address-family ipv4 vpn
> ubuntu2204hwe(config-router-af)# network 4.4.4.4/32 rd 55:55 label 556
> ubuntu2204hwe(config-router-af)# network 5.5.5.5/32 rd 662:33 label 232
> ubuntu2204hwe(config-router-af)# exit-address-family
> ubuntu2204hwe(config-router)# exit
> ubuntu2204hwe(config)# !
> ubuntu2204hwe(config)# no router bgp
The crash analysis indicates a memory item has been freed.
> #6 0x000076066a629c15 in mt_count_free (mt=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0)
> at lib/memory.c:73
> #7 mt_count_free (ptr=0x60200038b4f0, mt=0x56b57be85e00 <MTYPE_BGP_NAME>) at lib/memory.c:69
> #8 qfree (mt=mt@entry=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0) at lib/memory.c:129
> #9 0x000056b57bb09ce9 in bgp_free (bgp=<optimized out>) at bgpd/bgpd.c:4120
> #10 0x000056b57bb0aa73 in bgp_unlock (bgp=<optimized out>) at ./bgpd/bgpd.h:2513
> #11 peer_free (peer=0x62a000000200) at bgpd/bgpd.c:1313
> #12 0x000056b57bb0aca8 in peer_unlock_with_caller (name=<optimized out>, peer=<optimized out>)
> at bgpd/bgpd.c:1344
> #13 0x000076066a6dbb2c in event_call (thread=thread@entry=0x7ffc8cae1d60) at lib/event.c:2011
> #14 0x000076066a60aa88 in frr_run (master=0x613000000040) at lib/libfrr.c:1214
> #15 0x000056b57b8b2c44 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:543
Actually, the BGP_NAME item has not been used at allocation for
static->prd_pretty, and this results in reaching 0 quicker at bgp
deletion.
Fix this by reassigning MTYPE_BGP_NAME to prd_pretty.
Fixes: 16600df2c4 ("bgpd: fix show run of network route-distinguisher")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Instead of using 3 uint8_t variables under struct attr, let's use a single
uint8_t as the flags. Saving 2-bytes. Not a big deal, but it's even easier to
track EVPN-related flags/variables.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Introduce BGP-wide flags to denote if BGP has started gracefully
and GR is in progress or not. Use this for setting of the R-bit in
the GR capability, and not a timer which is set for any new
instance creation. Mark graceful restart is complete when the
deferred path selection has been done and route sync with zebra as
well as deferred EOR advertisement has been initiated.
Introduce a function to check on F-bit setting rather than just
base it on configuration.
Subsequent commits will extend these functionalities.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
RFC 8212 defines leak prevention for eBGP peers, but BGP-OAD defines a new
peering type One Administrative Domain (OAD), where multiple ASNs could be used
inside a single administrative domain. OAD allows sending non-transitive attributes,
so this prevention should be relaxed too.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Under a setup where two BGP prefixes are available from multiple sources,
if one of the two prefixes is recursive over the other BGP prefix, then
it will not be considered as multipath. The below output shows the two
prefixes 192.0.2.24/32 and 192.0.2.21/32. The 192.0.2.[5,6,8] are the
known IP addresses visible from the IGP.
> # show bgp ipv4 192.0.2.24/32
> *>i 192.0.2.24/32 192.0.2.21 0 100 0 i
> * i 192.0.2.21 0 100 0 i
> * i 192.0.2.21 0 100 0 i
> # show bgp ipv4 192.0.2.21/32
> *>i 192.0.2.21/32 192.0.2.5 0 100 0 i
> *=i 192.0.2.6 0 100 0 i
> *=i 192.0.2.8 0 100 0 i
The bgp best selection algorithm refuses to consider the paths to
'192.0.2.24/32' as multipath, whereas the BGP paths which use the
BGP peer as nexthop are considered multipath.
> ... has the same nexthop as the bestpath, skip it ...
Previously, this condition has been added to prevent ZEBRA from
installing routes with same nexthop:
> Here you can see the two paths with nexthop 210.2.2.2
> superm-redxp-05# show ip route 2.23.24.192/28
> Routing entry for 2.23.24.192/28
> Known via "bgp", distance 20, metric 0, best
> Last update 00:32:12 ago
> * 210.2.2.2, via swp3
> * 210.2.0.2, via swp1
> * 210.2.1.2, via swp2
> * 210.2.2.2, via swp3
> [..]
But today, ZEBRA knows how to handle it. When receiving incoming routes,
nexthop groups are used. At creation, duplicated nexthops are
identified, and will not be installed. The below output illustrate the
duplicate paths to 172.16.0.200 received by an other peer.
> r1# show ip route 172.18.1.100 nexthop-group
> Routing entry for 172.18.1.100/32
> Known via "bgp", distance 200, metric 0, best
> Last update 00:03:03 ago
> Nexthop Group ID: 75757580
> 172.16.0.200 (recursive), weight 1
> * 172.31.0.3, via r1-eth1, label 16055, weight 1
> * 172.31.2.4, via r1-eth2, label 16055, weight 1
> * 172.31.0.3, via r1-eth1, label 16006, weight 1
> * 172.31.2.4, via r1-eth2, label 16006, weight 1
> * 172.31.8.7, via r1-eth4, label 16008, weight 1
> 172.16.0.200 (duplicate nexthop removed) (recursive), weight 1
> 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16055, weight 1
> 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16055, weight 1
> 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16006, weight 1
> 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16006, weight 1
> 172.31.8.7, via r1-eth4 (duplicate nexthop removed), label 16008, weight 1
Fix this by proposing to let ZEBRA handle this duplicate decision.
Fixes: 7dc9d4e4e3 ("bgp may add multiple path entries with the same nexthop")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit addresses an issue that happens when using bgp
labeled unicast peering with a rr client, with a received prefix
which is the local ip address of the bgp session.
When using bgp ipv4 labeled session, the local prefix is
received by a peer, and finds out that the proposed prefix
and its next-hop are the same. To avoid a route loop locally,
no nexthop entry is referenced for that prefix, and the route
will not be selected.
As it has been done for ipv4-unicast, apply the following fix
for labeled address families: when the received peer is
a route reflector, the prefix has to be selected, even if the
route can not be installed locally.
Fixes: f874552557 ("bgpd: authorise to select bgp self peer prefix on rr case")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>