For JSON output we don't need newline to be printed.
Before:
```
"lastUpdate":{"epoch":1734490463,"string":"Wed Dec 18 04:54:23 2024\n"
```
After:
```
"lastUpdate":{"epoch":1734678560,"string":"Fri Dec 20 09:09:20 2024"
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Add missing json attribute to BGP path.
Fixes: 82c298be73 ("bgpd: Show RPKI short state in `show bgp <afi> <safi>`")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If we have a route-map that sets some attributes e.g. community or large-community,
and the route-map is applied for outgoing direction, everything is fine, but
we missed the point that `advertised-routes detail` was not using the applied
attributes to display and instead it uses what is received from the peer (original).
Let's fix this, and use what's already applied (advertise attributes), and
we can now see:
```
route-map r3 permit 10
match ip address prefix-list p1
set community 65001:65002
set extcommunity bandwidth 100
set large-community 65001:65002:65003
exit
!
...
address-family ipv4 unicast
neighbor 192.168.2.3 route-map r3 out
exit-address-family
...
```
The output:
```
r2# show bgp ipv4 neighbors 192.168.2.3 advertised-routes detail
BGP table version is 1, local router ID is 192.168.2.2, vrf id 0
Default local pref 100, local AS 65002
BGP routing table entry for 10.10.10.1/32, version 1
Paths: (1 available, best #1, table default)
Advertised to non peer-group peers:
192.168.1.1 192.168.2.3
65001
0.0.0.0 from 192.168.1.1 (192.168.1.1)
Origin IGP, valid, external, best (First path received)
Community: 65001:65002
Extended Community: LB:65002:12500000 (100.000 Mbps)
Large Community: 65001:65002:65003
Last update: Thu Dec 19 17:00:40 2024
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
This commit introduces meta queue to the BGP process_queue which is
helpful in having a priority of lists where some routes can be processed
earlier than 'other' routes. This is similar to how meta queue is
present in zebra.
After Fix:
---------
For testing, note that all 100.x routes are marked as Early routes which
got enqueued and dequeued first before Other routes in every batch of
updates. Also, the items are dequeued in FIFO order.
switch# cat /var/log/frr/bgpd.log | grep sub-queue
2024/12/06 19:19:42.788014 BGP: [V64FH-G6883] 88.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:42.856127 BGP: [V64FH-G6883] 100.90.9.186/32 queued into sub-queue Early Route
2024/12/06 19:19:42.856138 BGP: [V64FH-G6883] 100.90.9.187/32 queued into sub-queue Early Route
2024/12/06 19:19:42.886715 BGP: [V64FH-G6883] 66.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.022835 BGP: [V64FH-G6883] 33.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.058842 BGP: [V64FH-G6883] 44.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.092365 BGP: [V64FH-G6883] 55.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.540770 BGP: [ZAPXS-9754G] 100.90.9.186/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.541233 BGP: [ZAPXS-9754G] 100.90.9.187/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.541523 BGP: [ZAPXS-9754G] 88.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:43.602094 BGP: [V64FH-G6883] 88.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.649083 BGP: [V64FH-G6883] 100.90.9.186/32 queued into sub-queue Early Route
2024/12/06 19:19:43.649092 BGP: [V64FH-G6883] 100.90.9.187/32 queued into sub-queue Early Route
2024/12/06 19:19:43.649148 BGP: [V64FH-G6883] 77.0.0.9/32 queued into sub-queue Other Route
2024/12/06 19:19:43.712282 BGP: [V64FH-G6883] 100.90.9.138/32 queued into sub-queue Early Route
2024/12/06 19:19:43.712314 BGP: [V64FH-G6883] 100.90.9.139/32 queued into sub-queue Early Route
2024/12/06 19:19:43.817194 BGP: [V64FH-G6883] 100.90.8.58/32 queued into sub-queue Early Route
2024/12/06 19:19:43.817205 BGP: [V64FH-G6883] 100.90.8.59/32 queued into sub-queue Early Route
2024/12/06 19:19:43.942464 BGP: [ZAPXS-9754G] 100.90.9.186/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942530 BGP: [ZAPXS-9754G] 100.90.9.187/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942550 BGP: [ZAPXS-9754G] 100.90.9.138/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942738 BGP: [ZAPXS-9754G] 100.90.9.139/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942763 BGP: [ZAPXS-9754G] 100.90.8.58/32 dequeued from sub-queue Early Route
2024/12/06 19:19:43.942788 BGP: [ZAPXS-9754G] 100.90.8.59/32 dequeued from sub-queue Early Route
2024/12/06 19:19:44.558611 BGP: [ZAPXS-9754G] 66.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:44.893541 BGP: [ZAPXS-9754G] 33.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:45.171794 BGP: [ZAPXS-9754G] 44.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:45.453137 BGP: [ZAPXS-9754G] 55.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:45.685269 BGP: [ZAPXS-9754G] 88.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 19:19:45.764752 BGP: [ZAPXS-9754G] 77.0.0.9/32 dequeued from sub-queue Other Route
With 'update-delay' feature (EOIU marker):
------------------------------------------
switch# vtysh -c "show run bgp" | grep update-delay
update-delay 40
switch# cat /var/log/frr/bgpd.log | grep sub-queue
2024/12/06 23:27:46.124461 BGP: [V64FH-G6883] 22.0.0.9/32 queued into sub-queue Other Route
2024/12/06 23:27:46.160224 BGP: [V64FH-G6883] 100.90.8.11/32 queued into sub-queue Early Route
2024/12/06 23:27:46.219663 BGP: [W9QTR-P4REP] EOIU Marker queued into sub-queue EOIU Marker
2024/12/06 23:27:46.269711 BGP: [ZAPXS-9754G] 100.90.8.11/32 dequeued from sub-queue Early Route
2024/12/06 23:27:46.270980 BGP: [ZAPXS-9754G] 22.0.0.9/32 dequeued from sub-queue Other Route
2024/12/06 23:27:46.404868 BGP: [RBX2V-K33CZ] EOIU Marker dequeued from sub-queue EOIU Markera
Ticket: #4200787
Signed-off-by: Karthikeya Venkat Muppalla <kmuppalla@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If we have this construct:
for (pi = bgp_dest_get_bgp_path_info(dest); pi; pi = pi->next) {
...
bgp_process();
}
This can induce an infinite loop. This happens because bgp_process
will move the unsorted items to the top of the list for handling,
as such it is necessary to hold the next pointer to the side
to actually look at each possible bgp_path_info.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If you have a bestpath list that looks something like this:
<local evpn mac route>
<learned from peer out swp60>
<learned from peer out swp57>
And a network event happens that causes the peer out swp60
to not be in an established state, yet we still have the
path_info for the destination for swp60, bestpath
will currently end up with this order:
<learned from peer out swp60>
<local evpn mac route>
<learned from peer out swp57>
This causes the local evpn mac route to be deleted in zebra( Wrong! ).
This is happening because swp60 is skipped in bestpath calculation and
not considered to be a path yet it stays at the front of the list.
Modify bestpath calculation such that when pulling the unsorted_list
together to pull path info's into that list when they are also
not in a established state.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The json display of the version attribute is originally an
integer. It has changed, most probably mistakenly.
> {
> "vrfId": 7,
> "vrfName": "vrf1",
> "tableVersion": 3,
> "routerId": "192.0.2.1",
> "defaultLocPrf": 100,
> "localAS": 65500,
> "routes": {
> "172.31.0.1/32": {
> "prefix": "172.31.0.1/32",
> "version": "1", <--- int or string ??
Let us fix it, by using the integer display instead.
Fixes: f9f2d188e3 ("bgpd: fix 'json detail' output structure")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
If we have (default enabled) enabled `bgp ebgp-require-policy`, then first check
it before applying the route-maps.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
If we receive an IPv6 prefix e.g.: 2001:db8:100::/64 with nextop: 0.0.0.0, and
mp_nexthop: fc00::2, we should not treat this with an invalid nexthop because
of 0.0.0.0. We MUST check for MP_REACH attribute also and decide later if we
have at least one a valid nexthop.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Without the patch only the best path is displayed.
With the patch, display all paths including addpaths, but only for non-JSON
output to avoid breaking existing output.
E.g.:
```
munet> r2 shi vtysh -c 'sh ip bgp nei 192.168.2.3 advertised-routes'
Network Next Hop Metric LocPrf Weight Path
*> 172.16.16.254/32 192.168.2.3 0 0 65003 ?
* 172.16.16.254/32 192.168.2.4 0 0 65004 ?
*> 192.168.2.0/24 192.168.2.3 0 0 65003 ?
* 192.168.2.0/24 192.168.2.4 0 0 65004 ?
```
Before it was:
```
munet> r2 shi vtysh -c 'sh ip bgp nei 192.168.2.3 advertised-routes'
Network Next Hop Metric LocPrf Weight Path
*> 172.16.16.254/32 192.168.2.3 0 0 65003 ?
*> 192.168.2.0/24 192.168.2.3 0 0 65003 ?
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
A redistribute cmd can have a route-map attached to it and adding the
match source-protocol to that route-map means BGP to filter which
protocol routes to accept among the bunch of routes zebra is sending.
Fixing this since this wasnt implemented earlier.
Ticket :#4119692
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
We have this:
if ( (safi == SAFI_UNICAST) && ...)
do stuff
if ( (safi == SAFI_MPLS_VPN) && ... )
do stuff
this leads to having to test safi multiple times if safi is
SAFI_UNICAST. Let's make it a else if as that we know that
the safi is going to not change.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In an attempt to make the code faster let's just pass
in the prefix instead of having to do a lookup a majillion
times again after we already have it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
We iterated over all bgp_path_info's, but once we remove the path, we didn't
check for other paths under the same bgp_dest.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Currently the code to check to see if any input filters are
applicable is *before* the RFC 8212 check to see if we have
any filters at all. As such we have already tested for this,
so let's move this check for RFC 8212 to immediately before
the input filter test.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The nexthop reachability code was cut-n-pasted 2 times
with just a tiny bit of difference. If we ever change
that it becomes `fun` to keep them in sync. Since this
is more important than full on speed of code let's abstract
and get bgp_update() to be a bit easier to maintain.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The code is just arbitrarily checking to see if there are any
mac addresses associated with a prefix. This makes no
sense from the perspective that it can only happen as
an evpn route. Let's not make non-evpn people pay
the price to check this data.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The variable is_loop_check is being set and then later
we test against it multiple times. Move the setting
of whether or not to check for as loops to where it
is tested against and stop testing it multiple times.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In the interest of speeding up code, there is no point in
attempting to see if a label is usable if the number of labels
passed in is 0. Since that is a much much quicker test than
the bgp_is_valid_label() call, let's test that first.
Additionally, there is no point in walking the label[] array
passed in unless we are in the if statement, so move it inside.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In bgp_update(), the two variables allowas_in and aspath_loop_count
are only used when peer->change_local_as is true. Move the retrieval
of the allowas_in data to inside the if statement to save some
(very) small amount of time in bgp_update not gathering this
data unless the particular peer has this set.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When you have multiple paths to a particular route and a single
path changes. In addition of the other paths are either in
hold down or not established or really just not selected you
could end up with a situation where the bestpath choosen
was a path that was in hold down.
Modify the code such that when there is nothing worse
in bestpath selection for the choosen path, but were
unable to do any sorting, just put the path on the top
of the list and declare it the winner. Else just
do the original and put it at the end.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fix the display of the local label in show bgp.
> r1# show bgp ipv4 labeled-unicast 172.16.2.2/32
> BGP routing table entry for 172.16.2.2/32, version 2
> Local label: 16 <---- MISSING
> Paths: (1 available, best #1, table default, vrf (null))
> Advertised to non peer-group peers:
> 192.168.1.2
> 65501
> 192.168.1.2 from 192.168.1.2 (172.16.2.2)
> Origin IGP, metric 0, valid, external, best (First path received)
> Remote label: 3
> Last update: Fri Oct 25 17:55:45 2024
Fixes: 67f67ba481 ("bgpd: Drop label_ntop/label_pton functions")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
This is just a small optimization but when calling path_info_cmp
hundreds of millions of times this adds up.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
bgp_update is a very expensive call. Calling evpn_overlay_free
even when we have no evpn data to free is not trivial. Let's
limit the call into this function until we actually have data to
free.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
For consistency between RIB and BGP, the aigp comparison should
be made after the local route check in bgp bestpath selection.
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
Extended communities can be transitive or non-transitive.
Like other attributes (e.g., MED) non-transitive extended communities SHOULD
be sent to the direct peer, but not forward them to eBGP peers next.
Before this patch, we never send non-transitive extended attributes to the
direct peers at all.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Currently the AIGP is always incremented when a route with the
attribute is advertised. That is incorrect when the nexthop is
unchanged, as is commonly the case in route reflection.
Adjust the AIGP for propagation only when the nexthop is set
to ourselves.
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
OAD is treated as an _internal_ BGP peer, and some of the rules (including BGP
attributes) can be relaxed.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Fix several issues in sourcing AIGP attribute:
1) AIGP should not be set as default for a redistributed route or a
static network. It should be set by config instead.
2) AIGP sourced by "set aigp-metric igp-metric" in a route-map does
not set the correct value for a redistributed route.
3) When redistribute a connected route like loopback, the AGIP (with
value 0) is sourced by "set aigp-metric igp-metric", but the
attribute is not propagated as the attribute flag is not set.
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
If the underlay IGP metric changes, we SHOULD re-announce the routes with the
correct bpi->extra->igpmetric set.
Without this patch if the IGP link cost (metric) changes, we never notice this
and the peers do not have the updated metrics, which in turn causes incorrect
best path selections on remote peers.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
The nexthop metric should be added to AIGP when calculating the
bestpath in bgp_path_info_cmp().
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
AS 65000 | AS 65001
|
RR |
| |
R1 --- | --- R2
|
When r1 peer is an iBGP route reflector client of rr and r2 peer is a
eBGP neighbor of rr, and all three routers shares the same network, r2
receives announcements coming from r1 with a IPv6 link-local nexthop
from rr. This is incorrect as r2 should send traffic to r1 without
involving rr.
Do not send an IPv6 link-local nexthop if the originating peer is a
route-reflector client.
Link: https://github.com/FRRouting/frr/pull/16219#issuecomment-2397425505
Link: https://github.com/FRRouting/frr/pull/17037#discussion_r1792529683
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
In subgroup_announce_check(), the variable reflect is misleading, as it
suggests a relation to route reflection. However, it actually refers to
the scenario where an iBGP peer announces a route to another iBGP peer.
Rename reflect to ibgp_to_ibgp.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
If the "nexthop-local unchanged" setting is enabled, it preserves the
IPv6 link-local nexthop from the originating peer. However, if the
originating and destination peers are not on the same network segment,
the originating peer's IPv6 link-local address will be unreachable from
the destination peer.
In such cases, reset the IPv6 link-local nexthop, even if "nexthop-local
unchanged" is set on the destination peer.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Do not add an IPv6 link-local nexthop if the originating peer does not
provide one and the nexthop-local unchanged setting is enabled.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Currently bgp multipath has these properties:
a) mp_info may or may not be on a single path, based
upon path perturbations in the past.
b) mp_info->count started counting at 0( meaning 1 ). As that the
bestpath path_info was never included in the count
c) The first mp_info in the list held the multipath data associated
with the multipath. As such if you were at any other node that data
was not filled in.
d) As such the mp_info's that are not first on the list basically
were just pointers to the corresponding bgp_path_info that was in
the multipath.
e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was
created.
f) This linklist was passed in to a comparison function that took the
old mpinfo list and compared it item by item to the linklist and
doing magic to figure out how to create a new mp_info list.
g) the old mp_info and the link list had to be memory managed and
freed up.
h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the
multipath.
This is really complicated. Let's change the algorithm to this:
a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as
BGP_PATH_MULTIPATH_NEW.
b) When running multipath, just walk the list of bgp_path_info's and if
it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH.
If we run out of space to put in the ecmp, clear the flag on the rest.
c) Clean up the counting of sometimes adding 1 to the mpath count.
d) Only allocate a mpath_info node for the bestpath. Clean it up
when done with it.
e) remove the unneeded list management associated with the linklist and
the mp_list.
This greatly simplifies multipath computation for bgp and reduces memory
load for large scale deployments.
2 full feeds in work_queue_run prior:
0 56367.471 1123 50193 493695 50362 493791 0 0 0 TE work_queue_run
BGP multipath info : 1941844 48 110780992 1941844 110780992
2 full feeds in work_queue_run after change:
1 52924.931 1296 40837 465968 41025 487390 0 0 1 TE work_queue_run
BGP multipath info : 970860 32 38836880 970866 38837120
Aproximately 4 seconds of saved cpu time for convergence and ~75 mb
smaller run time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is handy when you need to do source matching e.g. `match src-peer ...`
on outgoing direction with a route-map.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
If we have soft inbound enabled, we should see how the route looks like
before it was modified by a route-map/prefix-list.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
1. bgp coredump is observed when we delete default bgp instance
when we have multi-vrf; and route-leaking is enabled between
default, non-default vrfs.
Removing default router bgp when routes leaked between non-default vrfs.
- Routes are leaked from VRF-A to VRF-B
- VPN table is created with auto RD/RT in default instance.
- Default instance is deleted, we try to unimport the routes from all VRFs
- non-default VRF schedules a work-queue to process deleted routes.
- Meanwhile default bgp instance clears VPN tables and free the route
entries as well, which are still referenced by non-default VRFs which
have imported routes.
- When work queue process starts to delete imported route in VRF-A it cores
as it accesses freed memory.
- Whenever we delete bgp in default vrf, we skip deleting routes in the vpn
table, import and export lists.
- The default hidden bgp instance will not be listed in any of the show
commands.
- Whenever we create new default instance, handle it with AS number change
i.e. old hidden default bgp's AS number is updated and also changing
local_as for all peers.
2. A default instance is created with ASN of the vrf with the import
statement.
This may not be the ASN desired for the default table
- First problem with current behavior.
Define two vrfs with different ASNs and then add import between.
starting without any bgp config (no default instance)
A default instance is created with ASN of the vrf with the import
statement.
This may not be the ASN desired for the default table
- Second related problem. Start with a default instance and a vrf in a
different ASN. Do an import statement in the vrf for a bgp vrf instance
not yet defined and it auto-creates that bgp/vrf instance and it inherits
the ASN of the importing vrf
- Handle bgp instances with different ASNs and handle ASN for auto created
BGP instance
Signed-off-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Add counters for redistributed routes, and local aggregates to the
output of "show ip bgp statistics".
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
The check for an equivalent bgp pointer makes no sense
in the context of the workqueue as that we have a
work queue per bgp process, as such the bgp pointer
will always be the same as the pqnode.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There is a need to be able to process certain bgp
routes earlier than others. Especially when there
is major trauma going on in the network. Start
the ability for this to happen.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Ticket: #4060069
show bgp vrf afi unicast statistics json output is not return in json
format for non exists vrf.
Fix:
Json output is formatted for non exists vrf cases.
Command supported:
```
show bgp vrf <VRFNAME> ipv4/ipv6 unicast statistics json
show bgp vrf <VRFNAME> l2vpn evpn statistics json
```
Before Fix:
```
leaf11#
leaf11# show bgp vrf test ipv4 unicast statistics json
View/Vrf test is unknown
leaf11#
leaf11#
leaf11# show bgp vrf test ipv6 unicast statistics json
View/Vrf test is unknown
leaf11#
leaf11#
leaf11# show bgp vrf default1 l2vpn evpn statistics json
View/Vrf default1 is unknown
leaf11#
```
After Fix:
```
leaf11#
leaf11# show bgp vrf test ipv4 unicast statistics json
{
"warning":"View/Vrf is unknown"
}
leaf11#
leaf11#
leaf11# show bgp vrf test ipv6 unicast statistics json
{
"warning":"View/Vrf is unknown"
}
leaf11#
leaf11# show bgp vrf default1 l2vpn evpn statistics json
{
"warning":"View/Vrf is unknown"
}
leaf11#
```
Ticket: #4060069
Signed-off-by: Sindhu Parvathi Gopinathan's <sgopinathan@nvidia.com>
With lots of update-groups, subgroups, this could be very tricky and the timer
is spawned even if it's totally unnecessary (default-originate is not enabled).
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
evpn has a concept of `local` tables where the evpn routes
are actually converted into underlying routes/neighbor
table entries( or vice versa ). Then this local route
is propagated to the global evpn l2vpn table and sent
to the peers. Certain show commands in evpn look
operate on the local table but make the output look
like the data has not been sent to the peer. This
is confusing for the operator. Modify the code
such that local tables get a `Local BGP table not advertised`
in the place where the code talks about whom has received
the data or not.
Example:
torm11# show bgp l2vpn evpn route vni 1000 mac 8a:a1:cc:73:a3:ac ip 45.0.0.5
BGP routing table entry for [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5]
Paths: (2 available, best #2)
Local BGP table not advertised
Route [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5] VNI 1000
Imported from 192.168.100.18:2:[2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5], VNI 1000
65101 65005
192.168.100.18(leaf2) from leaf2(192.168.5.1) (192.168.100.14)
Origin IGP, valid, external
Extended Community: RT:65005:1000 ET:8
Last update: Thu Mar 21 14:29:04 2024
Route [2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5] VNI 1000
Imported from 192.168.100.18:2:[2]:[0]:[48]:[8a:a1:cc:73:a3:ac]:[32]:[45.0.0.5], VNI 1000
65101 65005
192.168.100.18(leaf1) from leaf1(192.168.1.1) (192.168.100.13)
Origin IGP, valid, external, bestpath-from-AS 65101, best (Router ID)
Extended Community: RT:65005:1000 ET:8
Last update: Thu Mar 21 14:29:04 2024
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If we expand the truth (A || B) to "(A && B) || (A && !B) || (!A && B)"
so that we can isolate the case (!A && B), we then add the additional
check (C) to ensure that original route actually has a link-local hext-hop
Signed-off-by: Richard Cunningham <29760295+cunningr@users.noreply.github.com>
Fix static-analyser warnings with BGP labels:
> $ scan-build make -j12
> bgpd/bgp_updgrp_packet.c:819:10: warning: Access to field 'extra' results in a dereference of a null pointer (loaded from variable 'path') [core.NullDereference]
> ? &path->extra->labels->label[0]
> ^~~~~~~~~
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
When parsing EVPN NLRIs, and an error occurred, do no forget to free the memory.
Fixes: 4ace11d010 ("bgpd: Move evpn_overlay to a pointer")
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
A crash happens when executing the following command:
> ubuntu2204hwe# conf
> ubuntu2204hwe(config)# router bgp 65500
> ubuntu2204hwe(config-router)# !
> ubuntu2204hwe(config-router)# address-family ipv4 unicast
> ubuntu2204hwe(config-router-af)# sid vpn export auto
> ubuntu2204hwe(config-router-af)# exit-address-family
> ubuntu2204hwe(config-router)# !
> ubuntu2204hwe(config-router)# address-family ipv4 vpn
> ubuntu2204hwe(config-router-af)# network 4.4.4.4/32 rd 55:55 label 556
> ubuntu2204hwe(config-router-af)# network 5.5.5.5/32 rd 662:33 label 232
> ubuntu2204hwe(config-router-af)# exit-address-family
> ubuntu2204hwe(config-router)# exit
> ubuntu2204hwe(config)# !
> ubuntu2204hwe(config)# no router bgp
The crash analysis indicates a memory item has been freed.
> #6 0x000076066a629c15 in mt_count_free (mt=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0)
> at lib/memory.c:73
> #7 mt_count_free (ptr=0x60200038b4f0, mt=0x56b57be85e00 <MTYPE_BGP_NAME>) at lib/memory.c:69
> #8 qfree (mt=mt@entry=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0) at lib/memory.c:129
> #9 0x000056b57bb09ce9 in bgp_free (bgp=<optimized out>) at bgpd/bgpd.c:4120
> #10 0x000056b57bb0aa73 in bgp_unlock (bgp=<optimized out>) at ./bgpd/bgpd.h:2513
> #11 peer_free (peer=0x62a000000200) at bgpd/bgpd.c:1313
> #12 0x000056b57bb0aca8 in peer_unlock_with_caller (name=<optimized out>, peer=<optimized out>)
> at bgpd/bgpd.c:1344
> #13 0x000076066a6dbb2c in event_call (thread=thread@entry=0x7ffc8cae1d60) at lib/event.c:2011
> #14 0x000076066a60aa88 in frr_run (master=0x613000000040) at lib/libfrr.c:1214
> #15 0x000056b57b8b2c44 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:543
Actually, the BGP_NAME item has not been used at allocation for
static->prd_pretty, and this results in reaching 0 quicker at bgp
deletion.
Fix this by reassigning MTYPE_BGP_NAME to prd_pretty.
Fixes: 16600df2c4 ("bgpd: fix show run of network route-distinguisher")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Instead of using 3 uint8_t variables under struct attr, let's use a single
uint8_t as the flags. Saving 2-bytes. Not a big deal, but it's even easier to
track EVPN-related flags/variables.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Introduce BGP-wide flags to denote if BGP has started gracefully
and GR is in progress or not. Use this for setting of the R-bit in
the GR capability, and not a timer which is set for any new
instance creation. Mark graceful restart is complete when the
deferred path selection has been done and route sync with zebra as
well as deferred EOR advertisement has been initiated.
Introduce a function to check on F-bit setting rather than just
base it on configuration.
Subsequent commits will extend these functionalities.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
RFC 8212 defines leak prevention for eBGP peers, but BGP-OAD defines a new
peering type One Administrative Domain (OAD), where multiple ASNs could be used
inside a single administrative domain. OAD allows sending non-transitive attributes,
so this prevention should be relaxed too.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Under a setup where two BGP prefixes are available from multiple sources,
if one of the two prefixes is recursive over the other BGP prefix, then
it will not be considered as multipath. The below output shows the two
prefixes 192.0.2.24/32 and 192.0.2.21/32. The 192.0.2.[5,6,8] are the
known IP addresses visible from the IGP.
> # show bgp ipv4 192.0.2.24/32
> *>i 192.0.2.24/32 192.0.2.21 0 100 0 i
> * i 192.0.2.21 0 100 0 i
> * i 192.0.2.21 0 100 0 i
> # show bgp ipv4 192.0.2.21/32
> *>i 192.0.2.21/32 192.0.2.5 0 100 0 i
> *=i 192.0.2.6 0 100 0 i
> *=i 192.0.2.8 0 100 0 i
The bgp best selection algorithm refuses to consider the paths to
'192.0.2.24/32' as multipath, whereas the BGP paths which use the
BGP peer as nexthop are considered multipath.
> ... has the same nexthop as the bestpath, skip it ...
Previously, this condition has been added to prevent ZEBRA from
installing routes with same nexthop:
> Here you can see the two paths with nexthop 210.2.2.2
> superm-redxp-05# show ip route 2.23.24.192/28
> Routing entry for 2.23.24.192/28
> Known via "bgp", distance 20, metric 0, best
> Last update 00:32:12 ago
> * 210.2.2.2, via swp3
> * 210.2.0.2, via swp1
> * 210.2.1.2, via swp2
> * 210.2.2.2, via swp3
> [..]
But today, ZEBRA knows how to handle it. When receiving incoming routes,
nexthop groups are used. At creation, duplicated nexthops are
identified, and will not be installed. The below output illustrate the
duplicate paths to 172.16.0.200 received by an other peer.
> r1# show ip route 172.18.1.100 nexthop-group
> Routing entry for 172.18.1.100/32
> Known via "bgp", distance 200, metric 0, best
> Last update 00:03:03 ago
> Nexthop Group ID: 75757580
> 172.16.0.200 (recursive), weight 1
> * 172.31.0.3, via r1-eth1, label 16055, weight 1
> * 172.31.2.4, via r1-eth2, label 16055, weight 1
> * 172.31.0.3, via r1-eth1, label 16006, weight 1
> * 172.31.2.4, via r1-eth2, label 16006, weight 1
> * 172.31.8.7, via r1-eth4, label 16008, weight 1
> 172.16.0.200 (duplicate nexthop removed) (recursive), weight 1
> 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16055, weight 1
> 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16055, weight 1
> 172.31.0.3, via r1-eth1 (duplicate nexthop removed), label 16006, weight 1
> 172.31.2.4, via r1-eth2 (duplicate nexthop removed), label 16006, weight 1
> 172.31.8.7, via r1-eth4 (duplicate nexthop removed), label 16008, weight 1
Fix this by proposing to let ZEBRA handle this duplicate decision.
Fixes: 7dc9d4e4e3 ("bgp may add multiple path entries with the same nexthop")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit addresses an issue that happens when using bgp
labeled unicast peering with a rr client, with a received prefix
which is the local ip address of the bgp session.
When using bgp ipv4 labeled session, the local prefix is
received by a peer, and finds out that the proposed prefix
and its next-hop are the same. To avoid a route loop locally,
no nexthop entry is referenced for that prefix, and the route
will not be selected.
As it has been done for ipv4-unicast, apply the following fix
for labeled address families: when the received peer is
a route reflector, the prefix has to be selected, even if the
route can not be installed locally.
Fixes: f874552557 ("bgpd: authorise to select bgp self peer prefix on rr case")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
In a BGP L3VPN context using ADJ-RIB-IN (ie. enabled with
'soft-reconfiguration inbound'), after applying a deny route-map and
removing it, the remote MPLS label information is lost. As a result, BGP
is unable to re-install the related routes in the RIB.
For example,
> router bgp 65500
> [..]
> neighbor 192.0.2.2 remote-as 65501
> address-family ipv4 vpn
> neighbor 192.0.2.2 activate
> neighbor 192.0.2.2 soft-reconfiguration inbound
The 192.168.0.0/24 prefix has a remote label value of 102 in the BGP
RIB.
> # show bgp ipv4 vpn 192.168.0.0/24
> BGP routing table entry for 444:1:192.168.0.0/24, version 2
> [..]
> 192.168.0.0 from 192.0.2.2
> Origin incomplete, metric 0, valid, external, best (First path received)
> Extended Community: RT:52:100
> Remote label: 102
A route-map now filter all incoming BGP updates:
> route-map rmap deny 1
> router bgp 65500
> address-family ipv4 vpn
> neighbor 192.0.2.2 route-map rmap in
The prefix is now filtered:
> # show bgp ipv4 vpn 192.168.0.0/24
> #
The route-map is detached:
> router bgp 65500
> address-family ipv4 vpn
> no neighbor 192.168.0.1 route-map rmap in
The BGP RIB entry is present but the remote label is lost:
> # show bgp ipv4 vpn 192.168.0.0/24
> BGP routing table entry for 444:1:192.168.0.0/24, version 2
> [..]
> 192.168.0.0 from 192.0.2.2
> Origin incomplete, metric 0, valid, external, best (First path received)
> Extended Community: RT:52:100
The reason for the loose is that labels are stored within struct attr ->
struct extra -> struct bgp_labels but not in the struct bgp_adj_in.
Reference the bgp_labels pointer in struct bgp_adj_in and use its values
when doing a soft reconfiguration of the BGP table.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
In route_vty_out_detail(), tag_buf stores a string representation of
the VNI label.
Rename tag_buf to vni_buf for clarity and rework the code a little bit
to prepare the following commits.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>