When BGP has been asked to wait for FIB installation, on route
removal a return call is likely to not have the dest since BGP
will have cleaned up the node, entirely. Let's just note that
the prefix cannot be found if debugs are turned on and move on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
BGP is now keeping a list of dests with the dest having a pointer
to the bgp_path_info that it will be working on.
1) When bgp receives a prefix, process it, add the bgp_dest of the
prefix into the new Fifo list if not present, update the flags (Ex:
earlier if the prefix was advertised and now it is a withdrawn),
increment the ref_count and DO NOT advertise the install/withdraw
to zebra yet.
2) Schedule an event to wake up to invoke the new function which will
walk the list one by one and installs/withdraws the routes into zebra.
a) if BUFFER_EMPTY, process the next item on the list
b) if BUFFER_PENDING, bail out and the callback in
zclient_flush_data() will invoke the same function when BUFFER_EMPTY
Changes
- rename old bgp_zebra_announce to bgp_zebra_announce_actual
- rename old bgp_zebra_withdrw to bgp_zebra_withdraw_actual
- Handle new fifo list cleanup in bgp_exit()
- New funcs: bgp_handle_route_announcements_to_zebra() and
bgp_zebra_route_install()
- Define a callback function to invoke
bgp_handle_route_announcements_to_zebra() when BUFFER_EMPTY in
zclient_flush_data()
The current change deals with bgp installing routes via
bgp_process_main_one()
Ticket: #3390099
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Since installing/withdrawing routes into zebra is going to be changed
around to be dest based in a list,
- Retrieve the afi/safi to use based upon the dest's afi/safi
instead of passing it in.
- Prefix is known by the dest. Remove this arg as well
Ticket: #3390099
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Move mp_nexthop_prefer_global boolean attribute to nh_flags. It does
not currently save memory because of the packing.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
If the VRF is not yet created and a BGP instance is created for the
VRF, dependent leaked routes are inactive, which is normal. However,
when the VRF interface appears, they remains inactive.
Update route leak when a VRF interface appears. Note that routes to a
deleted VRF are already removed by zebra.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Locally leaked routes remain active after the nexthop VRF interface goes
down.
Update route leaking when the loopback or a VRF interface state change is
received from zebra.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Leaked recursive routes are not resolved.
> VRF r1-cust1:
> B> 5.1.0.0/24 [200/98] via 99.0.0.1 (recursive), weight 1, 00:00:08
> * via 192.168.1.2, r1-eth4, weight 1, 00:00:08
> B>* 99.0.0.1/32 [200/0] via 192.168.1.2, r1-eth4, weight 1, 00:00:08
> VRF r1-cust4:
> B 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) inactive, weight 1, 00:00:08
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:08
When announcing the routes to zebra, use the peer of the ultimate bgp
path info instead of the one of the first parent path info to determine
whether the route is recursive.
The result is:
> VRF r1-cust4:
> B> 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) (recursive), weight 1, 00:00:02
> * via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
When a BGP flowspec peering stops, the BGP RIB entries for IPv6
flowspec entries are removed, but not the ZEBRA RIB IPv6 entries.
Actually, when calling bgp_zebra_withdraw() function call, only
the AFI_IP parameter is passed to the bgp_pbr_update_entry() function
in charge of the Flowspec add/delete in zebra. Fix this by passing
the AFI parameter to the bgp_zebra_withdraw() function.
Note that using topotest does not show up the problem as the
flowspec driver code is not present and was refused. Without that,
routes are not installed, and can not be uninstalled.
Fixes: 529efa2346 ("bgpd: allow flowspec entries to be announced to zebra")
Link: https://github.com/FRRouting/frr/pull/2025
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There is no function that both sets the nhg id, and sets
the ZAPI_MESSAGE_NHG flag if the nhg id is valid.
Create a ZAPI API to do this, and apply the changes wherever
needed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Separate the processing in bgp_zebra_announce(), by separating the
nexthop code in a separate function called
bgp_zebra_announce_parse_nexthop(). This commit does not bring any
functional change.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When adding/removing a route, the next-hop can be dumped
with debugging turned on. Move this function in a separate
function. There is no other change in this commit.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Implement proper memory cleanup for SRv6 functions and locator chunks to prevent potential memory leaks.
The list callback deletion functions have been set.
The ASan leak log for reference:
```
***********************************************************************************
Address Sanitizer Error detected in bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.asan.bgpd.4180
=================================================================
==4180==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 544 byte(s) in 2 object(s) allocated from:
#0 0x7f8d176a0d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x7f8d1709f238 in qcalloc lib/memory.c:105
#2 0x55d5dba6ee75 in sid_register bgpd/bgp_mplsvpn.c:591
#3 0x55d5dba6ee75 in alloc_new_sid bgpd/bgp_mplsvpn.c:712
#4 0x55d5dba6f3ce in ensure_vrf_tovpn_sid_per_af bgpd/bgp_mplsvpn.c:758
#5 0x55d5dba6fb94 in ensure_vrf_tovpn_sid bgpd/bgp_mplsvpn.c:849
#6 0x55d5dba7f975 in vpn_leak_postchange bgpd/bgp_mplsvpn.h:299
#7 0x55d5dba7f975 in vpn_leak_postchange_all bgpd/bgp_mplsvpn.c:3704
#8 0x55d5dbbb6c66 in bgp_zebra_process_srv6_locator_chunk bgpd/bgp_zebra.c:3164
#9 0x7f8d1716f08a in zclient_read lib/zclient.c:4459
#10 0x7f8d1713f034 in event_call lib/event.c:1974
#11 0x7f8d1708242b in frr_run lib/libfrr.c:1214
#12 0x55d5db99d19d in main bgpd/bgp_main.c:510
#13 0x7f8d160c5c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Direct leak of 296 byte(s) in 1 object(s) allocated from:
#0 0x7f8d176a0d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x7f8d1709f238 in qcalloc lib/memory.c:105
#2 0x7f8d170b1d5f in srv6_locator_chunk_alloc lib/srv6.c:135
#3 0x55d5dbbb6a19 in bgp_zebra_process_srv6_locator_chunk bgpd/bgp_zebra.c:3144
#4 0x7f8d1716f08a in zclient_read lib/zclient.c:4459
#5 0x7f8d1713f034 in event_call lib/event.c:1974
#6 0x7f8d1708242b in frr_run lib/libfrr.c:1214
#7 0x55d5db99d19d in main bgpd/bgp_main.c:510
#8 0x7f8d160c5c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
***********************************************************************************
```
Signed-off-by: Keelan Cannoo <keelan.cannoo@icloud.com>
... and use it instead of fiddling with the `.synchronous` field.
(Make it const while at it.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Replace `struct list *` with `DLIST(if_connected, ...)`.
NB: while converting this, I found multiple places using connected
prefixes assuming they were IPv4 without checking:
- vrrpd/vrrp.c: vrrp_socket()
- zebra/irdp_interface.c: irdp_get_prefix(), irdp_if_start(),
irdp_advert_off()
(these fixes are really hard to split off into separate commits as that
would require going back and reapplying the change but with the old list
handling)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
INTERFACE_NAMSIZ is just a redefine of IFNAMSIZ and IFNAMSIZ
is the standard for interface name length on all platforms
that FRR currently compiles on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Let's use the natural data structure in bgp for the prefix display
instead of a bunch of places where we call a translator function.
The %pBD does this and actually ensures data is correct.
Also fix a few spots in bgp_zebra.c where the cast to a NULL
pointer causes the catcher functionality to not work and fix
the resulting crash that resulted.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add guard for `zlog_debug` when bgpd is not connected to zebra
or zebra does not know the bgp instance.
Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
...so that multiple functions can be subscribed.
The create/destroy hooks are renamed to real/unreal because that's what
they *actually* signal.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Add the 'redistribute table-direct' command under the bgp address-family
node. Handle the table-direct support wherever needed in the BGP code.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Today, when configuring BGP L3VPN mpls, the operator may
use that command to hardset a label value:
> router bgp 65500 vrf vrf1
> address-family ipv4 unicast
> label vpn export <hardset_label_value>
Today, BGP uses this value without checks, leading to potential
conflicts with other control planes like LDP. For instance, if
LDP initiates with a label chunk of [16;72] and BGP also uses the
50 label value, a conflict arises.
The 'label manager' service in zebra oversees label allocations.
While all the control plane daemons use it, BGP doesn't when a
hardset label is in place.
This update fixes this problem. Now, when a hardset label is set for
l3vpn export, a request is made to the label manager for approval,
ensuring no conflicts with other daemons. But, this means some existing
BGP configurations might become non-operational if they conflict with
labels already allocated to another daemon but not used.
note: Labels below 16 are reserved and won't be checked for consistency
by the label manager.
Fixes: ddb5b4880b ("bgpd: vpn-vrf route leaking")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Instead of scaling the bandwith to something between 1 and 100, just
send down the bandwidth Available for the link.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently when one interface changes its VRF, zebra will send these messages to
all daemons in *order*:
1) `ZEBRA_INTERFACE_DELETE` ( notify them delete from old VRF )
2) `ZEBRA_INTERFACE_VRF_UPDATE` ( notify them move from old to new VRF )
3) `ZEBRA_INTERFACE_ADD` ( notify them added into new VRF )
When daemons deal with `VRF_UPDATE`, they use
`zebra_interface_vrf_update_read()->if_lookup_by_name()`
to check the interface exist or not in old VRF. This check will always return
*NULL* because `DELETE` ( deleted from old VRF ) is already done, so can't
find this interface in old VRF.
Send `VRF_UPDATE` is redundant and unuseful. `DELETE` and `ADD` are enough,
they will deal with RB tree, so don't send this `VRF_UPDATE` message when
vrf changes.
Since all daemons have good mechanism to deal with changing vrf, and don't
use this `VRF_UPDATE` mechanism. So, it is safe to completely remove
all the code with `VRF_UPDATE`.
Signed-off-by: anlan_cs <anlan_cs@tom.com>
Append zebra and lib to use muliple SRv6 segs SIDs, and keep one
seg SID for bgpd and sharpd.
Note: bgpd and sharpd compilation relies on the lib and zebra files,
i.e if we separate this: lib or zebra or bgpd or sharpd in different
commits - this will not compile.
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
Link-state prefixes are only intended to be read for a link-state
consumer (i.e. a controler). They cannot be installed in Forwarding
Information Base (FIB).
Do not announce them to zebra.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Modify bgp_fsm_change_status to be connection oriented and
also make the BGP_TIMER_ON and BGP_EVENT_ADD macros connection
oriented as well. Attempt to make peer_xfer_conn a bit more
understandable because, frankly it was/is confusing.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When the BGP 'redistribute table' command is used for a given route
table, and BGP configuration is flushed and rebuilt, the redistribution
does not work.
Actually, when flushing the BGP configuration with the 'no router bgp'
command, the BGP redistribute entries related to the 'redistribute table'
entries are not flushed. Actually, at BGP deletion, the table number is
not given as parameter in bgp_redistribute_unset() function, and the
redistribution entry is not removed in zebra.
Fix this by adding some code to flush all the redistribute table
instances.
Fixes: 7c8ff89e93 ("Multi-Instance OSPF Summary")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This is based on @donaldsharp's work
The current code base is the struct bgp_node data structure.
The problem with this is that it creates a bunch of
extra data per route_node.
The table structure generates ‘holder’ nodes
that are never going to receive bgp routes,
and now the memory of those nodes is allocated
as if they are a full bgp_node.
After splitting up the bgp_node into bgp_dest and route_node,
the memory of ‘holder’ node which does not have any bgp data
will be allocated as the route_node, not the bgp_node,
and the memory usage is reduced.
The memory usage of BGP node will be reduced from 200B to 96B.
The total memory usage optimization of this part is ~16.00%.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Yuqing Zhao <xiaopanghu99@163.com>
In the netlink-mediated kernel dataplane, each rule is stored
in either an IPv4-specific database or an IPv6-specific database.
PBRD opportunistically gleans each rule's address family value
from its source or destination IP address match value (if either
exists), or from its nexthop or nexthop-group (if it exists).
The 'family' value is particularly needed for netlink during
incremental rule deletion when none of the above fields remain set.
Before now, this address family has been encoded by occult means
in the (possibly otherwise unset) source/destination IP match
fields in ZAPI and zebra.
This commit documents the reasons for maintaining the 'family'
field in the PBRD rule structure, adds a 'family' field in the
common lib/pbr.h rule structure, and carries it explicitly in ZAPI.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
Even if some of the attributes in bgp_path_info_extra are
not used, their memory is still allocated every time. It
cause a waste of memory.
This commit code deletes all unnecessary attributes and
changes the optional attributes to pointer storage. Memory
will only be allocated when they are actually used. After
optimization, extra info related memory is reduced by about
half(~400B -> ~200B).
Signed-off-by: Valerian_He <1826906282@qq.com>
Over time the bgp_zebra_announce_default function has gotten
slightly convoluted, clean it up so it's easier to read
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
bgpd, pbrd: use common pbr encoder
zebra: use common pbr decoder
tests: pbr_topo1: check more filter fields
Purpose:
1. Reduce likelihood of zapi format mismatches when adding
PBR fields due to multiple parallel encoder implementations
2. Encourage common PBR structure usage among various daemons
3. Reduce coding errors via explicit per-field enable flags
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
Subset: ZAPI changes to send the new data
Also adds filter_bm field; currently for PBR_FILTER_PCP, but in the
future to be used for all of the filter fields.
Changes by:
Josh Werner <joshuawerner@mitre.org>
Eli Baum <ebaum@mitre.org>
G. Paul Ziemba <paulz@labn.net>
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
bgp_zebra_tm_connect calls bgp_zebra_get_table_range which
just used the global zclient. Which of course still had
us exposing the global zclient to read and drop important
data from zebra. This fixes commit 787c61e03c
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If this an error, we should use zlog_err, not zlog_info as this is literally
not an information, but an error.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When running all daemons with config for most of them, FRR has
sharpd@janelle:~/frr$ vtysh -c "show debug hashtable" | grep "VRF BIT HASH" | wc -l
3570
3570 hashes for bitmaps associated with the vrf. This is a very
large number of hashes. Let's do two things:
a) Reduce the created size of the actually created hashes to 2
instead of 32.
b) Delay generation of the hash *until* a set operation happens.
As that no hash directly implies a unset value if/when checked.
This reduces the number of hashes to 61 in my setup for normal
operation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Both the label manager and table manager zapi code send data requests via zapi
to zebra and then immediately listen for a response from zebra. The problem here
is of course that the listen part is throwing away any zapi command that is not
the one it is looking for.
ISIS/OSPF and PIM all have synchronous abilities via zapi, which they all
do through a special zapi connection to zebra. BGP needs to follow this model
as well. Additionally the new zclient_sync connection that should be created,
a once a second timer should wake up and read any data on the socket to
prevent problems too much data accumulating in the socket.
```
r3# sh bgp labelpool summary
Labelpool Summary
-----------------
Ledger: 3
InUse: 3
Requests: 0
LabelChunks: 1
Pending: 128
Reconnects: 1
r3# sh bgp labelpool inuse
Prefix Label
---------------------------
10.0.0.1/32 16
192.168.31.0/24 17
192.168.32.0/24 18
r3#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When advertising an mpls vpn entry with a new label,
the return traffic is redirected to the local machine,
but the MPLS traffic is dropped.
Add an MPLS entry to handle MPLS packets which have
the new label value. Traffic is swapped to the original
label value from the mpls vpn next-hop entry; then it is
sent to the resolved next-hop of the original next-hop
from the mpls vpn next-hop entry.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
For whatever reason, we were only updating tip_hash when we processed an
L2VNI add/del. This adds tip_hash updates to the L3VNI add/del codepaths
so that their VTEP-IPs are also used when when considering martian
addresses, e.g. bgp_nexthop_self().
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
BGP MPLSVPN next hop label allocation was using only the next-hop
IP address. As MPLSVPN contexts rely on bnc contexts, the real
nexthop interface is known, and the LSP entry to enter can apply
to the specific interface. To illustrate, the BGP service is able
to handle the following two iproute2 commands:
> ip -f mpls route add 105 via inet 192.0.2.45 dev r1-eth1
> ip -f mpls route add 105 via inet 192.0.2.46 dev r1-eth2
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
> r1# show running-config
> [..]
> vrf vrf1
> ip route 172.31.0.30/32 172.31.0.20
> r1# show bgp vrf vrf1 nexthop
> [..]
> 172.31.0.20 valid [IGP metric 0], #paths 1
> gate 192.0.2.11
> gate 192.0.2.12
> Last update: Mon Jan 16 09:27:09 2023
> Paths:
> 1/1 172.31.0.30/32 VRF vrf1 flags 0x20018
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The flag for telling BGP that a route is expected to be installed
first before notifying a peer was always being set upon receipt
of a path that could be accepted as bestpath. This is not correct:
imagine that you have a peer sending you a route and you have a
network statement that covers the same route. Irrelevant if the
network statement would win the flag on the dest was being set
in bgp_update. Thus you could get into a situation where
the network statement path wins but since the flag is set on
the node, it will never be announced to a peer.
Let's just move the setting of the flag into bgp_zebra_announce
and _withdraw. In _announce set the flag to TRUE when suppress-fib
is enabled. In _withdraw just always unset the flag as that a withdrawal
does not need to wait for rib removal before announcing. This will
cover the case when a network statement is added after the route has
been learned from a peer.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
BGP MPLSVPN next hop label allocation was using only the next-hop
IP address. As MPLSVPN contexts rely on bnc contexts, the real
nexthop interface is known, and the LSP entry to enter can apply
to the specific interface. To illustrate, the BGP service is able
to handle the following two iproute2 commands:
> ip -f mpls route add 105 via inet 192.0.2.45 dev r1-eth1
> ip -f mpls route add 105 via inet 192.0.2.46 dev r1-eth2
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
> r1# show running-config
> [..]
> vrf vrf1
> ip route 172.31.0.30/32 172.31.0.20
> r1# show bgp vrf vrf1 nexthop
> [..]
> 172.31.0.20 valid [IGP metric 0], #paths 1
> gate 192.0.2.11
> gate 192.0.2.12
> Last update: Mon Jan 16 09:27:09 2023
> Paths:
> 1/1 172.31.0.30/32 VRF vrf1 flags 0x20018
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There are some specific edge-cases when is a need to run FRR and another FRR
and/or another BGP implementation on the same box. Relaxing 127.0.0.0/8 for
this case might be reasonable.
An example below peering via 127.0.0.0/8 between FRR and GoBGP:
```
% ss -ntlp | grep 179
LISTEN 0 4096 127.0.0.1:179 0.0.0.0:*
LISTEN 0 128 127.0.0.2:179 0.0.0.0:*
% grep 127.0.0.2 /etc/frr/daemons
bgpd_options=" -A 127.0.0.1 -l 127.0.0.2"
% grep local /etc/gobgp/config.toml
local-address-list = ["127.0.0.1"]
donatas-pc# sh ip bgp summary
IPv4 Unicast Summary (VRF default):
BGP router identifier 192.168.10.17, local AS number 65001 vrf-id 0
BGP table version 0
RIB entries 0, using 0 bytes of memory
Peers 1, using 725 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
127.0.0.1 4 65002 7 7 0 0 0 00:02:02 0 0 N/A
Total number of neighbors 1
donatas-pc#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
a) Make it legible what type of message is being passed
back and forth instead of having to guess it from
the insufficient debugs
b) Make it explicit which bgp instance is sending this
data
c) Cleanup bgp_zebra_update to have a cleaner api
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Static Analysis caught a bug where we could be reading
garbage values for labels/num_lables. Fix that by
ensuring it's set to NULL/0 per loop of the mpath.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add some bgp_path_info helper functions for getting the correct l3vni
label, getting the vni from the label stack, and determinging if
the mpath is D-VNI based.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add functionality to always send the L3VNI to zebra as a label
on the route. It will be zebra's job to determine how to use it (i.e.
via Single Vxlan Device or not).
The l3VNI according to rfc should always be the second for a type2 route
and be the only one available for a type5. Hence, we can just grab the
last label in the stack here and add it onto the route.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Take it into consideration for one debug log:
EVPN MAC-IP routes with a L3 NHG id, has no nexthops.
Not "delete", but "add".
Before:
```
Tx route delete VRF 21 192.168.30.253/32 metric 0 tag 0 count 0 nhg 72580649
```
After:
```
Tx route add VRF 21 192.168.30.253/32 metric 0 tag 0 count 0 nhg 72580649
```
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Current bgpd can't annouce SRv6 locally-generated routes to Zebra
correctly because MPLS label of locally-generated routes is not valid
but sid_info->transposition_len is set to non-zero value. This commit
fixes such kind of issues.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
When the last IPv4 address of an interface is deleted, Linux removes all
routes includes BGP ones using this interface without any Netlink
advertisement. bgpd keeps them in RIB as valid (e.g. installed in FIB).
The previous patch invalidates the associated nexthop groups in zebra
but bgpd is not notified of the event.
> 2022/05/09 17:37:52.925 ZEBRA: [TQKA8-0276P] Not Notifying Owner: connected about prefix 29.0.0.0/24(40) 3 vrf: 7
Look for the bgp_path_info that are unsynchronized with the kernel and
flag them for refresh in their attributes. A VPN route leaking update is
calles and the refresh flag triggers a route refresh to zebra and then a
kernel FIB installation.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Previous commits have introduced a new 8 bits nh_flag in the attr
struct that has increased the memory footprint.
Move the mp_nexthop_prefer_global boolean in the attr structure that
takes 8 bits to the new nh_flag in order to go back to the previous
memory utilization.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Leaked recursive routes are not resolved.
> VRF r1-cust1:
> B> 5.1.0.0/24 [200/98] via 99.0.0.1 (recursive), weight 1, 00:00:08
> * via 192.168.1.2, r1-eth4, weight 1, 00:00:08
> B>* 99.0.0.1/32 [200/0] via 192.168.1.2, r1-eth4, weight 1, 00:00:08
> VRF r1-cust4:
> B 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) inactive, weight 1, 00:00:08
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:08
When announcing the routes to zebra, use the peer of the ultimate bgp
path info instead of the one of the first parent path info to determine
whether the route is recursive.
The result is:
> VRF r1-cust4:
> B> 5.1.0.0/24 [20/98] via 99.0.0.1 (vrf r1-cust1) (recursive), weight 1, 00:00:02
> * via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
> B>* 99.0.0.1/32 [20/0] via 192.168.1.2, r1-eth4 (vrf r1-cust1), weight 1, 00:00:02
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
At bgpd startup, VRF instances are sent from zebra before the
interfaces. When importing a l3vpn prefix from another local VRF
instance, the interfaces are not known yet. The prefix nexthop interface
cannot be set to the loopback or the VRF interface, which causes setting
invalid routes in zebra.
Update route leaking when the loopback or a VRF interface is received
from zebra.
At a VRF interface deletion, zebra voluntarily sends a
ZEBRA_INTERFACE_ADD message to move it to VRF_DEFAULT. Do not update if
such a message is received. VRF destruction will destroy all the related
routes without adding codes.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Use %pI4/%pI6 where possible, otherwise at least atjust stack buffer sizes
for inet_ntop() calls.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>