`bmnc->nh` was not properly freed, leading to a memory leak.
The commit adds a check to ensure that the `bmnc->nh` member variable is freed if it exists.
The ASan leak log for reference:
```
***********************************************************************************
Address Sanitizer Error detected in bgp_vpnv4_asbr.test_bgp_vpnv4_asbr/r2.asan.bgpd.6382
=================================================================
==6382==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 720 byte(s) in 5 object(s) allocated from:
#0 0x7f6a80d02d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x55c9afd7c81c in qcalloc lib/memory.c:105
#2 0x55c9afd9166b in nexthop_new lib/nexthop.c:358
#3 0x55c9afd93aaa in nexthop_dup lib/nexthop.c:843
#4 0x55c9afad39bb in bgp_mplsvpn_nh_label_bind_register_local_label bgpd/bgp_mplsvpn.c:4259
#5 0x55c9afb1c5e9 in bgp_mplsvpn_handle_label_allocation bgpd/bgp_route.c:3239
#6 0x55c9afb1c5e9 in bgp_process_main_one bgpd/bgp_route.c:3339
#7 0x55c9afb1d2c1 in bgp_process_wq bgpd/bgp_route.c:3591
#8 0x55c9afe33df9 in work_queue_run lib/workqueue.c:266
#9 0x55c9afe198e2 in event_call lib/event.c:1995
#10 0x55c9afd5fc6f in frr_run lib/libfrr.c:1213
#11 0x55c9af9f6f00 in main bgpd/bgp_main.c:505
#12 0x7f6a7f55ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Indirect leak of 16 byte(s) in 2 object(s) allocated from:
#0 0x7f6a80d02d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x55c9afd7c81c in qcalloc lib/memory.c:105
#2 0x55c9afd91ce8 in nexthop_add_labels lib/nexthop.c:536
#3 0x55c9afd93754 in nexthop_copy_no_recurse lib/nexthop.c:802
#4 0x55c9afd939fb in nexthop_copy lib/nexthop.c:821
#5 0x55c9afd93abb in nexthop_dup lib/nexthop.c:845
#6 0x55c9afad39bb in bgp_mplsvpn_nh_label_bind_register_local_label bgpd/bgp_mplsvpn.c:4259
#7 0x55c9afb1c5e9 in bgp_mplsvpn_handle_label_allocation bgpd/bgp_route.c:3239
#8 0x55c9afb1c5e9 in bgp_process_main_one bgpd/bgp_route.c:3339
#9 0x55c9afb1d2c1 in bgp_process_wq bgpd/bgp_route.c:3591
#10 0x55c9afe33df9 in work_queue_run lib/workqueue.c:266
#11 0x55c9afe198e2 in event_call lib/event.c:1995
#12 0x55c9afd5fc6f in frr_run lib/libfrr.c:1213
#13 0x55c9af9f6f00 in main bgpd/bgp_main.c:505
#14 0x7f6a7f55ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
SUMMARY: AddressSanitizer: 736 byte(s) leaked in 7 allocation(s).
***********************************************************************************
```
Signed-off-by: Keelan Cannoo <keelan.cannoo@icloud.com>
The bgp vpn policy had some attribute not free when the function bgp_free was called leading to memory leak as shown below.
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251:Direct leak of 592 byte(s) in 2 object(s) allocated from:
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #0 0x7f4b7ae92037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #1 0x7f4b7aa96e38 in qcalloc lib/memory.c:105
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #2 0x7f4b7aa9bec9 in srv6_locator_chunk_alloc lib/srv6.c:135
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #3 0x56396f8e56f8 in ensure_vrf_tovpn_sid_per_af bgpd/bgp_mplsvpn.c:752
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #4 0x56396f8e608a in ensure_vrf_tovpn_sid bgpd/bgp_mplsvpn.c:846
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #5 0x56396f8e075d in vpn_leak_postchange bgpd/bgp_mplsvpn.h:259
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #6 0x56396f8f3e5b in vpn_leak_postchange_all bgpd/bgp_mplsvpn.c:3397
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #7 0x56396fa920ef in bgp_zebra_process_srv6_locator_chunk bgpd/bgp_zebra.c:3238
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #8 0x7f4b7abb2913 in zclient_read lib/zclient.c:4134
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #9 0x7f4b7ab62010 in thread_call lib/thread.c:1991
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #10 0x7f4b7aa5a418 in frr_run lib/libfrr.c:1185
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #11 0x56396f7d756d in main bgpd/bgp_main.c:505
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #12 0x7f4b7a479d09 in __libc_start_main ../csu/libc-start.c:308
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251-
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251:Direct leak of 32 byte(s) in 2 object(s) allocated from:
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #0 0x7f4b7ae92037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #1 0x7f4b7aa96e38 in qcalloc lib/memory.c:105
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #2 0x56396f8e31b8 in vpn_leak_zebra_vrf_sid_update_per_af bgpd/bgp_mplsvpn.c:386
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #3 0x56396f8e3ae8 in vpn_leak_zebra_vrf_sid_update bgpd/bgp_mplsvpn.c:448
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #4 0x56396f8e09b0 in vpn_leak_postchange bgpd/bgp_mplsvpn.h:271
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #5 0x56396f8f3e5b in vpn_leak_postchange_all bgpd/bgp_mplsvpn.c:3397
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #6 0x56396fa920ef in bgp_zebra_process_srv6_locator_chunk bgpd/bgp_zebra.c:3238
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #7 0x7f4b7abb2913 in zclient_read lib/zclient.c:4134
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #8 0x7f4b7ab62010 in thread_call lib/thread.c:1991
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #9 0x7f4b7aa5a418 in frr_run lib/libfrr.c:1185
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #10 0x56396f7d756d in main bgpd/bgp_main.c:505
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #11 0x7f4b7a479d09 in __libc_start_main ../csu/libc-start.c:308
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251-
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251:Direct leak of 32 byte(s) in 2 object(s) allocated from:
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #0 0x7f4b7ae92037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #1 0x7f4b7aa96e38 in qcalloc lib/memory.c:105
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #2 0x56396f8e5730 in ensure_vrf_tovpn_sid_per_af bgpd/bgp_mplsvpn.c:753
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #3 0x56396f8e608a in ensure_vrf_tovpn_sid bgpd/bgp_mplsvpn.c:846
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #4 0x56396f8e075d in vpn_leak_postchange bgpd/bgp_mplsvpn.h:259
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #5 0x56396f8f3e5b in vpn_leak_postchange_all bgpd/bgp_mplsvpn.c:3397
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #6 0x56396fa920ef in bgp_zebra_process_srv6_locator_chunk bgpd/bgp_zebra.c:3238
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #7 0x7f4b7abb2913 in zclient_read lib/zclient.c:4134
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #8 0x7f4b7ab62010 in thread_call lib/thread.c:1991
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #9 0x7f4b7aa5a418 in frr_run lib/libfrr.c:1185
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #10 0x56396f7d756d in main bgpd/bgp_main.c:505
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251- #11 0x7f4b7a479d09 in __libc_start_main ../csu/libc-start.c:308
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251-
./bgp_srv6l3vpn_to_bgp_vrf.test_bgp_srv6l3vpn_to_bgp_vrf/r2.bgpd.asan.603251-SUMMARY: AddressSanitizer: 656 byte(s) leaked in 6 allocation(s).
Signed-off-by: ryndia <dindyalsarvesh@gmail.com>
In case the full label stack is used, there may be
a table overrun happening. Avoid it by increasing the
size of the table.
Fixes: 27f4deed0a ("bgpd: update the mpls entry to handle return traffic")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The bmnc pointer is never null. Do not keep the test
on the pointer.
Fixes: 1069425868 ("bgpd: allocate label bound to received mpls vpn routes")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
After some VRF imports are removed and "no bgp retain route-target all"
is set, prefixes that are not imported anymore remain in the BGP table.
Parse the BGP table and remove un-imported prefixes in such a case.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
By default, bgpd stores all MPLS VPN SAFI prefixes unless the "no bgp
retain route-target all" option is used to store only prefixes that are
imported into local VRFs. The "no retain" option temporarily uses too
much memory, as all prefixes are stored in memory before the deletion of
non-imported prefixes is done.
Filter out non-imported prefixes before they are set into the BGP adj
RIB out.
Fixes: a486300b26 ("bgpd: implement retain route-target all behaviour")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Partially revert a486300b26 ("bgpd: implement retain route-target all
behaviour") in order to fix a memory consumption issue in the next
commit.
Fixes: a486300b26 ("bgpd: implement retain route-target all behaviour")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
When advertising an mpls vpn entry with a new label,
the return traffic is redirected to the local machine,
but the MPLS traffic is dropped.
Add an MPLS entry to handle MPLS packets which have
the new label value. Traffic is swapped to the original
label value from the mpls vpn next-hop entry; then it is
sent to the resolved next-hop of the original next-hop
from the mpls vpn next-hop entry.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The advertised label value from mpls vpn routes is not modified
when the advertised next-hop is modified to next-hop-self.
Actually, the original label value received is redistributed as
is, whereas the new_label value bound in the nexthop label
bind entry should be used.
Only the VPN entries that contain MPLS information, and that
are redistributed between distinct peers, will have a label
value to advertise.
- no SRv6 attribute
- no local prefix
- no exported VPN prefixes from a VRF
If the advertisement to a given peer has the next-hop modified,
then the new label value will be picked up. The considered cases
are peers configured with 'next-hop-self' option, or ebgp peerings
without the 'next-hop-unchanged' option.
Note that the the NLRI format will follow the rfc3107 format, as
multiple label values for MPLS VPN NLRIs are not supported (the
rfc8277 is not supported).
Note also that the case where an outgoing route-map is applied to
the outgoing neighbor is not considered in this commit.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Current implementation does not offer a new label to bind
to a received VPN route entry to redistribute with that new
label.
This commit allocates a label for VPN entries that have
a valid label, and a reachable next-hop interface that is
configured as follows:
> interface eth0
> mpls bgp l3vpn-multi-domain-switching
> exit
An mplsvpn next-hop label binding entry is created in an mpls
vpn nexthop label bind hash table of the current BGP instance.
That mpls vpn next-hop label entry is indexed by the (next-hop,
orig_label) values provided by the incoming updates, and shared
with other updates having the same (next-hop, orig_label) values.
A new 'LP_TYPE_BGP_L3VPN_BIND' label value is picked up from the
zebra mpls label pool, and assigned to the new_label attribute.
The 'bgp_path_info' appends a 'bgp_mplsvpn_nh_label_bind' structure
to the 'mplsvpn' union structure. Both structures in the union are not
used at the same, as the paths are either VRF updates to export, or MPLS
VPN updates. Using an union gives a 24 bytes memory gain compared to if
the structures had not been in an union (24 bytes compared to 48 bytes).
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The label per nexthop attributes take 24 bytes per bgp path
entry on AMD64 platform, and are only used for unicast paths.
The current patch-set introduces a similar attributes, but that
will be used only for l3vpn paths. To gain some memory on the
bgp_path_info structure in the next commit, do some changes.
Create an 'mplsvpn' union structure that will either include the
label per nexthop structs for ipv4 paths, or the l3vpn paths
structures. The 'label_nexthop_cache' and the 'label_nh_thread'
attributes of the 'bgp_path_info' structure are moved into an
union under a new structure called 'bgp_mplsvpn_label_nh_blnc'.
The flags attribute of 'bgp_path_info' is increased from 16 bits
to 32 bits, and the BGP_PATH_MPLSVPN_LABEL_NH flag is added to
know the 'mplsvpn' usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In the context of the ASBR facing an EBGP neighbor, or
facing an IBGP neighbor where the BGP updates received
are re-advertised with a modified next-hop, a new local
label will be re-advertised too, to replace the original
one.
Create a binding table, in the form of a hash list, from the
original labels to the new labels. Since labels can be the
same on several routers, set the next-hop and the label as
the keys. Add the needed API functions to manage the hash
list.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The label allocation per next-hop functionality is calling
the 'bgp_find_or_add_nexthop()' method using the SAFI_MPLS_VPN
safi parameter, whereas the call is supposed to apply to
unicast paths.
Fix this by using the SAFI_UNICAST safi parameter in the call.
Simplify the vpn_leak_from_vrf_get_per_nexthop_label() API by
removing the safi parameter from the function.
Fixes: 577be36a41 ("bgpd: add support for l3vpn per-nexthop label")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Create a bgp_labels_same() function that does the
same operations as the static function labels_same from
bgp_mplsvpn.c.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
It is impossible for the blnc statement to ever be NULL at
line 1470 as that the if statement at 1453 guarantees it
to be set to something.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
A timer attribute is added for each label nexthop entry, in order
to know when the last change occured.
The timer value will be used for troubleshooting by a show
command in the next commit.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The label allocation per nexthop mode requires to use a nexthop
tracking context. For redistributed routes, a nexthop tracking
context is created, and the resolution helps to know the real
nexthop ip address used. The below configuration example has
been used:
> vrf vrf1
> ip route 172.31.0.14/32 192.0.2.14
> ip route 172.31.0.15/32 192.0.2.12
> ip route 172.31.0.30/32 192.0.2.30
> exit
> router bgp 65500 vrf vrf1
> address-family ipv4 unicast
> redistribute static
> label vpn export per-nexthop
> [..]
The static routes are correctly imported in the BGP IPv4 RIB.
Contrary to label allocation per vrf mode, some nexthop tracking
are created/or reused:
> # show bgp vrf vrf1 nexthop
> 192.0.2.12 valid [IGP metric 0], #paths 3, peer 192.0.2.12
> if r1-eth1
> Last update: Fri Jan 13 15:49:42 2023
> 192.0.2.14 valid [IGP metric 0], #paths 1
> if r1-eth1
> Last update: Fri Jan 13 15:49:42 2023
> 192.0.2.30 valid [IGP metric 0], #paths 1
> if r1-eth1
> Last update: Fri Jan 13 15:49:51 2023
> [..]
This results in having a BGP VPN route for each of the static
routes:
> # show bgp ipv4 vpn
> [..]
> Route Distinguisher: 444:1
> *> 172.31.0.14/32 192.0.2.14@9< 0 32768 ?
> *> 172.31.0.15/32 192.0.2.12@9< 0 32768 ?
> *> 172.31.0.30/32 192.0.2.30@9< 0 32768 ?
> [..]
Without that patch, only the redistributed routes that rely on a
pre-existing nexthop tracking context could be exported.
Also, a command in the code about redistributed routes is modified
accordingly, to explain that redistribute routes may be submitted
to nexthop tracking in the case label allocation per next-hop is
used.
note:
VNC routes have been removed from the redistribution,
because of a test failure in the bgp_l3vpn_to_bgp_direct test.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
BGP MPLSVPN next hop label allocation was using only the next-hop
IP address. As MPLSVPN contexts rely on bnc contexts, the real
nexthop interface is known, and the LSP entry to enter can apply
to the specific interface. To illustrate, the BGP service is able
to handle the following two iproute2 commands:
> ip -f mpls route add 105 via inet 192.0.2.45 dev r1-eth1
> ip -f mpls route add 105 via inet 192.0.2.46 dev r1-eth2
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
> r1# show running-config
> [..]
> vrf vrf1
> ip route 172.31.0.30/32 172.31.0.20
> r1# show bgp vrf vrf1 nexthop
> [..]
> 172.31.0.20 valid [IGP metric 0], #paths 1
> gate 192.0.2.11
> gate 192.0.2.12
> Last update: Mon Jan 16 09:27:09 2023
> Paths:
> 1/1 172.31.0.30/32 VRF vrf1 flags 0x20018
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
A timer attribute is added for each label nexthop entry, in order
to know when the last change occured.
The timer value will be used for troubleshooting by a show
command in the next commit.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The label allocation per nexthop mode requires to use a nexthop
tracking context. For redistributed routes, a nexthop tracking
context is created, and the resolution helps to know the real
nexthop ip address used. The below configuration example has
been used:
> vrf vrf1
> ip route 172.31.0.14/32 192.0.2.14
> ip route 172.31.0.15/32 192.0.2.12
> ip route 172.31.0.30/32 192.0.2.30
> exit
> router bgp 65500 vrf vrf1
> address-family ipv4 unicast
> redistribute static
> label vpn export per-nexthop
> [..]
The static routes are correctly imported in the BGP IPv4 RIB.
Contrary to label allocation per vrf mode, some nexthop tracking
are created/or reused:
> # show bgp vrf vrf1 nexthop
> 192.0.2.12 valid [IGP metric 0], #paths 3, peer 192.0.2.12
> if r1-eth1
> Last update: Fri Jan 13 15:49:42 2023
> 192.0.2.14 valid [IGP metric 0], #paths 1
> if r1-eth1
> Last update: Fri Jan 13 15:49:42 2023
> 192.0.2.30 valid [IGP metric 0], #paths 1
> if r1-eth1
> Last update: Fri Jan 13 15:49:51 2023
> [..]
This results in having a BGP VPN route for each of the static
routes:
> # show bgp ipv4 vpn
> [..]
> Route Distinguisher: 444:1
> *> 172.31.0.14/32 192.0.2.14@9< 0 32768 ?
> *> 172.31.0.15/32 192.0.2.12@9< 0 32768 ?
> *> 172.31.0.30/32 192.0.2.30@9< 0 32768 ?
> [..]
Without that patch, only the redistributed routes that rely on a
pre-existing nexthop tracking context could be exported.
Also, a command in the code about redistributed routes is modified
accordingly, to explain that redistribute routes may be submitted
to nexthop tracking in the case label allocation per next-hop is
used.
note:
VNC routes have been removed from the redistribution,
because of a test failure in the bgp_l3vpn_to_bgp_direct test.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
BGP MPLSVPN next hop label allocation was using only the next-hop
IP address. As MPLSVPN contexts rely on bnc contexts, the real
nexthop interface is known, and the LSP entry to enter can apply
to the specific interface. To illustrate, the BGP service is able
to handle the following two iproute2 commands:
> ip -f mpls route add 105 via inet 192.0.2.45 dev r1-eth1
> ip -f mpls route add 105 via inet 192.0.2.46 dev r1-eth2
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
> r1# show running-config
> [..]
> vrf vrf1
> ip route 172.31.0.30/32 172.31.0.20
> r1# show bgp vrf vrf1 nexthop
> [..]
> 172.31.0.20 valid [IGP metric 0], #paths 1
> gate 192.0.2.11
> gate 192.0.2.12
> Last update: Mon Jan 16 09:27:09 2023
> Paths:
> 1/1 172.31.0.30/32 VRF vrf1 flags 0x20018
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The function vpn_leak_to_vrf_update_onevrf() has the RD parameter
set to NULL. Test the RD value before displaying it in the called
function.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
RD may be built based on an AS number. Like for the AS, the RD
may use the AS notation. The two below examples can illustrate:
RD 1.1:20 stands for an AS4B:NN RD with AS4B=65536 in dot format.
RD 0.1:20 stands for an AS2B:NNNN RD with AS2B=0.1 in dot+ format.
This commit adds the asnotation mode to prefix_rd2str() API so as
to pick up the relevant display.
Two new printfrr extensions are available to display the RD with
the two above display methods.
- The pRDD extension stands for dot asnotation format
- The pRDE extension stands for dot+ asnotation format.
- The pRD extension has been renamed to pRDP extension
The code is changed each time '%pRD' printf extension is called.
Possibly, the asnotation may change the output, then a macro defines
the asnotation mode to use. A side effect of forging the mode to
use is that the string could not be concatenated with other strings
in vty_out and snprintfrr. Those functions have been called multiple
times. When zlog_debug needs to display the RD with some other string,
the prefix_rd2str() old API is used instead of the printf extension.
Some code has been kept untouched:
- code related to running-config. Actually, wherever an RD is displayed,
its configured name should be dumped.
- bgp rfapi code
- bgp evpn multihoming code (partially done), since the logic is
missing to get the asnotation of 'struct bgp_evpn_es'.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
RFC7611 introduces new extended community ACCEPT_OWN and is already
implemented for FRR in the previous PR. However, this PR broke
compatibility about importing VPN routes.
Let's consider the following situation. There are 2 routers and these
routers connects with iBGP session. These routers have two VRF, vrf10
and vrf20, and RD 0:10, 0:20 is configured as the route distinguisher
of vrf10 and vrf20 respectively.
+- R1 --------+ +- R2 --------+
| +---------+ | | +---------+ |
| | VRF10 | | | | VRF10 | |
| | RD 0:10 +--------+ RD 0:10 | |
| +---------+ | | +---------+ |
| +---------+ | | +---------+ |
| | VRF20 +--------+ VRF20 | |
| | RD 0:20 | | | | RD 0:20 | |
| +---------+ | | +---------+ |
+-------------+ +-------------+
In this situation, the VPN routes from R1's VRF10 should be imported to
R2's VRF10 and the VPN routes from R2's VRF10 should be imported to R2's
VRF20. However, the current implementation of ACCEPT_OWN will always
reject routes if the RD of VPN routes are matched with the RD of VRF.
Similar issues will happen in local VRF2VRF route leaks. In such cases,
the route reaked from VRF10 should be imported to VRF20. However, the
current implementation of ACCEPT_OWN will not permit them.
+- R1 ---------------------+
| +------------+ |
| +----v----+ +----v----+ |
| | VRF10 | | VRF20 | |
| | RD 0:10 | | RD 0:10 | |
| +---------+ +---------+ |
+--------------------------+
So, this commit add additional condition in RD match. If the route
doesn't have ACCEPT_OWN extended community, source VRF check will be
skipped.
[RFC7611]: https://datatracker.ietf.org/doc/html/rfc7611
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
Before this patch we allowed importing routes between VRFs in the same node,
only for external routes, but not for local (e.g.: redistribute).
Relax here a bit, and allow importing local routes between VRFs when the RT
list is modified using route reflectors.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
```
unet> sh pe2 vtysh -c 'sh ip bgp ipv4 vpn detail-routes'
BGP table version is 4, local router ID is 10.10.10.20, vrf id 0
Default local pref 100, local AS 65001
Route Distinguisher: 192.168.2.2:2
BGP routing table entry for 192.168.2.2:2:10.0.0.0/24, version 1
not allocated
Paths: (1 available, best #1)
Not advertised to any peer
65000
192.168.2.1 from 0.0.0.0 (10.10.10.20) vrf RED(4) announce-nh-self
Origin incomplete, metric 0, localpref 50, valid, sourced, local, best (First path received)
Extended Community: RT:192.168.2.2:2
Originator: 10.10.10.20
Remote label: 2222
Last update: Tue Dec 20 13:01:20 2022
BGP routing table entry for 192.168.2.2:2:172.16.255.1/32, version 2
not allocated
Paths: (1 available, best #1)
Not advertised to any peer
65000
192.168.2.1 from 0.0.0.0 (10.10.10.20) vrf RED(4) announce-nh-self
Origin incomplete, localpref 50, valid, sourced, local, best (First path received)
Extended Community: RT:192.168.2.2:2
Originator: 10.10.10.20
Remote label: 2222
Last update: Tue Dec 20 13:01:20 2022
BGP routing table entry for 192.168.2.2:2:192.168.1.0/24, version 3
not allocated
Paths: (1 available, best #1)
Not advertised to any peer
65000
192.168.2.1 from 0.0.0.0 (10.10.10.20) vrf RED(4) announce-nh-self
Origin incomplete, localpref 50, valid, sourced, local, best (First path received)
Extended Community: RT:192.168.2.2:2
Originator: 10.10.10.20
Remote label: 2222
Last update: Tue Dec 20 13:01:20 2022
BGP routing table entry for 192.168.2.2:2:192.168.2.0/24, version 4
not allocated
Paths: (1 available, best #1)
Not advertised to any peer
65000
192.168.2.1 from 0.0.0.0 (10.10.10.20) vrf RED(4) announce-nh-self
Origin incomplete, metric 0, localpref 50, valid, sourced, local, best (First path received)
Extended Community: RT:192.168.2.2:2
Originator: 10.10.10.20
Remote label: 2222
Last update: Tue Dec 20 13:01:20 2022
Displayed 4 routes and 4 total paths
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
When the last IPv4 address of an interface is deleted, Linux removes all
routes includes BGP ones using this interface without any Netlink
advertisement. bgpd keeps them in RIB as valid (e.g. installed in FIB).
The previous patch invalidates the associated nexthop groups in zebra
but bgpd is not notified of the event.
> 2022/05/09 17:37:52.925 ZEBRA: [TQKA8-0276P] Not Notifying Owner: connected about prefix 29.0.0.0/24(40) 3 vrf: 7
Look for the bgp_path_info that are unsynchronized with the kernel and
flag them for refresh in their attributes. A VPN route leaking update is
calles and the refresh flag triggers a route refresh to zebra and then a
kernel FIB installation.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
At bgpd startup, VRF instances are sent from zebra before the
interfaces. When importing a l3vpn prefix from another local VRF
instance, the interfaces are not known yet. The prefix nexthop interface
cannot be set to the loopback or the VRF interface, which causes setting
invalid routes in zebra.
Update route leaking when the loopback or a VRF interface is received
from zebra.
At a VRF interface deletion, zebra voluntarily sends a
ZEBRA_INTERFACE_ADD message to move it to VRF_DEFAULT. Do not update if
such a message is received. VRF destruction will destroy all the related
routes without adding codes.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>