The advertised label value from mpls vpn routes is not modified
when the advertised next-hop is modified to next-hop-self.
Actually, the original label value received is redistributed as
is, whereas the new_label value bound in the nexthop label
bind entry should be used.
Only the VPN entries that contain MPLS information, and that
are redistributed between distinct peers, will have a label
value to advertise.
- no SRv6 attribute
- no local prefix
- no exported VPN prefixes from a VRF
If the advertisement to a given peer has the next-hop modified,
then the new label value will be picked up. The considered cases
are peers configured with 'next-hop-self' option, or ebgp peerings
without the 'next-hop-unchanged' option.
Note that the the NLRI format will follow the rfc3107 format, as
multiple label values for MPLS VPN NLRIs are not supported (the
rfc8277 is not supported).
Note also that the case where an outgoing route-map is applied to
the outgoing neighbor is not considered in this commit.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Current implementation does not offer a new label to bind
to a received VPN route entry to redistribute with that new
label.
This commit allocates a label for VPN entries that have
a valid label, and a reachable next-hop interface that is
configured as follows:
> interface eth0
> mpls bgp l3vpn-multi-domain-switching
> exit
An mplsvpn next-hop label binding entry is created in an mpls
vpn nexthop label bind hash table of the current BGP instance.
That mpls vpn next-hop label entry is indexed by the (next-hop,
orig_label) values provided by the incoming updates, and shared
with other updates having the same (next-hop, orig_label) values.
A new 'LP_TYPE_BGP_L3VPN_BIND' label value is picked up from the
zebra mpls label pool, and assigned to the new_label attribute.
The 'bgp_path_info' appends a 'bgp_mplsvpn_nh_label_bind' structure
to the 'mplsvpn' union structure. Both structures in the union are not
used at the same, as the paths are either VRF updates to export, or MPLS
VPN updates. Using an union gives a 24 bytes memory gain compared to if
the structures had not been in an union (24 bytes compared to 48 bytes).
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The label per nexthop attributes take 24 bytes per bgp path
entry on AMD64 platform, and are only used for unicast paths.
The current patch-set introduces a similar attributes, but that
will be used only for l3vpn paths. To gain some memory on the
bgp_path_info structure in the next commit, do some changes.
Create an 'mplsvpn' union structure that will either include the
label per nexthop structs for ipv4 paths, or the l3vpn paths
structures. The 'label_nexthop_cache' and the 'label_nh_thread'
attributes of the 'bgp_path_info' structure are moved into an
union under a new structure called 'bgp_mplsvpn_label_nh_blnc'.
The flags attribute of 'bgp_path_info' is increased from 16 bits
to 32 bits, and the BGP_PATH_MPLSVPN_LABEL_NH flag is added to
know the 'mplsvpn' usage.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
BGP cannot decide to disseminate the safi based upon the
bgp suppress-fib command. Modify the code to look at
the safi for the decision to communicate to a peer the
particular node.
Ticket: #3402926
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When using `addpath-tx-all` BGP announces all known paths instead of announcing
only an arbitrary number of best paths.
With this new command we can send N best paths to the neighbor. That means, we
send the best path, then send the second best path excluding the previous one,
and so on. In other words, we run best path selection algorithm N times before
we finish.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Reuse subgroup_process_announce_selected(). It does the same as we do here
duplicating the logic.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
> r1# show running-config
> [..]
> vrf vrf1
> ip route 172.31.0.30/32 172.31.0.20
> r1# show bgp vrf vrf1 nexthop
> [..]
> 172.31.0.20 valid [IGP metric 0], #paths 1
> gate 192.0.2.11
> gate 192.0.2.12
> Last update: Mon Jan 16 09:27:09 2023
> Paths:
> 1/1 172.31.0.30/32 VRF vrf1 flags 0x20018
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Memory leaks are observed in the cleanup code. When “no router bgp" is executed,
cleanup in that flow for aggregate-address command is not taken care.
fixes the below leak:
--
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444:Direct leak of 152 byte(s) in 1 object(s) allocated from:
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #0 0x7f163e911037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #1 0x7f163e4b9259 in qcalloc lib/memory.c:105
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #2 0x562bf42ebbd5 in bgp_aggregate_new bgpd/bgp_route.c:7239
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #3 0x562bf42f14e8 in bgp_aggregate_set bgpd/bgp_route.c:8421
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #4 0x562bf42f1e55 in aggregate_addressv6_magic bgpd/bgp_route.c:8592
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #5 0x562bf42be3f5 in aggregate_addressv6 bgpd/bgp_route_clippy.c:341
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #6 0x7f163e3f1e1b in cmd_execute_command_real lib/command.c:988
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #7 0x7f163e3f219c in cmd_execute_command lib/command.c:1048
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #8 0x7f163e3f2df4 in cmd_execute lib/command.c:1215
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #9 0x7f163e5a2d73 in vty_command lib/vty.c:544
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #10 0x7f163e5a79c8 in vty_execute lib/vty.c:1307
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #11 0x7f163e5ad299 in vtysh_read lib/vty.c:2216
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #12 0x7f163e593f16 in event_call lib/event.c:1995
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #13 0x7f163e47c839 in frr_run lib/libfrr.c:1185
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #14 0x562bf414e58d in main bgpd/bgp_main.c:505
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #15 0x7f163de66d09 in __libc_start_main ../csu/libc-start.c:308
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444-
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444:Direct leak of 152 byte(s) in 1 object(s) allocated from:
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #0 0x7f163e911037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #1 0x7f163e4b9259 in qcalloc lib/memory.c:105
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #2 0x562bf42ebbd5 in bgp_aggregate_new bgpd/bgp_route.c:7239
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #3 0x562bf42f14e8 in bgp_aggregate_set bgpd/bgp_route.c:8421
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #4 0x562bf42f1cde in aggregate_addressv4_magic bgpd/bgp_route.c:8543
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #5 0x562bf42bd258 in aggregate_addressv4 bgpd/bgp_route_clippy.c:255
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #6 0x7f163e3f1e1b in cmd_execute_command_real lib/command.c:988
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #7 0x7f163e3f219c in cmd_execute_command lib/command.c:1048
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #8 0x7f163e3f2df4 in cmd_execute lib/command.c:1215
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #9 0x7f163e5a2d73 in vty_command lib/vty.c:544
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #10 0x7f163e5a79c8 in vty_execute lib/vty.c:1307
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #11 0x7f163e5ad299 in vtysh_read lib/vty.c:2216
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #12 0x7f163e593f16 in event_call lib/event.c:1995
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #13 0x7f163e47c839 in frr_run lib/libfrr.c:1185
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #14 0x562bf414e58d in main bgpd/bgp_main.c:505
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444- #15 0x7f163de66d09 in __libc_start_main ../csu/libc-start.c:308
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444-
./bgp_local_asn_dot.test_bgp_local_asn_dot_agg/r3.bgpd.asan.3410444-SUMMARY: AddressSanitizer: 304 byte(s) leaked in 2 allocation(s).
Signed-off-by: Samanvitha B Bhargav <bsamanvitha@vmware.com>
Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This commit introduces a new method to associate a label to
prefixes to export to a VPNv4 backbone. All the methods to
associate a label to a BGP update is documented in rfc4364,
chapter 4.3.2. Initially, the "single label for an entire
VRF" method was available. This commit adds "single label
for each attachment circuit" method.
The change impacts the control-plane, because each BGP update
is checked to know if the nexthop has reachability in the VRF
or not. If this is the case, then a unique label for a given
destination IP in the VRF will be picked up. This label will
be reused for an other BGP update that will have the same
nexthop IP address.
The change impacts the data-plane, because the MPLs pop
mechanism applied to incoming labelled packets changes: the
MPLS label is popped, and the packet is directly sent to the
connected nexthop described in the previous outgoing BGP VPN
update.
By default per-vrf mode is done, but the user may choose
the per-nexthop mode, by using the vty command from the
previous commit. In the latter case, a per-vrf label
will however be allocated to handle networks that are not directly
connected. This is the case for local traffic for instance.
The change also include the following:
- ECMP case
In case a route is learnt in a given VRF, and is resolved via an
ECMP nexthop. This implies that when exporting the route as a BGP
update, if label allocation per nexthop is used, then two possible
MPLS values could be picked up, which is not possible with the
current implementation. Actually, the NLRI for VPNv4 stores one
prefix, and one single label value, not two. Today, RFC8277 with
multiple label capability is not yet available.
To avoid this corner case, when a route is resolved via more than one
nexthop, the label allocation per nexthop will not apply, and the
default per-vrf label will be chosen.
Let us imagine BGP redistributes a static route using the `172.31.0.20`
nexthop. The nexthop resolution will find two different nexthops fo a
unique BGP update.
> r1# show running-config
> [..]
> vrf vrf1
> ip route 172.31.0.30/32 172.31.0.20
> r1# show bgp vrf vrf1 nexthop
> [..]
> 172.31.0.20 valid [IGP metric 0], #paths 1
> gate 192.0.2.11
> gate 192.0.2.12
> Last update: Mon Jan 16 09:27:09 2023
> Paths:
> 1/1 172.31.0.30/32 VRF vrf1 flags 0x20018
To avoid this situation, BGP updates that resolve over multiple
nexthops are using the unique per-vrf label.
- recursive route case
Prefixes that need a recursive route to be resolved can
also be eligible for mpls allocation per nexthop. In that
case, the nexthop will be the recursive nexthop calculated.
To achieve this, all nexthop types in bnc contexts are valid,
except for the blackhole nexthops.
- network declared prefixes
Nexthop tracking is used to look for the reachability of the
prefixes. When the the 'no bgp network import-check' command
is used, network declared prefixes are maintained active,
even if there is no active nexthop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Before:
```
r1# sh ip bgp wide
BGP table version is 1, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 65001
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
* 172.16.255.254/32 192.168.2.2 0 0 (65003) i
*> 192.168.1.2 0 0 (65002) i
Displayed 1 routes and 2 total paths
r1#
```
After:
```
r1# sh ip bgp wide
BGP table version is 1, local router ID is 192.168.2.1, vrf id 0
Default local pref 100, local AS 65001
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
* 172.16.255.254/32 192.168.2.2 0 0 (65003) i
*> 192.168.1.2 0 0 (65002) i
Displayed 1 routes and 2 total paths
r1#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
"show bgp <afi> <safi> json detail" was incorrectly displaying header
information from route_vty_out_detail_header() as an element of the
"paths" array. This corrects the behavior for 'json detail' so that a
route holds a dictionary with keys for "paths" and header info, which
aligns with how we structure the output for a specific prefix, e.g.
"show bgp <afi> <safi> <prefix> json".
Before:
```
ub20# show ip bgp json detail
{
"vrfId": 0,
"vrfName": "default",
"tableVersion": 3,
"routerId": "100.64.0.222",
"defaultLocPrf": 100,
"localAS": 1,
"routes": { "2.2.2.2/32": [
{ <<<<<<<<< should be outside the array
"prefix":"2.2.2.2/32",
"version":1,
"advertisedTo":{
"192.168.122.12":{
"hostname":"ub20-2"
}
}
},
{
"aspath":{
"string":"Local",
"segments":[
],
"length":0
},
<snip>
```
After:
```
ub20# show ip bgp json detail
{
"vrfId": 0,
"vrfName": "default",
"tableVersion": 3,
"routerId": "100.64.0.222",
"defaultLocPrf": 100,
"localAS": 1,
"routes": { "2.2.2.2/32": {
"prefix": "2.2.2.2/32",
"version": "1",
"advertisedTo": {
"192.168.122.12":{
"hostname":"ub20-2"
}
}
,"paths": [
{
"aspath":{
"string":"Local",
"segments":[
],
"length":0
},
```
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Add a keyword self-originate" to extend current CLI commands to filter out self-originated routes only
a\) CLI to show ipv4/ipv6 self-originated routes
"show [ip] bgp [afi] [safi] [all] self-originate [wide|json]"
b\) CLI to show evpn self-originated routes
"show bgp l2vpn evpn route [detail] [type <ead|macip|multicast|es|prefix|1|2|3|4|5>] self-originate [json]"
Signed-off-by: Karl Quan <kquan@nvidia.com>
The bgp network command creates static routes with an optional
route-distinguisher parameter for VPN and EVPN address families.
Store the rd parameter in those static routes. This will be used
by the 'show running-config' later.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
These two functions always return 0. As such any and all
tests against this make no sense. Remove the return 0
to a void and follow the chain, logically, to remove all
the dead code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
```
unet> sh pe2 vtysh -c 'sh ip bgp ipv4 vpn detail-routes'
BGP table version is 4, local router ID is 10.10.10.20, vrf id 0
Default local pref 100, local AS 65001
Route Distinguisher: 192.168.2.2:2
BGP routing table entry for 192.168.2.2:2:10.0.0.0/24, version 1
not allocated
Paths: (1 available, best #1)
Not advertised to any peer
65000
192.168.2.1 from 0.0.0.0 (10.10.10.20) vrf RED(4) announce-nh-self
Origin incomplete, metric 0, localpref 50, valid, sourced, local, best (First path received)
Extended Community: RT:192.168.2.2:2
Originator: 10.10.10.20
Remote label: 2222
Last update: Tue Dec 20 13:01:20 2022
BGP routing table entry for 192.168.2.2:2:172.16.255.1/32, version 2
not allocated
Paths: (1 available, best #1)
Not advertised to any peer
65000
192.168.2.1 from 0.0.0.0 (10.10.10.20) vrf RED(4) announce-nh-self
Origin incomplete, localpref 50, valid, sourced, local, best (First path received)
Extended Community: RT:192.168.2.2:2
Originator: 10.10.10.20
Remote label: 2222
Last update: Tue Dec 20 13:01:20 2022
BGP routing table entry for 192.168.2.2:2:192.168.1.0/24, version 3
not allocated
Paths: (1 available, best #1)
Not advertised to any peer
65000
192.168.2.1 from 0.0.0.0 (10.10.10.20) vrf RED(4) announce-nh-self
Origin incomplete, localpref 50, valid, sourced, local, best (First path received)
Extended Community: RT:192.168.2.2:2
Originator: 10.10.10.20
Remote label: 2222
Last update: Tue Dec 20 13:01:20 2022
BGP routing table entry for 192.168.2.2:2:192.168.2.0/24, version 4
not allocated
Paths: (1 available, best #1)
Not advertised to any peer
65000
192.168.2.1 from 0.0.0.0 (10.10.10.20) vrf RED(4) announce-nh-self
Origin incomplete, metric 0, localpref 50, valid, sourced, local, best (First path received)
Extended Community: RT:192.168.2.2:2
Originator: 10.10.10.20
Remote label: 2222
Last update: Tue Dec 20 13:01:20 2022
Displayed 4 routes and 4 total paths
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Not all places were checking to see if soft reconfiguration
was turned on before calling into it to do all that work.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Column headers in BGP routes table are not aligned with data when
RPKI status is available. This was fixed to insert a space at the
beginning of the header and at the beginning of lines that do not
have RPKI status.
This fix requires that several testing templates be adjusted to
match the new output.
Signed-off-by: Wayne Morrison <wmorrison@netgate.com>
When bgp is using `bgp suppress-fib-pending` and the end
operator is using network statements, bgp was not sending
the network'ed prefix'es to it's peers. Fix this.
Also update the test cases for bgp_suppress_fib to test
this new corner case( I am sure that there are going to
be others that will need to be added ).
Fixes: #12112
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Re-work the bgp vni table to use separately keyed tables for type2
routes.
So, with type2 routes, we have the main table keyed off of the IP and a
new MAC table keyed off of MACs.
By separating out the two, we are able to run path selection separately
for the neigh and mac. Keeping the two separate is also more in-line
with what happens in zebra (they are managed comptletely seperate).
With this change type2 routes go into each table like so:
```
Remote MAC-IP -> IP Table & MAC Table
Remote MAC -> MAC Table
Local MAC-IP -> IP Table
Local MAC -> MAC Table
```
The difference for local is necessary because we should not ever allow
multiple paths for a local MAC.
Also cleaned up the commands for querying the vni tables:
```
show bgp vni all type ...
show bgp vni VNI type ...
```
Old commands will be deprecated in a separate commit.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Use the IP addr of type2/macip routes only for the hash/key
of the VNI table and carry the MAC in a path_info_extra attribute.
There is exists situations that can be hit during extended MAC mobility events
where two MACs could be pointing to the same IP in our global table. It
is requires very specific timings.
When that happens, BPG would (because we key'd on both MAC and IP)
install both into it's VNI table as separate entries, but zebra only
knows/needs to know about a single IP -> MAC relationship for it's VNI
table's type2 routes. So it was compleletly undeterministic which one
zebra would end up with in these timing situations.
With these changes, we move BGP's VNI table to key'd the same as Zebra's
and now a single IP will have multiple path_info's with a path_info_extra
that is carrying the MAC info for each path.
BGP will then run best path to deterministically decide which one to send to
zebra during the occasions where there exist's two possible MACs.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
RFC4364 describes peerings between multiple AS domains, to ease
the continuity of VPN services across multiple SPs. This commit
implements a sub-set of IETF option b) described in chapter 10 b.
The ASBR to ASBR approach is taken, with an EBGP peering between
the two routers. The EBGP peering must be directly connected to
the outgoing interface used. In those conditions, the next hop
is directly connected, and there is no need to have a transport
label to convey the VPN label. A new vty command is added on a
per interface basis:
This command if enabled, will permit to convey BGP VPN labels
without any transport labels (i.e. with implicit-null label).
restriction:
this command is used only for EBGP directly connected peerings.
Other use cases are not covered.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Move the logic to check the mp_nexthop_len against v6 lengths into its
own macro so we can apply that logic elsewhere on its own without always
checking for presence of BGP_ATTR_NEXT_HOP.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
The same as with prefix-list/route-maps/etc.
```
donatas-pc# show ip access-list spine
ZEBRA:
Zebra IP access list spine
seq 5 permit 200.200.200.200/32
BGP:
Zebra IP access list spine
seq 5 permit 200.200.200.200/32
PIM:
Zebra IP access list spine
seq 5 permit 200.200.200.200/32
BABELD:
Zebra IP access list spine
seq 5 permit 200.200.200.200/32
donatas-pc# show bgp ipv4 unicast access-list
ACCESSLIST_NAME Access-list name
spine
donatas-pc# show bgp ipv4 unicast access-list spine
BGP table version is 9, local router ID is 172.17.0.3, vrf id 0
Default local pref 100, local AS 1
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
*> 200.200.200.200/32
enp3s0 0 0 65000 3456 ?
Displayed 1 routes and 10 total paths
donatas-pc#
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
This patch adds transpostion_offset and transposition_len to bgp_sid_info,
and transposes SID only at bgp_zebra_announce.
Signed-off-by: Ryoga Saito <ryoga.saito@linecorp.com>
Currently the Wait for Install code ( bgp_suppress_fib ) does
not properly handle two states from zebra: ROUTE_INSTALL_FAILED
and BETTER_ADMIN_DISTANCE_WON. Pre this change the WFI code
would just never notify our peers about a route install failure
but more is needed. In the ROUTE_INSTALL_FAILED and the
BETTER_ADMIN_DISTANCE_WON we need to notify our peers with
a withdrawal about the route, else we will continue to
draw traffic to us when we cannot legally do so.
Why is this needed? In either case imagine that we've already
received a bgp route, installed it and sent to our peers.
In the Better admin distance won case, say a static route is installed
at this point in time we must stop advertising the route through
us since we are not installed. As such a withdrawal must be sent.
In the ROUTE_INSTALL_FAILED case, the code was not properly handling
the situation where we have Route A, it was successfully installed
and then we received a update to Route A that was attempted to be
installed but failed. In this case we also need to send a withdrawal
Finally update the bgp_suppress_fib topotest to test both of these
situations.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Description:
Incorrect behavior during best path selection for the imported routes.
Imported routes are always treated as eBGP routes.
Change is intended for fixing the issues related to
bgp best path selection for leaked routes:
- FRR does ecmp for the imported routes,
even without any ecmp related config.
If the same prefix is imported from two different VRFs,
then we configure the route with ecmp even without
any ecmp related config.
- Locally imported routes are preferred over imported
eBGP routes.
If there is a local route and eBGP learned route
for the same prefix, if we import both the routes,
imported local route is selected as best path.
- Same route is imported from multiple tenant VRFs,
both imported routes point to the same VRF in nexthop.
- When the same route with same nexthop in two different VRFs
is imported from those two VRFs, route is not installed as ecmp,
even though we had ecmp config.
- During best path selection, while comparing the paths for imported routes,
we should correctly refer to the original route i.e. the ultimate path.
- When the same route is imported from multiple VRF,
use the correct VRF while installing in the FIB.
- When same route is imported from two different tenant VRFs,
while comparing bgp path info as part of bgp best path selection,
we should ideally also compare corresponding VRFs.
See-also: https://github.com/FRRouting/frr/files/7169555/FRR.and.Cisco.VRF-Lite.Behaviour.pdf
Co-authored-by: Santosh P K <sapk@vmware.com>
Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
```
exit1-debian-9# show ip route 172.16.16.1/32
Routing entry for 172.16.16.1/32
Known via "bgp", distance 20, metric 0, best
Last update 00:00:28 ago
* 192.168.0.2, via eth1, weight 1
AS-Path : 65003
Communities : first 65001:2 65001:3
Large-Communities: 65001:1:1 65001:1:2 65001:1:3
Selection reason : First path received
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Description:
Change is intended for fixing the following issues related to vrf route leaking:
Routes with special nexthops i.e. blackhole/sink routes when imported,
are not programmed into the FIB and corresponding nexthop is set as 'inactive',
nexthop interface as 'unknown'.
While importing/leaking routes between VRFs, in case of special nexthop(ipv4/ipv6)
once bgp announces route(s) to zebra, nexthop type is incorrectly set as
NEXTHOP_TYPE_IPV6_IFINDEX/NEXTHOP_TYPE_IFINDEX
i.e. directly connected even though we are not able to resolve through an interface.
This leads to nexthop_active_check marking nexthop !NEXTHOP_FLAG_ACTIVE.
Unable to find the active nexthop(s), route is not programmed into the FIB.
Whenever BGP leaks routes, set the correct nexthop type, so that route gets resolved
and correctly programmed into the FIB, in the imported vrf.
Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
When bgp receives the admin distance from a redistribution statement
let's store that distance for later usage.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a terse option to show bgp summary to shorten output.
Do not show the following information about the BGP
instances: the number of RIB entries, the table version and the used memory.
The "terse" option can be used in combination with the "remote-as", "neighbor",
"failed" and "established" filters, and with the "wide" option as well.
Before patch:
ubuntu# show bgp summary remote-as 123456
IPv4 Unicast Summary (VRF default):
BGP router identifier X.X.X.X, local AS number XXX vrf-id 0
BGP table version 0
RIB entries 3, using 552 bytes of memory
Peers 5, using 3635 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.200.200.2 4 123456 81432 4 0 56092 0 00:00:13 572106 0 N/A
Displayed neighbors 1
Total number of neighbors 4
IPv6 Unicast Summary (VRF default):
BGP router identifier X.X.X.X, local AS number XXX vrf-id 0
BGP table version 0
RIB entries 3, using 552 bytes of memory
Peers 5, using 3635 KiB of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
% No matching neighbor
Total number of neighbors 5
After patch:
ubuntu# show bgp summary remote-as 123456 terse
IPv4 Unicast Summary (VRF default):
BGP router identifier X.X.X.X, local AS number XXX vrf-id 0
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
10.200.200.2 4 123456 81432 4 0 56092 0 00:00:13 572106 0 N/A
Displayed neighbors 1
Total number of neighbors 4
IPv6 Unicast Summary (VRF default):
BGP router identifier X.X.X.X, local AS number XXX vrf-id 1
% No matching neighbor
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
BGP configuration changes that imply recomputing the BGP route table
(e.g. modifying route-maps, setting bgp graceful-shutdown) might be a
long time process depending on the size of the BGP table and the
route-map numbers and complexity. For example, setups with full
Internet routes take something like one minute to reprocess all the
prefixes when graceful-shutdown is configured. During this time, a
"show bgp commands" request on vtysh results in blocking the shell until
the soft reconfigure table task is over.
This patch splits bgp_soft_reconfig_table task into thread jobs of 25K
prefixes.
Some tests on a full Internet route setup show that after reconfiguring
route-maps or graceful-shutdown, vtysh is not stucked anymore. We are
now able to request commands like "show bgp summary" after 1 or 2
seconds instead of 30 to 60s.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Just to be more informant, copying from Cisco.
```
exit1-debian-9# sh ip bgp
BGP table version is 4, local router ID is 192.168.100.1, vrf id 0
Default local pref 100, local AS 65534
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
N*> 10.0.2.0/24 0.0.0.0 0 32768 ?
N*> 192.168.0.0/24 0.0.0.0 0 32768 ?
N*> 192.168.10.0/24 0.0.0.0 0 32768 ?
N*> 192.168.100.1/32 0.0.0.0 0 32768 ?
Displayed 4 routes and 4 total paths
```
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
1. When a local ES is deleted or the ES-bond goes into bypass we treat
imported MAC-IP routes with that ES destination as remote routes instead
of sync routes. This requires a re-evaluation of the routes as
"non-local-dest" and an update to zebra.
2. When a ES is attached to an access port or the ES-bond transitions from
bypass to LACP-up we treat imported MAC-IP routes with that ES destination as
sync routes. This requires a re-evaluation of the routes as
"local-dest" and an update to zebra.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In the case of EVPN type-2 routes that use ES as destination, BGP
consolidates the nh (and nh->rmac mapping) and sends it to zebra as
a nexthop add.
This nexthop is the EVPN remote PE and is created by reference of
VRF IPvx unicast paths imported from EVPN Type-2 routes.
zebra uses this nexthop for setting up a remote neigh enty for the PE
and a remote fdb entry for the PE's RMAC.
Ticket: CM-31398
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Setup a mh_info indirection in the path extra. This has been done to
avoid increasing evpn route's path size to add new (type based) pointers
in path_info_extra.
Ticket: CM-31398
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. MAC-IP routes in the VPN routing table are linked to the
destination ES for efficient handling for remote ES link flaps.
2. Only MAC-IP paths whose nexthops are active (added via EAD-ES)
are imported into the VRF routing table.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
* Process FIB update in bgp_zebra_route_notify_owner() and call
group_announce_route() if route is installed
* When bgp update is received for a route which is not installed earlier
(flag BGP_NODE_FIB_INSTALLED is not set) and suppress fib is enabled
set the flag BGP_NODE_FIB_INSTALL_PENDING to indicate fib install is
pending for the route. The route will be advertised when zebra send
ZAPI_ROUTE_INSTALLED status.
* The advertisement delay (BGP_DEFAULT_UPDATE_ADVERTISEMENT_TIME)
is added to allow more routes to be sent in single update message.
This is required since zebra sends route notify message for each route.
The delay will be applied to update group timer which advertises
routes to peers.
Signed-off-by: kssoman <somanks@gmail.com>
The problem is that only prefixes were handled and any other `match`
commands were ignored. Let's do not forget them as well.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Implemented as per the feature description given in the source link.
Descriprion:
The BGP conditional advertisement feature uses the non-exist-map or exist-map
and the advertise-map keywords of the neighbor advertise-map command in order
to track routes by the route prefix.
non-exist-map :
If a route prefix is not present in output of the non-exist-map command, then
the route specified by the advertise-map command is announced.
exist-map :
If a route prefix is present in output of the exist-map command, then the route
specified by the advertise-map command is announced.
The conditional BGP announcements are sent in addition to the normal
announcements that a BGP router sends to its peers.
The conditional advertisement process is triggered by the BGP scanner process,
which runs every 60 seconds. This means that the maximum time for the conditional
advertisement to take effect is 60 seconds. The conditional advertisement can take
effect sooner, depending on when the tracked route is removed from the BGP table
and when the next instance of the BGP scanner occurs.
Sample Configuration on DUT
---------------------------
Router2# show running-config
Building configuration...
Current configuration:
!
frr version 7.6-dev-MyOwnFRRVersion
frr defaults traditional
hostname router
log file /var/log/frr/bgpd.log
log syslog informational
hostname Router2
service integrated-vtysh-config
!
debug bgp updates in
debug bgp updates out
!
debug route-map
!
ip route 200.200.0.0/16 blackhole
ipv6 route 2001:db8::200/128 blackhole
!
interface enp0s9
ip address 10.10.10.2/24
!
interface enp0s10
ip address 10.10.20.2/24
!
interface lo
ip address 2.2.2.2/24
ipv6 address 2001:db8::2/128
!
router bgp 2
bgp log-neighbor-changes
no bgp ebgp-requires-policy
neighbor 10.10.10.1 remote-as 1
neighbor 10.10.20.3 remote-as 3
!
address-family ipv4 unicast
network 2.2.2.0/24
network 200.200.0.0/16
neighbor 10.10.10.1 soft-reconfiguration inbound
neighbor 10.10.10.1 advertise-map ADVERTISE non-exist-map CONDITION
neighbor 10.10.20.3 soft-reconfiguration inbound
exit-address-family
!
address-family ipv6 unicast
network 2001:db8::2/128
network 2001:db8::200/128
neighbor 10.10.10.1 activate
neighbor 10.10.10.1 soft-reconfiguration inbound
neighbor 10.10.10.1 advertise-map ADVERTISE_6 non-exist-map CONDITION_6
neighbor 10.10.20.3 activate
neighbor 10.10.20.3 soft-reconfiguration inbound
exit-address-family
!
access-list CONDITION seq 5 permit 3.3.3.0/24
access-list ADVERTISE seq 5 permit 2.2.2.0/24
access-list ADVERTISE seq 6 permit 200.200.0.0/16
access-list ADVERTISE seq 7 permit 20.20.0.0/16
!
ipv6 access-list ADVERTISE_6 seq 5 permit 2001:db8::2/128
ipv6 access-list CONDITION_6 seq 5 permit 2001:db8::3/128
!
route-map ADVERTISE permit 10
match ip address ADVERTISE
!
route-map CONDITION permit 10
match ip address CONDITION
!
route-map ADVERTISE_6 permit 10
match ipv6 address ADVERTISE_6
!
route-map CONDITION_6 permit 10
match ipv6 address CONDITION_6
!
line vty
!
end
Router2#
Withdraw when non-exist-map prefixes present in BGP table:
----------------------------------------------------------
Router2# show ip bgp all wide
For address family: IPv4 Unicast
BGP table version is 8, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 10.10.10.1 0 0 1 i
*> 2.2.2.0/24 0.0.0.0 0 32768 i
*> 3.3.3.0/24 10.10.20.3 0 0 3 i
*> 200.200.0.0/16 0.0.0.0 0 32768 i
Displayed 4 routes and 4 total paths
For address family: IPv6 Unicast
BGP table version is 8, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 2001:db8::1/128 fe80::a00:27ff:fecb:ad57 0 0 1 i
*> 2001:db8::2/128 :: 0 32768 i
*> 2001:db8::3/128 fe80::a00:27ff:fe76:6738 0 0 3 i
*> 2001:db8::200/128 :: 0 32768 i
Displayed 4 routes and 4 total paths
Router2#
Router2# show ip bgp neighbors 10.10.10.1
BGP neighbor is 10.10.10.1, remote AS 1, local AS 2, external link
!--- Output suppressed.
For address family: IPv4 Unicast
Update group 9, subgroup 5
Packet Queue length 0
Inbound soft reconfiguration allowed
Community attribute sent to this neighbor(all)
Condition NON_EXIST, Condition-map *CONDITION, Advertise-map *ADVERTISE, status: Withdraw
1 accepted prefixes
For address family: IPv6 Unicast
Update group 10, subgroup 6
Packet Queue length 0
Inbound soft reconfiguration allowed
Community attribute sent to this neighbor(all)
Condition NON_EXIST, Condition-map *CONDITION_6, Advertise-map *ADVERTISE_6, status: Withdraw
1 accepted prefixes
!--- Output suppressed.
Router2#
Here 2.2.2.0/24 & 200.200.0.0/16 (prefixes in advertise-map) are withdrawn
by conditional advertisement scanner as the prefix(3.3.3.0/24) specified
by non-exist-map is present in BGP table.
Router2# show ip bgp all neighbors 10.10.10.1 advertised-routes wide
For address family: IPv4 Unicast
BGP table version is 8, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 0.0.0.0 0 1 i
*> 3.3.3.0/24 0.0.0.0 0 3 i
Total number of prefixes 2
For address family: IPv6 Unicast
BGP table version is 8, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 2001:db8::1/128 :: 0 1 i
*> 2001:db8::3/128 :: 0 3 i
*> 2001:db8::200/128 :: 0 32768 i
Total number of prefixes 3
Router2#
Advertise when non-exist-map prefixes not present in BGP table:
---------------------------------------------------------------
After Removing 3.3.3.0/24 (prefix present in non-exist-map),
2.2.2.0/24 & 200.200.0.0/16 (prefixes present in advertise-map) are advertised
Router2# show ip bgp all wide
For address family: IPv4 Unicast
BGP table version is 9, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 10.10.10.1 0 0 1 i
*> 2.2.2.0/24 0.0.0.0 0 32768 i
*> 200.200.0.0/16 0.0.0.0 0 32768 i
Displayed 3 routes and 3 total paths
For address family: IPv6 Unicast
BGP table version is 9, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 2001:db8::1/128 fe80::a00:27ff:fecb:ad57 0 0 1 i
*> 2001:db8::2/128 :: 0 32768 i
*> 2001:db8::200/128 :: 0 32768 i
Displayed 3 routes and 3 total paths
Router2#
Router2# show ip bgp neighbors 10.10.10.1
!--- Output suppressed.
For address family: IPv4 Unicast
Update group 9, subgroup 5
Packet Queue length 0
Inbound soft reconfiguration allowed
Community attribute sent to this neighbor(all)
Condition NON_EXIST, Condition-map *CONDITION, Advertise-map *ADVERTISE, status: Advertise
1 accepted prefixes
For address family: IPv6 Unicast
Update group 10, subgroup 6
Packet Queue length 0
Inbound soft reconfiguration allowed
Community attribute sent to this neighbor(all)
Condition NON_EXIST, Condition-map *CONDITION_6, Advertise-map *ADVERTISE_6, status: Advertise
1 accepted prefixes
!--- Output suppressed.
Router2#
Router2# show ip bgp all neighbors 10.10.10.1 advertised-routes wide
For address family: IPv4 Unicast
BGP table version is 9, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 1.1.1.0/24 0.0.0.0 0 1 i
*> 2.2.2.0/24 0.0.0.0 0 32768 i
*> 200.200.0.0/16 0.0.0.0 0 32768 i
Total number of prefixes 3
For address family: IPv6 Unicast
BGP table version is 9, local router ID is 2.2.2.2, vrf id 0
Default local pref 100, local AS 2
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 2001:db8::1/128 :: 0 1 i
*> 2001:db8::2/128 :: 0 32768 i
*> 2001:db8::200/128 :: 0 32768 i
Total number of prefixes 3
Router2#
Signed-off-by: Madhuri Kuruganti <k.madhuri@samsung.com>
Convert IPv4 and IPv6 unicast address family clis
to transactional clis and implementation of
northbound callbacks.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Instead of just counting the route suppressions, keep a reference for
all aggregations that are doing it. It should help the with the
following problems:
- Which aggregation suppressed the route.
- Double suppression
- Double unsuppression
- Avoids calling `bgp_process` if already suppressed/unsuppressed.
- Easier code maintenance and understanding
This also fixes a crash when modifying a route map that is
associated with a working aggregate-address.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add new aggregate-address option to selectively suppress routes based
on route map results.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
We currently have a global process queue for handling route
updates in bgp. This is fine, in general, except there are
places and times where we plug the queue for no new work
during certain peer states of bgp update delay. If we
happen to be processing multiple bgp instances on startup
why do we want to stop processing in vrf A when vrf B
is in a bit of a pickle?
Also this separation will allow us to start forward thinking
about how to fully integrate pthreads into route processing
in bgp.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add code to handle MED matching:
- When MED matches act as normal.
- When MED doesn't match do the following:
* Uninstall the aggregate route
* Unsuppress routes (if using summary-only)
- When MED didn't match, but now matches:
* Install the aggregate route
* Suppress all routes (if using summary-only)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When a SYNC route i.e. a route with a local ES as destination is
rxed on a switch (say L11) from an ES peer (say L12) a local
MAC/neigh entry is created on L11 with the local access port
as dest port.
Creation of the local entry triggers a local path advertisement from
L11. This could be a "locally-active" path or a "locally-inactive"
path. Inactive paths are advertised with the proxy bit.
To ensure that the local entry is not deleted by a SYNC route it is
given absolute precedence over peer-paths.
If there are two non-local paths with the same dest ES and same MM
seq number the non-proxy path is preferred. This is done to ensure
that we don't lose track of the peer-activity.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is the base patch that brings in support for Type-1 routes.
It includes support for -
- Ethernet Segment (ES) management
- EAD route handling
- MAC-IP (Type-2) routes with a non-zero ESI i.e. Aliasing for
active-active multihoming
- Initial infra for consistency checking. Consistency checking
is a fundamental feature for active-active solutions like MLAG.
We will try to levarage the info in the EAD-ES/EAD-EVI routes to
detect inconsitencies in access config across VTEPs attached to
the same Ethernet Segment.
Functionality Overview -
========================
1. Ethernet segments are created in zebra and associated with
access VLANs. zebra sends that info as ES and ES-EVI objects to BGP.
2. BGP advertises EAD-ES and EAD-EVI routes for the locally attached
ethernet segments.
3. Similarly BGP processes EAD-ES and EAD-EVI routes from peers
and translates them into ES-VTEP objects which are then sent to zebra
as remote ESs.
4. Each ES in zebra is associated with a list of active VTEPs which
is then translated into a L2-NHG (nexthop group). This is the ES
"Alias" entry
5. MAC-IP routes with a non-zero ESI use the alias entry created in
(4.) to forward traffic i.e. a MAC-ECMP is done to these remote-ES
destinations.
EAD route management (route table and key) -
============================================
1. Local EAD-ES routes
a. route-table: per-ES route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
b. route-table: per-VNI route-table
Not added
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
2. Remote EAD-ES routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
3. Local EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
4. Remote EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
Please refer to bgp_evpn_mh.h for info on how the data-structures are
organized.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Add ESI as an inline attribute field along with the other EVPN
attributes. This may be re-worked when the rest of the EVPN
attributes find a new home.
Some cleanup has been done to get rid of stale/unused references
to ESI. And also to consolidate duplicate definitions of ES ID
types.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is the bulk part extracted from "bgpd: Convert from `struct
bgp_node` to `struct bgp_dest`". It should not result in any functional
change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Problem Description:
=====================
+--+ +--+
|R1|-(192.201.202.1)----iBGP----(192.201.202.2)-|R2|
+--+ +--+
Routes on R2:
=============
S>* 202.202.202.202/32 [1/0] via 192.201.78.1, ens256, 00:40:48
Where, the next-hop network, 192.201.78.0/24, is a directly connected network address.
C>* 192.201.78.0/24 is directly connected, ens256, 00:40:48
Configurations on R1:
=====================
!
router bgp 201
bgp router-id 192.168.0.1
neighbor 192.201.202.2 remote-as 201
!
Configurations on R2:
=====================
!
ip route 202.202.202.202/32 192.201.78.1
!
router bgp 201
bgp router-id 192.168.0.2
neighbor 192.201.202.1 remote-as 201
!
address-family ipv4 unicast
redistribute static
exit-address-family
!
Step-1:
=======
R1 receives the route 202.202.202.202/32 from R2.
R1 installs the route in its BGP RIB.
Step-2:
=======
On R1, a connected interface address is added.
The address is the same as the next-hop of the BGP route received from R2 (192.201.78.1).
Point of Failure:
=================
R1 resolves the BGP route even though the route's next-hop is its own connected address.
Even though this appears to be a misconfiguration it would still be better to safeguard the code against it.
Fix:
====
When BGP receives a connected route from Zebra, it processes the
routes for the next-hop update.
While doing so, BGP must ignore routes whose next-hop address matches
the address of the connected route for which Zebra sent the next-hop update
message.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>