The BGP Router MAC extended community should be unique and not occur
multiple times. In a VRF-to-VRF route-leak scenario where EVPN routes
from a source VRF are leaked into the target VRF and then injected
back into EVPN from the target VRF, the resulting route had more than
one RMAC. With this fix, the resulting route will have only the
target VRF's RMAC.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
The EVPN advertise route-map may generate extended communities for an IPv4
or IPv6 route injected into EVPN as type-5. If so, allow for it and add
to it.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Implement the code to handle the other route-map options to generate
the link bandwidth, namely, to use the cumulative bandwidth or to
base this on the number of multipaths. In the latter case, a reference
bandwidth is internally chosen - the implementation uses a value of
1 Gbps.
These additional options mean that the prefix may need to be advertised
if there is a link bandwidth change, which is a new criteria. Define a
new path (change) flag to support this and implement the advertisement.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Certain extended communities cannot be repeated. An example is the
BGP link bandwidth extended community. Enhance the extended community
add function to ensure uniqueness, if requested.
Note: This commit does not change the lack of uniqueness for any of
the already-supported extended communities. Many of them such as the
BGP route target can obviously be present multiple times. Others like
the Router's MAC should most probably be present only once. The portions
of the code which add these may already be structured such that duplicates
do not arise.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Add new function `bgp_node_get_prefix()` and modify
the bgp code base to use it.
This is prep work for the struct bgp_dest rework.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Future work needs the ability to specify a
const struct prefix value. Iterate into
bgp a bit to get this started.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify more code to use `const struct prefix` throughout
bgp. This is all prep work for adding an accessor function
for bgp_node to get the prefix and reduce all the places that
code needs to be touched when we get that work done.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the EVPN advertise route-map is applied on a copy of the
original path_info and associated attribute, so that if the route-map
has SET clauses, they can operate properly. This closely follows
the model already in use in other route-map application code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
A route where ESI, GW IP, MAC and Label are all zero at the same time SHOULD
be treat-as-withdraw.
Invalid MAC addresses are broadcast or multicast MAC addresses. The route
MUST be treat-as-withdraw in case of an invalid MAC address.
As FRR support Ethernet NVO Tunnels only.
Route will be withdrawn when ESI, GW IP and MAC are zero or Invalid MAC
Test cases:
1) ET-5 route with valid RMAC extended community
2) ET-5 route no RMAC extended community
3) ET-5 route with Multicast MAC in RMAC extended community
4) ET-5 route with Broadcast MAC in RMAC extended community
Signed-off-by: Kishore Aramalla <karamalla@vmware.com>
RCA:
When we install IPv6 prefix imported from EVPN RT-5 in vrf, nexthop of the IPv6
route should be IPv4 mapped IPv6 address. In function
install_evpn_route_entry_in_vrf, we generate a new attribute with IPv4 mapped
IPv6 nexthop, but we use parent->attr while creating the actual route.
Thus, Ipv4 nexthop is assigned to this route.
Because of this incorrect nexthop, we observed a crash in function
update_ipv6nh_for_route_install.
Fix:
Pass the new attribute with Ipv4 mapped Ipv6 nexthop to
bgp_create_evpn_bgp_path_info
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
uint8_t * cannot be cast to uint32_t * unless the
pointed-to address is aligned according to uint32_t's
alignment rules. And it usually is not.
Signed-off-by: Santosh P K <sapk@vmware.com>
This moves all the DFLT_BGP_* stuff over to the new defaults mechanism.
bgp_timers_nondefault() added to get better file-scoping.
v2: moved everything into bgp_vty.c so that the core BGP code is
independent of the CLI-specific defaults. This should make the future
northbound conversion easier.
Signed-off-by: David Lamparter <equinox@diac24.net>
In rare situations, the local route in a VNI may not get selected as the
best route. One situation is during a race between bgp and zebra which
was addressed in a prior commit. This change addresses another situation
where due to a change of tunnel IP, it is possible that a received route
may be selected as the best route if the path selection needs to take
next hop IPs into consideration. This is a pretty convoluted scenario,
but the code should handle it and delete and withdraw the local route
as well as (re)install the received route.
Ticket: CM-24114
Reviewed By: CCR-9487
Testing Done:
1. Manual tests - note, problem is not readily reproducible
2. evpn-smoke - results documented in the ticket
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
when a pip is disabled or mac-vlan is not present
use anycast MAC as RMAC value.
Ticket:CM-26923
Reviewed By:CCR-9417
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
For self type-2 routes, do not assign system-rmac
as attribute RMAC value if advertise-pip is disable
or macvlan is not present.
Ticket:CM-26923
Reviewed By:CCR-9397
Testing Done:
pip is disabled under bgp vrf2 instance.
Trigger frr-restart.
Before fix:
*> [2]:[0]:[48]:[00:02:00:00:00:2e]:[32]:[45.0.4.4]
36.0.0.11 32768 i
ET:8 RT:5546:1004 RT:5546:4002 Rmac:00:02:00:00:00:2e
After fix:
*> [2]:[0]:[48]:[00:02:00:00:00:2e]:[32]:[45.0.4.4]
36.0.0.11 32768 i
ET:8 RT:5546:1004 RT:5546:4002 Rmac:44:38:39:ff:ff:01
TOR# ifquery vlan1004
auto vlan1004
iface vlan1004
address 45.0.4.4/24
vlan-id 1004
vrf vrf2
VNI: 4002 (known to the kernel)
Type: L3
Tenant VRF: vrf2
RD: 45.0.6.4:3
Originator IP: 36.0.0.11
Advertise-pip: Yes
System-IP: 27.0.0.11
System-MAC: 00:02:00:00:00:2e
Router-MAC: 44:38:39:ff:ff:01
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
By default announct Self Type-2 routes with
system IP as nexthop and system MAC as
nexthop.
An API to check type-2 is self route via
checking ipv4/ipv6 address from connected interfaces list.
An API to extract RMAC and nexthop for type-2
routes based on advertise-svi-ip knob is enabled.
When advertise-pip is enabled/disabled, trigger type-2
route update. For self type-2 routes to use
anycast or individual (rmac, nexthop) addresses.
Ticket:CM-26190
Reviewed By:
Testing Done:
Enable 'advertise-svi-ip' knob in bgp default instance.
the vrf instance svi ip is advertised with nexthop
as default instance router-id and RMAC as system MAC.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
In L3VNI add callback parse, vrr rmac value.
For non-zero vrr mac value, use it as anycast RMAC
and svi mac as individual rmac value.
If advertise-pip is disable or vrr rmac is not present
use svi mac as anycast rmac value for all routes.
Ticket:CM-26190
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Evpn Primary IP advertisement feature uses
individual system IP and system MAC for prefix (type-5)
and self type-2 routes.
The PIP knob is enabled by default for bgp vrf instance.
Configuration CLI for enable/disable PIP feature knob.
User can configure PIP system IP and MAC to retain as
permanent values.
For the PIP IP, the default behavior is to accept bgp default
instance's router-id. When the default instance router-id change,
reflect PIP IP assignment.
Reflect type-5 to use system-IP and system MAC as nexthop and RMAC
values.
Ticket:CM-26190
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* IPv6 routes received via a ibgp session with one of its own interface as
nexthop are getting installed in the BGP table.
*A common table to be implemented should take cares of both
ipv4 and ipv6 connected addresses.
Signed-off-by: Biswajit Sadhu sadhub@vmware.com
Problem statement:
When IPv4/IPv6 prefixes are received in BGP, bgp_update function registers the
nexthop of the route with nexthop tracking module. The BGP route is marked as
valid only if the nexthop is resolved.
Even for EVPN RT-5, route should be marked as valid only if the the nexthop is
resolvable.
Code changes:
1. Add nexthop of EVPN RT-5 for nexthop tracking. Route will be marked as valid
only if the nexthop is resolved.
2. Only the valid EVPN routes are imported to the vrf.
3. When nht update is received in BGP, make sure that the EVPN routes are
imported/unimported based on the route becomes valid/invalid.
Testcases:
1. At rtr-1, advertise EVPN RT-5 with a nexthop 10.100.0.2.
10.100.0.2 is resolved at rtr-2 in default vrf.
At rtr-2, remote EVPN RT-5 should be marked as valid and should be imported into
vrfs.
2. Make the nexthop 10.100.0.2 unreachable at rtr-2
Remote EVPN RT-5 should be marked as invalid and should be unimported from the
vrfs. As this code change deals with EVPN type-5 routes only, other EVPN routes
should be valid.
3. At rtr-2, add a static route to make nexthop 10.100.0.2 reachable.
EVPN RT-5 should again become valid and should be imported into the vrfs.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
There is a memory leak of the bgp node (route node)
in vni table while processing evpn remote route(s).
During the remote evpn route processing, a new route
is created in per vni route table, the refcount for
the route node is incremented twice. First refcount
is incremented during the node creation and the second
one when the bgp_info_add is added.
Post evpn route creation, the bgp node refcount needs
to be decremented.
Ticket:CM-26898,CM-26838,CM-27169
Reviewed By:CCR-9474
Testing Done:
In EVPN topology send 1K MAC routes then check the memory footprint
at the remote VTEP before sending 1K type-2 routes
and after flushing/withdrawal of the routes.
Before fix:
-----------
Initial memory footprint:
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 2008 32
BGP node : 182 152
BGP route : 96 112
With 1K MAC (type-2 routes)
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 6008 32
BGP node : 4182 152
BGP route : 2096 112
After cleaning up 1K MAC entries from source VTEP which triggers BGP withdraw
at the remote VTEP.
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 4008 32
BGP node : 2182 152 <-- Here 2K delta from initial count.
BGP route : 96 112
With fix:
---------
After 1K MAC entries cleaned up at the remote VTEP, the memory footprint
(BGP Node and Hash Bucket count) is equilibrium to start of the test.
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 2008 32
BGP node : 182 152
BGP route : 96 112
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
We make the assumption that ->attr is not NULL throughout
the code base. We are totally inconsistent about application
of this though.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In bgp_create_evpn_bgp_path_info we create a bgp_path_info
that should be returned since we need it later.
Found by Coverity Scan.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When creating a bgp_path_info for a type 4 route the pi->extra->parent
and the route node for the originating table were not being locked
properly. This will prevent BGP from not properly cleaning up
the data structures on cleanup.
Possibly every one of the functions that we use to create the
new bgp_path_info's should use an abstracted version of this code,
but I am unsure at this point in time if a type 4 should use the same
or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When L3vni is created with prefix-only flag,
the flag is set at bgp vrf instance level.
In the case of bgp instance is non auto created,
means user configured instance (i.e 'router bgp x vrf <name>')
Upon deletion of l3vni, clear the prefix-only flag from
bgp vrf instance.
Ticket:CM-21894
Reviewed By:CCR-9176
Testing Done:
vrf vrf1
vni 104001
exit-vrf
!
router bgp 650030 vrf vrf1
!
tor-21(config)# vrf vrf1
tor-21(config-vrf)# vni 104001 prefix-routes-only
tor-21(config-vrf)# no vni 104001 prefix-routes-only
tor-21(config-vrf)# end
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Problem:
With advertise_svi_ip knob enabled per vni.
Post vni flap, svi MAC-IP route are not originated.
Fix:
When a vni is flapped, upon re-add
send advetise_svi_ip knob to zebra.
Workaround:
re-configure advertise-svi-ip under l2vpn/evpn.
Ticket:CM-26001
Reviewed By:CCR-9118
Testing Done:
With advertise-svi-ip enabled under l2vpn/evpn
in bgp default instance.
Validated vni del/create post ifdown vxlan device
followed by ifup vxlan device.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Evpn extended communities like auto rts (import/export) should
check if its present in list before adding it, to avoid duplicate
addition.
L3vni_add callback from zebra to bgp may see updates to vnis.
The auto import/export rt derivation may call multiple times.
Testing Done:
Before:
TORC11# show bgp l2vpn evpn vni 4001
VNI: 4001 (known to the kernel)
Type: L3
Tenant VRF: vrf1
RD: 45.0.2.2:3
...
Import Route Target:
5546:4001
5546:4001
Export Route Target:
5546:4001
5546:4001
After:
VNI: 4001 (known to the kernel)
Type: L3
Tenant VRF: vrf1
RD: 45.0.2.2:3
...
Import Route Target:
5546:4001
Export Route Target:
5546:4001
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
In a situation where a VRF has configured route targets for importing
EVPN routes, this configuration may exist prior to the VRF being
ready to have EVPN routes installed into it - e.g., still missing
the L3VNI configuration or associated interface information. Ensure
that this is taken into account during EVPN route import and unimport.
Without this fix, EVPN routes would end up being prematurely imported
into the VRF routing table and consequently installed as inactive
(because the nexthop information would be incorrect when BGP informs
zebra).
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Introducing a 3rd state for route_map_apply library function: RMAP_NOOP
Traditionally route map MATCH rule apis were designed to return
a binary response, consisting of either RMAP_MATCH or RMAP_NOMATCH.
(Route-map SET rule apis return RMAP_OKAY or RMAP_ERROR).
Depending on this response, the following statemachine decided the
course of action:
State1:
If match cmd returns RMAP_MATCH then, keep existing behaviour.
If routemap type is PERMIT, execute set cmds or call cmds if applicable,
otherwise PERMIT!
Else If routemap type is DENY, we DENYMATCH right away
State2:
If match cmd returns RMAP_NOMATCH, continue on to next route-map. If there
are no other rules or if all the rules return RMAP_NOMATCH, return DENYMATCH
We require a 3rd state because of the following situation:
The issue - what if, the rule api needs to abort or ignore a rule?:
"match evpn vni xx" route-map filter can be applied to incoming routes
regardless of whether the tunnel type is vxlan or mpls.
This rule should be N/A for mpls based evpn route, but applicable to only
vxlan based evpn route.
Also, this rule should be applicable for routes with VNI label only, and
not for routes without labels. For example, type 3 and type 4 EVPN routes
do not have labels, so, this match cmd should let them through.
Today, the filter produces either a match or nomatch response regardless of
whether it is mpls/vxlan, resulting in either permitting or denying the
route.. So an mpls evpn route may get filtered out incorrectly.
Eg: "route-map RM1 permit 10 ; match evpn vni 20" or
"route-map RM2 deny 20 ; match vni 20"
With the introduction of the 3rd state, we can abort this rule check safely.
How? The rules api can now return RMAP_NOOP to indicate
that it encountered an invalid check, and needs to abort just that rule,
but continue with other rules.
As a result we have a 3rd state:
State3:
If match cmd returned RMAP_NOOP
Then, proceed to other route-map, otherwise if there are no more
rules or if all the rules return RMAP_NOOP, then, return RMAP_PERMITMATCH.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
This code is not returned anywhere in the system as that bgp
is by default multiple-instance 'only' now. So remove
the last remaining bits of it from the code base.
Remove BGP_ERR_MULTIPLE_INSTANCE_USED too.
Make bgp_get explicitly return BGP_SUCCESS
instead of 0.
Remove the multi-instance error code too.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Introducing a 3rd state for route_map_apply library function: RMAP_NOOP
Traditionally route map MATCH rule apis were designed to return
a binary response, consisting of either RMAP_MATCH or RMAP_NOMATCH.
(Route-map SET rule apis return RMAP_OKAY or RMAP_ERROR).
Depending on this response, the following statemachine decided the
course of action:
Action: Apply route-map match and return the result (RMAP_MATCH/RMAP_NOMATCH)
State1: Receveived RMAP_MATCH
THEN: If Routemap type is PERMIT, execute other rules if applicable,
otherwise we PERMIT!
Else: If Routemap type is DENY, we DENYMATCH right away
State2: Received RMAP_NOMATCH, continue on to next route-map, otherwise,
return DENYMATCH by default if nothing matched.
With reference to PR 4078 (https://github.com/FRRouting/frr/pull/4078),
we require a 3rd state because of the following situation:
The issue - what if, the rule api needs to abort or ignore a rule?:
"match evpn vni xx" route-map filter can be applied to incoming routes
regardless of whether the tunnel type is vxlan or mpls.
This rule should be N/A for mpls based evpn route, but applicable to only
vxlan based evpn route.
Today, the filter produces either a match or nomatch response regardless of
whether it is mpls/vxlan, resulting in either permitting or denying the
route.. So an mpls evpn route may get filtered out incorrectly.
Eg: "route-map RM1 permit 10 ; match evpn vni 20" or
"route-map RM2 deny 20 ; match vni 20"
With the introduction of the 3rd state, we can abort this rule check safely.
How? The rules api can now return RMAP_NOOP (or another enum) to indicate
that it encountered an invalid check, and needs to abort just that rule,
but continue with other rules.
Question: Do we repurpose an existing enum RMAP_OKAY or RMAP_ERROR
as the 3rd state (or create a new enum like RMAP_NOOP)?
RMAP_OKAY and RMAP_ERROR are used to return the result of set cmd.
We chose to go with RMAP_NOOP (but open to ideas),
as a way to bypass the rmap filter
As a result we have a 3rd state:
State3: Received RMAP_NOOP
Then, proceed to other route-map, otherwise return RMAP_PERMITMATCH by default.
Signed-off-by:Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>