For self type-2 routes, do not assign system-rmac
as attribute RMAC value if advertise-pip is disable
or macvlan is not present.
Ticket:CM-26923
Reviewed By:CCR-9397
Testing Done:
pip is disabled under bgp vrf2 instance.
Trigger frr-restart.
Before fix:
*> [2]:[0]:[48]:[00:02:00:00:00:2e]:[32]:[45.0.4.4]
36.0.0.11 32768 i
ET:8 RT:5546:1004 RT:5546:4002 Rmac:00:02:00:00:00:2e
After fix:
*> [2]:[0]:[48]:[00:02:00:00:00:2e]:[32]:[45.0.4.4]
36.0.0.11 32768 i
ET:8 RT:5546:1004 RT:5546:4002 Rmac:44:38:39:ff:ff:01
TOR# ifquery vlan1004
auto vlan1004
iface vlan1004
address 45.0.4.4/24
vlan-id 1004
vrf vrf2
VNI: 4002 (known to the kernel)
Type: L3
Tenant VRF: vrf2
RD: 45.0.6.4:3
Originator IP: 36.0.0.11
Advertise-pip: Yes
System-IP: 27.0.0.11
System-MAC: 00:02:00:00:00:2e
Router-MAC: 44:38:39:ff:ff:01
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
By default announct Self Type-2 routes with
system IP as nexthop and system MAC as
nexthop.
An API to check type-2 is self route via
checking ipv4/ipv6 address from connected interfaces list.
An API to extract RMAC and nexthop for type-2
routes based on advertise-svi-ip knob is enabled.
When advertise-pip is enabled/disabled, trigger type-2
route update. For self type-2 routes to use
anycast or individual (rmac, nexthop) addresses.
Ticket:CM-26190
Reviewed By:
Testing Done:
Enable 'advertise-svi-ip' knob in bgp default instance.
the vrf instance svi ip is advertised with nexthop
as default instance router-id and RMAC as system MAC.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
In L3VNI add callback parse, vrr rmac value.
For non-zero vrr mac value, use it as anycast RMAC
and svi mac as individual rmac value.
If advertise-pip is disable or vrr rmac is not present
use svi mac as anycast rmac value for all routes.
Ticket:CM-26190
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Evpn Primary IP advertisement feature uses
individual system IP and system MAC for prefix (type-5)
and self type-2 routes.
The PIP knob is enabled by default for bgp vrf instance.
Configuration CLI for enable/disable PIP feature knob.
User can configure PIP system IP and MAC to retain as
permanent values.
For the PIP IP, the default behavior is to accept bgp default
instance's router-id. When the default instance router-id change,
reflect PIP IP assignment.
Reflect type-5 to use system-IP and system MAC as nexthop and RMAC
values.
Ticket:CM-26190
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
* IPv6 routes received via a ibgp session with one of its own interface as
nexthop are getting installed in the BGP table.
*A common table to be implemented should take cares of both
ipv4 and ipv6 connected addresses.
Signed-off-by: Biswajit Sadhu sadhub@vmware.com
Problem statement:
When IPv4/IPv6 prefixes are received in BGP, bgp_update function registers the
nexthop of the route with nexthop tracking module. The BGP route is marked as
valid only if the nexthop is resolved.
Even for EVPN RT-5, route should be marked as valid only if the the nexthop is
resolvable.
Code changes:
1. Add nexthop of EVPN RT-5 for nexthop tracking. Route will be marked as valid
only if the nexthop is resolved.
2. Only the valid EVPN routes are imported to the vrf.
3. When nht update is received in BGP, make sure that the EVPN routes are
imported/unimported based on the route becomes valid/invalid.
Testcases:
1. At rtr-1, advertise EVPN RT-5 with a nexthop 10.100.0.2.
10.100.0.2 is resolved at rtr-2 in default vrf.
At rtr-2, remote EVPN RT-5 should be marked as valid and should be imported into
vrfs.
2. Make the nexthop 10.100.0.2 unreachable at rtr-2
Remote EVPN RT-5 should be marked as invalid and should be unimported from the
vrfs. As this code change deals with EVPN type-5 routes only, other EVPN routes
should be valid.
3. At rtr-2, add a static route to make nexthop 10.100.0.2 reachable.
EVPN RT-5 should again become valid and should be imported into the vrfs.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
There is a memory leak of the bgp node (route node)
in vni table while processing evpn remote route(s).
During the remote evpn route processing, a new route
is created in per vni route table, the refcount for
the route node is incremented twice. First refcount
is incremented during the node creation and the second
one when the bgp_info_add is added.
Post evpn route creation, the bgp node refcount needs
to be decremented.
Ticket:CM-26898,CM-26838,CM-27169
Reviewed By:CCR-9474
Testing Done:
In EVPN topology send 1K MAC routes then check the memory footprint
at the remote VTEP before sending 1K type-2 routes
and after flushing/withdrawal of the routes.
Before fix:
-----------
Initial memory footprint:
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 2008 32
BGP node : 182 152
BGP route : 96 112
With 1K MAC (type-2 routes)
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 6008 32
BGP node : 4182 152
BGP route : 2096 112
After cleaning up 1K MAC entries from source VTEP which triggers BGP withdraw
at the remote VTEP.
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 4008 32
BGP node : 2182 152 <-- Here 2K delta from initial count.
BGP route : 96 112
With fix:
---------
After 1K MAC entries cleaned up at the remote VTEP, the memory footprint
(BGP Node and Hash Bucket count) is equilibrium to start of the test.
root@TOR1:~# vtysh -c "show memory" | grep "Hash Bucket \|BGP node \|BGP route"
Hash Bucket : 2008 32
BGP node : 182 152
BGP route : 96 112
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
We make the assumption that ->attr is not NULL throughout
the code base. We are totally inconsistent about application
of this though.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In bgp_create_evpn_bgp_path_info we create a bgp_path_info
that should be returned since we need it later.
Found by Coverity Scan.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When creating a bgp_path_info for a type 4 route the pi->extra->parent
and the route node for the originating table were not being locked
properly. This will prevent BGP from not properly cleaning up
the data structures on cleanup.
Possibly every one of the functions that we use to create the
new bgp_path_info's should use an abstracted version of this code,
but I am unsure at this point in time if a type 4 should use the same
or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When L3vni is created with prefix-only flag,
the flag is set at bgp vrf instance level.
In the case of bgp instance is non auto created,
means user configured instance (i.e 'router bgp x vrf <name>')
Upon deletion of l3vni, clear the prefix-only flag from
bgp vrf instance.
Ticket:CM-21894
Reviewed By:CCR-9176
Testing Done:
vrf vrf1
vni 104001
exit-vrf
!
router bgp 650030 vrf vrf1
!
tor-21(config)# vrf vrf1
tor-21(config-vrf)# vni 104001 prefix-routes-only
tor-21(config-vrf)# no vni 104001 prefix-routes-only
tor-21(config-vrf)# end
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Problem:
With advertise_svi_ip knob enabled per vni.
Post vni flap, svi MAC-IP route are not originated.
Fix:
When a vni is flapped, upon re-add
send advetise_svi_ip knob to zebra.
Workaround:
re-configure advertise-svi-ip under l2vpn/evpn.
Ticket:CM-26001
Reviewed By:CCR-9118
Testing Done:
With advertise-svi-ip enabled under l2vpn/evpn
in bgp default instance.
Validated vni del/create post ifdown vxlan device
followed by ifup vxlan device.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Evpn extended communities like auto rts (import/export) should
check if its present in list before adding it, to avoid duplicate
addition.
L3vni_add callback from zebra to bgp may see updates to vnis.
The auto import/export rt derivation may call multiple times.
Testing Done:
Before:
TORC11# show bgp l2vpn evpn vni 4001
VNI: 4001 (known to the kernel)
Type: L3
Tenant VRF: vrf1
RD: 45.0.2.2:3
...
Import Route Target:
5546:4001
5546:4001
Export Route Target:
5546:4001
5546:4001
After:
VNI: 4001 (known to the kernel)
Type: L3
Tenant VRF: vrf1
RD: 45.0.2.2:3
...
Import Route Target:
5546:4001
Export Route Target:
5546:4001
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
In a situation where a VRF has configured route targets for importing
EVPN routes, this configuration may exist prior to the VRF being
ready to have EVPN routes installed into it - e.g., still missing
the L3VNI configuration or associated interface information. Ensure
that this is taken into account during EVPN route import and unimport.
Without this fix, EVPN routes would end up being prematurely imported
into the VRF routing table and consequently installed as inactive
(because the nexthop information would be incorrect when BGP informs
zebra).
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Introducing a 3rd state for route_map_apply library function: RMAP_NOOP
Traditionally route map MATCH rule apis were designed to return
a binary response, consisting of either RMAP_MATCH or RMAP_NOMATCH.
(Route-map SET rule apis return RMAP_OKAY or RMAP_ERROR).
Depending on this response, the following statemachine decided the
course of action:
State1:
If match cmd returns RMAP_MATCH then, keep existing behaviour.
If routemap type is PERMIT, execute set cmds or call cmds if applicable,
otherwise PERMIT!
Else If routemap type is DENY, we DENYMATCH right away
State2:
If match cmd returns RMAP_NOMATCH, continue on to next route-map. If there
are no other rules or if all the rules return RMAP_NOMATCH, return DENYMATCH
We require a 3rd state because of the following situation:
The issue - what if, the rule api needs to abort or ignore a rule?:
"match evpn vni xx" route-map filter can be applied to incoming routes
regardless of whether the tunnel type is vxlan or mpls.
This rule should be N/A for mpls based evpn route, but applicable to only
vxlan based evpn route.
Also, this rule should be applicable for routes with VNI label only, and
not for routes without labels. For example, type 3 and type 4 EVPN routes
do not have labels, so, this match cmd should let them through.
Today, the filter produces either a match or nomatch response regardless of
whether it is mpls/vxlan, resulting in either permitting or denying the
route.. So an mpls evpn route may get filtered out incorrectly.
Eg: "route-map RM1 permit 10 ; match evpn vni 20" or
"route-map RM2 deny 20 ; match vni 20"
With the introduction of the 3rd state, we can abort this rule check safely.
How? The rules api can now return RMAP_NOOP to indicate
that it encountered an invalid check, and needs to abort just that rule,
but continue with other rules.
As a result we have a 3rd state:
State3:
If match cmd returned RMAP_NOOP
Then, proceed to other route-map, otherwise if there are no more
rules or if all the rules return RMAP_NOOP, then, return RMAP_PERMITMATCH.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
This code is not returned anywhere in the system as that bgp
is by default multiple-instance 'only' now. So remove
the last remaining bits of it from the code base.
Remove BGP_ERR_MULTIPLE_INSTANCE_USED too.
Make bgp_get explicitly return BGP_SUCCESS
instead of 0.
Remove the multi-instance error code too.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Introducing a 3rd state for route_map_apply library function: RMAP_NOOP
Traditionally route map MATCH rule apis were designed to return
a binary response, consisting of either RMAP_MATCH or RMAP_NOMATCH.
(Route-map SET rule apis return RMAP_OKAY or RMAP_ERROR).
Depending on this response, the following statemachine decided the
course of action:
Action: Apply route-map match and return the result (RMAP_MATCH/RMAP_NOMATCH)
State1: Receveived RMAP_MATCH
THEN: If Routemap type is PERMIT, execute other rules if applicable,
otherwise we PERMIT!
Else: If Routemap type is DENY, we DENYMATCH right away
State2: Received RMAP_NOMATCH, continue on to next route-map, otherwise,
return DENYMATCH by default if nothing matched.
With reference to PR 4078 (https://github.com/FRRouting/frr/pull/4078),
we require a 3rd state because of the following situation:
The issue - what if, the rule api needs to abort or ignore a rule?:
"match evpn vni xx" route-map filter can be applied to incoming routes
regardless of whether the tunnel type is vxlan or mpls.
This rule should be N/A for mpls based evpn route, but applicable to only
vxlan based evpn route.
Today, the filter produces either a match or nomatch response regardless of
whether it is mpls/vxlan, resulting in either permitting or denying the
route.. So an mpls evpn route may get filtered out incorrectly.
Eg: "route-map RM1 permit 10 ; match evpn vni 20" or
"route-map RM2 deny 20 ; match vni 20"
With the introduction of the 3rd state, we can abort this rule check safely.
How? The rules api can now return RMAP_NOOP (or another enum) to indicate
that it encountered an invalid check, and needs to abort just that rule,
but continue with other rules.
Question: Do we repurpose an existing enum RMAP_OKAY or RMAP_ERROR
as the 3rd state (or create a new enum like RMAP_NOOP)?
RMAP_OKAY and RMAP_ERROR are used to return the result of set cmd.
We chose to go with RMAP_NOOP (but open to ideas),
as a way to bypass the rmap filter
As a result we have a 3rd state:
State3: Received RMAP_NOOP
Then, proceed to other route-map, otherwise return RMAP_PERMITMATCH by default.
Signed-off-by:Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
IMET/type-3 routes are used by VTEPs to advertise the flood mode for BUM
traffic via the PMSI tunnel attribute. If a type-3 route is not rxed from
a remote-VTEP we default to "no-head-end-rep" for that remote-VTEP. In such
cases static-config such as PIM is likely used for BUM flooding.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
IMET route is optional if the flood mode is PIM-SM and serves
no functional purpose. So this change limits type-3 route generation
to flood-mode=head-end-replication.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If PIM-SM if used for BUM flooding the multicast group address can be
configured per-vxlan-device. BGP receives this config from zebra via
the L2 VNI add/update.
Sample output -
root@TORS1:~# vtysh -c "show bgp l2vpn evpn vni 1000" |grep Mcast
Mcast group: 239.1.1.100
root@TORS1:~#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
During L3VNI add, non-default RD value is not replayed
correctly. Instead of picking non-default value it picks
up auto RD value which is derived based on router-id.
Indentation issue: Remove additional space from
L3VNI running config output.
Ticket:CM-24320
Reviewed By:CCR-8437
Testing Done:
Bring up evpn configuration with L3vni up with non-default
RD value, perform peerlink flap, l3vni flap which removes
all VNIS and readds with RD and RT values.
The configured RD and RTs are replayed.
Post L3VNI flap
router bgp 5546 vrf vrf2
!
address-family l2vpn evpn
rd 45.0.66.2:6
route-target import 20001:1
route-target export 20001:1
exit-address-family
TORC11# show bgp l2vpn evpn vni 4002
VNI: 4002 (known to the kernel)
Type: L3
Tenant VRF: vrf2
RD: 45.0.66.2:6
Originator IP: 36.0.0.11
Advertise-gw-macip : n/a
Import Route Target:
20001:1
Export Route Target:
20001:1
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
When a bgp-peer comes up prior to l3vnis are up in bgpd.
The EVPN routes (type-2/type-5) are learnt via peer.
The routes can have one of interface's MAC in rmac attribute.
The self rmac check would bypass as l3vni is not present.
Once l3vni has come up in bgpd, while installing evpn
routes in vrf table, perform rmac attribute check against self mac.
The routes with rmac of ours will be removed via re-scan
of routes during bgp_mac_rescan_all_evpn_tables when
interface mac is added to bgp.
Ticket:CM-24224
Reviewed By:CCR-8423
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulunetworks.com>
Rename {bgp,zvrf}_def{ault} to {bgp,zvrf}_evpn where it makes sense,
i.e. when they contain the EVPN instance.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
For default RT, this uses the correct ASN to derive the RT (ASN of the
EVPN VRF).
It also stores them in the EVPN VRF's hash tables rather than in the
default's one.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
This change stores the mapping in the hash table of the EVPN VRF rather
than the one of the default VRF.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
This sends local routes in overlay VRFs to the EPVN VRF when
redistribute configurations are present, rather than to the default VRF.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Refine check on whether a route can be injected into EVPN to allow
EVPN-sourced routes to be injected back into another instance.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
IPv4 or IPv6 unicast routes which are imported from EVPN routes
(type-2 or type-5) and installed in a BGP instance can be leaked
to another instance.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
The implementation currently derives this L3 interface for EVPN tenant
routes using special code that looks at route flags. This patch
exchanges the L3 interface between zebra and bgpd as part of the L3-VNI
exchange in order to eliminate some this special code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Withdraw flag is not sufficient to call bgp_update vs. bgp_withdraw()
processing for a given BGP evpn update message.
When a bgp update needs to be treated as an implicit withdraw
(e.g., due to malformed attribute), the code wasn't handling
things properly.
Rearranging attribute pass field to type-5 route processing and aligning
similar to done for other routes (type2/type-3).
Ticket:CM-24003
Reviewed By:CCR-8330
Testing Done:
Singed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Currently we are hardcoding it at the time of attr building to
ingress-replication. This is just a code clean-up and has no
functional impact.
Ticket: CM-23790
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Executed some evpn related tests with valgrind and saw some errors
related to uninitialized memory and overlapping memcpy. This commit
fixes those.
Ticket: CM-21218
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8249
When an inactive-neigh delete is rxed bgp will not have a local path to
remove (and re-run path selection). Instead it simply re-installs the
current best remote path if any.
Ticket: CM-23018
Testing Done: evpn-min
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Move the info filling for zebra mac-ip install (sent by bgpd) to a
common place.
The commit also fixes missing ROUTER flag for one of the cases
added in a code branch that doesn't have the ROUTER changes -
[
6d8c603a
bgpd: use IP address as tie breaker if the MM seq number is the same
]
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
During L3VNI add delete, configured non-default
route-target is not replayed correctly.
Non-default route-target should only be deleted
during unconfiguring under bgp vrf instance,
during delete of l3vni only unmap from the VRF.
during addition of l3vni map back to the VRF
Ticket:CM-21482
Testing Done:
Bring up evpn configuration with L3vni up with
non-default route-target.
Perform delete/add of L3vni and validated non-default
route-target is mapped back to vrf.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
EVPN route's attribute changes,
mark attribute change flag to imported unicast route.
A scenario where AS_PATH attribute have changed for an EVPN type-5
route, set attribute change
to imported route.
Ticket:CM-23008
Reviewed By:
Testing Done:
Validated via marking EVPN route with AS_PATH prepand.
At the receiving VTEP, ensure attribute change flag is set to
imported unicast route and bgp update sent to VTEPs subsequent
bgp peers with AS_PATH prepend update.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
When "default-originate ipv4" is configured, a type-5 route is installed in
the local node and advertised to the peer with auto-rd.
When the above was followed by configuring an RD in IP VRF, Type-5 are
generated for only the non-default routes.
Fixed this issue by withdrawing the default route with auto-rd and advertising
the route with confiured RD.
Signed-off-by: Kishore Aramalla karamalla@vmware.com
Duplicate address detection configuration clis
under bgp l2vpn evpn config mode.
- Enabled/Disable (global knob) for feature.
- Configure cli for duplicate detection action
freeze and freze until time (auto-recovery).
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Enable/disable duplicate address detection
there are 3 actions
warning-only: Default action which generates
only frr warning (syslog) to user for any
duplicate detecton
freeze: Permanently freezes address, manual
intervene required.
freeze with time: An address will recover once
the time has expired (auto-recovery).
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The bgp_info data is stored as a void pointer in `struct bgp_node`.
Abstract retrieval of this data and setting of this data
into functions so that in the future we can move around
what is stored in bgp_node.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The motivation for this patch is to address a concerning behavior of
tx-addpath-bestpath-per-AS. Prior to this patch, all paths' TX ID was
pre-determined as the path was received from a peer. However, this meant
that any time the path selected as best from an AS changed, bgpd had no
choice but to withdraw the previous best path, and advertise the new
best-path under a new TX ID. This could cause significant network
disruption, especially for the subset of prefixes coming from only one
AS that were also communicated over a bestpath-per-AS session.
The patch's general approach is best illustrated by
txaddpath_update_ids. After a bestpath run (required for best-per-AS to
know what will and will not be sent as addpaths) ID numbers will be
stripped from paths that no longer need to be sent, and held in a pool.
Then, paths that will be sent as addpaths and do not already have ID
numbers will allocate new ID numbers, pulling first from that pool.
Finally, anything left in the pool will be returned to the allocator.
In order for this to work, ID numbers had to be split by strategy. The
tx-addpath-All strategy would keep every ID number "in use" constantly,
preventing IDs from being transferred to different paths. Rather than
create two variables for ID, this patch create a more generic array that
will easily enable more addpath strategies to be implemented. The
previously described ID manipulations will happen per addpath strategy,
and will only be run for strategies that are enabled on at least one
peer.
Finally, the ID numbers are allocated from an allocator that tracks per
AFI/SAFI/Addpath Strategy which IDs are in use. Though it would be very
improbable, there was the possibility with the free-running counter
approach for rollover to cause two paths on the same prefix to get
assigned the same TX ID. As remote as the possibility is, we prefer to
not leave it to chance.
This ID re-use method is not perfect. In some cases you could still get
withdraw-then-add behaviors where not strictly necessary. In the case of
bestpath-per-AS this requires one AS to advertise a prefix for the first
time, then a second AS withdraws that prefix, all within the space of an
already pending MRAI timer. In those situations a withdraw-then-add is
more forgivable, and fixing it would probably require a much more
significant effort, as IDs would need to be moved to ADVs instead of
paths.
Signed-off-by Mitchell Skiba <mskiba@amazon.com>
Allow some debug notification when we are unable to talk
to zebra due to the connection not being there yet.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This change is a fixup to -
7b5e18 - bgpd: use IP address as tie breaker if the MM seq number is the
same
And is being done in response to review comments. This commit brings no
functional change; simply moves around code for easier maintanence.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Same sequence number handling is specified by RFC 7432 -
[
If two (or more) PEs advertise the same MAC address with the same
sequence number but different Ethernet segment identifiers, a PE that
receives these routes selects the route advertised by the PE with the
lowest IP address as the best route.
If the PE is the originator of the MAC route and it receives the same
MAC address with the same sequence number that it generated, it will
compare its own IP address with the IP address of the remote PE and
will select the lowest IP. If its own route is not the best one, it
will withdraw the route.
]
To implement that specification this commit uses nexthop IP as a tie
breaker between two paths of equal seq number with lower IP winning.
Now if a local path already exists with the same sequence number but higher
(local-VTEP) IP it is evicted (deleted and withdrawn from the peers) and
the winning new remote path is installed in zebra. This is existing code
and handled implicitly via evpn_route_select_install.
If a local path is rxed from zebra with the same sequence as the
current remote winner it is rejected (not installed in the bgp
routing tables) and zebra is asked to re-install the older/remote winner.
This is a race condition that can only happen if bgp's add and zebra's add
cross paths. Additional handling has been added in this commit via
evpn_cleanup_local_non_best_route to take care of the race condition.
Ticket: CM-22674
Reviewed By: CCR-7937
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is needed to install the remote dst when a more preferred local
path is removed.
Ticket: CM-22685
Reviewed By: CCR-7936
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The evpn_vtep_ip_cmp function must return positive and negative
numbers for when we are doing sorted linked list inserts.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The purpose of adding a l2vni as an sorted list is
shot in the foot when the l2vni compare function only
returns 0 or 1. This will cause subtle crashes when
we add sorted and we end up with multiple list node pointing
to the same thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add the '[no] flood <disable|head-end-replication>' command
to the l2vpn evpn afi/safi sub commands for bgp. This command
when entered as 'flood disable' will turn off type 3 route
generation for the transmittal of the type 3 route necessary
for BUM replication on the remote VTEP. Additionally it will
turn off the BUM handling via the new zebra command,
ZEBRA_VXLAN_FLOOD_CONTROL.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Do a straight conversion of `struct bgp_info` to `struct bgp_path_info`.
This commit will setup the rename of variables as well.
This is being done because `struct bgp_info` is not descriptive
of what this data actually is. It is path information for routes
that we keep to build the actual routes nexthops plus some extra
information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem encountered where using the aggregate-address command in an
evpn environment did not work properly. Depending on the order of
actions, the aggregate may not be created or removed when either the
commands were issued or routes come and go.
Ticket: CM-20585
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Scan all bgp vrf instances and respective L3VNI against the VNI which is being configured.
Ticket:CM-21859
Testing Done:
Configure l3vni,
try to configure same vni as l2vni under router bgp, address-family
l2vpn evpn.
The configuration is rejected.
show evpn vni
VNI Type VxLAN IF # MACs # ARPs # Remote VTEPs Tenant VRF
4001 L3 vx-4001 0 0 n/a vrf1
TOR(config)# router bgp 5546
TOR(config-router)# address-family l2vpn evpn
TOR(config-router-af)# vni 4001
% Failed to create VNI
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Implement procedures similar to what is specified in
https://tools.ietf.org/html/draft-malhotra-bess-evpn-irb-extended-mobility
in order to support extended mobility scenarios in EVPN. These are scenarios
where a host/VM move results in a different (MAC,IP) binding from earlier.
For example, a host with an address assignment (IP1, MAC1) moves behind a
different PE (VTEP) and has an address assignment of (IP1, MAC2) or a host
with an address assignment (IP5, MAC5) has a different assignment of (IP6,
MAC5) after the move. Note that while these are described as "move" scenarios,
they also cover the situation when a VM is shut down and a new VM is spun up
at a different location that reuses the IP address or MAC address of the
earlier instance, but not both. Yet another scenario is a MAC change for an
attached host/VM i.e., when the MAC of an attached host changes from MAC1 to
MAC2. This is necessary because there may already be a non-zero sequence
number associated with MAC2. Also, even though (IP, MAC1) is withdrawn before
(IP, MAC2) is advertised, they may propagate through the network differently.
The procedures continue to rely on the MAC mobility extended community
specified in RFC 7432 and already supported by the implementation, but
augment it with a inheritance mechanism that understands the relationship
of the host MACIP (ARP/neighbor table entry) to the underlying MAC (MAC
forwarding database entry). In FRR, this relationship is understood by the
zebra component which doubles as the "host mobility manager", so the MAC
mobility sequence numbers are determined through interaction between bgpd
and zebra.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
backet->data must be non-NULL( look at hash_get ) as such
we do not need to check for NULL values for this when
we retrieve data from the backet.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the presence of L3VNI is checked before we generate
Router MAC and L3 Route Target extended communities. Without this
check, the router would send an all-zeros RMAC in some situations,
which may cause problems for receivers.
Ticket: CM-21014
Testing Done:
a) Verification of failed scenario
b) Interop verification by Scott Laffer
c) evpn-smoke
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
EVPN ND ext community support NA flag R-bit, to have proxy ND.
Set R-bit in EVPN NA if a given router is default gateway or there is a
local
router attached, which can be determine based on local neighbor entry.
Implement BGP ext community attribute to generate and parse R-bit and
pass along zebra to program neigh entry in kernel.
Upon receiving MAC/IP update with community type 0x06 and sub_type 0x08,
pass the R-bit to zebra to program neigh entry.
Set NTF_ROUTER in neigh entry and inform kernel to do proxy NA for EVPN.
Ref:
https://tools.ietf.org/html/draft-ietf-bess-evpn-na-flags-01
Ticket:CM-21712, CM-21711
Reviewed By:
Testing Done:
Configure Local vni enabled L3 Gateway, which would act as router,
checked
show evpn arp-cache vni x ip <ip of svi> on originated and remote VTEPs.
"Router" flag is set.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit removes various parts of the bgpd implementation code which
are unused/useless, e.g. unused functions, unused variable
initializations, unused structs, ...
Signed-off-by: Pascal Mathis <mail@pascalmathis.com>
In Symmetric routing case, L3VNI stores evpn MAC_IP routes
as IP_PREFIX routes in associated bgp_vrf and fib.
When vxlan device for l3vni goes down, triggers l3vni delete
in bgp.
As part l3vni delete, evpn ip prefix routes associated
with the vni need to be withdrawn from zebra as well
bgpinfo needs to be freed.
bgp_delete does not free bgp_info associated
to evpn ip prefix routes (link to bgp_vrf).
Call to uninstall_evpn_route_entry_in_vrf() properly
cleanup bgp_info as well triggers appropriate updates.
Ticket:CM-21443
Testing Done:
On DUT, bringup symmetric routing configuration, learn
EVPN Type-2 and Type-3 Routes.
Type-2 MAC_IP routes will be stored as ip_prefix in vrf table
during l3vni bring up.
Remove L3vni, deletes all ip_prefix routes from the zebra, kernel
vrf route table and bgp_info is freed.
Check the show bgp memory stats for bgp_info post l3vni flap.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>