Force flush all ES-EVI PE entries when a L2-VNI is deleted. This will
implicitly free up the remote ES-EVI and deref the ES entry.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
There are two changes in this commit -
1. Maintain a list of global MAC-IP routes per-ES. This list is maintained
for quick processing on the following events -
a. When the first VTEP/PE becomes active in the ES-VRF, the L3 NHG is
activated and the route can be sent to zebra.
b. When there are no active PEs in the ES-VRF the L3 NHG is
de-activated and -
- If the ES is present in the VRF -
The route is not installed in zebra as there are no active PEs for
the ES-VRF
- If the ES is not present in the VRF -
The route is installed with a flat multi-path list i.e. without L3NHG.
This is to handle the case where there are no locally attached L2VNIs
on the ES (for that tenant VRF).
2. Reinstall VRF route when an ES is installed or uninstalled in a
tenant VRF (the global MAC-IP list in #1 is used for this purpose also).
If an ES is present in the VRF we use L3NHG to enable fast-failover of
routed traffic.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is done to clearly indicate what routes are being linked to
the list i.e. MAC-IP routes in the VNI table.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
In a sym-IRB setup the remote ES may not be installed if the tenant
VRF is not present locally. To allow that case while retaining the
fast-failover benefits for the case where the tenant VRF is locally
present we use the following approach -
1. If ES is present in the tenant VRF we use the L3NHG for installing
the MAC-IP based tenant route. This allows for efficient failover via
L3NHG updates.
2. If the ES is not present locally in the corresponding tenant VRF we
fall back to a non-NHG multi-path based routing approach. In this
case individual routes are updated when the ES links flap.
PS: #1 can be turned off entirely by disabling use-l3-nhg in BGP.
Ticket: CM-30935
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When an ES goes down the MAC-IP route must be updated to remove it from
the tenant VRF routing table. This is because the fast-failover
(via EAD-per-ES withdraw) procedures described in RFC 7432 are only
applicable to L2 forwarding/MAC-ECMP. For L3/routed traffic (in a
sym-IRB setup) failover, individual paths need to be withdrawn.
To handle this difference in L2/L3 requirements BGP updates the MAC-IP
route to include the L3 ECOM if local destination ES is oper-up and
to exclude the L3 ECOM if local ES is oper-down.
Ticket: CM-30935
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. When VNI export RT changes, for each local es_evi, update local
EAD/ES and EAD/EVI routes and advertise.
2. When VNI import RT changes, uninstall all type-1 routes imported in
the VNI and import routes carrying the updated RT.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
When a local ES is in LACP bypass state BGP doesn't advertise
reachability to it i.e. the Type-1/EAD-per-ES routes and Type-4
route for the ES is not advertised. This is the equivalent of
oper-down handling.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Consistency checks are processed in the background using a periodic timer.
Start this timer only if Ethernet Segments are present and consistency
checking is needed.
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Some vendors only advertise EAD-per-ES routes i.e. they do not
advertise EAD-per-EVI routes. To interop with these vendors we need
to relax the dependancy on EAD-per-EVI routes to activate a remote ES-PE.
Sample config -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
router bgp 5553
address-family l2vpn evpn
disable-ead-evi-rx
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Ticket: CM-31177
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
in the rare situation where we allocate the ES during the path link
we fail to check/store the allocated ES pointer thus leading to a
NULL dereference later in the function.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Two L3 next groups are installed per-VRF per-ES for v4 and v6. These
NHGs are used as an indirect destination for symmetric IRB host routes.
Using L3NHGs allows for efficient failover of an ES (similar to the
use of L2NHGs) i.e. when an ES goes down the number of dataplane
updates are limited to 2xN (where N is the number of tenant VRFs
associated with the ES) instead of updating all host-routes behind the
ES.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Host routes imported into the VRF can have a destination ES (per-VRF)
which is set up as a L3NHG for efficient failover.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. MAC-IP routes in the VPN routing table are linked to the
destination ES for efficient handling for remote ES link flaps.
2. Only MAC-IP paths whose nexthops are active (added via EAD-ES)
are imported into the VRF routing table.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
ES-VRF entries are maintained for the purpose of L3-NHG creation -
1. Each ES-EVI entry is associated with a tenant VRF. This associaton
triggers the creation of an ES-VRF entry.
2. Type-2/MAC-IP routes are imported into a tenant VRF and programmed as
a /32 or host route entry in the dataplane. If the destination of
the host route is a remote-ES the route is programmed with the
corresponding (keyed in by {vrf,ES-id}) L3-NHG.
3. The reason for this indirection (route->L3-NHG, L3-NHG->list-of-VTEPs)
is to avoid route updates to the dplane when a remote-ES link flaps i.e.
instead of updating all the dependent routes the NHG's contents are
updated. This reduces the amount of dataplane updates (fewer nhg updates vs.
route updates) allowing for a faster failover.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When iterating over the bgp_dest table, using this pattern:
for (dest = bgp_table_top(table); dest;
dest = bgp_route_next(dest)) {
If the code breaks or returns in the middle we will not have
properly unlocked the node as that bgp_table_top locks the top
dest and bgp_route_next locks the next dest and unlocks the old
dest.
From code inspection I have found a bunch of places that
we either return in the middle of or a break is issued.
Add appropriate unlocks.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
json_array_string_add is used to add a string entry into a JSON
list. This API is needed by zebra so moving it from bgpd to lib.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
DF (Designated forwarder) election is used for picking a single
BUM-traffic forwarded per-ES. RFC7432 specifies a mechanism called
service carving for DF election. However that mechanism has many
disadvantages -
1. LBs poorly.
2. Doesn't allow for a controlled failover needed in upgrade
scenarios.
3. Not easy to hw accelerate.
To fix the poor performance of service carving alternate DF mechanisms
have been proposed via the following drafts -
draft-ietf-bess-evpn-df-election-framework
draft-ietf-bess-evpn-pref-df
This commit adds support for the pref-df election mechanism which
is used as the default. Other mechanisms including service-carving
may be added later.
In this mechanism one switch on an ES is elected as DF based on the
preference value; higher preference wins with IP address acting
as the tie-breaker (lower-IP wins if pref value is the same).
Sample output
=============
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn es 03:00:00:00:00:01:11:00:00:01
ESI: 03:00:00:00:00:01:11:00:00:01
Type: LR
RD: 27.0.0.15:6
Originator-IP: 27.0.0.15
Local ES DF preference: 100
VNI Count: 10
Remote VNI Count: 10
Inconsistent VNI VTEP Count: 0
Inconsistencies: -
VTEPs:
27.0.0.16 flags: EA df_alg: preference df_pref: 32767
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn route esi 03:00:00:00:00:01:11:00:00:01
*> [4]:[03:00:00:00:00:01:11:00:00:01]:[32]:[27.0.0.15]
27.0.0.15 32768 i
ET:8 ES-Import-Rt:00:00:00:00:01:11 DF: (alg: 2, pref: 100)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Change thread_cancel to take a ** to an event, NULL-check
before dereferencing, and NULL the caller's pointer. Update
many callers to use the new signature.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The MH datastructures were being released before the paths that were
referencing them. Fix is to do the MH cleanup last.
The MH finish function has also been stripped down to only do a
datastructure cleanup i.e. avoid sending route updates etc.
Ticket: 31376
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Some more of the bgp_node usage snuck in from big commits in
the past month or so from feature work. Do some work
to put it back to bgp_dest for incoming future work.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
because ecommunity structure can host both ext community and ipv6 ext
community, do not forget to set the unit_size field.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
because the same extended community can be used for storing ipv6 and
ipv4 et communities, the unit length must be stored. do not forget to
set the standard value in bgp evpn.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
SYNC routes are paths rxed from a local-ES peer. These routes result in
the installation of local dataplane entries i.e. with access port as
destination (vs. the remote-VTEP destination that results in the packet
being sent via the VxLAN overlay).
If a SYNC path is selected as the best path it is always turned around
into a local path which immediately lowers the status of the SYNC path
to non-best. However we need to keep track of the highest MM seq-number
and peer activity to continue advertising the local path. In order to
do that we need information from the "second-best" SYNC path to be
bubbled up to the local best path. This "SYNC" info is then consolidated
and sent to zebra which is responsible for the MM handling and local
path management.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
A new proxy flag has been added to the already existing NA extended
community to allow proxy advertisment of a local host by a VTEP that is
yet to indpendently establish local reachability.
Reference: draft-rbickhart-evpn-ip-mac-proxy-adv
The extendend mac-mobility sequence number needs to be synced across
the ES peers. However we cannot let a ES-peer path win over a local
path on the same ES. To accomplish that some parameters such as the
MM seq number are bubbled up from the non-best path to the local path.
This mechanism is explained further in the path-selection patch.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is the base patch that brings in support for Type-1 routes.
It includes support for -
- Ethernet Segment (ES) management
- EAD route handling
- MAC-IP (Type-2) routes with a non-zero ESI i.e. Aliasing for
active-active multihoming
- Initial infra for consistency checking. Consistency checking
is a fundamental feature for active-active solutions like MLAG.
We will try to levarage the info in the EAD-ES/EAD-EVI routes to
detect inconsitencies in access config across VTEPs attached to
the same Ethernet Segment.
Functionality Overview -
========================
1. Ethernet segments are created in zebra and associated with
access VLANs. zebra sends that info as ES and ES-EVI objects to BGP.
2. BGP advertises EAD-ES and EAD-EVI routes for the locally attached
ethernet segments.
3. Similarly BGP processes EAD-ES and EAD-EVI routes from peers
and translates them into ES-VTEP objects which are then sent to zebra
as remote ESs.
4. Each ES in zebra is associated with a list of active VTEPs which
is then translated into a L2-NHG (nexthop group). This is the ES
"Alias" entry
5. MAC-IP routes with a non-zero ESI use the alias entry created in
(4.) to forward traffic i.e. a MAC-ECMP is done to these remote-ES
destinations.
EAD route management (route table and key) -
============================================
1. Local EAD-ES routes
a. route-table: per-ES route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
b. route-table: per-VNI route-table
Not added
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
2. Remote EAD-ES routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=ES-RD, ESI, ET=0xffffffff, VTEP-IP)
c. route-table: global route-table
key: {RD=ES-RD, ESI, ET=0xffffffff)
3. Local EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
4. Remote EAD-EVI routes
a. route-table: per-ES route-table
Not added
b. route-table: per-VNI route-table
key: {RD=0, ESI, ET=0, VTEP-IP)
c. route-table: global route-table
key: {RD=L2-VNI-RD, ESI, ET=0)
Please refer to bgp_evpn_mh.h for info on how the data-structures are
organized.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Re-org only; no other code changes. This is being done to make maintanence
of MH functionality (which will have more code added to it) easy.
The code moved here was originally committed via -
'commit 50f74cf131 ("*: support for evpn type-4 route")'
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>