Firstly, *keep no change* for `hash_get()` with NULL
`alloc_func`.
Only focus on cases with non-NULL `alloc_func` of
`hash_get()`.
Since `hash_get()` with non-NULL `alloc_func` parameter
shall not fail, just ignore the returned value of it.
The returned value must not be NULL.
So in this case, remove the unnecessary checking NULL
or not for the returned value and add `void` in front
of it.
Importantly, also *keep no change* for the two cases with
non-NULL `alloc_func` -
1) Use `assert(<returned_data> == <searching_data>)` to
ensure it is a created node, not a found node.
Refer to `isis_vertex_queue_insert()` of isisd, there
are many examples of this case in isid.
2) Use `<returned_data> != <searching_data>` to judge it
is a found node, then free <searching_data>.
Refer to `aspath_intern()` of bgpd, there are many
examples of this case in bgpd.
Here, <returned_data> is the returned value from `hash_get()`,
and <searching_data> is the data, which is to be put into
hash table.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
The bounded vrf of `l2vni/zevpn` have wrong relation with the order
in which vxlan interface and svi interface are set.
If set vxlan interface with vlanid first, then set svi interface with
vrf, it is ok that vxlan interface will get correct `vrf` inherited
from svi. But reverse the set sequence (i.e. set svi first, then vxlan),
vxlan interface can't get correct `vrf`, becasue the handling of
`ZEBRA_VXLIF_VLAN_CHANGE` missed inheritting `vrf` by mistake.
```
host# do show evpn vni 101
VNI: 101
Type: L2
Tenant VRF: vrf1
```
So update `vrf` ("Tenant VRF") of l2vni in `zebra_vxlan_if_update()`.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Like `zvni_map_to_svi_ns()` for `ns_walk_func()`, just use "assert"
instead of unnecessary check.
Since these parameters for `ns_walk_func()`, e.g. `in_param` and others,
must not be NULL. So use `assert` to ensure the these parameters, and
remove those unnecessary checks.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Since `zvni_map_to_svi_ns()` is used to find and return one specific interface
based on passed attributes of SVI, so the two parameters `in_param` and `p_ifp`
must not be NULL.
Passing NULL `p_ifp` makes no sense, so the check `if (p_ifp)` is
unnecessary.
So use `assert` to ensure the two parameters, and remove that unnecessary check.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Upon 'no advertise-all-vni', cleanup l2vni from
its tenant-vrf's l3vni list, instead of passed
zvrf->l3vni which will not be present in case
of default instance.
Reviewed By:
Testing Done:
Before Fix:
----------
TORC12(config-router-af)# advertise-all-vni
TORC12(config-router-af)# end
TORC12# show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
Vxlan-Intf: vni4001
State: Up
Router MAC: 44:38:39:ff:ff:01
L2 VNIs: 134217728 0 1000 1002 <-----
After Fix:
----------
TORC12# show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
Vxlan-Intf: vni4001
State: Up
Router MAC: 44:38:39:ff:ff:01
L2 VNIs: 1000 1002
Signed-off-by: Chirag Shah <chirag@nvidia.com>
RMAC keeping list of nexthops to keep track
of its existiance, remove the (old way) host prefix
mapping.
Ticket: #2798406
Reviewed By:
Testing Done:
TORS1# show evpn rmac vni 4001 mac 44:38:39:ff:ff:01
MAC: 44:38:39:ff:ff:01
Remote VTEP: 36.0.0.11
Refcount: 0
Prefixes:
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Keep the list of remote-vteps/nexthops in
rmac db.
Problem:
In CLAG deployment there might be a situation
where CLAG secondary sends individual ip as nexthop
along with anycast mac as RMAC. This combination
is updated in zebra's rmac cache.
Upon recovery at clag secondary sends withdrawal
of the incorrect rmac and nexthop mapping.
The RMAC entry mapping to nh is not cleaned up properly
in the zebra rmac cache.
Fix:
Zebra rmac db needs to maintain a list of nexthops.
When a bgp withdrawal for rmac to nexthop mapping
is received, remove the old nexthop from the rmac's nh
list and if the host reference still remains for
the RMAC,fall back to the nexthop one remaining in
the list.
At most you expect two nexthops mapped to RMAC
(in clag deployment).
Ticket: 2798406
Reviewed By:
Testing Done:
CLAG primary and secondary have advertise-pip enabled
advertise type-5 route (default route) with
individual IP as nh and individual svi mac as rmac.
- disable advertise pip on both clag devices, this
results in advertisement of routes with anycast ip as nh
and anycast mac as rmac.
- disable peerlink on clag primary, this triggers
clag secondary to (transitory) send bgp update with
individual ip as nh and anycast mac as rmac.
- At the remote vtep:
Check the zebra's rmac cache/nh mapping correctly
and points to anycast rmac and anycast ip as nh of the
clag system.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Since f60a1188 we store a pointer to the VRF in the interface structure.
There's no need anymore to store a separate vrf_id field.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Problem:
L2-VNI SVI down followed by L2-VNI's vxlan device
deletion leads to stale entry into L3VNI's
L2-VNI list.
Solution:
When L2-VNI associated SVI is down, default vrf
is the new tenant vrf.
Remove L2-VNI from L3VNI's l2vni list as
L3VNI/VRF is no longer valid in absence of associated
SVI.
When SVI is up re-add L2-VNI into associated VRF's
L3VNI.
The above remove/add from the L3VNI's L2VNI list is
already done when vxlan or L2-VNI is flaped, just need
to handle when SVI is flapped.
Ticket:#2817127
Reviewed By:
Testing Done:
After deleting SVI following by L2-VNI deletion,
L3VNI's L2-VNI list delets the L2-VNI. (no stale entry).
After adding back SVI/L2-VNI, L3VNI list adds back the
L2-VNI and it is associated right tenant VRF.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
if_lookup_by_index_all_vrf doesn't work correctly with netns VRF backend
as the same index may be used in multiple netns simultaneously.
In both case where it's used, we know the VRF in which we need to lookup
for the interface.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
when running bgp evpn rt5 setup, the Rmac sent in BGP updates
stands for the MAC address of the bridge interface. After
having loaded frr configuration, the Rmac address is not refreshed.
This issue can be easily reproduced by executing some commands:
ip netns exec cust1 ip link set dev br1000 address 2e🆎45:aa:bb:cc
Actually, the BGP EVPN contexts are kept unchanged.
That commit proposes to fix this by intercepting the mac address
change, and refreshing the vxlan interfaces attached to te bridge
interface that changed its MAC address.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When running bgp evpn rt5 setup with vrf namespace backend, once the
BGP configuration loaded, some refresh like the config change of a
vxlan interface is not taken into account. As consequence, the BGP
l2vpn evpn entries are empty. This can happen by recreating vxlan
interface like follows:
ip netns exec cust1 ip li del vxlan1000
ip link add vxlan1000 type vxlan id 1000 dev loopback0 local 10.209.36.1 learning
ip link set dev vxlan1000 mtu 9000
ip link set dev vxlan1000 netns cust1
ip netns exec cust1 bash
ip link set dev vxlan1000 up
ip link set dev vxlan1000 master br1000
Actually, changing learning attribute requires recreation, and this
change needs to manually reload the frr configuration.
The update mechanism in zebra about vxlan interface updates is
already put in place, but it does not work well with namespace
based vrf backend. The function zl3vni_from_svi() is then
modified to parse all the interfaces of each namespace.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
v4 and v6 host/refernce prefixes need to be setup separately for
[RMAC, VTEP] entries as the VTEP is always normalized to a v4 addr.
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Currently 'show evpn rmac vni .. mac .. json' includes fields for
localSequence and remoteSequence, which are misleading since they
aren't applicable to a macs in the IP-VRF mac table (RMAC).
This removes the localSequence + remoteSequence fields from the output.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Move remote VTEP updates from immediate, inline processing
in their ZAPI message handlers to the main workqueue.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Enqueue incoming vxlan remote macip updates on the main
workqueue, instead of performing the updates immediately,
in-line.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
FPM sends VNI to the data plane with the EVPN prefix. For pure type-5 EVPN
route, nexthop interface of EVPN prefix is L3VNI SVI. Thus, we encode L3VNI
corresponding to the nexthop vrf with rtmsg for this prefix.
For EVPN type-5 route with gateway IP overlay index, we supporting
asymmetric IRB. Thus, nexthop interface is L2VNI SVI. So, instead of fetching
vrf VNI, fetch VNI corresponding to the nexthop SVI and encode it in the rtmsg
for EVPN prefix.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
SVI ifindex for L2VNI is required in BGP to perform EVPN type-5 to type-2
recusrsive resolution using gateway IP overlay index.
Program this svi_ifindex in struct zebra_vni_t as well as in struct bgpevpn
Changes include:
1. Add svi_if field to struct zebra_evpn_t
2. Add svi_ifindex field to struct bgpevpn
3. When SVI (bridge or VLAN) is bound to a VxLAN interface, store it in the
zebra_evpn_t structure.
4. Add this SVI ifindex to ZEBRA_VNI_ADD
5. Store svi_ifindex in struct bgpevpn
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
When clagd is stopped on secondary device,
all vxlan interfaces (vnis) are kept in protodown state.
FRR treats protodown vxlan interfaces (vnis) as interface down
and sends vni delete to bgpd.
In the event of clagd down, SVIs are flapping as underlying
bridge is going through churn.
When FRR receives SVI up notification do not trigger event to bgpd
if vnis are operationaly down.
Ticket:#2600210 CM-22929
Reviewed By:CCR-11544
Testing Done:
Performed CLAG stop/start on secondary device, all vxlan devices
remained in protodown along with this validated the vnis are cleaned up
and added back in bgpd.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
When creating a large number of vrf's we are creating a fairly
large number of hash tables per vrf. Reduce memory usage on
startup as well as let us identify the table these things come
from.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is always a 16 bit unsigned value.
- signed int is the wrong type to use
- encoding a signed int as a uint32 is bad practice
- decoding a signed int encoded as a uint32 into a uint16 is bad
practice
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
EVPN nexthops are installed as remote neighs by zebra. This was earlier
done only via VRF IPvX uni routes imported from EVPN routes.
With EVPN-MH these VRF routes now reference a L3NHG which is setup based
on the EAD and doesn't include the RMAC. To workaround that BGP now
consolidates and maintains EVPN nexthops which are then sent to zebra.
zebra sets up these nexthops as L3-VNI nh entries using a dummy type-1
route as reference.
Ticket: CM-31398
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This one also needed a bit of shuffling around, but MTYPE_RE is the only
one left used across file boundaries now.
Signed-off-by: David Lamparter <equinox@diac24.net>
When an ES-bond comes out of bypass FRR needs to flush the local MACs learnt
while the bond was in bypass. To do that efficiently local MACs are linked
to the dest-access port. This only happens if the access-port is in
LACP-bypass or if it is non-ES.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Feature overview:
=================
A 802.3ad bond can be setup to allow lacp-bypass. This is done to enable
servers to pxe boot without a LACP license i.e. allows the bond to go oper
up (with a single link) without LACP converging.
If an ES-bond is oper-up in an "LACP-bypass" state MH treats it as a non-ES
bond. This involves the following special handling -
1. If the bond is in a bypass-state the associated ES is placed in a
bypass state.
2. If an ES is in a bypass state -
a. DF election is disabled (i.e. assumed DF)
b. SPH filter is not installed.
3. MACs learnt via the host bond are advertised with a zero ESI.
When the ES moves out of "bypass" the MACs are moved from a zero-ESI to
the correct non-zero id. This is treated as a local station move.
Implementation:
===============
When (a) an ES is detached from a hostbond or (b) an ES-bond goes into
LACP bypass zebra deletes all the local macs (with that ES as destination)
in the kernel and its local db. BGP re-sends any imported MAC-IP routes
that may exist with this ES destination as remote routes i.e. zebra can
end up programming a MAC that was perviously local as remote pointing
to a VTEP-ECMP group.
When an ES is attached to a hostbond or an ES-bond goes
LACP-up (out of bypss) zebra again deletes all the local macs in the
kernel and its local db. At this point BGP resends any imported MAC-IP
routes that may exist with this ES destination as sync routes i.e.
zebra can end up programming a MAC that was perviously remote
as local pointing to an access port.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Added support for advertising SVI MAC if EVPN-MH is enabled.
In the case of EVPN MH arp replies from an attached server can be sent to
the ES-peer. To prevent flooding of the reply the SVI MAC needs to be
advertised by default.
Note:
advertise-svi-ip could have been used as an alternate way to advertise
SVI MAC. However that config cannot be turned on if SVI IPs are
re-used (which is done to avoid wasting IP addresses in a subnet).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
SVI IP is being advertised unconditionally i.e. even if disabled (and
that is the default config). This can be problematic when the SVI address
is re-used across racks.
Added the user config condition in all the relevant places where the
SVI advertisement is triggered.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Neither tabs nor newlines are acceptable in syslog messages. They also
break line-based parsing of file logs.
Signed-off-by: David Lamparter <equinox@diac24.net>
the old VXLAN function for local MAC deletion was still in
existence and being called from the VXLAN code whilst the new
generic function was not being called at all. Resolve this so
the generic function matches the old function and is called
exclusively.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
If a netlink/dp notification is rxed for a neigh without the peer-sync
flag FRR re-installs the entry with the right flags. This change is
needed to handle cases where the dataplane and FRR may fall out of
sync because of neigh learning on the network ports (i.e. via
the VxLAN).
Ticket: CM-30693
The problem was found during VM mobility "torture" tests where 100s
of extended VM moves were done.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
After removal of L3VNI config, the VNI should become an L2VNI if a VxLAN
interface is present for the VNI. This case is not handled in the code.
Changes:
1. After unconfiguring L3VNI, create an L2VNI if VxLAN interface is present
for the VNI.
2. Trigger an update to BGP.
3. Read MAC and ARP entries from kernel.
This PR fixes the issue only for route type-2, 3 and 5. This PR does not address
states regarding route type-1, 4 and multicast group for VxLAN interface.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
When a local ES flaps there are two modes in which the local
MACs are failed over -
1. Fast failover - A backup NHG (ES-peer group) is programmed in the
dataplane per-access port. When a local ES flaps the MAC entries
are left unaltered i.e. pointing to the down access port. And the
dataplane redirects traffic destined to the oper-down access port
via the backup NHG.
2. Slow failover - This mode needs to be turned on to allow dataplanes
not capable of re-directing traffic. In this mode local MAC entries
on a down local ES are re-programmed to point to the ES-peers'
NHG. And vice-versa i.e. when the ES comes up the MAC entries
are re-programmed with the access port as dest.
Fast failover is on by default. Slow failover can be enabled via the
following config -
evpn mh redirect-off
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
There are two fixes in this commit -
1. Prevent implicit deletion of (*,G) entries during (S,G) cleanup.
This is done by creating a dummy reference on all (*,G) entries.
This is needed for a hash-walk based table cleanup.
2. Free up the SG hash table when the VRF is deleted.
Ticket: CM-30151
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
DAD is not supported currently with EVPN-MH so we turn it off internally
when the first ES config is detected.
PS: Note that when all local ESs are deleted DAD will stay off and
will need to be cleared via a daemon restart.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Local ethernet segments are held in a protodown or error-disabled state
if access to the VxLAN overlay is not ready -
1. When FRR comes up the local-ESs/access-port are kept protodown
for the startup-delay duration. During this time the underlay and
EVPN routes via it are expected to converge.
2. When all the uplinks/core-links attached to the underlay go down
the access-ports are similarly protodowned.
The ES-bond protodown state is propagated to each ES-bond member
and programmed in the dataplane/kernel (per-bond-member).
Configuring uplinks -
vtysh -c "conf t" vtysh -c "interface swp4" vtysh -c "evpn mh uplink"
Configuring startup delay -
vtysh -c "conf t" vtysh -c "evpn mh startup-delay 100"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
EVPN protodown display -
========================
root@torm-11:mgmt:~# vtysh -c "show evpn"
L2 VNIs: 10
L3 VNIs: 3
Advertise gateway mac-ip: No
Advertise svi mac-ip: No
Duplicate address detection: Disable
Detection max-moves 5, time 180
EVPN MH:
mac-holdtime: 60s, neigh-holdtime: 60s
startup-delay: 180s, start-delay-timer: 00:01:14 <<<<<<<<<<<<
uplink-cfg-cnt: 4, uplink-active-cnt: 4
protodown: startup-delay <<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ES-bond protodown display -
===========================
root@torm-11:mgmt:~# vtysh -c "show interface hostbond1"
Interface hostbond1 is up, line protocol is down
Link ups: 0 last: (never)
Link downs: 1 last: 2020/04/26 20:38:03.53
PTM status: disabled
vrf: default
OS Description: Local Node/s torm-11 and Ports swp5 <==> Remote Node/s hostd-11 and Ports swp1
index 58 metric 0 mtu 9152 speed 4294967295
flags: <UP,BROADCAST,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:35
Interface Type bond
Master interface: bridge
EVPN-MH: ES id 1 ES sysmac 00:00:00:00:01:11
protodown: off rc: startup-delay <<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ES-bond member protodown display -
==================================
root@torm-11:mgmt:~# vtysh -c "show interface swp5"
Interface swp5 is up, line protocol is down
Link ups: 0 last: (never)
Link downs: 3 last: 2020/04/26 20:38:03.52
PTM status: disabled
vrf: default
index 7 metric 0 mtu 9152 speed 10000
flags: <UP,BROADCAST,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:35
Interface Type Other
Master interface: hostbond1
protodown: on rc: startup-delay <<<<<<<<<<<<<<<<
root@torm-11:mgmt:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Zebra's clear duplicate detect command is rpc converted.
There is condition where cli fails with human readable message.
Using northboun's errmsg buffer to display error message to
user.
Testing:
bharat# clear evpn dup-addr vni 1002 ip 2011:11::11
Error type: generic error
Error description: Requested IP's associated MAC aa:aa:aa:aa:aa:aa is still in duplicate state
bharat# clear evpn dup-addr vni 1002 ip 11.11.11.11
Error type: generic error
Error description: Requested IP's associated MAC aa:aa:aa:aa:aa:aa is still in duplicate state
Signed-off-by: Chirag Shah <chirag@nvidia.com>
With l2vni flap leading to duplicate entry creation
in l3vni's l2vni-list.
Use list sorted add with no duplicates.
root@TORC11:mgmt:~# show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
State: Up
...
L2 VNIs: 1000 1000 1000 0 0 1002
root@TORC11:mgmt:~# ip link set down vx-1002
root@TORC11:mgmt:~# ip link set up vx-1002
root@TORC11:mgmt:~# show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
State: Up
...
L2 VNIs: 1000 1000 1000 0 0 1002 1002
Ticket:CM-31545
Reviewed By:
Testing Done:
With Fix:
Multiple time flaps vni counts remained the same.
root@TORC11:mgmt:~# ip link set down vx-1002
root@TORC11:mgmt:~# ip link set up vx-1002
root@TORC11:mgmt:~# ip link set down vx-1002
root@TORC11:mgmt:~# ip link set up vx-1002
root@TORC11:mgmt:~# net show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
State: Up
...
L2 VNIs: 1000 1002
Signed-off-by: Chirag Shah <chirag@nvidia.com>
the walk routine is used by vxlan service to identify some contexts in
each specific network namespace, when vrf netns backend is used. that
walk mechanism is extended with some additional paramters to the walk
routine.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when duplicate address detection is observed, some incrementation,
some timing mechanisms need to be done. For that the main evpn
configuration is retrieved. Until now, the VRF that was storing the dad
config parameters was the same VRF that hosted the VXLAN interface. With
netns backend, this is not true, as the VXLAN interface is in the
same VRF as the bridge interface. The modification takes same definition
as in BGP, that is to say that there is a single bgp evpn instance, and
this is that instance that will give the correct config settings.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
all network namespaces are read so as to collect interesting fdb and
neighbor tables for EVPN.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
With vrf-lite mechanisms, it is possible to create layer 3 vnis by
creating a bridge interface in default vr, by creating a vxlan interface
that is attached to that bridge interface, then by moving the vxlan
interface to the wished vrf.
With vrf-netns mechanism, it is slightly different since bridged
interfaces can not be separated in different network namespaces. To make
it work, the setup consists in :
- creating a vxlan interface on default vrf.
- move the vxlan interface to the wished vrf ( with an other netns)
- create a bridge interface in the wished vrf
- attach the vxlan interface to that bridged interface
from that point, if BGP is enabled to advertise vnis in default vrf,
then vxlan interfaces are discovered appropriately in other vrfs,
provided that the link interface still resides in the vrf where l2vpn is
advertised.
to import ipv4 entries from a separate vrf, into the l2vpn, the
configuration of vni in the dedicated vrf + the advertisement of ipv4
entries in bgp vrf will import the entries in the bgp l2vpn.
the modification consists in parsing the vxlan interfaces in all network
namespaces, where the link resides in the same network namespace as the
bgp core instance where bgp l2vpn is enabled.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
clone zebra_vxlan.c to create a file zebra_evpn.c for core
EVPN functions whilst retaining the history of zebra_vxlan.c
Signed-off-by: Pat Ruddy <pat@voltanet.io>
extract the neighbor uninstall part of
zebra_vxlan_handle_kernel_neigh_del into a new function
zebra_evpn_neigh_del_ip in zebra_evpn_neigh.c.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
extract the neighbor uninstall part of process_remote_macip_add
into a new function zebra_evpn_neigh_remote_uninstall in
zebra_evpn_neigh.c.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
extract the neighbor part of process_remote_macip_add into a new
function zebra_evpn_neigh_gw_macip_add in zebra_evpn_neigh.c.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
extract the neighbor part of process_remote_macip_add into a new
function process_neigh_remote_macip_add in zebra_evpn_neigh.c.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
clone zebra_vxlan.c to create a file zebra_evpn_neigh.c for neighbor
dB functions whilst retaining the history of zebra_vxlan.c
Signed-off-by: Pat Ruddy <pat@voltanet.io>
extract mac_gateway add code from zevi_gw_macip_add and move it to
a new generic function zebra_evpn_mac_gw_macip_add in zebra_evpn_mac.c
Signed-off-by: Pat Ruddy <pat@voltanet.io>
extract generic local mac add code from zebra_vxlan_local_mac_del
into a new function zebra_evpn_del_local_mac in zebra_evpn_mac.c
Signed-off-by: Pat Ruddy <pat@voltanet.io>
extract the local mac add code from zebra_vxlan_local_mac_add_update
and create a new generic local mac add function
zebra_evpn_add_update_local_mac
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Move MAC add code from process_remote_macip_add in zebra_vxlan.c
to a generic function process_mac_remote_macip_add in
zebra_evpn_mac.c
Signed-off-by: Pat Ruddy <pat@voltanet.io>
clone zebra_vxlan.c to create a file zebra_evpn_mac.c for MAC dB
functions whilst retaining the history of zebra_vxlan.c
Signed-off-by: Pat Ruddy <pat@voltanet.io>
The main zebra_vni_t hash structure has been renamed to zebra_evpn_t
to allow for other transport underlays. Rename functions and variables
to reflect this change.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
MAC-IP routes are used for syncing local entries across redundant
switches in an EVPN-MH setup. A path from a peer that has a local
ES as destination is tagged as a SYNC path. The SYNC path results in the
addition of local MAC and/or local neigh entry in zebra and in the
dataplane.
Implementation overview
=======================
1. Three new flags "local-inactive", "peer-active" and "peer-proxy"
are maintained per-local-MAC and per-local-Neigh entry.
2. The "peer-XXX" flags are set and cleared via SYNC path updates
from BGP. Proxy sync paths result in the setting of "peer-proxy" flag
(and non-proxies result in the "peer-active").
3. A neigh entry that has a "peer-XXX" flag set is programmed as
"static" in the dataplane.
4. A MAC entry that has a "peer-XXX" flag set or is referenced by
a sync-neigh entry (that has a "peer-XXX" flags set) is programmed
as "static" in the dataplane.
5. The sync-seq number is used to normalize the MM seq number across
all the redundant switches i.e. the max MM seq number across all
switches is used by each of the switches. This commit also includes
the changes needed for extended MM seq syncing.
6. A MAC/neigh entry has to be local-active or peer-active to sent to
BGP. An entry that is NOT local-active is sent with the proxy flag (so
BGP can "proxy" advertise it).
7. The "peer-active" flag is aged out by zebra by using a hold_timer
(this is instead of being abruptly dropped on SYNC path delete). This
age-out is needed to handle peer-switch restart (procedures are specified
in draft-rbickhart-evpn-ip-mac-proxy-adv). The holdtime needs to be
sufficiently long to allow an external neighmgr daemon or the dataplane
component to independently probe and establish local reachability of a
host. The MAC and neigh hold time values are configurable.
PS: In the future this probing may happen in FRR itself.
CLI changes to display sync info
================================
MAC
===
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@torm-11:mgmt:~# net show evpn mac vni 1000
Number of MACs (local and remote) known for this VNI: 6
Flags: N=sync-neighs, I=local-inactive, P=peer-active, X=peer-proxy
MAC Type Flags Intf/Remote ES/VTEP VLAN Seq #'s
00:02:00:00:00:25 local vlan1000 1000 0/0
02:02:00:00:00:02 local PI hostbond1 1000 0/0
02:02:00:00:00:06 remote 03:00:00:00:00:02:11:00:00:01 0/0
02:02:00:00:00:01 local X hostbond1 1000 0/0
00:00:00:00:00:11 local PI hostbond1 1000 0/0
02:02:00:00:00:05 remote 03:00:00:00:00:02:11:00:00:01 0/0
root@torm-11:mgmt:~#
root@torm-11:mgmt:~# net show evpn mac vni 1000 mac 00:00:00:00:00:11
MAC: 00:00:00:00:00:11
ESI: 03:00:00:00:00:01:11:00:00:01
Intf: hostbond1(58) VLAN: 1000
Sync-info: neigh#: 0 local-inactive peer-active >>>>>>>>>>>>
Local Seq: 0 Remote Seq: 0
Neighbors:
No Neighbors
root@torm-11:mgmt:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
neigh
=====
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
root@torm-11:mgmt:~# net show evpn arp vni 1003
Number of ARPs (local and remote) known for this VNI: 4
Flags: I=local-inactive, P=peer-active, X=peer-proxy
Neighbor Type Flags State MAC Remote ES/VTEP Seq #'s
2001:fee1:0:3::6 local active 00:02:00:00:00:25 0/0
45.0.3.66 local P active 00:02:00:00:00:66 0/0
45.0.3.6 local active 00:02:00:00:00:25 0/0
fe80::202:ff:fe00:25 local active 00:02:00:00:00:25 0/0
root@torm-11:mgmt:~#
root@torm-11:mgmt:~# net show evpn arp vni 1003 ip 45.0.3.66
IP: 45.0.3.66
Type: local
State: active
MAC: 00:02:00:00:00:66
Sync-info: peer-active >>>>>>>>>>>>>>>>
Local Seq: 0 Remote Seq: 0
root@torm-11:mgmt:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. Local ethernet segments are configured in zebra by attaching a
local-es-id and sys-mac to a access interface -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
!
interface hostbond1
evpn mh es-id 1
evpn mh es-sys-mac 00:00:00:00:01:11
!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
This info is then sent to BGP and used for the generation of EAD-per-ES
routes.
2. Access VLANs associated with an (ES) access port are translated into
ES-EVI objects and sent to BGP. This is used by BGP for the
generation of EAD-EVI routes.
3. Remote ESs are imported by BGP and sent to zebra. A list of VTEPs
is maintained per-remote ES in zebra. This list is used for the creation
of the L2-NHG that is used for forwarding traffic.
4. MAC entries with a non-zero ESI destination use the L2-NHG associated
with the ESI for forwarding traffic over the VxLAN overlay.
Please see zebra_evpn_mh.h for the datastruct organization details.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Revert "zebra: support for macvlan interfaces"
This reverts commit bf69e212fd.
Revert "doc: add some documentation about bgp evpn netns support"
This reverts commit 89b97c33d7.
Revert "zebra: dynamically detect vxlan link interfaces in other netns"
This reverts commit de0ebb2540.
Revert "bgpd: sanity check when updating nexthop from bgp to zebra"
This reverts commit ee9633ed87.
Revert "lib, zebra: reuse and adapt ns_list walk functionality"
This reverts commit c4d466c830.
Revert "zebra: local mac entries populated in correct netnamespace"
This reverts commit 4042454891.
Revert "zebra: when parsing local entry against dad, retrieve config"
This reverts commit 3acc394bc5.
Revert "bgpd: evpn nexthop can be changed by default"
This reverts commit a2342a2412.
Revert "zebra: zvni_map_to_vlan() adaptation for all namespaces"
This reverts commit db81d18647.
Revert "zebra: add ns_id attribute to mac structure"
This reverts commit 388d5b438e.
Revert "zebra: bridge layer2 information records ns_id where bridge is"
This reverts commit b5b453a2d6.
Revert "zebra, lib: new API to get absolute netns val from relative netns val"
This reverts commit b6ebab34f6.
Revert "zebra, lib: store relative default ns id in each namespace"
This reverts commit 9d3555e06c.
Revert "zebra, lib: add an internal API to get relative default nsid in other ns"
This reverts commit 97c9e7533b.
Revert "zebra: map vxlan interface to bridge interface with correct ns id"
This reverts commit 7c990878f2.
Revert "zebra: fdb and neighbor table are read for all zns"
This reverts commit f8ed2c5420.
Revert "zebra: zvni_map_to_svi() adaptation for other network namespaces"
This reverts commit 2a9dccb647.
Revert "zebra: display interface slave type"
This reverts commit fc3141393a.
Revert "zebra: zvni_from_svi() adaptation for other network namespaces"
This reverts commit 6fe516bd4b.
Revert "zebra: importation of bgp evpn rt5 from vni with other netns"
This reverts commit 28254125d0.
Revert "lib, zebra: update interface name at netlink creation"
This reverts commit 1f7a68a2ff.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
In networking restart event, l3vni (vxlan) interface followed by
associated vrf interfaces go down/deleted.
L3vni (oper) down event (from zebra to bgp) triggers to
clean up/un-import evpn routes (one-by-one) from the vrf table,
zebra internally removes the route entry from nexthop and RMAC hash.
When all the routes references in nexthop and RMAC db removed,
both (nexthop/rmac) are suppose to be uninstalled from the
bridge fdb and neigh table.
While evpn routes removal in progress, a vrf disable event removes
l3vni to its vrf association.
Subsequent bgp to evpn routes removal does not clean up thus evpn routes
reference to nexthop and RMAC remains in zebra hash.
bridge fdb and neigh tables are flushed out since networking restart brings down
all interfaces which results in flush of fdb and neigh tables.
By product is the zebra does not install nexthop and rmac when routes are re-imported
into vrf in VNI/VRF up event.
The fix is in vrf disable event to flush all l3vni's nexthop and rmac db.
Ticket:CM-30338
Reviewed By:CCR-10489
Testing Done:
Performed multiple networking restart and checked neigh and
bridge fdb tables for respective nexthop and router mac entry
programmed.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
the walk routine is used by vxlan service to identify some contexts in
each specific network namespace, when vrf netns backend is used. that
walk mechanism is extended with some additional paramters to the walk
routine.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>