When an ES-bond comes out of bypass FRR needs to flush the local MACs learnt
while the bond was in bypass. To do that efficiently local MACs are linked
to the dest-access port. This only happens if the access-port is in
LACP-bypass or if it is non-ES.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Feature overview:
=================
A 802.3ad bond can be setup to allow lacp-bypass. This is done to enable
servers to pxe boot without a LACP license i.e. allows the bond to go oper
up (with a single link) without LACP converging.
If an ES-bond is oper-up in an "LACP-bypass" state MH treats it as a non-ES
bond. This involves the following special handling -
1. If the bond is in a bypass-state the associated ES is placed in a
bypass state.
2. If an ES is in a bypass state -
a. DF election is disabled (i.e. assumed DF)
b. SPH filter is not installed.
3. MACs learnt via the host bond are advertised with a zero ESI.
When the ES moves out of "bypass" the MACs are moved from a zero-ESI to
the correct non-zero id. This is treated as a local station move.
Implementation:
===============
When (a) an ES is detached from a hostbond or (b) an ES-bond goes into
LACP bypass zebra deletes all the local macs (with that ES as destination)
in the kernel and its local db. BGP re-sends any imported MAC-IP routes
that may exist with this ES destination as remote routes i.e. zebra can
end up programming a MAC that was perviously local as remote pointing
to a VTEP-ECMP group.
When an ES is attached to a hostbond or an ES-bond goes
LACP-up (out of bypss) zebra again deletes all the local macs in the
kernel and its local db. At this point BGP resends any imported MAC-IP
routes that may exist with this ES destination as sync routes i.e.
zebra can end up programming a MAC that was perviously remote
as local pointing to an access port.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Earlier type-3 ESI was the only format supported for evpn-mh. Updated the
CLI to allow a 10-byte type-0 ESI.
Both type-0 and type-3 ESIs are statically configured; just in two different
ways -
1. type-0 is configured as a complete 10-byte string
2. type-3 is configured as a 6-byte es-sys-mac and a 3-byte
local-discriminator.
Sample config -
!
interface hostbond1
evpn mh es-id 00:44:38:39:ff:ff:01:00:00:01
!
This is a CLI-only change and has no functional impact.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Local ethernet segments are held in a protodown or error-disabled state
if access to the VxLAN overlay is not ready -
1. When FRR comes up the local-ESs/access-port are kept protodown
for the startup-delay duration. During this time the underlay and
EVPN routes via it are expected to converge.
2. When all the uplinks/core-links attached to the underlay go down
the access-ports are similarly protodowned.
The ES-bond protodown state is propagated to each ES-bond member
and programmed in the dataplane/kernel (per-bond-member).
Configuring uplinks -
vtysh -c "conf t" vtysh -c "interface swp4" vtysh -c "evpn mh uplink"
Configuring startup delay -
vtysh -c "conf t" vtysh -c "evpn mh startup-delay 100"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
EVPN protodown display -
========================
root@torm-11:mgmt:~# vtysh -c "show evpn"
L2 VNIs: 10
L3 VNIs: 3
Advertise gateway mac-ip: No
Advertise svi mac-ip: No
Duplicate address detection: Disable
Detection max-moves 5, time 180
EVPN MH:
mac-holdtime: 60s, neigh-holdtime: 60s
startup-delay: 180s, start-delay-timer: 00:01:14 <<<<<<<<<<<<
uplink-cfg-cnt: 4, uplink-active-cnt: 4
protodown: startup-delay <<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ES-bond protodown display -
===========================
root@torm-11:mgmt:~# vtysh -c "show interface hostbond1"
Interface hostbond1 is up, line protocol is down
Link ups: 0 last: (never)
Link downs: 1 last: 2020/04/26 20:38:03.53
PTM status: disabled
vrf: default
OS Description: Local Node/s torm-11 and Ports swp5 <==> Remote Node/s hostd-11 and Ports swp1
index 58 metric 0 mtu 9152 speed 4294967295
flags: <UP,BROADCAST,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:35
Interface Type bond
Master interface: bridge
EVPN-MH: ES id 1 ES sysmac 00:00:00:00:01:11
protodown: off rc: startup-delay <<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ES-bond member protodown display -
==================================
root@torm-11:mgmt:~# vtysh -c "show interface swp5"
Interface swp5 is up, line protocol is down
Link ups: 0 last: (never)
Link downs: 3 last: 2020/04/26 20:38:03.52
PTM status: disabled
vrf: default
index 7 metric 0 mtu 9152 speed 10000
flags: <UP,BROADCAST,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:35
Interface Type Other
Master interface: hostbond1
protodown: on rc: startup-delay <<<<<<<<<<<<<<<<
root@torm-11:mgmt:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. DF preference is configurable per-ES
!
interface hostbond1
evpn mh es-df-pref 100 >>>>>>>>>>>
evpn mh es-id 1
evpn mh es-sys-mac 00:00:00:00:01:11
!
2. This parameter is sent to BGP and advertised via the ESR.
3. The peer-ESs' DF params are sent to zebra (by BGP) and used
for running the DF election.
4. If the local VTEP becomes non-DF on an ES a block filter is
programmed in the dataplane to drop de-capsulated BUM packets
destined to that ES.
Sample output
=============
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh evpn es
Type: L local, R remote, N non-DF
ESI Type ES-IF VTEPs
03:00:00:00:00:01:11:00:00:01 LRN hostbond1 27.0.0.16
03:00:00:00:00:01:22:00:00:02 LR hostbond2 27.0.0.16
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh evpn es 03:00:00:00:00:01:11:00:00:01
ESI: 03:00:00:00:00:01:11:00:00:01
Type: Local,Remote
Interface: hostbond1
State: up
Ready for BGP: yes
VNI Count: 10
MAC Count: 2
DF: status: non-df preference: 100 >>>>>>>>
Nexthop group: 0x2000001
VTEPs:
27.0.0.16 df_alg: preference df_pref: 32767 nh: 0x100000d >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The Solaris code has gone through a deprecation cycle. No-one
has said anything to us and worse of all we don't have any test
systems running Solaris to know if we are making changes that
are breaking on Solaris. Remove it from the system so
we can clean up a bit.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
1. Local ethernet segments are configured in zebra by attaching a
local-es-id and sys-mac to a access interface -
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
!
interface hostbond1
evpn mh es-id 1
evpn mh es-sys-mac 00:00:00:00:01:11
!
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
This info is then sent to BGP and used for the generation of EAD-per-ES
routes.
2. Access VLANs associated with an (ES) access port are translated into
ES-EVI objects and sent to BGP. This is used by BGP for the
generation of EAD-EVI routes.
3. Remote ESs are imported by BGP and sent to zebra. A list of VTEPs
is maintained per-remote ES in zebra. This list is used for the creation
of the L2-NHG that is used for forwarding traffic.
4. MAC entries with a non-zero ESI destination use the L2-NHG associated
with the ESI for forwarding traffic over the VxLAN overlay.
Please see zebra_evpn_mh.h for the datastruct organization details.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reported by testing agency that rfc 4861 section 6.2.1 states
that all implementations must have a configuration knob to change
the setting of the advertised hop limit. This fix adds that
capability.
Ticket: CM-29200
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Problem reported by testing facility that our sending of Router
Advertisements more frequently than once very three seconds is not
compliant with rfc4861. Added a knob to turn off fast retransmits
in order to meet the requirement of the RFC.
Ticket: CM-27063
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Add a private header file for functions that are internal/special
case like how we do it for `lib/nexthop_group_private.h`.
Remove a bunch of functions from the header file only being used
statically and add some comments for those remaining to indicate
better what their use is.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Switch the nhg_connected tree structures to use the new
RB tree API in `lib/typerb.h`. We were using the openbsd-tree
implementation before.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We will use a nhe context for dataplane interaction with
nextho group hash entries.
New nhe's from the kernel will be put into a group array
if they are a group and queued on the rib metaq to be processed
later.
New nhe's sent to the kernel will be set on the dataplane context
with approprate ID's in the group array if needed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-organize and expose the nhg_connected functions so that
it can be used outside zebra_nhg.c. And then abstract those
into zebra_nhg_depends_* and zebra_nhg_depenents_* functons.
Switch the ifp struct to use an RB tree for its dependents,
making use of the nhg_connected functions.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add an interface pointer for an nexthop group hash entry
when we are getting a rib_add for a new route.
Also, add the interface index to the `show nexthop-group` command.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a nexthop hash entry list to the local zebra
interface info for each interface. This will allow
us to modify nexthops on link events.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The alias/description of an interface in linux was being
used to override the internal description. As such let's
fix the display to keep track of both if we have it.
Config in FRR:
!
interface docker0
description another combination
!
interface enp3s0
description BAMBOOZLE ME WILL YOU
!
Config in linux:
sharpd@robot ~/f/zebra> ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
alias This is the loopback you cabbage
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 74:d0:2b:9c:16:eb brd ff:ff:ff:ff:ff:ff
alias HI HI HI
Now the 'show int descr' command:
robot# show int description
Interface Status Protocol Description
docker0 up down another combination
enp3s0 up up BAMBOOZLE ME WILL YOU
HI HI HI
lo up up This is the loopback you cabbage
Fixes: #4191
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The interface type can be a bond or a bond slave, add some
code to note this and to display it as part of a show interface
command.
Signed-off-by: Dinesh Dutt <didutt@gmail.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Abstract the mac neigh installation for 169.254.0.1 into
it's own function that we can pass the mac address into.
This will allow a future commit to use this functionality
when we have the appropriate mac address from reading
optional attributes of a RA packet.
Signed-off-by: Donald Sharp <sharpd@cumuusnetworks.com>
Netdevices are not sorted in any fashion by the kernel during the initial
interface nldump. So you can get an upper device (such as an SVI) before
its corresponding lower device (bridge).
To fix this problem we skip resolving link dependencies during handling of
nldump notifications. Resolving instead at the end (when all the devices
are present)
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Ticket: CM-22388, CM-21796
Reviewed By: CCR-7845
Testing Done:
1. verified on a setup with missing linkages
2. automation - evpn-min
This crash occurs only with netns implementation.
vrf meaning is different regarging its implementation (netns or
vrf-lite)
- With vrf-lite implementation vrf is a property of the interface that
can be changed as the speed or the state (iproute2 command: "ip link
set dev IF_NAME master VRF_NAME"). All interfaces of the system are in
the same netns and so interface name is unique.
- With netns implementation vrf is a characteristic of the interface
that CANNOT be changed: it is the id of the netns where the interface
is located. To change the vrf of an interface (iproute2 command to
move an interface "ip netns exec VRF_NAME1 ip link set dev IF_NAME
netns VRF_NAME2") the interface is deleted from the old vrf and
created in the new vrf.
Interface name is not unique, the same name can be present in the
different netns (typically the lo interface) and search of interface
must be done by the tuple (interface name, netns id).
Current tests on the vrf implementation (vrf-lite or netns) are not
sufficient. In some cases (for example when an interface is moved from
a vrf X to the default vrf and then move back to VRF X) we can have a
corruption message and then a crash of zebra.
To avoid this corruption test on the vrf implementation, needed when an
interface changes, has been rewritten:
- For all interface changes except deletion the if_get_by_name function,
that checks if an interface exists and creates or updates it if
needed, is changed:
* The vrf-lite implementation is unchanged: search of the interface
is based only on the name and update the vrf-id if needed.
* The netns implementation search of the interface is based on the
(name, vrf-id) tuple and interface is created if not found, the
vrf-id is never updated.
- deletion of an interface (reception of a RTM_DELLINK netlink message):
* The vrf-lite implementation is unchanged: the interface
information are cleared and the interface is moved to the default
vrf if it does not belong to (to allow vrf deletion)
* The netns implementation is changed: only the interface
information are cleared and the interface stays in its vrf to
avoid conflict with interface with the same name in the default
vrf.
This implementation reverts (partially or totally):
commit 393ec5424e ("zebra: fix missing node attribute set in ifp")
commit e9e9b1150f ("lib: create interface even if name is the same")
commit 9373219c67 ("zebra: improve logs when replacing interface to an
other netns")
Fixes: b53686c52a ("zebra: delete interface that disappeared")
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when interface is a virtual ethernet interface, then there is no need to
update link pointer of interface.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This function is changed so that the interface index is searched across
the correct namespace.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Notice when someone deletes a neighbor entry we've put in for
rfc-5549 gets deleted by some evil evil person. When this happens
notice and push it back in, immediately.
Ticket: CM-18612
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When moving interfaces to an other place, like other netns, the
remaining interface is still present, with inactive status.
Now, that interface is deleted from the list, if the interface appears
on an other netns. If not, the interface is kept.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There are certain interfaces that when brought up and we receive
the netlink notification about it, the speed of the interface is
not set correctly. This creates a one-shot thread that will
wait 15 seconds and then requery the speed and if it is different
it will renotify the running daemons.
The kernel should notify us on speed changes, unfortunately this
is not done currently via a netlink message as you would think.
As I understand it there is some in-fighting about the proper
way to approach this issue and due to the way the kernel release
cycle works we are a ways off from getting this fixed. This
is a `hack` to make us work correctly while we wait for the
true answer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zserv.c had a grab bag of function declarations that
did not belong in it. Move those to where they better
belong.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If the frr.conf file contains bgp unnumbered peering but the associated
interfaces do not have the commands "no ipv6 nd suppress-ra" and
"ipv6 nd ra-interval 10" configured, when frr-reload.py is issued the
interface commands are removed from the running config, causing peers to
got down and stay down after a link flap. This situation can occur if
the frr.conf file is created manually or via automation (like ansible)
but a subsequent "wr mem" has not been performed.
This fix changes the behavior so that the interface ipv6 nd ra commands
created by bgp are not displayed. Therefore, when the above condition
occurs, there is no difference between the running and stored configs
and peers work fine.
Ticket: CM-18702
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-7004
Testing-done: Manual testing successful. L3-smoke has no new failures
This reverts commit c14777c6bf.
clang 5 is not widely available enough for people to indent with. This
is particularly problematic when rebasing/adjusting branches.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Define interface types of interest and recognize the types. Store layer-2
information (VLAN Id, VNI etc.) for interfaces, process bridge interfaces
and map bridge members to bridge. Display all the additional information
to user (through "show interface").
Note: Only implemented for the netlink interface.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Restore the original logic in netlink_link_change() which works like this:
* once an interface event is detected, lookup the associated interface
by its name;
* call the set_ifindex() function;
* set_ifindex() will lookup the interface again but now by its ifindex. If
the lookups by name and ifindex yield to different results, then the
interface was renamed and set_ifindex() will take care of that.
In the future, zns->if_table will be split into two different data
structures to allow faster lookups by both name and ifindex.
Fixes Issue #397.
Regression introduced by commit 12f6fb9.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>