In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
The implementation currently derives this L3 interface for EVPN tenant
routes using special code that looks at route flags. This patch
exchanges the L3 interface between zebra and bgpd as part of the L3-VNI
exchange in order to eliminate some this special code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Commit: 6005fe55bc
Introduced a crash with zebra looking up either the
nbr structure or the mac structure. This is because
the zvni used is NULL and we eventually call a hash_lookup
call that would cause a NULL dereference. Partially
revert this commit to original behavior.
Problems found via clang Static Analyzer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In Asymmetric and symetric routing scenario in EVPN
where each VTEP pair having different set of addresses
for the SVIs.
This knob allows reachability (ping connectivity) of
SVI IPs and resolve ARP resoultion VTEPs across racks.
This knob should not be used when same SVI IPs configured
on VTEPs across racks or when advertise default gateway
is configured.
Ticket:CM-23782
Testing Done:
Bring up EVPN symmetric routing topology with different
SVI IPs on different VTEPs. Enable advertise svi ip
at each VTEP, remote VTEPs installs arp entry for
SVI IPs via EVPN type-2 route exchange.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The master thread handler is really part of the zrouter structure.
So let's move it over to that. Eventually zserv.h will only be
used for zapi messages.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In extended-mobility case ({IP1, MAC} binding),
when a MAC moves from local to remote, binding
changes to {IP2, MAC}, local neigh (IP1) marked
as inactive in frr.
The evpn draft recommends to probe the entry once
local binding changes from local to remote.
Once the probe is set for the local neigh entry,
kernel will attempt refresh the entry via sending
unicast address resolution message, if host does not
reply, it will mark FAILED state.
For FAILED entry, kernel triggers delete neigh
request, which result in frr to remove inactive entry.
In absence of probing and aging out entry,
if MAC moves back to local with {IP3, MAC},
frr will mark both IP1 and IP3 as active and sends
type-2 update for both.
The IP1 may not be active host and still frr advertises
the route.
Ticket:CM-22864
Testing Done:
Validate the MAC mobilty in extended mobility scenario,
where local inactive entry gets removed once MAC moves
to remote state.
Once probe is set to the local entry, kernel triggers
reachability of the neigh/arp entry, since MAC moved remote,
ARP request goes to remote VTEP where host is not residing,
thus local neigh entry goes to failed state.
Frr receives neighbor delete faster and removes the entry.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Neigh detected duplicate detected during local update,
upon receiving kernel neigh delete, set neigh inactive
flag so BGPd can install remote route entry if present.
Only if freeze action enabled, local duplicate detected
entry will not be present in BGPd thus marking neigh
inactive is safe. BGPd will simply attempt install
remote entry if present.
Ticket:CM-23438
Testing Done:
Validated MAC-IP pair, trigger mobility of between two
VTEPs, upon local freeze perform neigh delete which
triggers BGP to install remote type-2 route into kernel.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
A MACIP is detected as duplicate and after that
the host continue to move behind different VTEPs results
in local VTEP receiving remote mobility events.
In remote_macip_add, ensure to trigger dad if
MAC is marked as duplicate. In case of freeze
action enabled, is_dup_detect will be set to
avoids installing frozen MAC into kernel.
Ticket:CM-23649
Testing Done:
Configured detection action freeze with detection count
as 7 at DUT and >7 at remote VTEP,
trigger MAC-IP mobility between VTEPs.
once tdetection count reached, MAC detected as duplicate,
post detection move the host to remote. The local VTEP
receives remote macip add and entry is not installed into
kernel with fix.
root@VTEP1:~# net show evpn mac vni 1002 mac aa:aa:aa:aa:aa:aa
MAC: aa:aa:aa:aa:aa:aa
Remote VTEP: 27.0.0.16
Local Seq: 7 Remote Seq: 8
Duplicate, detected at Fri Jan 25 05:03:29 2019
Neighbors:
11.11.11.11 Inactive
Kernel entry still points to LOCAL
root@VTEP1:~# bridge fdb show | grep aa:aa:aa
aa:aa:aa:aa:aa:aa dev hostbond3 vlan 1002 master VxLanA-1
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
When a local neigh is added with a MAC that is remote or absent the
neigh is kept in zebra as local/in-active. But not propagated to bgpd.
Similarly when an inactive neigh is deleted the del-msg is not propagated
to bgpd.
Without this change bgp and zebra would fall out of sync as that
bgp would not know to rerun bestpath and for it to reinstall a
known remote path for the mac-ip in question. To fix this we
now propagate inactive neigh deletes to bgpd.
Ticket: CM-23018
Testing Done:
1. evpn-min
2. manually triggered the out-of-sync state and verified the fix
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
For neigh check duplicate flag as it can be inherited from
duplicate detected MAC (count could be 0).
Ticket:CM-23316
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Below are cases where EVPN duplicate detection
Freeze and Unfreeze required fixes:
Auto recovery needs to check neighbor's duplicate flag
to take action, as neigh could be marked duplicate
via inherited from MAC where IP detection count could be 0.
MAC duplicate detection needs to set flag to true
if freeze action is configured.
Local MAC add update should not send update to bgp
if MAC is in frozen state.
Remote MAC-IP update should not process neigh update if MAC
is detected as duplicate during remote update.
Ticket:CM-23344
Testing Done:
Trigger duplicate detection via both local and remote update trigger,
Validate clear command and other changes expected behavior.
Auto-recovery takes appropriate action on inherited IPs.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Duplicate address detection should operate
at default vrf instance.
For mac and neigh show command, auto recovery and few places
where tanent vrf_id used for zvrf instead use default
vrf instance. Use vxlan_if's or VRF_DEFAULT vrf_id to
fetch zebra's default vrf instance.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
An EVPN type-2 entry is in freeze state during remote update,
remote VTEP can send typ-2 withdraw update,
upon receiving an entry delete (withdraw), first check
kernel has in local reachable state. Upon
unfreeze use the local entry to advertise to peers.
Fetch is for both MAC and IP, delete can come for
only MAC or MAC-IP combined route.
The specific entry fetch only required request flag to be set,
dump flag is not required.
Testing Done:
Simulate two VTEPs to do M1, IP1 mobility sequence,
freeze MAC during remote MAC update, subsequently send
withdraw type-2 route from origintating VTEP.
This results in read apis to invoke for local reachable entry.
Zebra updates its cache and upon unfreeze originates type-2.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
the default vrf name was hardset to "Default", whereas the default vrf
name could have been configured in an other manner. Fix this
inconsistency.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the l3vni structure is allocated only once, since that structure is only
used for default netns. For that, move the initialisation part is moved
to a proper place, where there is no risk of attempting to initialise it
more than once, even when vrf backend is netns.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Clear dup address vni needs to return non-zero value
in case of command is not successful.
Ticket:CM-23122
Testing Done:
run clear command and check upon failure return code is non-zero.
root@TORS1:~# vtysh -c "clear evpn dup-addr vni 1000 ip 45.0.1.26"
% Requested IP's associated MAC 00:01:02:03:04:05 is still in duplicate
% state
root@TORS1:~# echo $?
1
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Change helps display detailed output for all possible VNI neighbors
without specifying VNI and ip. It helps in troubleshooting as a single
command can be fired to capture detailed info on all VNIs.
Ticket: CM-22832
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8034
Change helps display detailed output for all possible VNI MACs without
specifying VNI or mac. It helps in troubleshooting - a single
command can be fired to capture detailed info on all VNIs.
Also fixed and existing json related bug where json object is created by
a parent function and freed in child function.
Ticket: CM-22832
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8028
Change helps display detailed output for all possible VNIs without
specifying VNI. It helps in troubleshooting - a single command can
be fired to capture detailed info on all VNIs.
Ticket: CM-22831
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8013
Display following Per MAC and Neigh's output:
If duplicate address detection is under process,
display detection start time and detection count.
If duplicate address detection detected an address
as duplicate, display detection time and duplicate
status.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).
Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
will NOT be accepted if the remote seq was not reset in the previous step.
Ticket: CM-22707
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors
The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh
The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-22700
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.
At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.
To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.
Ticket: CM-22687
Reviewed By: CCR-7935
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The `struct zebra_ns` data structure is being used
for both router information as well as support for
the vrf backend( as appropriate ). This is a confusing
state. Start the movement of `struct zebra_ns` into
2 things `struct zebra_router` and `struct zebra_ns`.
In this new regime `struct zebra_router` is purely
for handling data about the router. It has no knowledge
of the underlying representation of the Data Plane.
`struct zebra_ns` becomes a linux specific bit of code
that allows us to handle the vrf backend and is allowed
to have knowledge about underlying data plane constructs.
When someone implements a *bsd backend the zebra_vrf data
structure will need to be abstracted to take advantage of this
instead of relying on zebra_ns.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We had a variety of issues with sorted list compare functions.
This commit identifies and fixes these issues.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow the modification of whether or not we will allow
BUM flooding on the vxlan bridge. To do this allow
the upper level protocol to specify via the ZEBRA_VXLAN_FLOOD_CONTROL
zapi message.
If flooding is disabled then BUM traffic will not be forwarded
to other VTEP's.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The block comments from a couple commits were not following
proper style. Fix.
Fix SA warning that had snuck in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that when the is_router condition changes for a locally learnt
neighbor, it is informed to BGP only if it is active i.e., the MAC is
also locally learnt.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-22288
Reviewed By: CCR-7832
Testing Done:
1. Failed test
2. vxlan_routing_test.py
Use boolean variables instead of unsigned int for certain VxLAN-EVPN
flags which are really used as boolean.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-22288
Reviewed By: CCR-7832
Testing Done:
Along with a subsequent, related commit
When a remote MAC goes away, but there are neighbors referring to it,
ensure that when the last remote neighbor goes away, the MAC is
uninstalled from the kernel and no longer considered as remote.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-22130
Reviewed By: CCR-7777
Testing Done:
1. Replicated failed scenario and verified with fix.
2. evpn-min
When a MAC moves from local to remote, a replace is allowed, EVPN
no longer has to delete the local MAC before installing the remote
MAC.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
The RB-Tree used to store rmac information was not properly
handling the v6 address family. Modify the code to allow
this handling.
Cleans up this error message:
zebra[2231]: host_rb_entry_compare: Unexpected family type: 10
That is being seen, This fixes some connectivity issues being seen.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that some bgp and ospf json commands did not return
any json output at all if the bgp/ospf instance did not exist.
Additionally, some bgp and ospf json commands did not return any json
output if the instance existed but no neighbors were defined. This
fix makes these commands more consistent in returning empty braces for
json output and issue a message if not using json output. Additionally,
made the flag "use_json" a bool to make it consistent since previously,
it had been defined as an int, char, u_char, and bool at various places.
Ticket: CM-21040
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Add a header to cleanup no declaration and properly
wrapper some variables to appropriate #ifdef.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Handle Remote Neigh entry state change from Router to Host.
Remote MAC-IP update may not continue EVPN NA Extended community,
Zebra need to accomodate if router_flag change for existing neigh
and install with or without Router Flag (R-bit).
Testing:
Have locally run MAC/IP (neigh entry) with R-bit set,
Checke on remote VTEP 'show bgp evpn route ...mac ip' and
'show evpn arp-cache ...' contians router flag.
Change host to remove R-bit, which locally learnt entry removes
Router flag. This results in remote vtep to remove R-bit from
neigh entry.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Neigh update can have router_flag change, from unset to set and
viceversa. This is the case where MAC, IP and VLAN are same but
entry's flag moved from R to not R bit and reverse case.
Router flag change needs to trigger bgpd to inform all evpn peers
to remove from the evpn route.
Testing Done:
Send GARP with and without R bit from host and validate neigh entry
and evpn neigh and mac-ip route entry in zebra and bgpd.
Check Peer VTEP evpn route entry where router flag is (un)set.
With R-bit
Route [2]:[0]:[0]:[48]:[00:1f:2f:db:45:a6]:[128]:[2006:33:33:2::10]
VNI 1001
Imported from
27.0.0.16:5:[2]:[0]:[0]:[48]:[00:1f:2f:db:45:a6]:[128]:[2006:33:33:2::10]
4435 5551
27.0.0.16 from MSP1(uplink-1) (27.0.0.9)
Origin IGP, valid, external, bestpath-from-AS 4435, best
Extended Community: RT:5551:1001 ET:8 ND:Router
Flag
AddPath ID: RX 0, TX 1261
Last update: Wed Aug 15 20:52:14 2018
Without R-bit
Route [2]:[0]:[0]:[48]:[00:1f:2f:db:45:a6]:[128]:[2006:33:33:2::10]
VNI 1001
Imported from
27.0.0.16:5:[2]:[0]:[0]:[48]:[00:1f:2f:db:45:a6]:[128]:[2006:33:33:2::10]
4435 5551
27.0.0.16 from MSP2(uplink-2) (27.0.0.10)
Origin IGP, valid, external, bestpath-from-AS 4435, best
Extended Community: RT:5551:1001 ET:8
AddPath ID: RX 0, TX 1263
Last update: Wed Aug 15 20:53:10 2018
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The neigh update can come prior to mac add update.
In this case, the mac will be auto created for the vni.
set router flag to local neigh update for mac with auto flag.
The neigh update will be informed to bgpd once local mac is learnt.
Unset router flag if the neigh update comes without the router flag
for an existing neigh entry.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Enhance the EVPN MAC and Neighbor cache display to show additional
information such as the mobility sequence numbers and the state.
Ensure that the neighbor state is set in a couple of places so
that the display is correct.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Implement procedures similar to what is specified in
https://tools.ietf.org/html/draft-malhotra-bess-evpn-irb-extended-mobility
in order to support extended mobility scenarios in EVPN. These are scenarios
where a host/VM move results in a different (MAC,IP) binding from earlier.
For example, a host with an address assignment (IP1, MAC1) moves behind a
different PE (VTEP) and has an address assignment of (IP1, MAC2) or a host
with an address assignment (IP5, MAC5) has a different assignment of (IP6,
MAC5) after the move. Note that while these are described as "move" scenarios,
they also cover the situation when a VM is shut down and a new VM is spun up
at a different location that reuses the IP address or MAC address of the
earlier instance, but not both. Yet another scenario is a MAC change for an
attached host/VM i.e., when the MAC of an attached host changes from MAC1 to
MAC2. This is necessary because there may already be a non-zero sequence
number associated with MAC2. Also, even though (IP, MAC1) is withdrawn before
(IP, MAC2) is advertised, they may propagate through the network differently.
The procedures continue to rely on the MAC mobility extended community
specified in RFC 7432 and already supported by the implementation, but
augment it with a inheritance mechanism that understands the relationship
of the host MACIP (ARP/neighbor table entry) to the underlying MAC (MAC
forwarding database entry). In FRR, this relationship is understood by the
zebra component which doubles as the "host mobility manager", so the MAC
mobility sequence numbers are determined through interaction between bgpd
and zebra.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a host moves and is locally reachable, if the local neighbor event
is received before the local MAC event, flag the neighbor as inactive
just as would happen in the case of a new host. This ensures that the
MACIP route will get originated as soon as the local MAC event is got.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
There is no need to check for failure of a ALLOC call
as that any failure to do so will result in a assert
happening. So we can safely remove all of this code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
show evpn mac vni all
show evpn mac vni x
does not display local svi and anycast mac into count.
Ticket:CM-20456
Testing Done:
Before:
TOR1# show evpn mac vni 1008
Number of MACs (local and remote) known for this VNI: 4
MAC Type Intf/Remote VTEP VLAN
44:38:39:00:6b:4c local vlan1008 1008
00:02:00:00:00:04 local hostbond5 1008
00:02:00:00:00:02 local hostbond4 1008
00:00:5e:00:01:01 local vlan1008-v0 1008
00:02:00:00:00:0c remote 27.0.0.15
00:02:00:00:00:0a remote 27.0.0.15
dell-s6000-07#
After:
TOR1# show evpn mac vni 1008
Number of MACs (local and remote) known for this VNI: 6
MAC Type Intf/Remote VTEP VLAN
44:38:39:00:6b:4c local vlan1008 1008
00:02:00:00:00:04 local hostbond5 1008
00:02:00:00:00:02 local hostbond4 1008
00:00:5e:00:01:01 local vlan1008-v0 1008
00:02:00:00:00:0c remote 27.0.0.15
00:02:00:00:00:0a remote 27.0.0.15
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
EVPN ND ext community support NA flag R-bit, to have proxy ND.
Set R-bit in EVPN NA if a given router is default gateway or there is a
local
router attached, which can be determine based on local neighbor entry.
Implement BGP ext community attribute to generate and parse R-bit and
pass along zebra to program neigh entry in kernel.
Upon receiving MAC/IP update with community type 0x06 and sub_type 0x08,
pass the R-bit to zebra to program neigh entry.
Set NTF_ROUTER in neigh entry and inform kernel to do proxy NA for EVPN.
Ref:
https://tools.ietf.org/html/draft-ietf-bess-evpn-na-flags-01
Ticket:CM-21712, CM-21711
Reviewed By:
Testing Done:
Configure Local vni enabled L3 Gateway, which would act as router,
checked
show evpn arp-cache vni x ip <ip of svi> on originated and remote VTEPs.
"Router" flag is set.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
SVI interface ip/hw address is advertised by the GW VTEP (say TORC11) with
the default-GW community. And the rxing VTEP (say TORC21) installs the GW
MAC as a dynamic FDB entry. The problem with this is a rogue packet from a
server with the GW MAC as source can cause a station move resulting in
TORC21 hijacking the GW MAC address and blackholing all inter rack traffic.
Fix is to make the GW MAC "sticky" pinning it to the GW VTEP (TORC11). This
commit does it by installing the FDB entry as static if the MACIP route is
received with the default-GW community (mimics handling of
mac-mobility-with-sticky community)
Sample output with from TORC12 with TORC11 setup as gateway -
root@TORC21:~# net show evpn mac vni 1004 mac 00:00:5e:00:01:01
MAC: 00:00:5e:00:01:01
Remote VTEP: 36.0.0.11 Remote-gateway Mac
Neighbors:
45.0.4.1
fe80::200:5eff:fe00:101
2001:fee1:0:4::1
root@TORC21:~# bridge fdb show |grep 00:00:5e:00:01:01|grep 1004
00:00:5e:00:01:01 dev vx-1004 vlan 1004 master bridge static
00:00:5e:00:01:01 dev vx-1004 dst 36.0.0.11 self static
root@TORC21:~#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Ticket: CM-21508
New version of clang are detecting function parameters that we should
not be casting as such. Fix these issues.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we have a host prefix, actually free the alloced memory
associated with it when we free it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Add centralized thread scheduling dispatchers for client threads and
the main thread
* Rename everything in zserv.c to stop using a combination of:
- zebra_server_*
- zebra_*
- zserv_*
Everything in zserv.c now begins with zserv_*.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The neighbor host_list is expensive as well. Modify
the code to take advantage of a rb_tree as well.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We are going to modify more host_list's to host_rb's
so let's rename some functions to take advantage of
what is there.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The host_list when we attempt to use it at scale, ends
up spending a non-trivial amount of time finding and
sorting entries for the host list. Convert to a rb tree.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have a command to enable symmetric routing only for type-5 routes.
This command is provided under vrf <> option in zebra as follows:
vrf <VRF>
vni <VNI> [prefix-routes-only]
We need the corresponding no version of the command as well as follows:
vrf <VRF>
no vni <VNI> [prefix-routes-only]
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
For ipv6 host, the next hop is conevrted to ipv6 mapped address.
However, the remote rmac should still be programmed with the ipv4 address.
This is how the entries will look in the kernel for ipv6 hosts routing.
vrf routing table:
ipv6 -> ipv6_mapped remote vtep on l3vni SVI
neigh table:
ipv6_mapped remote vtep -> remote RMAC
bridge fdb:
remote rmac -> ipv4 vtep tunnel
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Ensure that when EVPN routes are installed into zebra, the router MAC
is passed per next hop and appropriately handled. This is required for
proper multipath operation.
Ticket: CM-18999
Reviewed By:
Testing Done: Verified failed scenario, other manual tests
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
EVPN owns the remote neigh entries which are programed in the kernel.
This entries should not age out and the only way to delete should be
from EVPN. We should program these entries with NUD_NOARP instead of
NUD_REACHABLE to avoid aging of this macs.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
There can be a race condition between kernel and frr as follows.
Frr sends remote neigh notification.
At the (almost) same time kernel might send a notification saying
neigh is local.
After processing this notifications, the state in frr is local while
state in kernel is remote. This causes kernel and frr to be out of sync.
This problem will be avoided if FRR acts on the kernel notifications for
remote neighbors. When FRR sees a remote neighbor notification for a
neighbor which it thinks is local, FRR will change the neigh state to remote.
Ticket: CM-19923/CM-18830
Review: CCR-7222
Testing: Manual
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
[zebra/zebra_vxlan.c:5779] -> [zebra/zebra_vxlan.c:5778]:
(warning) Either the condition 'if(svi_if_zif&&svi_if_link)'
is redundant or there is possible null pointer dereference: svi_if_zif.
Signed-off-by: Ilya Shipitsin <chipitsine@gmail.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Nobody uses it, but it's got the same definition. Move the parser
function into zclient.c and use it.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Group send and receive functions together, change handlers to take a
message instead of looking at ->ibuf and ->obuf, allow zebra to read
multiple packets off the wire at a time.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
A lot of the handler functions that are called directly from the ZAPI
input processing code take different argument sets where they don't need
to. These functions are called from only one place and all have the same
fundamental information available to them to do their work. There is no
need to specialize what information is passed to them; it is cleaner and
easier to understand when they all accept the same base set of
information and extract what they need inline.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
All of the ZAPI message handlers return an integer that means different
things to each of them, but nobody ever reads these integers, so this is
technical debt that we can just eliminate outright.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Asymmetric routing is an ideal choice when all VLANs are cfged on all leafs.
It simplifies the routing configuration and
eliminates potential need for advertising subnet routes.
However, we need to reach the Internet or global destinations
or to do subnet-based routing between PODs or DCs.
This requires EVPN type-5 routes but those routes require L3 VNI configuration.
This task is to support EVPN type-5 routes for prefix-based routing in
conjunction with asymmetric routing within the POD/DC.
It is done by providing an option to use the L3 VNI only for prefix routes,
so that type-2 routes (host routes) will only use the L2 VNI.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
is_vni_l3 was removed as a part of PR1700. However, it seems to be used in master.
Causing the breakage. Made the changes to not use the API anymore.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
removed an additional field 'local-tunnel-ip' from l2vnis o/p
Ticket: CM-19670
Review: CCR-7167
Testing: Verified that the output is proper
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Currently, while processing kernel messages related to VNIs
we first check if VNI is L3 - this is a hash lookup
later, we do the lookup again to find the L3-VNI.
This is non-optimal.
Made changed to make sure we only do the lookup once.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Refine the notion of what FRR considers as "configured" VRF. It is no longer
based on user just typing "vrf FOO" but when something is actually configured
against that VRF. Right now, in zebra, the only configuration against a VRF
are static IP routes and EVPN L3 VNI. Whenever a configuration is removed,
check and clear the "configured" flag if there is no other configuration for
this VRF. When user attempts to configure a static route and the VRF doesn't
exist, a VRF is created; the VRF is only active when also defined in the
kernel.
Updates: 8b73ea7bd479030418ca06eef59d0648d913b620
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Ticket: CM-10139, CM-18553
Reviewed By: CCR-7019
Testing Done:
1. Manual testing for L3 VNI and static routes - FRR restart, networking
restart etc.
2. 'vrf' smoke
<DETAILED DESCRIPTION (REPLACE)>
Only check on L3-VNI SVI status when uninstalling remote next hops.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-19036
Reviewed By: None
Testing Done:
1. Networking restart
2. VxLAN interface disable/enable
3. VRF delete and readd
A VRF is active only when the corresponding VRF device is present in the
kernel. However, when the kernel VRF device is removed, the VRF container in
FRR should go away only if there is no user configuration for it. Otherwise,
when the VRF device is created again so that the VRF becomes active, FRR
cannot take the correct actions. Example configuration for the VRF includes
static routes and EVPN L3 VNI.
Note that a VRF is currently considered to be "configured" as soon as the
operator has issued the "vrf <name>" command in FRR. Such a configured VRF
is not deleted upon VRF device removal, it is only made inactive. A VRF that
is "configured" can be deleted only upon operator action and only if the VRF
has been deactivated i.e., the VRF device removed from the kernel. This is
an existing restriction.
To implement this change, the VRF disable and delete actions have been modified.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Mitesh Kanjariya <mkanjariya@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Ticket: CM-18553, CM-18918, CM-10139
Reviewed By: CCR-7022
Testing Done:
1. vrf and pim-vrf automation tests
2. Multiple VRF delete and readd (ifdown, ifup-with-depends)
3. FRR stop, start, restart
4. Networking restart
5. Configuration delete and readd
Some of the above tests run in different sequences (manually).
Kernel can delete a frr installed remote RMAC on a L3-VNI.
We should re-add if such a siatuation occurs
as we are the owner of the RMAC.
This behavor is same for remote MACs as well and was missing for RMACs.
Ticket: CM-18762
Review: CCR-6992
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
0. move all global EVPN details to 'show evpn [json]' command
1. change "VRF" to "Tenant VRF" in 'show evpn vni'
2. change 'show vrf vni' command to tabular form
and add l3-vni related params to the output
3. show evpn rmac should show refcount only in detailed output
4. show evpn next-hop should show refcount only in detailed output
5. move VRF in 'show evpn l3vni' to the end
6. add num rmacs and num nexthops to show evpn l3vni
7. remove "info" from 'show bgp vrf <> l3vni info'
8. show evpn vni <vni> should show l2vni details or l3 vni details
9. show evpn vni should show both L2 and L3 VNIs
10. show bgp l2vpn evpn - shows all global bgp l2vpn evpn details
11. show bgp l2vpn evpn vni - will show both l2 and l3 vnis
12. show bgp l2vpn evpn vni - should show both l2 and l3 vnis
13. follow camel notation for all json keys
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
In EVPN symmetric routing, not all subnets are presents everywhere.
We have multiple scenarios where a host might not get learned locally.
1. GARP miss
2. SVI down/up
3. Silent host
We need a mechanism to resolve such hosts. In order to achieve this,
we will be advertising a subnet route from a box and that box will help
in resolving the ARP to such hosts.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
1. Added default gw extended community
2. code modification to handle sticky-mac/default-gw-mac as they go together
3. show command support for newly added extended community
4. State in zebra to reflect if a mac/neigh is default gateway
5. show command enhancement to refelect the same in zebra commands
Ticket: CM-17428
Review: CCR-6580
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
The function zserv_create_header was exactly the same
as zclient_create_header. Let's just have one in the
system.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
EVPN is only enabled when user configures advertise-all-vni.
All VNIs (L2 and L3) should be cleared upon removal of this config.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
For EVPN type-5 route the NH in the NLRI is set to the local tunnel ip.
This information has to be obtained from kernel notification.
We need to pass this info from zebra to bgp in l3vni call flow.
This patch doesn't handle the tunnel-ip change.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
When a remote VTEP next hop entry (for symmetric routing) becomes
stale, reinstall it. This makes the behavior the same as what is
done for remote host next hops (for asymmetric routing and ARP
suppression).
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Upon a l3vni delete (no vni under a vrf) is executed,
we should uninstall all the RMACs and NHs associated with the l3vni.
This is because by the time we get a route delete in zebra
l3vni is already deleted and we dont have refernce to RMACs and NHs
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
zebra_find_client needs to match on instance as well so
protocols like ospfd will work correctly for notification.
Modify the zebra_find_client code to accept the instance
number and to pass it in appropriately.
Signed-off-by: Doanld Sharp <sharpd@cumulusnetworks.com>
This code modifies zebra to use the STREAM_GET functionality.
This will allow zebra to continue functioning in the case of
bad input data from higher level protocols instead of crashing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This improves code readability and also future-proofs our codebase
against new changes in the data structure used to store interfaces.
The FOR_ALL_INTERFACES_ADDRESSES macro was also moved to lib/ but
for now only babeld is using it.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This is an important optimization for users running FRR on systems with
a large number of interfaces (e.g. thousands of tunnels). Red-black
trees scale much better than sorted linked-lists and also store the
elements in an ordered way (contrary to hash tables).
This is a big patch but the interesting bits are all in lib/if.[ch].
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
MAC entries are internally created for purposes such as when a local
neighbor is learnt but the MAC itself is not yet learnt. Such MACs are
not "real", so ensure they are not counted for UI output.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-17991
Reviewed By: None
Testing Done: Manual, evpn-smoke
Fix following flaws that resulted in EVPN with L3 multi-tenancy (i.e.,
EVPN dealing with VxLAN routing in the presence of tenant VRFs) not
working properly:
1. EVPN enable ("advertise-all-vni") is a global command, ensure it is
accordingly processed. The config is maintained against the default VRF.
2. There was an incorrect attempt to derive the L3 VRF for L2 interfaces
- the VRF only applies for L3 interfaces, though the code may initialize
to the default value in other cases.
3. Functions to map (port, VLAN) to SVI or vice versa were incorrect -
particularly, zvni_map_svi() since it was looking in the L3 VRF for
"matching" L2 interface which it would never find. Fix.
In addition, since the 'zebra_vrf *' parameter is not relevant in most
places, it has been removed.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-17840
Reviewed By: CCR-6685
Testing Done: evpn-smoke, various manual tests
Convert the list_delete(struct list *) function to use
struct list **. This is to allow the list pointer to be nulled.
I keep running into uses of this list_delete function where we
forget to set the returned pointer to NULL and attempt to use
it and then experience a crash, usually after the developer
has long since left the building.
Let's make the api explicit in it setting the list pointer
to null.
Cynical Prediction: This code will expose a attempt
to use the NULL'ed list pointer in some obscure bit
of code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1) Various socket close issues
2) Ensure afi passed is usable
3) Fix some reads beyond buffer and reads after free
4) Ensure some failure modes are handled properly
5) Memory Leak(s) fix
6) There is no 6.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Frr has an assumption that when interface A links to B,
we already know about B. But that might be true always.
It is probably purely depends on the configuration
and how the interfaces are hashed in Kernel.
FRR seems to sometimes get "A is linked to B" before it knows about B,
in that case, the linkage between the data structure for A & B won't be proper.
Ticket: CM-17679
Review: ccr-6628
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
NUD_STALE flag is causing a build breakage,
we might have to define it somewhere in frr.
Reverting the fix for now untill we decide how to handle it correctly.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
When the MAC changes for a local neighbor, ensure that the neighbor data
structure as well as the link between the neighbor and MAC data structures
is updated correctly.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-17565
Reviewed By: CCR-6605
Testing Done: Manual, evpn-smoke
If we get an ageout notification from the kernel for EVPN-installed
neighbors, ensure that they are readded. Otherwise, while entries in
STALE state are usable, based on other kernel parameters they can
get deleted and adding them back only at delete can have other
undesirable performance consequences.
Note: This is the current Linux kernel behavior (to ageout EVPN
installed neighbors).
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Ticket: CM-15623, CM-17490
Reviewed By: CCR-6586
Testing Done: Manual, evpn-min
When multiple events are happening, it is possible that remote
MACIP or other requests may be received when an interface is down
or removed from a bridge. Handle this correctly.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Until now, we had to delete the local mac entries when a mac moved from local to remote,
with the new kernel patch that is no longer necessary.
Ticket:CM-16094
Reviewed By:CCR-6470
Testing Done: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Currently, FRR does not do any linking between local MACs and neighbors.
We found this necessary when dealing with centralized GW. A neigh is considered local only when the mac is learnt locally as well.
Ticket: CM-16544
Review: CCR-6388
Unit-test: Manual/Evpn-Smoke
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>