Add the local and remote sequence number to the `show evpn arp-cache vni XX` command.
VNI 1000111 #ARP (IPv4 and IPv6, local and remote) 15
IP Type State MAC Remote VTEP Seq #'s
fe80::202:ff:fe00:15 remote active 00:02:00:00:00:15 6.0.0.31 0/0
fe80::202:ff:fe00:8 local active 00:02:00:00:00:08 0/0
60.1.1.111 local active 00:02:00:00:00:08 0/0
2060:1:1:1::11 local active 00:e0:ec:38:49:a1 0/0
fe80::202:ff:fe00:11 remote active 00:02:00:00:00:11 6.0.0.30 0/0
2060:1:1:1::211 remote active 00:02:00:00:00:11 6.0.0.30 0/0
2060:1:1:1::121 remote active 00:02:00:00:00:0c 6.0.0.29 0/0
60.1.1.211 remote active 00:02:00:00:00:11 6.0.0.30 0/0
fe80::202:ff:fe00:c remote active 00:02:00:00:00:0c 6.0.0.29 0/0
60.1.1.11 local active 00:e0:ec:38:49:a1 0/0
fe80::2e0:ecff:fe38:49a1 local active 00:e0:ec:38:49:a1 0/0
60.1.1.221 remote active 00:02:00:00:00:15 6.0.0.31 0/0
2060:1:1:1::111 local active 00:02:00:00:00:08 0/0
2060:1:1:1::221 remote active 00:02:00:00:00:15 6.0.0.31 0/0
60.1.1.121 remote active 00:02:00:00:00:0c 6.0.0.29 0/0
The seq numbers are at 0/0 because we have had no mobility events.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
the interface search done was not looking in the appropriate zns. The
display was then wrong. Update the show command with the correct zns.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Say, more than one sequence of a route-map uses the same named entity
in its match clause. After that entity is removed from any one of the
route-map sequences, any further changes made to that entity doesn't
dynamically take effect.
A reference counter, that allows the named entity to keep a count of
the route-maps dependent on it, has been introduced to address this issue.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
When you have compiled FRR with a large multipath number
then encoding large ecmp routes between zebra and the
routing daemons. There exists a theoritical size
of multipath that will cause the encoding to be larger
than the ZEBRA_MAX_PACKET_SIZ. In the cases where
we have allocated streams that will encode routes
then let's ensure that whatever size we have will
auto-fit what we say we can send.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Introducing a 3rd state for route_map_apply library function: RMAP_NOOP
Traditionally route map MATCH rule apis were designed to return
a binary response, consisting of either RMAP_MATCH or RMAP_NOMATCH.
(Route-map SET rule apis return RMAP_OKAY or RMAP_ERROR).
Depending on this response, the following statemachine decided the
course of action:
Action: Apply route-map match and return the result (RMAP_MATCH/RMAP_NOMATCH)
State1: Receveived RMAP_MATCH
THEN: If Routemap type is PERMIT, execute other rules if applicable,
otherwise we PERMIT!
Else: If Routemap type is DENY, we DENYMATCH right away
State2: Received RMAP_NOMATCH, continue on to next route-map, otherwise,
return DENYMATCH by default if nothing matched.
With reference to PR 4078 (https://github.com/FRRouting/frr/pull/4078),
we require a 3rd state because of the following situation:
The issue - what if, the rule api needs to abort or ignore a rule?:
"match evpn vni xx" route-map filter can be applied to incoming routes
regardless of whether the tunnel type is vxlan or mpls.
This rule should be N/A for mpls based evpn route, but applicable to only
vxlan based evpn route.
Today, the filter produces either a match or nomatch response regardless of
whether it is mpls/vxlan, resulting in either permitting or denying the
route.. So an mpls evpn route may get filtered out incorrectly.
Eg: "route-map RM1 permit 10 ; match evpn vni 20" or
"route-map RM2 deny 20 ; match vni 20"
With the introduction of the 3rd state, we can abort this rule check safely.
How? The rules api can now return RMAP_NOOP (or another enum) to indicate
that it encountered an invalid check, and needs to abort just that rule,
but continue with other rules.
Question: Do we repurpose an existing enum RMAP_OKAY or RMAP_ERROR
as the 3rd state (or create a new enum like RMAP_NOOP)?
RMAP_OKAY and RMAP_ERROR are used to return the result of set cmd.
We chose to go with RMAP_NOOP (but open to ideas),
as a way to bypass the rmap filter
As a result we have a 3rd state:
State3: Received RMAP_NOOP
Then, proceed to other route-map, otherwise return RMAP_PERMITMATCH by default.
Signed-off-by:Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
The multicast mode enum was a global static in zebra_rib.c
it does not belong there, it belongs in zebra_router, moving.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
BGP always sends down the correct distance to use. We do
not need rib_add_multipath to double check the code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Since these functions are not really rib processing problems
let's move them to zebra_nhg.c which is meant for processing of
nexthop groups.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow route notifications to trigger route state changes,
such as installed -> not installed.
Clean up the fib-specific nexthop-group in a couple of
un-install paths.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use some common handling for both route update results
processing and dataplane notification processing. Use the
fib-specific nexthop-group if the update to a route results
in different nexthop status than the default rib-provided
nexthop-group.
Use the fib-specific nexthop-group, if present, to provide
the output of 'show ip fib'.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Some updates may be the result of a plugin's actions - such
as an async notification. Add accessor so that we can
identify that an update was generated by a plugin.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When setting route nexthops' installation state based on a
dataplane context struct, unset the installed state if a
nexthop was not present in the dataplane context.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Create a helper api that locates a zebra route-node from info
in a dplane context struct. Moved code from the results handler
to make a more-general api that could be used in other paths.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add an api to update the status of a route based on info
from a dplane context object. Use the api when processing
route update results from the dataplane.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a callback called at start time, once the dplane pthread
and thread_master are available. The callback is optional.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This code doees this:
a) Imagine ospf installs a route into zebra. Zebra crashes and
we restart FRR. If we are using the -k option on zebra than
all routes are re-read in, including this OSPF route.
b) Now imagine at the same time that zebra is starting backup
ospf on a different router looses a link to the this route.
c) Since zebra was run with -k this OSPF route is read back
in but never replaced and we now have a route pointing out
an interface to other routers that cannot handle it.
We should never allow users to implement bad options from zebra's
perspective that allow them to put themselves into a clear problem
state and additionally we have *absolutely* no mechanism to ever
fix that broken route without special human interaction.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
<Initial Code from Praveen Chaudhary>
Add the a `--graceful_restart X` flag to zebra start that
now creates a timer that pops in X seconds and will go
through and remove all routes that are older than startup.
If graceful_restart is not specified then we will just pop
a timer that cleans everything up immediately.
Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow label ignoring when comparing nexthops. Specifically,
add another functon nexthop_same_no_labels() that shares
a path with nexthop_same() but doesn't check labels.
rib_delete() needs to ignore labels in this case.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The functions nexthop_same() does not check the resolved
nexthops so I don't think this function is even needed
anymore.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
This is necessary to avoid a name collision with std::for_each
from C++.
Fixes the compilation of the gRPC northbound module.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
- Today, rtm_table field takes a vrf_id. It should take table_id
- rtm_table field is a uchar field which can only accomodate table_id less than
256. To support table id greater than 255, if the table_id is greater than 255,
set rtm_table to 0 and add RTA_TABLE attribute with 32 bit value as the
table_id.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
- For data plane processing of VxLAN routes, add encap type and L3VNI info to
rtmsg message for FPM.
- Add "RTA_ENCAP_TYPE" attribute for VxLAN encap with value 100.
This value is not currently used for RTA_ENCAP_TYPE for any encap.
- If "RTA_ENCAP_TYPE" is 100, add "RTA_ENCAP" attribute with "RTA_VNI" as a
nested attribute of RTA_ENCAP
Format of RTA_VNI attribute:
Len(2 bytes) type (2 bytes) Value(4 bytes)(VNI)
00 08 : 00 00 : 1000
RTA_VNI attribute is a custom attribute.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
VRRP doesn't install any routes, but should still have an array entry.
Also add a help string for VRRP to route_types.txt
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We were running into some problems where VRRP is trying to protodown
interfaces that no longer exist. While this is a minor bug in its own
right, this was crashing Zebra because Zebra was not doing a null check
after its ifindex lookup.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Add a check for after table lookup during route map update.
If the table ID does not exist, continue.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The multipath_num global variable was moved into
zrouter.multipath_num but this particular usage
of it was not updated.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
bfd cbit is a value carried out in bfd messages, that permit to keep or
not, the independence between control plane and dataplane. In other
words, while most of the cases plan to flush entries, when bfd goes
down, there are some cases where that bfd event should be ignored. this
is the case with non stop forwarding mechanisms where entries may be
kept. this is the case for BGP, when graceful restart capability is
used. If BFD event down happens, and bgp is in graceful restart mode, it
is wished to ignore the BFD event while waiting for the remote router to
restart.
The changes take into account the following:
- add a config flag across zebra layer so that daemon can set or not the
cbit capability.
- ability for daemons to read the remote bfd capability associated to a bfd
notification.
- in bfdd, according to the value, the cbit value is set
- in bfdd, the received value is retrived and stored in the bfd session
context.
- by default, the local cbit announced to remote is set to 1 while
preservation of the local path is not set.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Make the RIB_*_ROUTE() macro which is passed a route in rib.h just use
the R*_ROUTE() macros that directly check the type in rt.h.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The CLI code ensures that the clippy code produces
valid input for the zebra_routemap.c functions, but
coverity SA does not understand this fact. So add
some asserts to make the coverity SA happy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
No need to check for non-null ctx at this point in the
function as that it has already been derefed.
Signed-off-by: donald Sharp ,sahrpd@cumulusnetworks.com>
The re->uptime usage of time(NULL) leaves it open to
timing changes from outside influence. Switching
to monotime allows us to ensure that we have a timestamp
that is always increasing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The route_map_event_hook callback was passing the `route_map_event_t`
to each individual interested party. No-one is ever using this data
so let's cut to the chase a bit and remove the pass through of data.
This is considered ok in that the routemap.c code came this way
originally and after 15+ years no-one is using this functionality.
Nor do I see any `easy` way to do anything useful with this data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. If prefix not found, print "{}" for json
2. Print "Network not in table" for route option
3. Print "Network not in FIB" for fib option
4. Take care of "show ip route/fib vrf all prefix" command.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
if the local sticky mac delete request is received,
if there are associated neighbor entries present, mac's
only local flag is removed and marked as auto mac.
this results in next local mac learning automatically assumes
mac is sticky.
There is a case when bridge learning off is configured, user
configures sticky mac via bridge fdb add.
This MAC learns associated neighbor entry.
Later user deletes stick mac via bridge fdb del, this triggers
frr to delete mac but if there are neighbors present, frr marks
MAC as AUTO but does not remove sticky flag.
User enables bridge learning on which triggers
The mac to learn as dynamic entry and in absence of this
fix, the mac is marked as sticky.
Ticket:CM-24968
Reviewed By:CCR-8683
Testing Done:
Validated broken condition with internally reproduction
with fix and without.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
According to the review comments, added "Network not in FIB" message when we do
not have a FIB route present for given prefix.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
With the previous commit, the zebra_router_score_proto function
became unnecessary, so let us remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For each table created by a vrf, keep track of it and
allow for proper cleanup on shutdown of that particular
table. Cleanup client shutdown to only cleanup data
that the particular vrf owns. Before we were cleaning
the same table 2 times.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Combine the zebra_vrf_other_route_table and zebra_vrf_table_with_table_id
functions into 1 function. Since they are basically the same thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
"show ip/ipv6 route <prefix> [json]" uses a different parser chain from
"show ip/ipv6 route [json]".
"show ip/ipv6 route <prefix> [json]" CLI does not support "fib" option.
Fix:
Add "fib" option to the above command.
The new command is: "show ip/ipv6 <route/fib> <prefix> [json]"
If "fib" option is specified, we will show only the selected routes
(Similar to "show ip/ipv6 fib")
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
messages from daemons to bfd daemons go through zebra. zebra reuses the
vrf identifier to send messages to bfd.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Try to remove any LSPs associated with a vrf when the vrf is
deleted. The vrf code was calling a helpful zebra_mpls api,
but that api was basically a no-op for vrfs other than
the default.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This command is broken and has been broken since the introduction
of vrf's. Since no-one has complained it is safe to assume that
there is no call for this specialized linux command. Remove
from the system with extreme prejudice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The rib_add( and rib_delete( functions are there to allow
kernel interactions with the creation of routes. Fixup the
code to be consistent in the passup of the tableid.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we switched to a pthread per client, we lost the ability to
correlate zapi message debugs with their handlers in zlog, because the
message was logged when it was read off the zapi socket and not right
before it was processed. Move the zapi msg hexdump to happen right
before we call the message handler.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The route_info[X].meta_q_map *must* be less than MQ_SIZE
or we will do some strange stuff, so assert on it at startup.
The distance in route_info is a uint8_t so let's keep the data
structure the same.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ifp pointer must be pointing at a real location
in memory since right above us in this loop we
return if it is.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The route entry code was using a custom linked list to handle
route entries. Remove and replace with the new lib link list
code. This reduces the size of the route entry by a further
8 bytes.
Observant people will notice that the current linked list
implementation is singly linked, while the Route Entry
is doubly linked. I am not terribly concerned about this
change as that 1) we do not see a large number of route
entries per prefix( say 2 maybe 3 items ) and route entries
do not come and go that often.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `struct rib_dest_t` was being used to store the linked
list of rnh's associated with the node. This was taking up
a bunch of memory. Replace with new data structure supplied
by David and see the memory reductions associated with 1 million
routes in the zebra rib:
Old:
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 675 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 567 MiB
Free small blocks: 39 MiB
Free ordinary blocks: 69 MiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
New:
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 574 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 536 MiB
Free small blocks: 33 MiB
Free ordinary blocks: 4600 KiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
`struct rnh` was moved to rib.h because of the tangled web
of structure dependancies. This data structure is used
in numerous places so it should be ok for the moment.
Future work might be needed to do a better job of splitting
up data structures and function definitions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `struct rib_dest_t` was being used to store the linked
list of rnh's associated with the node. This was taking up
a bunch of memory. Replace with new data structure supplied
by David and see the memory reductions associated with 1 million
routes in the zebra rib:
Old:
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 675 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 567 MiB
Free small blocks: 39 MiB
Free ordinary blocks: 69 MiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
New:
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 574 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 536 MiB
Free small blocks: 33 MiB
Free ordinary blocks: 4600 KiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
`struct rnh` was moved to rib.h because of the tangled web
of structure dependancies. This data structure is used
in numerous places so it should be ok for the moment.
Future work might be needed to do a better job of splitting
up data structures and function definitions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a function to check if the route_info array
has all types specified with data in it. Specifically,
test the 'key' attribute for non-zero data. Ignore
ZEBRA_ROUTE_SYSTEM as it should be zero key anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
With flooding control added recently we were not properly handling
the new flood control parameter in zebra_vxlan.c handler functions.
The error message that was being repeatedly seen:
2019/05/01 00:47:32 ZEBRA: [EC 100663311] stream_get2: Attempt to get out of bounds
2019/05/01 00:47:32 ZEBRA: [EC 100663311] &(struct stream): 0x7f0f04001740, size: 22, getp: 22, endp: 22
The fix was to ensure that both the _add and _del functions kept proper
sizing of amount of data read *and* the _del function was not
reading the flood_control data from the stream.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a comment to indicate that route types added to
Zebra, should also be present in the route_info array.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add OpenFabric to the route_info array for handling processing
of the OpenFabric route type.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Problem reported that route-maps applied to "ip protocol table bgp"
would not be invoked if the ip protocol table command was issued
after the bgp prefixes were installed. Found that a recent change
improving how often nexthop_active_update runs missed causing this
filtering to be applied. This fix resolves that issue as well as
a couple of other places that were problematic with the recent
change.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
in the case the vrf backend is vrf-lite, there is no need to have
separate sockets. use a socket located in zrouter, so that when needing
the socket, a common API is used. that API will return the appropriate
socket value.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when network namespace is used as vrf backend, there is need to have
separate contexts for rtadv contexts.
route advertisements have to look for appropriate interface based on
zvrf context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The alias/description of an interface in linux was being
used to override the internal description. As such let's
fix the display to keep track of both if we have it.
Config in FRR:
!
interface docker0
description another combination
!
interface enp3s0
description BAMBOOZLE ME WILL YOU
!
Config in linux:
sharpd@robot ~/f/zebra> ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
alias This is the loopback you cabbage
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 74:d0:2b:9c:16:eb brd ff:ff:ff:ff:ff:ff
alias HI HI HI
Now the 'show int descr' command:
robot# show int description
Interface Status Protocol Description
docker0 up down another combination
enp3s0 up up BAMBOOZLE ME WILL YOU
HI HI HI
lo up up This is the loopback you cabbage
Fixes: #4191
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Start using the dataplane for interface-address programming,
on netlink platforms. Other platforms just stubbed at this
point.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
These updates act as triggers to pimd to -
1. join the MDT for rxing VxLAN encapsulated BUM traffic
2. register the local-vtep-ip as a source for the MDT
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
An SG entry is added (if one doesn't already exist) when a l2-VNI is
associated with a mcast-grp and local-vtep-ip.
And viceversa; when the last l2-vni using a MDT is removed the SG
entry is deleted.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Based code for adding (S, G) entries. These entries are created when
a mcast-group and local-VTEP-IP is associated with and L2 VNI.
The parent (*, G) entries are created implicitly on the (S, G) addition
and play the role of termination entries.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Each multicast tunnel is associated with a -
1. Tunnel origination mroute that is used for forwarding the
VxLAN encapsulated flow -
S - local VTEP-IP
G - BUM mcast-group
2. And a tunnel termination entry -
S - * (any remote VTEP)
G - BUM mcast-group
Multiple L2 VNIs can share the same BUM mcast group (and local-VTEP-IP).
Zebra maintains an mcast (SG) hash table to pass this info to pimd for
subsequent MDT setup.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Remote VTEPs advertise the flood mode via IMET and the ingress VTEP
needs to perform head-end-replication of BUM packets to it only if the
PMSI tunnel type is set to ingress-replication. If a type-3 route is not
rxed or rxed with a mode other than ingress-replication we can skip
installation of the flood fdb entry for that L2-VNI. In that case the
remote VTEP is either not interested in BUM traffic or is using a
"static-config" based replication mode like PIM.
Sample output with HER -
=======================
root@TORS1:~# vtysh -c "show evpn vni 1000" |grep "Remote\|flood"
Remote VTEPs for this VNI:
27.0.0.8 flood: HER
root@TORS1:~#
Sample output with PIM-SM -
=========================
root@TORS2:~# vtysh -c "show evpn vni 1000" |grep "Remote\|flood"
Remote VTEPs for this VNI:
27.0.0.7 flood: -
root@TORS2:~#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The multicast group ip address for BUM traffic is configurable per-l2-vni.
One way to configure that is to setup a vxlan device that per-l2-vni and
specify the address against that vxlan device -
root@TORS1:~# vtysh -c "show interface vx-1000" |grep -i vxlan
Interface Type Vxlan
VxLAN Id 1000 VTEP IP: 27.0.0.15 Access VLAN Id 1000 Mcast 239.1.1.100
root@TORS1:~# vtysh -c "show evpn vni 1000" |grep Mcast
Mcast group: 239.1.1.100
root@TORS1:~#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Found that zebra_rnh_apply_nht_rmap would set the
NEXTHOP_FLAG_ACTIVE if not blocked by the route-map, even
if the flag was not active prior to the check. This fix
changes the flag used to denote the nexthop is filtered so
that proper active state can be retained. Additionally,
found two cases where we would send invalid nexthops via
send_client, which would also cause this crash. All three
fixed in this commit.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Update the nexthop flag output for the route entry dump to
include all possible flag states be output.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We currently run nexthop_active_check multiple times. Make the
code run once and figure out state from that.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The nexthop_active_update command looks at each individual
nexthop and decides if it has changed. If any nexthop
has changed we will set the re->status to ROUTE_ENTRY_CHANGED
and ROUTE_ENTRY_NEXTHOPS_CHANGED.
Additionally the test for old_nh_num != curr_active
makes no sense because suppose we have several events
we are processing at the same time and a total ecmp
of 16 but 14 are active at the start and 14 are active
at the end but different interfaces are up or down.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The NEXTHOP_FLAG_FILTERED went away when we started treating
static routes like every other route in the system. This was
a special case for handling static route code that just didn't
get finished cleaning up.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We are effectively calling nexthop_active_update() on every
route entry being processed for installation at least 2 times.
This is a bit ridiculous. We need to resolve the nexthops
when we know a route has changed in some manner, so do so.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zlog() should be part of the public logging API as it's useful in
the cases where the logging priority isn't known at compile time
(i.e. it depends on a variable).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
L3VNI configured in a specific VRF is allowed to unconfigure from any
VRF, including default (global) VRF. This results L3VNI delete notification
to BGP and subsequent type-5 route uninstall from the VRF the L3VNI belong to.
This also resulted in the inconsistent running configuration.
The deleted L3VNI still shows up in its original VRF. The VRF in which the
"no vni <x>" was executed doesn't display its own L3VNI.
Added a VRF check in zebra to prevent this.
Signed-off-by: Kishore Aramalla <karamalla@vmware.com>
When having a route recovery, because of the route installation
cycling and the next hop label check, it could happen that the PW
never gets recovered. The original code shows the intention of retrying,
but the code was missing. The fix includes the call to the timer programming
the recovery attempt.
Example for reproducing the issue:
|P1| <-> |P2| <-> |P3|
- Being P1, P2, P3 nodes, using IS-IS as IGP, and having a pseudowire
betwen P1 and P3 (P1, P2, P3 having configured LDP daemons).
- After 60 seconds, kill the IS-IS daemon in P2.
- Wait 30 seconds
- Launch again the IS-IS daemon in P2
- The bug/issue is that after P1 <-> P3 recovering connectivity sometimes
the PW is not recovered because the reason explained in the first paragraph.
Signed-off-by: F. Aragon <paco@voltanet.io>
In zebra terminate path, the node was attempted to remove
twice from the RB_TREE table. This lead to a crash during
zebra shutdown zebra_router_free_table already calls RB_REMOVE
to remove a node from rb tree table.
siginfo=0x7fffd9134a30, context=<optimized out>) at lib/sigevent.c:249
rbt=<optimized out>, t=<optimized out>) at lib/openbsd-tree.c:226
t=0x56296965ff50 <zebra_router_table_head_RB_INFO>) at lib/openbsd-tree.c:383
rbt=rbt@entry=0x562969669bd0 <zrouter+16>, elm=elm@entry=0x56296afcf810)
at lib/openbsd-tree.c:393
(elm=0x56296afcf810, head=0x562969669bd0 <zrouter+16>) at zebra/zebra_router.h:46
Singned-off-by: Chirag Shah <chirag@cumulusnetworks.com>
We were memsetting zebra_pbr_rule struct after
we had already put some information in it. Also updated
the init of the struct to use braces instead of a
memset.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The `show ipv[4|6] <nht|import-check> ...` commands are starting
to produce a bunch of output due to multiple daemons now
using the code. Allow the specification of a v4 or v6 address
to allow the show command to only display the interesting nht.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This fix covers the case where two or more events are processed but only one
becoming effective. E.g. when mixing a synchronous label request from a LDP
deamon and an asynchronous request from a BGP daemon it could happen to the
BGP having the label chunk, but the LDP stuck waiting for the response.
Given e.g.
ldpd <-------->
(sync label request)
Zebra (label proxy) <--> Zebra (shared label manager)
bgpd <-------->
(async label request)
Sequence:
LDP label request ----->
Zebra (label proxy FW) ----> Zebra (LM)
BGP label request ----->
Zebra (label proxy FW) ----> Zebra (LM)
<---- Zebra (LM) RP LDP
<---- Zebra (LM) RP BGP
Signed-off-by: F. Aragon <paco@voltanet.io>
We don't use th vrf-level VRF_RIB_SCHEDULED flag any longer;
remove it and collapse the zebra_vrf flags' values.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The current code path of registration does this:
a) Lookup or create the rnh
b) register the client with the rnh for callback
If this is a new rnh send a response to the client that
only includes the rnh data that it has (nothing so no path)
If this is a existing rnh send the actual path to the client,
if it exists.
c) If a new client or a flag has changed refigure and send result
to all clients.
This is problematic in that suppose the rnh is new. Clients
will receive two answers:
1) A call back with no nexthops
2) A call back with the resolved # of nexthops
Imagine pim who depends on nht to handle this, pim will create
a mroute( because it does a hard lookup of the rpf as it is registering
the nexthop ), then it will receive the first callback causing
it to tear down the mroute and then receive the second callback
causing it to put it right back.. This is obviously not very
good for mroutes.
This code moves the send to the new client till after the new
client has connected, thus only allowing one callback to the new
client with the actual answer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Routing protocols are allowed ( and even encouraged ) to modify
the flags that influence the nexthop tracking. As such when
we modify the tracking of a nexthop to go from, say, connected force
or not we must re-evaluate the nexthop and send the results
up to the interested parties.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
After we have evaluated the rnh for an import-check type
and we copy the re then we know that the state has changed
and we should be notifying the end user about it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
LSP processing was a zvrf flag based upon a connected route
coming or going. But this did not allow us to know
that we should do lsp processing other than after the meta-queue
processing was finished.
Eventually we moved meta-queue processing of do_nht_processing
to after the dataplane sent the main pthread some results.
This of course left us with a timing hole where if a connected
route came in and we received a data plane response *before*
the meta queue was processed we would not do the work as necessary.
Move the lsp processing to a flag off of the rib_dest_t. If it
is marked then we need to process lsps.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a detailed debugging command for NHT tracking and add
the detailed output to the log about why we make some decisions
that we are. I tried to model this like the rib processing
detailed debugs that we added a few months back.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Currently nexthop tracking is performed for all nexthops that
are being tracked after a group of contexts are passed back
from the data plane for post install processing.
This is inefficient and leaves us sending nexthop tracking
changes at an accelerated pace, when we think we've changed
a route. Additionally every route change will cause us
to relook at all nexthops we are tracking irrelevant if
they are possibly related to the route change or not.
Let's modify the code base to track the rnh's off of the rib
table's rn, `rib_dest_t`. So after we process a node, install
it into the data plane, in rib_process_result we can
look at the `rib_dest_t` associated with the rn and see that
a nexthop depended on this route node. If so, refigure it.
Additionally we will store rnh's that are not resolved on the
0.0.0.0/0 nexthop tracking list. As such when a route node
changes we can quickly walk up the rib tree and notice that
it needs to be reprocessed as well.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a default route_node for our routing tables. This will allow us
to know that we can hang data off the default route for processing.
We will be hanging the nexthop tracking data structures off the rib_dest_t
so that we can know which nexthops we need to handle. Effectively
nexthops that we are tracking that are unresolved will be stored on the
default route. When something changes in the rib tree we can
work up the rn->parent pointer checking for nexthops we need to re-evaluate.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The resolved_route is the prefix we are using in the routing table
to resolve this particular nexthop we are tracking. Add code
to better track it's change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The prn value as passed in may be NULL as such do not
allow it to be derefed (even though it works now).
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have several route types KERNEL and CONNECT that are handled via special
case in the code. This was causing a lot of work keeping the two different
classes of route types as special(SYSTEM OR NOT). Put the dplane
in charge of the code that sets the bits for signalling route install/failure.
This greatly simplifies the code calling path and makes all route types
be handled exactly the same. Additionaly code that we want to run
post data plane install can just work as per normal then, instead
of having to know we need to run it when we have a special type
of route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
When we get a route install failure from the kernel, actually
indicate in the rib the status of the routes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When switching routes from one route type to another actually
unset the old route as enqueued.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When shutting down, the individual vrf's own the shutdown of the table
and subsuquent removal from the routes from the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When shutting down and we have a very large table to shutdown
and after we've intentionally closed all the client connections
close the zebra zserv client socket.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It had no logical reason to be in the default VRF. This moves it to the
zebra_router, which is better suited to store global references.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
A lot of checks relied on the VRF ID and the EVPN VRF ID to be the same.
This patch changes those checks to the EVPN_ENABLED macro, which checks
if the VRF is the EVPN one.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Fix the macros for reading NLA attribute info
from an extended error ack. We were processing the data
using route attributes (rtattr) which is identical in size
to nlattr but probably should not be used.
Further, we were incorrectly calculating the length of the
inner netlink message that cause the error. We have to read
passed that in order to access all the nlattr's.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
For a MAC-IP pair generally local/netlink msg for
MAC is received followed by Neigh. The MAC can be detected as duplicate
during this event.
When a neigh update is received, the neigh inherits DUP flag from its
MAC and along with that mark the neigh as INACTIVE.
Also, In the case of DUP detected neigh, do not update its state
to ACTIVE before determining to send notification to bgpd.
There is a time when Neigh update received prior to MAC update.
In that case neigh is marked as inactive since its MAC is
still in REMOTE state. Once the MAC update is received and
it is detected as DUPLICATE, the neigh would inherit DUP flag
but remained in inactive state.
By fixing the first case, the neigh remains in inactive once
detected as DUPLICATE in both scenarios.
The unfreeze action would mark all inherited neighs to ACTIVE,
and clears DUP flag then sends notification to bgpd (to send type-2).
Ticket:CM-24339
Reviewed By:CCR-8451
Testing Done:
Validated dup detection on both environment where neigh and mac
notification can come as either one first.
With the fix, the neigh was remained in "inactive" state
once detected as duplicate.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The 'sho ip route summary' and 'sho ip route summary <prefix>'
paths used different definitions of a 'fib' route. Use
the route-entry 'INSTALLED' flag in both places.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This replaces manual checks of the flag with a wrapper macro to convey
the meaning "is evpn enabled on this vrf?"
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Rename {bgp,zvrf}_def{ault} to {bgp,zvrf}_evpn where it makes sense,
i.e. when they contain the EVPN instance.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Since the EVPN VRF may not be the default one, compare received
messages' VRF agains the EVPN VRF and not the Default.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
This uses the EPVN VRF to store L3VNIs hashes, and looks up L2VNIs in
this VRF as they are stored there.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
This sends local VNIs and local MAC addresses to the BGP instance
responsible for EVPN rather than the default one.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Since the EVPN session and underlay can be in a non-default VRF, the
default VRF can be an overlay VRF.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
If the EVPN VRF is not the default one (i.e. with advertise-all-vni),
this allows showing its information with `show bgp l2evpn evpn ...`
commands. They do not require adding `vrf VRFNAME` since we only
support a single EVPN VRF. The same is true for zebra-specific commands
(e.g. `show evpn ...`).
Configuration commands are not restricted to the default VRF but to
the EVPN one, that is to the one bearing `advertise-all-vni`.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
The EVPN VRF is defined by bgpd, and is the one vrf where
`advertise-all-vni` is present.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Duplicate address detection and recovery was relying on the l2-vni backptr
in the neighbor entry which was simply not initialized resulting in
a NULL pointer access in a setup with dup-addressed VMs -
VM1:{IP1,M1} and VM2:{IP1,M2}
Call stack:
(gdb) bt 6
at lib/sigevent.c:249
nbr=nbr@entry=0x559347f901d0, vtep_ip=..., vtep_ip@entry=..., do_dad=do_dad@entry=true,
is_dup_detect=is_dup_detect@entry=0x7ffc7f6be59f, is_local=is_local@entry=true)
at ./lib/ipaddr.h:86
ip=0x7ffc7f6be6f0, ifp=0x559347f901d0, zvni=0x559347f86800) at zebra/zebra_vxlan.c:3152
(More stack frames follow...)
(gdb) p nbr->zvni
$8 = (zebra_vni_t *) 0x0 <<<<<<<<<<<<<<<<<<<<
(gdb)
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
System Routes if received over the netlink bus in a
specific pattern that causes an update operation for that
route in zebra can leave the dest->selected_fib pointer NULL,
while having the ZEBRA_FLAG_SELECTED flag set. Specifically
one way to achieve this is to do this:
`ip addr del 4.5.6.7/32 dev swp1 ; ip addr add 4.5.6.7/32 dev swp1 metric 9`
Why is this a big deal?
Because nexthop tracking is looking at ZEBRA_FLAG_SELECTED to
know if we can use a route, while nexthop active checking uses
dest->selected_fib.
So imagine we have bgp registering a nexthop. nexthop tracking in
the above case will be able to choose the 4.5.6.7/32 route
if that is what the nexthop is, due to the ZEBRA_FLAG_SELECTED being
properly set. BGP then allows the peers connection to come up and we
install routes with a 4.5.6.7 nexthop. The rib processing for route
installation will then look at the 4.5.6.7 route see no
dest->selected_fib and then start walking up the tree to resolve
the route. In our case we could easily hit the default route and be
unable to resolve the route. Which then becomes inactive in the
rib so we never attempt to install it.
This commit fixes this problem because when the rib_process decides
that we need to update the fib( ie replace old w/ new ), the
replacement with new was not setting the `dest->selected_fib` pointer
to the new route_entry, when the route was a system route.
Ticket: CM-24203
Signed-off-by: Donald Sharp <sharpd@cumulusnetworkscom>
The dest->selected_fib should be reported in json output
so that we can debug subtle conditions a bit better in the
future.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleaup the rnh tables on shutdown before we cleanup tables. As that
this will remove any need to do rnh processing as part of shutdown.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we get a neighbor entry in zebra we start processing it.
Let's add some additional debugs to the processing so that when
it bails out and we don't use the data, we know the reason.
This should help in debugging the problems from why bgp does
not appear to have data associated with a neighbor entry
in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The check for an entry being NUD_PERMANENT has already been done
there is no need to do it twice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use const in the accessors for pseudowire nhlfe data; pull
that through the kernel-facing apis that use that data.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
In prep for adding nexthop info for pws, rename the accessor
for the pw destination. Add a nexthop-group to the pw
data in the dataplane module.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The current definition of an unnumberd interface as an interface with a
/32 IPv4 is too restrictive, especially for EVPN symmetric routing since
commit 2b83602b2 "*: Explicitly mark nexthop of EVPN-sourced routes as
onlink".
It removes the bypass check wether the nexthop is an EVPN VTEP, and
relies on the SVI to be unnumberd to bypass the gateway lookup. While
this works great if the SVI has an IP, it might not, and the test falls
flat and EVPN type 5 routes are not installed into the RIB.
Sample interface setup, where vxlan-blue is the L3VNI and br-blue the
SVI:
+----------+
| |
| vrf-blue |
| |
+---+--+---+
| |
+-------+ +-----------+
| |
+----+----+ +---------+---------+
| | | br1 |
| br-blue | | 10.0.0.1/24 |
| | +-+-------+-------+-+
+----+----+ | | |
| | | |
+-----+------+ +-----+--+ +--+---+ +-+----+
| | | | | | | |
| vxlan-blue | | vxlan1 | | eth1 | | eth2 |
| | | | | | | |
+------------+ +--------+ +------+ +------+
For inter-VNI routing, the SVI has no reason to have an IP, but it still
needs type-5 routes from remote VTEPs.
This commit expands the definition of an unnumberd interface to an
interface having a /32 IPv4 or no IPv4 at all.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
When a vrf is deleted we need to tell the zebra_router that we have
finished using the tables we are keeping track of. This will allow
us to properly cleanup the data structures associated with them.
This fixes this valgrind error found:
==8579== Invalid read of size 8
==8579== at 0x430034: zvrf_id (zebra_vrf.h:167)
==8579== by 0x432366: rib_process (zebra_rib.c:1580)
==8579== by 0x432366: process_subq (zebra_rib.c:2092)
==8579== by 0x432366: meta_queue_process (zebra_rib.c:2188)
==8579== by 0x48C99FE: work_queue_run (workqueue.c:291)
==8579== by 0x48C3788: thread_call (thread.c:1607)
==8579== by 0x48A2E9E: frr_run (libfrr.c:1011)
==8579== by 0x41316A: main (main.c:473)
==8579== Address 0x5aeb750 is 0 bytes inside a block of size 4,424 free'd
==8579== at 0x4839A0C: free (vg_replace_malloc.c:540)
==8579== by 0x438914: zebra_vrf_delete (zebra_vrf.c:279)
==8579== by 0x48C4225: vrf_delete (vrf.c:243)
==8579== by 0x48C4225: vrf_delete (vrf.c:217)
==8579== by 0x4151CE: netlink_vrf_change (if_netlink.c:364)
==8579== by 0x416810: netlink_link_change (if_netlink.c:1189)
==8579== by 0x41C1FC: netlink_parse_info (kernel_netlink.c:904)
==8579== by 0x41C2D3: kernel_read (kernel_netlink.c:389)
==8579== by 0x48C3788: thread_call (thread.c:1607)
==8579== by 0x48A2E9E: frr_run (libfrr.c:1011)
==8579== by 0x41316A: main (main.c:473)
==8579== Block was alloc'd at
==8579== at 0x483AB1A: calloc (vg_replace_malloc.c:762)
==8579== by 0x48A6030: qcalloc (memory.c:110)
==8579== by 0x4389EF: zebra_vrf_alloc (zebra_vrf.c:382)
==8579== by 0x438A42: zebra_vrf_new (zebra_vrf.c:93)
==8579== by 0x48C40AD: vrf_get (vrf.c:209)
==8579== by 0x415144: netlink_vrf_change (if_netlink.c:319)
==8579== by 0x415E90: netlink_interface (if_netlink.c:653)
==8579== by 0x41C1FC: netlink_parse_info (kernel_netlink.c:904)
==8579== by 0x4163E8: interface_lookup_netlink (if_netlink.c:760)
==8579== by 0x42BB37: zebra_ns_enable (zebra_ns.c:130)
==8579== by 0x42BC5E: zebra_ns_init (zebra_ns.c:208)
==8579== by 0x4130F4: main (main.c:401)
This can be found by: `ip link del <VRF DEVICE NAME>` then `ip link add <NAME> type vrf table X` again and
then attempting to use the vrf.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we install a new route into the kernel always use
REPLACE. Else if the route is already there it can
be translated into an append with the flags we are
using.
This is especially true for the way we handle pbr
routes as that we are re-installing the same route
entry from pbr at the moment.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the next hop's VRF is used for IPv4 and IPv6 unicast routes
sourced from EVPN routes, for next hop and Router MAC tracking and
install. This way, leaked routes from other instances are handled properly.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface. Howver, in the model that
is supported in the implementation and commonly deployed, there is no
explicit Overlay IP address associated with the next hop in the tenant
VRF; the underlay IP is used if (since) the forwarding plane requires
a next hop IP. Therefore, the next hop has to be explicit flagged as
onlink to cause any next hop reachability checks in the forwarding plane
to be skipped.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
Use existing mechanism to specify the nexthops as onlink when installing
these routes from bgpd to zebra and get rid of a special flag that was
introduced for EVPN-sourced routes. Also, use the onlink flag during next
hop validation in zebra and eliminate other special checks.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
Use the L3 interface exchanged between zebra and bgp in route install.
This patch in conjunction with the earlier one helps to eliminate some
special code in zebra to derive the next hop's interface.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
The implementation currently derives this L3 interface for EVPN tenant
routes using special code that looks at route flags. This patch
exchanges the L3 interface between zebra and bgpd as part of the L3-VNI
exchange in order to eliminate some this special code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Running zebra after commit 888756b208
in valgrind produces this item:
==17102== Invalid read of size 8
==17102== at 0x44D84C: rib_dest_from_rnode (rib.h:375)
==17102== by 0x4546ED: rib_process_result (zebra_rib.c:1904)
==17102== by 0x45436D: rib_process_dplane_results (zebra_rib.c:3295)
==17102== by 0x4D0902B: thread_call (thread.c:1607)
==17102== by 0x4CC3983: frr_run (libfrr.c:1011)
==17102== by 0x4266F6: main (main.c:473)
==17102== Address 0x83bd468 is 88 bytes inside a block of size 96 free'd
==17102== at 0x4A35F54: free (vg_replace_malloc.c:530)
==17102== by 0x4CCAC00: qfree (memory.c:129)
==17102== by 0x4D03DC6: route_node_destroy (table.c:501)
==17102== by 0x4D039EE: route_node_free (table.c:90)
==17102== by 0x4D03971: route_node_delete (table.c:382)
==17102== by 0x44D82A: route_unlock_node (table.h:256)
==17102== by 0x454617: rib_process_result (zebra_rib.c:1882)
==17102== by 0x45436D: rib_process_dplane_results (zebra_rib.c:3295)
==17102== by 0x4D0902B: thread_call (thread.c:1607)
==17102== by 0x4CC3983: frr_run (libfrr.c:1011)
==17102== by 0x4266F6: main (main.c:473)
==17102== Block was alloc'd at
==17102== at 0x4A36FF6: calloc (vg_replace_malloc.c:752)
==17102== by 0x4CCAA2D: qcalloc (memory.c:110)
==17102== by 0x4D03D88: route_node_create (table.c:489)
==17102== by 0x4D0360F: route_node_new (table.c:65)
==17102== by 0x4D034F8: route_node_set (table.c:74)
==17102== by 0x4D03486: route_node_get (table.c:327)
==17102== by 0x4CFB700: srcdest_rnode_get (srcdest_table.c:243)
==17102== by 0x4545C1: rib_process_result (zebra_rib.c:1872)
==17102== by 0x45436D: rib_process_dplane_results (zebra_rib.c:3295)
==17102== by 0x4D0902B: thread_call (thread.c:1607)
==17102== by 0x4CC3983: frr_run (libfrr.c:1011)
==17102== by 0x4266F6: main (main.c:473)
==17102==
This is happening because of this order of events:
1) Route is deleted in the main thread and scheduled for rib processing.
2) Rib garbage collection is run and we remove the route node since it
is no longer needed.
3) Data plane returns from the deletion in the kernel and we call
the srcdest_rnode_get function to get the prefix that was deleted.
This recreates a new route node. This creates a route_node with
a lock count of 1, which we freed via the route_unlock_node call.
Then we continued to use the rn pointer. Which leaves us with use
after frees.
The solution is, of course, to just move the unlock the node at the
end of the function if we have a route_node.
Fixes: #3854
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Commit: 6005fe55bc
Introduced a crash with zebra looking up either the
nbr structure or the mac structure. This is because
the zvni used is NULL and we eventually call a hash_lookup
call that would cause a NULL dereference. Partially
revert this commit to original behavior.
Problems found via clang Static Analyzer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
L3VNI keeps reference to svi interface (ifp).
When a netlink change received there is no flag
that mac has changed. Currently simply overwrite
interface's (ifp) hw_addr (MAC) field.
For originating EVPN type-2 and type-5 routes due to VNI
MAC change, comparison is required to check existing MAC
vs. netlink change MAC field.
Ticket:CM-23850
Reviewed By:CCR-8283
Testing Done:
Validate EVPN type-5 routes originated upon changing MAC address
of L3VNI's SVI inteface via ip link set cmd.
checked show bgp l2vpn evpn route and Rmac field contains new
MAC address.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
FRR is reporting that a lsp deletion event as a failure
in the log messsages. This will lead to confusion and
lots of fun debugging.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
CLANG was throwing an error because struct rtadv_rdnss(dnssl) was being
initialized with = {0} instead of {}. Change to be the latter.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we schedule a packet for future handling, list the packet
type so that we can see what we are getting with debugs.
Also note which client and how many packets we received from that
client.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In Asymmetric and symetric routing scenario in EVPN
where each VTEP pair having different set of addresses
for the SVIs.
This knob allows reachability (ping connectivity) of
SVI IPs and resolve ARP resoultion VTEPs across racks.
This knob should not be used when same SVI IPs configured
on VTEPs across racks or when advertise default gateway
is configured.
Ticket:CM-23782
Testing Done:
Bring up EVPN symmetric routing topology with different
SVI IPs on different VTEPs. Enable advertise svi ip
at each VTEP, remote VTEPs installs arp entry for
SVI IPs via EVPN type-2 route exchange.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
With the data plane changes that were made, we are now running
nexthop tracking 2 times. Once at the end of meta-queue insertion
and once at the end of receiving a bunch of data from the dataplane.
The Addition of the data plane code caused flags to not be set
fully for the resolved routes( since we do not know the answer yet ),
This in turn caused the nexthop tracking run after the meta-queue
to think that the route was not `good`. This would cause it to
tell all interested parties that there was no nexthop.
After the dataplane insertion we are also no running nht code.
This was re-figuring out the nexthop correctly and also
correctly reporting to interested parties that there was a path again.
Example:
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, f - failed route
K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:06:47
S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:04:47
C>* 10.50.11.0/24 is directly connected, enp0s3, 00:06:47
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:06:47
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:06:47
donna.cumulusnetworks.com(config)# ip route 4.5.6.7/32 192.168.210.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, f - failed route
K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:07:06
S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:00:04
* via 192.168.210.1, enp0s9, 00:00:04
C>* 10.50.11.0/24 is directly connected, enp0s3, 00:07:06
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:07:06
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:07:06
donna.cumulusnetworks.com(config)#
Log files for sharp, which is watching 4.5.6.7:
2019/02/04 15:20:54.844288 SHARP: Received update for 4.5.6.7/32
2019/02/04 15:20:54.844820 SHARP: Received update for 4.5.6.7/32
2019/02/04 15:20:54.844836 SHARP: Nexthop 192.168.209.1, type: 2, ifindex: 3, vrf: 0, label_num: 0
2019/02/04 15:20:54.844853 SHARP: Nexthop 192.168.210.1, type: 2, ifindex: 4, vrf: 0, label_num: 0
As you can see we have received an update with no nexthops( invalid route )
and a second update immediately after it with 2 nexthops.
What's the big deal you say? Well we have code in other daemons that reacts
to not having a path for a nexthop. In BGP this will cause us to tear
down the peer. In staticd we'll remove the recursively resolved route.
In pim we'll remove all paths to the mroute. This is not desirable.
The fix is to remove the meta-queue run of nexthop tracking.
While running after data plane notice of routes to handle is not ideal
we will be fixing this in the future with the nexthop group code, which
should know what nexthops are affected by a nexthop group change.
Fixed code debug code:
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, f - failed route
K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:00:46
S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:00:02
C>* 10.50.11.0/24 is directly connected, enp0s3, 00:00:46
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:46
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:46
donna.cumulusnetworks.com(config)# ip route 4.5.6.7/32 192.168.210.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, f - failed route
K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:00:59
S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:00:02
* via 192.168.210.1, enp0s9, 00:00:02
C>* 10.50.11.0/24 is directly connected, enp0s3, 00:00:59
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:59
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:59
2019/02/04 15:26:20.656395 SHARP: Received update for 4.5.6.7/32
2019/02/04 15:26:20.656440 SHARP: Nexthop 192.168.209.1, type: 2, ifindex: 3, vrf: 0, label_num: 0
2019/02/04 15:26:33.688251 SHARP: Received update for 4.5.6.7/32
2019/02/04 15:26:33.688322 SHARP: Nexthop 192.168.209.1, type: 2, ifindex: 3, vrf: 0, label_num: 0
2019/02/04 15:26:33.688329 SHARP: Nexthop 192.168.210.1, type: 2, ifindex: 4, vrf: 0, label_num: 0
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The restricting of data about interfaces was both inconsistent
in application and allowed protocol developers to get into states where
they did not have the expected data about an interface that they
thought that they would. These restrictions and inconsistencies
keep causing bugs that have to be sorted through.
The latest iteration of this bug was that commit:
f20b478ef3
Has caused pim to not receive interface up notifications( but
it knows the interface is back in the vrf and it knows the
relevant ip addresses on the interface as they were changed
as part of an ifdown/ifup cycle ).
Remove this restriction and allow the interface events to
be propagated to all clients.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Made changes and updated the routemap applied counter in the following flows.
1.Increment when route map attached to a list.
2.Decrement when route map removed / modified from a list.
3.Increment/decrement when route map create/delete callback triggered.
4.Besides ,This counter need not be updated when a route map is got updated.
i.e changing/adding a match value to the existing routemap.
In Zebra , same update api called for all three add/delete/update operation.
But this counter have to be updated only for routemap addition.
Addressed this specific change by identifying the routemap operation based
on routemap pointer.
Signed-off-by: RajeshGirada <rgirada@vmware.com>
The zebra.sock data is the listener socket for the zapi protocol.
The rest of the zebra router does not need to see this data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The client_list should be owned by the zebra_router data structure
as that it is part of global state information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The master thread handler is really part of the zrouter structure.
So let's move it over to that. Eventually zserv.h will only be
used for zapi messages.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we get into rib_process_result and the operation we are handling
is DPLANE_OP_ROUTE_UPDATE *and* the route entry being looked at
is a route replace, we currently have no way to decode to the old_re
and the re due to how we have stored context. As such they are the
same pointer.
As such the route replace for the same route type is causing the re
to set the installed flag and then immediately unset the installed
flag, leaving us in a state where the kernel has the route but
the rib thinks we are not installed.
Since the true old_re( the one being replaced by the update operation )
is going away( as that it zebra deletes the old one for us already )
this fix is not optimal but will get us moving forward.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Read the onlink flag from the kernel for routes and pass them
up and through to zebra so that we are consistent with what
the kernel is telling us.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we receive a valid message from the kernel that
is either a kernel or system route, we should trust
that the route is legit and just use it.
Old behavior:
K * 172.22.0.0/15 [0/0] via 172.22.2.254, eva_dummy1 inactive, 00:00:16
New Behavior:
K>* 172.22.0.0/15 [0/0] via 172.22.2.254, eva_dummy1, 00:02:35
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The route entry being displayed in debugs was displaying
the originating route type as a number. While numbers
are cool, I for one am not terribly interested in
memorizing them. Modify the (type %d) to a (%s) to
just list the string type of the route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Apparently 'f' means both OpenFabric and a Failed kernel
route installation.
Let's switch the 'f' for the failed kernel route installation
to 'r - rejected route'.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the nexthop->type is NEXTHOP_TYPE_IPV4_IFINDEX we
were writing the RTA_PREFSRC 2 times for the build_singlepath
and build_multipath functions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some v6 attributes for the netlink_route_build_singlepath
code were being written two times for the NEXTHOP_TYPE_IPV6_IFINDEX
nexthop type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In extended-mobility case ({IP1, MAC} binding),
when a MAC moves from local to remote, binding
changes to {IP2, MAC}, local neigh (IP1) marked
as inactive in frr.
The evpn draft recommends to probe the entry once
local binding changes from local to remote.
Once the probe is set for the local neigh entry,
kernel will attempt refresh the entry via sending
unicast address resolution message, if host does not
reply, it will mark FAILED state.
For FAILED entry, kernel triggers delete neigh
request, which result in frr to remove inactive entry.
In absence of probing and aging out entry,
if MAC moves back to local with {IP3, MAC},
frr will mark both IP1 and IP3 as active and sends
type-2 update for both.
The IP1 may not be active host and still frr advertises
the route.
Ticket:CM-22864
Testing Done:
Validate the MAC mobilty in extended mobility scenario,
where local inactive entry gets removed once MAC moves
to remote state.
Once probe is set to the local entry, kernel triggers
reachability of the neigh/arp entry, since MAC moved remote,
ARP request goes to remote VTEP where host is not residing,
thus local neigh entry goes to failed state.
Frr receives neighbor delete faster and removes the entry.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The kernel neigh update api helps update neighbor entry,
using changing state and flags parameters.
Ticket:CM-22864
Reviewed By:
Testing Done:
Signed-off-by:Chirag Shah <chirag@cumulusnetworks.com>
ip rule configuration is being equipped with extra log information for
fwmark information.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The ifa_flags value in the netlink message was originally a uint8_t
value. The linux kernel quickly ran out of 8 bits of data to
pass and the IFA_FLAGS value was added to the netlink message to allow
more than 8 bits of data to be passed. So replace the ifa_flags
with the IFA_FLAGS value if it exists in the interface netlink
message.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The onlink attribute was being passed from upper level protocols
as an attribute of the route *not* the individual nexthop. When
we pass this data to the kernel, we treat the onlink as a attribute
of the nexthop. This commit modifies the code base to allow
us to pass the ONLINK attribute as an attribute of the nexthop.
This commit also fixes static routes that have multiple nexthops
some onlink and some not.
ip route 4.5.6.7/32 192.168.41.1 eveth1 onlink
ip route 4.5.6.7/32 192.168.42.2
S>* 4.5.6.7/32 [1/0] via 192.168.41.1, eveth1 onlink, 00:03:04
* via 192.168.42.2, eveth2, 00:03:04
sharpd@robot ~/frr2> sudo ip netns exec EVA ip route show
4.5.6.7 proto 196 metric 20
nexthop via 192.168.41.1 dev eveth1 weight 1 onlink
nexthop via 192.168.42.2 dev eveth2 weight 1
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we process the dataplane data, keep track of whether or not a route
is in transit or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zebra is using NEXTHOP_FLAG_FIB as the basis of whether or not
a route_entry is installed. This is problematic in that we plan
to separate out nexthop handling from route installation. So modify
the code to keep track of whether or not a route_entry is installed/failed.
This basically means that every place we set/unset NEXTHOP_FLAG_FIB, we
actually also set/unset ROUTE_ENTRY_INSTALLED on the route_entry.
Additionally where we check for route installed via NEXTHOP_FLAG_FIB
switch over to checking if the route think's it is installed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are selecting nexthops for disply, abstract the notion
of what character we display to the end user about the status
of the nexthop.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we discover that a command given to the route add/delete
function in rt_socket.c is bogus, print out a debug message
but don't attempt to actually use a nexthop that we have not
figured out yet as part of the data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Just return right there, goto's are useful if you have common
code that needs to be cleaned up before exiting this function,
of which this function has none and there is only one goto.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Neigh detected duplicate detected during local update,
upon receiving kernel neigh delete, set neigh inactive
flag so BGPd can install remote route entry if present.
Only if freeze action enabled, local duplicate detected
entry will not be present in BGPd thus marking neigh
inactive is safe. BGPd will simply attempt install
remote entry if present.
Ticket:CM-23438
Testing Done:
Validated MAC-IP pair, trigger mobility of between two
VTEPs, upon local freeze perform neigh delete which
triggers BGP to install remote type-2 route into kernel.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
A MACIP is detected as duplicate and after that
the host continue to move behind different VTEPs results
in local VTEP receiving remote mobility events.
In remote_macip_add, ensure to trigger dad if
MAC is marked as duplicate. In case of freeze
action enabled, is_dup_detect will be set to
avoids installing frozen MAC into kernel.
Ticket:CM-23649
Testing Done:
Configured detection action freeze with detection count
as 7 at DUT and >7 at remote VTEP,
trigger MAC-IP mobility between VTEPs.
once tdetection count reached, MAC detected as duplicate,
post detection move the host to remote. The local VTEP
receives remote macip add and entry is not installed into
kernel with fix.
root@VTEP1:~# net show evpn mac vni 1002 mac aa:aa:aa:aa:aa:aa
MAC: aa:aa:aa:aa:aa:aa
Remote VTEP: 27.0.0.16
Local Seq: 7 Remote Seq: 8
Duplicate, detected at Fri Jan 25 05:03:29 2019
Neighbors:
11.11.11.11 Inactive
Kernel entry still points to LOCAL
root@VTEP1:~# bridge fdb show | grep aa:aa:aa
aa:aa:aa:aa:aa:aa dev hostbond3 vlan 1002 master VxLanA-1
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
In a VRR/VRRP setup we can have connected routes with different costs.
So this change eliminates suppressing metric display for connected routes.
Sample output -
root@TORC11:~# vtysh -c "show ipv6 route vrf vrf1"
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
> - selected route, * - FIB route
VRF vrf1:
K * ::/0 [255/8192] unreachable (ICMP unreachable), 00:00:36
C * 2001:aa:1::/64 [0/100] is directly connected, vlan1002-v0, 00:00:36
C>* 2001:aa:1::/64 [0/90] is directly connected, vlan1002, 00:00:36
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
MACVLAN devices are typically used for applications such as VRR/VRRP that
require a second MAC address (virtual). These devices have a corresponding
SVI/VLAN device -
root@TORC11:~# ip addr show vlan1002
39: vlan1002@bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
link/ether 00:02:00:00:00:2e brd ff:ff:ff:ff:ff:ff
inet6 2001:aa:1::2/64 scope global
valid_lft forever preferred_lft forever
root@TORC11:~# ip addr show vlan1002-v0
40: vlan1002-v0@vlan1002: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff
inet6 2001:aa:1::a/64 metric 1024 scope global
valid_lft forever preferred_lft forever
root@TORC11:~#
The macvlan device is used primarily for RX (VR-IP/VR-MAC). And TX is via
the SVI. To acheive that functionality the macvlan network's metric
is set to a higher value.
Zebra currently ignores the devaddr metric sent by the kernel and hardcodes
it to 0. This commit eliminates that hardcoding. If the devaddr metric
is available (METRIC_MAX) it is used for setting up the connected route
otherwise we fallback to the dev/interface metric.
Setting the macvlan metric to a higher value ensures that zebra will always
select the connected route on the SVI (and subsequently use it for next hop
resolution etc.) -
root@TORC11:~# vtysh -c "show ip route vrf vrf1 2001:aa:1::/64"
Routing entry for 2001:aa:1::/64
Known via "connected", distance 0, metric 1024, vrf vrf1
Last update 11:30:56 ago
* directly connected, vlan1002-v0
Routing entry for 2001:aa:1::/64
Known via "connected", distance 0, metric 0, vrf vrf1, best
Last update 11:30:56 ago
* directly connected, vlan1002
root@TORC11:~#
Ticket: CM-23511
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a local neigh is added with a MAC that is remote or absent the
neigh is kept in zebra as local/in-active. But not propagated to bgpd.
Similarly when an inactive neigh is deleted the del-msg is not propagated
to bgpd.
Without this change bgp and zebra would fall out of sync as that
bgp would not know to rerun bestpath and for it to reinstall a
known remote path for the mac-ip in question. To fix this we
now propagate inactive neigh deletes to bgpd.
Ticket: CM-23018
Testing Done:
1. evpn-min
2. manually triggered the out-of-sync state and verified the fix
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The sequence number used should be unique and increase by 1
for netlink commands. This will allow the code to match
up batched commands to actual requests, so that we can signal
the failure correctly back.
So start the movement and tracking of sequence numbers as
an atomic uint32_t in zebra_router. Modify the dataplane
code to start tracking contexts from this value.
In future commits we will move more of the sequencing
data into using this value.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were using dplane_ctx_get_status(ctx) and assigning that
value to zebra_dplane_status, not zebra_dplane_result( yeah what? )
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Finish the LSP update code for the async dataplane for
the openbsd platform. Remove synch apis now that we've
converted to the async code path.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Remove the last use of the pre-dataplane LSP update apis;
remove the stubs from the 'null' implementation file.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Start performing LSP updates through the async dataplane
subsystem. This is plumbed through for linux/netlink.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Adding infra to zebra dplane to support LSP updates. Add
kernel api for LSP updates that uses a dataplane context; add
stub apis for netlink, bsd, and 'null' kernel paths. Add
version of netlink mpls update code that takes a dplane
context struct instead of a zebra lsp struct.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add public versions of zebra apis that add NHLFEs to an LSP,
and that free NHLFEs. The dataplane code needs to capture/copy
NHLFEs in order to do async LSP programming.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Move route info to a separate struct and use a union in the
dplane context to hold either route or lsp info. Add
accessors for LSP info.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
IPv6 uses AF_LINK to represent netmasks, this commit unbreaks
`rtm_read_mesg` that was broke on the `rta_get*` refactory.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
(cherry picked from commit 7a163a7c59)
IPv6 netmasks use AF_LINK family type and puts the correct amount of
set bits in the data structure. If we only copy the SDL header we
won't get all IPv6 address length, we must copy the whole extension of
the `sockaddr_in6` struct (which is provided in `destlen` parameter).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
(cherry picked from commit 691e903879)
Remove two unused functions in `zebra/rt_socket.c`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
(cherry picked from commit 914fea09d9)
`sockaddr` `len` field is the address type size and not the mask length.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
(cherry picked from commit a6c0003182)
This is mostly to be consistent with the "show ip import-check"
command, which is very similar.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Favor usage of the afi_t enumeration to identify address-families
over using the classic AF_INET[6] constants for that. The choice to
use either of the two seems to be mostly arbitrary throughout our
code base, which leads to confusion and bugs like the one fixed by
commit 6f95d11a1. To address this problem, favor usage of the afi_t
enumeration whenever possible, since 1) it's an enumeration (helps
the compilers to catch some bugs), 2) has a safi_t sibling and 3)
can be used to index static arrays. AF_INET[6] should then be used
only when interfacing with the kernel or external libraries like
libc. The family2afi() and afi2family() functions can be used to
convert between the two different representations back and forth.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
`gate_buf` should be big enough to hold IPv6 addresses and `inet_ntop`
should be run in the correct `sockaddr` struct member.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Debug messages should use `prefix_buf` and `prefix2str` should only be
called once in `kernel_rtm`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add a new field in the ZEBRA_CAPABILITIES zapi message specifying
the VRF backend in use.
For simplicity, make the zclient code call vrf_configure_backend()
to apply the received value automatically instead of requiring
the daemons to do that themselves in their zebra_capabilities()
callbacks.
Additionally, call zebra_vrf_update_all() only after sending the
capabilities message to the client, so that it will know which VRF
backend is in use when processing the VRF messages.
This commit fixes a couple of bugs in the "interface" CLI command and
associated northbound callbacks, which behave differently depending
on the VRF backend in use. Before this commit, the vrf_backend
variable would always be set to VRF_BACKEND_NETNS in the client
daemons, even when zebra was started without the --vrfwnetns option.
This could lead to inconsistent behavior and subtle bugs under
specific circumstances.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
We were sending ZEBRA_INTERFACE_LINK_PARAMS messages under the
following circumstances:
* New interface was created (via kernel or config);
* Interface went from down to up;
* Update in the link-params configuration.
Now also send ZEBRA_INTERFACE_LINK_PARAMS messages whenever a zclient
connects and sends a ZEBRA_INTERFACE_ADD request. Without this fix,
the client daemons don't receive interface link parameters if they
are configured in the zebra startup configuration.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
client->ifinfo is a VRF bitmap, hence we need to use
vrf_bitmap_check() to check if a client is subscribed to receive
interface information for a particular VRF. Just checking if
the client->ifinfo value is set will always succeed since it's
a pointer initialized by zserv_client_create(). With this fix,
we'll stop sending interface messages from all VRFs to all clients,
even those that didn't subscribe to it.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Routes without nexthops don't make any sense, so we need to reject
them otherwise weird things can happen.
NOTE: blackhole routes aren't nexthop-less, they do have a single
nexthop of type NEXTHOP_TYPE_BLACKHOLE.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Some daemons like ospfd and isisd have the ability to advertise a
default route to their peers only if one exists in the RIB. This
is what the "default-information originate" commands do when used
without the "always" parameter.
For that to work, these daemons use the ZEBRA_REDISTRIBUTE_DEFAULT_ADD
message to request default route information to zebra. The problem
is that this message didn't have an AFI parameter, so a default route
from any address-family would satisfy the requests from both daemons
(e.g. ::/0 would trigger ospfd to advertise a default route to its
peers, and 0.0.0.0/0 would trigger isisd to advertise a default route
to its IPv6 peers).
Fix this by adding an AFI parameter to the
ZEBRA_REDISTRIBUTE_DEFAULT_{ADD,DELETE} messages and making the
corresponding code changes.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Future commits are going to introduce more rigor in
state setting in the case of received results from
the data plane. So let us move the DPLANE_OP_ROUTE_DELETE
state check to the same spot as the rest of the code that
is handling a particular operation.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify the status flag from 8 bits to 32 bits and to add
a few new flags that will be used in future commits.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify the meta_queue insertion such that we only enqueue
the route_node into one meta_queue instead of several.
Suppose we have multiple route_entries associated with
a particular node from rip, bgp, staticd. If we receive a
route update from rip, we would enqueue the route_node into
the 1, 2, 3 meta-nodes. Which means that we would run
the entire process of figuring out a route 3 times, while
nothing would change the second two times.
Modify the code to choose the lowest meta-queue and
install it into that one for processing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When a dataplane provider/plugin registers, return the new
handle/object - that's needed to use some provider apis
later on.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Pass lists of results back to zebra from the dataplane subsystem
(and pthread). This helps reduce the lock/unlock cycles when
zebra is busy. Also remove a couple of typedefs that made their
way into the dataplane header file - those violate the FRR style
guidelines.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
if the default vrf name is manually set, by passing -o parameter to
zebra, then this should be detected when walking the list of netns
available in the system. If a netns called vrf0 is present, then it
should be ignored.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when zebra is run, by using vrf netns backend mode, then the parser
detector of netns is run before forcing the default vrf to a possible
value. In that case, there is a possibility that the forced '-o' option
will create a second vrf with same name, whereas this option should be
there to uniquely have a default vrf with a value.
To make things consistent, the forced value will be priorised. Then, the
notifier will attempt to create vrf contexts. The expectation is that
the creation will fail, due to an already present vrf with same name.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When an empty netmask a wrong end size is calculated, lets handle this
corner case to avoid spurious warning messages.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Handle corner case where a warning log message is issued on interface
address netmask handling with sockaddr type AF_LINK: it may come empty
or with match all (all 0xFF).
In the first case all lengths are zero and we only need to copy the
first bytes, second case it comes with a zero index and all 0xFF bytes.
In any case we only need to figure out a few of the first bytes instead
of all data.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When porting routing socket macro data handling to functions, the
attribute function was forgotten. The only difference between the
attribute and address handler is the family type check.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
"brief" output for "show interface" helps when we have to quickly check
important information like ip address, vrf etc. This prints
information in the easy to read tabular format. Currently it prints oper
status, ifname, vrf, ipv4 and ipv6 addresses.
Ticket: CM-9109
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
For neigh check duplicate flag as it can be inherited from
duplicate detected MAC (count could be 0).
Ticket:CM-23316
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Below are cases where EVPN duplicate detection
Freeze and Unfreeze required fixes:
Auto recovery needs to check neighbor's duplicate flag
to take action, as neigh could be marked duplicate
via inherited from MAC where IP detection count could be 0.
MAC duplicate detection needs to set flag to true
if freeze action is configured.
Local MAC add update should not send update to bgp
if MAC is in frozen state.
Remote MAC-IP update should not process neigh update if MAC
is detected as duplicate during remote update.
Ticket:CM-23344
Testing Done:
Trigger duplicate detection via both local and remote update trigger,
Validate clear command and other changes expected behavior.
Auto-recovery takes appropriate action on inherited IPs.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Add the ability to retrieve the current role of mlag for this machine.
If mlag is not setup we will always return MLAG_ROLE_NONE.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The zebra_delete_rnh function is not needed to be exposed
to the entire world. Limit it's scope.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The deletion of a rnh is always proceeded by the same checks
to see if it is done. Just let zebra_delete_rnh do this test.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we call zebra_vrf_table_create, we've already created the info
pointer in zebra_router_get_table, so properly set the info->safi
and just store the zvrf->table[afi][safi] value.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When handling events from /var/run/netns folder, if several netns are
removed at the same time, only the first one is deleted in the frr. Fix
this behaviour by applying continue in the loop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Duplicate address detection should operate
at default vrf instance.
For mac and neigh show command, auto recovery and few places
where tanent vrf_id used for zvrf instead use default
vrf instance. Use vxlan_if's or VRF_DEFAULT vrf_id to
fetch zebra's default vrf instance.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
zebra uses the SIOCETHTOOL ioctl with the ETHTOOL_GSET command to
fetch the speed of interfaces from the kernel. The only problem is
that ETHTOOL_GSET returns EOPNOTSUPP when the given interface is a
virtual interface. This leads to zebra emitting warnings like this
at startup:
ZEBRA: IOCTL failure to read interface lo speed: 95 Operation not supported
ZEBRA: IOCTL failure to read interface dummy0 speed: 95 Operation not supported
ZEBRA: IOCTL failure to read interface ovs-system speed: 95 Operation not supported
Silence these warnings by ignoring EOPNOTSUPP errors, since we know
they are harmless. This is similar to how we handle EINVAL errors
from the BSD SIOCGIFMEDIA ioctl (commit c69f2c1ff).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Unlike the other interface zapi messages, ZEBRA_INTERFACE_VRF_UPDATE
identifies interfaces using ifindexes and not interface names. This
is a problem because zebra always sends ZEBRA_INTERFACE_DOWN
and ZEBRA_INTERFACE_DELETE messages before sending
ZEBRA_INTERFACE_VRF_UPDATE, and the ZEBRA_INTERFACE_DELETE callback
from all daemons set the interface index to IFINDEX_INTERNAL. Hence,
when decoding a ZEBRA_INTERFACE_VRF_UPDATE message, the interface
lookup would always fail since the corresponding interface lost
its ifindex. Example (ospfd):
OSPF: Zebra: Interface[rt1-eth2] state change to down.
OSPF: Zebra: interface delete rt1-eth2 vrf default[0] index 8 flags 11143 metric 0 mtu 1500
OSPF: [EC 100663301] INTERFACE_VRF_UPDATE: Cannot find IF 8 in VRF 0
To fix this problem, use interface names instead of ifindexes to
indentify interfaces like the other interface zapi messages do.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The route_info data structure already had a mapping of route type
to admin distance. Consolidate the meta_queue_map information
into this route_info data structure. This is to reduce the number
of places we need to remember to touch when adding a new routing
protocol.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
An EVPN type-2 entry is in freeze state during remote update,
remote VTEP can send typ-2 withdraw update,
upon receiving an entry delete (withdraw), first check
kernel has in local reachable state. Upon
unfreeze use the local entry to advertise to peers.
Fetch is for both MAC and IP, delete can come for
only MAC or MAC-IP combined route.
The specific entry fetch only required request flag to be set,
dump flag is not required.
Testing Done:
Simulate two VTEPs to do M1, IP1 mobility sequence,
freeze MAC during remote MAC update, subsequently send
withdraw type-2 route from origintating VTEP.
This results in read apis to invoke for local reachable entry.
Zebra updates its cache and upon unfreeze originates type-2.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Make netlink_request api generic where it can be used
for dump or querying specific information request.
nelink request nlm flags (NLM_F_ROOT | NLM_F_MATCH) are
used to dump purpose, if client wants to query spcific
MAC or IP using netlink_request does not require to set
them.
nlm struct is passed by the caller of netlink_request,
it can also set the nlm request flags.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit is the last missing piece to complete BGP LU support in bgpd. To this moment, bgpd (and zebra) supported auto label assignment only for prefixes leaked from VRFs to vpn and for MPLS SR prefixes. This adds auto label assignment to other routes types in bgpd. The following enhancements have been made:
* bgp_route.c:bgp_process_main_one() now sets implicit-null local_label to all local, aggregate and redistributed routes.
* bgp_route.c:bgp_process_main_one() now will request a label from the label pool for any prefix that loses the label for some reason (for example, when the static label assignment config is removed)
* bgp_label.c:bgp_reg_dereg_for_label() now requests labels from label pool for routes which have no associated label index
* zebra_mpls.c:zebra_mpls_fec_register() now expects both label and label_index from the calling function, one of which must be set to MPLS_INVALID_LABEL or MPLS_INVALID_LABEL_INDEX, based on this it will decide how to register the provided FEC.
Signed-off-by: Anton Degtyarev <anton@cumulusnetworks.com>
Reduce the zebra rib workqueue retry timeout, used when the queue
towards the zebra dataplane has reached its limit. Lowering the
value was reported to improve update throughput on some platforms.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The label processing for socket installs was not ensuring
that each nexthop would not accidently use the last
nexthops value.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The test we were using to ensure that a mask was sent in
is a bit redundant, let's just always send it in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ADD/DELETE messages are the only ones we support, so leave
early from the function, in other words don't check it every
nexthop loop.
Additionally nexthops only care about non recursive active flags.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
I'm going to rearrage the kernel_rtm_ipv4 and v6 functions
so the sin6_masklen needs to be moved a bit earlier.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The write function converted to v4 and v6 functions to a union sockunion
via casting. Just use `union sockunion` instead.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow the ns deletion event to happen *after* the data validity
checks.
Please note this probably still leaves a weird hole if we receive
multiple namespace events ( as the for loop implies ). We will
stop handling anything after a namespace deletion notification.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
the default vrf name was hardset to "Default", whereas the default vrf
name could have been configured in an other manner. Fix this
inconsistency.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the l3vni structure is allocated only once, since that structure is only
used for default netns. For that, move the initialisation part is moved
to a proper place, where there is no risk of attempting to initialise it
more than once, even when vrf backend is netns.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a route removal failure happens return to the installing
protocol that the route deletion failed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the zebra rib processing workqueue, set a small timeout
so that we will wait a short time if the queue into the
async dataplane is full. This helps avoid a situation where
the zebra main pthread constantly retries rib work without
giving the dataplane pthread a chance to make progress.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
NEXTHOP_FLAG_ACTIVE currently means that the nexthop is considered
good enough to be installed. With current ecmp restrictions this
translation from multipath_num is enforced in the data plane.
The problem with this is of course that every data plane now
becomes concerned about the multipath num and must enforce it
independently. Currently *bsd does not honor multipath_num at
all and linux marks all nexthops as being installed even when
it honors a multipath_num that is less than the total.
This code change moves the multipath_num enforcement from a dataplane
decision to a zebra nexthop decision. Thus dataplanes now can
just install those nexthops marked as NEXTHOP_FLAG_ACTIVE
without having to worry about multipath_num.
*BSD will now respect multipath_num and Linux now properly notes
which routes are actually installed or not:
sharpd@donna ~/f/t/topotests> ps -ef | grep frr
frr 6261 1556 0 09:12 ? 00:00:00 /usr/lib/frr/zebra -e 2 --daemon -A 127.0.0.1
frr 6279 1556 0 09:12 ? 00:00:00 /usr/lib/frr/staticd --daemon -A 127.0.0.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/106] via 10.0.2.2, enp0s3, 00:00:45
S>* 4.4.4.4/32 [1/0] via 10.0.2.1, enp0s3, 00:00:02
* via 192.168.209.1, enp0s8, 00:00:02
via 192.168.210.1, enp0s9 inactive, 00:00:02
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:45
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:45
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:45
donna.cumulusnetworks.com(config)#
sharpd@donna ~/f/t/topotests> ip route show
default via 10.0.2.2 dev enp0s3 proto dhcp metric 106
4.4.4.4 proto 196 metric 20
nexthop via 10.0.2.1 dev enp0s3 weight 1
nexthop via 192.168.209.1 dev enp0s8 weight 1
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 metric 106
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
192.168.209.0/24 dev enp0s8 proto kernel scope link src 192.168.209.2 metric 105
192.168.210.0/24 dev enp0s9 proto kernel scope link src 192.168.210.2 metric 103
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Stop creating individual, one-time events as each batch of
incoming zserv/zapi messages is processed - use a singleton
event so that the incoming message activity is more fair if
the zebra main pthread has other events to run.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
We never used this information and it was merely stored.
Additionally this is not something that is a flag, it's
a status.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Make the v4 and v6 code paths for rib_XXX calls in kernel_socket
as similiar as we can possibly make them. There is no need
for code duplication at this point in time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The rib_lookup_ipv4_route function is only used in a debug path.
Is only used for v4 and only checks to make sure that the rib
and fib are in sync( which is not needed/used/supported on other
platforms ). So let's just remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For nexthop handling use the actual resolved nexthop.
Nexthops are stored as a `special` list:
Suppose we have 3 way ecmp A, B, C:
nhop A -> resolves to nhop D
|
nhop B
|
nhop C -> resolves to nhop E
A and C are typically NEXTHOP_TYPE_IPV4( or 6 ) if they recursively resolve
We do not necessarily store the ifindex that this resolves to.
Current nexthop code only loops over A,B and C and uses those for
the zebra_rnh.c handling. So interested parties might receive non-fully
resolved nexthops( and they assume they are! ).
Let's convert the looping to go over all nexthops and only deal with
the resolved ones, so we will look at and use D,B and E.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `show ip route A.B.C.D json` command was only displaying
the last route entry looked at and we would drop the data
associated with other route entries. This fixes the issue:
robot# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/100] via 192.168.201.1, enp3s0, 00:13:31
C>* 4.50.50.50/32 is directly connected, lo, 00:13:31
D 10.0.0.1/32 [150/0] via 192.168.201.1, enp3s0, 00:09:46
S>* 10.0.0.1/32 [1/0] via 192.168.201.1, enp3s0, 00:10:04
C>* 192.168.201.0/24 is directly connected, enp3s0, 00:13:31
robot# show ip route 10.0.0.1 json
{
"10.0.0.1\/32":[
{
"prefix":"10.0.0.1\/32",
"protocol":"sharp",
"distance":150,
"metric":0,
"internalStatus":0,
"internalFlags":1,
"uptime":"00:09:50",
"nexthops":[
{
"flags":1,
"ip":"192.168.201.1",
"afi":"ipv4",
"interfaceIndex":2,
"interfaceName":"enp3s0",
"active":true
}
]
},
{
"prefix":"10.0.0.1\/32",
"protocol":"static",
"selected":true,
"distance":1,
"metric":0,
"internalStatus":0,
"internalFlags":2064,
"uptime":"00:10:08",
"nexthops":[
{
"flags":3,
"fib":true,
"ip":"192.168.201.1",
"afi":"ipv4",
"interfaceIndex":2,
"interfaceName":"enp3s0",
"active":true
}
]
}
]
}
robot#
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When using `SIOCGIFMEDIA` check for `EINVAL`, otherwise we might print
an error message on an unsupported interface.
FreeBSD source code reference:
https://github.com/freebsd/freebsd/blob/master/sys/net/if_media.c#L300
And:
8cb4b0c018/usr.sbin/rtsold/if.c (L211)
/*
* EINVAL simply means that the interface does not support
* the SIOCGIFMEDIA ioctl. We regard it alive.
*/
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Some address types were not being skipped triggering a warning log
message, so lets refactor this code to properly handle known and unknown
types.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Move the declaration of ROUNDUP and ROUND_TYPE to outside of
`ifdef SA_SIZE`. We'll use these definitions in the next commit.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Clear dup address vni needs to return non-zero value
in case of command is not successful.
Ticket:CM-23122
Testing Done:
run clear command and check upon failure return code is non-zero.
root@TORS1:~# vtysh -c "clear evpn dup-addr vni 1000 ip 45.0.1.26"
% Requested IP's associated MAC 00:01:02:03:04:05 is still in duplicate
% state
root@TORS1:~# echo $?
1
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Problem reported that kernel neighbor entries could end up in "FAILED"
state when the neighbor entry was deleted. This fix handles the
notification of the event from netlink messages and re-inserts the
deleted entry.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Always resend the nexthop information when we get a registration
event. Multiple daemons expect this information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
Change helps display detailed output for all possible VNI neighbors
without specifying VNI and ip. It helps in troubleshooting as a single
command can be fired to capture detailed info on all VNIs.
Ticket: CM-22832
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8034
Change helps display detailed output for all possible VNI MACs without
specifying VNI or mac. It helps in troubleshooting - a single
command can be fired to capture detailed info on all VNIs.
Also fixed and existing json related bug where json object is created by
a parent function and freed in child function.
Ticket: CM-22832
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8028
A while ago all FRR configuration commands were converted to use the
QOBJ infrastructure to keep track of configuration objects. This
means the configuration lock isn't necessary anymore because the
QOBJ code detects when someones tries to edit a configuration object
that was deleted and react accordingly (log an error and abort the
command). The possibility of accessing dangling pointers doesn't
exist anymore since vty->index was removed.
Summary of the changes:
* remove the configuration lock and the vty_config_lockless() function.
* rename vty_config_unlock() to vty_config_exit() since we need to
clean up a few things when exiting from the configuration mode.
* rename vty_config_lock() to vty_config_enter() to remove code
duplication that existed between the three different "configuration"
commands (terminal, private and exclusive).
Configuration commands converted to the new northbound model don't
need the configuration lock either since the northbound API also
detects when someone tries to edit a configuration object that
doesn't exist anymore.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
the vrf context was not created at previous location of the call.
The call is done after vrf initialisation.
PR=61513
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Acked-by: Nicolas dichtel <nicolas.dichtel@6wind.com>
the netns discovery process executed when vrf backend is netns, allows
the zebra daemon to dynamically change the default vrf name value. This
option is disabled, when the zebra is forced to a default vrf value with
option -o.
PR=61513
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Change helps display detailed output for all possible VNIs without
specifying VNI. It helps in troubleshooting - a single command can
be fired to capture detailed info on all VNIs.
Ticket: CM-22831
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8013
To avoid conflicts between the zebra main pthread and the
dataplane pthread, use a separate routing socket (on non-netlink
platforms) for dataplane route updates to the OS.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use a separate netlink socket for the dataplane's updates, to
avoid races between the dataplane pthread and the zebra main
pthread. Revise zebra shutdown so that the dataplane netlink
socket is cleaned-up later, after all shutdown-time dataplane
work has been done.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Improve, simplify dataplane provider locking apis. Add accessor
for dataplane pthread's thread_master, for use by providers who
need to use the thread/event apis.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Update the dataplane shutdown checks to include the providers.
Also revise the typedef for provider structs to make const
work.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Limit the number of updates processed from the incoming queue;
add more stats. Fill out apis for dataplane providers; convert
route update processing to provider model; move dataplane
status enum
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Display following Per MAC and Neigh's output:
If duplicate address detection is under process,
display detection start time and detection count.
If duplicate address detection detected an address
as duplicate, display detection time and duplicate
status.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The if_is_loopback() function is the right abstraction for identifying
loopback interfaces. There should be no reason for not using it in the
router-id code.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
It's been a year since we added the new optional parameters
to instantiation. Let's switch over to the new name.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the remote mac is deleted by bgpd we can end up with an auto mac
entry in zebra if there are neighs referring to the mac. The remote sequence
number in the auto mac entry needs to be reset to 0 as the mac entry may
have been removed on all VTEPs (including the originating one).
Now if the MAC comes back on a remote VTEP it may be added with MM=0 which
will NOT be accepted if the remote seq was not reset in the previous step.
Ticket: CM-22707
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
This is a fixup to commit -
f32ea5c07 - zebra: act on kernel notifications for remote neighbors
The original commit handled a race condition between kernel and zebra
that would result in an inconsistent state i.e.
kernel has an offload/remote neigh
zebra has a local neigh
The original commit missed setting the neigh to active when zebra
tried to resolve the inconsistency by modifying the local neigh to
remote neigh on hearing back its own kernel update. Fixed here.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-22700
When events cross paths between bgp and zebra bgpd could end up with a
dangling local MAC entry. Consider the following sequence of events on
rack-1 -
1. MAC1 has MM sequence number 1 and points to rack-3
2. Now a packet is rxed locally on rack-1 and rack-2 (simultaneously) with
source-mac=MAC1.
3. This would cause rack-1 and rack-2 to set the MM seq to 2 and
simultaneously report the MAC as local.
4. Now let's say on rack-1 zebra's MACIP_ADD is in bgpd's queue. bgpd
accepts rack-3's update and sends a remote MACIP add to zebra with MM=2.
5. zebra updates the MAC entry from local=>remote.
6. bgpd now processes zebra's "stale local" making it the best path.
However zebra no longer has a local MAC entry.
At this point bgpd and zebra are effectively out of sync i.e. bgpd has a
local-MAC which is not present in the kernel or in zebra.
To handle this window zebra should send a local MAC delete to bgpd on
modifying its cache to remote.
Ticket: CM-22687
Reviewed By: CCR-7935
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Current clang has an issue with the pointer/target argument
to at least one atomic/intrinsic. A variable with '_Atomic'
generates a compile-time error. Use a cast as a workaround
here to allow use of clang for now.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When the rib code is informed that a table is closing/
going away, only try once to uninstall associated routes from
the fib/dataplane. The close path can be called multiple times
in some cases - zebra shutdown, e.g.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Even if the neighbor entry we want already exists, force its
reinstallation to ensure that it's valid. This will now take place when
we request an update of the neighbor entry.
Ticket: CM-22604
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Recursive multipath nexthops were broken by the initial async
dataplane - we were trying to install an extra, invalid
nexthop.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The interface type can be a bond or a bond slave, add some
code to note this and to display it as part of a show interface
command.
Signed-off-by: Dinesh Dutt <didutt@gmail.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Correctly set safi to prevent duplicate allocations
* Free previously allocated table->info before overwriting it
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The frr-interface YANG module models interfaces using a YANG list keyed
by the interface name and the interface VRF. Interfaces can't be keyed
only by their name since interface names might not be globally unique
when the netns VRF backend is in use. When using the VRF-Lite backend,
however, interface names *must* be globally unique. In this case, we need
to validate the uniqueness of interface names inside the appropriate
northbound callback since this constraint can't be expressed in the
YANG language. We must also ensure that only inactive interfaces can be
removed, among other things we need to validate in the northbound layer.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Introduce frr-interface.yang, which defines a model for managing FRR
interfaces.
Update the 'frr_yang_module_info' array of all daemons that will
implement this module.
Add automatically generated stub callbacks in if.c. These callbacks will
be implemented in the following commit.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
FRR_DAEMON_INFO should now contain an array of 'frr_yang_module_info'
structures describing the YANG modules implemented by the daemon.
This array will be used by frr_init() function to load all YANG modules
and initialize the northbound callbacks during the daemon initialization.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Avoid running the shutdown/sigint handler code more than once. With
the async dataplane, once shutdown has been initiated, the completion
of all async updates triggers final shutdown of the zebra main
pthread. During that time, avoid taking and processing a second
signal, such as SIGINT or SIGTERM.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Impose a configurable limit on the number of route updates
that can be queued towards the dataplane subsystem.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Dplane support for zebra's route cleanup during shutdown (clean
shutdown via SIGINT, anyway.) The dplane has the opportunity to
process incoming updates, and then triggers final cleanup
in zebra's main thread.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add first pass at show commands for the zebra dplane. Add some stats
counters to show. Start prep for correct shutdown processing, and for
multiple providers.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Correct use of netlink_parse_info() in the netlink fuzzing path.
Also clarify a couple of comments about pthreads.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
We need a bit of special handling for system routes, which need
to be offered for redistribution even though they won't be
passing through the dplane system.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Initial WIP api to add providers into the zebra dataplane system,
with some simple ordering/prioritization.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Set SELECTED re immediately in rib_process, without expecting
that fib install has completed. Remove premature redistribute
call also.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Reduce or eliminate use of global zebra_ns structs in
a couple of netlink/kernel code paths, so that those paths
can potentially be made asynch eventually.
Slide netlink_talk_info into place to remove dependency on core
zebra structs; add accessors for dplane context block
Start init of route context from zebra core re and rn structs;
start queueing and event handling for incoming route updates.
Expose netlink apis that don't rely on zebra core structs;
add parallel route-update code path using the dplane ctx;
simplest possible event loop to process queued route'
updates.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When we fail to install a route into bsd, note the case
where we have no viable nexthops installed for it, so
that we can know in zebra if the route is good or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The _wrap_script inclusion implies a certain end functionality
of which we don't care. We just care that the hooks are called.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
These three data structures belong in the `zebra_router` structure
as that they do not belong in `struct zebra_ns`.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Move the rules_hash to the zrouter data structure and provide
the additional bit of work needed to lookup the rule based upon
the namespace id as well. Make the callers of functions not
care about what namespace id we are in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `struct zebra_ns` data structure is being used
for both router information as well as support for
the vrf backend( as appropriate ). This is a confusing
state. Start the movement of `struct zebra_ns` into
2 things `struct zebra_router` and `struct zebra_ns`.
In this new regime `struct zebra_router` is purely
for handling data about the router. It has no knowledge
of the underlying representation of the Data Plane.
`struct zebra_ns` becomes a linux specific bit of code
that allows us to handle the vrf backend and is allowed
to have knowledge about underlying data plane constructs.
When someone implements a *bsd backend the zebra_vrf data
structure will need to be abstracted to take advantage of this
instead of relying on zebra_ns.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>