If you haven't migrated your FPM server to use next hop groups, it is
possible that you want to disable this feature. This commit implements
a toggle to enable/disable next hop groups usage (even if your Linux
kernel is not using it).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Force off kernel NHG install with netns-based VRFs for
now. There is not really a good solution for allowing
kernel nexthop groups in namespaced based vrfs.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When installing a nexthop group, dump out the ifindex of the
nexthop being installed as a bit more data for the developer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The function rt_netlink.c is using to lookup the vrf by
passed in table id.
I'm also going to pretend that this function is not
so awful to run when we have a large number of routes
incoming.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Replace sprintf with snprintf where straightforward to do so.
- sprintf's into local scope buffers of known size are replaced with the
equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
buffer followed by strlcat
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The netlink_request function takes a `struct nlmsghdr *`
pointer from a common pattern that we use:
struct {
struct nlmsghdr n;
struct fib_rule_hdr frh;
char buf[NL_PKT_BUF_SIZE];
} req;
We were calling it `netlink_request(Socket, &req.n)`
The problem here is that coverity, rightly so, sees that
we access the data after the nlmsghdr in netlink_request and
tells us we have an read beyond end of the structure. While
we know we haven't mangled anything up here because of manual
inspection coverity doesn't have this knowledge implicitly.
So let's modify the code call to netlink_request to pass in the
void pointer of the req structure itself, cast to the appropriate
data structure in the function and do the right thing. Hopefully
the coverity SA will be happy and we can move on with our life.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Break lines longer than 80 columns.
* Remove space after '('.
* Use '%pIX' instead of 'inet_ntop'.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Store VNI information in the data plane context so we can use it to
build the FPM netlink update with that information later.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Instead of retuning always `0`, lets return the amount of used bytes for
the message. This will be used by the new FPM interface to know how many
bytes we must reserve for the output buffer.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
* Use `inet_ntop` instead of `inet_ntoa`
* Replace function name with `__func__`
* Inline functions
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Generalize the netlink route message building function so it can be used
in the future by the netlink Forwarding Plane Manager (FPM) interface.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
In some places we log the interface but not the vfr the
interface is in. In others we only output the vrf id, which
can be difficult for human to read. This commit makes zebra
debugs more vrf aware.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
In the netlink code for determining whether to set
a src on the route, we check if the cmd=NEW_ROUTE
but its not possible for this to ever be anything
but a new route since we do a goto skip further up
if its a DEL_ROUTE cmd.
So remove this unnecessary check.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Determine src based on nexthop data even when we are using
kernel nexthop objects.
Before, we were entirely skipping this step and just sending the
nexthop ID, ignoring src determination.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Abstraction the route src determination from a nexthop in the
netlink code into a function for both singlepath and mutlipath
to call.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Some logging systems are, er, "allergic" to tabs in log messages.
(RFC5424: "The syslog application SHOULD avoid octet values below 32")
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
We currently have netlink_neigh_update_ctx,
netlink_vxlan_flood_update_ctx and netlink_macfdb_update_ctx
all of which do slightly different RTM_NEWNEIGH calls into
the kernel. After this change, there will be one common
function.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
1) When programming a nhg id to the kernel we had no debug of that
is what we are doing.
2) Add debugs to all nexthop information to allow us to follow
which prefix we are talking about. This is especially
useful when dealing with a large number of routes and
you want to grep out one or two too see what is going on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that any weight associated with the next hop is installed for
IPv4 routes with IPv6 next hops too.
Updates: lib, zebra: Allow for installation of a weighted nexthop
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
The return type of is_selfroute function is changed from int to bool.
Also remove the redundant invoking of the is_selfroute function in the
calling function netlink_route_change_read_unicast
Fixes: https://github.com/FRRouting/frr/issues/5984
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
Readd the special MAC that represents the flood (head-end replication) entry
for EVPN-VxLAN upon getting a delete notification for it.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-25797
Ticket: CM-26238
Testing Done:
1. evpn-min, evpn-smoke - results summarized in CM-25798
add debug trace in specific neigh request send api
to help debug an issue where synchronous response parse
returns with NLMSG_DONE where there is no ipv6 neigh received.
the count value is set to 1 because the request contained
a spcific neigh.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Nexthop groups as a whole do not make sense to have a vrf'ness
As that you can have a arbitrary number of nexthops that point
to separate vrf's.
Modify the code to make this distinction, by clearly delineating
the line between the nhg and the nexthop a bit better.
Nexthop groups having a vrf_id only make sense if you are using
network namespaces to represent them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a config that disables use of kernel-level nexthop ids.
Currently, zebra always uses nexthop ids if the kernel supports
them.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
zebra can catch the kernel's route deletion by netlink.
but current FRR can't delete kernel-route on vrf(l3mdev)
when kernel operator delete the route on out-side of FRR.
It looks problem about kernel-route deletion.
This problem is caused around _nexthop_cmp_no_labels(nh1,nh2)
that checks the each nexthop's member 'vrf_id'.
And _nexthop_cmp_no_labels's caller doesn't set the vrf_id
of nexthop structure. This commit fix that case.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
We were not setting the RTNH_F_ONLINK flag where appropriate
when creating nexthop objects in the kernel.
Set it on the nhmsg.nh_flags netlink message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Linux has the idea of allowing a weight to be sent
down as part of a nexthop group to allow the kernel
to weight particular nexthop paths a bit more or less
than others.
See:
http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.rpdb.multiple-links.html
Allow for installation into the kernel using the weight attribute
associated with the nexthop.
This code is foundational in that it just sets up the ability
to do this, we do not use it yet. Further commits will
allow for the pass through of this data from upper level protocols.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Replace the existing list of nexthops (via a nexthop_group
struct) in the route_entry with a direct pointer to zebra's
new shared group (from zebra_nhg.h). This allows more
direct access to that shared group and the info it carries.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This reverts commit 7d5bb02b1a.
Allow zebra to actually maintain the nexthop group in the
linux kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fix 2 Coverity issues:
1) zebra_nhg.c -> all paths in nhg_ctx_process_finish have
already deref'ed the ctx pointer no need for a test of it
2) the **ifp pointer passed in may be NULL. Prevent an accidental
deref if calling function does not pass in a ifp pointer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Checkpatch was complaining because this code was extending
beyond 80 characters on a couple lines. Adjusted a conditional
tree to fix that.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Guard against an overflow read when processing
nexthop groups from netlink. Add a check to ensure
we don't try to write passed the array size.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add code for handling nexthop group hash entry encaps
and sending them to the kernel. Add some more debugging
information for the encaps and groups in general.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
There was some code copypasta for mpls stack building in the
netlink install path. Reduced that to a common function.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we receive a route delete from the kernel and it
contains a nexthop object id, use that to match against
route gateways with instead of explicit nexthops.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the supports_nh bool indicating whether the kernel we are
using supports nexthop objects into the netlink kernel interface
itself. Since only linux and netlink support nexthop object APIs
for now this is fine.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add handling for delete/update nexthop object messages from the
kernel.
If someone deletes a nexthop object we are still using, send it back
down. If the someone updates a nexthop we are using, replace that nexthop
with ours. Routes are referencing this nexthop object ID and we resolved
it ourselves, so we should force the other `someone` to submit to our
will.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
On restart, if we failed to remove any nexthop objects due
to a kill -9 or such event, sweep them if we aren't using them.
Add a proto field to handle this and remove the is_kernel bool.
Add a dupicate flag that indicates this nexthop group is only
present in our ID hashtable. It is a dupicate nexthop we received
from the kernel, therefore we cannot hash on it.
Make the idcounter globally accessible so that kernel updates
increment it as soon as we receive them, not when we handle them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Give all nhg_hash_entrys we install into the kernel
as nexthop objects a defined proto matching the zebra
rib table one. This makes sense since nhe's are proto-independent
and determined exclusively in zebra.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the ability to recursively resolve nexthop group hash entries
and resolve them when sending to the kernel.
When copying over nexthops into an NHE, copy resolved info as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were only setting and checking the ifindex if
the nexthop had an *_IFINDEX type. However, when nexthop
active checking is done, the non-*_IFINDEX types can also
obtain a nexthop with an ifindex and are thus valid too.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We will use a nhe context for dataplane interaction with
nextho group hash entries.
New nhe's from the kernel will be put into a group array
if they are a group and queued on the rib metaq to be processed
later.
New nhe's sent to the kernel will be set on the dataplane context
with approprate ID's in the group array if needed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Removing this function since the new paradigm
of everything just being nhg_connected structs
makes it not make a lot of sense.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Put the setting of the ifp on a nexthop group hash
entry into the zebra_nhg_alloc() function. It should
only be added if its not a group/recursive (it doesn't
have any depends) and its nexthop type has an ifindex.
This also provides functionality for proto-side ifp
setting.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
A nexthop group should not have a VRF ID. Only individual
nexthops need to be using a VRF. Fixed this both kernel and
proto side.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-organize and expose the nhg_connected functions so that
it can be used outside zebra_nhg.c. And then abstract those
into zebra_nhg_depends_* and zebra_nhg_depenents_* functons.
Switch the ifp struct to use an RB tree for its dependents,
making use of the nhg_connected functions.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Create a nhg_depenents tree that will function as a way
to get back pointers for NHE's depending on it.
Abstract the RB nodes into nhg_connected for both depends and
dependents. This same struct is used for both.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a helper function to allow us to check if two
nhg_hash_entry's dependency lists are equal.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Make the the kernel debug zlog for nexthop messages from the
kernel more aligned with the route message kernel debug zlog.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Changed our alloc function to just copy the nhg and
nhg_depends. This makes the zebra_nhg_find code a
little bit cleaner, hopefully preventing bugs.
The only issue with this is that it makes us have to loop
over the nexthops in a group an extra time for the copies.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add functionality to allow us to send nexthop groups
to the kernel. It creates a nexthop_grp array based on
the dependency list in the nhg_hash_entry and then shoves
that into the netlink message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Nexthop groups can have nexthops in different vrf's. So,
let's make the group vrf_id just be VRF_DEFAULT for hash
lookup purposes.
Set vrf_id to be VRF_DEFAULT for every message. If its a new
nextop, set the vrf to be the appropriate thing, otherwise
its a group and can just be left as default.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Simplify the code for nexthop hash entry creation. I made nexthop
hash entry creation expect the nexthop group and depends to always
be allocated before lookup. Before, it was only allocated if it had
dependencies. I think it makes the code a bit more readable to go
ahead an allocate even for single nexthops as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add functionality to read in a group from the kernel,
create a hash entry for it, and add its nexthops to
its dependency list.
Further, we create its nhg struct separtely from this,
copying over any nexthops it should reference directly
into it.
Thus, we have two types for representation of the nexthop group:
nhe->nhg_depends->[nhe, nhe, nhe]
nhe->nhg->nexthop->nexthop->nexthop
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Since nexthops are always going to need to be address family
specific unless they are only a group, we have to address
this when we receive and send them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The message for an invalid address family on a nexthop gateway did
not specify that is what for the gateway specifically.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add an interface pointer for an nexthop group hash entry
when we are getting a rib_add for a new route.
Also, add the interface index to the `show nexthop-group` command.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we get a new nexthop and find the interface associated
with it, add this nexthop to the interface's zebra interface
info nexthop hash entry list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Make our route entry struct's re->ng nexthop group pointer
just point to the nhe->nhg nexthop hash entry nexthop group.
This will allow updates to the nexthop itself to propogate
to our routes immediately.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a parameter to the rib_add function so that it takes
a nexthop ID from the kernel if one is passed along
with the route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the nexthop unicast parsing into its own function
to improve code readability. It was getting a bit too
indented.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add parsing code for nexthop object ID's when we get a
route. When we get a new route with the new kernel, it
will come with a nexthop ID and the nexthop full info.
We should just reference by ID if it exists and point
to the nexthop hash entry that matches it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added functionality so that when we receive a RTM_DELNEXTHOP
for a nhg_hash_entry that is still being referenced by
a route, we immediately push it back to the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were ignoring the status result interger from
the netlink request and message parsing and just
returning 0. Fixed this to return the result of the last one.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Device only nexthops still need an address family associated
with them. Decided to get this from the destination prefix on it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We needed a kernel debugging function for netlink nexthop
messages when people are debugging kernel zebra messages.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add all the neccessary code to allow nexthops to be processed
in separate dataplane contexts with the netlink dataplane kernel
provider.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added the appropriate flags that need to be set when
we receive a nexthop from the kernel. They should be
marked as ACTIVE and that they are in the FIB.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the functionality to parse new nexthop group messages
from the kernel and insert them into the appropriate hash
tables. Parsing is done at startup between interface and
interface address lookup. Add functionality to parse
changes to nexthops we already have. Add functionality
to parse delete nexthop messages from the kernel and
remove them from our table.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In the route_entry we are keeping a non pointer based
nexthop group, switch the code to use a pointer for all
operations here and ensure we create and delete the memory.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Update neighbor entries and rule entries to have the RTPROT_ZEBRA
protocol value. So we can tell where things come from.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some netlink-facing code used for evpn/vxlan programming was
being run in the dataplane pthread, but accessing zebra core
datastructs. Move some additional data into the dataplane
context, and use it in the netlink path instead.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Move neighbor programming to the dataplane; remove
old apis; remove some ifdef'd use of direct netlink
code points, using neutral values outside of the netlink-
specific files.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When resolving a nexthop, append its labels to the one its
resolving to along with the labels that may already be present there.
Before we were ignoring labels if the resolving level was greater than
two.
Before:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:07
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:07
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:04
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:17
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:17
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:17
ubuntu_nh#
```
This patch:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111/2222, 00:00:04
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:02
* via 7.7.7.7, dummy1 onlink, label 1111/2222/3333, 00:00:02
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:11
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:11
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:11
ubuntu_nh#
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When a new system route comes in and we have a pre-existing
non-system route we are not deleting the current system
route from the linux kernel.
Modify the code such that when a route replace is sent
to the kernel with a new route as a system route and
the old route as a non-system route do a delete of
the old route so it is no longer in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a conditional to guard against segfaulting on the debug
statement when zvrf lookup fails.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When installing a table route into the kernel choose
RTPROT_ZEBRA as the installing/controlling protocol.
This way we can know we installed it as well as stop
the warnings about this special case of `ip import-table XX`
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The failed neighbor event logging that was recently added in
commit: 3acae086ba
cast a bit too broad of a stroke. We should only inform
the user that we were ignoring the RTM_NEWNEIGH FAIL callback
when we believe it was one of our own 5549 entries.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a bit of extra code to indicate to the operator why
we intentionally rejected a kernel route from being used.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we get a neighbor entry for 5549 failure notice
from the kernel that means that something has probably
gone terribly wrong. Let's notice and not reinstall.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The re->uptime usage of time(NULL) leaves it open to
timing changes from outside influence. Switching
to monotime allows us to ensure that we have a timestamp
that is always increasing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have several route types KERNEL and CONNECT that are handled via special
case in the code. This was causing a lot of work keeping the two different
classes of route types as special(SYSTEM OR NOT). Put the dplane
in charge of the code that sets the bits for signalling route install/failure.
This greatly simplifies the code calling path and makes all route types
be handled exactly the same. Additionaly code that we want to run
post data plane install can just work as per normal then, instead
of having to know we need to run it when we have a special type
of route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
When we get a neighbor entry in zebra we start processing it.
Let's add some additional debugs to the processing so that when
it bails out and we don't use the data, we know the reason.
This should help in debugging the problems from why bgp does
not appear to have data associated with a neighbor entry
in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The check for an entry being NUD_PERMANENT has already been done
there is no need to do it twice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use const in the accessors for pseudowire nhlfe data; pull
that through the kernel-facing apis that use that data.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When we install a new route into the kernel always use
REPLACE. Else if the route is already there it can
be translated into an append with the flags we are
using.
This is especially true for the way we handle pbr
routes as that we are re-installing the same route
entry from pbr at the moment.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Read the onlink flag from the kernel for routes and pass them
up and through to zebra so that we are consistent with what
the kernel is telling us.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the nexthop->type is NEXTHOP_TYPE_IPV4_IFINDEX we
were writing the RTA_PREFSRC 2 times for the build_singlepath
and build_multipath functions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some v6 attributes for the netlink_route_build_singlepath
code were being written two times for the NEXTHOP_TYPE_IPV6_IFINDEX
nexthop type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The kernel neigh update api helps update neighbor entry,
using changing state and flags parameters.
Ticket:CM-22864
Reviewed By:
Testing Done:
Signed-off-by:Chirag Shah <chirag@cumulusnetworks.com>
Finish the LSP update code for the async dataplane for
the openbsd platform. Remove synch apis now that we've
converted to the async code path.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Adding infra to zebra dplane to support LSP updates. Add
kernel api for LSP updates that uses a dataplane context; add
stub apis for netlink, bsd, and 'null' kernel paths. Add
version of netlink mpls update code that takes a dplane
context struct instead of a zebra lsp struct.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
An EVPN type-2 entry is in freeze state during remote update,
remote VTEP can send typ-2 withdraw update,
upon receiving an entry delete (withdraw), first check
kernel has in local reachable state. Upon
unfreeze use the local entry to advertise to peers.
Fetch is for both MAC and IP, delete can come for
only MAC or MAC-IP combined route.
The specific entry fetch only required request flag to be set,
dump flag is not required.
Testing Done:
Simulate two VTEPs to do M1, IP1 mobility sequence,
freeze MAC during remote MAC update, subsequently send
withdraw type-2 route from origintating VTEP.
This results in read apis to invoke for local reachable entry.
Zebra updates its cache and upon unfreeze originates type-2.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Make netlink_request api generic where it can be used
for dump or querying specific information request.
nelink request nlm flags (NLM_F_ROOT | NLM_F_MATCH) are
used to dump purpose, if client wants to query spcific
MAC or IP using netlink_request does not require to set
them.
nlm struct is passed by the caller of netlink_request,
it can also set the nlm request flags.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
NEXTHOP_FLAG_ACTIVE currently means that the nexthop is considered
good enough to be installed. With current ecmp restrictions this
translation from multipath_num is enforced in the data plane.
The problem with this is of course that every data plane now
becomes concerned about the multipath num and must enforce it
independently. Currently *bsd does not honor multipath_num at
all and linux marks all nexthops as being installed even when
it honors a multipath_num that is less than the total.
This code change moves the multipath_num enforcement from a dataplane
decision to a zebra nexthop decision. Thus dataplanes now can
just install those nexthops marked as NEXTHOP_FLAG_ACTIVE
without having to worry about multipath_num.
*BSD will now respect multipath_num and Linux now properly notes
which routes are actually installed or not:
sharpd@donna ~/f/t/topotests> ps -ef | grep frr
frr 6261 1556 0 09:12 ? 00:00:00 /usr/lib/frr/zebra -e 2 --daemon -A 127.0.0.1
frr 6279 1556 0 09:12 ? 00:00:00 /usr/lib/frr/staticd --daemon -A 127.0.0.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/106] via 10.0.2.2, enp0s3, 00:00:45
S>* 4.4.4.4/32 [1/0] via 10.0.2.1, enp0s3, 00:00:02
* via 192.168.209.1, enp0s8, 00:00:02
via 192.168.210.1, enp0s9 inactive, 00:00:02
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:45
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:45
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:45
donna.cumulusnetworks.com(config)#
sharpd@donna ~/f/t/topotests> ip route show
default via 10.0.2.2 dev enp0s3 proto dhcp metric 106
4.4.4.4 proto 196 metric 20
nexthop via 10.0.2.1 dev enp0s3 weight 1
nexthop via 192.168.209.1 dev enp0s8 weight 1
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 metric 106
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
192.168.209.0/24 dev enp0s8 proto kernel scope link src 192.168.209.2 metric 105
192.168.210.0/24 dev enp0s9 proto kernel scope link src 192.168.210.2 metric 103
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that kernel neighbor entries could end up in "FAILED"
state when the neighbor entry was deleted. This fix handles the
notification of the event from netlink messages and re-inserts the
deleted entry.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Recursive multipath nexthops were broken by the initial async
dataplane - we were trying to install an extra, invalid
nexthop.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Reduce or eliminate use of global zebra_ns structs in
a couple of netlink/kernel code paths, so that those paths
can potentially be made asynch eventually.
Slide netlink_talk_info into place to remove dependency on core
zebra structs; add accessors for dplane context block
Start init of route context from zebra core re and rn structs;
start queueing and event handling for incoming route updates.
Expose netlink apis that don't rely on zebra core structs;
add parallel route-update code path using the dplane ctx;
simplest possible event loop to process queued route'
updates.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Conditional code in netlink_macfdb_update() introduced in 2232a77c used
the 'dst_present' variable because not all cases were covered. Now it is
not necessary.
Signed-off-by: F. Aragon <paco@voltanet.io>
Reduce or eliminate use of global zebra_ns structs in
a couple of netlink/kernel code paths, so that those paths
can potentially be made asynch eventually.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use boolean variables instead of unsigned int for certain VxLAN-EVPN
flags which are really used as boolean.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-22288
Reviewed By: CCR-7832
Testing Done:
Along with a subsequent, related commit
When a MAC moves from local to remote, a replace is allowed, EVPN
no longer has to delete the local MAC before installing the remote
MAC.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
So the linux kernel uses the RT_TABLE_MAIN for the table
id used for ip routing. The multicast routing tables use
RT_TABLE_DEFAULT. We changed the internal code of zebra_vrf
a few months back to use RT_TABLE_MAIN as the tableid to
use. This caused the pim sg stats to stop working because
of the kernel bug where it uses a different table
for ip routing and ip multicast.
Put a bit of a special case in to do the right thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Newer linux kernels apparently send data down the netlink
bus for the creation of mroutes. Add a bit of code
to notice this and to handle it appropriately( ie do
nothing at this point in time ) as that the correct
place to do this is in the pim socket in pimd.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are displaying data about a netlink message
in debugs or errors, print out the message type
as a string instead of a number.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There exists a possibility that the ifindex we are passed
does not exist and as such we should check for it not
resolving as part of the debug.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were ignoring mpls labels encapped with static routes.
Added support for single and multipath labels.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow protocols to specify to zebra that they would like zebra
to use the distance passed down as part of determine sameness for
Route Replace semantics.
This will be used by the static daemon to allow it to have
backup static routes with greater distances.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Prefix length validation checks should be returning an error
rather than 0. Switch to that and make them error messages.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Commit a2ca67d1d2 consolidated IPv4 and IPv6 handling. It also applied
our ignorance for IPv4 srcdest routes onto IPv6.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Bad nexthop messages from netlink were causing zebra
to hang here. Added a check to verify the length
of the nexthop so it doesn't keep trying to read.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Some more address family filters we can safely ignore
as well as typos in logger. Added AF_MPLS as filterable.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Zebra needed a check that varifies the prefix length
of an address is a valid length when receiving route
changes and interface address changes.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The zebra netlink socket was attempting to read netlink
messages with invalid address families in a couple areas.
Added filters and warn messages.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
EVPN ND ext community support NA flag R-bit, to have proxy ND.
Set R-bit in EVPN NA if a given router is default gateway or there is a
local
router attached, which can be determine based on local neighbor entry.
Implement BGP ext community attribute to generate and parse R-bit and
pass along zebra to program neigh entry in kernel.
Upon receiving MAC/IP update with community type 0x06 and sub_type 0x08,
pass the R-bit to zebra to program neigh entry.
Set NTF_ROUTER in neigh entry and inform kernel to do proxy NA for EVPN.
Ref:
https://tools.ietf.org/html/draft-ietf-bess-evpn-na-flags-01
Ticket:CM-21712, CM-21711
Reviewed By:
Testing Done:
Configure Local vni enabled L3 Gateway, which would act as router,
checked
show evpn arp-cache vni x ip <ip of svi> on originated and remote VTEPs.
"Router" flag is set.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Add 'const' to prefix args to several zebra route update,
redistribution, and route owner notification apis.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The interface lookup algorithm is different according to if we are on
netns vrf or not. If we are on the former case, then we only have to
parse the interfaces of the netns, while if we are on the other case, we
have to parse all the interfaces of all the vrfs ( since index is not
overlapping in the latter case).
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When we receive a netlink message from the kernel we have
handler functions for when we send a netlink command, if these
return a failure ( < 0 ) then we output that we had a parse
issue. But if all we get is:
2018-06-21T23:47:45.298156+00:00 qct-ix1-08 zebra[1484]: netlink-cmd (NS 0) filter function error
Then it is not very useful to figure out *where* the error happened.
Add more error code when in a decode path to hopefully allow us
to figure out where this message is coming from.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a bit of code to allow return of data plane
request messages.
Add the ability to pass the result back to callers
of kernel_route_rib.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The SOUTHBOUND_XXX enum was named a bit poorly.
Let's use a bit better name for what we are trying to do.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a route that we think we own and we
are not in startup conditions, then add a small debug
to help debug the issue when this happens, instead
of silently just ignoring the route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The re-use of RTPROT_STATIC has caused too many collisions
where other legitimate route sources are causing us to
believe we are the originator of the route. Modify
the code so that if another protocol inserts RTPROT_STATIC
we will assume it's a Kernel Route.
Fixes: #2293
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The linux kernel is getting the same Route Replace semantics
for v6 that v4 uses. Allow the end-user to know if their
kernel has this ability and if so to specify it so zebra
can take advantage of this.
Why not do auto-detection? Because you would have to write
code in zebra to add a route then add the same route again
with different nexthops to see if which semantics it is using.
It sure is easier to just add a cli that allows the user to
do it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Setup the buf used for extra data passed into kernel such
that we are cleaning it out before writing data to it,
so we can avoid writing uninited data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the next hop of the leaked VRF is not overwritten when the
route is being imported into the target VRF from the VPN table. Also, in
the case of multipath routes, ensure that the nexthop's ifindex is not
inadvertently reset.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ensure that when EVPN routes are installed into zebra, the router MAC
is passed per next hop and appropriately handled. This is required for
proper multipath operation.
Ticket: CM-18999
Reviewed By:
Testing Done: Verified failed scenario, other manual tests
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
We are missing some handling of PBR and SHARP protocols
for netlink operations w/ the linux kernel.
Additionally add a bread crumb for new developers( or existing )
to know to fixup the rt_netlink.c when we start handling new
route types to hand to the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zserv.c has become something of a dumping ground for everything vaguely
related to ZAPI and really needs some love. This change splits out the
code fo building and consuming ZAPI messages into a separate source
file, leaving the actual session and client lifecycle code in zserv.c.
Unfortunately since the #include situation in Zebra has not been paid
much attention I was forced to fix the headers in a lot of other source
files. This is a net improvement overall though.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
With the recent change to just pass the prefix in
for the RTM_DELROUTE, for blackhole routes we
had stopped modifying the req.rtm_type to
be the appropriate type for blackhole routes.
Since we are just deleting on the route, and
zebra is never going to really install the same
route multiple times then we do not need
to specify the req.r.rtm_type for the deletion
command.
Ticket: CM-20616
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
EVPN owns the remote neigh entries which are programed in the kernel.
This entries should not age out and the only way to delete should be
from EVPN. We should program these entries with NUD_NOARP instead of
NUD_REACHABLE to avoid aging of this macs.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
There can be a race condition between kernel and frr as follows.
Frr sends remote neigh notification.
At the (almost) same time kernel might send a notification saying
neigh is local.
After processing this notifications, the state in frr is local while
state in kernel is remote. This causes kernel and frr to be out of sync.
This problem will be avoided if FRR acts on the kernel notifications for
remote neighbors. When FRR sees a remote neighbor notification for a
neighbor which it thinks is local, FRR will change the neigh state to remote.
Ticket: CM-19923/CM-18830
Review: CCR-7222
Testing: Manual
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>