Device only nexthops still need an address family associated
with them. Decided to get this from the destination prefix on it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We needed a kernel debugging function for netlink nexthop
messages when people are debugging kernel zebra messages.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add all the neccessary code to allow nexthops to be processed
in separate dataplane contexts with the netlink dataplane kernel
provider.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added the appropriate flags that need to be set when
we receive a nexthop from the kernel. They should be
marked as ACTIVE and that they are in the FIB.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the functionality to parse new nexthop group messages
from the kernel and insert them into the appropriate hash
tables. Parsing is done at startup between interface and
interface address lookup. Add functionality to parse
changes to nexthops we already have. Add functionality
to parse delete nexthop messages from the kernel and
remove them from our table.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In the route_entry we are keeping a non pointer based
nexthop group, switch the code to use a pointer for all
operations here and ensure we create and delete the memory.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Update neighbor entries and rule entries to have the RTPROT_ZEBRA
protocol value. So we can tell where things come from.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some netlink-facing code used for evpn/vxlan programming was
being run in the dataplane pthread, but accessing zebra core
datastructs. Move some additional data into the dataplane
context, and use it in the netlink path instead.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Move neighbor programming to the dataplane; remove
old apis; remove some ifdef'd use of direct netlink
code points, using neutral values outside of the netlink-
specific files.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When resolving a nexthop, append its labels to the one its
resolving to along with the labels that may already be present there.
Before we were ignoring labels if the resolving level was greater than
two.
Before:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:07
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:07
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:04
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:17
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:17
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:17
ubuntu_nh#
```
This patch:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111/2222, 00:00:04
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:02
* via 7.7.7.7, dummy1 onlink, label 1111/2222/3333, 00:00:02
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:11
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:11
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:11
ubuntu_nh#
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When a new system route comes in and we have a pre-existing
non-system route we are not deleting the current system
route from the linux kernel.
Modify the code such that when a route replace is sent
to the kernel with a new route as a system route and
the old route as a non-system route do a delete of
the old route so it is no longer in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a conditional to guard against segfaulting on the debug
statement when zvrf lookup fails.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When installing a table route into the kernel choose
RTPROT_ZEBRA as the installing/controlling protocol.
This way we can know we installed it as well as stop
the warnings about this special case of `ip import-table XX`
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The failed neighbor event logging that was recently added in
commit: 3acae086ba
cast a bit too broad of a stroke. We should only inform
the user that we were ignoring the RTM_NEWNEIGH FAIL callback
when we believe it was one of our own 5549 entries.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a bit of extra code to indicate to the operator why
we intentionally rejected a kernel route from being used.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we get a neighbor entry for 5549 failure notice
from the kernel that means that something has probably
gone terribly wrong. Let's notice and not reinstall.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The re->uptime usage of time(NULL) leaves it open to
timing changes from outside influence. Switching
to monotime allows us to ensure that we have a timestamp
that is always increasing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have several route types KERNEL and CONNECT that are handled via special
case in the code. This was causing a lot of work keeping the two different
classes of route types as special(SYSTEM OR NOT). Put the dplane
in charge of the code that sets the bits for signalling route install/failure.
This greatly simplifies the code calling path and makes all route types
be handled exactly the same. Additionaly code that we want to run
post data plane install can just work as per normal then, instead
of having to know we need to run it when we have a special type
of route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
When we get a neighbor entry in zebra we start processing it.
Let's add some additional debugs to the processing so that when
it bails out and we don't use the data, we know the reason.
This should help in debugging the problems from why bgp does
not appear to have data associated with a neighbor entry
in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The check for an entry being NUD_PERMANENT has already been done
there is no need to do it twice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use const in the accessors for pseudowire nhlfe data; pull
that through the kernel-facing apis that use that data.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When we install a new route into the kernel always use
REPLACE. Else if the route is already there it can
be translated into an append with the flags we are
using.
This is especially true for the way we handle pbr
routes as that we are re-installing the same route
entry from pbr at the moment.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Read the onlink flag from the kernel for routes and pass them
up and through to zebra so that we are consistent with what
the kernel is telling us.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the nexthop->type is NEXTHOP_TYPE_IPV4_IFINDEX we
were writing the RTA_PREFSRC 2 times for the build_singlepath
and build_multipath functions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some v6 attributes for the netlink_route_build_singlepath
code were being written two times for the NEXTHOP_TYPE_IPV6_IFINDEX
nexthop type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The kernel neigh update api helps update neighbor entry,
using changing state and flags parameters.
Ticket:CM-22864
Reviewed By:
Testing Done:
Signed-off-by:Chirag Shah <chirag@cumulusnetworks.com>
Finish the LSP update code for the async dataplane for
the openbsd platform. Remove synch apis now that we've
converted to the async code path.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Adding infra to zebra dplane to support LSP updates. Add
kernel api for LSP updates that uses a dataplane context; add
stub apis for netlink, bsd, and 'null' kernel paths. Add
version of netlink mpls update code that takes a dplane
context struct instead of a zebra lsp struct.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
An EVPN type-2 entry is in freeze state during remote update,
remote VTEP can send typ-2 withdraw update,
upon receiving an entry delete (withdraw), first check
kernel has in local reachable state. Upon
unfreeze use the local entry to advertise to peers.
Fetch is for both MAC and IP, delete can come for
only MAC or MAC-IP combined route.
The specific entry fetch only required request flag to be set,
dump flag is not required.
Testing Done:
Simulate two VTEPs to do M1, IP1 mobility sequence,
freeze MAC during remote MAC update, subsequently send
withdraw type-2 route from origintating VTEP.
This results in read apis to invoke for local reachable entry.
Zebra updates its cache and upon unfreeze originates type-2.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Make netlink_request api generic where it can be used
for dump or querying specific information request.
nelink request nlm flags (NLM_F_ROOT | NLM_F_MATCH) are
used to dump purpose, if client wants to query spcific
MAC or IP using netlink_request does not require to set
them.
nlm struct is passed by the caller of netlink_request,
it can also set the nlm request flags.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
NEXTHOP_FLAG_ACTIVE currently means that the nexthop is considered
good enough to be installed. With current ecmp restrictions this
translation from multipath_num is enforced in the data plane.
The problem with this is of course that every data plane now
becomes concerned about the multipath num and must enforce it
independently. Currently *bsd does not honor multipath_num at
all and linux marks all nexthops as being installed even when
it honors a multipath_num that is less than the total.
This code change moves the multipath_num enforcement from a dataplane
decision to a zebra nexthop decision. Thus dataplanes now can
just install those nexthops marked as NEXTHOP_FLAG_ACTIVE
without having to worry about multipath_num.
*BSD will now respect multipath_num and Linux now properly notes
which routes are actually installed or not:
sharpd@donna ~/f/t/topotests> ps -ef | grep frr
frr 6261 1556 0 09:12 ? 00:00:00 /usr/lib/frr/zebra -e 2 --daemon -A 127.0.0.1
frr 6279 1556 0 09:12 ? 00:00:00 /usr/lib/frr/staticd --daemon -A 127.0.0.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/106] via 10.0.2.2, enp0s3, 00:00:45
S>* 4.4.4.4/32 [1/0] via 10.0.2.1, enp0s3, 00:00:02
* via 192.168.209.1, enp0s8, 00:00:02
via 192.168.210.1, enp0s9 inactive, 00:00:02
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:45
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:45
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:45
donna.cumulusnetworks.com(config)#
sharpd@donna ~/f/t/topotests> ip route show
default via 10.0.2.2 dev enp0s3 proto dhcp metric 106
4.4.4.4 proto 196 metric 20
nexthop via 10.0.2.1 dev enp0s3 weight 1
nexthop via 192.168.209.1 dev enp0s8 weight 1
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 metric 106
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
192.168.209.0/24 dev enp0s8 proto kernel scope link src 192.168.209.2 metric 105
192.168.210.0/24 dev enp0s9 proto kernel scope link src 192.168.210.2 metric 103
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that kernel neighbor entries could end up in "FAILED"
state when the neighbor entry was deleted. This fix handles the
notification of the event from netlink messages and re-inserts the
deleted entry.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Recursive multipath nexthops were broken by the initial async
dataplane - we were trying to install an extra, invalid
nexthop.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Reduce or eliminate use of global zebra_ns structs in
a couple of netlink/kernel code paths, so that those paths
can potentially be made asynch eventually.
Slide netlink_talk_info into place to remove dependency on core
zebra structs; add accessors for dplane context block
Start init of route context from zebra core re and rn structs;
start queueing and event handling for incoming route updates.
Expose netlink apis that don't rely on zebra core structs;
add parallel route-update code path using the dplane ctx;
simplest possible event loop to process queued route'
updates.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Conditional code in netlink_macfdb_update() introduced in 2232a77c used
the 'dst_present' variable because not all cases were covered. Now it is
not necessary.
Signed-off-by: F. Aragon <paco@voltanet.io>
Reduce or eliminate use of global zebra_ns structs in
a couple of netlink/kernel code paths, so that those paths
can potentially be made asynch eventually.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use boolean variables instead of unsigned int for certain VxLAN-EVPN
flags which are really used as boolean.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-22288
Reviewed By: CCR-7832
Testing Done:
Along with a subsequent, related commit
When a MAC moves from local to remote, a replace is allowed, EVPN
no longer has to delete the local MAC before installing the remote
MAC.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
So the linux kernel uses the RT_TABLE_MAIN for the table
id used for ip routing. The multicast routing tables use
RT_TABLE_DEFAULT. We changed the internal code of zebra_vrf
a few months back to use RT_TABLE_MAIN as the tableid to
use. This caused the pim sg stats to stop working because
of the kernel bug where it uses a different table
for ip routing and ip multicast.
Put a bit of a special case in to do the right thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Newer linux kernels apparently send data down the netlink
bus for the creation of mroutes. Add a bit of code
to notice this and to handle it appropriately( ie do
nothing at this point in time ) as that the correct
place to do this is in the pim socket in pimd.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are displaying data about a netlink message
in debugs or errors, print out the message type
as a string instead of a number.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There exists a possibility that the ifindex we are passed
does not exist and as such we should check for it not
resolving as part of the debug.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were ignoring mpls labels encapped with static routes.
Added support for single and multipath labels.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow protocols to specify to zebra that they would like zebra
to use the distance passed down as part of determine sameness for
Route Replace semantics.
This will be used by the static daemon to allow it to have
backup static routes with greater distances.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Prefix length validation checks should be returning an error
rather than 0. Switch to that and make them error messages.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Commit a2ca67d1d2 consolidated IPv4 and IPv6 handling. It also applied
our ignorance for IPv4 srcdest routes onto IPv6.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Bad nexthop messages from netlink were causing zebra
to hang here. Added a check to verify the length
of the nexthop so it doesn't keep trying to read.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Some more address family filters we can safely ignore
as well as typos in logger. Added AF_MPLS as filterable.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Zebra needed a check that varifies the prefix length
of an address is a valid length when receiving route
changes and interface address changes.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The zebra netlink socket was attempting to read netlink
messages with invalid address families in a couple areas.
Added filters and warn messages.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
EVPN ND ext community support NA flag R-bit, to have proxy ND.
Set R-bit in EVPN NA if a given router is default gateway or there is a
local
router attached, which can be determine based on local neighbor entry.
Implement BGP ext community attribute to generate and parse R-bit and
pass along zebra to program neigh entry in kernel.
Upon receiving MAC/IP update with community type 0x06 and sub_type 0x08,
pass the R-bit to zebra to program neigh entry.
Set NTF_ROUTER in neigh entry and inform kernel to do proxy NA for EVPN.
Ref:
https://tools.ietf.org/html/draft-ietf-bess-evpn-na-flags-01
Ticket:CM-21712, CM-21711
Reviewed By:
Testing Done:
Configure Local vni enabled L3 Gateway, which would act as router,
checked
show evpn arp-cache vni x ip <ip of svi> on originated and remote VTEPs.
"Router" flag is set.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Add 'const' to prefix args to several zebra route update,
redistribution, and route owner notification apis.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The interface lookup algorithm is different according to if we are on
netns vrf or not. If we are on the former case, then we only have to
parse the interfaces of the netns, while if we are on the other case, we
have to parse all the interfaces of all the vrfs ( since index is not
overlapping in the latter case).
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When we receive a netlink message from the kernel we have
handler functions for when we send a netlink command, if these
return a failure ( < 0 ) then we output that we had a parse
issue. But if all we get is:
2018-06-21T23:47:45.298156+00:00 qct-ix1-08 zebra[1484]: netlink-cmd (NS 0) filter function error
Then it is not very useful to figure out *where* the error happened.
Add more error code when in a decode path to hopefully allow us
to figure out where this message is coming from.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a bit of code to allow return of data plane
request messages.
Add the ability to pass the result back to callers
of kernel_route_rib.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The SOUTHBOUND_XXX enum was named a bit poorly.
Let's use a bit better name for what we are trying to do.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a route that we think we own and we
are not in startup conditions, then add a small debug
to help debug the issue when this happens, instead
of silently just ignoring the route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The re-use of RTPROT_STATIC has caused too many collisions
where other legitimate route sources are causing us to
believe we are the originator of the route. Modify
the code so that if another protocol inserts RTPROT_STATIC
we will assume it's a Kernel Route.
Fixes: #2293
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>