In zebra_nhg_find(), if we created a nhg_hash_entry, return
true so we know rib-side.
Kernel-side, we don't care since it will always just enqueue
a context to process later.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the ability to recursively resolve nexthop group hash entries
and resolve them when sending to the kernel.
When copying over nexthops into an NHE, copy resolved info as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were only setting and checking the ifindex if
the nexthop had an *_IFINDEX type. However, when nexthop
active checking is done, the non-*_IFINDEX types can also
obtain a nexthop with an ifindex and are thus valid too.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We will use a nhe context for dataplane interaction with
nextho group hash entries.
New nhe's from the kernel will be put into a group array
if they are a group and queued on the rib metaq to be processed
later.
New nhe's sent to the kernel will be set on the dataplane context
with approprate ID's in the group array if needed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Upon release, call the approprate functions to remove itself
from depends/dependents trees it is in.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some functions to iterate over the depends/dependents
RB tree and remove themselves from the other's RB tree.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Can't RM_REMOVE directly with a key, you need to actually pass the
data to be removed. So, lookup with a key first to find the node,
then remove it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Removed a static function that did not need to be
there. The nhg_connected_cmp() function provides
all the needed functionality for comparing ID's
in the RB tree.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Update the zebra_nhg_hash_equal() function to use
the nexthop_group_equal() function in lib/nexthop_group
instead of comparing their depends RB tree.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Removing this function since the new paradigm
of everything just being nhg_connected structs
makes it not make a lot of sense.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Put the setting of the ifp on a nexthop group hash
entry into the zebra_nhg_alloc() function. It should
only be added if its not a group/recursive (it doesn't
have any depends) and its nexthop type has an ifindex.
This also provides functionality for proto-side ifp
setting.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-organize and expose the nhg_connected functions so that
it can be used outside zebra_nhg.c. And then abstract those
into zebra_nhg_depends_* and zebra_nhg_depenents_* functons.
Switch the ifp struct to use an RB tree for its dependents,
making use of the nhg_connected functions.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Create a nhg_depenents tree that will function as a way
to get back pointers for NHE's depending on it.
Abstract the RB nodes into nhg_connected for both depends and
dependents. This same struct is used for both.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add function to increment the route reference count for nhg_hash_entry's
and to do so recursively if its a group.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a helper function to allow us to check if two
nhg_hash_entry's dependency lists are equal.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add helper function to allow us to lookup an ID inside
of a nhg_hash_entry's dependency list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a function that allows us to take a single
nexthop struct and look that up or create a group and
nexthop hash entry with it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Pass a boolean to zebra_nhg_find(), indicating whether the
nhg is being lookedup from the kernel side or not.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the id counter further up into zebra_nhg_find() so that
it is still incremented if we receive a duplicate that never
would get allocated. The kernel will still use the dup, so we
have to account for that in our id counter.
Also, if we don't create a new entry, reset the id back to where
it was when zebra_nhg_find() was called.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Changed our alloc function to just copy the nhg and
nhg_depends. This makes the zebra_nhg_find code a
little bit cleaner, hopefully preventing bugs.
The only issue with this is that it makes us have to loop
over the nexthops in a group an extra time for the copies.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Fix a couple functions that were using depends (plural)
rather than depend(singular) in their wording.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a function to duplicate a nhg dependency linked
list. We will use this for duplicating the dependency
list rather than the linked list dup function in lib/linkedlist.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Simplify the code for nexthop hash entry creation. I made nexthop
hash entry creation expect the nexthop group and depends to always
be allocated before lookup. Before, it was only allocated if it had
dependencies. I think it makes the code a bit more readable to go
ahead an allocate even for single nexthops as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some functions that can be called to free everything that should
have been allocated in a nexthop group hash entry.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add functionality to read in a group from the kernel,
create a hash entry for it, and add its nexthops to
its dependency list.
Further, we create its nhg struct separtely from this,
copying over any nexthops it should reference directly
into it.
Thus, we have two types for representation of the nexthop group:
nhe->nhg_depends->[nhe, nhe, nhe]
nhe->nhg->nexthop->nexthop->nexthop
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We treat "groups" from the kernel here as a dependency list.
Each hash entry, if its a group from the kernel, has
a list of any other nexthop hash entries that are in its
group. A non-group nexthop from the kernel will have its
dependency list set to NULL.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The nexthop group hash entries were using the "TMP" memory
type. Declared one for them and updated to use it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Changed to the wording in the duplicate error message
since its techincally possible we get could try to
create a dupe from somewhere else besides the kernel
in the future.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add an interface pointer for an nexthop group hash entry
when we are getting a rib_add for a new route.
Also, add the interface index to the `show nexthop-group` command.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add functionality to uninstall nexthops we created on shutdown.
To account for this, I added in a function for zebra_router
cleanup in a shutdown event.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When nexthop entry reference counts hit zero and
we created them, uninstall them from the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a function that can handle the results of a dataplane
ctx status, dpending on the operation performed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add functions for sending a nexthop to be queued on the dataplane
for install/uninstall into the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added functionality so that when we receive a RTM_DELNEXTHOP
for a nhg_hash_entry that is still being referenced by
a route, we immediately push it back to the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a function for installing Nexthop Group hash entires into
the kernel. It sends the entry to the dataplane and does any
post-processing immediately after that.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the functionality to parse new nexthop group messages
from the kernel and insert them into the appropriate hash
tables. Parsing is done at startup between interface and
interface address lookup. Add functionality to parse
changes to nexthops we already have. Add functionality
to parse delete nexthop messages from the kernel and
remove them from our table.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
I do not believe we should be hashing based on AFI
in for our upper level nexthop group entries. These
should be ambiguous with regards to address families since
an ipv4 or ipv6 address can have the same interface
nexthop. This can be seen in NEXTHOP_TYPE_IFINDEX.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a nexthop hash entry to the route_entry so that we can
track the nhe with the route entry.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The messages we get from the kernel come with ids only
for groups, so lets index with those as well. Also adding
a helper function for lookup and get with the two different
tables.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The nexthop_active_num data structure is a property of the
nexthop group. Move the keeping of this data to that.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the route_entry we are keeping a non pointer based
nexthop group, switch the code to use a pointer for all
operations here and ensure we create and delete the memory.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add some code to allow us to do lookups and releases of
nexthop groups from zebra. At this point we do not do anything
with it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit does nothing more than just create a hash structure
that we will use to track nexthop groups.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We can assume that system/kernel routes are valid indeed
if this is our first time procesing them. But since we don't
get explicit deletion events for kernel routes anymore, we
have to be prepared to process them if the nexthop becomes
unreachable for instance. Therefore, if the route is not NEW,
then don't assume its valid.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Do not allow an upper level protocol to send a route to
zebra that is a /32 or a /128 that recurses through itself.
Current behavior:
donna.cumulusnetworks.com# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/104] via 10.0.2.2, enp0s3, 01:05:28
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:01:50
C>* 192.168.209.0/24 is directly connected, enp0s8, 01:05:28
C>* 192.168.210.0/24 is directly connected, enp0s9, 01:05:28
D>* 192.168.210.43/32 [150/0] via 192.168.210.44, enp0s9, 01:01:57
D 192.168.210.44/32 [150/0] via 192.168.210.44 inactive, 01:05:15
C>* 192.168.212.0/24 is directly connected, enp0s10, 01:05:28
donna.cumulusnetworks.com# sharp install routes 40.0.0.1 nexthop 192.168.210.44
% Command incomplete: sharp install routes 40.0.0.1 nexthop 192.168.210.44
donna.cumulusnetworks.com# sharp install routes 40.0.0.1 nexthop 192.168.210.44 1
donna.cumulusnetworks.com# end
donna.cumulusnetworks.com# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/104] via 10.0.2.2, enp0s3, 01:05:51
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:12
D>* 40.0.0.1/32 [150/0] via 192.168.210.44, enp0s9, 00:00:03
C>* 192.168.209.0/24 is directly connected, enp0s8, 01:05:51
C>* 192.168.210.0/24 is directly connected, enp0s9, 01:05:51
D>* 192.168.210.43/32 [150/0] via 192.168.210.44, enp0s9, 01:02:20
D 192.168.210.44/32 [150/0] via 192.168.210.44 inactive, 01:05:38
C>* 192.168.212.0/24 is directly connected, enp0s10, 01:05:51
donna.cumulusnetworks.com#
Fixed behavior:
donna.cumulusnetworks.com# sharp install routes 192.168.210.44 nexthop 192.168.210.44 1
donna.cumulusnetworks.com# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/104] via 10.0.2.2, enp0s3, 00:00:15
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:15
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:15
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:15
D 192.168.210.44/32 [150/0] via 192.168.210.44 inactive, 00:00:03
C>* 192.168.212.0/24 is directly connected, enp0s10, 00:00:15
donna.cumulusnetworks.com# sharp install routes 40.0.0.1 nexthop 192.168.210.44 1
donna.cumulusnetworks.com# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/104] via 10.0.2.2, enp0s3, 00:00:24
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:24
D>* 40.0.0.1/32 [150/0] via 192.168.210.44, enp0s9, 00:00:02
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:24
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:24
D 192.168.210.44/32 [150/0] via 192.168.210.44 inactive, 00:00:12
C>* 192.168.212.0/24 is directly connected, enp0s10, 00:00:24
donna.cumulusnetworks.com#
This behavior came up from discussion around issue #5159. Where
OSPF was receiving a route through itself as part of the router link
lsa. I currently think that ospf should probably dissallow this in ospf
but we should also do the right thing in zebra. If we do not allow this
change we can have situations where ordering of routes into zebra suddenly
matters.
Fixes: #5159
Signed-off-by: Donald Sharp <sharpd@cumulsunetworks.com>
If the nexthop is of type `GATEWAY_IFINDEX` then the nexthop
should not resolve to a nexthop that has a different ifindex
from the one given.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
even if vty commands were available, the default resolution command was
working only for the first vrf configured. others were ignored. Also,
for nexthop, resolution was working for all vrfs, and not the specific
one.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When resolving a nexthop, append its labels to the one its
resolving to along with the labels that may already be present there.
Before we were ignoring labels if the resolving level was greater than
two.
Before:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:07
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:07
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:04
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:17
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:17
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:17
ubuntu_nh#
```
This patch:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111/2222, 00:00:04
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:02
* via 7.7.7.7, dummy1 onlink, label 1111/2222/3333, 00:00:02
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:11
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:11
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:11
ubuntu_nh#
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If there is a recursive route resolved over blackhole route, then
the resolved blackhole_type is not getting set correctly.
This fix updates the bh_type correctly for resursive routes.
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
The flag ROUTE_ENTRY_NEXTHOPS_CHANGED is only ever set or unset.
Since this flag is not used for anything useful, remove from system.
By changing this flag we have re-ordered `internalStatus' of json
output of zebra rib routes. Go through and fix up tetsts to
use the new values.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Introducing a 3rd state for route_map_apply library function: RMAP_NOOP
Traditionally route map MATCH rule apis were designed to return
a binary response, consisting of either RMAP_MATCH or RMAP_NOMATCH.
(Route-map SET rule apis return RMAP_OKAY or RMAP_ERROR).
Depending on this response, the following statemachine decided the
course of action:
State1:
If match cmd returns RMAP_MATCH then, keep existing behaviour.
If routemap type is PERMIT, execute set cmds or call cmds if applicable,
otherwise PERMIT!
Else If routemap type is DENY, we DENYMATCH right away
State2:
If match cmd returns RMAP_NOMATCH, continue on to next route-map. If there
are no other rules or if all the rules return RMAP_NOMATCH, return DENYMATCH
We require a 3rd state because of the following situation:
The issue - what if, the rule api needs to abort or ignore a rule?:
"match evpn vni xx" route-map filter can be applied to incoming routes
regardless of whether the tunnel type is vxlan or mpls.
This rule should be N/A for mpls based evpn route, but applicable to only
vxlan based evpn route.
Also, this rule should be applicable for routes with VNI label only, and
not for routes without labels. For example, type 3 and type 4 EVPN routes
do not have labels, so, this match cmd should let them through.
Today, the filter produces either a match or nomatch response regardless of
whether it is mpls/vxlan, resulting in either permitting or denying the
route.. So an mpls evpn route may get filtered out incorrectly.
Eg: "route-map RM1 permit 10 ; match evpn vni 20" or
"route-map RM2 deny 20 ; match vni 20"
With the introduction of the 3rd state, we can abort this rule check safely.
How? The rules api can now return RMAP_NOOP to indicate
that it encountered an invalid check, and needs to abort just that rule,
but continue with other rules.
As a result we have a 3rd state:
State3:
If match cmd returned RMAP_NOOP
Then, proceed to other route-map, otherwise if there are no more
rules or if all the rules return RMAP_NOOP, then, return RMAP_PERMITMATCH.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
Add a file that exposes functions which modify nexthop groups.
Nexthop groups are techincally immutable but there are a
few special cases where we need direct access to add/remove
nexthops after the group has been made. This file provides a
way to expose those functions in a way that makes it clear
this is a private/hidden api.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Introducing a 3rd state for route_map_apply library function: RMAP_NOOP
Traditionally route map MATCH rule apis were designed to return
a binary response, consisting of either RMAP_MATCH or RMAP_NOMATCH.
(Route-map SET rule apis return RMAP_OKAY or RMAP_ERROR).
Depending on this response, the following statemachine decided the
course of action:
Action: Apply route-map match and return the result (RMAP_MATCH/RMAP_NOMATCH)
State1: Receveived RMAP_MATCH
THEN: If Routemap type is PERMIT, execute other rules if applicable,
otherwise we PERMIT!
Else: If Routemap type is DENY, we DENYMATCH right away
State2: Received RMAP_NOMATCH, continue on to next route-map, otherwise,
return DENYMATCH by default if nothing matched.
With reference to PR 4078 (https://github.com/FRRouting/frr/pull/4078),
we require a 3rd state because of the following situation:
The issue - what if, the rule api needs to abort or ignore a rule?:
"match evpn vni xx" route-map filter can be applied to incoming routes
regardless of whether the tunnel type is vxlan or mpls.
This rule should be N/A for mpls based evpn route, but applicable to only
vxlan based evpn route.
Today, the filter produces either a match or nomatch response regardless of
whether it is mpls/vxlan, resulting in either permitting or denying the
route.. So an mpls evpn route may get filtered out incorrectly.
Eg: "route-map RM1 permit 10 ; match evpn vni 20" or
"route-map RM2 deny 20 ; match vni 20"
With the introduction of the 3rd state, we can abort this rule check safely.
How? The rules api can now return RMAP_NOOP (or another enum) to indicate
that it encountered an invalid check, and needs to abort just that rule,
but continue with other rules.
Question: Do we repurpose an existing enum RMAP_OKAY or RMAP_ERROR
as the 3rd state (or create a new enum like RMAP_NOOP)?
RMAP_OKAY and RMAP_ERROR are used to return the result of set cmd.
We chose to go with RMAP_NOOP (but open to ideas),
as a way to bypass the rmap filter
As a result we have a 3rd state:
State3: Received RMAP_NOOP
Then, proceed to other route-map, otherwise return RMAP_PERMITMATCH by default.
Signed-off-by:Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
Since these functions are not really rib processing problems
let's move them to zebra_nhg.c which is meant for processing of
nexthop groups.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>