Before deleting nexthop groups, that are installed,
from the system, start a timer and hold the nexthop
group for that time.
Suppose you have this scenario
a) create a static route with 1 x ecmp
creates a nhg with 1 x ecmp
b) create a static route with 2 x ecmp
creates a nhg with 2 x ecmp
deletes a's nhg
c) create a static route with 3 x ecmp
creates a nhg with 3 x ecmp
deletes b's nhg
d) create a different route with 1 x ecmp
creates another 1 x ecmp ( since a's ecmp was deleted )
e) create a different route with 2 x ecmp
creates another 2 x ecmp ( since b's ecmp was deleted )
If you don't delete the nhg, start a timer, the nhg's used
in steps a and b can be reused for steps d and e. This reduces
overhead work with zebra <-> kernel interactions and improves
the speed of the system.
So modify the code to note that an installed nexthop group should
be kept around a bit and hopefully reused.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
I keep getting confused about nhg_depends and nhg_dependents.
So take a second and write them down for the next person.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add `%pNG` so that a nexthop group can be displayed in debugs/logs
such that it can provide useful information.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add uptime for use with NHEs to keep track of how
long we have had this NHE in our rib without an update.
This is treated exactly the same as the re->uptime for
routes. When we get an update for a route, we reset the
uptime.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add a PROTO_OWNED macro for code readability when checking
ID bounds for whether a NHG is proto owned.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Use the main zebra workqueue for daemon-owned NHGs, in addition
to processing kernel-owned NHGs. The zapi message processing
creates a temporary object that's enqueued to the workqueue,
then processed/installed as part of the workqueue processing.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a control and api for the use of backup nexthops in
recursive resolution. With 'no', we won't try to use installed
backup nexthops when resolving a recursive route.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Start reorg of zebra nexthop-resolution so that we can use the
resolution logic for nexthop-groups as well as routes. Change
the signature of the core nexthop_active() api so that it does
not require a route-entry or route-node. Move some of the logic
around so that nexthop-specific logic is in nexthop_active(),
while route-oriented logic is in nexthop_active_check().
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Send the results of daemons' nhg updates asynchronously,
after the update has actually completed. Capture additional
info about the source daemon in order to locate the correct
zapi session. Simplify the result types considered by the
zebra_nhg module.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
It is now 4bits of type and 28bits of value -
1. type=0 is for L3 NHG
2. type=1 is for L2 NH
3. type=2 is for L2 NHG
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When `-r` is specified to zebra, on shutdown we should
not remove any routes from the fib. This was a problem
with nhg's on shutdown due to their ref-count behavior.
Introduce a methodology where on shutdown we don't mess
with the nexthop groups in the kernel. That way on
next startup things will be ok.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add type to the nhg_proto_del API params for sanity checking
that the types of the route sent by the proto matches the type
found with the ID.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a flag to track the released state of a proto-based NHG.
This flag is used to know whether the upper level proto has called
the *_del API. Typically, the NHG would just get removed and uninstalled
at this point but there is a chance we are being sent it while routes
are still being owned or we were sent it multiple times. This flag
and associated code handles that.
Ticket: CM-30369
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Fix some reference counting issues seen when replacing
a NHG and deleting one.
For replacement, we should end with the same refcnt on the new
one.
For delete, its the caller's job to decrement its ref after
its done with it.
Further, update routes in the rib with the new pointer after replace.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add code to handle proto-based NHG uninstalling after
the owning client disconnects.
This is handled the same way as rib_score_proto() but for now
we are ignoring instance.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Remove some leftover boilerplate from the old replace
code path. That code ended up in the add API so its no
longer needed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a command/functionality to only install proto-based nexthops.
That is nexthops owned/created by upper level protocols, not ones
implicitly created by zebra.
There are some scenarios where you would not want zebra to be
arbitrarily installing nexthop groups and but you still want
to use ones you have control over via lib/nexthop_group config
and an upper level protocol.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Implement the underlying zebra functionality to Add/Del an
internal zebra and kernel NHG.
These NHGs are managed by the upperlevel protocols that send them
down via zapi messaging.
They are not put into the overall zebra NHG hash table and only
put into to the ID table. Therefore, different protos cannot
and will not share NHGs.
The proto is also set appropriately when sent to the kernel.
Expand the separation of Zebra hashed/shared/created NHGs and
proto created and mangaged NHGs.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Implement the next hop group send on startup if you are using
them. Normally you will only have them if you are already using this
Linux kernel feature.
NOTE: to make sure all next hop groups exist, we send/enqueue all next
hop groups first and then we send routes. The RIB route walk start is
at the end of the function `fpm_nhg_send()`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Include backup nexthops in nhe processing; connect incoming
zapi route data with updated rib/nhg apis; add more debugs in
nhg processing.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use a backup index in a nexthop directly (if it has a backup
nexthop); revise the zebra nhe/nhg code; revise zapi route
decoding to match; revise the dataplane route datastructs.
Refactor some of the rib_add_multipath code to be prepared to
be called with an nhe, carrying nexthop and (possibly) backup
info together.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Embed nexthop-group, which is just a pointer, in the zebra
nexthop-hash-entry object, rather than mallocing one.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Nexthop groups as a whole do not make sense to have a vrf'ness
As that you can have a arbitrary number of nexthops that point
to separate vrf's.
Modify the code to make this distinction, by clearly delineating
the line between the nhg and the nexthop a bit better.
Nexthop groups having a vrf_id only make sense if you are using
network namespaces to represent them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a config that disables use of kernel-level nexthop ids.
Currently, zebra always uses nexthop ids if the kernel supports
them.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Replace the existing list of nexthops (via a nexthop_group
struct) in the route_entry with a direct pointer to zebra's
new shared group (from zebra_nhg.h). This allows more
direct access to that shared group and the info it carries.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Clean up the relationships between zebra's rib and nexthop-group
headers as prep for adding a nexthop-group pointer to the
route_entry.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a private header file for functions that are internal/special
case like how we do it for `lib/nexthop_group_private.h`.
Remove a bunch of functions from the header file only being used
statically and add some comments for those remaining to indicate
better what their use is.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-work the validity setting and checking APIs
for nhg_hash_entry's to make them clearer.
Further, they were originally only beings set
on ifdown and install. Extended their use into
releasing entries and to account for setting
the validity of a recursive dependent.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Change the wording of the flag indicating we have received
a nexthop group from the kernel with a different ID but
is fundamentally identical to one we already have.
It was colliding with a flag of similar name in the nexthop struct.
Change it from NEXTHOP_GROUP_DUPLICATE -> NEXTHOP_GROUP_UNHASHABLE
since it is in fact unhashable.
Also change the wording of functions and comments referencing the same
problem.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a comment to the header of `zebra_nhg.c` to point the reader
to where the hashtables containing the nhg entries are held.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a mechanism to requeue groups we receive from the
kernel if the IDs are in a weird order (Group ID is lower
than individual nexthop IDs for example).
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add handling for delete/update nexthop object messages from the
kernel.
If someone deletes a nexthop object we are still using, send it back
down. If the someone updates a nexthop we are using, replace that nexthop
with ours. Routes are referencing this nexthop object ID and we resolved
it ourselves, so we should force the other `someone` to submit to our
will.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
On restart, if we failed to remove any nexthop objects due
to a kill -9 or such event, sweep them if we aren't using them.
Add a proto field to handle this and remove the is_kernel bool.
Add a dupicate flag that indicates this nexthop group is only
present in our ID hashtable. It is a dupicate nexthop we received
from the kernel, therefore we cannot hash on it.
Make the idcounter globally accessible so that kernel updates
increment it as soon as we receive them, not when we handle them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The kernel does not allow duplicate IDs in the same group, but
we are perfectly find with it internally if two different
nexthops resolve the the same nexthop (default route for instance).
So, we have to handle this when we get ready to install.
Further, pass the max group size in the arguments to ensure we
don't overflow. Don't actually think this is possible due to
multipath checking in nexthop_active_update() but better to be
safe.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Switch the nhg_connected tree structures to use the new
RB tree API in `lib/typerb.h`. We were using the openbsd-tree
implementation before.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When hashing/creating the NHE, use the nexthops vrf as its
source of data. This is gotten directly from an interface
and should not come from a route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the ability to recursively resolve nexthop group hash entries
and resolve them when sending to the kernel.
When copying over nexthops into an NHE, copy resolved info as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We will use a nhe context for dataplane interaction with
nextho group hash entries.
New nhe's from the kernel will be put into a group array
if they are a group and queued on the rib metaq to be processed
later.
New nhe's sent to the kernel will be set on the dataplane context
with approprate ID's in the group array if needed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Removing this function since the new paradigm
of everything just being nhg_connected structs
makes it not make a lot of sense.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-organize and expose the nhg_connected functions so that
it can be used outside zebra_nhg.c. And then abstract those
into zebra_nhg_depends_* and zebra_nhg_depenents_* functons.
Switch the ifp struct to use an RB tree for its dependents,
making use of the nhg_connected functions.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Create a nhg_depenents tree that will function as a way
to get back pointers for NHE's depending on it.
Abstract the RB nodes into nhg_connected for both depends and
dependents. This same struct is used for both.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add function to increment the route reference count for nhg_hash_entry's
and to do so recursively if its a group.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>