Commit Graph

90 Commits

Author SHA1 Message Date
Mark Stapp
72b31b96fc *: create a single registry of daemons' default port values
Create a single registry of default port values that daemons
are using. Most of these are vty ports, but there are some
others for features like ospfapi and zebra FPM.

Signed-off-by: Mark Stapp <mjs@labn.net>
2024-02-01 11:40:02 -05:00
Russ White
54c2d327d3
Merge pull request #12261 from cscarpitta/srv6-encap-src-addr
zebra: Add the support of the Source Addr param of the SRv6 Encapsulation
2024-01-02 10:37:34 -05:00
Donald Sharp
61af06c813 zebra: Use event_add_event instead of _timer
The t_dequeue was being enqueued with a timer of 0
this is really an event instead of a timer.  Let's
use that instead.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-12-14 09:14:00 -05:00
Donald Sharp
3209ca4b08 zebra: Prevent possible wedged fpm write
An operator is reporting that the dplane_fpm_nl connection has
started to accumulate contexts.  One such path that could cause
this is that the obuf used is full and stays full.  This would
imply that what ever is on the receiving end has gotten wedged
and is not reading from the stream of data being sent it's way.
If after 15 seconds of no response, let's declare the connection
dead and reset it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-12-14 09:12:46 -05:00
Carmine Scarpitta
8d0b4745a1 zebra: Add code to set SRv6 encap source addr in dplane
Add a bunch of set functions and associated data structure in
zebra_dplane to allow the configuration of the source address for SRv6
encap in the data plane.

Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
2023-12-14 14:56:44 +01:00
Donald Sharp
315aa6cde4 *: Remove netlink headers from lib/zebra.h
The headers associated with netlink code
really only belong in those that need it.
Move these headers out of lib/zebra.h and
into more appropriate places.  bgp's usage
of the RT_TABLE_XXX defines are probably not
appropriate and will be cleaned up in future
commits.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-11-07 06:46:19 -05:00
Igor Ryzhov
7d67b9ff28 build: add -Wimplicit-fallthrough
Also:
- replace all /* fallthrough */ comments with portable fallthrough;
pseudo keyword to accomodate both gcc and clang
- add missing break; statements as required by older versions of gcc
- cleanup some code to remove unnecessary fallthrough

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2023-10-12 21:23:18 +03:00
Rafael Zalamena
6374aeab80 zebra: support route replace semantics in FPM
Set `NLM_F_REPLACE` instead of adding and removing the same route for an
update.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2023-09-04 15:38:03 -03:00
Carmine Scarpitta
7f2dec4f09 zebra: Fix crash when dplane_fpm_nl fails to process received routes
When `dplane_fpm_nl` receives a route, it allocates memory for a dplane
context and calls `netlink_route_change_read_unicast_internal` without
initializing the `intf_extra_list` contained in the dplane context. If
`netlink_route_change_read_unicast_internal` is not able to process the
route, we call `dplane_ctx_fini` to free the dplane context. This causes
a crash because `dplane_ctx_fini` attempts to access the intf_extra_list
which is not initialized.

To solve this issue, we can call `dplane_ctx_route_init`to initialize
the dplane route context properly, just after the dplane context
allocation.

(gdb) bt
#0 0x0000555dd5ceae80 in dplane_intf_extra_list_pop (h=0x7fae1c007e68) at ../zebra/zebra_dplane.c:427
#1 dplane_ctx_free_internal (ctx=0x7fae1c0074b0) at ../zebra/zebra_dplane.c:724
#2 0x0000555dd5cebc99 in dplane_ctx_free (pctx=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:869
#3 dplane_ctx_free (pctx=0x7fae2aa88c98, pctx@entry=0x7fae2aa78c28) at ../zebra/zebra_dplane.c:855
#4 dplane_ctx_fini (pctx=pctx@entry=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:890
#5 0x00007fae31e93f29 in fpm_read (t=) at ../zebra/dplane_fpm_nl.c:605
#6 0x00007fae325191dd in thread_call (thread=thread@entry=0x7fae2aa98da0) at ../lib/thread.c:2006
#7 0x00007fae324c42b8 in fpt_run (arg=0x555dd74777c0) at ../lib/frr_pthread.c:309
#8 0x00007fae32405ea7 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007fae32325a2f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Fixes: #13754
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
2023-07-07 10:59:28 +02:00
Donald Sharp
a014450441 zebra: Add code to get/set interface to pass up from dplane
1) Add a bunch of get/set functions and associated data
structure in zebra_dplane to allow the setting and retrieval
of interface netlink data up into the master pthread.

2) Add a bit of code to breakup startup into stages.  This is
because FRR currently has a mix of dplane and non dplane interactions
and the code needs to be paused before continuing on.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 13:03:14 -04:00
Donald Sharp
9a7d1e7427 zebra: Use zebra_vrf_lookup_by_id when we can
Let's make this as consistent as is possible.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-28 15:49:50 -04:00
Donald Sharp
cd9d053741 *: Convert struct event_master to struct event_loop
Let's find a better name for it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
e16d030c65 *: Convert THREAD_XXX macros to EVENT_XXX macros
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
2453d15dbf *: Convert struct thread_master to struct event_master and it's ilk
Convert the `struct thread_master` to `struct event_master`
across the code base.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
332beb64b8 *: Convert thread_cancelXXX to event_cancelXXX
Modify the code base so that thread_cancel becomes event_cancel

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
907a2395f4 *: Convert thread_add_XXX functions to event_add_XXX
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
e6685141aa *: Rename struct thread to struct event
Effectively a massive search and replace of
`struct thread` to `struct event`.  Using the
term `thread` gives people the thought that
this event system is a pthread when it is not

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
8383d53e43
Merge pull request #12780 from opensourcerouting/spdx-license-id
*: convert to SPDX License identifiers
2023-02-17 09:43:05 -05:00
Sharath Ramamurthy
b95ce8fadb zebra: single vxlan device dataplace vni update changes
dplane_mac_info and dplane_neigh_info is modified to be vni aware.
dplane_rem_mac_add/del dplane_mac_init is modified to be vni aware.

During dplane context update (mac and neigh), we use the vni information
and if set, corresponding netlink attribute NDA_SRC_VNI is set and passed to the
dplane.

Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
2023-02-13 18:12:04 -05:00
Sharath Ramamurthy
8d30ff3b5e zebra: data structure changes for single vxlan device
This changeset introduces the data structure changes needed for
single vxlan device functionality. A new struct zebra_vxlan_vni_info
encodes the iftype and vni information for vxlan device.

The change addresses related access changes of the new data structure
fields from different files

zebra_vty is modified to take care of the vni dump information according
to the new vni data structure for vxlan devices.

Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
2023-02-13 18:12:04 -05:00
David Lamparter
acddc0ed3c *: auto-convert to SPDX License IDs
Done with a combination of regex'ing and banging my head against a wall.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2023-02-09 14:09:11 +01:00
Mark Stapp
ac96497ccc zebra: use typesafe lib lists in zebra dplane
Replace some of the old queue/DLIST macros with typesafe
dlists.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-01-23 08:55:44 -05:00
Donald Sharp
c0275ab189 zebra: Continue fpm_read when we decide a netlink message is not needed
When FRR receives a netlink message that it decides to stop parsing
it returns a 0 ( instead of a -1 ).  Just make the dplane continue
reading other data instead of aborting the read.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-01-10 08:36:50 -05:00
Yutaro Hayakawa
45c129948c fpm: Send NH message to FPM even if the local kernel doesn't support it
netlink_route_multipath_msg_encode checks whether the local kernel
supports NextHop Netlink message and doesn't send the message if the
local kernel doesn't have support. This is also applied to the FPM since
kernel dataplane and FPM shares the same code. However, for the FPM,
it's not necessary to have this limit.

This commit adds extra check if netlink_route_multipath_msg_encode is
called from the FPM and bypass kernel support check if it is from the
FPM.

Signed-off-by: Yutaro Hayakawa <yutaro.hayakawa@isovalent.com>
2022-12-25 14:52:57 +09:00
Donald Sharp
a0e1173678 zebra: Read from the dplane_fpm_nl a route update
Read from the fpm dplane a route update that will
include status about whether or not the asic was
successfull in offloading the route.

Have this data passed up to zebra for processing and disseminate
this data as appropriate.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-12-13 15:34:05 -05:00
Donald Sharp
7d83e13937 zebra: Re-arrange fpm_read to reduce code duplication
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-12-12 10:44:57 -05:00
Siger Yang
c317d3f246
zebra: traffic control state management
This allows Zebra to manage QDISC, TCLASS, TFILTER in kernel and do cleaning
jobs when it starts up.

Signed-off-by: Siger Yang <siger.yang@outlook.com>
2022-11-22 22:35:35 +08:00
Donald Sharp
551fa8c354 zebra: Fix dplane_fpm_nl to allow for fast configuration
If you have this order in your configuration file:

no fpm use-next-hop-groups
fpm address 127.0.0.1

the dplane code was using the same event thread t_event and the second
add event in the code was going, you already have an event scheduled
and as such the second event does not overwrite it.  Leaving
no code to actually start the whole processing.  There are probably
other cli iterations that will cause this fun as well, but I'm
not going to spend the time sussing them out at the moment.

Fixes: #12314
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-11-14 08:31:18 -05:00
Donald Sharp
dc31de93e1 zebra: Use the enum, luke
Use the enum and let the compiler help us figure out
what cases are being missed.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-11-14 08:06:16 -05:00
Siger Yang
449a30edf6
zebra: add tc netlink and dplane ops
This commit implements necessary netlink encoders for traffic control
including QDISC, TCLASS and TFILTER, and adds basic dplane operations.

Co-authored-by: Stephen Worley <sworley@nvidia.com>
Signed-off-by: Siger Yang <siger.yang@outlook.com>
2022-08-11 02:32:43 +08:00
Russ White
5d97021ba3
Merge pull request #10427 from sworleys/Protodown-Reason-Upstream
Add Support for Setting Protodown Reason Code
2022-03-15 19:58:16 -04:00
Rafael Zalamena
3b1caddd34 zebra: don't enqueue data with FPM socket closed
It will trigger an assert while trying to schedule the next write.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2022-03-14 07:14:36 -03:00
Stephen Worley
5d41413833 zebra: add support for protodown reason code
Add support for setting the protodown reason code.

829eb208e8

These patches handle all our netlink code for setting the reason.

For protodown reason we only set `frr` as the reason externally
but internally we have more descriptive reasoning available via
`show interface IFNAME`. The kernel only provides a bitwidth of 32
that all userspace programs have to share so this makes the most sense.

Since this is new functionality, it needs to be added to the dplane
pthread instead. So these patches, also move the protodown setting we
were doing before into the dplane pthread. For this, we abstract it a
bit more to make it a general interface LINK update dplane API. This
API can be expanded to support gernal link creation/updating when/if
someone ever adds that code.

We also move a more common entrypoint for evpn-mh and from zapi clients
like vrrpd. They both call common code now to set our internal flags
for protodown and protodown reason.

Also add debugging code for dumping netlink packets with
protodown/protodown_reason.

Signed-off-by: Stephen Worley <sworley@nvidia.com>
2022-03-09 17:52:44 -05:00
Mark Stapp
728f2017ae zebra: add dplane type for NETCONF data
Add a new dplane op for interface NETCONF data; add the new
enum value to several switch statements.

Signed-off-by: Mark Stapp <mstapp@nvidia.com>
2022-02-25 09:53:02 -05:00
Mark Stapp
d4bcd88d8a zebra: avoid default clause in FPM switch
Avoid default clause in a switch in the FPM module that handles
dplane op codes - include all the codes.

Signed-off-by: Mark Stapp <mstapp@nvidia.com>
2022-02-25 09:53:02 -05:00
Donald Sharp
cc9f21da22 *: Change thread->func to return void instead of int
The int return value is never used.  Modify the code
base to just return a void instead.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-02-23 19:56:04 -05:00
Donatas Abraitis
962af8a8cd zebra: Convert vty_out to vty_json for JSON
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-11-25 17:49:46 +02:00
Donald Sharp
8f74a383b3 zebra: Convert to struct zebra_lsp as per our internal standard
We do not use typedef's to talk about structures as per our standard.
Fixing.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-09-02 10:33:23 -04:00
Donald Sharp
05843a27f5 zebra: Convert to struct zebra_l3nvi as per our internal standard
We do not use typedef's to talk about structures as per our standard.
Fixing.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-09-02 10:33:22 -04:00
Donald Sharp
3198b2b347 zebra: Convert to struct zebra_mac as per our internal standard
We do not use typedef's to talk about structures as per our standard.
Fixing.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-09-02 10:33:22 -04:00
David Lamparter
80413c2073 *: require semicolon after FRR_DAEMON_INFO & co.
... again ...

Signed-off-by: David Lamparter <equinox@diac24.net>
2021-03-17 06:18:39 +01:00
Igor Ryzhov
1ac88792c0 *: fix all backets
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2021-02-02 19:11:25 +03:00
Donald Sharp
3a15018892 zebra: Tell SA that we are intentionally ignoring the return
Calling fpm_nl_enqueue we should expect a it fit or not
return value on the outgoing stream.  This is not necessary
to check here because the while loop where we are checking this
already has ensured that the data being written will fit.

CID -> 1499854
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-01-18 09:06:49 -05:00
Duncan Eastoe
b677907c99 zebra: fpm_nl_process() reschedule dp thread
fpm_nl_process() now ensures that the dataplane thread is rescheduled
if it hits the work limit while processing its incoming work queue.

This would probably already occur due to some other event, such as
fpm_process_queue() enqueuing completed work to the output queue,
however it does no harm to add this explicit reschedule.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-22 21:14:03 +00:00
Mark Stapp
700ff41ed3
Merge pull request #7472 from opensourcerouting/fpm-fixes
fpm: frr-reload, IPv6 and an improvement
2020-12-22 11:37:58 -05:00
Duncan Eastoe
438dd3e7df zebra: reduce atomic ops in fpm_process_queue()
Maintain the count of contexts which have been processed in a local
variable, and perform a single atomic update after we have consumed
all queued contexts.

Generally this results in at least one less atomic operation per
context.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-18 15:37:13 +00:00
Duncan Eastoe
3f2b998f61 zebra: local var in fpm_process_queue() sched cond
Don't use an atomic operation to determine whether fpm_process_queue()
needs to be re-scheduled. Instead we can simply use a local variable
to determine if we stopped processing because we ran out of buffers.

In the case where we would have re-scheduled due to new context objects
in the queue (enqueued after we stopped processing), fpm_nl_process()
will schedule us (or will have done already).

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-18 15:36:39 +00:00
Duncan Eastoe
bf2f783945 zebra: reduce atomic ops in fpm_nl_process()
Maintain the peak ctxqueue length in a local variable, and perform
a single atomic update after processing all contexts.

Generally this results in at least one less atomic operation per
context.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-18 15:36:38 +00:00
Duncan Eastoe
dc693fe057 zebra: reduce dplane_fpm_nl ctxqueue_mutex contention
Reduce code in the critical sections of fpm_nl_process() and
fpm_process_queue() to the bare minimum - basically only enqueue
and dequeue operations on the shared ctxqueue.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-18 15:33:46 +00:00
Duncan Eastoe
164d8e8608 zebra: routes stuck with 'q' when using dplane FPM
New work enqueued to the dplane_fpm_nl provider is initially de-queued
and re-enqueued, in fpm_nl_process(), to be processed by the provider's
own thread.

After performing this initial de-queue/enqueue we return to
dplane_thread_loop() and check the dplane_fpm_nl output queue for any
work which has been completed.

Since this work is being processed in another thread it is very likely
that there will be some (or all) work still outstanding at this point.
The dataplane thread finishes up any other tasks and then waits until
it is next scheduled. In the meantime the dplane_fpm_nl thread is
processing its work queue until completion.

The issue arises here as the dataplane thread is not explicitly
re-scheduled once dplane_fpm_nl has drained its work queue and
populated its output queue with completed work.

This completed work can sit in the output queue for an indeterminate
period of time, depending upon when the dataplane thread is next
scheduled for other work. If the RIB has reached a stable state then
this could be a significant period of time. During this period zebra
marks these routes as queued, even though they have actually been
processed by all dataplane providers.

An un-related RIB change which triggers a FIB update will result in
the dataplane thread being scheduled and this completed work then
being processed. At this point the routes will then no longer be
marked as queued by zebra. However this new FIB update might itself
then fall victim to the same scenario!

We can observe the above behaviour in these detailed dplane logs.

    11:24:47 zebra[7282]: dplane: incoming new work counter: 2
    11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:24:47 zebra[7282]: dplane provider 'Kernel': processing
    11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
    11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
    11:24:47 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:24:47 zebra[7282]: dplane dequeues 1 completed work from provider dplane_fpm_nl
    11:24:47 zebra[7282]: dplane has 1 completed, 0 errors, for zebra main

2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
1 completed context was de-queued, so there is outstanding work.

    11:24:58 zebra[7282]: dplane: incoming new work counter: 2
    11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:24:58 zebra[7282]: dplane provider 'Kernel': processing
    11:24:58 zebra[7282]: ID (193) Dplane nexthop update ctx 0x55c429b6fed0 op NH_INSTALL
    11:24:58 zebra[7282]: 0:5.5.5.5/32 Dplane route update ctx 0x55c429b79690 op ROUTE_INSTALL
    11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
    11:24:58 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main

A further 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
2 completed contexts were de-queued, which sounds good as that is what we en-queued.
However, there is an outstanding context from earlier, so there is still outstanding
work.

Indeed the new 5.5.5.5/32 route is marked as queued:

    O>q 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:01:19

This remains the case until we trigger a FIB update by installation of the
(eg.) 10.10.10.10/32 route:

    11:26:41 zebra[7282]: dplane: incoming new work counter: 2
    11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:26:41 zebra[7282]: dplane provider 'Kernel': processing
    11:26:41 zebra[7282]: ID (195) Dplane nexthop update ctx 0x55c429b78ce0 op NH_INSTALL
    11:26:41 zebra[7282]: 0:10.10.10.10/32 Dplane route update ctx 0x55c429b7a040 op ROUTE_INSTALL
    11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
    11:26:41 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
    11:26:41 zebra[7282]: zebra2proto: Please add this protocol(2) to proper rt_netlink.c handling
    11:26:41 zebra[7282]: Nexthop dplane ctx 0x55c429b6fed0, op NH_INSTALL, nexthop ID (193), result SUCCESS
    11:26:41 zebra[7282]: default(0:254):5.5.5.5/32 Processing dplane result ctx 0x55c429b79690, op ROUTE_INSTALL result SUCCESS

We observe the same 2 enqueues and 2 dequeues as before, which again suggests
that there is outstanding work.

As expected, the 5.5.5.5/32 route is no longer marked as queued:

    O>* 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:02:06

But the 10.10.10.10/32 route is, as we have not yet processed the completed
context:

    C>q 10.10.10.10/32 is directly connected, lo, 00:26:05

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-11 15:04:15 +00:00