Commit Graph

48 Commits

Author SHA1 Message Date
Donald Sharp
3a15018892 zebra: Tell SA that we are intentionally ignoring the return
Calling fpm_nl_enqueue we should expect a it fit or not
return value on the outgoing stream.  This is not necessary
to check here because the while loop where we are checking this
already has ensured that the data being written will fit.

CID -> 1499854
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-01-18 09:06:49 -05:00
Duncan Eastoe
b677907c99 zebra: fpm_nl_process() reschedule dp thread
fpm_nl_process() now ensures that the dataplane thread is rescheduled
if it hits the work limit while processing its incoming work queue.

This would probably already occur due to some other event, such as
fpm_process_queue() enqueuing completed work to the output queue,
however it does no harm to add this explicit reschedule.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-22 21:14:03 +00:00
Mark Stapp
700ff41ed3
Merge pull request #7472 from opensourcerouting/fpm-fixes
fpm: frr-reload, IPv6 and an improvement
2020-12-22 11:37:58 -05:00
Duncan Eastoe
438dd3e7df zebra: reduce atomic ops in fpm_process_queue()
Maintain the count of contexts which have been processed in a local
variable, and perform a single atomic update after we have consumed
all queued contexts.

Generally this results in at least one less atomic operation per
context.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-18 15:37:13 +00:00
Duncan Eastoe
3f2b998f61 zebra: local var in fpm_process_queue() sched cond
Don't use an atomic operation to determine whether fpm_process_queue()
needs to be re-scheduled. Instead we can simply use a local variable
to determine if we stopped processing because we ran out of buffers.

In the case where we would have re-scheduled due to new context objects
in the queue (enqueued after we stopped processing), fpm_nl_process()
will schedule us (or will have done already).

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-18 15:36:39 +00:00
Duncan Eastoe
bf2f783945 zebra: reduce atomic ops in fpm_nl_process()
Maintain the peak ctxqueue length in a local variable, and perform
a single atomic update after processing all contexts.

Generally this results in at least one less atomic operation per
context.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-18 15:36:38 +00:00
Duncan Eastoe
dc693fe057 zebra: reduce dplane_fpm_nl ctxqueue_mutex contention
Reduce code in the critical sections of fpm_nl_process() and
fpm_process_queue() to the bare minimum - basically only enqueue
and dequeue operations on the shared ctxqueue.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-18 15:33:46 +00:00
Duncan Eastoe
164d8e8608 zebra: routes stuck with 'q' when using dplane FPM
New work enqueued to the dplane_fpm_nl provider is initially de-queued
and re-enqueued, in fpm_nl_process(), to be processed by the provider's
own thread.

After performing this initial de-queue/enqueue we return to
dplane_thread_loop() and check the dplane_fpm_nl output queue for any
work which has been completed.

Since this work is being processed in another thread it is very likely
that there will be some (or all) work still outstanding at this point.
The dataplane thread finishes up any other tasks and then waits until
it is next scheduled. In the meantime the dplane_fpm_nl thread is
processing its work queue until completion.

The issue arises here as the dataplane thread is not explicitly
re-scheduled once dplane_fpm_nl has drained its work queue and
populated its output queue with completed work.

This completed work can sit in the output queue for an indeterminate
period of time, depending upon when the dataplane thread is next
scheduled for other work. If the RIB has reached a stable state then
this could be a significant period of time. During this period zebra
marks these routes as queued, even though they have actually been
processed by all dataplane providers.

An un-related RIB change which triggers a FIB update will result in
the dataplane thread being scheduled and this completed work then
being processed. At this point the routes will then no longer be
marked as queued by zebra. However this new FIB update might itself
then fall victim to the same scenario!

We can observe the above behaviour in these detailed dplane logs.

    11:24:47 zebra[7282]: dplane: incoming new work counter: 2
    11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:24:47 zebra[7282]: dplane provider 'Kernel': processing
    11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
    11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
    11:24:47 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:24:47 zebra[7282]: dplane dequeues 1 completed work from provider dplane_fpm_nl
    11:24:47 zebra[7282]: dplane has 1 completed, 0 errors, for zebra main

2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
1 completed context was de-queued, so there is outstanding work.

    11:24:58 zebra[7282]: dplane: incoming new work counter: 2
    11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:24:58 zebra[7282]: dplane provider 'Kernel': processing
    11:24:58 zebra[7282]: ID (193) Dplane nexthop update ctx 0x55c429b6fed0 op NH_INSTALL
    11:24:58 zebra[7282]: 0:5.5.5.5/32 Dplane route update ctx 0x55c429b79690 op ROUTE_INSTALL
    11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
    11:24:58 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main

A further 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
2 completed contexts were de-queued, which sounds good as that is what we en-queued.
However, there is an outstanding context from earlier, so there is still outstanding
work.

Indeed the new 5.5.5.5/32 route is marked as queued:

    O>q 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:01:19

This remains the case until we trigger a FIB update by installation of the
(eg.) 10.10.10.10/32 route:

    11:26:41 zebra[7282]: dplane: incoming new work counter: 2
    11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:26:41 zebra[7282]: dplane provider 'Kernel': processing
    11:26:41 zebra[7282]: ID (195) Dplane nexthop update ctx 0x55c429b78ce0 op NH_INSTALL
    11:26:41 zebra[7282]: 0:10.10.10.10/32 Dplane route update ctx 0x55c429b7a040 op ROUTE_INSTALL
    11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
    11:26:41 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
    11:26:41 zebra[7282]: zebra2proto: Please add this protocol(2) to proper rt_netlink.c handling
    11:26:41 zebra[7282]: Nexthop dplane ctx 0x55c429b6fed0, op NH_INSTALL, nexthop ID (193), result SUCCESS
    11:26:41 zebra[7282]: default(0:254):5.5.5.5/32 Processing dplane result ctx 0x55c429b79690, op ROUTE_INSTALL result SUCCESS

We observe the same 2 enqueues and 2 dequeues as before, which again suggests
that there is outstanding work.

As expected, the 5.5.5.5/32 route is no longer marked as queued:

    O>* 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:02:06

But the 10.10.10.10/32 route is, as we have not yet processed the completed
context:

    C>q 10.10.10.10/32 is directly connected, lo, 00:26:05

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-11 15:04:15 +00:00
Duncan Eastoe
7545bda0a4 dplane_fpm_nl: queue peak counter never increments
The context queue length peak counter is always set to its current
value, hence never increments.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-11 12:09:56 +00:00
Rafael Zalamena
f584de526d fpm: reset/walk data structures on connection
Don't attempt to walk data structures while not connected so we can
save some CPU usage when FPM server is offline.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-12-03 07:30:23 -03:00
Rafael Zalamena
1f9193c1f0 fpm: simplify reset logic
Instead of checking for next group reset, always do it and skip sending
if next hop group support is disabled.

Also remove unused `*_complete` variables.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-12-03 07:30:23 -03:00
Rafael Zalamena
a3adec468e zebra,fpm: fix configuration display
Use `pI4` and `pI6` to format addresses and fix a bug when displaying
IPv6 addresses.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-12-03 07:30:23 -03:00
Donald Sharp
0fb4ab0388
Merge pull request #6950 from opensourcerouting/bfd-distributed-v3
bfdd: distributed BFD
2020-12-02 20:50:47 -05:00
Duncan Eastoe
f9bf1ecc38 zebra: dplane FPM LSP table walk
Add routines to walk the LSP table and generate FPM updates for all
entries. A walk of the LSP table is triggered when (re-)connecting
to an FPM.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-11-30 12:13:43 +00:00
Duncan Eastoe
b300c8bbcf zebra: dplane FPM handle LSP install/update/delete
Export netlink_lsp_msg_encoder() and use it to encode and send netlink
messages concerning LSP updates to connected FPMs.

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-11-27 16:32:01 +00:00
Rafael Zalamena
91804f630c lib: add new stream function to reorganize buffer
The function was originally implemented for zebra data plane FPM plugin,
but another code places could use it.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-11-24 07:54:07 -03:00
Pat Ruddy
b299808662 zebra: extract evpn mac functions from zebra_vxlan.c
Move MAC dB specific functions to zebra_evpn_mac.c

Signed-off-by: Pat Ruddy <pat@voltanet.io>
2020-08-12 12:39:33 +01:00
Anuradha Karuppiah
f188e68e5c zebra: debug flags for MAC-IP sync
Filters for zebra debug logs.

Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
2020-08-05 06:46:13 -07:00
Rafael Zalamena
e41e0f8135 zebra,fpm: serialize zebra table walks
We were not getting any benefits from attempting to walk all tables at the
same time and it made debugging harder, so lets execute one table walk
per time.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-07-28 12:34:12 -03:00
Rafael Zalamena
55eb9d4d7d zebra,fpm: fix race on completion detection
Zebra runs on a different thread than FPM, so we need to synchronize
them by using events. While here, implement completion detection for all
kinds of walk.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-07-28 12:34:12 -03:00
Rafael Zalamena
e1afb97fdd zebra,fpm: fix input handling
Two important fixes:

* `stream_read_try` does a dirty trick and converts the `-1` return to
  `-2` when errno is `EAGAIN`, `EWOULDBLOCK` or `EINTR`.
* Don't enable reads until the connection is complete.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-07-28 12:34:12 -03:00
Rafael Zalamena
a203232464 zebra,fpm: fix dead lock on close during startup
Serialize the `fpm_reconnect` function by only allowing one part of our
code to call it, then make sure all zebra threads executions are done
before attempting to close and reset the output stream.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-07-20 09:58:14 -03:00
Jakub Urbańczyk
0be6e7d75d zebra: check for buffer boundary
* Move code encoding Netlink messages to separate functions
 * Add buffer bounds checking while creating Nelink messages

Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
2020-06-13 22:56:25 +02:00
Rafael Zalamena
b55ab92abd fpm: add toggle to enable/disable next hop groups
If you haven't migrated your FPM server to use next hop groups, it is
possible that you want to disable this feature. This commit implements
a toggle to enable/disable next hop groups usage (even if your Linux
kernel is not using it).

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-05-05 16:22:07 -03:00
Rafael Zalamena
981ca5976f fpm: send all next hop groups on startup
Implement the next hop group send on startup if you are using
them. Normally you will only have them if you are already using this
Linux kernel feature.

NOTE: to make sure all next hop groups exist, we send/enqueue all next
hop groups first and then we send routes. The RIB route walk start is
at the end of the function `fpm_nhg_send()`.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-05-05 16:21:44 -03:00
Rafael Zalamena
e9a1cd931b fpm: add next hop group support
Add support for the new kernel messages: `RTM_NEWNEXTHOP` and
`RTM_DELNEXTHOP`.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-05-05 10:54:06 -03:00
Rafael Zalamena
c69e7ab7d9 fpm: don't check for NULL on async events
`thread_cancel_async` already handles the case of NULL events.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-05-05 08:48:59 -03:00
David Lamparter
7309092bf4 *: fix first header
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2020-04-27 09:52:41 +02:00
Quentin Young
e15361b322
Merge pull request #6253 from opensourcerouting/fpm-extra
zebra/fpm: fix shutdown and add more documentation
2020-04-21 11:28:05 -04:00
Rafael Zalamena
98a8750481 zebra: gracefully shutdown fpm module
Lets stop and free all resources before shutting down.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-17 20:18:58 -03:00
David Lamparter
893d8beb4d zebra: fix FPM node reusing VTY_NODE
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2020-04-16 12:54:03 +02:00
David Lamparter
612c2c15d8 *: remove second parameter on install_node()
There is really no reason to not put this in the cmd_node.

And while we're add it, rename from pointless ".func" to ".config_write".

[v2: fix forgotten ldpd config_write]

Signed-off-by: David Lamparter <equinox@diac24.net>
2020-04-16 12:53:00 +02:00
David Lamparter
249a771b63 *: remove cmd_node->vtysh
The only nodes that have this as 0 don't have a "->func" anyway, so the
entire thing is really just pointless.

Signed-off-by: David Lamparter <equinox@diac24.net>
2020-04-16 12:53:00 +02:00
Rafael Zalamena
9d5c32682f zebra: fix hash_backet typo in data plane FPM
Implement the fix made in `master` to the remain pieces of code in the
data plane FPM module.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 14:05:52 -03:00
Rafael Zalamena
e5e444d84a zebra: hide verbose data plane FPM log messages
To enable them just configure `debug zebra fpm`.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 14:05:52 -03:00
Rafael Zalamena
a50404aaae zebra: fix some formatting/style issues
* Break lines longer than 80 columns.
* Remove space after '('.
* Use '%pIX' instead of 'inet_ntop'.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 14:05:52 -03:00
Rafael Zalamena
f2a0ba3a50 zebra: data plane FPM add support RMAC VNI
Store VNI information in the data plane context so we can use it to
build the FPM netlink update with that information later.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
770a8d284c zebra: fix style on data plane FPM module
*   Use 32bit atomic instead of 64bit.
*   Don't use semicolon at the end of macros.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
c871e6c9d1 build: fix data plane FPM netlink module
Changes:

*   Let the package builder scripts know that we have a new module that
    needs to be taken care of.
*   Include the frr atomic header to avoid undeclared atomic operations.
*   Disable build on *BSDs because the code is using some zebra netlink
    functions only available for Linux.
*   Move data plane FPM module outside old FPM automake definition.
*   Fix atomic usage for Ubuntu 14.04 (always use explicit).

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
edfeff4251 zebra: use atomic operations in FPM
FPM has a thread to encode and enqueue output buffer that might compete
with zebra RIB/RMAC walk on startup, so lets use atomic operations to
make sure we are not getting statistic/counters wrong.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
ba803a2fbe zebra: queue data plane context for FPM
Enqueue all contexts inside FPM to avoid losing updates and to move all
processing to the FPM thread.

This helps in situations with huge amount of routes (e.g. BGP peer
flapping with a million routes).

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
ad4d102259 zebra: improve FPM output buffer handling
Add counters to debug the output buffer usage and pull down its data
when the remote receiver is slow (so we get more space for writes).

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
a179ba35a5 zebra: simplify FPM buffer full detection
Remove code duplication and document hardcoded values.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
6cc059cdd6 zebra: implement FPM counters
Add commands to show and reset FPM counters.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
3bdd7fcab9 zebra: CLI commands for new FPM interface
Add commands to enable/disable and configure FPM.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
bda10adfa3 zebra: data plane FPM RMAC walk code
Implement the code that walks the RMAC to send routes that are already
inside installed in the OS.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 13:45:39 -03:00
Rafael Zalamena
018e77bcb5 zebra: data plane FPM RIB walk code
Implement the code that walks the RIB to send routes that are already
inside the RIB.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 11:44:39 -03:00
Rafael Zalamena
d35f447d67 zebra: data plane plugin for FPM netlink
Initial import of the new zebra data plane plugin for FPM netlink.

Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2020-04-14 11:44:39 -03:00