mirror_frr

mirror of https://git.proxmox.com/git/mirror_frr synced 2025-10-24 00:43:12 +00:00

Author	SHA1	Message	Date
Donald Sharp	3a15018892	zebra: Tell SA that we are intentionally ignoring the return Calling fpm_nl_enqueue we should expect a it fit or not return value on the outgoing stream. This is not necessary to check here because the while loop where we are checking this already has ensured that the data being written will fit. CID -> 1499854 Signed-off-by: Donald Sharp <sharpd@nvidia.com>	2021-01-18 09:06:49 -05:00
Duncan Eastoe	b677907c99	zebra: fpm_nl_process() reschedule dp thread fpm_nl_process() now ensures that the dataplane thread is rescheduled if it hits the work limit while processing its incoming work queue. This would probably already occur due to some other event, such as fpm_process_queue() enqueuing completed work to the output queue, however it does no harm to add this explicit reschedule. Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>	2020-12-22 21:14:03 +00:00
Mark Stapp	700ff41ed3	Merge pull request #7472 from opensourcerouting/fpm-fixes fpm: frr-reload, IPv6 and an improvement	2020-12-22 11:37:58 -05:00
Duncan Eastoe	438dd3e7df	zebra: reduce atomic ops in fpm_process_queue() Maintain the count of contexts which have been processed in a local variable, and perform a single atomic update after we have consumed all queued contexts. Generally this results in at least one less atomic operation per context. Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>	2020-12-18 15:37:13 +00:00
Duncan Eastoe	3f2b998f61	zebra: local var in fpm_process_queue() sched cond Don't use an atomic operation to determine whether fpm_process_queue() needs to be re-scheduled. Instead we can simply use a local variable to determine if we stopped processing because we ran out of buffers. In the case where we would have re-scheduled due to new context objects in the queue (enqueued after we stopped processing), fpm_nl_process() will schedule us (or will have done already). Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>	2020-12-18 15:36:39 +00:00
Duncan Eastoe	bf2f783945	zebra: reduce atomic ops in fpm_nl_process() Maintain the peak ctxqueue length in a local variable, and perform a single atomic update after processing all contexts. Generally this results in at least one less atomic operation per context. Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>	2020-12-18 15:36:38 +00:00
Duncan Eastoe	dc693fe057	zebra: reduce dplane_fpm_nl ctxqueue_mutex contention Reduce code in the critical sections of fpm_nl_process() and fpm_process_queue() to the bare minimum - basically only enqueue and dequeue operations on the shared ctxqueue. Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>	2020-12-18 15:33:46 +00:00
Duncan Eastoe	164d8e8608	zebra: routes stuck with 'q' when using dplane FPM New work enqueued to the dplane_fpm_nl provider is initially de-queued and re-enqueued, in fpm_nl_process(), to be processed by the provider's own thread. After performing this initial de-queue/enqueue we return to dplane_thread_loop() and check the dplane_fpm_nl output queue for any work which has been completed. Since this work is being processed in another thread it is very likely that there will be some (or all) work still outstanding at this point. The dataplane thread finishes up any other tasks and then waits until it is next scheduled. In the meantime the dplane_fpm_nl thread is processing its work queue until completion. The issue arises here as the dataplane thread is not explicitly re-scheduled once dplane_fpm_nl has drained its work queue and populated its output queue with completed work. This completed work can sit in the output queue for an indeterminate period of time, depending upon when the dataplane thread is next scheduled for other work. If the RIB has reached a stable state then this could be a significant period of time. During this period zebra marks these routes as queued, even though they have actually been processed by all dataplane providers. An un-related RIB change which triggers a FIB update will result in the dataplane thread being scheduled and this completed work then being processed. At this point the routes will then no longer be marked as queued by zebra. However this new FIB update might itself then fall victim to the same scenario! We can observe the above behaviour in these detailed dplane logs. 11:24:47 zebra[7282]: dplane: incoming new work counter: 2 11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel' 11:24:47 zebra[7282]: dplane provider 'Kernel': processing 11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9 11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9 11:24:47 zebra[7282]: dplane dequeues 2 completed work from provider Kernel 11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl' 11:24:47 zebra[7282]: dplane dequeues 1 completed work from provider dplane_fpm_nl 11:24:47 zebra[7282]: dplane has 1 completed, 0 errors, for zebra main 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good. 1 completed context was de-queued, so there is outstanding work. 11:24:58 zebra[7282]: dplane: incoming new work counter: 2 11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel' 11:24:58 zebra[7282]: dplane provider 'Kernel': processing 11:24:58 zebra[7282]: ID (193) Dplane nexthop update ctx 0x55c429b6fed0 op NH_INSTALL 11:24:58 zebra[7282]: 0:5.5.5.5/32 Dplane route update ctx 0x55c429b79690 op ROUTE_INSTALL 11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider Kernel 11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl' 11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl 11:24:58 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main A further 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good. 2 completed contexts were de-queued, which sounds good as that is what we en-queued. However, there is an outstanding context from earlier, so there is still outstanding work. Indeed the new 5.5.5.5/32 route is marked as queued: O>q 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:01:19 This remains the case until we trigger a FIB update by installation of the (eg.) 10.10.10.10/32 route: 11:26:41 zebra[7282]: dplane: incoming new work counter: 2 11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel' 11:26:41 zebra[7282]: dplane provider 'Kernel': processing 11:26:41 zebra[7282]: ID (195) Dplane nexthop update ctx 0x55c429b78ce0 op NH_INSTALL 11:26:41 zebra[7282]: 0:10.10.10.10/32 Dplane route update ctx 0x55c429b7a040 op ROUTE_INSTALL 11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider Kernel 11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl' 11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl 11:26:41 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main 11:26:41 zebra[7282]: zebra2proto: Please add this protocol(2) to proper rt_netlink.c handling 11:26:41 zebra[7282]: Nexthop dplane ctx 0x55c429b6fed0, op NH_INSTALL, nexthop ID (193), result SUCCESS 11:26:41 zebra[7282]: default(0:254):5.5.5.5/32 Processing dplane result ctx 0x55c429b79690, op ROUTE_INSTALL result SUCCESS We observe the same 2 enqueues and 2 dequeues as before, which again suggests that there is outstanding work. As expected, the 5.5.5.5/32 route is no longer marked as queued: O>* 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:02:06 But the 10.10.10.10/32 route is, as we have not yet processed the completed context: C>q 10.10.10.10/32 is directly connected, lo, 00:26:05 Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>	2020-12-11 15:04:15 +00:00
Duncan Eastoe	7545bda0a4	dplane_fpm_nl: queue peak counter never increments The context queue length peak counter is always set to its current value, hence never increments. Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>	2020-12-11 12:09:56 +00:00
Rafael Zalamena	f584de526d	fpm: reset/walk data structures on connection Don't attempt to walk data structures while not connected so we can save some CPU usage when FPM server is offline. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-12-03 07:30:23 -03:00
Rafael Zalamena	1f9193c1f0	fpm: simplify reset logic Instead of checking for next group reset, always do it and skip sending if next hop group support is disabled. Also remove unused `*_complete` variables. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-12-03 07:30:23 -03:00
Rafael Zalamena	a3adec468e	zebra,fpm: fix configuration display Use `pI4` and `pI6` to format addresses and fix a bug when displaying IPv6 addresses. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-12-03 07:30:23 -03:00
Donald Sharp	0fb4ab0388	Merge pull request #6950 from opensourcerouting/bfd-distributed-v3 bfdd: distributed BFD	2020-12-02 20:50:47 -05:00
Duncan Eastoe	f9bf1ecc38	zebra: dplane FPM LSP table walk Add routines to walk the LSP table and generate FPM updates for all entries. A walk of the LSP table is triggered when (re-)connecting to an FPM. Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>	2020-11-30 12:13:43 +00:00
Duncan Eastoe	b300c8bbcf	zebra: dplane FPM handle LSP install/update/delete Export netlink_lsp_msg_encoder() and use it to encode and send netlink messages concerning LSP updates to connected FPMs. Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>	2020-11-27 16:32:01 +00:00
Rafael Zalamena	91804f630c	lib: add new stream function to reorganize buffer The function was originally implemented for zebra data plane FPM plugin, but another code places could use it. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-11-24 07:54:07 -03:00
Pat Ruddy	b299808662	zebra: extract evpn mac functions from zebra_vxlan.c Move MAC dB specific functions to zebra_evpn_mac.c Signed-off-by: Pat Ruddy <pat@voltanet.io>	2020-08-12 12:39:33 +01:00
Anuradha Karuppiah	f188e68e5c	zebra: debug flags for MAC-IP sync Filters for zebra debug logs. Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>	2020-08-05 06:46:13 -07:00
Rafael Zalamena	e41e0f8135	zebra,fpm: serialize zebra table walks We were not getting any benefits from attempting to walk all tables at the same time and it made debugging harder, so lets execute one table walk per time. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-07-28 12:34:12 -03:00
Rafael Zalamena	55eb9d4d7d	zebra,fpm: fix race on completion detection Zebra runs on a different thread than FPM, so we need to synchronize them by using events. While here, implement completion detection for all kinds of walk. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-07-28 12:34:12 -03:00
Rafael Zalamena	e1afb97fdd	zebra,fpm: fix input handling Two important fixes: * `stream_read_try` does a dirty trick and converts the `-1` return to `-2` when errno is `EAGAIN`, `EWOULDBLOCK` or `EINTR`. * Don't enable reads until the connection is complete. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-07-28 12:34:12 -03:00
Rafael Zalamena	a203232464	zebra,fpm: fix dead lock on close during startup Serialize the `fpm_reconnect` function by only allowing one part of our code to call it, then make sure all zebra threads executions are done before attempting to close and reset the output stream. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-07-20 09:58:14 -03:00
Jakub Urbańczyk	0be6e7d75d	zebra: check for buffer boundary * Move code encoding Netlink messages to separate functions * Add buffer bounds checking while creating Nelink messages Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>	2020-06-13 22:56:25 +02:00
Rafael Zalamena	b55ab92abd	fpm: add toggle to enable/disable next hop groups If you haven't migrated your FPM server to use next hop groups, it is possible that you want to disable this feature. This commit implements a toggle to enable/disable next hop groups usage (even if your Linux kernel is not using it). Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-05-05 16:22:07 -03:00
Rafael Zalamena	981ca5976f	fpm: send all next hop groups on startup Implement the next hop group send on startup if you are using them. Normally you will only have them if you are already using this Linux kernel feature. NOTE: to make sure all next hop groups exist, we send/enqueue all next hop groups first and then we send routes. The RIB route walk start is at the end of the function `fpm_nhg_send()`. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-05-05 16:21:44 -03:00
Rafael Zalamena	e9a1cd931b	fpm: add next hop group support Add support for the new kernel messages: `RTM_NEWNEXTHOP` and `RTM_DELNEXTHOP`. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-05-05 10:54:06 -03:00
Rafael Zalamena	c69e7ab7d9	fpm: don't check for NULL on async events `thread_cancel_async` already handles the case of NULL events. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-05-05 08:48:59 -03:00
David Lamparter	7309092bf4	*: fix first header Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2020-04-27 09:52:41 +02:00
Quentin Young	e15361b322	Merge pull request #6253 from opensourcerouting/fpm-extra zebra/fpm: fix shutdown and add more documentation	2020-04-21 11:28:05 -04:00
Rafael Zalamena	98a8750481	zebra: gracefully shutdown fpm module Lets stop and free all resources before shutting down. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-17 20:18:58 -03:00
David Lamparter	893d8beb4d	zebra: fix FPM node reusing VTY_NODE Signed-off-by: David Lamparter <equinox@opensourcerouting.org>	2020-04-16 12:54:03 +02:00
David Lamparter	612c2c15d8	*: remove second parameter on install_node() There is really no reason to not put this in the cmd_node. And while we're add it, rename from pointless ".func" to ".config_write". [v2: fix forgotten ldpd config_write] Signed-off-by: David Lamparter <equinox@diac24.net>	2020-04-16 12:53:00 +02:00
David Lamparter	249a771b63	*: remove cmd_node->vtysh The only nodes that have this as 0 don't have a "->func" anyway, so the entire thing is really just pointless. Signed-off-by: David Lamparter <equinox@diac24.net>	2020-04-16 12:53:00 +02:00
Rafael Zalamena	9d5c32682f	zebra: fix hash_backet typo in data plane FPM Implement the fix made in `master` to the remain pieces of code in the data plane FPM module. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 14:05:52 -03:00
Rafael Zalamena	e5e444d84a	zebra: hide verbose data plane FPM log messages To enable them just configure `debug zebra fpm`. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 14:05:52 -03:00
Rafael Zalamena	a50404aaae	zebra: fix some formatting/style issues * Break lines longer than 80 columns. * Remove space after '('. * Use '%pIX' instead of 'inet_ntop'. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 14:05:52 -03:00
Rafael Zalamena	f2a0ba3a50	zebra: data plane FPM add support RMAC VNI Store VNI information in the data plane context so we can use it to build the FPM netlink update with that information later. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	770a8d284c	zebra: fix style on data plane FPM module * Use 32bit atomic instead of 64bit. * Don't use semicolon at the end of macros. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	c871e6c9d1	build: fix data plane FPM netlink module Changes: * Let the package builder scripts know that we have a new module that needs to be taken care of. * Include the frr atomic header to avoid undeclared atomic operations. * Disable build on BSDs because the code is using some zebra netlink functions only available for Linux. Move data plane FPM module outside old FPM automake definition. * Fix atomic usage for Ubuntu 14.04 (always use explicit). Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	edfeff4251	zebra: use atomic operations in FPM FPM has a thread to encode and enqueue output buffer that might compete with zebra RIB/RMAC walk on startup, so lets use atomic operations to make sure we are not getting statistic/counters wrong. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	ba803a2fbe	zebra: queue data plane context for FPM Enqueue all contexts inside FPM to avoid losing updates and to move all processing to the FPM thread. This helps in situations with huge amount of routes (e.g. BGP peer flapping with a million routes). Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	ad4d102259	zebra: improve FPM output buffer handling Add counters to debug the output buffer usage and pull down its data when the remote receiver is slow (so we get more space for writes). Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	a179ba35a5	zebra: simplify FPM buffer full detection Remove code duplication and document hardcoded values. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	6cc059cdd6	zebra: implement FPM counters Add commands to show and reset FPM counters. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	3bdd7fcab9	zebra: CLI commands for new FPM interface Add commands to enable/disable and configure FPM. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	bda10adfa3	zebra: data plane FPM RMAC walk code Implement the code that walks the RMAC to send routes that are already inside installed in the OS. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 13:45:39 -03:00
Rafael Zalamena	018e77bcb5	zebra: data plane FPM RIB walk code Implement the code that walks the RIB to send routes that are already inside the RIB. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 11:44:39 -03:00
Rafael Zalamena	d35f447d67	zebra: data plane plugin for FPM netlink Initial import of the new zebra data plane plugin for FPM netlink. Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>	2020-04-14 11:44:39 -03:00

48 Commits