mirror_frr/zebra
Duncan Eastoe 164d8e8608 zebra: routes stuck with 'q' when using dplane FPM
New work enqueued to the dplane_fpm_nl provider is initially de-queued
and re-enqueued, in fpm_nl_process(), to be processed by the provider's
own thread.

After performing this initial de-queue/enqueue we return to
dplane_thread_loop() and check the dplane_fpm_nl output queue for any
work which has been completed.

Since this work is being processed in another thread it is very likely
that there will be some (or all) work still outstanding at this point.
The dataplane thread finishes up any other tasks and then waits until
it is next scheduled. In the meantime the dplane_fpm_nl thread is
processing its work queue until completion.

The issue arises here as the dataplane thread is not explicitly
re-scheduled once dplane_fpm_nl has drained its work queue and
populated its output queue with completed work.

This completed work can sit in the output queue for an indeterminate
period of time, depending upon when the dataplane thread is next
scheduled for other work. If the RIB has reached a stable state then
this could be a significant period of time. During this period zebra
marks these routes as queued, even though they have actually been
processed by all dataplane providers.

An un-related RIB change which triggers a FIB update will result in
the dataplane thread being scheduled and this completed work then
being processed. At this point the routes will then no longer be
marked as queued by zebra. However this new FIB update might itself
then fall victim to the same scenario!

We can observe the above behaviour in these detailed dplane logs.

    11:24:47 zebra[7282]: dplane: incoming new work counter: 2
    11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:24:47 zebra[7282]: dplane provider 'Kernel': processing
    11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
    11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
    11:24:47 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:24:47 zebra[7282]: dplane dequeues 1 completed work from provider dplane_fpm_nl
    11:24:47 zebra[7282]: dplane has 1 completed, 0 errors, for zebra main

2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
1 completed context was de-queued, so there is outstanding work.

    11:24:58 zebra[7282]: dplane: incoming new work counter: 2
    11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:24:58 zebra[7282]: dplane provider 'Kernel': processing
    11:24:58 zebra[7282]: ID (193) Dplane nexthop update ctx 0x55c429b6fed0 op NH_INSTALL
    11:24:58 zebra[7282]: 0:5.5.5.5/32 Dplane route update ctx 0x55c429b79690 op ROUTE_INSTALL
    11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
    11:24:58 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main

A further 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
2 completed contexts were de-queued, which sounds good as that is what we en-queued.
However, there is an outstanding context from earlier, so there is still outstanding
work.

Indeed the new 5.5.5.5/32 route is marked as queued:

    O>q 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:01:19

This remains the case until we trigger a FIB update by installation of the
(eg.) 10.10.10.10/32 route:

    11:26:41 zebra[7282]: dplane: incoming new work counter: 2
    11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:26:41 zebra[7282]: dplane provider 'Kernel': processing
    11:26:41 zebra[7282]: ID (195) Dplane nexthop update ctx 0x55c429b78ce0 op NH_INSTALL
    11:26:41 zebra[7282]: 0:10.10.10.10/32 Dplane route update ctx 0x55c429b7a040 op ROUTE_INSTALL
    11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
    11:26:41 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
    11:26:41 zebra[7282]: zebra2proto: Please add this protocol(2) to proper rt_netlink.c handling
    11:26:41 zebra[7282]: Nexthop dplane ctx 0x55c429b6fed0, op NH_INSTALL, nexthop ID (193), result SUCCESS
    11:26:41 zebra[7282]: default(0:254):5.5.5.5/32 Processing dplane result ctx 0x55c429b79690, op ROUTE_INSTALL result SUCCESS

We observe the same 2 enqueues and 2 dequeues as before, which again suggests
that there is outstanding work.

As expected, the 5.5.5.5/32 route is no longer marked as queued:

    O>* 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:02:06

But the 10.10.10.10/32 route is, as we have not yet processed the completed
context:

    C>q 10.10.10.10/32 is directly connected, lo, 00:26:05

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-11 15:04:15 +00:00
..
.gitignore *: cleanup .gitignore files 2018-09-08 21:30:42 +02:00
connected.c zebra: Add --asic-offload command 2020-11-15 10:19:25 -05:00
connected.h zebra: Do not use connection dest for bcast 2019-08-18 18:54:46 +02:00
debug.c *: move "show debugging ..." commands to enable node 2020-10-02 15:06:05 +03:00
debug.h zebra: debug flags for MAC-IP sync 2020-08-05 06:46:13 -07:00
dplane_fpm_nl.c zebra: routes stuck with 'q' when using dplane FPM 2020-12-11 15:04:15 +00:00
if_ioctl.c Revert "Ospf missing interface handling 2" 2019-06-23 19:46:39 -04:00
if_netlink.c zebra: uplink tracking and startup delay for EVPN-MH 2020-10-27 09:34:09 -07:00
if_netlink.h zebra: netlink message batching 2020-08-10 21:42:43 +02:00
if_sysctl.c *: Remove solaris from FRR 2020-09-21 10:02:20 -04:00
interface.c zebra: clean up zebra_protodown_rc_str() 2020-10-29 12:03:25 -04:00
interface.h zebra: clean up zebra_protodown_rc_str() 2020-10-29 12:03:25 -04:00
ioctl.c zebra: ifi_link_state is the link state 2020-10-13 11:32:36 +01:00
ioctl.h *: Remove solaris from FRR 2020-09-21 10:02:20 -04:00
ipforward_proc.c zebra: Cleanup set but unused variables 2020-02-27 09:41:58 -05:00
ipforward_sysctl.c *: Remove solaris from FRR 2020-09-21 10:02:20 -04:00
ipforward.h add cplusplus guards to all zebra headers 2019-03-25 16:05:27 +01:00
irdp_interface.c *: un-split strings across lines 2020-07-14 10:37:25 +02:00
irdp_main.c * : update signature of thread_cancel api 2020-10-23 08:59:34 -04:00
irdp_packet.c zebra: replace inet_ntoa 2020-10-22 13:37:25 -04:00
irdp.h *: strip trailing whitespace 2019-09-30 16:44:43 +00:00
kernel_netlink.c zebra: dplane APIs for programming evpn-mh access port attributes 2020-10-26 10:32:51 -07:00
kernel_netlink.h zebra: remove fuzzing stuff 2020-08-25 17:31:07 +02:00
kernel_socket.c zebra: Consolidate on 32 bits as the flag size for route flags 2020-10-29 09:13:59 -04:00
kernel_socket.h add cplusplus guards to all zebra headers 2019-03-25 16:05:27 +01:00
label_manager.c zebra: unexpose label-manager util-funcs as static 2020-12-10 09:56:55 +09:00
label_manager.h zebra: unexpose label-manager util-funcs as static 2020-12-10 09:56:55 +09:00
main.c zebra: Add --asic-offload command 2020-11-15 10:19:25 -05:00
Makefile
redistribute.c :* Convert prefix2str to %pFX 2020-10-22 09:07:41 +03:00
redistribute.h zebra: revise redistribution delete to improve update case 2019-09-12 08:51:05 -04:00
rib.h zebra: Gather opaque data into the route entry for storage 2020-12-08 09:06:08 -05:00
router-id.c *: Correct spelling stuff 2020-10-29 16:16:00 -04:00
router-id.h zebra: add IPv6 router-id 2020-07-17 17:39:05 +02:00
rt_netlink.c zebra: support for slow-failover of local MACs on an ES 2020-12-01 09:46:26 -08:00
rt_netlink.h zebra: dplane FPM handle LSP install/update/delete 2020-11-27 16:32:01 +00:00
rt_socket.c :* Convert prefix2str to %pFX 2020-10-22 09:07:41 +03:00
rt.h zebra: remove old kernel one-update-at-a-time api 2020-08-10 21:57:04 +02:00
rtadv.c :* Convert prefix2str to %pFX 2020-10-22 09:07:41 +03:00
rtadv.h zebra: Remove enum around ipv6_nd_suppress_ra_status 2020-05-08 08:08:04 -04:00
rtread_netlink.c zebra: evpn remote delete fetch local entry 2018-12-31 14:40:31 -08:00
rtread_sysctl.c *: Remove solaris from FRR 2020-09-21 10:02:20 -04:00
rule_netlink.c :* Convert prefix2str to %pFX 2020-10-22 09:07:41 +03:00
rule_netlink.h zebra: netlink message batching 2020-08-10 21:42:43 +02:00
rule_socket.c zebra: convert ip rule installation to use dplane thread 2020-06-10 16:18:45 +02:00
sample_plugin.c zebra: Add a sample dataplane plugin module 2019-10-31 16:24:16 -04:00
subdir.am vtysh: dynamically generate the list of daemons for commands 2020-10-02 15:06:27 +03:00
table_manager.c *: Remove solaris from FRR 2020-09-21 10:02:20 -04:00
table_manager.h add cplusplus guards to all zebra headers 2019-03-25 16:05:27 +01:00
testrib.conf
zapi_msg.c Merge pull request #7678 from donaldsharp/aspath_to_zebra 2020-12-10 10:38:14 -05:00
zapi_msg.h zebra: Adding zapi client close notification 2020-12-07 18:22:36 -05:00
zebra_dplane.c zebra: dplane API to get provider output q length 2020-12-11 15:04:11 +00:00
zebra_dplane.h zebra: dplane API to get provider output q length 2020-12-11 15:04:11 +00:00
zebra_errors.c zebra: remove unused EC_ZEBRA_IF_LOOKUP_FAILED 2020-12-01 13:05:36 -05:00
zebra_errors.h zebra: remove unused EC_ZEBRA_IF_LOOKUP_FAILED 2020-12-01 13:05:36 -05:00
zebra_evpn_mac.c zebra: debug logs to detect incorrect mac deletions 2020-12-01 09:46:28 -08:00
zebra_evpn_mac.h zebra: support for slow-failover of local MACs on an ES 2020-12-01 09:46:26 -08:00
zebra_evpn_mh.c zebra: allocate one nexthop id per-VTEP instead of one per-ES-VTEP 2020-12-01 09:46:28 -08:00
zebra_evpn_mh.h zebra: change the L2 NHG id format to co-exist with the L3NHG ids 2020-12-01 09:46:28 -08:00
zebra_evpn_neigh.c zebra: Keep DAD disabled if EVPN MH is turned on 2020-11-24 10:20:32 -08:00
zebra_evpn_neigh.h zebra: Add uptime to show evpn arp-cache vni .. detail 2020-10-26 16:47:07 -04:00
zebra_evpn_vxlan.h zebra: extract core EVPN functions from zebra_vxlan.c 2020-08-12 12:39:34 +01:00
zebra_evpn.c zebra: Reduce warn -> debug 2020-11-30 19:37:53 -05:00
zebra_evpn.h zebra: support for macvlan interfaces 2020-09-11 18:26:23 +02:00
zebra_fpm_dt.c zebra: replace inet_ntoa 2020-10-22 13:37:25 -04:00
zebra_fpm_netlink.c zebra: replace inet_ntoa 2020-10-22 13:37:25 -04:00
zebra_fpm_private.h Zebra: Build nelink message for RMAC updates 2019-06-17 12:05:38 -07:00
zebra_fpm_protobuf.c *: Do not cast to the same type 2020-04-08 17:15:06 +03:00
zebra_fpm.c *: unify thread/event cancel macros 2020-10-23 12:16:52 -04:00
zebra_gr.c :* Convert prefix2str to %pFX 2020-10-22 09:07:41 +03:00
zebra_l2.c zebra: uplink tracking and startup delay for EVPN-MH 2020-10-27 09:34:09 -07:00
zebra_l2.h zebra: uplink tracking and startup delay for EVPN-MH 2020-10-27 09:34:09 -07:00
zebra_memory.c zebra: Gather opaque data into the route entry for storage 2020-12-08 09:06:08 -05:00
zebra_memory.h zebra: Gather opaque data into the route entry for storage 2020-12-08 09:06:08 -05:00
zebra_mlag_private.c zebra: Isolate mlag_rd_buf_offset to the actual using function 2020-10-13 16:02:05 -04:00
zebra_mlag_vty.c zebra: Do not build mlag protobuf support if version 3 is not avail 2019-12-15 09:37:51 -05:00
zebra_mlag_vty.h lib, zebra: add missing extern "C" {} blocks to new header files 2020-04-22 23:49:22 -03:00
zebra_mlag.c zebra: Isolate mlag_rd_buf_offset to the actual using function 2020-10-13 16:02:05 -04:00
zebra_mlag.h zebra: Isolate mlag_rd_buf_offset to the actual using function 2020-10-13 16:02:05 -04:00
zebra_mpls_netlink.c zebra: dplane FPM handle LSP install/update/delete 2020-11-27 16:32:01 +00:00
zebra_mpls_null.c zebra: convert PW updates to async dataplane 2019-01-25 10:45:57 -05:00
zebra_mpls_openbsd.c *: un-split strings across lines 2020-07-14 10:37:25 +02:00
zebra_mpls_vty.c *: move CLI node names to cmd_node->name 2020-04-16 12:53:59 +02:00
zebra_mpls.c zebra: Fix prefix2str buf and some invalid data output in zebra_mpls.c 2020-10-26 09:38:33 -04:00
zebra_mpls.h zebra: dplane FPM LSP table walk 2020-11-30 12:13:43 +00:00
zebra_mroute.c zebra: replace inet_ntoa 2020-10-22 13:37:25 -04:00
zebra_mroute.h add cplusplus guards to all zebra headers 2019-03-25 16:05:27 +01:00
zebra_nb_config.c zebra: Allow set src X to work on startup 2020-11-13 16:12:26 -05:00
zebra_nb_rpcs.c zebra: display rpc error msg to vtysh 2020-10-05 13:57:54 -07:00
zebra_nb_state.c staticd: add support for SR Policies 2020-08-12 13:28:48 +02:00
zebra_nb.c staticd: add support for SR Policies 2020-08-12 13:28:48 +02:00
zebra_nb.h staticd: add support for SR Policies 2020-08-12 13:28:48 +02:00
zebra_netns_id.c Merge pull request #7148 from pguibert6WIND/fix_fd_not_closed 2020-09-23 07:40:14 -04:00
zebra_netns_id.h zebra: dynamically detect vxlan link interfaces in other netns 2020-09-11 18:26:23 +02:00
zebra_netns_notify.c * : update signature of thread_cancel api 2020-10-23 08:59:34 -04:00
zebra_netns_notify.h add cplusplus guards to all zebra headers 2019-03-25 16:05:27 +01:00
zebra_nhg_private.h lib, zebra: add missing extern "C" {} blocks to new header files 2020-04-22 23:49:22 -03:00
zebra_nhg.c zebra: make a couple NHG errors debugs 2020-12-01 12:04:30 -05:00
zebra_nhg.h zebra: change the L2 NHG id format to co-exist with the L3NHG ids 2020-12-01 09:46:28 -08:00
zebra_ns.c vrf: VRF_DEFAULT must be 0, remove useless code 2020-09-21 10:17:35 +02:00
zebra_ns.h lib, zebra: reuse and adapt ns_list walk functionality 2020-09-11 18:26:23 +02:00
zebra_opaque.c zebra: quiet the zebra opaque message debugs 2020-10-13 14:07:17 -04:00
zebra_opaque.h zebra: add zebra opaque module 2020-06-02 08:20:54 -04:00
zebra_pbr.c bgpd, lib, pbrd, zebra: Pass by ifname 2020-09-11 20:04:45 -04:00
zebra_pbr.h zebra: add icmpv6 table of type / code 2020-08-21 13:37:08 +02:00
zebra_ptm_redistribute.c zebra: Add missing c-bit uint8_t 2020-03-17 16:01:59 -04:00
zebra_ptm_redistribute.h add cplusplus guards to all zebra headers 2019-03-25 16:05:27 +01:00
zebra_ptm.c *: unify thread/event cancel macros 2020-10-23 12:16:52 -04:00
zebra_ptm.h add cplusplus guards to all zebra headers 2019-03-25 16:05:27 +01:00
zebra_pw.c *: unify thread/event cancel macros 2020-10-23 12:16:52 -04:00
zebra_pw.h ldpd: Relay data plane pseudowire status in LDP notification 2020-06-01 13:21:37 -04:00
zebra_rib.c zebra: Gather opaque data into the route entry for storage 2020-12-08 09:06:08 -05:00
zebra_rnh.c zebra: fix writing to pointer instead of value 2020-11-18 19:05:30 +03:00
zebra_rnh.h zebra: cleanup zebra_rnh.c debugs 2020-10-02 12:15:03 -04:00
zebra_routemap.c Merge pull request #7524 from donaldsharp/zebra_route_map_tighten 2020-12-10 11:01:25 +02:00
zebra_routemap.h zebra: Disable rmap update thread before routemap_finish while shutting down zebra 2020-03-16 23:57:45 -07:00
zebra_router.c zebra: Add --asic-offload command 2020-11-15 10:19:25 -05:00
zebra_router.h zebra: Add --asic-offload command 2020-11-15 10:19:25 -05:00
zebra_snmp.c zebra: in_addr_cmp and struct prefix are not happy 2020-04-16 20:14:55 -04:00
zebra_srte.c lib, zebra: Add SR-TE policy infrastructure to zebra 2020-08-07 11:08:49 +02:00
zebra_srte.h lib, zebra: Add SR-TE policy infrastructure to zebra 2020-08-07 11:08:49 +02:00
zebra_vrf.c zebra: anticipate zns creation at vrf creation when backend is vrf-lite 2020-12-09 13:26:20 +00:00
zebra_vrf.h zebra: rename vni to evpn where appropriate 2020-08-12 12:39:33 +01:00
zebra_vty.c Merge pull request #7678 from donaldsharp/aspath_to_zebra 2020-12-10 10:38:14 -05:00
zebra_vxlan_private.h zebra: Keep DAD disabled if EVPN MH is turned on 2020-11-24 10:20:32 -08:00
zebra_vxlan.c zebra: support for slow-failover of local MACs on an ES 2020-12-01 09:46:26 -08:00
zebra_vxlan.h zebra: support for slow-failover of local MACs on an ES 2020-12-01 09:46:26 -08:00
zebra.conf.sample
zserv.c zebra: Adding zapi client close notification 2020-12-07 18:22:36 -05:00
zserv.h zebra: remove fuzzing stuff 2020-08-25 17:31:07 +02:00