Go to file
Duncan Eastoe 164d8e8608 zebra: routes stuck with 'q' when using dplane FPM
New work enqueued to the dplane_fpm_nl provider is initially de-queued
and re-enqueued, in fpm_nl_process(), to be processed by the provider's
own thread.

After performing this initial de-queue/enqueue we return to
dplane_thread_loop() and check the dplane_fpm_nl output queue for any
work which has been completed.

Since this work is being processed in another thread it is very likely
that there will be some (or all) work still outstanding at this point.
The dataplane thread finishes up any other tasks and then waits until
it is next scheduled. In the meantime the dplane_fpm_nl thread is
processing its work queue until completion.

The issue arises here as the dataplane thread is not explicitly
re-scheduled once dplane_fpm_nl has drained its work queue and
populated its output queue with completed work.

This completed work can sit in the output queue for an indeterminate
period of time, depending upon when the dataplane thread is next
scheduled for other work. If the RIB has reached a stable state then
this could be a significant period of time. During this period zebra
marks these routes as queued, even though they have actually been
processed by all dataplane providers.

An un-related RIB change which triggers a FIB update will result in
the dataplane thread being scheduled and this completed work then
being processed. At this point the routes will then no longer be
marked as queued by zebra. However this new FIB update might itself
then fall victim to the same scenario!

We can observe the above behaviour in these detailed dplane logs.

    11:24:47 zebra[7282]: dplane: incoming new work counter: 2
    11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:24:47 zebra[7282]: dplane provider 'Kernel': processing
    11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
    11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
    11:24:47 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:24:47 zebra[7282]: dplane dequeues 1 completed work from provider dplane_fpm_nl
    11:24:47 zebra[7282]: dplane has 1 completed, 0 errors, for zebra main

2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
1 completed context was de-queued, so there is outstanding work.

    11:24:58 zebra[7282]: dplane: incoming new work counter: 2
    11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:24:58 zebra[7282]: dplane provider 'Kernel': processing
    11:24:58 zebra[7282]: ID (193) Dplane nexthop update ctx 0x55c429b6fed0 op NH_INSTALL
    11:24:58 zebra[7282]: 0:5.5.5.5/32 Dplane route update ctx 0x55c429b79690 op ROUTE_INSTALL
    11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
    11:24:58 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main

A further 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
2 completed contexts were de-queued, which sounds good as that is what we en-queued.
However, there is an outstanding context from earlier, so there is still outstanding
work.

Indeed the new 5.5.5.5/32 route is marked as queued:

    O>q 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:01:19

This remains the case until we trigger a FIB update by installation of the
(eg.) 10.10.10.10/32 route:

    11:26:41 zebra[7282]: dplane: incoming new work counter: 2
    11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
    11:26:41 zebra[7282]: dplane provider 'Kernel': processing
    11:26:41 zebra[7282]: ID (195) Dplane nexthop update ctx 0x55c429b78ce0 op NH_INSTALL
    11:26:41 zebra[7282]: 0:10.10.10.10/32 Dplane route update ctx 0x55c429b7a040 op ROUTE_INSTALL
    11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
    11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
    11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
    11:26:41 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
    11:26:41 zebra[7282]: zebra2proto: Please add this protocol(2) to proper rt_netlink.c handling
    11:26:41 zebra[7282]: Nexthop dplane ctx 0x55c429b6fed0, op NH_INSTALL, nexthop ID (193), result SUCCESS
    11:26:41 zebra[7282]: default(0:254):5.5.5.5/32 Processing dplane result ctx 0x55c429b79690, op ROUTE_INSTALL result SUCCESS

We observe the same 2 enqueues and 2 dequeues as before, which again suggests
that there is outstanding work.

As expected, the 5.5.5.5/32 route is no longer marked as queued:

    O>* 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:02:06

But the 10.10.10.10/32 route is, as we have not yet processed the completed
context:

    C>q 10.10.10.10/32 is directly connected, lo, 00:26:05

Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
2020-12-11 15:04:15 +00:00
.github .github: improve bug report template 2020-10-20 16:12:03 -04:00
alpine alpine: Remove old docker deps for alpine 2020-10-22 03:03:53 -04:00
babeld babeld: Free ifc leak 2020-11-14 21:19:42 -05:00
bfdd bfd: fix session lookup 2020-12-04 14:38:30 +03:00
bgpd Merge pull request #7678 from donaldsharp/aspath_to_zebra 2020-12-10 10:38:14 -05:00
debian Revert "debian: Merge various debian changelogs in debian/changelog" 2020-11-19 17:12:42 -05:00
doc Merge pull request #7678 from donaldsharp/aspath_to_zebra 2020-12-10 10:38:14 -05:00
docker docker: don't fail on chown /etc/frr 2020-06-10 00:20:04 -04:00
eigrpd eigrpd: Remove unneeeded if state types 2020-11-28 07:45:08 -05:00
fpm *: Replace sizeof something to sizeof(something) 2020-03-08 21:44:53 +02:00
gdb bgpd: Convert binfo to path 2018-10-09 14:26:30 -04:00
grpc lib: don't ignore error messages generated during the commit apply phase 2020-08-14 21:37:14 -03:00
include include: Update rtnetlink.h 2020-11-15 10:12:50 -05:00
isisd Merge pull request #7703 from volta-networks/fix_ldpsync_remove_hello 2020-12-09 20:21:11 -05:00
ldpd Merge pull request #7703 from volta-networks/fix_ldpsync_remove_hello 2020-12-09 20:21:11 -05:00
lib Merge pull request #7678 from donaldsharp/aspath_to_zebra 2020-12-10 10:38:14 -05:00
m4 build: find all future minor versions of python3 2020-07-09 06:47:31 +02:00
mlag zebra: Do not build mlag protobuf support if version 3 is not avail 2019-12-15 09:37:51 -05:00
nhrpd nhrpd: fix SA warning in nhrp_interface 2020-12-08 09:10:10 -05:00
ospf6d Merge pull request #7492 from Niral-Networks/niral_ospfv3_fix_redist 2020-12-10 09:01:12 -03:00
ospfclient ospfclient: replace inet_ntoa 2020-10-22 13:41:51 -04:00
ospfd ldpd, isisd, ospfd: Remove periodic ldp-sync hello message 2020-12-09 14:11:38 -05:00
pbrd *: Convert all usage of zclient_send_message to new enum 2020-11-15 15:04:52 -05:00
pimd Merge pull request #7601 from patrasar/pim_fix 2020-12-01 15:53:53 -05:00
pkgsrc *: cleanup .gitignore files 2018-09-08 21:30:42 +02:00
python *: reformat python files 2020-10-07 17:22:26 -04:00
qpb build: add LLVM bitcode targets 2020-05-05 14:39:12 +02:00
redhat redhat: include new BFD development header 2020-11-24 07:55:07 -03:00
ripd *: Remove route_map_object_t from the system 2020-11-13 19:35:20 -05:00
ripngd *: Remove route_map_object_t from the system 2020-11-13 19:35:20 -05:00
sharpd sharpd, zebra: Pass and display opaque data as PoC 2020-12-08 09:06:09 -05:00
snapcraft snapcraft: Update libyang version 2020-09-10 09:13:36 -04:00
staticd Merge pull request #7478 from donaldsharp/buffer 2020-11-18 08:30:47 -05:00
tests Merge pull request #7690 from donaldsharp/nht_show_is_not_not_not 2020-12-09 07:58:37 -05:00
tools Merge pull request #7582 from AnuradhaKaruppiah/frr-reload-cleanup 2020-12-07 16:19:04 -05:00
vrrpd *: Convert all usage of zclient_send_message to new enum 2020-11-15 15:04:52 -05:00
vtysh Merge pull request #7667 from donaldsharp/vtysh_more_useful_data 2020-12-04 08:14:23 -05:00
watchfrr *: unify thread/event cancel macros 2020-10-23 12:16:52 -04:00
yang Merge pull request #7590 from opensourcerouting/isisd-lfa 2020-12-02 20:43:51 -05:00
zebra zebra: routes stuck with 'q' when using dplane FPM 2020-12-11 15:04:15 +00:00
.clang-format clang-format: add FOREACH_SAFI to the ForEachMacros list 2020-08-03 12:18:24 -03:00
.dir-locals.el tools: fix emacs configuration file 2019-11-04 11:45:52 -03:00
.dockerignore docker: Make docker image on CentOS 7 2019-11-26 19:29:30 +00:00
.git-blame-ignore-revs *: Consolidate on first git blame ignore revs 2020-10-13 16:07:18 -04:00
.gitignore Revert "debian: Adjust tarsource.sh to use native debian/changelog" 2020-11-19 17:12:41 -05:00
bootstrap.sh autoreconf -i 2007-02-06 19:28:28 +00:00
buildtest.sh config: switch a few references to say FRR 2017-07-12 11:25:33 -05:00
changelog-auto.in Revert "debian: Adjust tarsource.sh to use native debian/changelog" 2020-11-19 17:12:41 -05:00
config.version.in build: carry --with-pkg-extra-version into tarballs 2018-10-24 15:11:50 +02:00
configure.ac Merge pull request #7475 from eololab/add-more-parameters-for-crosscompilation 2020-11-24 11:44:29 -05:00
COPYING *: make consistent & update GPLv2 file headers 2017-05-15 16:37:41 +02:00
COPYING-LGPLv2.1 build: remove LGPL v2.0, add LGPL v2.1 2016-11-15 17:19:38 +09:00
Makefile.am Revert "debian: Adjust tarsource.sh to use native debian/changelog" 2020-11-19 17:12:41 -05:00
README.md doc: Update Documentation to note Solaris Unsupported status 2020-09-21 10:02:20 -04:00
stamp-h.in Initial revision 2002-12-13 20:15:29 +00:00

Icon

FRRouting

FRR is free software that implements and manages various IPv4 and IPv6 routing protocols. It runs on nearly all distributions of Linux and BSD and supports all modern CPU architectures.

FRR currently supports the following protocols:

  • BGP
  • OSPFv2
  • OSPFv3
  • RIPv1
  • RIPv2
  • RIPng
  • IS-IS
  • PIM-SM/MSDP
  • LDP
  • BFD
  • Babel
  • PBR
  • OpenFabric
  • VRRP
  • EIGRP (alpha)
  • NHRP (alpha)

Installation & Use

For source tarballs, see the releases page.

For Debian and its derivatives, use the APT repository at https://deb.frrouting.org/.

Instructions on building and installing from source for supported platforms may be found in the developer docs.

Once installed, please refer to the user guide for instructions on use.

Community

The FRRouting email list server is located here and offers the following public lists:

Topic List
Development dev@lists.frrouting.org
Users & Operators frog@lists.frrouting.org
Announcements announce@lists.frrouting.org

For chat, we currently use Slack. You can join by clicking the "Slack" link under the Participate section of our website.

Contributing

FRR maintains developer's documentation which contains the project workflow and expectations for contributors. Some technical documentation on project internals is also available.

We welcome and appreciate all contributions, no matter how small!

Security

To report security issues, please use our security mailing list:

security [at] lists.frrouting.org