New work enqueued to the dplane_fpm_nl provider is initially de-queued
and re-enqueued, in fpm_nl_process(), to be processed by the provider's
own thread.
After performing this initial de-queue/enqueue we return to
dplane_thread_loop() and check the dplane_fpm_nl output queue for any
work which has been completed.
Since this work is being processed in another thread it is very likely
that there will be some (or all) work still outstanding at this point.
The dataplane thread finishes up any other tasks and then waits until
it is next scheduled. In the meantime the dplane_fpm_nl thread is
processing its work queue until completion.
The issue arises here as the dataplane thread is not explicitly
re-scheduled once dplane_fpm_nl has drained its work queue and
populated its output queue with completed work.
This completed work can sit in the output queue for an indeterminate
period of time, depending upon when the dataplane thread is next
scheduled for other work. If the RIB has reached a stable state then
this could be a significant period of time. During this period zebra
marks these routes as queued, even though they have actually been
processed by all dataplane providers.
An un-related RIB change which triggers a FIB update will result in
the dataplane thread being scheduled and this completed work then
being processed. At this point the routes will then no longer be
marked as queued by zebra. However this new FIB update might itself
then fall victim to the same scenario!
We can observe the above behaviour in these detailed dplane logs.
11:24:47 zebra[7282]: dplane: incoming new work counter: 2
11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:24:47 zebra[7282]: dplane provider 'Kernel': processing
11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
11:24:47 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:24:47 zebra[7282]: dplane dequeues 1 completed work from provider dplane_fpm_nl
11:24:47 zebra[7282]: dplane has 1 completed, 0 errors, for zebra main
2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
1 completed context was de-queued, so there is outstanding work.
11:24:58 zebra[7282]: dplane: incoming new work counter: 2
11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:24:58 zebra[7282]: dplane provider 'Kernel': processing
11:24:58 zebra[7282]: ID (193) Dplane nexthop update ctx 0x55c429b6fed0 op NH_INSTALL
11:24:58 zebra[7282]: 0:5.5.5.5/32 Dplane route update ctx 0x55c429b79690 op ROUTE_INSTALL
11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
11:24:58 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
A further 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
2 completed contexts were de-queued, which sounds good as that is what we en-queued.
However, there is an outstanding context from earlier, so there is still outstanding
work.
Indeed the new 5.5.5.5/32 route is marked as queued:
O>q 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:01:19
This remains the case until we trigger a FIB update by installation of the
(eg.) 10.10.10.10/32 route:
11:26:41 zebra[7282]: dplane: incoming new work counter: 2
11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:26:41 zebra[7282]: dplane provider 'Kernel': processing
11:26:41 zebra[7282]: ID (195) Dplane nexthop update ctx 0x55c429b78ce0 op NH_INSTALL
11:26:41 zebra[7282]: 0:10.10.10.10/32 Dplane route update ctx 0x55c429b7a040 op ROUTE_INSTALL
11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
11:26:41 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
11:26:41 zebra[7282]: zebra2proto: Please add this protocol(2) to proper rt_netlink.c handling
11:26:41 zebra[7282]: Nexthop dplane ctx 0x55c429b6fed0, op NH_INSTALL, nexthop ID (193), result SUCCESS
11:26:41 zebra[7282]: default(0:254):5.5.5.5/32 Processing dplane result ctx 0x55c429b79690, op ROUTE_INSTALL result SUCCESS
We observe the same 2 enqueues and 2 dequeues as before, which again suggests
that there is outstanding work.
As expected, the 5.5.5.5/32 route is no longer marked as queued:
O>* 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:02:06
But the 10.10.10.10/32 route is, as we have not yet processed the completed
context:
C>q 10.10.10.10/32 is directly connected, lo, 00:26:05
Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
Returns the current number of (completed) contexts in the provider's
output queue (dp_ctx_out_q), allowing access to this data from the
provider itself.
Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
Following functions which is a piece of label-maanager implementation
isn't called from out side of its file. And all lines of label-manager
are coded on zebra/label_manager.c at this time. So these functions
should be unexposed.
Functions:
- create_label_chunk
- assign_label_chunk
- delete_label_chunk
- release_label_chunk
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
in the case the namespace pointer is already available, feed it at vrf
creation. this prevents from crashing if the netlink parsing already
began, and the vrf-lite is not enabled yet.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Following functions is using writen to dispatch message
into socket, but another function uses zserv_send_message.
This commit does tiny unification for zapi's socket messaging.
Funcs:
- zsend_assign_label_chunk_response()
- zsend_label_manager_connect_response()
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
The `show ip nht` and `show ipv6 nht` commands were broken.
This is because recent code commit: 0154d8ce45
assumed that p must not be NULL and this is not the case.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a bit of code to allow bgp to send the AS-Path associated with
the route being installed to zebra so it can be displayed and
used as part of the `show ip route A` command in zebra.
eva# show ip route 20.0.0.0/11
Routing entry for 20.0.0.0/11
Known via "bgp", distance 20, metric 0, best
Last update 00:00:00 ago
* 192.168.161.1, via enp39s0, weight 1
AS-Path: 60000 64539 15096 6939 8075
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Just gather the opaque data into the route entry. Later
commits will display this data for end users as well as
to send it down.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
A MAC entry cannot be deleted while a neigh is referencing it. It seems
there is some race condition where this may be happening. The log is
to help identify those cases.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
It is now 4bits of type and 28bits of value -
1. type=0 is for L3 NHG
2. type=1 is for L2 NH
3. type=2 is for L2 NHG
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This is an optimization to reduce the number of L2 nexthops. A
l2 or fdb nexthop simply provides the dataplane with a nexthop ip-
torm-12:mgmt:~# ip nexthop
id 268435461 via 27.0.0.20 scope link fdb
id 268435463 via 27.0.0.20 scope link fdb
id 268435465 via 27.0.0.20 scope link fdb
So there is no need to allocate a nexthop per-ES/per-VTEP. There
can be 100+ ESs per-VTEP so this change cuts the scale down by a
factor of 100.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a local ES flaps there are two modes in which the local
MACs are failed over -
1. Fast failover - A backup NHG (ES-peer group) is programmed in the
dataplane per-access port. When a local ES flaps the MAC entries
are left unaltered i.e. pointing to the down access port. And the
dataplane redirects traffic destined to the oper-down access port
via the backup NHG.
2. Slow failover - This mode needs to be turned on to allow dataplanes
not capable of re-directing traffic. In this mode local MAC entries
on a down local ES are re-programmed to point to the ES-peers'
NHG. And vice-versa i.e. when the ES comes up the MAC entries
are re-programmed with the access port as dest.
Fast failover is on by default. Slow failover can be enabled via the
following config -
evpn mh redirect-off
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
As a part of extended MM handing a MAC can be updated from local
to remote while being referenced by SYNC neighs (this is really a
temporary/small window). During this window if the MAC transitions
back to local again we need to re-inforce the previous SYNC flags
(based on the sync-neigh count) as subsequent SYNC updates to the
MAC will be de-duped and ignored.
Ticket: CM-29636
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a local mac is deleted by the dataplane zebra can re-install it
if the MAC is a SYNC MAC (learned from ES peers). The "local_inactive"
bit must be set as a part of the re-install to prevent zebra turning
around and advertising the MAC as locally active.
Also fixed up some debug logs in the slow-fail path to include the VNI.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
NHG and DST (VTEP-IP) are mutually exclusive attributes. If DST is
present the kernel ignores NHG.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
NHG is activated i.e. programmed in the dataplane only if there
are active-VTEPs associated with it. When a NHG is de-activated
all the remote-mac entries associated with it need to be removed
before the NHG is removed.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The lookup for non default VRFs was always using a tableId; if not
provided, we were defaulting to RT_TABLE_MAIN. This is fine for the
default VRF but not for others. As a result, the command was silently
failing for non-default VRFs unless we also specified the correct tableId.
Fix this by only performing the lookup using the tableId if it is
provided; else use zebra_vrf_table.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
A couple NHG messages we were logging as errors are a bit spammy
in usecases where you routinely add/remove interfaces (VM heavy
deployments). Its not really an error a user cares about and
more for a developer to know what went wrong after the fact so
it makes more sense for these to be under a debug rather than
an error since seeing them does not implicitly mean error during
those usecases.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
During times of network trauma and when we are at large network scale
the process_remote_macip_add function can issue a zlog_warn for
a common occurrence. Modify the code to be a debug statement.
This behavior is the same now as the process_remote_macip_del function
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add an api that allows a caller in the zebra main pthread to
process the queue of pending dplane updates. The caller supplies
a function to call to test each pending context. Selected
contexts are dequeued, and freed without being processed.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add routines to walk the LSP table and generate FPM updates for all
entries. A walk of the LSP table is triggered when (re-)connecting
to an FPM.
Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
Export netlink_lsp_msg_encoder() and use it to encode and send netlink
messages concerning LSP updates to connected FPMs.
Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
Dataplane/kernel prints the NHG and NH ids as decimal. Zebra
was printing it as hex (to display type vs. val). This became a
debugging hassle hence normalizing the format.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
DAD is not supported currently with EVPN-MH so we turn it off internally
when the first ES config is detected.
PS: Note that when all local ESs are deleted DAD will stay off and
will need to be cleared via a daemon restart.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The function was originally implemented for zebra data plane FPM plugin,
but another code places could use it.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>