When all the uplinks go down the VTEP is disconnected from the
VxLAN overlay and this was handled by proto-downing the ES bonds. When
the uplinks come up again we need to re-enable the ES bonds but that
needs to be done after a delay to allow the EVPN network to converge.
And that is done by firing off the startup-delay timer on first
uplink-up.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. When a bond is associated with an ES we may need to re-sync
the dplane protodown state (which maybe stale/set by some other
app).
2. Also change the uplink state display to avoid confusion with
protodown reason code (both used to show uplink-up).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
protodown state is a combination of the dplane and zebra states.
protodown reason is maintained exclusively by zebra. Display this
information on two separate lines to make that ownership clearer.
Also display n/a for bonds as the dplane doesn't support protodowning
the bond device.
Sample output -
==============
root@torm-11:mgmt:~# vtysh -c "show interface hostbond1"|grep -i protodown
protodown: off (n/a)
protodown reasons: (uplinks-down)
root@torm-11:mgmt:~# vtysh -c "show interface swp5"|grep -i protodown
protodown: on
protodown reasons: (uplinks-down)
root@torm-11:mgmt:~#
PS: Cosmetic changes only, no functional change.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The code for this was already there but was not kicking in because of a
zebra local reason-code dup check. Even if the reason-code is the same,
if the dplane and zebra disagree about the protodown state zebra will
need to re-program the dplane.
Fixed a couple of spelling errors in the protodown logs to make greps
easy.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When the ospfv3 interface is disabled by the command "no interface <eth> area <area-id>
the linked interface prefixes does not get flushed
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
evpn route-map match (filter) on vni is not working
at the origin of the routes.
evpn match vni route checks for encap type as vxlan.
the source route attribute is not set with vxlan encap
thus the match filter wouldn't work.
Ticket:CM-32554
Reviewed By:CCR-11056
Testing Done:
At source have match vni plus set statement in route-map.
Validate the origin of the route's outbound correctly sets
the 'set' statment based on match vni filter.
At origin:
route-map RM-EVPN-TE-Matches permit 10
match evpn vni 4001
set large-community 10:10:119
Receiving end:
Route [5]:[0]:[24]:[78.41.1.0] VNI 4001
5550
27.0.0.15 from TORS1(downlink-5) (27.0.0.15)
Origin incomplete, metric 0, valid, external, bestpath-from-AS 5550, best (First path received)
Extended Community: RT:5550:4001 ET:8 Rmac:00:02:00:00:00:4d
Large Community: 10:10:119 <--- Large community stamped
Last update: Thu Dec 10 22:19:26 2020
Signed-off-by: Chirag Shah <chirag@nvidia.com>
New work enqueued to the dplane_fpm_nl provider is initially de-queued
and re-enqueued, in fpm_nl_process(), to be processed by the provider's
own thread.
After performing this initial de-queue/enqueue we return to
dplane_thread_loop() and check the dplane_fpm_nl output queue for any
work which has been completed.
Since this work is being processed in another thread it is very likely
that there will be some (or all) work still outstanding at this point.
The dataplane thread finishes up any other tasks and then waits until
it is next scheduled. In the meantime the dplane_fpm_nl thread is
processing its work queue until completion.
The issue arises here as the dataplane thread is not explicitly
re-scheduled once dplane_fpm_nl has drained its work queue and
populated its output queue with completed work.
This completed work can sit in the output queue for an indeterminate
period of time, depending upon when the dataplane thread is next
scheduled for other work. If the RIB has reached a stable state then
this could be a significant period of time. During this period zebra
marks these routes as queued, even though they have actually been
processed by all dataplane providers.
An un-related RIB change which triggers a FIB update will result in
the dataplane thread being scheduled and this completed work then
being processed. At this point the routes will then no longer be
marked as queued by zebra. However this new FIB update might itself
then fall victim to the same scenario!
We can observe the above behaviour in these detailed dplane logs.
11:24:47 zebra[7282]: dplane: incoming new work counter: 2
11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:24:47 zebra[7282]: dplane provider 'Kernel': processing
11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
11:24:47 zebra[7282]: Dplane NEIGH_DISCOVER, ip 192.168.2.2, ifindex 9
11:24:47 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:24:47 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:24:47 zebra[7282]: dplane dequeues 1 completed work from provider dplane_fpm_nl
11:24:47 zebra[7282]: dplane has 1 completed, 0 errors, for zebra main
2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
1 completed context was de-queued, so there is outstanding work.
11:24:58 zebra[7282]: dplane: incoming new work counter: 2
11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:24:58 zebra[7282]: dplane provider 'Kernel': processing
11:24:58 zebra[7282]: ID (193) Dplane nexthop update ctx 0x55c429b6fed0 op NH_INSTALL
11:24:58 zebra[7282]: 0:5.5.5.5/32 Dplane route update ctx 0x55c429b79690 op ROUTE_INSTALL
11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:24:58 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:24:58 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
11:24:58 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
A further 2 contexts (all incoming work) have been queued to dplane_fpm_nl - all good.
2 completed contexts were de-queued, which sounds good as that is what we en-queued.
However, there is an outstanding context from earlier, so there is still outstanding
work.
Indeed the new 5.5.5.5/32 route is marked as queued:
O>q 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:01:19
This remains the case until we trigger a FIB update by installation of the
(eg.) 10.10.10.10/32 route:
11:26:41 zebra[7282]: dplane: incoming new work counter: 2
11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'Kernel'
11:26:41 zebra[7282]: dplane provider 'Kernel': processing
11:26:41 zebra[7282]: ID (195) Dplane nexthop update ctx 0x55c429b78ce0 op NH_INSTALL
11:26:41 zebra[7282]: 0:10.10.10.10/32 Dplane route update ctx 0x55c429b7a040 op ROUTE_INSTALL
11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider Kernel
11:26:41 zebra[7282]: dplane enqueues 2 new work to provider 'dplane_fpm_nl'
11:26:41 zebra[7282]: dplane dequeues 2 completed work from provider dplane_fpm_nl
11:26:41 zebra[7282]: dplane has 2 completed, 0 errors, for zebra main
11:26:41 zebra[7282]: zebra2proto: Please add this protocol(2) to proper rt_netlink.c handling
11:26:41 zebra[7282]: Nexthop dplane ctx 0x55c429b6fed0, op NH_INSTALL, nexthop ID (193), result SUCCESS
11:26:41 zebra[7282]: default(0:254):5.5.5.5/32 Processing dplane result ctx 0x55c429b79690, op ROUTE_INSTALL result SUCCESS
We observe the same 2 enqueues and 2 dequeues as before, which again suggests
that there is outstanding work.
As expected, the 5.5.5.5/32 route is no longer marked as queued:
O>* 5.5.5.5/32 [110/10] via 192.168.2.2, dp0p1s3, weight 1, 00:02:06
But the 10.10.10.10/32 route is, as we have not yet processed the completed
context:
C>q 10.10.10.10/32 is directly connected, lo, 00:26:05
Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
Returns the current number of (completed) contexts in the provider's
output queue (dp_ctx_out_q), allowing access to this data from the
provider itself.
Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
Some prefixes were not shown in the link database
show command, due to issues with pointer calculation.
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
Some prefixes were not shown in the intra-prefix database
show command, due to issues with pointer calculation.
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
On top of the recent `bgp suppress-fib-pending which
was at a BGP_NODE level, add this command at the CONFIG_NODE
level as well and allow the command to apply to all instances
of bgp running.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Following functions which is a piece of label-maanager implementation
isn't called from out side of its file. And all lines of label-manager
are coded on zebra/label_manager.c at this time. So these functions
should be unexposed.
Functions:
- create_label_chunk
- assign_label_chunk
- delete_label_chunk
- release_label_chunk
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Use user provided AD for local routes (aggregate).
address-family ipv4 unicast
distance bgp 20 200 210
network 47.2.2.8/30
aggregate-address 51.1.0.0/16
Testing Done:
Before aggr route uses default 200 AD even user provided local AD.
B>* 51.1.0.0/16 [200/0] unreachable (blackhole), weight 1, 00:01:14
After:
B>* 51.1.0.0/16 [210/0] unreachable (blackhole), weight 1, 00:00:01
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Removing the obsolete ldp-sync periodic 'hello' message.
When ldp-sync is configured, IGPs take action if the LDP process goes down.
The IGPs have been updated to use the zapi client close callback to detect
the LDP process going down.
Signed-off-by: Karen Schoener <karen@voltanet.io>
In some extraordinary circumstances an LSP might not have any
TLV. Add a null check to prevent a crash when that happens.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
In ldpd, the child processes send IPC messages to the main process to
perform logging in their behalf (access to the file descriptor used
for logging needs to be serialized). This commit fixes a problem
that was preventing the printfrr format specifiers from working in
the child processes, since vsnprintf() was being used instead of
vsnprintfrr() before sending the log messages to the parent process.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When ldp-sync is configured, IGPs take action if the LDP process goes down.
Currently, IGPs detect the LDP process is down if they do not receive a
periodic 'hello' message from LDP within 1 second.
Intermittently, this heartbeat mechanism causes false topotest failures.
When the failure occurs, LDP is busy receiving messages from zebra for a
few seconds. During this time, LDP does not send the expected periodic
message.
With this change, IGPs detect LDP down via zapi client close message.
Signed-off-by: Karen Schoener <karen@voltanet.io>
in the case the namespace pointer is already available, feed it at vrf
creation. this prevents from crashing if the netlink parsing already
began, and the vrf-lite is not enabled yet.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>