The openfabric daemon has a longer name than anticipated for
`show zebra client summary` adjust to allow it to fit without
making columns all blomped.
Before:
robot# show zebra client summ
Name Connect Time Last Read Last Write IPv4 Routes IPv6 Routes
--------------------------------------------------------------------------------
static 00:00:06 00:00:06 00:00:06 4/0 0/0
openfabric 00:00:06 00:00:06 00:00:06 0/0 0/0
After:
[sharpd@robot frr4]$ vtysh -c "show zebra client summ"
Name Connect Time Last Read Last Write IPv4 Routes IPv6 Routes
--------------------------------------------------------------------------------
static 00:02:16 00:02:16 00:02:16 4/0 0/0
openfabric 00:02:16 00:02:16 00:02:16 0/0 0/0
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported by testing facility that our sending of Router
Advertisements more frequently than once very three seconds is not
compliant with rfc4861. Added a knob to turn off fast retransmits
in order to meet the requirement of the RFC.
Ticket: CM-27063
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Macvlan down event have sentinel check of its parent
link presence.
Ticket:CM-26622
Reviewed By:CCR-9326
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
"show vrf vni" and "show evpn vni <l3vni>" commands
need to display correct router mac value.
"show evpn vni <l3vni>" detail l3vni needs to display
system mac as in PIP scenario value can be different.
Syste MAC would be derived from SVI interface MAC wherelse
Router MAC would be derived from macvlan interface MAC value.
Ticket:CM-26710
Reviewed By:CCR-9334
Testing Done:
TORC11# show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
Local Vtep Ip: 36.0.0.11
Vxlan-Intf: vx-4001
SVI-If: vlan4001
State: Up
VNI Filter: none
System MAC: 00:02:00:00:00:2e
Router MAC: 44:38:39:ff:ff:01
L2 VNIs: 1000
TORC11# show vrf vni
VRF VNI VxLAN IF L3-SVI State Rmac
vrf1 4001 vx-4001 vlan4001 Up 44:38:39:ff:ff:01
TORC11# show evpn vni 4001 json
{
"vni":4001,
"type":"L3",
"localVtepIp":"36.0.0.11",
"vxlanIntf":"vx-4001",
"sviIntf":"vlan4001",
"state":"Up",
"vrf":"vrf1",
"sysMac":"00:02:00:00:00:2e",
"routerMac":"44:38:39:ff:ff:01",
"vniFilter":"none",
"l2Vnis":[
1000,
]
}
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
macvlan interface up/down event triggers
bgp to send updates for evpn routes
with changed RMAC and nexthop IP values.
Ticket:CM-26190
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
By default announct Self Type-2 routes with
system IP as nexthop and system MAC as
nexthop.
An API to check type-2 is self route via
checking ipv4/ipv6 address from connected interfaces list.
An API to extract RMAC and nexthop for type-2
routes based on advertise-svi-ip knob is enabled.
When advertise-pip is enabled/disabled, trigger type-2
route update. For self type-2 routes to use
anycast or individual (rmac, nexthop) addresses.
Ticket:CM-26190
Reviewed By:
Testing Done:
Enable 'advertise-svi-ip' knob in bgp default instance.
the vrf instance svi ip is advertised with nexthop
as default instance router-id and RMAC as system MAC.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Extract mac-vlan interface mac when a l3vni add is sent to bgp
Per L3VNI maintain vrr interface.
An api to extract vrr mac address from a vlan id, associated
master svi device.
When a l3vni operational up event is sent to bgpd,
extract vrr rmac along with svi rmac.
Ticket:CM-26190
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Zebra MLAG is using "t_read" for multiple tasks, such as
1. For opening Communication channel with MLAG
2. In case conncetion fails, same event is used for retries
3. after the connection establishment, same event is used to
read the data from MLAG
since all these taks will never schedule together, this will not
cause any issues.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
edge-2> show evpn vni detail json
{
"vni":79031,
"type":"L3",
...,
...
} <<<<<< no comma
{
"vni":79021,
"type":"L3",
...,
...
} <<<<<< no comma
{
} <<<<<< blank
edge-2>
The fix is to pack json info into json_array before printing it.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
Apparently the multipath_num functionatlity has been broken
for a while because we were ignoring the recusive nexthops
when marking them inactive based on it.
This sets them as inactive as well if the parent breaks it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were re-counting the entire group's active number on
every iteration of this nexthop_active_update() loop.
This is not great from a performance perspective but also
it was failing to properly mark things according to the
specified multipath_num.
Since a nexthop is set as active before this check, if its == to
the set ecmp, it gets marked inactive even though if its
under the max ecmp wanted!
ex)
set ecmp to 1.
`/usr/lib/frr/zebra -e 1`
All kernel routes will be marked inactive even with just one nexthop!
K 1.1.1.1/32 [0/0] is directly connected, dummy1 inactive, 00:00:10
K 1.1.1.2/32 [0/0] is directly connected, dummy2 inactive, 00:00:10
K 1.1.1.3/32 [0/0] is directly connected, dummy3 inactive, 00:00:10
K 1.1.1.4/32 [0/0] is directly connected, dummy4 inactive, 00:00:10
K 1.1.1.5/32 [0/0] is directly connected, dummy5 inactive, 00:00:10
K 1.1.1.6/32 [0/0] is directly connected, dummy6 inactive, 00:00:10
K 1.1.1.7/32 [0/0] is directly connected, dummy7 inactive, 00:00:10
K 1.1.1.8/32 [0/0] is directly connected, dummy8 inactive, 00:00:10
K 1.1.1.9/32 [0/0] is directly connected, dummy9 inactive, 00:00:10
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Clean up the relationships between zebra's rib and nexthop-group
headers as prep for adding a nexthop-group pointer to the
route_entry.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
On BSD systems null routes were not being installed into the
kernel. This is because commit 08ea27d112
introduced a bug where we were attempting to use the wrong
prefix afi types and as such we were going down the v6 code path.
test27.lab.netdef.org# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/0] via 192.168.122.1, 00:00:23
S>* 4.5.6.8/32 [1/0] unreachable (blackhole), 00:00:11
C>* 192.168.122.0/24 [0/1] is directly connected, vtnet0, 00:00:23
test27.lab.netdef.org# exit
[ci@test27 ~/frr]$ netstat -rn
Routing tables
Internet:
Destination Gateway Flags Netif Expire
default 192.168.122.1 UGS vtnet0
4.5.6.8/32 127.0.0.1 UG1B lo0
127.0.0.1 link#2 UH lo0
192.168.122.0/24 link#1 U vtnet0
192.168.122.108 link#1 UHS lo0
Fixes: #4843
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The code for when a new vrf is created to properly handle
router advertisement for it is messed up in several ways:
1) Generation of the zrouter data structure should set the rtadv
socket to -1 so that we don't accidently close someone elses
open file descriptor
2) When you created a new zvrf instance *after* bootup we are XCALLOC'ing
the data structure so the zvrf->fd was 0. The shutdown code was looking
for the >= 0 to know if the fd existed (since fd 0 is valid!)
This sequence of events would cause zebra to consume 100% of the
cpu:
Run zebra by itself ( no other programs )
ip link add vrf1 type vrf table 1003
ip link del vrf vrf1
vtysh -c "configure" -c "no interface vrf1"
This commit fixes this issue.
Fixes: #5376
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we shut down zebra, we were not doing anything to shut
down the FPM. Perform the necessary occult rituals and
stop the threads from running during early shutdown.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We should be setting the ns->info pointer to NULL when we free
what it points to. Just use XFREE directly on the void * pointer
to do this.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were not connecting the default zebra_ns to the default
ns->info at namespace initialization in zebra. Thus, when
we tried to use the `ns_walk_func()` it would ignore the
default zebra_ns since there is no pointer to it from the
ns struct.
Fix this by connecting them in `zebra_ns_init()` and,
if the default ns is not found, exit with failure
since this is not recoverable.
This was found during a crash where we fail to cancel the kernel_read
thread at termination (via the `ns_walk_func()`) and then we
get a netlink notification trying to use the zns struct that has
already been freed.
```
(gdb) bt
\#0 0x00007fc1134dc7bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
\#1 0x00007fc1134c7535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
\#2 0x00007fc113996f8f in core_handler (signo=11, siginfo=0x7ffe5429d070, context=<optimized out>) at lib/sigevent.c:254
\#3 <signal handler called>
\#4 0x0000561880e15449 in if_lookup_by_index_per_ns (ns=0x0, ifindex=174) at zebra/interface.c:269
\#5 0x0000561880e1642c in if_up (ifp=ifp@entry=0x561883076c50) at zebra/interface.c:1043
\#6 0x0000561880e10723 in netlink_link_change (h=0x7ffe5429d8f0, ns_id=<optimized out>, startup=<optimized out>) at zebra/if_netlink.c:1384
\#7 0x0000561880e17e68 in netlink_parse_info (filter=filter@entry=0x561880e17680 <netlink_information_fetch>, nl=nl@entry=0x561882497238, zns=zns@entry=0x7ffe542a5940,
count=count@entry=5, startup=startup@entry=0) at zebra/kernel_netlink.c:932
\#8 0x0000561880e186a5 in kernel_read (thread=<optimized out>) at zebra/kernel_netlink.c:406
\#9 0x00007fc1139a4416 in thread_call (thread=thread@entry=0x7ffe542a5b70) at lib/thread.c:1599
\#10 0x00007fc113974ef8 in frr_run (master=0x5618823c9510) at lib/libfrr.c:1024
\#11 0x0000561880e0b916 in main (argc=8, argv=0x7ffe542a5f78) at zebra/main.c:483
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
1. add the Mlag ProtoBuf Lib to Zebra Compilation
2. Encode the messages with protobuf before writing to MLAG
3. Decode the MLAG Messages using protobuf and write to clients
based on their subscrption.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
This includes:
1. Processing client Registrations for MLAG
2. storing client Interests for MLAG updates
3. Opening communication channel to MLAG with First client reg
4. Closing Communication channel with last client De-reg
5. Spawning a new thread for handling MLAG updates peocessing
6. adding Test code
7. advertising MLAG Updates to clients based on their interests
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
This code is called from the zebra main pthread during shutdown
but the thread event is scheduled via the zebra dplane pthread.
Hence, we should be using the `thread_cancel_async()` API to
cancel the thread event on a different pthread.
This is only ever hit in the rare case that we still have work left
to do on the update queue during shutdown.
Found via zebra crash:
```
(gdb) bt
\#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
\#1 0x00007f4e4d3f7535 in __GI_abort () at abort.c:79
\#2 0x00007f4e4d3f740f in __assert_fail_base (fmt=0x7f4e4d559ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f4e4d9071d0 "master->owner == pthread_self()",
file=0x7f4e4d906cf8 "lib/thread.c", line=1185, function=<optimized out>) at assert.c:92
\#3 0x00007f4e4d405102 in __GI___assert_fail (assertion=assertion@entry=0x7f4e4d9071d0 "master->owner == pthread_self()", file=file@entry=0x7f4e4d906cf8 "lib/thread.c",
line=line@entry=1185, function=function@entry=0x7f4e4d906b68 <__PRETTY_FUNCTION__.15817> "thread_cancel") at assert.c:101
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
\#5 0x000055b40c291845 in zebra_dplane_shutdown () at zebra/zebra_dplane.c:3274
\#6 0x000055b40c27ee13 in zebra_finalize (dummy=<optimized out>) at zebra/main.c:202
\#7 0x00007f4e4d8d1416 in thread_call (thread=thread@entry=0x7ffcbbc08870) at lib/thread.c:1599
\#8 0x00007f4e4d8a1ef8 in frr_run (master=0x55b40ce35510) at lib/libfrr.c:1024
\#9 0x000055b40c270916 in main (argc=8, argv=0x7ffcbbc08c78) at zebra/main.c:483
(gdb) down
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
1185 assert(master->owner == pthread_self());
(gdb)
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Put the code to free the data held by a nhg_ctx
in nhg_ctx_free() as well. We do it similiarly for
the dplane_ctx.
Let nhg_ctx_fini() be any other routines that need to
be handled before freeing.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
SA warned us lookup could be NULL dereferenced in some
paths. Handle the case where we are passed a NULL
nexthop before we try to copy it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were only checking that two nhg_hash_entry's were equal
based on the active nexthop NUMBER. This is not sufficient in
special cases where whats active with one route using it,
might not be active with the other. We can see this with
routes trying to resolve to themselves.
Ex)
1.1.1.0/24
-> 1.1.1.1 dummy1 (inactive)
-> 1.1.1.2 dummy2
1.1.2.0/24
-> 1.1.1.1 dummy1
-> 1.1.1.2 dummy1 (inactive)
Without checking each nexthop individually, they will
hash to the same group since they have the same number of
active nexthops.
Fix this by looping over every nexthop for each nhe (they should
be sorted) and checking if the NEXTHOP_FLAG_ACTIVE flag's match.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We cannot clear the NEXTHOP_FLAG_FIB nexthop flag
when sending routes to the dataplane anymore since
nexthops are now shared.
We were seeing a situation where if we delete a route
using a nexthop group that is still active with another
route, the fib flag was being unset by this code
path despite them still being valid fib nexthops with the
other route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were crashing due to a missed label change code path
in mpls_ftn_uninstall() with the zebra_nhg hashing code.
Add a static handler function for label changing everywhere
in that code and use it in mpls_ftn_uninstall().
The crash was found in the ISIS-SR tests:
==23== Thread 1:
==23== Invalid read of size 4
==23== at 0x15B20E: zebra_nhg_hash_equal (zebra_nhg.c:365)
==23== by 0x489A2FD: hash_get (hash.c:143)
==23== by 0x489A4BC: hash_lookup (hash.c:183)
==23== by 0x15B5A3: zebra_nhg_find (zebra_nhg.c:494)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x1573E8: mpls_ftn_update (zebra_mpls.c:2661)
==23== by 0x1A2554: zread_mpls_labels_replace (zapi_msg.c:1890)
==23== by 0x1A41CD: zserv_handle_commands (zapi_msg.c:2613)
==23== by 0x199B17: zserv_process_messages (zserv.c:517)
==23== by 0x48EE6B7: thread_call (thread.c:1549)
==23== by 0x48A8AD5: frr_run (libfrr.c:1064)
==23== by 0x1391B7: main (main.c:468)
==23== Address 0x5839330 is 0 bytes inside a block of size 80 free'd
==23== at 0x48369AB: free (vg_replace_malloc.c:530)
==23== by 0x48AEE6C: qfree (memory.c:129)
==23== by 0x15C5F8: zebra_nhg_free (zebra_nhg.c:1095)
==23== by 0x15BC8C: zebra_nhg_handle_uninstall (zebra_nhg.c:734)
==23== by 0x15DCFA: zebra_nhg_uninstall_kernel (zebra_nhg.c:1826)
==23== by 0x15C666: zebra_nhg_decrement_ref (zebra_nhg.c:1106)
==23== by 0x15D9D7: zebra_nhg_re_update_ref (zebra_nhg.c:1711)
==23== by 0x15D8B1: nexthop_active_update (zebra_nhg.c:1660)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
==23== Block was alloc'd at
==23== at 0x4837B65: calloc (vg_replace_malloc.c:752)
==23== by 0x48AED56: qcalloc (memory.c:110)
==23== by 0x15B07B: zebra_nhg_copy (zebra_nhg.c:307)
==23== by 0x15B13E: zebra_nhg_hash_alloc (zebra_nhg.c:329)
==23== by 0x489A339: hash_get (hash.c:148)
==23== by 0x15B6CA: zebra_nhg_find (zebra_nhg.c:532)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x15D89A: nexthop_active_update (zebra_nhg.c:1658)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
This reverts commit 7d5bb02b1a.
Allow zebra to actually maintain the nexthop group in the
linux kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In symmetric routing case for evpn (type-2/type-5)
routes, nexthop and Router mac fields can change from
the originating VTEP.
At the receiving VTEP, bgp path info may points to different
nexthop IP (nh1->nh2) and Remote MAC remain the same.
When the bgp sync the route with nexthop and RMAC fields.
For the exisitng rmac entry update/replace with the new
nexthop (VTEP) IP in remote rmac db in Zebra.
Similarly, bgp path info may points different Router-mac
(RMAC1->RMAC2) and the nexthop value remains the same.
In this case, update to the new RMAC value for the
existing remote nexthop in the Zebra' nexthop cache db.
Ticket:CM-26917
Reviewed By:CCR-9435
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
We were creating `other` tables in rib_del(), vty commands, and
dataplane return callback via the zebra_vrf_table_with_table_id()
API.
Seperate the API into only a lookup, never create
and added another with `get` in the name (following the standard
we use in other table APIs).
Then changed the rib_del(), rib_find_rn_from_ctx(), and show route
summary vty command to use the lookup API instead.
This was found via a crash where two different vrfs though they owned
the table. On delete, one free'd all the nodes, and then the other tried
to use them. It required specific timing of a VRF existing, going away,
and coming back again to cause the crash.
=23464== Invalid read of size 8
==23464== at 0x179EA4: rib_dest_from_rnode (rib.h:433)
==23464== by 0x17ACB1: zebra_vrf_delete (zebra_vrf.c:253)
==23464== by 0x48F3D45: vrf_delete (vrf.c:243)
==23464== by 0x48F4468: vrf_terminate (vrf.c:532)
==23464== by 0x13D8C5: sigint (main.c:172)
==23464== by 0x48DD25C: quagga_sigevent_process (sigevent.c:105)
==23464== by 0x48F0502: thread_fetch (thread.c:1417)
==23464== by 0x48AC82B: frr_run (libfrr.c:1023)
==23464== by 0x13DD02: main (main.c:483)
==23464== Address 0x5152788 is 104 bytes inside a block of size 112 free'd
==23464== at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23464== by 0x48B25B8: qfree (memory.c:129)
==23464== by 0x48EA335: route_node_destroy (table.c:500)
==23464== by 0x48E967F: route_node_free (table.c:90)
==23464== by 0x48E9742: route_table_free (table.c:124)
==23464== by 0x48E9599: route_table_finish (table.c:60)
==23464== by 0x170CEA: zebra_router_free_table (zebra_router.c:165)
==23464== by 0x170DB4: zebra_router_release_table (zebra_router.c:188)
==23464== by 0x17AAD2: zebra_vrf_disable (zebra_vrf.c:222)
==23464== by 0x48F3F0C: vrf_disable (vrf.c:313)
==23464== by 0x48F3CCF: vrf_delete (vrf.c:223)
==23464== by 0x48F4468: vrf_terminate (vrf.c:532)
==23464== Block was alloc'd at
==23464== at 0x4837B65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23464== by 0x48B24A2: qcalloc (memory.c:110)
==23464== by 0x48EA2FE: route_node_create (table.c:488)
==23464== by 0x48E95C7: route_node_new (table.c:66)
==23464== by 0x48E95E5: route_node_set (table.c:75)
==23464== by 0x48E9EA9: route_node_get (table.c:326)
==23464== by 0x48E1EDB: srcdest_rnode_get (srcdest_table.c:244)
==23464== by 0x16EA4B: rib_add_multipath (zebra_rib.c:2730)
==23464== by 0x1A5310: zread_route_add (zapi_msg.c:1592)
==23464== by 0x1A7B8E: zserv_handle_commands (zapi_msg.c:2579)
==23464== by 0x19D689: zserv_process_messages (zserv.c:523)
==23464== by 0x48F09F8: thread_call (thread.c:1599)
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a dataplane plugin module as a sample or reference for
folks who might like to integrate with the zebra dataplane
subsystem. This isn't part of the FRR build or product; there
are some simple build and load-at-runtime instructions in
comments in the file.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The zvni_map_to_svi function may return NULL as such prevent
a deref and crash. Found via coverity
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fix 2 Coverity issues:
1) zebra_nhg.c -> all paths in nhg_ctx_process_finish have
already deref'ed the ctx pointer no need for a test of it
2) the **ifp pointer passed in may be NULL. Prevent an accidental
deref if calling function does not pass in a ifp pointer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Checkpatch was complaining because this code was extending
beyond 80 characters on a couple lines. Adjusted a conditional
tree to fix that.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a private header file for functions that are internal/special
case like how we do it for `lib/nexthop_group_private.h`.
Remove a bunch of functions from the header file only being used
statically and add some comments for those remaining to indicate
better what their use is.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-work the validity setting and checking APIs
for nhg_hash_entry's to make them clearer.
Further, they were originally only beings set
on ifdown and install. Extended their use into
releasing entries and to account for setting
the validity of a recursive dependent.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The commenting for why we would need to requeue a
group from the kernel to be later processed was not
sufficient. Add a better explanation for the flow
and state of the system.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Change the wording of the flag indicating we have received
a nexthop group from the kernel with a different ID but
is fundamentally identical to one we already have.
It was colliding with a flag of similar name in the nexthop struct.
Change it from NEXTHOP_GROUP_DUPLICATE -> NEXTHOP_GROUP_UNHASHABLE
since it is in fact unhashable.
Also change the wording of functions and comments referencing the same
problem.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When determining whether to set the nhg_hash_entry as
invalid, we should have been checking the depends, not
the dependents. If its a group and at least one of its
depends is valid, the group is still valid.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Guard against an overflow read when processing
nexthop groups from netlink. Add a check to ensure
we don't try to write passed the array size.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Now with this patch we can't use shutdown for cleanup:
```
commit 2fc69f03d2 (pr_5079)
Author: Mark Stapp <mjs@voltanet.io>
Date: Fri Sep 27 12:15:34 2019 -0400
zebra: during shutdown processing, drop dplane results
Don't process dataplane results in zebra during shutdown (after
sigint has been seen). The dplane continues to run in order to
clean up, but zebra main just drops results.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
```
Adjusted nhg uninstall handling to clear data and other
cleanup before sending to the dataplane.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a comment to the header of `zebra_nhg.c` to point the reader
to where the hashtables containing the nhg entries are held.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Reduce the api for deleting nexthops and the containing
group to just one call rather than having a special case
and handling it separately.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
With the new nexthop group shared memory framework, pointers
are being used in route_entry for the nexthop_group. Update
the use of this in `mpls_ftn_uninstall()` to reflect the change.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If the vrf lookup fails, use the default namespace
to find/delete the nexthop group from the kernel because it
should be there anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Check both the nhg and nexthop are not NULL before passing
them to be hashed. Clang SA caught this.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some boilerplate for nexthop installation for bsd kernels.
They do not support nexthop objects for now so its just boilerplate.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When moving the nexthop group in a route entry to be a pointer,
we missed one wrapped in a `ifndef` for when the kernel doesn't
have netlink.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We do not need to check that the nexthop is installed or queued
when sending a route deletion since we only need to the prefix for it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In lieu of the fact that we probably shouldn't change show
command output too much, changing this to only give nhe_id
output when the user explicitly asks for it. Probably only
going to be used for debugging for now anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Only used the afi passed into `zebra_nhg_find()` for nexthops
that are blackhole/ifindex. Others should use the type actually declared
in the nexthop struct itself.
Basically, nexthop objects of type blackhole/ifindex in the kernel must
have an address family, they cannot be ambigious and be shared.
This is some requirement in the linux ip core code.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a mechanism to requeue groups we receive from the
kernel if the IDs are in a weird order (Group ID is lower
than individual nexthop IDs for example).
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If we get a nexthop group from the kernel with labels
and queue it as a context to process later, we have to
free the label stack we allocated.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some getters for the nhg_ctx struct. Probably unnecessary
at this point since they are all static but if they ever become
public it will be nice to have them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add code for handling nexthop group hash entry encaps
and sending them to the kernel. Add some more debugging
information for the encaps and groups in general.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
There was some code copypasta for mpls stack building in the
netlink install path. Reduced that to a common function.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When querying for detailed route information, show the nexthop
group id for its nh_hash_entry in the output before listing the
nexthops.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some more detailed output to `show nexthop-group`.
It closely resembles the output of `show ip routes`.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Optimize the fib and notified nexthop group comparison algorithm
to assume ordering. There were some pretty serious performance hits with
this on high ecmp routes.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Update nhg_hash_entry to use the non-recursive version of
nexthop_group_equal() since it doesn't really need to compare all
of those.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were waiting until install time to mark nexthops as duplicate.
Since they are immutable now and re-used, move this marking into
when they are actually created to save a bunch of cycles.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Before checking the equivalence of the whole group itself,
check to see if they contain the same number of non-recursive
active nexthops. This should shorten lookup time for the case of
non-resolved nexthop group creation.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Create any depends only after the initial hash lookup
fails. Should reduce hashing cpu cycles significantly.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we receive a route delete from the kernel and it
contains a nexthop object id, use that to match against
route gateways with instead of explicit nexthops.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the supports_nh bool indicating whether the kernel we are
using supports nexthop objects into the netlink kernel interface
itself. Since only linux and netlink support nexthop object APIs
for now this is fine.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add handling for delete/update nexthop object messages from the
kernel.
If someone deletes a nexthop object we are still using, send it back
down. If the someone updates a nexthop we are using, replace that nexthop
with ours. Routes are referencing this nexthop object ID and we resolved
it ourselves, so we should force the other `someone` to submit to our
will.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
On restart, if we failed to remove any nexthop objects due
to a kill -9 or such event, sweep them if we aren't using them.
Add a proto field to handle this and remove the is_kernel bool.
Add a dupicate flag that indicates this nexthop group is only
present in our ID hashtable. It is a dupicate nexthop we received
from the kernel, therefore we cannot hash on it.
Make the idcounter globally accessible so that kernel updates
increment it as soon as we receive them, not when we handle them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Give all nhg_hash_entrys we install into the kernel
as nexthop objects a defined proto matching the zebra
rib table one. This makes sense since nhe's are proto-independent
and determined exclusively in zebra.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The kernel does not allow duplicate IDs in the same group, but
we are perfectly find with it internally if two different
nexthops resolve the the same nexthop (default route for instance).
So, we have to handle this when we get ready to install.
Further, pass the max group size in the arguments to ensure we
don't overflow. Don't actually think this is possible due to
multipath checking in nexthop_active_update() but better to be
safe.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We need to handle refcnt differently if we ever start making
upper level protocols aware of nhg_hash_entry IDs.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If the nhe was successfully installed, make sure its marked
as valid. Not fully sure how/where the valid flag is going to
be used yet.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were setting a group to be recursive if its first depend
was. This is not the case; individual depends of the group
might be recursive but the group itself is not.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the resolving and installing of a single nhg_hash_entry
into the install function itself, rather than letting zebra_rib
handle it.
Further, ensure depends are installed/queued before installing
a group. The ordering should be find here since only one thread
will call this API.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the installation of an nhe out of nexthop_active_update()
and into the rib install path. So, only install the nhe when
a route using it is being installed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Before we install a route, we verify that the nhg_hash_entry is installed.
Allow the nhe to be queued as well and still pass the route
install along.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Switch the nhg_connected tree structures to use the new
RB tree API in `lib/typerb.h`. We were using the openbsd-tree
implementation before.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were not setting the NEXTHOP_GROUP_RECURSIVE flag via
the rib find path. Adding a check and set after successful
creation of a new nhg_hash_entry.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When going through the zebra_nhg_rib_find(), we now handle the
case of if that nexthop has been recursively resolved. A depend
is created and passed along to zebra_nhg_find().
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a refcnt as soon as depend is connected to mark
that this is being referenced as part of a group or
resolving another one. If the one referencing it
is never used, decrement it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some helper functions for finding/creating nexthop
group hash entries and assigning them as a depends for
another one using them in a group or resolving to them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some helper functions for ref incrementing and
decrementing the depends of a nexthop group hash entry.
This just abstracts the RB tree manipulation a bit more.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We are using the rib workqueue to handle nexthop groups
from the kernel and no longer need this.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Set the resolved nhg during the find path, rather
than after it has been created. This make more sense
now that we are hashing on the resolved nexthop as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Refactor/move around the code for nexthop resolution so
that it occurs only when the nexthop actually changes. Further,
provide a helper function to make the code more readable.
Also, remove the check for NEXTHOPS_CHANGED as this flag is used
specifcially for nexthop tracking and not an appropriate check
here.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When hashing/creating the NHE, use the nexthops vrf as its
source of data. This is gotten directly from an interface
and should not come from a route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When updating a route's referenced NHE, accept a NULL value
as valid and clear out the pointer in the struct.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When the resolved nexthop changes, we should increment the new
resolved NHE by the refcnt for the unresolved NHE being used
by the routes and decrement the old one by the same amount.
Before, we were simple incrementing by one, causing incorrect refcnts
to occur.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add cli to show nhg_hash_entry's by ID.
Add cli to show nhg_hash_entry info for interfaces and remove
just listing ID's in `show interface *`
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Ignore the cleanup for now until we get the timing
figured out without using the kernel nexthop object API.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If the nhg_hash_entry is a group, check if its members
are valid before setting it invalid. If even one is valid,
then this group should still be considered valid.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Only remove a route if the nexthop it is using is still installed.
If a nexthop object is removed from the kernel, all routes referencing
it will be removed from the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We should create a new NHE if the mpls labels change
since we hash on them. This adds the functonality to do that
and decrement the refcnt on the old one.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
On startup when we are requesting all nexthop objects
from the kernel and it doesn't support that, we should not
produce an error message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In zebra_nhg_find(), if we created a nhg_hash_entry, return
true so we know rib-side.
Kernel-side, we don't care since it will always just enqueue
a context to process later.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the ability to recursively resolve nexthop group hash entries
and resolve them when sending to the kernel.
When copying over nexthops into an NHE, copy resolved info as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Only queue a nexthop object update if the dataplane
supports nexthop objects. Otherwise, mark it as a success
since we should only me sending them to the kernel
if we think they are valid anywyay.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were only setting and checking the ifindex if
the nexthop had an *_IFINDEX type. However, when nexthop
active checking is done, the non-*_IFINDEX types can also
obtain a nexthop with an ifindex and are thus valid too.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We will use a nhe context for dataplane interaction with
nextho group hash entries.
New nhe's from the kernel will be put into a group array
if they are a group and queued on the rib metaq to be processed
later.
New nhe's sent to the kernel will be set on the dataplane context
with approprate ID's in the group array if needed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Upon release, call the approprate functions to remove itself
from depends/dependents trees it is in.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some functions to iterate over the depends/dependents
RB tree and remove themselves from the other's RB tree.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Can't RM_REMOVE directly with a key, you need to actually pass the
data to be removed. So, lookup with a key first to find the node,
then remove it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Check to make sure the route entry has a nexthop
group before we try to free after a table lookup
failure.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Removed a static function that did not need to be
there. The nhg_connected_cmp() function provides
all the needed functionality for comparing ID's
in the RB tree.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Update the zebra_nhg_hash_equal() function to use
the nexthop_group_equal() function in lib/nexthop_group
instead of comparing their depends RB tree.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Removing this function since the new paradigm
of everything just being nhg_connected structs
makes it not make a lot of sense.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Put the setting of the ifp on a nexthop group hash
entry into the zebra_nhg_alloc() function. It should
only be added if its not a group/recursive (it doesn't
have any depends) and its nexthop type has an ifindex.
This also provides functionality for proto-side ifp
setting.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
A nexthop group should not have a VRF ID. Only individual
nexthops need to be using a VRF. Fixed this both kernel and
proto side.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-organize and expose the nhg_connected functions so that
it can be used outside zebra_nhg.c. And then abstract those
into zebra_nhg_depends_* and zebra_nhg_depenents_* functons.
Switch the ifp struct to use an RB tree for its dependents,
making use of the nhg_connected functions.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Create a nhg_depenents tree that will function as a way
to get back pointers for NHE's depending on it.
Abstract the RB nodes into nhg_connected for both depends and
dependents. This same struct is used for both.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Default the afi of the nexthop to the route entry using it.
If it turns out to be a group, update the afi to AFI_UNSPEC.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Update rib_add_multipath to use the reference count
increment function for nexthop group hash entries.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>