Currently if a end user has something like this:
Routing entry for 192.168.212.1/32
Known via "kernel", distance 0, metric 100, best
Last update 00:07:50 ago
* directly connected, ens5
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
K>* 0.0.0.0/0 [0/100] via 192.168.212.1, ens5, src 192.168.212.19, 00:00:15
C>* 192.168.212.0/27 is directly connected, ens5, 00:07:50
K>* 192.168.212.1/32 [0/100] is directly connected, ens5, 00:07:50
And FRR does a link flap, it refigures the route and rejects the default
route:
2022/04/09 16:38:20 ZEBRA: [NZNZ4-7P54Y] default(0:254):0.0.0.0/0: Processing rn 0x56224dbb5b00
2022/04/09 16:38:20 ZEBRA: [ZJVZ4-XEGPF] default(0:254):0.0.0.0/0: Examine re 0x56224dbddc20 (kernel) status: Changed Installed flags: Selected dist 0 metric 100
2022/04/09 16:38:20 ZEBRA: [GG8QH-195KE] nexthop_active_update: re 0x56224dbddc20 nhe 0x56224dbdd950 (7), curr_nhe 0x56224dedb550
2022/04/09 16:38:20 ZEBRA: [T9JWA-N8HM5] nexthop_active_check: re 0x56224dbddc20, nexthop 192.168.212.1, via ens5
2022/04/09 16:38:20 ZEBRA: [M7EN1-55BTH] nexthop_active: Route Type kernel has not turned on recursion
2022/04/09 16:38:20 ZEBRA: [HJ48M-MB610] nexthop_active_check: Unable to find active nexthop
2022/04/09 16:38:20 ZEBRA: [JPJF4-TGCY5] default(0:254):0.0.0.0/0: After processing: old_selected 0x56224dbddc20 new_selected 0x0 old_fib 0x56224dbddc20 new_fib 0x0
So the 192.168.212.1 route is matched for the nexthop but it is not connected and
zebra treats it as a problem. Modify the code such that if a system route
matches through another system route, then it should work imo.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This bug should only really affect kernel routes. To reproduce:
a) Have multiple connected routes that point to the same prefix
swp8 up default 169.254.0.250/30
swp9 up default 169.254.0.250/30
b) Have a kernel route that uses one of those connected routes
7.6.2.8 via 169.254.0.249 dev swp8 proto static
(But have it choose a non-selected connected nexthop)
c) Introduce an event that causes the rib table to be reprocessed,
say a unrelated interface going up / down
This causes the route to be lost with this message:
2022/03/28 21:21:53 ZEBRA: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID (3454): 169.254.0.249, via swp8(1383) vrf default(0)
2022/03/28 21:21:53 ZEBRA: [YF2E6-J60JH] nexthop_active: 169.254.0.249, via swp8 given ifindex does not match nexthops ifindex found found: directly connected, swp9
Effectively the nexthop that zebra is choosing would not be the one
that the kernel route has choosen and FRR removes the route:
022/03/28 21:21:53 ZEBRA: [NM15X-X83N9] rib_process: (0:254):7.6.2.8/32: rn 0x56042e632e90, removing re 0x56042e6316e0
2022/03/28 21:21:53 ZEBRA: [Y53JX-CBC5H] rib_unlink: (0:254):7.6.2.8/32: rn 0x56042e632e90, re 0x56042e6316e0
2022/03/28 21:21:53 ZEBRA: [KT8QQ-45WQ0] rib_gc_dest: (0:?):7.6.2.8/32: removing dest from table
What is happening?
Zebra is not looking at all connected routes and if any of them
would have the appropriate ifindex and just blindly rejecting
the route.
So when nexthop resolution happens and it matches a connected
route and the dest->selected nexthop ifindex does not match, let's sort
through the rest of them and see if any of them match and if so
let's keep the route.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Since there are two kinds of ESI (Type-0 and Type-3), the warnings
should distinguish between the two cases.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
It's confusing for a user to see 'Tx RA failed' in the logs when
they've enabled RAs (either through interface config or BGP unnumbered)
on an interface that can't send them. Let's avoid sending RAs on
interfaces that are bridge_slaves or don't have a link-local address,
since they are the two of the most common reasons for RA Tx failures.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
The bounded vrf of `l2vni/zevpn` have wrong relation with the order
in which vxlan interface and svi interface are set.
If set vxlan interface with vlanid first, then set svi interface with
vrf, it is ok that vxlan interface will get correct `vrf` inherited
from svi. But reverse the set sequence (i.e. set svi first, then vxlan),
vxlan interface can't get correct `vrf`, becasue the handling of
`ZEBRA_VXLIF_VLAN_CHANGE` missed inheritting `vrf` by mistake.
```
host# do show evpn vni 101
VNI: 101
Type: L2
Tenant VRF: vrf1
```
So update `vrf` ("Tenant VRF") of l2vni in `zebra_vxlan_if_update()`.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Since `NDA_VLAN` is no longer mannually defined in header file,
the check for `NDA_VLAN` should be removed.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Like `zvni_map_to_svi_ns()` for `ns_walk_func()`, just use "assert"
instead of unnecessary check.
Since these parameters for `ns_walk_func()`, e.g. `in_param` and others,
must not be NULL. So use `assert` to ensure the these parameters, and
remove those unnecessary checks.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
When a client disconnects, we need to check & remove NHT entries for
other SAFIs too. Otherwise we crash later trying to access stale data.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
There exists code paths in the linux kernel where a dump command
will be interrupted( I am not sure I understand what this really
means ) and the data sent back from the kernel is wrong or incomplete.
At this point in time I am not 100% certain what should be done, but
let's start noticing that this has happened so we can formulate a plan
or allow the end operator to know bad stuff is a foot at the circle K.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In the FreeBSD code if you delete the interface
and it has no configuration, the ifp pointer will
be deleted from the system *but* zebra continues
to dereference the just freed pointer.
==58624== Invalid read of size 1
==58624== at 0x48539F3: strlcpy (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==58624== by 0x2B0565: ifreq_set_name (ioctl.c:48)
==58624== by 0x2B0565: if_get_flags (ioctl.c:416)
==58624== by 0x2B2D9E: ifan_read (kernel_socket.c:455)
==58624== by 0x2B2D9E: kernel_read (kernel_socket.c:1403)
==58624== by 0x499F46E: thread_call (thread.c:2002)
==58624== by 0x495D2B7: frr_run (libfrr.c:1196)
==58624== by 0x2B40B8: main (main.c:471)
==58624== Address 0x6baa7f0 is 64 bytes inside a block of size 432 free'd
==58624== at 0x484ECDC: free (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==58624== by 0x4953A64: if_delete (if.c:283)
==58624== by 0x2A93C1: if_delete_update (interface.c:874)
==58624== by 0x2B2DF3: ifan_read (kernel_socket.c:453)
==58624== by 0x2B2DF3: kernel_read (kernel_socket.c:1403)
==58624== by 0x499F46E: thread_call (thread.c:2002)
==58624== by 0x495D2B7: frr_run (libfrr.c:1196)
==58624== by 0x2B40B8: main (main.c:471)
==58624== Block was alloc'd at
==58624== at 0x4851381: calloc (in /usr/local/libexec/valgrind/vgpreload_memcheck-amd64-freebsd.so)
==58624== by 0x496A022: qcalloc (memory.c:116)
==58624== by 0x49546BC: if_new (if.c:164)
==58624== by 0x49546BC: if_create_name (if.c:218)
==58624== by 0x49546BC: if_get_by_name (if.c:603)
==58624== by 0x2B1295: ifm_read (kernel_socket.c:628)
==58624== by 0x2A7FB6: interface_list (if_sysctl.c:129)
==58624== by 0x2E99C8: zebra_ns_enable (zebra_ns.c:127)
==58624== by 0x2E99C8: zebra_ns_init (zebra_ns.c:214)
==58624== by 0x2B3FF2: main (main.c:401)
==58624==
Zebra needs to pass back whether or not the ifp pointer
was freed when if_delete_update is called and it should
then check in ifan_read as well as ifm_read that the
ifp pointer is still valid for use.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add new debug output to show the string of the message type that
is currently unhandled:
2022-03-24 18:30:15.284 [DEBG] zebra: [V3NSB-BPKBD] Kernel:
2022-03-24 18:30:15.284 [DEBG] zebra: [HDTM1-ENZNM] Kernel: message seq 792
2022-03-24 18:30:15.284 [DEBG] zebra: [MJD4M-0AAAR] Kernel: pid 594488, rtm_addrs {DST,GENMASK}
2022-03-24 18:30:15.285 [DEBG] zebra: [GRDRZ-0N92S] Unprocessed RTM_type: RTM_NEWMADDR(d)
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running zebra w/ valgrind, it was noticed that there
was a bunch of passing uninitialized data to the kernel:
==38194== Syscall param ioctl(generic) points to uninitialised byte(s)
==38194== at 0x4CDF88A: ioctl (in /lib/libc.so.7)
==38194== by 0x49A4031: vrf_ioctl (vrf.c:860)
==38194== by 0x2AFE29: vrf_if_ioctl (ioctl.c:91)
==38194== by 0x2AFF39: if_get_mtu (ioctl.c:161)
==38194== by 0x2B12C3: ifm_read (kernel_socket.c:653)
==38194== by 0x2A7F76: interface_list (if_sysctl.c:129)
==38194== by 0x2E9958: zebra_ns_enable (zebra_ns.c:127)
==38194== by 0x2E9958: zebra_ns_init (zebra_ns.c:214)
==38194== by 0x2B3F82: main (main.c:401)
==38194== Address 0x7fc000967 is on thread 1's stack
==38194== in frame #3, created by if_get_mtu (ioctl.c:155)
==38194==
==38194== Syscall param ioctl(generic) points to uninitialised byte(s)
==38194== at 0x4CDF88A: ioctl (in /lib/libc.so.7)
==38194== by 0x49A4031: vrf_ioctl (vrf.c:860)
==38194== by 0x2AFE29: vrf_if_ioctl (ioctl.c:91)
==38194== by 0x2AFED9: if_get_metric (ioctl.c:143)
==38194== by 0x2B12CB: ifm_read (kernel_socket.c:655)
==38194== by 0x2A7F76: interface_list (if_sysctl.c:129)
==38194== by 0x2E9958: zebra_ns_enable (zebra_ns.c:127)
==38194== by 0x2E9958: zebra_ns_init (zebra_ns.c:214)
==38194== by 0x2B3F82: main (main.c:401)
==38194== Address 0x7fc000967 is on thread 1's stack
==38194== in frame #3, created by if_get_metric (ioctl.c:137)
==38194==
==38194== Syscall param ioctl(generic) points to uninitialised byte(s)
==38194== at 0x4CDF88A: ioctl (in /lib/libc.so.7)
==38194== by 0x49A4031: vrf_ioctl (vrf.c:860)
==38194== by 0x2AFE29: vrf_if_ioctl (ioctl.c:91)
==38194== by 0x2B052D: if_get_flags (ioctl.c:419)
==38194== by 0x2B1CF1: ifam_read (kernel_socket.c:930)
==38194== by 0x2A7F57: interface_list (if_sysctl.c:132)
==38194== by 0x2E9958: zebra_ns_enable (zebra_ns.c:127)
==38194== by 0x2E9958: zebra_ns_init (zebra_ns.c:214)
==38194== by 0x2B3F82: main (main.c:401)
==38194== Address 0x7fc000707 is on thread 1's stack
==38194== in frame #3, created by if_get_flags (ioctl.c:411)
Valgrind is no longer reporting these issues.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When `zebra_evpn_mac_svi_add()` adds one found mac by
`zebra_evpn_mac_lookup()` and the found mac is without
svi flag, then call `zebra_evpn_mac_svi_add()` to create
one appropriate mac, but it will call `zebra_evpn_mac_lookup()`
the second time. So lookup twice, the procedure is redundant.
Just an optimization for it, make sure only lookup once.
Modify `zebra_evpn_mac_gw_macip_add()` to check the `macp`
parameter passed by caller, so it can distinguish whether
really need lookup or not.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
When issuing a RTM_DELETE operation and the kernel tells
us that the route is already deleted, let's not complain
about the situation:
2022/03/19 02:40:34 ZEBRA: [EC 100663303] kernel_rtm: 2a10:cc42:1d51::/48: rtm_write() unexpectedly returned -4 for command RTM_DELETE
I can recreate this issue on freebsd by doing this:
a) create a route using sharpd
b) shutdown the nexthop's interface
c) remove the route using sharpd
This would also be true of pretty much any routing protocol's behavior.
Let's just not complain about the situation if a RTM_DELETE
operation is issued and FRR is told that the route does not
exist to delete.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Since the `RB_INSERT()` is called after not found in RB tree, it MUST be ok and
and return zero. The check of returning value of `RB_INSERT()` is redundant,
just remove them.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Commit: 5d41413833 added 3 new dplane ops:
DPLANE_OP_INTF_INSTALL
DPLANE_OP_INTF_UPDATE
DPLANE_OP_INTF_DELETE
The build system does not build lua so zebra_script.c
was not updated. Update of course!
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When encoding a response to the upper level protocol the
prefixlen is not something that needs to be part of the
switch statement for handling of a prefix.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently the nexthop tracking code is only sending to the requestor
what it was requested to match against. When the nexthop tracking
code was simplified to not need an import check and a nexthop check
in b8210849b8 for bgpd. It was not
noticed that a longer prefix could match but it would be seen
as a match because FRR was not sending up both the resolved
route prefix and the route FRR was asked to match against.
This code change causes the nexthop tracking code to pass
back up the matched requested route (so that the calling
protocol can figure out which one it is being told about )
as well as the actual prefix that was matched to.
Fixes: #10766
Signed-off-by: Donald Sharp <sharpd@nvidia.com>