Upon zebra shutdown hash_clean_and_free is called
where user free function is passed,
The free function should not call hash_release
which lead to double free of hash bucket.
Fix:
The fix is to avoid calling hash_release from
free function if its called from hash_clean_and_free
path.
10 0x00007f0422b7df1f in free () from /lib/x86_64-linux-gnu/libc.so.6
11 0x00007f0422edd779 in qfree (mt=0x7f0423047ca0 <MTYPE_HASH_BUCKET>,
ptr=0x55fc8bc81980) at ../lib/memory.c:130
12 0x00007f0422eb97e2 in hash_clean (hash=0x55fc8b979a60,
free_func=0x55fc8a529478 <svd_nh_del_terminate>) at
../lib/hash.c:290
13 0x00007f0422eb98a1 in hash_clean_and_free (hash=0x55fc8a675920
<svd_nh_table>, free_func=0x55fc8a529478 <svd_nh_del_terminate>) at
../lib/hash.c:305
14 0x000055fc8a5323a5 in zebra_vxlan_terminate () at
../zebra/zebra_vxlan.c:6099
15 0x000055fc8a4c9227 in zebra_router_terminate () at
../zebra/zebra_router.c:276
16 0x000055fc8a4413b3 in zebra_finalize (dummy=0x7fffb881c1d0) at
../zebra/main.c:269
17 0x00007f0422f44387 in event_call (thread=0x7fffb881c1d0) at
../lib/event.c:2011
18 0x00007f0422ecb6fa in frr_run (master=0x55fc8b733cb0) at
../lib/libfrr.c:1243
19 0x000055fc8a441987 in main (argc=14, argv=0x7fffb881c4a8) at
../zebra/main.c:584
Signed-off-by: Chirag Shah <chirag@nvidia.com>
When looping through the dplane providers, the worklist was
being populated with items from the last provider and then
the event system was checked to see if we should stop processing.
If the event system says `yes` then the dplane code would stop
and send the worklist to the master zebra pthread for collection.
This obviously skipped the next dplane provider on the list
which is double plus not good.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The mutex that wraps access to the output buffer
is being held for the entire time the data is
being generated to send down the pipe. Since
the generation has absolutely nothing to do
with the obuf, let's limit the mutex holding some.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In fpm_listener, when a error is detected it would
stop listening and not recover. Modify the code
to close the socket and allow the connection to
recover.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
A recent code change 29122bc9b8
changed the passing of data up the fpm from passing the
tableid and vrf to the sonic expected tableid contains
the vrfid. This violates the assumptions in the code
that the netlink message passes up the tableid as the
tableid. Additionally this code change did not modify
the rib_find_rn_from_ctx to actually properly decode
what could be passed up. Let's just fix this and let
Sonic carry the patch as appropriate for themselves
since they are not the only users of dplane_fpm_nl.c
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When looping through the dplane providers, the worklist was
being populated with items from the last provider and then
the event system was checked to see if we should stop processing.
If the event system says `yes` then the dplane code would stop
and send the worklist to the master zebra pthread for collection.
This obviously skipped the next dplane provider on the list
which is double plus not good.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The SRv6 SID Manager does not allow allocating an SRv6 End/uN function
even though it is already supported by staticd.
Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
When a daemon wants to know about its routes, make it possible to have
that work for dst-src routes.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The Linux kernel doesn't support dst-src routes with NHGs as nexthop,
for some (rather dubious) caching reasons.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Implement the necessary data structures and code changes to support sending
table-direct routes to protocols running in different VRFs.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Return error if IPv6 address or prefix is passed as an argument
to "show ip route" command.
UT:
r1# show ip route 2::3/128
% Cannot specify IPv6 address/prefix for IPv4 table
r1#
r1# show ip route 2::3
% Cannot specify IPv6 address/prefix for IPv4 table
r1#
Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
Current -n option is only for zebra and mgmtd. All other daemons receive
the VRF backend configuration from zebra upon connection to it. This
leads to a potential race condition - daemons need to know the backend
before they start reading their config, but they can be not connected to
zebra yet at this point. As the VRF backend cannot change during runtime,
let's introduce a new global -w option for setting netns backend, to
make sure that all daemons know their VRF backend immediately after
start.
The reason for introducing a new option instead of making -n global is
that ospfd already uses -n for another purposes.
Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
vrf->ns_ctxt is only ever used in zebra, so move its initialization to
zebra's callback. Ideally this pointer shouldn't even be a part of
library's vrf struct, and moved to zebra-specific struct, but this is
the first step.
Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
The backend type cannot be unknown. It is configured to VRF_LITE by
default in zebra anyway, so just init to VRF_LITE in the lib and remove
the UNKNOWN type.
Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
Currently FRR when installing a nexthop group, the installation can fail.
The assumption with the code was that the current nexthop group was
not already installed. This leaves a problem state where if the
users of the nexthop group are removed, the nexthop group will be
removed possibly leaving a orphaned nexthop group in the data plane.
FRR on a nexthop group installation does not actually know the status
of the nexthop group in the kernel. It's possible that a earlier
version of the nexthop group is left in play. It's possible that
there is no nexthop group in the kernel at all. Leaving the
Installed flag alone allows upon Zebra removing the nexthop
group when it is removed from zebra.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently if you have an interface down event, Zebra
sets the nexthop(s) as !ACTIVE that use it. On
interface up events the singleton nexthops are not being
set as ACTIVE. Due to timing events it is sometimes
possible to end up with a route that is using a singleton
Change singleton nexthops to set the nexthop to ACTIVE.
This will allow the nexthop to be reinstalled appropriately
as well.
I was able to easily reproduce this using sharpd since
it does not attempt to reinstall the routes when a interface
goes up/down.
Before:
D>* 10.0.0.0/32 [150/0] via 192.168.102.34, dummy2, weight 1, 00:00:01
sharpd@eva ~/frr5 (master)> sudo ip link set dummy2 down ; sudo ip link set dummy2 up
D> 10.0.0.0/32 [150/0] (350) via 192.168.102.34, dummy2 inactive, weight 1, 00:00:10
After code change:
D>* 10.0.0.0/32 [150/0] (73) via 192.168.102.34, dummy2, weight 1, 00:00:14
sharpd@eva ~/frr5 (master)> sudo ip link set dummy2 down ; sudo ip link set dummy2 up
D>* 10.0.0.0/32 [150/0] (73) via 192.168.102.34, dummy2, weight 1, 00:00:21
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In some cases, the old_re nhe and the newnhe is same and there is no
point in comparing them both since they are the same. Skip comparing in
such cases.
Ex:
2025/01/09 23:49:27.489020 ZEBRA: [W4Z4R-NTSMD] zebra_nhg_rib_find_nhe: => nhe 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489021 ZEBRA: [ZH3FQ-TE9NV] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 new id: 44 old id: 44
2025/01/09 23:49:27.489021 ZEBRA: [YB8HE-Z86GN] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 NEW 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489023 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489024 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489025 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE
2025/01/09 23:49:27.489026 ZEBRA: [ZM3BX-HPETZ] zebra_nhg_rib_compare_old_nhe: 0.0.0.0/0 OLD 0x555f611d30c0 (44[38/39/45])
2025/01/09 23:49:27.489027 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 30.1.2.9[0] vrf default(0) wgt 1, with flags
2025/01/09 23:49:27.489028 ZEBRA: [ZSB1Z-XM2V3] 0.0.0.0/0: NH 20.1.1.2[4] vrf default(0) wgt 1, with flags ACTIVE
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
If you have this series of events:
a) Decision to install a NHG is made in zebra, enqueue to DPLANE
b) Changes to NHG are made and we remove it in the master pthread
Since this NHG is not marked as installed it is not removed
but the NHG data structure is deleted
c) DPLANE installs the NHG
In the end the NHG stays installed but ZEBRA has lost track of it.
Modify the removal code to check to see if the NHG is queued.
There are 2 cases:
a) NHG is kept around for a bit before being deleted. In this case
just see that the NHG is Queued and keep it around too.
b) NHG is not kept around and we are just removing it. In this case
check to see if it is queued and send another deletion event.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
During route processing in zebra, Zebra will create a nexthop
group that matches the nexthops passed down from the routing
protocol. Then Zebra will look to see if it can re-use a
nhe from a previous version of the route entry( say a interface
goes down ). If Zebra decides to re-use an nhe it was just dropping
the route entry created. Which led to nexthop group's that had
a refcount of 0 and in some cases these nexthop groups were installed
into the kernel.
Add a bit of code to see if the returned entry is not being used
and it has no reference count and if so, properly dispose of it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
"ip/ipv6 protocol any route-map <route map>" cli is setting
wrong route type ( ZEBRA_ROUTE_MAX ), It should set route type
ZEBRA_ROUTE_ALL.
Ticket: #4101560
Signed-off-by: Sougata Barik <sougatab@nvidia.com>
Fixing compilation error in a switch statement case
Fixes :aa4786642c9a65c282d0fd5247a35b0f14fa1c3c
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
If Duplicate Address Detection action is freeze
(permanent or definite time means not warn only mode)
then locally duplicate detected MAC delete notification
is not require to inform,
instead ask BGP to sync previous remote MAC entry.
In freeze case local MAC event is not known to BGP,
instead BGP is pointing to remote VTEP for the MAC.
Ticket: #3652383
Issue: 3652383
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Upon if_down, we don't reset the valid flag for dependents
and unset the INSTALLED flag.
So when its time for the NHG to be deleted (routes dereferenced),
zebra deletes it since refcnt goes to 0, but stale NHG remains in kernel.
Ticket :#4200788
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Just like `link down`, check all kernel routes when interface become up.
And, they maybe will be selected as the best one by zebra.
Signed-off-by: anlan_cs <anlan_cs@126.com>
After the nexthop check is fixed, zebra will wrongly uninstall the kernel routes
with inactive nexthop.
This commit would skip the uninstallation for kernel routes.
Signed-off-by: anlan_cs <anlan_cs@126.com>
The kernel routes are wrongly selected even the nexthop interface is linkdown.
Use `ip link set dev <interface> down` on the other box to set the box's
nexthop interface linkdown. The kernel routes will be kept as `linkdown`,
but are still with active nexthop in `zebra`.
Add three changes/commits for kernel routes in this PR:
1) The active nexthop should be the operative interface.
2) Don't uninstall the kernel routes from `zebra` even no active nexthops.
(It doesn't affect the kernel routes' deletion from kernel netlink messages.)
3) Update the kernel routes when the nexthop interface becomes up.
Before: (during nexthop interface is linkdown)
```
K>* 3.3.3.3/32 [0/0] via 88.88.88.1, enp2s0, weight 1, 00:00:14
```
After: (during nexthop interface is linkdown, with all three changes)
```
K 3.3.3.3/32 [0/0] via 88.88.88.1, enp2s0 inactive, weight 1, 00:00:07
```
This commit is 1st change:
Improve the judgment for "active" nexthop to be more accurate, the active
nexthop should be the operative interface.
Signed-off-by: anlan_cs <anlan_cs@126.com>
When debugging a crash I noticed that sometimes we talked about
a zclient connection in relation to the fd associated with it
and sometimes we did not. Let's just always give the data
associated with the fd. It will make it a bit easier for me
to follow the transitions.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add `mrib` flag to existing "show ip route" commands which then use
the multicast safi rather than the unicast safi. Updated the vty output
to include the AFI and SAFI string when printing the table.
Deprecate `show ip rpf` command, aliased to `show ip route mrib`.
Removed `show ip rpf A.B.C.D`.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>