Trigger: Zebra core seen when we convert l2vni to l3vni and back
BackTrace:
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(_zlog_assert_failed+0xe9) [0x7f4af96989d9]
/usr/lib/frr/zebra(zebra_vxlan_if_vni_up+0x250) [0x5561022ae030]
/usr/lib/frr/zebra(netlink_vlan_change+0x2f4) [0x5561021fd354]
/usr/lib/frr/zebra(netlink_parse_info+0xff) [0x55610220d37f]
/usr/lib/frr/zebra(+0xc264a) [0x55610220d64a]
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7f4af967e96d]
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7f4af9637588]
/usr/lib/frr/zebra(main+0x402) [0x5561021f4d32]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7f4af932624a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f4af9326305]
/usr/lib/frr/zebra(_start+0x21) [0x5561021f72f1]
Root Cause:
In working case,
- We get a RTM_NEWLINK whose ctx is enqueued by zebra dplane and
dequeued by zebra main and processed i.e.
(102000 is deleted from vxlan99) before we handle RTM_NEWVLAN.
- So in handling of NEWVLAN (vxlan99) we bail out since find with
vlan id 703 does not exist.
root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/working/nocras.log | grep "RTM_NEWLINK\|QUEUED\|vxlan99\|in thread"
2024/07/18 23:09:33.741105 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0
2024/07/18 23:09:33.744061 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (65), result QUEUED
2024/07/18 23:09:33.767240 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0
2024/07/18 23:09:33.767380 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (73), result QUEUED
2024/07/18 23:09:33.767389 ZEBRA: [NVFT0-HS1EX] INTF_INSTALL for vxlan99(73)
2024/07/18 23:09:33.767404 ZEBRA: [TQR2A-H2RFY] Vlan-Vni(1186:1186-6000002:6000002) update for VxLAN IF vxlan99(73)
2024/07/18 23:09:33.767422 ZEBRA: [TP4VP-XZ627] Del L2-VNI 102000 intf vxlan99(73)
2024/07/18 23:09:33.767858 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0
2024/07/18 23:09:33.767866 ZEBRA: [KKZGZ-8PCDW] Cannot find VNI for VID (703) IF vxlan99 for vlan state update >>>>BAIL OUT
In failure case,
- The NEWVLAN is received first even before processing RTM_NEWLINK.
- Since the vxlan id 102000 is not removed from the vxlan99,
the find with vlan id 703 returns the 102000 one and we invoke
zebra_vxlan_if_vni_up where the interfaces don't match and assert.
root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/noworking/crash.log | grep "RTM_NEWLINK\|QUEUED\|vxlan99\|in thread"
2024/07/18 22:26:43.829370 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0
2024/07/18 22:26:43.829646 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7fe07c026d30, op INTF_INSTALL, ifindex (65), result QUEUED
2024/07/18 22:26:43.853930 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0
2024/07/18 22:26:43.853949 ZEBRA: [K61WJ-XQQ3X] Intf vxlan99(73) L2-VNI 102000 is UP >>> VLAN PROCESSED BEFORE INTF EVENT
2024/07/18 22:26:43.853951 ZEBRA: [SPV50-BX2RP] RAJA zevpn_vxlanif vxlan48 and ifp vxlan99
2024/07/18 22:26:43.854005 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0
2024/07/18 22:26:43.854241 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=516, seq=0, pid=0
2024/07/18 22:26:43.854251 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=544, seq=0, pid=0
ZEBRA: in thread kernel_read scheduled from zebra/kernel_netlink.c:505 kernel_read()
Fix:
Similar to #13396, where link change
handling was offloaded to dplane, same is being done for vlan events.
Note: Prior to this change, zebra main thread was interested in the
RTNLGRP_BRVLAN. So all the kernel events pertaining to vlan was
handled by zebra main.
With this change change as well the handling of vlan events is still
with Zebra main. However we offload it via Dplane thread.
Ticket :#3878175
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Relocating functions used by vlan in if_netlink into zebra vxlan
Note: Static variable to the functions will be added back in the next
commit.
Ticket :#3878175
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Report the routes metric in IPFORWARDMETRIC1 and return
-1 for the other metrics as required by the IP-FORWARD-MIB.
inetCidrRouteMetric2 OBJECT-TYPE
SYNTAX Integer32
MAX-ACCESS read-create
STATUS current
DESCRIPTION
"An alternate routing metric for this route. The
semantics of this metric are determined by the routing-
protocol specified in the route's inetCidrRouteProto
value. If this metric is not used, its value should be
set to -1."
DEFVAL { -1 }
::= { inetCidrRouteEntry 13 }
I've included metric2 but it's the same for all of them.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The snmp walk of the zebra rib was skipping entries
because in_addr_cmp was replaced with a prefix_cmp
which worked slightly differently causing parts
of the zebra rib tree to be skipped.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Recent commit 4d0e7a49cf
brought in changes that moved a check for ret up
in the code, caused some code to be left around
and be effectively dead since it would never be called.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
With AutoRP discovery running by default, that adds a new
IGMP group that needs to be accounted for in IGMP output.
For pim.py
The clear IGMP interfaces function is in a broken state. It was
already ignoring any errors and returned true always, but with
the addition of the AutoRP discovery group, you could end up
with a different group order in the json which would cause a key
error making the test fail. For now I just added a check to
avoid the key error.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
With AutoRP discovery running by default, that adds a new
IGMP group that needs to be accounted for in IGMP output.
For multicast_pim_sm_topo3:
Ignore the total group number as it is unnecessary for the test.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Uses hardcoded sample AutoRP packets injected in to test
message parsing and proper application of AutoRP learned
RP info. Tests mix of AutoRP and static RP's.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
New CLI commands added:
router pim [vrf NAME]
autorp discovery
autorp announce RP-ADDR [GROUP | group-list PREFIX-LIST]
autorp announce {scope (1-255) | interval (1-65535) | holdtime (0-65535)}
autorp discovery
Enables Auto RP discovery for learning dynamic RP information using the
AutoRP protocol.
autorp announce RP-ADDR [GROUP | group-list PREFIX-LIST]
Enable announcements of a candidate RP with the given group range, or
prefix list of group ranges, to an AutoRP mapping agent.
autorp announce {scope (1-255) | interval (1-65535) | holdtime (0-65535)}
Configure the parameters of the AutoRP announcement messages.
The scope sets the packet TTL.
The interval sets the time between TX of announcements.
The holdtime sets the hold time in the message, the time the mapping
agent should wait before invalidating the candidate RP information.
debug pim autorp
Enable debug logging of the AutoRP protocol
show ip pim [vrf NAME] autorp [json]
Show details of the AutoRP protocol.
To view learned RP info, use the existing command 'show ip pim rp-info'
Extend pim yang for new configuration:
augment /frr-rt:routing/frr-rt:control-plane-protocols/frr-rt:control-plane-protocol/frr-pim:pim/frr-pim:address-family:
+--rw rp
+--rw auto-rp
+--rw discovery-enabled? boolean
+--rw announce-scope? uint8
+--rw announce-interval? uint16
+--rw announce-holdtime? uint16
+--rw candidate-rp-list* [rp-address]
+--rw rp-address inet:ip-address
+--rw (group-or-prefix-list)?
+--:(group)
| +--rw group? frr-route-types:ip-multicast-group-prefix
+--:(prefix-list)
+--rw prefix-list? plist-ref
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Perform AutoRP discovery and candidate RP announcements using the
AutoRP protocol.
Mapping agent is not yet implemented, but this feature is not
necessary for FRR to support AutoRP as we only need one AutoRP
mapping agent in the network.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
We can't use even `string()` function because built-in functions are not
loaded.
Testing with:
```
$ cat /etc/frr/scripts/zebra.lua
function on_rib_process_dplane_results(ctx)
log.warn(string.upper("testas"))
return {}
end
```
This results to "TESTAS" in the logs.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Range is wrong. We want values 1 and 2 but we only test 1.
> >>> for i in range(1, 2):
> ... print(i)
> ...
> 1
Fixes: abd2a1ff3f ("tests: Test some basic kernel <-> zebra interactions")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
There are no tests that ensured that turning off then on
v4 and v6 forwarding actually worked. This does so.
This was found via looking at the code coverage.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>