Currently FRR is limiting the nexthop count to a uint8_t not a
uint16_t. This leads to issues when the nexthop count is 256
which results in the count to overflow to 0 causing problems
in the code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Trigger: Zebra core seen when we convert l2vni to l3vni and back
BackTrace:
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(_zlog_assert_failed+0xe9) [0x7f4af96989d9]
/usr/lib/frr/zebra(zebra_vxlan_if_vni_up+0x250) [0x5561022ae030]
/usr/lib/frr/zebra(netlink_vlan_change+0x2f4) [0x5561021fd354]
/usr/lib/frr/zebra(netlink_parse_info+0xff) [0x55610220d37f]
/usr/lib/frr/zebra(+0xc264a) [0x55610220d64a]
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7f4af967e96d]
/usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7f4af9637588]
/usr/lib/frr/zebra(main+0x402) [0x5561021f4d32]
/lib/x86_64-linux-gnu/libc.so.6(+0x2724a) [0x7f4af932624a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85) [0x7f4af9326305]
/usr/lib/frr/zebra(_start+0x21) [0x5561021f72f1]
Root Cause:
In working case,
- We get a RTM_NEWLINK whose ctx is enqueued by zebra dplane and
dequeued by zebra main and processed i.e.
(102000 is deleted from vxlan99) before we handle RTM_NEWVLAN.
- So in handling of NEWVLAN (vxlan99) we bail out since find with
vlan id 703 does not exist.
root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/working/nocras.log | grep "RTM_NEWLINK\|QUEUED\|vxlan99\|in thread"
2024/07/18 23:09:33.741105 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0
2024/07/18 23:09:33.744061 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (65), result QUEUED
2024/07/18 23:09:33.767240 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0
2024/07/18 23:09:33.767380 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2244000cf0, op INTF_INSTALL, ifindex (73), result QUEUED
2024/07/18 23:09:33.767389 ZEBRA: [NVFT0-HS1EX] INTF_INSTALL for vxlan99(73)
2024/07/18 23:09:33.767404 ZEBRA: [TQR2A-H2RFY] Vlan-Vni(1186:1186-6000002:6000002) update for VxLAN IF vxlan99(73)
2024/07/18 23:09:33.767422 ZEBRA: [TP4VP-XZ627] Del L2-VNI 102000 intf vxlan99(73)
2024/07/18 23:09:33.767858 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0
2024/07/18 23:09:33.767866 ZEBRA: [KKZGZ-8PCDW] Cannot find VNI for VID (703) IF vxlan99 for vlan state update >>>>BAIL OUT
In failure case,
- The NEWVLAN is received first even before processing RTM_NEWLINK.
- Since the vxlan id 102000 is not removed from the vxlan99,
the find with vlan id 703 returns the 102000 one and we invoke
zebra_vxlan_if_vni_up where the interfaces don't match and assert.
root@leaf2:mgmt:/var/log/frr# cat ~/raja_logs/noworking/crash.log | grep "RTM_NEWLINK\|QUEUED\|vxlan99\|in thread"
2024/07/18 22:26:43.829370 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=616, seq=0, pid=0
2024/07/18 22:26:43.829646 ZEBRA: [K8FXY-V65ZJ] Intf dplane ctx 0x7fe07c026d30, op INTF_INSTALL, ifindex (65), result QUEUED
2024/07/18 22:26:43.853930 ZEBRA: [QYXB9-6RNNK] RTM_NEWVLAN bridge IF vxlan99 NS 0
2024/07/18 22:26:43.853949 ZEBRA: [K61WJ-XQQ3X] Intf vxlan99(73) L2-VNI 102000 is UP >>> VLAN PROCESSED BEFORE INTF EVENT
2024/07/18 22:26:43.853951 ZEBRA: [SPV50-BX2RP] RAJA zevpn_vxlanif vxlan48 and ifp vxlan99
2024/07/18 22:26:43.854005 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=508, seq=0, pid=0
2024/07/18 22:26:43.854241 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=516, seq=0, pid=0
2024/07/18 22:26:43.854251 ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_NEWLINK(16), len=544, seq=0, pid=0
ZEBRA: in thread kernel_read scheduled from zebra/kernel_netlink.c:505 kernel_read()
Fix:
Similar to #13396, where link change
handling was offloaded to dplane, same is being done for vlan events.
Note: Prior to this change, zebra main thread was interested in the
RTNLGRP_BRVLAN. So all the kernel events pertaining to vlan was
handled by zebra main.
With this change change as well the handling of vlan events is still
with Zebra main. However we offload it via Dplane thread.
Ticket :#3878175
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
Instead of keeping a counter that is independent
of the queue's data structure. Just use the queue's
built-in counter. Ensure that it's pthread safe by
keeping it wrapped inside the mutex for adding/deleting
to the queue.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a bunch of set functions and associated data structure in
zebra_dplane to allow the configuration of the source address for SRv6
encap in the data plane.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
The linux kernel can send up a flag that tells us that the
connected address is not a PREFIXROUTE. Add the ability
to note this and pass it up from the data plane.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The iprule/pbr rule object has a vrf id, and zebra uses
that internally, but the vrf id isn't returned to clients
who install rules and are waiting for results. Include the
vrf_id sent by the client in the zapi result notification
message; update the existing clients so they decode the id.
Signed-off-by: Mark Stapp <mjs@labn.net>
Add the txqlen attribute to the common interface struct. Capture
the value in zebra, and distribute it through the interface lib
module's zapi messaging.
Signed-off-by: Mark Stapp <mjs@labn.net>
Upon some internal testing some crashes were found. This fixes
the several crashes and normalizes the code to be closer in
it's execution pre and post changes to use the data plane.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
a) Move the reads of link and address information
into the dplane
b) Move the startup read of data into the dplane
as well.
c) Break up startup reading of the linux kernel data
into multiple phases. As that we have implied ordering
of data that must be read first and if the dplane has
taken over some data reading then we must delay initial
read-in of other data.
Fixes: #13288
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
1) Add a bunch of get/set functions and associated data
structure in zebra_dplane to allow the setting and retrieval
of interface netlink data up into the master pthread.
2) Add a bit of code to breakup startup into stages. This is
because FRR currently has a mix of dplane and non dplane interactions
and the code needs to be paused before continuing on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
dplane_mac_info and dplane_neigh_info is modified to be vni aware.
dplane_rem_mac_add/del dplane_mac_init is modified to be vni aware.
During dplane context update (mac and neigh), we use the vni information
and if set, corresponding netlink attribute NDA_SRC_VNI is set and passed to the
dplane.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
Read from the fpm dplane a route update that will
include status about whether or not the asic was
successfull in offloading the route.
Have this data passed up to zebra for processing and disseminate
this data as appropriate.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add the initial step of passing in a dplane context
to reading route netlink messages. This code
will be run in two contexts:
a) The normal pthread for reading netlink messages from
the kernel
b) The dplane_fpm_nl pthread.
The goal of this commit is too just allow a) to work
b) will be filled in in the future. Effectively
everything should still be working as it should
pre this change. We will just possibly allow
the passing of the context around( but not used )
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In order for a future commit to abstract the dplane_ctx_route_init
so that the kernel can use it, let's move some stuff around
and add a dplane_ctx_route_init_basic that can be used by multiple
different paths
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
create a dplane_ctx_route_init_basic so it can be used
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Zebra needs the ability to pass this data around.
Add it to the dplanes ability to pass.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
zebra: Add a dplane_ctx_set_flags
The dplane_ctx_set_flags call is missing, we will need it. Add it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This allows Zebra to manage QDISC, TCLASS, TFILTER in kernel and do cleaning
jobs when it starts up.
Signed-off-by: Siger Yang <siger.yang@outlook.com>
This commit implements necessary netlink encoders for traffic control
including QDISC, TCLASS and TFILTER, and adds basic dplane operations.
Co-authored-by: Stephen Worley <sworley@nvidia.com>
Signed-off-by: Siger Yang <siger.yang@outlook.com>
When reading a on the fly change of an interested netconf netlink
message. The ifindex and ns_id for the context was being set for the sub structure
but not for the main context data structure and zebra_if_dplane_result
was dropping the result on the floor because it was expecting the ns_id and
the interface id to be in a different spot.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Only clear protodown reason on shutdown/sweep, retain protodown
state.
This is to retain traditional and expected behavior with daemons
like vrrpd setting protodown. They expet it to be set on shutdown
and retained on bring up to prevent traffic from being dropped.
We must cleanup our reason code though to prevent us from blocking
others.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add support for setting the protodown reason code.
829eb208e8
These patches handle all our netlink code for setting the reason.
For protodown reason we only set `frr` as the reason externally
but internally we have more descriptive reasoning available via
`show interface IFNAME`. The kernel only provides a bitwidth of 32
that all userspace programs have to share so this makes the most sense.
Since this is new functionality, it needs to be added to the dplane
pthread instead. So these patches, also move the protodown setting we
were doing before into the dplane pthread. For this, we abstract it a
bit more to make it a general interface LINK update dplane API. This
API can be expanded to support gernal link creation/updating when/if
someone ever adds that code.
We also move a more common entrypoint for evpn-mh and from zapi clients
like vrrpd. They both call common code now to set our internal flags
for protodown and protodown reason.
Also add debugging code for dumping netlink packets with
protodown/protodown_reason.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Store the fd that corresponds to the appropriate `struct nlsock` and pass
that around in the dplane context instead of the pointer to the nlsock.
Modify the kernel_netlink.c code to store in a hash the `struct nlsock`
with the socket fd as the key.
Why do this? The dataplane context is used to pass around the `struct nlsock`
but the zebra code has a bug where the received buffer for kernel netlink
messages from the kernel is not big enough. So we need to dynamically
grow the receive buffer per socket, instead of having a non-dynamic buffer
that we read into. By passing around the fd we can look up the `struct nlsock`
that will soon have the associated buffer and not have to worry about `const`
issues that will arise.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Store and use the sequence number instead of using what is in
the `struct nlsock`. Future commits are going away from storing
the `struct nlsock` and the copy of the nlsock was guaranteeing
unique sequence numbers per message. So let's store the
sequence number to use instead.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The dplane_ctx_get_pbr_ipset_entry function only
failed when the caller did not pass in a valid
usable pointer. Change the code to assert on
a pointer not being passed in and remove the
bool return
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The only time this function ever failed is when
the developer does not pass in a usable pointer
to place the data in. Change it to an assert
to signify to the end developer that is what
we want and then remove all the if checks
for failure
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The function call dplane_ctx_get_pbr_ipset only
returns false when the calling function fails to
pass in a valid ipset pointer. This should
be an assertion issue since it's a programming
issue as opposed to an actual run time issue.
Change the function call parameter to not return
a bool on success/fail for a compile time decision.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Include the complete set of primary and backup nexthops from
the resolving route for a pseudowire. Add accessors for that
info. Modify the logic that creates the fib set of pw nexthops
so that only installed, labelled nexthops are included.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
- gre keys are collected and stored locally.
- when gre source set is requested, and the link interface
configured is different, the gre information collected is
pushed in the query, namely source ip or gre keys if present.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
preserve mtu upon interface flapping and tunnel source change.
Signed-off-by:Reuben Dowle <reuben.dowle@4rf.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This action is initiated by nhrp and has been stubbed when
moving to zebra. Now, a netlink request is forged to set
the link interface of a gre interface if that gre interface
does not have already a link interface.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>