two things:
On shutdown cleanup any events associated with the update walker.
Also do not allow new events to be created.
Fixes this mem-leak:
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790:Direct leak of 8 byte(s) in 1 object(s) allocated from:
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #0 0x7f0dd0b08037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #1 0x7f0dd06c19f9 in qcalloc lib/memory.c:105
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #2 0x55b42fb605bc in rib_update_ctx_init zebra/zebra_rib.c:4383
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #3 0x55b42fb6088f in rib_update zebra/zebra_rib.c:4421
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #4 0x55b42fa00344 in netlink_link_change zebra/if_netlink.c:2221
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #5 0x55b42fa24622 in netlink_information_fetch zebra/kernel_netlink.c:399
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #6 0x55b42fa28c02 in netlink_parse_info zebra/kernel_netlink.c:1183
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #7 0x55b42fa24951 in kernel_read zebra/kernel_netlink.c:493
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #8 0x7f0dd0797f0c in event_call lib/event.c:1995
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #9 0x7f0dd0684fd9 in frr_run lib/libfrr.c:1185
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #10 0x55b42fa30caa in main zebra/main.c:465
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #11 0x7f0dd01b5d09 in __libc_start_main ../csu/libc-start.c:308
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
BGP signals to zebra that a afi has converged immediately
after it has finished processing all routes for a given
afi/safi. This generates events in zebra in this order
a) Routes received from BGP, placed on early-rib Meta-Q
b) Signal GR for the afi.
Now imagine that zebra reads GR code and immediately
processes routes that are in the actual rib and
removes some routes. This generates a
c) route deletion to the kernel for some number of
routes that may be in the the early-rib Meta-Q
d) Process the Meta-Q, and re-install the routes
This is undesirable behavior in zebra. In that
while we may end up in a correct state, there
will be a blip for some number of routes that
happen to be in the early rib Meta-Q.
Modify the GR code to have it's own processing
entry at the end of the Meta-Q. This will
allow all routes to be processed and ready
for handling by the Graceful Restart code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
a) Consolidate v4 and v6 versions of rib_match_multicast
b) Improve debug to show what we matched against as well.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The define of ZEBRA_ON_RIB_PROCESS_HOOK_CALL was in zebra.h
which exposes it to everyone, except zebra is the only daemon
to use this define. This does not beling in zebra.h
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When zebra receives routes from upper level protocols it decodes the
zapi message and places the routes on the metaQ for processing. Suppose
we have a route A that is already installed by some routing protocol.
And there is a route B that has a nexthop that will be recursively
resolved through A. Imagine if a route replace operation for A is
going to happen from an upper level protocol at about the same time
the route B is going to be installed into zebra. If these routes
are received, and decoded, at about the same time there exists a
chance that the metaQ will contain both of them at the same time.
If the order of installation is [ B, A ]. B will be resolved
correctly through A and installed, A will be processed and
re-installed into the FIB. If the nexthops have changed for
A then the owner of B should be notified about the change( and B
can do the correct action here and decide to withdraw or re-install ).
Now imagine if the order of routes received for processing on the
metaQ is [ A, B ]. A will be received, processed and sent to the
dataplane for reinstall. B will then be pulled off the metaQ and
fail the install since A is in a `not Installed` state.
Let's loosen the restriction in nexthop resolution for B such
that if the route we are dependent on is a route replace operation
allow the resolution to suceed. This requires zebra to track a new
route state( ROUTE_ENTRY_ROUTE_REPLACING ) that can be looked at
during nexthop resolution. I believe this is ok because A is
a route replace operation, which could result in this:
-route install failed, in which case B should be nht'ing and
will receive the nht failure and the upper level protocol should
remove B.
-route install succeeded, no nexthop changes. In this case
allowing the resolution for B is ok, NHT will not notify the upper
level protocol so no action is needed.
-route install succeeded, nexthops changes. In this case
allowing the resolution for B is ok, NHT will notify the upper
level protocol and it can decide to reinstall B or not based
upon it's own algorithm.
This set of events was found by the bgp_distance_change topotest(s).
Effectively the tests were looking for the bug ( A, B order in the metaQ )
as the `correct` state. When under very heavy load, the A, B ordering
caused A to just be installed and fully resolved in the dataplane before
B is gotten to( which is entirely possible ).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fix issue#11996.
When removing VRF ( all routes of this VRF), zebra mistakenly forgot to check
whether its routes are in update queue of FPM. So FPM module will crash during
its dealing with these routes, which are already freed.
Add a new HOOK `rib_shutdown()`, `zebra_rtable_node_cleanup()` will use it
to remove these routes from update queue of FPM module before freeing them.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Currently if an operator does this operation:
sharpd@eva ~/frr8> sudo ip nexthop add id 5000 via 192.168.119.44 dev enp39s0 ; sudo ip route add 10.0.0.1 nhid 5000
2022/06/30 08:52:40 ZEBRA: [ZHQK5-J9M1R] proto2zebra: Please add this protocol(0) to proper rt_netlink.c handling
2022/06/30 08:52:40 ZEBRA: [PS16P-365FK][EC 4043309076] Zebra failed to find the nexthop hash entry for id=5000 in a route entry
sharpd@eva ~/frr8> vtysh -c "show ip route 10.0.0.1"
Routing entry for 0.0.0.0/0
Known via "kernel", distance 0, metric 100, best
Last update 00:01:58 ago
* 192.168.119.1, via enp39s0
The route is dropped by zebra with no warnings. This is not good,
but unlikely to happen at this point in time. In order to fix
this issue route processing from inputs needs to happen after nexthop
group processing from inputs. This was not possible because
nexthop groups are placed on the metaQ. As such the above
nexthop group creation is placed on the metaQ for processing
in META_QUEUE_NHG. Then the route is read in and processed
immediately. The nexthop group is not found ( not processed yet!)
and the route is dropped in zebra.
Modify the code to have early route processing of validity
on the MetaQ. This preserves the order of operations.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Convert label processing that comes from zapi messages
into being handled by the meta-Q. This is because early
route processing is going to be moved to the meta-Q as
well and we will have a chicken and egg problem without
moving this code to be processed by the meta-Q.
Ordering of messages from ospf as an example:
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:48] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:48] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:48] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:48] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:62] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:43] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:47] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:47] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:47] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:47] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:61] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:47] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_ROUTE_ADD:0:47] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_MPLS_LABELS_REPLACE:0:47] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_MPLS_LABELS_REPLACE:0:66] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_MPLS_LABELS_REPLACE:0:47] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_MPLS_LABELS_REPLACE:0:47] comes from socket [36]
2022/08/09 08:55:52.740 ZEBRA: [YXG8K-BCYMV] zebra message[ZEBRA_MPLS_LABELS_REPLACE:0:47] comes from socket [36]
The ZEBRA_MPLS_LABELS_REPLACE immediately turn around and attempt to replace nexthop labels on routes that
were added. If the route add is placed on the metaQ, it will not exist yet and as such the label replace
will fail.
Modify the zebra code to take the label operations and place them on the metaQ as well.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Move a few things into places they actually belong, and reduce the
number of places we have `#ifdev HAVE_RTADV`. Just overall code
prettification.
... I had actually done this quite a while ago while doing some other
random hacking and thought it more useful to not be sitting on it on my
disk...
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
When using wait for install there exists situations where
zebra will issue several route change operations to the kernel
but end up in a state where we shouldn't be at the end
due to extra data being received. Example:
a) zebra receives from bgp a route change, installs sends the
route to the kernel.
b) zebra receives a route deletion from bgp, removes the
struct route entry and then sends to the kernel a deletion.
c) zebra receives an asynchronous notification that (a) succeeded
but we treat this as a new route.
This is the ships in the night problem. In this case if we receive
notification from the kernel about a route that we know nothing
about and we are not in startup and we are doing asic offload
then we can ignore this update.
Ticket: #2563300
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Current code treats all metaqueues as lists of route_node structures.
However, some queues contain other structures that need to be cleaned up
differently. Casting the elements of those queues to struct route_node
and dereferencing them leads to a crash. The crash may be seen when
executing bgp_multi_vrf_topo2.
Fix the code by using the proper list element types.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The name 'opaque' is a little general - call the route_entry
struct 're_opaque' to make it more specific.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
Pass in the route_node that is under consideration
into route_notify_internal to allow calling functions
to reduce stack size as well as looking up data.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Topology:
IXIA-----(ens192)FRR(ens224)------iXIA
Configuration:
1. Create 8 sub-interfaces on ens192 under Default VRF and configure 8
EBGP session between FRR and IXIA.
2. Create 1000 sub-interfaces on ens224 under Default VRF and configure
1000 EBGP session between FRR and IXIA.
3. 2M prefixes distributed from Left side Ixia each with 8 ECMP path.
4. So in total, there are 2M prefixes * 8 ECMP = 16M prefixes entries
in RIB and FIB.
Issue:
Shut ens192 and ens224, this is taking 1hr 15 mins to clean up the routes.
Root Cause:
In the case of route deletion, if the particular route node is having
nht count = 0, we are going to the parent and doing nht evaluation,
which is not needed.
Fix:
If the deleted the route node is having nht count > 0, then do a nht
evaluation on the parent node.
Shut ens192 and ens224, it is taking 1 min to clean up the routes
with the fix.
Signed-off-by: Sarita Patra <saritap@vmware.com>
In some cases, zebra may install a nexthop-group id that is
different from the id of the nhe struct attached to a
route-entry. This happens for a singleton recursive nexthop,
for example, where a route is installed with the resolving
nexthop's id.
The installed value is the most useful value - that corresponds
to information in the kernel on linux/netlink platforms that
support nhgs. Display both values if they differ in ascii
output, and include both values in the json form.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
PIM is going to need to be able to send down the address it is
trying to resolve in the multicast rib. We need a way to signal
this to the end developer. Start the conversion by adding the
ability to have a safi. But only allow SAFI_UNICAST at the moment.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The entirety of the import checking no longer needs to be
in zebra as that no-one is calling it. Remove the code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When calling rib_add_multipath_nhe ensure that we have
well aligned return codes that mean something so that
interersted parties can properly handle the situation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Move remote VTEP updates from immediate, inline processing
in their ZAPI message handlers to the main workqueue.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Enqueue incoming vxlan remote macip updates on the main
workqueue, instead of performing the updates immediately,
in-line.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add workqueue subqueue for EVPN/VxLAN updates; migrate the
evpn route and remote ES processing from their ZAPI handlers
to the workqueue.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Rework RA handling for vrf-lite scenarios.
Before we were using a single FD descriptor for polling
across multiple zvrf's. This would cause us to hit this
assert() in some bgp unnumbered and vrrp configs:
```
/*
* What happens if we have a thread already
* created for this event?
*/
if (thread_array[fd])
assert(!"Thread already scheduled for file descriptor");
```
We were scheduling a thread_read on the same FD for every zvrf.
With vrf-lite, RAs and ARPs are not vrf-bound, so we can just use one
rtadv instance to manage them for all VRFs. We will choose the default
VRF for this.
This patch removes the rtadv_sock altogether for zrouter and moves the
functionality this represented to the default VRF. All RAs will be
handled in the default VRF under vrf-lite configs with only one poll
thread started for it.
This patch also extends how we track subscribed interfaces (s or msec)
to use an actual sorted list by interface names rather than just a
counter. With multiple daemons turning interfaces/on/off these counters
can get very wrong during ifup/down events. Making them a sorted list
prevents this from happening by preventing duplicates.
With netns-vrf's nothing should change other than the interface list.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Use the main zebra workqueue for daemon-owned NHGs, in addition
to processing kernel-owned NHGs. The zapi message processing
creates a temporary object that's enqueued to the workqueue,
then processed/installed as part of the workqueue processing.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This one also needed a bit of shuffling around, but MTYPE_RE is the only
one left used across file boundaries now.
Signed-off-by: David Lamparter <equinox@diac24.net>
When we need to cause a reprocessing of data the code currently
marks all routes as needing to be looked at. Modify the
rib_update_table code to allow us to specify a specific route
type we only want to reprocess. At this point none
of the code is behaving differently this is just setup
for a future code change.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
in rib_handle_nhg_replace, do not use new as a parameter name to
allow compilation of c++ code including zebra headers.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>