When an upper level protocol is installing a route X that needs to be
route replaced and at the same time the same or another protocol installs a
different route that depends on route X for nexthop resolution can leave
us with a state where the route is not accepted because zebra is still
really early in the route replace semantics ( route X is still on the work
Queue to be processed ) then the dependent route would not be installed.
This came up in the bgp_default_originate test cases frequently.
Further extendd the ROUTE_ENTR_ROUTE_REPLACING flag to cover this case
as well. This has come up because the early route processing queueing
that was implemented late last year.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently the vrf change procedure for the deleted interface is after
its deletion, it causes problem for upper daemons.
Here is the problem of `bgp`:
After deletion of one **irrelevant** interface in the same vrf, its
`ifindex` is set to 0. And then, the vrf change procedure will send
"ZEBRA_INTERFACE_DOWN" to `bgpd`.
Normally, `bgp_nht_ifp_table_handle()` should igore this message for
no correlation. However, it wrongly matched `ifindex` of 0, and removed
the related routes for the down `bnc`.
Adjust the location of the vrf change procedure to fix this issue.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
The default vrf is generally non-NULL, except when shutdown. So, most
of the time it is not necessary to check if it is NULL, we should
remove the useless checks for it.
Searched them with exact match:
```
grep -rI "zebra_vrf_lookup_by_id(VRF_DEFAULT)" | wc -l
31
```
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Adjust one debug info, separate the ip address from it. Just like it is processed
in `redistribute_update()`.
Before:
```
34:1375.75.75.75/32: Redist del: re 0x55c1112067e0 (0:static), new re 0x55c1112de7c0 (0:static)
```
After:
```
(34:13):75.75.75.75/32: Redist del: re 0x55c1112067e0 (0:static), new re 0x55c1112de7c0 (0:static)
```
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Treat NHRP-installed routes as valid, as if they were
CONNECTED routes, when checking candidate routes'
nexthops for validity. This allows use of NHRP by an
IGP, for example, that doesn't normally want recursive
nexthop resolution.
Signed-off-by: Mark Stapp <mjs@labn.net>
Code is looking up the nlsock to generate the batch messages
and then looking it up again to get the response. Let's
just look it up one time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The mpls configuration does not work when an interface is
created after having applied the frr configuration. The
below scenario illustrates:
> root@dut:~# modprobe mpls
> root@dut:~# zebra &
> [..]
> dut(config)# interface ifacenotcreated
> dut(config-if)# mpls enable
> dut(config-if)# Ctrl-D
> root@dut:~# ip li show ifacenotcreated
> Device "ifacenotcreated" does not exist.
> root@dut:~# ip li add ifacenotcreated type dummy
> 0
Fix this by forcing the mpls flag when the interface is detected.
> root@dut:~# cat /proc/sys/net/mpls/conf/ifacenotcreat/input
> 1
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When `dplane_fpm_nl` receives a route, it allocates memory for a dplane
context and calls `netlink_route_change_read_unicast_internal` without
initializing the `intf_extra_list` contained in the dplane context. If
`netlink_route_change_read_unicast_internal` is not able to process the
route, we call `dplane_ctx_fini` to free the dplane context. This causes
a crash because `dplane_ctx_fini` attempts to access the intf_extra_list
which is not initialized.
To solve this issue, we can call `dplane_ctx_route_init`to initialize
the dplane route context properly, just after the dplane context
allocation.
(gdb) bt
#0 0x0000555dd5ceae80 in dplane_intf_extra_list_pop (h=0x7fae1c007e68) at ../zebra/zebra_dplane.c:427
#1 dplane_ctx_free_internal (ctx=0x7fae1c0074b0) at ../zebra/zebra_dplane.c:724
#2 0x0000555dd5cebc99 in dplane_ctx_free (pctx=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:869
#3 dplane_ctx_free (pctx=0x7fae2aa88c98, pctx@entry=0x7fae2aa78c28) at ../zebra/zebra_dplane.c:855
#4 dplane_ctx_fini (pctx=pctx@entry=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:890
#5 0x00007fae31e93f29 in fpm_read (t=) at ../zebra/dplane_fpm_nl.c:605
#6 0x00007fae325191dd in thread_call (thread=thread@entry=0x7fae2aa98da0) at ../lib/thread.c:2006
#7 0x00007fae324c42b8 in fpt_run (arg=0x555dd74777c0) at ../lib/frr_pthread.c:309
#8 0x00007fae32405ea7 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007fae32325a2f in clone () from /lib/x86_64-linux-gnu/libc.so.6
Fixes: #13754
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
The function `dplane_ctx_route_init` initializes a dplane route context
from the route object passed as an argument. Let's abstract this
function to allow initializing the dplane route context without actually
copying a route object.
This allows us to use this function for initializing a dplane route
context when we don't have any route to copy in it.
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
a) Move the reads of link and address information
into the dplane
b) Move the startup read of data into the dplane
as well.
c) Break up startup reading of the linux kernel data
into multiple phases. As that we have implied ordering
of data that must be read first and if the dplane has
taken over some data reading then we must delay initial
read-in of other data.
Fixes: #13288
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
1) Add a bunch of get/set functions and associated data
structure in zebra_dplane to allow the setting and retrieval
of interface netlink data up into the master pthread.
2) Add a bit of code to breakup startup into stages. This is
because FRR currently has a mix of dplane and non dplane interactions
and the code needs to be paused before continuing on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Turns out FRR has 2 functions one specifically for startup
and one for normal day to day operations. There were only
a couple of minor differences from what I could tell, and
where they were different the after startup functionality should
have been updated too. I cannot figure out why we have 2.
Non-startup handling of bonds appears to be incorrect
so let's fix that. Additionally the speed was not
properly being set in non-startup situations.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Since we are moving some code handling out of the dataplane
and into zebra proper, lets move the protodown r bit as well.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Rename the vrf_lookup_by_id function to zebra_vrf_lookup_by_id
and move to zebra_vrf.c where it nominally belongs, as that
we need zebra specific data to find this vrf_id and as such
it does not belong in vrf.c
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When changing one interface's vrf, the kernel routes are wrongly kept
in old vrf. Finally, the forwarding table in that old vrf can't forward
traffic correctly for those residual entries.
Follow these steps to make this problem happen:
( Firstly, "x1" interface of default vrf is with address of "6.6.6.6/24". )
```
anlan# ip route add 4.4.4.0/24 via 6.6.6.8 dev x1
anlan# ip link add vrf1 type vrf table 1
anlan# ip link set vrf1 up
anlan# ip link set x1 master vrf1
```
Then check `show ip route`, the route of "4.4.4.0/24" is still selected
in default vrf.
If the interface goes down, the kernel routes will be reevaluated. Those
kernel routes with active interface of nexthop can be kept no change, it
is a fast path. Otherwise, it enters into slow path to do careful examination
on this nexthop.
After the interface's vrf had been changed into new vrf, the down message of
this interface came. It means the interface is not in old vrf although it
still exists during that checking, so the kernel routes should be dropped
after this nexthop matching against a default route in slow path. But, in
current code they are wrongly kept in fast path for not checking vrf.
So, modified the checking active nexthop with vrf comparision for the interface
during reevaluation.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
There are relaxed nexthop requirements for kernel routes because we
trust kernel routes.
Two minor changes for kernel routes:
1. `if_is_up()` is one of the necessary conditions for `if_is_operative()`.
Here, we can remove this unnecessary check for clarity.
2. Since `nexthop_active()` doesn't distinguish whether it is kernel route,
modified the corresponding comment in it.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
When using asic_offload with an asynchronous notification the
rib_route_match_ctx function is testing for distance and tag
being correct against the re.
Normal route notification for static routes is this(well really all routes):
a) zebra dplane generates a ctx to send to the dplane for route install
b) dplane installs it in the kernel
c) if the dplane_fpm_nl.c module is being used it installs it.
d) The context's success code is set to it worked and passes the context
back up to zebra for processing.
e) Zebra master receives this and checks the distance and tag are correct
for static routes and accepts the route and marks it installed.
If the operator is using a wait for install mechansim where the dplane
is asynchronously sending the result back up at a future time *and*
it is using the dplane_fpm_nl.c code where it uses the rt_netlink.c
route parsing code, then there is no way to set distance as that we
do not pass distance to the kernel.
As such static routes were never being properly handled since the re and
context would not match and the route would still be marked as queued.
Modify the code such that the asynchronous path notification for static
routes ignores the distance and tag's as that there is no way to test
for this data from that path at this point in time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
EVPN RMAC (Router MAC) nexthop list compare
function needs to return all values so
the list element can be compared and added/deleted
properly.
Ticket:#3486989
Testing Done:
Originate EVPN Type-5 route with PIP IP and MAC as remote
nexthops.
Change the PIP IP address which triggers nexthop change.
Before fix:
When PIP IP changes RMAC is deleted from remote VTEPs.
TORS1# show evpn next-hops vni 4001 | include 00:02:00:00:00:2d
27.0.0.11 00:02:00:00:00:2d
TORS1# show evpn rmac vni 4001 | include 00:02:00:00:00:2d
00:02:00:00:00:2d 27.0.0.11
----- Remote VTEP change nexthop IP to 172.16.16.16 -----
TORS1# show evpn next-hops vni 4001 | include 00:02:00:00:00:2d
172.16.16.16 00:02:00:00:00:2d
TORS1# show evpn rmac vni 4001 | include 00:02:00:00:00:2d
TORS1#
After fix:
RMAC is retained as its nexthop list is not empty,
thus it is not deleted from remote VTEPs.
TORS1# show evpn rmac vni 4001 | include 00:02:00:00:00:2d
00:02:00:00:00:2d 172.16.16.16
Log:
2023/06/27 00:50:36.833474 ZEBRA: [XREH0-ZYMH6] L3VNI 4001 Remote VTEP
change(27.0.0.11 -> 172.16.16.16) for RMAC 00:02:00:00:00:2d
Signed-off-by: Chirag Shah <chirag@nvidia.com>
When running all daemons with config for most of them, FRR has
sharpd@janelle:~/frr$ vtysh -c "show debug hashtable" | grep "VRF BIT HASH" | wc -l
3570
3570 hashes for bitmaps associated with the vrf. This is a very
large number of hashes. Let's do two things:
a) Reduce the created size of the actually created hashes to 2
instead of 32.
b) Delay generation of the hash *until* a set operation happens.
As that no hash directly implies a unset value if/when checked.
This reduces the number of hashes to 61 in my setup for normal
operation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Allow zapi clients to register to be notified when a server
for an opaque message type is present. Zebra maintains these
notification registrations in the same data structures that it
uses for opaque message handling.
Signed-off-by: Mark Stapp <mjs@labn.net>
Include the sending zapi client info (proto, instance, and
session id) in each opaque zapi message. Add opaque 'init'
apis for clients who want to encode their opaque data inline,
into the zclient's internal stream buffer. Use these init apis
in the TE/link-state lib code, instead of hand-coding the
zapi opaque header info.
Signed-off-by: Mark Stapp <mjs@labn.net>
In pbrd, don't encode a rule without a table. There are cases
where the zapi encoding was incorrect because the 4-octet
table id was missing. In zebra, mask off the ECN bits in the
TOS byte when encoding an iprule to match netlink's
expectation.
Signed-off-by: Mark Stapp <mjs@labn.net>