Commit Graph

6046 Commits

Author SHA1 Message Date
Mark Stapp
960462aade
Merge pull request #16960 from donaldsharp/zebra_nhg_startup_issue
zebra: On startup actually allow for nhe's to be early
2024-11-04 11:49:30 -05:00
Carmine Scarpitta
afd9d3f924 zebra: Fix wrong debug macro in release_srv6_sid_func_dynamic
`ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not.

The correct macro is `IS_ZEBRA_DEBUG_SRV6`.

Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
2024-11-03 08:45:03 +01:00
Carmine Scarpitta
69b1acf4e3 zebra: Fix wrong debug macro in release_srv6_sid_func_explicit
`ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not.

The correct macro is `IS_ZEBRA_DEBUG_SRV6`.

Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
2024-11-03 08:44:41 +01:00
Carmine Scarpitta
d1810d5c7f zebra: Fix wrong debug macro in alloc_srv6_sid_func_dynamic
`ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not.

The correct macro is `IS_ZEBRA_DEBUG_SRV6`.

Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
2024-11-03 08:44:23 +01:00
Carmine Scarpitta
58fd136f44 zebra: Fix wrong debug macro in alloc_srv6_sid_func_explicit
`ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not.

The correct macro is `IS_ZEBRA_DEBUG_SRV6`.

Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
2024-11-03 08:43:55 +01:00
Carmine Scarpitta
a56e790b07 zebra: Fix wrong debug macro in release_srv6_sid_func_dynamic
`ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not.

The correct macro is `IS_ZEBRA_DEBUG_SRV6`.

Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
2024-11-03 08:43:38 +01:00
Carmine Scarpitta
8e96fcece2 zebra: Fix wrong debug macro in release_srv6_sid_func_explicit
`ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not.

The correct macro is `IS_ZEBRA_DEBUG_SRV6`.

Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
2024-11-03 08:43:17 +01:00
Carmine Scarpitta
4710baa7bb zebra: Fix wrong debug macro in alloc_srv6_sid_func_dynamic
`ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not.

The correct macro is `IS_ZEBRA_DEBUG_SRV6`.

Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
2024-11-03 08:42:58 +01:00
Carmine Scarpitta
973c4750e1 zebra: Fix wrong debug macro in alloc_srv6_sid_func_explicit
`ZEBRA_DEBUG_SRV6` is not the correct macro to evaluate if SRv6 debug is enabled or not.

The correct macro is `IS_ZEBRA_DEBUG_SRV6`.

Fix this by replacing `ZEBRA_DEBUG_SRV6` with `IS_ZEBRA_DEBUG_SRV6`.

Signed-off-by: Carmine Scarpitta <cscarpit@cisco.com>
2024-11-03 08:42:36 +01:00
Donald Sharp
9e74dda819 zebra: Delay some processing until after startup is finished
Currently zebra starts the graceful restart timer as well as
allows connections from clients before all data is read in
from the kernel as well as the possiblity of allowing client
connections before this happens as well.

Let's move the graceful restart timer start till after this is
done as well as not allowing client connections till then as well.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-11-01 14:43:50 -04:00
Jafar Al-Gharaibeh
248ee22b9d
Merge pull request #17230 from donaldsharp/clang_19_some_more
Clang 19 some more
2024-11-01 09:02:31 -05:00
Donald Sharp
66feece071
Merge pull request #17281 from nabahr/mrib-import
Add support to import alternate URIB tables into the main MRIB
2024-10-31 13:28:57 -04:00
Donatas Abraitis
25ae643996 zebra: Add missing new line for help string
```
  -A, --asic-offload        FRR is interacting with an asic underneath the linux kernel
      --v6-with-v4-nexthops Underlying dataplane supports v6 routes with v4 nexthops  -s, --nl-bufsize          Set netlink receive buffer size
```

Fixes: 1f5611c06d ("zebra: Allow zebra cli to accept v6 routes with v4 nexthops")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-10-31 10:47:48 +02:00
Nathan Bahr
be2a7ed6af zebra: Add ability to import alternate tables into the MRIB
Expanded the cli command to include an mrib flag for importing to
the main table MRIB instead of the main table URIB.
Piped through specifying the safi through the import table functions
rather than hardcoding to SAFI_UNICAST.
Import still only import routes from the URIB subtable, only added the
ability to import into the main table MRIB.

Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
2024-10-29 20:17:59 +00:00
Donald Sharp
5f6200d334 zebra: Deconfuse clang-sa about possible NULL pointer
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-29 16:05:15 -04:00
Donatas Abraitis
60f77de79d
Merge pull request #17257 from pguibert6WIND/srv6_debug
zebra: add 'debug zebra srv6' command
2024-10-29 10:32:20 +02:00
Donald Sharp
3bff65abc7 zebra: When installing a mroute, allow it to flow
Currently the mroute code was not allowing the mroute
to be sent to the dataplane.  This leaves us with a
situation where the routes being installed where never
being set as installed and additionally nht against
the mrib would not work if the route came into existence
after the nexthop tracking was asked for.

Turns out all the pieces where there to let this work.
Modify the code to pass it to the dplane and to send
it back up as having worked.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-28 15:02:39 -04:00
Donald Sharp
811168ecc3 zebra: Add safi to some debugs
Trying to figure out what safi we are talking about is fun when
it is not put into the debugs.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-28 14:29:31 -04:00
Philippe Guibert
a7fec9c387 zebra: add 'debug zebra srv6' command
Add a specific debug command to handle srv6 troubleshooting.
Move the srv6 traces that initially were under 'debug zebra packet'
debug.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-10-28 16:40:33 +01:00
Jafar Al-Gharaibeh
f11421d4ec
Merge pull request #17160 from opensourcerouting/fix/keep_zebra_on-rib-process_in_frr.conf
lib, zebra: Keep `zebra on-rib-process script` in frr.conf
2024-10-27 18:23:36 -05:00
Donald Sharp
138935a5fd bgpd: Fix wrong pthread event cancelling
0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44
1  __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78
2  __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
3  0x000076e399e42476 in __GI_raise (sig=6) at ../sysdeps/posix/raise.c:26
4  0x000076e39a34f950 in core_handler (signo=6, siginfo=0x76e3985fca30, context=0x76e3985fc900) at lib/sigevent.c:258
5  <signal handler called>
6  __pthread_kill_implementation (no_tid=0, signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:44
7  __pthread_kill_internal (signo=6, threadid=130719886083648) at ./nptl/pthread_kill.c:78
8  __GI___pthread_kill (threadid=130719886083648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
9  0x000076e399e42476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
10 0x000076e399e287f3 in __GI_abort () at ./stdlib/abort.c:79
11 0x000076e39a39874b in _zlog_assert_failed (xref=0x76e39a46cca0 <_xref.27>, extra=0x0) at lib/zlog.c:789
12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428
13 0x000076e39a369ef6 in event_cancel_event_ready (m=0x5eda32df5e40, arg=0x5eda33afeed0) at lib/event.c:1470
14 0x00005eda0a94a5b3 in bgp_stop (connection=0x5eda33afeed0) at bgpd/bgp_fsm.c:1355
15 0x00005eda0a94b4ae in bgp_stop_with_notify (connection=0x5eda33afeed0, code=8 '\b', sub_code=0 '\000') at bgpd/bgp_fsm.c:1610
16 0x00005eda0a979498 in bgp_packet_add (connection=0x5eda33afeed0, peer=0x5eda33b11800, s=0x76e3880daf90) at bgpd/bgp_packet.c:152
17 0x00005eda0a97a80f in bgp_keepalive_send (peer=0x5eda33b11800) at bgpd/bgp_packet.c:639
18 0x00005eda0a9511fd in peer_process (hb=0x5eda33c9ab80, arg=0x76e3985ffaf0) at bgpd/bgp_keepalives.c:111
19 0x000076e39a2cd8e6 in hash_iterate (hash=0x76e388000be0, func=0x5eda0a95105e <peer_process>, arg=0x76e3985ffaf0) at lib/hash.c:252
20 0x00005eda0a951679 in bgp_keepalives_start (arg=0x5eda3306af80) at bgpd/bgp_keepalives.c:214
21 0x000076e39a2c9932 in frr_pthread_inner (arg=0x5eda3306af80) at lib/frr_pthread.c:180
22 0x000076e399e94ac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
23 0x000076e399f26850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) f 12
12 0x000076e39a369dde in cancel_event_helper (m=0x5eda32df5e40, arg=0x5eda33afeed0, flags=1) at lib/event.c:1428
1428		assert(m->owner == pthread_self());

In this decode the attempt to cancel the connection's events from
the wrong thread is causing the crash.  Modify the code to create an
event on the bm->master to cancel the events for the connection.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-24 21:01:26 -04:00
Donatas Abraitis
91e157f3ae
Merge pull request #17162 from louis-6wind/fix-bh-nh-vrf
zebra: fix showing nexthop vrf for ipv6 blackhole
2024-10-23 17:34:44 +03:00
Jafar Al-Gharaibeh
0078472e19
Merge pull request #17180 from anlancs/zebra/review-move-dplane
zebra: drop NEWLINK event handling in the main thread
2024-10-22 10:29:49 -05:00
anlan_cs
96192f6aee zebra: drop NEWLINK event handling in the main thread
NEWLINK is only registered by the dplane thread, the main thread
doesn't care about it. So remove the real process of `netlink_link_change()`
for NEWLINK event in main thread.

And move NEWLINK/DELLINK event to the block where the dplane messages
are kept together.

Signed-off-by: anlan_cs <anlan_cs@126.com>
2024-10-22 09:05:00 +08:00
anlan_cs
5829fea1b5 zebra: remove useless code
Signed-off-by: anlan_cs <anlan_cs@126.com>
2024-10-19 13:32:53 +08:00
Louis Scalbert
6cdc82b21b zebra: fix showing nexthop vrf for ipv6 blackhole
For some reasons the Linux kernel associates the ipv6 blackhole of non
default table the lo interface.

> root@r1# ip -6 route show table 100
> root@r1# ip -6 route add unreachable default metric 4278198272 table 100
> root@r1# ip -6 route show table 100
> unreachable default dev lo metric 4278198272 pref medium

As a consequence, the VRF default that owns the lo interface is shown as
the nexthop VRF:

> r1# show ipv6 route table 20
> Table 20:
> K>* ::/0 [255/8192] unreachable (ICMP unreachable) (vrf default), 00:18:12

Do not display the nexthop VRF of a blackhole. It does not make sense
for a blackhole and it was not displayed in the past.

Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-10-18 14:45:50 +02:00
Donatas Abraitis
1fe1f8d87c lib, zebra: Keep zebra on-rib-process script in frr.conf
After the change:

```
$ grep on-rib-process /etc/frr/frr.conf
zebra on-rib-process script script4

$ systemctl restart frr

$ vtysh -c 'show run' | grep on-rib-process
zebra on-rib-process script script4
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-10-18 15:36:52 +03:00
Donald Sharp
5a2a9e3b89 zebra: Fix possible null deref discovered by coverity
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-17 07:42:47 -04:00
Donald Sharp
466efab870
Merge pull request #17136 from opensourcerouting/clang-sa-19
*: fix clang-19 SA
2024-10-17 07:38:28 -04:00
Donatas Abraitis
1ce225d7e4
Merge pull request #17076 from donaldsharp/rnh_and_redistribution_nexthop_num_fix
*: Fix up improper handling of nexthops for nexthop tracking
2024-10-16 16:34:08 +03:00
Donald Sharp
cc63dbb68f
Merge pull request #17020 from pguibert6WIND/asan_shutdown
zebra: fix heap-use-after free on ns shutdown
2024-10-16 09:15:06 -04:00
David Lamparter
e6cb1a90f2 zebra: check dirfd() result
`dirfd()` can theoretically return an error.  Call it once and check the
result.

clang-SA: technically correct™.  Ain't that the best kind of correct?

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2024-10-16 13:30:25 +02:00
David Lamparter
67b0a457ed zebra: don't misappropriate errno
`errno` has its own semantics.  Sometimes it is correct to write to it.
This is not one of those cases - just use a separate `nl_errno`.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2024-10-16 13:30:25 +02:00
David Lamparter
1350f8d1c1 zebra: don't try to read past EOF
`FILE *` objects are theoretically in an invalid state if you try to use
them past their reporting EOF.  Adjust the code to make it correct.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2024-10-16 13:30:25 +02:00
David Lamparter
c071b4370d *: clang-SA switch-enum initializer workarounds
In these cases the value assigned by the switch block is used directly
rather than returned.  Mark the initial/default value as used so
clang-SA doesn't complain about it.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2024-10-16 13:30:25 +02:00
David Lamparter
49cf311d46 *: clang-SA friendly switch-enum-return-string
clang-19's SA complains about unused initializers for this kind of
"switch (enum) { return string }" kind of code.  Use direct string
return values to avoid the issue.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2024-10-16 13:00:11 +02:00
Donatas Abraitis
c32bdc2469
Merge pull request #17116 from enkechen-panw/zfix-2
zebra: unlock node only after operation in zebra_free_rnh()
2024-10-16 08:12:28 +03:00
Jafar Al-Gharaibeh
b2eaf86fb5
Merge pull request #15586 from donaldsharp/nht_explain_doc
zebra: Attempt to explain the rnh tracking code better
2024-10-15 14:25:35 -05:00
Jafar Al-Gharaibeh
b23bbb885a
Merge pull request #17088 from donaldsharp/connected_kernel_fun
zebra: Prevent a kernel route from being there when a connected should
2024-10-15 14:04:51 -05:00
Enke Chen
5b6ff51b8a zebra: unlock node only after operation in zebra_free_rnh()
Move route_unlock_node() after rnh_list_del().

Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
2024-10-15 10:25:46 -07:00
Donald Sharp
28237d73ad zebra: Attempt to explain the rnh tracking code better
I got asked today what was going on in the rnh code.  I
had to take time off of what I was doing and rewrap my
head around this code, since it's been a long time.
As that this question may come up again in the future
I am trying to document this better so that someone
coming behind us will be able to read this and get
a better idea of what the algorithm is attempting
to do.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-15 12:42:17 -04:00
Donald Sharp
645a9e4f83 *: Fix up improper handling of nexthops for nexthop tracking
Currently FRR needs to send a uint16_t value for the number
of nexthops as well it needs the ability to properly decode
all of this.  Find and handle all the places that this happens.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-15 11:57:23 -04:00
Mark Stapp
6c1bc51bbb
Merge pull request #16737 from raja-rajasekar/rajasekarr/vlan_to_dplane
zebra: vlan to dplane
2024-10-15 08:06:34 -04:00
Donald Sharp
74e25198e7 zebra: Prevent a kernel route from being there when a connected should
There exists a series of events where a kernel route is learned
first( that happens to be exactly what a connected route should be )
and FRR ends up with both a kernel route and a connected route,
leaving us in a very strange spot.  This code change just mirrors
the existing code of if there is a connected route drop the kernel
route.  Here we just do the reverse, if we have a kernel route
already and a connected should be created, remove the kernel and
keep the connected.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-14 11:27:53 -04:00
Donatas Abraitis
d1433ee9a8
Merge pull request #17062 from donaldsharp/dplane_fpm_nl_problems
zebra: Only notify dplane work pthread when needed
2024-10-14 08:14:34 +03:00
anlan_cs
05e2472de7 zebra: add back one field for debug
The `flags` field is removed recently, so add back it for debug.

Signed-off-by: anlan_cs <anlan_cs@126.com>
2024-10-13 21:30:46 +08:00
Donald Sharp
8aa97a439f zebra: Slow down fpm_process_queue
When the fpm_process_queue has run out of space
but has written to the fpm output buffer, schedule
it to wake up immediately, as that the write will go out
pretty much immediately, since it was scheduled first.
If the fpm_process_queue has not written to the output
buffer then delay the processing by 10 milliseconds to
allow a possibly backed up write processing to have a
chance to complete it's work.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-11 09:37:37 -04:00
Donald Sharp
963792e8c5 zebra: Only notify dplane work pthread when needed
The fpm_nl_process function was getting the count
of the total number of ctx's processed.  This leads
to after having processed 1 context to always signal
the dataplane that there is work to do.  Change the
code to only notify the dplane worker when a context
was actually added to the outgoing context queue.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-11 09:37:37 -04:00
Donald Sharp
154a89bc31 zebra: Fix crash in pw code
Recent PR #17009 introduced a crash in pw handing
for deletion.  Let's fix that problem.

Fixes: #17041
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-10-09 07:17:29 -04:00
Philippe Guibert
7ae70eb5ef zebra: fix heap-use-after free on ns shutdown
The following ASAN issue has been observed:

> ERROR: AddressSanitizer: heap-use-after-free on address 0x6160000acba4 at pc 0x55910c5694d0 bp 0x7ffe3a8ac850 sp 0x7ffe3a8ac840
> READ of size 4 at 0x6160000acba4 thread T0
>         #0 0x55910c5694cf in ctx_info_from_zns zebra/zebra_dplane.c:3315
>     #1 0x55910c569696 in dplane_ctx_ns_init zebra/zebra_dplane.c:3331
>     #2 0x55910c56bf61 in dplane_ctx_nexthop_init zebra/zebra_dplane.c:3680
>     #3 0x55910c5711ca in dplane_nexthop_update_internal zebra/zebra_dplane.c:4490
>     #4 0x55910c571c5c in dplane_nexthop_delete zebra/zebra_dplane.c:4717
>     #5 0x55910c61e90e in zebra_nhg_uninstall_kernel zebra/zebra_nhg.c:3413
>     #6 0x55910c615d8a in zebra_nhg_decrement_ref zebra/zebra_nhg.c:1919
>     #7 0x55910c6404db in route_entry_update_nhe zebra/zebra_rib.c:454
>     #8 0x55910c64c904 in rib_re_nhg_free zebra/zebra_rib.c:2822
>     #9 0x55910c655be2 in rib_unlink zebra/zebra_rib.c:4212
>     #10 0x55910c6430f9 in zebra_rtable_node_cleanup zebra/zebra_rib.c:968
>     #11 0x7f26f275b8a9 in route_node_free lib/table.c:75
>     #12 0x7f26f275bae4 in route_table_free lib/table.c:111
>     #13 0x7f26f275b749 in route_table_finish lib/table.c:46
>     #14 0x55910c65db17 in zebra_router_free_table zebra/zebra_router.c:191
>     #15 0x55910c65dfb5 in zebra_router_terminate zebra/zebra_router.c:244
>     #16 0x55910c4f40db in zebra_finalize zebra/main.c:249
>     #17 0x7f26f2777108 in event_call lib/event.c:2011
>     #18 0x7f26f264180e in frr_run lib/libfrr.c:1212
>     #19 0x55910c4f49cb in main zebra/main.c:531
>     #20 0x7f26f2029d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
>     #21 0x7f26f2029e3f in __libc_start_main_impl ../csu/libc-start.c:392
>     #22 0x55910c4b0114 in _start (/usr/lib/frr/zebra+0x1ae114)

It happens with FRR using the kernel. During shutdown, the
namespace identifier is attempted to be obtained by zebra, in an
attempt to prepare zebra dataplane nexthop messages.

Fix this by accessing the ns structure.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2024-10-08 22:25:55 +02:00