Commit Graph

36718 Commits

Author SHA1 Message Date
Donald Sharp
8ad5643abe zebra: Convince SA that the ng will always be valid
There is a code path that could theoretically get you
to a point where the ng->nexthop is a NULL value.
Let's just make sure the SA system believes that
cannot happen anymore.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-29 18:10:30 -04:00
Richard Cunningham
5f6a61f91f bgpd: Exclude case for remote prefix w/o link-local
If we expand the truth (A || B) to "(A && B) || (A && !B) || (!A && B)"
so that we can isolate the case (!A && B), we then add the additional
check (C) to ensure that original route actually has a link-local hext-hop

Signed-off-by: Richard Cunningham <29760295+cunningr@users.noreply.github.com>
2024-08-29 20:59:14 +01:00
Donald Sharp
c0f4972432
Merge pull request #16687 from Jafaral/isis-test-fix
tests: increase the timeout for packet padding check
2024-08-29 15:10:54 -04:00
Donald Sharp
f90989d52a zebra: Allow blackhole singleton nexthops to be v6
A blackhole nexthop, according to the linux kernel,
can be v4 or v6.  A v4 blackhole nexthop cannot be
used on a v6 route, but a v6 blackhole nexthop can
be used with a v4 route.  Convert all blackhole
singleton nexthops to v6 and just use that.
Possibly reducing the number of active nexthops by 1.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-29 15:06:31 -04:00
Donald Sharp
c10cdcd79a zebra: Display afi of the nexthop hash entry
Let's display the afi of the nexthop hash entry.  Right
now it is impossible to tell the difference between v4 or
v6 nexthops, especially since it is important for the kernel.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-29 14:49:36 -04:00
Jafar Al-Gharaibeh
5752fc86ee tests: increase the timeout for packet padding check
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
2024-08-29 11:40:50 -05:00
Jafar Al-Gharaibeh
77e1a26faa
Merge pull request #16664 from mjstapp/igor_debug_simplify
*: simplify frrlib debug
2024-08-29 11:51:53 -04:00
Jafar Al-Gharaibeh
f1e4ef461d
Merge pull request #16685 from opensourcerouting/fix/document_reverts
doc: Document the git revert flow
2024-08-29 11:39:27 -04:00
Mark Stapp
ac2d9bae5c
Merge pull request #16680 from donaldsharp/route_scale_minor_changes
tests: Fix route-scale at higher ecmp
2024-08-29 08:17:34 -04:00
Donatas Abraitis
28b8c85d68 doc: Document the git revert flow
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-08-29 09:04:28 +03:00
Jafar Al-Gharaibeh
12a3d5a748
Merge pull request #16683 from donaldsharp/test_ospf_netns_vrf_failure
tests: ospf_netns_vrf should give more time for coming up
2024-08-29 01:12:40 -04:00
Jafar Al-Gharaibeh
648566c6fb
Merge pull request #16682 from donaldsharp/bgp_suppress_test
tests: Ensure bgp suppress fib has a chance to transmit data to peer
2024-08-29 01:12:17 -04:00
Jafar Al-Gharaibeh
ffaa365cc4
Merge pull request #16681 from donaldsharp/zebra_re_after_rn
zebra: Move prefix lookup to outside re loop
2024-08-28 23:43:40 -04:00
Jafar Al-Gharaibeh
216ed8c796
Merge pull request #16673 from donaldsharp/default_original_sin
tests: Fix bgp_default_originate_topo1_3
2024-08-28 15:30:12 -04:00
Mark Stapp
79e0c6a2e0
Merge pull request #16672 from raja-rajasekar/vty_out_mem_spike_srujana
lib: Memory spike reduction for sh cmds at scale
2024-08-28 15:29:23 -04:00
Donald Sharp
ce74a6b0a8 tests: Fix route-scale at higher ecmp
Recent commits moved the default retries to 60, but
the higher ecmp counts were over-riding to 40.  Let's
make it 80.

Noticed this when I went looking at failures on 386 platforms
in our ci.  Route scale is timing out when deleting routes.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-28 15:18:24 -04:00
Donald Sharp
d58c44cebe tests: ospf_netns_vrf should give more time for coming up
Test fails:

            test_func = partial(
                topotest.router_json_cmp,
                router,
                "show ip ospf vrf {0}-ospf-cust1 json".format(rname),
                expected,
            )
            _, diff = topotest.run_and_expect(test_func, None, count=10, wait=0.5)
            assertmsg = '"{}" JSON output mismatches'.format(rname)
>           assert diff is None, assertmsg
E           AssertionError: "r1" JSON output mismatches
E           assert Generated JSON diff error report:
E
E             > $->r1-ospf-cust1->areas->0.0.0.0->nbrFullAdjacentCounter: output has element with value '1' but in expected it has value '2'

/home/sharpd/frr2/tests/topotests/ospf_netns_vrf/test_ospf_netns_vrf.py:239: AssertionError

Support bundle has this data:
r1# show ip ospf vrf all neighbor
% 2024/08/28 14:55:54.763

VRF Name: r1-ospf-cust1

Neighbor ID     Pri State           Up Time         Dead Time Address         Interface                        RXmtL RqstL DBsmL
10.0.255.3        1 Full/DR         10.547s           39.456s 10.0.3.1        r1-eth1:10.0.3.2                     0     0     0
10.0.255.2        1 Full/Backup     0.543s            38.378s 10.0.3.3        r1-eth1:10.0.3.2                     1     0     0

So immediately after the test fails this test, the neighbor comes up.
Let's give the test a bit more time for failure to not happen

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-28 15:10:04 -04:00
Donald Sharp
3797454a2a tests: Ensure bgp suppress fib has a chance to transmit data to peer
Giving only 5 seconds to pass bgp data to peers on a heavily
loaded system is a recipe for not having fun.  Add more time.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-28 15:05:40 -04:00
Mark Stapp
be161ba4a2
Merge pull request #16679 from donaldsharp/nhrp_test_documentation
doc: Update topotest doc to include iptables is needed
2024-08-28 14:11:18 -04:00
Mark Stapp
8b23abf36e
Merge pull request #16300 from donaldsharp/local_connected
Local connected
2024-08-28 14:10:14 -04:00
Donald Sharp
184dccca60 zebra: Move prefix lookup to outside re loop
Move the prefix lookup/comparison to outside the re loop
and into the rn loop, since that is where the code should
actually be.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-28 13:18:00 -04:00
Donald Sharp
8e4389da56
Merge pull request #16676 from opensourcerouting/fix/lua_nexthop_handling_in_lua
Lua stack dumping
2024-08-28 12:45:39 -04:00
Jafar Al-Gharaibeh
04b763bcec
Merge pull request #16677 from donaldsharp/mgmt_map
mgmtd: Ensure map is NULL
2024-08-28 12:38:11 -04:00
Donald Sharp
4ec3f1ef0f doc: Update topotest doc to include iptables is needed
The nhrp tests skip tests that do not have iptables installed.
As such we have ended up with a situation where the nrhp test
is now failing locally for me because I have iptables installed
and if the CI system had iptables installed it would have detected
the problem as well.

Let's document that iptables is needed to do testing.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-28 12:05:41 -04:00
Donald Sharp
598d9a1f17 tests: Fix bgp_default_originate_topo1_3
This test was killing bgp on r1 and r2
and then immediately testing that the
default route transitioned.  Unfortunately
the test was written that under load the
system might be in a bad state.  Let's
modify the code to check for a bgp version
change and then that the bgp state has
come back up

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-28 11:09:32 -04:00
Donald Sharp
43508d3c67 mgmtd: Ensure map is NULL
Build is complaining:
build	27-Aug-2024 05:46:38	mgmtd/mgmt_be_adapter.c: In function ‘mgmt_register_client_xpath’:
build	27-Aug-2024 05:46:38	mgmtd/mgmt_be_adapter.c:274:27: warning: ‘maps’ may be used uninitialized [-Wmaybe-uninitialized]
build	27-Aug-2024 05:46:38	  274 |         map = darr_append(*maps);
build	27-Aug-2024 05:46:38	      |                           ^
build	27-Aug-2024 05:46:38	mgmtd/mgmt_be_adapter.c:250:36: note: ‘maps’ was declared here
build	27-Aug-2024 05:46:38	  250 |         struct mgmt_be_xpath_map **maps, *map;
build	27-Aug-2024 05:46:38	      |                                    ^~~~

Let's make the compiler happy, even though there is no problem.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-28 10:09:52 -04:00
Donatas Abraitis
a0a2a35ed3 lib: Add a helper function to dump Lua stack
Very handy for debugging.

In Lua script just use "log.trace(table)":

```
function on_rib_process_dplane_results(ctx)
	log.trace(ctx.rinfo.zd_ng)
end
```

You will get something like:

```
Aug 28 17:04:36 donatas-laptop zebra[3782199]: [GCZ7N-MM9D9] {
                                                 1: {
                                                   type: 2
                                                   weight: 1
                                                   flags: 5
                                                   backup_idx: 0
                                                   vrf_id: 0
                                                   nh_encap_type: 0
                                                   gate: {
                                                     value: 5.87967e+08
                                                     string: "192.168.11.35"
                                                   }
                                                   nh_label_type: 0
                                                   srte_color: 0
                                                   ifindex: 0
                                                   backup_num: 0
                                                 }
                                                 2: {
                                                   type: 3
                                                   weight: 1
                                                   flags: 3
                                                   backup_idx: 0
                                                   vrf_id: 0
                                                   nh_encap_type: 0
                                                   nh_label_type: 0
                                                   srte_color: 0
                                                   ifindex: 4
                                                   backup_num: 0
                                                 }
                                               }
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-08-28 17:08:45 +03:00
Donatas Abraitis
b1012b693f lib: Start from 1, not 0 when creating Lua tables for nexthops
Lua technically enumerates arrays from 1, not 0.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-08-28 15:31:47 +03:00
Srujana
9112fb367b lib: Memory spike reduction for sh cmds at scale
The output buffer vty->obuf is a linked list where
each element is of 4KB.
Currently, when a huge sh command  like <show ip route json>
is executed on a large scale, all the vty_outs are
processed and the entire data is accumulated.
After the entire vty execution, vtysh_flush proceeses
and puts this data in the socket (131KB at a time).

Problem here is the memory spike for such heavy duty
show commands.

The fix here is to chunkify the output on VTY shell by
flushing it intermediately for every 128 KB of output
accumulated and free the memory allocated for the buffer data.

This way, we achieve ~25-30% reduction in the memory spike.

Fixes: #16498
Note: This is a continuation of MR #16498

Signed-off-by: Srujana <skanchisamud@nvidia.com>

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2024-08-27 12:47:00 -07:00
Donatas Abraitis
3d2c589766
Merge pull request #16655 from louis-6wind/fix-bmp-bpi-extra
bgpd: fix labels static-analyser
2024-08-27 22:14:13 +03:00
Donald Sharp
ae49b992ae
Merge pull request #16651 from opensourcerouting/fix/blackhole_community_bgpd
bgpd: Respect BLACKHOLE community for internal BGP peering also
2024-08-27 15:11:00 -04:00
Jafar Al-Gharaibeh
fa7c77f293
Merge pull request #16665 from louis-6wind/fix-flexalgo-crash-no-te
isisd: fix crash at flex-algo without mpls-te
2024-08-27 13:52:50 -04:00
Louis Scalbert
6ce6b7a856 isisd: fix update link params after circuit is up
If the link-params are set when the circuit not yet up, the link-params
are never updated.

isis_link_params_update() is called from isis_circuit_up() but returns
immediately because circuit->state != C_STATE_UP. circuit->state is
updated in isis_csm_state_change after isis_circuit_up().

> struct isis_circuit *isis_csm_state_change(enum isis_circuit_event event,
> 					   struct isis_circuit *circuit,
> 					   void *arg)
> {
> [...]
> 			if (isis_circuit_up(circuit) != ISIS_OK) {
> 				isis_circuit_deconfigure(circuit, area);
> 				break;
> 			}
> 			circuit->state = C_STATE_UP;
> 			isis_event_circuit_state_change(circuit, circuit->area,
> 							1);

Do not return isis_link_params_update() if circuit->state != C_STATE_UP.

Fixes: 0fdd8b2b11 ("isisd: update link params after circuit is up")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-08-27 18:27:06 +02:00
Russ White
3150d963e6
Merge pull request #16652 from opensourcerouting/fix/prefix_sid_handling
bgpd: Filter Prefix-SID, Encap, PMSI Tunnel
2024-08-27 10:57:44 -04:00
Russ White
bd0fdc443e
Merge pull request #16610 from Jafaral/no-py
tools, ospfclient: add a config option to skip installing python scripts
2024-08-27 10:38:09 -04:00
Louis Scalbert
cd81d28ae2 isisd: fix crash at flex-algo without mpls-te
Fix crash when flex-algo is configured and mpls-te is disabled.

> interface eth0
>  ip router isis 1
> !
> router isis 1
>  flex-algo 129
>   dataplane sr-mpls
>   advertise-definition

> #0  __pthread_kill_implementation (no_tid=0, signo=11, threadid=140486233631168) at ./nptl/pthread_kill.c:44
> #1  __pthread_kill_internal (signo=11, threadid=140486233631168) at ./nptl/pthread_kill.c:78
> #2  __GI___pthread_kill (threadid=140486233631168, signo=signo@entry=11) at ./nptl/pthread_kill.c:89
> #3  0x00007fc5802e9476 in __GI_raise (sig=11) at ../sysdeps/posix/raise.c:26
> #4  0x00007fc58076021f in core_handler (signo=11, siginfo=0x7ffd38d42470, context=0x7ffd38d42340) at lib/sigevent.c:248
> #5  <signal handler called>
> #6  0x000055c527f798c9 in isis_link_params_update_asla (circuit=0x55c52aaed3c0, ifp=0x55c52a1044e0) at isisd/isis_te.c:176
> #7  0x000055c527fb29da in isis_instance_flex_algo_create (args=0x7ffd38d43120) at isisd/isis_nb_config.c:2875
> #8  0x00007fc58072655b in nb_callback_create (context=0x55c52ab1d2f0, nb_node=0x55c529f72950, event=NB_EV_APPLY, dnode=0x55c52ab06230, resource=0x55c52ab189f8, errmsg=0x7ffd38d43750 "",
>     errmsg_len=8192) at lib/northbound.c:1262
> #9  0x00007fc580727625 in nb_callback_configuration (context=0x55c52ab1d2f0, event=NB_EV_APPLY, change=0x55c52ab189c0, errmsg=0x7ffd38d43750 "", errmsg_len=8192) at lib/northbound.c:1662
> #10 0x00007fc580727c39 in nb_transaction_process (event=NB_EV_APPLY, transaction=0x55c52ab1d2f0, errmsg=0x7ffd38d43750 "", errmsg_len=8192) at lib/northbound.c:1794
> #11 0x00007fc580725f77 in nb_candidate_commit_apply (transaction=0x55c52ab1d2f0, save_transaction=true, transaction_id=0x0, errmsg=0x7ffd38d43750 "", errmsg_len=8192)
>     at lib/northbound.c:1131
> #12 0x00007fc5807260d1 in nb_candidate_commit (context=..., candidate=0x55c529f0a730, save_transaction=true, comment=0x0, transaction_id=0x0, errmsg=0x7ffd38d43750 "", errmsg_len=8192)
>     at lib/northbound.c:1164
> #13 0x00007fc58072d220 in nb_cli_classic_commit (vty=0x55c52a0fc6b0) at lib/northbound_cli.c:51
> #14 0x00007fc58072d839 in nb_cli_apply_changes_internal (vty=0x55c52a0fc6b0,
>     xpath_base=0x7ffd38d477f0 "/frr-isisd:isis/instance[area-tag='1'][vrf='default']/flex-algos/flex-algo[flex-algo='129']", clear_pending=false) at lib/northbound_cli.c:178
> #15 0x00007fc58072dbcf in nb_cli_apply_changes (vty=0x55c52a0fc6b0, xpath_base_fmt=0x55c528014de0 "./flex-algos/flex-algo[flex-algo='%ld']") at lib/northbound_cli.c:234
> #16 0x000055c527fd3403 in flex_algo_magic (self=0x55c52804f1a0 <flex_algo_cmd>, vty=0x55c52a0fc6b0, argc=2, argv=0x55c52ab00ec0, algorithm=129, algorithm_str=0x55c52ab120d0 "129")
>     at isisd/isis_cli.c:3752
> #17 0x000055c527fc97cb in flex_algo (self=0x55c52804f1a0 <flex_algo_cmd>, vty=0x55c52a0fc6b0, argc=2, argv=0x55c52ab00ec0) at ./isisd/isis_cli_clippy.c:6445
> #18 0x00007fc5806b9abc in cmd_execute_command_real (vline=0x55c52aaf78f0, vty=0x55c52a0fc6b0, cmd=0x0, up_level=0) at lib/command.c:984
> #19 0x00007fc5806b9c35 in cmd_execute_command (vline=0x55c52aaf78f0, vty=0x55c52a0fc6b0, cmd=0x0, vtysh=0) at lib/command.c:1043
> #20 0x00007fc5806ba1e5 in cmd_execute (vty=0x55c52a0fc6b0, cmd=0x55c52aae6bd0 "flex-algo 129\n", matched=0x0, vtysh=0) at lib/command.c:1209
> #21 0x00007fc580782ae1 in vty_command (vty=0x55c52a0fc6b0, buf=0x55c52aae6bd0 "flex-algo 129\n") at lib/vty.c:615
> #22 0x00007fc580784a05 in vty_execute (vty=0x55c52a0fc6b0) at lib/vty.c:1378
> #23 0x00007fc580787131 in vtysh_read (thread=0x7ffd38d4ab10) at lib/vty.c:2373
> #24 0x00007fc58077b605 in event_call (thread=0x7ffd38d4ab10) at lib/event.c:2011
> #25 0x00007fc5806f8976 in frr_run (master=0x55c529df9b30) at lib/libfrr.c:1212
> #26 0x000055c527f301bc in main (argc=5, argv=0x7ffd38d4ad58, envp=0x7ffd38d4ad88) at isisd/isis_main.c:350
> (gdb) f 6
> #6  0x000055c527f798c9 in isis_link_params_update_asla (circuit=0x55c52aaed3c0, ifp=0x55c52a1044e0) at isisd/isis_te.c:176
> 176                     list_delete_all_node(ext->aslas);
> (gdb) p ext
> $1 = (struct isis_ext_subtlvs *) 0x0

Fixes: ae27101e6f ("isisd: fix building asla at first flex-algo config")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
2024-08-27 16:06:08 +02:00
Igor Ryzhov
830972cab2 lib: common debug status output
Implement common code for debug status output and remove daemon-specific
code that is duplicated everywhere.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2024-08-27 09:53:02 -04:00
Igor Ryzhov
82e52e0f21 lib: common debug config output
Implement common code for debug config output and remove daemon-specific
code that is duplicated everywhere.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2024-08-27 09:53:02 -04:00
Igor Ryzhov
5dac696154 lib: rework debug init
The debug library allows to register a `debug_set_all` callback which
should enable all debugs in a daemon. This callback is implemented
exactly the same in each daemon. Instead of duplicating the code, rework
the lib to allow registration of each debug type, and implement the
common code only once in the lib.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2024-08-27 09:53:02 -04:00
Russ White
5a6cb0bf75
Merge pull request #16103 from mjstapp/fix_5549_nhg_type
zebra: be consistent about v6 nexthops for v4 routes
2024-08-27 09:46:53 -04:00
Mark Stapp
17fffbad1b
Merge pull request #16656 from donaldsharp/minor_fix_for_pim_dr_nondr
tests: Allow convergence before adding multicast routes
2024-08-27 08:17:46 -04:00
Donald Sharp
37dd51867f tests: Add some tests to show new behavior works as expected
a) A noprefix address by itself should not create a connected route.
   This was pre-existing.
b) A noprefix address with a corresponding route should result in a
   connected route.  This is how NetworkManager appears to work.
   This is new behavior, so a new test.
c) A route is added to the system from someone else.
   This is new behavior, so a new test.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-27 06:25:34 -04:00
Donald Sharp
9bc0cd8241 zebra: Prevent accidental re memory leak in odd case
There exists a path in rib_add_multipath where if a decision
is made to not use the passed in re, we just drop the memory
instead of freeing it.  Let's free it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-27 06:25:34 -04:00
Donald Sharp
d528c02a20 zebra: Handle kernel routes appropriately
Current code intentionally ignores kernel routes.  Modify
zebra to allow these routes to be read in on linux.  Also
modify zebra to look to see if a route should be treated
as a connected and mark it as such.

Additionally this should properly handle some of the issues
being seen with NOPREFIXROUTE.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-27 06:25:34 -04:00
Donald Sharp
bdfccf69fa zebra: Expose rib_update_handle_vrf_all
This function will be used on interface down
events to allow for kernel routes to be cleaned
up.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-27 06:25:34 -04:00
Donald Sharp
f450e1cda4 zebra: Make p and src_p const for rib_delete
The prefix'es p and src_p are not const.  Let's make
them so.  Useful to signal that we will not change this
data.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-27 06:25:34 -04:00
Donatas Abraitis
7a461479a0 bgpd: Respect BLACKHOLE community for internal BGP peering also
rfc7999 does not define to use this technique ONLY for EBGP sessions.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2024-08-27 10:08:54 +03:00
Donald Sharp
52f292d188
Merge pull request #16657 from Jafaral/ospfv3_test_fix
tests: Fix frequent ospfv3 basic functionality test failure
2024-08-26 20:29:38 -04:00
Jafar Al-Gharaibeh
0d745741c9 tests: Fix frequent ospfv3 basic functionality test failure
The dead timer is set to 4 seconds, while the hello interval is set to 6535.
This test will only pass of the platform is fast enough for ospfv3 to
converge in 4 seconds. These timers were already tested multiple time earlier.
This test should just make sure that the boundary value 65535 is configurable,

Other changes in this commit:
  - add sequence numbers to the dead intervals tests to make it easier to
    track test faliures.
  - swap the config order in one test to match order with all other tests.

Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
2024-08-26 16:35:37 -05:00
Donald Sharp
3c4ffcacfe tests: Allow convergence before adding multicast routes
Current code adds a new vlan interface, sets up ospf and
pim on it and immediately starts shoving data down the pipes.
This of course has the fun thing where the IGP and pim do not
always come up in a nice neat manner and the test is looking
for state from a nice neat come up, even though pim is `working`
correctly it is not correct for what the test wants.

Modify the code to ensure that ospf is up and has propagated
the route where it is needed as well as that pim neighbors have
properly come up, then initiate the multicast streams and igmp
reports.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2024-08-26 16:02:46 -04:00