Commit Graph

35921 Commits

Author SHA1 Message Date
Y Bharath
d4c307d075 yang: Added missed prefix to the yang file
Corrected warning by including the module

Signed-off-by: y-bharath14 <y.bharath@samsung.com>
(cherry picked from commit ed832b70ec)
2024-07-24 10:39:46 +00:00
Jafar Al-Gharaibeh
bf7eb3ff99
Merge pull request #16442 from FRRouting/mergify/bp/dev/10.1/pr-16330
zebra: Properly note that a nhg's nexthop has gone down (backport #16330)
2024-07-24 00:47:58 -04:00
Russ White
d9ca4b850e
Merge pull request #16443 from FRRouting/mergify/bp/dev/10.1/pr-16376
ospfd: fix internal ldp-sync state flags when feature is disabled (backport #16376)
2024-07-23 18:11:46 -04:00
Donatas Abraitis
5bc6621c4b
Merge pull request #16441 from FRRouting/mergify/bp/dev/10.1/pr-16437
bgpd: backpressure - Avoid use after free (backport #16437)
2024-07-23 23:38:23 +03:00
Christian Breunig
f34dfab805 ospfd: fix internal ldp-sync state flags when feature is disabled
When enabling "mpls ldp-sync" under "router ospf" ospfd configures
SET_FLAG(ldp_sync_info->flags, LDP_SYNC_FLAG_IF_CONFIG) so internally knowing
that the ldp-sync feature is enabled. However the flag is not cleared when
turning of the feature using "nompls ldp-sync"!

https://github.com/FRRouting/frr/issues/16375

Signed-off-by: Christian Breunig <christian@breunig.cc>
(cherry picked from commit 5a70378a47)
2024-07-23 14:52:53 +00:00
Donald Sharp
ef873b1b33 zebra: Properly note that a nhg's nexthop has gone down
Current code when a link is set down is to just mark the
nexthop group as not properly setup.  Leaving situations
where when an interface goes down and show output is
entered we see incorrect state.  This is true for anything
that would be checking those flags at that point in time.

Modify the interface down nexthop group code to notice the
nexthops appropriately ( and I mean set the appropriate flags )
and to allow a `show ip route` command to actually display
what is going on with the nexthops.

eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:00:06 ago
  * 192.168.44.33, via dummy1, weight 1
  * 192.168.45.33, via dummy2, weight 1

sharpd@eva:~/frr1$ sudo ip link set dummy2 down

eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:00:12 ago
  * 192.168.44.33, via dummy1, weight 1
    192.168.45.33, via dummy2 inactive, weight 1

Notice now that the 1.0.0.0/32 route now correctly
displays the route for the nexthop group entry.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 266b061994)
2024-07-23 14:50:46 +00:00
Rajasekar Raja
4c68489d2f bgpd: backpressure - Avoid use after free
Coverity complains there is a use after free (1598495 and 1598496)
At this point, most likely dest->refcount cannot go 1 and free up
the dest, but there might be some code path where this can happen.

Fixing this with a simple order change (no harm fix).

Ticket :#4001204

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit 40965e5999)
2024-07-23 14:47:38 +00:00
Donald Sharp
743eecd288
Merge pull request #16433 from FRRouting/mergify/bp/dev/10.1/pr-16309
pimd: fix crash on non-existent interface (backport #16309)
2024-07-23 10:45:11 -04:00
Jafar Al-Gharaibeh
edbd0af9d8
Merge pull request #16430 from FRRouting/mergify/bp/dev/10.1/pr-16429
lib: move non-error from __log_err to __dbg (backport #16429)
2024-07-22 14:02:32 -04:00
Louis Scalbert
4e652de703 pimd: fix crash on non-existent interface
Fix the following crash when pim options are (un)configured on an
non-existent interface.

> r1(config)# int fgljdsf
> r1(config-if)# no ip pim unicast-bsm
> vtysh: error reading from pimd: Connection reset by peer (104)Warning: closing connection to pimd because of an I/O error!

> #0  raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
> #1  0x00007f70c8f32249 in core_handler (signo=11, siginfo=0x7fffff88e4f0, context=0x7fffff88e3c0) at lib/sigevent.c:258
> #2  <signal handler called>
> #3  0x0000556cfdd9b16d in lib_interface_pim_address_family_unicast_bsm_modify (args=0x7fffff88f130) at pimd/pim_nb_config.c:1910
> #4  0x00007f70c8efdcb5 in nb_callback_modify (context=0x556d00032b60, nb_node=0x556cffeeb9b0, event=NB_EV_APPLY, dnode=0x556d00031670, resource=0x556d00032b48, errmsg=0x7fffff88f710 "", errmsg_len=8192)
>     at lib/northbound.c:1538
> #5  0x00007f70c8efe949 in nb_callback_configuration (context=0x556d00032b60, event=NB_EV_APPLY, change=0x556d00032b10, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:1888
> #6  0x00007f70c8efee82 in nb_transaction_process (event=NB_EV_APPLY, transaction=0x556d00032b60, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:2016
> #7  0x00007f70c8efd658 in nb_candidate_commit_apply (transaction=0x556d00032b60, save_transaction=true, transaction_id=0x0, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:1356
> #8  0x00007f70c8efd78e in nb_candidate_commit (context=..., candidate=0x556cffeb0e80, save_transaction=true, comment=0x0, transaction_id=0x0, errmsg=0x7fffff88f710 "", errmsg_len=8192) at lib/northbound.c:1389
> #9  0x00007f70c8f03e58 in nb_cli_classic_commit (vty=0x556d00025a80) at lib/northbound_cli.c:51
> #10 0x00007f70c8f043f8 in nb_cli_apply_changes_internal (vty=0x556d00025a80,
>     xpath_base=0x7fffff893bb0 "/frr-interface:lib/interface[name='fgljdsf']/frr-pim:pim/address-family[address-family='frr-routing:ipv4']", clear_pending=false) at lib/northbound_cli.c:178
> #11 0x00007f70c8f0475d in nb_cli_apply_changes (vty=0x556d00025a80, xpath_base_fmt=0x556cfdde9fe0 "./frr-pim:pim/address-family[address-family='%s']") at lib/northbound_cli.c:234
> #12 0x0000556cfdd8298f in pim_process_no_unicast_bsm_cmd (vty=0x556d00025a80) at pimd/pim_cmd_common.c:3493
> #13 0x0000556cfddcf782 in no_ip_pim_ucast_bsm (self=0x556cfde40b20 <no_ip_pim_ucast_bsm_cmd>, vty=0x556d00025a80, argc=4, argv=0x556d00031500) at pimd/pim_cmd.c:4950
> #14 0x00007f70c8e942f0 in cmd_execute_command_real (vline=0x556d00032070, vty=0x556d00025a80, cmd=0x0, up_level=0) at lib/command.c:1002
> #15 0x00007f70c8e94451 in cmd_execute_command (vline=0x556d00032070, vty=0x556d00025a80, cmd=0x0, vtysh=0) at lib/command.c:1061
> #16 0x00007f70c8e9499f in cmd_execute (vty=0x556d00025a80, cmd=0x556d00030320 "no ip pim unicast-bsm", matched=0x0, vtysh=0) at lib/command.c:1227
> #17 0x00007f70c8f51e44 in vty_command (vty=0x556d00025a80, buf=0x556d00030320 "no ip pim unicast-bsm") at lib/vty.c:616
> #18 0x00007f70c8f53bdd in vty_execute (vty=0x556d00025a80) at lib/vty.c:1379
> #19 0x00007f70c8f55d59 in vtysh_read (thread=0x7fffff896600) at lib/vty.c:2374
> #20 0x00007f70c8f4b209 in event_call (thread=0x7fffff896600) at lib/event.c:2011
> #21 0x00007f70c8ed109e in frr_run (master=0x556cffdb4ea0) at lib/libfrr.c:1217
> #22 0x0000556cfdddec12 in main (argc=2, argv=0x7fffff896828, envp=0x7fffff896840) at pimd/pim_main.c:165
> (gdb) f 3
> #3  0x0000556cfdd9b16d in lib_interface_pim_address_family_unicast_bsm_modify (args=0x7fffff88f130) at pimd/pim_nb_config.c:1910
> 1910			pim_ifp->ucast_bsm_accept =
> (gdb) list
> 1905		case NB_EV_ABORT:
> 1906			break;
> 1907		case NB_EV_APPLY:
> 1908			ifp = nb_running_get_entry(args->dnode, NULL, true);
> 1909			pim_ifp = ifp->info;
> 1910			pim_ifp->ucast_bsm_accept =
> 1911				yang_dnode_get_bool(args->dnode, NULL);
> 1912
> 1913			break;
> 1914		}
> (gdb) p pim_ifp
> $1 = (struct pim_interface *) 0x0

Fixes: 3bb513c399 ("lib: adapt to version 2 of libyang")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
(cherry picked from commit 6952bea5cd)
2024-07-22 16:39:09 +00:00
Christian Hopps
f1e664fc78 lib: move non-error from __log_err to __dbg
Additionally, print `errmsg_if_any` in successful debug messages
if non-NULL.

fixes #16386 #16043

Signed-off-by: Christian Hopps <chopps@labn.net>
(cherry picked from commit 7afd7d99f2)
2024-07-22 14:37:04 +00:00
Donatas Abraitis
2a06d1c5e0
Merge pull request #16404 from FRRouting/mergify/bp/dev/10.1/pr-16368
bgpd: backpressure - fix to properly remove dest for bgp under deletion (backport #16368)
2024-07-16 22:08:00 -07:00
Jafar Al-Gharaibeh
b13911378d
Merge pull request #16399 from FRRouting/mergify/bp/dev/10.1/pr-16374
bgpd: Mark VRF instance as auto created if import vrf is configured for this instance (backport #16374)
2024-07-16 20:07:15 -04:00
Donatas Abraitis
e014a465df
Merge pull request #16379 from FRRouting/mergify/bp/dev/10.1/pr-16350
zebra: Fix to avoid two Vrfs with same table ids (backport #16350)
2024-07-16 16:00:05 -07:00
Donatas Abraitis
2295328b22
Merge pull request #16395 from FRRouting/mergify/bp/dev/10.1/pr-16365
isisd: fix crash when calculating the neighbor spanning tree based on the fragmented LSP (backport #16365)
2024-07-16 15:51:01 -07:00
Rajasekar Raja
25d515cb25 bgpd: backpressure - Improve debuggability
Improve debuggability in backpressure code.

Ticket :#3980988

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit 186db96c06)
2024-07-16 16:15:32 +00:00
Rajasekar Raja
9f9c0bc244 bgpd: backpressure - fix to properly remove dest for bgp under deletion
In case of imported routes (L3vni/vrf leaks), when a bgp instance is
being deleted, the peer->bgp comparision with the incoming bgp to remove
the dest from the pending fifo is wrong. This can lead to the fifo
having stale entries resulting in crash.

Two changes are done here.
 - Instead of pop/push items in list if the struct bgp doesnt match,
   simply iterate the list and remove the expected ones.

 - Corrected the way bgp is fetched from dest rather than relying on
   path_info->peer so that it works for all kinds of routes.

Ticket :#3980988

Signed-off-by: Chirag Shah <chirag @nvidia.com>
Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit 4395fcd8e1)
2024-07-16 16:15:31 +00:00
Donatas Abraitis
2ab249e580 bgpd: Skip empty (auto created) VRF instances when deleting a default BGP instance
Auto created VRF instances does not have any config, so it's not relevant
depending on them.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit bfedb38110)
2024-07-16 14:10:07 +00:00
Donatas Abraitis
19a0e3d544 tests: Check if VRF instance has a different ASN than a default VRF
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit c6c0403c61)
2024-07-16 14:10:07 +00:00
Donatas Abraitis
b5db1e7352 bgpd: Skip automatically created BGP instances for show CMDs
When using e.g. `adverise-all-vni`, and/or `import vrf ...`, the VRF instance
is created with a default's VRF ASN and tagged as AUTO_VRF. We MUST skip them
here also.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 03c086866b)
2024-07-16 14:10:07 +00:00
Donatas Abraitis
906711c8f6 tests: Check if multiple VRF instances can have different ASNs
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 7540364e58)
2024-07-16 14:10:06 +00:00
Donatas Abraitis
6c573ce305 bgpd: Mark VRF instance as auto created if import vrf is configured for this instance
If we create a new BGP instance (in this case VRF instance), it MUST be marked
as auto created, to avoid bgpd changing VRF instance's ASN to the default VRF's.

That's because of the ordering when FRR reload is happening.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 80a4f87c9a)
2024-07-16 14:10:06 +00:00
zhou-run
f727c63790 isisd: fix crash when calculating the neighbor spanning tree based on the fragmented LSP
1. When the root IS regenerates an LSP, it calls lsp_build() -> lsp_clear_data() to free the TLV memory of the first fragment and all other fragments. If the number of fragments in the regenerated LSP decreases or if no fragmentation is needed, the extra LSP fragments are not immediately deleted. Instead, lsp_seqno_update() -> lsp_purge() is called to set the remaining time to zero and start aging, while also notifying other IS nodes to age these fragments. lsp_purge() usually does not reset lsp->hdr.seqno to zero because the LSP might recover during the aging process.
2. When other IS nodes receive an LSP, they always call process_lsp() -> isis_unpack_tlvs() to allocate TLV memory for the LSP. This does not differentiate whether the received LSP has a remaining lifetime of zero. Therefore, it is rare for an LSP of a non-root IS to have empty TLVs. Of course, if an LSP with a remaining time of zero and already corrupted is received, lsp_update() -> lsp_purge() will be called to free the TLV memory of the LSP, but this scenario is rare.
3. In LFA calculations, neighbors of the root IS are traversed, and each neighbor is taken as a new root to compute the neighbor SPT. During this process, the old root IS will serve as a neighbor of the new root IS, triggering a call to isis_spf_process_lsp() to parse the LSP of the old root IS and obtain its IP vertices and neighboring IS vertices. However, isis_spf_process_lsp() only checks whether the TLVs in the first fragment of the LSP exist, and does not check the TLVs in the fragmented LSP. If the TLV memory of the fragmented LSP of the old root IS has been freed, it can lead to a null pointer access, causing the current crash.

Additionally, for the base SPT, there are only two places where the LSP of the root IS is parsed:
1. When obtaining the UP neighbors of the root IS via spf_adj_list_parse_lsp().
2. When preloading the IP vertices of the root IS via isis_lsp_iterate_ip_reach().
Both of these checks ensure that frag->tlvs is not null, and they do not subsequently call isis_spf_process_lsp() to parse the root IS's LSP. It is very rare for non-root IS LSPs to have empty TLVs unless they are corrupted LSPs awaiting deletion. If it happens, a crash will occur.

The backtrace is as follows:
(gdb) bt
#0  0x00007f3097281fe1 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f30973a2972 in core_handler (signo=11, siginfo=0x7ffce66c2870, context=0x7ffce66c2740) at ../lib/sigevent.c:261
#2  <signal handler called>
#3  0x000055dfa805512b in isis_spf_process_lsp (spftree=0x55dfa950eee0, lsp=0x55dfa94cb590, cost=10, depth=1, root_sysid=0x55dfa950ef6c "", parent=0x55dfa952fca0)
    at ../isisd/isis_spf.c:898
#4  0x000055dfa805743b in isis_spf_loop (spftree=0x55dfa950eee0, root_sysid=0x55dfa950ef6c "") at ../isisd/isis_spf.c:1688
#5  0x000055dfa805784f in isis_run_spf (spftree=0x55dfa950eee0) at ../isisd/isis_spf.c:1808
#6  0x000055dfa8037ff5 in isis_spf_run_neighbors (spftree=0x55dfa9474440) at ../isisd/isis_lfa.c:1259
#7  0x000055dfa803ac17 in isis_spf_run_lfa (area=0x55dfa9477510, spftree=0x55dfa9474440) at ../isisd/isis_lfa.c:2300
#8  0x000055dfa8057964 in isis_run_spf_with_protection (area=0x55dfa9477510, spftree=0x55dfa9474440) at ../isisd/isis_spf.c:1827
#9  0x000055dfa8057c15 in isis_run_spf_cb (thread=0x7ffce66c38e0) at ../isisd/isis_spf.c:1889
#10 0x00007f30973bbf04 in thread_call (thread=0x7ffce66c38e0) at ../lib/thread.c:1990
#11 0x00007f309735497b in frr_run (master=0x55dfa91733c0) at ../lib/libfrr.c:1198
#12 0x000055dfa8029d5d in main (argc=5, argv=0x7ffce66c3b08, envp=0x7ffce66c3b38) at ../isisd/isis_main.c:273
(gdb) f 3
#3  0x000055dfa805512b in isis_spf_process_lsp (spftree=0x55dfa950eee0, lsp=0x55dfa94cb590, cost=10, depth=1, root_sysid=0x55dfa950ef6c "", parent=0x55dfa952fca0)
    at ../isisd/isis_spf.c:898
898     ../isisd/isis_spf.c: No such file or directory.
(gdb) p te_neighs
$1 = (struct isis_item_list *) 0x120
(gdb) p lsp->tlvs
$2 = (struct isis_tlvs *) 0x0
(gdb) p lsp->hdr
$3 = {pdu_len = 27, rem_lifetime = 0, lsp_id = "\000\000\000\000\000\001\000\001", seqno = 4, checksum = 59918, lsp_bits = 1 '\001'}

The backtrace provided above pertains to version 8.5.4, but it seems that the same issue exists in the code of the master branch as well.

I have reviewed the process for calculating the SPT based on the LSP, and isis_spf_process_lsp() is the only function that does not check whether the TLVs in the fragments are empty. Therefore, I believe that modifying this function alone should be sufficient. If the TLVs of the current fragment are already empty, we do not need to continue processing subsequent fragments. This is consistent with the behavior where we do not process fragments if the TLVs of the first fragment are empty.
Of course, one could argue that lsp_purge() should still retain the TLV memory, freeing it and then reallocating it if needed. However, this is a debatable point because in some scenarios, it is permissible for the LSP to have empty TLVs. For example, after receiving an SNP (Sequence Number PDU) message, an empty LSP (with lsp->hdr.seqno = 0) might be created by calling lsp_new. If the corresponding LSP message is discarded due to domain or area authentication failure, the TLV memory wouldn't be allocated.

Test scenario:
In an LFA network, importing a sufficient number of static routes to cause LSP fragmentation, and then rolling back the imported static routes so that the LSP is no longer fragmented, can easily result in this issue.

Signed-off-by: zhou-run <zhou.run@h3c.com>
(cherry picked from commit e905177a8c)
2024-07-16 14:07:50 +00:00
Mark Stapp
ed272429fc
Merge pull request #16387 from FRRouting/mergify/bp/dev/10.1/pr-16373
staticd: fix missing static routes (backport #16373)
2024-07-16 07:55:28 -04:00
anlan_cs
11812ec93b zebra: fix missing static routes
Use `vtysh` with this input file:
```
ip route A nh1
ip route A nh2
ip route B nh1
ip route B nh2
```

When running "ip route B" with "nh1" and "nh2", the procedure maybe is:
1) Create the two nexthops: "nh1" and "nh2".
2) Register "nh1" with `static_zebra_nht_register()`, then the states of both
   "nh1" and "nht2" are set to "STATIC_SENT_TO_ZEBRA".
3) Register "nh2" with `static_zebra_nht_register()`, then only the routes with
   nexthop of "STATIC_START" will be sent to zebra.

So, send the routes with the nexthop of "STATIC_SENT_TO_ZEBRA" to zebra.

Signed-off-by: anlan_cs <vic.lan@pica8.com>
(cherry picked from commit 4518d386f7)
2024-07-15 18:46:22 +00:00
Jafar Al-Gharaibeh
c09f1aeb85
Merge pull request #16378 from FRRouting/mergify/bp/dev/10.1/pr-16363
tests: tweak timers to avoid frequent failures on slow CI hardware (backport #16363)
2024-07-15 14:43:21 -04:00
Rajasekar Raja
09bfdc3f81 zebra: Fix to avoid two Vrfs with same table ids
During internal testing, when the following sequence is followed, two
non default vrfs end up pointing to the same table-id

 - Initially vrf201 has table id 1002
 - ip link add dev vrf202 type vrf table 1002
 - ip link set dev vrf202 up
 - ip link set dev <intrerface> master vrf202

This will ideally lead to zebra exit since this is a misconfiguration as
expected.

However if we perform a restart frr.service at this point, we end up
having two vrfs pointing to same table-id and bad things can happen.
This is because in the interface_vrf_change, we incorrectly check for
vrf_lookup_by_id() to evaluate if there is a misconfig. This works well
for a non restart case but not for the startup case.

root@mlx-3700-20:mgmt:/var/log/frr# sudo vtysh -c "sh vrf"
vrf mgmt id 37 table 1001
vrf vrf201 id 46 table 1002
vrf vrf202 id 59 table 1002 >>>>

Fix: in all cases of misconfiguration, exit zebra as expected.

Ticket :#3970414

Signed-off-by: Donald Sharp <sharpd@nvidia.com>

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
(cherry picked from commit c77e15710d)
2024-07-14 00:14:24 +00:00
Jafar Al-Gharaibeh
7ac9e7f96d tests: tweak timers to avoid frequent failures on slow CI hardware
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
(cherry picked from commit ad7a1f9487)
2024-07-14 00:13:20 +00:00
Donald Sharp
641ebd910d
Merge pull request #16335 from FRRouting/mergify/bp/dev/10.1/pr-16226
ldpd: fix wrong gtsm count (backport #16226)
2024-07-03 08:41:01 -04:00
Jafar Al-Gharaibeh
8295bdba22
Merge pull request #16328 from FRRouting/mergify/bp/dev/10.1/pr-16303
isisd: fix crash when obtaining the next hop to calculate LFA on LAN links (backport #16303)
2024-07-02 16:55:12 -04:00
anlan_cs
6b7c9f0ba5 ldpd: fix wrong gtsm count
In linux networking stack, the received mpls packets will be processed
by the host *twice*, one as mpls packet, the other as ip packet, so
its ttl decreased 1.

So, we need release the `IP_MINTTL` value if gtsm is enabled, it is for the
mpls packets of neighbor session caused by the command:
`label local advertise explicit-null`.

This change makes the gtsm mechanism a bit deviation.

Fix PR #8313

Signed-off-by: anlan_cs <vic.lan@pica8.com>
(cherry picked from commit 1919df3a64)
2024-07-02 17:50:22 +00:00
zhou-run
12e24ca567 isisd: fix crash when obtaining the next hop to calculate LFA on LAN links
When a neighbor connection is disconnected, it may trigger LSP re-generation as a timer task, but this process may be delayed. As a result, the list of neighbors in area->adjacency_list may be inconsistent with the neighbors in lsp->tlvs->oldstyle_reach/extended_reach. For example, the area->adjacency_list may lack certain neighbors even though they are present in the LSP. When computing SPF, the call to isis_spf_build_adj_list() generates the spftree->sadj_list, which reflects the real neighbors in the area->adjacency_list. However, in the case of LAN links, spftree->sadj_list may include additional pseudo neighbors.
The pre-loading of tents through the call to isis_spf_preload_tent involves two steps:
1. isis_spf_process_lsp() is called to generate real neighbor vertices based on the root LSP and pseudo LSP.
2. isis_spf_add_local() is called to add corresponding next hops to the vertex->Adj_N list for the real neighbor vertices.
In the case of LAN links, the absence of corresponding real neighbors in the spftree->sadj_list prevents the execution of the second step. Consequently, the vertex->Adj_N list for the real neighbor vertices lacks corresponding next hops. This leads to a null pointer access when isis_lfa_compute() is called to calculate LFA.
As for P2P links, since there are no pseudo neighbors, only the second step is executed, which does not create real neighbor vertices and therefore does not encounter this issue.
The backtrace is as follows:
(gdb) bt
#0  0x00007fd065277fe1 in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007fd065398972 in core_handler (signo=11, siginfo=0x7ffc5c0636b0, context=0x7ffc5c063580) at ../lib/sigevent.c:261
#2  <signal handler called>
#3  0x00005564d82f8408 in isis_lfa_compute (area=0x5564d8b143f0, circuit=0x5564d8b21d10, spftree=0x5564d8b06bf0, resource=0x7ffc5c064410) at ../isisd/isis_lfa.c:2134
#4  0x00005564d82f8d78 in isis_spf_run_lfa (area=0x5564d8b143f0, spftree=0x5564d8b06bf0) at ../isisd/isis_lfa.c:2344
#5  0x00005564d8315964 in isis_run_spf_with_protection (area=0x5564d8b143f0, spftree=0x5564d8b06bf0) at ../isisd/isis_spf.c:1827
#6  0x00005564d8315c15 in isis_run_spf_cb (thread=0x7ffc5c064590) at ../isisd/isis_spf.c:1889
#7  0x00007fd0653b1f04 in thread_call (thread=0x7ffc5c064590) at ../lib/thread.c:1990
#8  0x00007fd06534a97b in frr_run (master=0x5564d88103c0) at ../lib/libfrr.c:1198
#9  0x00005564d82e7d5d in main (argc=5, argv=0x7ffc5c0647b8, envp=0x7ffc5c0647e8) at ../isisd/isis_main.c:273
(gdb) f 3
#3  0x00005564d82f8408 in isis_lfa_compute (area=0x5564d8b143f0, circuit=0x5564d8b21d10, spftree=0x5564d8b06bf0, resource=0x7ffc5c064410) at ../isisd/isis_lfa.c:2134
2134    ../isisd/isis_lfa.c: No such file or directory.
(gdb) p vadj_primary
$1 = (struct isis_vertex_adj *) 0x0
(gdb) p vertex->Adj_N->head
$2 = (struct listnode *) 0x0
(gdb) p (struct isis_vertex *)spftree->paths->l.list->head->next->next->next->next->data
$8 = (struct isis_vertex *) 0x5564d8b5b240
(gdb) p $8->type
$9 = VTYPE_NONPSEUDO_TE_IS
(gdb) p $8->N.id
$10 = "\000\000\000\000\000\002"
(gdb) p $8->Adj_N->count
$11 = 0
(gdb) p (struct isis_vertex *)spftree->paths->l.list->head->next->next->next->next->next->data
$12 = (struct isis_vertex *) 0x5564d8b73dd0
(gdb) p $12->type
$13 = VTYPE_NONPSEUDO_TE_IS
(gdb) p $12->N.id
$14 = "\000\000\000\000\000\003"
(gdb) p $12->Adj_N->count
$15 = 0
(gdb) p area->adjacency_list->count
$16 = 0
The backtrace provided above pertains to version 8.5.4, but it seems that the same issue exists in the code of the master branch as well.
The scenario where a vertex has no next hop is normal. For example, the "clear isis neighbor" command invokes isis_vertex_adj_del() to delete the next hop of a vertex. Upon reviewing all the instances where the vertex->Adj_N list is used, I found that only isis_lfa_compute() lacks a null check. Therefore, I believe that modifying this part will be sufficient. Additionally, the vertex->parents list for IP vertices is guaranteed not to be empty.
Test scenario:
Setting up LFA for LAN links and executing the "clear isis neighbor" command easily reproduces the issue.

Signed-off-by: zhou-run <zhou.run@h3c.com>
(cherry picked from commit a970bb51b5)
2024-07-02 12:02:13 +00:00
Russ White
7da20b3397
Merge pull request #16312 from FRRouting/mergify/bp/dev/10.1/pr-16305
bgpd: Ignore RFC8212 for BGP Confederations (backport #16305)
2024-07-02 08:00:43 -04:00
Russ White
9221e2c375
Merge pull request #16318 from FRRouting/mergify/bp/dev/10.1/pr-16233
ripd/ripd.c - rip_auth_md5 : Change the start value of sequence 1 to 0 (backport #16233)
2024-07-02 07:59:50 -04:00
T-Nicolas
46eb2c32fe ripd: Change the start value of sequence 1 to 0
Signed-off-by: T-Nicolas <github@toselli.email>
(cherry picked from commit 1a64fe4254)
2024-07-01 17:37:45 +00:00
Donatas Abraitis
57dec1cfe5 tests: Test if RFC 8212 is not involved for BGP confederations
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit dd6a679e3a)
2024-07-01 14:19:51 +00:00
Donatas Abraitis
8eb7d0ac1a bgpd: Ignore RFC8212 for BGP Confederations
RFC 8212 should be restricted for eBGP peers.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit fa2cc09d45)
2024-07-01 14:19:50 +00:00
Donatas Abraitis
dd53197ef2
Merge pull request #16306 from FRRouting/mergify/bp/dev/10.1/pr-16068
bgpd: Ignore routes from evpn if VRF is unknown (backport #16068)
2024-06-30 18:03:38 +02:00
Piotr Suchy
5bd66b695d bgpd: Ignore routes from evpn if VRF is unknown
Fix for a bug, where FRR fails to install route received for an unknown but later-created VRF - detailed description can be found here https://github.com/FRRouting/frr/issues/13708

Signed-off-by: Piotr Suchy <psuchy@akamai.com>
(cherry picked from commit 8044d73300)
2024-06-28 08:34:57 +00:00
Donald Sharp
6d24756b9b
Merge pull request #16302 from FRRouting/mergify/bp/dev/10.1/pr-16271
bgpd: avoid clearing routes for peers that were never established (backport #16271)
2024-06-27 12:16:28 -04:00
Loïc Sang
73b18210f2 bgpd: avoid clearing routes for peers that were never established
Under heavy system load with many peers in passive mode and a large
number of routes, bgpd can enter an infinite loop. This occurs while
processing timeout BGP_OPEN messages, which prevents it from accepting
new connections. The following log entries illustrate the issue:
>bgpd[6151]: [VX6SM-8YE5W][EC 33554460] 3.3.2.224: nexthop_set failed, resetting connection - intf 0x0
>bgpd[6151]: [P790V-THJKS][EC 100663299] bgp_open_receive: bgp_getsockname() failed for peer: 3.3.2.224
>bgpd[6151]: [HTQD2-0R1WR][EC 33554451] bgp_process_packet: BGP OPEN receipt failed for peer: 3.3.2.224
... repeating

The issue occurs when bgpd handles a massive number of routes in the RIB
while receiving numerous BGP_OPEN packets. If bgpd is overloaded, it
fails to process these packets promptly, leading the remote peer to
close the connection and resend BGP_OPEN packets.

When bgpd eventually starts processing these timeout BGP_OPEN packets,
it finds the TCP connection closed by the remote peer, resulting in
"bgp_stop()" being called. For each timeout peer, bgpd must iterate
through the routing table, which is time-consuming and causes new
incoming BGP_OPEN packets to timeout, perpetuating the infinite loop.

To address this issue, the code is modified to check if the peer has
been established at least once before calling "bgp_clear_route_all()".
This ensures that routes are only cleared for peers that had a
successful session, preventing unnecessary iterations over the routing
table for peers that never established a connection.

With this change, BGP_OPEN timeout messages may still occur, but in the
worst case, bgpd will stabilize. Before this patch, bgpd could enter a
loop where it was unable to accpet any new connections.

Signed-off-by: Loïc Sang <loic.sang@6wind.com>
(cherry picked from commit e0ae285eb8)
2024-06-26 20:46:36 +00:00
Russ White
6ea0a93ceb
Merge pull request #16285 from FRRouting/mergify/bp/dev/10.1/pr-15838
bgpd: fix "bgp as-pah access-list" with "set aspath exclude" set/unset issue (backport #15838)
2024-06-25 08:42:02 -04:00
Donatas Abraitis
67328b60c4
Merge pull request #16284 from FRRouting/mergify/bp/dev/10.1/pr-16261
zebra: clear evpn dup-addr return error-msg when there is no vni (backport #16261)
2024-06-25 14:49:39 +03:00
Russ White
37451922d7
Merge pull request #16291 from FRRouting/mergify/bp/dev/10.1/pr-16214
bgpd: A couple more fixes for Tunnel encapsulation handling (backport #16214)
2024-06-25 07:30:45 -04:00
Russ White
b43bf5aeec
Merge pull request #16289 from FRRouting/mergify/bp/dev/10.1/pr-16273
bgpd: Relax OAD (One-Administration-Domain) for RFC8212 (backport #16273)
2024-06-25 07:30:24 -04:00
Donatas Abraitis
c01d60b037 bgpd: Check if we have real stream data for tunnel encapsulation sub-tlvs
When the packet is malformed it can use whatever values it wants. Let's check
what the real data we have in a stream instead of relying on malformed values.

Reported-by: Iggy Frankovic <iggyfran@amazon.com>
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 9929486d6b)
2024-06-25 11:27:43 +00:00
Donatas Abraitis
c105f9acd9 bgpd: Adjust the length of tunnel encap sub-tlv by sub-tlv type
Fixes: 79563af564 ("bgpd: Get 1 or 2 octets for Sub-TLV length (Tunnel Encap attr)")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 34b209f0ae)
2024-06-25 11:27:43 +00:00
Donatas Abraitis
26fd1f8167 bgpd: Relax OAD (One-Administration-Domain) for RFC8212
RFC 8212 defines leak prevention for eBGP peers, but BGP-OAD defines a new
peering type One Administrative Domain (OAD), where multiple ASNs could be used
inside a single administrative domain. OAD allows sending non-transitive attributes,
so this prevention should be relaxed too.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 3b98ddf501)
2024-06-25 11:25:33 +00:00
Donatas Abraitis
48b0d338e3
Merge pull request #16281 from FRRouting/mergify/bp/dev/10.1/pr-16213
bgpd: Check if we have really enough data before doing memcpy for FQDN capability (backport #16213)
2024-06-25 13:48:18 +03:00
Donatas Abraitis
a83e5971d5
Merge pull request #16278 from FRRouting/mergify/bp/dev/10.1/pr-16211
bgpd: Check if we have really enough data before doing memcpy for software version (backport #16211)
2024-06-25 13:47:50 +03:00