Upstream CI is frequently running into a situation where
the routes are not being installed. These routes
start at the beginning and suddenly in the middle
they start working properly.
D 1.0.15.183/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.184/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.185/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D>* 1.0.15.186/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.187/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.188/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
Turning on some debugs showed that the failed installed routes are
trying to be matched against the default route. Thus implying
all the connected routes for the test are not yet successfully
installed. Let's modify the test(s) on startup to just ensure
that the connected routes are installed correctly. I am no
longer seeing the problem after this change.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
A crash happens when executing the following command:
> ubuntu2204hwe# conf
> ubuntu2204hwe(config)# router bgp 65500
> ubuntu2204hwe(config-router)# !
> ubuntu2204hwe(config-router)# address-family ipv4 unicast
> ubuntu2204hwe(config-router-af)# sid vpn export auto
> ubuntu2204hwe(config-router-af)# exit-address-family
> ubuntu2204hwe(config-router)# !
> ubuntu2204hwe(config-router)# address-family ipv4 vpn
> ubuntu2204hwe(config-router-af)# network 4.4.4.4/32 rd 55:55 label 556
> ubuntu2204hwe(config-router-af)# network 5.5.5.5/32 rd 662:33 label 232
> ubuntu2204hwe(config-router-af)# exit-address-family
> ubuntu2204hwe(config-router)# exit
> ubuntu2204hwe(config)# !
> ubuntu2204hwe(config)# no router bgp
The crash analysis indicates a memory item has been freed.
> #6 0x000076066a629c15 in mt_count_free (mt=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0)
> at lib/memory.c:73
> #7 mt_count_free (ptr=0x60200038b4f0, mt=0x56b57be85e00 <MTYPE_BGP_NAME>) at lib/memory.c:69
> #8 qfree (mt=mt@entry=0x56b57be85e00 <MTYPE_BGP_NAME>, ptr=0x60200038b4f0) at lib/memory.c:129
> #9 0x000056b57bb09ce9 in bgp_free (bgp=<optimized out>) at bgpd/bgpd.c:4120
> #10 0x000056b57bb0aa73 in bgp_unlock (bgp=<optimized out>) at ./bgpd/bgpd.h:2513
> #11 peer_free (peer=0x62a000000200) at bgpd/bgpd.c:1313
> #12 0x000056b57bb0aca8 in peer_unlock_with_caller (name=<optimized out>, peer=<optimized out>)
> at bgpd/bgpd.c:1344
> #13 0x000076066a6dbb2c in event_call (thread=thread@entry=0x7ffc8cae1d60) at lib/event.c:2011
> #14 0x000076066a60aa88 in frr_run (master=0x613000000040) at lib/libfrr.c:1214
> #15 0x000056b57b8b2c44 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:543
Actually, the BGP_NAME item has not been used at allocation for
static->prd_pretty, and this results in reaching 0 quicker at bgp
deletion.
Fix this by reassigning MTYPE_BGP_NAME to prd_pretty.
Fixes: 16600df2c4 ("bgpd: fix show run of network route-distinguisher")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The all_protocol_startup topotest needs to allow for some delay
between configuring nexthop-groups and their installation. Add
some wait periods in a couple of nhg test cases.
Signed-off-by: Mark Stapp <mjs@cisco.com>
```
==1145965==ERROR: AddressSanitizer: heap-use-after-free on address 0x6030007159c0 at pc 0x55ade8d962d1 bp 0x7ffec4ce74c0 sp 0x7ffec4ce74b0
READ of size 8 at 0x6030007159c0 thread T0
0 0x55ade8d962d0 in no_router_bgp bgpd/bgp_vty.c:1701
1 0x7efe5aed19ed in cmd_execute_command_real lib/command.c:1002
2 0x7efe5aed1da3 in cmd_execute_command lib/command.c:1061
3 0x7efe5aed2303 in cmd_execute lib/command.c:1227
4 0x7efe5af6c023 in vty_command lib/vty.c:616
5 0x7efe5af6d2d2 in vty_execute lib/vty.c:1379
6 0x7efe5af77df2 in vtysh_read lib/vty.c:2374
7 0x7efe5af64c9b in event_call lib/event.c:1996
8 0x7efe5af03887 in frr_run lib/libfrr.c:1232
9 0x55ade8cd9850 in main bgpd/bgp_main.c:555
10 0x7efe5aa29d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
11 0x7efe5aa29e3f in __libc_start_main_impl ../csu/libc-start.c:392
12 0x55ade8cdc314 in _start (/usr/lib/frr/bgpd+0x16f314)
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
In order to run the XXXX Sanitizers over the code as a developer
modern linux distro's require a specific sysctl. Let's document
that so that people are aware of it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
json_peers is allocated in the above if statement block
for json but is not freed in this code path. Noticed
by running Address Sanitizer.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The events list is storing a `struct event *` allocated
as a MTYPE_TMP pointer, on shutdown ensure that it is
properly free'd up.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This program, mgmtd_testc, is built with the
--enable-mgmtd-test-be-client configure option
but it is not .gitignore'd. Let's fix that
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The function zebra_nhg_hash_equal is only used
as a hash function for storage of NHG's and retrieval.
If you have say two nhg's:
31 (25/26)
32 (25/26)
This function would return them as being equal. Which
of course leads to the problem when you attempt to
hash_release 32 but release 31 from the hash. Then later
when you attempt to do hash comparisons 32 has actually
been freed leaving to use after free situations and shit
goes down hill fast.
This hash is only used as part of the hash comparison
function for nexthop group storage. Since this is so
let's always return the 31/32 nhg's are not equal at all.
We possibly have a different problem where we are creating
31 and 32 ( when 31 should have just been used instead of 32 )
but we need to prevent any type of hash release problem at all.
This supercedes any other issue( that should be tracked down
on it's own ). Since you can have use after free situation
that leads to a crash -vs- some possible nexthop group duplication
which is very minor in comparison.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fix the LYD_NEW_PATH_OUTPUT undeclared error to support the latest libyang v3.x version,
and also compatible with old version.
Signed-off-by: Lu Mao <lu.mao@molex.com>
When a whole distribute-list is deleted (can be done only using API),
all its children must be cleaned up manually.
Fixes#16538
Signed-off-by: Igor Ryzhov <idryzhov@gmail.com>
The adj_process_threeway() api may call the adj_state_change()
api, which may delete the adj struct being examined. Change the
signature so that callers pass a ptr-to-ptr so that they will
see that deletion.
Signed-off-by: Mark Stapp <mjs@cisco.com>
vtysh will print out the `stupidly large FD limit` upon
every run of the program if the ulimit is set stupidly
large. Prevent this from being displayed for vtysh.
Fixes: #16516
Signed-off-by: Donald Sharp <sharpd@nvidia.com>