two things:
On shutdown cleanup any events associated with the update walker.
Also do not allow new events to be created.
Fixes this mem-leak:
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790:Direct leak of 8 byte(s) in 1 object(s) allocated from:
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #0 0x7f0dd0b08037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #1 0x7f0dd06c19f9 in qcalloc lib/memory.c:105
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #2 0x55b42fb605bc in rib_update_ctx_init zebra/zebra_rib.c:4383
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #3 0x55b42fb6088f in rib_update zebra/zebra_rib.c:4421
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #4 0x55b42fa00344 in netlink_link_change zebra/if_netlink.c:2221
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #5 0x55b42fa24622 in netlink_information_fetch zebra/kernel_netlink.c:399
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #6 0x55b42fa28c02 in netlink_parse_info zebra/kernel_netlink.c:1183
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #7 0x55b42fa24951 in kernel_read zebra/kernel_netlink.c:493
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #8 0x7f0dd0797f0c in event_call lib/event.c:1995
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #9 0x7f0dd0684fd9 in frr_run lib/libfrr.c:1185
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #10 0x55b42fa30caa in main zebra/main.c:465
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #11 0x7f0dd01b5d09 in __libc_start_main ../csu/libc-start.c:308
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 3cd0accb50)
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
The parser for extended communities was incorrectly disallowing an
operator from configuring "Route Origin" extended communities
(e.g. RD/RT/SoO) with a 4-byte value matching BGP_AS4_MAX (UINT32_MAX)
and allowed the user to overflow UINT32_MAX. This updates the parser to
read the value as a uint64_t so that we can do proper checks on the
upper bounds (> BGP_AS4_MAX || errno).
before:
```
TORC11(config-router-af)# neighbor uplink-1 soo 4294967296:65
TORC11(config-router-af)# do sh run | include soo
neighbor uplink-1 soo 0:65
TORC11(config-router-af)# neighbor uplink-1 soo 4294967295:65
% Malformed SoO extended community
TORC11(config-router-af)#
```
after:
```
TORC11(config-router-af)# neighbor uplink-1 soo 4294967296:65
% Malformed SoO extended community
TORC11(config-router-af)# neighbor uplink-1 soo 4294967295:65
TORC11(config-router-af)# do sh run | include soo
neighbor uplink-1 soo 4294967295:65
TORC11(config-router-af)#
```
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
(cherry picked from commit b571d79d64)
Currently the process of the `route-map` configuration for `per-vrf-rip`
is wrong.
There are two problems:
1. `ctx->name` for `if_rmap_ctx` is not initialized in `if_rmap_ctx_create()`.
2. The global `if_rmap_ctx_list` is wrongly used for `per-vrf-rip`.
So, two changes for it:
1. Correctly initializes `ctx->name`.
2. Use specific `if_rmap_ctx` for `per-vrf-rip`, not global one.
Note, this related implementation for `route-map` is only for `ripd`.
Before:
```
anlan(config)# route rip vrf vrf1
anlan(config-router)# route-map aa in lan
anlan(config-router)# do show run
!
router rip
route-map aa in lan
exit
!
```
After:
```
anlan(config)# route rip vrf vrf1
anlan(config-router)# route-map aa in lan
anlan(config-router)# do show run
!
router rip vrf vrf1
route-map aa in lan
exit
!
```
Signed-off-by: anlan_cs <vic.lan@pica8.com>
(cherry picked from commit ff0fa00c7d)
Fix for missing neighbor description in "show bgp summary [wide]"
when its length exceeds 20[64] chars and it doesn't contain
withespaces.
Existing behavior remains if description contains whitespaces
before size limit.
Signed-off-by: rbarroetavena <rbarroetavena@gmail.com>
(cherry picked from commit 420ac3d24c)
Test failed time to time, let's try this way:
```
$ for x in $(seq 1 20); do cp test_bgp_labeled_unicast_addpath.py test_$x.py; done
$ sudo pytest -s -n 20
```
Ran 10 times using this pattern, no failure 🤷
Before this change, we checked advertised routes, and at some point `=` was
missing from the output, but advertised correctly. Receiving router gets as
much routes as expected to receive.
I reversed checking received routes, not advertised.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 9db7ed2fc9)
If we set `bgp route-map delay-timer X`, we should ignore starting to announce
routes immediately, and wait for delay timer to expire (or ignore at all if set
to zero).
f1aa49293a broke this because we always sent
route refresh and on receiving BoRR before sending back EoRR.
Let's get fix this.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 4d8e44c753)
Whenever OSPF virtual-link is created, a virtual interface is
associated with it. Name of the virtual interface is derived by
combining "VLINK" string with the value of vlink_count, which is a global
variable.
Problem:
Consider a scenario where 2 virtual links A and B are created in OSPF with
virtual interfaces VLINK0 and VLINK1 respectively. When virtual-link A is unconfigured
and reconfigured, new interface name derived for it will be VLINK1, which is already
associated with virtual-link B. Due to this, both virtual-links A and B will
point to the same interface, VLINK1.
During FRR restart when signal handler is called, OSPF goes through all the virtual
links and deletes the interface(oi) associated with it. During the deletion of interface
for virtual-link B,it accesses the interface which was deleted already(which was deleted
during deletion of virual-link A) and whose fields were set to NULL. This
leads to OSPF crash.
Fixed it by not decrementing vlink_count during unconfig/deletion for virtual-link.
Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
(cherry picked from commit 19f451913e)
We copy the password only if an existing peer structure didn't have it.
But it might be the case when it exists, and we skip here.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit b5b6f11fcb)
Before it was setting SDIR, which is /usr/lib/frr, but the vtysh binary is put
under bindir (which is /usr/local by default). And running `/usr/lib/frr/frr reload`
failed.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit c9bdc0c79e)
It's not 4 bytes, it was assuming the same as Graceful-Restart tuples.
LLGR has more 3 bytes (Long-lived Stale Time).
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit b1d33ec293)
We have this valgrind trace:
==1125== Invalid read of size 4
==1125== at 0x170A7D: pim_if_delete (pim_iface.c:203)
==1125== by 0x170C01: pim_if_terminate (pim_iface.c:80)
==1125== by 0x174F34: pim_instance_terminate (pim_instance.c:68)
==1125== by 0x17535B: pim_vrf_terminate (pim_instance.c:260)
==1125== by 0x1941CF: pim_terminate (pimd.c:161)
==1125== by 0x1B476D: pim_sigint (pim_signals.c:44)
==1125== by 0x4910C22: frr_sigevent_process (sigevent.c:133)
==1125== by 0x49220A4: thread_fetch (thread.c:1777)
==1125== by 0x48DC8E2: frr_run (libfrr.c:1222)
==1125== by 0x15E12A: main (pim_main.c:176)
==1125== Address 0x6274d28 is 1,192 bytes inside a block of size 1,752 free'd
==1125== at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1125== by 0x174FF1: pim_vrf_delete (pim_instance.c:181)
==1125== by 0x4925480: vrf_delete (vrf.c:264)
==1125== by 0x4925480: vrf_delete (vrf.c:238)
==1125== by 0x49332C7: zclient_vrf_delete (zclient.c:2187)
==1125== by 0x4934319: zclient_read (zclient.c:4003)
==1125== by 0x492249C: thread_call (thread.c:2008)
==1125== by 0x48DC8D7: frr_run (libfrr.c:1223)
==1125== by 0x15E12A: main (pim_main.c:176)
==1125== Block was alloc'd at
==1125== at 0x4837B65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1125== by 0x48E80AF: qcalloc (memory.c:116)
==1125== by 0x1750DA: pim_instance_init (pim_instance.c:90)
==1125== by 0x1750DA: pim_vrf_new (pim_instance.c:161)
==1125== by 0x4924FDC: vrf_get (vrf.c:183)
==1125== by 0x493334C: zclient_vrf_add (zclient.c:2157)
==1125== by 0x4934319: zclient_read (zclient.c:4003)
==1125== by 0x492249C: thread_call (thread.c:2008)
==1125== by 0x48DC8D7: frr_run (libfrr.c:1223)
==1125== by 0x15E12A: main (pim_main.c:176)
and you do this series of events:
a) Create a vrf, put an interface in it
b) Turn on pim on that interface and turn on pim in that vrf
c) Delete the vrf
d) Do anything with the interface, in this case shutdown the system
The move of the interface to a new vrf is leaving the pim_ifp->pim pointer pointing
at the old pim instance, which was just deleted, so the instance pointer was freed.
Let's clean up the pim pointer in the interface pointer as well.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit e60308f498)
Description:
After area range config, summary lsas are aggerated to configured
route but later it was being flushed instead of the actual summary
lsa. This was seen when prefix-id of the aggregated route is same
as one of the actual summary route.
Here, aggregated summary lsa need to be returned to set the flag
SUMMARY_APPROVE after originating aggregated summary lsa but its not.
Which is being cleaned up as part of unapproved summary cleanup.
Corrected this now.
Issue: #13028
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
(cherry picked from commit c8c1a240ab)
Prevent a use after free and tell the bfd subsystem
we are shutting down in staticd.
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460:==2264460==ERROR: AddressSanitizer: heap-use-after-free on address 0x61f000004698 at pc 0x7f65d1eb11b2 bp 0x7ffdbface490 sp 0x7ffdbface488
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-READ of size 4 at 0x61f000004698 thread T0
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #0 0x7f65d1eb11b1 in zclient_bfd_command lib/bfd.c:307
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #1 0x7f65d1eb20f5 in _bfd_sess_send lib/bfd.c:507
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #2 0x7f65d20510aa in thread_call lib/thread.c:1989
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #3 0x7f65d2051f0a in _thread_execute lib/thread.c:2081
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #4 0x7f65d1eb271b in _bfd_sess_remove lib/bfd.c:544
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #5 0x7f65d1eb278d in bfd_sess_free lib/bfd.c:553
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #6 0x7f65d1eb5400 in bfd_protocol_integration_finish lib/bfd.c:1029
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #7 0x7f65d1f42f77 in hook_call_frr_fini lib/libfrr.c:41
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #8 0x7f65d1f494a1 in frr_fini lib/libfrr.c:1199
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #9 0x563b7abefd76 in sigint staticd/static_main.c:70
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #10 0x7f65d200ef91 in frr_sigevent_process lib/sigevent.c:115
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #11 0x7f65d204fac6 in thread_fetch lib/thread.c:1758
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #12 0x7f65d1f49377 in frr_run lib/libfrr.c:1184
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #13 0x563b7abefed1 in main staticd/static_main.c:160
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #14 0x7f65d1b92d09 in __libc_start_main ../csu/libc-start.c:308
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460- #15 0x563b7abefa99 in _start (/usr/lib/frr/staticd+0x15a99)
./bfd_topo3.test_bfd_topo3/r4.staticd.asan.2264460-
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit 7a185ac85e)
Issue:
When a netns is deleted, since zebra doesn’t receive interface down/delete
notifications from kernel, it manually deletes the interface without removing
the association between zebra_l3vni and the interface that is being deleted
(i.e it deletes the interface without setting “zl3vni->vxlan_if” to NULL).
Later, during the deletion of netns, when zl3vni_rmac_uninstall() is called to
uninstall the remote RMAC from the kernel, zebra ends up accessing stale
“zl3vni->vxlan_if” pointer, which now points to freed memory.
This was causing heap use-after-free.
Fix:
Before zebra starts deleting the interfaces when it receives netns delete notification,
appropriate functions() are being called to remove the association between evpn structs
and interface and set “zl3vni->vxlan_if” to NULL. This ensures that when
zl3vni_rmac_uninstall() is called during netns deletion, it will bail because
“zl3vni->vxlan_if” is NULL.
Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
(cherry picked from commit 7eefea98ba)
When deleting a bfd peer during shutdown, let's ensure
that any scheduled events are actually stopped.
==7759== Invalid read of size 4
==7759== at 0x48BF700: _bfd_sess_valid (bfd.c:419)
==7759== by 0x48BF700: _bfd_sess_send (bfd.c:470)
==7759== by 0x492F79C: thread_call (thread.c:2008)
==7759== by 0x48E9BD7: frr_run (libfrr.c:1223)
==7759== by 0x1C739B: main (bgp_main.c:550)
==7759== Address 0xfb687a4 is 4 bytes inside a block of size 272 free'd
==7759== at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==7759== by 0x48BFA5A: bfd_sess_free (bfd.c:535)
==7759== by 0x2B7034: bgp_peer_remove_bfd (bgp_bfd.c:339)
==7759== by 0x29FF8A: peer_free (bgpd.c:1160)
==7759== by 0x29FF8A: peer_unlock_with_caller (bgpd.c:1192)
==7759== by 0x2A0506: peer_delete (bgpd.c:2633)
==7759== by 0x208190: bgp_stop (bgp_fsm.c:1639)
==7759== by 0x20C082: bgp_event_update (bgp_fsm.c:2751)
==7759== by 0x492F79C: thread_call (thread.c:2008)
==7759== by 0x48E9BD7: frr_run (libfrr.c:1223)
==7759== by 0x1C739B: main (bgp_main.c:550)
==7759== Block was alloc'd at
==7759== at 0x4837B65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==7759== by 0x48F53AF: qcalloc (memory.c:116)
==7759== by 0x48BF98D: bfd_sess_new (bfd.c:397)
==7759== by 0x2B76DC: bgp_peer_configure_bfd (bgp_bfd.c:298)
==7759== by 0x2B76DC: bgp_peer_configure_bfd (bgp_bfd.c:279)
==7759== by 0x29BA06: peer_group2peer_config_copy (bgpd.c:2803)
==7759== by 0x2A3D96: peer_create_bind_dynamic_neighbor (bgpd.c:4107)
==7759== by 0x2A4195: peer_lookup_dynamic_neighbor (bgpd.c:4239)
==7759== by 0x21AB72: bgp_accept (bgp_network.c:422)
==7759== by 0x492F79C: thread_call (thread.c:2008)
==7759== by 0x48E9BD7: frr_run (libfrr.c:1223)
==7759== by 0x1C739B: main (bgp_main.c:550)
tl;dr -> Effectively, in this test setup we have 300 dynamic bgp
sessions all of which are using bfd. When a peer collision is detected
or we remove the peers, if an event has been scheduled but not actually
executed yet the event event was not actually being stopped, leaving
the bsp pointer on the thread->arg and causing a crash when it is
executed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
(cherry picked from commit f83431c7e8)
Due to the wrong input argv id, "argv[idx_word]->arg"
fetched in-correctly and it clears all the route-maps instead of
specific one.
Now correct argv id is passed to clear the given route-map counters.
Also, use RMAP_NAME which allows to show list of configured
route-maps in the system.
After Fix:-
Ticket:#3407773
Issue:3407773
Testing: UT done
Before:
TORC11# clear route-map counters
<cr>
WORD route-map name
After:
TORC11# clear route-map counters
<cr>
RMAP_NAME route-map name
my-as
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Signed-off-by: Sindhu Parvathi Gopinathan's <sgopinathan@nvidia.com>
(cherry picked from commit 463110f733)
Crash:
(gdb) bt
0 0x00007fee27de15cb in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
1 0x00007fee280ecd9c in core_handler (signo=11, siginfo=0x7ffe56001bb0, context=<optimized out>) at lib/sigevent.c:264
2 <signal handler called>
3 0x0000555e321c41b2 in prefix_rd2str (prd=0x10, buf=buf@entry=0x7ffe56002080 "27.0.0.R\340\373\062\062^U", size=size@entry=28) at bgpd/bgp_rd.c:168
4 0x0000555e321c431a in printfrr_prd (buf=0x7ffe560021a0, ea=<optimized out>, ptr=<optimized out>) at bgpd/bgp_rd.c:224
5 0x00007fee2812069b in vbprintfrr (cb_in=cb_in@entry=0x7ffe56002330, fmt0=fmt0@entry=0x555e3229a3ad " RD: %pRD\n", ap=ap@entry=0x7ffe560023d8) at lib/printf/vfprintf.c:564
6 0x00007fee28122ef7 in vasnprintfrr (mt=mt@entry=0x7fee281cb5e0 <MTYPE_VTY_OUT_BUF>, out=out@entry=0x7ffe560023f0 " RD: : R\n", outsz=outsz@entry=1024, fmt=fmt@entry=0x555e3229a3ad " RD: %pRD\n", ap=ap@entry=0x7ffe560023d8) at lib/printf/glue.c:103
7 0x00007fee28103504 in vty_out (vty=vty@entry=0x555e33f82d10, format=format@entry=0x555e3229a3ad " RD: %pRD\n") at lib/vty.c:190
8 0x0000555e32185156 in bgp_evpn_es_show_entry_detail (vty=0x555e33f82d10, es=0x555e33c38420, json=<optimized out>) at bgpd/bgp_evpn_mh.c:2655
9 0x0000555e32188fe5 in bgp_evpn_es_show (vty=vty@entry=0x555e33f82d10, uj=false, detail=true) at bgpd/bgp_evpn_mh.c:2721
notice prd=0x10 in #3. This is because in bgp_evpn_mh.c we are sending &es->es_base_frag->prd.
There is one spot in the code where during output the es->es_base_frag is checked for non nullness
Let's just make sure it's right in all the places.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The same as 61c07b9d43, but forgot to put IPv6
in place.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
(cherry picked from commit 14c1e0a169)