The function ospf6_router_lsa_contains_adj(), ospf6_gr_check_adjs() and ospf6_find_interf_prefix_lsa() iterate through LSDB and lock each LSA. During testing, it was discovered that the lock count did not reach zero upon termination. The stack trace below indicates the leak. To resolve this issue, it was found that unlocking the LSA before returning from the functions solves the problem. This suggests that there was a missing unlock that caused the lock count to remain nonzero.
=================================================================
==22565==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 400 byte(s) in 2 object(s) allocated from:
#0 0x7fa744ccea37 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7fa744867562 in qcalloc ../lib/memory.c:105
#2 0x555cdbb37506 in ospf6_lsa_alloc ../ospf6d/ospf6_lsa.c:710
#3 0x555cdbb375d6 in ospf6_lsa_create ../ospf6d/ospf6_lsa.c:725
#4 0x555cdbaf1008 in ospf6_receive_lsa ../ospf6d/ospf6_flood.c:912
#5 0x555cdbb48ceb in ospf6_lsupdate_recv ../ospf6d/ospf6_message.c:1621
#6 0x555cdbb4ac90 in ospf6_read_helper ../ospf6d/ospf6_message.c:1896
#7 0x555cdbb4aecc in ospf6_receive ../ospf6d/ospf6_message.c:1925
#8 0x7fa744950c33 in event_call ../lib/event.c:1995
#9 0x7fa74483b34a in frr_run ../lib/libfrr.c:1213
#10 0x555cdbacf1eb in main ../ospf6d/ospf6_main.c:250
#11 0x7fa7443f9d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Objects leaked above:
0x6110000606c0 (200 bytes)
0x611000060940 (200 bytes)
Indirect leak of 80 byte(s) in 2 object(s) allocated from:
#0 0x7fa744cce867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x7fa744867525 in qmalloc ../lib/memory.c:100
#2 0x555cdbb37520 in ospf6_lsa_alloc ../ospf6d/ospf6_lsa.c:711
#3 0x555cdbb375d6 in ospf6_lsa_create ../ospf6d/ospf6_lsa.c:725
#4 0x555cdbaf1008 in ospf6_receive_lsa ../ospf6d/ospf6_flood.c:912
#5 0x555cdbb48ceb in ospf6_lsupdate_recv ../ospf6d/ospf6_message.c:1621
#6 0x555cdbb4ac90 in ospf6_read_helper ../ospf6d/ospf6_message.c:1896
#7 0x555cdbb4aecc in ospf6_receive ../ospf6d/ospf6_message.c:1925
#8 0x7fa744950c33 in event_call ../lib/event.c:1995
#9 0x7fa74483b34a in frr_run ../lib/libfrr.c:1213
#10 0x555cdbacf1eb in main ../ospf6d/ospf6_main.c:250
#11 0x7fa7443f9d8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Objects leaked above:
0x6040000325d0 (40 bytes)
0x604000032650 (40 bytes)
SUMMARY: AddressSanitizer: 480 byte(s) leaked in 4 allocation(s).
=================================================================
==5483==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 2000 byte(s) in 10 object(s) allocated from:
#0 0x7f2c3faeea37 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
#1 0x7f2c3f68a6d9 in qcalloc ../lib/memory.c:105
#2 0x56431b83633d in ospf6_lsa_alloc ../ospf6d/ospf6_lsa.c:710
#3 0x56431b83640d in ospf6_lsa_create ../ospf6d/ospf6_lsa.c:725
#4 0x56431b7efe13 in ospf6_receive_lsa ../ospf6d/ospf6_flood.c:912
#5 0x56431b847b31 in ospf6_lsupdate_recv ../ospf6d/ospf6_message.c:1621
#6 0x56431b849ad6 in ospf6_read_helper ../ospf6d/ospf6_message.c:1896
#7 0x56431b849d12 in ospf6_receive ../ospf6d/ospf6_message.c:1925
#8 0x7f2c3f773c62 in event_call ../lib/event.c:1995
#9 0x7f2c3f65e2de in frr_run ../lib/libfrr.c:1213
#10 0x56431b7cdff6 in main ../ospf6d/ospf6_main.c:221
#11 0x7f2c3f21dd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Objects leaked above:
0x611000060800 (200 bytes)
0x611000060a80 (200 bytes)
0x611000060d00 (200 bytes)
0x611000060f80 (200 bytes)
0x611000061200 (200 bytes)
0x611000061480 (200 bytes)
0x611000061840 (200 bytes)
0x611000061ac0 (200 bytes)
0x61100006c740 (200 bytes)
0x61100006d500 (200 bytes)
Indirect leak of 460 byte(s) in 10 object(s) allocated from:
#0 0x7f2c3faee867 in __interceptor_malloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:145
#1 0x7f2c3f68a69c in qmalloc ../lib/memory.c:100
#2 0x56431b836357 in ospf6_lsa_alloc ../ospf6d/ospf6_lsa.c:711
#3 0x56431b83640d in ospf6_lsa_create ../ospf6d/ospf6_lsa.c:725
#4 0x56431b7efe13 in ospf6_receive_lsa ../ospf6d/ospf6_flood.c:912
#5 0x56431b847b31 in ospf6_lsupdate_recv ../ospf6d/ospf6_message.c:1621
#6 0x56431b849ad6 in ospf6_read_helper ../ospf6d/ospf6_message.c:1896
#7 0x56431b849d12 in ospf6_receive ../ospf6d/ospf6_message.c:1925
#8 0x7f2c3f773c62 in event_call ../lib/event.c:1995
#9 0x7f2c3f65e2de in frr_run ../lib/libfrr.c:1213
#10 0x56431b7cdff6 in main ../ospf6d/ospf6_main.c:221
#11 0x7f2c3f21dd8f in __libc_start_call_main ../sysdeps/nptl/libc_start_call_main.h:58
Objects leaked above:
0x604000033110 (40 bytes)
0x604000033190 (40 bytes)
0x604000033210 (44 bytes)
0x604000033290 (44 bytes)
0x604000033310 (44 bytes)
0x604000033390 (44 bytes)
0x604000033410 (44 bytes)
0x604000033490 (44 bytes)
0x604000034c90 (44 bytes)
0x6070000d3830 (72 bytes)
SUMMARY: AddressSanitizer: 2460 byte(s) leaked in 20 allocation(s).
Signed-off-by: ryndia <dindyalsarvesh@gmail.com>
The loading_done event needs a event pointer to prevent
use after free's. Testing found this:
ERROR: AddressSanitizer: heap-use-after-free on address 0x613000035130 at pc 0x55ad42d54e5f bp 0x7ffff1e942a0 sp 0x7ffff1e94290
READ of size 1 at 0x613000035130 thread T0
#0 0x55ad42d54e5e in loading_done ospf6d/ospf6_neighbor.c:447
#1 0x55ad42ed7be4 in event_call lib/event.c:1995
#2 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#3 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#4 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
#5 0x55ad42cf2b19 in _start (/usr/lib/frr/ospf6d+0x248b19)
0x613000035130 is located 48 bytes inside of 384-byte region [0x613000035100,0x613000035280)
freed by thread T0 here:
#0 0x7f57998d77a8 in __interceptor_free (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xde7a8)
#1 0x55ad42e3b4b6 in qfree lib/memory.c:130
#2 0x55ad42d5d049 in ospf6_neighbor_delete ospf6d/ospf6_neighbor.c:180
#3 0x55ad42d1e1ea in interface_down ospf6d/ospf6_interface.c:930
#4 0x55ad42ed7be4 in event_call lib/event.c:1995
#5 0x55ad42ed84fe in _event_execute lib/event.c:2086
#6 0x55ad42d26d7b in ospf6_interface_clear ospf6d/ospf6_interface.c:2847
#7 0x55ad42d73f16 in ospf6_process_reset ospf6d/ospf6_top.c:755
#8 0x55ad42d7e98c in clear_router_ospf6_magic ospf6d/ospf6_top.c:778
#9 0x55ad42d7e98c in clear_router_ospf6 ospf6d/ospf6_top_clippy.c:42
#10 0x55ad42dc2665 in cmd_execute_command_real lib/command.c:994
#11 0x55ad42dc2b32 in cmd_execute_command lib/command.c:1053
#12 0x55ad42dc2fa9 in cmd_execute lib/command.c:1221
#13 0x55ad42ee3cd6 in vty_command lib/vty.c:591
#14 0x55ad42ee4170 in vty_execute lib/vty.c:1354
#15 0x55ad42eec94f in vtysh_read lib/vty.c:2362
#16 0x55ad42ed7be4 in event_call lib/event.c:1995
#17 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#18 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#19 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
previously allocated by thread T0 here:
#0 0x7f57998d7d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
#1 0x55ad42e3ab22 in qcalloc lib/memory.c:105
#2 0x55ad42d5c8ff in ospf6_neighbor_create ospf6d/ospf6_neighbor.c:119
#3 0x55ad42d4c86a in ospf6_hello_recv ospf6d/ospf6_message.c:464
#4 0x55ad42d4c86a in ospf6_read_helper ospf6d/ospf6_message.c:1884
#5 0x55ad42d4c86a in ospf6_receive ospf6d/ospf6_message.c:1925
#6 0x55ad42ed7be4 in event_call lib/event.c:1995
#7 0x55ad42e1df75 in frr_run lib/libfrr.c:1213
#8 0x55ad42cf332e in main ospf6d/ospf6_main.c:250
#9 0x7f5798133c86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
Add an actual event pointer and just track it appropriately.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
I'm seeing crashes in ospf6_write on the `assert(node)`. The only
sequence of events that I see that could possibly cause this to happen
is this:
a) Someone has scheduled a outgoing write to the ospf6->t_write and
placed item(s) on the ospf6->oi_write_q
b) A decision is made in ospf6_send_lsupdate() to send an immediate
packet via a event_execute(..., ospf6_write,....).
c) ospf6_write is called and the oi_write_q is cleaned out.
d) the t_write event is now popped and the oi_write_q is empty
and FRR asserts on the `assert(node)` <crash>
When event_execute is called for ospf6_write, just cancel the t_write
event. If ospf6_write has more data to send at the end of the function
it will reschedule itself. I've only seen this crash one time and am
unable to reliably reproduce this at all. But this is the only mechanism
that I can see that could make this happen, given how little the oi_write_q
is actually touched in code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running all daemons with config for most of them, FRR has
sharpd@janelle:~/frr$ vtysh -c "show debug hashtable" | grep "VRF BIT HASH" | wc -l
3570
3570 hashes for bitmaps associated with the vrf. This is a very
large number of hashes. Let's do two things:
a) Reduce the created size of the actually created hashes to 2
instead of 32.
b) Delay generation of the hash *until* a set operation happens.
As that no hash directly implies a unset value if/when checked.
This reduces the number of hashes to 61 in my setup for normal
operation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
According to RFC 7166, the sequence number should be treated as an
unsigned 64-bit value, although it is stored as two 32-bit values.
When incrementing it, the code caused the lower-order 32-bit value
to skip from 0xFFFFFFFE to 0. As a side effect, an error was never
produced if the full 64-bit sequence number wrapped.
Fixes: #13805
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Commit 76249532fa ("ospf6d: Handle Premature Aging of LSAs") added a
duplicate call to OSPF6_INTRA_PREFIX_LSA_EXECUTE_TRANSIT(), when the
interface state changes to "Down".
Fixes: #1738
Signed-off-by: David Ward <david.ward@ll.mit.edu>
Change timestamp parameter from int to time_t to avoid truncation.
Found by Coverity Scan (CID 1563226 and 1563222)
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This command makes unplanned GR more reliable by manipulating the
sending of Grace-LSAs and Hello packets for a certain amount of time,
increasing the chance that the neighboring routers are aware of
the ongoing graceful restart before resuming normal OSPF operation.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
In practical terms, unplanned GR refers to the act of recovering
from a software crash without affecting the forwarding plane.
Unplanned GR and Planned GR work virtually the same, except for the
following difference: on planned GR, the router sends the Grace-LSAs
*before* restarting, whereas in unplanned GR the router sends the
Grace-LSAs immediately *after* restarting.
For unplanned GR to work, ospf6d was modified to send a
ZEBRA_CLIENT_GR_CAPABILITIES message to zebra as soon as GR is
enabled. This causes zebra to freeze the OSPF routes in the RIB as
soon as the ospf6d daemon dies, for as long as the configured grace
period (the defaults is 120 seconds). Similarly, ospf6d now stores in
non-volatile memory that GR is enabled as soon as GR is configured.
Those two things are no longer done during the GR preparation phase,
which only happens for planned GRs.
Unplanned GR will only take effect when the daemon is killed
abruptly (e.g. SIGSEGV, SIGKILL), otherwise all OSPF routes will be
uninstalled while ospf6d is exiting. Once ospf6d starts, it will
check whether GR is enabled and enter in the GR mode if necessary,
sending Grace-LSAs out all operational interfaces.
One disadvantage of unplanned GR is that the neighboring routers
might time out their corresponding adjacencies if ospf6d takes too
long to come back up. This is especially the case when short dead
intervals are used (or BFD). For this and other reasons, planned
GR should be preferred whenever possible.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When ospf6 is started up and SPF is run depending on which route is
selected as the parent route we could miss adding a NH. If one
possible parent route has two equal cost paths and the second possible
parent route has only one depending on which one is selected first
determines if we have have one or two NHs.
In the network below when creating a route 2001:db8:3:4::/64 in R2.
When SPF is run there are two possible parent routes R3 and R4.
2001:db8:1:2 +-----+ 2001:db8:2:5
+--------------+ 2 +---------------+
| ::2 | | ::2 |
| +-----+ |
| |
::1| |
+-----+ |::5
| 1 |2001:db8:1:3+-----+2001:db8:3:5+-----+2001:db8:5:6+-----+
| +------------+ 3 +------------+ 5 +------------+ 6 |
+-----+ ::1 ::3 | |::3 ::5 | |::5 ::6| |
::1| +-----+ +-----+ +-----+
| |::3
| | 2001:db8:3:4
| |
| |::4
| 2001:db8:1:4 +-----+
+--------------+ 4 |
::4 | |
+-----+
The problem was if we first created the route to 2001:db8:3:4::/64 with R3
as the parent route all is fine. The code was merging the NH from the parent
route and R3 has 2 NH, one pointing to R1 and one to R5. But if route
2001:db8:3:4::/64 was first created with parent as R4, it has only one NH
pointing to R1, and then later a new vertex was created pointing to R3 the
code would only copy the nhs from the vertex not from the parent route. The
vertex always has just one NH. But the parent route could have more. So
when we would bringup this setup one time we would see one NH for
2001:db8:3:4::/64 and the next time we would see two NHs. The code has been
modified so that it behaves the same when the route is first created, or when
a vertex is created, it selects the NHs from the parent route.
Signed-off-by: Lynne Morrison <lynne@voltanet.io>
Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a hash_clean_and_free() function as well as convert
the code to use it. This function also takes a double
pointer to the hash to set it NULL. Also it cleanly
does nothing if the pointer is NULL( as a bunch of
code tested for ).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The GR code should check for topology changes only upon the receipt
of Router-LSAs and Network-LSAs. Other LSAs types don't affect the
topology as far as a restarting router is concerned.
This optimization reduces unnecessary computations when the
restarting router receives thousands of inter-area LSAs or external
LSAs while coming back up.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Upon exiting the GR mode, force reorigination of intra-area-prefix-LSAs
on all attached areas. This is to ensure all configured areas will have
an associated intra-area-prefix-LSA at the end of the GR procedures,
even if the area doesn't have any full adjacency.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This commit fixes a bug where self-originated NSSA/AS-External LSAs
would age out about one hour after exiting from the GR mode. The
reason is because received self-originated LSAs aren't registered
for periodic refreshing, so that needs to be done manually. Fix
this by explicitly reoriginating all NSSA/AS-External LSAs while
exiting from the GR mode.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
An ABR that is originating inter-area-prefix-LSAs should take into
account the fact that there might be self-originated LSAs for the
same prefixes that were originated prior to a graceful restart. When
that happens, the previous LSA-IDs should be reused to avoid having
duplicate LSAs.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
RFC 5340 says that Link-LSAs should not be originated for virtual
links. Add a check to prevent that from happening while exiting
from the GR mode.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Consider the case where a DR router is performing a graceful restart,
and all neighbors attached to the DR network have their priorities
set to zero.
According to RFC 3623, the router should reclaim its DR status while
coming back up once it receives a Hello packet from a neighbor
listing the router as the DR, and the associated interface is in
Waiting state.
The problem arises when the DR election starts. Since the router
is already elected the DR, and no BDR will be elected (since all
neighbors have their priorities set to zero), the AdjOk event won't
be triggered at the end of the DR election as it would normally
happen. That causes all neighbors reachable over the broadcast
interface to get stuck in the 2-Way state.
Fix this corner case by always triggering the AdjOk event at the
end of the DR election process when a GR is in progress. Triggering
the AdjOk event when not necessary should never be a problem as
the neighbor FSM is already prepared to deal with that.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
RFC 5340, Section 4.8.3 says:
"Prefixes having the NU-bit set in their PrefixOptions field should
be ignored by the inter-area route calculation".
Fix a bug where, in addition to the NU-bit, ospf6d was also ignoring
prefixes having the LA-bit set when computing inter-area routes. In
practice, this fixes interoperability issues with vendors that set
the LA-bit in loopback prefixes (among other cases).
While here, fix a copy-and-paste error where a log message wasn't
showing accurate information about what happened.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Do not assume that all redistributed routes have a nexthop address,
otherwise blackhole nexthops can be misinterpreted as IPv6 addresses,
leading to inconsistencies.
Also, change the signature of a few functions to allow const nexthop
addresses, such that in6addr_any can be used without type casts.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Originate AS-External LSAs with forwarding addresses whenever the
corresponding redistributed routes have a global nexthop address.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Don't directly use `time()` for generating sequence numbers for two
reasons:
1. `time()` can go backwards (due to NTP or time adjustments)
2. Coverity Scan warns every time we truncate a `time_t` variable for
good reason (verify that we are Y2K38 ready).
Found by Coverity Scan (CID 1519812, 1519786, 1519783 and 1519772)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Problem Statement:
=================
Memory leak backtraces
2022-11-23 01:51:10,525 - ERROR: ==842== 1,100 (1,000 direct, 100 indirect) bytes in 5 blocks are definitely lost in loss record 29 of 31
2022-11-23 01:51:10,525 - ERROR: ==842== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_dbdesc_recv_slave (ospf6_message.c:976)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_dbdesc_recv (ospf6_message.c:1038)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,526 - ERROR: ==842== by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,526 - ERROR: ==842== by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,526 - ERROR: ==842==
2022-11-23 01:51:10,524 - ERROR: Found memory leak in module ospf6d
2022-11-23 01:51:10,525 - ERROR: ==842== 220 (200 direct, 20 indirect) bytes in 1 blocks are definitely lost in loss record 21 of 31
2022-11-23 01:51:10,525 - ERROR: ==842== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_dbdesc_recv_master (ospf6_message.c:760)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_dbdesc_recv (ospf6_message.c:1036)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,525 - ERROR: ==842==
RCA:
====
These memory leaks are beacuse of last lsa in neighbour's request_list is not
getting freed beacuse of lsa lock. The last request has an addtional lock which
is added as a part of ospf6_make_lsreq, this lock needs to be removed
in order for the lsa to get freed.
Fix:
====
Check and remove the lock on the last request in all the functions.
Signed-off-by: Manoj Naragund <mnaragund@vmware.com>
When using auth keys in ospfv3, there are some memory
leaks when you change the key or remove the interface
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The ospf6_route_cmp_nexthops function was returning 0 for same
and 1 for not same. Let's reverse the polarity and actually make
the returns useful long term.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Commit 8f359e1593 removed a check that prevented the same route
from being added twice. In certain topologies, that change resulted in
the following infinite loop when adding an ASBR route:
ospf6_route_add
ospf6_top_brouter_hook_add
ospf6_abr_examin_brouter
ospf6_abr_examin_summary
ospf6_route_add
(repeat until stack overflow)
Revert the offending commit and update `ospf6_route_is_identical()` to
not do comparison using `memcmp()`.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Problem:
Delay in ospfv3 route installation when area gets converted to regular
from NSSA.
RCA:
when area gets converted from NSSA to normal the type-7(NSSA_LSAs)
gets flushed from the area, as a result the external routes
learnt from these type-7s gets removed. Once the area is moved
to nomral the type 5 lsas needs to flooded through the area
so that routes are re-learnt. however there is a delay in
flooding of these routes until these routes are refreshed.
Due to this there is delay installation of these routes.
Fix:
The Fix involves refreshing of the type 5 lsas once the area
is changed from nssa to regular area.
Signed-off-by: Manoj Naragund <mnaragund@vmware.com>
donatas-pc# sh ipv6 ospf6 interface enp3s0
enp3s0 is up, type BROADCAST
Interface ID: 2
Internet Address:
inet : 192.168.10.17/24
inet6: fe80::ca5d:fd0d:cd8:1bb7/64
Instance ID 0, Interface MTU 1500 (autodetect: 1500)
MTU mismatch detection: enabled
Area ID 0.0.0.0, Cost 1000
State Waiting, Transmit Delay 1 sec, Priority 1
Timer intervals configured:
Hello 10(8.149), Dead 40, Retransmit 5
DR: 0.0.0.0 BDR: 0.0.0.0
Number of I/F scoped LSAs is 1
0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
Authentication Trailer is disabled
donatas-pc# con
donatas-pc(config)# int enp3s0
donatas-pc(config-if)# ipv6 ospf6 passive
donatas-pc(config-if)# do sh ipv6 ospf6 interface enp3s0
enp3s0 is up, type BROADCAST
Interface ID: 2
Internet Address:
inet : 192.168.10.17/24
inet6: fe80::ca5d:fd0d:cd8:1bb7/64
Instance ID 0, Interface MTU 1500 (autodetect: 1500)
MTU mismatch detection: enabled
Area ID 0.0.0.0, Cost 1000
State Waiting, Transmit Delay 1 sec, Priority 1
Timer intervals configured:
No Hellos (Passive interface)
DR: 0.0.0.0 BDR: 0.0.0.0
Number of I/F scoped LSAs is 1
0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
Authentication Trailer is disabled
donatas-pc(config-if)#
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
ospfd and ospf6d define the same metric-type route-map commands. Make
them have the same help string too.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Rather than running selected source files through the preprocessor and a
bunch of perl regex'ing to get the list of all DEFUNs, use the data
collected in frr.xref.
This not only eliminates issues we've been having with preprocessor
failures due to nonexistent header files, but is also much faster.
Where extract.pl would take 5s, this now finishes in 0.2s. And since
this is a non-parallelizable build step towards the end of the build
(dependent on a lot of other things being done already), the speedup is
actually noticeable.
Also files containing CLI no longer need to be listed in `vtysh_scan`
since the .xref data covers everything. `#ifndef VTYSH_EXTRACT_PL`
checks are equally obsolete.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Problem:
Multiple memory leaks in ospf6.
260 ==6637== 32 bytes in 1 blocks are definitely lost in loss record 5 of 24
261 ==6637== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
262 ==6637== by 0x4E8A1BF: qcalloc (memory.c:111)
263 ==6637== by 0x11EE27: ospf6_summary_add_aggr_route_and_blackhole (ospf6_asbr.c:2779)
264 ==6637== by 0x11EEBA: ospf6_originate_new_aggr_lsa (ospf6_asbr.c:2811)
265 ==6637== by 0x4E7C6A7: hash_clean (hash.c:325)
266 ==6637== by 0x11FA93: ospf6_handle_external_aggr_update (ospf6_asbr.c:3164)
267 ==6637== by 0x11FA93: ospf6_asbr_summary_process (ospf6_asbr.c:3386)
268 ==6637== by 0x4EB739B: thread_call (thread.c:1692)
269 ==6637== by 0x4E85B17: frr_run (libfrr.c:1068)
270 ==6637== by 0x119535: main (ospf6_main.c:228)
356 ==6637== 240 bytes in 12 blocks are indirectly lost in loss record 13 of 24
357 ==6637== at 0x4C2FE96: malloc (vg_replace_malloc.c:309)
358 ==6637== by 0x4E8A0DA: qmalloc (memory.c:106)
359 ==6637== by 0x13545C: ospf6_lsa_alloc (ospf6_lsa.c:724)
360 ==6637== by 0x1354E3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
361 ==6637== by 0x1355F2: ospf6_lsa_copy (ospf6_lsa.c:790)
362 ==6637== by 0x13B58B: ospf6_dbdesc_recv_slave (ospf6_message.c:976)
363 ==6637== by 0x13B58B: ospf6_dbdesc_recv (ospf6_message.c:1038)
364 ==6637== by 0x13B58B: ospf6_read_helper (ospf6_message.c:1838)
365 ==6637== by 0x13B58B: ospf6_receive (ospf6_message.c:1875)
366 ==6637== by 0x4EB739B: thread_call (thread.c:1692)
367 ==6637== by 0x4E85B17: frr_run (libfrr.c:1068)
368 ==6637== by 0x119535: main (ospf6_main.c:228)
RCA:
1. when the ospf6 area is being deleted, the neighbor related information
was not being cleaned up.
2. when aggr route gets deleted from rt_aggr_tbl the corrsponding summary
route attched to the aggr route was not being deleted.
Fix:
Added the ospf6_neighbor_delete in ospf6_area_delete to free the
neighbor related information and added ospf6_route_delete while
freeing external aggr route to free the summary route.
Signed-off-by: Manoj Naragund <mnaragund@vmware.com>
Description:
Active GR count field is missing in json o/p
of 'show ipv6 ospf gr helper' command.
Issue: #12100
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
There are lib debugs being set but never show up in
`show debug` commands because there was no way to show
that they were being used. Add a bit of infrastructure
to allow this and then use it for `debug route-map`
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fix issue #11839.
When the user defines a range in an area other than the backbone area, the
summary route will be announced to the backbone area as an inter-area LSA.
However, if the prefix defined in the range is the same prefix as a connected
route in that area, the LSA won't be announced to the backbone area.
This is because when ospf6d is originating the summary route for the
intra-area route, it finds the range configured by the user and tries to
suppress the route by deleting the existing summary route, which happens to be
the one created by the range.
Although the range definition is not necessary in this case, it should not
fail this use case. So let's just keep the summary route there if it is
created from the user defined range.
Signed-off-by: Xiaodong Xu <stid.smth@gmail.com>
After all needed interfaces ( for example: interface "a1", vrf "vrf1", and
"a1" is binded to "vrf1" ) are ready/created, then restart/start frr. zebra
at startup will call `netlink_interface()` to process all interfaces and notify
all clients, but its calling `get_iflink_speed()` maybe fails for unexpected
order of the coming interfaces: when processing "a1", "vrf1" maybe is unknown
at that time. `if_zebra_speed_update()` timer is introduced to deal with this
order problem.
Currently only ospfd and ospf6d deal with this speed change to recalculated
route cost. ospfd can deal with this change, but ospf6d will wrongly missed it.
Since both `ipv6 ospf6 cost COST` and `auto-cost reference-bandwidth COST` are
not set, cost of this ospf6 interface should be calculated with interface
speed, but it is wrongly kept to `10`, which is based on interface speed being
`0` for it missed speed change. Further, ECMP function becomes invalid after
restart frr, beacuse some ospf6 interfaces of one ECMP are wrongly with cost
`10`.
To avoid missing, recalculate cost for ospf6 interfaces based on potentially
changed speed.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
1. topo test failure seen in below mentioned step of execution with
routes not synced with ABR
ospf6_gr_topo1/test_ospf6_gr_topo1.py::test_gr_rt3 - AssertionError:
"rt1" JSON output mismatches the expected result.
2. as experimental, increasing the sleep interval(21),
cleared the above step but failed in the step
FAILED ospf6_gr_topo1/test_ospf6_gr_topo1.py::test_gr_rt5 -
AssertionError: "rt2" JSON output mismatches the expected result
fix:
tuning retry parameter in check_routers cleared the topotest.
so, changing default value of ospf6 ABR task delay to 5 seconds.
Signed-off-by: Punith Kumar S <punith.shivakumar@sophos.com>
topology: C1--R1---R2---R3--C2
client C1 connected to router node R1
client C2 connected to router node R3
router nodes R1,R2 and R3 are back to back connected
area 0 configured between R1 and R2
R1: all routes of area 0 are learnt successfully
R2: all routes of area 0 are learnt successfully
area 1 configured between R2 and R3
R2: all routes are learnt from R3
R3: routes learnt from C1 on ABR router R2 does not get forward to R3
root cause: on interface start, ABR schedule task is missing.
fix: handle ABR schedule during interface start event
Signed-off-by: Punith Kumar S <punith.shivakumar@sophos.com>
It's possible for ospf6 to decide to delete a route after it's
removed all of the route's nexthops. It's ok to delete a prefix
alone - be a little more forgiving when preparing a route delete.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
OSPFv3 packets can be fragmented and up to 64k long, regardless of
interface MTU. Trying to size these buffers to MTU is just plain wrong.
To not make this a super intrusive change during the 8.3 release freeze,
just code this into ospf6_iobuf_size().
Since the buffer is now always 64k, don't waste time zeroing the entire
thing in receive; instead just zero kind of a "sled" of 128 bytes after
the buffer as a security precaution.
Fixes: #11298
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
a) Remove setting of thread pointer to NULL after
thread invocation, this is already done.
b) Use thread_is_scheduled()
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The ospf6_is_valid_summary_addr function is checking
to see if a prefix is the default and also then double
comparing it against the v6 prefix part. No need to do this.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If a end users does something like this:
int enp39s0
ipv6 ospf6 hello-interval 65535
And then the timer pops and we send the hello and immediately
if the end user does this:
ipv6 ospf6 hello-interval 5
The timer is not being reset and FRR waits the full 65k seconds
before sending the hello again, which then immediately sets
the next hello to go out in 5 seconds.
When FRR receives the new timer value, look at how much time
is left on the timer in seconds. If this value is greater
than the new hello timer, stop the timer and set it too that
value.
This should fix a CI system test failure found, where the
system is testing setting timer from things like 12 seconds
to 65k seconds then back down to 12 and that the ospf6 neighbor
relationship stays up.
The code was also changed from thread_add_event to thread_add_timer
in all cases. I am not sure what would happen if a show command
comes in for a thread timer remaining with an event instead of a timer
just make it consistent.
This was chased down because the support bundle showed this:
r0# show ipv6 ospf6 vrf all interface
r0-r1-eth0 is up, type BROADCAST
Interface ID: 6
Internet Address:
inet6: fe80::a4ea:d3ff:fe35:cef1/64
inet6: fd00::1/64
Instance ID 0, Interface MTU 1500 (autodetect: 1500)
MTU mismatch detection: enabled
Area ID 0.0.0.0, Cost 10
State DR, Transmit Delay 1 sec, Priority 1
Timer intervals configured:
Hello 12(65480.960), Dead 48, Retransmit 5
And looking at the test code is doing stuff like this:
2022/05/16 17:08:15 OSPF6: [M7Q4P-46WDR] vty[5]@(config)# interface r1-r0-eth0
2022/05/16 17:08:15 OSPF6: [M7Q4P-46WDR] vty[5]@(config-if)# ipv6 ospf6 hello-interval 65535
2022/05/16 17:08:15 OSPF6: [M7Q4P-46WDR] vty[5]@(config-if)# no ipv6 ospf6 hello-interval
2022/05/16 17:08:16 OSPF6: [M7Q4P-46WDR] vty[5]@(config-if)# ipv6 ospf6 hello-interval 1
2022/05/16 17:08:16 OSPF6: [M7Q4P-46WDR] vty[5]@(config-if)# ipv6 ospf6 hello-interval 12
If the old timer value pops, the hello interval is set to 65k and never reset again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running `show ipv6 ospf6 interface` the hello timer period
is shown, but there is no indication on how much time is left
on the timer. Add a clue:
sharpd@eva ~/frr5 (master)> vtysh -c "show ipv6 ospf6 int"
enp39s0 is up, type BROADCAST
Interface ID: 2
Internet Address:
inet : 192.168.119.224/24
inet6: 2603:6080:602:509e:9a14:998:b154:9e9/64
Instance ID 0, Interface MTU 1500 (autodetect: 1500)
MTU mismatch detection: enabled
Area ID 0.0.0.0, Cost 1000
State DR, Transmit Delay 1 sec, Priority 1
Timer intervals configured:
Hello 10(2.652), Dead 40, Retransmit 5
DR: 192.168.122.1 BDR: 0.0.0.0
Number of I/F scoped LSAs is 1
0 Pending LSAs for LSUpdate in Time 00:00:00 [thread off]
0 Pending LSAs for LSAck in Time 00:00:00 [thread off]
Authentication Trailer is disabled
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Firstly, *keep no change* for `hash_get()` with NULL
`alloc_func`.
Only focus on cases with non-NULL `alloc_func` of
`hash_get()`.
Since `hash_get()` with non-NULL `alloc_func` parameter
shall not fail, just ignore the returned value of it.
The returned value must not be NULL.
So in this case, remove the unnecessary checking NULL
or not for the returned value and add `void` in front
of it.
Importantly, also *keep no change* for the two cases with
non-NULL `alloc_func` -
1) Use `assert(<returned_data> == <searching_data>)` to
ensure it is a created node, not a found node.
Refer to `isis_vertex_queue_insert()` of isisd, there
are many examples of this case in isid.
2) Use `<returned_data> != <searching_data>` to judge it
is a found node, then free <searching_data>.
Refer to `aspath_intern()` of bgpd, there are many
examples of this case in bgpd.
Here, <returned_data> is the returned value from `hash_get()`,
and <searching_data> is the data, which is to be put into
hash table.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
A router has some static routes and redistributes turned on.
"clear ipv6 ospf process" command is applied. Then static routes
are deleted. In 1 in 5 runs, AS-External LSAs are not getting removed
from the neighbors even though it gets removed from its own LSDB.
Because of the clear process command, MAX_AGE LSAs are advertised and
fresh LSAs are installed in the LSDB. When the MAX_LSAs are advertised
back to the same router as part of the flooding process, it gets added
to the LSUpdate list even though it comes inside the MinLSArrival time.
When the static routes get deleted, it removed the LSA from the
LSRetrans list but not from LSUpdate list. The LSAs present in the
LSUpdate list gets advertised when sending LS Updates.
When an old copy of an LSA is more recent than the new LSA, check if it
has come inside the MinLSArrival time before adding to the LSUpdate
list.
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
ospf6_routemap_rule_match_interface uses route->ospf6 field for matching
so we must fill the field in our temporary variable.
Fixes#10911.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>