Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Problem Statement:
=================
Memory leak backtraces
2022-11-23 01:51:10,525 - ERROR: ==842== 1,100 (1,000 direct, 100 indirect) bytes in 5 blocks are definitely lost in loss record 29 of 31
2022-11-23 01:51:10,525 - ERROR: ==842== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_dbdesc_recv_slave (ospf6_message.c:976)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_dbdesc_recv (ospf6_message.c:1038)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13B64B: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,526 - ERROR: ==842== by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,526 - ERROR: ==842== by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,526 - ERROR: ==842==
2022-11-23 01:51:10,524 - ERROR: Found memory leak in module ospf6d
2022-11-23 01:51:10,525 - ERROR: ==842== 220 (200 direct, 20 indirect) bytes in 1 blocks are definitely lost in loss record 21 of 31
2022-11-23 01:51:10,525 - ERROR: ==842== at 0x4C31FAC: calloc (vg_replace_malloc.c:762)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E8A1BF: qcalloc (memory.c:111)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13555A: ospf6_lsa_alloc (ospf6_lsa.c:723)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x1355F3: ospf6_lsa_create_headeronly (ospf6_lsa.c:756)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x135702: ospf6_lsa_copy (ospf6_lsa.c:790)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_dbdesc_recv_master (ospf6_message.c:760)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_dbdesc_recv (ospf6_message.c:1036)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_read_helper (ospf6_message.c:1838)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x13BBCE: ospf6_receive (ospf6_message.c:1875)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4EB741B: thread_call (thread.c:1692)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x4E85B17: frr_run (libfrr.c:1068)
2022-11-23 01:51:10,525 - ERROR: ==842== by 0x119585: main (ospf6_main.c:228)
2022-11-23 01:51:10,525 - ERROR: ==842==
RCA:
====
These memory leaks are beacuse of last lsa in neighbour's request_list is not
getting freed beacuse of lsa lock. The last request has an addtional lock which
is added as a part of ospf6_make_lsreq, this lock needs to be removed
in order for the lsa to get freed.
Fix:
====
Check and remove the lock on the last request in all the functions.
Signed-off-by: Manoj Naragund <mnaragund@vmware.com>
a) Remove setting of thread pointer to NULL after
thread invocation, this is already done.
b) Use thread_is_scheduled()
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Problem Statement:
==================
RFC 7166 support for OSPF6 in FRR code.
RCA:
====
This feature is newly supported in FRR
Fix:
====
Core functionality implemented in previous commit is
stitched with rest of ospf6 code as part of this commit.
Risk:
=====
Low risk
Tests Executed:
===============
Have executed the combination of commands.
Signed-off-by: Abhinay Ramesh <rabhinay@vmware.com>
Currently the ospf6d's commands with non-exist vrfs can't give the error
informations to users.
This commit adds a macro "OSPF6_CMD_CHECK_VRF" to give error information
if with non-exist vrfs. As usual, skip the checking process in the case
of json.
So one command can call this macro to do the checking process in its
end. At that time it need know json style or not, so add "json" parameter for
several related functions.
BTW, suppress the build warning of the macro `OSPF6_FIND_VRF_ARGS`:
"Macros starting with if should be enclosed by a do - while loop to avoid
possible if/else logic defects."
Signed-off-by: anlan_cs <anlan_cs@tom.com>
The adj_ok thread event is being added but not killed
when the underlying interface is deleted. I am seeing
this crash:
OSPF6: Received signal 11 at 1636142186 (si_addr 0x0, PC 0x561d7fc42285); aborting...
OSPF6: zlog_signal+0x18c 7f227e93519a 7ffdae024590 /lib/libfrr.so.0 (mapped at 0x7f227e884000)
OSPF6: core_handler+0xe3 7f227e97305e 7ffdae0246b0 /lib/libfrr.so.0 (mapped at 0x7f227e884000)
OSPF6: funlockfile+0x50 7f227e863140 7ffdae024800 /lib/x86_64-linux-gnu/libpthread.so.0 (mapped at 0x7f227e84f000)
OSPF6: ---- signal ----
OSPF6: need_adjacency+0x10 561d7fc42285 7ffdae024db0 /usr/lib/frr/ospf6d (mapped at 0x561d7fbc6000)
OSPF6: adj_ok+0x180 561d7fc42f0b 7ffdae024dc0 /usr/lib/frr/ospf6d (mapped at 0x561d7fbc6000)
OSPF6: thread_call+0xc2 7f227e989e32 7ffdae024e00 /lib/libfrr.so.0 (mapped at 0x7f227e884000)
OSPF6: frr_run+0x217 7f227e92a7f3 7ffdae024ec0 /lib/libfrr.so.0 (mapped at 0x7f227e884000)
OSPF6: main+0xf3 561d7fc0f573 7ffdae024fd0 /usr/lib/frr/ospf6d (mapped at 0x561d7fbc6000)
OSPF6: __libc_start_main+0xea 7f227e6b0d0a 7ffdae025010 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x7f227e68a000)
OSPF6: _start+0x2a 561d7fc0f06a 7ffdae0250e0 /usr/lib/frr/ospf6d (mapped at 0x561d7fbc6000)
OSPF6: in thread adj_ok scheduled from ospf6d/ospf6_interface.c:678 dr_election()
The crash is in the on->ospf6_if pointer is NULL. The only way this could
happen from what I can tell is that the event is added to the system
and then we immediately delete the interface, removing the memory
but not freeing up the adj_ok thread event.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
I am seeing a crash of ospf6d with this stack trace:
OSPF6: Received signal 11 at 1636042827 (si_addr 0x0, PC 0x55efc2d09ec2); aborting...
OSPF6: zlog_signal+0x18c 7fe20c8ca19a 7ffd08035590 /lib/libfrr.so.0 (mapped at 0x7fe20c819000)
OSPF6: core_handler+0xe3 7fe20c90805e 7ffd080356b0 /lib/libfrr.so.0 (mapped at 0x7fe20c819000)
OSPF6: funlockfile+0x50 7fe20c7f8140 7ffd08035800 /lib/x86_64-linux-gnu/libpthread.so.0 (mapped at 0x7fe20c7e4000)
OSPF6: ---- signal ----
OSPF6: ospf6_neighbor_state_change+0xdc 55efc2d09ec2 7ffd08035d90 /usr/lib/frr/ospf6d (mapped at 0x55efc2c8e000)
OSPF6: exchange_done+0x15c 55efc2d0ab4a 7ffd08035dc0 /usr/lib/frr/ospf6d (mapped at 0x55efc2c8e000)
OSPF6: thread_call+0xc2 7fe20c91ee32 7ffd08035df0 /lib/libfrr.so.0 (mapped at 0x7fe20c819000)
OSPF6: frr_run+0x217 7fe20c8bf7f3 7ffd08035eb0 /lib/libfrr.so.0 (mapped at 0x7fe20c819000)
OSPF6: main+0xf3 55efc2cd7573 7ffd08035fc0 /usr/lib/frr/ospf6d (mapped at 0x55efc2c8e000)
OSPF6: __libc_start_main+0xea 7fe20c645d0a 7ffd08036000 /lib/x86_64-linux-gnu/libc.so.6 (mapped at 0x7fe20c61f000)
OSPF6: _start+0x2a 55efc2cd706a 7ffd080360d0 /usr/lib/frr/ospf6d (mapped at 0x55efc2c8e000)
OSPF6: in thread exchange_done scheduled from ospf6d/ospf6_message.c:2264 ospf6_dbdesc_send_newone()
The stack trace when decoded is:
(gdb) l *(ospf6_neighbor_state_change+0xdc)
0x7bec2 is in ospf6_neighbor_state_change (ospf6d/ospf6_neighbor.c:200).
warning: Source file is more recent than executable.
195 on->name, ospf6_neighbor_state_str[prev_state],
196 ospf6_neighbor_state_str[next_state],
197 ospf6_neighbor_event_string(event));
198 }
199
200 /* Optionally notify about adjacency changes */
201 if (CHECK_FLAG(on->ospf6_if->area->ospf6->config_flags,
202 OSPF6_LOG_ADJACENCY_CHANGES)
203 && (CHECK_FLAG(on->ospf6_if->area->ospf6->config_flags,
204 OSPF6_LOG_ADJACENCY_DETAIL)
OSPFv3 is creating the event without a managing thread and as such
if the event is not run before a deletion event comes in memory
will be freed up and we'll start trying to access memory we should
not. Modify ospfv3 to track the thread and appropriately stop
it when the memory is deleted or it is no longer need to run
that bit of code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
FRR should only ever use the appropriate THREAD_ON/THREAD_OFF
semantics. This is espacially true for the functions we
end up calling the thread for.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
RFC 5187 specifies the Graceful Restart enhancement to the OSPFv3
routing protocol. This commit implements support for the GR
restarting mode.
Here's a quick summary of how the GR restarting mode works:
* GR can be enabled on a per-instance basis using the `graceful-restart
[grace-period (1-1800)]` command;
* To perform a graceful shutdown, the `graceful-restart prepare ipv6
ospf` EXEC-level command needs to be issued before restarting the
ospf6d daemon (there's no specific requirement on how the daemon
should be restarted);
* `graceful-restart prepare ospf` will initiate the graceful restart
for all GR-enabled instances by taking the following actions:
o Flooding Grace-LSAs over all interfaces
o Freezing the OSPF routes in the RIB
o Saving the end of the grace period in non-volatile memory (a JSON
file stored in `$frr_statedir`)
* Once ospf6d is started again, it will follow the procedures
described in RFC 3623 until it detects it's time to exit the graceful
restart (either successfully or unsuccessfully).
Testing done:
* New topotest featuring a multi-area OSPF topology (including stub
and NSSA areas);
* Successful interop tests against IOS-XR routers acting as helpers.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Description:
1. changes to process GRACE LSA packet.
2. Validation changes to enter Helper role.
3. Helper functionality during graceful restart.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
During the database description exchange process, the slave
releases the last dbdesc packet after router_dead_interval.
This was not implemented in the code.
I have written the function ospf6_neighbor_last_dbdesc_release,
which releases the last dbdesc packet after router_dead_interval.
This change was required as per the conformance test 13.11:
In state Full reception of a Database Description packet from
the master after this interval (RouterDeadInterval) will
generate a SeqNumberMismatch neighbor event.
Associated Parameters
ICMPv6 Packet Listen Time
ICMPv6 Packet Tolerance Factor
ICMPv6 Packet Tolerance Time
OSPFV3 DUT Interface Transmit Delay
OSPF Reset Adjacencies Timeout
Test Actions
1.
2. 3.
ANVL: Establish full adjacency with DUT for neighbor Rtr-0-A on DIface-0, with DUT as slave.
ANVL: Wait (for <RouterDeadInterval> seconds).
ANVL: Send <OSPF-DD> packet from neighbor Rtr-0-A to DIface-0 con- taining:
• •
I-bit field not set M-bit field not set
MS-bit field set
DD sequence number same as the one last sent by ANVL.
. ANVL: Listen (for upto 2 * <RxmtInterval> seconds) on DIface-0.
5. DUT: Trigger the event SeqNumberMismatch and set the neighbor state for neighbor Rtr-0-A to ExStart.
6. DUT: Send <OSPF-DD> packet.
7. ANVL: Verify that the received <OSPF-DD> packet contains:
• I-bit field set
• M-bit field set
• MS-bit field set.
Test Reference
• RFC 5340, s4.2.1.2 p19 Sending Database Description Packets
RFC 2328, s10.8 p104 Sending Database Description Packets.
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
Same as other commits -- convert most DEFINE_MTYPE into the _STATIC
variant, and move the remaining non-static ones to appropriate places.
Signed-off-by: David Lamparter <equinox@diac24.net>
Compare the neighbour id string from the arguments to the router_id of
the neighbor. If equal then call the show function.
Signed-off-by: Yash Ranjan <ranjany@vmware.com>
1. All the changes are related to handle ospf6 with different vrf.
2. The dependancy of global ospf6 is removed.
Co-authored-by: Kaushik <kaushik@niralnetworks.com>
Signed-off-by: harios_niral <hari@niralnetworks.com>
The code pattern:
for (ALL_LSDB(lsdb, lsa)) {
remove_lsa(lsa)
}
has a use after free in ALL_LSDB, since we ask for the next pointer,
after it has been freed.
Modify the code such that we grab the next pointer before we can
possibly free it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This reverts commit 0f9f74baeb.
This commit was causing crashes and the goal of this commit
was to make coverity sanity happy. I'd rather have coverity
sad and not have ospfv3 crash
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
the for (ALL_LSDB...) macro was iterating over lsa,
when lsa had just been freed in these functions.
Remove the macro and make the adjustments saving lsa_next
before the free.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If the user configured an interface to be in a particular mode, we need
to be consistent about that. No looking at if_is_pointopoint() or
if_is_broadcast().
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
We had a variety of issues with sorted list compare functions.
This commit identifies and fixes these issues.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This serves no other purpose than to generate stupid warnings for
overwritten initializers on old gcc versions.
Signed-off-by: David Lamparter <equinox@diac24.net>
There is no need to check for failure of a ALLOC call
as that any failure to do so will result in a assert
happening. So we can safely remove all of this code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When neighbor state transition from LOADING to
FULL state, active full neighbors count incremented.
The full neighbors count is used for router-id change
if any full neighbor exist, displays message to restart
ospf6/frr to activate new router-id.
In the case of P-t-P neighbor type neighbor transition
from EXCHANGE to FULL which missed full neighbors count.
Ticket:CM-20574
Testing Done:
Initially, Bring up zebra assigned router-id in ospf6
with point-to-point link based neighbor.
Configure static router-id where restart of ospf6 message
is displayed.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Initially INP LSA is originated, when connected
interface comes up. As neighbor is not up, LSA is
not transmitted but stored in DB.
As NSM transition to FULL, INP is scheduled but
ospf6_flood() would not originate the LSA as
current DB and new INP LSA same so it discards
the new LSA.
When Neighor becomes FULL, originate INP via
flushing current DB copy and generate new.
This is introduced as PR 1738 introduce,
premature aging of LSAs in nbr table as R1
going down. upon neigbor coming up, INP was
not updated to new age.
Ticket:CM-19926,CM-19945
Testing Done:
Topology R3 --- R1 -- R2, R1 have INP LSA.
After frr restart R2 and R3 re learnt R1's
INP LSA as new neighbor(s) come up.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Notify user to store config and restart ospf6d
as part of router-id change cli if any of
the area active.
Store zebra router-id under ospf6, when static
router-id removed restore zebra router-id, ask
to restart ospf6d.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
RFC 2328 (14.1) Premature aging of LSAs from
routing domain :
When ospf6d is going away (router going down),
send MAXAGEd self originated LSAs to all
neighbors in routing domain to trigger
Premature aging to remove from resepective LSDBs.
Neighbor Router Reboot:
Upon receiving Self-originate MAXAGEd LSA, simply
discard, Current copy could be non maxaged latest.
For neighbor advertised LSA's (current copy in LSDB)
is set to MAXAGE but received new LSA with Non-MAXAGE
(with current age), discard the current MAXAGE LSA,
Send latest copy of LSA to neighbors and update the
LSDB with new LSA.
When a neighbor transition to FULL, trigger AS-External
LSAs update from external LSDB to new neighbor.
Testing:
R1 ---- DUT --- R5
| \
R2 R3
|
R4
Area 1: R5 and DUT
Area 0: DUT, R1, R2, R3
Area 2: R2 R4
Add IPv6 static routes at R5
Redistribute kernel routes at R5,
Validate routes at R4, redistributed via backbone
to area 2.
Stop n start frr.service at R5 and validated
MAXAGE LSAs then recent age LSAs in Database at DUT-R4.
Validated external routes installed DUT to R4.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>