Problem: Sometimes the configured Local GR state is not reflected in
show command and peer node. This is causing failures in few of the
BGP-GR topotests.
RCA: This problem is seen when the configuration of local GR state
happens when the BGP session is in OpenSent state and moves to
Established after the configuration is complete.
When the session gets established, we move the GR state value from stub peer
to the config peer. This will result in overriding the GR state to
previous value.
Fix: The local GR state is modified only through CLI configuration and
does not change during BGP FSM transition. In this case it is not necessary
to transfer the GR state value from stub peer to config peer. This way we
can ensure that always the most recent config value is present in peer
datastructure.
Signed-off-by: Prerana-GB <prerana@vmware.com>
We are sending up to ZAPI_MESSAGE_OPAQUE_LENGTH but checking
for one less. We know the data will fit in it to that size.
Also we have asserts on the write to ensure we don't go over
it
Fixes: #8995
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There's nothing that can be done here with an error. Try to make
Coverity understand that this is intentional.
(I don't know if the `(void)` will actually fix the coverity warning,
but I don't really have a better way to figure it out beyond just
getting this merged and waiting for a result...)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This is pretty much common sense ("runtime knobs are easier to adjust
than a compile-time setting"), but maybe it should be said just for
reference.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Issue: Crash observed when LSAs are removed from LSDB after max age
when there is no area configured.
(gdb) bt
0 raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
1 0x00007fdb190548bc in core_handler (signo=6, siginfo=0x7ffdd2f5a470, context=<optimized out>) at lib/sigevent.c:262
2 <signal handler called>
3 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
4 0x00007fdb185ad921 in __GI_abort () at abort.c:79
5 0x00007fdb1907f199 in _zlog_assert_failed (xref=xref@entry=0x55f30902aa20 <_xref.21999>, extra=extra@entry=0x0) at lib/zlog.c:581
6 0x000055f308dc4f78 in ospf6_asbr_lsa_remove (lsa=0x55f30a7546d0, asbr_entry=0x0) at ospf6d/ospf6_asbr.c:696
7 0x000055f308dd8f0d in ospf6_lsdb_remove (lsa=0x55f30a7546d0, lsdb=lsdb@entry=0x55f30a73d300) at ospf6d/ospf6_lsdb.c:166
8 0x000055f308dd9701 in ospf6_lsdb_maxage_remover (lsdb=0x55f30a73d300) at ospf6d/ospf6_lsdb.c:376
9 0x000055f308dee724 in ospf6_maxage_remover (thread=<optimized out>) at ospf6d/ospf6_top.c:603
10 0x00007fdb1906520d in thread_call (thread=thread@entry=0x7ffdd2f5ae90) at lib/thread.c:1919
11 0x00007fdb19023e48 in frr_run (master=0x55f30a569b70) at lib/libfrr.c:1155
12 0x000055f308dc09b6 in main (argc=6, argv=0x7ffdd2f5b198, envp=<optimized out>) at ospf6d/ospf6_main.c:235
(gdb)
Steps to reproduce the issue:
1. router ospf6
2. redistribute static
3. ipv6 route 1::1/128 Null0
4. no redistribute static
5. wait for Max aged LSA to flush
6. Check DB, crash occurs.
RCA:
Crash occurred while accessing listgetdata(listhead(ospf6->area_list))
When there is no area attached to any of the interface listhead(ospf6->area_list)
is NULL. Therefore it crashed due to NULL access.
Fix:
Check before accessing null pointer.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
There is no peer_af allocated in `peer_activate`. Trying to delete
the structure just results in an no-op and a error return value.
The error message "couldn't delete af structure for peer" is
unexpected.
Signed-off-by: zyxwvu Shi <shiyuchen.syc@bytedance.com>
Add a new topotest that features a topology with seven routers spread
across four OSPF areas:
* 1 backbone area;
* 1 regular non-backbone area (0.0.0.1);
* 1 stub area (0.0.0.2);
* 1 NSSA area (0.0.0.3).
All routers have both GR and GR helper functionality enabled in
the configuration. The test consists of restarting each router,
one at time, and checking that all forwarding planes (and LSDBs)
are kept intact during those restarts.
A successful run takes about three minutes to finish.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Using "write memory" to save the daemons' configurations before
restarting them can cause log files to stop working correctly. Add
a new "save_config" to the kill_router_daemons() function to prevent
that from happening when saving the configurations isn't necessary.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
RFC 3623 specifies the Graceful Restart enhancement to the OSPF
routing protocol. This PR implements support for the restarting mode,
whereas the helper mode was implemented by #6811.
This work is based on #6782, which implemented the pre-restart part
and settled the foundations for the post-restart part (behavioral
changes, GR exit conditions, and on-exit actions).
Here's a quick summary of how the GR restarting mode works:
* GR can be enabled on a per-instance basis using the `graceful-restart
[grace-period (1-1800)]` command;
* To perform a graceful shutdown, the `graceful-restart prepare ospf`
EXEC-level command needs to be issued before restarting the ospfd
daemon (there's no specific requirement on how the daemon should
be restarted);
* `graceful-restart prepare ospf` will initiate the graceful restart
for all GR-enabled instances by taking the following actions:
o Flooding Grace-LSAs over all interfaces
o Freezing the OSPF routes in the RIB
o Saving the end of the grace period in non-volatile memory (a JSON
file stored in `$frr_statedir`)
* Once ospfd is started again, it will follow the procedures
described in RFC 3623 until it detects it's time to exit the graceful
restart (either successfully or unsuccessfully).
Testing done:
* New topotest featuring a multi-area OSPF topology (including stub
and NSSA areas);
* Successful interop tests against IOS-XR routers acting as helpers.
Co-authored-by: GalaxyGorilla <sascha@netdef.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Both the GR helper code and the upcoming GR restarting code are going
to share a lot of definitions. As such, rename ospf_gr_helper.h to
ospf_gr.h, which will be the central point of all GR definitions
and prototypes.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Remove previous log config
debug ospf graceful-restart helper
and just use
debug ospf graceful-restart
for everything related to OSPF GR.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>
Log the LSA advertising router in addition to the LSA type and
ID in the places where that information is necessary to uniquely
identify the LSA in the LSDB.
This is useful, for example, to know exactly which LSA has changed
when the router is exiting from the GR helper mode when a topology
change was detected.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The debuild command fails when we are doing source package only build
because it expects the arch-dependent .changes file to be present. Thus
in the instructions we switch to using dpkg-buildpackage directly and
add a note about using debuild in more complicated scenarios.
Signed-off-by: Ondřej Surý <ondrej@sury.org>
In the CI, it's better to build the source package only once and then
instead of checking out the whole repository, only distribute the source
packages to the individual jobs.
Signed-off-by: Ondřej Surý <ondrej@sury.org>
The Debian autopkgtest would fail with new PAM introduced in Debian bullseye.
Add a little loop to wait a little longer for the changes to propagate.
Signed-off-by: Ondřej Surý <ondrej@sury.org>