RFC 3623 specifies the Graceful Restart enhancement to the OSPF
routing protocol. This PR implements support for the restarting mode,
whereas the helper mode was implemented by #6811.
This work is based on #6782, which implemented the pre-restart part
and settled the foundations for the post-restart part (behavioral
changes, GR exit conditions, and on-exit actions).
Here's a quick summary of how the GR restarting mode works:
* GR can be enabled on a per-instance basis using the `graceful-restart
[grace-period (1-1800)]` command;
* To perform a graceful shutdown, the `graceful-restart prepare ospf`
EXEC-level command needs to be issued before restarting the ospfd
daemon (there's no specific requirement on how the daemon should
be restarted);
* `graceful-restart prepare ospf` will initiate the graceful restart
for all GR-enabled instances by taking the following actions:
o Flooding Grace-LSAs over all interfaces
o Freezing the OSPF routes in the RIB
o Saving the end of the grace period in non-volatile memory (a JSON
file stored in `$frr_statedir`)
* Once ospfd is started again, it will follow the procedures
described in RFC 3623 until it detects it's time to exit the graceful
restart (either successfully or unsuccessfully).
Testing done:
* New topotest featuring a multi-area OSPF topology (including stub
and NSSA areas);
* Successful interop tests against IOS-XR routers acting as helpers.
Co-authored-by: GalaxyGorilla <sascha@netdef.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Both the GR helper code and the upcoming GR restarting code are going
to share a lot of definitions. As such, rename ospf_gr_helper.h to
ospf_gr.h, which will be the central point of all GR definitions
and prototypes.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Remove previous log config
debug ospf graceful-restart helper
and just use
debug ospf graceful-restart
for everything related to OSPF GR.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>
Log the LSA advertising router in addition to the LSA type and
ID in the places where that information is necessary to uniquely
identify the LSA in the LSDB.
This is useful, for example, to know exactly which LSA has changed
when the router is exiting from the GR helper mode when a topology
change was detected.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When ip nhrp map multicast is being used, this is usually accompanied by an
iptables rule to block the original multicast packet. This causes sendmsg to
return EPERM.
Signed-off-by: Reuben Dowle <reuben.dowle@4rf.com>
Currently if the sysctl net.ipv4.raw_l3mdev_accept is 1, packets
destined to a specific vrf also end up being delivered to the default
vrf. We will see logs like this in ospf:
2021/02/10 21:17:05.245727 OSPF: ospf_recv_packet: fd 20(default) on interface 1265(swp1s1.26)
2021/02/10 21:17:05.245740 OSPF: Hello received from [9.9.36.12] via [swp1s1.26:200.254.26.13]
2021/02/10 21:17:05.245741 OSPF: src [200.254.26.14],
2021/02/10 21:17:05.245743 OSPF: dst [224.0.0.5]
2021/02/10 21:17:05.245769 OSPF: ospf_recv_packet: fd 45(vrf1036) on interface 1265(swp1s1.26)
2021/02/10 21:17:05.245774 OSPF: Hello received from [9.9.36.12] via [swp1s1.26:200.254.26.13]
2021/02/10 21:17:05.245775 OSPF: src [200.254.26.14],
2021/02/10 21:17:05.245777 OSPF: dst [224.0.0.5]
This really really makes ospf unhappy in the vrf we are running in.
I am approaching the problem by just dropping the packet if read in the
default vrf because of:
commit 0556fc33c7
Author: Donald Sharp <sharpd@cumulusnetworks.com>
Date: Fri Feb 1 11:54:59 2019 -0500
lib: Allow bgp to always create a listen socket for the vrf
Effectively if we have `router ospf vrf BLUE` but no ospf running
in the default vrf, we will not have a listener and that would
require a fundamental change in our approach to handle the ospf->fd
at a global level. I think this is less than ideal at the moment
but it will get us moving again and allow FRR to work with
a bunch of vrf's and ospf neighbors.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Change thread_cancel to take a ** to an event, NULL-check
before dereferencing, and NULL the caller's pointer. Update
many callers to use the new signature.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Description:
1. Skipping inactivity timer during graceful restart to make
the RESTARTER active even after dead timer expiry.
2. Handling HELPER on unplanned outages.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
OSPFD sends ARP proactively to speed up convergence for /32 networks
on a p2p connection. It is only an optimization, so it can be disabled.
It is enabled by default.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
Fix
- Modulo check on data length not inclusive enough
- Garbage heap read when bounds checking
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
ospf_opaque_self_originated_lsa_received decrements refcount which can
result in a free, this is followed by a call to ospf_ls_ack_send which
accesses the freed LSA
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Line break at the end of the message is implicit for zlog_* and flog_*,
don't put it in the string. Mid-message line breaks are currently
unsupported. (LF is "end of message" in syslog.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
We actually don't validate the IHL field, although it certainly looks
like we do at a casual glance.
This patch saves us from an assert in case we actually do get an IP
packet with an incorrect header length field.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We test nbr->oi in a couple of places for null, but
in the majority of places of the nbr->oi data is being
used we just access it. Touch up code to trust this
assertion and make the code more consistent in others.
Found in Coverity.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The indentation level for ospf_read was starting to be pretty
extremene. Rework into 2 functions for improved readability.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Read in up to 20(ospf write-multipler X) packets, for handling of data.
This improves performance because we allow ospf to have a bit more data
to work on in one go for spf calculations instead of 1 packet at a time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Turning on packet debugs and seeing a header dump that is 11
lines long is useless
2019/11/07 01:07:05.941798 OSPF: ip_v 4
2019/11/07 01:07:05.941806 OSPF: ip_hl 5
2019/11/07 01:07:05.941813 OSPF: ip_tos 192
2019/11/07 01:07:05.941821 OSPF: ip_len 68
2019/11/07 01:07:05.941831 OSPF: ip_id 48576
2019/11/07 01:07:05.941838 OSPF: ip_off 0
2019/11/07 01:07:05.941845 OSPF: ip_ttl 1
2019/11/07 01:07:05.941857 OSPF: ip_p 89
2019/11/07 01:07:05.941865 OSPF: ip_sum 0xcf33
2019/11/07 01:07:05.941873 OSPF: ip_src 200.254.30.14
2019/11/07 01:07:05.941882 OSPF: ip_dst 224.0.0.5
We already have this debugged, it's not going to change and the
end developer can stick this back in if needed by hand to debug
something that is not working properly.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit has:
The received packet path in ospf, had absolutely no debugs associated with
it. This makes it extremely hard to know when we receive packets for
consumption. Add some breadcrumbs to this end.
Large chunks of commands have no ability to debug what is happening
in what vrf. With ip overlap X vrf this becomes a bit of a problem
Add some breadcrumbs here.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have a bunch of places that look for ORIGINAL_CODING. There is
nothing in our configure system to define this value and a quick
git blame shows this code as being original to the import a very
very long time ago. This is dead code, removing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Recently Lot of issues are seen in OSPF adjacnecy establishements,
sessions was tear down because of DD Sequence Number mismatch.
adding Debugs to capture Master & slave generated sequence numbers.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
When OSPF receives a Database description packet and is in
`Down`, `Attempt` or `2-Way` state we are creating a warning
for the end user.
rfc2328 states(10.6):
Down - The packet should be rejected
Attempt - The packet should be rejected
2-Way - The packet should be ignored
I cannot find any instructions in the rfc to state what the operational
difference is between rejected and ignored. Neither can I figure
out what FRR expects the end user to do with this information.
I can see this information being useful if we encounter a bug
down the line and we have gathered a bunch of data. As such
let's modify the code to remove the flog_warn and convert
the message to a debug level message that can be controlled by
appropriate debug statements.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This looks like a finish up of the partial cleanup that
ocurred at some point in time in the past. When we
alloc oi also always alloc the oi->obuf. When we delete
oi always delete the oi->obuf right before.
This cleans up a bunch of code to be simpler and hopefully
easier to follow.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
I am rarely seeing this crash:
r2: ospfd crashed. Core file found - Backtrace follows:
[New LWP 32748]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/ospfd'.
Program terminated with signal SIGABRT, Aborted.
2019-08-29 15:59:36,149 ERROR: assert failed at "test_ospf_sr_topo1/test_memory_leak":
Which translates to this code:
node = listhead(ospf->oi_write_q);
assert(node);
oi = listgetdata(node);
assert(oi);
So if we get into ospf_write without anything on the oi_write_q
we are stopping the program.
This is happening because in ospf_ls_upd_queue_send we are calling
ospf_write. Imagine that we have a interface already on the on_write_q
and then ospf_write handles the packet send for all functions. We
are not clearing the t_write thread and we are popping and causing
a crash.
Additionally modify OSPF_ISM_WRITE_ON(O) to not just blindly
turn on the t_write thread. Only do so if we have data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
ospfd: Remove redundant asserts
assert(oi) is impossible all listgetdata(node) directly proceeding
it already asserts here, besides a node cannot be created
with a null pointer!
If list_isempty is called directly before the listhead call
it is impossilbe that we do not have a valid pointer here.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
no router ospf triggers to cancel all threads
including read/write (receive/send packets) threads,
cleans up resources fd, message queue and data.
Last job of write (packet) thread invoked where the
ospf instance is referenced is not running nor
the socket fd valid.
Write thread callback should check if fd is valid and
ospf instance is running before proceeding to send a
message over socket.
Ticket:CM-20095
Testing Done:
Performed the multiple 'no router ospf' with the fix
in topology where the crash was seen.
Post fix the crash is not observed.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
While fragmenting ospf ls packets, before appending the link state info,
wrong value is checked to see if current packet can fit in another ls info.
Because of this, when a lower mtu is configured, it couldn't fit in even 1
ls ack, which tries to send all the available ls ack in the list in loop.
This keeps allocating memory to send the packet and ends up putting the
packet buffer without ls-ack into deferred send que(ospf_ls_ack_send_delayed).
This infinite loop causes infinite memory being allocated in a loop causing
system to be unstable. This commit takes care of calculating the right value
to compare for checking oif this buffer can fit in more.
Signed-off-by: Saravanan K <saravanank@vmware.com>
Based on the vulnerability mentioned in 793496 an attacker can craft an
LSA with MaxSequence number wtih invalid links and not set age to MAX_AGE
so the lsa would not be flush from the database.
To address the issue, check incoming LSA is MaxSeq but Age is not set
to MAX_AGE 3600, discard the LSA from processing it.
Based on RFC-2328 , When a LSA update sequence reaches MaxSequence
number, it should be prematurely aged out from the database with age set
to MAX_AGE (3600).
Ticket:CM-18989
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Ospfd cored because of an assert when we try to write more than the MTU
size to the ospf packet buffer stream. The problem is - we allocate only MTU
sized buffer. The expectation is that Hello packets are never large
enough to approach MTU. Instead of crashing, this fix discards hello and
logs an error. One should not have so many neighbors behind an
interface.
Ticket: CM-22380
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8204
This reverts commit 48944eb65e.
We're using GNU C, not ISO C - and this commit triggers new (real)
warnings about {0} instead of bogus ones about {}.
Signed-off-by: David Lamparter <equinox@diac24.net>