Opaque data takes up a lot of memory when there are a lot of routes on
the box. Given that this is just a cosmetic info, I propose to disable
it by default to not shock people who start using FRR for the first time
or upgrades from an old version.
Fixes#10101.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
VRF name should not be printed in the config since 574445ec. The update
was done for NB config output but I missed it for regular vty output.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Description:
This LSID alogithm added as per rcf2328 Appendex-E recommendation.
This applies only for AS-external lsas and summary lsas.
As an example of the algorithm, consider its operation when the
following sequence of events occurs in a single router (Router A).
(1) Router A wants to originate an AS-external-LSA for
[10.0.0.0,255.255.255.0]:
(a) A Link State ID of 10.0.0.0 is used.
(2) Router A then wants to originate an AS-external-LSA for
[10.0.0.0,255.255.0.0]:
(a) The LSA for [10.0.0,0,255.255.255.0] is reoriginated using a
new Link State ID of 10.0.0.255.
(b) A Link State ID of 10.0.0.0 is used for
[10.0.0.0,255.255.0.0].
(3) Router A then wants to originate an AS-external-LSA for
[10.0.0.0,255.0.0.0]:
(a) The LSA for [10.0.0.0,255.255.0.0] is reoriginated using a
new Link State ID of 10.0.255.255.
(b) A Link State ID of 10.0.0.0 is used for
[10.0.0.0,255.0.0.0].
(c) The network [10.0.0.0,255.255.255.0] keeps its Link State ID
of 10.0.0.255.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
The ospf instance code is not properly handling the default route
when using default-information originate. This is because
the code is looking for the default route to be saved with an
instance of <ospf instance id> but we always save it as a instance
id of 0. In fact OSPF asks zebra for the default route as a special
case in instance 0, always.
Here is the correct behavior:
eva# show ip ospf data
OSPF Instance: 3
OSPF Router with ID (192.168.122.1)
AS External Link States
Link ID ADV Router Age Seq# CkSum Route
0.0.0.0 192.168.122.1 8 0x80000001 0xdb08 E2 0.0.0.0/0 [0x0]
eva# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
K>* 0.0.0.0/0 [0/100] via 192.168.119.1, enp39s0, 00:02:03
Fixes: #10251
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Update ospfd and ospf6d to send opaque route attributes to
zebra. Those attributes are stored in the RIB and can be viewed
using the "show ip[v6] route" commands (other than that, they are
completely ignored by zebra).
Example:
```
debian# show ip route 192.168.1.0/24
Routing entry for 192.168.1.0/24
Known via "ospf", distance 110, metric 20, best
Last update 01:57:08 ago
* 10.0.1.2, via eth-rt2, weight 1
OSPF path type : External-2
OSPF tag : 0
debian#
debian# show ip route 192.168.1.0/24 json
{
"192.168.1.0\/24":[
{
"prefix":"192.168.1.0\/24",
"prefixLen":24,
"protocol":"ospf",
"vrfId":0,
"vrfName":"default",
"selected":true,
[snip]
"ospfPathType":"External-2",
"ospfTag":"0"
}
]
}
```
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Comparison of the two pointer is confusing, they have no relevance
except the time both of them are empty.
Additionly modify one variable name and correct some comment words, they
are same in both ospfd and ospf6d.
Signed-off-by: anlan_cs <anlan_cs@tom.com>
Currently, it is possible to rename the default VRF either by passing
`-o` option to zebra or by creating a file in `/var/run/netns` and
binding it to `/proc/self/ns/net`.
In both cases, only zebra knows about the rename and other daemons learn
about it only after they connect to zebra. This is a problem, because
daemons may read their config before they connect to zebra. To handle
this rename after the config is read, we have some special code in every
single daemon, which is not very bad but not desirable in my opinion.
But things are getting worse when we need to handle this in northbound
layer as we have to manually rewrite the config nodes. This approach is
already hacky, but still works as every daemon handles its own NB
structures. But it is completely incompatible with the central
management daemon architecture we are aiming for, as mgmtd doesn't even
have a connection with zebra to learn from it. And it shouldn't have it,
because operational state changes should never affect configuration.
To solve the problem and simplify the code, I propose to expand the `-o`
option to all daemons. By using the startup option, we let daemons know
about the rename before they read their configs so we don't need any
special code to deal with it. There's an easy way to pass the option to
all daemons by using `frr_global_options` variable.
Unfortunately, the second way of renaming by creating a file in
`/var/run/netns` is incompatible with the new mgmtd architecture.
Theoretically, we could force daemons to read their configs only after
they connect to zebra, but it means adding even more code to handle a
very specific use-case. And anyway this won't work for mgmtd as it
doesn't have a connection with zebra. So I had to remove this option.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When the summary-address is deleted, `ospf_aggr_handle_external_info` is
called for each aggregated route for the cleanup. It needs to find the
corresponding OSPF instance and it does it using the `ei->instance`
which is totally wrong, because it's the instance from which the route
is redistributed, not the local OSPF instance. A pointer to the correct
OSPF instance is already stored in the external_info structure.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
An OSPF ABR, while in the process of announcing summary LSAs,
checks whether it's connected to the backbone area. If not, then
all summary LSAs are invalidated and not announced (or flushed)
while the missing backbone connectivity persists.
The backbone connectivity check consists of assessing whether
there's at least one fully formed adjacency in the backbone area. The
problem is that this check can fail unexpectedly if the router is
acting as a helper for a neighbor that is performing a graceful
restart. This is because there's a short interim of time in which
that neighbor's state will oscillate between ExStart and Full during
the LSDB synchronization process.
To address that issue, update ospf_act_bb_connection() to consider
neighbors performing a graceful restart as if they were fully
adjacent (which is what a GR helper should do).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
This commit fixes a rather obscure bug that was causing the GR
topotest to fail on a frequent basis.
RFC 3623 specifies that a router acting as a helper to a restarting
neighbor should monitor topology changes and abort the GR procedures
when one is detected, falling back to normal OSPF operation.
ospfd uses the ospf_lsa_different() function to detect when the
content of an LSA has changed, which is considered as a topology
change. The problem is that ospf_lsa_different() can return true
even when the two LSAs passed as parameters are identical, provided
one LSA has the OSPF_LSA_RECEIVED flag set and the other not.
In the context of the ospf_gr_topo1 test, router rt6 performs
a graceful restart and a few seconds later acts as a helper for
router rt7. When it's acting as a helper for rt7, it still didn't
translate its NSSA Type-7 LSAs, something that happens only after 7
seconds (OSPF_ABR_TASK_DELAY) of the first SPF run. The translated
Type-5 LSAs on its LSDB were learned from the helping neighbors
(rt3 and rt7). It's then possible that the NSSA Type-7 LSAs might
be translated while rt6 is acting as helper for rt7, which causes
the daemon to detect a non-existent topology change only because
the OSPF_LSA_RECEIVED flag is unset in the recently originated
Type-5 LSA.
Fix this problem by ignoring the OSPF_LSA_RECEIVED flag when
comparing LSAs for the purpose of topology change detection.
In short, the bug would only show up when the restarting router
would start acting as a helper immediately after coming back up
(which would be hard to happen in the real world). The topotest
failures became more frequent after commit 6255aad0bc because of
the removal of the 'sleep' calls, which used to give ospfd more time
to converge before start acting as a helper for other routers. The
problem still occurred from time to time though.
Fixes#9983.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Since f60a1188 we store a pointer to the VRF in the interface structure.
There's no need anymore to store a separate vrf_id field.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Description:
When changing the area from normal to NSSA, previous area's
ASBR router's type-5 also calculated and added to routing table along
with Type-7 lsas.
Made a change in route calculation such that it will not consider Type-5
lsas in calculation if it is originated from NSSA ASBR router.
These lsas will be age out at MAX age.
log:
frr(config-router)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
K>* 0.0.0.0/0 [0/0] via 10.112.157.253, ens160, 00:32:47
C>* 10.112.156.0/23 is directly connected, ens160, 00:32:47
S>* 22.22.22.2/32 [1/0] is directly connected, ens192, weight 1, 00:20:03
O>* 33.33.33.0/24 [110/20] via 100.1.1.220, ens192, weight 1, 00:08:55
via 100.1.1.220, ens192, weight 1, 00:08:55
O 100.1.1.0/24 [110/10] is directly connected, ens192, weight 1, 00:21:32
C>* 100.1.1.0/24 is directly connected, ens192, 00:23:11
frr(config-router)# do show ip ospf route
============ OSPF network routing table ============
N 100.1.1.0/24 [10] area: 0.0.0.1
directly attached to ens192
============ OSPF router routing table =============
R 2.2.2.2 [10] area: 0.0.0.1, ASBR
via 100.1.1.220, ens192
============ OSPF external routing table ===========
N E2 33.33.33.0/24 [10/20] tag: 0
via 100.1.1.220, ens192
via 100.1.1.220, ens192
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
We should always treat the VRF interface as a loopback. Currently, this
is not the case, because in some old pre-VRF code we use if_is_loopback
instead of if_is_loopback_or_vrf. To avoid any future problems, the
proposal is to rename if_is_loopback_or_vrf to if_is_loopback and use it
everywhere. if_is_loopback is renamed to if_is_loopback_exact in case
it's ever needed, but currently it's not used anywhere.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Function ospf_te_parse_te() and ospf_te_delete_te() browse TE TLV but also
subTLV. The loop that parse the subTLV check that cummulative read data doesn't
exceed the total size of the TLV. However, the sum variable that counts the
number of read data was wrongly intialize to 0 instead to 4 (i.e. the initial
TLV Header size that is located at the TOP of subTLV).
This patch adjust accordingly the initial value of the counter.
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
The no-form should use the same arguments as the regular command, hence
replace "period" with "grace-period".
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Running ospf_topo_vrf1 leads us to this valgrind issue:
==2386518== Invalid read of size 8
==2386518== at 0x4971520: route_top (table.c:401)
==2386518== by 0x181F08: ospf_interface_bfd_apply (ospf_bfd.c:126)
==2386518== by 0x182069: ospf_interface_disable_bfd (ospf_bfd.c:158)
==2386518== by 0x18BF51: ospf_del_if_params (ospf_interface.c:557)
==2386518== by 0x18C584: ospf_if_delete_hook (ospf_interface.c:712)
==2386518== by 0x490CA0B: hook_call_if_del (if.c:61)
==2386518== by 0x490D1F3: if_delete_retain (if.c:286)
==2386518== by 0x490D337: if_delete (if.c:309)
==2386518== by 0x490CDED: if_destroy_via_zapi (if.c:200)
==2386518== by 0x49940A9: zclient_interface_delete (zclient.c:2237)
==2386518== by 0x4998062: zclient_read (zclient.c:3969)
==2386518== by 0x4979529: thread_call (thread.c:1908)
==2386518== by 0x4919918: frr_run (libfrr.c:1164)
==2386518== by 0x181AC7: main (ospf_main.c:235)
==2386518== Address 0x5df39a0 is 0 bytes inside a block of size 56 free'd
==2386518== at 0x48399AB: free (vg_replace_malloc.c:538)
==2386518== by 0x492A03E: qfree (memory.c:141)
==2386518== by 0x4970C6F: route_table_free (table.c:141)
==2386518== by 0x4970A36: route_table_finish (table.c:61)
==2386518== by 0x18C543: ospf_if_delete_hook (ospf_interface.c:708)
==2386518== by 0x490CA0B: hook_call_if_del (if.c:61)
==2386518== by 0x490D1F3: if_delete_retain (if.c:286)
==2386518== by 0x490D337: if_delete (if.c:309)
==2386518== by 0x490CDED: if_destroy_via_zapi (if.c:200)
==2386518== by 0x49940A9: zclient_interface_delete (zclient.c:2237)
==2386518== by 0x4998062: zclient_read (zclient.c:3969)
==2386518== by 0x4979529: thread_call (thread.c:1908)
==2386518== by 0x4919918: frr_run (libfrr.c:1164)
==2386518== by 0x181AC7: main (ospf_main.c:235)
==2386518== Block was alloc'd at
==2386518== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==2386518== by 0x4929EFC: qcalloc (memory.c:116)
==2386518== by 0x49709F8: route_table_init_with_delegate (table.c:53)
==2386518== by 0x49717F4: route_table_init (table.c:528)
==2386518== by 0x18C328: ospf_if_new_hook (ospf_interface.c:659)
==2386518== by 0x490C97D: hook_call_if_add (if.c:60)
==2386518== by 0x490CE85: if_create_name (if.c:223)
==2386518== by 0x490DF32: if_get_by_name (if.c:622)
==2386518== by 0x4993F73: zclient_interface_add (zclient.c:2186)
==2386518== by 0x4998062: zclient_read (zclient.c:3969)
==2386518== by 0x4979529: thread_call (thread.c:1908)
==2386518== by 0x4919918: frr_run (libfrr.c:1164)
==2386518== by 0x181AC7: main (ospf_main.c:235)
==2386518==
Fix the ordering to do the individual node tree cleanup after we delete
the data we care about.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently, we automatically delete an inactive VRF when its last
interface is deleted. This code introduces a couple of crashes because
of the following problems:
- vrf_delete is called before calling if_del hook, so daemons may try to
dereference an ifp->vrf pointer which is freed
- in if_terminate, we continue to use the VRF in the loop condition
after the last interface is deleted
This check is needed only when the interface is deleted by the user,
because if the interface is deleted by the system, VRF must still exist
in the system. Move the check to appropriate places to fix crashes.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Issue #9983 explains what is wrong with the GR helper mode.
To unblock the CI that fails almost all the time on the ospf_gr_topo1
test, remove the commands and disable the test. Also add a reminder to
completely remove the helper mode if no one fixes the code in a month.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Description:
timerval datastructure is being used without initialization.
Using these uninitialized parameters can lead unexpected results
so initializing before using it.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
Description:
In PointToPoint networks, There wont be DR and BDR.
But by default, All neighbours ism state is shown as
DR_OTHER.
Changed the nbr state format to <nbrsate>/- (ex : FULL/-)
to P2pnetworks.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
Description:
1. Adding uptime to the 'show ip ospf neighbor' o/p.
2. Adding uptime and deadtime in string format for json consumption.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
Do not return pointer to the newly created thread from various thread_add
functions. This should prevent developers from storing a thread pointer
into some variable without letting the lib know that the pointer is
stored. When the lib doesn't know that the pointer is stored, it doesn't
prevent rescheduling and it can lead to hard to find bugs. If someone
wants to store the pointer, they should pass a double pointer as the last
argument.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When doing a normal exit from ospf we should close
the log file as that we are leaving a bunch of
unterminated logging processes by not doing so.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This removes a giant `switch { }` block from lib/zclient.c and
harmonizes all zclient callback function types to be the same (some had
a subset of the args, some had a void return, now they all have
ZAPI_CALLBACK_ARGS and int return.)
Apart from getting rid of the giant switch, this is a minor security
benefit since the function pointers are now in a `const` array, so they
can't be overwritten by e.g. heap overflows for code execution anymore.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
It allows FRR to read the interface config even when the necessary VRFs
are not yet created and interfaces are in "wrong" VRFs. Currently, such
config is rejected.
For VRF-lite backend, we don't care at all about the VRF of the inactive
interface. When the interface is created in the OS and becomes active,
we always use its actual VRF instead of the configured one. So there's
no need to reject the config.
For netns backend, we may have multiple interfaces with the same name in
different VRFs. So we care about the VRF of inactive interfaces. And we
must allow to preconfigure the interface in a VRF even before it is
moved to the corresponding netns. From now on, we allow to create
multiple configs for the same interface name in different VRFs and
the necessary config is applied once the OS interface is moved to the
corresponding netns.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
if_lookup_by_name_all_vrf doesn't work correctly with netns VRF backend
as the same index may be used in multiple netns simultaneously.
Use the appropriate VRF when looking for the interface.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The `show ip ospf neighbor json` command was displaying
state:`Full\/DR`
Where state was both the role and whether or not the neigbhor
was converged. While from a OSPF perspective this is the state.
This state is a combination of two things.
This creates a problem in testing because we have no guarantee
that a particular ospf router will actually have a particular role
given how loaded our topotest systems are. So add a bit of json
output to display both the converged status as well as the
role this router is playing on this neighbor/interface.
The above becomes:
state:`Full\/DR`
converged:`Full`
role:`DR`
Tests can now be modified to look for `Full` and allow it to
continue. Most of the tests do not actually care if this
router is the DR or Backup.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Commit 3551ee9e90304 introduced a regression that causes GR to fail
under certain circumstances. In short, while ISM events should
be ignored while acting as a helper for a restarting router, the
DR/BDR fields of the neighbor structure should still be updated
while processing a Hello packet. If that isn't done, it can cause
the helper to elect the wrong DR while exiting from the helper mode,
leading to a situation where there are two DRs for the same network
segment (and a failed GR by consequence). Fix this.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Before starting the graceful restart procedures, ospf_gr_prepare()
verifies for each configured OSPF instance whether it has the opaque
capability enabled (a pre-requisite for GR). If not, a warning is
emitted and GR isn't performed on that instance.
This PR introduces an additional opaque capability check that will
return a CLI error when the opaque capability isn't enabled. The
idea is to make it easier for the user to identify when the GR
activation has failed, instead of requiring him or her to check
the logs for errors.
The original opaque capability check from ospf_gr_prepare() was
retaining as it's possible that that function might be called from
other contexts in the future.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The ospfd opaque LSA infrastruture has an issue where it can't store
different versions of the same Type-9 LSA for different interfaces.
When flushing the self-originated Grace-LSAs upon exiting from the GR
mode, the code was looking up the single self-originated Grace-LSA
from the LSDB, setting its age to MaxAge and sending it out on all
interfaces.
The problem is that Grace-LSAs sent on broadcast interfaces have
their own unique "IP interface address" TLV that is used to identify
the restarting router. That way, just reusing the same Grace-LSA for
all interfaces doesn't work.
Fix this by generating a new Grace-LSA with its age manually set
to MaxAge whenever one needs to be flushed. This will allow the "IP
interface address" TLV to be set correctly and make GR work even in
the presence of multiple broadcast interfaces.
In the long term, the opaque LSA infrastructure should be updated
to support Type-9 link-local LSAs correctly so that we don't need to
resort to hacks like this.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Add a 'json' parameter to the 'show_opaque_info' callback definition,
and update all instances of that callback to not display plain-text
data when the user requested JSON data.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
RFC 3623 says:
"If the restarting router determines that it was the Designated
Router on a given segment prior to the restart, it elects
itself as the Designated Router again. The restarting router
knows that it was the Designated Router if, while the
associated interface is in Waiting state, a Hello packet is
received from a neighbor listing the router as the Designated
Router".
Implement that logic when processing Hello messages to ensure DR
interfaces will preserve their DR status across a graceful restart.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
FRR should only ever use the appropriate THREAD_ON/THREAD_OFF
semantics. This is espacially true for the functions we
end up calling the thread for.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Issue:
===================
OSPF neighbors are not going down even after 10 mins when
having a mismatch in hello and dead interval.
First neighbors are formed and then a mismatch in the interval
is created, it is observed that the neighbor is not going down.
Root Cause Analysis:
====================
The event HelloReceived defined in RFC 2328 was named as PacketReceived
and this event was scheduled whenever LS Update, LS Ack, LS Request,
DD description packet or Hello packet is received.
Although there is a mismatch in the Hello packet contents, the
event PacketReceived gets triggered due to LS Update received and the
dead timer gets reset and hence the neighbor was never going Down and
remains FULL.
Fix:
==================
As per RFC 2328, the HelloReceived needs to be triggered only when
valid OSPF Hello packet is received and not when other OSPF packets
are received. Modified the function name as well.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Description:
As per the RFC 3623 section 3.2,
OSPF nbr shouldn't be deleted even in unsuccessful helper exit.
1. Made the changes to keep neighbour even after exit.
2. Restart the dead timer after expiry in helper. Otherwise, Restarter
will be in FULL state in helper forever until it receives the 'hello'.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
There's a helper function to check whether the interface is loopback or
VRF - if_is_loopback_or_vrf. Let's use it whenever we need to check that.
There's no functional change in this commit.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The capability opaque command can trigger asserts through a rather
round-about mechanism. The command eventually calls
ospf_renegotiate_optional_capabilities, which will call
ospf_flush_self_originated_lsas_now, which has the side effect of
marking the OSPF instance as shutting down. This was causing the
flooding logic to call ospf_write immediately insted of waiting for the
select IO loop every time it was sending a maxage LSA. This could cause
the list of OSPF interfaces needing to send packets to be drained while
there was a call to ospf_write pending from the IO loop. When the
pending call ran, it would see the empty list of interfaces and assert.
Signed-off-by: Yuan Yuan <yyuanam@amazon.com>
The existing logic was not comparing the prefix of the extended
prefix TLV. As a result, the code was removing all of the prefix
SIDs except the one received on every LSA update.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
In update_ext_prefix_sid(), the sr_prefix is associated to the
SR node and inherits the adv router ID regardless.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
The code was not checking if in these label ranges [a,b], a is
smaller than b, which is assumed in several places, including when
determining the size of the block as b-a+1. As a consequence, the
results of a bad configuration can be unpredictable. Some effects
observed were: 1) segfault 2) de-activation of SR due to label
reservation failure.
The added validation function also checks if the SR blocks are
larger than some minimal size. RFC 8665 mandates that the blocks
be srictly larger than zero. In this patch, the minimum sized is
arbitrarily defined to be 16.
Checking if ranges would fall outside [16,1048575] is omitted
since the vty filtering takes care of that.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
The prior condition was wrong since it ended up allowing for
labels past the end of the SRLB. Variable 'current' should be in
range [0, size-1] for labels not to exceed the SRLB upper boundary.
In addition, emit a warning log when all labels in the SRLB have
been used.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
Replaces several complex if conditions by a lookup to a utility
to determine if two ranges of numbers overlap.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
Homogenize the code dealing with SRGBs and SRLBs by defining the
same set of utility functions for their reservation.
Unify also the logs and don't display function names since the
operations are only performed from the same functions.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
Homogenize the code dealing with SRGBs and SRLBs by defining the
same set of utility functions for the deletion of SR blocks.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
* Some of the debug flags were not shown in show debugging.
* The check for TI-LFA debug was made against the wrong variable.
* Some of the debugs were not cleared with 'no debug ospf'
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
Considering that both the GR helper mode and restarting mode can be
enabled at the same time, the "graceful-restart helper-only" command
can be a bit misleading since it implies that only the helper mode
is enabled. Rename the command to "graceful-restart helper enable"
to clarify what the command does.
Start a deprecation cycle of one year before removing the original
command
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
min lsa packet sizes are not always directly corresponding
to the actual LSA. Add a bit of comments so it's easier
for future people to figure out.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In some cases FRR is receiving a lsa data packet
with a length set to the length of the header only.
If we are expecting data from a peer in the form
of lsa data. Let's enforce it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Several functions in ospf_vty.c were allocating json memory
irrelevant if it was needed or not and then at the end of the loop
free'ing it if it was not used. Clean up the access pattern.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The usage of json_object_to_json_string_ext is meant for
generation of output string and returns a `char *` pointer
to the `formatted` output. Just calling it does nothing
and it's expensive to boot.
Modify the code in ospfd to just output with the NOSLASHESCAPE
when outputting.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Problem Statement :
===================
LSA with InitialSequenceNumber is not originated
after MaxSequenceNumber.
ANVL Test case 25.33 states:
============================
As soon as this flooding of a LSA with LS sequence number
MaxSequenceNumber has been acknowledged by all adjacent neighbors,
a new instance can be originated with sequence number of InitialSequenceNumber.
RCA :
=====
DUT did not originated LSA with INITIAL_SEQUENCE number even
after receiving ACK for max sequence LSA.
Code is not present to handle this situation in the lsa ack flow.
Fix :
=====
Add code to originate LSA with initial sequence number in the
LSA ack flow in case of wrap around sequence number.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
ANVL Test case 28.11
If the database copy has LS age equal to MaxAge and LS sequence number
equal to MaxSequenceNumber, simply discard the received LSA
without acknowledging it.
ANVL Test Case 25.22
When an attempt is made to increment the sequence number past the maximum
value of N - 1 (0x7fffffff; also referred to as MaxSequenceNumber),
the current instance of the LSA must first be flushed from the routing domain.
ANVL Test Case 25.23
As soon as this flooding of a LSA with LS sequence number MaxSequenceNumber
has been acknowledged by all adjacent neighbors, a new instance can be
originated with sequence number of InitialSequenceNumber.
RCA:
When IXIA sent LS Seq num as MAX and LS Age as (MAX - 3),
DUT dropped the packet instead of sending ACK.
In function ospf_ls_upd, at Line 2106 the code is there to drop the LSA.
Hence its failing.
Fix:
LSAs ACK must be sent when received LSA is having max sequence number
but not max-aged.
Considering /* CVE-2017-3224 */ issue, have corrected the existing
code to prevent attacker from sending LSAs with max sequence number
and higher checksum and blocking the flooding of the Max-sequence numbered LSAs.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Problem Statement:
===================
DUT selecting itself as DR when RR goes for reload.
Test Case 7.2
DUT (GR Helper) receives the Hello packet from the OSPF GR RESTARTER
(ANVL here) with DR and BDR set to 0.0.0.0 and DUT in its hello
neighbor list. DUT triggers the DR and BDR election although it is
in the Helper mode for that neighbor.
Root Cause Analysis:
====================
When hello packet is received with self router ID in the neighbor list,
there is no check in the code to handle this scenario. Hence the DR/BDR
election happens and it changes the DR although it is helper.
Fix:
===================
As per RFC 3623 Section 3. Operation of Helper Neighbor, below point,
we need to maintain the DR relationship.
Also, if X was the Designated Router on network segment S when the
helping relationship began, Y maintains X as the Designated Router
until the helping relationship is terminated.
Adding the check when DUT is under neighbor helper mode, we need to avoid
ISM state change when hello packet is received with DR/BDR set to 0.0.0.0.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Problem Statement:
==================
Memory Leak seen at show_ip_ospf_neighbor_all_common (ospf_vty.c:4635)
RCA:
=================
In function show_ip_ospf_neighbor_all_common, one child json object is not
added to the parent child object when there is no nbma neighbor. Hence
the memory leak.
Fix:
=================
Add the child object to the parent json object.
Fixes: #9548
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Problem Statement:
==================
Summary LSA is not originated when router-id is modified or process is reset
Root Cause Analysis:
====================
When router-id is modified or process is cleared, all the external LSAs are
flushed then LSA is re-originated using ospf_external_lsa_rid_change
When the LSAs are flushed, the aggregate flags are not reset.
Fix:
===============
Reset the aggregation flag when the LSAs
are flushed.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Problem Statement:
==================
When hello-interval is configured as 5, automatically dead interval becomes
4 times of hello i.e 20 seconds. But user wants the dead interval as
40 seconds and hello as 5 seconds. Therefore user configures it.
Now "ip ospf dead-interval 40" is not shown in "show running-config"
Therefore when user restarts the daemon, the dead interval goes back to
20 seconds and the neighbors are down.
Fix:
==================
If user configures dead-interval as 40, show it in show running config.
Fixes: #9401
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
There is a possibility that the same line can be matched as a command in
some node and its parent node. In this case, when reading the config,
this line is always executed as a command of the child node.
For example, with the following config:
```
router ospf
network 193.168.0.0/16 area 0
!
mpls ldp
discovery hello interval 111
!
```
Line `mpls ldp` is processed as command `mpls ldp-sync` inside the
`router ospf` node. This leads to a complete loss of `mpls ldp` node
configuration.
To eliminate this issue and all possible similar issues, let's print an
explicit "exit" at the end of every node config.
This commit also changes indentation for a couple of existing exit
commands so that all existing commands are on the same level as their
corresponding node-entering commands.
Fixes#9206.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Fix CI Failure test_ospf_type5_summary_tc45_p0
Problem Statement:
==================
Summarised LSA is not flushed in OSPFv2 in below scenario:
1. Configure summary-address in ospfv2
2. redistribute static and connected.
3. Check the LSAs are received on neighbor.
4. Now remove all OSPFv2 configs, so neighbor will still have the summarised LSA.
5. Configure router ospf with redistribute static and connected.
6. Check the DB, summarised LSA is present although the configuration is not present.
7. Now configure the summary-address and remove the configuration after sometime.
8. The summarised LSA will be still present.
RCA:
==================
When self originated LSA is received from the neighbor and that
LSA is summarised one, the LSA is refreshed but a flag is not set
due to which it was not able to remove it later.
Fix:
==================
Set the originated flag when refreshing summarised LSA.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
There are a couple of things that are not initialized if the OSPF router
is created in a non-existent VRF:
- ospf_lsa_maxage_walker
- ospf_lsa_refresh_walker
- ospf_opaque_type11_lsa_init
Rearrange some code to always initialize them and make it easier to find
similar problems in the future.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When OSPF is disabled on interface and enabled again, the IP which is
not matching the prefix-list is getting originated as External LSA.
Fixes: #9362
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Problem Statement:
==================
OSPF Peer gets stuck in EXSTART with ARISTA Device.
Root Cause:
=================
First peer is form with Arista device in normal area and then
the area type is changed to NSSA. Due to this Type-4 and Type-5
LSAs advertised by Arista router is still present in
the OSPF DB. While DD exchange the Type-5 LSAs are omitted but
the Type-4 LSAs are not omitted due to which Arista device gets
stuck in EXSTART and it keeps moving between EXCHANGE And EXSTART.
Fix:
=================
When the area is NSSA, we should not send Type-4 LSAs in DD
exchange packet.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
This command is currently always treated as an "unset" command, assuming
that active is the default type of the interface. In reality, the default
type of the interface can be changed using "passive-interface default"
command. Both "no" and regular commands can be "set" commands, depending
on the default value. They are treated as an "unset" when there's already
a config of the opposite type.
All this logic is in ospf_passive_interface_update.
Fixes#9240.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The only difference in daemons' interface node definition is the config
write function. No need to define the node in every daemon, just pass
the callback as an argument to a library function and define the node
there.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Let's be less radical. There's no reason to stop the whole daemon when
there's a socket creation error in a single VRF. The user can always
restart this single VRF to retry to create a socket.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
like the other automake variables, setting `xyz_LDFLAGS` causes
`AM_LDFLAGS` to be ignored for `xyz`. For some reason I had in my mind
that automake doesn't do this for LDFLAGS, but... it does. (Which is
consistent with `_CFLAGS` and co.)
So, all the libraries and modules have been ignoring `AM_LDFLAGS` (which
includes `SAN_FLAGS` too). Set up new `LIB_LDFLAGS` and
`MODULE_LDFLAGS` to handle all of this correctly (and move these bits to
a central location.)
Fixes: #9034
Fixes: 0c4285d77e ("build: properly split CFLAGS from AC_CFLAGS")
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
ospf_distribute_list_update currently passes two arguments to
ospf_distribute_list_update_timer - pointer to the ospf structure and
protocol type. The protocol type is only used for logging and is not
even correct because if multiple changes happen during one
ospf->min_ls_interval, then only the type of the first change is logged.
It is better to completely remove the protocol type argument to have a
correct log and eliminate the need for memory allocation.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Description:
Ospf process crashes upon giving 'clear ip ospf neighbor' with
self routerId. It is asserting if it is a self neighbor in ospf
neighbour kill event processing.
Added a check to validate the provided router-id is self
router-id.
Signed-off-by: Rajesh Girada <rgirada@vmware.com>
Move `is_default_prefix` variations to `lib/prefix.h` and make the code
use the library version instead of implementing it again.
NOTE
----
The function was split into per family versions to cover all types.
Using `union prefixconstptr` is not possible due to static analyzer
warnings which cause CI to fail.
The specific cases that would cause this failure were:
- Caller used `struct prefix_ipv4` and called the generic function.
- `is_default_prefix` with signature using `const struct prefix *` or
`union prefixconstptr`.
The compiler would complain about reading bytes outside of the memory
bounds even though it did not take into account the `prefix->family`
part.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
RFC 3623 specifies the Graceful Restart enhancement to the OSPF
routing protocol. This PR implements support for the restarting mode,
whereas the helper mode was implemented by #6811.
This work is based on #6782, which implemented the pre-restart part
and settled the foundations for the post-restart part (behavioral
changes, GR exit conditions, and on-exit actions).
Here's a quick summary of how the GR restarting mode works:
* GR can be enabled on a per-instance basis using the `graceful-restart
[grace-period (1-1800)]` command;
* To perform a graceful shutdown, the `graceful-restart prepare ospf`
EXEC-level command needs to be issued before restarting the ospfd
daemon (there's no specific requirement on how the daemon should
be restarted);
* `graceful-restart prepare ospf` will initiate the graceful restart
for all GR-enabled instances by taking the following actions:
o Flooding Grace-LSAs over all interfaces
o Freezing the OSPF routes in the RIB
o Saving the end of the grace period in non-volatile memory (a JSON
file stored in `$frr_statedir`)
* Once ospfd is started again, it will follow the procedures
described in RFC 3623 until it detects it's time to exit the graceful
restart (either successfully or unsuccessfully).
Testing done:
* New topotest featuring a multi-area OSPF topology (including stub
and NSSA areas);
* Successful interop tests against IOS-XR routers acting as helpers.
Co-authored-by: GalaxyGorilla <sascha@netdef.org>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Both the GR helper code and the upcoming GR restarting code are going
to share a lot of definitions. As such, rename ospf_gr_helper.h to
ospf_gr.h, which will be the central point of all GR definitions
and prototypes.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Remove previous log config
debug ospf graceful-restart helper
and just use
debug ospf graceful-restart
for everything related to OSPF GR.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>