Don't rely on operational data to check for system ID consistency. This
is purely configurational data thing.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
We are passing around the system id using the variable name
of `argv`. Let's name the variable correctly and pass it around
correctly named.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
in lsp_for_arg we have already checked for NULL and returned
if argv is null. We do not need to check for it again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If we have the following configuration:
```
vrf red
smth
exit-vrf
!
interface red vrf red
smth
```
And we delete the VRF using "no vrf red" command, we end up with:
```
interface red
smth
```
Interface config is preserved but moved to the default VRF.
This is not an expected behavior. We should remove the interface config
when the VRF is deleted.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
isis_circuit_enable can be called for an already enabled circuit. In this
case we would add the circuit to the area multiple times.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When creating a new area, we're adding all circuits in the same VRF to
this area. We should only add circuits configured with the same tag.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Currently, the dynamic hostname cache is global. It is incorrect because
neighbors in different VRFs may have the same system ID and different
hostnames.
This also fixes a memory leak - when the instance is deleted, the cache
must be cleaned up and the cleanup thread must be cancelled.
Fixes#8832.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The backoff code assumed that yang operations always completed quickly.
It checked for > 100 YANG modeled commands happening in under 1 second
to enable batching. If 100 yang modeled commands always take longer than
1 second batching is never enabled. This is the exact opposite of what
we want to happen since batching speeds the operations up.
Here are the results for libyang2 code without and with batching.
| action | 1K rts | 2K rts | 1K rts | 2K rts | 20k rts |
| | nobatch | nobatch | batch | batch | batch |
| Add IPv4 | .881 | 1.28 | .703 | 1.04 | 8.16 |
| Add Same IPv4 | 28.7 | 113 | .590 | .860 | 6.09 |
| Rem 1/2 IPv4 | .376 | .442 | .379 | .435 | 1.44 |
| Add Same IPv4 | 28.7 | 113 | .576 | .841 | 6.02 |
| Rem All IPv4 | 17.4 | 71.8 | .559 | .813 | 5.57 |
(IPv6 numbers are basically the same as iPv4, a couple percent slower)
Clearly we need this. Please note the growth (1K to 2K) w/o batching is
non-linear and 100 times slower than batched.
Notes on code: The use of the new `nb_cli_apply_changes_clear_pending`
is to commit any pending changes (including the current one). This is
done when the code would not correctly handle a single diff that
included the current changes with possible following changes. For
example, a "no" command followed by a new value to replace it would be
merged into a change, and the code would not deal well with that. A good
example of this is BGP neighbor peer-group changing. The other use is
after entering a router level (e.g., "router bgp") where the follow-on
command handlers expect that router object to now exists. The code
eventually needs to be cleaned up to not fail in these cases, but that
is for future NB cleanup.
Signed-off-by: Christian Hopps <chopps@labn.net>
If the n-flag-clear option is set in the configuration of a prefix
segment, clear the flag in the extended ip reachability TLVs.
RFCs 7794 and 8667 are not too strict on the setting / clearing the
N-flag in prefix SIDs. However, if there exists a cmd line option
to clear it, it should be cleared in the TLVs announced, as other
vendors do.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
We only need an instance when we have at least one area configured in a
VRF. Currently we have the following issues:
- instance for the default VRF is always created
- instance is not removed after the last area config is removed
This commit fixes both issues.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When the redistribution is configured in non-default VRF, isisd should
redistribute routes from this VRF instead of default.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Compile with v2.0.0 tag of `libyang2` branch of:
https://github.com/CESNET/libyang
staticd init load time of 10k routes now 6s vs ly1 time of 150s
Signed-off-by: Christian Hopps <chopps@labn.net>
The current implementation of TI-LFA computes link-protecting
repair paths (even when node protection is enabled) to have repair
paths to all destinations when no node-protecting repair has been
found. This may be desired or not. E.g. the link-protecting paths
may use the protected node and be, therefore, useless if the node
fails. Also, computing link-protecting repairs incurs extra
calculations.
With this patch, when node protection is enabled, link protecting
repair paths are only computed if "link-fallback" is specified in
the configuration, on a per interface and IS-IS level.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
When enabling 'debug isis lfa', the option was correctly enabled
but not displayed by 'show debugging' command.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
When enabling TI-LFA the forward SPF for neighbors adjacent to the
PLR is computed. Later, when computing the PQ spaces, the reverse
SPF trees for those adjacent neighbors affected by the protected
interface are computed.
When node protection is enabled, TI-LFA link protection is run
immediately afterwards to compute repairs in case no
node-protecting backup path exists. In this second run, the
existing code tries to compute the reverse SPF tree for the same
node, without freeing the SPF tree of the prior run.
This patch fixes this by not computing the reverse SPF again, thus
avoiding a memory leak and an unnecessary SPF run.
Signed-off-by: Fredi Raspall <fredi@voltanet.io>
Currently the operational data is used for two things:
- to inherit the is-type from the isis instance
- to set passive flag for loopback interfaces
This commit implements the first one using only the config data.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
We need to delete isis config from interfaces when we delete the isis
router instance. This should be done using only config data.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
YANG model and CLI commands allow user to configure LDP-sync per area.
But the actual implementation is incorrect - all commands are changing
the config for the whole VRF instead of a single area. This commit fixes
this issue by actually implementing per area configuration.
Fixes#8578.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Currently we don't allow to configure the interface before the area is
configured. This approach has the following issues:
1. The area config can be deleted even when we have an interface config
relying on it. The code is not ready for that - we'll have a whole
bunch of stale pointers if user does that.
2. The code doesn't correctly process the event of changing the VRF for
an interface. There is no mechanism to ensure that the area exists
in the new VRF so currently the circuit still stays in the old VRF.
This commit allows an arbitrary order of area/interface configuration.
There is no more need to configure the area before configuring the
interface.
This change fixes both the issues.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Call from isis_circuit_create works only if we enable isis on an already
existing interface. If we configure isis on a pseudo interface and then
actually create it - this call doesn't work.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Necessary structures for snmp-id generation are currently stored in
`struct isis`. When we generate the new circuit ID, we always use the
instance from the default VRF. When we free the circuit ID, we use the
instance from the circuit VRF. This causes the following problems:
1. If there is no instance in the default VRF, this code doesn't work.
2. When circuit in non-default VRF is deleted, the ID is not actually
freed.
This is fixed by using global structures instead. The code itself is
moved to isis_snmp.c and linked to the main code using hooks. We should
not call SNMP-related code when the SNMP module is not loaded at all.
More than that, we don't allow to activate the circuit if we failed to
generate the SNMP ID. Even if SNMP support is completely disabled! This
check is removed.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When running isis and not running isis on all interfaces results
in a bunch of warn messages to the log about circuit state
changes. These warn messages also didn't bother to inform
the end user what interface was causing the fun. Since
the end operator cannot do anything with these warn messages
and nor should they in the vast array of normal operations
modify the code to use event debugging and turn the warns
to debugs.
Additionally add some information to clue the operator
in on to what actual interface we are talking about.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
`CFLAGS` is a "user variable", not intended to be controlled by
configure itself. Let's put all the "important" stuff in AC_CFLAGS and
only leave debug/optimization controls in CFLAGS.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
... by referencing all autogenerated headers relative to the root
directory. (90% of the changes here is `version.h`.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Most of these are many, many years out of date. All of them vary
randomly in quality. They show up by default in packages where they
aren't really useful now that we use integrated config. Remove them.
The useful ones have been moved to the docs.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
When you set the isis mtu to 200, isis ends up in a infinite loop
trying to fragment the tlv's.
Specifically ( for me ) the extended reachability function
for packing pack_item_extended_reach requires 11 + ISIS_SUBTLV_MAX_SIZE
room in the packet. Which is 180 bytes. At this point we have
174 bytes that we can write into a packet.
I created this by modifying the isis-topo1 topology to all
the isis routers to have a lsp-mtu of 200 and immediately
saw the crash.
Effectively the pack_items_ function had no detection for
when a part of the next bit it was writing into the stream
could not even fit and it would go into an infinite loop
allocating ~800 bytes at a time. This would cause the
router to run out of memory very very fast and the OOM
detector would kill the process.
Modify the code to notice that we have insufficient space to
even write any data into the stream.
I suspect that pack_item_extended_reach could also be optimized
to figure out exactly how much space is needed. But I also
think we need this protection in the function if this ever
happens again.
I also do not understand the use case of saying the min mtu is
200.
Fixes: #8289
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Convert most DEFINE_MTYPE into the _STATIC variant, and move the
remaining non-static ones to appropriate places.
Signed-off-by: David Lamparter <equinox@diac24.net>
Fix places where we are outputing an extra space. This was
because it was prepping for vrf but we may not have a vrf.
Fixes: #8300
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
no point in scheduling an LSP refresh immediately if we know it is
going to be postponed again due to the network still being in its
instability grace period
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
when we receive an event from BFDD and we end up throwing it away,
make sure that we log (with debug guards) the reason for this, so
we can troubleshoot issues like the one addressed by the previous
commit.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
A wrong check was silently skipping the initialization of the bfd_session
struct in the adjacency if the router was not configured for IPv6. This
would cause BFD events to be ignored regardless of the configuration.
Also add a function to return the "name" of an adjacency and use it in a
couple of places, including the new log, instead of repeating the same
code in a bunch of places.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Back when I put this together in 2015, ISO C11 was still reasonably new
and we couldn't require it just yet. Without ISO C11, there is no
"good" way (only bad hacks) to require a semicolon after a macro that
ends with a function definition. And if you added one anyway, you'd get
"spurious semicolon" warnings on some compilers...
With C11, `_Static_assert()` at the end of a macro will make it so that
the semicolon is properly required, consumed, and not warned about.
Consistently requiring semicolons after "file-level" macros matches
Linux kernel coding style and helps some editors against mis-syntax'ing
these macros.
Signed-off-by: David Lamparter <equinox@diac24.net>
The point of the `-std=gnu99` was to override a `-std=c99` that may be
coming in from net-snmp. However, we want C11, not C99.
Signed-off-by: David Lamparter <equinox@diac24.net>
There are places in the code where function nb_running_get_entry is used
with abort_if_not_found set to true during the config validation stage.
This is incorrect because when used in transactional CLI, the running
entry won't be set until the apply stage, and such usage leads to crash.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
when changing both ranges at the same time the order of the commands
matters, as we need to make sure that the intermediate state is valid.
This represents a problem when pushing configuration via frr-reload.
To fix this, the global-block command was extended to optionally
allow setting the local-block range as well. The local-block command
is deprecated with a 1-year notice.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Now it's possible to filter routes redistributed by another protocol using tag
which comes from zebra daemon.
Example of a possible configuration:
```
!
ipv6 route fd00::/48 blackhole tag 20
ipv6 route fd00::/60 blackhole tag 10
!
interface one
ipv6 router isis COMMON
isis circuit-type level-1
!
interface two
ipv6 router isis COMMON
isis circuit-type level-2-only
!
router isis COMMON
net fd.0000.0000.0000.0001.00
redistribute ipv6 static level-1 route-map static-l1
redistribute ipv6 static level-2 route-map static-l2
topology ipv6-unicast
!
route-map static-l1 permit 10
match tag 10
!
route-map static-l2 permit 10
match tag 20
!
```
Signed-off-by: Emanuele Altomare <emanuele@common-net.org>
Add support for read only mib objects from RFC4444.
Signed-off-by: Lynne Morrison <lynne@voltanet.io>
Signed-off-by: Karen Schoener <karen@voltanet.io>
When the last SID in the TI-LFA repair list is an Adj-SID from the
penultimate hop router towards the final hop, the No-PHP flag of the
original Prefix-SID must be honored in the repair list itself since
the penultimate hop router won't have a chance to process that SID
and pop it if necessary.
Reported-by: Fredi Raspall <fredi@voltanet.io>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
In some cases it's possible that the TI-LFA algorithms will try to
compute a SID repair list more than once for the same backup nexthop
[1]. This of course shouldn't be allowed, as a backup nexthop can't
have multiple label stacks. When that happens, we should just ignore
the new repair list if one is already applied, instead of asserting
and crashing the daemon.
[1] One scenario this can happen is when there's ECMP involving
different P-nodes in the PQ-space intersection.
Reported-by: Fredi Raspall <fredi@voltanet.io>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Neither tabs nor newlines are acceptable in syslog messages. They also
break line-based parsing of file logs.
Signed-off-by: David Lamparter <equinox@diac24.net>
Valgrind reports:
469901-==469901==
469901-==469901== Conditional jump or move depends on uninitialised value(s)
469901:==469901== at 0x3A090D: bgp_bfd_dest_update (bgp_bfd.c:416)
469901-==469901== by 0x497469E: zclient_read (zclient.c:3701)
469901-==469901== by 0x4955AEC: thread_call (thread.c:1684)
469901-==469901== by 0x48FF64E: frr_run (libfrr.c:1126)
469901-==469901== by 0x213AB3: main (bgp_main.c:540)
469901-==469901== Uninitialised value was created by a stack allocation
469901:==469901== at 0x3A0725: bgp_bfd_dest_update (bgp_bfd.c:376)
469901-==469901==
469901-==469901== Conditional jump or move depends on uninitialised value(s)
469901:==469901== at 0x3A093C: bgp_bfd_dest_update (bgp_bfd.c:421)
469901-==469901== by 0x497469E: zclient_read (zclient.c:3701)
469901-==469901== by 0x4955AEC: thread_call (thread.c:1684)
469901-==469901== by 0x48FF64E: frr_run (libfrr.c:1126)
469901-==469901== by 0x213AB3: main (bgp_main.c:540)
469901-==469901== Uninitialised value was created by a stack allocation
469901:==469901== at 0x3A0725: bgp_bfd_dest_update (bgp_bfd.c:376)
On looking at bgp_bfd_dest_update the function call into bfd_get_peer_info
when it fails to lookup the ifindex ifp pointer just returns leaving
the dest and src prefix pointers pointing to whatever was passed in.
Let's do two things:
a) The src pointer was sometimes assumed to be passed in and sometimes not.
Forget that. Make it always be passed in
b) memset the src and dst pointers to be all zeros. Then when we look
at either of the pointers we are not making decisions based upon random
data in the pointers.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When adjacencies change state the attached-bits in LSPs in other areas
on the router may need to be modified.
1. If a router no longer has a L2 adjacency to another area the
attached-bit must no longer be sent in the LSP
2. If a new L2 adjacency comes up in a different area then the
attached-bit should be sent in the LSP
Signed-off-by: Lynne Morrison <lynne@voltanet.io>
Valgrind reports:
2172861-==2172861==
2172861-==2172861== Syscall param write(buf) points to uninitialised byte(s)
2172861:==2172861== at 0x49B4FB3: write (write.c:26)
2172861-==2172861== by 0x48A4EA0: buffer_write (buffer.c:475)
2172861-==2172861== by 0x4915AD9: zclient_send_message (zclient.c:298)
2172861-==2172861== by 0x12AE08: isis_ldp_sync_state_req_msg (isis_ldp_sync.c:152)
2172861-==2172861== by 0x12B74B: isis_ldp_sync_adj_state_change (isis_ldp_sync.c:305)
2172861-==2172861== by 0x16DE04: hook_call_isis_adj_state_change_hook.isra.0 (isis_adjacency.c:141)
2172861-==2172861== by 0x16EE27: isis_adj_state_change (isis_adjacency.c:371)
2172861-==2172861== by 0x16F1F3: isis_adj_process_threeway (isis_adjacency.c:242)
2172861-==2172861== by 0x13BCCA: process_p2p_hello (isis_pdu.c:283)
2172861-==2172861== by 0x13BCCA: process_hello (isis_pdu.c:781)
2172861-==2172861== by 0x13BCCA: isis_handle_pdu (isis_pdu.c:1700)
Sending of request includes uninited memory at the end of the interface
name string. Fix
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Looks like the #if 0 code in this place was for ESI support
on solaris. We do not support solaris anymore. So let's
remove with prejudice.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The purpose of the Attach-bit is to accomplish inter-area routing. In other
venders, the Attached-bit is automatically set when a router is configured
as a L1|L2 router and has two adjacencies. When a L1 router receives a LSP
with the Attached-bit set it is supposed to create a default route pointing
toward the neighbor to provide a default path out of the L1 area.
ISIS implementation has been fixed to support the above definition:
Setting the Attach-bit is now the default behavior and we allow the user to
turn it off.
We will only set the Default Attach-bit when creating a L1 LSP, if we are
a L1|L2 router and have a L2 adjacency up.
When a L1 router receives a LSP with the Attach-bit set, we will create a
default route pointing to the L1|L2 router as the nexthop.
The default route will be removed if the LSP is received with the Attach-bit
cleared.
Signed-off-by: Lynne Morrison <lynne@voltanet.io>
Currently the transition metric style is redundant because isis will
always read both reachability TLVs regardless of the configured
metric style. Correct this by only considering TLVs matching our
configuration.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
These two debug messages are so verbose to a point they impact
performance when testing RLFA/TI-LFA on large-scale networks. Remove
them since they aren't really useful.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Always call vid2string() whenever necessary instead of trying to be
too clever and call it only once. The original assumption was that
"buf" only needed to be initialized when LFA debugging was enabled,
but we also need that buffer when logging one error message.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Remote LFA (RFC 7490) is an extension to the base LFA mechanism
that uses dynamically determined tunnels to extend the IP-FRR
protection coverage.
RLFA is similar to TI-LFA in that it computes a post-convergence
SPT (with the protected interface pruned from the network topology)
and the P/Q spaces based on that SPT. There are a few differences
however:
* RLFAs can push at most one label, so the P/Q spaces need to
intersect otherwise the destination can't be protected (the
protection coverage is topology dependent).
* isisd needs to interface with ldpd to obtain the labels it needs to
create a tunnel to the PQ node. That interaction needs to be done
asynchronously to prevent blocking the daemon for too long. With
TI-LFA all required labels are already available in the LSPDB.
RLFA and TI-LFA have more similarities than differences though,
and thanks to that both features share a lot of code.
Limitations:
* Only RLFA link protection is implemented. The algorithm used
to find node-protecting RLFAs (RFC 8102) is too CPU intensive and
doesn't always work. Most vendors implement RLFA link protection
only.
* RFC 7490 says it should be a local matter whether the repair path
selection policy favors LFA repairs over RLFA repairs. It might be
desirable, for instance, to prefer RLFAs that satisfy the downstream
condition over LFAs that don't. In this implementation, however,
RLFAs are only computed for destinations that can't be protected
by local LFAs.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The "load-sharing" node is a boolean leaf that has a default
value. As such, it doesn't make sense to either create or delete
it. That node always exists in the configuration tree. Its value
should only be modified. Change the corresponding CLI wrapper
command to reflect that fact.
This commit doesn't introduce any change of behavior as the NB API
maps create/destroy edit operations to modify operations whenever
that makes sense. However it's better to not rely on that behavior
and always use the correct operations in the CLI commands.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When last area address is removed, resign if we were DR.
This fixes an issue where: when the ISIS area address is changed, ISIS fails
to elect a new DR.
Signed-off-by: Karen Schoener <karen@voltanet.io>
Removing the obsolete ldp-sync periodic 'hello' message.
When ldp-sync is configured, IGPs take action if the LDP process goes down.
The IGPs have been updated to use the zapi client close callback to detect
the LDP process going down.
Signed-off-by: Karen Schoener <karen@voltanet.io>
In some extraordinary circumstances an LSP might not have any
TLV. Add a null check to prevent a crash when that happens.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When ldp-sync is configured, IGPs take action if the LDP process goes down.
Currently, IGPs detect the LDP process is down if they do not receive a
periodic 'hello' message from LDP within 1 second.
Intermittently, this heartbeat mechanism causes false topotest failures.
When the failure occurs, LDP is busy receiving messages from zebra for a
few seconds. During this time, LDP does not send the expected periodic
message.
With this change, IGPs detect LDP down via zapi client close message.
Signed-off-by: Karen Schoener <karen@voltanet.io>
Instead of storing the LSP associated to pseudonodes only, store the
LSP associated to all SPF adjacencies instead.
The upcoming LFA work will need to have that piece of information
for all SPF adjacencies in order to know which ones have the overload
bit set or not. Other use cases might arise in the future.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Rename "debug isis ti-lfa" to "debug isis lfa". Having different
debug guards for different kinds of LFA (classic, remote and TI-LFA)
doesn't make sense since all LFA solutions share code to certain
extent.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Those constants are also useful in contexts other than LDP-IGP
Synchronization (e.g. the upcoming LFA work will need them). Move
them to a more general header to reflect that.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Do not attempt to install a TI-LFA backup nexthop if its number of
labels exceeds the locally configured MSD (Maximum Stack Depth). The
idea is to prevent forward-plane installation failures before they
happen. The MSD check should also allow the "show isis fast-reroute
summary" command (not implemented yet) to display the actual
protection coverage provided by TI-LFA, which might not be 100%
if the MSD isn't big enough.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Commit 4c75f7c773 fixed a bug in which the TI-LFA repair paths
weren't preserving the original Prefix-SID of the routes. That
commit, however, didn't update the zebra interface code to account
for backup nexthops that don't have a repair list but do have a
SR label. As a consequence, backup nexthops that didn't have any
repair label were not preserving the original Prefix-SID of the
corresponding routes. Fix this and update the TI-LFA topotest
accordingly.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
vertex->N is an union whose "id" and "ip" fields are only valid
depending on the vertex type (IS adjacency or IP reachability
information). As such, add a vertex type check before consulting
vertex->N.id in order to prevent unexpected behavior from happening.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The "ifp" variable returned by nb_running_get_entry() might be
NULL when using the transactional CLI mode. Make the required
modifications to avoid null pointer dereferences.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Once the remote end of a connected link is shut down (or lose
its address), isisd will remove the corresponding route from its
RIB after SPF runs. A new route for the same destination should
be computed based on the local LSP, and that route by definition
doesn't have any nexthop. The problem is that, when isisd tries
to replace the old route by the new one, it fails because routes
without nexthops can't be installed. That causes the old invalid
route to remain in the RIB when it shouldn't. To fix this problem,
change the zebra interface code to uninstall a route whenever it
can't be installed (because it lacks nexthops) instead of doing
nothing in that case.
This change should fix occasional failures of the test_isis_sr_topo1
topotest.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The `enum zclient_send_status` enum needs to be extended
throughout the code base to use the new states and
to fix up places where we tested against the return
value being non zero.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
On redistribution into isis we were creating a table for
handling the redistributed routes, but never cleaning them
up on shutdown properly. Do so.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When isis is being shutdown the area->spf_timer thread has
special data assigned to that was never being freed.
Free this data.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The route_map_object_t was being used to track what protocol we were
being called against. But each protocol was only ever calling itself.
So we had a variable that was only ever being passed in from route_map_apply
that had to be carried against and everyone was testing if that variable
was for their own stack.
Clean up this route_map_object_t from the entire system. We should
speed some stuff up. Yes I know not a bunch but this will add up.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is a second iteration of commit 10bdc68f0c. Some recent
commits introduced zlog calls in the northbound callbacks
inadvertently.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
isisd relies on its YANG module to prevent the same SID index
from being configured multiple times for different prefixes. It's
possible, however, to have different routers assigning the same SID
index for different prefixes. When that happens, we say we have a
Prefix-SID collision, which is ultimately a misconfiguration issue.
The problem with Prefix-SID collisions is that the Prefix-SID that
is processed later overwrites the previous ones. Then, once the
Prefix-SID collision is fixed in the configuration, the overwritten
Prefix-SID isn't reinstalled since it's already marked as installed
and it didn't change. To prevent such inconsistency from happening,
add a safeguard in the SPF code to detect Prefix-SID collisions and
handle them appropriately (i.e. log a warning + ignore the Prefix-SID
Sub-TLV since it's already in use by another prefix). That way,
once the configuration is fixed, no Prefix-SID label entry will be
missing in the LFIB.
Reported-by: Emanuele Di Pascale <emanuele@voltanet.io>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The fields in the broadcast/p2p union struct in an isis circuit are
initialized when the circuit goes up, but currently this step is
skipped if the interface is passive. This can create problems if the
circuit type (referred to as network type in the config) changes from
broadcast to point-to-point. We can end up with the p2p neighbor
pointer pointing at some garbage left by the broadcast struct in the
union, which would then cause a segfault the first time we would
dereference it - for example when building the lsp, or computing the
SPF tree.
compressed backtrace of a possible crash:
#0 0x0000555555579a9c in lsp_build at frr/isisd/isis_lsp.c:1114
#1 0x000055555557a516 in lsp_regenerate at frr/isisd/isis_lsp.c:1301
#2 0x000055555557aa25 in lsp_refresh at frr/isisd/isis_lsp.c:1381
#3 0x00007ffff7b2622c in thread_call at frr/lib/thread.c:1549
#4 0x00007ffff7ad6df4 in frr_run at frr/lib/libfrr.c:1098
#5 0x000055555556b67f in main at frr/isisd/isis_main.c:272
isis_lsp.c:
1112 case CIRCUIT_T_P2P: {
1113 struct isis_adjacency *nei = circuit->u.p2p.neighbor;
1114 if (nei && nei->adj_state == ISIS_ADJ_UP
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
There exists a code path where we would allocate memory
then test a variable and then immediately return NULL.
Prevent memory from leaking in this situation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
valgrind is showing a usage of uninited memory:
==935465== Conditional jump or move depends on uninitialised value(s)
==935465== at 0x159E17: tlvs_area_addresses_to_adj (isis_tlvs.c:4430)
==935465== by 0x15A4BD: isis_tlvs_to_adj (isis_tlvs.c:4568)
==935465== by 0x1377F0: process_p2p_hello (isis_pdu.c:203)
==935465== by 0x1391FD: process_hello (isis_pdu.c:781)
==935465== by 0x13BDBE: isis_handle_pdu (isis_pdu.c:1700)
==935465== by 0x13BECD: isis_receive (isis_pdu.c:1744)
==935465== by 0x49210FF: thread_call (thread.c:1585)
==935465== by 0x48CFACB: frr_run (libfrr.c:1099)
==935465== by 0x1218C9: main (isis_main.c:272)
==935465==
==935465== Conditional jump or move depends on uninitialised value(s)
==935465== at 0x483EEC5: bcmp (vg_replace_strmem.c:1111)
==935465== by 0x15A290: tlvs_ipv4_addresses_to_adj (isis_tlvs.c:4512)
==935465== by 0x15A4EB: isis_tlvs_to_adj (isis_tlvs.c:4570)
==935465== by 0x1377F0: process_p2p_hello (isis_pdu.c:203)
==935465== by 0x1391FD: process_hello (isis_pdu.c:781)
==935465== by 0x13BDBE: isis_handle_pdu (isis_pdu.c:1700)
==935465== by 0x13BECD: isis_receive (isis_pdu.c:1744)
==935465== by 0x49210FF: thread_call (thread.c:1585)
==935465== by 0x48CFACB: frr_run (libfrr.c:1099)
==935465== by 0x1218C9: main (isis_main.c:272)
Effectively we are reallocing memory to hold data. realloc does not
set the new memory to anything. So whatever happens to be in the memory
is what is there. after the realloc happens we are iterating over the
memory just realloced and doing memcmp's to values in it causing these
use of uninitialized memory.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Replace all lib/thread cancel macros, use thread_cancel()
everywhere. Only the THREAD_OFF macro and thread_cancel() api are
supported. Also adjust thread_cancel_async() to NULL caller's pointer (if
present).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
==935465== 40 bytes in 1 blocks are definitely lost in loss record 71 of 546
==935465== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==935465== by 0x48D6611: qcalloc (memory.c:110)
==935465== by 0x48CFE02: list_new (linklist.c:32)
==935465== by 0x15DBF0: isis_new (isisd.c:213)
==935465== by 0x15DAC4: isis_global_instance_create (isisd.c:179)
==935465== by 0x121892: main (isis_main.c:264)
==935465== 64 (40 direct, 24 indirect) bytes in 1 blocks are definitely lost in loss record 101 of 546
==935465== at 0x483AB65: calloc (vg_replace_malloc.c:760)
==935465== by 0x48D6611: qcalloc (memory.c:110)
==935465== by 0x48CFE02: list_new (linklist.c:32)
==935465== by 0x15DBE3: isis_new (isisd.c:212)
==935465== by 0x15DAC4: isis_global_instance_create (isisd.c:179)
==935465== by 0x121892: main (isis_main.c:264)
On isis shutdown we are seeing the above memory leaks. Modify
the code to start cleaning this up.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add the "n-flag-clear" option to the "segment-routing prefix"
command. The only thing that option does is to clear the node
flag of the Prefix-SID, even if it corresponds to a local loopback
address. No changes are necessary other than that in order to fully
support Anycast-SIDs. isisd already supports multiple routers
advertising the same route with the same Prefix-SID after the recent
refactoring. Clearing the node flag for such anycast routes isn't
strictly required, but failure to do so can lead to problems like
TI-LFA picking the wrong Prefix-SID when calculating repair paths.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When computing backup nexthops for routes that contain a Prefix-SID,
the original Prefix-SID label should be present at the end of
backup label stacks (after the repair labels). This commit fixes
that oversight in the original TI-LFA code. The SPF unit tests and
TI-LFA topotes were also updated accordingly.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>