Since LSP fragments are also on our lspdb dict, lsp_tick() needs to skip
over them after calling lsp_destroy(). Otherwise it ends up accessing
free'd memory.
Fixes: #3533
Signed-off-by: David Lamparter <equinox@diac24.net>
When receiving an LSP with same sequence number but different
checksum as in the local database, we would always treat it as
newer than the local LSP.
That behavior is incorrect if the local LSP is indeed a purged
LSP waiting for age-out and the received one is not.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Purged fragments would always be reoriginated by isisd. They
should only be purged once and never be reoriginated.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
When `first` would be initialized to the same value as `last`, the
function would return incorrect results.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
It turns out 50ms is actually too short to aggregate all changes
in some cases, so allow for 100ms.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
When there is a stream of events coming in, where IS-IS learns
about a lot of updates, IS-IS would regenerate its LSPs before
the updates have been processed completely.
This causes suboptimal convergence because the intermediate state
will be flooded. Only after the configured `lsp_gen_interval`, a
new update with the correct and final state will be generated.
Resolve this by holding off LSP generation while there are still
events coming in.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
lsp_l1_refresh and lsp_l2_refresh are identical apart from the
hardcoded IS-IS level they are referring to. So merge them and
pass the level as part of the argument.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
For debugging the timing of LSP generation, it is useful to know
which event caused a regeneration to be scheduled. Therefore, add
this information to the debug log.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
IS-IS would ignore any area lsp-mtu setting configured after initial
creation of the LSP since move to the new tlv serialized/deserializer.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
isisd would crash when lsp fragments aged out, since they got freed
correctly, but were not removed from LSP0's linked list of fragments.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Implement RFC 6232, optionally allowing to flood isisd's NET and
hostname in purges it originates.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Regular IS-IS will flood any LSP updates out to all circuits except the
one where it was received on. This is done in `lsp_flood`.
Change `lsp_flood` for fabricd to use the optimized flooding algorithm
instead.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Before this commit, isisd/fabricd maintained a bitfield for each LSP
to track the SRM bit for each circuit, which specifies whether an LSP
needs to be sent on that circuit. Every second, it would scan over all
LSPs in `lsp_tick` and queue them up for transmission accordingly.
This design has two drawbacks: a) it scales poorly b) it adds
unacceptable latency to the update process: each router takes a random
amount of time between 0 and 1 seconds to forward an update. In a
network with a diamter of 10, it might already take 10 seconds for an
update to traverse the network.
To mitigate this, a new design was chosen. Instead of tracking SRM in a
bitfield, have one tx_queue per circuit and declare that an LSP is in
that queue if and only if it would have SRM set for that circuit.
This way, we can track SRM similarly as we did before, however, on
insertion into the LSP queue, we can add a timer for (re)transmission,
alleviating the need for a periodic scan with LSP tick and reducing the
latency for forwarding of updates.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
To avoid passing of traffic via leaf nodes in the fabric, OpenFabric
specifies that all links towards tier 0 nodes should be advertised with
a very high metric.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
While OpenFabric calculates most tier numbers automatically by the
fabric locality calculation algorithm, that algorithm requires two
systems to be manually configured as tier 0, so it has reference points.
Also, completely manual configuration is possible.
To support this, introduce appropriate CLI commands and flood the
configured information.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
OpenFabric changes IS-IS's initial database synchronization. While
regular IS-IS will simultaneuously exchange LSPs with all neighboring
routers during startup, this is considered too much churn for a densely
connected fabric.
To mitigate this, OpenFabric prescribes that a router should only
bring up an adjacency with a single neighbor and perform a full
synchronization with that neighbor, before bringing up further
adjacencies.
This is implemented by having a field `initial_sync_state` in the
fabricd datastructure which tracks whether an initial sync is still
pending, currently in progress, or complete.
When an initial sync is pending, the state will transition to the
in-progress state when the first IIH is received.
During this state, all IIHs from other routers are ignored. Any
IIHs transmitted on any link other than the one to the router with
which we are performing the initial sync will always report the far
end as DOWN in their threeway handshake state, avoiding the formation of
additional adjacencies.
The state will be left if all the SRM and SSN flags on the
initial-sync circuit are cleared (meaning that initial sync has
completed). This is checked in `lsp_tick`. When this condition occurrs,
we progress to the initial-sync-complete state, allowing other
adjacencies to form.
The state can also be left if the initial synchronization is taking too
long to succeed, for whatever reason. In that case, we fall back to the
initial-sync-pending state and will reattempt initial synchronization
with a different neighbor.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
This correction fixes two bugs detected by Clang scan:
Bug Group: Dead store
Bug Type: Dead assignment
File: zebra/kernel_netlink.c
Function: netlink_parse_extended_ack
Line: 548
Bug Type: Dead increment
File: isisd/isis_lsp.c
Function: lsp_bits2string
Line: 625
Signed-off-by: F. Aragon <paco@voltanet.io>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
None of these variables can actually be used before being initialized,
but unfortunately some old compilers are not smart enough to detect that.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Add a timestamp information for level 2 circuits, otherwise if the
circuit is marked as already processed on level 1 we will not process
level 2 areas.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Convert the list_delete(struct list *) function to use
struct list **. This is to allow the list pointer to be nulled.
I keep running into uses of this list_delete function where we
forget to set the returned pointer to NULL and attempt to use
it and then experience a crash, usually after the developer
has long since left the building.
Let's make the api explicit in it setting the list pointer
to null.
Cynical Prediction: This code will expose a attempt
to use the NULL'ed list pointer in some obscure bit
of code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>