a) Make timers more aggressive for this test
b) Double run_and_expect time for one sub test.
These two changes cause this test to pass regularly for
me when this test used to fail regularly for me.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Under some circumstances it might happen that the session is quickly UP in the
middle of `clear bgp ...` and `shutdown`. That leads to session be UP, and
the stale routes being cleared quickly.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
These failing tests are hard to track down. Added numbering to each assert
to easily tell where the test fails.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
When enabling "mpls ldp-sync" under "router ospf" ospfd configures
SET_FLAG(ldp_sync_info->flags, LDP_SYNC_FLAG_IF_CONFIG) so internally knowing
that the ldp-sync feature is enabled. However the flag is not cleared when
turning of the feature using "nompls ldp-sync"!
https://github.com/FRRouting/frr/issues/16375
Signed-off-by: Christian Breunig <christian@breunig.cc>
The bgp_duplicate_nexthop test installs routes with nexthop's
flags set to both DUPLICATE and FIB: this should not happen.
The DUPLICATE flag of a nexthop indicates this nexthop is already
used in the same nexthop-group, and there is no need to install it
twice in the system; having the FIB flag set indicates that the
nexthop is installed in the system. This is why both flags should
not be set on the same nexthop.
This case happens at installation time, but can also happen
at update time.
- Fix this by not setting the FIB flag value when the DUPLICATE
flag is present.
- Modify the bgp_duplicate_test to check that the FIB flag is not
present on duplicated nexthops.
- Modify the bgp_peer_type_multipath_relax test.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
During the bgp_peer_type_multipath_relax_test, the test does not
check the 'duplicate' flag value of the duplicate nexthop.
Fix this by adding the duplicate value in the expected json files.
Fixes: ee88563ac2 ("bgpd: Add 'bgp bestpath peer-type multipath-relax'")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
1. The lsp_update_data() function will check for the presence of the ISIS_TLV_DYNAMIC_HOSTNAME in the LSP, and then call isis_dynhn_insert() to add a hostname entry corresponding to the LSP ID. However, when the ISIS_TLV_DYNAMIC_HOSTNAME is not present in the LSP, the hostname entry corresponding to the LSP ID should also be deleted.
2. The command “show isis neighbor” invokes isis_adj_name() to display the System ID or hostname, but it does not check the area->dynhostname flag.
3. When the LSP expires and is removed, the corresponding hostname entry should also be deleted.
4. The TLV for LSP fragmentation will not contain the hostname and should be skipped.
Signed-off-by: zhou-run <zhou.run@h3c.com>
Since the displayed header of "show ip rip" and "show ipv6 ripng" are changed,
we should update tests of ripd and ripngd.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
In some cases (large scale) it's desired to avoid changing configurations, but
let the BGP to automatically handle ASN changes.
`auto` means the peering can be iBGP or eBGP. It will be automatically detected
and adjusted from the OPEN message.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Current code when a link is set down is to just mark the
nexthop group as not properly setup. Leaving situations
where when an interface goes down and show output is
entered we see incorrect state. This is true for anything
that would be checking those flags at that point in time.
Modify the interface down nexthop group code to notice the
nexthops appropriately ( and I mean set the appropriate flags )
and to allow a `show ip route` command to actually display
what is going on with the nexthops.
eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
Known via "sharp", distance 150, metric 0, best
Last update 00:00:06 ago
* 192.168.44.33, via dummy1, weight 1
* 192.168.45.33, via dummy2, weight 1
sharpd@eva:~/frr1$ sudo ip link set dummy2 down
eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
Known via "sharp", distance 150, metric 0, best
Last update 00:00:12 ago
* 192.168.44.33, via dummy1, weight 1
192.168.45.33, via dummy2 inactive, weight 1
Notice now that the 1.0.0.0/32 route now correctly
displays the route for the nexthop group entry.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a new start option "-K" to libfrr to denote a graceful start,
and use it in zebra and bgpd.
zebra will use this option to denote a planned FRR graceful restart
(supporting only bgpd currently) to wait for a route sync completion
from bgpd before cleaning up old stale routes from the FIB. An optional
timer provides an upper-bounds for this cleanup.
bgpd will use this option to denote either a planned FRR graceful
restart or a bgpd-only graceful restart, and this will drive the BGP
GR restarting router procedures.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
The current OSPF neighbor retransmission operates on a single per-neighbor
periodic timer that sends all LSAs on the list when it expires.
Additionally, since it skips the first retransmission of received LSAs so
that at least the retransmission interval (resulting in a delay of between
the retransmission interval and twice the interval. In environments where
the links are lossy on P2MP networks with "delay-reflood" configured (which
relies on neighbor retransmission in partial meshs), the implementation
is sub-optimal (to say the least).
This commit reimplements OSPF neighbor retransmission as follows:
1. A new data structure making use the application managed
typesafe.h doubly linked list implements an OSPF LSA
list where each node includes a timestamp.
2. The existing neighbor LS retransmission LSDB data structure
is augmented with a pointer to the list node on the LSA
list to faciliate O(1) removal when the LSA is acknowledged.
3. The neighbor LS retransmission timer is set to the expiration
timer of the LSA at the top of the list.
4. When the timer expires, LSAs are retransmitted that within
the window of the current time and a small delta (50 milli-secs
default). The LSAs that are retransmited are given an updated
retransmission time and moved to the end of the LSA list.
5. Configuration is added to set the "retransmission-window" to a
value other than 50 milliseconds.
6. Neighbor and interface LSA retransmission counters are added
to provide insight into the lossiness of the links. However,
these will increment quickly on non-fully meshed P2MP networks
with "delay-reflood" configured.
7. Added a topotest to exercise the implementation on a non-fully
meshed P2MP network with "delay-reflood" configured. The
alternative was to use existing mechanisms to instroduce loss
but these seem less determistic in a topotest.
Signed-off-by: Acee Lindem <acee@lindem.com>
The atomlist test consists of a sequence of (MT) sub-tests, from which
counters are collected and verified. TSAN doesn't know that these
counters are synchronized by way of the sub-test starting and finishing,
so it complains. Just use atomics to get rid of the warning.
(This is solely an issue with the test, not the atomlist code. There
are no warnings from that.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
TSAN warns about leaving the second thread dangling. Doesn't really
matter, but just add a pthread_join to get rid of the warning.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
seqlock_bump() used to return the value before bumping, but that's
unhelpful if you were to actually need it. I had changed it to return
the value after, but the update to the test got lost at some point.
The return value is not in fact used anywhere in FRR, so while it is
a bug, it has zero impact.
NB: yes, test_seqlock is not run, which sounds wrong. The problem here
is that (a) the test itself uses sleeps and is timing sensitive, which
would raise false positives. And (b), the test is meaningless if
executed once. It needs to be run millions of times under various
conditions (e.g. load) to catch rare races, and it needs to be run on
machines with "odd" memory models (in this case I used BE ppc32 and
ppc64 systems as test platforms.)
Fixes: 6046b690b5 ("lib/seqlock: avoid syscalls in no-waiter cases")
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The command "show isis topology" calls print_sys_hostname() to display the system ID or hostname, but it does not check the area->dynhostname flag.
Signed-off-by: zhou-run <zhou.run@h3c.com>
Add a topotest that ensures that when addpath is enabled and two
paths with same nexthop are received, they are sent to ZEBRA which
detects 'duplicate nexthop'.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Some tests may want to use the json facility of iproute2 to
dump some results.
Add an internal API in lib/topotest.py that tells whether iproute2
is json capable or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Taking over this development from https://github.com/FRRouting/frr/pull/14788
This commit addresses 4 issues found in the previous PR
1) FRR would accept messages from a spoke without authentication when FRR NHRP had auth configured.
2) The error indication was not being sent in network byte order
3) The debug print in nhrp_connection_authorized was not correctly printing the received password
4) The addresses portion of the mandatory part of the error indication was invalid on the wire (confirmed in wireshark)
Signed-off-by: Dave LeRoy <dleroy@labn.net>
Co-authored-by: Volodymyr Huti <volodymyr.huti@gmail.com>
The isis_tilfa_topo1 topotest is comprehensive and contains a large
amount of reference data. One problem is that, when changes occur,
updating this reference data can be difficult.
To address this problem, this commit introduces a method to
automatically regenerate the reference data by setting the `REGEN_DATA`
environment variable.
Usage:
$ REGEN_DATA=true python3 ./test_isis_tilfa_topo1.py
When `REGEN_DATA` is set, the topotest regenerates reference data
from the current run instead of comparing against existing reference
data. Note that regenerated data must be manually verified for
correctness.
This commit also simplifies the reference data by replacing all diff
files with complete JSON snapshots.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
In this topotest, steps 10-15 were added to test the IS-IS switchover
functionality. In short, two cases were tested: switchover after a
link down event and switchover after a BFD down event. Both cases
were tested in sequence on the same router, rt6. This involved the
following steps:
- Setting the SPF delay timer to 15 seconds
- Shutting down the eth-rt5 interface from the switch side
- Testing the post-switchover RIB and LIB (triggered by the link down
event)
- Testing the post-SPF RIB and LIB
- Bringing the eth-rt5 interface back up
- Configuring a BFD session between rt6 and rt5
- Shutting down the eth-rt5 interface from the switch side once again
- Testing the post-switchover RIB and LIB (triggered by the BFD down
event)
- Testing the post-SPF RIB and LIB
Since the time window to test the post-switchover RIB and LIB was too
narrow (10 seconds), these tests were having sporadic failures.
To resolve this problem, we can simplify the switchover test as follows:
- Setting the SPF delay timer to 60 seconds (not 15)
- Disabling "link-detect" on rt6's eth-rt5 interface
- Shutting down the eth-rt5 interface from the switch side
- On rt6, testing the post-switchover RIB and LIB (triggered by the
BFD down event)
- On rt5, testing the post-switchover RIB and LIB (triggered by the
link down event)
Notice how we can test both post-link-down and post-BFD-down switchover
cases simultaneously by having different "link-detect" configurations
on rt5 and rt6. Additionally, by using a larger SPF delay timer, the
time window to test the post-switchover RIB and LIB is much larger
and less prone to sporadic failures.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Check that "show ip route vrf XXX json" and the JSON at key "XXX" of
"show ip route vrf all json" gives the same output.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
- `darr_free_free` to `darr_free` each element prior to `darr_free`
the array.
- `darr_free_func` to call `func` on each element prior to `darr_free`
the array.
Signed-off-by: Christian Hopps <chopps@labn.net>
Topotest relevant changes:
- add support for `timeout` arg to `cmd_*()`
- handle invalid regexp in CLI commands
- fix long interface name support
Full munet changelog:
munet: 0.14.9: add support for `timeout` arg to `cmd_*()`
munet: 0.14.8: cleanup the cleanup (kill) on launch options
munet: 0.14.7: allow multiple extra commands for shell console init
munet: 0.14.6:
- qemu: gather gcda files where munet can find them
- handle invalid regexp in CLI commands
munet: 0.14.5:
- (podman) pull missing images for containers
- fix long interface name support
- add another router example
munet: 0.14.4: mutest: add color to PASS/FAIL indicators on tty consoles
munet: 0.14.3: Add hostnet node that runs it's commands in the host network namespace.
munet: 0.14.2:
- always fail mutest tests on bad json inputs
- improve ssh-remote for common use-case of connecting to host connected devices
- fix ready-cmd for python v3.11+
munet: 0.14.1: Improved host interface support.
Signed-off-by: Christian Hopps <chopps@labn.net>
The test is done on r2. A BGP update is received on r2, and is
filtered on r2. The RIB of r2 does not have the BGP update stored,
but the ADJ-RIB-IN is yet present. To demonstrate this, if the
inbound route-map is removed, then the BGP update should be copied
from the the ADJ-RIB-IN and added to the RIB with the label
value.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
This test ensures that when r1 changes the label value, then
the new value is automatically propagated to remote peer.
This demonstrates that the ADJ-RIB-OUT to r2 has been correctly
updated.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Only output requested information to stdout so it can be
filtered and captured in shell variables etc...
Signed-off-by: Christian Hopps <chopps@labn.net>
Contains 2 testcases. The first does a basic configuration/connectivity.
The second testcase initiates a shortcut through the primary NHS,
verifies shortcut routes are installed. Primary NHS interface brought
down and verify that the shortcut is not impacted. Finally verify that
after the shortcut expires, it is able to be re-established via a backup
NHS.
Signed-off-by: dleroy <dleroy@labn.net>
Locally this test would occassionally fail for me
because the connected route the sharp route being
installed has not fully come up yet due to heavy
load and start up slowness. Add a bit of code
to look for the problem and make sure it doesn't
happen.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>