Recent commits moved the default retries to 60, but
the higher ecmp counts were over-riding to 40. Let's
make it 80.
Noticed this when I went looking at failures on 386 platforms
in our ci. Route scale is timing out when deleting routes.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Test fails:
test_func = partial(
topotest.router_json_cmp,
router,
"show ip ospf vrf {0}-ospf-cust1 json".format(rname),
expected,
)
_, diff = topotest.run_and_expect(test_func, None, count=10, wait=0.5)
assertmsg = '"{}" JSON output mismatches'.format(rname)
> assert diff is None, assertmsg
E AssertionError: "r1" JSON output mismatches
E assert Generated JSON diff error report:
E
E > $->r1-ospf-cust1->areas->0.0.0.0->nbrFullAdjacentCounter: output has element with value '1' but in expected it has value '2'
/home/sharpd/frr2/tests/topotests/ospf_netns_vrf/test_ospf_netns_vrf.py:239: AssertionError
Support bundle has this data:
r1# show ip ospf vrf all neighbor
% 2024/08/28 14:55:54.763
VRF Name: r1-ospf-cust1
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
10.0.255.3 1 Full/DR 10.547s 39.456s 10.0.3.1 r1-eth1:10.0.3.2 0 0 0
10.0.255.2 1 Full/Backup 0.543s 38.378s 10.0.3.3 r1-eth1:10.0.3.2 1 0 0
So immediately after the test fails this test, the neighbor comes up.
Let's give the test a bit more time for failure to not happen
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Giving only 5 seconds to pass bgp data to peers on a heavily
loaded system is a recipe for not having fun. Add more time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This test was killing bgp on r1 and r2
and then immediately testing that the
default route transitioned. Unfortunately
the test was written that under load the
system might be in a bad state. Let's
modify the code to check for a bgp version
change and then that the bgp state has
come back up
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
a) A noprefix address by itself should not create a connected route.
This was pre-existing.
b) A noprefix address with a corresponding route should result in a
connected route. This is how NetworkManager appears to work.
This is new behavior, so a new test.
c) A route is added to the system from someone else.
This is new behavior, so a new test.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The dead timer is set to 4 seconds, while the hello interval is set to 6535.
This test will only pass of the platform is fast enough for ospfv3 to
converge in 4 seconds. These timers were already tested multiple time earlier.
This test should just make sure that the boundary value 65535 is configurable,
Other changes in this commit:
- add sequence numbers to the dead intervals tests to make it easier to
track test faliures.
- swap the config order in one test to match order with all other tests.
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
Current code adds a new vlan interface, sets up ospf and
pim on it and immediately starts shoving data down the pipes.
This of course has the fun thing where the IGP and pim do not
always come up in a nice neat manner and the test is looking
for state from a nice neat come up, even though pim is `working`
correctly it is not correct for what the test wants.
Modify the code to ensure that ospf is up and has propagated
the route where it is needed as well as that pim neighbors have
properly come up, then initiate the multicast streams and igmp
reports.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Changes:
- mutini: handle possible missed zombie cleanup leading to test hangs
- mutini: also we avoid logging in the signal handler which was causing
an exception.
Signed-off-by: Christian Hopps <chopps@labn.net>
All routes received by zebra from upper level protocols have a weight
of 1. Let's just make everything extremely consistent in our code.
Lot's of tests needed to be fixed up to make this work.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This test checks the bgp crash on rt2 when 2 commands
launched consequently:
T0: rr, config -> router bgp 65004 -> neighbor 192.168.12.2 password 8888
T1: rt2, snmpwalk -v 2c -c public 127.0.0.1 .1.3.6.1.4.1.7336.4.2.1
T2: test if rt2 bgp is crashed.
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
Copied the existing "join-group" test and modified to test
static groups instead. Functionally the same but without IGMP
reports.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Upstream CI is frequently running into a situation where
the routes are not being installed. These routes
start at the beginning and suddenly in the middle
they start working properly.
D 1.0.15.183/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.184/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.185/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D>* 1.0.15.186/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.187/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.188/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
Turning on some debugs showed that the failed installed routes are
trying to be matched against the default route. Thus implying
all the connected routes for the test are not yet successfully
installed. Let's modify the test(s) on startup to just ensure
that the connected routes are installed correctly. I am no
longer seeing the problem after this change.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The all_protocol_startup topotest needs to allow for some delay
between configuring nexthop-groups and their installation. Add
some wait periods in a couple of nhg test cases.
Signed-off-by: Mark Stapp <mjs@cisco.com>
Vtysh has been improved to startup very quickly this exposed a race in this
test, where the `clear ip rip...` command ran before the test client that
handles it had finished connecting to mgmtd. Add a retried check for the test
client being connected before issuing the `clear ip rip ...` test command.
Signed-off-by: Christian Hopps <chopps@labn.net>
Add some timers to make convergence happan as fast as possible
when a connection fails on the intial attempt.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add some timers to make the convergence happen as fast as possible
when a connection fails on the initial attempt.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running bfd_bgp_cbit_topo3 and an intial connection
goes wrong, try to connect again as fast as possible as
that the timer is 2 minutes otherwise and the test will
never come back from it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This test is frequently failing in the upstream CI. Most
log failures are stating that we expected something like
1 million routes but we have 900k+. Looks like the system
is just loaded a bit more than expected. Let's give these
tests a bit more time to complete.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Store a parsed and built graph of the CLI nodes in vtysh, rather than
parsing and building that graph every time vtysh starts up.
This provides a 3x to 5x reduction in vtysh startup overhead:
`vtysh -c 'configure' -c 'interface lo' -c 'do show version'`
- before: 92.9M cycles, 1114 samples
- after: 16.5M cycles, 330 samples
This improvement is particularly visible for users scripting `vtysh -c`
calls, which notably includes topotests.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Modify existing MSDP topology to use test SA filtering:
- Add new multicast host (so we get two sources for same group)
- Test group only filtering
- Test source / group filtering
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
So the test script is making changes to a vpn configuration by
changing something fundamental about the vpn. This is causing
a window where routes we are interested in are:
present ( from pre-change ) then
withdrawn ( the test change causes this ) then
present ( with the new data )
The test code was trying to test for this by checking
to see if the prefix was there, but due to timing issues
it's not always there when we look for it.
Modify the test to get the vpn table version prior to
the change( as that it should not be moving around ) and
then change the test for the prefix to look for a version
that is later than the vpn's table version. Then we know
that it is *after* everything has stabilized again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
So the test script is making changes to a vpn configuration by
changing something fundamental about the vpn. This is causing
a window where routes we are interested in are:
present ( from pre-change ) then
withdrawn ( the test change causes this ) then
present ( with the new data )
The test code was trying to test for this by checking
to see if the prefix was there, but due to timing issues
it's not always there when we look for it.
Modify the test to get the vpn table version prior to
the change( as that it should not be moving around ) and
then change the test for the prefix to look for a version
that is later than the vpn's table version. Then we know
that it is *after* everything has stabilized again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>