RB-tree and double-linked-list easily support backwards iteration, and
an use case seems to have popped up. Let's make it accessible.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Do not allow the test system to turn off the logging of commands
Some tests use the reload command that is accidently turning off
the logging. Just force the tests to ignore it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Test ospf running with 3 vrfs: default, neno, ray
Route leaking is setup via bgp between default and neno vrfs
Leaked routes include connected and ospf
Included test:
1- OSPF convergnce
2- zebra/kernel routes
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
This PR adds support for configuring topotest routers using a single file.
instead of:
```
router.load_config(
TopoRouter.RD_ZEBRA, os.path.join(CWD, "{}/zebra.conf".format(rname))
)
router.load_config(
TopoRouter.RD_OSPF, os.path.join(CWD, "{}/ospfd.conf".format(rname))
)
router.load_config(
TopoRouter.RD_BGP, os.path.join(CWD, "{}/bgpd.conf".format(rname))
)
```
you can now do:
```
router.load_frr_config(
os.path.join(CWD, "{}/frr.conf".format(rname)),
[TopoRouter.RD_ZEBRA, TopoRouter.RD_OSPF, TopoRouter.RD_BGP]
)
```
or just:
```
router.load_frr_config(os.path.join(CWD, "{}/frr.conf".format(rname)))
```
In this latter case, the daemons list will be inferred from frr.conf file.
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
When running a topotest with the --shell or --vtysh argument, the
window titles of the routers are generic.
Set the router name as title to identify correctly the window.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Opening new tab in screen is not possible when using option --vtysh or
--shell. Error 'No such file or directory'.
Fix the issue.
Fixes: 6a5433ef0b ("tests: NEW micronet replacement for mininet")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
noticed that pylint was complaining about some easily
fixable stuff in test_route_map_topo1.py so let's clean
it up some.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Have added topotest to verify below combination.
Auth support for md5
Auth support for hmac-sha-256
Auth support with keychain for md5
Auth support with keychain for hmac-sha-256
Have sussessfully run all 4 test cases in my local setup.
Signed-off-by: Abhinay Ramesh <rabhinay@vmware.com>
isis_tlvs.c would fail at multiple places if incorrect TLVs were
received causing stream assertion violations.
This patch fixes the issues by adding missing length checks, missing
consumed length updates and handling malformed Segment Routing subTLVs.
Signed-off-by: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>
Small adjustments by Igor Ryzhov:
- fix incorrect replacement of srgb by srlb on lines 3052 and 3054
- add length check for ISIS_SUBTLV_ALGORITHM
- fix conflict in fuzzing data during rebase
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Add new topotest for the Constraints ShortestPath First (CSPF) algorithm.
This topotest uses IS-IS-TE as base network to populate a Traffic Engineering
Database (TED) and sharpd to call cspf algorithms on this IS-IS-TE topology.
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
When link-param is enabled for a given interface, TE metric is automatically
assigned to the metric of the interface. However, the metric of the interface
could be unassigned and keep the default value equal to 0. Thus, if the TE
metric is not explicitely modified within the `link-param metric` statement,
TE metric remains set to 0 which is not a valid value especially when
computing constrainted path.
This patch changes the assignement of the default value of the TE metric.
It is set to the metric of the interface only if the latter is not equal to 0.
TE topotests for OSPF and IS-IS have been adjusted accordingly.
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
Test the ability to use the following configure command with a Y value:
no neighbor X.X.X.X maximum-prefix-out Y
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Introduces a topotest to validate proper AS-Path manipulation when using
"neighbor ... remove-private-AS".
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Opaque data takes up a lot of memory when there are a lot of routes on
the box. Given that this is just a cosmetic info, I propose to disable
it by default to not shock people who start using FRR for the first time
or upgrades from an old version.
Fixes#10101.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
VRF name should not be printed in the config since 574445ec. The update
was done for NB config output but I missed it for regular vty output.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The current maximum-prefix-out topo-test starts a configuration with a
maximum-prefix-out.
Test the application of new maximum-prefix-out value without clearing
the neighbor.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Redistribution for ospf with instance id's using instance id's
was incorrect. Add some small tests to make sure it catches the
issues and we don't regress.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This just tries logging messages in random ways to allow the fuzzer to
do its thing and try to find weird edge cases.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The test case test_PIM_hello_tx_rx_p1 is failing randomly because
sometimes the hello packet is received and sometimes not received while getting
the stats data.
When the hello packet is received HelloRx gets incremented to 1 and then
shutdown of the interface is executed which resets the stats to 0
and again when "no shutdown" of the interface is done, the stats get incremented to 1.
The test case checks after "no shutdown" of the interface whether the stats is incremented
but in this case although the stats got incremented the before and after value is same.
Hence the test case failed.
Adding correct expectations in the test case.
Signed-off-by: Mobashshera Rasool <mrasool@vmware.com>
Adding an `s` after these printfrr specifiers replaces 0.0.0.0 / :: in
the output with a star (`*`). This is primarily intended for use with
multicast, e.g. to print `(*,G)`.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Since this is only used in very few places, moving it out of the way is
reasonable. (`%pSG` will be pim_sgaddr)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The test is doing this:
a) gather interface data about packets sent
b) shut interface
c) no shut interface
d) gather interface data about packets sent
e) compare a to d and fail if packets sent/received has not incremented
The problem is, of course, that under heavy system load insufficient time
might not have passed for packets to be sent between c and d. Add up to
35 seconds of looking for packet data being incremented else heavily
loaded systems may never show that data is being sent.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
verify_pim_interface_traffic *fetches* the pim
traffic data. Rename the function to what it
actually does
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The nhrp_topo test sets up some infrastructure and
was displaying the commands it was outputting
incorrectly. Fix this.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Using with LLGR, this should be allowed setting GR restart-time timer to 0,
to immediately start LLGR timers.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
The following subcodes are defined for the Cease NOTIFICATION
message:
Subcode Symbolic Name
1 Maximum Number of Prefixes Reached
2 Administrative Shutdown
3 Peer De-configured
4 Administrative Reset
5 Connection Rejected
6 Other Configuration Change
7 Connection Collision Resolution
8 Out of Resources
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Currently, it is possible to rename the default VRF either by passing
`-o` option to zebra or by creating a file in `/var/run/netns` and
binding it to `/proc/self/ns/net`.
In both cases, only zebra knows about the rename and other daemons learn
about it only after they connect to zebra. This is a problem, because
daemons may read their config before they connect to zebra. To handle
this rename after the config is read, we have some special code in every
single daemon, which is not very bad but not desirable in my opinion.
But things are getting worse when we need to handle this in northbound
layer as we have to manually rewrite the config nodes. This approach is
already hacky, but still works as every daemon handles its own NB
structures. But it is completely incompatible with the central
management daemon architecture we are aiming for, as mgmtd doesn't even
have a connection with zebra to learn from it. And it shouldn't have it,
because operational state changes should never affect configuration.
To solve the problem and simplify the code, I propose to expand the `-o`
option to all daemons. By using the startup option, we let daemons know
about the rename before they read their configs so we don't need any
special code to deal with it. There's an easy way to pass the option to
all daemons by using `frr_global_options` variable.
Unfortunately, the second way of renaming by creating a file in
`/var/run/netns` is incompatible with the new mgmtd architecture.
Theoretically, we could force daemons to read their configs only after
they connect to zebra, but it means adding even more code to handle a
very specific use-case. And anyway this won't work for mgmtd as it
doesn't have a connection with zebra. So I had to remove this option.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Currently the Wait for Install code ( bgp_suppress_fib ) does
not properly handle two states from zebra: ROUTE_INSTALL_FAILED
and BETTER_ADMIN_DISTANCE_WON. Pre this change the WFI code
would just never notify our peers about a route install failure
but more is needed. In the ROUTE_INSTALL_FAILED and the
BETTER_ADMIN_DISTANCE_WON we need to notify our peers with
a withdrawal about the route, else we will continue to
draw traffic to us when we cannot legally do so.
Why is this needed? In either case imagine that we've already
received a bgp route, installed it and sent to our peers.
In the Better admin distance won case, say a static route is installed
at this point in time we must stop advertising the route through
us since we are not installed. As such a withdrawal must be sent.
In the ROUTE_INSTALL_FAILED case, the code was not properly handling
the situation where we have Route A, it was successfully installed
and then we received a update to Route A that was attempted to be
installed but failed. In this case we also need to send a withdrawal
Finally update the bgp_suppress_fib topotest to test both of these
situations.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There still existed chances that best path consideration
has not taken place for both bgp_l3vpn_to_bgp_vrf and
bgp_instance_del_test ( since they both used the same
check_routes.py scripting ). Add some more checks
to ensure that we have all the data. Prior to this
change I could see one of these two tests failing
every 2-3 runs on my test system. I am not seeing
this anymore after ~5 complete test runs.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
OSPF when converging will choose a DR / Backup DR based upon
who has already come up. Irrelevant of priority. As such if
under system load OSPF comes up first and elects a DR that under
normal circumstances not be the elected one due to priority
OSPF does not go back through and re-elect to keep the system
stable in this case. Tests are experiencing this:
unet> r0 show ip ospf neigh
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
100.1.1.1 99 Full/Backup 4m14s 3.780s 10.0.1.2 r0-s1-eth0:10.0.1.1 0 0 0
100.1.1.2 0 Full/DROther 4m14s 3.848s 10.0.1.3 r0-s1-eth0:10.0.1.1 0 0 0
100.1.1.3 0 Full/DROther 4m14s 3.912s 10.0.1.4 r0-s1-eth0:10.0.1.1 0 0 0
unet> r1 show ip ospf neigh
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
100.1.1.0 98 Full/DR 4m15s 3.011s 10.0.1.1 r1-s1-eth1:10.0.1.2 0 0 0
100.1.1.2 0 Full/DROther 4m19s 3.124s 10.0.1.3 r1-s1-eth1:10.0.1.2 0 0 0
100.1.1.3 0 Full/DROther 4m19s 3.188s 10.0.1.4 r1-s1-eth1:10.0.1.2 0 0 0
unet> r2 show ip ospf neigh
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
100.1.1.0 98 Full/DR 4m27s 3.483s 10.0.1.1 r2-s1-eth0:10.0.1.3 0 0 0
100.1.1.1 99 Full/Backup 4m32s 3.527s 10.0.1.2 r2-s1-eth0:10.0.1.3 0 0 0
100.1.1.3 0 2-Way/DROther 4m32s 3.660s 10.0.1.4 r2-s1-eth0:10.0.1.3 0 0 0
unet> r3 show ip ospf neigh
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
100.1.1.0 98 Full/DR 4m55s 3.786s 10.0.1.1 r3-s1-eth1:10.0.1.4 0 0 0
100.1.1.1 99 Full/Backup 4m55s 3.829s 10.0.1.2 r3-s1-eth1:10.0.1.4 0 0 0
100.1.1.2 0 2-Way/DROther 4m54s 3.897s 10.0.1.3 r3-s1-eth1:10.0.1.4 0 0 0
Modify the test to do a clear to enforce the order we are specifically looking for.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
- Add advertisement of Global IPv6 address in IIH pdu
- Add new CLI to set IPv6 Router ID
- Add advertisement of IPv6 Router ID
- Correctly advertise IPv6 local and neighbor addresses in Extended IS and MT
Reachability TLVs
- Correct output of Neighbor IPv6 address in 'show isis database detail'
- Manage IPv6 addresses advertisement and corresponiding Adjacency SID when
IS-IS is not using Multi-Topology by introducing a new ISIS_MT_DISABLE
value for mtid (== 4096 i.e. first reserved flag set to 1)
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
Lot's of the GR topotests kill daemons in order to test code
that deals with crashing daemons. Under heavy system load
it was noticed that a kill command was sent and if told to
wait we would sleep 2 seconds send another kill command and
call it good. This was causiing issues when subsuquent
json commands would get errors like `lost connection to daemon`
as the daemon finally shut down after some time due to load.
Modify the kill the daemon function to notice that the daemon
was not actually killed and if we need to wait wait some
more time for it too happen
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently under system load tests that use verify_pim_interface_traffic
immediately after a interface down/up event are not giving any time
for pim to receive and process the data from that event. Give
the test some time to gather this data.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Under heavy system load, we are sometimes seeing this
output for addKernelRoute:
2021-11-28 16:17:27,604 INFO: topolog: [DUT: b1]: Running command: [ip route add 224.0.0.13 dev b1-f1-eth0]
2021-11-28 16:17:27,604 DEBUG: topolog.b1: LinuxNamespace(b1): cmd_status("['/bin/bash', '-c', 'ip route add 224.0.0.13 dev b1-f1-eth0']", kwargs: {'encoding': 'utf-8', 'stdout': -1, 'stderr': -2, 'shell': False, 'stdin': None})
2021-11-28 16:17:27,967 DEBUG: topolog.b1: LinuxNamespace(b1): cmd_status("['/bin/bash', '-c', 'ip route']", kwargs: {'encoding': 'utf-8', 'stdout': -1, 'stderr': -2, 'shell': False, 'stdin': None})
2021-11-28 16:17:28,243 DEBUG: topolog: ip route
70.0.0.0/24 dev b1-f1-eth0 proto kernel scope link src 70.0.0.1
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This tells us that the ip route add succeeded but when looking for it
the system failed to immediately find it. Why is this happening?
Probably we are under heavy system load and the two different
commands, 'ip route add..' and 'ip route show' are being executed
on different cpu's and the data has not been copied to the different
cpu yet in the kernel. This is not necessarily something normally
seen but entirely possible. Giving the system a few extra seconds
for the kernel to execute/work the memory barrier system seems
prudent for long term success of our programming.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Modify the timers uses to send updates/hello's every
1 seconds instead of 5. Allowing this test to converge
faster under heavy system load.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
During repeated runs I am seeing this test fail to run successfully.
Upon inspecting the output:
{
"prefix":"10.0.10.0/24",
"prefixLen":24,
"protocol":"isis",
"vrfId":6,
"vrfName":"r1-cust1",
"selected":true,
"destSelected":true,
"distance":115,
"metric":10,
"queued":true,
We can see that the route is still queued. Under heavy system
load and not ensuring that isis has time to send the route to
zebra and for zebra to install the route, this test can fail.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Update verify_ospf6_neighbor() so we can verify there are no
neighbors in a given router
input_dict = {
"r0": {
"ospf6": {
"neighbors": []
}
}
}
result = verify_ospf6_neighbor(tgen, topo, dut, input_dict)
Signed-off-by: ckishimo <carles.kishimoto@gmail.com>
The interface area command is deprecated under
router ospf6 and should be on the individual interface.
Let's modify the tests to not actually put the
interface foo area 0.0.0.0 command under the
router node.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When using build_config_from_json there exists a timing
window where neighbors can come up before the router-id
is applied. As a precaution, quickly clear the neighbors
to ensure that we get neighbors with the expected router-id.
This can especially happen under high system load.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The test_ospf_dual_stack test had area configuration
under the `router ospf6` nodes. This is getting
lots of warning messages from the cli. Let's remove
this.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When testers use the build_config_from_json function
the create_router_ospf function is double creating
the ospfv3 cli to be passed in. This is because
the create_router_ospf loops over both v2 and v3
and then create_router_ospf6 re-adds v3.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The ospf_basic_functionality/test_ospf_lan.py creates
a ethernet segment and attaches 4 routers to it and
assigns ip addresses in a /24. As one of the tests
it picks a new address for r0 which coincides with
a ip address on r3. Then the test immediatly
checks for other data. The problem is of course
that if a test is `slow` enough hello's will
start to be ignored from r3 to r0 and the
neighbor relationships will come down. Choose
an ip address that doesn't cause this issue.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The new bgp_route_server_client test is not setting the
timers for peers to be fast enough to have the ability
to converge in under 60 seconds if a packet is dropped/missed
at startup. Make the test have the ability to converge
under load
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
add a parameter to resolver api that is the vrf identifier. this permits
to make resolution self to each vrf. in case vrf netns backend is used,
this is very practical, since resolution can happen on one netns, while
it is not the case in an other one.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There exist systems that do not explicity have a python soft-link
on their system. Let's explicity call out which python we want
to be using with exabgp.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The bgp gr topotests had run times that were greater than 10 minutes each.
Just brute force break up the tests to 4 different sub parts.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Description:
- Changing the expected output for selected route in the script.
- With our changes for VRF-Lite fix best path selection,
during best path selection, while comparing the paths for imported routes,
we should correctly refer to the original route i.e. the ultimate path.
In this case, when we have ibgp route and imported ibgp route
for the same prefix, we do compare IGP metric which is same for both,
So we proceed to comparing router-ids and selecting the best path.
- Before our changes, ibgp route was preferred because of IGP metric.
With our fix, expected output for selected route is changed to
imported ibgp route because of the lower router-id.
- Corresponding changes for expected advertised route and
the large community are made.
Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
Somewhere along the line core-files stopped being generated
with the running of the topotests. With this change we now
see this:
sharpd@eva /t/topotests> find . -name '*.dmp' -print
./ospfv3_basic_functionality.test_ospfv3_asbr_summary_topo1/r0/ospf6d_core-sig_6-pid_430478.dmp
sharpd@eva /t/topotests> sudo gdb /usr/lib/frr/ospf6d ./ospfv3_basic_functionality.test_ospfv3_asbr_summary_topo1/r0/ospf6d_core-sig_6-pid_430478.dmp
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/lib/frr/ospf6d...
[New LWP 430478]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/ospf6d --log file:ospf6d.log --log-level debug -d'.
Program terminated with signal SIGABRT, Aborted.
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
(gdb)
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is implicitly checked by the "verify mroute" below, but it's much
more helpful to explicitly check in advance.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Currently I get bgp_instance_del-test as well as bgp_l3vpn_to_bgp_vrf
failures every ~3-4 runs when under a 40 parallel run with micronet.
Examination of the failure and passing cases always leads to the
failures showing convergence of bgp bestpath immediately after
the show commands to ensure that the routes are there.
Modify the code to look for the fact that the vrf has
converged from routes being passed around across vrf's
and ensure that bestpath has run on them.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When debugging issues for routes in multiple vrf's. It would
be extremely useful if the debug output had which vrf we
are acting on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Because this test can be run in either netns vrf mode or vrflite
vrf mode, the default vrf name has different name. When netns mode
is chosen, vrf0 name is chosen as default name, while when vrflite
mode is chosen, default name is chosen. Remove the vrf keyword from
the expected dump.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Issue #9983 explains what is wrong with the GR helper mode.
To unblock the CI that fails almost all the time on the ospf_gr_topo1
test, remove the commands and disable the test. Also add a reminder to
completely remove the helper mode if no one fixes the code in a month.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
This can't really be run as part of CI, it's intended as a helper
instead, to use manually after poking around in the c-ares binding code.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
common_cli.c disables logging by default so stdio is usable as vty
without log messages getting strewn inbetween. This the right thing for
most tests, but not all; sometimes we do want log messages.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
949aaea5 removed debugs from all topotests, but this test relies on the
debug logs so it constantly fails now.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When our CI test system is under high load, expecting bfd to
converge in under 2 seconds is not going to happen. Modify the test
suites to just ensure that things reconvderge.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Debugs take up a significant amount of cpu time as well as
increased disk space for storage of results. Reduce test
over head by removing the debugs, Hopefully this helps
alleviate some of the overloading that we are seeing in
our CI systems.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The test system under load looks for upstream state only
1 time immediately after sending 2 streams of S,G data
flowing. Give the system some time to process this
and ensure that it actually shows up in a small
amount of time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The test_ldp_pseudowires_after_link_down test
shuts a link down and was blindly waiting 5 seconds
before just assuming the test system was in a sane
state. Remove the sleep(5) and actually look for
the changed state for the route 2.2.2.2 that the
psueudowire actually depends on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The test does this:
a) shut link down
b) test for ospf convergence
c) ensure the route is installed
When under a heavily loaded system c) is not guaranteed
to happen quickly. Give the system 10 extra seconds
to ensure it happens.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The route replace test was doing this seq of events:
a) Create nhg
b) Install route w/ sharpd
c) Ensure it worked
d) Modify nhg
d) Ensure the update group replace worked
The problem is that the sharp code is doing this:
/* Only send via ID if nhgroup has been successfully installed */
if (nhgid && sharp_nhgroup_id_is_installed(nhgid)) {
SET_FLAG(api.message, ZAPI_MESSAGE_NHG);
api.nhgid = nhgid;
} else {
for (ALL_NEXTHOPS_PTR(nhg, nh)) {
api_nh = &api.nexthops[i];
zapi_nexthop_from_nexthop(api_nh, nh);
i++;
}
api.nexthop_num = i;
}
The created nhg has not been successfully installed( or at least
sharpd has not read the results yet) when it gets the command
to install the routes. As such it passes down the individual
nexthops instead. The route replace is never going to work.
Modify the code to add a bit of sleep to allow sharpd to
get notified when the system is under load. At this point
there is no way to query sharpd for whether or not it
thinks it's nhg is installed properly or not. This
test is failing all over the place for a bunch of people
let's get this fixed so people can get running
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
the test_nexthop_groups function is failing occassionally
because the test executes 4 in succession sharp install
routes commands. When I dumped the rib on a failed test
run there were only 2 of the 4 routes in the rib and
the two that were in were the last 2 installed.
The sharp daemon setups a event process where it
installs routes `automatically`. If the previous
run is not finished entering a new command to install
the routes will mess up the last one from ever happening.
It is assumed that the user doesn't do stupid stuff here.
In this case I am just adding a small sleep between each
installation to just let the test proceed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
the isis_topo1 test has two functions where immediately
after the test ensures that the routes are in isis
tests to see if they are in the rib. Under system
load I am seeing this test failing because the
routes are still queued. Modify the zebra check
for the isis routes to look for the proper results
for 10 seconds.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently, we have a lot of checks in CLI and NB layer to prevent
incompatible IS-types of circuits and areas. All these checks become
completely meaningless when the interface is moved between VRFs. If the
area IS-type is different in the new VRF, previously done checks mean
nothing and we still end up with incorrect circuit IS type. To actually
prevent incorrect IS type, all checks must be done in the processing
code.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
This code has two issues:
a) The loop to test for successful installation re-installs
the route every time it loops. A system under load will
have issues ensuring the route is installed and repeated
attempts does not help
b) The nexthop group installation was always failing
but never noticed (because of the previous commit)
and the test was always passing, when it should
have never passed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The test is checking installing of seg6 routes by this
loop:
for up to 5 times:
sharp install seg6 route
show ip route and is it installed
The problem is that if the system is under heavy
load the installation may not have happened yet
and by immediately reinstalling the same route
the same thing could happen again.
Modify the code to pull the route installation
outside of the loop and to increase to 10 attempts
in case there is very heavy system load.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The check_ping function `_check` function was asserting and being
passed to the topotests.run_and_expect() functionality causing
it to not run the full range of pings if one failed the test.
So effectively it was properly detecting pass / failure but
only allowing for 1 iteration if it was going to fail.
Modify the code to not assert and act like all the other
run_and_expect functionality.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
isis_tlvs.c would fail at multiple places if incorrect
TLVs were received in unpack_item_ext_subtlvs(),
causing stream assertion violations.
Signed-off-by: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>
The bgp_l3vpn_to_direct test is failing sometimes because
the 2.2.2.2 route is dissapearing. What is happening?
The log file for the failed test run shows us this:
build 15-Oct-2021 07:26:12 scripts/adjacencies.py:8 WAIT:r4:ping 2.2.2.2 -c 1: 0. packet loss:wait:PE->P2 (loopback) ping:60:0.5:
build 15-Oct-2021 07:26:12 Fri Oct 15 14:26:12 2021 (#9) scripts/adjacencies.py:8 COMMAND:r4:ping 2.2.2.2 -c 1: 0. packet loss:wait:PE->P2 (loopback) ping:
build 15-Oct-2021 07:26:12 COMMAND OUTPUT:PING 2.2.2.2 (2.2.2.2) 56(84) bytes of data.
build 15-Oct-2021 07:26:12 64 bytes from 2.2.2.2: icmp_seq=1 ttl=64 time=0.143 ms
build 15-Oct-2021 07:26:12
build 15-Oct-2021 07:26:12 --- 2.2.2.2 ping statistics ---
build 15-Oct-2021 07:26:12 1 packets transmitted, 1 received, 0% packet loss, time 0ms
build 15-Oct-2021 07:26:12 rtt min/avg/max/mdev = 0.143/0.143/0.143/0.000 ms:
build 15-Oct-2021 07:26:12 Done after 1 loops, time=0.024507761001586914, Found= 0% packet loss
build 15-Oct-2021 07:26:12 Fri Oct 15 14:26:12 2021 (#9) scripts/adjacencies.py:9 COMMAND:r4:ping 2.2.2.2 -c 1: 0. packet loss:pass:PE->P2 (loopback) ping +0.02 secs:
build 15-Oct-2021 07:26:12 2021-10-15 14:26:12,446 WARNING: topolog.r4: LinuxNamespace(r4): proc failed: rc 2 pid 28826
build 15-Oct-2021 07:26:12 args: /usr/bin/nsenter -a -t 27444 -F --wd=/tmp/topotests/bgp_l3vpn_to_bgp_direct.test_bgp_l3vpn_to_bgp_direct/r4 /bin/bash -c ping 2.2.2.2 -c 1
build 15-Oct-2021 07:26:12 stdout: connect: Network is unreachable:
build 15-Oct-2021 07:26:17 COMMAND OUTPUT:connect: Network is unreachable:
build 15-Oct-2021 07:26:17 R:9 r4 PE->P2 (loopback) ping +0.02 secs 0 1
So the 2.2.2.2 route is coming/going and is failing on these test lines:
luCommand(
"r1", "ping 2.2.2.2 -c 1", " 0. packet loss", "wait", "PE->P2 (loopback) ping", 60
)
luCommand(
"r3", "ping 2.2.2.2 -c 1", " 0. packet loss", "wait", "PE->P2 (loopback) ping", 60
)
luCommand(
"r4", "ping 2.2.2.2 -c 1", " 0. packet loss", "wait", "PE->P2 (loopback) ping", 60
)
So the 2.2.2.2 routes on r1,3 and 4 are received via ospf, but are
modified by some other process to add labels ( probably ldp, since
it is running too ). The 2nd ping to 2.2.2.2 is failing because
the 2.2.2.2 route on r4 is being replaced. As an example here
is `ip monitor all` on r4 during boot up. Please note timestamps
are not necessarily representative of what we will see on the
loaded ci system.
[2021-10-15T15:46:52.261456] [NEXTHOP]id 27 via 10.0.2.2 dev r4-eth0 scope link proto zebra
[2021-10-15T15:46:52.261490] [ROUTE]2.2.2.2 nhid 27 via 10.0.2.2 dev r4-eth0 proto ospf metric 20
<snip>
[2021-10-15T15:46:53.556405] [NEXTHOP]Deleted id 27 via 10.0.2.2 dev r4-eth0 scope link proto zebra
<snip>
[2021-10-15T15:46:53.566575] [NEXTHOP]id 32 via 10.0.2.2 dev r4-eth0 scope link proto zebra
[2021-10-15T15:46:53.566585] [ROUTE]2.2.2.2 nhid 32 via 10.0.2.2 dev r4-eth0 proto ospf metric 20
For a small amount of time the route was *gone*. I believe the upstream
CI system hits that window sometimes, causing the test to fail.
This patch attempts to ensure that the 2.2.2.2 route should be learned
appropriately ( thus slowing it down ) before the test moves onto
the ping. I suspect the long term answer might be to add a test to
the scripts/adjancies.py script to ensure that the test does not
continue until the appropriate label is in place, but I want to
make the test run a bit more perscriptive in what it is looking
for here.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Recent commit 83f325901a had a accidental
turn of a 1 second wait into a 10 second wait
between retries. 10 seconds is too long.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Test doesn't wait long enough when it checks the routers after
restart. On slower systems, it frequently failed as it ran out
of time
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
When our ci test system is under high load, expecting bfd to converge
in under 2 seconds is not going to happen. Modify the test suites
to just ensure that things converge. If we need actual functional
testing of bfd response times the topotests are not an appropriate place
to do this or we need to modify the test system to gather the data for
how long it takes after the tests are run.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
During a local CI run, bgp_ecmp_topo3 was failing
to properly notice the fast-convergence command
issued before the interface is shut down. As
such there exists a race condition where under
high load the zebra process can actually shut
an interface down before we have properly ensured
that fast convergence is on for ibgp.
Modify the test for in two ways:
1) Ensure that previous section makes sure
that we have properly converged for when we
bring back up the interfaces instead of
assuming that we have done so.
2) After issuing the fast-convergence command.
Ensure that bgp has fully processed it and is
ready to receive the interface down events
as triggers for shutting down the ibgp session.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>