Reduces run time of the bgp_l3vpn_to_bgp_vrf topotests
from ~118 seconds to ~87 seconds by reducing hello timers
in bgp and ospf
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reduce run time of bfd-topo2 from ~62 seconds to ~33 seconds
by modifying the hello/dead intervals for both ospf and ospfv3
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reduce the runtime from ~82 seconds to ~51 seconds by
reducing hello/hold timers for both bgp and ospf.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reduce run time from ~114 seconds to ~55 seconds by
configuring hello/dead interval timers for ospf and ospfv3
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reduce run time from ~76 seconds to ~47 seconds by modifying
both bgp and ospf timers to be more aggressive
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reduce this tests run time from ~76 seconds to ~49 seconds
by decreasing the hello/dead interval timers in ospf
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Decrease run time from ~70 seconds to ~41 seconds by
reducing hello/dead interval timers in ospf
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Decrease run time from ~70 seconds to ~60 seconds
by modifying the hello/dead interval interface timers
in ospf
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reduce ospf-sr-topo1 run time from ~60 seconds to ~30 seconds
by shortening the hello and dead timers.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Convert run times by lowering hello/dead interval timers to
smaller values from ~66 seconds to ~36 seconds.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Initial run of topotests on my machine takes ~210 seconds
With these changes we are at ~40 seconds
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. Added test to verify bgp vrf dynamic route leak functionality
2. Total execution time is ~8 mins
3. Added kernel version check, these script would be run for kernel version >= 4.19
Signed-off-by: Kuldeep Kashyap <kashyapk@vmware.com>
1. Topotest for isis-vrf is added for ipv4 and ipv6.
2. Test case for checking isis topology.
3. Test case for checking zebra isis routes.
4. Test case for checking linux vrf routes.
5. 2 new API's written in topotest/lib for checking vrf routes.
Co-authored-by: Kaushik <kaushik@niralnetworks.com>"
Signed-off-by: harios_niral <hari@niralnetworks.com>
1. Added isis with different vrf and it's dependecies.
2. Added new vrf leaf in yang.
3. A minor change for IF_DOWN_FROM_Z passing argrument is
replaced with ifp pointer in api "isis_if_delete_hook()".
4. Minor fix in the isisd spf unit test.
Co-authored-by: Kaushik <kaushik@niralnetworks.com>"
Signed-off-by: harios_niral <hari@niralnetworks.com>
Avoid unnecessary use of StringIO in one place, use version-
dependent method in another. Remove a couple of other py2->py3
problems.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
RFC 7490 says:
"The reverse SPF computes the cost from each remote node to root. This
is achieved by running the normal SPF algorithm but using the link
cost in the direction from the next hop back towards root in place of
the link cost in the direction away from root towards the next hop".
Support for reverse SPF will be necessary later as it's one of the
algorithms used to compute R-LFA/TI-LFA repair paths.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Now that the IS-IS SPF code is more modular, write some unit tests
for it.
This commit includes a new test program called "test_isis_spf" which
can load any test topology (there are 13 different ones available)
and run SPF on any desired node. In the future this same test program
and topologies will also be used to test reverse SPF and TI-LFA.
The "test_common.c" file contains helper functions used to parse the
topology descriptions from "test_topologies.c" into LSP databases
that can be used as an input to the SPF code.
This commit also introduces the F_ISIS_UNIT_TEST flag which is used
to prevent the IS-IS code from scheduling any event when running
under the context of an unit test.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Change the public router stop method to always do a two-phase
shutdown - once without waiting and a second time with a wait.
Ordinary callers need to use this approach when stopping routers.
Move the detailed internal details to a private method that tests
should not call directly.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
this test ensures that an incoming bgp ipv4 and ipv6 flowspec
entry is received with a nexthop IP associated.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In case of config rollback is enabled,
record northbound transaction based on a control flag.
The actual frr daemons would set the flag to true via
nb_init from frr_init.
This will allow test daemon to bypass recording
transacation to db.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Add new option to `segment-routing prefix` command to set the
Explcit Null flag in addition to the No-PHP flag. MPLS LFIB configuration
has been also updated to take into account the Explicit Null flag.
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
RFC 8665 defines a Segment Routing Local Block for Adjacency SID.
This patch provides the possibility to modify the SRLB as well as
reserved the block range from the Label Manager.
- Introduce new CLI 'segment-routing local-block'
- Add local block to SRDB structure
- Parse / Serialize SRLB in Router Information LSA
- Update OSPF-SR topotest
- Update documentation
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
1. Created a structure "isis master".
2. All the changes are related to handle ISIS with different vrf.
3. A new variable added in structure "isis" to store the vrf name.
4. The display commands for isis is changed to support different VRFs.
Signed-off-by: Kaushik <kaushik@niralnetworks.com>
Add a new test to cover the new features for multi hop BFD peers:
- Test that we correctly receive TTL from protocol integration.
- Check minimum TTL usage and 'show' command.
- Check for passive mode.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
The commit `bfdd: simplify and remove duplicated code` fixed a problem
that was causing the protocol configuration to override the user
configuration.
In this test case: the peer was configured to be disabled (default is
`shutdown`) and the test was expecting it to get activated (`no shutdown`)
when the protocol converged. I changed the peer default state to
`no shutdown`, however another way to get the same effect is to
configure the protocol to use a profile or don't configure a peer at all
(and use the defaults).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Since the values of ifindices cannot be relied upon across
distributions, simpy remove them from the VNI JSON being compared.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
add tests to check IP address/MAC address associations are learned
from netlink NEWNEIGH messages and are propagated to the remote PE
Signed-off-by: Pat Ruddy <pat@voltanet.io>
With these changes the IS-IS SR topotest should run to completion
about twice as fast compared to before (4 -> 2 minutes on my
machine).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
For the sake of Segment Routing (SR) and Traffic Engineering (TE)
Policies there's a need for additional infrastructure within zebra.
The infrastructure in this PR is supposed to manage such policies
in terms of installing binding SIDs and LSPs. Also it is capable of
managing MPLS labels using the label manager, keeping track of
nexthops (for resolving labels) and notifying interested parties about
changes of a policy/LSP state. Further it enables a route map mechanism
for BGP and SR-TE colors such that learned BGP routes can be mapped
onto SR-TE Policies.
This PR does not introduce any usable features by now, it is just
infrastructure for other upcoming PRs which will introduce 'pathd',
a new SR-TE daemon.
Co-authored-by: Renato Westphal <renato@opensourcerouting.org>
Co-authored-by: GalaxyGorilla <sascha@netdef.org>
Signed-off-by: Sebastien Merle <sebastien@netdef.org>
The vxlan `ip... ` command is failing because we are passing in
`no learning` and that is failing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a bit of a clue to the test_evpn_type5_topo1.py script
to what dut is failing, when things go south.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the topotest mem-leak reporting is enabled, use the same
two-step daemon stop procedure that's used in
the topogen.stop_topology path.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The base topology is a two level CLOS with two racks. There are
two PEs/TORs in each rack that provide active-active redundancy to
two dual-attached servers in the rack. And EVPN-PIM is used for
flooded traffic.
Reference: evpn-mh-topo-tests.pdf
Tests have been added for the following functionality -
1. ES management
2. EAD/Type-1 route handling
3. Type-2 route with non-zero ESI
4. MAC sync and remote MAC (with remote-ES destination) handling
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When you make a change to a route-map or a prefix-list it depends on, note
that the route-map needs to be reprocessed for the change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
`sharpd` doesn't handle any route map commands and neither should show
up in route map commands. This makes the CI pass again after not sending
route map commands to it again.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Address-sanitizer runs in the CI appear to require more
memory than is available (at present), so skip the top
x32 route_scale testcase when running with <4G of ram.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Make some changes to the route-scale topotest, in view of
issue #6734. Table-drive the test to eliminate some
repeated code. Assert and fail if a step in the progression
of scale fails. Wait a little longer between checking the show
output - it's costly to generate that output at scale. Add a
memleak testcase.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
1. evpn_type5_test_topo1 tests started failing in CI for all Ubuntu 18.04 machine,
which are having kernel version: 5.4.0-42-generic
2. We will enable these tests once issue is found and fixed.
Signed-off-by: Kuldeep Kashyap <kashyapk@vmware.com>
The `log monitor' command is a no-op and actually
outputs a `this doesn't do anything` warning. Let's remove
this cli line from our tests as that don't do anything and
people will look at these configs for guidance.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
add thread info, use "bt full" to get variables and add a bit of
disassembly for good measure.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
1. Increasing BGP convergence wait time to overcome Ubuntu 16.04 arm8 box, as
bgp neighorship is taking more time in this particular testbed.
2. Debugged bgp-ecmp-topo2 failures and here also it seems to be bgp convergence
issue, doing some enhancement in scripts to handle it
Signed-off-by: Kuldeep Kashyap <kashyapk@vmware.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
Stop printing hard-coded 30 seconds in a couple of places in
bgp.py in the topojson infra - print the actual time
spent waiting.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Remove a special-case clause for static routes - it was the same
as the clause for other recursive routes. Have staticd just tell
zebra that recursion is allowed. Update topotest that was aware
of this 'internal' flag.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
1. It will generate support bundle/sump data on test failures
2. It used /usr/lib/frr/generate_support_bundle.py utility to dump the data
Signed-off-by: Kuldeep Kashyap <kashyapk@vmware.com>
Another short timeout for bgp - make the
verify_bgp_convergence_from_running_config() api use the same
generous timeout as verify_bgp_convergence()
Signed-off-by: Mark Stapp <mjs@voltanet.io>
There exists the possiblity that the hello timer printed would
show a time to expiration in this format:
Hello due in 350 usecs
The tests are looking for:
Hello due in 5.430s
Just notice that we may have gotten usecs and act accordingly
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. Added 7 test cases to verify bgp recursive nexthop and ebgp multi-hop functionality
2. Added framework support to automate these test cases
3. Total execution time is ~5 mins
Signed-off-by: Kuldeep Kashyap <kashyapk@vmware.com>
Use the right list of daemons to avoid trying to start zebra twice.
Change a zebra log message to INFO level to avoid stderr check
failure.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a few retries during router shutdown before killing a daemon. Also
work harder to start only a single instance of daemons, esp. zebra.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The tests work with the default settings of BFD meaning that bfdd is
able to recognize a 'down' link after ~900ms so a route recovery should
be visible in the RIB after 1 second.
In the current state only IPv4 is used (when using IPv6
autoconfiguration) within BFD, even though the recovery also affects
IPv6 routes. This is different to the current state of ospfd/ospf6d in
combination with BFD since both IPv4 and IPv6 sessions are used there.
The following topology is used:
+---------+
| |
eth-rt2 (.1) | RT1 | eth-rt3 (.1)
+----------+ 1.1.1.1 +----------+
| | | |
| +---------+ |
| |
| 10.0.2.0/24 |
| |
| eth-rt1 | (.2)
| 10.0.1.0/24 +----+----+
| | |
| | RT3 |
| | 3.3.3.3 |
| | |
(.2) | eth-rt1 +----+----+
+----+----+ eth-rt4 | (.1)
| | |
| RT2 | |
| 2.2.2.2 | 10.0.4.0/24 |
| | |
+----+----+ |
(.1) | eth-rt5 eth-rt3 | (.2)
| +----+----+
| | |
| | RT4 |
| | 4.4.4.4 |
| | |
| +----+----+
| 10.0.3.0/24 eth-rt5 | (.1)
| |
| |
| 10.0.5.0/24 |
| |
| +---------+ |
| | | |
+----------+ RT5 +----------+
eth-rt2 (.2) | 5.5.5.5 | eth-rt4 (.2)
| |
+---------+
Route recovery is tested on RT1. The focus here lies on the two
different routes to RT5. Link failures are generated by taking
down interfaces via the mininet Python interface on RT2 and RT3.
Hence routes are supposed to be adjusted to use RT3 when a link
failure happens on RT2 or vice versa.
Note that only failure recognition and recovery is "fast". BFD
does not monitor a link becoming available again.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>
Instead of waiting for daemons start with `sleep`, start them with the
`-d` parameter so they can release the terminal themselves when ready.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Start logging early everything (including debug) to
`/tmp/topotest/<test>/<node>/<daemon>.{out,err}`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Handle the duplicated code with a simple conditional: if called from
specialized API use provided daemons configuration, otherwise fallback
to old `Router` own daemon settings.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Just disable pbr tests on anything less than 4.10.
This has to do with the fact that the arm platform
is not allowing us to install a route into a
non default table using a interface associated
with a vrf.
ip route add default 4.5.6.7 via swp39 table 10000
When swp39 is in a vrf other than default
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
From last addition we can tell that the nexthop-group C is
installed but pbr does not think it is. This failure
has been consistent the last 4-5 runs in master. Lets
add a bit more data gathering to figure out what is going on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Include vrf name with interface name when topojson framework
generates interface configuration. This matches the output of
'show runn', and makes config reset less disruptive. Also
stop removing configured debugs and log output when re-generating
config.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This may be expanded in the future as we figure out more things
to gather when the test has gone south.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is the bulk part extracted from "bgpd: Convert from `struct
bgp_node` to `struct bgp_dest`". It should not result in any functional
change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Under heavy load some systems may still be processing. Let's give
the system some time to figure out what is going wrong.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Announcements that are marked as invalid were previously not revalidated.
This was fixed by replacing the range lookup with a subtree lookup.
Signed-off-by: Marcel Röthke <marcel.roethke@haw-hamburg.de>
Import a topology with some protocols that integrate with BFD. As other
daemons get the new BFD profile support we can update the test to cover
them.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Error Message seen:
2020-06-11 14:00:35,288 ERROR: assert failed at "test_ebgp_ecmp_topo2/test_ecmp_after_clear_bgp[redist_static]": Testcase test_ecmp_after_clear_bgp[redist_static] : Failed
Error: TIMEOUT!! BGP is not converged in 30 seconds for router r3
assert 'TIMEOUT!! BGP is not converged in 30 seconds for router r3' is True
if a retry for a failed connection is 120 seconds we should wait slightly
longer than a retry session, which this clear test was not doing.
Especially since we know our topotests are lossy on data under load.
Apparently I changed this earlier to 90 seconds, but a retry window
is 120. Not sure wtf I was thinking
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add some basic route scale tests to ensure that we can
install a large number of routes. Also grab some timings
so that we can keep track and see if anything substantially
changes over time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The convergence timeout in lib/bgp.py was something like 40 secs;
that's not really long enough for the CI. Raise the timeout to
40 x 3 secs.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use `json_cmp` instead of raw text comparison. It should fix some of the
ordering problems we are seeing in CI runs.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Update OSPF Segment Routing topotest in conformity to ECMP
- Add one more interface between r1 and r2 for ECMP
- Anonymize Adjacency SID
- Update expected json output
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
The bgp_as_wide_bgp_identifier test can time out after
30 seconds waiting for convergence. This is of course
a problem in our test setup where we know that convergence
can fail on first startup due to load issues in the
topology. Additionally we also know that our default time
for bgp sessions is 120 seconds to retry. Give things
a bit longer than 120 seconds to actually fail
instead of 30 seconds
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Just add a basic test for pbr. This code
does not actually test installation in the kernel at this
point in time.
What we do do is make sure pbr is in a sane state after
some very basic configuration.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Just a simple setup for pbr to prove it starts. Once the json
code for pbr gets in we can add more.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Issue number #6291 describes how OSPFd crashes after being deleted and then
added again with configuration when segment routing is used.
The problem occurs in ospf_ri.c because the OspfRI structures retains
the reference to the old area pointer which is mofified when ospfd is
reactivated by configuration. When segment routing is activated, the LSA Router
Information is sent with reference to the old area pointer, instead the new one,
which causes the crash. The same problem is also present in ospf_ext.c with
OspfEXT structure and Extended Link/Prefix structure.
This commit introduces Extended Link/Prefix and Router Information LSAs flusing
when OSPFd is stopped when configuration is removed and adds the correct
initialization to the area pointer in OspfRI and Extended Link/Prefix structure
when OSPFd is re-enabled with the configuration. Area pointer has been removed
from the OspfEXT structure as it is never used with this commit.
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
Changes:
- Renamed file so we don't get confused when it fails.
- Use `json_cmp` instead of direct key access.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
The involved piece of code is supposed to find a 'closest' match for two
JSON structures using another JSON diff. However, it can happen that
during that new diff the JSON structures are altered (elements from a
list are deleted when 'found'). This is in general ok when the deleted
element is part of the JSON structure which 'matches', but when it later
turns out that some other element of the structure doesn't fit, then the
whole structure should be recovered. This is now realized by using a
deepcopy for the besaid new JSON diff such that the original is only
altered (e.g. deleted) when the diff is clean.
Signed-off-by: GalaxyGorilla <sascha@netdef.org>