When debugging issues for routes in multiple vrf's. It would
be extremely useful if the debug output had which vrf we
are acting on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Because this test can be run in either netns vrf mode or vrflite
vrf mode, the default vrf name has different name. When netns mode
is chosen, vrf0 name is chosen as default name, while when vrflite
mode is chosen, default name is chosen. Remove the vrf keyword from
the expected dump.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Issue #9983 explains what is wrong with the GR helper mode.
To unblock the CI that fails almost all the time on the ospf_gr_topo1
test, remove the commands and disable the test. Also add a reminder to
completely remove the helper mode if no one fixes the code in a month.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
This can't really be run as part of CI, it's intended as a helper
instead, to use manually after poking around in the c-ares binding code.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
common_cli.c disables logging by default so stdio is usable as vty
without log messages getting strewn inbetween. This the right thing for
most tests, but not all; sometimes we do want log messages.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
949aaea5 removed debugs from all topotests, but this test relies on the
debug logs so it constantly fails now.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When our CI test system is under high load, expecting bfd to
converge in under 2 seconds is not going to happen. Modify the test
suites to just ensure that things reconvderge.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Debugs take up a significant amount of cpu time as well as
increased disk space for storage of results. Reduce test
over head by removing the debugs, Hopefully this helps
alleviate some of the overloading that we are seeing in
our CI systems.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The test system under load looks for upstream state only
1 time immediately after sending 2 streams of S,G data
flowing. Give the system some time to process this
and ensure that it actually shows up in a small
amount of time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The test_ldp_pseudowires_after_link_down test
shuts a link down and was blindly waiting 5 seconds
before just assuming the test system was in a sane
state. Remove the sleep(5) and actually look for
the changed state for the route 2.2.2.2 that the
psueudowire actually depends on.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The test does this:
a) shut link down
b) test for ospf convergence
c) ensure the route is installed
When under a heavily loaded system c) is not guaranteed
to happen quickly. Give the system 10 extra seconds
to ensure it happens.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The route replace test was doing this seq of events:
a) Create nhg
b) Install route w/ sharpd
c) Ensure it worked
d) Modify nhg
d) Ensure the update group replace worked
The problem is that the sharp code is doing this:
/* Only send via ID if nhgroup has been successfully installed */
if (nhgid && sharp_nhgroup_id_is_installed(nhgid)) {
SET_FLAG(api.message, ZAPI_MESSAGE_NHG);
api.nhgid = nhgid;
} else {
for (ALL_NEXTHOPS_PTR(nhg, nh)) {
api_nh = &api.nexthops[i];
zapi_nexthop_from_nexthop(api_nh, nh);
i++;
}
api.nexthop_num = i;
}
The created nhg has not been successfully installed( or at least
sharpd has not read the results yet) when it gets the command
to install the routes. As such it passes down the individual
nexthops instead. The route replace is never going to work.
Modify the code to add a bit of sleep to allow sharpd to
get notified when the system is under load. At this point
there is no way to query sharpd for whether or not it
thinks it's nhg is installed properly or not. This
test is failing all over the place for a bunch of people
let's get this fixed so people can get running
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
the test_nexthop_groups function is failing occassionally
because the test executes 4 in succession sharp install
routes commands. When I dumped the rib on a failed test
run there were only 2 of the 4 routes in the rib and
the two that were in were the last 2 installed.
The sharp daemon setups a event process where it
installs routes `automatically`. If the previous
run is not finished entering a new command to install
the routes will mess up the last one from ever happening.
It is assumed that the user doesn't do stupid stuff here.
In this case I am just adding a small sleep between each
installation to just let the test proceed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
the isis_topo1 test has two functions where immediately
after the test ensures that the routes are in isis
tests to see if they are in the rib. Under system
load I am seeing this test failing because the
routes are still queued. Modify the zebra check
for the isis routes to look for the proper results
for 10 seconds.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Currently, we have a lot of checks in CLI and NB layer to prevent
incompatible IS-types of circuits and areas. All these checks become
completely meaningless when the interface is moved between VRFs. If the
area IS-type is different in the new VRF, previously done checks mean
nothing and we still end up with incorrect circuit IS type. To actually
prevent incorrect IS type, all checks must be done in the processing
code.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
This code has two issues:
a) The loop to test for successful installation re-installs
the route every time it loops. A system under load will
have issues ensuring the route is installed and repeated
attempts does not help
b) The nexthop group installation was always failing
but never noticed (because of the previous commit)
and the test was always passing, when it should
have never passed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The test is checking installing of seg6 routes by this
loop:
for up to 5 times:
sharp install seg6 route
show ip route and is it installed
The problem is that if the system is under heavy
load the installation may not have happened yet
and by immediately reinstalling the same route
the same thing could happen again.
Modify the code to pull the route installation
outside of the loop and to increase to 10 attempts
in case there is very heavy system load.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The check_ping function `_check` function was asserting and being
passed to the topotests.run_and_expect() functionality causing
it to not run the full range of pings if one failed the test.
So effectively it was properly detecting pass / failure but
only allowing for 1 iteration if it was going to fail.
Modify the code to not assert and act like all the other
run_and_expect functionality.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
isis_tlvs.c would fail at multiple places if incorrect
TLVs were received in unpack_item_ext_subtlvs(),
causing stream assertion violations.
Signed-off-by: Juraj Vijtiuk <juraj.vijtiuk@sartura.hr>
The bgp_l3vpn_to_direct test is failing sometimes because
the 2.2.2.2 route is dissapearing. What is happening?
The log file for the failed test run shows us this:
build 15-Oct-2021 07:26:12 scripts/adjacencies.py:8 WAIT:r4:ping 2.2.2.2 -c 1: 0. packet loss:wait:PE->P2 (loopback) ping:60:0.5:
build 15-Oct-2021 07:26:12 Fri Oct 15 14:26:12 2021 (#9) scripts/adjacencies.py:8 COMMAND:r4:ping 2.2.2.2 -c 1: 0. packet loss:wait:PE->P2 (loopback) ping:
build 15-Oct-2021 07:26:12 COMMAND OUTPUT:PING 2.2.2.2 (2.2.2.2) 56(84) bytes of data.
build 15-Oct-2021 07:26:12 64 bytes from 2.2.2.2: icmp_seq=1 ttl=64 time=0.143 ms
build 15-Oct-2021 07:26:12
build 15-Oct-2021 07:26:12 --- 2.2.2.2 ping statistics ---
build 15-Oct-2021 07:26:12 1 packets transmitted, 1 received, 0% packet loss, time 0ms
build 15-Oct-2021 07:26:12 rtt min/avg/max/mdev = 0.143/0.143/0.143/0.000 ms:
build 15-Oct-2021 07:26:12 Done after 1 loops, time=0.024507761001586914, Found= 0% packet loss
build 15-Oct-2021 07:26:12 Fri Oct 15 14:26:12 2021 (#9) scripts/adjacencies.py:9 COMMAND:r4:ping 2.2.2.2 -c 1: 0. packet loss:pass:PE->P2 (loopback) ping +0.02 secs:
build 15-Oct-2021 07:26:12 2021-10-15 14:26:12,446 WARNING: topolog.r4: LinuxNamespace(r4): proc failed: rc 2 pid 28826
build 15-Oct-2021 07:26:12 args: /usr/bin/nsenter -a -t 27444 -F --wd=/tmp/topotests/bgp_l3vpn_to_bgp_direct.test_bgp_l3vpn_to_bgp_direct/r4 /bin/bash -c ping 2.2.2.2 -c 1
build 15-Oct-2021 07:26:12 stdout: connect: Network is unreachable:
build 15-Oct-2021 07:26:17 COMMAND OUTPUT:connect: Network is unreachable:
build 15-Oct-2021 07:26:17 R:9 r4 PE->P2 (loopback) ping +0.02 secs 0 1
So the 2.2.2.2 route is coming/going and is failing on these test lines:
luCommand(
"r1", "ping 2.2.2.2 -c 1", " 0. packet loss", "wait", "PE->P2 (loopback) ping", 60
)
luCommand(
"r3", "ping 2.2.2.2 -c 1", " 0. packet loss", "wait", "PE->P2 (loopback) ping", 60
)
luCommand(
"r4", "ping 2.2.2.2 -c 1", " 0. packet loss", "wait", "PE->P2 (loopback) ping", 60
)
So the 2.2.2.2 routes on r1,3 and 4 are received via ospf, but are
modified by some other process to add labels ( probably ldp, since
it is running too ). The 2nd ping to 2.2.2.2 is failing because
the 2.2.2.2 route on r4 is being replaced. As an example here
is `ip monitor all` on r4 during boot up. Please note timestamps
are not necessarily representative of what we will see on the
loaded ci system.
[2021-10-15T15:46:52.261456] [NEXTHOP]id 27 via 10.0.2.2 dev r4-eth0 scope link proto zebra
[2021-10-15T15:46:52.261490] [ROUTE]2.2.2.2 nhid 27 via 10.0.2.2 dev r4-eth0 proto ospf metric 20
<snip>
[2021-10-15T15:46:53.556405] [NEXTHOP]Deleted id 27 via 10.0.2.2 dev r4-eth0 scope link proto zebra
<snip>
[2021-10-15T15:46:53.566575] [NEXTHOP]id 32 via 10.0.2.2 dev r4-eth0 scope link proto zebra
[2021-10-15T15:46:53.566585] [ROUTE]2.2.2.2 nhid 32 via 10.0.2.2 dev r4-eth0 proto ospf metric 20
For a small amount of time the route was *gone*. I believe the upstream
CI system hits that window sometimes, causing the test to fail.
This patch attempts to ensure that the 2.2.2.2 route should be learned
appropriately ( thus slowing it down ) before the test moves onto
the ping. I suspect the long term answer might be to add a test to
the scripts/adjancies.py script to ensure that the test does not
continue until the appropriate label is in place, but I want to
make the test run a bit more perscriptive in what it is looking
for here.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Recent commit 83f325901a had a accidental
turn of a 1 second wait into a 10 second wait
between retries. 10 seconds is too long.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Test doesn't wait long enough when it checks the routers after
restart. On slower systems, it frequently failed as it ran out
of time
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
When our ci test system is under high load, expecting bfd to converge
in under 2 seconds is not going to happen. Modify the test suites
to just ensure that things converge. If we need actual functional
testing of bfd response times the topotests are not an appropriate place
to do this or we need to modify the test system to gather the data for
how long it takes after the tests are run.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
During a local CI run, bgp_ecmp_topo3 was failing
to properly notice the fast-convergence command
issued before the interface is shut down. As
such there exists a race condition where under
high load the zebra process can actually shut
an interface down before we have properly ensured
that fast convergence is on for ibgp.
Modify the test for in two ways:
1) Ensure that previous section makes sure
that we have properly converged for when we
bring back up the interfaces instead of
assuming that we have done so.
2) After issuing the fast-convergence command.
Ensure that bgp has fully processed it and is
ready to receive the interface down events
as triggers for shutting down the ibgp session.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
On a local CI run. The test_ldp_topo1.py showed fail to converge
on r3. r3 has 2 neighbors but only 1 was up when we got to
further steps in the test suites.
Modify the neighbor checking to `know` how many neighbors
should be operational and continue looking for them until
they are up and running.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Previously, when a valgrind memleak was discovered, would cause a
catastrophic pytest failure. Now correctly fails the current pytest as
intended.
As a result of this fix --valgrind-memleaks now works in distributed
pytest mode as well.
Signed-off-by: Christian Hopps <chopps@labn.net>
Revert the accidental enabling of the optional memleak tests that came
with the large micronet changeset.
Signed-off-by: Christian Hopps <chopps@labn.net>
The nexthop group code is installing routes and nexthop groups
and immediately expecting zebra to have processed the results
as a result there is a situation when the CI system is under
intense load that the nexthop group might not have been processed.
Add a bit of code to allow the test to give FRR some time
to finish work before declaring it not working.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When the CI system is heavily loaded, we might see the following failures:
```
test failed at "test_config_timing/test_static_timing": assert 20.083204 <= 19.487716
```
Currently we allow each step to run 2 times slower than the initial
measurement. Let's allow them to run 3 times slower.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
On the first step, the test creates 10000 static routes. It passes 10000
to `get_ip_networks` and it generates 10000 /22 routes.
On the fourth step, the test tries to remove 5000 previously created
routes. It passes 5000 to `get_ip_networks` and here starts the problem.
Instead of generating 5000 /22 routes, it generates 5000 /21 routes. And
the whole step is a no-op, we constantly see the following logs:
```
% Refusing to remove a non-existent route
```
To consistently generate same routes, `get_ip_networks` must always use
the same prefix length.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Our topotests send SIGBUS 2 seconds after a SIGTERM is
initiated. This is bad because under a heavily loaded
topotest system we may have a case where the system has
not had a chance to properly shut down the daemon.
Extend the time greatly before topotests send SIGBUS.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This removes a giant `switch { }` block from lib/zclient.c and
harmonizes all zclient callback function types to be the same (some had
a subset of the args, some had a void return, now they all have
ZAPI_CALLBACK_ARGS and int return.)
Apart from getting rid of the giant switch, this is a minor security
benefit since the function pointers are now in a `const` array, so they
can't be overwritten by e.g. heap overflows for code execution anymore.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
*_anywhere(item) returns whether an item is on _any_ container. Only
available for unsorted containers for now.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This provides a "is this item on this list" check, which may or may not
be faster than using *_find() for the same purpose. (If the container
has no faster way of doing it, it falls back to using *_find().)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Even if it doesn't matter for an unit test in general, it hides actual
leaks in the code being tested. Fix so any leaks will be actual bugs.
(Currently there aren't any, yay.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
This script is failing occassionally in our upstream topotests.
Where it was changing route-maps and attempting to see if
summarization was working correctly. The problem was that
the code appeared to be attempting to add route-maps to
redistribution in ospf then modifying the route-maps behavior
to affect summarization as well as the metric type of that
summarization.
The problem is of course that ospf does not appear to modify
the summary routes metric-type when the components
of that summary change it's metric-type. So the test
is testing nothing. In addition the test had messed
up the usage of the route-map generation code and all
the generated config was in different sequence numbers
but route-map processing would never get to those
new sequence numbers because of how route-maps are processed.
Let's just remove this part of the test instead of trying
to unwind it into anything meaningfull
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Several tests used the route_map_create functionality
with `metric-type` but never bothered to add the
backend code to ensure it works correctly.
Add it in so it can be used.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
We have this pattern in this test:
# Let's kill the interface on rt2 and see what happens with the RIB and BFD on rt1
tgen.gears["rt2"].link_enable("eth-rt1", enabled=False)
# By default BFD provides a recovery time of 900ms plus jitter, so let's wait
# initial 2 seconds to let the CI not suffer.
topotest.sleep(2, 'Wait for BFD down notification')
router_compare_json_output(
"rt1", "show ip route ospf json", "step3/show_ip_route_rt2_down.ref", 1, 0
)
Under a heavy CI load, interface down events and then reacting to them may not actually
happen within 2 seconds. Allow some more grace time in the test to ensure that we
react to it in an appropriate manner.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
OSPF when it is deciding on whom it should elect for DR and backup
has a process that prioritizes network stabilty over the exact
same results of who is the DR / Backups.
Essentially if we have r1 ----- r2
Let's say r1 has a higher priority, but r2 comes up first, starts
sending hello packets and then decides that it is the DR. At some
point in time in the future, r1 comes up and then connects to r2
at that point it sees that r2 has elected itself DR and it keeps
it that way.
This is by design of the system. With our tight ospf timers as
well as high load being experienced on our test systems. There
exists a bunch of ospf tests that we cannot guarantee that a
consistent DR will be elected for the test. As such let's not
even pretend that we care a bunch and just look for `Full`.
If we care about `ordering` we need to spend more time getting
the tests to actually start routers, ensure that htey are up and
running in the right order so that priority can take place.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fix a loop in the setup phase of isis_topo1_vrf: only configure
interfaces that each router actually has.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
Ensure GR helpers have received a Grace-LSA before killing the
ospfd/ospf6d process that is undergoing a graceful restart.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
There's no more difference between number-named and word-named access-lists.
This commit removes separate arguments for number-named ACLs from CLI.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
BGP LU will use implicit-null in more situations now; adjust
the original LU topotest to align with that. Node R2 uses
imp-null now, while R1 continues to allocate labels.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
Add a second BGP labelled-unicast (BGP-LU) test suite, with
an additional router and some additional tests.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
2 things:
a) Each test was setting up for graceful restart with calls to
`graceful-restart prepare ip[v6] ospf`, then sleeping for
3 or 5 seconds. Then killing the ospf process. Under heavy
load there is no guarantee that zebra has received/processed
this signal. Write some code to ensure that this happens
b) Tests are issuing commands in this order:
1) issue gr prepare command
2) kill router
3) <ensure routes were still installed in zebra>
4) start router
5) <ensure routes were stil installed in zebra>
Imagine that the system is under some load and there is
a small amount of time before step 5 happens. In this
case ospf could have come up and started neighbor relations
and also started installing routes. If zebra receives
a new route before step 5 is issued then the route could
be in a state where it is not installed, because it is
being sent to the kernel for installation. This would
fail the test because it would only look 1 time. This
is fixed by giving time on restart for the routes to
be in the installed state.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Any command that uses `peer_lookup_in_view` crashes when "vrf all" is
used, because bgp is NULL in this case.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
FRR should only ever use the appropriate THREAD_ON/THREAD_OFF
semantics. This is espacially true for the functions we
end up calling the thread for.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Remove references to the deprecated "CLI()" function; clean up
a couple of string escapes; make one test-case sensitive to
previous failures.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
Some tests had commented-out references to the old "CLI()"
function. Remove those so they're not confusing in the future,
and replace at least one with a comment that uses the
'mininet_cli()' function.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
* Add new debug directives for NSSA LSAs;
* Remove the "debug ospf6 gr helper" command since it doesn't make
sense for this test (not to mention it was renamed to "debug ospf6
graceful-restart");
* Migrate to the new interface-level command to enable OSPFv3 on
interfaces ("interface WORD area A.B.C.D" was deprecated).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Document the `sleep` statement so people know that we are sleeping
because we are waiting for the BFD down notification. If we don't
sleep here it is possible that we get outdated `show` command results.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Call the `show` commands less often to reduce the CPU pressure.
Also increase the wait time from 60 to 80 seconds to have spare room
for failures (4 times more). This is the latest measure wait time:
> INFO: topolog: 'router_json_cmp' succeeded after 20.08 seconds
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Reduce timers so we send hello packets more often and reduce dead
interval to converge faster.
Previous test wait amount:
> INFO: topolog: 'router_json_cmp' succeeded after 47.20 seconds
New test wait amount:
> INFO: topolog: 'router_json_cmp' succeeded after 20.08 seconds
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
FRR should only ever use the appropriate THREAD_ON/THREAD_OFF
semantics. This is espacially true for the functions we
end up calling the thread for.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Add the "default-information-originate" option to the "area X nssa"
command. That option allows the origination of Type-7 default routes
on NSSA ABRs and ASBRs.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The route created by the "default-information-originate" command
isn't a regular external route. As such, an NSSA ABR shouldn't
originate a corresponding Type-7 LSA for it (there's a separate
configuration knob to generate Type-7 default routes).
While here, fix a small issue in ospf6_asbr_redistribute_add()
where routes created by "default-information-originate" were being
displayed with an incorrect "unknown" type.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Fix wrong comparison since route->path.metric_type is always set
to either 1 or 2. The OSPF6_PATH_TYPE_EXTERNAL2 constant, whose
value is 4, refers to a route type so its usage was incorrect here.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Considering that both the GR helper mode and restarting mode can be
enabled at the same time, the "graceful-restart helper-only" command
can be a bit misleading since it implies that only the helper mode
is enabled. Rename the command to "graceful-restart helper enable"
to clarify what the command does.
Start a deprecation cycle of one year before removing the original
command
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Because vrf backend may be based on namespaces, each vrf can
use in the [16-(2^32-1)] range table identifier for daemons that
request it. Extend the table manager to be hosted by vrf.
That possibility is disabled in the case the vrf backend is vrflite.
In that case, all vrf context use the same table manager instance.
Add a configuration command to be able to configure the wished
range of tables to use. This is a solution that permits to give
chunks to bgp daemon when it works with bgp flowspec entries and
wants to use specific iptables that do not override vrf tables.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Considering that both the GR helper mode and restarting mode can be
enabled at the same time, the "graceful-restart helper-only" command
can be a bit misleading since it implies that only the helper mode
is enabled. Rename the command to "graceful-restart helper enable"
to clarify what the command does.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Issue #9535 describes how the export-list/import-list commands work
differently on ospfd and ospf6d.
In short:
* On ospfd, "area A.B.C.D export-list" filters which internal
routes an ABR exports to other areas. On ospf6d, instead, that
command filters which inter-area routes an ABR exports to the
configured area (which is quite counter-intuitive). In other words,
both commands do the same but in opposite directions.
* On ospfd, "area A.B.C.D import-list" filters which inter-area
routes an ABR imports into the configured area. On ospf6d, that
command filters which inter-area routes an interior router accepts.
* On both daemons, "area A.B.C.D filter-list prefix NAME <in|out>"
works exactly the same as import/export lists, but using prefix-lists
instead of ACLs.
The inconsistency on how those commands work is undesirable. This
PR proposes to adapt the ospf6d commands to behave like they do
in ospfd.
These changes are obviously backward incompatible and this PR doesn't
propose any mitigation strategy other than warning users about the
changes in the next release notes. Since these ospf6d commands are
undocumented and work in such a peculiar way, it's unlikely many
users will be affected (if any at all).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
A bunch of tests have this pattern:
a) Install a new prefix into bgp
b) Run this loop:
foreach (router in topology) {
verify_bgp_rib(router)
}
This is to ensure that the prefix is actually disseminated.
The problem with this, of course, is that a wait of 2 seconds
for every item in that loop makes no sense. As that the initial
router verification of it's bgp rib will wait 2 seconds and
all the remaining bgp routers in the topology will have gotten
the data. So we end up waiting a bunch of extra time.
Remove the initial_wait time for verify_bgp_rib. Also
increase the failure wait time to 30 seconds. This is
to give a bigger window for bgp to send it's data for
our test systems that could be under heavy load. In the
normal case tests will never hit this.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a new topotest that features a topology with seven routers spread
across four OSPF areas:
* 1 backbone area;
* 1 regular non-backbone area (0.0.0.1);
* 1 stub area (0.0.0.2);
* 1 NSSA area (0.0.0.3).
All routers have both GR and GR helper functionality enabled in
the configuration. The test consists of restarting each router,
one at time, and checking that all forwarding planes (and LSDBs)
are kept intact during those restarts.
A successful run takes about three minutes to finish.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Compilation is warning that a memcpy is only copying
the first (sizeof pointer) into memory. This is not
what we really want. Although it does beg the question about
why this memcpy is needed( or what it is doing ). I'm going
to just fix the memcpy and call it a day.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
frrmod_load() attempts to dlopen() several possible paths
(constructed from its basename argument) until one succeeds.
Each dlopen() attempt may fail for a different reason, and
the important one might not be the last one. Example:
dlopen(a/foo): file not found
dlopen(b/foo): symbol "bar" missing
dlopen(c/foo): file not found
Previous code reported only the most recent error. Now frrmod_load()
describes each dlopen() failure.
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
1. Optimized test: test_clear_pim_neighbors_and_mroute_p0 run time by clearing
mroute and verifying mroutes separately. Execution time is reduced from almots 10 mins
to ~220 sec.
Signed-off-by: Kuldeep Kashyap <kashyapk@vmware.com>
route_scale run is 500+ seconds. Break it up into
2 separate tests. This should reduce run time a slight
bit.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fix issue of topotest failures with BGP status Connect or Idle
instead of the expected Active
Signed-off-by: Martin Winter <mwinter@opensourcerouting.org>
Modernize the test a bit, generate expected results rather than load from
file, and add a general json_cmp with retry function and use it.
Signed-off-by: Christian Hopps <chopps@labn.net>
- Update the template and documentation to use newer pytest fixutres for
setup and teardown, as well as skipping tests when the suite fails.
Signed-off-by: Christian Hopps <chopps@labn.net>
When looking for a implied host route it is not necessary
to add the `/32` to an ip route add. As such masks
will not be set in this case. Set the value of masks
to a known good value so that when the route installation
fails the test for it actually being there will tell you
that the route is not there -vs- complaining about mask
being uninited.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There is no need to add calls to addKernelRoutes for
groups. They do not need to be routed via the
normal kernel methodology.
Tests run successfully with this change.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
- Fix xterm support to work, previously it mostly didn't, not it should
in all cases (i.e., single or dist mode).
- Catch when the user tries to use various window requiring topotests
features (e.g., --cli-on-error) but isn't running under supported
system (e.g., byobu/tmux/xterm), and fail the run with an explanation.
Signed-off-by: Christian Hopps <chopps@labn.net>
If the SRv6 locator is deleted in zebra, zclient(bgpd)
which allocates SIDs from the locator will update the
RIBs which use those SIDs and make them invalid.
This will cause the VPNv6 route to be withdrawn and
the VPN to stop.
If the SRv6 locator is added again, zclient(bgpd) will
allocate the SIDs from the locator again, and VPNv6
will be re-established.
This commit add a test case to confirm this.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Before this PR, in case of get locator chunk zapi from
zclient, zebra precreated a down state locator and set
the chunk ownership. After this PR, this is no longer
done, and chunks are no longer automatically generated.
In this commit, we will make a test update to check the
corresponding detailed behavior.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Description:
Change is intended for fixing the following issues related to vrf route leaking:
Routes with special nexthops i.e. blackhole/sink routes when imported,
are not programmed into the FIB and corresponding nexthop is set as 'inactive',
nexthop interface as 'unknown'.
While importing/leaking routes between VRFs, in case of special nexthop(ipv4/ipv6)
once bgp announces route(s) to zebra, nexthop type is incorrectly set as
NEXTHOP_TYPE_IPV6_IFINDEX/NEXTHOP_TYPE_IFINDEX
i.e. directly connected even though we are not able to resolve through an interface.
This leads to nexthop_active_check marking nexthop !NEXTHOP_FLAG_ACTIVE.
Unable to find the active nexthop(s), route is not programmed into the FIB.
Whenever BGP leaks routes, set the correct nexthop type, so that route gets resolved
and correctly programmed into the FIB, in the imported vrf.
Co-authored-by: Kantesh Mundaragi <kmundaragi@vmware.com>
Signed-off-by: Iqra Siddiqui <imujeebsiddi@vmware.com>
Create a pid file for the router created by topotest.
By executing nsenter directly against this pid, developers
can execute commands directly from outside the unet shell.
This allows the developer to use script, tab completion, etc.,
and improves efficiency.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
Refactor the bgp_auth test to create common_config code to allow
non-json based tests to reset routers and load configs in parallel.
Signed-off-by: Christian Hopps <chopps@labn.net>
- Reduce OSPF timers to 1 and 4
- Reduce BGP connect timer to 5
- Apply configs in parallel as single file
- Remove the switches as all links are p2p, perhaps this will help with
reliability?
Signed-off-by: Christian Hopps <chopps@labn.net>
Utilizes new pytest fixtures to completely factor out setup and teardown
functionality. Supply the JSON config and write your tests.
"The best topotest template yet!"
Signed-off-by: Christian Hopps <chopps@labn.net>
New generic script uses a new default node specific log dir to avoid
collisions when running in parallel.
Signed-off-by: Christian Hopps <chopps@labn.net>
- The PIM tests do not need kernel routes to help them bind joins and
sources to specific interfaces. They should do that themselves directly.
Also do not change system wide "rp_filter" sysctl away from the value
required by everyone else.
Signed-off-by: Christian Hopps <chopps@labn.net>
There were some tests where we were turning on mpls on
interface names that don't exist for certain `machines`
in the topology. Fix.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is to avoid breaking changes between existing deployments of
extended community for bandwidth encoding. By default FRR uses uint32
to encode bandwidth, which is not as the draft requires (IEEE floating-point).
This switch enables the required encoding per-peer.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
This allows defining a CLI command like this:
`[no] some setting ![VALUE]`
with VALUE being optional for the "no" form, but required for the
positive form. It's just a `[...]` where the empty branch can only be
taken for commands starting with `no`.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Add the "metric" and "metric-type" options to the "redistribute"
command.
This is a small commit since the logic of setting the metric
value and type of external routes was already present due to the
implementation of the "default-information originate" command months
ago. This commit merely extends the "redistribute" command to
leverage that functionality.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Also, update the ospf6_topo2 topotest since the expected output
was wrong. With this fix, NSSA routes will be created on r2
("redistribute connected"), and NSSA routes appear in the routing
table as regular external routes.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
OSPF mixes uses of "delete" and "del_action" depending on which library
function is called. It's a bug-prone mess that needs fixing; however, for
now we fix the one obvious incorrect use in this test.
Signed-off-by: Christian Hopps <chopps@labn.net>
There is a possibility that the same line can be matched as a command in
some node and its parent node. In this case, when reading the config,
this line is always executed as a command of the child node.
For example, with the following config:
```
router ospf
network 193.168.0.0/16 area 0
!
mpls ldp
discovery hello interval 111
!
```
Line `mpls ldp` is processed as command `mpls ldp-sync` inside the
`router ospf` node. This leads to a complete loss of `mpls ldp` node
configuration.
To eliminate this issue and all possible similar issues, let's print an
explicit "exit" at the end of every node config.
This commit also changes indentation for a couple of existing exit
commands so that all existing commands are on the same level as their
corresponding node-entering commands.
Fixes#9206.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
In particular, the fixed 2 second sleep here was not long enough.
Switch to standard run_and_expect polling to make test more robust.
Signed-off-by: Christian Hopps <chopps@labn.net>
- In order to run tests in parallel the netns-based vrfs need to
have unique names primarily bc they are all tracked/looked-up in
`/run/netns` which is not network namespace nesting friendly
- use ip(8) exclusively rather than a mix of `ip` and `ifconfig`
and `vconfig`, reducing required pkg count by a couple.
Signed-off-by: Christian Hopps <chopps@labn.net>
- bugs in the support library function `verify_gr_address_family`
allowed this test to pass depending on ordering of python dictinoary
keys. Fix the bugs, fix the test.
Signed-off-by: Christian Hopps <chopps@labn.net>
Related: http://docs.frrouting.org/projects/dev-guide/en/latest/topotests.html
Directory name for a new topotest must not contain hyphen (-) characters.
To separate words, use underscores (_). For example, tests/topotests/bgp_new_example.
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
Meanwhile we don't get all MSDP features (MSDP route validation via BGP
AS Path as described in RFC 4611 Section 2), kill one of the links of
the topology to avoid intermittent test failures due to different
traffic route.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Remove a 200 second sleep from bgp-evpn-overlay-index-gateway.
There does not seem to be any evidence that this is needed
and I cannot make the test fail without it.
Fixes: #9035
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
- A more general fix for the bgp listener test which requires interfaces be
configured in the kernel when the bgpd daemons are launched.
Signed-off-by: Christian Hopps <chopps@labn.net>
The test_simple_snmp.py test starts bgp, zebra and snmpd at the
same time. Then zebra configuration is read in and interface
addresses are applied. If snmp start slower than zebra
the snmp process can properly get it's ip address to bind to
if it is faster than zebra, it will fail. Ensure that the
test has addresses before we start daemons.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running this test on a locally loaded system I am seeing the
static route as `queued` still after 1 second. Let's just blanket
increase the timeout to something longer to give a very loaded system
more time to install the route.
Output on my test system when it was loaded:
INFO topolog.r1:topogen.py:880 vtysh result:
{
"4.5.1.0/24":[
{
"prefix":"4.5.1.0/24",
"prefixLen":24,
"protocol":"static",
"vrfId":0,
"vrfName":"default",
"selected":true,
"destSelected":true,
"distance":1,
"metric":0,
"queued":true,
"table":254,
"internalStatus":8,
"internalFlags":73,
"internalNextHopNum":1,
"internalNextHopActiveNum":1,
"uptime":"00:00:00",
"nexthops":[
{
"flags":1,
"ip":"192.168.216.3",
"afi":"ipv4",
"interfaceIndex":11,
"interfaceName":"r1-eth6",
"active":true,
"weight":1
}
]
},
I suspect 10 seconds should be enough( I would hope ).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Tests should have low enough overhead that sending
the join/prune every 5 seconds should be sufficient
also it should allow us to converge faster in case of
dropped packets.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
frrscript_load now loads a function instead of a file, so frrscript_unload
should be renamed since it does not unload a function.
Signed-off-by: Donald Lee <dlqs@gmx.com>
- Remove incorrect requirement for `service integrated-vtysh-config`
when producing a delta.
- Add `--test-reset` option which suppresses non-parseable lines from the
produced delta
- Use new features in common_config.py
Signed-off-by: Christian Hopps <chopps@labn.net>
TMUX and Screen support when running topotests inside docker. This
allows the gdb, shell and vtysh features to correctly work even when
running the tests inside docker.
Add options:
--asan-abort :: aborts the process on ASAN errors
--strace-daemons :: strace some or all daemons
Signed-off-by: Christian Hopps <chopps@labn.net>
Some BGP updates received by BGP invite local router to
install a route through itself. The system will not do it, and
the route should be considered as not valid at the earliest.
This case is detected on the zebra, and this detection prevents
from trying to install this route to the local system. However,
the nexthop tracking mechanism is called, and acts as if the route
was valid, which is not the case.
By detecting in BGP that use case, we avoid installing the invalid
routes.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Since the common CLI code calls nb_init, allow specifying some modules
to load by overriding test_yang_models.
Signed-off-by: David Lamparter <equinox@diac24.net>
Add a new topotest that features a topology with seven routers spread
across four OSPF areas:
* 1 backbone area;
* 1 regular non-backbone area (0.0.0.1);
* 1 stub area (0.0.0.2);
* 1 NSSA area (0.0.0.3).
All routers have both GR and GR helper functionality enabled in
the configuration. The test consists of restarting each router,
one at time, and checking that all forwarding planes (and LSDBs)
are kept intact during those restarts.
A successful run takes about three minutes to finish.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Using "write memory" to save the daemons' configurations before
restarting them can cause log files to stop working correctly. Add
a new "save_config" to the kill_router_daemons() function to prevent
that from happening when saving the configurations isn't necessary.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Speedup (large topo): OLD: ~6 minutes NEW: ~1 second
(when paired with generate_support_bundle.py changes)
- Collect from each node in parallel
Bug fixes:
- sub-directory test name was the same internal pytest function name
for any test, and not the actual test name.
Signed-off-by: Christian Hopps <chopps@labn.net>
On a loaded machine running FRR with ASAN I've got the following result:
INFO: waiting MSDP connection from peer 10.254.254.3 on router r1
INFO: 'router_json_cmp' polling started (interval 1 secs, maximum 30 tries)
INFO: 'router_json_cmp' succeeded after 22.53 seconds
Which is very close to the limit, so lets bump the value 4x to avoid a
test false positive.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
This python fixture was way too complex for what is needed.
Eliminate gratuitous options/over-engineering:
- Change from non-deterministic `wait` and `attempts` to a single
`retry_timeout` value. This is both more deterministic, as well as
what the user should actually be thinking about.
- Use a fixed 2 second pause between executing the wrapped function
rather than a bunch of arbitrary choices of 2, 3 and 4 seconds
spread all over the test code.
- Get rid of the multiple variables for determining what "Positive" and
"Negative" results are. Instead just implement what all the user code
already wants, i.e., boolean False or a str (errormsg) means
"Negative" result otherwise it's a "Positive" result.
- As part of the above the inversion logic is much more comprehensible
in the fixture code (and more correct to boot).
Signed-off-by: Christian Hopps <chopps@labn.net>
Pylint cleanup in commit 914faab594 removed a crucial function
parameter that inverted the logic of verify function calls.
Signed-off-by: Christian Hopps <chopps@labn.net>