Mgmtd makes use of libyang's internal ietf-yang-library module to add
support for said module to FRR management. Previously, mgmtd was loading
this module explicitly; however, that required that libyang's
`ietf-yang-library.yang` module definition file be co-located with FRR's
yang files so that it (and ietf-datastore.yang) would be found when
searched for by libyang using FRRs search path. This isn't always the
case depending on how the user compiles and installs libyang so mgmtd
was failing to run in some cases.
Instead of doing it the above way we simply tell libyang to load it's
internal version of ietf-yang-library when we initialize the libyang
context.
This required adding a boolean to a couple of the init functions which
is why so many files are touched (although all the changes are minimal).
Signed-off-by: Christian Hopps <chopps@labn.net>
It's replaced and simplified by c3fd1e9520c619babb3004cea6df622ca67b0dfa.
JSON topo is just horrible to debug.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Currently bgp multipath has these properties:
a) mp_info may or may not be on a single path, based
upon path perturbations in the past.
b) mp_info->count started counting at 0( meaning 1 ). As that the
bestpath path_info was never included in the count
c) The first mp_info in the list held the multipath data associated
with the multipath. As such if you were at any other node that data
was not filled in.
d) As such the mp_info's that are not first on the list basically
were just pointers to the corresponding bgp_path_info that was in
the multipath.
e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was
created.
f) This linklist was passed in to a comparison function that took the
old mpinfo list and compared it item by item to the linklist and
doing magic to figure out how to create a new mp_info list.
g) the old mp_info and the link list had to be memory managed and
freed up.
h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the
multipath.
This is really complicated. Let's change the algorithm to this:
a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as
BGP_PATH_MULTIPATH_NEW.
b) When running multipath, just walk the list of bgp_path_info's and if
it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH.
If we run out of space to put in the ecmp, clear the flag on the rest.
c) Clean up the counting of sometimes adding 1 to the mpath count.
d) Only allocate a mpath_info node for the bestpath. Clean it up
when done with it.
e) remove the unneeded list management associated with the linklist and
the mp_list.
This greatly simplifies multipath computation for bgp and reduces memory
load for large scale deployments.
2 full feeds in work_queue_run prior:
0 56367.471 1123 50193 493695 50362 493791 0 0 0 TE work_queue_run
BGP multipath info : 1941844 48 110780992 1941844 110780992
2 full feeds in work_queue_run after change:
1 52924.931 1296 40837 465968 41025 487390 0 0 1 TE work_queue_run
BGP multipath info : 970860 32 38836880 970866 38837120
Aproximately 4 seconds of saved cpu time for convergence and ~75 mb
smaller run time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
With AutoRP discovery running by default, that adds a new
IGMP group that needs to be accounted for in IGMP output.
For pim.py
The clear IGMP interfaces function is in a broken state. It was
already ignoring any errors and returned true always, but with
the addition of the AutoRP discovery group, you could end up
with a different group order in the json which would cause a key
error making the test fail. For now I just added a check to
avoid the key error.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
With AutoRP discovery running by default, that adds a new
IGMP group that needs to be accounted for in IGMP output.
For multicast_pim_sm_topo3:
Ignore the total group number as it is unnecessary for the test.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Uses hardcoded sample AutoRP packets injected in to test
message parsing and proper application of AutoRP learned
RP info. Tests mix of AutoRP and static RP's.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Range is wrong. We want values 1 and 2 but we only test 1.
> >>> for i in range(1, 2):
> ... print(i)
> ...
> 1
Fixes: abd2a1ff3f ("tests: Test some basic kernel <-> zebra interactions")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
There are no tests that ensured that turning off then on
v4 and v6 forwarding actually worked. This does so.
This was found via looking at the code coverage.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
None of the bgp dump code was even tested. Add a bit
of basic stuff that it at least generates a dump file.
This can be extended at a future time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In ospf_multi_vrf_bgp_route_leak, the admin distance for the
redistributed ospf route should be 110, and should remain as 110 after
it's imported into another vrf, and then downloaded to zebra.
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
Apparently logger.warn is being deprecated. So let's
switch over to logger.warning. Clearly it's better
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If the developer pass way too low timers, we end up with most likely false-positive
situations for random tests under a high load of the system.
It would be better to fallback to the minimum default values for such a cases.
E.g.:
```
WARNING: topo: Waiting time is too small (count=1, wait=0.5), using default values (count=20, wait=3)
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
No need to do 'no set as-path exclude' to replace the current rule by
another. The code is supposed to support the replacement.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Check that the following does not cause a crash:
> route-map r2 permit 6
> set as-path exclude 65555
> set as-path exclude as-path-access-list NON-EXISTING
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
This allows retrying functions to possibly change their logging level
for diagnostics.
In order to maintain backward compatibility with this longstanding
function we catch the specific exception of it not being handled by the
retrying function and call again w/o the argument.
Signed-off-by: Christian Hopps <chopps@labn.net>
Enabe/fix using a munet.yaml config file for topology configuration.
Easier test writing.
This also uses the standard `frrinit.sh` to launch and teardown
FRR, so we actually test what most users use.
Signed-off-by: Christian Hopps <chopps@labn.net>
When trying to track down a MTYPE_TMP memory leak
it's harder to search for it when you happen to
have some usage of ttable_dump. Let's just give
it it's own memory type so that we can avoid
confusion in the future.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fully check the NHRP convergence after setting nhs1 down. Otherwise the
ping may pass because the previous shortcut is still present.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
After setting down nhs1, the test is checking that nhc1 routing table
matches routes in nhc1/nhrp_route.json. It is incorrect because it
checks that the NHRP route to nhs1 is still present but it should have
disappeared.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Rename router variables in nhrp_redundancy to match the actual name.
Cosmetic change to help debugging.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
The expected prefix should be 5.5.5.0/24 otherwise the hosts behind NHRP
client 1 nhc1 (aka. r5) are not reachable via NHRP.
The issue was not seen in the FRR official CI because the tests were
skipped because iptables were missing in CI machines.
It solves the 16690 issue.
Fixes: https://github.com/FRRouting/frr/issues/16690
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Recent commits moved the default retries to 60, but
the higher ecmp counts were over-riding to 40. Let's
make it 80.
Noticed this when I went looking at failures on 386 platforms
in our ci. Route scale is timing out when deleting routes.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Test fails:
test_func = partial(
topotest.router_json_cmp,
router,
"show ip ospf vrf {0}-ospf-cust1 json".format(rname),
expected,
)
_, diff = topotest.run_and_expect(test_func, None, count=10, wait=0.5)
assertmsg = '"{}" JSON output mismatches'.format(rname)
> assert diff is None, assertmsg
E AssertionError: "r1" JSON output mismatches
E assert Generated JSON diff error report:
E
E > $->r1-ospf-cust1->areas->0.0.0.0->nbrFullAdjacentCounter: output has element with value '1' but in expected it has value '2'
/home/sharpd/frr2/tests/topotests/ospf_netns_vrf/test_ospf_netns_vrf.py:239: AssertionError
Support bundle has this data:
r1# show ip ospf vrf all neighbor
% 2024/08/28 14:55:54.763
VRF Name: r1-ospf-cust1
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
10.0.255.3 1 Full/DR 10.547s 39.456s 10.0.3.1 r1-eth1:10.0.3.2 0 0 0
10.0.255.2 1 Full/Backup 0.543s 38.378s 10.0.3.3 r1-eth1:10.0.3.2 1 0 0
So immediately after the test fails this test, the neighbor comes up.
Let's give the test a bit more time for failure to not happen
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Giving only 5 seconds to pass bgp data to peers on a heavily
loaded system is a recipe for not having fun. Add more time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This test was killing bgp on r1 and r2
and then immediately testing that the
default route transitioned. Unfortunately
the test was written that under load the
system might be in a bad state. Let's
modify the code to check for a bgp version
change and then that the bgp state has
come back up
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Implement common code for debug config output and remove daemon-specific
code that is duplicated everywhere.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The debug library allows to register a `debug_set_all` callback which
should enable all debugs in a daemon. This callback is implemented
exactly the same in each daemon. Instead of duplicating the code, rework
the lib to allow registration of each debug type, and implement the
common code only once in the lib.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
a) A noprefix address by itself should not create a connected route.
This was pre-existing.
b) A noprefix address with a corresponding route should result in a
connected route. This is how NetworkManager appears to work.
This is new behavior, so a new test.
c) A route is added to the system from someone else.
This is new behavior, so a new test.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The dead timer is set to 4 seconds, while the hello interval is set to 6535.
This test will only pass of the platform is fast enough for ospfv3 to
converge in 4 seconds. These timers were already tested multiple time earlier.
This test should just make sure that the boundary value 65535 is configurable,
Other changes in this commit:
- add sequence numbers to the dead intervals tests to make it easier to
track test faliures.
- swap the config order in one test to match order with all other tests.
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
Current code adds a new vlan interface, sets up ospf and
pim on it and immediately starts shoving data down the pipes.
This of course has the fun thing where the IGP and pim do not
always come up in a nice neat manner and the test is looking
for state from a nice neat come up, even though pim is `working`
correctly it is not correct for what the test wants.
Modify the code to ensure that ospf is up and has propagated
the route where it is needed as well as that pim neighbors have
properly come up, then initiate the multicast streams and igmp
reports.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Changes:
- mutini: handle possible missed zombie cleanup leading to test hangs
- mutini: also we avoid logging in the signal handler which was causing
an exception.
Signed-off-by: Christian Hopps <chopps@labn.net>
All routes received by zebra from upper level protocols have a weight
of 1. Let's just make everything extremely consistent in our code.
Lot's of tests needed to be fixed up to make this work.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This test checks the bgp crash on rt2 when 2 commands
launched consequently:
T0: rr, config -> router bgp 65004 -> neighbor 192.168.12.2 password 8888
T1: rt2, snmpwalk -v 2c -c public 127.0.0.1 .1.3.6.1.4.1.7336.4.2.1
T2: test if rt2 bgp is crashed.
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
Copied the existing "join-group" test and modified to test
static groups instead. Functionally the same but without IGMP
reports.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Upstream CI is frequently running into a situation where
the routes are not being installed. These routes
start at the beginning and suddenly in the middle
they start working properly.
D 1.0.15.183/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.184/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.185/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D>* 1.0.15.186/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.187/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.188/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
Turning on some debugs showed that the failed installed routes are
trying to be matched against the default route. Thus implying
all the connected routes for the test are not yet successfully
installed. Let's modify the test(s) on startup to just ensure
that the connected routes are installed correctly. I am no
longer seeing the problem after this change.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The all_protocol_startup topotest needs to allow for some delay
between configuring nexthop-groups and their installation. Add
some wait periods in a couple of nhg test cases.
Signed-off-by: Mark Stapp <mjs@cisco.com>
Vtysh has been improved to startup very quickly this exposed a race in this
test, where the `clear ip rip...` command ran before the test client that
handles it had finished connecting to mgmtd. Add a retried check for the test
client being connected before issuing the `clear ip rip ...` test command.
Signed-off-by: Christian Hopps <chopps@labn.net>
Add some timers to make convergence happan as fast as possible
when a connection fails on the intial attempt.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add some timers to make the convergence happen as fast as possible
when a connection fails on the initial attempt.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running bfd_bgp_cbit_topo3 and an intial connection
goes wrong, try to connect again as fast as possible as
that the timer is 2 minutes otherwise and the test will
never come back from it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This test is frequently failing in the upstream CI. Most
log failures are stating that we expected something like
1 million routes but we have 900k+. Looks like the system
is just loaded a bit more than expected. Let's give these
tests a bit more time to complete.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Store a parsed and built graph of the CLI nodes in vtysh, rather than
parsing and building that graph every time vtysh starts up.
This provides a 3x to 5x reduction in vtysh startup overhead:
`vtysh -c 'configure' -c 'interface lo' -c 'do show version'`
- before: 92.9M cycles, 1114 samples
- after: 16.5M cycles, 330 samples
This improvement is particularly visible for users scripting `vtysh -c`
calls, which notably includes topotests.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Modify existing MSDP topology to use test SA filtering:
- Add new multicast host (so we get two sources for same group)
- Test group only filtering
- Test source / group filtering
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
So the test script is making changes to a vpn configuration by
changing something fundamental about the vpn. This is causing
a window where routes we are interested in are:
present ( from pre-change ) then
withdrawn ( the test change causes this ) then
present ( with the new data )
The test code was trying to test for this by checking
to see if the prefix was there, but due to timing issues
it's not always there when we look for it.
Modify the test to get the vpn table version prior to
the change( as that it should not be moving around ) and
then change the test for the prefix to look for a version
that is later than the vpn's table version. Then we know
that it is *after* everything has stabilized again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
So the test script is making changes to a vpn configuration by
changing something fundamental about the vpn. This is causing
a window where routes we are interested in are:
present ( from pre-change ) then
withdrawn ( the test change causes this ) then
present ( with the new data )
The test code was trying to test for this by checking
to see if the prefix was there, but due to timing issues
it's not always there when we look for it.
Modify the test to get the vpn table version prior to
the change( as that it should not be moving around ) and
then change the test for the prefix to look for a version
that is later than the vpn's table version. Then we know
that it is *after* everything has stabilized again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
a) Make timers more aggressive for this test
b) Double run_and_expect time for one sub test.
These two changes cause this test to pass regularly for
me when this test used to fail regularly for me.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Under some circumstances it might happen that the session is quickly UP in the
middle of `clear bgp ...` and `shutdown`. That leads to session be UP, and
the stale routes being cleared quickly.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
These failing tests are hard to track down. Added numbering to each assert
to easily tell where the test fails.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
When enabling "mpls ldp-sync" under "router ospf" ospfd configures
SET_FLAG(ldp_sync_info->flags, LDP_SYNC_FLAG_IF_CONFIG) so internally knowing
that the ldp-sync feature is enabled. However the flag is not cleared when
turning of the feature using "nompls ldp-sync"!
https://github.com/FRRouting/frr/issues/16375
Signed-off-by: Christian Breunig <christian@breunig.cc>