The BGP BMP VRF topotest is difficult to debug. It does not say which
prefix is not received by BGP or BMP when it fails.
Rework the test to convert the actual BMP logs to JSON and compare the
BGP table and BMP server logs output to expected reference JSON files.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
The BGP BMP topotest is difficult to debug. It does not say which prefix
is not received by BGP or BMP when it fails.
Rework the test to convert the actual BMP logs to JSON and compare the
BGP table and BMP server logs output to expected reference JSON files.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
WARNING: topo: Waiting time is too small
(count=10, wait=0.5), using default values (count=20, wait=3)
Supress warning by inscreasing wait time.
Signed-off-by: Loïc Sang <loic.sang@6wind.com>
When a prefix is imported using the "network" command under a vrf, which
is a connected prefix, and in the context of label allocation per
nexthop:
..
>router bgp 1 vrf vrf1
> address-family ipv4 unicast
> redistribute static
> network 172.16.0.1/32 <--- connected network
> network 192.168.106.0/29
> label vpn export auto
> label vpn export allocation-mode per-nexthop
..
We encounter an MPLS entry where the nexthop is the prefix itself:
> 18 BGP 172.16.0.1 -
Actually, when using the "network" command, a bnc context is used, but
it is filled by using the prefix itself instead of the nexthop for other
BGP updates. Consequently, when picking up the original nexthop for
label allocation, the function behaves incorrectly. Instead ensure that
the nexthop type of bnc->nexthop is not a nexthop_ifindex; otherwise
fallback to the per vrf label.
Update topotests.
Signed-off-by: Loïc Sang <loic.sang@6wind.com>
bmpserver infinitely loops after the clients has closed the TCP session.
In this situation, recv() returns empty data.
Detect session close immediately.
Fixes: 875511c466 ("topotests: add basic bmp collector")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
In this topotest, the route reflector advertises a route with the
aigp attribute to its client, some with the nexthop unchanged and
some with the nexthp changed. Different aigp values are sent to
the clients depending on the nexthop setting.
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
a local logger masks the global logger and prevents errors from being
gracefully handled within topotest.py
Signed-off-by: Liam Brady <lbrady@labn.net>
Move the various destinations handling into lib/memory.c, include
"normal" logging as target, and make `ACTIVEATEXIT` properly non-error
as it was intended to be.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Add a new topotest for getting the aigp from the "igp-metric"
for a redistributed route (ospf route in the test).
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
Update IS-IS fuzz test to match corrected output after change in the
display of IPv4 mapped IPv6 address.
The update was performed using wuschl [1] like this:
$ wuschl rebuild tests/isisd/test_fuzz_isis_tlv
$ gzip -9 tests/isisd/test_fuzz_isis_tlv_tests.h
[1] https://pypi.org/project/wuschl/
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Fix and adjust the topotest post the fix for route selection with
AIGP.
When there are multiple IGP domains (OSPF in this case), the nexthop
for a BGP route with the AIGP attribute must be resolved in its own
IGP domain.
The changes in r2/bgpd.conf and r3/bgpd.conf are needed as incorrect
IGP metrics are received from NHT for the recursive nexthops. Once
the issue is resolved, the changes can be reverted.
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
AS 65000 | AS 65001
|
RR |
| |
R1 --- | --- R2
|
When r1 peer is an iBGP route reflector client of rr and r2 peer is a
eBGP neighbor of rr, and all three routers shares the same network, r2
receives announcements coming from r1 with a IPv6 link-local nexthop
from rr. This is incorrect as r2 should send traffic to r1 without
involving rr.
Do not send an IPv6 link-local nexthop if the originating peer is a
route-reflector client.
Link: https://github.com/FRRouting/frr/pull/16219#issuecomment-2397425505
Link: https://github.com/FRRouting/frr/pull/17037#discussion_r1792529683
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Add test to check BMP in VRF.
Note that the following configuration works with interface r1-eth0
towards 192.0.2.10 (BMP collector) in the default VRF but not in vrf1.
> router bgp 65501 vrf vrf1
> bmp targets bmp1
> bmp connect 192.0.2.10 port 1789 min-retry 100 max-retry 10000
Also, for some reasons, the test works even without "bgpd: bmp loc-rib
peer up/down for vrfs" commit.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Unset enforce-first-as on r3 of bgp_route_server_client to enable the
reception of routes on this router.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Rework bgp_route_server_client in a more standard form in order to
facilitate the next commut changes. Cosmetic change.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Mgmtd makes use of libyang's internal ietf-yang-library module to add
support for said module to FRR management. Previously, mgmtd was loading
this module explicitly; however, that required that libyang's
`ietf-yang-library.yang` module definition file be co-located with FRR's
yang files so that it (and ietf-datastore.yang) would be found when
searched for by libyang using FRRs search path. This isn't always the
case depending on how the user compiles and installs libyang so mgmtd
was failing to run in some cases.
Instead of doing it the above way we simply tell libyang to load it's
internal version of ietf-yang-library when we initialize the libyang
context.
This required adding a boolean to a couple of the init functions which
is why so many files are touched (although all the changes are minimal).
Signed-off-by: Christian Hopps <chopps@labn.net>
It's replaced and simplified by c3fd1e9520c619babb3004cea6df622ca67b0dfa.
JSON topo is just horrible to debug.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Currently bgp multipath has these properties:
a) mp_info may or may not be on a single path, based
upon path perturbations in the past.
b) mp_info->count started counting at 0( meaning 1 ). As that the
bestpath path_info was never included in the count
c) The first mp_info in the list held the multipath data associated
with the multipath. As such if you were at any other node that data
was not filled in.
d) As such the mp_info's that are not first on the list basically
were just pointers to the corresponding bgp_path_info that was in
the multipath.
e) On bestpath calculation, a linklist(struct linklist *) of bgp_path_info's was
created.
f) This linklist was passed in to a comparison function that took the
old mpinfo list and compared it item by item to the linklist and
doing magic to figure out how to create a new mp_info list.
g) the old mp_info and the link list had to be memory managed and
freed up.
h) BGP_PATH_MULTIPATH is only set on non bestpath nodes in the
multipath.
This is really complicated. Let's change the algorithm to this:
a) When running bestpath, mark a bgp_path_info node that could be in the ecmp path as
BGP_PATH_MULTIPATH_NEW.
b) When running multipath, just walk the list of bgp_path_info's and if
it has BGP_PATH_MULTIPATH_NEW on it, decide if it is in BGP_MULTIPATH.
If we run out of space to put in the ecmp, clear the flag on the rest.
c) Clean up the counting of sometimes adding 1 to the mpath count.
d) Only allocate a mpath_info node for the bestpath. Clean it up
when done with it.
e) remove the unneeded list management associated with the linklist and
the mp_list.
This greatly simplifies multipath computation for bgp and reduces memory
load for large scale deployments.
2 full feeds in work_queue_run prior:
0 56367.471 1123 50193 493695 50362 493791 0 0 0 TE work_queue_run
BGP multipath info : 1941844 48 110780992 1941844 110780992
2 full feeds in work_queue_run after change:
1 52924.931 1296 40837 465968 41025 487390 0 0 1 TE work_queue_run
BGP multipath info : 970860 32 38836880 970866 38837120
Aproximately 4 seconds of saved cpu time for convergence and ~75 mb
smaller run time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
With AutoRP discovery running by default, that adds a new
IGMP group that needs to be accounted for in IGMP output.
For pim.py
The clear IGMP interfaces function is in a broken state. It was
already ignoring any errors and returned true always, but with
the addition of the AutoRP discovery group, you could end up
with a different group order in the json which would cause a key
error making the test fail. For now I just added a check to
avoid the key error.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
With AutoRP discovery running by default, that adds a new
IGMP group that needs to be accounted for in IGMP output.
For multicast_pim_sm_topo3:
Ignore the total group number as it is unnecessary for the test.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Uses hardcoded sample AutoRP packets injected in to test
message parsing and proper application of AutoRP learned
RP info. Tests mix of AutoRP and static RP's.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Range is wrong. We want values 1 and 2 but we only test 1.
> >>> for i in range(1, 2):
> ... print(i)
> ...
> 1
Fixes: abd2a1ff3f ("tests: Test some basic kernel <-> zebra interactions")
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
There are no tests that ensured that turning off then on
v4 and v6 forwarding actually worked. This does so.
This was found via looking at the code coverage.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
None of the bgp dump code was even tested. Add a bit
of basic stuff that it at least generates a dump file.
This can be extended at a future time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In ospf_multi_vrf_bgp_route_leak, the admin distance for the
redistributed ospf route should be 110, and should remain as 110 after
it's imported into another vrf, and then downloaded to zebra.
Signed-off-by: Enke Chen <enchen@paloaltonetworks.com>
Apparently logger.warn is being deprecated. So let's
switch over to logger.warning. Clearly it's better
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If the developer pass way too low timers, we end up with most likely false-positive
situations for random tests under a high load of the system.
It would be better to fallback to the minimum default values for such a cases.
E.g.:
```
WARNING: topo: Waiting time is too small (count=1, wait=0.5), using default values (count=20, wait=3)
```
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
No need to do 'no set as-path exclude' to replace the current rule by
another. The code is supposed to support the replacement.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Check that the following does not cause a crash:
> route-map r2 permit 6
> set as-path exclude 65555
> set as-path exclude as-path-access-list NON-EXISTING
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
This allows retrying functions to possibly change their logging level
for diagnostics.
In order to maintain backward compatibility with this longstanding
function we catch the specific exception of it not being handled by the
retrying function and call again w/o the argument.
Signed-off-by: Christian Hopps <chopps@labn.net>
Enabe/fix using a munet.yaml config file for topology configuration.
Easier test writing.
This also uses the standard `frrinit.sh` to launch and teardown
FRR, so we actually test what most users use.
Signed-off-by: Christian Hopps <chopps@labn.net>
When trying to track down a MTYPE_TMP memory leak
it's harder to search for it when you happen to
have some usage of ttable_dump. Let's just give
it it's own memory type so that we can avoid
confusion in the future.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Fully check the NHRP convergence after setting nhs1 down. Otherwise the
ping may pass because the previous shortcut is still present.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
After setting down nhs1, the test is checking that nhc1 routing table
matches routes in nhc1/nhrp_route.json. It is incorrect because it
checks that the NHRP route to nhs1 is still present but it should have
disappeared.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Rename router variables in nhrp_redundancy to match the actual name.
Cosmetic change to help debugging.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
The expected prefix should be 5.5.5.0/24 otherwise the hosts behind NHRP
client 1 nhc1 (aka. r5) are not reachable via NHRP.
The issue was not seen in the FRR official CI because the tests were
skipped because iptables were missing in CI machines.
It solves the 16690 issue.
Fixes: https://github.com/FRRouting/frr/issues/16690
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Recent commits moved the default retries to 60, but
the higher ecmp counts were over-riding to 40. Let's
make it 80.
Noticed this when I went looking at failures on 386 platforms
in our ci. Route scale is timing out when deleting routes.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Test fails:
test_func = partial(
topotest.router_json_cmp,
router,
"show ip ospf vrf {0}-ospf-cust1 json".format(rname),
expected,
)
_, diff = topotest.run_and_expect(test_func, None, count=10, wait=0.5)
assertmsg = '"{}" JSON output mismatches'.format(rname)
> assert diff is None, assertmsg
E AssertionError: "r1" JSON output mismatches
E assert Generated JSON diff error report:
E
E > $->r1-ospf-cust1->areas->0.0.0.0->nbrFullAdjacentCounter: output has element with value '1' but in expected it has value '2'
/home/sharpd/frr2/tests/topotests/ospf_netns_vrf/test_ospf_netns_vrf.py:239: AssertionError
Support bundle has this data:
r1# show ip ospf vrf all neighbor
% 2024/08/28 14:55:54.763
VRF Name: r1-ospf-cust1
Neighbor ID Pri State Up Time Dead Time Address Interface RXmtL RqstL DBsmL
10.0.255.3 1 Full/DR 10.547s 39.456s 10.0.3.1 r1-eth1:10.0.3.2 0 0 0
10.0.255.2 1 Full/Backup 0.543s 38.378s 10.0.3.3 r1-eth1:10.0.3.2 1 0 0
So immediately after the test fails this test, the neighbor comes up.
Let's give the test a bit more time for failure to not happen
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Giving only 5 seconds to pass bgp data to peers on a heavily
loaded system is a recipe for not having fun. Add more time.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This test was killing bgp on r1 and r2
and then immediately testing that the
default route transitioned. Unfortunately
the test was written that under load the
system might be in a bad state. Let's
modify the code to check for a bgp version
change and then that the bgp state has
come back up
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Implement common code for debug config output and remove daemon-specific
code that is duplicated everywhere.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The debug library allows to register a `debug_set_all` callback which
should enable all debugs in a daemon. This callback is implemented
exactly the same in each daemon. Instead of duplicating the code, rework
the lib to allow registration of each debug type, and implement the
common code only once in the lib.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
a) A noprefix address by itself should not create a connected route.
This was pre-existing.
b) A noprefix address with a corresponding route should result in a
connected route. This is how NetworkManager appears to work.
This is new behavior, so a new test.
c) A route is added to the system from someone else.
This is new behavior, so a new test.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The dead timer is set to 4 seconds, while the hello interval is set to 6535.
This test will only pass of the platform is fast enough for ospfv3 to
converge in 4 seconds. These timers were already tested multiple time earlier.
This test should just make sure that the boundary value 65535 is configurable,
Other changes in this commit:
- add sequence numbers to the dead intervals tests to make it easier to
track test faliures.
- swap the config order in one test to match order with all other tests.
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
Current code adds a new vlan interface, sets up ospf and
pim on it and immediately starts shoving data down the pipes.
This of course has the fun thing where the IGP and pim do not
always come up in a nice neat manner and the test is looking
for state from a nice neat come up, even though pim is `working`
correctly it is not correct for what the test wants.
Modify the code to ensure that ospf is up and has propagated
the route where it is needed as well as that pim neighbors have
properly come up, then initiate the multicast streams and igmp
reports.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Changes:
- mutini: handle possible missed zombie cleanup leading to test hangs
- mutini: also we avoid logging in the signal handler which was causing
an exception.
Signed-off-by: Christian Hopps <chopps@labn.net>
All routes received by zebra from upper level protocols have a weight
of 1. Let's just make everything extremely consistent in our code.
Lot's of tests needed to be fixed up to make this work.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This test checks the bgp crash on rt2 when 2 commands
launched consequently:
T0: rr, config -> router bgp 65004 -> neighbor 192.168.12.2 password 8888
T1: rt2, snmpwalk -v 2c -c public 127.0.0.1 .1.3.6.1.4.1.7336.4.2.1
T2: test if rt2 bgp is crashed.
Signed-off-by: Dmytro Shytyi <dmytro.shytyi@6wind.com>
Copied the existing "join-group" test and modified to test
static groups instead. Functionally the same but without IGMP
reports.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
Upstream CI is frequently running into a situation where
the routes are not being installed. These routes
start at the beginning and suddenly in the middle
they start working properly.
D 1.0.15.183/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.184/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D 1.0.15.185/32 [150/0] via 192.168.0.1, r1-eth0 inactive, weight 1, 00:10:17
via 192.168.1.1, r1-eth1 inactive, weight 1, 00:10:17
D>* 1.0.15.186/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.187/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
* via 192.168.1.1, r1-eth1, weight 1, 00:10:17
D>* 1.0.15.188/32 [150/0] via 192.168.0.1, r1-eth0, weight 1, 00:10:17
Turning on some debugs showed that the failed installed routes are
trying to be matched against the default route. Thus implying
all the connected routes for the test are not yet successfully
installed. Let's modify the test(s) on startup to just ensure
that the connected routes are installed correctly. I am no
longer seeing the problem after this change.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The all_protocol_startup topotest needs to allow for some delay
between configuring nexthop-groups and their installation. Add
some wait periods in a couple of nhg test cases.
Signed-off-by: Mark Stapp <mjs@cisco.com>
Vtysh has been improved to startup very quickly this exposed a race in this
test, where the `clear ip rip...` command ran before the test client that
handles it had finished connecting to mgmtd. Add a retried check for the test
client being connected before issuing the `clear ip rip ...` test command.
Signed-off-by: Christian Hopps <chopps@labn.net>
Add some timers to make convergence happan as fast as possible
when a connection fails on the intial attempt.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add some timers to make the convergence happen as fast as possible
when a connection fails on the initial attempt.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When running bfd_bgp_cbit_topo3 and an intial connection
goes wrong, try to connect again as fast as possible as
that the timer is 2 minutes otherwise and the test will
never come back from it.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This test is frequently failing in the upstream CI. Most
log failures are stating that we expected something like
1 million routes but we have 900k+. Looks like the system
is just loaded a bit more than expected. Let's give these
tests a bit more time to complete.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Store a parsed and built graph of the CLI nodes in vtysh, rather than
parsing and building that graph every time vtysh starts up.
This provides a 3x to 5x reduction in vtysh startup overhead:
`vtysh -c 'configure' -c 'interface lo' -c 'do show version'`
- before: 92.9M cycles, 1114 samples
- after: 16.5M cycles, 330 samples
This improvement is particularly visible for users scripting `vtysh -c`
calls, which notably includes topotests.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Modify existing MSDP topology to use test SA filtering:
- Add new multicast host (so we get two sources for same group)
- Test group only filtering
- Test source / group filtering
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
So the test script is making changes to a vpn configuration by
changing something fundamental about the vpn. This is causing
a window where routes we are interested in are:
present ( from pre-change ) then
withdrawn ( the test change causes this ) then
present ( with the new data )
The test code was trying to test for this by checking
to see if the prefix was there, but due to timing issues
it's not always there when we look for it.
Modify the test to get the vpn table version prior to
the change( as that it should not be moving around ) and
then change the test for the prefix to look for a version
that is later than the vpn's table version. Then we know
that it is *after* everything has stabilized again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
So the test script is making changes to a vpn configuration by
changing something fundamental about the vpn. This is causing
a window where routes we are interested in are:
present ( from pre-change ) then
withdrawn ( the test change causes this ) then
present ( with the new data )
The test code was trying to test for this by checking
to see if the prefix was there, but due to timing issues
it's not always there when we look for it.
Modify the test to get the vpn table version prior to
the change( as that it should not be moving around ) and
then change the test for the prefix to look for a version
that is later than the vpn's table version. Then we know
that it is *after* everything has stabilized again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
a) Make timers more aggressive for this test
b) Double run_and_expect time for one sub test.
These two changes cause this test to pass regularly for
me when this test used to fail regularly for me.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Under some circumstances it might happen that the session is quickly UP in the
middle of `clear bgp ...` and `shutdown`. That leads to session be UP, and
the stale routes being cleared quickly.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
These failing tests are hard to track down. Added numbering to each assert
to easily tell where the test fails.
Signed-off-by: Nathan Bahr <nbahr@atcorp.com>
When enabling "mpls ldp-sync" under "router ospf" ospfd configures
SET_FLAG(ldp_sync_info->flags, LDP_SYNC_FLAG_IF_CONFIG) so internally knowing
that the ldp-sync feature is enabled. However the flag is not cleared when
turning of the feature using "nompls ldp-sync"!
https://github.com/FRRouting/frr/issues/16375
Signed-off-by: Christian Breunig <christian@breunig.cc>
The bgp_duplicate_nexthop test installs routes with nexthop's
flags set to both DUPLICATE and FIB: this should not happen.
The DUPLICATE flag of a nexthop indicates this nexthop is already
used in the same nexthop-group, and there is no need to install it
twice in the system; having the FIB flag set indicates that the
nexthop is installed in the system. This is why both flags should
not be set on the same nexthop.
This case happens at installation time, but can also happen
at update time.
- Fix this by not setting the FIB flag value when the DUPLICATE
flag is present.
- Modify the bgp_duplicate_test to check that the FIB flag is not
present on duplicated nexthops.
- Modify the bgp_peer_type_multipath_relax test.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
During the bgp_peer_type_multipath_relax_test, the test does not
check the 'duplicate' flag value of the duplicate nexthop.
Fix this by adding the duplicate value in the expected json files.
Fixes: ee88563ac2 ("bgpd: Add 'bgp bestpath peer-type multipath-relax'")
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
1. The lsp_update_data() function will check for the presence of the ISIS_TLV_DYNAMIC_HOSTNAME in the LSP, and then call isis_dynhn_insert() to add a hostname entry corresponding to the LSP ID. However, when the ISIS_TLV_DYNAMIC_HOSTNAME is not present in the LSP, the hostname entry corresponding to the LSP ID should also be deleted.
2. The command “show isis neighbor” invokes isis_adj_name() to display the System ID or hostname, but it does not check the area->dynhostname flag.
3. When the LSP expires and is removed, the corresponding hostname entry should also be deleted.
4. The TLV for LSP fragmentation will not contain the hostname and should be skipped.
Signed-off-by: zhou-run <zhou.run@h3c.com>
Since the displayed header of "show ip rip" and "show ipv6 ripng" are changed,
we should update tests of ripd and ripngd.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
In some cases (large scale) it's desired to avoid changing configurations, but
let the BGP to automatically handle ASN changes.
`auto` means the peering can be iBGP or eBGP. It will be automatically detected
and adjusted from the OPEN message.
Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
Current code when a link is set down is to just mark the
nexthop group as not properly setup. Leaving situations
where when an interface goes down and show output is
entered we see incorrect state. This is true for anything
that would be checking those flags at that point in time.
Modify the interface down nexthop group code to notice the
nexthops appropriately ( and I mean set the appropriate flags )
and to allow a `show ip route` command to actually display
what is going on with the nexthops.
eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
Known via "sharp", distance 150, metric 0, best
Last update 00:00:06 ago
* 192.168.44.33, via dummy1, weight 1
* 192.168.45.33, via dummy2, weight 1
sharpd@eva:~/frr1$ sudo ip link set dummy2 down
eva# show ip route 1.0.0.0
Routing entry for 1.0.0.0/32
Known via "sharp", distance 150, metric 0, best
Last update 00:00:12 ago
* 192.168.44.33, via dummy1, weight 1
192.168.45.33, via dummy2 inactive, weight 1
Notice now that the 1.0.0.0/32 route now correctly
displays the route for the nexthop group entry.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a new start option "-K" to libfrr to denote a graceful start,
and use it in zebra and bgpd.
zebra will use this option to denote a planned FRR graceful restart
(supporting only bgpd currently) to wait for a route sync completion
from bgpd before cleaning up old stale routes from the FIB. An optional
timer provides an upper-bounds for this cleanup.
bgpd will use this option to denote either a planned FRR graceful
restart or a bgpd-only graceful restart, and this will drive the BGP
GR restarting router procedures.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
The current OSPF neighbor retransmission operates on a single per-neighbor
periodic timer that sends all LSAs on the list when it expires.
Additionally, since it skips the first retransmission of received LSAs so
that at least the retransmission interval (resulting in a delay of between
the retransmission interval and twice the interval. In environments where
the links are lossy on P2MP networks with "delay-reflood" configured (which
relies on neighbor retransmission in partial meshs), the implementation
is sub-optimal (to say the least).
This commit reimplements OSPF neighbor retransmission as follows:
1. A new data structure making use the application managed
typesafe.h doubly linked list implements an OSPF LSA
list where each node includes a timestamp.
2. The existing neighbor LS retransmission LSDB data structure
is augmented with a pointer to the list node on the LSA
list to faciliate O(1) removal when the LSA is acknowledged.
3. The neighbor LS retransmission timer is set to the expiration
timer of the LSA at the top of the list.
4. When the timer expires, LSAs are retransmitted that within
the window of the current time and a small delta (50 milli-secs
default). The LSAs that are retransmited are given an updated
retransmission time and moved to the end of the LSA list.
5. Configuration is added to set the "retransmission-window" to a
value other than 50 milliseconds.
6. Neighbor and interface LSA retransmission counters are added
to provide insight into the lossiness of the links. However,
these will increment quickly on non-fully meshed P2MP networks
with "delay-reflood" configured.
7. Added a topotest to exercise the implementation on a non-fully
meshed P2MP network with "delay-reflood" configured. The
alternative was to use existing mechanisms to instroduce loss
but these seem less determistic in a topotest.
Signed-off-by: Acee Lindem <acee@lindem.com>
The atomlist test consists of a sequence of (MT) sub-tests, from which
counters are collected and verified. TSAN doesn't know that these
counters are synchronized by way of the sub-test starting and finishing,
so it complains. Just use atomics to get rid of the warning.
(This is solely an issue with the test, not the atomlist code. There
are no warnings from that.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
TSAN warns about leaving the second thread dangling. Doesn't really
matter, but just add a pthread_join to get rid of the warning.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
seqlock_bump() used to return the value before bumping, but that's
unhelpful if you were to actually need it. I had changed it to return
the value after, but the update to the test got lost at some point.
The return value is not in fact used anywhere in FRR, so while it is
a bug, it has zero impact.
NB: yes, test_seqlock is not run, which sounds wrong. The problem here
is that (a) the test itself uses sleeps and is timing sensitive, which
would raise false positives. And (b), the test is meaningless if
executed once. It needs to be run millions of times under various
conditions (e.g. load) to catch rare races, and it needs to be run on
machines with "odd" memory models (in this case I used BE ppc32 and
ppc64 systems as test platforms.)
Fixes: 6046b690b5 ("lib/seqlock: avoid syscalls in no-waiter cases")
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The command "show isis topology" calls print_sys_hostname() to display the system ID or hostname, but it does not check the area->dynhostname flag.
Signed-off-by: zhou-run <zhou.run@h3c.com>
Add a topotest that ensures that when addpath is enabled and two
paths with same nexthop are received, they are sent to ZEBRA which
detects 'duplicate nexthop'.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Some tests may want to use the json facility of iproute2 to
dump some results.
Add an internal API in lib/topotest.py that tells whether iproute2
is json capable or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>