mirror_frr/tests/topotests/all_protocol_startup
Donald Sharp a272a2b364 zebra: Allow longer prefix matches for nexthops
Zebra currently does a shortest prefix match for
resolving nexthops for a prefix.  This is typically
an ok thing to do but fails in several specific scenarios.
If a nexthop matches to a route that is not usable, nexthop
resolution just gives up and refuses to use that particular
route.  For example if zebra currently has a covering prefix
say a 10.0.0.0/8.  And about the same time it receives a
10.1.0.0/16 ( a more specific than the /8 ) and another
route A, who's nexthop is 10.1.1.1.  Imagine the 10.1.0.0/16
is processed enough to know we want to install it and the
prefix is sent to the dataplane for installation( it is queued )
and then route A is processed, nexthop resolution will fail
and the route A will be left in limbo as uninstallable.

Let's modify the nexthop resolution code in zebra such that
if a nexthop's most specific match is unusable, continue looking
up the table till we get to the 0.0.0.0/0 route( if it's even
installed ).  If we find a usable route for the nexthop accept
it and use it.

The bgp_default_originate topology test is frequently failing
with this exact problem:

B>* 0.0.0.0/0 [200/0] via 192.168.1.1, r2-r1-eth0, weight 1, 00:00:21
B   1.0.1.17/32 [200/0] via 192.168.0.1 inactive, weight 1, 00:00:21
B>* 1.0.2.17/32 [200/0] via 192.168.1.1, r2-r1-eth0, weight 1, 00:00:21
C>* 1.0.3.17/32 is directly connected, lo, 00:02:00
B>* 1.0.5.17/32 [20/0] via 192.168.2.2, r2-r3-eth1, weight 1, 00:00:32
B>* 192.168.0.0/24 [200/0] via 192.168.1.1, r2-r1-eth0, weight 1, 00:00:21
B   192.168.1.0/24 [200/0] via 192.168.1.1 inactive, weight 1, 00:00:21
C>* 192.168.1.0/24 is directly connected, r2-r1-eth0, 00:02:00
C>* 192.168.2.0/24 is directly connected, r2-r3-eth1, 00:02:00
B>* 192.168.3.0/24 [20/0] via 192.168.2.2, r2-r3-eth1, weight 1, 00:00:32
B   198.51.1.1/32 [200/0] via 192.168.0.1 inactive, weight 1, 00:00:21
B>* 198.51.1.2/32 [20/0] via 192.168.2.2, r2-r3-eth1, weight 1, 00:00:32

Notice that the 1.0.1.17/32 route is inactive but the nexthop
192.168.0.1 is covered by both the 192.168.0.0/24 prefix( shortest match )
*and* the 0.0.0.0/0 route ( longest match ).  When looking at the logs
the 1.0.1.17/32 route was not being installed because the matching
route was not in a usable state, which is because the 192.168.0.0/24
route was in the process of being installed.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-10-23 08:15:11 -04:00
..
r1 zebra: Show NHT resolve via default status on/off 2023-08-17 21:45:55 +03:00
test_all_protocol_startup.dot tests: Unify directory naming for topotests 2021-05-11 14:14:26 +03:00
test_all_protocol_startup.pdf tests: Unify directory naming for topotests 2021-05-11 14:14:26 +03:00
test_all_protocol_startup.py zebra: Allow longer prefix matches for nexthops 2023-10-23 08:15:11 -04:00