Commit Graph

289 Commits

Author SHA1 Message Date
Donald Sharp
e16d030c65 *: Convert THREAD_XXX macros to EVENT_XXX macros
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
5f6eaa9b96 *: Convert a bunch of thread_XX to event_XX
Convert these functions:

thread_getrusage
thread_cmd_init
thread_consumed_time
thread_timer_to_hhmmss
thread_is_scheduled
thread_ignore_late_timer

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
907a2395f4 *: Convert thread_add_XXX functions to event_add_XXX
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
e6685141aa *: Rename struct thread to struct event
Effectively a massive search and replace of
`struct thread` to `struct event`.  Using the
term `thread` gives people the thought that
this event system is a pthread when it is not

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
8383d53e43
Merge pull request #12780 from opensourcerouting/spdx-license-id
*: convert to SPDX License identifiers
2023-02-17 09:43:05 -05:00
Stephen Worley
0bbad9d19a zebra: clang-format style fixes
clang-format style fixes

Signed-off-by: Stephen Worley <sworley@nvidia.com>
2023-02-13 18:12:05 -05:00
Stephen Worley
371298399e zebra: account for non-evpn ecmp
Account for non-evpn nexthops in ecmp groups when
doing the DVNI check.

Signed-off-by: Stephen Worley <sworley@nvidia.com>
2023-02-13 18:12:05 -05:00
Stephen Worley
b991a37262 zebra: nhg resolution handler for d-vni
Add code in the nhg resolution path for determining if Downstream
VNI is in play. This is the only place in all of zebra where
we should be arbitrarily setting the ifindex/labels since
this is where new nhgs are created/destroyed. If something
changes, it must happen here.

We determine if D-VNI is being used by matching the carried
label (VNI) on the nexthop with the vrf VNI from the route.
If they do not match, we can assume this is a D-VNI labeled
nexthop.

We loop through all of the group to see if any are D-VNI. If even
one is, we must treat them all as such. Otherwise, fallback to
traditional EVPN route handling and remove all the labels.

If they are going to be treated as D-VNI we retain the labels and
verify the underlying VRF vxlan interface is a Single VXlan Device.
If it is not, we cannot use D-VNI. If it is, continue on. The VNI label
will encapped via LWTUNNEL and sent to the kernel.

Signed-off-by: Stephen Worley <sworley@nvidia.com>
2023-02-13 18:12:05 -05:00
David Lamparter
acddc0ed3c *: auto-convert to SPDX License IDs
Done with a combination of regex'ing and banging my head against a wall.

Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
2023-02-09 14:09:11 +01:00
Donald Sharp
a98701f053 zebra: Add missing enums to switch statements
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-01-31 15:15:42 -05:00
Donald Sharp
75c87b7279 zebra: i declaration shadows other i declared
Clear up some confustion

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-01-26 11:40:33 -05:00
Siger Yang
c317d3f246
zebra: traffic control state management
This allows Zebra to manage QDISC, TCLASS, TFILTER in kernel and do cleaning
jobs when it starts up.

Signed-off-by: Siger Yang <siger.yang@outlook.com>
2022-11-22 22:35:35 +08:00
Donald Sharp
ca2b346783 *: Add ability to encode / decode resilence down zapi
At this point add abilty for the encode/decode of the
resilience down ZAPI to zebra.  Just hookup sharpd
at this point in time.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-11-04 13:34:27 -04:00
Donald Sharp
569e141113 lib, zebra: Add ability to encode/decode resilient nhg's
Add ability to read the nexthop group resilient linux
kernel data as well as write it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-11-04 13:29:36 -04:00
Donald Sharp
8d4665aabf zebra: Fix handling of recursive routes when processing closely in time
When zebra receives routes from upper level protocols it decodes the
zapi message and places the routes on the metaQ for processing.  Suppose
we have a route A that is already installed by some routing protocol.
And there is a route B that has a nexthop that will be recursively
resolved through A.  Imagine if a route replace operation for A is
going to happen from an upper level protocol at about the same time
the route B is going to be installed into zebra.  If these routes
are received, and decoded, at about the same time there exists a
chance that the metaQ will contain both of them at the same time.
If the order of installation is [ B, A ].  B will be resolved
correctly through A and installed, A will be processed and
re-installed into the FIB.  If the nexthops have changed for
A then the owner of B should be notified about the change( and B
can do the correct action here and decide to withdraw or re-install ).
Now imagine if the order of routes received for processing on the
metaQ is [ A, B ].  A will be received, processed and sent to the
dataplane for reinstall.  B will then be pulled off the metaQ and
fail the install since A is in a `not Installed` state.

Let's loosen the restriction in nexthop resolution for B such
that if the route we are dependent on is a route replace operation
allow the resolution to suceed.  This requires zebra to track a new
route state( ROUTE_ENTRY_ROUTE_REPLACING ) that can be looked at
during nexthop resolution.  I believe this is ok because A is
a route replace operation, which could result in this:
-route install failed, in which case B should be nht'ing and
will receive the nht failure and the upper level protocol should
remove B.
-route install succeeded, no nexthop changes.  In this case
allowing the resolution for B is ok, NHT will not notify the upper
level protocol so no action is needed.
-route install succeeded, nexthops changes.  In this case
allowing the resolution for B is ok, NHT will notify the upper
level protocol and it can decide to reinstall B or not based
upon it's own algorithm.

This set of events was found by the bgp_distance_change topotest(s).
Effectively the tests were looking for the bug ( A, B order in the metaQ )
as the `correct` state.  When under very heavy load, the A, B ordering
caused A to just be installed and fully resolved in the dataplane before
B is gotten to( which is entirely possible ).

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-10-26 15:06:23 -04:00
Donald Sharp
040a0e6d26 zebra: Fix debug of filtering out prefix due to routemap
The debug for notification about a filtered prefix was
just printing the nexthop ifindex and vrf id.  Not all
nexthops have this data.  Just print out the actual nexthop

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-10-20 07:43:45 -04:00
Siger Yang
449a30edf6
zebra: add tc netlink and dplane ops
This commit implements necessary netlink encoders for traffic control
including QDISC, TCLASS and TFILTER, and adds basic dplane operations.

Co-authored-by: Stephen Worley <sworley@nvidia.com>
Signed-off-by: Siger Yang <siger.yang@outlook.com>
2022-08-11 02:32:43 +08:00
Donald Sharp
a69b10c1e6 zebra: Cleanup unguarded debug
Left over debug from earlier commits

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-08-08 09:15:22 -04:00
Donald Sharp
0a5f9773a8 zebra: zrouter.in_shutdown is an atomic variable
So let's treat the variable like it is atomic and
properly load it when we need to look at it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-08-05 07:51:27 -04:00
Donald Sharp
d5795103bc zebra: Fix memory leaks and use after frees in nhg's on shutdown
Fixup both memory leaks as well as use after free's in nhg's
on shutdown.

This approach is effectively just iterating through all the
hash items and directly just freeing the memory instead
of handling ref counts or cross references.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-08-05 07:51:27 -04:00
Donald Sharp
34a67a7d1e zebra: When saving nhg for later stop processing
Commit 35729f38fa introduced the idea of
holding a nexthop group for a small amount of time
before removing it from the system.  When this code
was introduced the nexthop group entry was saved
and a timer started, except instead of stopping
processing at that point in time, zebra was
continuing on and deleting nexthop group entries
that that entry depended on as well.  This
should not be done until the timer pops.

Fixes: #11596
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-08-05 07:51:27 -04:00
Donald Sharp
9d1fec4c7e zebra: When deleting nexthop group entries ensure the thread is off
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-07-16 19:00:43 -04:00
Donald Sharp
f00b37e710 zebra: make rib_process_dplane_results own ctx freeing
The rib_process_dplane_results function was having each
sub function handler process the results and then
free the ctx.  Lot's of functionality that needs to remember
to free the context.  Let's just free it in the main loop.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-06-29 15:24:20 -04:00
Donald Sharp
fc3de981be zebra: Allow kernel routes to stick around better on interface state changes
Currently kernel routes on system bring up would be `auto-accepted`,
then if an interface went down all kernel and system routes would
be re-evaluated.  There exists situations where a kernel route can
exist but the interface itself is not exactly in a state that is
ready to create a connected route yet.  As such when any interface
goes down in the system all kernel/system routes would be re-evaluated
and then since that interfaces connected route is not in the table yet
the route is matching against a default route( or not at all ) and
is being dropped.

Modify the code such that kernel or system routes just look for interface
being in a good state (up or operative) and accept it.

Broken code:
eva# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [0/100] via 192.168.119.1, enp39s0, 00:05:08
K>* 1.2.3.5/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.6/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.7/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.8/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.9/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.10/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.11/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.12/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.13/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.14/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.16/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 1.2.3.17/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
C>* 4.5.6.99/32 is directly connected, dummy9, 00:05:08
K>* 4.9.10.11/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:05:08
K>* 10.11.12.13/32 [0/0] via 192.168.119.1, enp39s0, 00:05:08
C>* 192.168.10.0/24 is directly connected, dummy99, 00:05:08
C>* 192.168.119.0/24 is directly connected, enp39s0, 00:05:08
<shutdown a non-related interface>
eva# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [0/100] via 192.168.119.1, enp39s0, 00:05:28
C>* 4.5.6.99/32 is directly connected, dummy9, 00:05:28
K>* 10.11.12.13/32 [0/0] via 192.168.119.1, enp39s0, 00:05:28
C>* 192.168.10.0/24 is directly connected, dummy99, 00:05:28
C>* 192.168.119.0/24 is directly connected, enp39s0, 00:05:28

Working code:
eva# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [0/100] via 192.168.119.1, enp39s0, 00:00:04
K>* 1.2.3.5/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.6/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.7/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.8/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.9/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.10/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.11/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.12/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.13/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.14/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.16/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 1.2.3.17/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
C>* 4.5.6.99/32 is directly connected, dummy9, 00:00:04
K>* 4.9.10.11/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:04
K>* 10.11.12.13/32 [0/0] via 192.168.119.1, enp39s0, 00:00:04
C>* 192.168.10.0/24 is directly connected, dummy99, 00:00:04
C>* 192.168.119.0/24 is directly connected, enp39s0, 00:00:04
<shutdown a non-related interface>
eva# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [0/100] via 192.168.119.1, enp39s0, 00:00:15
K>* 1.2.3.5/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.6/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.7/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.8/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.9/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.10/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.11/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.12/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.13/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.14/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.16/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 1.2.3.17/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
C>* 4.5.6.99/32 is directly connected, dummy9, 00:00:15
K>* 4.9.10.11/32 [0/0] via 172.22.0.44, br-23e378ed7fd2 linkdown, 00:00:15
K>* 10.11.12.13/32 [0/0] via 192.168.119.1, enp39s0, 00:00:15
C>* 192.168.10.0/24 is directly connected, dummy99, 00:00:15
C>* 192.168.119.0/24 is directly connected, enp39s0, 00:00:15
eva#

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-06-23 12:22:30 -04:00
Donald Sharp
c9af62e314 zebra: Add a configurable knob zebra nexthop-group keep (1-3600)
Allow end operator to set how long a nexthop-group is kept around
in the system after it is no-longer being used.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-06-16 14:47:19 -04:00
Donald Sharp
35729f38fa zebra: Add a timer to nexthop group deletion
Before deleting nexthop groups, that are installed,
from the system, start a timer and hold the nexthop
group for that time.

Suppose you have this scenario

a) create a static route with 1 x ecmp
      creates a nhg with 1 x ecmp
b) create a static route with 2 x ecmp
      creates a nhg with 2 x ecmp
      deletes a's nhg
c) create a static route with 3 x ecmp
      creates a nhg with 3 x ecmp
      deletes b's nhg
d) create a different route with 1 x ecmp
      creates another 1 x ecmp ( since a's ecmp was deleted )
e) create a different route with 2 x ecmp
      creates another 2 x ecmp ( since b's ecmp was deleted )

If you don't delete the nhg, start a timer, the nhg's used
in steps a and b can be reused for steps d and e.  This reduces
overhead work with zebra <-> kernel interactions and improves
the speed of the system.

So modify the code to note that an installed nexthop group should
be kept around a bit and hopefully reused.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-06-16 14:47:19 -04:00
Donald Sharp
382858d015 zebra: Move where zebra marks a nhg as uninstalled in fib
Currently the code is marking the nhg as uninstalled but not
causing that to flood up to the dependent nhgs:

nhg 3 is a group of 1/2
   1 -> interface A
   2 -> interface B

Suppose A goes down, old code would mark nhg 1 as !VALID and !INSTALLED.
Suppose B then goes down, old code would mark nhg 2 as !VALID and !INSTALLED
But would not mark nhg 3 as !VALID and !INSTALLED (sort of assuming that
it would just be cleaned up by NHG refcounts ).  I would prefer that
the code is pedantic about nhg 3 actually being removed from the system.

This code moves the setting of !INSTALLED into zebra_nhg.c where it
really belongs.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-06-16 14:47:19 -04:00
Donald Sharp
68d188be7a zebra: Convert debugs to use %pNG
The nexthop group debugs were using %u to just display the id.
I found this very hard to figure out what was going on.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-06-14 20:25:56 -04:00
Donald Sharp
cc75cbea1b zebra: Add %pNG to zebra print routines
Add `%pNG` so that a nexthop group can be displayed in debugs/logs
such that it can provide useful information.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-06-14 20:25:56 -04:00
anlan_cs
8e3aae66ce *: remove the checking returned value for hash_get()
Firstly, *keep no change* for `hash_get()` with NULL
`alloc_func`.

Only focus on cases with non-NULL `alloc_func` of
`hash_get()`.

Since `hash_get()` with non-NULL `alloc_func` parameter
shall not fail, just ignore the returned value of it.
The returned value must not be NULL.
So in this case, remove the unnecessary checking NULL
or not for the returned value and add `void` in front
of it.

Importantly, also *keep no change* for the two cases with
non-NULL `alloc_func` -
1) Use `assert(<returned_data> == <searching_data>)` to
   ensure it is a created node, not a found node.
   Refer to `isis_vertex_queue_insert()` of isisd, there
   are many examples of this case in isid.
2) Use `<returned_data> != <searching_data>` to judge it
   is a found node, then free <searching_data>.
   Refer to `aspath_intern()` of bgpd, there are many
   examples of this case in bgpd.

Here, <returned_data> is the returned value from `hash_get()`,
and <searching_data> is the data, which is to be put into
hash table.

Signed-off-by: anlan_cs <vic.lan@pica8.com>
2022-05-03 00:41:48 +08:00
mobash-rasool
16b5065b47
Merge pull request #10908 from donaldsharp/proto_only_error
zebra: When `zebra nexthop proto only` limit errors
2022-04-19 21:27:29 +05:30
Donald Sharp
1cadfaf213 zebra: When zebra nexthop proto only limit errors
Operators are seeing:

Mar 28 07:19:37 kingpin zebra[418]: [TZANK-DEMSE] netlink_nexthop_msg_encode: nhg_id 68 (zebra): proto-based nexthops only, ignoring
Mar 28 07:19:37 kingpin zebra[418]: [TZANK-DEMSE] netlink_nexthop_msg_encode: nhg_id 68 (zebra): proto-based nexthops only, ignoring
Mar 28 07:19:37 kingpin zebra[418]: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_DELROUTE 2804:4d48:4000::/42 vrf 0(254)
Mar 28 07:19:37 kingpin zebra[418]: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 2804:4d48:4000::/42 vrf 0(254)
Mar 28 07:19:37 kingpin zebra[418]: [TVM3E-A8ZAG] _netlink_route_build_singlepath: (single-path): 2804:4d48:4000::/42 nexthop via fe80::b6fb:e4ff:fe26:c5d5  if 2 vrf default(0)
Mar 28 07:19:37 kingpin zebra[418]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=140, msg cnt=2
Mar 28 07:19:37 kingpin zebra[418]: [P2XBZ-RAFQ5][EC 4043309074] Failed to install Nexthop ID (68) into the kernel

When `zebra nexthop proto only` is turned on.

Effectively zebra intentionally does not do the nexthop group installation
and the dplane notification in zebra_nhg.c just assumes it was a failure
and prints an error message.  Since this act was intentional, let's
just notice that it was intentional and not report the message
as a failure.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-04-18 09:41:38 -04:00
Donald Sharp
c9e4abf81f zebra: Allow system routes to recurse through themselves
Currently if a end user has something like this:

Routing entry for 192.168.212.1/32
  Known via "kernel", distance 0, metric 100, best
  Last update 00:07:50 ago
  * directly connected, ens5

Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

K>* 0.0.0.0/0 [0/100] via 192.168.212.1, ens5, src 192.168.212.19, 00:00:15
C>* 192.168.212.0/27 is directly connected, ens5, 00:07:50
K>* 192.168.212.1/32 [0/100] is directly connected, ens5, 00:07:50

And FRR does a link flap, it refigures the route and rejects the default
route:

2022/04/09 16:38:20 ZEBRA: [NZNZ4-7P54Y] default(0:254):0.0.0.0/0: Processing rn 0x56224dbb5b00
2022/04/09 16:38:20 ZEBRA: [ZJVZ4-XEGPF] default(0:254):0.0.0.0/0: Examine re 0x56224dbddc20 (kernel) status: Changed Installed flags: Selected dist 0 metric 100
2022/04/09 16:38:20 ZEBRA: [GG8QH-195KE] nexthop_active_update: re 0x56224dbddc20 nhe 0x56224dbdd950 (7), curr_nhe 0x56224dedb550
2022/04/09 16:38:20 ZEBRA: [T9JWA-N8HM5] nexthop_active_check: re 0x56224dbddc20, nexthop 192.168.212.1, via ens5
2022/04/09 16:38:20 ZEBRA: [M7EN1-55BTH]         nexthop_active: Route Type kernel has not turned on recursion
2022/04/09 16:38:20 ZEBRA: [HJ48M-MB610]         nexthop_active_check: Unable to find active nexthop
2022/04/09 16:38:20 ZEBRA: [JPJF4-TGCY5] default(0:254):0.0.0.0/0: After processing: old_selected 0x56224dbddc20 new_selected 0x0 old_fib 0x56224dbddc20 new_fib 0x0

So the 192.168.212.1 route is matched for the nexthop but it is not connected and
zebra treats it as a problem.  Modify the code such that if a system route
matches through another system route, then it should work imo.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-04-09 13:17:14 -04:00
Donald Sharp
48dc861028 zebra: Allow multiple connected routes to be choosen for kernel routes
This bug should only really affect kernel routes.  To reproduce:

a) Have multiple connected routes that point to the same prefix
swp8  up      default         169.254.0.250/30
swp9  up      default         169.254.0.250/30

b) Have a kernel route that uses one of those connected routes
7.6.2.8 via 169.254.0.249 dev swp8 proto static
(But have it choose a non-selected connected nexthop)

c) Introduce an event that causes the rib table to be reprocessed,
say a unrelated interface going up / down

  This causes the route to be lost with this message:
2022/03/28 21:21:53 ZEBRA: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID (3454): 169.254.0.249, via swp8(1383) vrf default(0)
2022/03/28 21:21:53 ZEBRA: [YF2E6-J60JH] nexthop_active: 169.254.0.249, via swp8 given ifindex does not match nexthops ifindex found found: directly connected, swp9

Effectively the nexthop that zebra is choosing would not be the one
that the kernel route has choosen and FRR removes the route:
022/03/28 21:21:53 ZEBRA: [NM15X-X83N9] rib_process: (0:254):7.6.2.8/32: rn 0x56042e632e90, removing re 0x56042e6316e0
2022/03/28 21:21:53 ZEBRA: [Y53JX-CBC5H] rib_unlink: (0:254):7.6.2.8/32: rn 0x56042e632e90, re 0x56042e6316e0
2022/03/28 21:21:53 ZEBRA: [KT8QQ-45WQ0] rib_gc_dest: (0:?):7.6.2.8/32: removing dest from table

What is happening?

Zebra is not looking at all connected routes and if any of them
would have the appropriate ifindex and just blindly rejecting
the route.

So when nexthop resolution happens and it matches a connected
route and the dest->selected nexthop ifindex does not match, let's sort
through the rest of them and see if any of them match and if so
let's keep the route.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-04-08 08:15:20 -04:00
Stephen Worley
5d41413833 zebra: add support for protodown reason code
Add support for setting the protodown reason code.

829eb208e8

These patches handle all our netlink code for setting the reason.

For protodown reason we only set `frr` as the reason externally
but internally we have more descriptive reasoning available via
`show interface IFNAME`. The kernel only provides a bitwidth of 32
that all userspace programs have to share so this makes the most sense.

Since this is new functionality, it needs to be added to the dplane
pthread instead. So these patches, also move the protodown setting we
were doing before into the dplane pthread. For this, we abstract it a
bit more to make it a general interface LINK update dplane API. This
API can be expanded to support gernal link creation/updating when/if
someone ever adds that code.

We also move a more common entrypoint for evpn-mh and from zapi clients
like vrrpd. They both call common code now to set our internal flags
for protodown and protodown reason.

Also add debugging code for dumping netlink packets with
protodown/protodown_reason.

Signed-off-by: Stephen Worley <sworley@nvidia.com>
2022-03-09 17:52:44 -05:00
anlan_cs
3f04f9cf24 zebra: let /32 host route with same IP cross VRF
Contraints of host routes are too strict in current code:
Host routes with same destination address and nexthop address are forbidden
even when cross VRFs.

Currently host routes with different destination and nexthop address can cross
VRFs, it is ok. But host routes with same addresses are forbidden to cross VRFs,
it is wrong.

Since different VRFs can have the same addresses, leak specific host route with
the same nexthop address ( it means destination address is same to nexthop
address ) to other VRFs is a normal case.

This commit relaxes that contraints. Host routes with same destination address
and nexthop address are forbidden only when not cross VRFs.

Signed-off-by: anlan_cs <vic.lan@pica8.com>
2022-03-09 07:22:11 +08:00
Donald Sharp
45dafca86c zebra: Use the routes vrf not the vrf of the nexthop for route-map application
When a end operator is doing cross vrf imports in bgp:

router bgp 3239 vrf FOO
  address-family ipv4 uni
    import vrf BAR
!

and zebra has this configuration:

vrf FOO
  ip protocol bgp route-map EVA
!

The current code in zebra_nhg.c was looking up the vrf of the
nexthop and attempting to apply the ip protocol route-map.

For most people the nexthop vrf and the re vrf are one and the
same so they never see a problem.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-02-28 13:08:01 -05:00
Mark Stapp
728f2017ae zebra: add dplane type for NETCONF data
Add a new dplane op for interface NETCONF data; add the new
enum value to several switch statements.

Signed-off-by: Mark Stapp <mstapp@nvidia.com>
2022-02-25 09:53:02 -05:00
Donald Sharp
81ef8a69ae zebra: Use AF_UNSPEC instead of setting to 0
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-02-07 13:22:41 -05:00
Donald Sharp
07b9ebca65 zebra: Ensure zebra_nhg_sweep_table accounts for double deletes
I'm seeing this crash in various forms:
Program terminated with signal SIGSEGV, Segmentation fault.
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f418efbc7c0 (LWP 3580253))]
(gdb) bt
(gdb) f 4
267 (*func)(hb, arg);
(gdb) p hb
$1 = (struct hash_bucket *) 0x558cdaafb250
(gdb) p *hb
$2 = {len = 0, next = 0x0, key = 0, data = 0x0}
(gdb)

I've also seen a crash where data is 0x03.

My suspicion is that hash_iterate is calling zebra_nhg_sweep_entry which
does delete the particular entry we are looking at as well as possibly other
entries when the ref count for those entries gets set to 0 as well.

Then we have this loop in hash_iterate.c:

   for (i = 0; i < hash->size; i++)
            for (hb = hash->index[i]; hb; hb = hbnext) {
                    /* get pointer to next hash bucket here, in case (*func)
                     * decides to delete hb by calling hash_release
                     */
                    hbnext = hb->next;
                    (*func)(hb, arg);
            }
Suppose in the previous loop hbnext is set to hb->next and we call
zebra_nhg_sweep_entry. This deletes the previous entry and also
happens to cause the hbnext entry to be deleted as well, because of nhg
refcounts. At this point in time the memory pointed to by hbnext is
not owned by the pthread anymore and we can end up on a state where
it's overwritten by another pthread in zebra with data for other incoming events.

What to do?  Let's change the sweep function to a hash_walk and have
it stop iterating and to start over if there is a possible double
delete operation.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2022-02-04 12:05:38 -05:00
Donald Sharp
5b311cf18d
Merge pull request #9052 from mjstapp/dplane_incoming_dev
zebra: Move incoming netlink interface address change events to the dplane pthread
2021-09-21 10:51:37 -04:00
Mark Stapp
9d59df634c zebra: add new dplane op codes for interface addr events
Add new dplane op values for incoming interface address add
and delete events.

Signed-off-by: Mark Stapp <mjs.ietf@gmail.com>
2021-09-14 11:07:30 -04:00
Ryoga Saito
24b3c59c2d zebra: copy nexthop_srv6 in nexthop_set_resolved
Current implementation doesn't copy nexthop_srv6. This causes unexpected
behavior when receiving SID information and nexthop isn't onlink.t

Signed-off-by: Ryoga Saito <contact@proelbtn.com>
2021-09-10 22:30:00 +00:00
Donald Sharp
f2595bd505 zebra: Convert to struct zebra_nhlfe as per our internal standard
We do not use typedef's to talk about structures as per our standard.
Fixing.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2021-09-02 10:33:23 -04:00
Stephen Worley
bf157b9263 zebra: fix ifp pointer for groups/recursives
At some point we broke the ifp pointer for nhe->ifp such
that it was pointing to an interface even in groups/recurisve
instances.

Add checks here to make it again so that we only set the ifp
pointer if it is a fully resolved singleton NHE.

Signed-off-by: Stephen Worley <sworley@nvidia.com>
2021-07-15 11:24:24 -04:00
Donatas Abraitis
8643c2e5f7 *: Replace 4/16 integers to IPV4_MAX_BYTELEN/IPV6_MAX_BYTELEN
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-07-01 23:54:39 +03:00
Donatas Abraitis
12256b84a5 *: Convert numeric 32 into IPV4_MAX_BITLEN for prefixlen
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-07-01 23:50:39 +03:00
Donatas Abraitis
13ccce6e7e *: Convert numeric 128 into IPV6_MAX_BITLEN for prefixlen
Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-07-01 17:53:21 +03:00
Donatas Abraitis
936fbaef47 *: Replace IPV4_MAX_PREFIXLEN to IPV4_MAX_BITLEN
Just drop IPV4_MAX_PREFIXLEN at all, no need keeping both.

Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-07-01 17:44:09 +03:00
Donatas Abraitis
f4d81e5507 *: Replace IPV6_MAX_PREFIXLEN to IPV6_MAX_BITLEN
Just drop IPV6_MAX_PREFIXLEN at all, no need keeping both.

Signed-off-by: Donatas Abraitis <donatas.abraitis@gmail.com>
2021-07-01 17:41:09 +03:00