Commit Graph

5580 Commits

Author SHA1 Message Date
Rajasekar Raja
cd0558e629 zebra: fix nhe refcnt when frr service goes down
When frr.service is going down(restart or stop),
zebra core can be seen.

Sequence of events leading to crash:
Increments of nhe refcnt:
  - Upper level creates a new nhe(say NHE1) —> nhe->refcnt=1
  - Two RE’s (Say RE1 & RE2) associate with NHE1 —> nhe->refcnt = 3

Decrements of nhe refcnt:
  - BGP sends a zapi msg to zebra to delete NHG. —> nhe->refcnt = 2
  - RE1 is queued for delete in META-Q
  - As zebra is dissociating with its clients, zebra_nhg_score_proto() is
     invoked -> nhe->refcnt=1
  - RE2 is no more associated with the NHE1 —>nhe->refcnt=0 &
     hence NHE IS FREED
  - Now RE1 is dequeued from META-Q for processing the re delete. —> At
     this point re->nhe is pointing to freed pointer. CRASH CRASH!!!!

Fix:
  - When we iterate zebra_nhg_score_proto_entry() to delete the upper
     proto specific nhe’s,  we need to skip the additional nhe->refcnt
     decrement in case nhe->flags has NEXTHOP_GROUP_PROTO_RELEASED set.

Backtrace-1
0x00007fa8449ce8eb in raise () from /lib/x86_64-linux-gnu/libc.so.6
0x00007fa8449b9535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
0x00007fa844d32f86 in _zlog_assert_failed (xref=xref@entry=0x55fa37871040 <_xref.28142>, extra=extra@entry=0x0) at lib/zlog.c:680
0x000055fa3778f770 in rib_re_nhg_free (re=0x55fa39e33770) at zebra/zebra_rib.c:2578
rib_unlink (rn=0x55fa39e27a60, re=0x55fa39e33770) at zebra/zebra_rib.c:3930
0x000055fa3778ff18 in rib_process (rn=0x55fa39e27a60) at zebra/zebra_rib.c:1439
0x000055fa37790b1c in process_subq_route (qindex=8 '\b', lnode=0x55fa39e1c1b0) at zebra/zebra_rib.c:2549
process_subq (qindex=META_QUEUE_BGP, subq=0x55fa3999c580) at zebra/zebra_rib.c:3107
meta_queue_process (dummy=<optimized out>, data=0x55fa3999c480) at zebra/zebra_rib.c:3146
0x00007fa844d232b8 in work_queue_run (thread=0x7ffffbdf6cb0) at lib/workqueue.c:285
0x00007fa844d195fd in thread_call (thread=thread@entry=0x7ffffbdf6cb0) at lib/thread.c:2008
0x00007fa844cd3888 in frr_run (master=0x55fa397b7630) at lib/libfrr.c:1223
0x000055fa3771e294 in main (argc=12, argv=0x7ffffbdf7098) at zebra/main.c:526

Backtrace-2
0x00007f125af3f535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
0x00007f125b2b8f96 in _zlog_assert_failed (xref=xref@entry=0x7f125b344260 <_xref.18768>, extra=extra@entry=0x0) at lib/zlog.c:680
0x00007f125b268190 in nexthop_copy_no_recurse (copy=copy@entry=0x5606dd726f10, nexthop=nexthop@entry=0x7f125b0d7f90, rparent=<optimized out>) at lib/nexthop.c:806
0x00007f125b2681b2 in nexthop_copy (copy=0x5606dd726f10, nexthop=0x7f125b0d7f90, rparent=<optimized out>) at lib/nexthop.c:836
0x00007f125b268249 in nexthop_dup (nexthop=nexthop@entry=0x7f125b0d7f90, rparent=rparent@entry=0x0) at lib/nexthop.c:860
0x00007f125b26b67b in copy_nexthops (tnh=tnh@entry=0x5606dd9ec748, nh=<optimized out>, rparent=rparent@entry=0x0) at lib/nexthop_group.c:457
0x00007f125b26b6ba in nexthop_group_copy (to=to@entry=0x5606dd9ec748, from=from@entry=0x5606dd9ee9f8) at lib/nexthop_group.c:291
0x00005606db6ec678 in zebra_nhe_copy (orig=0x5606dd9ee9d0, id=id@entry=0) at zebra/zebra_nhg.c:431
0x00005606db6ddc63 in mpls_ftn_uninstall_all (zvrf=zvrf@entry=0x5606dd6e7cd0, afi=afi@entry=2, lsp_type=ZEBRA_LSP_NONE) at zebra/zebra_mpls.c:3410
0x00005606db6de108 in zebra_mpls_cleanup_zclient_labels (client=0x5606dd8e03b0) at ./zebra/zebra_mpls.h:471
0x00005606db73e575 in hook_call_zserv_client_close (client=0x5606dd8e03b0) at zebra/zserv.c:566
zserv_client_free (client=0x5606dd8e03b0) at zebra/zserv.c:585
zserv_close_client (client=0x5606dd8e03b0) at zebra/zserv.c:706
0x00007f125b29f60d in thread_call (thread=thread@entry=0x7ffc2a740290) at lib/thread.c:2008
0x00007f125b259888 in frr_run (master=0x5606dd3b7630) at lib/libfrr.c:1223
0x00005606db68d298 in main (argc=12, argv=0x7ffc2a740678) at zebra/main.c:534

Issue: 3492031

Ticket# 3492031

Signed-off-by: Rajasekar Raja <rajasekarr@nvidia.com>
2023-07-26 21:17:16 +00:00
mobash-rasool
49f0484113
Merge pull request #14064 from donaldsharp/pim_cleanup
Cleanup from examining gcov runs
2023-07-26 21:33:29 +05:30
Russ White
3f043d027f
Merge pull request #14050 from LabNConsulting/ziemba-pbr-zapi-common
pbrd: 2/3 zapi PBR common encode/decode
2023-07-25 10:55:50 -04:00
Russ White
260b6123cb
Merge pull request #14080 from anlancs/fix/zebra-nhg-reinstall
zebra: fix nhg out of sync between zebra and kernel
2023-07-25 10:23:10 -04:00
Mark Stapp
adca5c22c5 * : include event ptr in event_execute api
Include an event ptr-to-ptr in the event_execute() api
call, like the various schedule api calls. This allows the
execute() api to cancel an existing scheduled task if that
task is being executed inline.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-07-25 10:17:48 -04:00
Russ White
82d8e7d5fa
Merge pull request #13945 from pguibert6WIND/redistribute_isis_table
Redistribute isis table
2023-07-25 10:16:46 -04:00
anlan_cs
90bc24408b zebra: add several fields for debug
Two changes for debug:
1. Add a field to indicate its vrf for nexthop.  When the interface changes
vrf, we can't easily know the vrf of this nexthop according to current log.
2. Add a field to indicate operation type.  We can't know whether to add or
remove route according to current log.

Before:
```
zebra_nhg_increment_ref: nhe 0x555623eb82c0 (76[if 6]) 0 => 1
zebra_interface_nhg_reinstall install nhe 75[77.75.1.75 if 6] nh type 3 flags 0x1
Route 77.75.1.0/24(8) queued for processing into sub-queue Early Route Processing
Route 77.75.1.0/24(8) queued for processing into sub-queue Early Route Processing
```

After:
```
zebra_nhg_increment_ref: nhe 0x555623eb82c0 (76[if 6 vrfid 9]) 0 => 1
zebra_interface_nhg_reinstall install nhe 75[77.75.1.75 if 6 vrfid 8] nh type 3 flags 0x1
Route 77.75.1.0/24(8) (add) queued for processing into sub-queue Early Route Processing
Route 77.75.1.0/24(8) (delete) queued for processing into sub-queue Early Route Processing
```

Signed-off-by: anlan_cs <anlan_cs@tom.com>
2023-07-25 14:23:35 +08:00
anlan_cs
045df14427 zebra: fix nhg out of sync between zebra and kernel
PR#13413 introduces reinstall mechanism, but there is problem with the route
leak scenario.

With route leak configuration: ( `x1` and `x2` are binded to `vrf1` )
```
vrf vrf2
 ip route 75.75.75.75/32 77.75.1.75 nexthop-vrf vrf1
 ip route 75.75.75.75/32 77.75.2.75 nexthop-vrf vrf1
exit-vrf
```

Firstly, all are ok.  But after `x1` is set down and up ( The interval
between the down and up operations should be less than 180 seconds. ) ,
`x1` is lost from the nexthop group:
```
anlan# ip nexthop
id 121 group 122/123 proto zebra
id 122 via 77.75.1.75 dev x1 scope link proto zebra
id 123 via 77.75.2.75 dev x2 scope link proto zebra
anlan# ip route show table 2
75.75.75.75 nhid 121 proto 196 metric 20
        nexthop via 77.75.1.75 dev x1 weight 1
        nexthop via 77.75.2.75 dev x2 weight 1
anlan# ip link set dev x1 down
anlan# ip link set dev x1 up
anlan# ip route show table 2 <- Wrong, one nexthop lost from group
75.75.75.75 nhid 121 via 77.75.2.75 dev x2 proto 196 metric 20
anlan# ip nexthop
id 121 group 123 proto zebra
id 122 via 77.75.1.75 dev x1 scope link proto zebra
id 123 via 77.75.2.75 dev x2 scope link proto zebra
anlan# show ip route vrf vrf2 <- Still ok
VRF vrf2:
S>* 75.75.75.75/32 [1/0] via 77.75.1.75, x1 (vrf vrf1), weight 1, 00:00:05
  *                      via 77.75.2.75, x2 (vrf vrf1), weight 1, 00:00:05
```

From the impact on kernel:
The `nh->type` of `id 122` is *always* `NEXTHOP_TYPE_IPV4` in the route leak
case.  Then, `nexthop_is_ifindex_type()` introduced by commit `5bb877` always
returns `false`, so its dependents can't be reinstalled.  After `x1` is down,
there is only `id 123` in the group of `id 121`.  So, Finally `id 121` remains
unchanged after `x1` is up, i.e., `id 122` is not added to the group even it is
reinstalled itself.

From the impact on zebra:
The `show ip route vrf vrf2` is still ok because the `id`s are reused/reinstalled
successfully within 180 seconds after `x1` is down and up.  The group of `id 121`
is with old `NEXTHOP_GROUP_INSTALLED` flag, and it is still the group of `id 122`
and `id 123` as before.

In this way, kernel and zebra have become out of sync.

The `nh->type` of `id 122` should be adjusted to `NEXTHOP_TYPE_IPV4_IFINDEX`
after nexthop resolved.  This commit is for doing this to make that reinstall
mechanism work.

Signed-off-by: anlan_cs <anlan_cs@tom.com>
2023-07-24 18:00:16 +08:00
Sindhu Parvathi Gopinathan
fadf87f358 zebra: non pretty json output for evpn route
Currently, json output of evpn route command are no pretty format.
This is an extremely expensive operation at high VNI scale

EVPN json non-pretty command support added:

```
show evpn mac vni <vni-id> detail json
show evpn vni detail json
```

Ticket:#3513256
Issue:3513256

Testing: UT done

Signed-off-by: Sindhu Parvathi Gopinathan's <sgopinathan@nvidia.com>
2023-07-21 10:15:25 -07:00
Sindhu Parvathi Gopinathan
1c67c0951b zebra: non pretty json output for show ip route
Currently, json output of show ip route command are no pretty format.
This is an extremely expensive operation at high scale
(with high number of routes with many paths).

Zebra json non-pretty command support added:

```
show ip route json
```

Ticket:#3513256

Issue:3513256

Testing: UT done

Signed-off-by: Sindhu Parvathi Gopinathan's <sgopinathan@nvidia.com>
2023-07-21 10:15:11 -07:00
Donald Sharp
ada7353089 zebra: Remove unused functionality
The nl_rta_putXXX functions are never used.  Let's just remove them.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-21 07:31:04 -04:00
G. Paul Ziemba
580a98b798 lib: zapi PBR common encode/decode
bgpd, pbrd: use common pbr encoder
    zebra: use common pbr decoder
    tests: pbr_topo1: check more filter fields

    Purpose:
	1. Reduce likelihood of zapi format mismatches when adding
	   PBR fields due to multiple parallel encoder implementations
	2. Encourage common PBR structure usage among various daemons
	3. Reduce coding errors via explicit per-field enable flags

Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2023-07-20 08:10:45 -07:00
Donald Sharp
1b1d256f03
Merge pull request #14026 from LabNConsulting/pbr-add-vlan-filters
pbrd: 1/3: add vty support for vlan filtering and send to zebra
2023-07-20 08:01:24 -04:00
Donatas Abraitis
698d53bf58
Merge pull request #14055 from guoguojia2021/route_lock
zebra:unlock node after route_next
2023-07-20 10:06:47 +03:00
G. Paul Ziemba
657882c430 pbrd: add vlan filters pcp/vlan-id/vlan-flags; ip-protocol any (zebra dplane)
Subset: zebra dataplane

    Add new vlan filter fields. No kernel dataplane
    implementation yet (linux does not support).

    Changes by:
	Josh Werner <joshuawerner@mitre.org>
	Eli Baum <ebaum@mitre.org>
	G. Paul Ziemba <paulz@labn.net>

Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2023-07-19 08:15:15 -07:00
G. Paul Ziemba
dbade07e0e pbrd: add vlan filters pcp/vlan-id/vlan-flags; ip-protocol any (zapi)
Subset: ZAPI changes to send the new data

    Also adds filter_bm field; currently for PBR_FILTER_PCP, but in the
    future to be used for all of the filter fields.

    Changes by:
	Josh Werner <joshuawerner@mitre.org>
	Eli Baum <ebaum@mitre.org>
	G. Paul Ziemba <paulz@labn.net>

Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2023-07-19 08:14:49 -07:00
Jack.zhang
a53159c8db zebra:fix a zebra crash issue caused by mac change
When the MAC address of the neighbor changes, a possible crash issue may occur.

In the zebra_evpn_local_neigh_update function, the value of old_zmac (n->mac) will be updated to the new MAC address when the neighbor's MAC address changes.
The pointer to the memory that this pointer points to may be released in the zebra_evpn_local_neigh_deref_mac function. This will cause old_zmac to become a dangling pointer. Accessing this dangling pointer in the zebra_evpn_ip_inherit_dad_from_mac function below will cause the zebra process to crash.

Here is the backtrace:
(gdb) bt
0  0x00007fc12c5f1fbf in raise () from /lib/x86_64-linux-gnu/libpthread.so.0
1  0x00007fc12d52e19c in core_handler (signo=11, siginfo=0x7ffda1fd1570, context=<optimized out>) at lib/sigevent.c:262
2  <signal handler called>
3  zebra_evpn_ip_inherit_dad_from_mac (zvrf=<optimized out>, old_zmac=0x5579ac3ca520, new_zmac=0x5579aba82f80, nbr=0x5579abd65ec0) at zebra/ze
4  0x00005579aa8dbf6d in zebra_evpn_local_neigh_update (zevpn=0x5579abb81440, ifp=ifp@entry=0x5579ab8a1640, ip=ip@entry=0x7ffda1fd1b40, macadd
   local_inactive=local_inactive@entry=253, dp_static=false) at zebra/zebra_evpn_neigh.c:1729
5  0x00005579aa9190a9 in zebra_vxlan_handle_kernel_neigh_update (ifp=ifp@entry=0x5579ab8a1640, link_if=link_if@entry=0x5579abd14f90, ip=ip@ent
   is_ext=is_ext@entry=false, is_router=<optimized out>, local_inactive=false, dp_static=false) at zebra/zebra_vxlan.c:3791
6  0x00005579aa8b3048 in netlink_ipneigh_change (h=0x7ffda1fd1d50, len=<optimized out>, ns_id=<optimized out>) at zebra/rt_netlink.c:3649
7  0x00005579aa8ac667 in netlink_parse_info (filter=filter@entry=0x5579aa8ab630 <netlink_information_fetch>, nl=nl@entry=0x5579ab5861e8, zns=z
   startup=startup@entry=0) at zebra/kernel_netlink.c:965
8  0x00005579aa8ac8c8 in kernel_read (thread=<optimized out>) at zebra/kernel_netlink.c:402
9  0x00007fc12d53e60b in thread_call (thread=thread@entry=0x7ffda1fd9fd0) at lib/thread.c:1834
10 0x00007fc12d4fba78 in frr_run (master=0x5579ab3a1740) at lib/libfrr.c:1155
11 0x00005579aa89c6e3 in main (argc=11, argv=0x7ffda1fda3c8) at zebra/main.c:485
(gdb) f 3
3  zebra_evpn_ip_inherit_dad_from_mac (zvrf=<optimized out>, old_zmac=0x5579ac3ca520, new_zmac=0x5579aba82f80, nbr=0x5579abd65ec0) at zebra/ze
1230	zebra/zebra_evpn_neigh.c: No such file or directory.
(gdb) p *old_zmac
Cannot access memory at address 0x5579ac3ca520
(gdb)

To fix this issue, the ZEBRA_MAC_DUPLICATE flag should be retrieved before old_zmac is released and used in the zebra_evpn_ip_inherit_dad_from_mac function.

Signed-off-by: Jack.zhang <hanyu.zly@alibaba-inc.com>
2023-07-19 22:03:54 +08:00
guozhongfeng
88ff576f86 zebra:unlock node after route_next
When route_next return node, it has lock the node. if return or break loop, should unlock node.
Signed-off-by: guozhongfeng <guozhongfeng.gzf@alibaba-inc.com>
2023-07-19 19:39:22 +08:00
Quentin Young
712e40c409
Merge pull request #11831 from anlancs/fix/cleanup-default
zebra: remove unnecessary check for default vrf
2023-07-18 15:15:39 +00:00
Donatas Abraitis
ef87237121
Merge pull request #14033 from donaldsharp/zebra_same_route
Zebra same route
2023-07-18 10:37:15 +03:00
Donald Sharp
788cf6e892
Merge pull request #14025 from guoguojia2021/guozhongfeng_alibaba
zebra: The command ipv6 nht xxx not work
2023-07-17 14:27:56 -04:00
Donald Sharp
af80201876 zebra: Further handle route replace semantics
When an upper level protocol is installing a route X that needs to be
route replaced and at the same time the same or another protocol installs a
different route that depends on route X for nexthop resolution can leave
us with a state where the route is not accepted because zebra is still
really early in the route replace semantics ( route X is still on the work
Queue to be processed ) then the dependent route would not be installed.
This came up in the bgp_default_originate test cases frequently.

Further extendd the ROUTE_ENTR_ROUTE_REPLACING flag to cover this case
as well.  This has come up because the early route processing queueing
that was implemented late last year.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-17 10:00:32 -04:00
guozhongfeng
1193611f8e zebra: The command ipv6 nht xxx not work
If the command is ipv6 nht protocol route-map rmap, this parameter should use AFI_IP6

Signed-off-by: guozhongfeng <guozhongfeng.gzf@alibaba-inc.com>
2023-07-16 17:52:31 +08:00
anlan_cs
a99521a26f zebra: Fix wrong vrf change procedure
Currently the vrf change procedure for the deleted interface is after
its deletion, it causes problem for upper daemons.

Here is the problem of `bgp`:

After deletion of one **irrelevant** interface in the same vrf, its
`ifindex` is set to 0. And then, the vrf change procedure will send
"ZEBRA_INTERFACE_DOWN" to `bgpd`.

Normally, `bgp_nht_ifp_table_handle()` should igore this message for
no correlation. However, it wrongly matched `ifindex` of 0, and removed
the related routes for the down `bnc`.

Adjust the location of the vrf change procedure to fix this issue.

Signed-off-by: anlan_cs <vic.lan@pica8.com>
2023-07-13 15:25:31 +08:00
Philippe Guibert
492b93bea0 zebra: fix imported static routes deletion
When unconfiguring 'no import <table>', a static route imported
from a routing table number is never deleted.

When importing a route from a given table, a default distance of
15 is applied. At the time of deletion, when trying to compare
the original route with the new one, the distance does not match,
because the static route applies a default distance of 1.

If the imported route has the distance set, unset the distance
flag to avoid comparing it.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-07-12 14:06:00 +02:00
anlan_cs
f8d94e8a62 zebra: remove unnecessary check for default vrf
The default vrf is generally non-NULL, except when shutdown. So, most
of the time it is not necessary to check if it is NULL, we should
remove the useless checks for it.

Searched them with exact match:
```
grep -rI "zebra_vrf_lookup_by_id(VRF_DEFAULT)" | wc -l
31
```

Signed-off-by: anlan_cs <vic.lan@pica8.com>
2023-07-12 17:00:27 +08:00
Russ White
916feb7acc
Merge pull request #13885 from donaldsharp/tests_need_to_be_stricter
Tests need to be stricter
2023-07-11 11:49:38 -04:00
Russ White
89aba318f7
Merge pull request #13876 from LabNConsulting/mjs/nhrp_resolving
Allow NHRP routes to validate incoming nexthops
2023-07-11 11:48:16 -04:00
Donald Sharp
c8971388a9
Merge pull request #13958 from opensourcerouting/fix/coverity
Coverity fixes
2023-07-11 11:26:47 -04:00
Russ White
f0f2c7be41
Merge pull request #13964 from pguibert6WIND/mpls_again
zebra: fix mpls config on ifaces created post frr
2023-07-11 10:12:04 -04:00
anlan_cs
5581a7fc08 zebra: adjust one debug info
Adjust one debug info, separate the ip address from it. Just like it is processed
in `redistribute_update()`.

Before:
```
34:1375.75.75.75/32: Redist del: re 0x55c1112067e0 (0:static), new re 0x55c1112de7c0 (0:static)
```

After:
```
(34:13):75.75.75.75/32: Redist del: re 0x55c1112067e0 (0:static), new re 0x55c1112de7c0 (0:static)
```

Signed-off-by: anlan_cs <vic.lan@pica8.com>
2023-07-11 13:36:09 +08:00
Mark Stapp
bb58cad150 zebra: use NHRP routes as valid in nexthop check
Treat NHRP-installed routes as valid, as if they were
CONNECTED routes, when checking candidate routes'
nexthops for validity. This allows use of NHRP by an
IGP, for example, that doesn't normally want recursive
nexthop resolution.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-07-10 16:43:53 -04:00
Donatas Abraitis
4bd04364ad zebra: Guard printing an error by checking if VRF is not NULL
Check if vrf_lookup_by_id() didn't return a NULL before dereferencing in
flor_err().

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-07-10 22:37:35 +03:00
Donatas Abraitis
f5fee8dd54 zebra: Check if ifp is not NULL in zebra_if_update_ctx()
Use the same logic as zebra_if_netconf_update_ctx().

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-07-10 22:37:33 +03:00
Donatas Abraitis
803375ac69 zebra: Do not check ifp for NULL
It's already checked at the bottom of the function.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-07-10 22:36:59 +03:00
Donald Sharp
f4c29914b5 zebra: Lookup up nlsock * one time in call tree
Code is looking up the nlsock to generate the batch messages
and then looking it up again to get the response.  Let's
just look it up one time.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-10 09:06:40 -04:00
Philippe Guibert
71b0b0d3b3 zebra: fix mpls config on ifaces created post frr
The mpls configuration does not work when an interface is
created after having applied the frr configuration. The
below scenario illustrates:

> root@dut:~# modprobe mpls
> root@dut:~# zebra &
> [..]
> dut(config)# interface ifacenotcreated
> dut(config-if)# mpls enable
> dut(config-if)# Ctrl-D
> root@dut:~# ip li show ifacenotcreated
> Device "ifacenotcreated" does not exist.
> root@dut:~# ip li add ifacenotcreated type dummy
> 0

Fix this by forcing the mpls flag when the interface is detected.

> root@dut:~# cat /proc/sys/net/mpls/conf/ifacenotcreat/input
> 1

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-07-09 21:57:01 +02:00
Carmine Scarpitta
7f2dec4f09 zebra: Fix crash when dplane_fpm_nl fails to process received routes
When `dplane_fpm_nl` receives a route, it allocates memory for a dplane
context and calls `netlink_route_change_read_unicast_internal` without
initializing the `intf_extra_list` contained in the dplane context. If
`netlink_route_change_read_unicast_internal` is not able to process the
route, we call `dplane_ctx_fini` to free the dplane context. This causes
a crash because `dplane_ctx_fini` attempts to access the intf_extra_list
which is not initialized.

To solve this issue, we can call `dplane_ctx_route_init`to initialize
the dplane route context properly, just after the dplane context
allocation.

(gdb) bt
#0 0x0000555dd5ceae80 in dplane_intf_extra_list_pop (h=0x7fae1c007e68) at ../zebra/zebra_dplane.c:427
#1 dplane_ctx_free_internal (ctx=0x7fae1c0074b0) at ../zebra/zebra_dplane.c:724
#2 0x0000555dd5cebc99 in dplane_ctx_free (pctx=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:869
#3 dplane_ctx_free (pctx=0x7fae2aa88c98, pctx@entry=0x7fae2aa78c28) at ../zebra/zebra_dplane.c:855
#4 dplane_ctx_fini (pctx=pctx@entry=0x7fae2aa88c98) at ../zebra/zebra_dplane.c:890
#5 0x00007fae31e93f29 in fpm_read (t=) at ../zebra/dplane_fpm_nl.c:605
#6 0x00007fae325191dd in thread_call (thread=thread@entry=0x7fae2aa98da0) at ../lib/thread.c:2006
#7 0x00007fae324c42b8 in fpt_run (arg=0x555dd74777c0) at ../lib/frr_pthread.c:309
#8 0x00007fae32405ea7 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007fae32325a2f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Fixes: #13754
Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
2023-07-07 10:59:28 +02:00
Carmine Scarpitta
745a0fcbb2 zebra: Abstract dplane_ctx_route_init to init route without copying
The function `dplane_ctx_route_init` initializes a dplane route context
from the route object passed as an argument. Let's abstract this
function to allow initializing the dplane route context without actually
copying a route object.

This allows us to use this function for initializing a dplane route
context when we don't have any route to copy in it.

Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
2023-07-07 10:59:28 +02:00
Russ White
1e9e82e803
Merge pull request #13396 from donaldsharp/interface_is_interface
move interface ( LINK and ADDR ) events to the dplane
2023-07-06 08:31:16 -04:00
Donatas Abraitis
2ec7477a26
Merge pull request #13808 from anlancs/fix/zebra-kernel-route-reserved
zebra: fix wrong nexthop check for kernel routes
2023-07-06 09:01:21 +03:00
Donald Sharp
605df8d44f zebra: Use zebra dplane for RTM link and addr
a) Move the reads of link and address information
into the dplane
b) Move the startup read of data into the dplane
as well.
c) Break up startup reading of the linux kernel data
into multiple phases.  As that we have implied ordering
of data that must be read first and if the dplane has
taken over some data reading then we must delay initial
read-in of other data.

Fixes: #13288
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 13:03:14 -04:00
Donald Sharp
a014450441 zebra: Add code to get/set interface to pass up from dplane
1) Add a bunch of get/set functions and associated data
structure in zebra_dplane to allow the setting and retrieval
of interface netlink data up into the master pthread.

2) Add a bit of code to breakup startup into stages.  This is
because FRR currently has a mix of dplane and non dplane interactions
and the code needs to be paused before continuing on.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 13:03:14 -04:00
Donald Sharp
487a96a35f zebra: Remove duplicate function for netlink interface changes
Turns out FRR has 2 functions one specifically for startup
and one for normal day to day operations.  There were only
a couple of minor differences from what I could tell, and
where they were different the after startup functionality should
have been updated too.  I cannot figure out why we have 2.

Non-startup handling of bonds appears to be incorrect
so let's fix that.  Additionally the speed was not
properly being set in non-startup situations.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 13:03:14 -04:00
Donald Sharp
bc0bac5524 zebra: Remove unused add variable
Function was not using the add variable.  Remove it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 11:49:36 -04:00
Donald Sharp
cd7324dfa6 zebra: Remove unused dplane_intf_delete
There is no need for this functionality and it is
not used.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 11:49:36 -04:00
Donald Sharp
c3c9683f99 zebra: Move protodown_r_bit to a better spot
Since we are moving some code handling out of the dataplane
and into zebra proper, lets move the protodown r bit as well.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 11:49:36 -04:00
Donald Sharp
6a3ae11c9b zebra: Rename vrf_lookup_by_tableid to zebra_vrf_lookup..
Rename the vrf_lookup_by_id function to zebra_vrf_lookup_by_id
and move to zebra_vrf.c where it nominally belongs, as that
we need zebra specific data to find this vrf_id and as such
it does not belong in vrf.c

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-07-05 11:49:36 -04:00
Mark Stapp
d6caf0dbd7
Merge pull request #13875 from donaldsharp/static_dplane_issues
zebra: Static routes async notification do not need this test
2023-07-05 08:27:23 -04:00
Donatas Abraitis
9a0bb7bcd1
Merge pull request #13333 from donaldsharp/vrf_bitmap_cleanup
*: Rearrange vrf_bitmap_X api to reduce memory footprint
2023-07-04 22:11:11 +03:00
anlan_cs
098519caf8 zebra: fix wrong nexthop check for kernel routes
When changing one interface's vrf, the kernel routes are wrongly kept
in old vrf.  Finally, the forwarding table in that old vrf can't forward
traffic correctly for those residual entries.

Follow these steps to make this problem happen:
( Firstly, "x1" interface of default vrf is with address of "6.6.6.6/24". )

```
anlan# ip route add 4.4.4.0/24 via 6.6.6.8 dev x1
anlan# ip link add vrf1 type vrf table 1
anlan# ip link set vrf1 up
anlan# ip link set x1 master vrf1
```

Then check `show ip route`, the route of "4.4.4.0/24" is still selected
in default vrf.

If the interface goes down, the kernel routes will be reevaluated.  Those
kernel routes with active interface of nexthop can be kept no change, it
is a fast path.  Otherwise, it enters into slow path to do careful examination
on this nexthop.

After the interface's vrf had been changed into new vrf, the down message of
this interface came.  It means the interface is not in old vrf although it
still exists during that checking, so the kernel routes should be dropped
after this nexthop matching against a default route in slow path. But, in
current code they are wrongly kept in fast path for not checking vrf.

So, modified the checking active nexthop with vrf comparision for the interface
during reevaluation.

Signed-off-by: anlan_cs <vic.lan@pica8.com>
2023-07-02 10:30:09 +08:00
anlan_cs
caf896d6ef zebra: Remove unnecessary condition check for kernel routes
There are relaxed nexthop requirements for kernel routes because we
trust kernel routes.

Two minor changes for kernel routes:

1. `if_is_up()` is one of the necessary conditions for `if_is_operative()`.
Here, we can remove this unnecessary check for clarity.

2. Since `nexthop_active()` doesn't distinguish whether it is kernel route,
modified the corresponding comment in it.

Signed-off-by: anlan_cs <vic.lan@pica8.com>
2023-07-02 10:30:09 +08:00
Donatas Abraitis
64510b9467 zebra: Dump route details when deleting a route
Just more details what's going on when deleting a route.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-29 17:39:45 +03:00
Donald Sharp
d0123a9012 zebra: Static routes async notification do not need this test
When using asic_offload with an asynchronous notification the
rib_route_match_ctx function is testing for distance and tag
being correct against the re.

Normal route notification for static routes is this(well really all routes):
a) zebra dplane generates a ctx to send to the dplane for route install
b) dplane installs it in the kernel
c) if the dplane_fpm_nl.c module is being used it installs it.
d) The context's success code is set to it worked and passes the context
back up to zebra for processing.
e) Zebra master receives this and checks the distance and tag are correct
for static routes and accepts the route and marks it installed.

If the operator is using a wait for install mechansim where the dplane
is asynchronously sending the result back up at a future time *and*
it is using the dplane_fpm_nl.c code where it uses the rt_netlink.c
route parsing code, then there is no way to set distance as that we
do not pass distance to the kernel.

As such static routes were never being properly handled since the re and
context would not match and the route would still be marked as queued.

Modify the code such that the asynchronous path notification for static
routes ignores the distance and tag's as that there is no way to test
for this data from that path at this point in time.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-29 09:35:00 -04:00
Mark Stapp
59b8965aa6
Merge pull request #13861 from opensourcerouting/fix/memory_leak_zserv
zebra: Free Zebra client resources
2023-06-28 08:18:11 -04:00
Donatas Abraitis
97072d144e zebra: Free Zebra client resources
Memory leaks started flowing:

```
AddressSanitizer Topotests Part 0:  15 KB -> 283 KB
AddressSanitizer Topotests Part 1:  1 KB -> 495 KB
AddressSanitizer Topotests Part 2:  13 KB -> 478 KB
AddressSanitizer Topotests Part 3:  39 KB -> 213 KB
AddressSanitizer Topotests Part 4:  30 KB -> 836 KB
AddressSanitizer Topotests Part 5:  0 bytes -> 356 KB
AddressSanitizer Topotests Part 6:  86 KB -> 783 KB
AddressSanitizer Topotests Part 7:  0 bytes -> 354 KB
AddressSanitizer Topotests Part 8:  0 bytes -> 62 KB
AddressSanitizer Topotests Part 9:  408 KB -> 518 KB
```

```
Direct leak of 3584 byte(s) in 1 object(s) allocated from:
    #0 0x7f1957b02d28 in __interceptor_calloc (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xded28)
    #1 0x559895c55df0 in qcalloc lib/memory.c:105
    #2 0x559895bc1cdf in zserv_client_create zebra/zserv.c:743
    #3 0x559895bc1cdf in zserv_accept zebra/zserv.c:880
    #4 0x559895cf3438 in event_call lib/event.c:1995
    #5 0x559895c3901c in frr_run lib/libfrr.c:1213
    #6 0x559895a698f1 in main zebra/main.c:472
    #7 0x7f195635ec86 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21c86)
```

Fixes b20acd0 ("bgpd: Use synchronous way to get labels from Zebra")

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-27 22:48:39 +03:00
Russ White
1f08a055a8
Merge pull request #13852 from mjstapp/fix_opq_cov_msg
zebra: clean up coverity warning in opaque api
2023-06-27 11:28:31 -04:00
Chirag Shah
a7d77ee58b zebra: fix evpn rmac nh list cmp function
EVPN RMAC (Router MAC) nexthop list compare
function needs to return all values so
the list element can be compared and added/deleted
properly.

Ticket:#3486989
Testing Done:
Originate EVPN Type-5 route with PIP IP and MAC as remote
nexthops.
Change the PIP IP address which triggers nexthop change.

Before fix:
When PIP IP changes RMAC is deleted from remote VTEPs.

TORS1# show evpn next-hops vni 4001 | include 00:02:00:00:00:2d
27.0.0.11       00:02:00:00:00:2d
TORS1# show evpn rmac vni 4001 | include 00:02:00:00:00:2d
00:02:00:00:00:2d 27.0.0.11

----- Remote VTEP change nexthop IP to 172.16.16.16 -----

TORS1# show evpn next-hops vni 4001 | include 00:02:00:00:00:2d
172.16.16.16    00:02:00:00:00:2d
TORS1# show evpn rmac vni 4001 | include 00:02:00:00:00:2d
TORS1#

After fix:
RMAC is retained as its nexthop list is not empty,
thus it is not deleted from remote VTEPs.

TORS1# show evpn rmac vni 4001 | include 00:02:00:00:00:2d
00:02:00:00:00:2d 172.16.16.16

Log:
2023/06/27 00:50:36.833474 ZEBRA: [XREH0-ZYMH6] L3VNI 4001 Remote VTEP
change(27.0.0.11 -> 172.16.16.16) for RMAC 00:02:00:00:00:2d

Signed-off-by: Chirag Shah <chirag@nvidia.com>
2023-06-26 17:59:16 -07:00
Donald Sharp
161972c9fe *: Rearrange vrf_bitmap_X api to reduce memory footprint
When running all daemons with config for most of them, FRR has
sharpd@janelle:~/frr$ vtysh -c "show debug hashtable"  | grep "VRF BIT HASH" | wc -l
3570

3570 hashes for bitmaps associated with the vrf.  This is a very
large number of hashes.  Let's do two things:

a) Reduce the created size of the actually created hashes to 2
instead of 32.

b) Delay generation of the hash *until* a set operation happens.
As that no hash directly implies a unset value if/when checked.

This reduces the number of hashes to 61 in my setup for normal
operation.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-26 14:59:21 -04:00
Mark Stapp
0ee56dd332 zebra: clean up coverity warning in opaque api
Seems a bit fussy of coverity, but ... don't NULL a variable
unnecessarily.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-06-26 13:19:23 -04:00
Mark Stapp
de1a9ce0a7 zebra: support notifications for opaque ZAPI messages
Allow zapi clients to register to be notified when a server
for an  opaque message type is present. Zebra maintains these
notification registrations in the same data structures that it
uses for opaque message handling.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-06-23 08:57:37 -04:00
Mark Stapp
ef8e3ac02c lib, zebra: include source client zapi info in opaque messages
Include the sending zapi client info (proto, instance, and
session id) in each opaque zapi message. Add opaque 'init'
apis for clients who want to encode their opaque data inline,
into the zclient's internal stream buffer. Use these init apis
in the TE/link-state lib code, instead of hand-coding the
zapi opaque header info.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-06-23 08:27:42 -04:00
Donatas Abraitis
3cbc7150bb
Merge pull request #13545 from idryzhov/remove-bond-slave
zebra: remove ZEBRA_IF_BOND_SLAVE interface type
2023-06-23 11:01:19 +03:00
Donatas Abraitis
52dde8747b zebra: Ignore non GR-aware zclient handling for BGP
This is for synchronous client (label/table manager) - aka session_id == 1.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-20 20:50:40 +03:00
Donatas Abraitis
20c2c8787a zebra: Show session id when printing an error when the client disconnects
Before:

```
2023/06/18 22:00:42 ZEBRA: [VXKFG-8SJRV][EC 4043309121] Client 'bgp' encountered an error and is shutting down.
2023/06/18 22:00:42 ZEBRA: [VXKFG-8SJRV][EC 4043309121] Client 'bgp' encountered an error and is shutting down.
```

After:

```
2023/06/18 22:06:44 ZEBRA: [N5M5Y-J5BPG][EC 4043309121] Client 'bgp' (session id 0) encountered an error and is shutting down.
2023/06/18 22:06:44 ZEBRA: [N5M5Y-J5BPG][EC 4043309121] Client 'bgp' (session id 1) encountered an error and is shutting down.
```

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-06-20 20:50:40 +03:00
Russ White
40502902f4
Merge pull request #13394 from mjstapp/fix_zebra_mpls_config
zebra: clarify interface-level mpls config
2023-06-20 09:10:53 -04:00
Donald Sharp
f89d090230
Merge pull request #13755 from LabNConsulting/ziemba/zebra-dplane-priority
zebra: bugfix dplane priority sorting
2023-06-13 10:36:57 -04:00
Mark Stapp
a32d40a676 zebra: clarify interface-level mpls config
We have both interface-level configuration to enable mpls,
and runtime mpls status. They need to be distinct.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-06-12 16:41:27 -04:00
Mark Stapp
4112baec9f pbrd, zebra: fix zapi and netlink rule encoding
In pbrd, don't encode a rule without a table. There are cases
where the zapi encoding was incorrect because the 4-octet
table id was missing. In zebra, mask off the ECN bits in the
TOS byte when encoding an iprule to match netlink's
expectation.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-06-12 16:39:26 -04:00
G. Paul Ziemba
9e5c9e6d65 zebra: bugfix dplane priority sorting
Signed-off-by: G. Paul Ziemba <paulz@labn.net>
2023-06-09 06:58:20 -07:00
Donald Sharp
977d7e24ff zebra: Prevent crash because nl is NULL on shutdown
When shutting down the main pthread was first closing
the sockets associated with the dplane pthread and
then telling it to shutdown the pthread at a later point
in time.  This caused the dplane to crash because the nl
data has been freed already.  Change the shutdown order
to stop the dplane pthread *and* then close the sockets.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-08 12:03:49 -04:00
Donatas Abraitis
29f6fb04d8
Merge pull request #13649 from donaldsharp/unlock_the_node_or_else
zebra: Unlock the route node when sending route notifications
2023-06-06 08:52:40 +03:00
Donald Sharp
3ddf7680fd zebra: Consolidate the stream_failure section with normal return
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-01 08:58:16 -04:00
Donald Sharp
c2cf522347 zebra: No need to set msg to NULL
The msg value is always reset to something new before it is used inside
the mutex.  No need to set it to NULL.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-01 08:54:25 -04:00
Donald Sharp
82c6e4fea5 zebra: Unlock the route node when sending route notifications
When using a context to send route notifications to upper
level protocols, the code was using a locking function to
get the route node.  There is no need for this to be locked
as such FRR should free it up.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-06-01 07:35:12 -04:00
Donatas Abraitis
147c7a2de3
Merge pull request #13631 from donaldsharp/fix_some_ping_issues
various issues
2023-05-30 21:26:24 +03:00
Christian Hopps
ff6b14a658 zebra: use ifindex vs ifp to avoid use-after-free on shutdown
Signed-off-by: Christian Hopps <chopps@labn.net>
2023-05-30 04:09:29 -04:00
Christian Hopps
8cfe36bc7e zebra: avoid unneeded vxlan work on shutdown
Signed-off-by: Christian Hopps <chopps@labn.net>
2023-05-30 04:09:29 -04:00
Donald Sharp
46d725f76b lib, zebra: Ensure that the ifp->node exists
On removal, ensure that the ifp->node is set to a null
pointer so that FRR does not use data after freed.
In addition ensure that the ifp->node exists before
attempting to free it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-28 10:13:16 -04:00
Russ White
7b7da41def
Merge pull request #13556 from donaldsharp/token_to_desc
memory desciprtion shortening
2023-05-23 08:21:51 -04:00
Donald Sharp
d7c9666e06 zebra: Fix paths that have already de-refed ctx
There is no path in some functions where the ctx
has not already been de-refed.  As such no need
to test for it's existence.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-22 10:52:54 -04:00
Igor Ryzhov
9ce24c31bf zebra: remove ZEBRA_IF_BOND_SLAVE interface type
It is never actually used in the code.

Closes #13532.

Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
2023-05-21 23:37:39 +03:00
Donald Sharp
a01f310709 zebra: Make memory description string smaller to fit in vty space
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-19 21:31:35 -04:00
Donald Sharp
5ec001aa53 zebra: On shutdown stop hook calls for fpm rmac updates
When shutting down zebra, the hook for the rmac update was
not being unregistered.  As such it would be possible
to get into a condition where more rmacs are being
added to the queue for handling in the future after we
are told to shutdown.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-19 10:02:19 -04:00
Donald Sharp
540334324c zebra: Properly handle zfpm_g->t_conn_down in zebra_fpm.c
The t_conn_down pointer was being set to NULL when it already
was.  The t_conn_down pointer was being dropped( and leaving
a thread possibly running in the background ) which could
cause problems on shutdown.  And finally when shutting down
the t_conn_down event was not being stopped at all.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-19 10:02:19 -04:00
Donald Sharp
0eaa6523f6 zebra: Do not allow old FPM to access freed memory after shutdown
On shutdown, the old FPM queues up dests to be sent to
the FPM listener.  This is done through the rib_shutdown
hook.  Which is called when the table that the routes are
stored in are being deleted.  This dest has pointers
to the rnode.  The rnode has pointers to the table it
is associated with as well as the table->info pointer for
the zebra data associated with this table.

The FPM after this attempts to tell this to it's listener
via events.  Unfortunately the zvrf, table_id and nl_pid
was being grabbed from memory that had been freed!  Since
all this can be grabbed from memory that has not been freed
on shutdown let's switch over to using that instead of freed
memory for gathering data.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-19 10:02:19 -04:00
Carmine Scarpitta
eb68d4a04c zebra: Fix build error when --disable-bfdd
When FRR is built with the option `--disable-bfdd`, the build process
fails with the following error:

```
zebra/zebra_ptm.c: In function ‘zebra_ptm_init’:
zebra/zebra_ptm.c:119:35: error: ‘FRR_PTM_NAME’ undeclared (first use in this function)
  119 |  snprintf(buf, sizeof(buf), "%s", FRR_PTM_NAME);
      |                                   ^~~~~~~~~~~~
zebra/zebra_ptm.c:119:35: note: each undeclared identifier is reported only once for each function it appears in
make[1]: *** [Makefile:10520: zebra/zebra_ptm.o] Error 1
```

The reason is that `FRR_PTM_NAME` is defined in `version.h` which is not
imported.

This commit adds the missing import.

Signed-off-by: Carmine Scarpitta <carmine.scarpitta@uniroma2.it>
2023-05-17 18:47:23 +02:00
Mark Stapp
e8224402cd
Merge pull request #13444 from donaldsharp/fix_dplane_provider_counter
zebra: Fix dp_out_queued counter to actually reflect real life
2023-05-12 14:54:13 -04:00
Donald Sharp
995d810d08 zebra: Fix dp_out_queued counter to actually reflect real life
The prov->dp_out_queued counter was never being decremented
when a ctx was pulled off of the list.  Let's change it to
accurately reflect real life.

Broken:
janelle.pinkbelly.org# show zebra dplane providers detailed
Zebra dataplane providers:
Kernel (1): in: 330872, q: 0, q_max: 100, out: 330872, q: 330872, q_max: 330872
janelle.pinkbelly.org#

Fixed:
sharpd@janelle:/tmp/topotests$ vtysh -c "show zebra dplane providers detailed"
Zebra dataplane providers:
Kernel (1): in: 221495, q: 0, q_max: 100, out: 221495, q: 0, q_max: 100
sharpd@janelle:/tmp/topotests$

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-12 11:34:56 -04:00
Philippe Guibert
fab64b600a zebra: mpls nexthop entry displays also interface when available
The 'show mpls table json' command displays the outgoing interface
name only when the nexthop type is either NEXTHOP_TYPE_IFINDEX or
NEXTHOP_TYPE_IPV6_IFINDEX. add the interface name for the nexthop
type NEXTHOP_TYPE_IPV4_IFINDEX.

Fixes: ("b78b820d46d6") MPLS: Display enhancements and JSON support
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-05-09 21:00:57 +02:00
Philippe Guibert
7bae48960e zebra: handle nexthop vrf_id in ZEBRA_MPLS_LABELS messages
This commit addresses the case where a service wants to install
an LSP entry to a next-hop located in a VRF instance. The incoming
MPLS packet is on the namespace and has to be directed to a nexthop
located behind an interface that sits in a specific VRF instance.
The below iproute command can illustrate:

  > ip link add vrf1 type vrf table 10
  > ip link set dev vrf1 up
  > ip link set dev eth0 master vrf1
  > ip a a 192.0.2.1/24 dev eth0
  > ip -f mpls route add 105 via inet 192.0.2.45 dev eth0

If a service uses the ZEBRA_MPLS_LABELS messages, then the LSP
message is ignored: from zebra perspective, the MPLS entries are
visible via the 'show mpls table' command, but no LSP entry is
installed in the kernel.

The issue is in the nhlfe_nexthop_active_ipv[4/6] function: the
outgoing interface mentioned in the nexthop is searched in the
main VRF, whereas the interface is in a separate VRF. The interface
is not found, and the nhlfe to install is considered not active.

To address this issue, reuse the incoming vrf_id parameter transmitted
in the nexthop structure from the ZEBRA_MPLS_LABELS message. When
creating an NHLFE entry, the vrf_id is used instead of the DEFAULT_VRF.
And the nhlfe entry can be considered as active.

One alternate solution to reuse the vrf_id parameter in the mpls network
context would be to modify the search function in nhlfe_nexthop_active..()
function: looking for an existing ifindex in the zns. However, this
solution may not fit later when netns backend would be used.

Note that some changes have not been done yet and are considered
sufficient for now:
- The 'nhlfe_find' API: the assumption is done that only the linux vrf
backend is used for now.

- The 'mpls_lsp_install()' API: It is currently used by the CLI command
which does not handle the interface parameter, and the SRTE service, whih
always sends LSPs towards a nexthop located in the VRF_DEFAULT.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-05-09 21:00:57 +02:00
Philippe Guibert
bd21ba79aa zebra: accept LSP entries with an mpls-less outgoing interface
The ZEBRA_MPLS_LABELS_[ADD/DELETE/REPLACE] messages may change an
LSP entry based on an incoming MPLS entry, followed by a given
next-hop.
Having a next hop with no label information inside is rejected
by the zebra layer. As illustration, the following ZAPI message
would be rejected, because the next hop does not contain any
label information.

  > ip -f mpls route add 105 via inet 192.0.2.45

At the same time, such configuration is desirable to be
supported:

An attempt has been done to configure the next-hop with an implicit-
null label. But the message is rejected by the kernel:

  > ip -f mpls route add 104 as 3 via inet 192.0.2.45
  > Error: Implicit NULL Label (3) can not be used in encapsulation.

The commit proposes to accept ZEBRA_MPLS_LABELS_[XX] messages with
a nexthop that does not contain any label information.

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-05-09 21:00:57 +02:00
Donatas Abraitis
bae305fc9b
Merge pull request #13445 from donaldsharp/lua_scripting_mem_leak
zebra: Reduce creation and fix memory leak of frrscripting pointers
2023-05-09 15:38:06 +03:00
Mark Stapp
eb4c026d13
Merge pull request #13413 from chiragshah6/fdev2
zebra: re-install NHG on interface up
2023-05-08 14:36:07 -04:00
Donald Sharp
3e7b3ed1dc zebra: dplane_gre_set could return while leaking ctx
Prevent this function from leaking the ctx memory.
Also properly record that something has gone wrong.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-05 19:11:02 -04:00
Donald Sharp
6636fc44c8 zebra: Dplane ctx allocation cannot fail
Having tests for memory allocation success makes no sense
given what happens when frr fails to allocate memory.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-05 19:10:59 -04:00
Chirag Shah
69cf016ee2 zebra:re-install dependent nhgs on interface up
Upon interface up associated singleton NHG's
dependent NHGs needs to be reinstalled as
kernel would have deleted if there is no route
referencing it.

Ticket:#3416477
Issue:3416477
Testing Done:
flap interfaces which are part of route NHG,
upon interfaces up event, NHGs are resynced
into dplane.

Signed-off-by: Chirag Shah <chirag@nvidia.com>
2023-05-05 14:37:52 -07:00
Ashwini Reddy
5bb87732f6 zebra: re-install nhg on interface up
Intermittently zebra and kernel are out of sync
when interface flaps and the add's/dels are in
same processing queue and zebra assumes no change in nexthop.
Hence we need to bring in a reinstall to kernel
of the nexthops and routes to sync their states.

Upon interface flap kernel would have deleted NHGs
associated to a interface (the one flapped),
zebra retains NHGs for 3 mins even though upper
layer protocol removes the nexthops (associated NHG).
As part of interface address add ,
re-add singleton NHGs associated to interface.

Ticket: #3173663
Issue: 3173663

Signed-off-by: Ashwini Reddy <ashred@nvidia.com>
Signed-off-by: Chirag Shah <chirag@nvidia.com>
2023-05-05 14:37:52 -07:00
Donald Sharp
d8be139972 zebra: Reduce creation and fix memory leak of frrscripting pointers
There are two issues being addressed:

a) The ZEBRA_ON_RIB_PROCESS_HOOK_CALL script point
was creating a fs pointer per dplane ctx in
rib_process_dplane_results().

b) The fs pointer was not being deleted and directly
leaked.

For (a) Move the creation of the fs to outside
the do while loop.

For (b) At function end ensure that the pointer
is actually deleted.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-05-05 12:24:02 -04:00
Donatas Abraitis
786e2b8bdb Revert "MPLS allocation mode per next hop"
Broken tests, let's revert now.

Signed-off-by: Donatas Abraitis <donatas@opensourcerouting.org>
2023-05-03 13:52:46 +03:00
Donatas Abraitis
99a1ab0b21
Merge pull request #12646 from pguibert6WIND/mpls_alloc_per_nh
MPLS allocation mode per next hop
2023-05-02 18:36:45 +03:00
Russ White
1998805bd5
Merge pull request #13403 from anlancs/fix/zebra-missing-vrf-flag
zebra: Fix missing VRF flag
2023-05-02 10:47:41 -04:00
Russ White
856e85e910
Merge pull request #13270 from pguibert6WIND/better_srv6_output_seg6local
zebra: display seg6local only when specified
2023-05-02 10:30:15 -04:00
anlan_cs
41414503e4 zebra: Fix missing VRF flag
1. No any configuration in FRR, and `ip link add vrf1 type vrf ...`.
Currently, everything is ok.

2.  `ip link del vrf1`.
`zebra` will wrongly/redundantly notify clients to add "vrf1" as a normal
interface after correct deletion of "vrf1".

```
ZEBRA: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELLINK(17), len=588, seq=0, pid=0
ZEBRA: [TDJW2-B9KJW] RTM_DELLINK for vrf1(93) <- Wrongly as normal interface, not vrf
ZEBRA: [WEEJX-M4HA0] interface vrf1 vrf vrf1(93) index 93 is now inactive.
ZEBRA: [NXAHW-290AC] MESSAGE: ZEBRA_INTERFACE_DELETE vrf1 vrf vrf1(93)
ZEBRA: [H97XA-ABB3A] MESSAGE: ZEBRA_INTERFACE_VRF_UPDATE/DEL vrf1 VRF Id 93 -> 0
ZEBRA: [HP8PZ-7D6D2] MESSAGE: ZEBRA_INTERFACE_VRF_UPDATE/ADD vrf1 VRF Id 93 -> 0 <-
ZEBRA: [Y6R2N-EF2N4] interface vrf1 is being deleted from the system
ZEBRA: [KNFMR-AFZ53] RTM_DELLINK for VRF vrf1(93)
ZEBRA: [P0CZ5-RF5FH] VRF vrf1 id 93 is now inactive
ZEBRA: [XC3P3-1DG4D] MESSAGE: ZEBRA_VRF_DELETE vrf1
ZEBRA: [ZMS2F-6K837] VRF vrf1 id 4294967295 deleted
OSPF: [JKWE3-97M3J] Zebra: interface add vrf1 vrf default[0] index 0 flags 480 metric 0 mtu 65575 speed 0 <- Wrongly add interface
```

`if_handle_vrf_change()` moved the interface from specific vrf to default
vrf. But it doesn't skip interface of vrf type. So, the wrong/redundant
add operation is done.

Note, the wrong add operation is regarded as an normal interface because
the `ifp->status` is cleared too early, so it is without VRF flag
( `ZEBRA_INTERFACE_VRF_LOOPBACK` ). Now, ospfd will initialize `ifp->type`
to `OSPF_IFTYPE_BROADCAST`.

3. `ip link add vrf1 type vrf ...`, add "vrf1" again. FRR will be with
wrong display:

```
interface vrf1
 ip ospf network broadcast
exit
```

Here, zebra will send `ZEBRA_INTERFACE_ADD` again for "vrf1" with
correct `ifp->status`, so it will be updated into vrf type. But
it can't update `ifp->type` from `OSPF_IFTYPE_BROADCAST` to
`OSPF_IFTYPE_LOOPBACK` because it had been already configured in above
step 2.

Two changes to fix it:

1. Skip the procedure of switching VRF for interfaces of vrf type.
It means, don't send `ZEBRA_INTERFACE_ADD` to clients when deleting vrf.

2. Put the deletion of this flag at the last.
It means, clients should get correct `ifp->status`.

Signed-off-by: anlan_cs <vic.lan@pica8.com>
2023-05-01 20:21:37 +08:00
Sindhu Parvathi Gopinathan
2223b4d543 zebra:add df flag into evpn esi json output
FRR "show evpn es 'esi-id' json" output dont have the 'df' flag.

Modified the code to add the 'df' flag into json output.

Before Fix:

```
torm-11# show evpn es 03:44:38:39:ff:ff:01:00:00:01 json
{
  "esi":"03:44:38:39:ff:ff:01:00:00:01",
  "accessPort":"hostbond1",
  "flags":[
    "local",
    "remote",
    "readyForBgp",
    "bridgePort",
    "operUp",
    "nexthopGroupActive"
	 ====================> df is missing
  ],
  "vniCount":10,
  "macCount":13,
  "dfPreference":50000,
  "nexthopGroup":536870913,
  "vteps":[
    {
      "vtep":"27.0.0.16",
      "dfAlgorithm":"preference",
      "dfPreference":32767,
      "nexthopId":268435460
    },
    {
      "vtep":"27.0.0.17",
      "dfAlgorithm":"preference",
      "dfPreference":32767,
      "nexthopId":268435461
    }
  ]
}
torm-11#
```

After Fix:-

```
torm-11# show evpn es 03:44:38:39:ff:ff:01:00:00:01 json
{
  "esi":"03:44:38:39:ff:ff:01:00:00:01",
  "accessPort":"hostbond1",
  "flags":[
    "local",
    "remote",
    "readyForBgp",
    "bridgePort",
    "operUp",
    "nexthopGroupActive",
    "df" ========================> designated-forward flag added
  ],
  "vniCount":10,
  "macCount":13,
  "dfPreference":50000,
  "nexthopGroup":536870913,
  "vteps":[
    {
      "vtep":"27.0.0.16",
      "dfAlgorithm":"preference",
      "dfPreference":32767,
      "nexthopId":268435460
    },
    {
      "vtep":"27.0.0.17",
      "dfAlgorithm":"preference",
      "dfPreference":32767,
      "nexthopId":268435461
    }
  ]
}
torm-11#

```

Ticket:# 3447935

Issue: 3447935

Testing: UT done

Signed-off-by: Sindhu Parvathi Gopinathan's <sgopinathan@nvidia.com>
Signed-off-by: Chirag Shah <chirag@nvidia.com>
2023-04-28 16:04:25 -07:00
Russ White
257fddaeb6
Merge pull request #13246 from opensourcerouting/rip-bfd
ripd: support BFD integration
2023-04-25 11:54:32 -04:00
Donatas Abraitis
76cd90fb4e
Merge pull request #13330 from chiragshah6/fdev1
zebra: EVPN handle duplicate detected local mac delete event
2023-04-24 16:51:10 +03:00
Donald Sharp
6f99cfcd89 zebra: ctx has to be non NULL at this point
Remove the pointer check for ctx.  At this point in the
function it has to be non null since we deref'ed it.
Additionally the alloc function that creates it cannot
fail.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-04-21 08:54:51 -04:00
Chirag Shah
89844a9678 zebra:fix evpn dup detected local mac del event
The current local mac delete event send to flag with force
always which breaks the duplicate detected MACs where
it requires to be resynced from bgpd to earlier state.

Ticket:#3233019
Issue:3233019

Signed-off-by: Chirag Shah <chirag@nvidia.com>
2023-04-20 15:45:39 -07:00
Chirag Shah
ad7685de28 zebra: evpn handle del event for dup detected mac
Upon receiving local mobility event for MAC + NEIGH,
both are detected as duplicate upon hitting DAD threshold.

Duplicated detected ( freezed) MAC + NEIGH are not known
to bgpd.

If locally learnt MAC + NEIGH are deleted in kernel,
the MAC is marked as AUTO after sending delete event
to bgpd.

Bgpd only reinstalls best route for MAC_IP route (NEIGH)
but not for MAC event.
This puts a situation where MAC is AUTO state and
associated neigh as remote.

Fix:
DUPLICATE + LOCAL MAC deletion, set MAC delete request
as reinstall from bgpd.

Ticket:#2873307
Reviewed By:
Testing Done:

Freeze MAC + two NEIGHs in local mobility event.
Delete MAC and NEIGH from kerenl.
bgp rsync remote mac route which puts MAC to remote state.

Signed-off-by: Chirag Shah <chirag@nvidia.com>
2023-04-20 15:45:26 -07:00
Renato Westphal
c262df828b ripd: support BFD integration
Implement RIP peer monitoring with BFD.

RFC 5882 Generic Application of Bidirectional Forwarding Detection
(BFD), Section 10.3 Interactions with RIP.

Co-authored-by: Renato Westphal <renato@opensourcerouting.org>
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
2023-04-19 09:15:01 -03:00
Chirag Shah
4a1f91a366 zebra: evpn mh sync mac install as inactive
EVPN MH ES reduendant VTEPs need to install
sync MAC as notify inactive and generate
ND:Proxy stamped extended community on Type-2
route.

Ticket:#3436621
Issue:3436621

Testing Done:

tor-11 originates type-2 MAC route:

tor-11# bridge -d fdb show | grep 00:65:00:00:00:01
00:65:00:00:00:01 dev hostbond1 vlan 1000 notify master bridge static

tor-12 receives sync MAC route:

Before fix:
----------
tor-12:/# bridge -d fdb show | grep 00:65:00:00:00:01
00:65:00:00:00:01 dev hostbond1 vlan 1000 notify master bridge static

After fix: inactive is set to MAC entry
----------
tor-12:/#bridge -d fdb show | grep 00:65:00:00:00:01
00:65:00:00:00:01 dev hostbond1 vlan 1000 notify inactive master bridge
static

Notice the difference in `inactive` post notify on tor-12
with the fix.

Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Signed-off-by: Chirag Shah <chirag@nvidia.com>
2023-04-14 14:50:24 -07:00
Philippe Guibert
f38f5c9a78 zebra: keep seg6local information from 'show ipv6 route' consistent with iproute2
Srv6 nexthop segments may not be set when configuring seg6local
attributes. This is the case for the following seg6local route:

Dump in vtysh, extract from 'show ipv6 route'
> B>* 2001:db8:1:1:1::/128 [20/0] is directly connected, vrf1, seg6local End.DT46 table 10, seg6 ::, weight 1, 00:02:10

Dump in iproute2, extract from 'ip -6 route show'
> 2001:db8:1:1:1:: nhid 22  encap seg6local action End.DT46 vrftable 10 dev vrf1 proto bgp metric 20 pref medium

As can be seen, the 'seg6 ::' nexthop segment is not visible on iproute2,
because it is not set. Do not display seg6 ipv6 nexthop when not set.

After:
> B>* 2001:db8:1:1:1::/128 [20/0] is directly connected, vrf1, seg6local End.DT46 table 10, weight 1, 00:02:10

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-04-14 18:04:01 +02:00
Philippe Guibert
b4bb3b1735 zebra: display seg6local only when specified
Srv6 routes which configure encap method, may not have
seg6local instructions. Generally speaking, seg6local
attributes that are not specified should not be dumped.

Before:
> B>* 10.200.0.0/24 [20/0] via fd00:125::2, ntfp2 (vrf default), label 16, seg6local unspec unknown(seg6local_context2str), seg6 2001:db8:1:1:1::, weight 1, 0\
0:00:17

After:
> B>* 10.200.0.0/24 [20/0] via fd00:125::2, ntfp2 (vrf default), label 16, seg6 2001:db8:1:1:1::, weight 1, 00:00:17

Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
2023-04-14 18:04:01 +02:00
Mark Stapp
4b6b10cb81
Merge pull request #13273 from donaldsharp/metaq_not_making_me_meta_happy
zebra: Actually free up memory associated with the mq list
2023-04-12 14:02:14 -04:00
Mark Stapp
52ccf12c30
Merge pull request #13249 from Pdoijode/connected-route-install-fix
zebra: Mark connected route as installed after interface flap event
2023-04-12 11:03:47 -04:00
Donald Sharp
1b192d88e4 zebra: Actually free up memory associated with the mq list
Free up the link list data structures as well as properly
account for data sizes.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-04-12 10:41:42 -04:00
Jafar Al-Gharaibeh
bd2711d251
Merge pull request #12959 from leonshaw/fix/zif-link-nsid
zebra: Add link_nsid to zebra interface
2023-04-11 16:38:33 -05:00
Donatas Abraitis
b69fa56517
Merge pull request #13213 from mjstapp/fix_dplane_shutdown_event
zebra: fix race during shutdown
2023-04-11 22:24:35 +03:00
Pooja Jagadeesh Doijode
e25a0b138a zebra: Install directly connected route after interface flap
Issue:
After vlan flap, zebra was not marking the selected/best route as installed.

As a result, when a static route was configured with nexthop as directly
connected interface's(vlan) IP, the static route was not being installed
in the kernel since its nexthop was unresolved. The nexthop was marked
unresolved because zebra failed to mark the best route as installed after
interface flap.

This was happening because, in dplane_route_update_internal() if the old and
new context type, and nexthop group id are the same, then zebra doesn't send
down a route replace request to kernel. But, the installed (ROUTE_ENTRY_INSTALLED)
flag is set when zebra receives a response from kernel. Since the
request to kernel was being skipped for the route entry, installed flag
was not being set

Fix:
In dplane_route_update_internal() if the old and new context type, and
nexthop group id are the same, then before returning, installed flag will
be set on the route-entry if it's not set already.

Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
2023-04-10 16:03:23 -07:00
Donatas Abraitis
cf35e49354
Merge pull request #13214 from chiragshah6/fdev2
zebra:return empty dict in json when evpn is disabled
2023-04-06 12:48:52 +03:00
Mark Stapp
27552b48ab zebra: null-check client pointer during GR processing
Add a null check.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-04-05 12:30:52 -04:00
Sindhu Parvathi Gopinathan
61f3a6c353 zebra:return empty dict when evpn is disabled
"show evpn json" returns nothing when evpn is disabled.

Code has been fixed to return {} when evpn is disabled or no entry
available.

Before Fix:-
```
cumulus@r2:mgmt:~$ sudo vtysh -c "show evpn json"
cumulus@r2:mgmt:~$
```

After Fix:-
```
cumulus@r1:mgmt:~$ sudo vtysh -c "show evpn json"
{
}
cumulus@r1:mgmt:~$
```

Ticket:#3417955

Issue:3417955

Testing: UT done

Signed-off-by: Chirag Shah <chirag@nvidia.com>
Signed-off-by: Sindhu Parvathi Gopinathan <sgopinathan@nvidia.com>
2023-04-04 19:41:25 -07:00
Jafar Al-Gharaibeh
92c4494ce5
Merge pull request #13145 from donaldsharp/do_delete
Improve and fix zebra GR
2023-04-04 21:10:54 -05:00
Mark Stapp
38a2e2cb26 zebra: fix race during shutdown
During shutdown, the main pthread stops the dplane pthread
before exiting. Don't try to clean up any events scheduled
to the dplane pthread at that point - just let the thread
exit and clean up.

Signed-off-by: Mark Stapp <mjs@labn.net>
2023-04-04 16:37:38 -04:00
Russ White
c0656e9040
Merge pull request #12837 from donaldsharp/unlikely_routemap
Unlikely routemap
2023-04-04 08:20:25 -04:00
Christian Hopps
9ecc5f3603
Merge pull request #13179 from donaldsharp/array_size
isisd, zebra: Use array_size instead of ARRAY_SIZE
2023-04-02 08:21:41 +09:00
Donald Sharp
6cd594ecfd isisd, zebra: Use array_size instead of ARRAY_SIZE
Use the FRR provided array_size.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-31 13:58:47 -04:00
Donald Sharp
3cd0accb50 zebra: Cleanup ctx leak on shutdown and turn off event
two things:

On shutdown cleanup any events associated with the update walker.
Also do not allow new events to be created.

Fixes this mem-leak:

./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790:Direct leak of 8 byte(s) in 1 object(s) allocated from:
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #0 0x7f0dd0b08037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #1 0x7f0dd06c19f9 in qcalloc lib/memory.c:105
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #2 0x55b42fb605bc in rib_update_ctx_init zebra/zebra_rib.c:4383
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #3 0x55b42fb6088f in rib_update zebra/zebra_rib.c:4421
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #4 0x55b42fa00344 in netlink_link_change zebra/if_netlink.c:2221
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #5 0x55b42fa24622 in netlink_information_fetch zebra/kernel_netlink.c:399
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #6 0x55b42fa28c02 in netlink_parse_info zebra/kernel_netlink.c:1183
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #7 0x55b42fa24951 in kernel_read zebra/kernel_netlink.c:493
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #8 0x7f0dd0797f0c in event_call lib/event.c:1995
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #9 0x7f0dd0684fd9 in frr_run lib/libfrr.c:1185
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #10 0x55b42fa30caa in main zebra/main.c:465
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-    #11 0x7f0dd01b5d09 in __libc_start_main ../csu/libc-start.c:308
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-31 09:09:21 -04:00
Jafar Al-Gharaibeh
3b0e17067e
Merge pull request #13082 from inspurSDN/bugfix_zebra_crash_rebooting
zebra: move vrf deleting handle to zebra final state handle
2023-03-31 00:17:19 -05:00
Donald Sharp
81322b96b0 zebra: Ensure gr events run after Meta Queue has run
BGP signals to zebra that a afi has converged immediately
after it has finished processing all routes for a given
afi/safi.  This generates events in zebra in this order

a) Routes received from BGP, placed on early-rib Meta-Q
b) Signal GR for the afi.

Now imagine that zebra reads GR code and immediately
processes routes that are in the actual rib and
removes some routes.  This generates a

c) route deletion to the kernel for some number of
routes that may be in the the early-rib Meta-Q
d) Process the Meta-Q, and re-install the routes

This is undesirable behavior in zebra.  In that
while we may end up in a correct state, there
will be a blip for some number of routes that
happen to be in the early rib Meta-Q.

Modify the GR code to have it's own processing
entry at the end of the Meta-Q.  This will
allow all routes to be processed and ready
for handling by the Graceful Restart code.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 20:25:51 -04:00
Donald Sharp
644a8d3560 zebra: remove current_afi as that it is no longer used
After the restructure of the gr code to allow zebra_gr
to have individual cleanups of afi, this is no longer necessary.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 15:40:56 -04:00
Donald Sharp
347ded1ec8 zebra: Allow GR to run per AFI as they are reported
The GR code in FRR used to wait till all AFI's were complete
before cleaning up the routes from the upper level protocol.
This of course can lead to some weird situations where say
ipv4 finishes and then v6 is stuck waiting for a peer to come
up and never finishes.  v4 when it finishes signals zebra that
it is done but no action is taken at that moment.

Modify the code to allow the zebra_gr.c code to handle a per
afi removal, instead of doing it all at the end.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 15:40:56 -04:00
Donald Sharp
9c1c21da8a zebra: Rearrange zebra_gr zapi functions
The zebra_gr code had 3 functions when effectively only
1 was needed.  Cleans up some code weirdness around
multiple switch statements for the same api->cap
as well as consolidating down to only caring about
SAFI_UNICAST, since that is all we care about at the
moment.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 15:40:56 -04:00
Donald Sharp
0f5ef7f9b1 zebra: zebra GR only works with AFI's limit it
We have code that tracks both afi and safi's,
but we only ever operate on the afi's.  So lets
limit our work being done to something more sensible.

I'm leaving the safi being broadcast through the zapi
message, as that I am not sure what else should be ripped
out at this point in time.

Finally re-arrange the zread_client_capabilites function
to stop the multiple levels of function calling that really
serve no purpose.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 15:40:13 -04:00
Donald Sharp
096abfb815 zebra: Remove redundant check for pointers being good
By the time this function is called we have already
ensured that the pointers are good several times.
I like consistency but this is a bit much

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 07:48:42 -04:00
Donald Sharp
0c1fd82df6 zebra: GR code could potentially stop running
When GR is running and attempting to clear up a node
if the node that is currently saved and we are coming
back to happens to be deleted during the time zebra
suspends the GR code due to hitting the node limit
then zebra GR code will just completely stop processing
and potentially leave stale nodes around forever.

Let's just remove this hole and process what we can.
Can you imagine trying to debug this after the fact?

If we remove a node then that counts toward the maximum
to process of ZEBRA_MAX_STALE_ROUTE_COUNT.  This should
prevent any non-processing with a slightly larger cost
of having to look at a few nodes repeatedly

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 07:48:42 -04:00
Donald Sharp
559dbc2ea1 zebra: Cleanup indentation in function
Indentation was deep and hard to understand in
zebra_gr_delete_stale_route

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 07:48:42 -04:00
Donald Sharp
310ee91718 zebra: Just set the variable for what is wanted in GR code
The info->do_delete variable was being set to true only when
u.val was 1.  The problem with this is that u.val is a union
and the various ways that we can call this event causes
different values to be written to the union value on the thread.

This makes no sense.  Just set the variable to what we want it to
be when we need it to be true.  Since it was only ever set during
a thread_execute section.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-29 07:48:42 -04:00
Donald Sharp
9a7d1e7427 zebra: Use zebra_vrf_lookup_by_id when we can
Let's make this as consistent as is possible.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-28 15:49:50 -04:00
Donald Sharp
24a58196dd *: Convert event.h to frrevent.h
We should probably prevent any type of namespace collision
with something else.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
cd9d053741 *: Convert struct event_master to struct event_loop
Let's find a better name for it.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
e16d030c65 *: Convert THREAD_XXX macros to EVENT_XXX macros
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
70d4d90c82 lib, zebra: Convert THREAD_TIMER_STRLEN to EVENT_TIMER_STRLEN
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
2453d15dbf *: Convert struct thread_master to struct event_master and it's ilk
Convert the `struct thread_master` to `struct event_master`
across the code base.

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
5f6eaa9b96 *: Convert a bunch of thread_XX to event_XX
Convert these functions:

thread_getrusage
thread_cmd_init
thread_consumed_time
thread_timer_to_hhmmss
thread_is_scheduled
thread_ignore_late_timer

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
70c35c11f2 *: Convert thread_should_yield and thread_set_yield_time
Convert thread_should_yield and thread_set_yield_time
to event_should_yield and event_set_yield_time

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
4f830a0799 *: Convert thread_timer_remain_XXX to event_timer_remain_XXX
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
8c1186d38e *: Convert thread_execute to event_execute
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00
Donald Sharp
332beb64b8 *: Convert thread_cancelXXX to event_cancelXXX
Modify the code base so that thread_cancel becomes event_cancel

Signed-off-by: Donald Sharp <sharpd@nvidia.com>
2023-03-24 08:32:17 -04:00