do not add a new route type, and consider 0 as a value meaning
that zebra should be the owner.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
zapi_nbr structure is renamed to zapi_neigh_ip.
Initially used to set a neighbor ip entry for gre interfaces, this
structure is used to get events from the zebra layer to nhrp layer.
The ndm state has been added, as it is needed on both sides.
The zebra dplane layer is slightly modified.
Also, to clarify what ZEBRA_NEIGH_ADD/DEL means, a rename is done:
it is called now ZEBRA_NEIGH_IP_ADD/DEL, and it signified that this
zapi interface permits to set link operations by associating ip
addresses to link addresses.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Instead of directly configuring the neighbor table after read from zapi
interface, a zebra dplane context is prepared to host the interface and
the family where the neighbor table is updated. Also, some other fields
are hosted: app_probes, ucast_probes, and mcast_probes. More information
on those fields can be found on ip-ntable configuration.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
EVPN neighbor operations were already done in the zebra dataplane
framework. Now that NHRP is able to use zebra to perform neighbor IP
operations (by programming link IP operations), handle this operation
under dataplane framework:
- assign two new operations NEIGH_IP_INSTALL and NEIGH_IP_DELETE; this
is reserved for GRE like interfaces:
example: ip neigh add A.B.C.D lladdr E.F.G.H
- use 'struct ipaddr' to store and encode the link ip address
- reuse dplane_neigh_info, and create an union with mac address
- reuse the protocol type and use it for neighbor operations; this
permits to store the daemon originating this neighbor operation.
a new route type is created: ZEBRA_ROUTE_NEIGH.
- the netlink level functions will handle a pointer, and a type; the
type indicates the family of the pointer: AF_INET or AF_INET6 if the
link type is an ip address, mac address otherwise.
- to keep backward compatibility with old queries, as no extension was
done, an option NEIGH_NO_EXTENSION has been put in place
- also, 2 new state flags are used: NUD_PERMANENT and NUD_FAILED.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This one also needed a bit of shuffling around, but MTYPE_RE is the only
one left used across file boundaries now.
Signed-off-by: David Lamparter <equinox@diac24.net>
Back when I put this together in 2015, ISO C11 was still reasonably new
and we couldn't require it just yet. Without ISO C11, there is no
"good" way (only bad hacks) to require a semicolon after a macro that
ends with a function definition. And if you added one anyway, you'd get
"spurious semicolon" warnings on some compilers...
With C11, `_Static_assert()` at the end of a macro will make it so that
the semicolon is properly required, consumed, and not warned about.
Consistently requiring semicolons after "file-level" macros matches
Linux kernel coding style and helps some editors against mis-syntax'ing
these macros.
Signed-off-by: David Lamparter <equinox@diac24.net>
like it has been done for iptable contexts, a zebra dplane context is
created for each ipset/ipset entry event. The zebra_dplane_ctx job is
then enqueued and processed by separate thread. Like it has been done
for zebra_pbr_iptable context, the ipset and ipset entry contexts are
encapsulated into an union of structures in zebra_dplane_ctx.
There is a specificity in that when storing ipset_entry structure, there
was a backpointer pointer to the ipset structure that is necessary
to get some complementary information before calling the hook. The
proposal is to use an ipset_entry_info structure next to the ipset_entry,
in the zebra_dplane context. That information is used for ipset_entry
processing. The ipset name and the ipset type are the only fields
necessary.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The iptable processing was not handled in remote dataplane, and was
directly processed by the thread in charge of zapi calls. Now that call
can be handled in the zebra_dplane separate thread. once a
zebra_dplane_ctx is allocated for iptable handling, the hook call is
performed later. Subsequently, a return code may be triggered to zclient
interface if any problem occurs when calling the hook call.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Feature overview:
=================
A 802.3ad bond can be setup to allow lacp-bypass. This is done to enable
servers to pxe boot without a LACP license i.e. allows the bond to go oper
up (with a single link) without LACP converging.
If an ES-bond is oper-up in an "LACP-bypass" state MH treats it as a non-ES
bond. This involves the following special handling -
1. If the bond is in a bypass-state the associated ES is placed in a
bypass state.
2. If an ES is in a bypass state -
a. DF election is disabled (i.e. assumed DF)
b. SPH filter is not installed.
3. MACs learnt via the host bond are advertised with a zero ESI.
When the ES moves out of "bypass" the MACs are moved from a zero-ESI to
the correct non-zero id. This is treated as a local station move.
Implementation:
===============
When (a) an ES is detached from a hostbond or (b) an ES-bond goes into
LACP bypass zebra deletes all the local macs (with that ES as destination)
in the kernel and its local db. BGP re-sends any imported MAC-IP routes
that may exist with this ES destination as remote routes i.e. zebra can
end up programming a MAC that was perviously local as remote pointing
to a VTEP-ECMP group.
When an ES is attached to a hostbond or an ES-bond goes
LACP-up (out of bypss) zebra again deletes all the local macs in the
kernel and its local db. At this point BGP resends any imported MAC-IP
routes that may exist with this ES destination as sync routes i.e.
zebra can end up programming a MAC that was perviously remote
as local pointing to an access port.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
If the dataplane thread hits the work limit while processing the
output queue for any given provider, we now explicitly reschedule
the thread.
Otherwise, if the number of items in the output queue is greater than
the work limit, draining of that output queue is dependent on new
dataplane work.
Routes which are not drained from the output queue are stuck with
the 'q' flag, so this is a similar issue to that observed in
164d8e8608.
Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
The way a couple of clauses were placed in a loop meant that
some info might not be collected - re-order things just a bit.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Returns the current number of (completed) contexts in the provider's
output queue (dp_ctx_out_q), allowing access to this data from the
provider itself.
Signed-off-by: Duncan Eastoe <duncan.eastoe@att.com>
Add the current queue depths for each plugin to the
'show dplane providers' output. Maintain the out-bound queue
max counter properly, that was being ignored.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add an api that allows a caller in the zebra main pthread to
process the queue of pending dplane updates. The caller supplies
a function to call to test each pending context. Selected
contexts are dequeued, and freed without being processed.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add extra data about the interfaces used in route updates'
nexthops - some consumers of route updates may want additional
data, but dataplane plugins running in the dplane pthread
cannot safely access the normal zebra data structures. Capturing
this info is optional - a plugin must request it (via an api).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
split horizon filter, non-DF block filter and backup nexthop group
are passed as bridge port attributes to the dataplane.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
This includes -
1. non-DF block filter
2. List of es-peers that need to be blocked per-access port (for
split horizon filtering)
3. Backup nexthop group to failover local-es via the VxLAN overlay
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Limit the not re-installation of routes with the same NHG ID
to routes that are using the new NHG PROTO API. This would
only include sharpd and EVPN-MH for now.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When the dataplane detects that we have no need to
reinstall the same route, setup the NEXTHOP_FLAG_FIB
appropriately.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we have received a route that the already existing
route is exactly the same, just note that it happened
and move on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When installing rules pass by the interface name across
zapi.
This is being changed because we have a situation where
if you quickly create/destroy ephermeal interfaces under
linux the upper level protocol may be trying to add
a rule for a interface that does not quite exist
at the moment. Since ip rules actually want the
interface name ( to handle just this sort of situation )
convert over to passing the interface name and storing
it and using it in zebra.
Ticket: CM-31042
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
During testing it was noticed that routes were considered
installed by zebra, but the kernel did not have the route.
Upon close debugging of the rib it was noticed that FRR
was turning a dplane_ctx_route_init into a success and
FRR was now in a bad state.
2020/08/26 17:55:53.897436 PBR: route_notify_owner: [0.0.0.0/0] Route Removed succeeded for table: 10012
2020/08/26 17:55:53.897572 ZEBRA: 0.0.0.0/0: uptime == 432033, type == 24, instance == 0, table == 10012
2020/08/26 17:55:53.897622 ZEBRA: rib_meta_queue_add: (0:10012):0.0.0.0/0: queued rn 0x5566b0ea7680 into sub-queue 5
2020/08/26 17:55:53.907637 ZEBRA: default(0:10012):0.0.0.0/0: Processing rn 0x5566b0ea7680
2020/08/26 17:55:53.907665 ZEBRA: default(0:10012):0.0.0.0/0: Examine re 0x5566b0d01200 (pbr) status 2 flags 1 dist 200 metric 0
2020/08/26 17:55:53.907702 ZEBRA: default(0:10012):0.0.0.0/0: After processing: old_selected 0x0 new_selected 0x5566b0d01200 old_fib 0x0 new_fib 0x5566b0d01200
2020/08/26 17:55:53.907713 ZEBRA: default(0:10012):0.0.0.0/0: Adding route rn 0x5566b0ea7680, re 0x5566b0d01200 (pbr)
2020/08/26 17:55:53.907879 ZEBRA: default(0:10012):0.0.0.0/0: rn 0x5566b0ea7680 dequeued from sub-queue 5
2020/08/26 17:55:53.907943 ZEBRA: netlink_route_multipath: RTM_NEWROUTE 0.0.0.0/0 vrf 0(10012)
2020/08/26 17:55:53.910756 ZEBRA: default(0:10012):0.0.0.0/0 Processing dplane result ctx 0x5566b0ea82f0, op ROUTE_INSTALL result SUCCESS
2020/08/26 17:55:53.910769 ZEBRA: update_from_ctx: default(0:10012):0.0.0.0/0: SELECTED, re 0x5566b0d01200
2020/08/26 17:55:53.910785 ZEBRA: default(0:10012):0.0.0.0/0 update_from_ctx(): no fib nhg
2020/08/26 17:55:53.910793 ZEBRA: default(0:10012):0.0.0.0/0 update_from_ctx(): rib nhg matched, changed 'true'
2020/08/26 17:55:53.910802 ZEBRA: (0:10012):0.0.0.0/0: Redist update re 0x5566b0d01200 (pbr), old 0x0 (None)
2020/08/26 17:55:53.910812 ZEBRA: Notifying Owner: 24 about prefix 0.0.0.0/0(10012) 2 vrf: 0
2020/08/26 17:55:53.910912 PBR: route_notify_owner: [0.0.0.0/0] Route installed succeeded for table: 10012
2020/08/26 17:55:55.400516 ZEBRA: RTM_DELROUTE 0.0.0.0/0 vrf default(0) table_id: 10012 metric: 20 Admin Distance: 0
2020/08/26 17:55:55.400527 ZEBRA: rib_delete: (0:10012):0.0.0.0/0: rn 0x5566b0ea7680, re 0x5566b0d01200 (pbr) was deleted from kernel, adding
We were receiving a notification from the kernel that the route was deleted and deciding
that we needed to reinstall it. At that point in time when it got into the dplane
handlers to convert it to the dplane pthread, the dplane decided to drop the request
convert it too a success and not do anything.
This code change removes the conversion from this failure to success and
notifies the upper level about it. After this change the default route
to table 10012 is now properly marked as rejected:
root@mlx-2700-07:mgmt:/var/log/frr# vtysh -c "show ip route table 10012"
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF default table 10012:
F>r 0.0.0.0/0 [200/0] via 172.168.1.164, isp2-uplink (vrf PUBLIC), weight 1, 00:24:48
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are not using nexthop groups, there is no need to
test for whether or not they are installed correctly or not
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We can make the Linux kernel send an ARP/NDP request by adding
a neighbour with the 'NUD_INCOMPLETE' state and the 'NTF_USE' flag.
This commit adds new dataplane operation as well as new zapi message
to allow other daemons send ARP/NDP requests.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
Reverting probing of neigh entry. There is a timing where
probe and remote macip add request comes at the same time resulting
in neigh to remain in local state event though it should be remote.
In mobility case, the host moves to remote VTEP, first MAC only type-2
route is received which triggers a PROBE of neighs (associated to MAC).
PROBE request can go via network port to remote VTEP.
PROBE request picks up local neigh with MAC entry's outgoing port is
remote VTEP tunnel port.
The PROBE reply and MAC-IP (containing IP) almost comes same time at
DUT.
DUT first processes remote macip and installs neigh as remote.
Followed by receives neigh as REACHABLE which marks neigh as LOCAL.
FRR does have BPF filter which does not allow its own netlink request
to receive. Otherwise frr's request to program neigh as remote can move
neigh from local to remote.
Though ordering can not be guranteed that REACHABLE (PROBE's repsonse)
can come at anytime and move it to LOCAL.
This fix would not suffice the needs of converging LOCAL inactive neighs
to remove from DB. As mobility draft sugges to PROBE local neigh when
MAC moves to remote but it is not working with current framework.
Ticket:CM-22864
This reverts commit 44bc8ae550
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Multihoming support requires a new dataplane feature, MAC-ECMP, to
bridge traffic to remote ESs that are attached to more than one
active VTEP.
As a part of this support indirection has also been added via
L2-NHGs. Using a nexthop group allows for fast failover
of MAC entries when an access port attached to a remote-ES goes
down i.e. instead of updating many MAC entries this becomes a
single NHG update to the dataplane.
Note: Some of the code here needs to be reworked to the new
dataplane model.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Include any installed backup nexthops when installing
pseudowires; include installed backups in vty and json
pw show output.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Improve vty output for routes and lsps with backups, including
json. Simplify or correct some code that uses both primary and
backup nexthops in dplane, nht.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Include any installed backups when updating the local kernel
after processing an async notification. This includes routes'
nexthops and LSPs' nhlfes.
Add the 'b' character to the route show display and header to
indicate backup nexthops.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Extend PBR maps to discriminate by Differentiated Services Code Point and / or
Explicit Congestion Notification fields. These fields are used in the IP header
for classifying network traffic.
0 1 2 3 4 5 6 7
+-----+-----+-----+-----+-----+-----+-----+-----+
| DS FIELD, DSCP | ECN FIELD |
+-----+-----+-----+-----+-----+-----+-----+-----+
DSCP: differentiated services codepoint
ECN: Explicit Congestion Notification
Signed-off-by: Wesley Coakley <wcoakley@nvidia.com>
Signed-off-by: Saurav Kumar Paul <saurav@cumulusnetworks.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
Add an init api (based on what had been a private/static api)
to allow a caller to init a context and use it to generate LSP
updates. This might be useful for testing, or from a dplane
plugin.
Signed-off-by: Mark Stapp <mjs@voltanet.io>