We are effectively calling nexthop_active_update() on every
route entry being processed for installation at least 2 times.
This is a bit ridiculous. We need to resolve the nexthops
when we know a route has changed in some manner, so do so.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zlog() should be part of the public logging API as it's useful in
the cases where the logging priority isn't known at compile time
(i.e. it depends on a variable).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
L3VNI configured in a specific VRF is allowed to unconfigure from any
VRF, including default (global) VRF. This results L3VNI delete notification
to BGP and subsequent type-5 route uninstall from the VRF the L3VNI belong to.
This also resulted in the inconsistent running configuration.
The deleted L3VNI still shows up in its original VRF. The VRF in which the
"no vni <x>" was executed doesn't display its own L3VNI.
Added a VRF check in zebra to prevent this.
Signed-off-by: Kishore Aramalla <karamalla@vmware.com>
When having a route recovery, because of the route installation
cycling and the next hop label check, it could happen that the PW
never gets recovered. The original code shows the intention of retrying,
but the code was missing. The fix includes the call to the timer programming
the recovery attempt.
Example for reproducing the issue:
|P1| <-> |P2| <-> |P3|
- Being P1, P2, P3 nodes, using IS-IS as IGP, and having a pseudowire
betwen P1 and P3 (P1, P2, P3 having configured LDP daemons).
- After 60 seconds, kill the IS-IS daemon in P2.
- Wait 30 seconds
- Launch again the IS-IS daemon in P2
- The bug/issue is that after P1 <-> P3 recovering connectivity sometimes
the PW is not recovered because the reason explained in the first paragraph.
Signed-off-by: F. Aragon <paco@voltanet.io>
In zebra terminate path, the node was attempted to remove
twice from the RB_TREE table. This lead to a crash during
zebra shutdown zebra_router_free_table already calls RB_REMOVE
to remove a node from rb tree table.
siginfo=0x7fffd9134a30, context=<optimized out>) at lib/sigevent.c:249
rbt=<optimized out>, t=<optimized out>) at lib/openbsd-tree.c:226
t=0x56296965ff50 <zebra_router_table_head_RB_INFO>) at lib/openbsd-tree.c:383
rbt=rbt@entry=0x562969669bd0 <zrouter+16>, elm=elm@entry=0x56296afcf810)
at lib/openbsd-tree.c:393
(elm=0x56296afcf810, head=0x562969669bd0 <zrouter+16>) at zebra/zebra_router.h:46
Singned-off-by: Chirag Shah <chirag@cumulusnetworks.com>
We were memsetting zebra_pbr_rule struct after
we had already put some information in it. Also updated
the init of the struct to use braces instead of a
memset.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The `show ipv[4|6] <nht|import-check> ...` commands are starting
to produce a bunch of output due to multiple daemons now
using the code. Allow the specification of a v4 or v6 address
to allow the show command to only display the interesting nht.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This fix covers the case where two or more events are processed but only one
becoming effective. E.g. when mixing a synchronous label request from a LDP
deamon and an asynchronous request from a BGP daemon it could happen to the
BGP having the label chunk, but the LDP stuck waiting for the response.
Given e.g.
ldpd <-------->
(sync label request)
Zebra (label proxy) <--> Zebra (shared label manager)
bgpd <-------->
(async label request)
Sequence:
LDP label request ----->
Zebra (label proxy FW) ----> Zebra (LM)
BGP label request ----->
Zebra (label proxy FW) ----> Zebra (LM)
<---- Zebra (LM) RP LDP
<---- Zebra (LM) RP BGP
Signed-off-by: F. Aragon <paco@voltanet.io>
We don't use th vrf-level VRF_RIB_SCHEDULED flag any longer;
remove it and collapse the zebra_vrf flags' values.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The current code path of registration does this:
a) Lookup or create the rnh
b) register the client with the rnh for callback
If this is a new rnh send a response to the client that
only includes the rnh data that it has (nothing so no path)
If this is a existing rnh send the actual path to the client,
if it exists.
c) If a new client or a flag has changed refigure and send result
to all clients.
This is problematic in that suppose the rnh is new. Clients
will receive two answers:
1) A call back with no nexthops
2) A call back with the resolved # of nexthops
Imagine pim who depends on nht to handle this, pim will create
a mroute( because it does a hard lookup of the rpf as it is registering
the nexthop ), then it will receive the first callback causing
it to tear down the mroute and then receive the second callback
causing it to put it right back.. This is obviously not very
good for mroutes.
This code moves the send to the new client till after the new
client has connected, thus only allowing one callback to the new
client with the actual answer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Routing protocols are allowed ( and even encouraged ) to modify
the flags that influence the nexthop tracking. As such when
we modify the tracking of a nexthop to go from, say, connected force
or not we must re-evaluate the nexthop and send the results
up to the interested parties.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
After we have evaluated the rnh for an import-check type
and we copy the re then we know that the state has changed
and we should be notifying the end user about it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
LSP processing was a zvrf flag based upon a connected route
coming or going. But this did not allow us to know
that we should do lsp processing other than after the meta-queue
processing was finished.
Eventually we moved meta-queue processing of do_nht_processing
to after the dataplane sent the main pthread some results.
This of course left us with a timing hole where if a connected
route came in and we received a data plane response *before*
the meta queue was processed we would not do the work as necessary.
Move the lsp processing to a flag off of the rib_dest_t. If it
is marked then we need to process lsps.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a detailed debugging command for NHT tracking and add
the detailed output to the log about why we make some decisions
that we are. I tried to model this like the rib processing
detailed debugs that we added a few months back.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Currently nexthop tracking is performed for all nexthops that
are being tracked after a group of contexts are passed back
from the data plane for post install processing.
This is inefficient and leaves us sending nexthop tracking
changes at an accelerated pace, when we think we've changed
a route. Additionally every route change will cause us
to relook at all nexthops we are tracking irrelevant if
they are possibly related to the route change or not.
Let's modify the code base to track the rnh's off of the rib
table's rn, `rib_dest_t`. So after we process a node, install
it into the data plane, in rib_process_result we can
look at the `rib_dest_t` associated with the rn and see that
a nexthop depended on this route node. If so, refigure it.
Additionally we will store rnh's that are not resolved on the
0.0.0.0/0 nexthop tracking list. As such when a route node
changes we can quickly walk up the rib tree and notice that
it needs to be reprocessed as well.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a default route_node for our routing tables. This will allow us
to know that we can hang data off the default route for processing.
We will be hanging the nexthop tracking data structures off the rib_dest_t
so that we can know which nexthops we need to handle. Effectively
nexthops that we are tracking that are unresolved will be stored on the
default route. When something changes in the rib tree we can
work up the rn->parent pointer checking for nexthops we need to re-evaluate.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The resolved_route is the prefix we are using in the routing table
to resolve this particular nexthop we are tracking. Add code
to better track it's change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The prn value as passed in may be NULL as such do not
allow it to be derefed (even though it works now).
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have several route types KERNEL and CONNECT that are handled via special
case in the code. This was causing a lot of work keeping the two different
classes of route types as special(SYSTEM OR NOT). Put the dplane
in charge of the code that sets the bits for signalling route install/failure.
This greatly simplifies the code calling path and makes all route types
be handled exactly the same. Additionaly code that we want to run
post data plane install can just work as per normal then, instead
of having to know we need to run it when we have a special type
of route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
When we get a route install failure from the kernel, actually
indicate in the rib the status of the routes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When switching routes from one route type to another actually
unset the old route as enqueued.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When shutting down, the individual vrf's own the shutdown of the table
and subsuquent removal from the routes from the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When shutting down and we have a very large table to shutdown
and after we've intentionally closed all the client connections
close the zebra zserv client socket.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It had no logical reason to be in the default VRF. This moves it to the
zebra_router, which is better suited to store global references.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
A lot of checks relied on the VRF ID and the EVPN VRF ID to be the same.
This patch changes those checks to the EVPN_ENABLED macro, which checks
if the VRF is the EVPN one.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Fix the macros for reading NLA attribute info
from an extended error ack. We were processing the data
using route attributes (rtattr) which is identical in size
to nlattr but probably should not be used.
Further, we were incorrectly calculating the length of the
inner netlink message that cause the error. We have to read
passed that in order to access all the nlattr's.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
For a MAC-IP pair generally local/netlink msg for
MAC is received followed by Neigh. The MAC can be detected as duplicate
during this event.
When a neigh update is received, the neigh inherits DUP flag from its
MAC and along with that mark the neigh as INACTIVE.
Also, In the case of DUP detected neigh, do not update its state
to ACTIVE before determining to send notification to bgpd.
There is a time when Neigh update received prior to MAC update.
In that case neigh is marked as inactive since its MAC is
still in REMOTE state. Once the MAC update is received and
it is detected as DUPLICATE, the neigh would inherit DUP flag
but remained in inactive state.
By fixing the first case, the neigh remains in inactive once
detected as DUPLICATE in both scenarios.
The unfreeze action would mark all inherited neighs to ACTIVE,
and clears DUP flag then sends notification to bgpd (to send type-2).
Ticket:CM-24339
Reviewed By:CCR-8451
Testing Done:
Validated dup detection on both environment where neigh and mac
notification can come as either one first.
With the fix, the neigh was remained in "inactive" state
once detected as duplicate.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The 'sho ip route summary' and 'sho ip route summary <prefix>'
paths used different definitions of a 'fib' route. Use
the route-entry 'INSTALLED' flag in both places.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This replaces manual checks of the flag with a wrapper macro to convey
the meaning "is evpn enabled on this vrf?"
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Rename {bgp,zvrf}_def{ault} to {bgp,zvrf}_evpn where it makes sense,
i.e. when they contain the EVPN instance.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Since the EVPN VRF may not be the default one, compare received
messages' VRF agains the EVPN VRF and not the Default.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
This uses the EPVN VRF to store L3VNIs hashes, and looks up L2VNIs in
this VRF as they are stored there.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
This sends local VNIs and local MAC addresses to the BGP instance
responsible for EVPN rather than the default one.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Since the EVPN session and underlay can be in a non-default VRF, the
default VRF can be an overlay VRF.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
If the EVPN VRF is not the default one (i.e. with advertise-all-vni),
this allows showing its information with `show bgp l2evpn evpn ...`
commands. They do not require adding `vrf VRFNAME` since we only
support a single EVPN VRF. The same is true for zebra-specific commands
(e.g. `show evpn ...`).
Configuration commands are not restricted to the default VRF but to
the EVPN one, that is to the one bearing `advertise-all-vni`.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
The EVPN VRF is defined by bgpd, and is the one vrf where
`advertise-all-vni` is present.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Duplicate address detection and recovery was relying on the l2-vni backptr
in the neighbor entry which was simply not initialized resulting in
a NULL pointer access in a setup with dup-addressed VMs -
VM1:{IP1,M1} and VM2:{IP1,M2}
Call stack:
(gdb) bt 6
at lib/sigevent.c:249
nbr=nbr@entry=0x559347f901d0, vtep_ip=..., vtep_ip@entry=..., do_dad=do_dad@entry=true,
is_dup_detect=is_dup_detect@entry=0x7ffc7f6be59f, is_local=is_local@entry=true)
at ./lib/ipaddr.h:86
ip=0x7ffc7f6be6f0, ifp=0x559347f901d0, zvni=0x559347f86800) at zebra/zebra_vxlan.c:3152
(More stack frames follow...)
(gdb) p nbr->zvni
$8 = (zebra_vni_t *) 0x0 <<<<<<<<<<<<<<<<<<<<
(gdb)
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
System Routes if received over the netlink bus in a
specific pattern that causes an update operation for that
route in zebra can leave the dest->selected_fib pointer NULL,
while having the ZEBRA_FLAG_SELECTED flag set. Specifically
one way to achieve this is to do this:
`ip addr del 4.5.6.7/32 dev swp1 ; ip addr add 4.5.6.7/32 dev swp1 metric 9`
Why is this a big deal?
Because nexthop tracking is looking at ZEBRA_FLAG_SELECTED to
know if we can use a route, while nexthop active checking uses
dest->selected_fib.
So imagine we have bgp registering a nexthop. nexthop tracking in
the above case will be able to choose the 4.5.6.7/32 route
if that is what the nexthop is, due to the ZEBRA_FLAG_SELECTED being
properly set. BGP then allows the peers connection to come up and we
install routes with a 4.5.6.7 nexthop. The rib processing for route
installation will then look at the 4.5.6.7 route see no
dest->selected_fib and then start walking up the tree to resolve
the route. In our case we could easily hit the default route and be
unable to resolve the route. Which then becomes inactive in the
rib so we never attempt to install it.
This commit fixes this problem because when the rib_process decides
that we need to update the fib( ie replace old w/ new ), the
replacement with new was not setting the `dest->selected_fib` pointer
to the new route_entry, when the route was a system route.
Ticket: CM-24203
Signed-off-by: Donald Sharp <sharpd@cumulusnetworkscom>
The dest->selected_fib should be reported in json output
so that we can debug subtle conditions a bit better in the
future.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleaup the rnh tables on shutdown before we cleanup tables. As that
this will remove any need to do rnh processing as part of shutdown.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we get a neighbor entry in zebra we start processing it.
Let's add some additional debugs to the processing so that when
it bails out and we don't use the data, we know the reason.
This should help in debugging the problems from why bgp does
not appear to have data associated with a neighbor entry
in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The check for an entry being NUD_PERMANENT has already been done
there is no need to do it twice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use const in the accessors for pseudowire nhlfe data; pull
that through the kernel-facing apis that use that data.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
In prep for adding nexthop info for pws, rename the accessor
for the pw destination. Add a nexthop-group to the pw
data in the dataplane module.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The current definition of an unnumberd interface as an interface with a
/32 IPv4 is too restrictive, especially for EVPN symmetric routing since
commit 2b83602b2 "*: Explicitly mark nexthop of EVPN-sourced routes as
onlink".
It removes the bypass check wether the nexthop is an EVPN VTEP, and
relies on the SVI to be unnumberd to bypass the gateway lookup. While
this works great if the SVI has an IP, it might not, and the test falls
flat and EVPN type 5 routes are not installed into the RIB.
Sample interface setup, where vxlan-blue is the L3VNI and br-blue the
SVI:
+----------+
| |
| vrf-blue |
| |
+---+--+---+
| |
+-------+ +-----------+
| |
+----+----+ +---------+---------+
| | | br1 |
| br-blue | | 10.0.0.1/24 |
| | +-+-------+-------+-+
+----+----+ | | |
| | | |
+-----+------+ +-----+--+ +--+---+ +-+----+
| | | | | | | |
| vxlan-blue | | vxlan1 | | eth1 | | eth2 |
| | | | | | | |
+------------+ +--------+ +------+ +------+
For inter-VNI routing, the SVI has no reason to have an IP, but it still
needs type-5 routes from remote VTEPs.
This commit expands the definition of an unnumberd interface to an
interface having a /32 IPv4 or no IPv4 at all.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
When a vrf is deleted we need to tell the zebra_router that we have
finished using the tables we are keeping track of. This will allow
us to properly cleanup the data structures associated with them.
This fixes this valgrind error found:
==8579== Invalid read of size 8
==8579== at 0x430034: zvrf_id (zebra_vrf.h:167)
==8579== by 0x432366: rib_process (zebra_rib.c:1580)
==8579== by 0x432366: process_subq (zebra_rib.c:2092)
==8579== by 0x432366: meta_queue_process (zebra_rib.c:2188)
==8579== by 0x48C99FE: work_queue_run (workqueue.c:291)
==8579== by 0x48C3788: thread_call (thread.c:1607)
==8579== by 0x48A2E9E: frr_run (libfrr.c:1011)
==8579== by 0x41316A: main (main.c:473)
==8579== Address 0x5aeb750 is 0 bytes inside a block of size 4,424 free'd
==8579== at 0x4839A0C: free (vg_replace_malloc.c:540)
==8579== by 0x438914: zebra_vrf_delete (zebra_vrf.c:279)
==8579== by 0x48C4225: vrf_delete (vrf.c:243)
==8579== by 0x48C4225: vrf_delete (vrf.c:217)
==8579== by 0x4151CE: netlink_vrf_change (if_netlink.c:364)
==8579== by 0x416810: netlink_link_change (if_netlink.c:1189)
==8579== by 0x41C1FC: netlink_parse_info (kernel_netlink.c:904)
==8579== by 0x41C2D3: kernel_read (kernel_netlink.c:389)
==8579== by 0x48C3788: thread_call (thread.c:1607)
==8579== by 0x48A2E9E: frr_run (libfrr.c:1011)
==8579== by 0x41316A: main (main.c:473)
==8579== Block was alloc'd at
==8579== at 0x483AB1A: calloc (vg_replace_malloc.c:762)
==8579== by 0x48A6030: qcalloc (memory.c:110)
==8579== by 0x4389EF: zebra_vrf_alloc (zebra_vrf.c:382)
==8579== by 0x438A42: zebra_vrf_new (zebra_vrf.c:93)
==8579== by 0x48C40AD: vrf_get (vrf.c:209)
==8579== by 0x415144: netlink_vrf_change (if_netlink.c:319)
==8579== by 0x415E90: netlink_interface (if_netlink.c:653)
==8579== by 0x41C1FC: netlink_parse_info (kernel_netlink.c:904)
==8579== by 0x4163E8: interface_lookup_netlink (if_netlink.c:760)
==8579== by 0x42BB37: zebra_ns_enable (zebra_ns.c:130)
==8579== by 0x42BC5E: zebra_ns_init (zebra_ns.c:208)
==8579== by 0x4130F4: main (main.c:401)
This can be found by: `ip link del <VRF DEVICE NAME>` then `ip link add <NAME> type vrf table X` again and
then attempting to use the vrf.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we install a new route into the kernel always use
REPLACE. Else if the route is already there it can
be translated into an append with the flags we are
using.
This is especially true for the way we handle pbr
routes as that we are re-installing the same route
entry from pbr at the moment.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the next hop's VRF is used for IPv4 and IPv6 unicast routes
sourced from EVPN routes, for next hop and Router MAC tracking and
install. This way, leaked routes from other instances are handled properly.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface. Howver, in the model that
is supported in the implementation and commonly deployed, there is no
explicit Overlay IP address associated with the next hop in the tenant
VRF; the underlay IP is used if (since) the forwarding plane requires
a next hop IP. Therefore, the next hop has to be explicit flagged as
onlink to cause any next hop reachability checks in the forwarding plane
to be skipped.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
Use existing mechanism to specify the nexthops as onlink when installing
these routes from bgpd to zebra and get rid of a special flag that was
introduced for EVPN-sourced routes. Also, use the onlink flag during next
hop validation in zebra and eliminate other special checks.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
Use the L3 interface exchanged between zebra and bgp in route install.
This patch in conjunction with the earlier one helps to eliminate some
special code in zebra to derive the next hop's interface.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
The implementation currently derives this L3 interface for EVPN tenant
routes using special code that looks at route flags. This patch
exchanges the L3 interface between zebra and bgpd as part of the L3-VNI
exchange in order to eliminate some this special code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>