commit: 0eb97b860d
Removed this chunk of code in zebra:
- if (ifp)
- if (connected_is_unnumbered(ifp))
- SET_FLAG(nexthop->flags, NEXTHOP_FLAG_ONLINK);
Effectively if we had a NEXTHOP_TYPE_IPV4_IFINDEX we would
auto set the onlink flag. This commit dropped it for some reason.
Add it back in an intelligent manner.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
My previous patch to fix a memory leak, caused by not properly freeing
the iptable iface list on stream parse failure, created/exposed a heap
use after free because we were not doing a deep copy
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
With recent changes to the lib nexthop_group
APIs (e1f3a8eb19), we are making
new assumptions that this should be adding a single nexthop
to a group, not a list of nexthops.
This broke the case of a recursive nexthop resolving to a group:
```
D> 2.2.2.1/32 [150/0] via 1.1.1.1 (recursive), 00:00:09
* via 1.1.1.1, dummy1 onlink, 00:00:09
via 1.1.1.2 (recursive), 00:00:09
* via 1.1.1.2, dummy2 onlink, 00:00:09
D> 3.3.3.1/32 [150/0] via 2.2.2.1 (recursive), 00:00:04
* via 1.1.1.1, dummy1 onlink, 00:00:04
K * 10.0.0.0/8 [0/1] via 172.27.227.148, tun0, 00:00:21
```
This group can instead just directly point to the nh that was passed.
Its only being used for a lookup (the memory gets copied and used
elsewhere if the nexthop is not found).
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Make the nexthop_copy/nexthop_dup APIs more consistent by
adding a secondary, non-recursive, version of them. Before,
it was inconsistent whether the APIs were expected to copy
recursive info or not. Make it clear now that the default is
recursive info is copied unless the _no_recurse() version is
called. These APIs are not heavily used so it is fine to
change them for now.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
cb86eba3ab was causing zebra to crash
when handling a nexthop group that had a nexthop which was recursively resolved.
Steps to recreate:
!
nexthop-group red
nexthop 1.1.1.1
nexthop 1.1.1.2
!
sharp install routes 8.8.8.1 nexthop-group red 1
=========================================
==11898== Invalid write of size 8
==11898== at 0x48E53B4: _nexthop_add_sorted (nexthop_group.c:254)
==11898== by 0x48E5336: nexthop_group_add_sorted (nexthop_group.c:296)
==11898== by 0x453593: handle_recursive_depend (zebra_nhg.c:481)
==11898== by 0x451CA8: zebra_nhg_find (zebra_nhg.c:572)
==11898== by 0x4530FB: zebra_nhg_find_nexthop (zebra_nhg.c:597)
==11898== by 0x4536B4: depends_find (zebra_nhg.c:1065)
==11898== by 0x453526: depends_find_add (zebra_nhg.c:1087)
==11898== by 0x451C4D: zebra_nhg_find (zebra_nhg.c:567)
==11898== by 0x4519DE: zebra_nhg_rib_find (zebra_nhg.c:1126)
==11898== by 0x452268: nexthop_active_update (zebra_nhg.c:1729)
==11898== by 0x461517: rib_process (zebra_rib.c:1049)
==11898== by 0x4610C8: process_subq_route (zebra_rib.c:1967)
==11898== Address 0x0 is not stack'd, malloc'd or (recently) free'd
Zebra crashes because we weren't handling the case of the depend nexthop
being recursive.
For this case, we cannot make the function more efficient. A nexthop
could resolve to a group of any size, thus we need allocs/frees.
To solve this and retain the goal of the original patch, we separate out the
two cases so it will still be more efficient if the nexthop is not recursive.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The only two safi's that are usable for zebra for installation
of routes into the rib are SAFI_UNICAST and SAFI_MULTICAST.
The acceptance of other safi's is causing a memory leak:
Direct leak of 56 byte(s) in 1 object(s) allocated from:
#0 0x5332f2 in calloc (/usr/lib/frr/zebra+0x5332f2)
#1 0x7f594adc29db in qcalloc /opt/build/frr/lib/memory.c:110:27
#2 0x686849 in zebra_vrf_get_table_with_table_id /opt/build/frr/zebra/zebra_vrf.c:390:11
#3 0x65a245 in rib_add_multipath /opt/build/frr/zebra/zebra_rib.c:2591:10
#4 0x7211bc in zread_route_add /opt/build/frr/zebra/zapi_msg.c:1616:8
#5 0x73063c in zserv_handle_commands /opt/build/frr/zebra/zapi_msg.c:2682:2
Collapse
Sequence of events:
Upon vrf creation there is a zvrf->table[afi][safi] data structure
that tables are auto created for. These tables only create SAFI_UNICAST
and SAFI_MULTICAST tables. Since these are the only safi types that
are zebra can actually work on. zvrf data structures also have a
zvrf->otable data structure that tracks in a RB tree other tables
that are created ( say you have routes stuck in any random table
in the 32bit route table space in linux ). This data structure is
only used if the lookup in zvrf->table[afi][safi] fails.
After creation if we pass a route down from an upper level protocol
that has non unicast or multicast safi *but* has the actual
tableid of the vrf we are in, the initial lookup will always
return NULL leaving us to look in the otable. This will create
a data structure to track this data.
If after this event you pass in a second route with the same
afi/safi/table_id, the otable will be created and attempted
to be stored, but the RB_TREE_UNIQ data structure when it sees
this will return the original otable returned and the lookup function
zebra_vrf_get_table_with_table_id will just drop the second otable.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
==25402==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x533302 in calloc (/usr/lib/frr/zebra+0x533302)
#1 0x7fee84cdc80b in qcalloc /home/qlyoung/frr/lib/memory.c:110:27
#2 0x5a3032 in create_label_chunk /home/qlyoung/frr/zebra/label_manager.c:188:3
#3 0x5a3c2b in assign_label_chunk /home/qlyoung/frr/zebra/label_manager.c:354:8
#4 0x5a2a38 in label_manager_get_chunk /home/qlyoung/frr/zebra/label_manager.c:424:9
#5 0x5a1412 in hook_call_lm_get_chunk /home/qlyoung/frr/zebra/label_manager.c:60:1
#6 0x5a1412 in lm_get_chunk_call /home/qlyoung/frr/zebra/label_manager.c:81:2
#7 0x72a234 in zread_get_label_chunk /home/qlyoung/frr/zebra/zapi_msg.c:2026:2
#8 0x72a234 in zread_label_manager_request /home/qlyoung/frr/zebra/zapi_msg.c:2073:4
#9 0x73150c in zserv_handle_commands /home/qlyoung/frr/zebra/zapi_msg.c:2688:2
When creating label chunk that has a specified base, we eventually are
calling assign_specific_label_chunk. This function finds the appropriate
list node and deletes it from the lbl_mgr.lc_list but since
the function uses list_delete_node() the deletion function that is
specified for lbl_mgr.lc_list is not called thus dropping the memory.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The vrrpd one conflicts with the standalone vrrpd package; also we're
installing daemons to /usr/lib/frr on some systems so they're not on
PATH.
Signed-off-by: David Lamparter <equinox@diac24.net>
Previous patches introduced various issues:
- Removal of stream_free() to fix double free caused memleak
- Patch for memleak was incomplete
This should fix it hopefully.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The existing usage of the rta_nest and addattr_nest
functions were not adding the NLA_F_NESTED flag
to the type. As such the new nexthop functionality was
actually looking for this flag, while apparently older
code did not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add error handling for top level failures (not able to
execute command, unable to find vrf for command, etc.)
With this error handling we add a new zapi message type
of ZEBRA_ERROR used when we are unable to properly handle
a zapi command and pass it down into the lower level code.
In the event of this, we reply with a message of type
enum zebra_error_types containing the error type.
The sent packet will look like so:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Marker | Version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VRF ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Command |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ERROR TYPE |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Also add appropriate hooks for clients to subscribe to for
handling these types of errors.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
There's confusion between the nexthop-group configuration and a
zebra-specific show command. For now, make the zebra show
command string RIB-specific until we're able to unify these
paths.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
=================================================================
==3058==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x7f5bf3ef7477 bp 0x7ffdfaa20d40 sp 0x7ffdfaa204c8 T0)
==3058==The signal is caused by a READ memory access.
==3058==Hint: address points to the zero page.
#0 0x7f5bf3ef7476 in memcpy /build/glibc-OTsEL5/glibc-2.27/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:134
#1 0x4d158a in __asan_memcpy (/usr/lib/frr/zebra+0x4d158a)
#2 0x7f5bf58da8ad in stream_put /home/qlyoung/frr/lib/stream.c:605:3
#3 0x67d428 in zsend_ipset_entry_notify_owner /home/qlyoung/frr/zebra/zapi_msg.c:851:2
#4 0x5c70b3 in zebra_pbr_add_ipset_entry /home/qlyoung/frr/zebra/zebra_pbr.c
#5 0x68e1bb in zread_ipset_entry /home/qlyoung/frr/zebra/zapi_msg.c:2465:4
#6 0x68f958 in zserv_handle_commands /home/qlyoung/frr/zebra/zapi_msg.c:2611:3
#7 0x55666d in main /home/qlyoung/frr/zebra/main.c:309:2
#8 0x7f5bf3e5db96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
#9 0x4311d9 in _start (/usr/lib/frr/zebra+0x4311d9)
the ipset->backpointer was NULL as that the hash lookup failed to find
anything. Prevent this crash from happening.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The decoding of _add and _del functions is practically identical
do a bit of work and make them so.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
=================================================================
==13611==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe9e5c8694 at pc 0x0000004d18ac bp 0x7ffe9e5c8330 sp 0x7ffe9e5c7ae0
WRITE of size 17 at 0x7ffe9e5c8694 thread T0
#0 0x4d18ab in __asan_memcpy (/usr/lib/frr/zebra+0x4d18ab)
#1 0x7f16f04bd97f in stream_get2 /home/qlyoung/frr/lib/stream.c:277:2
#2 0x6410ec in zebra_vxlan_remote_macip_del /home/qlyoung/frr/zebra/zebra_vxlan.c:7718:4
#3 0x68fa98 in zserv_handle_commands /home/qlyoung/frr/zebra/zapi_msg.c:2611:3
#4 0x556add in main /home/qlyoung/frr/zebra/main.c:309:2
#5 0x7f16eea3bb96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
#6 0x431249 in _start (/usr/lib/frr/zebra+0x431249)
This decode is the result of a buffer overflow because we are
not checking ipa_len.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The linux kernel will occassionally send RTM_GETNEIGH when
it expects user space to help in resolution of an ARP entry.
See linux kernel commit:
commit 3e25c65ed085b361cc91a8f02e028f1158c9f255
Author: Tim Gardner <tim.gardner@canonical.com>
Date: Thu Aug 29 06:38:47 2013 -0600
net: neighbour: Remove CONFIG_ARPD
Since we don't care about this, let's just safely ignore this
message for the moment. I imagine in the future we might
care when we implement neighbor managment in the system.
Reported By: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There may be logic to prevent this ever happening earlier in the network
read path, but it doesn't hurt to double check it here, because clearly
deeper paths rely on this being the case.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Whatever this BFD re-transmission function is had a few problems.
1. Used memcpy instead of the (more concise) stream APIs, which include
bounds checking.
2. Did not sufficiently check packet sizes.
Actually, 2) is mitigated but is still a problem, because the BFD header
is 2 bytes larger than the "normal" ZAPI header, while the overall
message size remains the same. So if the source message being duplicated
is actually right up against the ZAPI_MAX_PACKET_SIZ, you still can't
fit the whole message into your duplicated message. I have no idea what
the intent was here but at least there's a warning if it happens now.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
- Fix iptable freeing code to free malloc'd list
- malloc iptable in zapi handler and use those functions to free it when
done to fix a linked list memleak
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We copy a fixed length buffer from the wire but don't ensure it is null
terminated. Then print it as a c-string. Lul.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
further down we hash the src & dst ip, which asserts that the afi is one
of the well known ones, given the field names i assume the correct afis
here are af_inet[6]
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
zebra can catch the kernel's route deletion by netlink.
but current FRR can't delete kernel-route on vrf(l3mdev)
when kernel operator delete the route on out-side of FRR.
It looks problem about kernel-route deletion.
This problem is caused around _nexthop_cmp_no_labels(nh1,nh2)
that checks the each nexthop's member 'vrf_id'.
And _nexthop_cmp_no_labels's caller doesn't set the vrf_id
of nexthop structure. This commit fix that case.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
router-id is buried deep in "show running-config", this new
command makes it easy to retrieve the user configured router-id.
Example:
# configure terminal
(config)# router-id 1.2.3.4
(config)# end
# show router-id
router-id 1.2.3.4
# configure terminal
(config)# no router-id 1.2.3.4
(config)# end
# show router-id
#
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
We were not setting the RTNH_F_ONLINK flag where appropriate
when creating nexthop objects in the kernel.
Set it on the nhmsg.nh_flags netlink message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we are doing a lookup on an individual nexthop,
we should still be passing along the type that gets passed
via the arguments. Otherwise, we will always think we own that
NHE when in reality anyone could have put that into the
kernel.
Before this patch, nexthops in the kernel will get swepped
out even if we didn't create them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We should be NULL checking the entire re->nhe struct, not
the group inside of it. When we get routes from the kernel
using a nexthop group (and future protocols) they will only
pass us an ID to use. Hence, this struct can (and will be)
NULL on first attach when only passed an ID.
There shouldn't be a situation where we have an re->nhe
and don't have an re->nhe->nhg anyway.
Before this patch you can easily make zebra crash by creating a
route in the kernel using a nexthop group and starting zebra.
`ip next add dev lo id 111`
`ip route add 1.1.1.1/32 nhid 111`
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Older versions of protobuf-c do not support version 3 of the
protocol. Add a check into the system to see if we have
version 3 available and if so, compile it in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If you compile FRR with no j factor zebra_mlag.c fails to
build because the vtysh extraction methodology runs first
before the protobuf compiler runs and that compilation does
not have the proper dependancy chain built for the inclusions
that zebra_mlag.c had. Moving the DEF* code into a zebra_mlag_vty.c
which can be included in the vtysh extraction code and has
no mlag.proto dependancies makes the compilation work better.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Handle the special case where a route update contains
no installed nexthops - that means the route is not
installed.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This is pretty much just to get rid of the HAVE_CUMULUS. The
hook/module API is as "wtf" as it was before...
Signed-off-by: David Lamparter <equinox@diac24.net>
Add an api that creates a copy of a list of nexthops and
enforces the canonical sort ordering; consolidate some nhg
code to avoid copy-and-paste. The zebra dplane uses
that api when a plugin sets up a list of nexthops, ensuring
that the plugin's list is ordered when it's processed in
zebra.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The processing of dataplane route notifications was a little
off-target after the nexthop-group re-work. This should allow
notifications to work better.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Linux has the idea of allowing a weight to be sent
down as part of a nexthop group to allow the kernel
to weight particular nexthop paths a bit more or less
than others.
See:
http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.rpdb.multiple-links.html
Allow for installation into the kernel using the weight attribute
associated with the nexthop.
This code is foundational in that it just sets up the ability
to do this, we do not use it yet. Further commits will
allow for the pass through of this data from upper level protocols.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use a per-nexthop flag to indicate the presence of labels; add
some utility zapi encode/decode apis for nexthops; use the zapi
apis more consistently.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use correct state/flags when installing EVPN macs; when we
converted from raw netlink to the zebra dataplane, a state value
got lost.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The flags can be important - like "threaded" - so we need to
actually capture them when plugins are registered.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Replace the existing list of nexthops (via a nexthop_group
struct) in the route_entry with a direct pointer to zebra's
new shared group (from zebra_nhg.h). This allows more
direct access to that shared group and the info it carries.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Problem reported by testing agency that RFC4861 section 6.2.5
states that a router should send an RA with a lifetime of 0
before ceasing to send RAs on the interface, or when the interace
is shutdown, or the router is shutdown. This fix adds that capability.
Ticket: CM-27061
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
For SR-TE we'll need to create Binding-SIDs which are essentially
LSPs that can push multiple outgoing labels. This commit sets the
groundwork for that. Luckily the netlink code didn't need to be
changed since it already supports pushing label stacks.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The openfabric daemon has a longer name than anticipated for
`show zebra client summary` adjust to allow it to fit without
making columns all blomped.
Before:
robot# show zebra client summ
Name Connect Time Last Read Last Write IPv4 Routes IPv6 Routes
--------------------------------------------------------------------------------
static 00:00:06 00:00:06 00:00:06 4/0 0/0
openfabric 00:00:06 00:00:06 00:00:06 0/0 0/0
After:
[sharpd@robot frr4]$ vtysh -c "show zebra client summ"
Name Connect Time Last Read Last Write IPv4 Routes IPv6 Routes
--------------------------------------------------------------------------------
static 00:02:16 00:02:16 00:02:16 4/0 0/0
openfabric 00:02:16 00:02:16 00:02:16 0/0 0/0
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported by testing facility that our sending of Router
Advertisements more frequently than once very three seconds is not
compliant with rfc4861. Added a knob to turn off fast retransmits
in order to meet the requirement of the RFC.
Ticket: CM-27063
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Macvlan down event have sentinel check of its parent
link presence.
Ticket:CM-26622
Reviewed By:CCR-9326
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
"show vrf vni" and "show evpn vni <l3vni>" commands
need to display correct router mac value.
"show evpn vni <l3vni>" detail l3vni needs to display
system mac as in PIP scenario value can be different.
Syste MAC would be derived from SVI interface MAC wherelse
Router MAC would be derived from macvlan interface MAC value.
Ticket:CM-26710
Reviewed By:CCR-9334
Testing Done:
TORC11# show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
Local Vtep Ip: 36.0.0.11
Vxlan-Intf: vx-4001
SVI-If: vlan4001
State: Up
VNI Filter: none
System MAC: 00:02:00:00:00:2e
Router MAC: 44:38:39:ff:ff:01
L2 VNIs: 1000
TORC11# show vrf vni
VRF VNI VxLAN IF L3-SVI State Rmac
vrf1 4001 vx-4001 vlan4001 Up 44:38:39:ff:ff:01
TORC11# show evpn vni 4001 json
{
"vni":4001,
"type":"L3",
"localVtepIp":"36.0.0.11",
"vxlanIntf":"vx-4001",
"sviIntf":"vlan4001",
"state":"Up",
"vrf":"vrf1",
"sysMac":"00:02:00:00:00:2e",
"routerMac":"44:38:39:ff:ff:01",
"vniFilter":"none",
"l2Vnis":[
1000,
]
}
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
macvlan interface up/down event triggers
bgp to send updates for evpn routes
with changed RMAC and nexthop IP values.
Ticket:CM-26190
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
By default announct Self Type-2 routes with
system IP as nexthop and system MAC as
nexthop.
An API to check type-2 is self route via
checking ipv4/ipv6 address from connected interfaces list.
An API to extract RMAC and nexthop for type-2
routes based on advertise-svi-ip knob is enabled.
When advertise-pip is enabled/disabled, trigger type-2
route update. For self type-2 routes to use
anycast or individual (rmac, nexthop) addresses.
Ticket:CM-26190
Reviewed By:
Testing Done:
Enable 'advertise-svi-ip' knob in bgp default instance.
the vrf instance svi ip is advertised with nexthop
as default instance router-id and RMAC as system MAC.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Extract mac-vlan interface mac when a l3vni add is sent to bgp
Per L3VNI maintain vrr interface.
An api to extract vrr mac address from a vlan id, associated
master svi device.
When a l3vni operational up event is sent to bgpd,
extract vrr rmac along with svi rmac.
Ticket:CM-26190
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Zebra MLAG is using "t_read" for multiple tasks, such as
1. For opening Communication channel with MLAG
2. In case conncetion fails, same event is used for retries
3. after the connection establishment, same event is used to
read the data from MLAG
since all these taks will never schedule together, this will not
cause any issues.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
edge-2> show evpn vni detail json
{
"vni":79031,
"type":"L3",
...,
...
} <<<<<< no comma
{
"vni":79021,
"type":"L3",
...,
...
} <<<<<< no comma
{
} <<<<<< blank
edge-2>
The fix is to pack json info into json_array before printing it.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
Apparently the multipath_num functionatlity has been broken
for a while because we were ignoring the recusive nexthops
when marking them inactive based on it.
This sets them as inactive as well if the parent breaks it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were re-counting the entire group's active number on
every iteration of this nexthop_active_update() loop.
This is not great from a performance perspective but also
it was failing to properly mark things according to the
specified multipath_num.
Since a nexthop is set as active before this check, if its == to
the set ecmp, it gets marked inactive even though if its
under the max ecmp wanted!
ex)
set ecmp to 1.
`/usr/lib/frr/zebra -e 1`
All kernel routes will be marked inactive even with just one nexthop!
K 1.1.1.1/32 [0/0] is directly connected, dummy1 inactive, 00:00:10
K 1.1.1.2/32 [0/0] is directly connected, dummy2 inactive, 00:00:10
K 1.1.1.3/32 [0/0] is directly connected, dummy3 inactive, 00:00:10
K 1.1.1.4/32 [0/0] is directly connected, dummy4 inactive, 00:00:10
K 1.1.1.5/32 [0/0] is directly connected, dummy5 inactive, 00:00:10
K 1.1.1.6/32 [0/0] is directly connected, dummy6 inactive, 00:00:10
K 1.1.1.7/32 [0/0] is directly connected, dummy7 inactive, 00:00:10
K 1.1.1.8/32 [0/0] is directly connected, dummy8 inactive, 00:00:10
K 1.1.1.9/32 [0/0] is directly connected, dummy9 inactive, 00:00:10
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Clean up the relationships between zebra's rib and nexthop-group
headers as prep for adding a nexthop-group pointer to the
route_entry.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
On BSD systems null routes were not being installed into the
kernel. This is because commit 08ea27d112
introduced a bug where we were attempting to use the wrong
prefix afi types and as such we were going down the v6 code path.
test27.lab.netdef.org# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/0] via 192.168.122.1, 00:00:23
S>* 4.5.6.8/32 [1/0] unreachable (blackhole), 00:00:11
C>* 192.168.122.0/24 [0/1] is directly connected, vtnet0, 00:00:23
test27.lab.netdef.org# exit
[ci@test27 ~/frr]$ netstat -rn
Routing tables
Internet:
Destination Gateway Flags Netif Expire
default 192.168.122.1 UGS vtnet0
4.5.6.8/32 127.0.0.1 UG1B lo0
127.0.0.1 link#2 UH lo0
192.168.122.0/24 link#1 U vtnet0
192.168.122.108 link#1 UHS lo0
Fixes: #4843
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The code for when a new vrf is created to properly handle
router advertisement for it is messed up in several ways:
1) Generation of the zrouter data structure should set the rtadv
socket to -1 so that we don't accidently close someone elses
open file descriptor
2) When you created a new zvrf instance *after* bootup we are XCALLOC'ing
the data structure so the zvrf->fd was 0. The shutdown code was looking
for the >= 0 to know if the fd existed (since fd 0 is valid!)
This sequence of events would cause zebra to consume 100% of the
cpu:
Run zebra by itself ( no other programs )
ip link add vrf1 type vrf table 1003
ip link del vrf vrf1
vtysh -c "configure" -c "no interface vrf1"
This commit fixes this issue.
Fixes: #5376
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we shut down zebra, we were not doing anything to shut
down the FPM. Perform the necessary occult rituals and
stop the threads from running during early shutdown.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We should be setting the ns->info pointer to NULL when we free
what it points to. Just use XFREE directly on the void * pointer
to do this.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were not connecting the default zebra_ns to the default
ns->info at namespace initialization in zebra. Thus, when
we tried to use the `ns_walk_func()` it would ignore the
default zebra_ns since there is no pointer to it from the
ns struct.
Fix this by connecting them in `zebra_ns_init()` and,
if the default ns is not found, exit with failure
since this is not recoverable.
This was found during a crash where we fail to cancel the kernel_read
thread at termination (via the `ns_walk_func()`) and then we
get a netlink notification trying to use the zns struct that has
already been freed.
```
(gdb) bt
\#0 0x00007fc1134dc7bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
\#1 0x00007fc1134c7535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
\#2 0x00007fc113996f8f in core_handler (signo=11, siginfo=0x7ffe5429d070, context=<optimized out>) at lib/sigevent.c:254
\#3 <signal handler called>
\#4 0x0000561880e15449 in if_lookup_by_index_per_ns (ns=0x0, ifindex=174) at zebra/interface.c:269
\#5 0x0000561880e1642c in if_up (ifp=ifp@entry=0x561883076c50) at zebra/interface.c:1043
\#6 0x0000561880e10723 in netlink_link_change (h=0x7ffe5429d8f0, ns_id=<optimized out>, startup=<optimized out>) at zebra/if_netlink.c:1384
\#7 0x0000561880e17e68 in netlink_parse_info (filter=filter@entry=0x561880e17680 <netlink_information_fetch>, nl=nl@entry=0x561882497238, zns=zns@entry=0x7ffe542a5940,
count=count@entry=5, startup=startup@entry=0) at zebra/kernel_netlink.c:932
\#8 0x0000561880e186a5 in kernel_read (thread=<optimized out>) at zebra/kernel_netlink.c:406
\#9 0x00007fc1139a4416 in thread_call (thread=thread@entry=0x7ffe542a5b70) at lib/thread.c:1599
\#10 0x00007fc113974ef8 in frr_run (master=0x5618823c9510) at lib/libfrr.c:1024
\#11 0x0000561880e0b916 in main (argc=8, argv=0x7ffe542a5f78) at zebra/main.c:483
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
1. add the Mlag ProtoBuf Lib to Zebra Compilation
2. Encode the messages with protobuf before writing to MLAG
3. Decode the MLAG Messages using protobuf and write to clients
based on their subscrption.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
This includes:
1. Processing client Registrations for MLAG
2. storing client Interests for MLAG updates
3. Opening communication channel to MLAG with First client reg
4. Closing Communication channel with last client De-reg
5. Spawning a new thread for handling MLAG updates peocessing
6. adding Test code
7. advertising MLAG Updates to clients based on their interests
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
This code is called from the zebra main pthread during shutdown
but the thread event is scheduled via the zebra dplane pthread.
Hence, we should be using the `thread_cancel_async()` API to
cancel the thread event on a different pthread.
This is only ever hit in the rare case that we still have work left
to do on the update queue during shutdown.
Found via zebra crash:
```
(gdb) bt
\#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
\#1 0x00007f4e4d3f7535 in __GI_abort () at abort.c:79
\#2 0x00007f4e4d3f740f in __assert_fail_base (fmt=0x7f4e4d559ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f4e4d9071d0 "master->owner == pthread_self()",
file=0x7f4e4d906cf8 "lib/thread.c", line=1185, function=<optimized out>) at assert.c:92
\#3 0x00007f4e4d405102 in __GI___assert_fail (assertion=assertion@entry=0x7f4e4d9071d0 "master->owner == pthread_self()", file=file@entry=0x7f4e4d906cf8 "lib/thread.c",
line=line@entry=1185, function=function@entry=0x7f4e4d906b68 <__PRETTY_FUNCTION__.15817> "thread_cancel") at assert.c:101
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
\#5 0x000055b40c291845 in zebra_dplane_shutdown () at zebra/zebra_dplane.c:3274
\#6 0x000055b40c27ee13 in zebra_finalize (dummy=<optimized out>) at zebra/main.c:202
\#7 0x00007f4e4d8d1416 in thread_call (thread=thread@entry=0x7ffcbbc08870) at lib/thread.c:1599
\#8 0x00007f4e4d8a1ef8 in frr_run (master=0x55b40ce35510) at lib/libfrr.c:1024
\#9 0x000055b40c270916 in main (argc=8, argv=0x7ffcbbc08c78) at zebra/main.c:483
(gdb) down
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
1185 assert(master->owner == pthread_self());
(gdb)
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Put the code to free the data held by a nhg_ctx
in nhg_ctx_free() as well. We do it similiarly for
the dplane_ctx.
Let nhg_ctx_fini() be any other routines that need to
be handled before freeing.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
SA warned us lookup could be NULL dereferenced in some
paths. Handle the case where we are passed a NULL
nexthop before we try to copy it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were only checking that two nhg_hash_entry's were equal
based on the active nexthop NUMBER. This is not sufficient in
special cases where whats active with one route using it,
might not be active with the other. We can see this with
routes trying to resolve to themselves.
Ex)
1.1.1.0/24
-> 1.1.1.1 dummy1 (inactive)
-> 1.1.1.2 dummy2
1.1.2.0/24
-> 1.1.1.1 dummy1
-> 1.1.1.2 dummy1 (inactive)
Without checking each nexthop individually, they will
hash to the same group since they have the same number of
active nexthops.
Fix this by looping over every nexthop for each nhe (they should
be sorted) and checking if the NEXTHOP_FLAG_ACTIVE flag's match.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We cannot clear the NEXTHOP_FLAG_FIB nexthop flag
when sending routes to the dataplane anymore since
nexthops are now shared.
We were seeing a situation where if we delete a route
using a nexthop group that is still active with another
route, the fib flag was being unset by this code
path despite them still being valid fib nexthops with the
other route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were crashing due to a missed label change code path
in mpls_ftn_uninstall() with the zebra_nhg hashing code.
Add a static handler function for label changing everywhere
in that code and use it in mpls_ftn_uninstall().
The crash was found in the ISIS-SR tests:
==23== Thread 1:
==23== Invalid read of size 4
==23== at 0x15B20E: zebra_nhg_hash_equal (zebra_nhg.c:365)
==23== by 0x489A2FD: hash_get (hash.c:143)
==23== by 0x489A4BC: hash_lookup (hash.c:183)
==23== by 0x15B5A3: zebra_nhg_find (zebra_nhg.c:494)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x1573E8: mpls_ftn_update (zebra_mpls.c:2661)
==23== by 0x1A2554: zread_mpls_labels_replace (zapi_msg.c:1890)
==23== by 0x1A41CD: zserv_handle_commands (zapi_msg.c:2613)
==23== by 0x199B17: zserv_process_messages (zserv.c:517)
==23== by 0x48EE6B7: thread_call (thread.c:1549)
==23== by 0x48A8AD5: frr_run (libfrr.c:1064)
==23== by 0x1391B7: main (main.c:468)
==23== Address 0x5839330 is 0 bytes inside a block of size 80 free'd
==23== at 0x48369AB: free (vg_replace_malloc.c:530)
==23== by 0x48AEE6C: qfree (memory.c:129)
==23== by 0x15C5F8: zebra_nhg_free (zebra_nhg.c:1095)
==23== by 0x15BC8C: zebra_nhg_handle_uninstall (zebra_nhg.c:734)
==23== by 0x15DCFA: zebra_nhg_uninstall_kernel (zebra_nhg.c:1826)
==23== by 0x15C666: zebra_nhg_decrement_ref (zebra_nhg.c:1106)
==23== by 0x15D9D7: zebra_nhg_re_update_ref (zebra_nhg.c:1711)
==23== by 0x15D8B1: nexthop_active_update (zebra_nhg.c:1660)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
==23== Block was alloc'd at
==23== at 0x4837B65: calloc (vg_replace_malloc.c:752)
==23== by 0x48AED56: qcalloc (memory.c:110)
==23== by 0x15B07B: zebra_nhg_copy (zebra_nhg.c:307)
==23== by 0x15B13E: zebra_nhg_hash_alloc (zebra_nhg.c:329)
==23== by 0x489A339: hash_get (hash.c:148)
==23== by 0x15B6CA: zebra_nhg_find (zebra_nhg.c:532)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x15D89A: nexthop_active_update (zebra_nhg.c:1658)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
This reverts commit 7d5bb02b1a.
Allow zebra to actually maintain the nexthop group in the
linux kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In symmetric routing case for evpn (type-2/type-5)
routes, nexthop and Router mac fields can change from
the originating VTEP.
At the receiving VTEP, bgp path info may points to different
nexthop IP (nh1->nh2) and Remote MAC remain the same.
When the bgp sync the route with nexthop and RMAC fields.
For the exisitng rmac entry update/replace with the new
nexthop (VTEP) IP in remote rmac db in Zebra.
Similarly, bgp path info may points different Router-mac
(RMAC1->RMAC2) and the nexthop value remains the same.
In this case, update to the new RMAC value for the
existing remote nexthop in the Zebra' nexthop cache db.
Ticket:CM-26917
Reviewed By:CCR-9435
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
We were creating `other` tables in rib_del(), vty commands, and
dataplane return callback via the zebra_vrf_table_with_table_id()
API.
Seperate the API into only a lookup, never create
and added another with `get` in the name (following the standard
we use in other table APIs).
Then changed the rib_del(), rib_find_rn_from_ctx(), and show route
summary vty command to use the lookup API instead.
This was found via a crash where two different vrfs though they owned
the table. On delete, one free'd all the nodes, and then the other tried
to use them. It required specific timing of a VRF existing, going away,
and coming back again to cause the crash.
=23464== Invalid read of size 8
==23464== at 0x179EA4: rib_dest_from_rnode (rib.h:433)
==23464== by 0x17ACB1: zebra_vrf_delete (zebra_vrf.c:253)
==23464== by 0x48F3D45: vrf_delete (vrf.c:243)
==23464== by 0x48F4468: vrf_terminate (vrf.c:532)
==23464== by 0x13D8C5: sigint (main.c:172)
==23464== by 0x48DD25C: quagga_sigevent_process (sigevent.c:105)
==23464== by 0x48F0502: thread_fetch (thread.c:1417)
==23464== by 0x48AC82B: frr_run (libfrr.c:1023)
==23464== by 0x13DD02: main (main.c:483)
==23464== Address 0x5152788 is 104 bytes inside a block of size 112 free'd
==23464== at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23464== by 0x48B25B8: qfree (memory.c:129)
==23464== by 0x48EA335: route_node_destroy (table.c:500)
==23464== by 0x48E967F: route_node_free (table.c:90)
==23464== by 0x48E9742: route_table_free (table.c:124)
==23464== by 0x48E9599: route_table_finish (table.c:60)
==23464== by 0x170CEA: zebra_router_free_table (zebra_router.c:165)
==23464== by 0x170DB4: zebra_router_release_table (zebra_router.c:188)
==23464== by 0x17AAD2: zebra_vrf_disable (zebra_vrf.c:222)
==23464== by 0x48F3F0C: vrf_disable (vrf.c:313)
==23464== by 0x48F3CCF: vrf_delete (vrf.c:223)
==23464== by 0x48F4468: vrf_terminate (vrf.c:532)
==23464== Block was alloc'd at
==23464== at 0x4837B65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23464== by 0x48B24A2: qcalloc (memory.c:110)
==23464== by 0x48EA2FE: route_node_create (table.c:488)
==23464== by 0x48E95C7: route_node_new (table.c:66)
==23464== by 0x48E95E5: route_node_set (table.c:75)
==23464== by 0x48E9EA9: route_node_get (table.c:326)
==23464== by 0x48E1EDB: srcdest_rnode_get (srcdest_table.c:244)
==23464== by 0x16EA4B: rib_add_multipath (zebra_rib.c:2730)
==23464== by 0x1A5310: zread_route_add (zapi_msg.c:1592)
==23464== by 0x1A7B8E: zserv_handle_commands (zapi_msg.c:2579)
==23464== by 0x19D689: zserv_process_messages (zserv.c:523)
==23464== by 0x48F09F8: thread_call (thread.c:1599)
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a dataplane plugin module as a sample or reference for
folks who might like to integrate with the zebra dataplane
subsystem. This isn't part of the FRR build or product; there
are some simple build and load-at-runtime instructions in
comments in the file.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The zvni_map_to_svi function may return NULL as such prevent
a deref and crash. Found via coverity
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fix 2 Coverity issues:
1) zebra_nhg.c -> all paths in nhg_ctx_process_finish have
already deref'ed the ctx pointer no need for a test of it
2) the **ifp pointer passed in may be NULL. Prevent an accidental
deref if calling function does not pass in a ifp pointer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Checkpatch was complaining because this code was extending
beyond 80 characters on a couple lines. Adjusted a conditional
tree to fix that.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a private header file for functions that are internal/special
case like how we do it for `lib/nexthop_group_private.h`.
Remove a bunch of functions from the header file only being used
statically and add some comments for those remaining to indicate
better what their use is.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-work the validity setting and checking APIs
for nhg_hash_entry's to make them clearer.
Further, they were originally only beings set
on ifdown and install. Extended their use into
releasing entries and to account for setting
the validity of a recursive dependent.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The commenting for why we would need to requeue a
group from the kernel to be later processed was not
sufficient. Add a better explanation for the flow
and state of the system.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Change the wording of the flag indicating we have received
a nexthop group from the kernel with a different ID but
is fundamentally identical to one we already have.
It was colliding with a flag of similar name in the nexthop struct.
Change it from NEXTHOP_GROUP_DUPLICATE -> NEXTHOP_GROUP_UNHASHABLE
since it is in fact unhashable.
Also change the wording of functions and comments referencing the same
problem.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When determining whether to set the nhg_hash_entry as
invalid, we should have been checking the depends, not
the dependents. If its a group and at least one of its
depends is valid, the group is still valid.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Guard against an overflow read when processing
nexthop groups from netlink. Add a check to ensure
we don't try to write passed the array size.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Now with this patch we can't use shutdown for cleanup:
```
commit 2fc69f03d2 (pr_5079)
Author: Mark Stapp <mjs@voltanet.io>
Date: Fri Sep 27 12:15:34 2019 -0400
zebra: during shutdown processing, drop dplane results
Don't process dataplane results in zebra during shutdown (after
sigint has been seen). The dplane continues to run in order to
clean up, but zebra main just drops results.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
```
Adjusted nhg uninstall handling to clear data and other
cleanup before sending to the dataplane.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a comment to the header of `zebra_nhg.c` to point the reader
to where the hashtables containing the nhg entries are held.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Reduce the api for deleting nexthops and the containing
group to just one call rather than having a special case
and handling it separately.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
With the new nexthop group shared memory framework, pointers
are being used in route_entry for the nexthop_group. Update
the use of this in `mpls_ftn_uninstall()` to reflect the change.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If the vrf lookup fails, use the default namespace
to find/delete the nexthop group from the kernel because it
should be there anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Check both the nhg and nexthop are not NULL before passing
them to be hashed. Clang SA caught this.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some boilerplate for nexthop installation for bsd kernels.
They do not support nexthop objects for now so its just boilerplate.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When moving the nexthop group in a route entry to be a pointer,
we missed one wrapped in a `ifndef` for when the kernel doesn't
have netlink.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We do not need to check that the nexthop is installed or queued
when sending a route deletion since we only need to the prefix for it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In lieu of the fact that we probably shouldn't change show
command output too much, changing this to only give nhe_id
output when the user explicitly asks for it. Probably only
going to be used for debugging for now anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Only used the afi passed into `zebra_nhg_find()` for nexthops
that are blackhole/ifindex. Others should use the type actually declared
in the nexthop struct itself.
Basically, nexthop objects of type blackhole/ifindex in the kernel must
have an address family, they cannot be ambigious and be shared.
This is some requirement in the linux ip core code.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a mechanism to requeue groups we receive from the
kernel if the IDs are in a weird order (Group ID is lower
than individual nexthop IDs for example).
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If we get a nexthop group from the kernel with labels
and queue it as a context to process later, we have to
free the label stack we allocated.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some getters for the nhg_ctx struct. Probably unnecessary
at this point since they are all static but if they ever become
public it will be nice to have them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add code for handling nexthop group hash entry encaps
and sending them to the kernel. Add some more debugging
information for the encaps and groups in general.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
There was some code copypasta for mpls stack building in the
netlink install path. Reduced that to a common function.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When querying for detailed route information, show the nexthop
group id for its nh_hash_entry in the output before listing the
nexthops.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some more detailed output to `show nexthop-group`.
It closely resembles the output of `show ip routes`.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Optimize the fib and notified nexthop group comparison algorithm
to assume ordering. There were some pretty serious performance hits with
this on high ecmp routes.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Update nhg_hash_entry to use the non-recursive version of
nexthop_group_equal() since it doesn't really need to compare all
of those.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were waiting until install time to mark nexthops as duplicate.
Since they are immutable now and re-used, move this marking into
when they are actually created to save a bunch of cycles.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Before checking the equivalence of the whole group itself,
check to see if they contain the same number of non-recursive
active nexthops. This should shorten lookup time for the case of
non-resolved nexthop group creation.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Create any depends only after the initial hash lookup
fails. Should reduce hashing cpu cycles significantly.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we receive a route delete from the kernel and it
contains a nexthop object id, use that to match against
route gateways with instead of explicit nexthops.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the supports_nh bool indicating whether the kernel we are
using supports nexthop objects into the netlink kernel interface
itself. Since only linux and netlink support nexthop object APIs
for now this is fine.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add handling for delete/update nexthop object messages from the
kernel.
If someone deletes a nexthop object we are still using, send it back
down. If the someone updates a nexthop we are using, replace that nexthop
with ours. Routes are referencing this nexthop object ID and we resolved
it ourselves, so we should force the other `someone` to submit to our
will.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
On restart, if we failed to remove any nexthop objects due
to a kill -9 or such event, sweep them if we aren't using them.
Add a proto field to handle this and remove the is_kernel bool.
Add a dupicate flag that indicates this nexthop group is only
present in our ID hashtable. It is a dupicate nexthop we received
from the kernel, therefore we cannot hash on it.
Make the idcounter globally accessible so that kernel updates
increment it as soon as we receive them, not when we handle them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Give all nhg_hash_entrys we install into the kernel
as nexthop objects a defined proto matching the zebra
rib table one. This makes sense since nhe's are proto-independent
and determined exclusively in zebra.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The kernel does not allow duplicate IDs in the same group, but
we are perfectly find with it internally if two different
nexthops resolve the the same nexthop (default route for instance).
So, we have to handle this when we get ready to install.
Further, pass the max group size in the arguments to ensure we
don't overflow. Don't actually think this is possible due to
multipath checking in nexthop_active_update() but better to be
safe.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We need to handle refcnt differently if we ever start making
upper level protocols aware of nhg_hash_entry IDs.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If the nhe was successfully installed, make sure its marked
as valid. Not fully sure how/where the valid flag is going to
be used yet.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were setting a group to be recursive if its first depend
was. This is not the case; individual depends of the group
might be recursive but the group itself is not.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the resolving and installing of a single nhg_hash_entry
into the install function itself, rather than letting zebra_rib
handle it.
Further, ensure depends are installed/queued before installing
a group. The ordering should be find here since only one thread
will call this API.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the installation of an nhe out of nexthop_active_update()
and into the rib install path. So, only install the nhe when
a route using it is being installed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Before we install a route, we verify that the nhg_hash_entry is installed.
Allow the nhe to be queued as well and still pass the route
install along.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Switch the nhg_connected tree structures to use the new
RB tree API in `lib/typerb.h`. We were using the openbsd-tree
implementation before.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were not setting the NEXTHOP_GROUP_RECURSIVE flag via
the rib find path. Adding a check and set after successful
creation of a new nhg_hash_entry.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When going through the zebra_nhg_rib_find(), we now handle the
case of if that nexthop has been recursively resolved. A depend
is created and passed along to zebra_nhg_find().
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a refcnt as soon as depend is connected to mark
that this is being referenced as part of a group or
resolving another one. If the one referencing it
is never used, decrement it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some helper functions for finding/creating nexthop
group hash entries and assigning them as a depends for
another one using them in a group or resolving to them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some helper functions for ref incrementing and
decrementing the depends of a nexthop group hash entry.
This just abstracts the RB tree manipulation a bit more.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We are using the rib workqueue to handle nexthop groups
from the kernel and no longer need this.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Set the resolved nhg during the find path, rather
than after it has been created. This make more sense
now that we are hashing on the resolved nexthop as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Refactor/move around the code for nexthop resolution so
that it occurs only when the nexthop actually changes. Further,
provide a helper function to make the code more readable.
Also, remove the check for NEXTHOPS_CHANGED as this flag is used
specifcially for nexthop tracking and not an appropriate check
here.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When hashing/creating the NHE, use the nexthops vrf as its
source of data. This is gotten directly from an interface
and should not come from a route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When updating a route's referenced NHE, accept a NULL value
as valid and clear out the pointer in the struct.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When the resolved nexthop changes, we should increment the new
resolved NHE by the refcnt for the unresolved NHE being used
by the routes and decrement the old one by the same amount.
Before, we were simple incrementing by one, causing incorrect refcnts
to occur.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add cli to show nhg_hash_entry's by ID.
Add cli to show nhg_hash_entry info for interfaces and remove
just listing ID's in `show interface *`
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Ignore the cleanup for now until we get the timing
figured out without using the kernel nexthop object API.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If the nhg_hash_entry is a group, check if its members
are valid before setting it invalid. If even one is valid,
then this group should still be considered valid.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Only remove a route if the nexthop it is using is still installed.
If a nexthop object is removed from the kernel, all routes referencing
it will be removed from the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We should create a new NHE if the mpls labels change
since we hash on them. This adds the functonality to do that
and decrement the refcnt on the old one.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
On startup when we are requesting all nexthop objects
from the kernel and it doesn't support that, we should not
produce an error message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In zebra_nhg_find(), if we created a nhg_hash_entry, return
true so we know rib-side.
Kernel-side, we don't care since it will always just enqueue
a context to process later.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the ability to recursively resolve nexthop group hash entries
and resolve them when sending to the kernel.
When copying over nexthops into an NHE, copy resolved info as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Only queue a nexthop object update if the dataplane
supports nexthop objects. Otherwise, mark it as a success
since we should only me sending them to the kernel
if we think they are valid anywyay.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were only setting and checking the ifindex if
the nexthop had an *_IFINDEX type. However, when nexthop
active checking is done, the non-*_IFINDEX types can also
obtain a nexthop with an ifindex and are thus valid too.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We will use a nhe context for dataplane interaction with
nextho group hash entries.
New nhe's from the kernel will be put into a group array
if they are a group and queued on the rib metaq to be processed
later.
New nhe's sent to the kernel will be set on the dataplane context
with approprate ID's in the group array if needed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Upon release, call the approprate functions to remove itself
from depends/dependents trees it is in.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some functions to iterate over the depends/dependents
RB tree and remove themselves from the other's RB tree.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Can't RM_REMOVE directly with a key, you need to actually pass the
data to be removed. So, lookup with a key first to find the node,
then remove it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Check to make sure the route entry has a nexthop
group before we try to free after a table lookup
failure.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Removed a static function that did not need to be
there. The nhg_connected_cmp() function provides
all the needed functionality for comparing ID's
in the RB tree.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Update the zebra_nhg_hash_equal() function to use
the nexthop_group_equal() function in lib/nexthop_group
instead of comparing their depends RB tree.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Removing this function since the new paradigm
of everything just being nhg_connected structs
makes it not make a lot of sense.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Put the setting of the ifp on a nexthop group hash
entry into the zebra_nhg_alloc() function. It should
only be added if its not a group/recursive (it doesn't
have any depends) and its nexthop type has an ifindex.
This also provides functionality for proto-side ifp
setting.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
A nexthop group should not have a VRF ID. Only individual
nexthops need to be using a VRF. Fixed this both kernel and
proto side.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-organize and expose the nhg_connected functions so that
it can be used outside zebra_nhg.c. And then abstract those
into zebra_nhg_depends_* and zebra_nhg_depenents_* functons.
Switch the ifp struct to use an RB tree for its dependents,
making use of the nhg_connected functions.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Create a nhg_depenents tree that will function as a way
to get back pointers for NHE's depending on it.
Abstract the RB nodes into nhg_connected for both depends and
dependents. This same struct is used for both.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Default the afi of the nexthop to the route entry using it.
If it turns out to be a group, update the afi to AFI_UNSPEC.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Update rib_add_multipath to use the reference count
increment function for nexthop group hash entries.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add function to increment the route reference count for nhg_hash_entry's
and to do so recursively if its a group.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a helper function to allow us to check if two
nhg_hash_entry's dependency lists are equal.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add helper function to allow us to lookup an ID inside
of a nhg_hash_entry's dependency list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a function that allows us to take a single
nexthop struct and look that up or create a group and
nexthop hash entry with it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Pass a boolean to zebra_nhg_find(), indicating whether the
nhg is being lookedup from the kernel side or not.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the id counter further up into zebra_nhg_find() so that
it is still incremented if we receive a duplicate that never
would get allocated. The kernel will still use the dup, so we
have to account for that in our id counter.
Also, if we don't create a new entry, reset the id back to where
it was when zebra_nhg_find() was called.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Make the the kernel debug zlog for nexthop messages from the
kernel more aligned with the route message kernel debug zlog.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Changed our alloc function to just copy the nhg and
nhg_depends. This makes the zebra_nhg_find code a
little bit cleaner, hopefully preventing bugs.
The only issue with this is that it makes us have to loop
over the nexthops in a group an extra time for the copies.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Fix a couple functions that were using depends (plural)
rather than depend(singular) in their wording.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add functionality to allow us to send nexthop groups
to the kernel. It creates a nexthop_grp array based on
the dependency list in the nhg_hash_entry and then shoves
that into the netlink message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Update the dataplane nexthop ctx to use the nhg_depend_dup_list()
function for copying over the dependencies into its context.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a function to duplicate a nhg dependency linked
list. We will use this for duplicating the dependency
list rather than the linked list dup function in lib/linkedlist.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Nexthop groups can have nexthops in different vrf's. So,
let's make the group vrf_id just be VRF_DEFAULT for hash
lookup purposes.
Set vrf_id to be VRF_DEFAULT for every message. If its a new
nextop, set the vrf to be the appropriate thing, otherwise
its a group and can just be left as default.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Simplify the code for nexthop hash entry creation. I made nexthop
hash entry creation expect the nexthop group and depends to always
be allocated before lookup. Before, it was only allocated if it had
dependencies. I think it makes the code a bit more readable to go
ahead an allocate even for single nexthops as well.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some functions that can be called to free everything that should
have been allocated in a nexthop group hash entry.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add an option to not specify the afi in the show nexthop-group
command so that it shows all nexthops, including groups. This is
how iproute2 does it. If the afi is given, it will only show single
nexthops since groups are AF_UNSPEC.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add functionality to read in a group from the kernel,
create a hash entry for it, and add its nexthops to
its dependency list.
Further, we create its nhg struct separtely from this,
copying over any nexthops it should reference directly
into it.
Thus, we have two types for representation of the nexthop group:
nhe->nhg_depends->[nhe, nhe, nhe]
nhe->nhg->nexthop->nexthop->nexthop
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We treat "groups" from the kernel here as a dependency list.
Each hash entry, if its a group from the kernel, has
a list of any other nexthop hash entries that are in its
group. A non-group nexthop from the kernel will have its
dependency list set to NULL.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Since nexthops are always going to need to be address family
specific unless they are only a group, we have to address
this when we receive and send them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The message for an invalid address family on a nexthop gateway did
not specify that is what for the gateway specifically.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The nexthop group hash entries were using the "TMP" memory
type. Declared one for them and updated to use it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Changed to the wording in the duplicate error message
since its techincally possible we get could try to
create a dupe from somewhere else besides the kernel
in the future.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add an interface pointer for an nexthop group hash entry
when we are getting a rib_add for a new route.
Also, add the interface index to the `show nexthop-group` command.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we get a new nexthop and find the interface associated
with it, add this nexthop to the interface's zebra interface
info nexthop hash entry list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a nexthop hash entry list to the local zebra
interface info for each interface. This will allow
us to modify nexthops on link events.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Make our route entry struct's re->ng nexthop group pointer
just point to the nhe->nhg nexthop hash entry nexthop group.
This will allow updates to the nexthop itself to propogate
to our routes immediately.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a parameter to the rib_add function so that it takes
a nexthop ID from the kernel if one is passed along
with the route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Move the nexthop unicast parsing into its own function
to improve code readability. It was getting a bit too
indented.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add parsing code for nexthop object ID's when we get a
route. When we get a new route with the new kernel, it
will come with a nexthop ID and the nexthop full info.
We should just reference by ID if it exists and point
to the nexthop hash entry that matches it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add functionality to uninstall nexthops we created on shutdown.
To account for this, I added in a function for zebra_router
cleanup in a shutdown event.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When nexthop entry reference counts hit zero and
we created them, uninstall them from the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added in case statements to handle finished dataplane contexts
and then handle them with the nexthop process result function.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Switched the route entries to use ID's instead of pointers.
Perform lookups with the ID and then check if its null.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a function that can handle the results of a dataplane
ctx status, dpending on the operation performed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add functions for sending a nexthop to be queued on the dataplane
for install/uninstall into the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added functionality so that when we receive a RTM_DELNEXTHOP
for a nhg_hash_entry that is still being referenced by
a route, we immediately push it back to the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a function for installing Nexthop Group hash entires into
the kernel. It sends the entry to the dataplane and does any
post-processing immediately after that.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added a NEXTHOP_GROUP_QUEUED flag to the nexthop
group hash entry struct. This indicates when we have
sent it to be installed to the kernel and are waiting
for the dataplane provider to process it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were ignoring the status result interger from
the netlink request and message parsing and just
returning 0. Fixed this to return the result of the last one.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added a check on startup for determining if the kernel supports
nexthop objects. It sets an appropriate bool on the zebra namespace
struct.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Device only nexthops still need an address family associated
with them. Decided to get this from the destination prefix on it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The nexthop dataplane context was not getting populated with
namespace info for its netlink messages. Fixed this to do
lookups the same way we do it with route contexts.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We needed a kernel debugging function for netlink nexthop
messages when people are debugging kernel zebra messages.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add all the neccessary code to allow nexthops to be processed
in separate dataplane contexts with the netlink dataplane kernel
provider.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We needed an error code that can be used when we
fail to install a nexthop group into the kernel/fib.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added the appropriate flags that need to be set when
we receive a nexthop from the kernel. They should be
marked as ACTIVE and that they are in the FIB.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the functionality to parse new nexthop group messages
from the kernel and insert them into the appropriate hash
tables. Parsing is done at startup between interface and
interface address lookup. Add functionality to parse
changes to nexthops we already have. Add functionality
to parse delete nexthop messages from the kernel and
remove them from our table.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
I do not believe we should be hashing based on AFI
in for our upper level nexthop group entries. These
should be ambiguous with regards to address families since
an ipv4 or ipv6 address can have the same interface
nexthop. This can be seen in NEXTHOP_TYPE_IFINDEX.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add an error code that indicates we received a nexthop
from the kernel that is identical to one it/we already
have other than its ID.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the basic infrastructure for a nexthop group work queue.
This queue will be used to validate and then install the
new nexthop group.
The result from the kernel when a new nexthop group is installed
will cause the route entries that depend on it to be installed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a nexthop hash entry to the route_entry so that we can
track the nhe with the route entry.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Since we are using two different tables to hash the next groups with,
lets add an error message in case there is a failure to insert into
one of them. This will help to notify if the tables are not synced.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The messages we get from the kernel come with ids only
for groups, so lets index with those as well. Also adding
a helper function for lookup and get with the two different
tables.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Separate interface lookup into its own function.
We need to know interfaces for reading in nexthop
information, but we need to know nexthops for reading
in the interface addresses. We will read in nexthops
between the two.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The nexthop_active_num data structure is a property of the
nexthop group. Move the keeping of this data to that.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the route_entry we are keeping a non pointer based
nexthop group, switch the code to use a pointer for all
operations here and ensure we create and delete the memory.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add some base functionality so we can verify we are getting messages
about nexthops from the kernel.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some code to allow us to do lookups and releases of
nexthop groups from zebra. At this point we do not do anything
with it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We need to track if a nexthop group is valid and installed,
so create some basic flags to track this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit does nothing more than just create a hash structure
that we will use to track nexthop groups.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Since we don't have a daemon who's job is to handle kernel
routes and we don't get an explicit route delete anymore if
nexthops become unreachable from the kernel, zebra must
re-process kernel routes itself to make sure they are still valid.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We can assume that system/kernel routes are valid indeed
if this is our first time procesing them. But since we don't
get explicit deletion events for kernel routes anymore, we
have to be prepared to process them if the nexthop becomes
unreachable for instance. Therefore, if the route is not NEW,
then don't assume its valid.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If we need to batch process the rib (all tables or specific
vrf), do so as a scheduled thread event rather than immediately
handling it. Further, add context to the events so that you
narrow down to certain route types you want to reprocess.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Do not allow an upper level protocol to send a route to
zebra that is a /32 or a /128 that recurses through itself.
Current behavior:
donna.cumulusnetworks.com# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/104] via 10.0.2.2, enp0s3, 01:05:28
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:01:50
C>* 192.168.209.0/24 is directly connected, enp0s8, 01:05:28
C>* 192.168.210.0/24 is directly connected, enp0s9, 01:05:28
D>* 192.168.210.43/32 [150/0] via 192.168.210.44, enp0s9, 01:01:57
D 192.168.210.44/32 [150/0] via 192.168.210.44 inactive, 01:05:15
C>* 192.168.212.0/24 is directly connected, enp0s10, 01:05:28
donna.cumulusnetworks.com# sharp install routes 40.0.0.1 nexthop 192.168.210.44
% Command incomplete: sharp install routes 40.0.0.1 nexthop 192.168.210.44
donna.cumulusnetworks.com# sharp install routes 40.0.0.1 nexthop 192.168.210.44 1
donna.cumulusnetworks.com# end
donna.cumulusnetworks.com# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/104] via 10.0.2.2, enp0s3, 01:05:51
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:12
D>* 40.0.0.1/32 [150/0] via 192.168.210.44, enp0s9, 00:00:03
C>* 192.168.209.0/24 is directly connected, enp0s8, 01:05:51
C>* 192.168.210.0/24 is directly connected, enp0s9, 01:05:51
D>* 192.168.210.43/32 [150/0] via 192.168.210.44, enp0s9, 01:02:20
D 192.168.210.44/32 [150/0] via 192.168.210.44 inactive, 01:05:38
C>* 192.168.212.0/24 is directly connected, enp0s10, 01:05:51
donna.cumulusnetworks.com#
Fixed behavior:
donna.cumulusnetworks.com# sharp install routes 192.168.210.44 nexthop 192.168.210.44 1
donna.cumulusnetworks.com# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/104] via 10.0.2.2, enp0s3, 00:00:15
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:15
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:15
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:15
D 192.168.210.44/32 [150/0] via 192.168.210.44 inactive, 00:00:03
C>* 192.168.212.0/24 is directly connected, enp0s10, 00:00:15
donna.cumulusnetworks.com# sharp install routes 40.0.0.1 nexthop 192.168.210.44 1
donna.cumulusnetworks.com# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/104] via 10.0.2.2, enp0s3, 00:00:24
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:24
D>* 40.0.0.1/32 [150/0] via 192.168.210.44, enp0s9, 00:00:02
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:24
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:24
D 192.168.210.44/32 [150/0] via 192.168.210.44 inactive, 00:00:12
C>* 192.168.212.0/24 is directly connected, enp0s10, 00:00:24
donna.cumulusnetworks.com#
This behavior came up from discussion around issue #5159. Where
OSPF was receiving a route through itself as part of the router link
lsa. I currently think that ospf should probably dissallow this in ospf
but we should also do the right thing in zebra. If we do not allow this
change we can have situations where ordering of routes into zebra suddenly
matters.
Fixes: #5159
Signed-off-by: Donald Sharp <sharpd@cumulsunetworks.com>
If we only really use the ifp for the name, then
don't bother referencing the ifp. If that ifp is
freed, we don't expect zebra to handle the rules that
use it (that's pbrd's job), so it is going to be
pointing to unintialized memory when we decide to remove
that rule later. Thus, just keep the name in the data
and dont mess with pointer refs.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Use the ifindex value as a primary hash key/identifier, not
the ifp pointer. It is possible for that data to be freed
and then we would not be able to hash and find the rule entry
anymore. Using the ifindex, we can still find the rule even
if the interface is removed.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were seeing a double free on shutdown if the
hash release fails here due to the interface state
changing. We probably shouldn't free the data if its
still being handled in the table so adding a check there
and a debug message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
With commit: a9ff90c41b
the vrf_id_t was changed from a uint16_t to a uint32_t
Zebra tracked the last command sent to it's peer via peeking
into the data it was sending to each client ( since we had
lost the idea of what the command was when it was time to track
the data ).
Add a define to track this and add a bit of verbiage
to the code to allow us to notice when we screw with
the header again so that this is just fixed correctly
when it happens again.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
initially, vrf backend if vrf-lite, and a specific table identifier is
associated to a vrf. here, with netns vrf backend, there is no specific
table assigned to except default routing table. use the passed table_id
parameter in zapi api, and apply it to the entry to be pushed in, or to
be removed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Cleanup the interface creation apis to make it more
clear what they are doing.
Make it explicit that the creation via name/ifindex will
only add it to the appropriate list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
commit ee8a72f315
broke the usage of ZEBRA_ROUTE_ALL as a valid redistribution
command. This commit puts it back in. LDP uses ZEBRA_ROUTE_ALL
as an option to say it is interested in all REDISTRIBUTION events.
Fixes: #5072
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Don't process dataplane results in zebra during shutdown (after
sigint has been seen). The dplane continues to run in order to
clean up, but zebra main just drops results.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
asymmetric routing default vrf vni configuration
is not displayed as part of running-config.
Ticket:CM-26470
Reviewed By:
Testing Done:
T11# config t
T11(config)# vni 4004 prefix-routes-only
T11(config)# end
Before:
T11# show running-config
...
vni 4004
...
After:
T11# show running-config
...
vni 4004 prefix-routes-only
...
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Add the (single) dataplane config value to the output of
config write, 'show run' - missed this during dplane development.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
the if_lookup_by_name_per_ns keeps a lock on the node where the
searched ifp is stored. Then this node can not be freed even if
the ifp is removed from the node. Just add the missing unlock
(as for the if_lookup_by_index_per_ns lookup function)
Fixes: b8af3fbbaf ("zebra: fix detection of interface renames")
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Current autocompletion works only for simple "vrf NAME" case.
This commit expands it also for the following cases:
- "nexthop-vrf NAME" in staticd
- usage of $varname in many daemons
All daemons are updated to use single varname "$vrf_name".
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
1. add the Mlag ProtoBuf Lib to Zebra Compilation
2. Encode the messages with protobuf before writing to MLAG
3. Decode the MLAG Messages using protobuf and write to clients
based on their subscrption.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
This includes:
1. Processing client Registrations for MLAG
2. storing client Interests for MLAG updates
3. Opening communication channel to MLAG with First client reg
4. Closing Communication channel with last client De-reg
5. Spawning a new thread for handling MLAG updates peocessing
6. adding Test code
7. advertising MLAG Updates to clients based on their interests
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
When a VxLAN interface comes up new vni up event is sent
to bgpd, which triggers bgpd to sync advertise-svi-macip
to zebra. At this point, vni is present but the associated
SVI may not be present.
When SVI comes up, vni add event sent to bgpd (with associated
vrf update). Bgpd already has vni present so
advertise-svi-macip is not synced to Zebra.
To fix,
When advertise-svi-macip flag is synced first time, cache it in
zebra context even though vni associated SVI is not present.
when SVI comes up, interface address add event triggers
new MAC-IP route add to bgpd.
Ticket:CM-26038
Reviewed By:CCR-9254
Testing Done:
Validated via running a sequence of steps in symmetric
routing topology.
- Enable advertise-svi-macip at l2vni level under bgp default
instance (afi/safi, l2vpn/evpn)
- Flap l2vni associated SVI interface.
- Check the output of 'show bgp l2vpn evpn route' command for
MAC-IP route of the SVI's (MAC and IP address).
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Update neighbor entries and rule entries to have the RTPROT_ZEBRA
protocol value. So we can tell where things come from.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Start the conversion to allow zapi interface callbacks to be
controlled like vrf creation/destruction/change callbacks.
This will allow us to consolidate control into the interface.c
instead of having each daemon read the stream and react accordingly.
This will hopefully reduce a bunch of cut-n-paste stuff
Create 4 new callback functions that will be controlled by
lib/if.c
create -> A upper level protocol receives an interface creation event
The ifp is brand spanking newly created in the system.
up -> A upper level protocol receives a interface up event
This means the interface is up and ready to go.
down -> A upper level protocol receives a interface down
destroy -> A upper level protocol receives a destroy event
This means to delete the pointers associated with it.
At this point this is just boilerplate setup for future commits.
There is no new functionality.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When zebra gets a callback from the kernel that an interface has
actually been deleted *and* the end users has not configured
the interface, then allow for deletion of the interface from zebra.
This is especially important in a docker environment where containers
and their veth interfaces are treated as ephermeal. FRR can quickly
have an inordinate amount of interfaces sitting around that are
not in the kernel and we have no way to clean them up either.
My expectation is that this will cause a second order crashes
in upper level protocols, but I am not sure how to catch these
and fix them now ( suggestions welcome ). There are too many
use patterns and order based events that I cannot know for certain
that we are going to see any at all, until someone sees this problem
as a crash :( I do not recommend that this be put in the current
stabilization branch and allow this to soak in master for some time
first.
Testing:
sharpd@donna ~/frr4> sudo ip link add vethdj type veth peer name vethjd
sharpd@donna ~/frr4> sudo ip link add vethaa type veth peer name vethab
sharpd@donna ~/frr4> sudo vtysh -c "show int brief"
Interface Status VRF Addresses
--------- ------ --- ---------
dummy1 down default
enp0s3 up default 10.0.2.15/24
enp0s8 up default 192.168.209.2/24
enp0s9 up default 192.168.210.2/24
enp0s10 up default 192.168.212.4/24
lo up default 10.22.89.38/32
vethaa down default
vethab down default
vethdj down default
vethjd down default
virbr0 up default 192.168.122.1/24
virbr0-nic down default
sharpd@donna ~/frr4> sudo ip link set vethaa up
sharpd@donna ~/frr4> sudo ip link set vethab up
sharpd@donna ~/frr4> sudo ip link del vethdj
sharpd@donna ~/frr4> sudo vtysh -c "show int brief"
Interface Status VRF Addresses
--------- ------ --- ---------
dummy1 down default
enp0s3 up default 10.0.2.15/24
enp0s8 up default 192.168.209.2/24
enp0s9 up default 192.168.210.2/24
enp0s10 up default 192.168.212.4/24
lo up default 10.22.89.38/32
vethaa up default
vethab up default
virbr0 up default 192.168.122.1/24
virbr0-nic down default
sharpd@donna ~/frr4> sudo ip link del vethaa
sharpd@donna ~/frr4> sudo vtysh -c "show int brief"
Interface Status VRF Addresses
--------- ------ --- ---------
dummy1 down default
enp0s3 up default 10.0.2.15/24
enp0s8 up default 192.168.209.2/24
enp0s9 up default 192.168.210.2/24
enp0s10 up default 192.168.212.4/24
lo up default 10.22.89.38/32
virbr0 up default 192.168.122.1/24
virbr0-nic down default
sharpd@donna ~/frr4> sudo ip link add vethaa type veth peer name vethab
sharpd@donna ~/frr4> sudo vtysh -c "show int brief"
Interface Status VRF Addresses
--------- ------ --- ---------
dummy1 down default
enp0s3 up default 10.0.2.15/24
enp0s8 up default 192.168.209.2/24
enp0s9 up default 192.168.210.2/24
enp0s10 up default 192.168.212.4/24
lo up default 10.22.89.38/32
vethaa down default
vethab down default
virbr0 up default 192.168.122.1/24
virbr0-nic down default
sharpd@donna ~/frr4> sudo vtysh -c "show run"
Building configuration...
Current configuration:
!
frr version 7.2-dev
frr defaults datacenter
hostname donna.cumulusnetworks.com
log stdout
no ipv6 forwarding
!
ip route 192.168.3.0/24 192.168.209.1
ip route 192.168.4.0/24 blackhole
ip route 192.168.5.0/24 192.168.209.1
ip route 192.168.6.0/24 192.168.209.1
ip route 192.168.7.0/24 99.99.99.99 nexthop-vrf EVA
ip route 192.168.8.0/24 192.168.209.1
ip route 4.5.6.7/32 12.13.14.15
!
interface dummy1
ip address 12.13.14.15/32
!
interface vethaa
description FROO
!
line vty
!
end
sharpd@donna ~/frr4> sudo ip link del vethaa
sharpd@donna ~/frr4> sudo vtysh -c "show int brief"
Interface Status VRF Addresses
--------- ------ --- ---------
dummy1 down default
enp0s3 up default 10.0.2.15/24
enp0s8 up default 192.168.209.2/24
enp0s9 up default 192.168.210.2/24
enp0s10 up default 192.168.212.4/24
lo up default 10.22.89.38/32
vethaa down default
virbr0 up default 192.168.122.1/24
virbr0-nic down default
sharpd@donna ~/frr4> sudo vtysh -c "show run"
Building configuration...
Current configuration:
!
frr version 7.2-dev
frr defaults datacenter
hostname donna.cumulusnetworks.com
log stdout
no ipv6 forwarding
!
ip route 192.168.3.0/24 192.168.209.1
ip route 192.168.4.0/24 blackhole
ip route 192.168.5.0/24 192.168.209.1
ip route 192.168.6.0/24 192.168.209.1
ip route 192.168.7.0/24 99.99.99.99 nexthop-vrf EVA
ip route 192.168.8.0/24 192.168.209.1
ip route 4.5.6.7/32 12.13.14.15
!
interface dummy1
ip address 12.13.14.15/32
!
interface vethaa
description FROO
!
line vty
!
end
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were storing the interface description irrelevant of whether
or not it was a newlink or dellink. This makes no sense.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This change addresses the following :
1. Ensures zlog_debug should be under DEBUG macro check
2. Ensures zlog_err and zlog_warn wherever applicable.
3. Removed few posivite logs from fpm handling, whose frequency is high.
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
when a client disconnects, we iterate over the routing table to
remove any label that originated from that client. However we
were erroneously passing the route type to the function, while
it was expecting the lsp type. As a result, for example, killing
ldpd would not remove the ldp labels from the routes.
Kudos to @rwestphal for pointing me to the source of the issue.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
speed interface is done 15 seconds after interface creation. during that
time, the vrf or the interface may have disappeared. to protect this,
return an error in case it is not possible to create a vrf socket or it
is not possible to get speed of an interface because of a missing
device.
Signed-off-by: Julien Floret <julien.floret@6wind.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When processing route updates from the dataplane, we were
terminating the checking of nexthops prematurely, and we could
miss meaningful changes.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
User pass the string match large-community 1 exact-match from CLI.
Now route map lib has got the string as "1 exact-match". It passes the string
to call back for compilation. BGP will parse this string and came to know
that for "1" it has to do exact match. Routemap lib has to save "1" in it’s
dependency table. Here routemap is saving this as a “1 exact-match”
which is wrong. The solution is used the compiled data.
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
When selecting a new best route, zebra sends a redist update
when the route is installed. There are cases where redist
clients may not see that redist add - clients who are not
subscribed to the new route type, e.g. In that case, attempt
to send a redist delete for the old/previous route type.
Revised the redist delete api to accomodate both cases;
also tightened up the const-ness of a few internal redist apis.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a bit of extra command `show ip route summary table XXX`
To allow end user to specify a specific table that they want
summary information on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This new message makes it possible to install/reinstall LSPs with
multiple nexthops using a single ZAPI message.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
If the nexthop is of type `GATEWAY_IFINDEX` then the nexthop
should not resolve to a nexthop that has a different ifindex
from the one given.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Use the zserv_client_close hook to cleanup all MPLS labels advertised
by a zclient when it disconnects. We were doing this cleanup for
ldpd only, but now we have other daemons that are MPLS aware,
like ospfd (due to the new Segment Routing feature).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
* Add ability to specify the nexthop type;
* Add ability to install or not a FTN (in addition to an LSP).
These two additions will be useful to install local SR Prefix-SIDs
configured with the no-PHP option.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
SR support for IS-IS is coming so we need to be able to distinguish
OSPF and IS-IS LSPs.
While here, add missing case statement for LDP on
lsp_type_from_re_type().
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Use the route type and instance instead of the route distance
to identify MPLS FTNs. This is a more robust approach since the
routing daemons can modify the distance of their announced routes
via configuration, which can cause inconsistencies.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Do this for the following reasons:
* Improve modularity of the code by separating the decoding of the
ZAPI messages from their processing;
* Create an API that is easier to use by the client daemons.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Some netlink-facing code used for evpn/vxlan programming was
being run in the dataplane pthread, but accessing zebra core
datastructs. Move some additional data into the dataplane
context, and use it in the netlink path instead.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Everywhere else in the code we use GNU_LINUX, that is the symbol we actualy define in the configuration. Don't rely on compiler's built-in symbols.
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
frr_with_mutex(...) { ... } locks and automatically unlocks the listed
mutex(es) when the block is exited. This adds a bit of safety against
forgetting the unlock in error paths & co. and makes the code a slight
bit more readable.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Conver these functions:
route_map_add_match
route_map_delete_match
route_map_add_set
route_map_delete_set
To return the `enum rmap_compile_rets` and ensure all functions
that use this code handle all the enumerated possible returns.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We agreed on this several weeks ago on the weekly call, I just forgot to
actually put it in a PR...
A call for any Protobuf FPM users to raise their hand came up empty on
both the mailing list as well as Slack. Let's see if this gets any
response. If not, it'll be time to remove Protobuf FPM.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
even if vty commands were available, the default resolution command was
working only for the first vrf configured. others were ignored. Also,
for nexthop, resolution was working for all vrfs, and not the specific
one.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Move neighbor programming to the dataplane; remove
old apis; remove some ifdef'd use of direct netlink
code points, using neutral values outside of the netlink-
specific files.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
re->nexthop_num and re->nexthop_active_num are calculated while rib
processing. Also It helps in encoding the ZAPI message.
It's good to dump these parameters also, when the system is in
abnormal state.
Signed-off-by: vishaldhingra<vdhingra@vmware.com>
When resolving a nexthop, append its labels to the one its
resolving to along with the labels that may already be present there.
Before we were ignoring labels if the resolving level was greater than
two.
Before:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:07
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:07
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:04
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:17
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:17
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:17
ubuntu_nh#
```
This patch:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111/2222, 00:00:04
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:02
* via 7.7.7.7, dummy1 onlink, label 1111/2222/3333, 00:00:02
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:11
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:11
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:11
ubuntu_nh#
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Debian packaging when run finds a bunch of spelling errors:
I: frr: spelling-error-in-binary usr/bin/vtysh occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bfdd Amount of times Number of times
I: frr: spelling-error-in-binary usr/lib/frr/bgpd occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bgpd recieved received
I: frr: spelling-error-in-binary usr/lib/frr/isisd betweeen between
I: frr: spelling-error-in-binary usr/lib/frr/ospf6d Infomation Information
I: frr: spelling-error-in-binary usr/lib/frr/ospfd missmatch mismatch
I: frr: spelling-error-in-binary usr/lib/frr/pimd bootsrap bootstrap
I: frr: spelling-error-in-binary usr/lib/frr/pimd Unknwon Unknown
I: frr: spelling-error-in-binary usr/lib/frr/zebra Requsted Requested
I: frr: spelling-error-in-binary usr/lib/frr/zebra uknown unknown
I: frr: spelling-error-in-binary usr/lib/x86_64-linux-gnu/frr/libfrr.so.0.0.0 overriden overridden
This commit fixes all of them except the bgp `recieved` issue due to
it being part of json output. That one will need to go through
a deprecation cycle.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Since we are now away from the dual use of the destination field, there
is no need to single out /32 addresses as broadcast. This was bugged
anyway, since the same /32 criteria was used for IPv6 addresses as well,
when `connected_check_ptp` is called in `connected_delete_ipv6`.
Fixes: 3053
Signed-off-by: Juergen Werner <juergen@opensourcerouting.org>
The `destination` field of the connection structure was used to store
the broadcast address, if the connection was not p2p. This multipurpose
is not very evident and the benefits over calculating the bcast address
on the fly minimal.
Signed-off-by: Juergen Werner <juergen@opensourcerouting.org>
Create an interface with IP4 link local address 169.254.0.131/25.
In BGP enable the redistribute connected. Now Zebra will not send
the route corresponding to IPV4 link local address. Now made this
interface down and up. Zebra sends the route to BGP.
Zebra should not send this route to BGP.
This Fix would make the behaviour consistent and would not send the
routes corresponding to IPV4 Link local addresses.
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
In if_netlink.c, when an interface structure, ifp, is first created,
its possible for the master to come up after the slave interface does.
This means, the slave interface has no way to display the master's ifname
in show outputs. To fix this, we need to allow creation by ifindex instead
of by ifname so that this issue is handled.
Signed-off-by: Dinesh G Dutt<5016467+ddutt@users.noreply.github.com>
When displaying the master interface's information in "show interface",
the display is currently the ifindex of the master interface. Make it
display the name as well as that is more useful than the name.
Signed-off-by: Dinesh G Dutt<5016467+ddutt@users.noreply.github.com>
If there is a recursive route resolved over blackhole route, then
the resolved blackhole_type is not getting set correctly.
This fix updates the bh_type correctly for resursive routes.
Signed-off-by: vishaldhingra <vdhingra@vmware.com>
We used the vrf_id in the rtm_table field of the netlink rtmsg to fetch L3VNI.
But, now we program table_id to rtm_table field instead of vrf_id.
Thus, L3VNI fetched using rtm_table is incorrect.
Instead, use nexthop->vrf_id to fetch the L3VNI.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
PR #3745 added EVPN feature to advertise individual
SVI-IPs as MAC-IP routes.
Fix a condition in zebra to send MAC and IP pair
to bgpd when the feature is enabled.
Testing Done:
Originator VTEP:
TORC11:~# ip -br addr show VxU-1002
VxU-1002 UP 45.0.2.2/24 2001:fee1:0:2::2/64
show bgp l2vpn evpn vni 1004
VNI: 1004 (known to the kernel)
Type: L2
Tenant-Vrf: default
RD: 27.0.0.11:3
Advertise-svi-macip : Yes
Import Route Target:
10:1004
Export Route Target:
10:1004
Remote vtep evpn route output for 45.0.4.2:
BGP routing table entry for 27.0.0.11:3:[2]:[0]:[48]:[00:02:00:00:00:2f]:[32]:[45.0.4.2]
Paths: (2 available, best #1)
Advertised to non peer-group peers:
MSP1(uplink-1) MSP2(uplink-2)
Route [2]:[0]:[48]:[00:02:00:00:00:2f]:[32]:[45.0.4.2] VNI 1004
64435 65546
36.0.0.11 from MSP1(uplink-1) (27.0.0.9)
Origin IGP, valid, external, bestpath-from-AS 64435, best (First path received)
Extended Community: RT:10:1004 ET:8
Last update: Thu Aug 8 18:09:13 2019
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The `show ip nht vrf EVA ...` command was not allowing you to only
specify the vrf anymore. Fix this:
robot# show ip nht vrf EVA
<cr>
A.B.C.D IPv4 Address
X:X::X:X IPv6 Address
robot# show ip nht vrf EVA 4.5.6.7
robot# show ip nht vrf EVA
robot#
Ticket: CM-25831
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the route-entry QUEUED flag is cleared in the async
notification path, as it is in the normal results processing
code path.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Update the stats displayed by 'show zebra dplane' - some
counters had been added but not displayed. Also include
the new counters for evpn macs.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When interfaces change while they are up, Zebra sends if_up
notifications with the updated interface info. Change Zebra to send
if_down notifications with interface info when the interface changes
while it is down.
VRRP, at the least, needs these to know about MAC changes while an
interface is down.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When we are sending a redistribute_update, pass the old_re in
so that if we still have it around we can update the calling protocol.
Test:
router ospf
redistribute sharp
!
sharp install route 4.5.6.7 nexthop 192.168.201.1 1
Now add a `ip route 4.5.6.7/32 192.168.201.1`.
This causes zebra to replace the sharp route with the static route.
No update is sent to ospf and debug:
2019/08/01 19:02:38.271998 ZEBRA: 0:4.5.6.7/32: Redist update re 0x12fdbda0 (static), old 0x0 (None)
With fix:
2019/08/01 19:15:09.644499 ZEBRA: 0:4.5.6.7/32: Redist update re 0x1ba5bce0 (static), old 0x1beea4e0 (sharp)
2019/08/01 19:15:09.645462 OSPF: ospf_zebra_read_route: from client sharp: vrf_id 0, p 4.5.6.7/32
Ticket: CM-25847
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Delete an auto MAC with no neighbor associated,
when its VNI is down.
In Following sequence stale MAC entry retained in
FRR (zebra).
- Local MAC-IP pair
- MAC is deleted in bridge fdb table
- VNI is down, triggers IP (neigh) entries removed
from FRR DB.
- MAC retained as AUTO MAC with neigh list count 0.
- When VNI is UP again, stale MAC entry retained in FRR
DB.
When the MAC-IP pair moves behind remote VTEP, local VTEP
fails to add remote entry as its MAC is in auto state.
Ticket:CM-25504
Reviewed By:
Testing Done:
Validated the sequence with fix and auto MAC is deleted
when VNI is down.
When VNI comes up, the remote MAC-IP is added to FRR (Zebra)
and kernel.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Add info info in local mac del debug,
the local sequence and assoicated neigh count.
remote_mac_ip_add modify debug to display
flags value to cover local, remote and auto flags
for the MAC.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The changes came as part of PR #4730, checks
only variable mac, which is never null. Even
for ip version of cli hits "mac" case statement
and failing the clear cli.
Testing Done:
Before Fix:
VTEP-03# show evpn arp-cache vni 1002 duplicate
VNI 1002 #ARP (IPv4 and IPv6, local and remote) 1
IP Type State MAC Remote VTEP
Seq #'s
11.11.11.11 remote active aa:22:aa:aa:aa:aa 27.0.0.16
7/8
VTEP-03# clear evpn dup-addr vni 1002 ip 11.11.11.11
% Requested MAC does not exist in VNI 1002
Post fix:
VTEP-03# clear evpn dup-addr vni 1002 ip 11.11.11.11
VTEP-03#
VTEP-03# show evpn mac vni all duplicat
VNI 1002 #MACs (local and remote) 1
MAC Type Intf/Remote VTEP VLAN Seq #'s
aa:aa:aa:aa:aa:aa remote 27.0.0.16 7/8
Post fix:
VTEP-03# clear evpn dup-addr vni 1002 mac aa:aa:aa:aa:aa:aa
VTEP-03#
VTEP-03# clear evpn dup-addr vni 1002 ip 11.11.11.11
VTEP-03#
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The 'show ip import-check A.B.C.D` code was generating
a /32 prefix for comparison. Except import-check was
being used by bgp to track networks. So they could
have received a /24( or anything the `network A.B.C.D/M`
statement specifies ).
Consequently when we do a `show ip import-check A.B.C.D`
we would never find the network but a `show ip import-check |
grep A.B.C.D` would find it.
Fix the exact comparison to a match.
For the `show ip nht A.B.C.D` case we are comparing
a /32 to a /32 so prefix_match will work still.
While a `show ip import-check A.B.C.D` will now show
the expected behavior as well.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The flag ROUTE_ENTRY_NEXTHOPS_CHANGED is only ever set or unset.
Since this flag is not used for anything useful, remove from system.
By changing this flag we have re-ordered `internalStatus' of json
output of zebra rib routes. Go through and fix up tetsts to
use the new values.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The code as written before this code change point would enqueue
every system route type to be refigured when we have an
interface event. I believe this was to originally handle bugs
in the way nexthop tracking was handled, mainly that if you keep
asking the question you'll eventually get the right answer.
Modify the code to not do this, we have fixed nexthop tracking
to not be so brain dead and to know when it needs to refigure
a route that it is tracking.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When a new system route comes in and we have a pre-existing
non-system route we are not deleting the current system
route from the linux kernel.
Modify the code such that when a route replace is sent
to the kernel with a new route as a system route and
the old route as a non-system route do a delete of
the old route so it is no longer in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Initial data struct and api changes to support EVPN MAC
updates via the dataplane subsystem (no handlers yet).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Problem reported where certain routes were not being passed on to
clients if they were operated on while still queued for kernel
installation. Changed it to defer working on entries that were
queued to dplane so we could operate on them after getting an
answer back from kernel installatino.
Ticket: CM-25480
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Introducing a 3rd state for route_map_apply library function: RMAP_NOOP
Traditionally route map MATCH rule apis were designed to return
a binary response, consisting of either RMAP_MATCH or RMAP_NOMATCH.
(Route-map SET rule apis return RMAP_OKAY or RMAP_ERROR).
Depending on this response, the following statemachine decided the
course of action:
State1:
If match cmd returns RMAP_MATCH then, keep existing behaviour.
If routemap type is PERMIT, execute set cmds or call cmds if applicable,
otherwise PERMIT!
Else If routemap type is DENY, we DENYMATCH right away
State2:
If match cmd returns RMAP_NOMATCH, continue on to next route-map. If there
are no other rules or if all the rules return RMAP_NOMATCH, return DENYMATCH
We require a 3rd state because of the following situation:
The issue - what if, the rule api needs to abort or ignore a rule?:
"match evpn vni xx" route-map filter can be applied to incoming routes
regardless of whether the tunnel type is vxlan or mpls.
This rule should be N/A for mpls based evpn route, but applicable to only
vxlan based evpn route.
Also, this rule should be applicable for routes with VNI label only, and
not for routes without labels. For example, type 3 and type 4 EVPN routes
do not have labels, so, this match cmd should let them through.
Today, the filter produces either a match or nomatch response regardless of
whether it is mpls/vxlan, resulting in either permitting or denying the
route.. So an mpls evpn route may get filtered out incorrectly.
Eg: "route-map RM1 permit 10 ; match evpn vni 20" or
"route-map RM2 deny 20 ; match vni 20"
With the introduction of the 3rd state, we can abort this rule check safely.
How? The rules api can now return RMAP_NOOP to indicate
that it encountered an invalid check, and needs to abort just that rule,
but continue with other rules.
As a result we have a 3rd state:
State3:
If match cmd returned RMAP_NOOP
Then, proceed to other route-map, otherwise if there are no more
rules or if all the rules return RMAP_NOOP, then, return RMAP_PERMITMATCH.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
A client was sending zebra a route with no nexthops! Update the
error message to tell us *Which* daemon is doing this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Initial commit of understanding interface speed changes
on startup was this commit:
dc7b3caefb
Effectively we had encountered situations on system startup
where the interface speed for a device was not properly setup
when zebra learns about the interface ( Imagine a bond being
brought up and the controlling software creating the bond
is not fast given system load, the bond's speed changes
upwards for each interface added ).
The initial workup on this was to allow a 15 second window
and then just reread the interface speed. We've since noticed
that under heavy system load on startup this is not always sufficient.
So modify the code to wait the 15 seconds and then check the interfaces
speed. If the interfaces speed is still MAX_UINT32T or it has changed
let's wait a bit longer and try again.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
in order to both streamline the code and allow users to
define their own specialized versions of the LM api handlers,
define hooks for the 4 main primitives offered by the label
manager (i.e. connect, disconnect, get_chunk and release_chunk),
and have the existing code be run in response to a hook_call.
Additionally, have the responses to the requesting daemon be
callable from an external API.
Note that the proxy version of the label manager was a source of
issues and hardly used in practice. With the new hooks, users with
more complex requirements can simply plug in their own code to
handle label distribution remotely, so there is no longer a reason
to maintain this code.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
when requesting a specific label chunk (e.g. for the SRGB),
it might happen that we cannot get what we want. In this
event, we must be prepared to receive a response with no
label chunk. Without this fix, if the remote label manager
was not able to alloate the chunk we requested, we would
hang indefinitely trying to read data from the stream which
was not there.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
For SRGB, we need to support chunk requests starting at a
specific point in the label space, rather than just asking
for any sufficiently large chunk. To this purpose, we extend
the label manager api to request a chunk with a base value;
if the base is set to 0, the label manager will behave as it
currently does, i.e. fetching the first free chunk big enough
to satisfy the request.
update all the existing calls to get chunks from the label
manager so that they use MPLS_LABEL_BASE_ANY as the base
for the requested chunk
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Add a conditional to guard against segfaulting on the debug
statement when zvrf lookup fails.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
initially, that command was dumping only tables from default vrfs.
the change here consists in dumping all the tables from all the vrfs.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the table identifier is made visible. this permits to easily know which
table identifier is dumped, or which table that entry belongs to, when
one calls 'show ip route all' command.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this vty command explores the routing tables available, and dumps the
routing entries. there is no need to pass a table identifier, since all
configured tables are dumped.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
in addition to support for tcpflags, it is possible to filter on any
protocol. the filtering can then be based with iptables.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
zvni setup in zebra is controlled via bgpd i.e. advertise_all_vni
from bgpd triggers this setup. As a part of zvni creation we may need
to setup BUM mcast SG entries which are propagated to pimd for MDT setup.
Now pimd may not be present at the time of zvni creation or may restart
post zvni creation so we need a mechanism to replay (on pimd startup) and
to cleanup (on pimd stop). This is addressed via zebra_vxlan_sg_replay and
zebra_evpn_pim_cfg_clean_up.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
As part of PR 4458, when a client (bgpd) goes down,
zebra cleans up any evpn state including remotely learned
neighs, macs and vteps are suppose to be cleaned up,
uninstall from kernel tables.
Neighs (arps), macs and vteps (HREP entries) were not
removed from kernel tables as the uninstall flag as not set.
Clean up l3vni associated remote neighs, macs and vteps.
Ticket:CM-25468
Reviewed By:CCR-8889
Testing Done:
Validated in evpn symmetric routing topology where
remotely learned l2/l3 vnis neigh, macs and remote
vtep (hrep) entries are installed in kernel table,
perform systemctl stop frr.service and validated
all remotely learned entries cleaned up from kernel
tables.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Add a file that exposes functions which modify nexthop groups.
Nexthop groups are techincally immutable but there are a
few special cases where we need direct access to add/remove
nexthops after the group has been made. This file provides a
way to expose those functions in a way that makes it clear
this is a private/hidden api.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When installing a table route into the kernel choose
RTPROT_ZEBRA as the installing/controlling protocol.
This way we can know we installed it as well as stop
the warnings about this special case of `ip import-table XX`
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are importing/removing the table entry from table X into the
default routing table we are not properly setting the table_id
of the route entry. This is causing the route to be pushed
into the wrong internal table and to not be found for deletion.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The import table code assumes that they will only work
in the default vrf. This is ok, but we should push the
vrf_id and zvrf to be passed in instead of just using
VRF_DEFAULT.
This will allow us to fix a couple of things:
1) A bug in import where we are not creating the
route entry with the appropriate table so the imported
entry is showing up in the wrong spot.
2) In the future allow `ip import-table X` to become
vrf aware very easily.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The import-table code when looking up the table to use
for route-import was reversing the order of the table_id
and vrf_id causing us to never ever lookup a table
and we would cause the `ip|ipv6 import-table X` commands
to be just ignored.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Improve debugging when we cannot find a route to delete
that we have been told to delete.
New output:
2019/06/25 17:43:49 ZEBRA: default[0]:4.5.6.7/32 doesn't exist in rib
2019/06/25 17:43:49 ZEBRA: default[0]:4.5.6.8/32 doesn't exist in rib
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1) If we are moving the nexthop we are tracking to
a new rn in the rib, then we know that the route
to get to that nexthop has changed. As such
we should notify the upper level.
This manifested itself because the code had a trigraph `?`
in the wrong order. Put the comparison in the right order.
2) If we are re-matching to the same rn and we call compare_state
then we need to see if our stored nexthops are the same or different.
If they are the same we should not notify. If they are different
we should notify. compare_state was only comparing the flags
on a route and since those are not necessarily the right flags
to look at( and we are well after the fact that the route has
already changed and been processed ) let's just compare
the nexthops to see if they are the same or different.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
when receiving an EAGAIN while trying to read the header
of a ZAPI message, we were erroneously continuing as if
everything was fine, which could crash zebra. Fix this
by returning and letting the re-armed read task deal with
this
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Also fixes some issues related to -
show evpn arp-cache vni xx vtep yy
Ticket: CM-25380
Signed-off-by: Nitin Soni<nsoni@cumulusnetworks.com>
Reviewed-by: CCR-8858
Testing-Done: Evpn scale test with 30K neighs
The failed neighbor event logging that was recently added in
commit: 3acae086ba
cast a bit too broad of a stroke. We should only inform
the user that we were ignoring the RTM_NEWNEIGH FAIL callback
when we believe it was one of our own 5549 entries.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When dumping rib data about a route for `debug rib detail`
modify the dump command to display the prefix as part
of every line so that we can use a grep on the log
file.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When BGP daemon is down, Clean up its configuration state from zebra.
When the BGP daemon is up again, it will push its configuration to zebra
Delete the MAC and neighbor information received on the BGP session,
while retaining the local MAC and local ARP entries.
Signed-off-by: Kishore Aramalla karamalla@vmware.com
Problem discovered in testing that occasionally when an interface
address was flushed, the corresponding route would be removed from
the kernel and zebra but remain in the bgp table and be advertised
to peers. Discovered that when zebra_rib_evaluate_nexthops spun
thru the tree list of rns, if the timing and circumstances were
right, it would move elements and miss evaluating some. Changed
from frr_each to frr_each_safe and the problem is now gone.
Ticket: CM-25301
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Add a expected count for the route node we will be processing
as part of nexthop resolution and modify the type to display
a useful string of what the type is instead of a number.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a bit of extra code to indicate to the operator why
we intentionally rejected a kernel route from being used.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
- When the connection with the FPM socket is established, iterate through all the
L3VNIs and send all the RMACs for FPM processing zfpm_conn_up_thread_cb"
- We have already handled connection down even in previous commits. When the FPM
connection goes down, empty mac_q and FPM mac info hash table
"zfpm_conn_down_thread_cb"
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
- FPM write thread calls "zfpm_build_updates()" to process mac_q and dest_q and
to write update buffer over the FPM socket.
- "zfpm_build_updates()" processes all the update queues one by one in a while
loop. It will break the while loop and return if Queue processing function
returns "FPM_WRITE_STOP" OR FPM write buffer is full OR all the queues are
empty (no more update to process).
- "zfpm_build_route_updates()" dequeues and processes route nodes from "dest_q".
- "zfpm_build_mac_updates()" dequeues and processes MAC nodes from "mac_q"
- These queue processing functions return with "FPM_WRITE_STOP" if the write
buffer is full. Return value is "FPM_GOTO_NEXT_Q" if enough updates are
processed from this queue and we want to move on to the next queue.
- In each call, a queue processing function will process max
"FPM_QUEUE_PROCESS_LIMIT (10000)" updates to avoid starvation of other queues.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
- Define a hook "zebra_mac_update" which can be registered by multiple
data plane components (e.g. FPM, dplane).
DEFINE_HOOK(zebra_rmac_update, (zebra_mac_t *rmac, zebra_l3vni_t *zl3vni, bool
delete, const char *reason), (rmac, zl3vni, delete, reason))
- While performing RMAC add/delete for an L3VNI, call "zebra_mac_update" hook.
- This hook call triggers "zfpm_trigger_rmac_update". In this function, we do a
lookup for the RMAC in fpm_mac_info_table. If already present, this node is
updated with the latest RMAC info. Else, a new fpm_mac_info_t node is created
and inserted in the queue and hash data structures.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
- FPM MAC structure: This data structure will contain all the information
required for FPM message generation for an RMAC.
struct fpm_mac_info_t {
struct ethaddr macaddr;
uint32_t zebra_flags; /* Could be used to build FPM messages */
vni_t vni;
ifindex_t vxlan_if;
ifindex_t svi_if; /* L2 or L3 Bridge interface */
struct in_addr r_vtep_ip; /* Remote VTEP IP */
/* Linkage to put MAC on the FPM processing queue. */
TAILQ_ENTRY(fpm_mac_info_t) fpm_mac_q_entries;
uint8_t fpm_flags;
};
- Queue structure for FPM processing:
For FPM processing, we build a queue of "fpm_mac_info_t". When RMAC is
added or deleted from zebra, fpm_mac_info_t node is enqueued in this queue
for the corresponding operation. FPM thread will dequeue these nodes one by
one to generate a netlink message.
TAILQ_HEAD(zfpm_mac_q, fpm_mac_info_t) mac_q;
- Hash table for "fpm_mac_info_t"
When zebra tries to enqueue fpm_mac_info_t for a new RMAC add/delete
operation, it is possible that this RMAC is already present in the queue. So,
to avoid multiple messages for duplicate RMAC nodes, insert fpm_mac_info_t
into a hash table.
struct hash *fpm_mac_info_table;
- Before enqueueing any MAC, try to fetch the fpm_mac_info_t from the hash
table first.
- Entry is deleted from the hash table when the node is dequeued.
- For hash table key generation, parameters used are "mac adress" and "vni"
This will provide a fairly unique key for a MAC(fpm_mac_info_hash_keymake).
- Compare function uses "mac address", "RVTEP address" and "VNI" as the key
which is sufficient to distinguish any two RMACs. This compare function is
used for fpm_mac_info_t lookup (zfpm_mac_info_cmp).
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
Found that the "show interface brief" command was missing the
ability to specify all vrfs. Added that capability via this
fix.
Ticket: CM-25139
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
If we get a neighbor entry for 5549 failure notice
from the kernel that means that something has probably
gone terribly wrong. Let's notice and not reinstall.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is mostly relevant for Solaris, where config.h sets up some #define
that affect overall header behaviour, so it needs to be before anything
else.
Signed-off-by: David Lamparter <equinox@diac24.net>
Field vrf_id is replaced by the pointer of the struct vrf *.
For that all other code referencing to (interface)->vrf_id is replaced.
This work should not change the behaviour.
It is just a continuation work toward having an interface API handling
vrf pointer only.
some new generic functions are created in vrf:
vrf_to_id, vrf_to_name,
a zebra function is also created:
zvrf_info_lookup
an ospf function is also created:
ospf_lookup_by_vrf
it is to be noted that now that interface has a vrf pointer, some more
optimisations could be thought through all the rest of the code. as
example, many structure store the vrf_id. those structures could get
the exact vrf structure if inherited from an interface vrf context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
vrf_id parameter is replaced with struct vrf * parameter. It is
needed to create vrf structure before entering in the fuction.
an error is generated in case the vrf parameter is missing.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
there may be cases where the vrf is yet allocated from the vty, and the
discovery process did not make the relationship between the vrf_id and
the name of the vrf. For instance, by parsing an interface belonging to
vrf-id X, it is not sure that vrf-id X and vrfname XX are talking about
the same vrf. For that, lets allocate the vrf, and lets try to detect
there is a duplicate case in vrf, so that the merge can be done without
any impact for the user.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the interface search is based on vrfs. As at startup, some interfaces
may be configured, there is need to have vrfs contexts present. A macro
is being appended with an extra parameter that permits create a vrf and
return the context. This macro is also used by some show routines, but
will not create vrfs, because that extra parameter will be set to false,
on that case.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Upon accessing interface NB API, the interface is created, if the vrf
is available. the commit does not change the behaviour, since at this
commit, this is not yet possible to have vrf contexts, while zebra did
not connect to daemons. However, that commit adds some work, so that it
will be possible to work on a vrf context, without having the vrf_id
completely resolved. for instance, if we suppose a vrf is created by
command 'vrf TOTO' in the starting configuration of a daemon, then 'interface
TITI vrf TOTO' will permit to create interface TITI within vrf TOTO.
the macro VRF_GET_INSTANCE will return the vrf context, if available or
not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the vrf_id parameter is replaced by struct vrf * parameter.
this impacts most of the daemons that look for an interface based on the
name and the vrf identifier.
Also, it fixes 2 lookup calls in zebra and sharpd, where the vrf_id was
ignored until now.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
vrf pointer is used as reference when calling if_get_by_name() function.
this will permit to create interfaces with an unknown vrf_id, since it
is only necessary to get the vrf structure to store the interfaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Update pbrd to properly handle nexthop tracking.
When we get a notification that a change happened on a nexthop,
re-install it if its still valid.
Before, we were running over all routes and re-queueing them if they
were PBR routes. This commit removes that and puts all the processing
in PBR instead.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
in the case the vrf backend is vrf-lite, there is no need to have
separate sockets. use a socket located in zrouter, so that when needing
the socket, a common API is used. that API will return the appropriate
socket value.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when network namespace is used as vrf backend, there is need to have
separate contexts for rtadv contexts.
route advertisements have to look for appropriate interface based on
zvrf context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
In a variety of places we are using DAEMON_VTY_DIR, convert
to use frr_vtydir. This will allow us in a future commit
to have the -N namespace option be automatically used.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add the local and remote sequence number to the `show evpn arp-cache vni XX` command.
VNI 1000111 #ARP (IPv4 and IPv6, local and remote) 15
IP Type State MAC Remote VTEP Seq #'s
fe80::202:ff:fe00:15 remote active 00:02:00:00:00:15 6.0.0.31 0/0
fe80::202:ff:fe00:8 local active 00:02:00:00:00:08 0/0
60.1.1.111 local active 00:02:00:00:00:08 0/0
2060:1:1:1::11 local active 00:e0:ec:38:49:a1 0/0
fe80::202:ff:fe00:11 remote active 00:02:00:00:00:11 6.0.0.30 0/0
2060:1:1:1::211 remote active 00:02:00:00:00:11 6.0.0.30 0/0
2060:1:1:1::121 remote active 00:02:00:00:00:0c 6.0.0.29 0/0
60.1.1.211 remote active 00:02:00:00:00:11 6.0.0.30 0/0
fe80::202:ff:fe00:c remote active 00:02:00:00:00:0c 6.0.0.29 0/0
60.1.1.11 local active 00:e0:ec:38:49:a1 0/0
fe80::2e0:ecff:fe38:49a1 local active 00:e0:ec:38:49:a1 0/0
60.1.1.221 remote active 00:02:00:00:00:15 6.0.0.31 0/0
2060:1:1:1::111 local active 00:02:00:00:00:08 0/0
2060:1:1:1::221 remote active 00:02:00:00:00:15 6.0.0.31 0/0
60.1.1.121 remote active 00:02:00:00:00:0c 6.0.0.29 0/0
The seq numbers are at 0/0 because we have had no mobility events.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
the interface search done was not looking in the appropriate zns. The
display was then wrong. Update the show command with the correct zns.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Say, more than one sequence of a route-map uses the same named entity
in its match clause. After that entity is removed from any one of the
route-map sequences, any further changes made to that entity doesn't
dynamically take effect.
A reference counter, that allows the named entity to keep a count of
the route-maps dependent on it, has been introduced to address this issue.
Signed-off-by: NaveenThanikachalam <nthanikachal@vmware.com>
When you have compiled FRR with a large multipath number
then encoding large ecmp routes between zebra and the
routing daemons. There exists a theoritical size
of multipath that will cause the encoding to be larger
than the ZEBRA_MAX_PACKET_SIZ. In the cases where
we have allocated streams that will encode routes
then let's ensure that whatever size we have will
auto-fit what we say we can send.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Introducing a 3rd state for route_map_apply library function: RMAP_NOOP
Traditionally route map MATCH rule apis were designed to return
a binary response, consisting of either RMAP_MATCH or RMAP_NOMATCH.
(Route-map SET rule apis return RMAP_OKAY or RMAP_ERROR).
Depending on this response, the following statemachine decided the
course of action:
Action: Apply route-map match and return the result (RMAP_MATCH/RMAP_NOMATCH)
State1: Receveived RMAP_MATCH
THEN: If Routemap type is PERMIT, execute other rules if applicable,
otherwise we PERMIT!
Else: If Routemap type is DENY, we DENYMATCH right away
State2: Received RMAP_NOMATCH, continue on to next route-map, otherwise,
return DENYMATCH by default if nothing matched.
With reference to PR 4078 (https://github.com/FRRouting/frr/pull/4078),
we require a 3rd state because of the following situation:
The issue - what if, the rule api needs to abort or ignore a rule?:
"match evpn vni xx" route-map filter can be applied to incoming routes
regardless of whether the tunnel type is vxlan or mpls.
This rule should be N/A for mpls based evpn route, but applicable to only
vxlan based evpn route.
Today, the filter produces either a match or nomatch response regardless of
whether it is mpls/vxlan, resulting in either permitting or denying the
route.. So an mpls evpn route may get filtered out incorrectly.
Eg: "route-map RM1 permit 10 ; match evpn vni 20" or
"route-map RM2 deny 20 ; match vni 20"
With the introduction of the 3rd state, we can abort this rule check safely.
How? The rules api can now return RMAP_NOOP (or another enum) to indicate
that it encountered an invalid check, and needs to abort just that rule,
but continue with other rules.
Question: Do we repurpose an existing enum RMAP_OKAY or RMAP_ERROR
as the 3rd state (or create a new enum like RMAP_NOOP)?
RMAP_OKAY and RMAP_ERROR are used to return the result of set cmd.
We chose to go with RMAP_NOOP (but open to ideas),
as a way to bypass the rmap filter
As a result we have a 3rd state:
State3: Received RMAP_NOOP
Then, proceed to other route-map, otherwise return RMAP_PERMITMATCH by default.
Signed-off-by:Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
The multicast mode enum was a global static in zebra_rib.c
it does not belong there, it belongs in zebra_router, moving.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
BGP always sends down the correct distance to use. We do
not need rib_add_multipath to double check the code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Since these functions are not really rib processing problems
let's move them to zebra_nhg.c which is meant for processing of
nexthop groups.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow route notifications to trigger route state changes,
such as installed -> not installed.
Clean up the fib-specific nexthop-group in a couple of
un-install paths.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use some common handling for both route update results
processing and dataplane notification processing. Use the
fib-specific nexthop-group if the update to a route results
in different nexthop status than the default rib-provided
nexthop-group.
Use the fib-specific nexthop-group, if present, to provide
the output of 'show ip fib'.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Some updates may be the result of a plugin's actions - such
as an async notification. Add accessor so that we can
identify that an update was generated by a plugin.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When setting route nexthops' installation state based on a
dataplane context struct, unset the installed state if a
nexthop was not present in the dataplane context.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Create a helper api that locates a zebra route-node from info
in a dplane context struct. Moved code from the results handler
to make a more-general api that could be used in other paths.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add an api to update the status of a route based on info
from a dplane context object. Use the api when processing
route update results from the dataplane.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a callback called at start time, once the dplane pthread
and thread_master are available. The callback is optional.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This code doees this:
a) Imagine ospf installs a route into zebra. Zebra crashes and
we restart FRR. If we are using the -k option on zebra than
all routes are re-read in, including this OSPF route.
b) Now imagine at the same time that zebra is starting backup
ospf on a different router looses a link to the this route.
c) Since zebra was run with -k this OSPF route is read back
in but never replaced and we now have a route pointing out
an interface to other routers that cannot handle it.
We should never allow users to implement bad options from zebra's
perspective that allow them to put themselves into a clear problem
state and additionally we have *absolutely* no mechanism to ever
fix that broken route without special human interaction.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
<Initial Code from Praveen Chaudhary>
Add the a `--graceful_restart X` flag to zebra start that
now creates a timer that pops in X seconds and will go
through and remove all routes that are older than startup.
If graceful_restart is not specified then we will just pop
a timer that cleans everything up immediately.
Signed-off-by: Praveen Chaudhary <pchaudhary@linkedin.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow label ignoring when comparing nexthops. Specifically,
add another functon nexthop_same_no_labels() that shares
a path with nexthop_same() but doesn't check labels.
rib_delete() needs to ignore labels in this case.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The functions nexthop_same() does not check the resolved
nexthops so I don't think this function is even needed
anymore.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
This is necessary to avoid a name collision with std::for_each
from C++.
Fixes the compilation of the gRPC northbound module.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
- Today, rtm_table field takes a vrf_id. It should take table_id
- rtm_table field is a uchar field which can only accomodate table_id less than
256. To support table id greater than 255, if the table_id is greater than 255,
set rtm_table to 0 and add RTA_TABLE attribute with 32 bit value as the
table_id.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
- For data plane processing of VxLAN routes, add encap type and L3VNI info to
rtmsg message for FPM.
- Add "RTA_ENCAP_TYPE" attribute for VxLAN encap with value 100.
This value is not currently used for RTA_ENCAP_TYPE for any encap.
- If "RTA_ENCAP_TYPE" is 100, add "RTA_ENCAP" attribute with "RTA_VNI" as a
nested attribute of RTA_ENCAP
Format of RTA_VNI attribute:
Len(2 bytes) type (2 bytes) Value(4 bytes)(VNI)
00 08 : 00 00 : 1000
RTA_VNI attribute is a custom attribute.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
VRRP doesn't install any routes, but should still have an array entry.
Also add a help string for VRRP to route_types.txt
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We were running into some problems where VRRP is trying to protodown
interfaces that no longer exist. While this is a minor bug in its own
right, this was crashing Zebra because Zebra was not doing a null check
after its ifindex lookup.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Add a check for after table lookup during route map update.
If the table ID does not exist, continue.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The multipath_num global variable was moved into
zrouter.multipath_num but this particular usage
of it was not updated.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
bfd cbit is a value carried out in bfd messages, that permit to keep or
not, the independence between control plane and dataplane. In other
words, while most of the cases plan to flush entries, when bfd goes
down, there are some cases where that bfd event should be ignored. this
is the case with non stop forwarding mechanisms where entries may be
kept. this is the case for BGP, when graceful restart capability is
used. If BFD event down happens, and bgp is in graceful restart mode, it
is wished to ignore the BFD event while waiting for the remote router to
restart.
The changes take into account the following:
- add a config flag across zebra layer so that daemon can set or not the
cbit capability.
- ability for daemons to read the remote bfd capability associated to a bfd
notification.
- in bfdd, according to the value, the cbit value is set
- in bfdd, the received value is retrived and stored in the bfd session
context.
- by default, the local cbit announced to remote is set to 1 while
preservation of the local path is not set.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Make the RIB_*_ROUTE() macro which is passed a route in rib.h just use
the R*_ROUTE() macros that directly check the type in rt.h.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The CLI code ensures that the clippy code produces
valid input for the zebra_routemap.c functions, but
coverity SA does not understand this fact. So add
some asserts to make the coverity SA happy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
No need to check for non-null ctx at this point in the
function as that it has already been derefed.
Signed-off-by: donald Sharp ,sahrpd@cumulusnetworks.com>
The re->uptime usage of time(NULL) leaves it open to
timing changes from outside influence. Switching
to monotime allows us to ensure that we have a timestamp
that is always increasing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The route_map_event_hook callback was passing the `route_map_event_t`
to each individual interested party. No-one is ever using this data
so let's cut to the chase a bit and remove the pass through of data.
This is considered ok in that the routemap.c code came this way
originally and after 15+ years no-one is using this functionality.
Nor do I see any `easy` way to do anything useful with this data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1. If prefix not found, print "{}" for json
2. Print "Network not in table" for route option
3. Print "Network not in FIB" for fib option
4. Take care of "show ip route/fib vrf all prefix" command.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
if the local sticky mac delete request is received,
if there are associated neighbor entries present, mac's
only local flag is removed and marked as auto mac.
this results in next local mac learning automatically assumes
mac is sticky.
There is a case when bridge learning off is configured, user
configures sticky mac via bridge fdb add.
This MAC learns associated neighbor entry.
Later user deletes stick mac via bridge fdb del, this triggers
frr to delete mac but if there are neighbors present, frr marks
MAC as AUTO but does not remove sticky flag.
User enables bridge learning on which triggers
The mac to learn as dynamic entry and in absence of this
fix, the mac is marked as sticky.
Ticket:CM-24968
Reviewed By:CCR-8683
Testing Done:
Validated broken condition with internally reproduction
with fix and without.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
According to the review comments, added "Network not in FIB" message when we do
not have a FIB route present for given prefix.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
With the previous commit, the zebra_router_score_proto function
became unnecessary, so let us remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For each table created by a vrf, keep track of it and
allow for proper cleanup on shutdown of that particular
table. Cleanup client shutdown to only cleanup data
that the particular vrf owns. Before we were cleaning
the same table 2 times.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Combine the zebra_vrf_other_route_table and zebra_vrf_table_with_table_id
functions into 1 function. Since they are basically the same thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
"show ip/ipv6 route <prefix> [json]" uses a different parser chain from
"show ip/ipv6 route [json]".
"show ip/ipv6 route <prefix> [json]" CLI does not support "fib" option.
Fix:
Add "fib" option to the above command.
The new command is: "show ip/ipv6 <route/fib> <prefix> [json]"
If "fib" option is specified, we will show only the selected routes
(Similar to "show ip/ipv6 fib")
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
messages from daemons to bfd daemons go through zebra. zebra reuses the
vrf identifier to send messages to bfd.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Try to remove any LSPs associated with a vrf when the vrf is
deleted. The vrf code was calling a helpful zebra_mpls api,
but that api was basically a no-op for vrfs other than
the default.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This command is broken and has been broken since the introduction
of vrf's. Since no-one has complained it is safe to assume that
there is no call for this specialized linux command. Remove
from the system with extreme prejudice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The rib_add( and rib_delete( functions are there to allow
kernel interactions with the creation of routes. Fixup the
code to be consistent in the passup of the tableid.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we switched to a pthread per client, we lost the ability to
correlate zapi message debugs with their handlers in zlog, because the
message was logged when it was read off the zapi socket and not right
before it was processed. Move the zapi msg hexdump to happen right
before we call the message handler.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The route_info[X].meta_q_map *must* be less than MQ_SIZE
or we will do some strange stuff, so assert on it at startup.
The distance in route_info is a uint8_t so let's keep the data
structure the same.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ifp pointer must be pointing at a real location
in memory since right above us in this loop we
return if it is.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The route entry code was using a custom linked list to handle
route entries. Remove and replace with the new lib link list
code. This reduces the size of the route entry by a further
8 bytes.
Observant people will notice that the current linked list
implementation is singly linked, while the Route Entry
is doubly linked. I am not terribly concerned about this
change as that 1) we do not see a large number of route
entries per prefix( say 2 maybe 3 items ) and route entries
do not come and go that often.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `struct rib_dest_t` was being used to store the linked
list of rnh's associated with the node. This was taking up
a bunch of memory. Replace with new data structure supplied
by David and see the memory reductions associated with 1 million
routes in the zebra rib:
Old:
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 675 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 567 MiB
Free small blocks: 39 MiB
Free ordinary blocks: 69 MiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
New:
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 574 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 536 MiB
Free small blocks: 33 MiB
Free ordinary blocks: 4600 KiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
`struct rnh` was moved to rib.h because of the tangled web
of structure dependancies. This data structure is used
in numerous places so it should be ok for the moment.
Future work might be needed to do a better job of splitting
up data structures and function definitions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `struct rib_dest_t` was being used to store the linked
list of rnh's associated with the node. This was taking up
a bunch of memory. Replace with new data structure supplied
by David and see the memory reductions associated with 1 million
routes in the zebra rib:
Old:
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 675 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 567 MiB
Free small blocks: 39 MiB
Free ordinary blocks: 69 MiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
New:
Memory statistics for zebra:
System allocator statistics:
Total heap allocated: 574 MiB
Holding block headers: 0 bytes
Used small blocks: 0 bytes
Used ordinary blocks: 536 MiB
Free small blocks: 33 MiB
Free ordinary blocks: 4600 KiB
Ordinary blocks: 0
Small blocks: 0
Holding blocks: 0
`struct rnh` was moved to rib.h because of the tangled web
of structure dependancies. This data structure is used
in numerous places so it should be ok for the moment.
Future work might be needed to do a better job of splitting
up data structures and function definitions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a function to check if the route_info array
has all types specified with data in it. Specifically,
test the 'key' attribute for non-zero data. Ignore
ZEBRA_ROUTE_SYSTEM as it should be zero key anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
With flooding control added recently we were not properly handling
the new flood control parameter in zebra_vxlan.c handler functions.
The error message that was being repeatedly seen:
2019/05/01 00:47:32 ZEBRA: [EC 100663311] stream_get2: Attempt to get out of bounds
2019/05/01 00:47:32 ZEBRA: [EC 100663311] &(struct stream): 0x7f0f04001740, size: 22, getp: 22, endp: 22
The fix was to ensure that both the _add and _del functions kept proper
sizing of amount of data read *and* the _del function was not
reading the flood_control data from the stream.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a comment to indicate that route types added to
Zebra, should also be present in the route_info array.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add OpenFabric to the route_info array for handling processing
of the OpenFabric route type.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Problem reported that route-maps applied to "ip protocol table bgp"
would not be invoked if the ip protocol table command was issued
after the bgp prefixes were installed. Found that a recent change
improving how often nexthop_active_update runs missed causing this
filtering to be applied. This fix resolves that issue as well as
a couple of other places that were problematic with the recent
change.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
in the case the vrf backend is vrf-lite, there is no need to have
separate sockets. use a socket located in zrouter, so that when needing
the socket, a common API is used. that API will return the appropriate
socket value.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when network namespace is used as vrf backend, there is need to have
separate contexts for rtadv contexts.
route advertisements have to look for appropriate interface based on
zvrf context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The alias/description of an interface in linux was being
used to override the internal description. As such let's
fix the display to keep track of both if we have it.
Config in FRR:
!
interface docker0
description another combination
!
interface enp3s0
description BAMBOOZLE ME WILL YOU
!
Config in linux:
sharpd@robot ~/f/zebra> ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
alias This is the loopback you cabbage
2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 74:d0:2b:9c:16:eb brd ff:ff:ff:ff:ff:ff
alias HI HI HI
Now the 'show int descr' command:
robot# show int description
Interface Status Protocol Description
docker0 up down another combination
enp3s0 up up BAMBOOZLE ME WILL YOU
HI HI HI
lo up up This is the loopback you cabbage
Fixes: #4191
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Start using the dataplane for interface-address programming,
on netlink platforms. Other platforms just stubbed at this
point.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
These updates act as triggers to pimd to -
1. join the MDT for rxing VxLAN encapsulated BUM traffic
2. register the local-vtep-ip as a source for the MDT
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
An SG entry is added (if one doesn't already exist) when a l2-VNI is
associated with a mcast-grp and local-vtep-ip.
And viceversa; when the last l2-vni using a MDT is removed the SG
entry is deleted.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Based code for adding (S, G) entries. These entries are created when
a mcast-group and local-VTEP-IP is associated with and L2 VNI.
The parent (*, G) entries are created implicitly on the (S, G) addition
and play the role of termination entries.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Each multicast tunnel is associated with a -
1. Tunnel origination mroute that is used for forwarding the
VxLAN encapsulated flow -
S - local VTEP-IP
G - BUM mcast-group
2. And a tunnel termination entry -
S - * (any remote VTEP)
G - BUM mcast-group
Multiple L2 VNIs can share the same BUM mcast group (and local-VTEP-IP).
Zebra maintains an mcast (SG) hash table to pass this info to pimd for
subsequent MDT setup.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Remote VTEPs advertise the flood mode via IMET and the ingress VTEP
needs to perform head-end-replication of BUM packets to it only if the
PMSI tunnel type is set to ingress-replication. If a type-3 route is not
rxed or rxed with a mode other than ingress-replication we can skip
installation of the flood fdb entry for that L2-VNI. In that case the
remote VTEP is either not interested in BUM traffic or is using a
"static-config" based replication mode like PIM.
Sample output with HER -
=======================
root@TORS1:~# vtysh -c "show evpn vni 1000" |grep "Remote\|flood"
Remote VTEPs for this VNI:
27.0.0.8 flood: HER
root@TORS1:~#
Sample output with PIM-SM -
=========================
root@TORS2:~# vtysh -c "show evpn vni 1000" |grep "Remote\|flood"
Remote VTEPs for this VNI:
27.0.0.7 flood: -
root@TORS2:~#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The multicast group ip address for BUM traffic is configurable per-l2-vni.
One way to configure that is to setup a vxlan device that per-l2-vni and
specify the address against that vxlan device -
root@TORS1:~# vtysh -c "show interface vx-1000" |grep -i vxlan
Interface Type Vxlan
VxLAN Id 1000 VTEP IP: 27.0.0.15 Access VLAN Id 1000 Mcast 239.1.1.100
root@TORS1:~# vtysh -c "show evpn vni 1000" |grep Mcast
Mcast group: 239.1.1.100
root@TORS1:~#
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Found that zebra_rnh_apply_nht_rmap would set the
NEXTHOP_FLAG_ACTIVE if not blocked by the route-map, even
if the flag was not active prior to the check. This fix
changes the flag used to denote the nexthop is filtered so
that proper active state can be retained. Additionally,
found two cases where we would send invalid nexthops via
send_client, which would also cause this crash. All three
fixed in this commit.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Update the nexthop flag output for the route entry dump to
include all possible flag states be output.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We currently run nexthop_active_check multiple times. Make the
code run once and figure out state from that.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The nexthop_active_update command looks at each individual
nexthop and decides if it has changed. If any nexthop
has changed we will set the re->status to ROUTE_ENTRY_CHANGED
and ROUTE_ENTRY_NEXTHOPS_CHANGED.
Additionally the test for old_nh_num != curr_active
makes no sense because suppose we have several events
we are processing at the same time and a total ecmp
of 16 but 14 are active at the start and 14 are active
at the end but different interfaces are up or down.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The NEXTHOP_FLAG_FILTERED went away when we started treating
static routes like every other route in the system. This was
a special case for handling static route code that just didn't
get finished cleaning up.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We are effectively calling nexthop_active_update() on every
route entry being processed for installation at least 2 times.
This is a bit ridiculous. We need to resolve the nexthops
when we know a route has changed in some manner, so do so.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zlog() should be part of the public logging API as it's useful in
the cases where the logging priority isn't known at compile time
(i.e. it depends on a variable).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
L3VNI configured in a specific VRF is allowed to unconfigure from any
VRF, including default (global) VRF. This results L3VNI delete notification
to BGP and subsequent type-5 route uninstall from the VRF the L3VNI belong to.
This also resulted in the inconsistent running configuration.
The deleted L3VNI still shows up in its original VRF. The VRF in which the
"no vni <x>" was executed doesn't display its own L3VNI.
Added a VRF check in zebra to prevent this.
Signed-off-by: Kishore Aramalla <karamalla@vmware.com>
When having a route recovery, because of the route installation
cycling and the next hop label check, it could happen that the PW
never gets recovered. The original code shows the intention of retrying,
but the code was missing. The fix includes the call to the timer programming
the recovery attempt.
Example for reproducing the issue:
|P1| <-> |P2| <-> |P3|
- Being P1, P2, P3 nodes, using IS-IS as IGP, and having a pseudowire
betwen P1 and P3 (P1, P2, P3 having configured LDP daemons).
- After 60 seconds, kill the IS-IS daemon in P2.
- Wait 30 seconds
- Launch again the IS-IS daemon in P2
- The bug/issue is that after P1 <-> P3 recovering connectivity sometimes
the PW is not recovered because the reason explained in the first paragraph.
Signed-off-by: F. Aragon <paco@voltanet.io>
In zebra terminate path, the node was attempted to remove
twice from the RB_TREE table. This lead to a crash during
zebra shutdown zebra_router_free_table already calls RB_REMOVE
to remove a node from rb tree table.
siginfo=0x7fffd9134a30, context=<optimized out>) at lib/sigevent.c:249
rbt=<optimized out>, t=<optimized out>) at lib/openbsd-tree.c:226
t=0x56296965ff50 <zebra_router_table_head_RB_INFO>) at lib/openbsd-tree.c:383
rbt=rbt@entry=0x562969669bd0 <zrouter+16>, elm=elm@entry=0x56296afcf810)
at lib/openbsd-tree.c:393
(elm=0x56296afcf810, head=0x562969669bd0 <zrouter+16>) at zebra/zebra_router.h:46
Singned-off-by: Chirag Shah <chirag@cumulusnetworks.com>
We were memsetting zebra_pbr_rule struct after
we had already put some information in it. Also updated
the init of the struct to use braces instead of a
memset.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The `show ipv[4|6] <nht|import-check> ...` commands are starting
to produce a bunch of output due to multiple daemons now
using the code. Allow the specification of a v4 or v6 address
to allow the show command to only display the interesting nht.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This fix covers the case where two or more events are processed but only one
becoming effective. E.g. when mixing a synchronous label request from a LDP
deamon and an asynchronous request from a BGP daemon it could happen to the
BGP having the label chunk, but the LDP stuck waiting for the response.
Given e.g.
ldpd <-------->
(sync label request)
Zebra (label proxy) <--> Zebra (shared label manager)
bgpd <-------->
(async label request)
Sequence:
LDP label request ----->
Zebra (label proxy FW) ----> Zebra (LM)
BGP label request ----->
Zebra (label proxy FW) ----> Zebra (LM)
<---- Zebra (LM) RP LDP
<---- Zebra (LM) RP BGP
Signed-off-by: F. Aragon <paco@voltanet.io>
We don't use th vrf-level VRF_RIB_SCHEDULED flag any longer;
remove it and collapse the zebra_vrf flags' values.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The current code path of registration does this:
a) Lookup or create the rnh
b) register the client with the rnh for callback
If this is a new rnh send a response to the client that
only includes the rnh data that it has (nothing so no path)
If this is a existing rnh send the actual path to the client,
if it exists.
c) If a new client or a flag has changed refigure and send result
to all clients.
This is problematic in that suppose the rnh is new. Clients
will receive two answers:
1) A call back with no nexthops
2) A call back with the resolved # of nexthops
Imagine pim who depends on nht to handle this, pim will create
a mroute( because it does a hard lookup of the rpf as it is registering
the nexthop ), then it will receive the first callback causing
it to tear down the mroute and then receive the second callback
causing it to put it right back.. This is obviously not very
good for mroutes.
This code moves the send to the new client till after the new
client has connected, thus only allowing one callback to the new
client with the actual answer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Routing protocols are allowed ( and even encouraged ) to modify
the flags that influence the nexthop tracking. As such when
we modify the tracking of a nexthop to go from, say, connected force
or not we must re-evaluate the nexthop and send the results
up to the interested parties.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
After we have evaluated the rnh for an import-check type
and we copy the re then we know that the state has changed
and we should be notifying the end user about it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
LSP processing was a zvrf flag based upon a connected route
coming or going. But this did not allow us to know
that we should do lsp processing other than after the meta-queue
processing was finished.
Eventually we moved meta-queue processing of do_nht_processing
to after the dataplane sent the main pthread some results.
This of course left us with a timing hole where if a connected
route came in and we received a data plane response *before*
the meta queue was processed we would not do the work as necessary.
Move the lsp processing to a flag off of the rib_dest_t. If it
is marked then we need to process lsps.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a detailed debugging command for NHT tracking and add
the detailed output to the log about why we make some decisions
that we are. I tried to model this like the rib processing
detailed debugs that we added a few months back.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Currently nexthop tracking is performed for all nexthops that
are being tracked after a group of contexts are passed back
from the data plane for post install processing.
This is inefficient and leaves us sending nexthop tracking
changes at an accelerated pace, when we think we've changed
a route. Additionally every route change will cause us
to relook at all nexthops we are tracking irrelevant if
they are possibly related to the route change or not.
Let's modify the code base to track the rnh's off of the rib
table's rn, `rib_dest_t`. So after we process a node, install
it into the data plane, in rib_process_result we can
look at the `rib_dest_t` associated with the rn and see that
a nexthop depended on this route node. If so, refigure it.
Additionally we will store rnh's that are not resolved on the
0.0.0.0/0 nexthop tracking list. As such when a route node
changes we can quickly walk up the rib tree and notice that
it needs to be reprocessed as well.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a default route_node for our routing tables. This will allow us
to know that we can hang data off the default route for processing.
We will be hanging the nexthop tracking data structures off the rib_dest_t
so that we can know which nexthops we need to handle. Effectively
nexthops that we are tracking that are unresolved will be stored on the
default route. When something changes in the rib tree we can
work up the rn->parent pointer checking for nexthops we need to re-evaluate.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The resolved_route is the prefix we are using in the routing table
to resolve this particular nexthop we are tracking. Add code
to better track it's change.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The prn value as passed in may be NULL as such do not
allow it to be derefed (even though it works now).
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have several route types KERNEL and CONNECT that are handled via special
case in the code. This was causing a lot of work keeping the two different
classes of route types as special(SYSTEM OR NOT). Put the dplane
in charge of the code that sets the bits for signalling route install/failure.
This greatly simplifies the code calling path and makes all route types
be handled exactly the same. Additionaly code that we want to run
post data plane install can just work as per normal then, instead
of having to know we need to run it when we have a special type
of route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
When we get a route install failure from the kernel, actually
indicate in the rib the status of the routes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When switching routes from one route type to another actually
unset the old route as enqueued.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When shutting down, the individual vrf's own the shutdown of the table
and subsuquent removal from the routes from the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When shutting down and we have a very large table to shutdown
and after we've intentionally closed all the client connections
close the zebra zserv client socket.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It had no logical reason to be in the default VRF. This moves it to the
zebra_router, which is better suited to store global references.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
A lot of checks relied on the VRF ID and the EVPN VRF ID to be the same.
This patch changes those checks to the EVPN_ENABLED macro, which checks
if the VRF is the EVPN one.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Fix the macros for reading NLA attribute info
from an extended error ack. We were processing the data
using route attributes (rtattr) which is identical in size
to nlattr but probably should not be used.
Further, we were incorrectly calculating the length of the
inner netlink message that cause the error. We have to read
passed that in order to access all the nlattr's.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
For a MAC-IP pair generally local/netlink msg for
MAC is received followed by Neigh. The MAC can be detected as duplicate
during this event.
When a neigh update is received, the neigh inherits DUP flag from its
MAC and along with that mark the neigh as INACTIVE.
Also, In the case of DUP detected neigh, do not update its state
to ACTIVE before determining to send notification to bgpd.
There is a time when Neigh update received prior to MAC update.
In that case neigh is marked as inactive since its MAC is
still in REMOTE state. Once the MAC update is received and
it is detected as DUPLICATE, the neigh would inherit DUP flag
but remained in inactive state.
By fixing the first case, the neigh remains in inactive once
detected as DUPLICATE in both scenarios.
The unfreeze action would mark all inherited neighs to ACTIVE,
and clears DUP flag then sends notification to bgpd (to send type-2).
Ticket:CM-24339
Reviewed By:CCR-8451
Testing Done:
Validated dup detection on both environment where neigh and mac
notification can come as either one first.
With the fix, the neigh was remained in "inactive" state
once detected as duplicate.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The 'sho ip route summary' and 'sho ip route summary <prefix>'
paths used different definitions of a 'fib' route. Use
the route-entry 'INSTALLED' flag in both places.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This replaces manual checks of the flag with a wrapper macro to convey
the meaning "is evpn enabled on this vrf?"
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Rename {bgp,zvrf}_def{ault} to {bgp,zvrf}_evpn where it makes sense,
i.e. when they contain the EVPN instance.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Since the EVPN VRF may not be the default one, compare received
messages' VRF agains the EVPN VRF and not the Default.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
This uses the EPVN VRF to store L3VNIs hashes, and looks up L2VNIs in
this VRF as they are stored there.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
This sends local VNIs and local MAC addresses to the BGP instance
responsible for EVPN rather than the default one.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Since the EVPN session and underlay can be in a non-default VRF, the
default VRF can be an overlay VRF.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
If the EVPN VRF is not the default one (i.e. with advertise-all-vni),
this allows showing its information with `show bgp l2evpn evpn ...`
commands. They do not require adding `vrf VRFNAME` since we only
support a single EVPN VRF. The same is true for zebra-specific commands
(e.g. `show evpn ...`).
Configuration commands are not restricted to the default VRF but to
the EVPN one, that is to the one bearing `advertise-all-vni`.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
The EVPN VRF is defined by bgpd, and is the one vrf where
`advertise-all-vni` is present.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
Sponsored-by: Scaleway
Duplicate address detection and recovery was relying on the l2-vni backptr
in the neighbor entry which was simply not initialized resulting in
a NULL pointer access in a setup with dup-addressed VMs -
VM1:{IP1,M1} and VM2:{IP1,M2}
Call stack:
(gdb) bt 6
at lib/sigevent.c:249
nbr=nbr@entry=0x559347f901d0, vtep_ip=..., vtep_ip@entry=..., do_dad=do_dad@entry=true,
is_dup_detect=is_dup_detect@entry=0x7ffc7f6be59f, is_local=is_local@entry=true)
at ./lib/ipaddr.h:86
ip=0x7ffc7f6be6f0, ifp=0x559347f901d0, zvni=0x559347f86800) at zebra/zebra_vxlan.c:3152
(More stack frames follow...)
(gdb) p nbr->zvni
$8 = (zebra_vni_t *) 0x0 <<<<<<<<<<<<<<<<<<<<
(gdb)
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
System Routes if received over the netlink bus in a
specific pattern that causes an update operation for that
route in zebra can leave the dest->selected_fib pointer NULL,
while having the ZEBRA_FLAG_SELECTED flag set. Specifically
one way to achieve this is to do this:
`ip addr del 4.5.6.7/32 dev swp1 ; ip addr add 4.5.6.7/32 dev swp1 metric 9`
Why is this a big deal?
Because nexthop tracking is looking at ZEBRA_FLAG_SELECTED to
know if we can use a route, while nexthop active checking uses
dest->selected_fib.
So imagine we have bgp registering a nexthop. nexthop tracking in
the above case will be able to choose the 4.5.6.7/32 route
if that is what the nexthop is, due to the ZEBRA_FLAG_SELECTED being
properly set. BGP then allows the peers connection to come up and we
install routes with a 4.5.6.7 nexthop. The rib processing for route
installation will then look at the 4.5.6.7 route see no
dest->selected_fib and then start walking up the tree to resolve
the route. In our case we could easily hit the default route and be
unable to resolve the route. Which then becomes inactive in the
rib so we never attempt to install it.
This commit fixes this problem because when the rib_process decides
that we need to update the fib( ie replace old w/ new ), the
replacement with new was not setting the `dest->selected_fib` pointer
to the new route_entry, when the route was a system route.
Ticket: CM-24203
Signed-off-by: Donald Sharp <sharpd@cumulusnetworkscom>
The dest->selected_fib should be reported in json output
so that we can debug subtle conditions a bit better in the
future.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Cleaup the rnh tables on shutdown before we cleanup tables. As that
this will remove any need to do rnh processing as part of shutdown.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we get a neighbor entry in zebra we start processing it.
Let's add some additional debugs to the processing so that when
it bails out and we don't use the data, we know the reason.
This should help in debugging the problems from why bgp does
not appear to have data associated with a neighbor entry
in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The check for an entry being NUD_PERMANENT has already been done
there is no need to do it twice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use const in the accessors for pseudowire nhlfe data; pull
that through the kernel-facing apis that use that data.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
In prep for adding nexthop info for pws, rename the accessor
for the pw destination. Add a nexthop-group to the pw
data in the dataplane module.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The current definition of an unnumberd interface as an interface with a
/32 IPv4 is too restrictive, especially for EVPN symmetric routing since
commit 2b83602b2 "*: Explicitly mark nexthop of EVPN-sourced routes as
onlink".
It removes the bypass check wether the nexthop is an EVPN VTEP, and
relies on the SVI to be unnumberd to bypass the gateway lookup. While
this works great if the SVI has an IP, it might not, and the test falls
flat and EVPN type 5 routes are not installed into the RIB.
Sample interface setup, where vxlan-blue is the L3VNI and br-blue the
SVI:
+----------+
| |
| vrf-blue |
| |
+---+--+---+
| |
+-------+ +-----------+
| |
+----+----+ +---------+---------+
| | | br1 |
| br-blue | | 10.0.0.1/24 |
| | +-+-------+-------+-+
+----+----+ | | |
| | | |
+-----+------+ +-----+--+ +--+---+ +-+----+
| | | | | | | |
| vxlan-blue | | vxlan1 | | eth1 | | eth2 |
| | | | | | | |
+------------+ +--------+ +------+ +------+
For inter-VNI routing, the SVI has no reason to have an IP, but it still
needs type-5 routes from remote VTEPs.
This commit expands the definition of an unnumberd interface to an
interface having a /32 IPv4 or no IPv4 at all.
Signed-off-by: Tuetuopay <tuetuopay@me.com>
When a vrf is deleted we need to tell the zebra_router that we have
finished using the tables we are keeping track of. This will allow
us to properly cleanup the data structures associated with them.
This fixes this valgrind error found:
==8579== Invalid read of size 8
==8579== at 0x430034: zvrf_id (zebra_vrf.h:167)
==8579== by 0x432366: rib_process (zebra_rib.c:1580)
==8579== by 0x432366: process_subq (zebra_rib.c:2092)
==8579== by 0x432366: meta_queue_process (zebra_rib.c:2188)
==8579== by 0x48C99FE: work_queue_run (workqueue.c:291)
==8579== by 0x48C3788: thread_call (thread.c:1607)
==8579== by 0x48A2E9E: frr_run (libfrr.c:1011)
==8579== by 0x41316A: main (main.c:473)
==8579== Address 0x5aeb750 is 0 bytes inside a block of size 4,424 free'd
==8579== at 0x4839A0C: free (vg_replace_malloc.c:540)
==8579== by 0x438914: zebra_vrf_delete (zebra_vrf.c:279)
==8579== by 0x48C4225: vrf_delete (vrf.c:243)
==8579== by 0x48C4225: vrf_delete (vrf.c:217)
==8579== by 0x4151CE: netlink_vrf_change (if_netlink.c:364)
==8579== by 0x416810: netlink_link_change (if_netlink.c:1189)
==8579== by 0x41C1FC: netlink_parse_info (kernel_netlink.c:904)
==8579== by 0x41C2D3: kernel_read (kernel_netlink.c:389)
==8579== by 0x48C3788: thread_call (thread.c:1607)
==8579== by 0x48A2E9E: frr_run (libfrr.c:1011)
==8579== by 0x41316A: main (main.c:473)
==8579== Block was alloc'd at
==8579== at 0x483AB1A: calloc (vg_replace_malloc.c:762)
==8579== by 0x48A6030: qcalloc (memory.c:110)
==8579== by 0x4389EF: zebra_vrf_alloc (zebra_vrf.c:382)
==8579== by 0x438A42: zebra_vrf_new (zebra_vrf.c:93)
==8579== by 0x48C40AD: vrf_get (vrf.c:209)
==8579== by 0x415144: netlink_vrf_change (if_netlink.c:319)
==8579== by 0x415E90: netlink_interface (if_netlink.c:653)
==8579== by 0x41C1FC: netlink_parse_info (kernel_netlink.c:904)
==8579== by 0x4163E8: interface_lookup_netlink (if_netlink.c:760)
==8579== by 0x42BB37: zebra_ns_enable (zebra_ns.c:130)
==8579== by 0x42BC5E: zebra_ns_init (zebra_ns.c:208)
==8579== by 0x4130F4: main (main.c:401)
This can be found by: `ip link del <VRF DEVICE NAME>` then `ip link add <NAME> type vrf table X` again and
then attempting to use the vrf.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we install a new route into the kernel always use
REPLACE. Else if the route is already there it can
be translated into an append with the flags we are
using.
This is especially true for the way we handle pbr
routes as that we are re-installing the same route
entry from pbr at the moment.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the next hop's VRF is used for IPv4 and IPv6 unicast routes
sourced from EVPN routes, for next hop and Router MAC tracking and
install. This way, leaked routes from other instances are handled properly.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface. Howver, in the model that
is supported in the implementation and commonly deployed, there is no
explicit Overlay IP address associated with the next hop in the tenant
VRF; the underlay IP is used if (since) the forwarding plane requires
a next hop IP. Therefore, the next hop has to be explicit flagged as
onlink to cause any next hop reachability checks in the forwarding plane
to be skipped.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
Use existing mechanism to specify the nexthops as onlink when installing
these routes from bgpd to zebra and get rid of a special flag that was
introduced for EVPN-sourced routes. Also, use the onlink flag during next
hop validation in zebra and eliminate other special checks.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
Use the L3 interface exchanged between zebra and bgp in route install.
This patch in conjunction with the earlier one helps to eliminate some
special code in zebra to derive the next hop's interface.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the case of EVPN symmetric routing, the tenant VRF is associated with
a VNI that is used for routing and commonly referred to as the L3 VNI or
VRF VNI. Corresponding to this VNI is a VLAN and its associated L3 (IP)
interface (SVI). Overlay next hops (i.e., next hops for routes in the
tenant VRF) are reachable over this interface.
https://tools.ietf.org/html/draft-ietf-bess-evpn-prefix-advertisement
section 4.4 provides additional description of the above constructs.
The implementation currently derives this L3 interface for EVPN tenant
routes using special code that looks at route flags. This patch
exchanges the L3 interface between zebra and bgpd as part of the L3-VNI
exchange in order to eliminate some this special code.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Running zebra after commit 888756b208
in valgrind produces this item:
==17102== Invalid read of size 8
==17102== at 0x44D84C: rib_dest_from_rnode (rib.h:375)
==17102== by 0x4546ED: rib_process_result (zebra_rib.c:1904)
==17102== by 0x45436D: rib_process_dplane_results (zebra_rib.c:3295)
==17102== by 0x4D0902B: thread_call (thread.c:1607)
==17102== by 0x4CC3983: frr_run (libfrr.c:1011)
==17102== by 0x4266F6: main (main.c:473)
==17102== Address 0x83bd468 is 88 bytes inside a block of size 96 free'd
==17102== at 0x4A35F54: free (vg_replace_malloc.c:530)
==17102== by 0x4CCAC00: qfree (memory.c:129)
==17102== by 0x4D03DC6: route_node_destroy (table.c:501)
==17102== by 0x4D039EE: route_node_free (table.c:90)
==17102== by 0x4D03971: route_node_delete (table.c:382)
==17102== by 0x44D82A: route_unlock_node (table.h:256)
==17102== by 0x454617: rib_process_result (zebra_rib.c:1882)
==17102== by 0x45436D: rib_process_dplane_results (zebra_rib.c:3295)
==17102== by 0x4D0902B: thread_call (thread.c:1607)
==17102== by 0x4CC3983: frr_run (libfrr.c:1011)
==17102== by 0x4266F6: main (main.c:473)
==17102== Block was alloc'd at
==17102== at 0x4A36FF6: calloc (vg_replace_malloc.c:752)
==17102== by 0x4CCAA2D: qcalloc (memory.c:110)
==17102== by 0x4D03D88: route_node_create (table.c:489)
==17102== by 0x4D0360F: route_node_new (table.c:65)
==17102== by 0x4D034F8: route_node_set (table.c:74)
==17102== by 0x4D03486: route_node_get (table.c:327)
==17102== by 0x4CFB700: srcdest_rnode_get (srcdest_table.c:243)
==17102== by 0x4545C1: rib_process_result (zebra_rib.c:1872)
==17102== by 0x45436D: rib_process_dplane_results (zebra_rib.c:3295)
==17102== by 0x4D0902B: thread_call (thread.c:1607)
==17102== by 0x4CC3983: frr_run (libfrr.c:1011)
==17102== by 0x4266F6: main (main.c:473)
==17102==
This is happening because of this order of events:
1) Route is deleted in the main thread and scheduled for rib processing.
2) Rib garbage collection is run and we remove the route node since it
is no longer needed.
3) Data plane returns from the deletion in the kernel and we call
the srcdest_rnode_get function to get the prefix that was deleted.
This recreates a new route node. This creates a route_node with
a lock count of 1, which we freed via the route_unlock_node call.
Then we continued to use the rn pointer. Which leaves us with use
after frees.
The solution is, of course, to just move the unlock the node at the
end of the function if we have a route_node.
Fixes: #3854
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Commit: 6005fe55bc
Introduced a crash with zebra looking up either the
nbr structure or the mac structure. This is because
the zvni used is NULL and we eventually call a hash_lookup
call that would cause a NULL dereference. Partially
revert this commit to original behavior.
Problems found via clang Static Analyzer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
L3VNI keeps reference to svi interface (ifp).
When a netlink change received there is no flag
that mac has changed. Currently simply overwrite
interface's (ifp) hw_addr (MAC) field.
For originating EVPN type-2 and type-5 routes due to VNI
MAC change, comparison is required to check existing MAC
vs. netlink change MAC field.
Ticket:CM-23850
Reviewed By:CCR-8283
Testing Done:
Validate EVPN type-5 routes originated upon changing MAC address
of L3VNI's SVI inteface via ip link set cmd.
checked show bgp l2vpn evpn route and Rmac field contains new
MAC address.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
FRR is reporting that a lsp deletion event as a failure
in the log messsages. This will lead to confusion and
lots of fun debugging.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
CLANG was throwing an error because struct rtadv_rdnss(dnssl) was being
initialized with = {0} instead of {}. Change to be the latter.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we schedule a packet for future handling, list the packet
type so that we can see what we are getting with debugs.
Also note which client and how many packets we received from that
client.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In Asymmetric and symetric routing scenario in EVPN
where each VTEP pair having different set of addresses
for the SVIs.
This knob allows reachability (ping connectivity) of
SVI IPs and resolve ARP resoultion VTEPs across racks.
This knob should not be used when same SVI IPs configured
on VTEPs across racks or when advertise default gateway
is configured.
Ticket:CM-23782
Testing Done:
Bring up EVPN symmetric routing topology with different
SVI IPs on different VTEPs. Enable advertise svi ip
at each VTEP, remote VTEPs installs arp entry for
SVI IPs via EVPN type-2 route exchange.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
With the data plane changes that were made, we are now running
nexthop tracking 2 times. Once at the end of meta-queue insertion
and once at the end of receiving a bunch of data from the dataplane.
The Addition of the data plane code caused flags to not be set
fully for the resolved routes( since we do not know the answer yet ),
This in turn caused the nexthop tracking run after the meta-queue
to think that the route was not `good`. This would cause it to
tell all interested parties that there was no nexthop.
After the dataplane insertion we are also no running nht code.
This was re-figuring out the nexthop correctly and also
correctly reporting to interested parties that there was a path again.
Example:
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, f - failed route
K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:06:47
S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:04:47
C>* 10.50.11.0/24 is directly connected, enp0s3, 00:06:47
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:06:47
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:06:47
donna.cumulusnetworks.com(config)# ip route 4.5.6.7/32 192.168.210.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, f - failed route
K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:07:06
S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:00:04
* via 192.168.210.1, enp0s9, 00:00:04
C>* 10.50.11.0/24 is directly connected, enp0s3, 00:07:06
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:07:06
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:07:06
donna.cumulusnetworks.com(config)#
Log files for sharp, which is watching 4.5.6.7:
2019/02/04 15:20:54.844288 SHARP: Received update for 4.5.6.7/32
2019/02/04 15:20:54.844820 SHARP: Received update for 4.5.6.7/32
2019/02/04 15:20:54.844836 SHARP: Nexthop 192.168.209.1, type: 2, ifindex: 3, vrf: 0, label_num: 0
2019/02/04 15:20:54.844853 SHARP: Nexthop 192.168.210.1, type: 2, ifindex: 4, vrf: 0, label_num: 0
As you can see we have received an update with no nexthops( invalid route )
and a second update immediately after it with 2 nexthops.
What's the big deal you say? Well we have code in other daemons that reacts
to not having a path for a nexthop. In BGP this will cause us to tear
down the peer. In staticd we'll remove the recursively resolved route.
In pim we'll remove all paths to the mroute. This is not desirable.
The fix is to remove the meta-queue run of nexthop tracking.
While running after data plane notice of routes to handle is not ideal
we will be fixing this in the future with the nexthop group code, which
should know what nexthops are affected by a nexthop group change.
Fixed code debug code:
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, f - failed route
K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:00:46
S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:00:02
C>* 10.50.11.0/24 is directly connected, enp0s3, 00:00:46
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:46
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:46
donna.cumulusnetworks.com(config)# ip route 4.5.6.7/32 192.168.210.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, f - failed route
K>* 0.0.0.0/0 [0/103] via 10.50.11.1, enp0s3, 00:00:59
S>* 4.5.6.7/32 [1/0] via 192.168.209.1, enp0s8, 00:00:02
* via 192.168.210.1, enp0s9, 00:00:02
C>* 10.50.11.0/24 is directly connected, enp0s3, 00:00:59
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:59
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:59
2019/02/04 15:26:20.656395 SHARP: Received update for 4.5.6.7/32
2019/02/04 15:26:20.656440 SHARP: Nexthop 192.168.209.1, type: 2, ifindex: 3, vrf: 0, label_num: 0
2019/02/04 15:26:33.688251 SHARP: Received update for 4.5.6.7/32
2019/02/04 15:26:33.688322 SHARP: Nexthop 192.168.209.1, type: 2, ifindex: 3, vrf: 0, label_num: 0
2019/02/04 15:26:33.688329 SHARP: Nexthop 192.168.210.1, type: 2, ifindex: 4, vrf: 0, label_num: 0
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The restricting of data about interfaces was both inconsistent
in application and allowed protocol developers to get into states where
they did not have the expected data about an interface that they
thought that they would. These restrictions and inconsistencies
keep causing bugs that have to be sorted through.
The latest iteration of this bug was that commit:
f20b478ef3
Has caused pim to not receive interface up notifications( but
it knows the interface is back in the vrf and it knows the
relevant ip addresses on the interface as they were changed
as part of an ifdown/ifup cycle ).
Remove this restriction and allow the interface events to
be propagated to all clients.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Made changes and updated the routemap applied counter in the following flows.
1.Increment when route map attached to a list.
2.Decrement when route map removed / modified from a list.
3.Increment/decrement when route map create/delete callback triggered.
4.Besides ,This counter need not be updated when a route map is got updated.
i.e changing/adding a match value to the existing routemap.
In Zebra , same update api called for all three add/delete/update operation.
But this counter have to be updated only for routemap addition.
Addressed this specific change by identifying the routemap operation based
on routemap pointer.
Signed-off-by: RajeshGirada <rgirada@vmware.com>
The zebra.sock data is the listener socket for the zapi protocol.
The rest of the zebra router does not need to see this data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The client_list should be owned by the zebra_router data structure
as that it is part of global state information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The master thread handler is really part of the zrouter structure.
So let's move it over to that. Eventually zserv.h will only be
used for zapi messages.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we get into rib_process_result and the operation we are handling
is DPLANE_OP_ROUTE_UPDATE *and* the route entry being looked at
is a route replace, we currently have no way to decode to the old_re
and the re due to how we have stored context. As such they are the
same pointer.
As such the route replace for the same route type is causing the re
to set the installed flag and then immediately unset the installed
flag, leaving us in a state where the kernel has the route but
the rib thinks we are not installed.
Since the true old_re( the one being replaced by the update operation )
is going away( as that it zebra deletes the old one for us already )
this fix is not optimal but will get us moving forward.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Read the onlink flag from the kernel for routes and pass them
up and through to zebra so that we are consistent with what
the kernel is telling us.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we receive a valid message from the kernel that
is either a kernel or system route, we should trust
that the route is legit and just use it.
Old behavior:
K * 172.22.0.0/15 [0/0] via 172.22.2.254, eva_dummy1 inactive, 00:00:16
New Behavior:
K>* 172.22.0.0/15 [0/0] via 172.22.2.254, eva_dummy1, 00:02:35
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The route entry being displayed in debugs was displaying
the originating route type as a number. While numbers
are cool, I for one am not terribly interested in
memorizing them. Modify the (type %d) to a (%s) to
just list the string type of the route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Apparently 'f' means both OpenFabric and a Failed kernel
route installation.
Let's switch the 'f' for the failed kernel route installation
to 'r - rejected route'.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the nexthop->type is NEXTHOP_TYPE_IPV4_IFINDEX we
were writing the RTA_PREFSRC 2 times for the build_singlepath
and build_multipath functions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some v6 attributes for the netlink_route_build_singlepath
code were being written two times for the NEXTHOP_TYPE_IPV6_IFINDEX
nexthop type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In extended-mobility case ({IP1, MAC} binding),
when a MAC moves from local to remote, binding
changes to {IP2, MAC}, local neigh (IP1) marked
as inactive in frr.
The evpn draft recommends to probe the entry once
local binding changes from local to remote.
Once the probe is set for the local neigh entry,
kernel will attempt refresh the entry via sending
unicast address resolution message, if host does not
reply, it will mark FAILED state.
For FAILED entry, kernel triggers delete neigh
request, which result in frr to remove inactive entry.
In absence of probing and aging out entry,
if MAC moves back to local with {IP3, MAC},
frr will mark both IP1 and IP3 as active and sends
type-2 update for both.
The IP1 may not be active host and still frr advertises
the route.
Ticket:CM-22864
Testing Done:
Validate the MAC mobilty in extended mobility scenario,
where local inactive entry gets removed once MAC moves
to remote state.
Once probe is set to the local entry, kernel triggers
reachability of the neigh/arp entry, since MAC moved remote,
ARP request goes to remote VTEP where host is not residing,
thus local neigh entry goes to failed state.
Frr receives neighbor delete faster and removes the entry.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
The kernel neigh update api helps update neighbor entry,
using changing state and flags parameters.
Ticket:CM-22864
Reviewed By:
Testing Done:
Signed-off-by:Chirag Shah <chirag@cumulusnetworks.com>
ip rule configuration is being equipped with extra log information for
fwmark information.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The ifa_flags value in the netlink message was originally a uint8_t
value. The linux kernel quickly ran out of 8 bits of data to
pass and the IFA_FLAGS value was added to the netlink message to allow
more than 8 bits of data to be passed. So replace the ifa_flags
with the IFA_FLAGS value if it exists in the interface netlink
message.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The onlink attribute was being passed from upper level protocols
as an attribute of the route *not* the individual nexthop. When
we pass this data to the kernel, we treat the onlink as a attribute
of the nexthop. This commit modifies the code base to allow
us to pass the ONLINK attribute as an attribute of the nexthop.
This commit also fixes static routes that have multiple nexthops
some onlink and some not.
ip route 4.5.6.7/32 192.168.41.1 eveth1 onlink
ip route 4.5.6.7/32 192.168.42.2
S>* 4.5.6.7/32 [1/0] via 192.168.41.1, eveth1 onlink, 00:03:04
* via 192.168.42.2, eveth2, 00:03:04
sharpd@robot ~/frr2> sudo ip netns exec EVA ip route show
4.5.6.7 proto 196 metric 20
nexthop via 192.168.41.1 dev eveth1 weight 1 onlink
nexthop via 192.168.42.2 dev eveth2 weight 1
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we process the dataplane data, keep track of whether or not a route
is in transit or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zebra is using NEXTHOP_FLAG_FIB as the basis of whether or not
a route_entry is installed. This is problematic in that we plan
to separate out nexthop handling from route installation. So modify
the code to keep track of whether or not a route_entry is installed/failed.
This basically means that every place we set/unset NEXTHOP_FLAG_FIB, we
actually also set/unset ROUTE_ENTRY_INSTALLED on the route_entry.
Additionally where we check for route installed via NEXTHOP_FLAG_FIB
switch over to checking if the route think's it is installed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are selecting nexthops for disply, abstract the notion
of what character we display to the end user about the status
of the nexthop.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we discover that a command given to the route add/delete
function in rt_socket.c is bogus, print out a debug message
but don't attempt to actually use a nexthop that we have not
figured out yet as part of the data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Just return right there, goto's are useful if you have common
code that needs to be cleaned up before exiting this function,
of which this function has none and there is only one goto.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Neigh detected duplicate detected during local update,
upon receiving kernel neigh delete, set neigh inactive
flag so BGPd can install remote route entry if present.
Only if freeze action enabled, local duplicate detected
entry will not be present in BGPd thus marking neigh
inactive is safe. BGPd will simply attempt install
remote entry if present.
Ticket:CM-23438
Testing Done:
Validated MAC-IP pair, trigger mobility of between two
VTEPs, upon local freeze perform neigh delete which
triggers BGP to install remote type-2 route into kernel.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
A MACIP is detected as duplicate and after that
the host continue to move behind different VTEPs results
in local VTEP receiving remote mobility events.
In remote_macip_add, ensure to trigger dad if
MAC is marked as duplicate. In case of freeze
action enabled, is_dup_detect will be set to
avoids installing frozen MAC into kernel.
Ticket:CM-23649
Testing Done:
Configured detection action freeze with detection count
as 7 at DUT and >7 at remote VTEP,
trigger MAC-IP mobility between VTEPs.
once tdetection count reached, MAC detected as duplicate,
post detection move the host to remote. The local VTEP
receives remote macip add and entry is not installed into
kernel with fix.
root@VTEP1:~# net show evpn mac vni 1002 mac aa:aa:aa:aa:aa:aa
MAC: aa:aa:aa:aa:aa:aa
Remote VTEP: 27.0.0.16
Local Seq: 7 Remote Seq: 8
Duplicate, detected at Fri Jan 25 05:03:29 2019
Neighbors:
11.11.11.11 Inactive
Kernel entry still points to LOCAL
root@VTEP1:~# bridge fdb show | grep aa:aa:aa
aa:aa:aa:aa:aa:aa dev hostbond3 vlan 1002 master VxLanA-1
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
In a VRR/VRRP setup we can have connected routes with different costs.
So this change eliminates suppressing metric display for connected routes.
Sample output -
root@TORC11:~# vtysh -c "show ipv6 route vrf vrf1"
Codes: K - kernel route, C - connected, S - static, R - RIPng,
O - OSPFv3, I - IS-IS, B - BGP, N - NHRP, T - Table,
v - VNC, V - VNC-Direct, A - Babel, D - SHARP, F - PBR,
> - selected route, * - FIB route
VRF vrf1:
K * ::/0 [255/8192] unreachable (ICMP unreachable), 00:00:36
C * 2001:aa:1::/64 [0/100] is directly connected, vlan1002-v0, 00:00:36
C>* 2001:aa:1::/64 [0/90] is directly connected, vlan1002, 00:00:36
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
MACVLAN devices are typically used for applications such as VRR/VRRP that
require a second MAC address (virtual). These devices have a corresponding
SVI/VLAN device -
root@TORC11:~# ip addr show vlan1002
39: vlan1002@bridge: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
link/ether 00:02:00:00:00:2e brd ff:ff:ff:ff:ff:ff
inet6 2001:aa:1::2/64 scope global
valid_lft forever preferred_lft forever
root@TORC11:~# ip addr show vlan1002-v0
40: vlan1002-v0@vlan1002: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9152 qdisc noqueue master vrf1 state UP group default
link/ether 00:00:5e:00:01:01 brd ff:ff:ff:ff:ff:ff
inet6 2001:aa:1::a/64 metric 1024 scope global
valid_lft forever preferred_lft forever
root@TORC11:~#
The macvlan device is used primarily for RX (VR-IP/VR-MAC). And TX is via
the SVI. To acheive that functionality the macvlan network's metric
is set to a higher value.
Zebra currently ignores the devaddr metric sent by the kernel and hardcodes
it to 0. This commit eliminates that hardcoding. If the devaddr metric
is available (METRIC_MAX) it is used for setting up the connected route
otherwise we fallback to the dev/interface metric.
Setting the macvlan metric to a higher value ensures that zebra will always
select the connected route on the SVI (and subsequently use it for next hop
resolution etc.) -
root@TORC11:~# vtysh -c "show ip route vrf vrf1 2001:aa:1::/64"
Routing entry for 2001:aa:1::/64
Known via "connected", distance 0, metric 1024, vrf vrf1
Last update 11:30:56 ago
* directly connected, vlan1002-v0
Routing entry for 2001:aa:1::/64
Known via "connected", distance 0, metric 0, vrf vrf1, best
Last update 11:30:56 ago
* directly connected, vlan1002
root@TORC11:~#
Ticket: CM-23511
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a local neigh is added with a MAC that is remote or absent the
neigh is kept in zebra as local/in-active. But not propagated to bgpd.
Similarly when an inactive neigh is deleted the del-msg is not propagated
to bgpd.
Without this change bgp and zebra would fall out of sync as that
bgp would not know to rerun bestpath and for it to reinstall a
known remote path for the mac-ip in question. To fix this we
now propagate inactive neigh deletes to bgpd.
Ticket: CM-23018
Testing Done:
1. evpn-min
2. manually triggered the out-of-sync state and verified the fix
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The sequence number used should be unique and increase by 1
for netlink commands. This will allow the code to match
up batched commands to actual requests, so that we can signal
the failure correctly back.
So start the movement and tracking of sequence numbers as
an atomic uint32_t in zebra_router. Modify the dataplane
code to start tracking contexts from this value.
In future commits we will move more of the sequencing
data into using this value.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were using dplane_ctx_get_status(ctx) and assigning that
value to zebra_dplane_status, not zebra_dplane_result( yeah what? )
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Finish the LSP update code for the async dataplane for
the openbsd platform. Remove synch apis now that we've
converted to the async code path.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Remove the last use of the pre-dataplane LSP update apis;
remove the stubs from the 'null' implementation file.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Start performing LSP updates through the async dataplane
subsystem. This is plumbed through for linux/netlink.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Adding infra to zebra dplane to support LSP updates. Add
kernel api for LSP updates that uses a dataplane context; add
stub apis for netlink, bsd, and 'null' kernel paths. Add
version of netlink mpls update code that takes a dplane
context struct instead of a zebra lsp struct.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add public versions of zebra apis that add NHLFEs to an LSP,
and that free NHLFEs. The dataplane code needs to capture/copy
NHLFEs in order to do async LSP programming.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Move route info to a separate struct and use a union in the
dplane context to hold either route or lsp info. Add
accessors for LSP info.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
IPv6 uses AF_LINK to represent netmasks, this commit unbreaks
`rtm_read_mesg` that was broke on the `rta_get*` refactory.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
(cherry picked from commit 7a163a7c59)
IPv6 netmasks use AF_LINK family type and puts the correct amount of
set bits in the data structure. If we only copy the SDL header we
won't get all IPv6 address length, we must copy the whole extension of
the `sockaddr_in6` struct (which is provided in `destlen` parameter).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
(cherry picked from commit 691e903879)
Remove two unused functions in `zebra/rt_socket.c`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
(cherry picked from commit 914fea09d9)
`sockaddr` `len` field is the address type size and not the mask length.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
(cherry picked from commit a6c0003182)
This is mostly to be consistent with the "show ip import-check"
command, which is very similar.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Favor usage of the afi_t enumeration to identify address-families
over using the classic AF_INET[6] constants for that. The choice to
use either of the two seems to be mostly arbitrary throughout our
code base, which leads to confusion and bugs like the one fixed by
commit 6f95d11a1. To address this problem, favor usage of the afi_t
enumeration whenever possible, since 1) it's an enumeration (helps
the compilers to catch some bugs), 2) has a safi_t sibling and 3)
can be used to index static arrays. AF_INET[6] should then be used
only when interfacing with the kernel or external libraries like
libc. The family2afi() and afi2family() functions can be used to
convert between the two different representations back and forth.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
`gate_buf` should be big enough to hold IPv6 addresses and `inet_ntop`
should be run in the correct `sockaddr` struct member.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Debug messages should use `prefix_buf` and `prefix2str` should only be
called once in `kernel_rtm`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add a new field in the ZEBRA_CAPABILITIES zapi message specifying
the VRF backend in use.
For simplicity, make the zclient code call vrf_configure_backend()
to apply the received value automatically instead of requiring
the daemons to do that themselves in their zebra_capabilities()
callbacks.
Additionally, call zebra_vrf_update_all() only after sending the
capabilities message to the client, so that it will know which VRF
backend is in use when processing the VRF messages.
This commit fixes a couple of bugs in the "interface" CLI command and
associated northbound callbacks, which behave differently depending
on the VRF backend in use. Before this commit, the vrf_backend
variable would always be set to VRF_BACKEND_NETNS in the client
daemons, even when zebra was started without the --vrfwnetns option.
This could lead to inconsistent behavior and subtle bugs under
specific circumstances.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
We were sending ZEBRA_INTERFACE_LINK_PARAMS messages under the
following circumstances:
* New interface was created (via kernel or config);
* Interface went from down to up;
* Update in the link-params configuration.
Now also send ZEBRA_INTERFACE_LINK_PARAMS messages whenever a zclient
connects and sends a ZEBRA_INTERFACE_ADD request. Without this fix,
the client daemons don't receive interface link parameters if they
are configured in the zebra startup configuration.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
client->ifinfo is a VRF bitmap, hence we need to use
vrf_bitmap_check() to check if a client is subscribed to receive
interface information for a particular VRF. Just checking if
the client->ifinfo value is set will always succeed since it's
a pointer initialized by zserv_client_create(). With this fix,
we'll stop sending interface messages from all VRFs to all clients,
even those that didn't subscribe to it.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Routes without nexthops don't make any sense, so we need to reject
them otherwise weird things can happen.
NOTE: blackhole routes aren't nexthop-less, they do have a single
nexthop of type NEXTHOP_TYPE_BLACKHOLE.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Some daemons like ospfd and isisd have the ability to advertise a
default route to their peers only if one exists in the RIB. This
is what the "default-information originate" commands do when used
without the "always" parameter.
For that to work, these daemons use the ZEBRA_REDISTRIBUTE_DEFAULT_ADD
message to request default route information to zebra. The problem
is that this message didn't have an AFI parameter, so a default route
from any address-family would satisfy the requests from both daemons
(e.g. ::/0 would trigger ospfd to advertise a default route to its
peers, and 0.0.0.0/0 would trigger isisd to advertise a default route
to its IPv6 peers).
Fix this by adding an AFI parameter to the
ZEBRA_REDISTRIBUTE_DEFAULT_{ADD,DELETE} messages and making the
corresponding code changes.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Future commits are going to introduce more rigor in
state setting in the case of received results from
the data plane. So let us move the DPLANE_OP_ROUTE_DELETE
state check to the same spot as the rest of the code that
is handling a particular operation.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify the status flag from 8 bits to 32 bits and to add
a few new flags that will be used in future commits.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Modify the meta_queue insertion such that we only enqueue
the route_node into one meta_queue instead of several.
Suppose we have multiple route_entries associated with
a particular node from rip, bgp, staticd. If we receive a
route update from rip, we would enqueue the route_node into
the 1, 2, 3 meta-nodes. Which means that we would run
the entire process of figuring out a route 3 times, while
nothing would change the second two times.
Modify the code to choose the lowest meta-queue and
install it into that one for processing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When a dataplane provider/plugin registers, return the new
handle/object - that's needed to use some provider apis
later on.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Pass lists of results back to zebra from the dataplane subsystem
(and pthread). This helps reduce the lock/unlock cycles when
zebra is busy. Also remove a couple of typedefs that made their
way into the dataplane header file - those violate the FRR style
guidelines.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
if the default vrf name is manually set, by passing -o parameter to
zebra, then this should be detected when walking the list of netns
available in the system. If a netns called vrf0 is present, then it
should be ignored.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when zebra is run, by using vrf netns backend mode, then the parser
detector of netns is run before forcing the default vrf to a possible
value. In that case, there is a possibility that the forced '-o' option
will create a second vrf with same name, whereas this option should be
there to uniquely have a default vrf with a value.
To make things consistent, the forced value will be priorised. Then, the
notifier will attempt to create vrf contexts. The expectation is that
the creation will fail, due to an already present vrf with same name.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When an empty netmask a wrong end size is calculated, lets handle this
corner case to avoid spurious warning messages.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Handle corner case where a warning log message is issued on interface
address netmask handling with sockaddr type AF_LINK: it may come empty
or with match all (all 0xFF).
In the first case all lengths are zero and we only need to copy the
first bytes, second case it comes with a zero index and all 0xFF bytes.
In any case we only need to figure out a few of the first bytes instead
of all data.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
When porting routing socket macro data handling to functions, the
attribute function was forgotten. The only difference between the
attribute and address handler is the family type check.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
"brief" output for "show interface" helps when we have to quickly check
important information like ip address, vrf etc. This prints
information in the easy to read tabular format. Currently it prints oper
status, ifname, vrf, ipv4 and ipv6 addresses.
Ticket: CM-9109
Signed-off-by: Nitin Soni <nsoni@cumulusnetworks.com>
For neigh check duplicate flag as it can be inherited from
duplicate detected MAC (count could be 0).
Ticket:CM-23316
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Below are cases where EVPN duplicate detection
Freeze and Unfreeze required fixes:
Auto recovery needs to check neighbor's duplicate flag
to take action, as neigh could be marked duplicate
via inherited from MAC where IP detection count could be 0.
MAC duplicate detection needs to set flag to true
if freeze action is configured.
Local MAC add update should not send update to bgp
if MAC is in frozen state.
Remote MAC-IP update should not process neigh update if MAC
is detected as duplicate during remote update.
Ticket:CM-23344
Testing Done:
Trigger duplicate detection via both local and remote update trigger,
Validate clear command and other changes expected behavior.
Auto-recovery takes appropriate action on inherited IPs.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Add the ability to retrieve the current role of mlag for this machine.
If mlag is not setup we will always return MLAG_ROLE_NONE.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The zebra_delete_rnh function is not needed to be exposed
to the entire world. Limit it's scope.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The deletion of a rnh is always proceeded by the same checks
to see if it is done. Just let zebra_delete_rnh do this test.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we call zebra_vrf_table_create, we've already created the info
pointer in zebra_router_get_table, so properly set the info->safi
and just store the zvrf->table[afi][safi] value.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When handling events from /var/run/netns folder, if several netns are
removed at the same time, only the first one is deleted in the frr. Fix
this behaviour by applying continue in the loop.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Duplicate address detection should operate
at default vrf instance.
For mac and neigh show command, auto recovery and few places
where tanent vrf_id used for zvrf instead use default
vrf instance. Use vxlan_if's or VRF_DEFAULT vrf_id to
fetch zebra's default vrf instance.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
zebra uses the SIOCETHTOOL ioctl with the ETHTOOL_GSET command to
fetch the speed of interfaces from the kernel. The only problem is
that ETHTOOL_GSET returns EOPNOTSUPP when the given interface is a
virtual interface. This leads to zebra emitting warnings like this
at startup:
ZEBRA: IOCTL failure to read interface lo speed: 95 Operation not supported
ZEBRA: IOCTL failure to read interface dummy0 speed: 95 Operation not supported
ZEBRA: IOCTL failure to read interface ovs-system speed: 95 Operation not supported
Silence these warnings by ignoring EOPNOTSUPP errors, since we know
they are harmless. This is similar to how we handle EINVAL errors
from the BSD SIOCGIFMEDIA ioctl (commit c69f2c1ff).
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Unlike the other interface zapi messages, ZEBRA_INTERFACE_VRF_UPDATE
identifies interfaces using ifindexes and not interface names. This
is a problem because zebra always sends ZEBRA_INTERFACE_DOWN
and ZEBRA_INTERFACE_DELETE messages before sending
ZEBRA_INTERFACE_VRF_UPDATE, and the ZEBRA_INTERFACE_DELETE callback
from all daemons set the interface index to IFINDEX_INTERNAL. Hence,
when decoding a ZEBRA_INTERFACE_VRF_UPDATE message, the interface
lookup would always fail since the corresponding interface lost
its ifindex. Example (ospfd):
OSPF: Zebra: Interface[rt1-eth2] state change to down.
OSPF: Zebra: interface delete rt1-eth2 vrf default[0] index 8 flags 11143 metric 0 mtu 1500
OSPF: [EC 100663301] INTERFACE_VRF_UPDATE: Cannot find IF 8 in VRF 0
To fix this problem, use interface names instead of ifindexes to
indentify interfaces like the other interface zapi messages do.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The route_info data structure already had a mapping of route type
to admin distance. Consolidate the meta_queue_map information
into this route_info data structure. This is to reduce the number
of places we need to remember to touch when adding a new routing
protocol.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
An EVPN type-2 entry is in freeze state during remote update,
remote VTEP can send typ-2 withdraw update,
upon receiving an entry delete (withdraw), first check
kernel has in local reachable state. Upon
unfreeze use the local entry to advertise to peers.
Fetch is for both MAC and IP, delete can come for
only MAC or MAC-IP combined route.
The specific entry fetch only required request flag to be set,
dump flag is not required.
Testing Done:
Simulate two VTEPs to do M1, IP1 mobility sequence,
freeze MAC during remote MAC update, subsequently send
withdraw type-2 route from origintating VTEP.
This results in read apis to invoke for local reachable entry.
Zebra updates its cache and upon unfreeze originates type-2.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Make netlink_request api generic where it can be used
for dump or querying specific information request.
nelink request nlm flags (NLM_F_ROOT | NLM_F_MATCH) are
used to dump purpose, if client wants to query spcific
MAC or IP using netlink_request does not require to set
them.
nlm struct is passed by the caller of netlink_request,
it can also set the nlm request flags.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
This commit is the last missing piece to complete BGP LU support in bgpd. To this moment, bgpd (and zebra) supported auto label assignment only for prefixes leaked from VRFs to vpn and for MPLS SR prefixes. This adds auto label assignment to other routes types in bgpd. The following enhancements have been made:
* bgp_route.c:bgp_process_main_one() now sets implicit-null local_label to all local, aggregate and redistributed routes.
* bgp_route.c:bgp_process_main_one() now will request a label from the label pool for any prefix that loses the label for some reason (for example, when the static label assignment config is removed)
* bgp_label.c:bgp_reg_dereg_for_label() now requests labels from label pool for routes which have no associated label index
* zebra_mpls.c:zebra_mpls_fec_register() now expects both label and label_index from the calling function, one of which must be set to MPLS_INVALID_LABEL or MPLS_INVALID_LABEL_INDEX, based on this it will decide how to register the provided FEC.
Signed-off-by: Anton Degtyarev <anton@cumulusnetworks.com>
Reduce the zebra rib workqueue retry timeout, used when the queue
towards the zebra dataplane has reached its limit. Lowering the
value was reported to improve update throughput on some platforms.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The label processing for socket installs was not ensuring
that each nexthop would not accidently use the last
nexthops value.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The test we were using to ensure that a mask was sent in
is a bit redundant, let's just always send it in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ADD/DELETE messages are the only ones we support, so leave
early from the function, in other words don't check it every
nexthop loop.
Additionally nexthops only care about non recursive active flags.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
I'm going to rearrage the kernel_rtm_ipv4 and v6 functions
so the sin6_masklen needs to be moved a bit earlier.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The write function converted to v4 and v6 functions to a union sockunion
via casting. Just use `union sockunion` instead.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow the ns deletion event to happen *after* the data validity
checks.
Please note this probably still leaves a weird hole if we receive
multiple namespace events ( as the for loop implies ). We will
stop handling anything after a namespace deletion notification.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
the default vrf name was hardset to "Default", whereas the default vrf
name could have been configured in an other manner. Fix this
inconsistency.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the l3vni structure is allocated only once, since that structure is only
used for default netns. For that, move the initialisation part is moved
to a proper place, where there is no risk of attempting to initialise it
more than once, even when vrf backend is netns.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a route removal failure happens return to the installing
protocol that the route deletion failed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the zebra rib processing workqueue, set a small timeout
so that we will wait a short time if the queue into the
async dataplane is full. This helps avoid a situation where
the zebra main pthread constantly retries rib work without
giving the dataplane pthread a chance to make progress.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
NEXTHOP_FLAG_ACTIVE currently means that the nexthop is considered
good enough to be installed. With current ecmp restrictions this
translation from multipath_num is enforced in the data plane.
The problem with this is of course that every data plane now
becomes concerned about the multipath num and must enforce it
independently. Currently *bsd does not honor multipath_num at
all and linux marks all nexthops as being installed even when
it honors a multipath_num that is less than the total.
This code change moves the multipath_num enforcement from a dataplane
decision to a zebra nexthop decision. Thus dataplanes now can
just install those nexthops marked as NEXTHOP_FLAG_ACTIVE
without having to worry about multipath_num.
*BSD will now respect multipath_num and Linux now properly notes
which routes are actually installed or not:
sharpd@donna ~/f/t/topotests> ps -ef | grep frr
frr 6261 1556 0 09:12 ? 00:00:00 /usr/lib/frr/zebra -e 2 --daemon -A 127.0.0.1
frr 6279 1556 0 09:12 ? 00:00:00 /usr/lib/frr/staticd --daemon -A 127.0.0.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/106] via 10.0.2.2, enp0s3, 00:00:45
S>* 4.4.4.4/32 [1/0] via 10.0.2.1, enp0s3, 00:00:02
* via 192.168.209.1, enp0s8, 00:00:02
via 192.168.210.1, enp0s9 inactive, 00:00:02
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:45
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:45
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:45
donna.cumulusnetworks.com(config)#
sharpd@donna ~/f/t/topotests> ip route show
default via 10.0.2.2 dev enp0s3 proto dhcp metric 106
4.4.4.4 proto 196 metric 20
nexthop via 10.0.2.1 dev enp0s3 weight 1
nexthop via 192.168.209.1 dev enp0s8 weight 1
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 metric 106
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
192.168.209.0/24 dev enp0s8 proto kernel scope link src 192.168.209.2 metric 105
192.168.210.0/24 dev enp0s9 proto kernel scope link src 192.168.210.2 metric 103
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Stop creating individual, one-time events as each batch of
incoming zserv/zapi messages is processed - use a singleton
event so that the incoming message activity is more fair if
the zebra main pthread has other events to run.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
We never used this information and it was merely stored.
Additionally this is not something that is a flag, it's
a status.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Make the v4 and v6 code paths for rib_XXX calls in kernel_socket
as similiar as we can possibly make them. There is no need
for code duplication at this point in time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The rib_lookup_ipv4_route function is only used in a debug path.
Is only used for v4 and only checks to make sure that the rib
and fib are in sync( which is not needed/used/supported on other
platforms ). So let's just remove it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For nexthop handling use the actual resolved nexthop.
Nexthops are stored as a `special` list:
Suppose we have 3 way ecmp A, B, C:
nhop A -> resolves to nhop D
|
nhop B
|
nhop C -> resolves to nhop E
A and C are typically NEXTHOP_TYPE_IPV4( or 6 ) if they recursively resolve
We do not necessarily store the ifindex that this resolves to.
Current nexthop code only loops over A,B and C and uses those for
the zebra_rnh.c handling. So interested parties might receive non-fully
resolved nexthops( and they assume they are! ).
Let's convert the looping to go over all nexthops and only deal with
the resolved ones, so we will look at and use D,B and E.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The `show ip route A.B.C.D json` command was only displaying
the last route entry looked at and we would drop the data
associated with other route entries. This fixes the issue:
robot# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/100] via 192.168.201.1, enp3s0, 00:13:31
C>* 4.50.50.50/32 is directly connected, lo, 00:13:31
D 10.0.0.1/32 [150/0] via 192.168.201.1, enp3s0, 00:09:46
S>* 10.0.0.1/32 [1/0] via 192.168.201.1, enp3s0, 00:10:04
C>* 192.168.201.0/24 is directly connected, enp3s0, 00:13:31
robot# show ip route 10.0.0.1 json
{
"10.0.0.1\/32":[
{
"prefix":"10.0.0.1\/32",
"protocol":"sharp",
"distance":150,
"metric":0,
"internalStatus":0,
"internalFlags":1,
"uptime":"00:09:50",
"nexthops":[
{
"flags":1,
"ip":"192.168.201.1",
"afi":"ipv4",
"interfaceIndex":2,
"interfaceName":"enp3s0",
"active":true
}
]
},
{
"prefix":"10.0.0.1\/32",
"protocol":"static",
"selected":true,
"distance":1,
"metric":0,
"internalStatus":0,
"internalFlags":2064,
"uptime":"00:10:08",
"nexthops":[
{
"flags":3,
"fib":true,
"ip":"192.168.201.1",
"afi":"ipv4",
"interfaceIndex":2,
"interfaceName":"enp3s0",
"active":true
}
]
}
]
}
robot#
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When using `SIOCGIFMEDIA` check for `EINVAL`, otherwise we might print
an error message on an unsupported interface.
FreeBSD source code reference:
https://github.com/freebsd/freebsd/blob/master/sys/net/if_media.c#L300
And:
8cb4b0c018/usr.sbin/rtsold/if.c (L211)
/*
* EINVAL simply means that the interface does not support
* the SIOCGIFMEDIA ioctl. We regard it alive.
*/
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Some address types were not being skipped triggering a warning log
message, so lets refactor this code to properly handle known and unknown
types.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Move the declaration of ROUNDUP and ROUND_TYPE to outside of
`ifdef SA_SIZE`. We'll use these definitions in the next commit.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Clear dup address vni needs to return non-zero value
in case of command is not successful.
Ticket:CM-23122
Testing Done:
run clear command and check upon failure return code is non-zero.
root@TORS1:~# vtysh -c "clear evpn dup-addr vni 1000 ip 45.0.1.26"
% Requested IP's associated MAC 00:01:02:03:04:05 is still in duplicate
% state
root@TORS1:~# echo $?
1
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Problem reported that kernel neighbor entries could end up in "FAILED"
state when the neighbor entry was deleted. This fix handles the
notification of the event from netlink messages and re-inserts the
deleted entry.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Always resend the nexthop information when we get a registration
event. Multiple daemons expect this information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.