When attempting to limit the amount of data sent from the kernel
to FRR, some kernels we can run against may not have this ability
in which case the setsockopt will fail. Notice that in the log.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This problem was reported by the sanitizer -
=================================================================
==24764==ERROR: AddressSanitizer: heap-use-after-free on address 0x60d0000115c8 at pc 0x55cb9cfad312 bp 0x7fffa0552140 sp 0x7fffa0552138
READ of size 8 at 0x60d0000115c8 thread T0
#0 0x55cb9cfad311 in zebra_evpn_remote_es_flush zebra/zebra_evpn_mh.c:2041
#1 0x55cb9cfad311 in zebra_evpn_es_cleanup zebra/zebra_evpn_mh.c:2234
#2 0x55cb9cf6ae78 in zebra_vrf_disable zebra/zebra_vrf.c:205
#3 0x7fc8d478f114 in vrf_delete lib/vrf.c:229
#4 0x7fc8d478f99a in vrf_terminate lib/vrf.c:541
#5 0x55cb9ceba0af in sigint zebra/main.c:176
#6 0x55cb9ceba0af in sigint zebra/main.c:130
#7 0x7fc8d4765d20 in quagga_sigevent_process lib/sigevent.c:103
#8 0x7fc8d4787e8c in thread_fetch lib/thread.c:1396
#9 0x7fc8d4708782 in frr_run lib/libfrr.c:1092
#10 0x55cb9ce931d8 in main zebra/main.c:488
#11 0x7fc8d43ee09a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2409a)
#12 0x55cb9ce94c09 in _start (/usr/lib/frr/zebra+0x8ac09)
=================================================================
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
The read/write mlag buffer sizes of 2k were sufficient
for ~100 S,G notifications at one go. Increase to 32k
to give us 16 times the space.
Ticket: CM-31576
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
If we receive a message that is greater than our buffer
size we are in a situation where both the read and write
buffers are fubar'ed beyond the end. Assert when we notice
this fact.
Ticket: CM-31576
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The normal pattern of writing the type/length at the beginning
of the packet was not being quite followed. Modify the mlag
code to respect the proper way of doing things and get rid
of a stream_new and copy.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The neigh hold timer was firing after the neigh was deleted resulting
in the following crash -
[
at ./zebra/zebra_evpn_neigh.h:155
at zebra/zebra_evpn_neigh.c:447
at lib/thread.c:1578
at zebra/main.c:488
]
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Found that the command "evpn mh neigh-holdtime" can be set but
not deleted. This fix solves the delete process
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
When an ES peer withdraws a MAC-IP route we hold the entry for N seconds
to allow an external daemon (neighmgr) to establish host reachability
independent of the peer. Add config commands to allow the user to set
this holdtime (N).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Let's not make the entire `depend_finds` function pay
for the data gathering needed for the debug. There
are numerous other places in the code that check
the NEXTHOP_FLAG_RECURSIVE and do the same output.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The linux kernel is getting RTM_F_TRAP and RTM_F_OFFLOAD for
kernel routes that have an underlying asic offload. Write the
code to receive these notifications from the linux kernel and
to store that data for display about the routes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some linux kernels are starting to support the idea of knowledge
about the underlying asic. Add a boolean that we can set/unset
to track whether or not we think the router has this functionality
available.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The Solaris code has gone through a deprecation cycle. No-one
has said anything to us and worse of all we don't have any test
systems running Solaris to know if we are making changes that
are breaking on Solaris. Remove it from the system so
we can clean up a bit.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Code was added in the past to support a value of VRF_DEFAULT different
from 0. This option was abandoned, the default vrf id is always 0.
Remove this code, this will simplify the code and improve performance
(use a constant value instead of a function that performs tests).
Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
In all outputs (text and json): simplify and optimize the vrf name
display, use the vrf_id_to_name() handler.
Note: vrf_id_to_name() has a safeguard system that prevents from
crashing when the vrf cannot be found because it changed in some
(unexpected) manner, it returns "n/a".
Note: "vrf n/a" will now be displayed instead of "vrf UNKNOWN" in this
case, like in most other frr components.
This safeguard was missing for show ip route json, so this
optimization also fixes a potential crash.
Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
Variable "show ip route" commands invoke the same helper
(do_show_ip_route), potentially several times.
When asking to dump a non-default vrf, all vrfs or all tables, the
output is messy, the header summarizing abbreviations is repeated
several times, excess line feeds appear, the default table of default
VRF is concatenated to the previous table output...
Normalize the output:
- whatever the case, display the common header at most once, if there
is at least an entry to dump.
- when using a "vrf all" or "table all" command, prepend a line with
the VRF and table (even for the default vrf or table).
- when dumping a specific vrf or table, prepend a line with the VRF
and table.
Example (vrf all)
=================
router# show ip route vrf all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main:
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:24:09
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:24:09
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:00:26
VRF private:
S>* 1.1.1.0/24 [1/0] via 10.125.0.2, loop0, 00:00:29
C>* 10.125.0.0/24 is directly connected, loop0, 00:00:42
Example (main vrf)
==================
router# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:24:41
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:24:41
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:00:58
Example (specific vrf)
======================
router# show ip route vrf private
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF private:
S>* 1.1.1.0/24 [1/0] via 10.125.0.2, loop0, 00:01:23
C>* 10.125.0.0/24 is directly connected, loop0, 00:01:36
Example (all tables)
====================
router# show ip route table all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main table 200:
S>* 4.4.4.4/32 [1/0] via 10.125.0.3, ntfp2, 00:01:51
VRF main table 254:
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:25:34
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:25:34
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:01:51
Example (all vrf, all table)
============================
router# show ip route table all vrf all
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main table 200:
S>* 4.4.4.4/32 [1/0] via 10.125.0.3, ntfp2, 00:02:15
VRF main table 254:
C>* 10.0.2.0/24 is directly connected, mgmt0, 00:25:58
K>* 10.0.2.2/32 [0/100] is directly connected, mgmt0, 00:25:58
C>* 10.125.0.0/24 is directly connected, ntfp2, 00:02:15
VRF private table 200:
S>* 2.2.2.0/24 [1/0] via 10.125.0.2, loop0, 00:02:18
VRF private table 254:
S>* 1.1.1.0/24 [1/0] via 10.125.0.2, loop0, 00:02:18
C>* 10.125.0.0/24 is directly connected, loop0, 00:02:31
Example (specific table)
========================
router# show ip route table 200
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
VRF main table 200:
S>* 4.4.4.4/32 [1/0] via 10.125.0.3, ntfp2, 00:05:26
Signed-off-by: Christophe Gouault <christophe.gouault@6wind.com>
This series of events:
$ sudo ifconfig lo0 add 4.4.4.4/32
$ sudo ifconfig lo0 inet 4.4.4.4/32 delete
would end up leaving the 4.4.4.4/32 address on the interface under
freebsd.
This all boils down to the fact that the interface is not
considered connected yet we have a destination. If the
destination is the same and we are not connected ignore
it on freebsd.
I am sure there are other fun scenarios that someone
will have to squirrel out.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Problem commit -
[
b169fd6fd5 zebra: support for MAC-IP sync routes
]
That commit had accidentally replaced a mac-ip del to bgp with a mac
del (consequence of a bad cut-paste).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Changes to setup peer-synced as static in the dataplane. This prevents
them from being flushed out when the local switch cannot establish
their reachability.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
As a part of the re-factoring some of the evpn_vni_es apis got re-named
as evpn_evpn_es. Changed them to evpn_es_evi to make it common to
vxlan and mpls.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When a MAC is detected duplicate on a local
learn event (with freeze action),
do not send update to bgp to advertise into
evpn control plane.
With evpn mh, inform_client flag is set and
sends notification to bgp albeit dup detect
is set.
Check mac are detected as duplicate before
setting inform_client to true.
Ticket:CM-29817
Reviewed By:CCR-10329
Testing Done:
Enable DAD with freeze action
Upon local learn MAC detected as duplica
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
When installing rules pass by the interface name across
zapi.
This is being changed because we have a situation where
if you quickly create/destroy ephermeal interfaces under
linux the upper level protocol may be trying to add
a rule for a interface that does not quite exist
at the moment. Since ip rules actually want the
interface name ( to handle just this sort of situation )
convert over to passing the interface name and storing
it and using it in zebra.
Ticket: CM-31042
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
this is used when parsing the newly network namespaces. actually, to
track the link of some interfaces like vxlan interfaces, both link index
and link nsid are necessary. if a vxlan interface is moved to a new
netns, the link information is in the default network namespace, then
LINK_NSID is the value of the netns by default in the new netns. That
value of the default netns in the new netns is not known, because the
system does not automatically assign an NSID of default network
namespace in the new netns. Now a new NSID of default netns, seen from
that new netns, is created. This permits to store at netns creation the
default netns relative value for further usage.
Because the default netns value is set from the new netns perspective,
it is not needed anymore to use the NETNSA_TARGET_NSID attribute only
available in recent kernels.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the walk routine is used by vxlan service to identify some contexts in
each specific network namespace, when vrf netns backend is used. that
walk mechanism is extended with some additional paramters to the walk
routine.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when duplicate address detection is observed, some incrementation,
some timing mechanisms need to be done. For that the main evpn
configuration is retrieved. Until now, the VRF that was storing the dad
config parameters was the same VRF that hosted the VXLAN interface. With
netns backend, this is not true, as the VXLAN interface is in the
same VRF as the bridge interface. The modification takes same definition
as in BGP, that is to say that there is a single bgp evpn instance, and
this is that instance that will give the correct config settings.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this change is needed when a MAC/IP entry is learned by zebra, and the
entry happens to be in a different namespace. So that the entry be
active, the correct vni match has to be found.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
1. MAC ref of a zero ESI was accidentally creating a new ES with zero
ES id.
2. When an ES was deleted and re-added the ES was not being sent to BGP
because of a stale flag that suppressed the update as a dup.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
When we get a rib deletion event and we already have
that particular route node in the queue to be reprocessed,
just note that someone from kernel land has done us dirty
and allow it to be cleaned up by normal processing
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Imagine a situation where a interface is bouncing up/down.
The interface comes up and daemons like pbr will get a nht
tracking callback for a connected interface up and will install
the routes down to zebra. At this same time the interface can
go down. But since zebra is busy handling route changes ( from pbr )
it has not read the netlink message and can get into a situation
where the route resolves properly and then we attempt to install
it into the kernel( which is rejected ). If the interface
bounces back up fast at this point, the down then up netlink
message will be read and create two route entries off the connected
route node. Zebra will then enqueue both route entries for future processing.
After this processing happens the down/up is collapsed into an up
and nexthop tracking sees no changes and does not inform any upper
level protocol( in this case pbr ) that nexthop tracking has changed.
So pbr still believes the nexthops are good but the routes are not
installed since pbr has taken no action.
Fix this by immediately running rnh when we signal a connected
route entry is scheduled for removal. This should cause
upper level protocols to get a rnh notification for the small
amount of time that the connected route was bouncing around like
a madman.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
It was wrongly assumed that the kernel is replying in batches when multiple
requests fail. The kernel sends one error message at a time, so we can
simply keep reading data from the socket as long as possible.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>