This includes -
1. non-DF block filter
2. List of es-peers that need to be blocked per-access port (for
split horizon filtering)
3. Backup nexthop group to failover local-es via the VxLAN overlay
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
1. DF preference is configurable per-ES
!
interface hostbond1
evpn mh es-df-pref 100 >>>>>>>>>>>
evpn mh es-id 1
evpn mh es-sys-mac 00:00:00:00:01:11
!
2. This parameter is sent to BGP and advertised via the ESR.
3. The peer-ESs' DF params are sent to zebra (by BGP) and used
for running the DF election.
4. If the local VTEP becomes non-DF on an ES a block filter is
programmed in the dataplane to drop de-capsulated BUM packets
destined to that ES.
Sample output
=============
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh evpn es
Type: L local, R remote, N non-DF
ESI Type ES-IF VTEPs
03:00:00:00:00:01:11:00:00:01 LRN hostbond1 27.0.0.16
03:00:00:00:00:01:22:00:00:02 LR hostbond2 27.0.0.16
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh evpn es 03:00:00:00:00:01:11:00:00:01
ESI: 03:00:00:00:00:01:11:00:00:01
Type: Local,Remote
Interface: hostbond1
State: up
Ready for BGP: yes
VNI Count: 10
MAC Count: 2
DF: status: non-df preference: 100 >>>>>>>>
Nexthop group: 0x2000001
VTEPs:
27.0.0.16 df_alg: preference df_pref: 32767 nh: 0x100000d >>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
DF (Designated forwarder) election is used for picking a single
BUM-traffic forwarded per-ES. RFC7432 specifies a mechanism called
service carving for DF election. However that mechanism has many
disadvantages -
1. LBs poorly.
2. Doesn't allow for a controlled failover needed in upgrade
scenarios.
3. Not easy to hw accelerate.
To fix the poor performance of service carving alternate DF mechanisms
have been proposed via the following drafts -
draft-ietf-bess-evpn-df-election-framework
draft-ietf-bess-evpn-pref-df
This commit adds support for the pref-df election mechanism which
is used as the default. Other mechanisms including service-carving
may be added later.
In this mechanism one switch on an ES is elected as DF based on the
preference value; higher preference wins with IP address acting
as the tie-breaker (lower-IP wins if pref value is the same).
Sample output
=============
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn es 03:00:00:00:00:01:11:00:00:01
ESI: 03:00:00:00:00:01:11:00:00:01
Type: LR
RD: 27.0.0.15:6
Originator-IP: 27.0.0.15
Local ES DF preference: 100
VNI Count: 10
Remote VNI Count: 10
Inconsistent VNI VTEP Count: 0
Inconsistencies: -
VTEPs:
27.0.0.16 flags: EA df_alg: preference df_pref: 32767
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
torm-11# sh bgp l2vpn evpn route esi 03:00:00:00:00:01:11:00:00:01
*> [4]:[03:00:00:00:00:01:11:00:00:01]:[32]:[27.0.0.15]
27.0.0.15 32768 i
ET:8 ES-Import-Rt:00:00:00:00:01:11 DF: (alg: 2, pref: 100)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Convert IPv4 and IPv6 unicast address family clis
to transactional clis and implementation of
northbound callbacks.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
In transactional cli mode, bgp address-family <afi> <afi>
node builds xpath on top of `router bgp` node's xpath.
When `exit` is applied under afi-safi commands, retain
xpath_index to 1 to keep using bgp global xpath.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
When compiling w/ --enable-bfdd=no we get warnings
about functions not being used.
Add a #if check to include it as needed.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
There are several places where prefix2str was used to convert
a prefix but they were debug guarded and the buffer was
used for flog_err/warn. This would lead to corrupt data
being output in the failure cases if debugs were not turned
on.
Modify the code in zebra_mpls.c to not use prefix2str
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
We are loading a buffer with the prefix2str results then
using it in the debugs throughout functions. Replace
with just using %pFX and remove the buffer.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Problem found that turning an update-delay would only delay prefixes
learned from peers by delaying bestpath, but would allow local routes
(network statements or redistributed) to be immediately advertised,
followed by an End of Rib indicator. This fix delays sending local
routes until the update-delay process is completed, which matches
what testing shows other vendors do..
Ticket: CM-31743
Signed-off-by: Don Slice <dslice@nvidia.com>
Issue:
When the ospf area is changed from default to nssa or stub, the previously
advertised external LSAs are not removed from the neighbor.
The LSAs remain in database till maxage timeout.
Fix:
Advertise the external LSAs with age set to maxage and flood to the
nssa or stub area.
Signed-off-by: kssoman <somanks@gmail.com>
Make it possible to load YANG modules outside the main northbound
initialization. The primary use case is to support YANG modules
that are specific to an FRR plugin. Example: only load the PCEP
YANG module when the corresponding FRR plugin is loaded. Other use
cases might arise in the future.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Combine yang_snodes_iterate_module() and yang_snodes_iterate_all()
into an unified yang_snodes_iterate() function, where the first
"module" parameter is optional. There's no point in having two
separate YANG schema iteration functions anymore now that they are
too similar.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The only safe way to iterate over all schema nodes of a given YANG
module is by iterating over all schema nodes of all YANG modules
and filter out the nodes that belong to other modules.
The original yang_snodes_iterate_module() code did the following:
1 - Iterate over all top-level schema nodes of the given module;
2 - Iterate over all augmentations of the given module.
While that iteration strategy is more efficient, it does't handle
well more complex YANG hierarchies containing nested augmentations
or self-augmenting modules. Any iteration that isn't done on the
resolved YANG data hierarchy is fragile and prone to errors.
Fixes regression introduced by commit 8a923b4851 where the
gen_northbound_callbacks tool was generating duplicate callbacks
for certain modules.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
- tracepoint() -> frrtrace()
- tracelog() -> frrtracelog()
- tracepoint_enabled() -> frrtrace_enabled()
Also removes copypasta'd #ifdefs for those LTTng macros, those are
handled in lib/trace.h
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
Previous commits added LTTng tracepoints. This was primarily for testing
/ trial purposes; in practice we'd like to support arbitrary tracing
methods, and especially USDT probes, which SystemTap and dtrace expect,
and which are supported on at least one flavor of BSD (FreeBSD).
To that end this patch adds an frr-specific tracing macro, frrtrace(),
which proxies into either DTRACE_PROBEn() or tracepoint() macros
depending on whether --enable-usdt or --enable-lttng is passed at
compile time.
At some point this could be tweaked to allow compiling in both types of
probes. Ideally there should be some logic there to use LTTng's optional
support for generating USDT probes when both are requested.
No additional libraries are required to use USDT, since these probes are
a kernel feature and only need the <sys/sdt.h> header.
- add --enable-usdt to toggle use of LTTng tracepoints or USDT probes
- add new trace.h library header for use with tracepoint definition
headers
- add frrtrace() wrapper macro; this should be used to define
tracepoints instead of using tracepoint() or DTRACE_PROBEn()
Compilation with USDT does nothing as of this commit; the existing LTTng
tracepoints need to be converted to use the frrtrace*() macros in a
subsequent commit.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
hash_get is used for both lookup and insert; add a tracepoint for when
we insert something into the hash
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
LTTng supports tracef() and tracelog() macros, which work like printf,
and are used to ease transition between logging and tracing. Messages
printed using these macros end up as trace events. For our uses we are
not interested in dropping logging, but it is nice to get log messages
in trace output, so I've added a call to tracelog() in zlog that dumps
our zlog messages as trace events.
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
This commit adds initial support for LTTng.
When --enable-lttng=no or is not specified, no tracing code is included.
When --enable-lttng=yes, LTTng tracing events are (will be) generated.
configure.ac:
- add --enable-lttng
- define HAVE_LTTNG when enabled
- minimum LTTng version: 2.12.0
lib:
- add trace.[ch]
- update subdir.am
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
Fixes the valgrind error we were seeing on startup due to
initializing the msg header struct:
```
==2534283== Thread 3 zebra_dplane:
==2534283== Syscall param recvmsg(msg) points to uninitialised byte(s)
==2534283== at 0x4D616DD: recvmsg (in /usr/lib64/libpthread-2.31.so)
==2534283== by 0x43107C: netlink_recv_msg (kernel_netlink.c:744)
==2534283== by 0x4330E4: nl_batch_read_resp (kernel_netlink.c:1070)
==2534283== by 0x431D12: nl_batch_send (kernel_netlink.c:1201)
==2534283== by 0x431E8B: kernel_update_multi (kernel_netlink.c:1369)
==2534283== by 0x46019B: kernel_dplane_process_func (zebra_dplane.c:3979)
==2534283== by 0x45EB7F: dplane_thread_loop (zebra_dplane.c:4368)
==2534283== by 0x493F5CC: thread_call (thread.c:1585)
==2534283== by 0x48D3450: fpt_run (frr_pthread.c:303)
==2534283== by 0x48D3D41: frr_pthread_inner (frr_pthread.c:156)
==2534283== by 0x4D56431: start_thread (in /usr/lib64/libpthread-2.31.so)
==2534283== by 0x4E709D2: clone (in /usr/lib64/libc-2.31.so)
==2534283== Address 0x85cd850 is on thread 3's stack
==2534283== in frame #2, created by nl_batch_read_resp (kernel_netlink.c:1051)
==2534283==
==2534283== Syscall param recvmsg(msg.msg_control) points to unaddressable byte(s)
==2534283== at 0x4D616DD: recvmsg (in /usr/lib64/libpthread-2.31.so)
==2534283== by 0x43107C: netlink_recv_msg (kernel_netlink.c:744)
==2534283== by 0x4330E4: nl_batch_read_resp (kernel_netlink.c:1070)
==2534283== by 0x431D12: nl_batch_send (kernel_netlink.c:1201)
==2534283== by 0x431E8B: kernel_update_multi (kernel_netlink.c:1369)
==2534283== by 0x46019B: kernel_dplane_process_func (zebra_dplane.c:3979)
==2534283== by 0x45EB7F: dplane_thread_loop (zebra_dplane.c:4368)
==2534283== by 0x493F5CC: thread_call (thread.c:1585)
==2534283== by 0x48D3450: fpt_run (frr_pthread.c:303)
==2534283== by 0x48D3D41: frr_pthread_inner (frr_pthread.c:156)
==2534283== by 0x4D56431: start_thread (in /usr/lib64/libpthread-2.31.so)
==2534283== by 0x4E709D2: clone (in /usr/lib64/libc-2.31.so)
==2534283== Address 0xa0 is not stack'd, malloc'd or (recently) free'd
==2534283==
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When Segment Routing is not enabled, some related output messages are
printed on the console especially when Segment Routing Debug is enabled.
This patch adds additional controls to check whether segment routing
is enabled or not.
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>