Remove the pointer check for ctx. At this point in the
function it has to be non null since we deref'ed it.
Additionally the alloc function that creates it cannot
fail.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The current local mac delete event send to flag with force
always which breaks the duplicate detected MACs where
it requires to be resynced from bgpd to earlier state.
Ticket:#3233019
Issue:3233019
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Upon receiving local mobility event for MAC + NEIGH,
both are detected as duplicate upon hitting DAD threshold.
Duplicated detected ( freezed) MAC + NEIGH are not known
to bgpd.
If locally learnt MAC + NEIGH are deleted in kernel,
the MAC is marked as AUTO after sending delete event
to bgpd.
Bgpd only reinstalls best route for MAC_IP route (NEIGH)
but not for MAC event.
This puts a situation where MAC is AUTO state and
associated neigh as remote.
Fix:
DUPLICATE + LOCAL MAC deletion, set MAC delete request
as reinstall from bgpd.
Ticket:#2873307
Reviewed By:
Testing Done:
Freeze MAC + two NEIGHs in local mobility event.
Delete MAC and NEIGH from kerenl.
bgp rsync remote mac route which puts MAC to remote state.
Signed-off-by: Chirag Shah <chirag@nvidia.com>
EVPN MH ES reduendant VTEPs need to install
sync MAC as notify inactive and generate
ND:Proxy stamped extended community on Type-2
route.
Ticket:#3436621
Issue:3436621
Testing Done:
tor-11 originates type-2 MAC route:
tor-11# bridge -d fdb show | grep 00:65:00:00:00:01
00:65:00:00:00:01 dev hostbond1 vlan 1000 notify master bridge static
tor-12 receives sync MAC route:
Before fix:
----------
tor-12:/# bridge -d fdb show | grep 00:65:00:00:00:01
00:65:00:00:00:01 dev hostbond1 vlan 1000 notify master bridge static
After fix: inactive is set to MAC entry
----------
tor-12:/#bridge -d fdb show | grep 00:65:00:00:00:01
00:65:00:00:00:01 dev hostbond1 vlan 1000 notify inactive master bridge
static
Notice the difference in `inactive` post notify on tor-12
with the fix.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Srv6 nexthop segments may not be set when configuring seg6local
attributes. This is the case for the following seg6local route:
Dump in vtysh, extract from 'show ipv6 route'
> B>* 2001:db8:1:1:1::/128 [20/0] is directly connected, vrf1, seg6local End.DT46 table 10, seg6 ::, weight 1, 00:02:10
Dump in iproute2, extract from 'ip -6 route show'
> 2001:db8:1:1:1:: nhid 22 encap seg6local action End.DT46 vrftable 10 dev vrf1 proto bgp metric 20 pref medium
As can be seen, the 'seg6 ::' nexthop segment is not visible on iproute2,
because it is not set. Do not display seg6 ipv6 nexthop when not set.
After:
> B>* 2001:db8:1:1:1::/128 [20/0] is directly connected, vrf1, seg6local End.DT46 table 10, weight 1, 00:02:10
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Srv6 routes which configure encap method, may not have
seg6local instructions. Generally speaking, seg6local
attributes that are not specified should not be dumped.
Before:
> B>* 10.200.0.0/24 [20/0] via fd00:125::2, ntfp2 (vrf default), label 16, seg6local unspec unknown(seg6local_context2str), seg6 2001:db8:1:1:1::, weight 1, 0\
0:00:17
After:
> B>* 10.200.0.0/24 [20/0] via fd00:125::2, ntfp2 (vrf default), label 16, seg6 2001:db8:1:1:1::, weight 1, 00:00:17
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Issue:
After vlan flap, zebra was not marking the selected/best route as installed.
As a result, when a static route was configured with nexthop as directly
connected interface's(vlan) IP, the static route was not being installed
in the kernel since its nexthop was unresolved. The nexthop was marked
unresolved because zebra failed to mark the best route as installed after
interface flap.
This was happening because, in dplane_route_update_internal() if the old and
new context type, and nexthop group id are the same, then zebra doesn't send
down a route replace request to kernel. But, the installed (ROUTE_ENTRY_INSTALLED)
flag is set when zebra receives a response from kernel. Since the
request to kernel was being skipped for the route entry, installed flag
was not being set
Fix:
In dplane_route_update_internal() if the old and new context type, and
nexthop group id are the same, then before returning, installed flag will
be set on the route-entry if it's not set already.
Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
"show evpn json" returns nothing when evpn is disabled.
Code has been fixed to return {} when evpn is disabled or no entry
available.
Before Fix:-
```
cumulus@r2:mgmt:~$ sudo vtysh -c "show evpn json"
cumulus@r2:mgmt:~$
```
After Fix:-
```
cumulus@r1:mgmt:~$ sudo vtysh -c "show evpn json"
{
}
cumulus@r1:mgmt:~$
```
Ticket:#3417955
Issue:3417955
Testing: UT done
Signed-off-by: Chirag Shah <chirag@nvidia.com>
Signed-off-by: Sindhu Parvathi Gopinathan <sgopinathan@nvidia.com>
During shutdown, the main pthread stops the dplane pthread
before exiting. Don't try to clean up any events scheduled
to the dplane pthread at that point - just let the thread
exit and clean up.
Signed-off-by: Mark Stapp <mjs@labn.net>
two things:
On shutdown cleanup any events associated with the update walker.
Also do not allow new events to be created.
Fixes this mem-leak:
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790:Direct leak of 8 byte(s) in 1 object(s) allocated from:
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #0 0x7f0dd0b08037 in __interceptor_calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #1 0x7f0dd06c19f9 in qcalloc lib/memory.c:105
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #2 0x55b42fb605bc in rib_update_ctx_init zebra/zebra_rib.c:4383
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #3 0x55b42fb6088f in rib_update zebra/zebra_rib.c:4421
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #4 0x55b42fa00344 in netlink_link_change zebra/if_netlink.c:2221
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #5 0x55b42fa24622 in netlink_information_fetch zebra/kernel_netlink.c:399
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #6 0x55b42fa28c02 in netlink_parse_info zebra/kernel_netlink.c:1183
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #7 0x55b42fa24951 in kernel_read zebra/kernel_netlink.c:493
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #8 0x7f0dd0797f0c in event_call lib/event.c:1995
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #9 0x7f0dd0684fd9 in frr_run lib/libfrr.c:1185
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #10 0x55b42fa30caa in main zebra/main.c:465
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790- #11 0x7f0dd01b5d09 in __libc_start_main ../csu/libc-start.c:308
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-
./msdp_topo1.test_msdp_topo1/r2.zebra.asan.1117790-SUMMARY: AddressSanitizer: 8 byte(s) leaked in 1 allocation(s).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
BGP signals to zebra that a afi has converged immediately
after it has finished processing all routes for a given
afi/safi. This generates events in zebra in this order
a) Routes received from BGP, placed on early-rib Meta-Q
b) Signal GR for the afi.
Now imagine that zebra reads GR code and immediately
processes routes that are in the actual rib and
removes some routes. This generates a
c) route deletion to the kernel for some number of
routes that may be in the the early-rib Meta-Q
d) Process the Meta-Q, and re-install the routes
This is undesirable behavior in zebra. In that
while we may end up in a correct state, there
will be a blip for some number of routes that
happen to be in the early rib Meta-Q.
Modify the GR code to have it's own processing
entry at the end of the Meta-Q. This will
allow all routes to be processed and ready
for handling by the Graceful Restart code.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
After the restructure of the gr code to allow zebra_gr
to have individual cleanups of afi, this is no longer necessary.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The GR code in FRR used to wait till all AFI's were complete
before cleaning up the routes from the upper level protocol.
This of course can lead to some weird situations where say
ipv4 finishes and then v6 is stuck waiting for a peer to come
up and never finishes. v4 when it finishes signals zebra that
it is done but no action is taken at that moment.
Modify the code to allow the zebra_gr.c code to handle a per
afi removal, instead of doing it all at the end.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The zebra_gr code had 3 functions when effectively only
1 was needed. Cleans up some code weirdness around
multiple switch statements for the same api->cap
as well as consolidating down to only caring about
SAFI_UNICAST, since that is all we care about at the
moment.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
We have code that tracks both afi and safi's,
but we only ever operate on the afi's. So lets
limit our work being done to something more sensible.
I'm leaving the safi being broadcast through the zapi
message, as that I am not sure what else should be ripped
out at this point in time.
Finally re-arrange the zread_client_capabilites function
to stop the multiple levels of function calling that really
serve no purpose.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
By the time this function is called we have already
ensured that the pointers are good several times.
I like consistency but this is a bit much
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When GR is running and attempting to clear up a node
if the node that is currently saved and we are coming
back to happens to be deleted during the time zebra
suspends the GR code due to hitting the node limit
then zebra GR code will just completely stop processing
and potentially leave stale nodes around forever.
Let's just remove this hole and process what we can.
Can you imagine trying to debug this after the fact?
If we remove a node then that counts toward the maximum
to process of ZEBRA_MAX_STALE_ROUTE_COUNT. This should
prevent any non-processing with a slightly larger cost
of having to look at a few nodes repeatedly
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The info->do_delete variable was being set to true only when
u.val was 1. The problem with this is that u.val is a union
and the various ways that we can call this event causes
different values to be written to the union value on the thread.
This makes no sense. Just set the variable to what we want it to
be when we need it to be true. Since it was only ever set during
a thread_execute section.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Effectively a massive search and replace of
`struct thread` to `struct event`. Using the
term `thread` gives people the thought that
this event system is a pthread when it is not
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This is a first in a series of commits, whose goal is to rename
the thread system in FRR to an event system. There is a continual
problem where people are confusing `struct thread` with a true
pthread. In reality, our entire thread.c is an event system.
In this commit rename the thread.[ch] files to event.[ch].
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The 'show mpls table json' command displays the outgoing interface
name only when the nexthop type is either NEXTHOP_TYPE_IFINDEX or
NEXTHOP_TYPE_IPV6_IFINDEX. add the interface name for the nexthop
type NEXTHOP_TYPE_IPV4_IFINDEX.
Fixes: ("b78b820d46d6") MPLS: Display enhancements and JSON support
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This commit addresses the case where a service wants to install
an LSP entry to a next-hop located in a VRF instance. The incoming
MPLS packet is on the namespace and has to be directed to a nexthop
located behind an interface that sits in a specific VRF instance.
The below iproute command can illustrate:
> ip link add vrf1 type vrf table 10
> ip link set dev vrf1 up
> ip link set dev eth0 master vrf1
> ip a a 192.0.2.1/24 dev eth0
> ip -f mpls route add 105 via inet 192.0.2.45 dev eth0
If a service uses the ZEBRA_MPLS_LABELS messages, then the LSP
message is ignored: from zebra perspective, the MPLS entries are
visible via the 'show mpls table' command, but no LSP entry is
installed in the kernel.
The issue is in the nhlfe_nexthop_active_ipv[4/6] function: the
outgoing interface mentioned in the nexthop is searched in the
main VRF, whereas the interface is in a separate VRF. The interface
is not found, and the nhlfe to install is considered not active.
To address this issue, reuse the incoming vrf_id parameter transmitted
in the nexthop structure from the ZEBRA_MPLS_LABELS message. When
creating an NHLFE entry, the vrf_id is used instead of the DEFAULT_VRF.
And the nhlfe entry can be considered as active.
One alternate solution to reuse the vrf_id parameter in the mpls network
context would be to modify the search function in nhlfe_nexthop_active..()
function: looking for an existing ifindex in the zns. However, this
solution may not fit later when netns backend would be used.
Note that some changes have not been done yet and are considered
sufficient for now:
- The 'nhlfe_find' API: the assumption is done that only the linux vrf
backend is used for now.
- The 'mpls_lsp_install()' API: It is currently used by the CLI command
which does not handle the interface parameter, and the SRTE service, whih
always sends LSPs towards a nexthop located in the VRF_DEFAULT.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The ZEBRA_MPLS_LABELS_[ADD/DELETE/REPLACE] messages may change an
LSP entry based on an incoming MPLS entry, followed by a given
next-hop.
Having a next hop with no label information inside is rejected
by the zebra layer. As illustration, the following ZAPI message
would be rejected, because the next hop does not contain any
label information.
> ip -f mpls route add 105 via inet 192.0.2.45
At the same time, such configuration is desirable to be
supported:
An attempt has been done to configure the next-hop with an implicit-
null label. But the message is rejected by the kernel:
> ip -f mpls route add 104 as 3 via inet 192.0.2.45
> Error: Implicit NULL Label (3) can not be used in encapsulation.
The commit proposes to accept ZEBRA_MPLS_LABELS_[XX] messages with
a nexthop that does not contain any label information.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
Add a hash_clean_and_free() function as well as convert
the code to use it. This function also takes a double
pointer to the hash to set it NULL. Also it cleanly
does nothing if the pointer is NULL( as a bunch of
code tested for ).
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Issue:
When a netns is deleted, since zebra doesn’t receive interface down/delete
notifications from kernel, it manually deletes the interface without removing
the association between zebra_l3vni and the interface that is being deleted
(i.e it deletes the interface without setting “zl3vni->vxlan_if” to NULL).
Later, during the deletion of netns, when zl3vni_rmac_uninstall() is called to
uninstall the remote RMAC from the kernel, zebra ends up accessing stale
“zl3vni->vxlan_if” pointer, which now points to freed memory.
This was causing heap use-after-free.
Fix:
Before zebra starts deleting the interfaces when it receives netns delete notification,
appropriate functions() are being called to remove the association between evpn structs
and interface and set “zl3vni->vxlan_if” to NULL. This ensures that when
zl3vni_rmac_uninstall() is called during netns deletion, it will bail because
“zl3vni->vxlan_if” is NULL.
Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
The "show zebra mpls .. json" vty command may return empty information
in case the MPLS database is empty or a given label entry is not
available. When those errors occur, add the braces to return a
valid json format.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The GR debug logs are doing all sorts of wonderful stuff
but they were not actually displaying anything useful to the operator
about what vrf we are operating in.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Create VRF and interfaces:
ip netns add vrf1
ip link add veth1 index 100 type veth
ip link add link veth1 veth1.200 type vlan id 200
ip link set veth1.200 netns vrf1
ip -n vrf1 link add veth2 index 100 type veth
After reloading zebra, "show interface veth1.200" shows wrong parent
interface:
test# show interface veth1.200
Interface veth1.200 is down
...
Parent interface: veth2
This is because veth1.200 and veth1 are in different netns, and veth2
happens to have the same ifindex as veth1, in the same netns of
veth1.200.
When looking for parent, link-ifindex 100 should be looked up within
link-netns, rather than that of the child interface.
Add link_nsid to zebra interface, so that the <link_nsid, link_ifindex>
pair can uniquely identify the link interface.
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
Once RP/BSR address is learned in PIMD, PIMD does nexthop tracking
in Zebra.
For IPV6 address, the nexthop type is either NEXTHOP_TYPE_IPV6
or NEXTHOP_TYPE_IPV6_IFINDEX.
Zebra should send nexthop ifindex information along with nexthop address
to the client (PIMD).
Issue: #11526
Issue: #11957
Signed-off-by: Sarita Patra <saritap@vmware.com>
Coverity rightly points out that a call into zebra_l2_bridge_if_vlan_find
is NULL checked 4/5 times. Let's make it 5/5
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
a) Consolidate v4 and v6 versions of rib_match_multicast
b) Improve debug to show what we matched against as well.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In `rib_link`, if is_zebra_import_table_enabled returns
true, `rib_queue_add` will not called, resulting in other
table route node never processed. This actually should not
be dependent on whether the route is imported.
In `rib_delnode`, if is_zebra_import_table_enabled returns
true, it will use `rib_unlink` instead of enqueuing the
route node for process. There is no reason that imported
route nodes should not be reprocessed. Long ago, the
behaviour was dependent on whether the route_entry comes
from a table other than main.
Signed-off-by: zyxwvu Shi <i@shiyc.cn>
When we are installing the flood entry for a vtep in SVD,
ensure VNI is set on the ctx object so that it gets
sent to the kernel and set appropriately with src_vni.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Ticket: 2698649
Testing Done: precommit and evpn-min
Problem:
When the mcast-group is updated, the changes were being read from the netlink
and populated by zebra, but when kernel sends the delete of fdb delete for the
group, we are deleting the mcast-group that we newly updated. This is because,
currently we blindly reset the mcast-group during fdb delete without checking
for mcast-group associated to the vni.
Fix is to separate add/update and delete mcast-group functions and to check
for mcast-group before resetting during delete.
Signed-off-by: sramamurthy <sramamurthy@nvidia.com>
Ticket: 2674793
Testing Done: precommit, evpn-min and evpn-smoke
The problem in this case is whenever we are triggering ifdown
followed by ifup of bridge, we see that remote mac entries
are programmed with vlan-1 in the fdb from zebra and never cleaned up.
bridge has vlan_default_pvid 1 which means any port that gets added
will initially have vlan 1 which then gets deleted by ifupdown2 and
the proper vlan gets added.
The problem lies in zebra where we are not cleaning up the remote
macs during vlan change.
Fix is to uninstall the remote macs and then install them
during vlan change.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
When the VLAN-VNI mapping is configured via a map and not using
individual VXLAN interfaces, upon removal of a VNI ensure that the
remote FDB entries are uninstalled correctly.
Signed-off-by: Vivek Venkatraman <vivek@nvidia.com>
Ticket: #2613048
Reviewed By:
Testing Done:
1. Manual verification - logs in the ticket
2. Precommit (user job #171) and evpn-min (user job #170)
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Ticket: 2730328, 2724075
Reviewed By: CCR-11741, CCR-11746
Testing Done: Unit Test
2730328: At high bridge-vids count, VNI devices are not added in FRR if
FRR restarts after loading e/n/i
The issue is the wrt buffer overflow for netlink_recv_msg.
We have defined the kernel recv message buffer in stack which is of size 32768 (32K).
When the configuration is applied without FRR restart things work fine
because the recv message from kernel is well within the limit of 32K.
However with this configuration, when the FRR was restarted I could see that
some recv messages were crossing the 32K limit and hence weren't processed.
Below error logs were seen when frr was restarted with the confuguration.
2021/08/09 05:59:55 ZEBRA: [EC 4043309092] netlink-cmd (NS 0) error: data remnant size 32768
Fix is to increase the buffer size by another 2K
2724075: evpn mh/SVD - some of the remote neighs/macs aren't installed
in kernel post ifdown/ifup bridge
The issue was specific to SVD. During ifdown/ifup of the bridge,
I could see that the access-bd was not associated with the vni and hence
the remote neighs were not getting programmed in the kernel.
Fix is to reference (or associate) vxlan vni to the access-bd when
the vni is reported up. With this fix, I was able to see the remote
neighs getting programmed to the kernel.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
ignore GETVLAN errors at startup like we are doing
for nexthop groups. Older platforms don't support the API.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Ignore zebra_mac updates if they do not contain a VNI for vxlan
interface. We don't have anything we can do with them.
'''
==443593== Process terminating with default action of signal 6 (SIGABRT): dumping core
==443593== at 0x4E1156C: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==443593== by 0x4DC4D15: raise (in /usr/lib64/libc.so.6)
==443593== by 0x49823C7: core_handler (sigevent.c:261)
==443593== by 0x4DC4DBF: ??? (in /usr/lib64/libc.so.6)
==443593== by 0x4E1156B: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==443593== by 0x4DC4D15: raise (in /usr/lib64/libc.so.6)
==443593== by 0x4D987F2: abort (in /usr/lib64/libc.so.6)
==443593== by 0x49C3064: _zlog_assert_failed (zlog.c:700)
==443593== by 0x4F5E6D: zebra_vxlan_if_vni_find (zebra_vxlan_if.c:661)
==443593== by 0x4EEAC3: zebra_vxlan_check_readd_vtep (zebra_vxlan.c:4244)
==443593== by 0x450967: netlink_macfdb_change (rt_netlink.c:3722)
==443593== by 0x450011: netlink_neigh_change (rt_netlink.c:4458)
'''
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Properly handle ipv6-mapped-ipv4 with DVNI by converting
the address to ipv4 and setting that as the DST field for
the encap.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
The `show evpn next-hop svd *` command doesn't provide much
for users right now. Make it hidden so we can still debug
the tables with it.
Also remove SVD output from `show evpn next-hop vni all`.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Don't install implict NULL labels with non-vni label'd
routes.
This returns behavior to how it was before adding the DVNI code.
Ticket: #2677036
Testing Done: precommit, manual
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Read in STP state changes for a Single Vxlan Device
via bridge vlan netlink messages. Map the vlanid to a
VNI in the SVD table and treat it similar to how
we handle proto down of the Vxlan device traditionally
in a non-SVD device scenario.
Forwarding == Interface UP
Blocking == Interface DOWN
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add some show commands and expand some already existing
commands so we can get debug info from the SVD global
neigh table inside zebra.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add code in the nhg resolution path for determining if Downstream
VNI is in play. This is the only place in all of zebra where
we should be arbitrarily setting the ifindex/labels since
this is where new nhgs are created/destroyed. If something
changes, it must happen here.
We determine if D-VNI is being used by matching the carried
label (VNI) on the nexthop with the vrf VNI from the route.
If they do not match, we can assume this is a D-VNI labeled
nexthop.
We loop through all of the group to see if any are D-VNI. If even
one is, we must treat them all as such. Otherwise, fallback to
traditional EVPN route handling and remove all the labels.
If they are going to be treated as D-VNI we retain the labels and
verify the underlying VRF vxlan interface is a Single VXlan Device.
If it is not, we cannot use D-VNI. If it is, continue on. The VNI label
will encapped via LWTUNNEL and sent to the kernel.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Install neigh entries always on SVD if it exists in
zebra. If zebra is using a Single Vxlan Device, we must
duplicate the install of our neigh entries to it so that
vxlan communication can also work across it in the downstream VNI
case.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Encode the vni label during route install on linux
systems via lwt encap 64bit LWTUNNEL_IP_ID. The kernel expects
this in network byte order, so we convert it.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Add the ability to specify the label type along with the labels
you are passing to zebra in zapi_nexthop. This is needed as we
abstract the label code to be re-used by evpn as well as mpls.
Protocols need to be able to set the type of label they have attached.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
Use the already existing mpls label code to store VNI
info for vxlan. VNI's are defined as labels just like mpls,
we should be using the same code for both.
This patch is the first part of that. Next we will need to
abstract the label code to not be so mpls specific. Currently
in this, we are just treating VXLAN as a label type and storing
it that way.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
This patch addresses fix for issues found during static analysis.
rt_netlink - initialise vtep if there is NDA_DST attribute
if_netlink - initialise vni_start and vni_end
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
zebra_vxlan_if.h header file was missed in noinst_HEADERS resulting
in build failure for some platforms.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
zebra_l2_bridge_if.h header file was missed in noinst_HEADERS resulting
in build failure for some platforms.
Signed-off-by: Stephen Worley <sworley@nvidia.com>
This patch addresses following issues,
- When the VLAN-VNI mapping is configured via a map and not using
individual VXLAN interfaces, upon removal of a VNI ensure that the
remote FDB entries are uninstalled correctly.
- When VNI configuration is performed using VLAN-VNI mapping (i.e., without
individual VXLAN interfaces) and flooded traffic is handled via multicast,
the multicast group corresponding to the VNI needs to be explicitly read
from the bridge FDB. This is relevant in the case of netlink interface to
the kernel and for the scenario where a new VNI is provisioned or comes up.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
This patch addresses following
- Remove unused VLAN Id parameter when trying to determine the VNI associated
with a non-VLAN aware bridge. Also, add a check to ensure that in this case,
we have a per-VNI VXLAN interface. Due to sequence of events, it is possible
that we may have VLAN-VNI mappings, in which case the code should return
gracefully.
- With support for a container VXLAN interface that has VLAN-VNI mappings,
the VXLAN interface itself may be up but a particular VNI might have
been removed. Ensure that VNI mapping exists before proceeding with
further processing.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
This patch addresses following bug fixes
- Fix vtysh doc string in "show evpn access-vlan..." command
- Multicast group handling was little complex. This change avoids calling
multiple functions and directly calls the zebra_vxlan_if_update_vni for
mcast group updates.
- When a vlan-vni map is removed, the removed vni deletion was happening
in FRR with SVD config. This was resulting in stale vni and not
resulting propagation of the vni deletion.
During vni cleanup (zebra_vxlan_if_vni_clean) zebra_vxlan_if_vni_del
was called for vni delete which is not correct. We should be calling
zebra_vxlan_if_vni_entry_del for the given vni entry.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
Today to find the vni for a given (vlan, bridge) we walk over all interfaces
and filter the vxlan device associated with the bridge. With multiple vlan aware
bridge changes, we can derive the vni directly by looking up the hash table i.e.
the vlan_table of the associated (vlan, bridge) which would give the vni.
During vrf_terminate() call zebra_l2_bridge_if_cleanup if the interface
that we are removing is of type bridge. In this case, we walk over all
the vlan<->access_bd association and clean them up.
zebra_evpn_t is modified to record (vlan, bridge) details and the
corresponding vty is modified to print the same.
zevpn_bridge_if_set and zl3vni_bridge_if_set is used to set/unset the
association.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
Multiple vlan aware bridge data structure changes and its corresponding bridge
handling changes.
A new vlan-table is maintained for each bridge which records the zebra_l2_bridge_vlan
entry. zebra_l2_bridge_vlan maps vlan to access_bd associated to this bridge.
Existing zebra_evpn_access_bd structure is vlan aware which is now modified to be
(vlan, bridge) aware.
Whenever a new access_bd is instantiated, a corresponding entry is also recorded
in the zebra l2 bridge for the vlan.
When the access_bd is dereferenced or whenever a bridge is deleted, the
association is cleaned up.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
This change brings in following functionality
- netlink_bridge_vxlan_vlan_vni_map_update for single vxlan devices
This function is responsible for reading the vlan-vni map information
received from netlink and populating a new hash_table with the vlan-vni
data. Once all the vlan-vni data is collected, zebra_vxlan_if_vni_table_add_update
is called to update vni_table in vxlan interface and process each of the
vlan-vni data.
- refactoring changes for zevpn_build_hash_table
- existing zevpn_build_hash_table was walking over all the vxlan interfaces
and then processing the vni for each of them. In case of single vxlan device,
we will have more than one vni entries. This function is abstracted so that
it iterates over all the vni entries for single vxlan device. For traditional
vxlan device the zebra_vxlan_if_vni_iterate would only process single vni
associated with that device.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
This change modifies zebra_vxlan_if_up/down/add/update and del functionality
to be per vni based.
zebra_vxlan_if_add/update/del and zebra_vxlan_if_up/down now handles
the vni operations based on vxlan device type (single or traditional vxlan device).
zebra_vxlan_if_vni_table_add_update
- This function handles the vlan-vni map update received from the netlink
interface to single vxlan device vni_table hash table.
zebra_vxlan_if_vni_mcast_group_update
- This function handles the new multicast group update received from
the netlink interface to single vxlan device vni_table hash table.
For traditional vxlan interfaces, the vni and mcast group
handling follows the traditional approach.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
This change refactors the zebra_vxlan_if related functionality
to a new zebra_vxlan_if.c file. zebra_vxlan_if_up/down,
zebra_vxlan_if_add/update/del is moved zebra_vxlan_if.c
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
dplane_mac_info and dplane_neigh_info is modified to be vni aware.
dplane_rem_mac_add/del dplane_mac_init is modified to be vni aware.
During dplane context update (mac and neigh), we use the vni information
and if set, corresponding netlink attribute NDA_SRC_VNI is set and passed to the
dplane.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
This change set introduces data structure changes required for multiple vlan aware bridge
functionality. A new structure zebra_l2_bridge_if encapsulates the vlan to access_bd
association of the bridge. A vlan_table hash_table is used to record each instance
of the vlan to access_bd of the bridge via zebra_l2_bridge_vlan structure.
vxlan iftype derivation: netlink attribute IFLA_VXLAN_COLLECT_METADATA is used
to derive the iftype of the vxlan device. If the attribute is present, then the
vxlan interface is treated as single vxlan device, otherwise it would default to
traditional vxlan device.
zebra_vxlan_check_readd_vtep, zebra_vxlan_dp_network_mac_add/del is modified to
be vni aware.
mac_fdb_read_for_bridge - is modified to be (vlan, bridge) aware
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
This changeset introduces the data structure changes needed for
single vxlan device functionality. A new struct zebra_vxlan_vni_info
encodes the iftype and vni information for vxlan device.
The change addresses related access changes of the new data structure
fields from different files
zebra_vty is modified to take care of the vni dump information according
to the new vni data structure for vxlan devices.
Signed-off-by: Sharath Ramamurthy <sramamurthy@nvidia.com>
Currently `ip import-table 33` imports routes with
a distance of 15, as defined by zebra.h. zebra_rib.c
on the other hand believes the default value for the table
is 150. Let's make them agree with each other.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Use the defines for distance that are in zebra.h. We could
easily have a cluster where we don't agree with ourselves. So
let's convert zebra to use the defines in zebra.h
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The define of ZEBRA_ON_RIB_PROCESS_HOOK_CALL was in zebra.h
which exposes it to everyone, except zebra is the only daemon
to use this define. This does not beling in zebra.h
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add affinity-map hooks to check the utilization of affinity-map in
link-params before its deletion and to update link-params when the
affinity-map bit-position is updated.
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Add the support of Extended Admin-Group (RFC7308) to the zebra interface
link-params Traffic-Engineering context.
Extended admin-groups can be configured with the affinity-map:
> affinity-map blue bit-position 221
> int eth-rt1
> link-params
> affinity blue
> exit-link-params
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
Add the affinity-map global command to zebra. The syntax is:
> affinity-map NAME bit-position (0-1023)
Signed-off-by: Louis Scalbert <louis.scalbert@6wind.com>
The files converted in this commit either had some random misspelling or
formatting weirdness that made them escape automated replacement, or
have a particularly "weird" licensing setup (e.g. dual-licensed.)
This also marks a bunch of "public domain" files as SPDX License "NONE".
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
There existed the idea, from Volta, that a nexthop group would not have
the same nexthops installed -vs- what FRR actually sent down. The
dplane would notify you.
With the addition of 06525c4f99
the code was put behind a bit of a wall controlled the usage
of it.
The flag ROUTE_ENTRY_USE_FIB_NHG flag was being used
to control which set was being sent up to concerned parties
in nexthop tracking. Put this flag behind the wall and
do not necessarily set it when we receive a data plane
notification about a route being installed or not.
Fixes: #12706
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Locking around the list of providers/plugins is not
helpful - these only change at init time. Clear some SA
warnings by removing the locking.
Signed-off-by: Mark Stapp <mjs@labn.net>
1. Renamed "gates" to "nexthops"
2. Displaying afi of the nexthops being dispalyed in place of
"nexthops" JSON object in the old JSON output
3. Calling show_route_nexthop_helper() and show_nexthop_json_helper()
instead of print_nh() inorder to keeps the fields in "nexthops"
JSON object in sync with "nexthops" JSON object of
"show nexthop-group rib json".
Updated vtysh:
r1# show ip nht
192.168.0.2
resolved via connected
is directly connected, r1-eth0 (vrf default)
Client list: static(fd 28)
192.168.0.4
resolved via connected
is directly connected, r1-eth0 (vrf default)
Client list: static(fd 28)
Updated JSON:
r1# show ip nht json
{
"default":{
"ipv4":{
"192.168.0.2":{
"nhtConnected":false,
"clientList":[
{
"protocol":"static",
"socket":28,
"protocolFiltered":"none"
}
],
"nexthops":[
{
"flags":3,
"fib":true,
"directlyConnected":true,
"interfaceIndex":2,
"interfaceName":"r1-eth0",
"vrf":"default",
"active":true
}
],
"resolvedProtocol":"connected"
}
}
}
}
Signed-off-by: Pooja Jagadeesh Doijode <pdoijode@nvidia.com>
fpm:netlink format doesn't indicate the protocol information
in routes of BGP, OSPF and other protocols. Routes of those
protocols just indicate protocol as zebra.
The below route is actually BGP route but 'proto': 11
indicates that it is zebra.
{'attrs': [('RTA_DST', 'dummy'),
('RTA_PRIORITY', 0),
('RTA_GATEWAY', 'dummy'),
('RTA_OIF', 2)],
'dst_len': 32,
'family': 2,
'flags': 0,
'header': {'flags': 1025,
'length': 60,
'pid': 3160253895,
'sequence_number': 0,
'type': 24},
'proto': 11,
'scope': 0,
'src_len': 0,
'table': 254,
'tos': 0,
'type': 1}
with this change it is now seen with 'proto': 186
indicates that it is BGP.
{'attrs': [('RTA_DST', 'dummy'),
('RTA_PRIORITY', 0),
('RTA_GATEWAY', 'dummy'),
('RTA_OIF', 2)],
'dst_len': 32,
'family': 2,
'flags': 0,
'header': {'flags': 1025,
'length': 60,
'pid': 3160253895,
'sequence_number': 0,
'type': 24},
'proto': 186,
'scope': 0,
'src_len': 0,
'table': 254,
'tos': 0,
'type': 1}
Signed-off-by: Spoorthi K <spk@redhat.com>
Don't directly use `time()` for generating sequence numbers for two
reasons:
1. `time()` can go backwards (due to NTP or time adjustments)
2. Coverity Scan warns every time we truncate a `time_t` variable for
good reason (verify that we are Y2K38 ready).
Found by Coverity Scan (CID 1519812, 1519786, 1519783 and 1519772)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
The two commands ( `advertise-svi-ip` and `advertise-default-gw` ) can
be set in both `BGP_EVPN_NODE` and `BGP_EVPN_VNI_NODE`. So, when
configuring one of them, need to consider the configuration of the
other. Configuring it under `BGP_EVPN_NODE`, it does check the other.
However, the conversion is wrong when configured under `BGP_EVPN_VNI_NODE`.
One example:
With the following steps, the evpn routes with `SVI` will be mistakenly
withdrawn.
```
anlan(config-router-af)# advertise-svi-ip
anlan(config-router-af)# vni 100
anlan(config-router-af-vni)# advertise-svi-ip
anlan(config-router-af-vni)# no advertise-svi-ip
```
This commit fixed the conversion under `BGP_EVPN_VNI_NODE` for the
two commands.
Signed-off-by: anlan_cs <vic.lan@pica8.com>
Don't attempt to dereference `ifp` directly if it might be null: there
is a check right before this usage: `ifp ? ifp->info : NULL`.
In this context it should be safe to assume `ifp` is not NULL because
the only caller of this function checks that for this `ifindex`. For
consistency we'll check for null anyway in case this ever changes (and
with this the coverity scan warning gets silenced).
Found by Coverity Scan (CID 1519776)
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>