if_netlink.c created it's on nested parsing #define which
is identical to netlink_parse_rtattr_nested. Consolidate
on one instead of having this duality.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
In order to parse the netlink message into the
`struct rtattr *tb[size]` it is assumed that the buffer is
memset to 0 before the parsing. As such if you attempt
to read a value that was not returned in the message
you will not crash when you test for it.
The code has places were we memset it and places where we don't.
This *will* lead to crashes when the kernel changes. In
our parsing routines let's have them memset instead of having
to remember to do it pre pass in to the parser.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
- gre keys are collected and stored locally.
- when gre source set is requested, and the link interface
configured is different, the gre information collected is
pushed in the query, namely source ip or gre keys if present.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
preserve mtu upon interface flapping and tunnel source change.
Signed-off-by:Reuben Dowle <reuben.dowle@4rf.com>
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
This action is initiated by nhrp and has been stubbed when
moving to zebra. Now, a netlink request is forged to set
the link interface of a gre interface if that gre interface
does not have already a link interface.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
zebra is able to get information about gre tunnels.
zebra_gre file is created to handle hooks, but is not yet used.
also, debug zebra gre command is done to add gre traces.
A zebra_gre file is used for complementary actions that may be needed.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when zebra has vrf backend mapped to namespaces, the polling
of interfaces leads to fix all linkages of interfaces. This
was not done on non default namespace. do it for other namespaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
There are cases where either link information is not present at
interface creation or link information changed. handle this
situation.
Signed-off-by: Philippe.Guibert <philippe.guibert@6wind.com>
zebra dd link
The first change in this commit is the processing of the VRF termination.
When we terminate the VRF, we should not delete the underlying interfaces,
because there may be pointers to them in the northbound configuration. We
should move them to the default VRF instead.
Because of the first change, the VRF interface itself is also not deleted
when deleting the VRF. It should be handled in netlink_link_change. This
is done by the second change.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
This one also needed a bit of shuffling around, but MTYPE_RE is the only
one left used across file boundaries now.
Signed-off-by: David Lamparter <equinox@diac24.net>
This was caused because of uninitialized netlint attrs in the bond-member
netlink parse API.
PS: It was caught by the upstream topotests on ARM8 (passed everywhere
else).
Signed-off-by: Anuradha Karuppiah <anuradhak@nvidia.com>
Feature overview:
=================
A 802.3ad bond can be setup to allow lacp-bypass. This is done to enable
servers to pxe boot without a LACP license i.e. allows the bond to go oper
up (with a single link) without LACP converging.
If an ES-bond is oper-up in an "LACP-bypass" state MH treats it as a non-ES
bond. This involves the following special handling -
1. If the bond is in a bypass-state the associated ES is placed in a
bypass state.
2. If an ES is in a bypass state -
a. DF election is disabled (i.e. assumed DF)
b. SPH filter is not installed.
3. MACs learnt via the host bond are advertised with a zero ESI.
When the ES moves out of "bypass" the MACs are moved from a zero-ESI to
the correct non-zero id. This is treated as a local station move.
Implementation:
===============
When (a) an ES is detached from a hostbond or (b) an ES-bond goes into
LACP bypass zebra deletes all the local macs (with that ES as destination)
in the kernel and its local db. BGP re-sends any imported MAC-IP routes
that may exist with this ES destination as remote routes i.e. zebra can
end up programming a MAC that was perviously local as remote pointing
to a VTEP-ECMP group.
When an ES is attached to a hostbond or an ES-bond goes
LACP-up (out of bypss) zebra again deletes all the local macs in the
kernel and its local db. At this point BGP resends any imported MAC-IP
routes that may exist with this ES destination as sync routes i.e.
zebra can end up programming a MAC that was perviously remote
as local pointing to an access port.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Added support for advertising SVI MAC if EVPN-MH is enabled.
In the case of EVPN MH arp replies from an attached server can be sent to
the ES-peer. To prevent flooding of the reply the SVI MAC needs to be
advertised by default.
Note:
advertise-svi-ip could have been used as an alternate way to advertise
SVI MAC. However that config cannot be turned on if SVI IPs are
re-used (which is done to avoid wasting IP addresses in a subnet).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Description: When we get a new vrf add and vrf with same name, but different vrf-id already
exists in the database, we should treat vrf add as update.
This happens mostly when there are lots of vrf and other configuration being replayed.
There may be a stale vrf delete followed by new vrf add. This
can cause timing race condition where vrf delete could be missed and
further same vrf add would get rejected instead of treating last arrived
vrf add as update.
Treat vrf add for existing vrf as update.
Implicitly disable this VRF to cleanup routes and other functions as part of vrf disable.
Update vrf_id for the vrf and update vrf_id tree.
Re-enable VRF so that all routes are freshly installed.
Above 3 steps are mandatory since it can happen that with config reload
stale routes which are installed in vrf-1 table might contain routes from
older vrf-0 table which might have got deleted due to missing vrf-0 in new configuration.
Signed-off-by: sudhanshukumar22 <sudhanshu.kumar@broadcom.com>
1. When a bond is associated with an ES we may need to re-sync
the dplane protodown state (which maybe stale/set by some other
app).
2. Also change the uplink state display to avoid confusion with
protodown reason code (both used to show uplink-up).
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Local ethernet segments are held in a protodown or error-disabled state
if access to the VxLAN overlay is not ready -
1. When FRR comes up the local-ESs/access-port are kept protodown
for the startup-delay duration. During this time the underlay and
EVPN routes via it are expected to converge.
2. When all the uplinks/core-links attached to the underlay go down
the access-ports are similarly protodowned.
The ES-bond protodown state is propagated to each ES-bond member
and programmed in the dataplane/kernel (per-bond-member).
Configuring uplinks -
vtysh -c "conf t" vtysh -c "interface swp4" vtysh -c "evpn mh uplink"
Configuring startup delay -
vtysh -c "conf t" vtysh -c "evpn mh startup-delay 100"
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
EVPN protodown display -
========================
root@torm-11:mgmt:~# vtysh -c "show evpn"
L2 VNIs: 10
L3 VNIs: 3
Advertise gateway mac-ip: No
Advertise svi mac-ip: No
Duplicate address detection: Disable
Detection max-moves 5, time 180
EVPN MH:
mac-holdtime: 60s, neigh-holdtime: 60s
startup-delay: 180s, start-delay-timer: 00:01:14 <<<<<<<<<<<<
uplink-cfg-cnt: 4, uplink-active-cnt: 4
protodown: startup-delay <<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ES-bond protodown display -
===========================
root@torm-11:mgmt:~# vtysh -c "show interface hostbond1"
Interface hostbond1 is up, line protocol is down
Link ups: 0 last: (never)
Link downs: 1 last: 2020/04/26 20:38:03.53
PTM status: disabled
vrf: default
OS Description: Local Node/s torm-11 and Ports swp5 <==> Remote Node/s hostd-11 and Ports swp1
index 58 metric 0 mtu 9152 speed 4294967295
flags: <UP,BROADCAST,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:35
Interface Type bond
Master interface: bridge
EVPN-MH: ES id 1 ES sysmac 00:00:00:00:01:11
protodown: off rc: startup-delay <<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
ES-bond member protodown display -
==================================
root@torm-11:mgmt:~# vtysh -c "show interface swp5"
Interface swp5 is up, line protocol is down
Link ups: 0 last: (never)
Link downs: 3 last: 2020/04/26 20:38:03.52
PTM status: disabled
vrf: default
index 7 metric 0 mtu 9152 speed 10000
flags: <UP,BROADCAST,MULTICAST>
Type: Ethernet
HWaddr: 00:02:00:00:00:35
Interface Type Other
Master interface: hostbond1
protodown: on rc: startup-delay <<<<<<<<<<<<<<<<
root@torm-11:mgmt:~#
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
when working with vrf netns backend, two bridges interfaces may have the
same bridge interface index, but not the same namespace. because in vrf
netns backend mode, a bridge slave always belong to the same network
namespace, then a check with the namespace id and the ns id of the
bridge interface permits to resolve correctly the interface pointer.
The problem could occur if a same index of two bridge interfaces can be
found on two different namespaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when receiving a netlink API for an interface in a namespace, this
interface may come with LINK_NSID value, which means that the interface
has its link in an other namespace. Unfortunately, the link_nsid value
is self to that namespace, and there is a need to know what is its
associated nsid value from the default namespace point of view.
The information collected previously on each namespace, can then be
compared with that value to check if the link belongs to the default
namespace or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The linux kernel sends the VLAN list per-access port as bitmap. This
needs to be translated into a per-ES VNI list for generation of
EAD-EVI routes.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Revert "zebra: support for macvlan interfaces"
This reverts commit bf69e212fd.
Revert "doc: add some documentation about bgp evpn netns support"
This reverts commit 89b97c33d7.
Revert "zebra: dynamically detect vxlan link interfaces in other netns"
This reverts commit de0ebb2540.
Revert "bgpd: sanity check when updating nexthop from bgp to zebra"
This reverts commit ee9633ed87.
Revert "lib, zebra: reuse and adapt ns_list walk functionality"
This reverts commit c4d466c830.
Revert "zebra: local mac entries populated in correct netnamespace"
This reverts commit 4042454891.
Revert "zebra: when parsing local entry against dad, retrieve config"
This reverts commit 3acc394bc5.
Revert "bgpd: evpn nexthop can be changed by default"
This reverts commit a2342a2412.
Revert "zebra: zvni_map_to_vlan() adaptation for all namespaces"
This reverts commit db81d18647.
Revert "zebra: add ns_id attribute to mac structure"
This reverts commit 388d5b438e.
Revert "zebra: bridge layer2 information records ns_id where bridge is"
This reverts commit b5b453a2d6.
Revert "zebra, lib: new API to get absolute netns val from relative netns val"
This reverts commit b6ebab34f6.
Revert "zebra, lib: store relative default ns id in each namespace"
This reverts commit 9d3555e06c.
Revert "zebra, lib: add an internal API to get relative default nsid in other ns"
This reverts commit 97c9e7533b.
Revert "zebra: map vxlan interface to bridge interface with correct ns id"
This reverts commit 7c990878f2.
Revert "zebra: fdb and neighbor table are read for all zns"
This reverts commit f8ed2c5420.
Revert "zebra: zvni_map_to_svi() adaptation for other network namespaces"
This reverts commit 2a9dccb647.
Revert "zebra: display interface slave type"
This reverts commit fc3141393a.
Revert "zebra: zvni_from_svi() adaptation for other network namespaces"
This reverts commit 6fe516bd4b.
Revert "zebra: importation of bgp evpn rt5 from vni with other netns"
This reverts commit 28254125d0.
Revert "lib, zebra: update interface name at netlink creation"
This reverts commit 1f7a68a2ff.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
* Use nl_attr_add32 instead of nl_attr_add where it is possible.
* Move common code from build_singlepath() and build_multipath()
to separate function.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
* Rename netlink utility functions like addattr to be less ambiguous
* Replace rta_attr_* functions with nl_attr_* since they introduced
inconsistencies in the code
* Add helper functions for adding rtnexthop struct to the Netlink
message
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
When deleting a p2p address from an interface, include
the destination address. Without this, we don't find the
internal connected datastruct and process the delete
correctly on netlink OSes.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
the interface name was not present in the hook in charge of updating the
interface context to the registered hook service. For that, update the
name before informing it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when working with vrf netns backend, two bridges interfaces may have the
same bridge interface index, but not the same namespace. because in vrf
netns backend mode, a bridge slave always belong to the same network
namespace, then a check with the namespace id and the ns id of the
bridge interface permits to resolve correctly the interface pointer.
The problem could occur if a same index of two bridge interfaces can be
found on two different namespaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when receiving a netlink API for an interface in a namespace, this
interface may come with LINK_NSID value, which means that the interface
has its link in an other namespace. Unfortunately, the link_nsid value
is self to that namespace, and there is a need to know what is its
associated nsid value from the default namespace point of view.
The information collected previously on each namespace, can then be
compared with that value to check if the link belongs to the default
namespace or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the link information of vxlan interface is populated in layer 2
information, as well as in layer 2 vxlan information. This information
will be used later to collect vnis that are in other network namespaces,
but where bgp evpn is enabled on main network namespaces, and those vnis
have the link information in that namespace.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The netlink_vrf_change() function is called both when a VRF device
is created in the Linux kernel and when it is activated. This
commit changes this function to perform the VRF misconfiguration
detection only when the VRF device is created, as doing the check
twice would cause a false positive followed by a hard failure (not
to mention the double check is unnecessary since the VRF table ID
can't change once the device is created).
Fixes#6319.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Currently the linux kernel allows you to specify the same
table id -> multiple vrf's. While I am arguing with
the kernel people about proper behavior here let's
just remove this as a possiblity from happening and
mark it a zebra stopable misconfiguration.
(Effectively we are preventing a crash down the line
as that all over FRR we assume it's a unique
mapping not a many to one).
Why fail hard? Because we hope to get the person
who misconfigured it to actually notice immediately
not hours or days down the line when shit hits the fan.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The netlink_request function takes a `struct nlmsghdr *`
pointer from a common pattern that we use:
struct {
struct nlmsghdr n;
struct fib_rule_hdr frh;
char buf[NL_PKT_BUF_SIZE];
} req;
We were calling it `netlink_request(Socket, &req.n)`
The problem here is that coverity, rightly so, sees that
we access the data after the nlmsghdr in netlink_request and
tells us we have an read beyond end of the structure. While
we know we haven't mangled anything up here because of manual
inspection coverity doesn't have this knowledge implicitly.
So let's modify the code call to netlink_request to pass in the
void pointer of the req structure itself, cast to the appropriate
data structure in the function and do the right thing. Hopefully
the coverity SA will be happy and we can move on with our life.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Move the supports_nh bool indicating whether the kernel we are
using supports nexthop objects into the netlink kernel interface
itself. Since only linux and netlink support nexthop object APIs
for now this is fine.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
On restart, if we failed to remove any nexthop objects due
to a kill -9 or such event, sweep them if we aren't using them.
Add a proto field to handle this and remove the is_kernel bool.
Add a dupicate flag that indicates this nexthop group is only
present in our ID hashtable. It is a dupicate nexthop we received
from the kernel, therefore we cannot hash on it.
Make the idcounter globally accessible so that kernel updates
increment it as soon as we receive them, not when we handle them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Added a check on startup for determining if the kernel supports
nexthop objects. It sets an appropriate bool on the zebra namespace
struct.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add all the neccessary code to allow nexthops to be processed
in separate dataplane contexts with the netlink dataplane kernel
provider.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add the functionality to parse new nexthop group messages
from the kernel and insert them into the appropriate hash
tables. Parsing is done at startup between interface and
interface address lookup. Add functionality to parse
changes to nexthops we already have. Add functionality
to parse delete nexthop messages from the kernel and
remove them from our table.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Separate interface lookup into its own function.
We need to know interfaces for reading in nexthop
information, but we need to know nexthops for reading
in the interface addresses. We will read in nexthops
between the two.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Since we don't have a daemon who's job is to handle kernel
routes and we don't get an explicit route delete anymore if
nexthops become unreachable from the kernel, zebra must
re-process kernel routes itself to make sure they are still valid.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Cleanup the interface creation apis to make it more
clear what they are doing.
Make it explicit that the creation via name/ifindex will
only add it to the appropriate list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>