Move neighbor programming to the dataplane; remove
old apis; remove some ifdef'd use of direct netlink
code points, using neutral values outside of the netlink-
specific files.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When resolving a nexthop, append its labels to the one its
resolving to along with the labels that may already be present there.
Before we were ignoring labels if the resolving level was greater than
two.
Before:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:07
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:07
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111, 00:00:04
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:17
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:17
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:17
ubuntu_nh#
```
This patch:
```
S> 2.2.2.2/32 [1/0] via 7.7.7.7 (recursive), label 2222, 00:00:04
* via 7.7.7.7, dummy1 onlink, label 1111/2222, 00:00:04
S> 3.3.3.3/32 [1/0] via 2.2.2.2 (recursive), label 3333, 00:00:02
* via 7.7.7.7, dummy1 onlink, label 1111/2222/3333, 00:00:02
K>* 7.7.7.7/32 [0/0] is directly connected, dummy1, label 1111, 00:00:11
C>* 192.168.122.0/24 is directly connected, ens3, 00:00:11
K>* 192.168.122.1/32 [0/100] is directly connected, ens3, 00:00:11
ubuntu_nh#
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When a new system route comes in and we have a pre-existing
non-system route we are not deleting the current system
route from the linux kernel.
Modify the code such that when a route replace is sent
to the kernel with a new route as a system route and
the old route as a non-system route do a delete of
the old route so it is no longer in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a conditional to guard against segfaulting on the debug
statement when zvrf lookup fails.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When installing a table route into the kernel choose
RTPROT_ZEBRA as the installing/controlling protocol.
This way we can know we installed it as well as stop
the warnings about this special case of `ip import-table XX`
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The failed neighbor event logging that was recently added in
commit: 3acae086ba
cast a bit too broad of a stroke. We should only inform
the user that we were ignoring the RTM_NEWNEIGH FAIL callback
when we believe it was one of our own 5549 entries.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a bit of extra code to indicate to the operator why
we intentionally rejected a kernel route from being used.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we get a neighbor entry for 5549 failure notice
from the kernel that means that something has probably
gone terribly wrong. Let's notice and not reinstall.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The re->uptime usage of time(NULL) leaves it open to
timing changes from outside influence. Switching
to monotime allows us to ensure that we have a timestamp
that is always increasing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We have several route types KERNEL and CONNECT that are handled via special
case in the code. This was causing a lot of work keeping the two different
classes of route types as special(SYSTEM OR NOT). Put the dplane
in charge of the code that sets the bits for signalling route install/failure.
This greatly simplifies the code calling path and makes all route types
be handled exactly the same. Additionaly code that we want to run
post data plane install can just work as per normal then, instead
of having to know we need to run it when we have a special type
of route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
When we get a neighbor entry in zebra we start processing it.
Let's add some additional debugs to the processing so that when
it bails out and we don't use the data, we know the reason.
This should help in debugging the problems from why bgp does
not appear to have data associated with a neighbor entry
in the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The check for an entry being NUD_PERMANENT has already been done
there is no need to do it twice.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use const in the accessors for pseudowire nhlfe data; pull
that through the kernel-facing apis that use that data.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When we install a new route into the kernel always use
REPLACE. Else if the route is already there it can
be translated into an append with the flags we are
using.
This is especially true for the way we handle pbr
routes as that we are re-installing the same route
entry from pbr at the moment.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Read the onlink flag from the kernel for routes and pass them
up and through to zebra so that we are consistent with what
the kernel is telling us.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When the nexthop->type is NEXTHOP_TYPE_IPV4_IFINDEX we
were writing the RTA_PREFSRC 2 times for the build_singlepath
and build_multipath functions.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Some v6 attributes for the netlink_route_build_singlepath
code were being written two times for the NEXTHOP_TYPE_IPV6_IFINDEX
nexthop type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The kernel neigh update api helps update neighbor entry,
using changing state and flags parameters.
Ticket:CM-22864
Reviewed By:
Testing Done:
Signed-off-by:Chirag Shah <chirag@cumulusnetworks.com>
Finish the LSP update code for the async dataplane for
the openbsd platform. Remove synch apis now that we've
converted to the async code path.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Adding infra to zebra dplane to support LSP updates. Add
kernel api for LSP updates that uses a dataplane context; add
stub apis for netlink, bsd, and 'null' kernel paths. Add
version of netlink mpls update code that takes a dplane
context struct instead of a zebra lsp struct.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
An EVPN type-2 entry is in freeze state during remote update,
remote VTEP can send typ-2 withdraw update,
upon receiving an entry delete (withdraw), first check
kernel has in local reachable state. Upon
unfreeze use the local entry to advertise to peers.
Fetch is for both MAC and IP, delete can come for
only MAC or MAC-IP combined route.
The specific entry fetch only required request flag to be set,
dump flag is not required.
Testing Done:
Simulate two VTEPs to do M1, IP1 mobility sequence,
freeze MAC during remote MAC update, subsequently send
withdraw type-2 route from origintating VTEP.
This results in read apis to invoke for local reachable entry.
Zebra updates its cache and upon unfreeze originates type-2.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Make netlink_request api generic where it can be used
for dump or querying specific information request.
nelink request nlm flags (NLM_F_ROOT | NLM_F_MATCH) are
used to dump purpose, if client wants to query spcific
MAC or IP using netlink_request does not require to set
them.
nlm struct is passed by the caller of netlink_request,
it can also set the nlm request flags.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
NEXTHOP_FLAG_ACTIVE currently means that the nexthop is considered
good enough to be installed. With current ecmp restrictions this
translation from multipath_num is enforced in the data plane.
The problem with this is of course that every data plane now
becomes concerned about the multipath num and must enforce it
independently. Currently *bsd does not honor multipath_num at
all and linux marks all nexthops as being installed even when
it honors a multipath_num that is less than the total.
This code change moves the multipath_num enforcement from a dataplane
decision to a zebra nexthop decision. Thus dataplanes now can
just install those nexthops marked as NEXTHOP_FLAG_ACTIVE
without having to worry about multipath_num.
*BSD will now respect multipath_num and Linux now properly notes
which routes are actually installed or not:
sharpd@donna ~/f/t/topotests> ps -ef | grep frr
frr 6261 1556 0 09:12 ? 00:00:00 /usr/lib/frr/zebra -e 2 --daemon -A 127.0.0.1
frr 6279 1556 0 09:12 ? 00:00:00 /usr/lib/frr/staticd --daemon -A 127.0.0.1
donna.cumulusnetworks.com(config)# do show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route
K>* 0.0.0.0/0 [0/106] via 10.0.2.2, enp0s3, 00:00:45
S>* 4.4.4.4/32 [1/0] via 10.0.2.1, enp0s3, 00:00:02
* via 192.168.209.1, enp0s8, 00:00:02
via 192.168.210.1, enp0s9 inactive, 00:00:02
C>* 10.0.2.0/24 is directly connected, enp0s3, 00:00:45
C>* 192.168.209.0/24 is directly connected, enp0s8, 00:00:45
C>* 192.168.210.0/24 is directly connected, enp0s9, 00:00:45
donna.cumulusnetworks.com(config)#
sharpd@donna ~/f/t/topotests> ip route show
default via 10.0.2.2 dev enp0s3 proto dhcp metric 106
4.4.4.4 proto 196 metric 20
nexthop via 10.0.2.1 dev enp0s3 weight 1
nexthop via 192.168.209.1 dev enp0s8 weight 1
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 metric 106
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
192.168.209.0/24 dev enp0s8 proto kernel scope link src 192.168.209.2 metric 105
192.168.210.0/24 dev enp0s9 proto kernel scope link src 192.168.210.2 metric 103
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported that kernel neighbor entries could end up in "FAILED"
state when the neighbor entry was deleted. This fix handles the
notification of the event from netlink messages and re-inserts the
deleted entry.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Recursive multipath nexthops were broken by the initial async
dataplane - we were trying to install an extra, invalid
nexthop.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Reduce or eliminate use of global zebra_ns structs in
a couple of netlink/kernel code paths, so that those paths
can potentially be made asynch eventually.
Slide netlink_talk_info into place to remove dependency on core
zebra structs; add accessors for dplane context block
Start init of route context from zebra core re and rn structs;
start queueing and event handling for incoming route updates.
Expose netlink apis that don't rely on zebra core structs;
add parallel route-update code path using the dplane ctx;
simplest possible event loop to process queued route'
updates.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Conditional code in netlink_macfdb_update() introduced in 2232a77c used
the 'dst_present' variable because not all cases were covered. Now it is
not necessary.
Signed-off-by: F. Aragon <paco@voltanet.io>
Reduce or eliminate use of global zebra_ns structs in
a couple of netlink/kernel code paths, so that those paths
can potentially be made asynch eventually.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use boolean variables instead of unsigned int for certain VxLAN-EVPN
flags which are really used as boolean.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-22288
Reviewed By: CCR-7832
Testing Done:
Along with a subsequent, related commit
When a MAC moves from local to remote, a replace is allowed, EVPN
no longer has to delete the local MAC before installing the remote
MAC.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
So the linux kernel uses the RT_TABLE_MAIN for the table
id used for ip routing. The multicast routing tables use
RT_TABLE_DEFAULT. We changed the internal code of zebra_vrf
a few months back to use RT_TABLE_MAIN as the tableid to
use. This caused the pim sg stats to stop working because
of the kernel bug where it uses a different table
for ip routing and ip multicast.
Put a bit of a special case in to do the right thing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Newer linux kernels apparently send data down the netlink
bus for the creation of mroutes. Add a bit of code
to notice this and to handle it appropriately( ie do
nothing at this point in time ) as that the correct
place to do this is in the pim socket in pimd.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we are displaying data about a netlink message
in debugs or errors, print out the message type
as a string instead of a number.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There exists a possibility that the ifindex we are passed
does not exist and as such we should check for it not
resolving as part of the debug.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were ignoring mpls labels encapped with static routes.
Added support for single and multipath labels.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Allow protocols to specify to zebra that they would like zebra
to use the distance passed down as part of determine sameness for
Route Replace semantics.
This will be used by the static daemon to allow it to have
backup static routes with greater distances.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Prefix length validation checks should be returning an error
rather than 0. Switch to that and make them error messages.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Commit a2ca67d1d2 consolidated IPv4 and IPv6 handling. It also applied
our ignorance for IPv4 srcdest routes onto IPv6.
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
Bad nexthop messages from netlink were causing zebra
to hang here. Added a check to verify the length
of the nexthop so it doesn't keep trying to read.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Some more address family filters we can safely ignore
as well as typos in logger. Added AF_MPLS as filterable.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Zebra needed a check that varifies the prefix length
of an address is a valid length when receiving route
changes and interface address changes.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The zebra netlink socket was attempting to read netlink
messages with invalid address families in a couple areas.
Added filters and warn messages.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
EVPN ND ext community support NA flag R-bit, to have proxy ND.
Set R-bit in EVPN NA if a given router is default gateway or there is a
local
router attached, which can be determine based on local neighbor entry.
Implement BGP ext community attribute to generate and parse R-bit and
pass along zebra to program neigh entry in kernel.
Upon receiving MAC/IP update with community type 0x06 and sub_type 0x08,
pass the R-bit to zebra to program neigh entry.
Set NTF_ROUTER in neigh entry and inform kernel to do proxy NA for EVPN.
Ref:
https://tools.ietf.org/html/draft-ietf-bess-evpn-na-flags-01
Ticket:CM-21712, CM-21711
Reviewed By:
Testing Done:
Configure Local vni enabled L3 Gateway, which would act as router,
checked
show evpn arp-cache vni x ip <ip of svi> on originated and remote VTEPs.
"Router" flag is set.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Add 'const' to prefix args to several zebra route update,
redistribution, and route owner notification apis.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The interface lookup algorithm is different according to if we are on
netns vrf or not. If we are on the former case, then we only have to
parse the interfaces of the netns, while if we are on the other case, we
have to parse all the interfaces of all the vrfs ( since index is not
overlapping in the latter case).
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When we receive a netlink message from the kernel we have
handler functions for when we send a netlink command, if these
return a failure ( < 0 ) then we output that we had a parse
issue. But if all we get is:
2018-06-21T23:47:45.298156+00:00 qct-ix1-08 zebra[1484]: netlink-cmd (NS 0) filter function error
Then it is not very useful to figure out *where* the error happened.
Add more error code when in a decode path to hopefully allow us
to figure out where this message is coming from.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a bit of code to allow return of data plane
request messages.
Add the ability to pass the result back to callers
of kernel_route_rib.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The SOUTHBOUND_XXX enum was named a bit poorly.
Let's use a bit better name for what we are trying to do.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we receive a route that we think we own and we
are not in startup conditions, then add a small debug
to help debug the issue when this happens, instead
of silently just ignoring the route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The re-use of RTPROT_STATIC has caused too many collisions
where other legitimate route sources are causing us to
believe we are the originator of the route. Modify
the code so that if another protocol inserts RTPROT_STATIC
we will assume it's a Kernel Route.
Fixes: #2293
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The linux kernel is getting the same Route Replace semantics
for v6 that v4 uses. Allow the end-user to know if their
kernel has this ability and if so to specify it so zebra
can take advantage of this.
Why not do auto-detection? Because you would have to write
code in zebra to add a route then add the same route again
with different nexthops to see if which semantics it is using.
It sure is easier to just add a cli that allows the user to
do it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Setup the buf used for extra data passed into kernel such
that we are cleaning it out before writing data to it,
so we can avoid writing uninited data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that the next hop of the leaked VRF is not overwritten when the
route is being imported into the target VRF from the VPN table. Also, in
the case of multipath routes, ensure that the nexthop's ifindex is not
inadvertently reset.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ensure that when EVPN routes are installed into zebra, the router MAC
is passed per next hop and appropriately handled. This is required for
proper multipath operation.
Ticket: CM-18999
Reviewed By:
Testing Done: Verified failed scenario, other manual tests
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
We are missing some handling of PBR and SHARP protocols
for netlink operations w/ the linux kernel.
Additionally add a bread crumb for new developers( or existing )
to know to fixup the rt_netlink.c when we start handling new
route types to hand to the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
zserv.c has become something of a dumping ground for everything vaguely
related to ZAPI and really needs some love. This change splits out the
code fo building and consuming ZAPI messages into a separate source
file, leaving the actual session and client lifecycle code in zserv.c.
Unfortunately since the #include situation in Zebra has not been paid
much attention I was forced to fix the headers in a lot of other source
files. This is a net improvement overall though.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
With the recent change to just pass the prefix in
for the RTM_DELROUTE, for blackhole routes we
had stopped modifying the req.rtm_type to
be the appropriate type for blackhole routes.
Since we are just deleting on the route, and
zebra is never going to really install the same
route multiple times then we do not need
to specify the req.r.rtm_type for the deletion
command.
Ticket: CM-20616
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
EVPN owns the remote neigh entries which are programed in the kernel.
This entries should not age out and the only way to delete should be
from EVPN. We should program these entries with NUD_NOARP instead of
NUD_REACHABLE to avoid aging of this macs.
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
There can be a race condition between kernel and frr as follows.
Frr sends remote neigh notification.
At the (almost) same time kernel might send a notification saying
neigh is local.
After processing this notifications, the state in frr is local while
state in kernel is remote. This causes kernel and frr to be out of sync.
This problem will be avoided if FRR acts on the kernel notifications for
remote neighbors. When FRR sees a remote neighbor notification for a
neighbor which it thinks is local, FRR will change the neigh state to remote.
Ticket: CM-19923/CM-18830
Review: CCR-7222
Testing: Manual
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Background:
v6 does not have route replace semantics. If you want to add a nexthop
to an existing route, you just send RTM_NEWROUTE and the new nexthop.
If you want to delete a nexthop you should just send RTM_DELROUTE
with the removed nexthop.
This leads to situations where if zebra is processing a route
and has lost track of intermediate nexthops( yes this sucks )
then v6 routes will get out of sync when we try to implement
route replace semantics.
So notice when we are doing a route delete and the route is
not being updated, just send the prefix and tell it too delete.
Ticket: CM-20391
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This commit does 2 things:
1) When receiving a route from the kernel, display the incoming
table as part of the debug, to facilatate knowing what we are
talking about as part of the debug.
2) When displaying nexthop information for routes we were sending
to the kernel, no need to display the route information every time
Display the route then the individual nexthops for what we are doing.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Notice when someone deletes a neighbor entry we've put in for
rfc-5549 gets deleted by some evil evil person. When this happens
notice and push it back in, immediately.
Ticket: CM-18612
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add some additional debug information to the netlink debug
messages so we can see the table we are installing to as
well as the nexthop's vrf.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Also modify `struct route_entry` to use nexthop_groups.
Move ALL_NEXTHOPS loop to nexthop_group.h
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
So as to get the correct NETNS where some discovery must be done and
populated, the zns pointer is directly retrieved from zvrf, instead of
checking that the VRF is a backend NETNS or not.
In the case where the interfaces are discovered before the VRF is enabled
( VRF-lite populate), then the default NS is retrieved.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
For each route to be added or deleted, instead of applying directly to
default namespaces, when a vrf is mapped to a namespace, then the
correct zns must be found out.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The netns backend is chosen by VRF if a runtime flag named vrfwnetns is
selected when running zebra.
In the case the NETNS backend is chosen, in some case the VRFID value is
being assigned the value of the NSID. Within the perimeter of that work,
this is why the vrf_lookup_by_table function is extended with a new
parameter.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When a BGP-labeled route is resolved into an LDP-labeled IGP route,
zebra would install it with no labels in the kernel. This patch implements
recursive MPLS labels, i.e. make zebra install all labels from the route's
nexthop chain (the labels from the top-level nexthop being installed in
the top of the MPLS label stack). Multiple recursion levels are supported.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Turns out we had 3 different ways to define labels
all of them overlapping with the same meanings.
Consolidate to 1. This one choosen is consistent
naming wise with what the *bsd and linux kernels
use.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The route_node that we are working on is going to be interesting
to the kernel_route_rib_pass_fail. So I am setting up the
code to allow me to pass it. This will be done in a subsuquent
commit.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In order for routes to be leaked the ifindex must be sent
down into the kernel over the netlink protocol. So
send it( we always figure it out ) when we add the
route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add to the rib_add function the ability to pass in the nexthops
vrf.
Additionally when we decode the netlink message from the linux
kernel, properly figure out the nexthops vrf_id.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
With VRF route-leaking we need to know what vrf
the nexthops are in compared to this vrf. This
code adds the nh_vrf_id to the route entry and
sets it up correctly for the non-route-leaking
case.
The assumption here is that future commits
will make the nh_vrf_id *different* than
the vrf_id.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Linux has the ability to support a concept of 'realms'.
This concept allows you to mark routes with a realm id
value of 1-255. If you have marked the realm
of a route then you can use the tc program to
apply policy to the routes.
This commit adds the ability of FRR to interpret
a tag from (1-255) as a realm when installing into
the kernel. Please note that at this point in time
there is no way to set policy from within FRR. This
must be done outside of it.
The normal methodology for setting tags is valid here
via a route-map.
Finally this is only applied if the --enable-realms configure
option is applied.
Signed-off-by: Kaloyan Kovachev <kkovachev@varna.net>
Setup a interface such that the add/del of lsp's from
the kernel can have a callback for success/failure.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When a route is installed or deleted into the kernel allow a
callback mechanism to handle the success/failure of
the kernel call.
This separation is to allow us to do these things:
1) In the future create a true pthread to handle route
install/deletes. This way we can schedule these
events in a smarter fashion
2) Allow us to use a common southbound api for route
install and deletion.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add a daemon that will allow us to test the zapi
as well as test route install/removal times from
the kernel.
The current commands are:
install route <starting ip address> nexthop <nexthop> (1-1000000)
This command starts installing at <starting ip address>/32
(1-100000) routes that it auto-increments by 1
Installation start time is noted in the log and finish
time is noted as well.
remove routes <starting ip address> (1-1000000)
This command removes routes at <starting ip address>/32
and removes (1-100000) routes created by the install route
command.
This code can be considered experimental and *is not*
something that should be run in a production environment.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The v6 linux kernel netlink code doees not have
route replace semantics. So if we are in that
situation, do a delete/add to get the correct
results.
Fixes: #1461
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The function clear_nhlfe_installed is to be called
when we get a install failure of some sort for
a lsp change. Since an install failure can happen
in both linux and openBSD moving the function call
northbound is a good idea.
I've also added it to the kernel_del_lsp for completeness
on failure as well, even though neither linux or openBSD
currently can fail a uninstall.
This still leaves the hole where if we have multiple
nhlfes and have an install failure we are not quite
doing the right thing by just blanketly calling
clear_nhlfe_installed.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported with not deleting LSPs from the zebra kernal mpls table
when a delete occurred in bgp. Found that we were exiting the delete
process incorrectly due to not being able to derive the route_type from
the best_nhlre on the lsp while deleting. Since this info was only
needed for route installation, removed this early exit in the case of
deleting the lsp.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Ticket: CM-18309
Reviewed By: CCR-6781
Testing Done: Manual testing looks good. mpls tests successful
This is the definitive solution to avoid build issues on old Linux
systems, where the system kernel headers might not contain some constants
or macros used by FRR (e.g. MPLS_IPTUNNEL_DST, introduced on 2015).
This is the same strategy adopted by other projects, like iproute2,
libnl, lldpd, strongswan, etc. These header files don't need to be in
sync with upstream, they only need to be updated when necessary (e.g. if
we want to use a new feature introduced by a recent kernel).
Fixes#962 using the solution suggested by David Lamparter.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
* use %u instead of %d, we don't want to print negative labels;
* increase the size of label_buf to accommodate the worst case scenarios;
* use strlcat() instead of strcat() as a security best practice.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Fix following flaws that resulted in EVPN with L3 multi-tenancy (i.e.,
EVPN dealing with VxLAN routing in the presence of tenant VRFs) not
working properly:
1. EVPN enable ("advertise-all-vni") is a global command, ensure it is
accordingly processed. The config is maintained against the default VRF.
2. There was an incorrect attempt to derive the L3 VRF for L2 interfaces
- the VRF only applies for L3 interfaces, though the code may initialize
to the default value in other cases.
3. Functions to map (port, VLAN) to SVI or vice versa were incorrect -
particularly, zvni_map_svi() since it was looking in the L3 VRF for
"matching" L2 interface which it would never find. Fix.
In addition, since the 'zebra_vrf *' parameter is not relevant in most
places, it has been removed.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-17840
Reviewed By: CCR-6685
Testing Done: evpn-smoke, various manual tests
This is a continuation of 915902cb82. Basically the netlink
read of messages up from the kernel is now noticing the proper
owner of the route. As such when rib_delete was being called
as part of the upcall from the kernel we were not noticing that
we were the originator and not diss-allowing the rib_delete
from happening. This restores this behavior that we were getting
pre-915902cb82cfd
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
For ZEBRA_ROUTE_KERNEL types:
The metric/priority of the route received from the kernel
is a 32 bit number. We are going to interpret the high
order byte as the Admin Distance and the low order 3 bytes
as the metric.
This will allow us to do two things:
1) Allow the creation of kernel routes that can be
overridden by zebra.
2) Allow the old behavior for 'most' kernel route types
if a user enters 'ip route ...' v4 routes get a metric
of 0 and v6 routes get a metric of 1024. Both of these
values will end up with a admin distance of 0, which
will cause them to win for the purposes of zebra.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This fixes the broken indentation of several foreach loops throughout
the code.
From clang's documentation[1]:
ForEachMacros: A vector of macros that should be interpreted as foreach
loops instead of as function calls.
[1] http://clang.llvm.org/docs/ClangFormatStyleOptions.html
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
There exists situations where it is possible to have duplicate
nexthops passed from a higher level protocol into zebra.
This code notices this duplication of nexthops and marks
the duplicates as DUPLICATE so we don't attempt to install
it into the kernel.
This is important on *BSD as I understand it because passing
duplicate nexthops will cause the route to be rejected.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
So the current code for a blackhole route assumed that you
would never want a recursively resolved blackhole to work.
Suppose you have this setup:
1) ip route 192.0.2.1/32 Null0
2) BGP installed with a route-map that rewrites the
nexthop to 192.0.2.1.
Zebra will end up with a recursive nexthop that resolves
to the blackhole.
The original rib install function assumed that we would never
want the ability to recursively resolve a blackhole route.
Instead just handle the blackhole as part of the nexthop_num = 1
case.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
With the change to make zebra pass routes to the kernel
with the 'correct' proto name, it caused zebra to
not properly recognize them on startup again
the next time such that the route would not
be deleted.
Modify rt_netlink.c to notice that we have a
self originated route and to properly mark
the type of route it was.
Modify rib_table_sweep to mark the nexthops
as active so that when we go to delete the
self originated routes it would properly
delete from the kernel.
Fixes: #1061
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we've set the bh_type to something besides BLACKHOLE_UNSPEC
due to the received route type being RTN_BLACKHOLE,
RTN_UNREACHABLE or RTN_PROHIBIT then just trust that
the nexthop is just what it is and set accordingly.
Fixes: #1082
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
support processing of RTN_BLACKHOLE et al. from kernel and dump them
into appropriate blackhole rib entries.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
blackhole support was horribly broken. cleanup by removing blackhole
stuff from ZEBRA_FLAG_*
introduces support for "prohibit" routes (Linux/netlink only)
also clean up blackhole options on "ip route" vty commands.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Frr has an assumption that when interface A links to B,
we already know about B. But that might be true always.
It is probably purely depends on the configuration
and how the interfaces are hashed in Kernel.
FRR seems to sometimes get "A is linked to B" before it knows about B,
in that case, the linkage between the data structure for A & B won't be proper.
Ticket: CM-17679
Review: ccr-6628
Testing: Manual
Signed-off-by: Mitesh Kanjariya <mitesh@cumulusnetworks.com>
When the linux kernel adds/deletes routes, the
metric is important, but our routing protocols
add/delete in a slightly different manner,
so allow kernel metrics to match so that our
rib matches the kernel's fib.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
RTA_PAYLOAD() return value depends on the platform bits.
make[5]: Nothing to be done for 'all-am'.
Making all in zebra
CC rt_netlink.o
../../zebra/rt_netlink.c: In function 'netlink_macfdb_change':
../../zebra/rt_netlink.c:1695:63: error: format '%ld' expects argument of type 'long int', but argument 7 has type 'unsigned int' [-Werror=format=]
"%s family %s IF %s(%u) brIF %u - LLADDR is not MAC, len %ld",
^
../../zebra/rt_netlink.c: In function 'netlink_ipneigh_change':
../../zebra/rt_netlink.c:2024:57: error: format '%ld' expects argument of type 'long int', but argument 6 has type 'unsigned int' [-Werror=format=]
"%s family %s IF %s(%u) - LLADDR is not MAC, len %ld",
^
Signed-off-by: Jorge Boncompte <jbonor@gmail.com>
The pimregX devices when created by the kernel are put into
the default vrf. When pim gets the callback that the device
exists, check to see if it is a pimregX device and if so
move it into the appropriate vrf.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This current implementation unfortunately must
ask the kernel for all mroutes because vrf's
do not have the ability to request a single
mroute at this time.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Implement support for sticky (static) MACs. This includes the following:
- Recognize MAC is static (using NUD_NOARP flag) and inform BGP
- Construct MAC mobility extended community for sticky MACs as per
RFC 7432 section 15.2
- Inform to zebra that remote MAC is sticky, where appropriate
- Install sticky MACs into the kernel with the right flag
- Appropriate handling in route selection
Signed-off-by: Daniel Walton <dwalton@cumulusnetworks.com>
Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Implement handling of MACs and Neighbors (ARP/ND entries) in zebra:
- MAC and Neighbor database handlers
- Read MACs and Neighbors from the kernel, when needed and create
entries in zebra's MAC and Neighbor databases.
- Handle add/update/delete notifications from the kernel for MACs and
Neighbors and update zebra's database appropriately
- Inform locally learnt MACs and Neighbors to client
- Handle MACIP add/delete from client and install appriporiate entries
into the kernel
- Since Neighbor entries will be installed on an SVI, implement the
needed mappings
NOTE: kernel interface is only implemented for Linux/netlink
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Implement fundamental handling for VNIs and VTEPs:
- Handle EVPN enable/disable by client (advertise-all-vni)
- Create/update/delete VNIs based on VxLAN interface events and inform
client
- Handle VTEP add/delete from client and install into kernel
- New debug command for VxLAN/EVPN
- kernel interface (Linux/netlink only)
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
For NHRP, EIGRP and LDP( This is for consistency as opposed to correctness )
assign some new values to routes to be installed into the kernel
so we can know who owns them later.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The 'struct rib' data structure is missnamed. It really
is a 'struct route_entry' as part of the 'struct route_node'.
We have 1 'struct route_entry' per route src. As such
1 route node can have multiple route entries if multiple
protocols attempt to install the same route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Rearrange the _netlink_route_build*() functions so the labels of the
nexthops are always installed, even for IPv4 routes with IPv6 nexthops.
Fixes Labeled Unicast with BGP Unnumbered.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
the ipv4_ll address used for 5549 routes does not need
to be figured out every single time that we attempt
to install/remove a route of that type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When zebra issues read (GET) requests to the kernel using the netlink
interface, it is incorrect to format all of them in a generic manner
using 'struct ifinfomsg' or 'struct rtgenmsg'. Rather, messages for a
particular entity (e.g., routes) should use the corresponding structure
for encoding (e.g., 'struct rtmsg'). Of course, this has to correlate
with what the kernel expects.
In the absence of this, there is the possibility of sending extraneous
information in the request which the kernel wouldn't like.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Ticket: CM-14313
Reviewed By:
Testing Done: bgpmin, ospfmin, bgp_kitchen_sink_test
'ip route show' displays all routes as belonging to protocol zebra.
The user has to run an additional command (in vtysh) to get the actual
source of a route (bgp/ospf/static etc.). This patch addresses that by
pushing the appropriate protocol string into the protocol field of the
netlink route update message. Now you can see routes with the correct
origin as well as filter on them (ip route show proto ospf).
'ospf' is used for both IPv4 and IPv6 routes, even though the OSPF
version is different in both cases.
Sample output (old):
9.9.12.13 via 69.254.2.38 dev swp3.2 proto zebra metric 20
9.9.13.3 proto zebra metric 20
nexthop via 69.254.2.30 dev swp1.2 weight 1
nexthop via 69.254.2.34 dev swp2.2 weight 1
nexthop via 69.254.2.38 dev swp3.2 weight 1
Sample output (new):
9.9.12.13 via 69.254.2.38 dev swp3.2 proto bgp metric 20
9.9.13.3 proto bgp metric 20
nexthop via 69.254.2.30 dev swp1.2 weight 1
nexthop via 69.254.2.34 dev swp2.2 weight 1
nexthop via 69.254.2.38 dev swp3.2 weight 1
The kernel can send a DELROUTE with a individual
nexthop. Technically this is meant to delete that
individual nexthop from the route but zebra
has no way to do this currently. So we just delete
the route.
V4 -> Never sends a DELROUTE with multiple nexthops
as a way to modify the rib. It sends a a NEWROUTE
with RTM_REPLACE with the new appropriate route.
V6 -> Sends a DELROUTE with multiple nexthops
which is supposed to be interpreted as a
subtraction from the route.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In the near future it will be possible to recieve v6 multipath netlink
messages. This code change is in prep for it. In the meantime the
v6 code path will continue to work as per normal.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The reading if unicast routes from the kernel acts subtly differently
between reading in the routes from the kernel on startup and
reading a new route or getting a response for a route.
Add startup flag(currently ignored) so that we can start
consolidating the functionality.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When starting up bgp and zebra now, you can specify
-e <number> or --ecmp <number>
and that number will be used as the maximum ecmp
that can be used.
The <number specified must be >= 1 and <= MULTIPATH_NUM
that Quagga is compiled with.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This reverts commit 1a11782c40.
The change is not suitable for stable/2.0, it's not a bugfix and has
quite a visible user impact.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported was stale routes left in the kernel in certain cases
when overlapping static routes were used and links were bounced. The
problem was determined to be an issue where the nexthop was changed
due to recursion as the link is going down, and the next-hop at the
time of deletion doesn't match what was previously installed by the
kernel. This caused the kernel to reject the deletion and the route
stuck around.
It was pointed out that the kernel doesn't actually require a next-hop
value on the netlink deletion call. In this fix, we are eliminating
the nexthop for RTM_DELROUTE messages to the kernel in the ipv4 singlepath
case. This approach could also be valid for other cases but the fix
as is resolved the reported failure case. More testing should be
performed before similar changes are made for other cases.
Testing included manual testing for the failure condition as well as
complete bgp-smoke and ospf-smoke tests with no new failures.
Ticket: CM-13328
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: CCR-5562
Ticket: CM-14313
Reviewed By:
Testing Done: bgpmin, ospfmin, bgp_kitchen_sink_test
'ip route show' displays all routes as belonging to protocol zebra.
The user has to run an additional command (in vtysh) to get the actual
source of a route (bgp/ospf/static etc.). This patch addresses that by
pushing the appropriate protocol string into the protocol field of the
netlink route update message. Now you can see routes with the correct
origin as well as filter on them (ip route show proto ospf).
'ospf' is used for both IPv4 and IPv6 routes, even though the OSPF
version is different in both cases.
Sample output (old):
9.9.12.13 via 69.254.2.38 dev swp3.2 proto zebra metric 20
9.9.13.3 proto zebra metric 20
nexthop via 69.254.2.30 dev swp1.2 weight 1
nexthop via 69.254.2.34 dev swp2.2 weight 1
nexthop via 69.254.2.38 dev swp3.2 weight 1
Sample output (new):
9.9.12.13 via 69.254.2.38 dev swp3.2 proto bgp metric 20
9.9.13.3 proto bgp metric 20
nexthop via 69.254.2.30 dev swp1.2 weight 1
nexthop via 69.254.2.34 dev swp2.2 weight 1
nexthop via 69.254.2.38 dev swp3.2 weight 1
Signed-off-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Check and read the IPv6 source prefix on ZAPI messages, and pass it down
to the RIB functions (which do nothing with it yet.) Since the RIB
functions now all have a new extra argument, this also updates the
kernel route read functions to supply NULL.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
The code to collect the sg stats was written for linux.
Abstract the call to allow it to work on all platforms.
I have not implemented the call for non-linux systems.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fully decode mcast messages from the kernel. We are not
doing anything with this at the moment, but that will
change.
Additionally convert over to using lookup for
displaying the route type.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The netlink_talk call sends a message to the kernel, which
with netlink_talk_filter only waits for the ACK.
It would be nice to have the ability to specify what the handler
function would be for when we send queries about mcast S,G routes
so that we can gather the data returned from the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There's no need to duplicate the 'vrf_id' and 'name' fields from the 'vrf'
structure into the 'zebra_vrf' structure. Instead of that, add a back
pointer in 'zebra_vrf' that should point to the associated 'vrf' structure.
Additionally, modify the vrf callbacks to pass the whole vrf structure
as a parameter. This allow us to make further simplifications in the code.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
[DL: picked out from: "atomic FIB updates"]
This simplifies the OS-specific route update API into a single entry
point, kernel_route_rib(), which dispatches the various operations
internally.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
In linux, 'scope' is a hint of distance of the IP. And this is
evident from the fact that only lower scope can be used as recursive
via lookup result. This changes all interface routes scope to link
so kernel will allow regular routes to use it as via. Then we do
not need to use the 'onlink' attribute.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
This patch installs labeled static routes in the FIB. The routes are installed
using the RTA_ENCAP (and RTA_ENCAP_TYPE) nested attributes.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-6040
Reviewed By: CCR-3091
Testing Done: Tested in SE-1, brief manual testing now
Install the statically configured LSPs into the FIB (kernel). This is done
using the new attributes and definitions for MPLS in the kernel -
RTA_VIA, RTA_NEWDST and AF_MPLS.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-4804
Reviewed By: CCR-3088
Testing Done: Manual in SE-1
The alignment of nlmsg_len is calculated wrong leading to wrong rta_len
calculations for nested TLVs when the data length of the last TLV added
to the nested TLV is not aligned to RTA_ALIGNTO already. Use same fix
that was implemented in iproute2 by Thomas Graf circa 2005. A reference
to the fix is at
http://oss.sgi.com/archives/netdev/2005-03/msg03103.html.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-6491
Reviewed By: CCR-3087
Testing Done: MPLS testing with other patches in SE-1
Note: Prior to MPLS, we didn't face this problem as we haven't really had
any nested TLVs; even if RTA_MULTIPATH were to be considered a nested TLV,
it didn't have any non-aligned fields.
This is a rather large mechanical commit that splits up the memory types
defined in lib/memtypes.c and distributes them into *_memory.[ch] files
in the individual daemons.
The zebra change is slightly annoying because there is no nice place to
put the #include "zebra_memory.h" statement.
bgpd, ospf6d, isisd and some tests were reusing MTYPEs defined in the
library for its own use. This is bad practice and would break when the
memtype are made static.
Acked-by: Vincent JARDIN <vincent.jardin@6wind.com>
Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
[CF: rebased for cmaster-next]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
This removes the BSD specific usage of struct sockaddr_dl
hardware address. This unifies to use explict hw_addr member for
the address, and zebra specific enumeration for the link layer
type.
Additionally the zapi is updated to never send platform specific
structures over the wire, but the ll_type along with hw_addr_len
and hw_addr are now sent for all platforms.
Based on initial work by Paul Jakma.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
# Please enter the commit message for your changes. Lines starting
# with '#' will be kept; you may remove them yourself if you want to.
# An empty message aborts the commit.
#
# Author: Timo Teräs <timo.teras@iki.fi>
#
# rebase in progress; onto 9c2f85d
# You are currently editing a commit while rebasing branch 'renato' on '9c2f85d'.
#
# Changes to be committed:
# modified: isisd/isis_circuit.c
# modified: lib/if.c
# modified: lib/if.h
# modified: lib/zclient.c
# modified: zebra/interface.c
# modified: zebra/interface.h
# modified: zebra/kernel_socket.c
# modified: zebra/rt_netlink.c
# modified: zebra/rtadv.c
# modified: zebra/zserv.c
#
# Untracked files:
# "\033\033OA\033OB\033"
# 0001-bgpd-fix-build-on-Solaris.patch
# ldpd/
# redhat/ldpd.init
# redhat/ldpd.service
# tags
#
This commits allow overriding MTU using netlink attributes on
per-route basis. This is useful for routing protocols that can
advertice prefix specific MTUs between routers (e.g. NHRP).
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
(cherry picked from commit b11f3b54c842117e22e2f5cf1561ea34eee8dfcc)
NS_DEFAULT is #defined to 0, We are passing it
in to a function that is taking 'struct zebra_ns *'
which is translating into a NULL pointer. Which
in some situations will cause a crash.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Daniel Walton <dwalton@cumulusnetworks.com>
Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
(cherry picked from commit 1e9fa2763953adc603c3acc4ed2a46c9e72cbb29)
We shouldn't have platform-agnostic code (e.g. zebra/interface.c)
calling platform-specific functions (e.g. netlink_neigh_update).
This commit introduces the kernel_neigh_update() function, which then
has to be implemented by all supported platforms. Currently only Linux
implements this function, which is only used by the RTADV code.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Resolved several memory leaks caused by ifdown/ifup the vrf device or
a swp port. For bgp/zebra/ospf/ospf6, bouncing the vrf device would cause
a linked list, Interface, and route-table to get leaked. For ospf6,
bouncing the swp device also caused leaks of Connected and Prefix entries.
Ticket: CM-10841
Signed-off-by: Don Slice
Reviewed-By: Donald Sharp
Testing Done: Manual testing, bgp and ospf mins passed, smokes had fewer failures than base
Add command and associated functionality to enable dumping
raw netlink messages.
Ticket: CM-6568
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
The patches to allow kernel v6 Route Replacement semantics
to work correctly are on a very recent kernel. If you are
compiling on a linux kernel where it's broken, just
compile with --disable-rr-semantics.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When zebra receives a recvmsg buffer from the kernel
silently exit so that watchquagga will notice and then
restart zebra.
Ticket: CM-11130
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
This makes code more robust, consice and readable.
Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
(cherry picked from commit be6335d682c5ee1b6930345193eda875705fbab2)
In _netlink_route_build_multipath():
- Each time when appending a IPv4 gateway in the message, rtnh_len
is increased by sizeof (struct rtattr) + 4, where we should use
"bytelen" instead of the hard coding "4".
- As what done for IPv4, we should increase rtnh_len accordingly
along with adding a IPv6 gateway, or else the IPv6 gateways will
be lost.
Signed-off-by: Feng Lu <lu.feng@6wind.com>
Reviewed-by: Alain Ritoux <alain.ritoux@6wind.com>
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
(cherry picked from commit 621e2aaf33d8ab73bf44b0eea3f3900135d34996)
Conflicts:
zebra/rt_netlink.c
The only user of this was rib_bogus_ipv6(), which was removed in the
previous commit. Good riddance.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Acked-by: Greg Troxel <gdt@ir.bbn.com>
Acked-by: Feng Lu <lu.feng@6wind.com>
Acked-by: Paul Jakma <paul@jakma.org>
(cherry picked from commit 51bdebad99fe813d1b7104543b352f0e39b1c8dc)
CM-10680
Issue: When BGP daemon is stopped, all the BGP BFD sessions are not getting deleted from PTM.
Root cause: BGP daemon stop causes BFD de-register message to be sent for every peer on which BFD is enabled. But, all the de-register messages from bgpd to zebra are not processed before the socket close. This results in some stale BGP BFD sessions.
Fix: Support for client de-register message has been added in PTM/BFD. Changes in Quagga to support BFD client de-registrations:
− The BFD clients de-registration is sent directly from zebra daemon when zebra client (bgpd, ospfd and ospf6d) socket close is detected.
− Introduced a BFD flag for the zebra clients to prevent BFD de-registration messages from being sent to zebra daemon when the client is shutting down. This reduces the BFD messaging.
CM-10540
Issue: Invalid ptm status “fail” instead of “n/a” being displayed for VRF interfaces.
Root cause: ptm status is not being initialized to “unknown” status when VRF interface is added or changed. The uninitialized value is ‘0’ which is the value for “fail”
Fix: Initialized the ptm status to the correct value.
Signed-off-by: Radhika Mahankali <radhika@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Kanna Rajagopal <kanna@cumulusnetworks.com>
Ticket: CM-10680, CM-10540
Reviewed By: CCR-4653
Testing Done: PTM smoke, BGP smoke and ptmd_test.py:TestMultipleAddrsIntfOspfBgp
Instead of turning on IPv6 RA on every interface as soon as it has an IPv6
address, only enable it upon configuration of BGP neighbor. When the BGP
neighbor is deleted, signal that RAs can be turned off.
To support this, introduce new message interaction between BGP and Zebra.
Also, take appropriate actions in BGP upon interface add/del since the
unnumbered neighbor could exist prior to interface creation etc.
Only unnumbered IPv6 neighbors require RA, the /30 or /31 based neighbors
don't. However, to keep the interaction simple and not have to deal with
too many dynamic conditions (e.g., address deletes or neighbor change to/from
'v6only'), RAs on the interface are triggered upon any unnumbered neighbor
configuration.
BGP-triggered RAs will cause RAs to be initiated on the interface; however,
if BGP asks that RAs be stopped (upon delete of unnumbered neighbor), RAs
will continue to be exchanged if the operator has explicitly enabled.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-10640
Reviewed By: CCR-4589
Testing Done: Various manual and automated (refer to defect)
Quagga does not have proper recovery for route install failure (in
the kernel). The lack of this may not be a significant issue if the
failure is only an exception. However, the introduction of route
replace presents a new failure scenario which was not there earlier.
Before replace, the update operation involved a delete followed by
add; the failure of add would not leave hanging route entries in the
kernel as they would've got deleted first. With route replace, if
the replace fails, recovery action to delete the route is needed, else
the route remains hanging in the kernel.
In particular, with VRFs and in the presence of ECMP/multipath, a
failure mode exists where Quagga thinks that routes have been cleaned
up and deleted from the kernel but the kernel continues to retain them.
This happens when multiple VRF interfaces are moved from one VRF to
another.
This patch addresses this scenario by implementing proper recovery for
route install failure.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-10361
Reviewed By: CCR-4566
Testing Done: bgp-min, ospf-min, bgp-smoke, ospf-smoke and manual
Note: There are some test failures and results aren't consistent across
runs; Daniel has resolved many of these through other fixes.
IFLA_INFO_SLAVE_KIND is a new type of netlink message
If the kernel makes it available compile in support
else we'll just silently do the right thing.
Additionally reduce the test cases for netlink by 1
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: David Ahern <dsa@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Move zebra_vrf_XXX functionality into it's own
file so that we can isolate a bit the api edges
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
The struct zebra_ns was littered throughout the code
base in a half-hazard fashion. Gather up the references
and isolate the code a bit better.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
We were including 'extern struct zebra_t zebrad;' all
over the place. This made no sense. Refactor
into zserv.h where the definition was and remove resulting
unnecessary code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
The vrf_add_update function does not need to exist.
Move it's constituent parts into the appropriate
vrf_create/vrf_enable functionality as well as
move the zebra_vrf_add_update() function call
into zebra_vrf_enable()
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
vrf_delete_update really belongs in vrf.c broken up
into it's appropriate places.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Zebra in rt_netlink.c has a while (1) loop that handles
recvmsg from the netlink socket. In early bootup a
situation can occur whereby the netlink messages
take a long time to parse. This time starts to
take an exponential amount of time the more netlink
messages that you read in. There reaches
a point where the incoming netlink messages are
coming in at about the same rate that they are processed.
This ends up causing the while (1) loop to never exit.
Eventually this causes quagga to fail when the watchdog message
is never sent to systemd.
This patch attempts to address this deficiency in that
we allow for a pause from reading in netlink messages
to allow other work to be done. This pause drains
the work queue items created by the netlink received
data and allows zebra to respond to other system input.
I believe we will need to come back in and modify zebra
startup a bit more. There are ineffiencies that need
to be addressed as part of boot up.
Ticket: CM-9992
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Wilson Kok <wkok@cumulusnetworks.com>
Zebra code was not handling larger table-ids correctly. There were 2 issues:
a) In the netlink interface, RTA_TABLE was never sent or processed. This
pretty much limited the table-ids that zebra could understand to < 255.
b) In the interface into the zebra RIB (in particular for protocols), there
were some incorrect checks that again assumed the table id should be < 252
or be "main". This is valid only for the Default VRF (for now), for other
VRFs, the table-id should be the value learnt from the kernel.
These two issues are addressed with this change.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Ticket: CM-10087, CM-10091
Reviewed By: CCR-4359
Testing Done: Manual
zebra was not actually deleting the vrf passed in.
Ticket: CM-9412
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
Restrict interfaces on which IPv6 Router Advertisements are allowed. The list
excludes loopback interfaces including the VRF device interface; specific to
Cumulus, it also includes "switch0" and "ethX" interfaces.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Ticket: CM-9849
Reviewed By: CCR-4334
Testing Done: Manual
Invoke VRF change for an interface, if appropriate, upon netlink
notification.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Ticket: CM-9527
Reviewed By: CCR-4174
Testing Done: Manual tests of various scenarios
Modified response to netlink message for VRF creation, allowing it to be
created as an interface and setting the right vrf_id and bringing in the ip address.
Ticket: CM-9277
Signed-off-by: Don Slice
Reviewed-by: Vivek Venkatraman
The earlier change to ignore status for VRF device was not quite perfect. As
defect CM-9437 illustrates, there are situations when Quagga may get a VRF
member interface (that refers to the VRF id of the VRF device) before it gets
the VRF device itself. The code has some logic to handle this, creating a
VRF structure which is partly initialized. The initialization is completed
with some additional incorrect status processing when the VRF is learnt. The
fix done earlier completely ignored the VRF message treating it as a status
change because the VRF is already present, but this left the VRF structure
not fully initialized in Quagga. The fix is to do some additional checks
to handle this scenario.
Fixes: 3e66be2ee6
Ticket: CM-9437
Reviewed By: None
Testing Done: Reproduced problem, verified fix.
Temporary change to ignore status change for a VRF device as it is
incorrectly implemented now. When VRF is also supported as an
interface, the status change will be handled for the interface.
Ticket:
Reviewed By:
Testing Done:
Since the netlink socket is per namespace and not per VRF, do not
invoke vrf_socket().
Note: This needs to be changed when we support multiple namespaces -
needed only for upstreaming.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-9206
Reviewed By: CCR-4127
Testing Done:
When enabling IPv6 Router Advertisements automatically based on the
presence of IPv6 address on an interface, do it only for relevant
interfaces.
Note: This needs a configure option for completion.
Ticket: CM-9358
Reviewed By: CCR-4116
Testing Done: Manual verification
vrf: check netlink message for slave info and set the vrf-id accoringly
When a netlink newlink or link change comes into zebra, check the IFLA_INFO_SLAVE_KIND
to discover if the interface is a member of a vrf or not. Set the vrf-id to the correct
value if the interface is a slave member
Signed-off-by: Don Sice
Reviewed-by:
When a slave device is received via netlink, all the
devices were being treated as vrf's instead of the
myriad of slave devices that are possible.
Add code to check to see if the device is truly a vrf slave
or not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fixup the debug handling of vrf's to be a bit
more explicit how we create a vrf internally.
Add code to turn on/off debugging of vrf's.
Ticket: CM-9063
Testing: Manual
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Convert the rest of zebra over to use a Namespae and VRF.
Signed-off-by: Vipin Kumar <vipin@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The NEXTHOP_TYPE_XXX_IFNAME types were never being used. Remove them
and the code associated with them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
During CR for nexthop upstream it was noticed that usage
of prefix2str was not consistent. This fixes this problem
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
At init time, Zebra queries the kernel for all interfaces. At this time,
an IPv6 address may exist on an interface but IPv6 DAD has not completed,
or it could be that DAD has failed. Zebra should examine the flags on
the address and act accordingly. Otherwise, it may end up with addresses
and routes which are not actually valid in the kernel, and this may lead
to, for example, BGP attempting neighbor connections on an interface on
which the source IPv6 address is not yet valid.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Ticket: CM-7176
Reviewed By: CCR-3815
Testing Done: Verify failed test (needs temporary test change)
Note: Imported from 2.5-br patch zebra-handle-ipv6-addr-status.patch.
Zebra currently performs a delete followed by add when a route needs to be
modified. Change this to use the replace semantics of netlink so that the
operation can possibly be atomic.
Note: This patch handles IPv6 routes, IPv4 already performs a replace.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Ticket: CM-5597
Reviewed By: CCR-3407
Testing Done: Manual testing of various scearnios (Vivek, Satish)
Note: This is an import of patch zebra-ipv6-route-replace.patch from 2.5-br.
The code has tests to see if the MULTIPATH_NUM == 0 and to
treat it like the user has entered 'Maximum PATHS'.
This 0 is treated as 64 internally. Remove this dependency
and setup MULTIPATH_NUM to 64 when --enable-multipath=0 from
the configure cli.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>
This patch lets the netlink sockets work per VRF.
* The definition of "struct nlsock" is moved into zebra/rib.h.
* The previous global variables "netlink" and "netlink_cmd" now
become the members of "struct zebra_vrf", and are initialized
in zebra_vrf_alloc().
* All relative functions now work for a specific VRF, by adding
a new parameter which specifies the working VRF, except those
functions in which the VRF ID can be obtained from the interface.
* kernel_init(), interface_list() and route_read() are now also
working per VRF, and moved from main() to zebra_vrf_enable().
* A new function kernel_terminate() is added to release the
netlink sockets. It is called from zebra_vrf_disable().
* Correct VRF ID, instead of the previous VRF_DEFAULT, are now
passed to the functions of processing interfaces or route
entries.
Signed-off-by: Feng Lu <lu.feng@6wind.com>
Reviewed-by: Alain Ritoux <alain.ritoux@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
Conflicts:
lib/memtypes.c
zebra/rib.h
zebra/rt_netlink.c
Conflicts:
zebra/if_netlink.c
zebra/if_sysctl.c
zebra/kernel_null.c
zebra/rib.h
zebra/rt_netlink.c
zebra/rt_netlink.h
A new member "vrf_id" is added to "struct rib", reflecting the VRF
which it belongs to.
A new parameter "vrf_id" is added to the relative functions where
need, except those:
- which already have the parameter "vrf_id"; or
- which have a parameter in type of "struct rib"; or
- which have a parameter in type of "struct interface".
All incoming routes are set to default VRF.
In fact, all routes in FIB are kept in default VRF. And the logic
is not changed.
Signed-off-by: Feng Lu <lu.feng@6wind.com>
Reviewed-by: Alain Ritoux <alain.ritoux@6wind.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Acked-by: Vincent JARDIN <vincent.jardin@6wind.com>
[DL: conflicts fixed + compile warning fix]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Conflicts:
zebra/connected.c
zebra/kernel_socket.c
zebra/rib.h
zebra/rt_netlink.c
zebra/zebra_rib.c
zebra/zserv.c
Conflicts:
zebra/connected.c
zebra/interface.c
zebra/kernel_socket.c
zebra/rib.h
zebra/rt_netlink.c
zebra/rtread_getmsg.c
zebra/zebra_rib.c
zebra/zebra_vty.c
zebra/zserv.c
Ticket: CM-6854
Reviewed By: CCR-3297
Testing Done: bgpsmoke, bgpclos to verify setting source (in 2.5-br)
Two pieces prevented the user from specifying a route-map with set src on
IPv4 routes learnt via BGP's RFC 5549 model (v4 prefix with v6 nexthop):
- There was code missing in the section specific to 5549 in setting
the src in the netlink message
- During RIB processing, route-map processing was ignored when the NH
was v6 and the route itself was v4.
As per the code, all route-map processing that uses nexthop validates the
NH type before applying the route-map and so there should be no errors
as a consequence of relaxing bullet 2 above.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket: CM-6680
Reviewed-by: CCR-3486
Testing: See bug
In these situations:
(A) user enters under bgp more 'maximum-paths' than zebra is compiled with
warn the user that there is a problem
(B) Zebra receives more maximum paths than what it can handle log the fact
that this happened
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ticket:
Reviewed By: CCR-3335
Testing Done: bgpsmoke, ENHE tests etc.
Add support for filtering routes from upper layer protocols to zebra
via route-maps for IPv6. The same functionality already existed for
IPv4.
In addition, add support for setting source of routes via IPv6 protocol
map.
Signed-off-by: Dinesh G Dutt <ddutt@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Vivek Venkataraman <vivek@cumulusnetworks.com>
Reviewed-by: Vipin Kumar <vipin@cumulusnetworks.com>
Zebra currently performs a delete followed by add when a route needs to be
modified. Change this to use the replace semantics of netlink so that the
operation can possibly be atomic.
Note: Only implemented for IPv4 currently.
be issued to Quagga. Quagga will in turn try to re-add the route(s) back to
the kernel and this will result in an error back from the kernel. This change
is to make sure these error messages are not logged by default. Subsequent
changes will cleanup this handling (to address CM-4577).
Note: This patch should not be upstreamed.