* Implement new dataplane operations
* Convert existing code to use dataplane context object
* Modify function preparing netlink message to use dataplane
context object
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
This commit is the first step to convert IP rule installation to
use dplane thread.
* Add dataplane's internal representation of a pbr rule
* Add dplane stats related to rules
* Introduce a new type of dplane operation
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
Start modifying the OPAQUE zapi message to include optional
unicast destination zapi client info. Add a 'decode' api and
opaque msg struct to encapsulate that optional info.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Change name of an opaque zapi api to 'decode' to align with the
other zapi message parsing apis. Missed that in the original
opaque commits.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Let's prevent nhlfe_alloc from actually returning anything that can fail:
1) nexthop_new -> never returns NULL so checking for NULL here
makes no sense, remove it.
2) lsp not being NULL is a assert condition here as that it's
a precondition for the function to work properly.
3) since nhlfe_alloc cannot return NULL now remove tests
for it in callng functions
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Every time zebra receives a ZEBRA_PW_SET, zebra should call
zebra_evaluate_rnh.
This fixes a race condition where zebra sometimes fails to install a
pseudowire that is 'up', and has a reachable next hop.
Signed-off-by: Karen Schoener <karen@voltanet.io>
Issue:
When BGP sends aggregation routes to zebra, the next hop is black hole.
Then Zebra will try to build the netlink FPM message, but there is no
next hop as it is a black hole route. Then the netlink_route_info_fill
function returns 0. In the result, zebra will crashed in
"assert(data_len)" of zfpm_build_route_updates.
This issue also happen when I create a static black hole route via
staticd.
Fix:
As the netlink message of the blackhole route is legal, it should return
success.
Signed-off-by: Richard Wu <wutong23@baidu.com>
Add initial support to maintain client daemon registrations for
OPAQUE messages. Use the registered zapi client info to forward
copies of OPAQUE messages sent to zebra.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The zapi code processes a batch of incoming messages, using a
fifo. Hand the entire batch into the main zebra handling code,
and let it loop through the individual messages.
Divert the special OPAQUE messages from the normal processing
flow, and offer them to the new zebra_opaque module instead.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a mutex used to manage the list of zclients. Add a busy
counter to the zapi client session, so that we can use a
client session from another pthread.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add the zebra_opaque module, designed to offload some opaque zapi
message processing to a new, dedicated pthread. Add to the build;
also re-sort the lists of zebra files in subdir.am.
Start, stop, and clean-up the opaque module, integrate with zebra
start and shutdown.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Move some processing of zapi label messages so they can be
handled more efficiently. Handle zapi delete and replace
messages.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Add a 'check' api to hold the code that determines whether an LSP
can be freed or not. Replace calls to the free api with check
calls.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Handle backup nhlfes in LSP zapi messages. Capture backup info
with LSPs, capture backup info in the dataplane LSP processing.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Provide a way for the data plane to indicate pseudowire
status (such as: not forwarding, AC failure).
On a data plane pseudowire install failure, data plane
sets the pseudowire status.
Zebra relays the pseudowire status to LDP.
LDP includes the pseudowire status in the LDP notification
to the LDP peer.
Signed-off-by: Karen Schoener <karen@voltanet.io>
When deleting a p2p address from an interface, include
the destination address. Without this, we don't find the
internal connected datastruct and process the delete
correctly on netlink OSes.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The northbound configuration callbacks should now print error
messages to the provided buffer (args->errmsg) instead of logging
them directly. This will allow the northbound layer to forward the
error messages to the northbound clients in addition to logging them.
NOTE: many callbacks are returning errors without providing any
error message. This needs to be fixed long term.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
`debug zebra packet detail` dumps the full message whereas
it had been dropping exactly 10 bytes, the size of the zebra header
Signed-off-by: Wesley Coakley <wcoakley@cumulusnetworks.com>
Currently zebra when you compile without router advertisements
will just say something like `cannot handle message 42`. Which
is not terribly useful to an end user.
Add some smarts to the zapi message handling to just do nothing
and output a debug if someone has it turned on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
the interface name was not present in the hook in charge of updating the
interface context to the registered hook service. For that, update the
name before informing it.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this is used when parsing the newly network namespaces. actually, to
track the link of some interfaces like vxlan interfaces, both link index
and link nsid are necessary. if a vxlan interface is moved to a new
netns, the link information is in the default network namespace, then
LINK_NSID is the value of the netns by default in the new netns. That
value of the default netns in the new netns is not known, because the
system does not automatically assign an NSID of default network
namespace in the new netns. Now a new NSID of default netns, seen from
that new netns, is created. This permits to store at netns creation the
default netns relative value for further usage.
Because the default netns value is set from the new netns perspective,
it is not needed anymore to use the NETNSA_TARGET_NSID attribute only
available in recent kernels.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the walk routine is used by vxlan service to identify some contexts in
each specific network namespace, when vrf netns backend is used. that
walk mechanism is extended with some additional paramters to the walk
routine.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when duplicate address detection is observed, some incrementation,
some timing mechanisms need to be done. For that the main evpn
configuration is retrieved. Until now, the VRF that was storing the dad
config parameters was the same VRF that hosted the VXLAN interface. With
netns backend, this is not true, as the VXLAN interface is in the
same VRF as the bridge interface. The modification takes same definition
as in BGP, that is to say that there is a single bgp evpn instance, and
this is that instance that will give the correct config settings.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this change is needed when a MAC/IP entry is learned by zebra, and the
entry happens to be in a different namespace. So that the entry be
active, the correct vni match has to be found.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
all network namespaces are read so as to collect interesting fdb and
neighbor tables for EVPN.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
this information is necessary for local information, because the
interface associated to the mac address is stored with its ifindex, and
the ifindex may not be enough to get to the right interface when it
comes with multiple network namespaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when working with vrf netns backend, two bridges interfaces may have the
same bridge interface index, but not the same namespace. because in vrf
netns backend mode, a bridge slave always belong to the same network
namespace, then a check with the namespace id and the ns id of the
bridge interface permits to resolve correctly the interface pointer.
The problem could occur if a same index of two bridge interfaces can be
found on two different namespaces.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
when receiving a netlink API for an interface in a namespace, this
interface may come with LINK_NSID value, which means that the interface
has its link in an other namespace. Unfortunately, the link_nsid value
is self to that namespace, and there is a need to know what is its
associated nsid value from the default namespace point of view.
The information collected previously on each namespace, can then be
compared with that value to check if the link belongs to the default
namespace or not.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
to be able to retrieve the network namespace identifier for each
namespace, the ns id is stored in each ns context. For default
namespace, the netns id is the same as that value.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
as remind, the netns identifiers are local to a namespace. that is to
say that for instance, a vrf <vrfx> will have a netns id value in one
netns, and have an other netns id value in one other netns.
There is a need for zebra daemon to collect some cross information, like
the LINK_NETNSID information from interfaces having link layer in an
other network namespace. For that, it is needed to have a global
overview instead of a relative overview per namespace.
The first brick of this change is an API that sticks to netlink API,
that uses NETNSA_TARGET_NSID. from a given vrf vrfX, and a new vrf
created vrfY, the API returns the value of nsID from vrfX, inside the
new vrf vrfY.
The brick also gets the ns id value of default namespace in each other
namespace. An additional value in ns.h is offered, that permits to
retrieve the default namespace context.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
an incoming bridge index has been found, that is linked with vxlan
interface, and the search for that bridge interface is done. In
vrf-lite, the search is done across the same default namespace, because
bridge and vxlan may not be in the same vrf. But this behaviour is wrong
when using vrf netns backend, as the bridge and the vxlan have to be in
the same vrf ( hence in the same network namespace). To comply with
that, use the netnamespace of the vxlan interface. Like that, the
appropriate nsid is passed as parameter, and consequently, the search is
correct, and the mac address passed to BGP will be ok too.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
other network namespaces are parsed because bridge interface can be
bridged with vxlan interfaces with a link in the default vrf that hosts
l2vpn.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
With vrf-lite mechanisms, it is possible to create layer 3 vnis by
creating a bridge interface in default vr, by creating a vxlan interface
that is attached to that bridge interface, then by moving the vxlan
interface to the wished vrf.
With vrf-netns mechanism, it is slightly different since bridged
interfaces can not be separated in different network namespaces. To make
it work, the setup consists in :
- creating a vxlan interface on default vrf.
- move the vxlan interface to the wished vrf ( with an other netns)
- create a bridge interface in the wished vrf
- attach the vxlan interface to that bridged interface
from that point, if BGP is enabled to advertise vnis in default vrf,
then vxlan interfaces are discovered appropriately in other vrfs,
provided that the link interface still resides in the vrf where l2vpn is
advertised.
to import ipv4 entries from a separate vrf, into the l2vpn, the
configuration of vni in the dedicated vrf + the advertisement of ipv4
entries in bgp vrf will import the entries in the bgp l2vpn.
the modification consists in parsing the vxlan interfaces in all network
namespaces, where the link resides in the same network namespace as the
bgp core instance where bgp l2vpn is enabled.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
the link information of vxlan interface is populated in layer 2
information, as well as in layer 2 vxlan information. This information
will be used later to collect vnis that are in other network namespaces,
but where bgp evpn is enabled on main network namespaces, and those vnis
have the link information in that namespace.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
When fetching the next route node in the RIB, skip the empty ones
to avoid calling other northbound callbacks later unnecessarily.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The motivation for this change is that IPv6 link-local routes don't
conform to the zebra YANG module since they all have the same prefix
(fe80::/64), but zebra's YANG module require each route to have
an unique prefix (the key of the "rib" list). This violation can
cause problems when iterating over the RIB asynchronously, so skip
those routes.
At the end of the day nobody cares about link-local routes anyway :)
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
When checking if a nexthop is active, if it has been marked as onlink,
just check on the presence and status of the nexthop's interface. When
handling client request to create a route, if the client says that the
nexthop is onlink, trust it; when internally (in zebra) determining
that the nexthop is onlink, ensure it is only done in the case of an
interface with a /32 IP address which is the case for OSPF unnumbered.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Stephen Worley <sworley@cumulusnetworks.com>
to make sure that c++ code can include them, avoid using reserved
keywords like 'delete' or 'new'.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
This commit implements:
RIB operational list create/destroy.
Walk over RIB tables using keys.
The first RIB table will be IPV4/unicast (table-id 254)
will be fetched.
Create a new api to fetch RIB table based on
afi-safi and table id as the keys.
remove mandatory true statement from the leaf which
is part of the list key.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
L2VPN PW are very hard to determine why they do not come up. The following
fixes expand the existing show commands in ldp and zebra to display a
reason why the PW is in the DOWN state and also display the labeled nexthop
route selected to reach the PW peer. By adding this information it will
provide the user some guidance on how to debug the PW issue. Also fixed an
assert if labels were changed for a PW that is between directly connected
peers.
Signed-off-by: Lynne Morrison <lynne@voltanet.io>
Loosen the ONLINK restrictions such that when an upper
level protocol sends us a nexthop with an ONLINK attribute
just ensure that interface is up and usable. ONLINK effectively
means we know what we are doing to the kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If you haven't migrated your FPM server to use next hop groups, it is
possible that you want to disable this feature. This commit implements
a toggle to enable/disable next hop groups usage (even if your Linux
kernel is not using it).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Implement the next hop group send on startup if you are using
them. Normally you will only have them if you are already using this
Linux kernel feature.
NOTE: to make sure all next hop groups exist, we send/enqueue all next
hop groups first and then we send routes. The RIB route walk start is
at the end of the function `fpm_nhg_send()`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Force off kernel NHG install with netns-based VRFs for
now. There is not really a good solution for allowing
kernel nexthop groups in namespaced based vrfs.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When installing a nexthop group, dump out the ifindex of the
nexthop being installed as a bit more data for the developer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is an implementation of the IS-IS SR draft [1] for FRR.
The following features are supported:
* IPv4 and IPv6 Prefix-SIDs;
* IPv4 and IPv6 Adj-SIDs and LAN-Adj-SIDs;
* Index and absolute labels;
* The no-php and explicit-null Prefix-SID flags;
* Full integration with the Label Manager.
Known limitations:
* No support for Anycast-SIDs;
* No support for the SID/Label Binding TLV (required for LDP interop).
* No support for persistent Adj-SIDs;
* No support for multiple SRGBs.
[1] draft-ietf-isis-segment-routing-extensions-25
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The netlink_vrf_change() function is called both when a VRF device
is created in the Linux kernel and when it is activated. This
commit changes this function to perform the VRF misconfiguration
detection only when the VRF device is created, as doing the check
twice would cause a false positive followed by a hard failure (not
to mention the double check is unnecessary since the VRF table ID
can't change once the device is created).
Fixes#6319.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Commit e93a6fbb4 from PR3908 changed every interface into an
'unnumbered' interface - even interfaces that do not have
ipv4 at all. Undo that.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The function zebra_vxlan_print_neigh_vni_vtep does not create
a json object when json has been requested from the CLI and as a
result it prints out the information in normal CLI format.
Fix is to allocate the json object when required.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Reported by testing agency that rfc 4861 section 6.2.1 states
that all implementations must have a configuration knob to change
the setting of the advertised retransmit timer sent in RA packets.
This fix adds that capability.
Ticket: CM-29199
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Intermittently, there is a 30 second delay for a LDP pseudowire to become
operational.
One way to reproduce the issue is: Once PW is up, shutdown link to trigger
a change to the pseudowire's next hop, and then restore link to cause
pseudowire to return to original NH.
Problem Descripton:
The Zebra PW manager installs pseudowires in the data plane when the
following two conditions are met:
1. Pseudowire is labeled via LDP mapping messages
2. A labeled NH route exists to reach the remote pseudowire endpoint
The Zebra PW manager registers a NHT callback when a pseudowire is enabled.
This allows the Zebra PW manager to install or reinstall the pseudowire.
The Zebra PW manager deregisters for the NHT callback when the pseudowire is
disabled. When LDP learns the remote-pseudowire status is 'not forwarding',
LDP notifies Zebra that the pseudowire is disabled.
This creates a race condition where a new labeled NH can be resolved after the
Zebra PW manager deregistered for the NHT callback.
For static pseudowires, it makes sense for Zebra PW manager to deregister for
NHT callbacks for disabled pseudowires. Static pseudowires become disabled
via CLI configuration commands.
For LDP pseudowires, the Zebra PW manager should not deregister for NHT
callbacks for disabled pseudowires.
Overview of changes:
1. Zebra PW manager should not deregister for NHT callbacks when an LDP
pseudowire is disabled.
Zebra PW manager will register for NHT callbacks when the LDP pseudowire
is first enabled.
Zebra PW manager will deregister for NHT callbacks when the LDP
pseudowire is deleted.
2. Remove the 30 second timer that was added in PR4122.
PR4122 tried to fix this race condition with a timer.
Once we eliminate the race condition (by keeping the Zebra PW manager
registered for NHT callbacks), this timer can be removed.
3. Zebra PW manager handling of static pseudowires will remain as-is.
Zebra PW manager will register for NHT callbacks when the static
pseudowire is enabled.
Zebra PW manager will deregister for NHT callbacks when the static
pseudowire is disabled.
Signed-off-by: Lynne Morrison <lynne@voltanet.io>
Signed-off-by: Karen Schoener <karen@voltanet.io>
An async route notification can indicate that installation
has failed, but the handling code wasn't dealing with that
possibility correctly.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
These are easy to get subtly wrong, and doing so can cause
nondeterministic failures when racing in parallel builds.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Having a fixed set of parameters for each northbound callback isn't a
good idea since it makes it difficult to add new parameters whenever
that becomes necessary, as several hundreds or thousands of existing
callbacks need to be updated accordingly.
To remediate this issue, this commit changes the signature of all
northbound callbacks to have a single parameter: a pointer to a
'nb_cb_x_args' structure (where x is different for each type
of callback). These structures encapsulate all real parameters
(both input and output) the callbacks need to have access to. And
adding a new parameter to a given callback is as simple as adding
a new field to the corresponding 'nb_cb_x_args' structure, without
needing to update any instance of that callback in any daemon.
This commit includes a .cocci semantic patch that can be used to
update old code to the new format automatically.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Currently the linux kernel allows you to specify the same
table id -> multiple vrf's. While I am arguing with
the kernel people about proper behavior here let's
just remove this as a possiblity from happening and
mark it a zebra stopable misconfiguration.
(Effectively we are preventing a crash down the line
as that all over FRR we assume it's a unique
mapping not a many to one).
Why fail hard? Because we hope to get the person
who misconfigured it to actually notice immediately
not hours or days down the line when shit hits the fan.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The function rt_netlink.c is using to lookup the vrf by
passed in table id.
I'm also going to pretend that this function is not
so awful to run when we have a large number of routes
incoming.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There are a couple of switch statements in netlink_route_info_encode
in zebra_fpm_netlink.c that had logically dead code. We have
a switch statement let's take actual advantage of it instead
of doing gyrations to what we want.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
- Fix 1 byte overflow when showing GR info in bgpd
- Use PATH_MAX for path buffers
- Use unsigned specifiers for uint16_t's in zebra pbr
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Replace sprintf with snprintf where straightforward to do so.
- sprintf's into local scope buffers of known size are replaced with the
equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
buffer followed by strlcat
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Replace all `random()` calls with a function called `frr_weak_random()`
and make it clear that it is only supposed to be used for weak random
applications.
Use the annotation described by the Coverity Scan documentation to
ignore `random()` call warnings.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Call the `dp_fini` callback twice: once at the beginning of the shutdown
and then again right before `exit()`ing zebra.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Coverity is complaining that we are looking beyond the end
of the pointer. Why not just use prefix_cmp here? Since
we are comparing to route_nodes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use the zapi client session id in the label manager apis;
use the client struct directly in some code. Assign a session
id to ldpd's sync LM zapi session.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Distinguish zapi sessions, for daemons who use more than one,
by adding a session id. The tuple of proto + instance is not
adequate to support clients who use multiple zapi sessions.
Include the id in the client show output if it's present. Add
a bit of info about this to the developer doc.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
And again for the name. Why on earth would we centralize this, just so
people can forget to update it?
Signed-off-by: David Lamparter <equinox@diac24.net>
Same as before, instead of shoving this into a big central list we can
just put the parent node in cmd_node.
Signed-off-by: David Lamparter <equinox@diac24.net>
There is really no reason to not put this in the cmd_node.
And while we're add it, rename from pointless ".func" to ".config_write".
[v2: fix forgotten ldpd config_write]
Signed-off-by: David Lamparter <equinox@diac24.net>
The only nodes that have this as 0 don't have a "->func" anyway, so the
entire thing is really just pointless.
Signed-off-by: David Lamparter <equinox@diac24.net>
Reported by testing agency that rfc 4861 section 6.2.1 states
that all implementations must have a configuration knob to change
the setting of the advertised hop limit. This fix adds that
capability.
Ticket: CM-29200
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
The netlink_request function takes a `struct nlmsghdr *`
pointer from a common pattern that we use:
struct {
struct nlmsghdr n;
struct fib_rule_hdr frh;
char buf[NL_PKT_BUF_SIZE];
} req;
We were calling it `netlink_request(Socket, &req.n)`
The problem here is that coverity, rightly so, sees that
we access the data after the nlmsghdr in netlink_request and
tells us we have an read beyond end of the structure. While
we know we haven't mangled anything up here because of manual
inspection coverity doesn't have this knowledge implicitly.
So let's modify the code call to netlink_request to pass in the
void pointer of the req structure itself, cast to the appropriate
data structure in the function and do the right thing. Hopefully
the coverity SA will be happy and we can move on with our life.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Implement the fix made in `master` to the remain pieces of code in the
data plane FPM module.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
* Break lines longer than 80 columns.
* Remove space after '('.
* Use '%pIX' instead of 'inet_ntop'.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Store VNI information in the data plane context so we can use it to
build the FPM netlink update with that information later.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Changes:
* Let the package builder scripts know that we have a new module that
needs to be taken care of.
* Include the frr atomic header to avoid undeclared atomic operations.
* Disable build on *BSDs because the code is using some zebra netlink
functions only available for Linux.
* Move data plane FPM module outside old FPM automake definition.
* Fix atomic usage for Ubuntu 14.04 (always use explicit).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
FPM has a thread to encode and enqueue output buffer that might compete
with zebra RIB/RMAC walk on startup, so lets use atomic operations to
make sure we are not getting statistic/counters wrong.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Enqueue all contexts inside FPM to avoid losing updates and to move all
processing to the FPM thread.
This helps in situations with huge amount of routes (e.g. BGP peer
flapping with a million routes).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add counters to debug the output buffer usage and pull down its data
when the remote receiver is slow (so we get more space for writes).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Implement the code that walks the RMAC to send routes that are already
inside installed in the OS.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add a public reset api, so a context can be reset and reused;
add apis to init a context for a route or mac update.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Instead of retuning always `0`, lets return the amount of used bytes for
the message. This will be used by the new FPM interface to know how many
bytes we must reserve for the output buffer.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
* Use `inet_ntop` instead of `inet_ntoa`
* Replace function name with `__func__`
* Inline functions
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Generalize the netlink route message building function so it can be used
in the future by the netlink Forwarding Plane Manager (FPM) interface.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
In some places we log the interface but not the vfr the
interface is in. In others we only output the vrf id, which
can be difficult for human to read. This commit makes zebra
debugs more vrf aware.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
Issue:
For consecutive messages such as
MAC1 -> VTEP1 add
MAC1 -> VTEP2 add
MAC1 -> VTEP1 add
Final state, i.e. (MAC1 -> VTEP1 add) should be sent via FPM.
But, with current code, FPM will send (MAC1 -> VTEP2 add)
RCA:
When FPM receives (MAC1, VTEP1), it stores it in the FPM processing queue and
hash table.
When FPM receives (MAC1, VTEP2), this entry is stored as another node as hash
table key is (mac, vtep and vni)
IF FPM again receives (MAC1, VTEP1), we fetch this node in the hash table
which is already enqueued.
When the FPM queue is processed, we will send FPM message for (MAC1, VTEP1)
first and then for (MAC1, VTEP2)
This sequencing issue happened because the key of the table is (MAC, VTEP, VNI)
Fix:
Change the key of the hash table to (MAC, VNI)
So, every time we receive a new update for (MAC1, VNI1), we will find a node in
the processing queue corresponding to MAC1 if present.
We will update this same node for every operation related to (MAC1, VNI1)
Thus, at the time when FPM processes this node, it will have latest MAC1 info.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
On startup of zebra, read in all ipv4/ipv6 rules from
the kernel and remove any with the zebra proto.
If there are any, this means we failed to remove them
on shutdown due to a crash or something. Without this,
users have to manually remove them with iproute2 or some
such and its really annoying.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Define some explicit rule replace code paths into the dataplane
code and improve the handling around it/releasing the the old
rule from the hash table.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When default route is requested from client, default
route is sent to client if present. When route gets
deleted then delete is sent to clients.
Signed-off-by: Santosh P K <sapk@vmware.com>
zebra should only check whether a get_chunk operation succeeded
when processing the response, rather than insde the get_chunk
call itself. Spllitting the request and response hooks was done
precisely to allow for asynchronous calls to an external label
manager; in this case, the requested chunk is not necessarily
going to be available at request time.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Free unhashable (duplicate NHEs from the kernel) via ID table
cleanup. Since the NHE ID hash table contains extra entries,
that's the one we need to be calling zebra_nhg_hash_free()
on, otherwise we will never free the unhashable NHEs.
This was found via a memleak:
==1478713== HEAP SUMMARY:
==1478713== in use at exit: 10,267 bytes in 46 blocks
==1478713== total heap usage: 76,810 allocs, 76,764 frees, 3,901,237 bytes allocated
==1478713==
==1478713== 208 (88 direct, 120 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 41
==1478713== at 0x483BB1A: calloc (vg_replace_malloc.c:762)
==1478713== by 0x48E35E8: qcalloc (memory.c:110)
==1478713== by 0x451CCB: zebra_nhg_alloc (zebra_nhg.c:369)
==1478713== by 0x453DE3: zebra_nhg_copy (zebra_nhg.c:379)
==1478713== by 0x452670: nhg_ctx_process_new (zebra_nhg.c:1143)
==1478713== by 0x4523A8: nhg_ctx_process (zebra_nhg.c:1234)
==1478713== by 0x452A2D: zebra_nhg_kernel_find (zebra_nhg.c:1294)
==1478713== by 0x4326E0: netlink_nexthop_change (rt_netlink.c:2433)
==1478713== by 0x427320: netlink_parse_info (kernel_netlink.c:945)
==1478713== by 0x432DAD: netlink_nexthop_read (rt_netlink.c:2488)
==1478713== by 0x41B600: interface_list (if_netlink.c:1486)
==1478713== by 0x457275: zebra_ns_enable (zebra_ns.c:127)
Repro with:
ip next add id 1 blackhole
ip next add id 2 blackhole
valgrind /usr/lib/frr/zebra
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The rtadv code has two types of sockets:
a) namespace -> Where each zvrf get's it's own socket
b) vrf lite -> Where we get 1 socket for everything
When we were terminating a vrf we were *always*
killing the (b) socket. This is a mistake in
that other vrf's may need to be communicating.
Modify the code on vrf shutdown to only disable
that vrf's event processing and when we actually
terminate we shut the socket.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We don't want to install backup nexthops - yet - as part of the
nexthop-id-based kernel interactions on netlink platforms. Avoid
mixing backup and primary nexthops in the tree of dependencies
in the ecmp cases.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Include backup nexthops in nhe processing; connect incoming
zapi route data with updated rib/nhg apis; add more debugs in
nhg processing.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Refactor the detailed route debugging so that the dump of nexthops
can be used for both normal/active nexthops and backups (if they
are present).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use a backup index in a nexthop directly (if it has a backup
nexthop); revise the zebra nhe/nhg code; revise zapi route
decoding to match; revise the dataplane route datastructs.
Refactor some of the rib_add_multipath code to be prepared to
be called with an nhe, carrying nexthop and (possibly) backup
info together.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use const with some args to ipaddr, zebra vxlan, mpls
lsp, and nexthop apis; add some extra checks to some
nexthop-related apis.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
If we find that a nexthop is a duplicate, break immediately
rather than continuing to look through the rest of the list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Properly set the NEXTHOP_GROUP_VALID flag and use it
as a conditional for installation decisions for individual
nexthop and groups containing it.
We set the NEXTHOP_GROUP_VALID flag it is:
1) A fully resolved active nexthop
or
2) Its a group that contains at least one VALID NHE
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were still doing a lookup on the nhe_id from before we
started referencing re->nhe directly.
Change set flag to just use re->nhe directly here since they
should always be the same at this point in the code anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we find a nexthop ID thats a duplicate in the code that converts
NHG rb trees into a flat list of nexthop IDs for the dataplane,
output a debug message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we transform the nexthop group rb trees into a flat
array of IDs to send into the dataplane code (zebra_nhg_nhe2grp),
don't put an ID in there that has not been in installed or is
not currently queued to be installed into the dataplane.
Otherwise, if some of the nexthops fail to install, we will
still try to create a group with them and then the entire group
will fail.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were not properly handling the case of a NHG inside of
another NHG when converting the rb tree of a multilevel NHG
into a flat list of IDs. When constructing, we call the function
zebra_nhg_nhe2grp_internal() recursively so that the rare
case of a group within a group is handled such that its
singleton nexthops are appended to the grp array of IDs
we send to the dataplane code.
Ex)
1:
-> 2:
-> 3
-> 4
->5:
->6
becomes this:
1:
->3
->4
->6
when its sent to the dataplane code for final kernel installation.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In the netlink code for determining whether to set
a src on the route, we check if the cmd=NEW_ROUTE
but its not possible for this to ever be anything
but a new route since we do a goto skip further up
if its a DEL_ROUTE cmd.
So remove this unnecessary check.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Determine src based on nexthop data even when we are using
kernel nexthop objects.
Before, we were entirely skipping this step and just sending the
nexthop ID, ignoring src determination.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Abstraction the route src determination from a nexthop in the
netlink code into a function for both singlepath and mutlipath
to call.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Line break at the end of the message is implicit for zlog_* and flog_*,
don't put it in the string. Mid-message line breaks are currently
unsupported. (LF is "end of message" in syslog.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Some logging systems are, er, "allergic" to tabs in log messages.
(RFC5424: "The syslog application SHOULD avoid octet values below 32")
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Zebra is currently sending messages on interface add/delete/update,
VRF add/delete, and interface address change - regardless of whether
its clients had requested them. This is problematic for lde and isis,
which only listens to label chunk messages, and only when it is
waiting for one (synchronous client). The effect is the that messages
accumulate on the lde synchronous message queue.
With this change:
- Zebra does not send unsolicited messages to synchronous clients.
- Synchronous clients send a ZEBRA_HELLO to zebra.
The ZEBRA_HELLO contains a new boolean field: sychronous.
- LDP and PIM have been updated to send a ZEBRA_HELLO for their
synchronous clients.
Signed-off-by: Karen Schoener <karen@voltanet.io>
We currently have netlink_neigh_update_ctx,
netlink_vxlan_flood_update_ctx and netlink_macfdb_update_ctx
all of which do slightly different RTM_NEWNEIGH calls into
the kernel. After this change, there will be one common
function.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
1) When programming a nhg id to the kernel we had no debug of that
is what we are doing.
2) Add debugs to all nexthop information to allow us to follow
which prefix we are talking about. This is especially
useful when dealing with a large number of routes and
you want to grep out one or two too see what is going on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that any weight associated with the next hop is installed for
IPv4 routes with IPv6 next hops too.
Updates: lib, zebra: Allow for installation of a weighted nexthop
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Add to the ZEBRA_INTERFACE_BFD_DEST_UPDATE code path
in zebra_ptm_redistribute.c the missing c-bit data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem: While zebra going down, rmap update thread is being called as part of
timer event. This make zebra to crash.
RCA: At this time route_map_master_hash is made to 0 by sig int handler.
This is causing Zebrad to crash while executing rmap update thread
Fix: As part of SIGINT handler, before calling routemap_finish,
thread off any routemap update scheduled at that point and make sure that
it wont get scheduled again by making the timeout as 0.
Signed-off-by: Saravanan K <saravanank@vmware.com>
The return type of is_selfroute function is changed from int to bool.
Also remove the redundant invoking of the is_selfroute function in the
calling function netlink_route_change_read_unicast
Fixes: https://github.com/FRRouting/frr/issues/5984
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
Readd the special MAC that represents the flood (head-end replication) entry
for EVPN-VxLAN upon getting a delete notification for it.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-25797
Ticket: CM-26238
Testing Done:
1. evpn-min, evpn-smoke - results summarized in CM-25798
add debug trace in specific neigh request send api
to help debug an issue where synchronous response parse
returns with NLMSG_DONE where there is no ipv6 neigh received.
the count value is set to 1 because the request contained
a spcific neigh.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Memory leak found where ipv6 global prefixes added to the router
advertisement prefix lists were not deleted when the process was
killed.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Add a common api that formats a time interval into a string
with different output for short and longer intervals. We do
this in several places, for cli/ui output.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Upper level clients ask for default routes of a particular family
This change ensures that they only receive the family that they
have asked for.
Discovered when testing in ospf `default-information originate`
=================================================================
==246306==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffa2e8 at pc 0x7ffff73c44e2 bp 0x7fffffffa090 sp 0x7fffffffa088
READ of size 16 at 0x7fffffffa2e8 thread T0
#0 0x7ffff73c44e1 in prefix_copy lib/prefix.c:310
#1 0x7ffff741c0aa in route_node_lookup lib/table.c:255
#2 0x5555556cd263 in ospf_external_info_delete ospfd/ospf_asbr.c:178
#3 0x5555556a47cc in ospf_zebra_read_route ospfd/ospf_zebra.c:852
#4 0x7ffff746f5d8 in zclient_read lib/zclient.c:3028
#5 0x7ffff742fc91 in thread_call lib/thread.c:1549
#6 0x7ffff7374642 in frr_run lib/libfrr.c:1093
#7 0x5555555bfaef in main ospfd/ospf_main.c:235
#8 0x7ffff70a2bba in __libc_start_main ../csu/libc-start.c:308
#9 0x5555555bf499 in _start (/usr/lib/frr/ospfd+0x6b499)
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This patch does two things:
1) Ensure the decoding of stream data between pim <-> zebra is properly
decoded and we don't read beyond the end of the stream.
2) In zebra when we are freeing memory alloced ensure that we
actually have memory to delete before we do so.
Ticket: CM-27055
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This code is effectively dead code. SO_PEERCRED is a getsockopt
call not *setsockopt* call. Additionally we are not doing
anything with the failed setsockopt call at all.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There was some code missed during the upstreaming process
due to code squash. Identify and put into a commit
to keep code consistent and correct.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
this flag can be used when one routing daemon wants to force his route
to be injected prioritary with other routes, including selected routes.
for that, do not forget to update the new_selected pointer in the zebra
nexthop tracking algorithm.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The handlers for a couple of the main LSP-oriented zapi
messages explicitly limited themselves to a single out-label.
Allow multiple labels if the sender ... sends them.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
We made the decision to explicitly trust kernel and system routes
of every other type with 058c16b7e2.
So, we should trust directly connected routes the same way, assuming
the interface exists.
Old Behavior:
K 2.2.2.1/32 [0/0] is directly connected, unknown inactive, 00:00:39
New Behavior:
K>* 2.2.2.1/32 [0/0] is directly connected, test1, 00:00:03
As a bonus, this fixes the issues we were seeing with not removing
directly connected routes of certain interface types when
those interfaces go down/are deleted.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
It's been a year search and destroy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Embed nexthop-group, which is just a pointer, in the zebra
nexthop-hash-entry object, rather than mallocing one.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When a client connects to zebra with GR capabilities and
then restarts, it might disconnect again even before hello is
sent leading zebra cores.
GR should be supported only for dynamic neighbor who are capable
of restarting.
Signed-off-by: Santosh P K <sapk@vmware.com>
Somewhat gnarly code flow here that might be leaking memory - can't tell
if it's a test artifact or not, but in any case this reduces the
situations in which we need to alloc a block.
And we don't need to check XCALLOC for success...
And we don't need to null check before XFREE...
Or set XFREE'd pointers to NULL...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Add in a few missing stub route-advert functions; these are
needed to build frr with v6 route adverts disabled.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
As part of checksum calculation for a received packet we were
comparing the checksum returned from in_cksum. Typically
when we calculate the checksum the value stored in the checksum
must be all 0's. Store the received checksum and then set
the checksum to 0 and then compare.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In several places we would send debug messages for failure situations
that really should be errors.
Signed-off-by: Donald Sharpd <sharpd@cumulusnetworks.com>
Using SO_BROADCAST, in the linux kernel, requires a uint32_t to be passed
in for all SOL_SOCKET calls. Modify code to use it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use the zapi_nexthop struct with the mpls_labels
zapi messages instead of the special-purpose (and
more limited) nexthop struct that was being used.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
RFC 4861 states that ipv6 RA messages sent out an interface should
contain all global ipv6 addresses on that interface. This fix adds
that capability. To override the default flags and timer settings
for a particular prefix, the existing "ipv6 nd prefix ..." command
should be used via vtysh under the appropriate interface.
Ticket: CM-20363
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Today vtysh can show the ip/ip6 routes through several commands:
- show_route_cmd
- show_route_detail_cmd
- show_route_summary_cmd
- show_route_table_cmd
- show_route_table_vrf_cmd
- show_route_all_table_vrf_cmd
Each command has its own set of filter rules:
- show_route_cmd can filter by vrf, protocol, tag, ... but not by table
- show_route_table_cmd always filter by table
- show_route_table_vrf_cmd always filter by table and can filter by vrf
too
- show_route_all_table_vrf_cmd show all route in any table for a vrf (or
all)
To reduce the number of commands and provide a possibility to filter by
any key add possibility for the show_route_cmd to filter by table with a
specific value or all to get route in all tables.
Then the show_route_table_cmd, show_route_table_vrf_cmd and
show_route_all_table_vrf_cmd functions can be removed as they are covered
by the generic show_route_cmd function.
It is to be noted that when zebra is started by default, it is possible
to execute show ip route command with both vrf and table parameters,
whereas before the command was not displayed. This is due to the fact
that this combination is only permitted when zebra is launched with vrf
network namespace mode. There, if zebra is configured with vrf-lite
backend, then a vty error message informs the user that the combination
of both table and vrf is not possible.
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
The existing behavior is when a remote VTEP is deleted,
its associatedneighbor (arp) and MAC entries are removed from
zebra database and do not wait for explicit type-2 route
withdraw from originating VTEP.
Remote type-2 route delete checks if VTEP is present before
removing the entry.
The behavior works fine when all evpn routes points to the
same nexthop as the VTEP IP.
In MLAG topology with advertise-pip, self type-2 and type-5 routes
are advertised with individual VTEP IP as nexthop ip for the route.
When a new VNI is created, it is assigned individual IP as tunnel-ip
then it transition to anycast IP (of the MLAG). During the transition,
type-3 route (VTEP delete) withdraw is sent for the individual IP.
The remote VTEP delete should not trigger to remove evpn routes pointing
to VTEP IP. Instead the route will be removed via explicit withdraw.
Ticket:CM-27752
Reviewed By:CCR-9722
Testing Done:
In evpn with MLAG deployment with advertise-pip and advertise-svi-ip
enabled, validated remote vtep delete does not remove self type-2 routes
from zebra DB. Upon explicit type-2 withdraw routes are removed.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Use a hash walker/iterator instead of a temporary list to
show zebra's nexthop-groups/nexthop-hash-entries.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The top variable has already been derefed by the time we get
to the test to see if it is non-NULL. No need to check it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Nexthop groups as a whole do not make sense to have a vrf'ness
As that you can have a arbitrary number of nexthops that point
to separate vrf's.
Modify the code to make this distinction, by clearly delineating
the line between the nhg and the nexthop a bit better.
Nexthop groups having a vrf_id only make sense if you are using
network namespaces to represent them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>