Implement the next hop group send on startup if you are using
them. Normally you will only have them if you are already using this
Linux kernel feature.
NOTE: to make sure all next hop groups exist, we send/enqueue all next
hop groups first and then we send routes. The RIB route walk start is
at the end of the function `fpm_nhg_send()`.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Force off kernel NHG install with netns-based VRFs for
now. There is not really a good solution for allowing
kernel nexthop groups in namespaced based vrfs.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When installing a nexthop group, dump out the ifindex of the
nexthop being installed as a bit more data for the developer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is an implementation of the IS-IS SR draft [1] for FRR.
The following features are supported:
* IPv4 and IPv6 Prefix-SIDs;
* IPv4 and IPv6 Adj-SIDs and LAN-Adj-SIDs;
* Index and absolute labels;
* The no-php and explicit-null Prefix-SID flags;
* Full integration with the Label Manager.
Known limitations:
* No support for Anycast-SIDs;
* No support for the SID/Label Binding TLV (required for LDP interop).
* No support for persistent Adj-SIDs;
* No support for multiple SRGBs.
[1] draft-ietf-isis-segment-routing-extensions-25
Signed-off-by: Olivier Dugeon <olivier.dugeon@orange.com>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The netlink_vrf_change() function is called both when a VRF device
is created in the Linux kernel and when it is activated. This
commit changes this function to perform the VRF misconfiguration
detection only when the VRF device is created, as doing the check
twice would cause a false positive followed by a hard failure (not
to mention the double check is unnecessary since the VRF table ID
can't change once the device is created).
Fixes#6319.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Commit e93a6fbb4 from PR3908 changed every interface into an
'unnumbered' interface - even interfaces that do not have
ipv4 at all. Undo that.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The function zebra_vxlan_print_neigh_vni_vtep does not create
a json object when json has been requested from the CLI and as a
result it prints out the information in normal CLI format.
Fix is to allocate the json object when required.
Signed-off-by: Pat Ruddy <pat@voltanet.io>
Reported by testing agency that rfc 4861 section 6.2.1 states
that all implementations must have a configuration knob to change
the setting of the advertised retransmit timer sent in RA packets.
This fix adds that capability.
Ticket: CM-29199
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Intermittently, there is a 30 second delay for a LDP pseudowire to become
operational.
One way to reproduce the issue is: Once PW is up, shutdown link to trigger
a change to the pseudowire's next hop, and then restore link to cause
pseudowire to return to original NH.
Problem Descripton:
The Zebra PW manager installs pseudowires in the data plane when the
following two conditions are met:
1. Pseudowire is labeled via LDP mapping messages
2. A labeled NH route exists to reach the remote pseudowire endpoint
The Zebra PW manager registers a NHT callback when a pseudowire is enabled.
This allows the Zebra PW manager to install or reinstall the pseudowire.
The Zebra PW manager deregisters for the NHT callback when the pseudowire is
disabled. When LDP learns the remote-pseudowire status is 'not forwarding',
LDP notifies Zebra that the pseudowire is disabled.
This creates a race condition where a new labeled NH can be resolved after the
Zebra PW manager deregistered for the NHT callback.
For static pseudowires, it makes sense for Zebra PW manager to deregister for
NHT callbacks for disabled pseudowires. Static pseudowires become disabled
via CLI configuration commands.
For LDP pseudowires, the Zebra PW manager should not deregister for NHT
callbacks for disabled pseudowires.
Overview of changes:
1. Zebra PW manager should not deregister for NHT callbacks when an LDP
pseudowire is disabled.
Zebra PW manager will register for NHT callbacks when the LDP pseudowire
is first enabled.
Zebra PW manager will deregister for NHT callbacks when the LDP
pseudowire is deleted.
2. Remove the 30 second timer that was added in PR4122.
PR4122 tried to fix this race condition with a timer.
Once we eliminate the race condition (by keeping the Zebra PW manager
registered for NHT callbacks), this timer can be removed.
3. Zebra PW manager handling of static pseudowires will remain as-is.
Zebra PW manager will register for NHT callbacks when the static
pseudowire is enabled.
Zebra PW manager will deregister for NHT callbacks when the static
pseudowire is disabled.
Signed-off-by: Lynne Morrison <lynne@voltanet.io>
Signed-off-by: Karen Schoener <karen@voltanet.io>
An async route notification can indicate that installation
has failed, but the handling code wasn't dealing with that
possibility correctly.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
These are easy to get subtly wrong, and doing so can cause
nondeterministic failures when racing in parallel builds.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Having a fixed set of parameters for each northbound callback isn't a
good idea since it makes it difficult to add new parameters whenever
that becomes necessary, as several hundreds or thousands of existing
callbacks need to be updated accordingly.
To remediate this issue, this commit changes the signature of all
northbound callbacks to have a single parameter: a pointer to a
'nb_cb_x_args' structure (where x is different for each type
of callback). These structures encapsulate all real parameters
(both input and output) the callbacks need to have access to. And
adding a new parameter to a given callback is as simple as adding
a new field to the corresponding 'nb_cb_x_args' structure, without
needing to update any instance of that callback in any daemon.
This commit includes a .cocci semantic patch that can be used to
update old code to the new format automatically.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
Currently the linux kernel allows you to specify the same
table id -> multiple vrf's. While I am arguing with
the kernel people about proper behavior here let's
just remove this as a possiblity from happening and
mark it a zebra stopable misconfiguration.
(Effectively we are preventing a crash down the line
as that all over FRR we assume it's a unique
mapping not a many to one).
Why fail hard? Because we hope to get the person
who misconfigured it to actually notice immediately
not hours or days down the line when shit hits the fan.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The function rt_netlink.c is using to lookup the vrf by
passed in table id.
I'm also going to pretend that this function is not
so awful to run when we have a large number of routes
incoming.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There are a couple of switch statements in netlink_route_info_encode
in zebra_fpm_netlink.c that had logically dead code. We have
a switch statement let's take actual advantage of it instead
of doing gyrations to what we want.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
- Fix 1 byte overflow when showing GR info in bgpd
- Use PATH_MAX for path buffers
- Use unsigned specifiers for uint16_t's in zebra pbr
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Replace sprintf with snprintf where straightforward to do so.
- sprintf's into local scope buffers of known size are replaced with the
equivalent snprintf call
- snprintf's into local scope buffers of known size that use the buffer
size expression now use sizeof(buffer)
- sprintf(buf + strlen(buf), ...) replaced with snprintf() into temp
buffer followed by strlcat
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Replace all `random()` calls with a function called `frr_weak_random()`
and make it clear that it is only supposed to be used for weak random
applications.
Use the annotation described by the Coverity Scan documentation to
ignore `random()` call warnings.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Call the `dp_fini` callback twice: once at the beginning of the shutdown
and then again right before `exit()`ing zebra.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Coverity is complaining that we are looking beyond the end
of the pointer. Why not just use prefix_cmp here? Since
we are comparing to route_nodes.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use the zapi client session id in the label manager apis;
use the client struct directly in some code. Assign a session
id to ldpd's sync LM zapi session.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Distinguish zapi sessions, for daemons who use more than one,
by adding a session id. The tuple of proto + instance is not
adequate to support clients who use multiple zapi sessions.
Include the id in the client show output if it's present. Add
a bit of info about this to the developer doc.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
And again for the name. Why on earth would we centralize this, just so
people can forget to update it?
Signed-off-by: David Lamparter <equinox@diac24.net>
Same as before, instead of shoving this into a big central list we can
just put the parent node in cmd_node.
Signed-off-by: David Lamparter <equinox@diac24.net>
There is really no reason to not put this in the cmd_node.
And while we're add it, rename from pointless ".func" to ".config_write".
[v2: fix forgotten ldpd config_write]
Signed-off-by: David Lamparter <equinox@diac24.net>
The only nodes that have this as 0 don't have a "->func" anyway, so the
entire thing is really just pointless.
Signed-off-by: David Lamparter <equinox@diac24.net>
Reported by testing agency that rfc 4861 section 6.2.1 states
that all implementations must have a configuration knob to change
the setting of the advertised hop limit. This fix adds that
capability.
Ticket: CM-29200
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
The netlink_request function takes a `struct nlmsghdr *`
pointer from a common pattern that we use:
struct {
struct nlmsghdr n;
struct fib_rule_hdr frh;
char buf[NL_PKT_BUF_SIZE];
} req;
We were calling it `netlink_request(Socket, &req.n)`
The problem here is that coverity, rightly so, sees that
we access the data after the nlmsghdr in netlink_request and
tells us we have an read beyond end of the structure. While
we know we haven't mangled anything up here because of manual
inspection coverity doesn't have this knowledge implicitly.
So let's modify the code call to netlink_request to pass in the
void pointer of the req structure itself, cast to the appropriate
data structure in the function and do the right thing. Hopefully
the coverity SA will be happy and we can move on with our life.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Implement the fix made in `master` to the remain pieces of code in the
data plane FPM module.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
* Break lines longer than 80 columns.
* Remove space after '('.
* Use '%pIX' instead of 'inet_ntop'.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Store VNI information in the data plane context so we can use it to
build the FPM netlink update with that information later.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Changes:
* Let the package builder scripts know that we have a new module that
needs to be taken care of.
* Include the frr atomic header to avoid undeclared atomic operations.
* Disable build on *BSDs because the code is using some zebra netlink
functions only available for Linux.
* Move data plane FPM module outside old FPM automake definition.
* Fix atomic usage for Ubuntu 14.04 (always use explicit).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
FPM has a thread to encode and enqueue output buffer that might compete
with zebra RIB/RMAC walk on startup, so lets use atomic operations to
make sure we are not getting statistic/counters wrong.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Enqueue all contexts inside FPM to avoid losing updates and to move all
processing to the FPM thread.
This helps in situations with huge amount of routes (e.g. BGP peer
flapping with a million routes).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add counters to debug the output buffer usage and pull down its data
when the remote receiver is slow (so we get more space for writes).
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Implement the code that walks the RMAC to send routes that are already
inside installed in the OS.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Add a public reset api, so a context can be reset and reused;
add apis to init a context for a route or mac update.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Instead of retuning always `0`, lets return the amount of used bytes for
the message. This will be used by the new FPM interface to know how many
bytes we must reserve for the output buffer.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
* Use `inet_ntop` instead of `inet_ntoa`
* Replace function name with `__func__`
* Inline functions
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Generalize the netlink route message building function so it can be used
in the future by the netlink Forwarding Plane Manager (FPM) interface.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
In some places we log the interface but not the vfr the
interface is in. In others we only output the vrf id, which
can be difficult for human to read. This commit makes zebra
debugs more vrf aware.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
Issue:
For consecutive messages such as
MAC1 -> VTEP1 add
MAC1 -> VTEP2 add
MAC1 -> VTEP1 add
Final state, i.e. (MAC1 -> VTEP1 add) should be sent via FPM.
But, with current code, FPM will send (MAC1 -> VTEP2 add)
RCA:
When FPM receives (MAC1, VTEP1), it stores it in the FPM processing queue and
hash table.
When FPM receives (MAC1, VTEP2), this entry is stored as another node as hash
table key is (mac, vtep and vni)
IF FPM again receives (MAC1, VTEP1), we fetch this node in the hash table
which is already enqueued.
When the FPM queue is processed, we will send FPM message for (MAC1, VTEP1)
first and then for (MAC1, VTEP2)
This sequencing issue happened because the key of the table is (MAC, VTEP, VNI)
Fix:
Change the key of the hash table to (MAC, VNI)
So, every time we receive a new update for (MAC1, VNI1), we will find a node in
the processing queue corresponding to MAC1 if present.
We will update this same node for every operation related to (MAC1, VNI1)
Thus, at the time when FPM processes this node, it will have latest MAC1 info.
Signed-off-by: Ameya Dharkar <adharkar@vmware.com>
On startup of zebra, read in all ipv4/ipv6 rules from
the kernel and remove any with the zebra proto.
If there are any, this means we failed to remove them
on shutdown due to a crash or something. Without this,
users have to manually remove them with iproute2 or some
such and its really annoying.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Define some explicit rule replace code paths into the dataplane
code and improve the handling around it/releasing the the old
rule from the hash table.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When default route is requested from client, default
route is sent to client if present. When route gets
deleted then delete is sent to clients.
Signed-off-by: Santosh P K <sapk@vmware.com>
zebra should only check whether a get_chunk operation succeeded
when processing the response, rather than insde the get_chunk
call itself. Spllitting the request and response hooks was done
precisely to allow for asynchronous calls to an external label
manager; in this case, the requested chunk is not necessarily
going to be available at request time.
Signed-off-by: Emanuele Di Pascale <emanuele@voltanet.io>
Free unhashable (duplicate NHEs from the kernel) via ID table
cleanup. Since the NHE ID hash table contains extra entries,
that's the one we need to be calling zebra_nhg_hash_free()
on, otherwise we will never free the unhashable NHEs.
This was found via a memleak:
==1478713== HEAP SUMMARY:
==1478713== in use at exit: 10,267 bytes in 46 blocks
==1478713== total heap usage: 76,810 allocs, 76,764 frees, 3,901,237 bytes allocated
==1478713==
==1478713== 208 (88 direct, 120 indirect) bytes in 1 blocks are definitely lost in loss record 35 of 41
==1478713== at 0x483BB1A: calloc (vg_replace_malloc.c:762)
==1478713== by 0x48E35E8: qcalloc (memory.c:110)
==1478713== by 0x451CCB: zebra_nhg_alloc (zebra_nhg.c:369)
==1478713== by 0x453DE3: zebra_nhg_copy (zebra_nhg.c:379)
==1478713== by 0x452670: nhg_ctx_process_new (zebra_nhg.c:1143)
==1478713== by 0x4523A8: nhg_ctx_process (zebra_nhg.c:1234)
==1478713== by 0x452A2D: zebra_nhg_kernel_find (zebra_nhg.c:1294)
==1478713== by 0x4326E0: netlink_nexthop_change (rt_netlink.c:2433)
==1478713== by 0x427320: netlink_parse_info (kernel_netlink.c:945)
==1478713== by 0x432DAD: netlink_nexthop_read (rt_netlink.c:2488)
==1478713== by 0x41B600: interface_list (if_netlink.c:1486)
==1478713== by 0x457275: zebra_ns_enable (zebra_ns.c:127)
Repro with:
ip next add id 1 blackhole
ip next add id 2 blackhole
valgrind /usr/lib/frr/zebra
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The rtadv code has two types of sockets:
a) namespace -> Where each zvrf get's it's own socket
b) vrf lite -> Where we get 1 socket for everything
When we were terminating a vrf we were *always*
killing the (b) socket. This is a mistake in
that other vrf's may need to be communicating.
Modify the code on vrf shutdown to only disable
that vrf's event processing and when we actually
terminate we shut the socket.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We don't want to install backup nexthops - yet - as part of the
nexthop-id-based kernel interactions on netlink platforms. Avoid
mixing backup and primary nexthops in the tree of dependencies
in the ecmp cases.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Include backup nexthops in nhe processing; connect incoming
zapi route data with updated rib/nhg apis; add more debugs in
nhg processing.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Refactor the detailed route debugging so that the dump of nexthops
can be used for both normal/active nexthops and backups (if they
are present).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use a backup index in a nexthop directly (if it has a backup
nexthop); revise the zebra nhe/nhg code; revise zapi route
decoding to match; revise the dataplane route datastructs.
Refactor some of the rib_add_multipath code to be prepared to
be called with an nhe, carrying nexthop and (possibly) backup
info together.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use const with some args to ipaddr, zebra vxlan, mpls
lsp, and nexthop apis; add some extra checks to some
nexthop-related apis.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
If we find that a nexthop is a duplicate, break immediately
rather than continuing to look through the rest of the list.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Properly set the NEXTHOP_GROUP_VALID flag and use it
as a conditional for installation decisions for individual
nexthop and groups containing it.
We set the NEXTHOP_GROUP_VALID flag it is:
1) A fully resolved active nexthop
or
2) Its a group that contains at least one VALID NHE
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were still doing a lookup on the nhe_id from before we
started referencing re->nhe directly.
Change set flag to just use re->nhe directly here since they
should always be the same at this point in the code anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we find a nexthop ID thats a duplicate in the code that converts
NHG rb trees into a flat list of nexthop IDs for the dataplane,
output a debug message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we transform the nexthop group rb trees into a flat
array of IDs to send into the dataplane code (zebra_nhg_nhe2grp),
don't put an ID in there that has not been in installed or is
not currently queued to be installed into the dataplane.
Otherwise, if some of the nexthops fail to install, we will
still try to create a group with them and then the entire group
will fail.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were not properly handling the case of a NHG inside of
another NHG when converting the rb tree of a multilevel NHG
into a flat list of IDs. When constructing, we call the function
zebra_nhg_nhe2grp_internal() recursively so that the rare
case of a group within a group is handled such that its
singleton nexthops are appended to the grp array of IDs
we send to the dataplane code.
Ex)
1:
-> 2:
-> 3
-> 4
->5:
->6
becomes this:
1:
->3
->4
->6
when its sent to the dataplane code for final kernel installation.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In the netlink code for determining whether to set
a src on the route, we check if the cmd=NEW_ROUTE
but its not possible for this to ever be anything
but a new route since we do a goto skip further up
if its a DEL_ROUTE cmd.
So remove this unnecessary check.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Determine src based on nexthop data even when we are using
kernel nexthop objects.
Before, we were entirely skipping this step and just sending the
nexthop ID, ignoring src determination.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Abstraction the route src determination from a nexthop in the
netlink code into a function for both singlepath and mutlipath
to call.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Line break at the end of the message is implicit for zlog_* and flog_*,
don't put it in the string. Mid-message line breaks are currently
unsupported. (LF is "end of message" in syslog.)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Some logging systems are, er, "allergic" to tabs in log messages.
(RFC5424: "The syslog application SHOULD avoid octet values below 32")
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Zebra is currently sending messages on interface add/delete/update,
VRF add/delete, and interface address change - regardless of whether
its clients had requested them. This is problematic for lde and isis,
which only listens to label chunk messages, and only when it is
waiting for one (synchronous client). The effect is the that messages
accumulate on the lde synchronous message queue.
With this change:
- Zebra does not send unsolicited messages to synchronous clients.
- Synchronous clients send a ZEBRA_HELLO to zebra.
The ZEBRA_HELLO contains a new boolean field: sychronous.
- LDP and PIM have been updated to send a ZEBRA_HELLO for their
synchronous clients.
Signed-off-by: Karen Schoener <karen@voltanet.io>
We currently have netlink_neigh_update_ctx,
netlink_vxlan_flood_update_ctx and netlink_macfdb_update_ctx
all of which do slightly different RTM_NEWNEIGH calls into
the kernel. After this change, there will be one common
function.
Signed-off-by: Jakub Urbańczyk <xthaid@gmail.com>
1) When programming a nhg id to the kernel we had no debug of that
is what we are doing.
2) Add debugs to all nexthop information to allow us to follow
which prefix we are talking about. This is especially
useful when dealing with a large number of routes and
you want to grep out one or two too see what is going on.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Ensure that any weight associated with the next hop is installed for
IPv4 routes with IPv6 next hops too.
Updates: lib, zebra: Allow for installation of a weighted nexthop
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Add to the ZEBRA_INTERFACE_BFD_DEST_UPDATE code path
in zebra_ptm_redistribute.c the missing c-bit data.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem: While zebra going down, rmap update thread is being called as part of
timer event. This make zebra to crash.
RCA: At this time route_map_master_hash is made to 0 by sig int handler.
This is causing Zebrad to crash while executing rmap update thread
Fix: As part of SIGINT handler, before calling routemap_finish,
thread off any routemap update scheduled at that point and make sure that
it wont get scheduled again by making the timeout as 0.
Signed-off-by: Saravanan K <saravanank@vmware.com>
The return type of is_selfroute function is changed from int to bool.
Also remove the redundant invoking of the is_selfroute function in the
calling function netlink_route_change_read_unicast
Fixes: https://github.com/FRRouting/frr/issues/5984
Signed-off-by: Naveen Naidu <naveennaidu479@gmail.com>
Readd the special MAC that represents the flood (head-end replication) entry
for EVPN-VxLAN upon getting a delete notification for it.
Signed-off-by: Vivek Venkatraman <vivek@cumulusnetworks.com>
Reviewed-by: Chirag Shah <chirag@cumulusnetworks.com>
Ticket: CM-25797
Ticket: CM-26238
Testing Done:
1. evpn-min, evpn-smoke - results summarized in CM-25798
add debug trace in specific neigh request send api
to help debug an issue where synchronous response parse
returns with NLMSG_DONE where there is no ipv6 neigh received.
the count value is set to 1 because the request contained
a spcific neigh.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Memory leak found where ipv6 global prefixes added to the router
advertisement prefix lists were not deleted when the process was
killed.
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Add a common api that formats a time interval into a string
with different output for short and longer intervals. We do
this in several places, for cli/ui output.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Upper level clients ask for default routes of a particular family
This change ensures that they only receive the family that they
have asked for.
Discovered when testing in ospf `default-information originate`
=================================================================
==246306==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7fffffffa2e8 at pc 0x7ffff73c44e2 bp 0x7fffffffa090 sp 0x7fffffffa088
READ of size 16 at 0x7fffffffa2e8 thread T0
#0 0x7ffff73c44e1 in prefix_copy lib/prefix.c:310
#1 0x7ffff741c0aa in route_node_lookup lib/table.c:255
#2 0x5555556cd263 in ospf_external_info_delete ospfd/ospf_asbr.c:178
#3 0x5555556a47cc in ospf_zebra_read_route ospfd/ospf_zebra.c:852
#4 0x7ffff746f5d8 in zclient_read lib/zclient.c:3028
#5 0x7ffff742fc91 in thread_call lib/thread.c:1549
#6 0x7ffff7374642 in frr_run lib/libfrr.c:1093
#7 0x5555555bfaef in main ospfd/ospf_main.c:235
#8 0x7ffff70a2bba in __libc_start_main ../csu/libc-start.c:308
#9 0x5555555bf499 in _start (/usr/lib/frr/ospfd+0x6b499)
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This patch does two things:
1) Ensure the decoding of stream data between pim <-> zebra is properly
decoded and we don't read beyond the end of the stream.
2) In zebra when we are freeing memory alloced ensure that we
actually have memory to delete before we do so.
Ticket: CM-27055
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This code is effectively dead code. SO_PEERCRED is a getsockopt
call not *setsockopt* call. Additionally we are not doing
anything with the failed setsockopt call at all.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There was some code missed during the upstreaming process
due to code squash. Identify and put into a commit
to keep code consistent and correct.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
this flag can be used when one routing daemon wants to force his route
to be injected prioritary with other routes, including selected routes.
for that, do not forget to update the new_selected pointer in the zebra
nexthop tracking algorithm.
Signed-off-by: Philippe Guibert <philippe.guibert@6wind.com>
The handlers for a couple of the main LSP-oriented zapi
messages explicitly limited themselves to a single out-label.
Allow multiple labels if the sender ... sends them.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
We made the decision to explicitly trust kernel and system routes
of every other type with 058c16b7e2.
So, we should trust directly connected routes the same way, assuming
the interface exists.
Old Behavior:
K 2.2.2.1/32 [0/0] is directly connected, unknown inactive, 00:00:39
New Behavior:
K>* 2.2.2.1/32 [0/0] is directly connected, test1, 00:00:03
As a bonus, this fixes the issues we were seeing with not removing
directly connected routes of certain interface types when
those interfaces go down/are deleted.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
It's been a year search and destroy.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Embed nexthop-group, which is just a pointer, in the zebra
nexthop-hash-entry object, rather than mallocing one.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When a client connects to zebra with GR capabilities and
then restarts, it might disconnect again even before hello is
sent leading zebra cores.
GR should be supported only for dynamic neighbor who are capable
of restarting.
Signed-off-by: Santosh P K <sapk@vmware.com>
Somewhat gnarly code flow here that might be leaking memory - can't tell
if it's a test artifact or not, but in any case this reduces the
situations in which we need to alloc a block.
And we don't need to check XCALLOC for success...
And we don't need to null check before XFREE...
Or set XFREE'd pointers to NULL...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Add in a few missing stub route-advert functions; these are
needed to build frr with v6 route adverts disabled.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
As part of checksum calculation for a received packet we were
comparing the checksum returned from in_cksum. Typically
when we calculate the checksum the value stored in the checksum
must be all 0's. Store the received checksum and then set
the checksum to 0 and then compare.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In several places we would send debug messages for failure situations
that really should be errors.
Signed-off-by: Donald Sharpd <sharpd@cumulusnetworks.com>
Using SO_BROADCAST, in the linux kernel, requires a uint32_t to be passed
in for all SOL_SOCKET calls. Modify code to use it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use the zapi_nexthop struct with the mpls_labels
zapi messages instead of the special-purpose (and
more limited) nexthop struct that was being used.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
RFC 4861 states that ipv6 RA messages sent out an interface should
contain all global ipv6 addresses on that interface. This fix adds
that capability. To override the default flags and timer settings
for a particular prefix, the existing "ipv6 nd prefix ..." command
should be used via vtysh under the appropriate interface.
Ticket: CM-20363
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Today vtysh can show the ip/ip6 routes through several commands:
- show_route_cmd
- show_route_detail_cmd
- show_route_summary_cmd
- show_route_table_cmd
- show_route_table_vrf_cmd
- show_route_all_table_vrf_cmd
Each command has its own set of filter rules:
- show_route_cmd can filter by vrf, protocol, tag, ... but not by table
- show_route_table_cmd always filter by table
- show_route_table_vrf_cmd always filter by table and can filter by vrf
too
- show_route_all_table_vrf_cmd show all route in any table for a vrf (or
all)
To reduce the number of commands and provide a possibility to filter by
any key add possibility for the show_route_cmd to filter by table with a
specific value or all to get route in all tables.
Then the show_route_table_cmd, show_route_table_vrf_cmd and
show_route_all_table_vrf_cmd functions can be removed as they are covered
by the generic show_route_cmd function.
It is to be noted that when zebra is started by default, it is possible
to execute show ip route command with both vrf and table parameters,
whereas before the command was not displayed. This is due to the fact
that this combination is only permitted when zebra is launched with vrf
network namespace mode. There, if zebra is configured with vrf-lite
backend, then a vty error message informs the user that the combination
of both table and vrf is not possible.
Signed-off-by: Thibaut Collet <thibaut.collet@6wind.com>
The existing behavior is when a remote VTEP is deleted,
its associatedneighbor (arp) and MAC entries are removed from
zebra database and do not wait for explicit type-2 route
withdraw from originating VTEP.
Remote type-2 route delete checks if VTEP is present before
removing the entry.
The behavior works fine when all evpn routes points to the
same nexthop as the VTEP IP.
In MLAG topology with advertise-pip, self type-2 and type-5 routes
are advertised with individual VTEP IP as nexthop ip for the route.
When a new VNI is created, it is assigned individual IP as tunnel-ip
then it transition to anycast IP (of the MLAG). During the transition,
type-3 route (VTEP delete) withdraw is sent for the individual IP.
The remote VTEP delete should not trigger to remove evpn routes pointing
to VTEP IP. Instead the route will be removed via explicit withdraw.
Ticket:CM-27752
Reviewed By:CCR-9722
Testing Done:
In evpn with MLAG deployment with advertise-pip and advertise-svi-ip
enabled, validated remote vtep delete does not remove self type-2 routes
from zebra DB. Upon explicit type-2 withdraw routes are removed.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Use a hash walker/iterator instead of a temporary list to
show zebra's nexthop-groups/nexthop-hash-entries.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The top variable has already been derefed by the time we get
to the test to see if it is non-NULL. No need to check it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Nexthop groups as a whole do not make sense to have a vrf'ness
As that you can have a arbitrary number of nexthops that point
to separate vrf's.
Modify the code to make this distinction, by clearly delineating
the line between the nhg and the nexthop a bit better.
Nexthop groups having a vrf_id only make sense if you are using
network namespaces to represent them.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The zebra implementation of nexthop groups has
two types of nexthops groups currently. Singleton
objects which have afi's and combined nexthop groups
that do not. Specifically call this out in the code
to make this distinction.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Handling capability received from client. It may contain
GR enable/disable, Stale time changes, RIB update complete
for given AFi, ASAFI and instance. It also has changes for
stale route handling.
Signed-off-by: Santosh P K <sapk@vmware.com>
Add a null check in `handle_recursive_depend()` so it
doesn't try to add a NULL pointer to the RB tree.
This was found with clang SA.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were not resetting the nexthop pointer to NULL for each
new read of a nexthop from the zapi route. On the chance we
get a nexthop that does not have a proper type, we will not
create a new nexthop and update that pointer, thus it still
has the last valid one and will create a group with two
pointers to the same nexthop.
Then when it enters any code that iterates the group, it loops
endlessly.
This was found with zapi fuzzing.
```
0x00007f728891f1c3 in jhash2 (k=<optimized out>, length=<optimized out>, initval=12183506) at lib/jhash.c:138
0x00007f728896d92c in nexthop_hash (nexthop=<optimized out>) at lib/nexthop.c:563
0x00007f7288979ece in nexthop_group_hash (nhg=<optimized out>) at lib/nexthop_group.c:394
0x0000000000621036 in zebra_nhg_hash_key (arg=<optimized out>) at zebra/zebra_nhg.c:356
0x00007f72888ec0e1 in hash_get (hash=<optimized out>, data=0x7ffffb94aef0, alloc_func=0x0) at lib/hash.c:138
0x00007f72888ee118 in hash_lookup (hash=0x7f7288de2f10, data=0x7f728908e7fc) at lib/hash.c:183
0x0000000000626613 in zebra_nhg_find (nhe=0x7ffffb94b080, id=0, nhg=0x6020000032d0, nhg_depends=0x0, vrf_id=<optimized out>,
afi=<optimized out>, type=<optimized out>) at zebra/zebra_nhg.c:541
0x0000000000625f39 in zebra_nhg_rib_find (id=0, nhg=<optimized out>, rt_afi=AFI_IP) at zebra/zebra_nhg.c:1126
0x000000000065f953 in rib_add_multipath (afi=AFI_IP, safi=<optimized out>, p=0x7ffffb94b370, src_p=0x0, re=0x6070000013d0,
ng=0x7f728908e7fc) at zebra/zebra_rib.c:2616
0x0000000000768f90 in zread_route_add (client=0x61f000000080, hdr=<optimized out>, msg=<optimized out>, zvrf=<optimized out>)
at zebra/zapi_msg.c:1596
0x000000000077c135 in zserv_handle_commands (client=<optimized out>, msg=0x61b000000780) at zebra/zapi_msg.c:2636
0x0000000000575e1f in main (argc=<optimized out>, argv=<optimized out>) at zebra/main.c:309
```
```
(gdb) p *nhg->nexthop
$4 = {next = 0x5488e0, prev = 0x5488e0, vrf_id = 16843009, ifindex = 16843009, type = NEXTHOP_TYPE_IFINDEX, flags = 8 '\b', {gate = {ipv4 = {s_addr = 0},
ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}},
bh_type = BLACKHOLE_UNSPEC}, src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0,
0}, __u6_addr32 = {0, 0, 0, 0}}}}, rmap_src = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0,
0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}, resolved = 0x0, rparent = 0x0, nh_label_type = ZEBRA_LSP_NONE, nh_label = 0x0, weight = 1 '\001'}
(gdb) quit
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Since we are using a UNIQUE RB tree, we need to handle the
case of adding in a duplicate entry into it.
The list API code returns NULL when a successfull add
occurs, so lets pull that handling further up into
the connected handlers. Then, free the allocated
connected struct if it is a duplicate.
This is a pretty unlikely situation to happen.
Also, pull up the RB handling of _del RB API as well.
This was found with the zapi fuzzing code.
```
==1052840==
==1052840== 200 bytes in 5 blocks are definitely lost in loss record 545 of 663
==1052840== at 0x483BB1A: calloc (vg_replace_malloc.c:762)
==1052840== by 0x48E1008: qcalloc (memory.c:110)
==1052840== by 0x44D357: nhg_connected_new (zebra_nhg.c:73)
==1052840== by 0x44D300: nhg_connected_tree_add_nhe (zebra_nhg.c:123)
==1052840== by 0x44FBDC: depends_add (zebra_nhg.c:1077)
==1052840== by 0x44FD62: depends_find_add (zebra_nhg.c:1090)
==1052840== by 0x44E46D: zebra_nhg_find (zebra_nhg.c:567)
==1052840== by 0x44E1FE: zebra_nhg_rib_find (zebra_nhg.c:1126)
==1052840== by 0x45AD3D: rib_add_multipath (zebra_rib.c:2616)
==1052840== by 0x4977DC: zread_route_add (zapi_msg.c:1596)
==1052840== by 0x49ABB9: zserv_handle_commands (zapi_msg.c:2636)
==1052840== by 0x428B11: main (main.c:309)
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Zebra will have special handling for clients with GR enabled.
When client disconnects with GR enabled, then a stale client
will be created and its RIB will be retained till stale timer
or client comes up and updated its RIB.
Co-authored-by: Santosh P K <sapk@vmware.com>
Co-authored-by: Soman K S <somanks@vmware.com>
Signed-off-by: Santosh P K <sapk@vmware.com>
Adding header files changes where structure to hold
received graceful restart info from client is defined.
Also there are changes for show commands where exisiting
commands are extended.
Co-authored-by: Santosh P K <sapk@vmware.com>
Co-authored-by: Soman K S <somanks@vmware.com>
Signed-off-by: Santosh P K <sapk@vmware.com>
Add a config that disables use of kernel-level nexthop ids.
Currently, zebra always uses nexthop ids if the kernel supports
them.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
When we are receiving a kernel route, with an admin distance
of 255 we are not marking it as installed. This route
should be marked as installed.
New behavior:
K>* 4.5.7.0/24 [255/8192] via 192.168.209.1, enp0s8, 00:10:14
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
commit: 0eb97b860d
Removed this chunk of code in zebra:
- if (ifp)
- if (connected_is_unnumbered(ifp))
- SET_FLAG(nexthop->flags, NEXTHOP_FLAG_ONLINK);
Effectively if we had a NEXTHOP_TYPE_IPV4_IFINDEX we would
auto set the onlink flag. This commit dropped it for some reason.
Add it back in an intelligent manner.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
My previous patch to fix a memory leak, caused by not properly freeing
the iptable iface list on stream parse failure, created/exposed a heap
use after free because we were not doing a deep copy
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
With recent changes to the lib nexthop_group
APIs (e1f3a8eb19), we are making
new assumptions that this should be adding a single nexthop
to a group, not a list of nexthops.
This broke the case of a recursive nexthop resolving to a group:
```
D> 2.2.2.1/32 [150/0] via 1.1.1.1 (recursive), 00:00:09
* via 1.1.1.1, dummy1 onlink, 00:00:09
via 1.1.1.2 (recursive), 00:00:09
* via 1.1.1.2, dummy2 onlink, 00:00:09
D> 3.3.3.1/32 [150/0] via 2.2.2.1 (recursive), 00:00:04
* via 1.1.1.1, dummy1 onlink, 00:00:04
K * 10.0.0.0/8 [0/1] via 172.27.227.148, tun0, 00:00:21
```
This group can instead just directly point to the nh that was passed.
Its only being used for a lookup (the memory gets copied and used
elsewhere if the nexthop is not found).
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Make the nexthop_copy/nexthop_dup APIs more consistent by
adding a secondary, non-recursive, version of them. Before,
it was inconsistent whether the APIs were expected to copy
recursive info or not. Make it clear now that the default is
recursive info is copied unless the _no_recurse() version is
called. These APIs are not heavily used so it is fine to
change them for now.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
cb86eba3ab was causing zebra to crash
when handling a nexthop group that had a nexthop which was recursively resolved.
Steps to recreate:
!
nexthop-group red
nexthop 1.1.1.1
nexthop 1.1.1.2
!
sharp install routes 8.8.8.1 nexthop-group red 1
=========================================
==11898== Invalid write of size 8
==11898== at 0x48E53B4: _nexthop_add_sorted (nexthop_group.c:254)
==11898== by 0x48E5336: nexthop_group_add_sorted (nexthop_group.c:296)
==11898== by 0x453593: handle_recursive_depend (zebra_nhg.c:481)
==11898== by 0x451CA8: zebra_nhg_find (zebra_nhg.c:572)
==11898== by 0x4530FB: zebra_nhg_find_nexthop (zebra_nhg.c:597)
==11898== by 0x4536B4: depends_find (zebra_nhg.c:1065)
==11898== by 0x453526: depends_find_add (zebra_nhg.c:1087)
==11898== by 0x451C4D: zebra_nhg_find (zebra_nhg.c:567)
==11898== by 0x4519DE: zebra_nhg_rib_find (zebra_nhg.c:1126)
==11898== by 0x452268: nexthop_active_update (zebra_nhg.c:1729)
==11898== by 0x461517: rib_process (zebra_rib.c:1049)
==11898== by 0x4610C8: process_subq_route (zebra_rib.c:1967)
==11898== Address 0x0 is not stack'd, malloc'd or (recently) free'd
Zebra crashes because we weren't handling the case of the depend nexthop
being recursive.
For this case, we cannot make the function more efficient. A nexthop
could resolve to a group of any size, thus we need allocs/frees.
To solve this and retain the goal of the original patch, we separate out the
two cases so it will still be more efficient if the nexthop is not recursive.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The only two safi's that are usable for zebra for installation
of routes into the rib are SAFI_UNICAST and SAFI_MULTICAST.
The acceptance of other safi's is causing a memory leak:
Direct leak of 56 byte(s) in 1 object(s) allocated from:
#0 0x5332f2 in calloc (/usr/lib/frr/zebra+0x5332f2)
#1 0x7f594adc29db in qcalloc /opt/build/frr/lib/memory.c:110:27
#2 0x686849 in zebra_vrf_get_table_with_table_id /opt/build/frr/zebra/zebra_vrf.c:390:11
#3 0x65a245 in rib_add_multipath /opt/build/frr/zebra/zebra_rib.c:2591:10
#4 0x7211bc in zread_route_add /opt/build/frr/zebra/zapi_msg.c:1616:8
#5 0x73063c in zserv_handle_commands /opt/build/frr/zebra/zapi_msg.c:2682:2
Collapse
Sequence of events:
Upon vrf creation there is a zvrf->table[afi][safi] data structure
that tables are auto created for. These tables only create SAFI_UNICAST
and SAFI_MULTICAST tables. Since these are the only safi types that
are zebra can actually work on. zvrf data structures also have a
zvrf->otable data structure that tracks in a RB tree other tables
that are created ( say you have routes stuck in any random table
in the 32bit route table space in linux ). This data structure is
only used if the lookup in zvrf->table[afi][safi] fails.
After creation if we pass a route down from an upper level protocol
that has non unicast or multicast safi *but* has the actual
tableid of the vrf we are in, the initial lookup will always
return NULL leaving us to look in the otable. This will create
a data structure to track this data.
If after this event you pass in a second route with the same
afi/safi/table_id, the otable will be created and attempted
to be stored, but the RB_TREE_UNIQ data structure when it sees
this will return the original otable returned and the lookup function
zebra_vrf_get_table_with_table_id will just drop the second otable.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
==25402==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 16 byte(s) in 1 object(s) allocated from:
#0 0x533302 in calloc (/usr/lib/frr/zebra+0x533302)
#1 0x7fee84cdc80b in qcalloc /home/qlyoung/frr/lib/memory.c:110:27
#2 0x5a3032 in create_label_chunk /home/qlyoung/frr/zebra/label_manager.c:188:3
#3 0x5a3c2b in assign_label_chunk /home/qlyoung/frr/zebra/label_manager.c:354:8
#4 0x5a2a38 in label_manager_get_chunk /home/qlyoung/frr/zebra/label_manager.c:424:9
#5 0x5a1412 in hook_call_lm_get_chunk /home/qlyoung/frr/zebra/label_manager.c:60:1
#6 0x5a1412 in lm_get_chunk_call /home/qlyoung/frr/zebra/label_manager.c:81:2
#7 0x72a234 in zread_get_label_chunk /home/qlyoung/frr/zebra/zapi_msg.c:2026:2
#8 0x72a234 in zread_label_manager_request /home/qlyoung/frr/zebra/zapi_msg.c:2073:4
#9 0x73150c in zserv_handle_commands /home/qlyoung/frr/zebra/zapi_msg.c:2688:2
When creating label chunk that has a specified base, we eventually are
calling assign_specific_label_chunk. This function finds the appropriate
list node and deletes it from the lbl_mgr.lc_list but since
the function uses list_delete_node() the deletion function that is
specified for lbl_mgr.lc_list is not called thus dropping the memory.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The vrrpd one conflicts with the standalone vrrpd package; also we're
installing daemons to /usr/lib/frr on some systems so they're not on
PATH.
Signed-off-by: David Lamparter <equinox@diac24.net>
Previous patches introduced various issues:
- Removal of stream_free() to fix double free caused memleak
- Patch for memleak was incomplete
This should fix it hopefully.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The existing usage of the rta_nest and addattr_nest
functions were not adding the NLA_F_NESTED flag
to the type. As such the new nexthop functionality was
actually looking for this flag, while apparently older
code did not.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Add error handling for top level failures (not able to
execute command, unable to find vrf for command, etc.)
With this error handling we add a new zapi message type
of ZEBRA_ERROR used when we are unable to properly handle
a zapi command and pass it down into the lower level code.
In the event of this, we reply with a message of type
enum zebra_error_types containing the error type.
The sent packet will look like so:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Length | Marker | Version |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| VRF ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Command |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ERROR TYPE |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Also add appropriate hooks for clients to subscribe to for
handling these types of errors.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
There's confusion between the nexthop-group configuration and a
zebra-specific show command. For now, make the zebra show
command string RIB-specific until we're able to unify these
paths.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
=================================================================
==3058==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000010 (pc 0x7f5bf3ef7477 bp 0x7ffdfaa20d40 sp 0x7ffdfaa204c8 T0)
==3058==The signal is caused by a READ memory access.
==3058==Hint: address points to the zero page.
#0 0x7f5bf3ef7476 in memcpy /build/glibc-OTsEL5/glibc-2.27/string/../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:134
#1 0x4d158a in __asan_memcpy (/usr/lib/frr/zebra+0x4d158a)
#2 0x7f5bf58da8ad in stream_put /home/qlyoung/frr/lib/stream.c:605:3
#3 0x67d428 in zsend_ipset_entry_notify_owner /home/qlyoung/frr/zebra/zapi_msg.c:851:2
#4 0x5c70b3 in zebra_pbr_add_ipset_entry /home/qlyoung/frr/zebra/zebra_pbr.c
#5 0x68e1bb in zread_ipset_entry /home/qlyoung/frr/zebra/zapi_msg.c:2465:4
#6 0x68f958 in zserv_handle_commands /home/qlyoung/frr/zebra/zapi_msg.c:2611:3
#7 0x55666d in main /home/qlyoung/frr/zebra/main.c:309:2
#8 0x7f5bf3e5db96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
#9 0x4311d9 in _start (/usr/lib/frr/zebra+0x4311d9)
the ipset->backpointer was NULL as that the hash lookup failed to find
anything. Prevent this crash from happening.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The decoding of _add and _del functions is practically identical
do a bit of work and make them so.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
=================================================================
==13611==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffe9e5c8694 at pc 0x0000004d18ac bp 0x7ffe9e5c8330 sp 0x7ffe9e5c7ae0
WRITE of size 17 at 0x7ffe9e5c8694 thread T0
#0 0x4d18ab in __asan_memcpy (/usr/lib/frr/zebra+0x4d18ab)
#1 0x7f16f04bd97f in stream_get2 /home/qlyoung/frr/lib/stream.c:277:2
#2 0x6410ec in zebra_vxlan_remote_macip_del /home/qlyoung/frr/zebra/zebra_vxlan.c:7718:4
#3 0x68fa98 in zserv_handle_commands /home/qlyoung/frr/zebra/zapi_msg.c:2611:3
#4 0x556add in main /home/qlyoung/frr/zebra/main.c:309:2
#5 0x7f16eea3bb96 in __libc_start_main /build/glibc-OTsEL5/glibc-2.27/csu/../csu/libc-start.c:310
#6 0x431249 in _start (/usr/lib/frr/zebra+0x431249)
This decode is the result of a buffer overflow because we are
not checking ipa_len.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The linux kernel will occassionally send RTM_GETNEIGH when
it expects user space to help in resolution of an ARP entry.
See linux kernel commit:
commit 3e25c65ed085b361cc91a8f02e028f1158c9f255
Author: Tim Gardner <tim.gardner@canonical.com>
Date: Thu Aug 29 06:38:47 2013 -0600
net: neighbour: Remove CONFIG_ARPD
Since we don't care about this, let's just safely ignore this
message for the moment. I imagine in the future we might
care when we implement neighbor managment in the system.
Reported By: Stefan Priebe <s.priebe@profihost.ag>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
There may be logic to prevent this ever happening earlier in the network
read path, but it doesn't hurt to double check it here, because clearly
deeper paths rely on this being the case.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Whatever this BFD re-transmission function is had a few problems.
1. Used memcpy instead of the (more concise) stream APIs, which include
bounds checking.
2. Did not sufficiently check packet sizes.
Actually, 2) is mitigated but is still a problem, because the BFD header
is 2 bytes larger than the "normal" ZAPI header, while the overall
message size remains the same. So if the source message being duplicated
is actually right up against the ZAPI_MAX_PACKET_SIZ, you still can't
fit the whole message into your duplicated message. I have no idea what
the intent was here but at least there's a warning if it happens now.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
- Fix iptable freeing code to free malloc'd list
- malloc iptable in zapi handler and use those functions to free it when
done to fix a linked list memleak
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
We copy a fixed length buffer from the wire but don't ensure it is null
terminated. Then print it as a c-string. Lul.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
further down we hash the src & dst ip, which asserts that the afi is one
of the well known ones, given the field names i assume the correct afis
here are af_inet[6]
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
zebra can catch the kernel's route deletion by netlink.
but current FRR can't delete kernel-route on vrf(l3mdev)
when kernel operator delete the route on out-side of FRR.
It looks problem about kernel-route deletion.
This problem is caused around _nexthop_cmp_no_labels(nh1,nh2)
that checks the each nexthop's member 'vrf_id'.
And _nexthop_cmp_no_labels's caller doesn't set the vrf_id
of nexthop structure. This commit fix that case.
Signed-off-by: Hiroki Shirokura <slank.dev@gmail.com>
router-id is buried deep in "show running-config", this new
command makes it easy to retrieve the user configured router-id.
Example:
# configure terminal
(config)# router-id 1.2.3.4
(config)# end
# show router-id
router-id 1.2.3.4
# configure terminal
(config)# no router-id 1.2.3.4
(config)# end
# show router-id
#
Signed-off-by: Jafar Al-Gharaibeh <jafar@atcorp.com>
We were not setting the RTNH_F_ONLINK flag where appropriate
when creating nexthop objects in the kernel.
Set it on the nhmsg.nh_flags netlink message.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When we are doing a lookup on an individual nexthop,
we should still be passing along the type that gets passed
via the arguments. Otherwise, we will always think we own that
NHE when in reality anyone could have put that into the
kernel.
Before this patch, nexthops in the kernel will get swepped
out even if we didn't create them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We should be NULL checking the entire re->nhe struct, not
the group inside of it. When we get routes from the kernel
using a nexthop group (and future protocols) they will only
pass us an ID to use. Hence, this struct can (and will be)
NULL on first attach when only passed an ID.
There shouldn't be a situation where we have an re->nhe
and don't have an re->nhe->nhg anyway.
Before this patch you can easily make zebra crash by creating a
route in the kernel using a nexthop group and starting zebra.
`ip next add dev lo id 111`
`ip route add 1.1.1.1/32 nhid 111`
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Older versions of protobuf-c do not support version 3 of the
protocol. Add a check into the system to see if we have
version 3 available and if so, compile it in.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If you compile FRR with no j factor zebra_mlag.c fails to
build because the vtysh extraction methodology runs first
before the protobuf compiler runs and that compilation does
not have the proper dependancy chain built for the inclusions
that zebra_mlag.c had. Moving the DEF* code into a zebra_mlag_vty.c
which can be included in the vtysh extraction code and has
no mlag.proto dependancies makes the compilation work better.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Handle the special case where a route update contains
no installed nexthops - that means the route is not
installed.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This is pretty much just to get rid of the HAVE_CUMULUS. The
hook/module API is as "wtf" as it was before...
Signed-off-by: David Lamparter <equinox@diac24.net>
Add an api that creates a copy of a list of nexthops and
enforces the canonical sort ordering; consolidate some nhg
code to avoid copy-and-paste. The zebra dplane uses
that api when a plugin sets up a list of nexthops, ensuring
that the plugin's list is ordered when it's processed in
zebra.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The processing of dataplane route notifications was a little
off-target after the nexthop-group re-work. This should allow
notifications to work better.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Linux has the idea of allowing a weight to be sent
down as part of a nexthop group to allow the kernel
to weight particular nexthop paths a bit more or less
than others.
See:
http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.rpdb.multiple-links.html
Allow for installation into the kernel using the weight attribute
associated with the nexthop.
This code is foundational in that it just sets up the ability
to do this, we do not use it yet. Further commits will
allow for the pass through of this data from upper level protocols.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Use a per-nexthop flag to indicate the presence of labels; add
some utility zapi encode/decode apis for nexthops; use the zapi
apis more consistently.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Use correct state/flags when installing EVPN macs; when we
converted from raw netlink to the zebra dataplane, a state value
got lost.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The flags can be important - like "threaded" - so we need to
actually capture them when plugins are registered.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Replace the existing list of nexthops (via a nexthop_group
struct) in the route_entry with a direct pointer to zebra's
new shared group (from zebra_nhg.h). This allows more
direct access to that shared group and the info it carries.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Problem reported by testing agency that RFC4861 section 6.2.5
states that a router should send an RA with a lifetime of 0
before ceasing to send RAs on the interface, or when the interace
is shutdown, or the router is shutdown. This fix adds that capability.
Ticket: CM-27061
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
For SR-TE we'll need to create Binding-SIDs which are essentially
LSPs that can push multiple outgoing labels. This commit sets the
groundwork for that. Luckily the netlink code didn't need to be
changed since it already supports pushing label stacks.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
The openfabric daemon has a longer name than anticipated for
`show zebra client summary` adjust to allow it to fit without
making columns all blomped.
Before:
robot# show zebra client summ
Name Connect Time Last Read Last Write IPv4 Routes IPv6 Routes
--------------------------------------------------------------------------------
static 00:00:06 00:00:06 00:00:06 4/0 0/0
openfabric 00:00:06 00:00:06 00:00:06 0/0 0/0
After:
[sharpd@robot frr4]$ vtysh -c "show zebra client summ"
Name Connect Time Last Read Last Write IPv4 Routes IPv6 Routes
--------------------------------------------------------------------------------
static 00:02:16 00:02:16 00:02:16 4/0 0/0
openfabric 00:02:16 00:02:16 00:02:16 0/0 0/0
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Problem reported by testing facility that our sending of Router
Advertisements more frequently than once very three seconds is not
compliant with rfc4861. Added a knob to turn off fast retransmits
in order to meet the requirement of the RFC.
Ticket: CM-27063
Signed-off-by: Don Slice <dslice@cumulusnetworks.com>
Macvlan down event have sentinel check of its parent
link presence.
Ticket:CM-26622
Reviewed By:CCR-9326
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
"show vrf vni" and "show evpn vni <l3vni>" commands
need to display correct router mac value.
"show evpn vni <l3vni>" detail l3vni needs to display
system mac as in PIP scenario value can be different.
Syste MAC would be derived from SVI interface MAC wherelse
Router MAC would be derived from macvlan interface MAC value.
Ticket:CM-26710
Reviewed By:CCR-9334
Testing Done:
TORC11# show evpn vni 4001
VNI: 4001
Type: L3
Tenant VRF: vrf1
Local Vtep Ip: 36.0.0.11
Vxlan-Intf: vx-4001
SVI-If: vlan4001
State: Up
VNI Filter: none
System MAC: 00:02:00:00:00:2e
Router MAC: 44:38:39:ff:ff:01
L2 VNIs: 1000
TORC11# show vrf vni
VRF VNI VxLAN IF L3-SVI State Rmac
vrf1 4001 vx-4001 vlan4001 Up 44:38:39:ff:ff:01
TORC11# show evpn vni 4001 json
{
"vni":4001,
"type":"L3",
"localVtepIp":"36.0.0.11",
"vxlanIntf":"vx-4001",
"sviIntf":"vlan4001",
"state":"Up",
"vrf":"vrf1",
"sysMac":"00:02:00:00:00:2e",
"routerMac":"44:38:39:ff:ff:01",
"vniFilter":"none",
"l2Vnis":[
1000,
]
}
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
macvlan interface up/down event triggers
bgp to send updates for evpn routes
with changed RMAC and nexthop IP values.
Ticket:CM-26190
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
By default announct Self Type-2 routes with
system IP as nexthop and system MAC as
nexthop.
An API to check type-2 is self route via
checking ipv4/ipv6 address from connected interfaces list.
An API to extract RMAC and nexthop for type-2
routes based on advertise-svi-ip knob is enabled.
When advertise-pip is enabled/disabled, trigger type-2
route update. For self type-2 routes to use
anycast or individual (rmac, nexthop) addresses.
Ticket:CM-26190
Reviewed By:
Testing Done:
Enable 'advertise-svi-ip' knob in bgp default instance.
the vrf instance svi ip is advertised with nexthop
as default instance router-id and RMAC as system MAC.
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Extract mac-vlan interface mac when a l3vni add is sent to bgp
Per L3VNI maintain vrr interface.
An api to extract vrr mac address from a vlan id, associated
master svi device.
When a l3vni operational up event is sent to bgpd,
extract vrr rmac along with svi rmac.
Ticket:CM-26190
Reviewed By:
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
Zebra MLAG is using "t_read" for multiple tasks, such as
1. For opening Communication channel with MLAG
2. In case conncetion fails, same event is used for retries
3. after the connection establishment, same event is used to
read the data from MLAG
since all these taks will never schedule together, this will not
cause any issues.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
edge-2> show evpn vni detail json
{
"vni":79031,
"type":"L3",
...,
...
} <<<<<< no comma
{
"vni":79021,
"type":"L3",
...,
...
} <<<<<< no comma
{
} <<<<<< blank
edge-2>
The fix is to pack json info into json_array before printing it.
Signed-off-by: Lakshman Krishnamoorthy <lkrishnamoor@vmware.com>
Apparently the multipath_num functionatlity has been broken
for a while because we were ignoring the recusive nexthops
when marking them inactive based on it.
This sets them as inactive as well if the parent breaks it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were re-counting the entire group's active number on
every iteration of this nexthop_active_update() loop.
This is not great from a performance perspective but also
it was failing to properly mark things according to the
specified multipath_num.
Since a nexthop is set as active before this check, if its == to
the set ecmp, it gets marked inactive even though if its
under the max ecmp wanted!
ex)
set ecmp to 1.
`/usr/lib/frr/zebra -e 1`
All kernel routes will be marked inactive even with just one nexthop!
K 1.1.1.1/32 [0/0] is directly connected, dummy1 inactive, 00:00:10
K 1.1.1.2/32 [0/0] is directly connected, dummy2 inactive, 00:00:10
K 1.1.1.3/32 [0/0] is directly connected, dummy3 inactive, 00:00:10
K 1.1.1.4/32 [0/0] is directly connected, dummy4 inactive, 00:00:10
K 1.1.1.5/32 [0/0] is directly connected, dummy5 inactive, 00:00:10
K 1.1.1.6/32 [0/0] is directly connected, dummy6 inactive, 00:00:10
K 1.1.1.7/32 [0/0] is directly connected, dummy7 inactive, 00:00:10
K 1.1.1.8/32 [0/0] is directly connected, dummy8 inactive, 00:00:10
K 1.1.1.9/32 [0/0] is directly connected, dummy9 inactive, 00:00:10
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Clean up the relationships between zebra's rib and nexthop-group
headers as prep for adding a nexthop-group pointer to the
route_entry.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
On BSD systems null routes were not being installed into the
kernel. This is because commit 08ea27d112
introduced a bug where we were attempting to use the wrong
prefix afi types and as such we were going down the v6 code path.
test27.lab.netdef.org# show ip route
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued route, r - rejected route
K>* 0.0.0.0/0 [0/0] via 192.168.122.1, 00:00:23
S>* 4.5.6.8/32 [1/0] unreachable (blackhole), 00:00:11
C>* 192.168.122.0/24 [0/1] is directly connected, vtnet0, 00:00:23
test27.lab.netdef.org# exit
[ci@test27 ~/frr]$ netstat -rn
Routing tables
Internet:
Destination Gateway Flags Netif Expire
default 192.168.122.1 UGS vtnet0
4.5.6.8/32 127.0.0.1 UG1B lo0
127.0.0.1 link#2 UH lo0
192.168.122.0/24 link#1 U vtnet0
192.168.122.108 link#1 UHS lo0
Fixes: #4843
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The code for when a new vrf is created to properly handle
router advertisement for it is messed up in several ways:
1) Generation of the zrouter data structure should set the rtadv
socket to -1 so that we don't accidently close someone elses
open file descriptor
2) When you created a new zvrf instance *after* bootup we are XCALLOC'ing
the data structure so the zvrf->fd was 0. The shutdown code was looking
for the >= 0 to know if the fd existed (since fd 0 is valid!)
This sequence of events would cause zebra to consume 100% of the
cpu:
Run zebra by itself ( no other programs )
ip link add vrf1 type vrf table 1003
ip link del vrf vrf1
vtysh -c "configure" -c "no interface vrf1"
This commit fixes this issue.
Fixes: #5376
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When we shut down zebra, we were not doing anything to shut
down the FPM. Perform the necessary occult rituals and
stop the threads from running during early shutdown.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We should be setting the ns->info pointer to NULL when we free
what it points to. Just use XFREE directly on the void * pointer
to do this.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were not connecting the default zebra_ns to the default
ns->info at namespace initialization in zebra. Thus, when
we tried to use the `ns_walk_func()` it would ignore the
default zebra_ns since there is no pointer to it from the
ns struct.
Fix this by connecting them in `zebra_ns_init()` and,
if the default ns is not found, exit with failure
since this is not recoverable.
This was found during a crash where we fail to cancel the kernel_read
thread at termination (via the `ns_walk_func()`) and then we
get a netlink notification trying to use the zns struct that has
already been freed.
```
(gdb) bt
\#0 0x00007fc1134dc7bb in raise () from /lib/x86_64-linux-gnu/libc.so.6
\#1 0x00007fc1134c7535 in abort () from /lib/x86_64-linux-gnu/libc.so.6
\#2 0x00007fc113996f8f in core_handler (signo=11, siginfo=0x7ffe5429d070, context=<optimized out>) at lib/sigevent.c:254
\#3 <signal handler called>
\#4 0x0000561880e15449 in if_lookup_by_index_per_ns (ns=0x0, ifindex=174) at zebra/interface.c:269
\#5 0x0000561880e1642c in if_up (ifp=ifp@entry=0x561883076c50) at zebra/interface.c:1043
\#6 0x0000561880e10723 in netlink_link_change (h=0x7ffe5429d8f0, ns_id=<optimized out>, startup=<optimized out>) at zebra/if_netlink.c:1384
\#7 0x0000561880e17e68 in netlink_parse_info (filter=filter@entry=0x561880e17680 <netlink_information_fetch>, nl=nl@entry=0x561882497238, zns=zns@entry=0x7ffe542a5940,
count=count@entry=5, startup=startup@entry=0) at zebra/kernel_netlink.c:932
\#8 0x0000561880e186a5 in kernel_read (thread=<optimized out>) at zebra/kernel_netlink.c:406
\#9 0x00007fc1139a4416 in thread_call (thread=thread@entry=0x7ffe542a5b70) at lib/thread.c:1599
\#10 0x00007fc113974ef8 in frr_run (master=0x5618823c9510) at lib/libfrr.c:1024
\#11 0x0000561880e0b916 in main (argc=8, argv=0x7ffe542a5f78) at zebra/main.c:483
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
1. add the Mlag ProtoBuf Lib to Zebra Compilation
2. Encode the messages with protobuf before writing to MLAG
3. Decode the MLAG Messages using protobuf and write to clients
based on their subscrption.
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
This includes:
1. Processing client Registrations for MLAG
2. storing client Interests for MLAG updates
3. Opening communication channel to MLAG with First client reg
4. Closing Communication channel with last client De-reg
5. Spawning a new thread for handling MLAG updates peocessing
6. adding Test code
7. advertising MLAG Updates to clients based on their interests
Signed-off-by: Satheesh Kumar K <sathk@cumulusnetworks.com>
This code is called from the zebra main pthread during shutdown
but the thread event is scheduled via the zebra dplane pthread.
Hence, we should be using the `thread_cancel_async()` API to
cancel the thread event on a different pthread.
This is only ever hit in the rare case that we still have work left
to do on the update queue during shutdown.
Found via zebra crash:
```
(gdb) bt
\#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
\#1 0x00007f4e4d3f7535 in __GI_abort () at abort.c:79
\#2 0x00007f4e4d3f740f in __assert_fail_base (fmt=0x7f4e4d559ee0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7f4e4d9071d0 "master->owner == pthread_self()",
file=0x7f4e4d906cf8 "lib/thread.c", line=1185, function=<optimized out>) at assert.c:92
\#3 0x00007f4e4d405102 in __GI___assert_fail (assertion=assertion@entry=0x7f4e4d9071d0 "master->owner == pthread_self()", file=file@entry=0x7f4e4d906cf8 "lib/thread.c",
line=line@entry=1185, function=function@entry=0x7f4e4d906b68 <__PRETTY_FUNCTION__.15817> "thread_cancel") at assert.c:101
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
\#5 0x000055b40c291845 in zebra_dplane_shutdown () at zebra/zebra_dplane.c:3274
\#6 0x000055b40c27ee13 in zebra_finalize (dummy=<optimized out>) at zebra/main.c:202
\#7 0x00007f4e4d8d1416 in thread_call (thread=thread@entry=0x7ffcbbc08870) at lib/thread.c:1599
\#8 0x00007f4e4d8a1ef8 in frr_run (master=0x55b40ce35510) at lib/libfrr.c:1024
\#9 0x000055b40c270916 in main (argc=8, argv=0x7ffcbbc08c78) at zebra/main.c:483
(gdb) down
\#4 0x00007f4e4d8d095a in thread_cancel (thread=0x55b40d01a640) at lib/thread.c:1185
1185 assert(master->owner == pthread_self());
(gdb)
```
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Put the code to free the data held by a nhg_ctx
in nhg_ctx_free() as well. We do it similiarly for
the dplane_ctx.
Let nhg_ctx_fini() be any other routines that need to
be handled before freeing.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
SA warned us lookup could be NULL dereferenced in some
paths. Handle the case where we are passed a NULL
nexthop before we try to copy it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were only checking that two nhg_hash_entry's were equal
based on the active nexthop NUMBER. This is not sufficient in
special cases where whats active with one route using it,
might not be active with the other. We can see this with
routes trying to resolve to themselves.
Ex)
1.1.1.0/24
-> 1.1.1.1 dummy1 (inactive)
-> 1.1.1.2 dummy2
1.1.2.0/24
-> 1.1.1.1 dummy1
-> 1.1.1.2 dummy1 (inactive)
Without checking each nexthop individually, they will
hash to the same group since they have the same number of
active nexthops.
Fix this by looping over every nexthop for each nhe (they should
be sorted) and checking if the NEXTHOP_FLAG_ACTIVE flag's match.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We cannot clear the NEXTHOP_FLAG_FIB nexthop flag
when sending routes to the dataplane anymore since
nexthops are now shared.
We were seeing a situation where if we delete a route
using a nexthop group that is still active with another
route, the fib flag was being unset by this code
path despite them still being valid fib nexthops with the
other route.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We were crashing due to a missed label change code path
in mpls_ftn_uninstall() with the zebra_nhg hashing code.
Add a static handler function for label changing everywhere
in that code and use it in mpls_ftn_uninstall().
The crash was found in the ISIS-SR tests:
==23== Thread 1:
==23== Invalid read of size 4
==23== at 0x15B20E: zebra_nhg_hash_equal (zebra_nhg.c:365)
==23== by 0x489A2FD: hash_get (hash.c:143)
==23== by 0x489A4BC: hash_lookup (hash.c:183)
==23== by 0x15B5A3: zebra_nhg_find (zebra_nhg.c:494)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x1573E8: mpls_ftn_update (zebra_mpls.c:2661)
==23== by 0x1A2554: zread_mpls_labels_replace (zapi_msg.c:1890)
==23== by 0x1A41CD: zserv_handle_commands (zapi_msg.c:2613)
==23== by 0x199B17: zserv_process_messages (zserv.c:517)
==23== by 0x48EE6B7: thread_call (thread.c:1549)
==23== by 0x48A8AD5: frr_run (libfrr.c:1064)
==23== by 0x1391B7: main (main.c:468)
==23== Address 0x5839330 is 0 bytes inside a block of size 80 free'd
==23== at 0x48369AB: free (vg_replace_malloc.c:530)
==23== by 0x48AEE6C: qfree (memory.c:129)
==23== by 0x15C5F8: zebra_nhg_free (zebra_nhg.c:1095)
==23== by 0x15BC8C: zebra_nhg_handle_uninstall (zebra_nhg.c:734)
==23== by 0x15DCFA: zebra_nhg_uninstall_kernel (zebra_nhg.c:1826)
==23== by 0x15C666: zebra_nhg_decrement_ref (zebra_nhg.c:1106)
==23== by 0x15D9D7: zebra_nhg_re_update_ref (zebra_nhg.c:1711)
==23== by 0x15D8B1: nexthop_active_update (zebra_nhg.c:1660)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
==23== Block was alloc'd at
==23== at 0x4837B65: calloc (vg_replace_malloc.c:752)
==23== by 0x48AED56: qcalloc (memory.c:110)
==23== by 0x15B07B: zebra_nhg_copy (zebra_nhg.c:307)
==23== by 0x15B13E: zebra_nhg_hash_alloc (zebra_nhg.c:329)
==23== by 0x489A339: hash_get (hash.c:148)
==23== by 0x15B6CA: zebra_nhg_find (zebra_nhg.c:532)
==23== by 0x15C536: zebra_nhg_rib_find (zebra_nhg.c:1070)
==23== by 0x15D89A: nexthop_active_update (zebra_nhg.c:1658)
==23== by 0x167072: rib_process (zebra_rib.c:1154)
==23== by 0x168D72: process_subq_route (zebra_rib.c:2039)
==23== by 0x168E92: process_subq (zebra_rib.c:2078)
==23== by 0x168F5B: meta_queue_process (zebra_rib.c:2112)
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
This reverts commit 7d5bb02b1a.
Allow zebra to actually maintain the nexthop group in the
linux kernel.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
In symmetric routing case for evpn (type-2/type-5)
routes, nexthop and Router mac fields can change from
the originating VTEP.
At the receiving VTEP, bgp path info may points to different
nexthop IP (nh1->nh2) and Remote MAC remain the same.
When the bgp sync the route with nexthop and RMAC fields.
For the exisitng rmac entry update/replace with the new
nexthop (VTEP) IP in remote rmac db in Zebra.
Similarly, bgp path info may points different Router-mac
(RMAC1->RMAC2) and the nexthop value remains the same.
In this case, update to the new RMAC value for the
existing remote nexthop in the Zebra' nexthop cache db.
Ticket:CM-26917
Reviewed By:CCR-9435
Testing Done:
Signed-off-by: Chirag Shah <chirag@cumulusnetworks.com>
We were creating `other` tables in rib_del(), vty commands, and
dataplane return callback via the zebra_vrf_table_with_table_id()
API.
Seperate the API into only a lookup, never create
and added another with `get` in the name (following the standard
we use in other table APIs).
Then changed the rib_del(), rib_find_rn_from_ctx(), and show route
summary vty command to use the lookup API instead.
This was found via a crash where two different vrfs though they owned
the table. On delete, one free'd all the nodes, and then the other tried
to use them. It required specific timing of a VRF existing, going away,
and coming back again to cause the crash.
=23464== Invalid read of size 8
==23464== at 0x179EA4: rib_dest_from_rnode (rib.h:433)
==23464== by 0x17ACB1: zebra_vrf_delete (zebra_vrf.c:253)
==23464== by 0x48F3D45: vrf_delete (vrf.c:243)
==23464== by 0x48F4468: vrf_terminate (vrf.c:532)
==23464== by 0x13D8C5: sigint (main.c:172)
==23464== by 0x48DD25C: quagga_sigevent_process (sigevent.c:105)
==23464== by 0x48F0502: thread_fetch (thread.c:1417)
==23464== by 0x48AC82B: frr_run (libfrr.c:1023)
==23464== by 0x13DD02: main (main.c:483)
==23464== Address 0x5152788 is 104 bytes inside a block of size 112 free'd
==23464== at 0x48369AB: free (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23464== by 0x48B25B8: qfree (memory.c:129)
==23464== by 0x48EA335: route_node_destroy (table.c:500)
==23464== by 0x48E967F: route_node_free (table.c:90)
==23464== by 0x48E9742: route_table_free (table.c:124)
==23464== by 0x48E9599: route_table_finish (table.c:60)
==23464== by 0x170CEA: zebra_router_free_table (zebra_router.c:165)
==23464== by 0x170DB4: zebra_router_release_table (zebra_router.c:188)
==23464== by 0x17AAD2: zebra_vrf_disable (zebra_vrf.c:222)
==23464== by 0x48F3F0C: vrf_disable (vrf.c:313)
==23464== by 0x48F3CCF: vrf_delete (vrf.c:223)
==23464== by 0x48F4468: vrf_terminate (vrf.c:532)
==23464== Block was alloc'd at
==23464== at 0x4837B65: calloc (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==23464== by 0x48B24A2: qcalloc (memory.c:110)
==23464== by 0x48EA2FE: route_node_create (table.c:488)
==23464== by 0x48E95C7: route_node_new (table.c:66)
==23464== by 0x48E95E5: route_node_set (table.c:75)
==23464== by 0x48E9EA9: route_node_get (table.c:326)
==23464== by 0x48E1EDB: srcdest_rnode_get (srcdest_table.c:244)
==23464== by 0x16EA4B: rib_add_multipath (zebra_rib.c:2730)
==23464== by 0x1A5310: zread_route_add (zapi_msg.c:1592)
==23464== by 0x1A7B8E: zserv_handle_commands (zapi_msg.c:2579)
==23464== by 0x19D689: zserv_process_messages (zserv.c:523)
==23464== by 0x48F09F8: thread_call (thread.c:1599)
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a dataplane plugin module as a sample or reference for
folks who might like to integrate with the zebra dataplane
subsystem. This isn't part of the FRR build or product; there
are some simple build and load-at-runtime instructions in
comments in the file.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
The zvni_map_to_svi function may return NULL as such prevent
a deref and crash. Found via coverity
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Fix 2 Coverity issues:
1) zebra_nhg.c -> all paths in nhg_ctx_process_finish have
already deref'ed the ctx pointer no need for a test of it
2) the **ifp pointer passed in may be NULL. Prevent an accidental
deref if calling function does not pass in a ifp pointer.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Checkpatch was complaining because this code was extending
beyond 80 characters on a couple lines. Adjusted a conditional
tree to fix that.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a private header file for functions that are internal/special
case like how we do it for `lib/nexthop_group_private.h`.
Remove a bunch of functions from the header file only being used
statically and add some comments for those remaining to indicate
better what their use is.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Re-work the validity setting and checking APIs
for nhg_hash_entry's to make them clearer.
Further, they were originally only beings set
on ifdown and install. Extended their use into
releasing entries and to account for setting
the validity of a recursive dependent.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
The commenting for why we would need to requeue a
group from the kernel to be later processed was not
sufficient. Add a better explanation for the flow
and state of the system.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Change the wording of the flag indicating we have received
a nexthop group from the kernel with a different ID but
is fundamentally identical to one we already have.
It was colliding with a flag of similar name in the nexthop struct.
Change it from NEXTHOP_GROUP_DUPLICATE -> NEXTHOP_GROUP_UNHASHABLE
since it is in fact unhashable.
Also change the wording of functions and comments referencing the same
problem.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When determining whether to set the nhg_hash_entry as
invalid, we should have been checking the depends, not
the dependents. If its a group and at least one of its
depends is valid, the group is still valid.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Guard against an overflow read when processing
nexthop groups from netlink. Add a check to ensure
we don't try to write passed the array size.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Now with this patch we can't use shutdown for cleanup:
```
commit 2fc69f03d2 (pr_5079)
Author: Mark Stapp <mjs@voltanet.io>
Date: Fri Sep 27 12:15:34 2019 -0400
zebra: during shutdown processing, drop dplane results
Don't process dataplane results in zebra during shutdown (after
sigint has been seen). The dplane continues to run in order to
clean up, but zebra main just drops results.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
```
Adjusted nhg uninstall handling to clear data and other
cleanup before sending to the dataplane.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a comment to the header of `zebra_nhg.c` to point the reader
to where the hashtables containing the nhg entries are held.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Reduce the api for deleting nexthops and the containing
group to just one call rather than having a special case
and handling it separately.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
With the new nexthop group shared memory framework, pointers
are being used in route_entry for the nexthop_group. Update
the use of this in `mpls_ftn_uninstall()` to reflect the change.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If the vrf lookup fails, use the default namespace
to find/delete the nexthop group from the kernel because it
should be there anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Check both the nhg and nexthop are not NULL before passing
them to be hashed. Clang SA caught this.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some boilerplate for nexthop installation for bsd kernels.
They do not support nexthop objects for now so its just boilerplate.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When moving the nexthop group in a route entry to be a pointer,
we missed one wrapped in a `ifndef` for when the kernel doesn't
have netlink.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
We do not need to check that the nexthop is installed or queued
when sending a route deletion since we only need to the prefix for it.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
In lieu of the fact that we probably shouldn't change show
command output too much, changing this to only give nhe_id
output when the user explicitly asks for it. Probably only
going to be used for debugging for now anyway.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Only used the afi passed into `zebra_nhg_find()` for nexthops
that are blackhole/ifindex. Others should use the type actually declared
in the nexthop struct itself.
Basically, nexthop objects of type blackhole/ifindex in the kernel must
have an address family, they cannot be ambigious and be shared.
This is some requirement in the linux ip core code.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add a mechanism to requeue groups we receive from the
kernel if the IDs are in a weird order (Group ID is lower
than individual nexthop IDs for example).
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
If we get a nexthop group from the kernel with labels
and queue it as a context to process later, we have to
free the label stack we allocated.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some getters for the nhg_ctx struct. Probably unnecessary
at this point since they are all static but if they ever become
public it will be nice to have them.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add code for handling nexthop group hash entry encaps
and sending them to the kernel. Add some more debugging
information for the encaps and groups in general.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
There was some code copypasta for mpls stack building in the
netlink install path. Reduced that to a common function.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
When querying for detailed route information, show the nexthop
group id for its nh_hash_entry in the output before listing the
nexthops.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Add some more detailed output to `show nexthop-group`.
It closely resembles the output of `show ip routes`.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>
Optimize the fib and notified nexthop group comparison algorithm
to assume ordering. There were some pretty serious performance hits with
this on high ecmp routes.
Signed-off-by: Stephen Worley <sworley@cumulusnetworks.com>