NetDEF CI has been whining about multiline string style.
Make the strings single-line and call it a day.
Signed-off-by: Trey Aspelund <taspelund@nvidia.com>
Currently when the kernel sends netlink messages to FRR
the buffers to receive this data is of fixed length.
The kernel, with certain configurations, will send
netlink messages that are larger than this fixed length.
This leads to situations where, on startup, zebra gets
really confused about the state of the kernel. Effectively
the current algorithm is this:
read up to buffer in size
while (data to parse)
get netlink message header, look at size
parse if you can
The problem is that there is a 32k buffer we read.
We get the first message that is say 1k in size,
subtract that 1k to 31k left to parse. We then
get the next header and notice that the length
of the message is 33k. Which is obviously larger
than what we read in. FRR has no recover mechanism
nor is there a way to know, a priori, what the maximum
size the kernel will send us.
Modify FRR to look at the kernel message and see if the
buffer is large enough, if not, make it large enough to
read in the message.
This code has to be per netlink socket because of the usage
of pthreads. So add to `struct nlsock` the buffer and current
buffer length. Growing it as necessary.
Fixes: #10404
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Store the fd that corresponds to the appropriate `struct nlsock` and pass
that around in the dplane context instead of the pointer to the nlsock.
Modify the kernel_netlink.c code to store in a hash the `struct nlsock`
with the socket fd as the key.
Why do this? The dataplane context is used to pass around the `struct nlsock`
but the zebra code has a bug where the received buffer for kernel netlink
messages from the kernel is not big enough. So we need to dynamically
grow the receive buffer per socket, instead of having a non-dynamic buffer
that we read into. By passing around the fd we can look up the `struct nlsock`
that will soon have the associated buffer and not have to worry about `const`
issues that will arise.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Store and use the sequence number instead of using what is in
the `struct nlsock`. Future commits are going away from storing
the `struct nlsock` and the copy of the nlsock was guaranteeing
unique sequence numbers per message. So let's store the
sequence number to use instead.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Using memcmp is wrong because struct ipaddr may contain unitialized
padding bytes that should not be compared.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
When using wait for install there exists situations where
zebra will issue several route change operations to the kernel
but end up in a state where we shouldn't be at the end
due to extra data being received. Example:
a) zebra receives from bgp a route change, installs sends the
route to the kernel.
b) zebra receives a route deletion from bgp, removes the
struct route entry and then sends to the kernel a deletion.
c) zebra receives an asynchronous notification that (a) succeeded
but we treat this as a new route.
This is the ships in the night problem. In this case if we receive
notification from the kernel about a route that we know nothing
about and we are not in startup and we are doing asic offload
then we can ignore this update.
Ticket: #2563300
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The ctx->zd_is_update is being set in various
spots based upon the same value that we are
passing into dplane_ctx_ns_init. Let's just
consolidate all this into the dplane_ctx_ns_init
so that the zd_is_udpate value is set at the
same time that we increment the sequence numbers
to use.
As a note for future me's reading this. The sequence
number choosen for the seq number passed to the
kernel is that each context gets a copy of the
appropriate nlsock to use. Since it's a copy
at a point in time, we know we have a unique sequence
number value.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When nl_batch_read_resp gets a full on failure -1 or an implicit
ack 0 from the kernel for a batch of code. Let's immediately
mark all of those in the batch pass/fail as needed. Instead
of having them marked else where.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
I'm seeing this crash in various forms:
Program terminated with signal SIGSEGV, Segmentation fault.
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f418efbc7c0 (LWP 3580253))]
(gdb) bt
(gdb) f 4
267 (*func)(hb, arg);
(gdb) p hb
$1 = (struct hash_bucket *) 0x558cdaafb250
(gdb) p *hb
$2 = {len = 0, next = 0x0, key = 0, data = 0x0}
(gdb)
I've also seen a crash where data is 0x03.
My suspicion is that hash_iterate is calling zebra_nhg_sweep_entry which
does delete the particular entry we are looking at as well as possibly other
entries when the ref count for those entries gets set to 0 as well.
Then we have this loop in hash_iterate.c:
for (i = 0; i < hash->size; i++)
for (hb = hash->index[i]; hb; hb = hbnext) {
/* get pointer to next hash bucket here, in case (*func)
* decides to delete hb by calling hash_release
*/
hbnext = hb->next;
(*func)(hb, arg);
}
Suppose in the previous loop hbnext is set to hb->next and we call
zebra_nhg_sweep_entry. This deletes the previous entry and also
happens to cause the hbnext entry to be deleted as well, because of nhg
refcounts. At this point in time the memory pointed to by hbnext is
not owned by the pthread anymore and we can end up on a state where
it's overwritten by another pthread in zebra with data for other incoming events.
What to do? Let's change the sweep function to a hash_walk and have
it stop iterating and to start over if there is a possible double
delete operation.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add to `show zebra` whether or not RA is compiled into FRR
and whether or not BGP is using RFC 5549 at the moment.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
The dataplane pthread only processes a limited set of incoming
netlink notifications: only register for that set of events,
reducing duplicate incoming netlink messages.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
Current code treats all metaqueues as lists of route_node structures.
However, some queues contain other structures that need to be cleaned up
differently. Casting the elements of those queues to struct route_node
and dereferencing them leads to a crash. The crash may be seen when
executing bgp_multi_vrf_topo2.
Fix the code by using the proper list element types.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The name 'opaque' is a little general - call the route_entry
struct 're_opaque' to make it more specific.
Signed-off-by: Mark Stapp <mstapp@nvidia.com>
RA packets are pretty chatty and when there is a warning from
a missconfiguration on the network, the log file gets filed
up with warnings. Modify the code in rtadv.c to only spit
out the warning in these cases at most every 6 hours.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
EVPN route add should be queued to preserve the config order.
In particular, against deletion in rib_delete().
Signed-off-by: Xiao Liang <shaw.leon@gmail.com>
vrf_disable is always called first before
vrf_delete. The rnh_table and rnh_table_multicast tables
are already deleted as part of vrf_disable. No need
to do it again.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
VRF name should not be printed in the config since 574445ec. The update
was done for NB config output but I missed it for regular vty output.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
Add a thread_ignore_late_timer(struct thread *thread) function
that allows thread.c to ignore when timers are late to the party.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>