getrusage, in a heavily stressed system, can account for
signficant running time due to process switching to the kernel.
Allow the end-operator to specify `--disable-cpu-time` to
avoid this call. Additionally we cause `show thread cpu` to
not show up if this is selected.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When POLLNVAL is received for a FD then that FD is removed from the
pfd array and also array is rearranged using memmove. When memmove
is used then unused index are not cleanedup. When a new FD takes
up that index then it ends up using stale events without any handler
set for the same.
Signed-off-by: Santosh P K <sapk@vmware.com>
frr_with_mutex(...) { ... } locks and automatically unlocks the listed
mutex(es) when the block is exited. This adds a bit of safety against
forgetting the unlock in error paths & co. and makes the code a slight
bit more readable.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Debian packaging when run finds a bunch of spelling errors:
I: frr: spelling-error-in-binary usr/bin/vtysh occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bfdd Amount of times Number of times
I: frr: spelling-error-in-binary usr/lib/frr/bgpd occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bgpd recieved received
I: frr: spelling-error-in-binary usr/lib/frr/isisd betweeen between
I: frr: spelling-error-in-binary usr/lib/frr/ospf6d Infomation Information
I: frr: spelling-error-in-binary usr/lib/frr/ospfd missmatch mismatch
I: frr: spelling-error-in-binary usr/lib/frr/pimd bootsrap bootstrap
I: frr: spelling-error-in-binary usr/lib/frr/pimd Unknwon Unknown
I: frr: spelling-error-in-binary usr/lib/frr/zebra Requsted Requested
I: frr: spelling-error-in-binary usr/lib/frr/zebra uknown unknown
I: frr: spelling-error-in-binary usr/lib/x86_64-linux-gnu/frr/libfrr.so.0.0.0 overriden overridden
This commit fixes all of them except the bgp `recieved` issue due to
it being part of json output. That one will need to go through
a deprecation cycle.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Replaces the use of pqueue_* for the thread_master's timer list with an
instance of DECLARE_HEAP_*.
Signed-off-by: David Lamparter <equinox@diac24.net>
When displaying `show thread poll` data add the
function we are supposed to call when the poll
event happens.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When adding a read/write poll event and we are using a developmental
build add a bit of code to ensure that we do not already have an read
or write event scheduled.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we have a case where have created a fd for i/o and we have
removed the handling thread but still have the fd in the poll
data structure, there existed a case where we would get
the handle this fd return from poll but we would immediately
do nothing with it because we didn't have a thread to hand
the event to.
This leads to an infinite loop. Prevent the infinite loop
from happening and log the problem.
We still need to find the cause of this happening. But
let's prevent the system from melting down in the mean time.
Fixes: #2796
Signed-off-by: David Lamparter <equinox@diac24.net>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Various compilers in our CI system were complaining about various
auto-conversions. Let's get these cleaned up a bit more.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Upon startup FRR reads in the MAX_FDS variable from
it's control files via the getrlimit call. We then
setup code to limit the poll data structure size to
that value. The OS also limits our FD's to that value
because that is what is set. Provide a methodology
that a interested end user can figure this data out.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The 'show thread cpu' command referenced a 'b' option. Which
is not parsed at all in the parse_filter function. As such
I do not know what this was referencing as that it has been
removed. Update the help strings to reflect this reality.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is necessary to avoid a name collision with std::for_each
from C++.
Fixes the compilation of the gRPC northbound module.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Replaces the use of pqueue_* for the thread_master's timer list with an
instance of DECLARE_SKIPLIST_*.
Signed-off-by: David Lamparter <equinox@diac24.net>
Replaces the open-coded thread_list with a DECLARE_LIST instantiation.
Some function prototypes are actually identical to what was previously
open-coded.
Signed-off-by: David Lamparter <equinox@diac24.net>
C++ doesn't have ISO C11 stdatomic.h or "_Atomic inttype", so use
std::atomic instead to get the headers compatible.
Signed-off-by: David Lamparter <equinox@diac24.net>
When using getrusage, we have multiple choices about what
to call for data gathering about this particular thread of execution.
RUSAGE_SELF -> This means gather all cpu run time for all pthreads associated
with this process.
RUSAGE_THREAD -> This means gather all cpu run time for this particular
pthread.
Clearly with data gathering for slow thread as well as `show thread cpu`
it would be preferable to gather only data about the current running
pthread. This probably was the original behavior of using RUSAGE_SELF
when we didn't have multiple pthreads. So it didn't matter so much.
Prior to this change, 10 iterations of 1 million routes install/remove
from zebra would give us this cpu time for the dataplane pthread:
Showing statistics for pthread Zebra dplane thread
--------------------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
0 280902.149 326541 860 2609982 550 2468910 E dplane_thread_loop
After this change we are seeing this:
Showing statistics for pthread Zebra dplane thread
--------------------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
0 58045.560 334944 173 277226 539 2502268 E dplane_thread_loop
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Don't allocate threads in the stack, but use the standardized
`thread_get` and `thread_add_unused` to avoid creating corner cases in
the thread API.
This fixes a thread mutex memory leak in FreeBSD.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Two important changes:
* Centralize the thread teardown procedure;
* Save and restore thread mutex context to avoid losing the memory
pointer;
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
The master->unused list was unbounded during normal operation.
A full BGP feed on my machine left 11k threads on the unused
list, taking up over 2mb of data. This seemed a bit excessive,
reduce to a limit of 10.
Also fix a crash that this exposed where we assumed that a thread
structure was not deleted.
Future committers can make this configurable? or modify
the value to something better for their system. I am
dubious of the value of this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
We were storing Poll data for the read and write
memory information in MTYPE_THREAD, so a show run
would not be able to show actual amount of memory
associated with the `struct thread`.
Remove unnecessary NULL checks on malloc.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Thread statistics are collected and stored in a hashtable shared across
threads, but while the hashtable itself is protected by a mutex, the
records themselves were not being updated safely. Change all thread
history collection to use atomic operations.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The following types are nonstandard:
- u_char
- u_short
- u_int
- u_long
- u_int8_t
- u_int16_t
- u_int32_t
Replace them with the C99 standard types:
- uint8_t
- unsigned short
- unsigned int
- unsigned long
- uint8_t
- uint16_t
- uint32_t
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
There are some observed instances where we end up trying to cancel a rw
job based on a file descriptor that we don't have a reference on. The
specific cancel function for rw jobs assumes it's called with a file
descriptor that is valid within pollfds and will cause a segmentation
fault by buffer overrun if this is not the case.
Instead log it and move on. Since the fd does not exist this should
patch over the buggy behavior and provide additional information to help
in finding the root cause.
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When we remove a thread from a pqueue, use the saved
index to go to the correct spot immediately instead of
having to search the whole queue for it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When displaying thread cpu data, display unsigned instead
of signed data when we get really really really large
numbers of invocations.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
list_free is occassionally being used to delete the
list and accidently not deleting all the nodes.
We keep running across this usage pattern. Let's
remove the temptation and only allow list_delete
to handle list deletion.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Convert the list_delete(struct list *) function to use
struct list **. This is to allow the list pointer to be nulled.
I keep running into uses of this list_delete function where we
forget to set the returned pointer to NULL and attempt to use
it and then experience a crash, usually after the developer
has long since left the building.
Let's make the api explicit in it setting the list pointer
to null.
Cynical Prediction: This code will expose a attempt
to use the NULL'ed list pointer in some obscure bit
of code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
list_delete does not set the list pointer to NULL
Thus when we accidently use it later we happily write
off into lala land instead of crashing imediately
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1) Some hash key functions where converting pointers
directly to a 32 bit value via downcasting. Pointers
are 64 bit on a majority of our platforms.
2) Some hashes were being created with 256 entries,
downsize the hash creation size to more appropriate
values.
3) Add hash names to hash creation so we can watch
the hash via 'show debugging hashtable'
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Fix bad format specifier in thread.[ch]
* Move PRINTF_ATTRIBUTE macro to zebra.h
* Use PRINTF_ATTRIBUTE on termtable printers
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Add support for naming pthreads. Also, note that we don't have any
records yet if that's the case.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This patch fixes up show thread commands so that they know about
and operate on all extant thread_masters, since we can now have multiple
running in any given application.
This change also eliminates a heap use after free that appears when
using a single cpu_record shared among multiple threads. Since struct
thread's have pointers to bits of memory that are freed when the global
statistics hash table is freed, later accesses are invalid. By moving
the stats hash to be unique to each thread_master this problem is
sidestepped.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Account for the pipe poker in poll() by explicitly returning NULL when
we have no events, timers or file descriptors to work with
* Add a comment explaining exactly what we are doing and why
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Update pollfds copy as well as the original
* Keep array count for copy in thread_master
* Remove last remnants of POLLHUP in .events field
* Remove unused snmpcount (lolwut)
* Improve docs
* Add missing do_thread_cancel() call in thread_cancel_event()
* Change thread_fetch() to always enter poll() to avoid starving i/o
* Remember to free up cancel_req when destroying thread_master
* Fix dereference of null pointer
* Fix dead store to timeval
* Fix missing condition for condition variable :-)
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This patch implements an MT-safe version of thread_cancel() in
thread_cancel_async(). Behavior as follows:
* Cancellation requests are queued into a list
* Cancellation requests made from the same pthread as the thread_master
owner are serviced immediately (thread_cancel())
* Cancellation requests made from a separate pthread are queued and the
call blocks on a condition variable until the owning pthread services
the request, at which point the condition variable is signaled and
execution continues (thread_cancel_async())
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
it's just an alias for a millisecond timer used in exactly nine places
and serves only to complicate
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
If fd_poll() is called with no file descriptors, an incorrect check in
the function prelude causes it to return instantly; for a thread that
wishes to poll but has no file descriptors, this results in busy
waiting. Desired behavior is to block.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
a bunch of pollfds can cause a stack overflow when using a stack
allocated buffer...silly me...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When scheduling a task onto a thread master owned by another pthread, we
need to lock the thread master's mutex. However, if the pthread which
owns that thread master is in poll(), we could be stuck waiting for a
very long time. To solve this, we copy all data poll() needs and unlock
during poll(). To break the target pthread out of poll(), thread_master
has gained a pipe whose reading end is passed into poll(). After an event
that requires immediate action by the target pthread, a byte is written
into the pipe in order to wake it up.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
[DL: split off from select() removal]
poll() is present on every supported platform and does not have an upper
limit on file descriptors.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
[DL: split off from AWAKEN() change]
Allow some more flexibility in case callers wish to manage their own
thread pointers and don't require or don't want the thread to keep a
back reference to its holding pointer.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Passing stack value to thread_add_* causes thread->ref to become an
invalid pointer when the value goes out of scope
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When scheduling a thread, the scheduling function returns a pointer to
the struct thread that was placed on one of the scheduling queues in the
associated thread master. This pointer is used to check whether or not
the thread is scheduled, and is passed to thread_cancel() should the
daemon need to cancel that particular task.
The thread_fetch() function is called to retrieve the next thread to
execute. However, when it returns, the aforementioned pointer is not
updated. As a result, in order for the above use cases to work, every
thread handler function must set the associated pointer to NULL. This is
bug prone, and moreover, not thread safe.
This patch changes the thread scheduling functions to return void. If
the caller needs a reference to the scheduled thread, it must pass in a
pointer to store the pointer to the thread struct in. Subsequent calls
to thread_cancel(), thread_cancel_event() or thread_fetch() will result
in that pointer being nulled before return. These operations occur
within the thread_master critical sections.
Overall this should avoid bugs introduced by thread handler funcs
forgetting to null the associated pointer, double-scheduling caused by
overwriting pointers to currently scheduled threads without performing a
nullity check, and the introduction of true kernel threads causing race
conditions within the userspace threading world.
Also removes the return value for thread_execute since it always returns
null...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The way thread.c is written, a caller who wishes to be able to cancel a
thread or avoid scheduling it twice must keep a reference to the thread.
Typically this is done with a long lived pointer whose value is checked
for null in order to know if the thread is currently scheduled. The
check-and-schedule idiom is so common that several wrapper macros in
thread.h existed solely to provide it.
This patch removes those macros and adds a new parameter to all
thread_add_* functions which is a pointer to the struct thread * to
store the result of a scheduling call. If the value passed is non-null,
the thread will only be scheduled if the value is null. This helps with
consistency.
A Coccinelle spatch has been used to transform code of the form:
if (t == NULL)
t = thread_add_* (...)
to the form
thread_add_* (..., &t)
The THREAD_ON macros have also been transformed to the underlying
thread.c calls.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>