Add a counter to the number of times a thread is starved from
a timer event and add the output to `show thread cpu`
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Add a thread_ignore_late_timer(struct thread *thread) function
that allows thread.c to ignore when timers are late to the party.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Do not return pointer to the newly created thread from various thread_add
functions. This should prevent developers from storing a thread pointer
into some variable without letting the lib know that the pointer is
stored. When the lib doesn't know that the pointer is stored, it doesn't
prevent rescheduling and it can lead to hard to find bugs. If someone
wants to store the pointer, they should pass a double pointer as the last
argument.
Signed-off-by: Igor Ryzhov <iryzhov@nfware.com>
The function thread_is_scheduled allows us to know if
the particular thread is scheduled for execution or not.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
This might be faster if at some point in the future the Linux vDSO
supports CLOCK_THREAD_CPUTIME_ID without making a syscall. (Same
applies for other OSes.)
Signed-off-by: David Lamparter <equinox@diac24.net>
...really no reason to force this into a compile time decision. The
only point is avoiding the getrusage() syscall, which can easily be a
runtime decision.
[v2: also split cputime & walltime limits]
Signed-off-by: David Lamparter <equinox@diac24.net>
Incorporate into the `show thread cpu` the number of times
we have issued warnings about a particular thread being
too slow.
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
When handling a large number of events at one time
FRR will call monotime and getrusage 2 times for each
event. With this change modify the code to change
this to (X events / 2) + 1 calls of getrusage and monotime
Signed-off-by: Donald Sharp <sharpd@nvidia.com>
Extend the thread_cancel_event api so that it's more complete:
look in all the lists of events, including io and timers, for
matching tasks. Add a limited version of the api that only
examines tasks in the event and ready queues.
BGP appears to require the old behavior, so change its macro
to use the more limited cancel api.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
No reason for the thread/task cancellation struct to be public:
move it out of the header file. Also add a flags field.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
gcc 10 complains about some of our format specs, fix them. Use
atomic size_t in thread stats, to work around platform
differences.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Replace all lib/thread cancel macros, use thread_cancel()
everywhere. Only the THREAD_OFF macro and thread_cancel() api are
supported. Also adjust thread_cancel_async() to NULL caller's pointer (if
present).
Signed-off-by: Mark Stapp <mjs@voltanet.io>
Change thread_cancel to take a ** to an event, NULL-check
before dereferencing, and NULL the caller's pointer. Update
many callers to use the new signature.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This api was earlier present in the daemon code but as multiple daemons
need it moving it to lib will avoid unnecessary copy-paste.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Replaces the use of pqueue_* for the thread_master's timer list with an
instance of DECLARE_HEAP_*.
Signed-off-by: David Lamparter <equinox@diac24.net>
Replaces the use of pqueue_* for the thread_master's timer list with an
instance of DECLARE_SKIPLIST_*.
Signed-off-by: David Lamparter <equinox@diac24.net>
Replaces the open-coded thread_list with a DECLARE_LIST instantiation.
Some function prototypes are actually identical to what was previously
open-coded.
Signed-off-by: David Lamparter <equinox@diac24.net>
These are necessary to use functions defined in these headers from C++.
Signed-off-by: David Lamparter <equinox@diac24.net>
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
C++ doesn't have ISO C11 stdatomic.h or "_Atomic inttype", so use
std::atomic instead to get the headers compatible.
Signed-off-by: David Lamparter <equinox@diac24.net>
Allow the user to specify a run name for display in
'show thread cpu' that is different than the function
name we are calling.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Thread statistics are collected and stored in a hashtable shared across
threads, but while the hashtable itself is protected by a mutex, the
records themselves were not being updated safely. Change all thread
history collection to use atomic operations.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Add support for naming pthreads. Also, note that we don't have any
records yet if that's the case.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This patch fixes up show thread commands so that they know about
and operate on all extant thread_masters, since we can now have multiple
running in any given application.
This change also eliminates a heap use after free that appears when
using a single cpu_record shared among multiple threads. Since struct
thread's have pointers to bits of memory that are freed when the global
statistics hash table is freed, later accesses are invalid. By moving
the stats hash to be unique to each thread_master this problem is
sidestepped.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Update pollfds copy as well as the original
* Keep array count for copy in thread_master
* Remove last remnants of POLLHUP in .events field
* Remove unused snmpcount (lolwut)
* Improve docs
* Add missing do_thread_cancel() call in thread_cancel_event()
* Change thread_fetch() to always enter poll() to avoid starving i/o
* Remember to free up cancel_req when destroying thread_master
* Fix dereference of null pointer
* Fix dead store to timeval
* Fix missing condition for condition variable :-)
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This patch implements an MT-safe version of thread_cancel() in
thread_cancel_async(). Behavior as follows:
* Cancellation requests are queued into a list
* Cancellation requests made from the same pthread as the thread_master
owner are serviced immediately (thread_cancel())
* Cancellation requests made from a separate pthread are queued and the
call blocks on a condition variable until the owning pthread services
the request, at which point the condition variable is signaled and
execution continues (thread_cancel_async())
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
it's just an alias for a millisecond timer used in exactly nine places
and serves only to complicate
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
a bunch of pollfds can cause a stack overflow when using a stack
allocated buffer...silly me...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When scheduling a task onto a thread master owned by another pthread, we
need to lock the thread master's mutex. However, if the pthread which
owns that thread master is in poll(), we could be stuck waiting for a
very long time. To solve this, we copy all data poll() needs and unlock
during poll(). To break the target pthread out of poll(), thread_master
has gained a pipe whose reading end is passed into poll(). After an event
that requires immediate action by the target pthread, a byte is written
into the pipe in order to wake it up.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
[DL: split off from select() removal]
poll() is present on every supported platform and does not have an upper
limit on file descriptors.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
[DL: split off from AWAKEN() change]
Allow some more flexibility in case callers wish to manage their own
thread pointers and don't require or don't want the thread to keep a
back reference to its holding pointer.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
When scheduling a thread, the scheduling function returns a pointer to
the struct thread that was placed on one of the scheduling queues in the
associated thread master. This pointer is used to check whether or not
the thread is scheduled, and is passed to thread_cancel() should the
daemon need to cancel that particular task.
The thread_fetch() function is called to retrieve the next thread to
execute. However, when it returns, the aforementioned pointer is not
updated. As a result, in order for the above use cases to work, every
thread handler function must set the associated pointer to NULL. This is
bug prone, and moreover, not thread safe.
This patch changes the thread scheduling functions to return void. If
the caller needs a reference to the scheduled thread, it must pass in a
pointer to store the pointer to the thread struct in. Subsequent calls
to thread_cancel(), thread_cancel_event() or thread_fetch() will result
in that pointer being nulled before return. These operations occur
within the thread_master critical sections.
Overall this should avoid bugs introduced by thread handler funcs
forgetting to null the associated pointer, double-scheduling caused by
overwriting pointers to currently scheduled threads without performing a
nullity check, and the introduction of true kernel threads causing race
conditions within the userspace threading world.
Also removes the return value for thread_execute since it always returns
null...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>