There are some observed instances where we end up trying to cancel a rw
job based on a file descriptor that we don't have a reference on. The
specific cancel function for rw jobs assumes it's called with a file
descriptor that is valid within pollfds and will cause a segmentation
fault by buffer overrun if this is not the case.
Instead log it and move on. Since the fd does not exist this should
patch over the buggy behavior and provide additional information to help
in finding the root cause.
Reviewed-by: Donald Sharp <sharpd@cumulusnetworks.com>
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When we remove a thread from a pqueue, use the saved
index to go to the correct spot immediately instead of
having to search the whole queue for it.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When displaying thread cpu data, display unsigned instead
of signed data when we get really really really large
numbers of invocations.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
list_free is occassionally being used to delete the
list and accidently not deleting all the nodes.
We keep running across this usage pattern. Let's
remove the temptation and only allow list_delete
to handle list deletion.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Convert the list_delete(struct list *) function to use
struct list **. This is to allow the list pointer to be nulled.
I keep running into uses of this list_delete function where we
forget to set the returned pointer to NULL and attempt to use
it and then experience a crash, usually after the developer
has long since left the building.
Let's make the api explicit in it setting the list pointer
to null.
Cynical Prediction: This code will expose a attempt
to use the NULL'ed list pointer in some obscure bit
of code.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
list_delete does not set the list pointer to NULL
Thus when we accidently use it later we happily write
off into lala land instead of crashing imediately
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
1) Some hash key functions where converting pointers
directly to a 32 bit value via downcasting. Pointers
are 64 bit on a majority of our platforms.
2) Some hashes were being created with 256 entries,
downsize the hash creation size to more appropriate
values.
3) Add hash names to hash creation so we can watch
the hash via 'show debugging hashtable'
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
* Fix bad format specifier in thread.[ch]
* Move PRINTF_ATTRIBUTE macro to zebra.h
* Use PRINTF_ATTRIBUTE on termtable printers
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Add support for naming pthreads. Also, note that we don't have any
records yet if that's the case.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This patch fixes up show thread commands so that they know about
and operate on all extant thread_masters, since we can now have multiple
running in any given application.
This change also eliminates a heap use after free that appears when
using a single cpu_record shared among multiple threads. Since struct
thread's have pointers to bits of memory that are freed when the global
statistics hash table is freed, later accesses are invalid. By moving
the stats hash to be unique to each thread_master this problem is
sidestepped.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Account for the pipe poker in poll() by explicitly returning NULL when
we have no events, timers or file descriptors to work with
* Add a comment explaining exactly what we are doing and why
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
* Update pollfds copy as well as the original
* Keep array count for copy in thread_master
* Remove last remnants of POLLHUP in .events field
* Remove unused snmpcount (lolwut)
* Improve docs
* Add missing do_thread_cancel() call in thread_cancel_event()
* Change thread_fetch() to always enter poll() to avoid starving i/o
* Remember to free up cancel_req when destroying thread_master
* Fix dereference of null pointer
* Fix dead store to timeval
* Fix missing condition for condition variable :-)
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This patch implements an MT-safe version of thread_cancel() in
thread_cancel_async(). Behavior as follows:
* Cancellation requests are queued into a list
* Cancellation requests made from the same pthread as the thread_master
owner are serviced immediately (thread_cancel())
* Cancellation requests made from a separate pthread are queued and the
call blocks on a condition variable until the owning pthread services
the request, at which point the condition variable is signaled and
execution continues (thread_cancel_async())
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
it's just an alias for a millisecond timer used in exactly nine places
and serves only to complicate
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
If fd_poll() is called with no file descriptors, an incorrect check in
the function prelude causes it to return instantly; for a thread that
wishes to poll but has no file descriptors, this results in busy
waiting. Desired behavior is to block.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
a bunch of pollfds can cause a stack overflow when using a stack
allocated buffer...silly me...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When scheduling a task onto a thread master owned by another pthread, we
need to lock the thread master's mutex. However, if the pthread which
owns that thread master is in poll(), we could be stuck waiting for a
very long time. To solve this, we copy all data poll() needs and unlock
during poll(). To break the target pthread out of poll(), thread_master
has gained a pipe whose reading end is passed into poll(). After an event
that requires immediate action by the target pthread, a byte is written
into the pipe in order to wake it up.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
[DL: split off from select() removal]
poll() is present on every supported platform and does not have an upper
limit on file descriptors.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
[DL: split off from AWAKEN() change]
Allow some more flexibility in case callers wish to manage their own
thread pointers and don't require or don't want the thread to keep a
back reference to its holding pointer.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The FSF's address changed, and we had a mixture of comment styles for
the GPL file header. (The style with * at the beginning won out with
580 to 141 in existing files.)
Note: I've intentionally left intact other "variations" of the copyright
header, e.g. whether it says "Zebra", "Quagga", "FRR", or nothing.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Passing stack value to thread_add_* causes thread->ref to become an
invalid pointer when the value goes out of scope
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
When scheduling a thread, the scheduling function returns a pointer to
the struct thread that was placed on one of the scheduling queues in the
associated thread master. This pointer is used to check whether or not
the thread is scheduled, and is passed to thread_cancel() should the
daemon need to cancel that particular task.
The thread_fetch() function is called to retrieve the next thread to
execute. However, when it returns, the aforementioned pointer is not
updated. As a result, in order for the above use cases to work, every
thread handler function must set the associated pointer to NULL. This is
bug prone, and moreover, not thread safe.
This patch changes the thread scheduling functions to return void. If
the caller needs a reference to the scheduled thread, it must pass in a
pointer to store the pointer to the thread struct in. Subsequent calls
to thread_cancel(), thread_cancel_event() or thread_fetch() will result
in that pointer being nulled before return. These operations occur
within the thread_master critical sections.
Overall this should avoid bugs introduced by thread handler funcs
forgetting to null the associated pointer, double-scheduling caused by
overwriting pointers to currently scheduled threads without performing a
nullity check, and the introduction of true kernel threads causing race
conditions within the userspace threading world.
Also removes the return value for thread_execute since it always returns
null...
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
The way thread.c is written, a caller who wishes to be able to cancel a
thread or avoid scheduling it twice must keep a reference to the thread.
Typically this is done with a long lived pointer whose value is checked
for null in order to know if the thread is currently scheduled. The
check-and-schedule idiom is so common that several wrapper macros in
thread.h existed solely to provide it.
This patch removes those macros and adds a new parameter to all
thread_add_* functions which is a pointer to the struct thread * to
store the result of a scheduling call. If the value passed is non-null,
the thread will only be scheduled if the value is null. This helps with
consistency.
A Coccinelle spatch has been used to transform code of the form:
if (t == NULL)
t = thread_add_* (...)
to the form
thread_add_* (..., &t)
The THREAD_ON macros have also been transformed to the underlying
thread.c calls.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Rename HAVE_POLL to HAVE_POLL_CALL, when compiling with
snmp and poll enabled this was causing issues.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This change adds three fields to thread_master and associated code to
use them. The fields are:
* long selectpoll_timeout
This is a millisecond value that, if nonzero, will override the
internally calculated timeout for select()/poll(). -1 indicates
nonblocking while a positive value indicates the desired timeout in
milliseconds.
* bool spin
This indicates whether a call to thread_fetch() should result in a loop
until work is available. By default this is set to true, in order to
keep the default behavior. In this case a return value of NULL indicates
that a fatal signal was received in select() or poll(). If it is set to
false, thread_fetch() will return immediately. NULL is then an
acceptable return value if there is no work to be done.
* bool handle_signals
This indicates whether or not the pthread that owns the thread master
is responsible for handling signals (since this is an MT-unsafe
operation, it is best to have just the root thread do it). It is set to
true by default. Non-root pthreads should set this to false.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Fixes a few insufficient critical sections. Adds back locking for
thread_cancel(), since while thread_cancel() is only safe to call from
the pthread which owns the thread master due to races involving
thread_fetch() modifying thread master's ready queue, we still need
mutual exclusion here for all of the other public thread.c functions to
maintain their MT-safety.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
This change introduces synchronization mechanisms to thread.c in order
to allow safe concurrent use.
Thread.c should now be threadstafe with respect to:
* struct thread
* struct thread_master
Calls into thread.c for operations upon data of this type should not
require external synchronization.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
In the case where we are using select as
the operator *and* we call
funcname_thread_add_read_write *and* the
fd is already set, we would overwrite
the read/write direction to always be READ.
Clearly this was a bad idea.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Since time is no longer cached, if we schedule something with zero
timeout, it will automatically be negative by the time we reach the
event loop.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
monotime_since() does exactly the same thing.
... and timeval_elapsed is now private to lib/thread.c
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Fix the display of 'show thread cpu' to keep track
of the number of active threads and to display that
information.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
lib: Fix thread_execute_crash
This commit does these things:
1) Make thread_add_unuse own the setting of THREAD_UNUSED.
2) Move thread->hist finding to to thread_get.
We are storing the thread->hist even when the thread
is on the unused. This means that we check to see
if the funcname or func have changed and we get new
history. Else we've probably just retrieved the last
unused which has the same func/funcanme. This is
a common practice to do THREAD_OFF/THREAD_ON in
quick succession.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com.
This moves all install_element calls into the file where the DEFUNs are
located. This fixes several small related bugs:
- ospf6d wasn't installing a "no interface FOO" command
- zebra had a useless copy of "interface FOO"
- pimd's copy of "interface FOO" was not setting qobj_index, which means
"description LINE" commands would fail with an error
The next commit will do the actual act of making "foo_cmd" static.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Since we have autoconf results from a wide swath of target platforms, we
can go remove checks that have the same result on all systems.
This also removes several "fallback" implementations of functions that,
at some point in the history, weren't available on all target platforms.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
When displaying thread time for long running/busy
protocols, the space allocated may not be sufficient.
Allow the runtime to take a bit more space.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is a rather large mechanical commit that splits up the memory types
defined in lib/memtypes.c and distributes them into *_memory.[ch] files
in the individual daemons.
The zebra change is slightly annoying because there is no nice place to
put the #include "zebra_memory.h" statement.
bgpd, ospf6d, isisd and some tests were reusing MTYPEs defined in the
library for its own use. This is bad practice and would break when the
memtype are made static.
Acked-by: Vincent JARDIN <vincent.jardin@6wind.com>
Acked-by: Donald Sharp <sharpd@cumulusnetworks.com>
[CF: rebased for cmaster-next]
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Signed-off-by: Christian Franke <chris@opensourcerouting.org>
The regular expression for finding DEFUN/ALIAS in
extract.pl looks for "DEFUN (" or "ALIAS (" if
the *.c file does not have this then it will just
silently ignore the cli.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
AgentX fd/timeout handling is rather hackishly monkeyed into thread.c.
Replace with code that uses plain thread_* functions.
NB: Net-SNMP's API rivals Quagga's in terms of age and absence of
documentation. netsnmp_check_outstanding_agent_requests() in particular
seems to be unused and is therefore untested.
The most useful documentation on this is actually the blog post Vincent
Bernat wrote when he originally integrated this into lldpd and Quagga:
https://vincent.bernat.im/en/blog/2012-snmp-event-loop.html
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Another zoo extension, this adds a timer scheduling function that takes
a struct timeval argument (which is actually what the wrappers boil down
to, yet it's not exposed...)
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
QUAGGA_CLK_REALTIME and QUAGGA_CLK_REALTIME_STABILISED aren't used
anywhere in the code. Remove. The enum is kept to avoid having to
change the calls everywhere.
Same applies to the workaround code for systems that don't have a
monotonic clock. None of the systems Quagga works on fall into that
category; Linux, BSD and Solaris all do clock_gettime, for OSX we have
mach_absolute_time() - that covers everything.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
ospf->lsa_refresher_started is only used in relative timing to itself;
replace with monotonic clock which is appropriate for this.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
- HAVE_POLL is overloaded by net-snmp
- missing includes
- ospf6_snmp converted to vrf_iflist()
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Two Fixes:
1) When a fd has both read and write as a .events.
(POLLHUP | POLLIN | POLLOUT) and a
thread_cancel_read_write call is executed
from a protocol, the code was blindly removing
the fd from consideration at all.
2) POLLNVAL was being evaluated before POLLIN|POLLOUT
were being evaluated. While I didn't see a case
of POLLNVAL being included with other .revent flags
I decided to move the POLLNVAL and POLLHUP handling
to the same section of code.
Additionally the function thread_cancel_read_write
was poorly named and let me to poorly implement
the poll version of it. I've renamed the function
thread_cancel_read_or_write in an attempt to
make this problem moot in the future.
Ticket: CM-11027
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
(cherry picked from commit f6da66a913)
now that we know what thread we're currently executing, let's add that
information to SEGV / assert backtraces.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
(cherry picked from commit 615f9f18fc025757a255f936748fc1e86e922783)
the library's thread scheduling functions keep track of the thread
function's name, so far so good. However, copying the compiler-provided
constant into a buffer inside the thread structure is plain useless.
Also, strip_funcname() was trying to support something that never
happens.
Instead, let's use some bytes here to track where threads are scheduled
from. Another commit will print that information on crashes.
Ripping out useless stuff: -64 bytes in the thread structure
Re-add as const ptr: +8 bytes
Extra debug info: +12 bytes
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
(cherry picked from commit 3493b7731b750cbc62f00be94b624a08ccccf0b2)
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Reviewed-by: Don Slice <dslice@cumulusnetworks.com>
Reviewed-by: Daniel Walton <dwalton@cumulusnetworks.com>