- tracepoint() -> frrtrace()
- tracelog() -> frrtracelog()
- tracepoint_enabled() -> frrtrace_enabled()
Also removes copypasta'd #ifdefs for those LTTng macros, those are
handled in lib/trace.h
Signed-off-by: Quentin Young <qlyoung@nvidia.com>
There are situations where POLLERR will be returned. But
since we were not handling it. Thread processing effectively
is turned into an infinite loop, which is bad.
Modify the code so that if we receive a POLLERR we turn it
into a read event to be handled as an error from the handler
function.
This was discovered in pim:
Thread statistics for pimd:
Showing poll FD's for main
--------------------------
Count: 14/1024
0 fd: 9 events: 1 revents: 0 mroute_read
1 fd: 12 events: 1 revents: 0 vty_accept
2 fd: 13 events: 1 revents: 0 vtysh_accept
3 fd: 11 events: 1 revents: 0 zclient_read
4 fd: 15 events: 1 revents: 0 mroute_read
5 fd: 16 events: 1 revents: 0 mroute_read
6 fd: 17 events: 1 revents: 0 pim_sock_read
7 fd: 19 events: 1 revents: 0 pim_sock_read
8 fd: 21 events: 1 revents: 0 pim_igmp_read
9 fd: 22 events: 1 revents: 0 pim_sock_read
10 fd: 23 events: 1 revents: 0 pim_sock_read
11 fd: 20 events: 1 revents: 0 vtysh_read
12 fd: 18 events: 1 revents: 0 pim_sock_read
13 fd: 24 events: 0 revents: 0
strace was showing this line over and over and over:
poll([{fd=9, events=POLLIN}, {fd=12, events=POLLIN}, {fd=13, events=POLLIN}, {fd=11, events=POLLIN}, {fd=15, events=POLLIN}, {fd=16, events=POLLIN}, {fd=17, events=POLLIN}, {fd=19, events=POLLIN}, {fd=21, events=POLLIN}, {fd=22, events=POLLIN}, {fd=23, events=POLLIN}, {fd=20, events=POLLIN}, {fd=18, events=POLLIN}, {fd=6, events=POLLIN}], 14, 20) = 1 ([{fd=21, revents=POLLERR}])
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Somewhere along the way the indentation for comments got
all messed up. Let's make it follow our standards and
also look right too.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This api was earlier present in the daemon code but as multiple daemons
need it moving it to lib will avoid unnecessary copy-paste.
Signed-off-by: Anuradha Karuppiah <anuradhak@cumulusnetworks.com>
Remove mid-string line breaks, cf. workflow doc:
.. [#tool_style_conflicts] For example, lines over 80 characters are allowed
for text strings to make it possible to search the code for them: please
see `Linux kernel style (breaking long lines and strings)
<https://www.kernel.org/doc/html/v4.10/process/coding-style.html#breaking-long-lines-and-strings>`_
and `Issue #1794 <https://github.com/FRRouting/frr/issues/1794>`_.
Scripted commit, idempotent to running:
```
python3 tools/stringmangle.py --unwrap `git ls-files | egrep '\.[ch]$'`
```
Signed-off-by: David Lamparter <equinox@diac24.net>
Fix a number of library and daemon issues so that daemons can
call frr_fini() during normal termination. Without this,
temporary logging files are left behind in /var/tmp/frr/.
Signed-off-by: Mark Stapp <mjs@voltanet.io>
This is a full rewrite of the "back end" logging code. It now uses a
lock-free list to iterate over logging targets, and the targets
themselves are as lock-free as possible. (syslog() may have a hidden
internal mutex in the C library; the file/fd targets use a single
write() call which should ensure atomicity kernel-side.)
Note that some functionality is lost in this patch:
- Solaris printstack() backtraces are ditched (unlikely to come back)
- the `log-filter` machinery is gone (re-added in followup commit)
- `terminal monitor` is temporarily stubbed out. The old code had a
race condition with VTYs going away. It'll likely come back rewritten
and with vtysh support.
- The `zebra_ext_log` hook is gone. Instead, it's now much easier to
add a "proper" logging target.
v2: TLS buffer to get some actual performance
Signed-off-by: David Lamparter <equinox@diac24.net>
getrusage, in a heavily stressed system, can account for
signficant running time due to process switching to the kernel.
Allow the end-operator to specify `--disable-cpu-time` to
avoid this call. Additionally we cause `show thread cpu` to
not show up if this is selected.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When POLLNVAL is received for a FD then that FD is removed from the
pfd array and also array is rearranged using memmove. When memmove
is used then unused index are not cleanedup. When a new FD takes
up that index then it ends up using stale events without any handler
set for the same.
Signed-off-by: Santosh P K <sapk@vmware.com>
frr_with_mutex(...) { ... } locks and automatically unlocks the listed
mutex(es) when the block is exited. This adds a bit of safety against
forgetting the unlock in error paths & co. and makes the code a slight
bit more readable.
Signed-off-by: David Lamparter <equinox@opensourcerouting.org>
Debian packaging when run finds a bunch of spelling errors:
I: frr: spelling-error-in-binary usr/bin/vtysh occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bfdd Amount of times Number of times
I: frr: spelling-error-in-binary usr/lib/frr/bgpd occurences occurrences
I: frr: spelling-error-in-binary usr/lib/frr/bgpd recieved received
I: frr: spelling-error-in-binary usr/lib/frr/isisd betweeen between
I: frr: spelling-error-in-binary usr/lib/frr/ospf6d Infomation Information
I: frr: spelling-error-in-binary usr/lib/frr/ospfd missmatch mismatch
I: frr: spelling-error-in-binary usr/lib/frr/pimd bootsrap bootstrap
I: frr: spelling-error-in-binary usr/lib/frr/pimd Unknwon Unknown
I: frr: spelling-error-in-binary usr/lib/frr/zebra Requsted Requested
I: frr: spelling-error-in-binary usr/lib/frr/zebra uknown unknown
I: frr: spelling-error-in-binary usr/lib/x86_64-linux-gnu/frr/libfrr.so.0.0.0 overriden overridden
This commit fixes all of them except the bgp `recieved` issue due to
it being part of json output. That one will need to go through
a deprecation cycle.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Replaces the use of pqueue_* for the thread_master's timer list with an
instance of DECLARE_HEAP_*.
Signed-off-by: David Lamparter <equinox@diac24.net>
When displaying `show thread poll` data add the
function we are supposed to call when the poll
event happens.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
When adding a read/write poll event and we are using a developmental
build add a bit of code to ensure that we do not already have an read
or write event scheduled.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
If we have a case where have created a fd for i/o and we have
removed the handling thread but still have the fd in the poll
data structure, there existed a case where we would get
the handle this fd return from poll but we would immediately
do nothing with it because we didn't have a thread to hand
the event to.
This leads to an infinite loop. Prevent the infinite loop
from happening and log the problem.
We still need to find the cause of this happening. But
let's prevent the system from melting down in the mean time.
Fixes: #2796
Signed-off-by: David Lamparter <equinox@diac24.net>
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Various compilers in our CI system were complaining about various
auto-conversions. Let's get these cleaned up a bit more.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Upon startup FRR reads in the MAX_FDS variable from
it's control files via the getrlimit call. We then
setup code to limit the poll data structure size to
that value. The OS also limits our FD's to that value
because that is what is set. Provide a methodology
that a interested end user can figure this data out.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The 'show thread cpu' command referenced a 'b' option. Which
is not parsed at all in the parse_filter function. As such
I do not know what this was referencing as that it has been
removed. Update the help strings to reflect this reality.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
This is necessary to avoid a name collision with std::for_each
from C++.
Fixes the compilation of the gRPC northbound module.
Signed-off-by: Renato Westphal <renato@opensourcerouting.org>
It doesn't make much sense for a hash function to modify its argument,
so const the hash input.
BGP does it in a couple places, those cast away the const. Not great but
not any worse than it was.
Signed-off-by: Quentin Young <qlyoung@cumulusnetworks.com>
Replaces the use of pqueue_* for the thread_master's timer list with an
instance of DECLARE_SKIPLIST_*.
Signed-off-by: David Lamparter <equinox@diac24.net>
Replaces the open-coded thread_list with a DECLARE_LIST instantiation.
Some function prototypes are actually identical to what was previously
open-coded.
Signed-off-by: David Lamparter <equinox@diac24.net>
C++ doesn't have ISO C11 stdatomic.h or "_Atomic inttype", so use
std::atomic instead to get the headers compatible.
Signed-off-by: David Lamparter <equinox@diac24.net>
When using getrusage, we have multiple choices about what
to call for data gathering about this particular thread of execution.
RUSAGE_SELF -> This means gather all cpu run time for all pthreads associated
with this process.
RUSAGE_THREAD -> This means gather all cpu run time for this particular
pthread.
Clearly with data gathering for slow thread as well as `show thread cpu`
it would be preferable to gather only data about the current running
pthread. This probably was the original behavior of using RUSAGE_SELF
when we didn't have multiple pthreads. So it didn't matter so much.
Prior to this change, 10 iterations of 1 million routes install/remove
from zebra would give us this cpu time for the dataplane pthread:
Showing statistics for pthread Zebra dplane thread
--------------------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
0 280902.149 326541 860 2609982 550 2468910 E dplane_thread_loop
After this change we are seeing this:
Showing statistics for pthread Zebra dplane thread
--------------------------------------------------
CPU (user+system): Real (wall-clock):
Active Runtime(ms) Invoked Avg uSec Max uSecs Avg uSec Max uSecs Type Thread
0 58045.560 334944 173 277226 539 2502268 E dplane_thread_loop
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
The ->hash_cmp and linked list ->cmp functions were sometimes
being used interchangeably and this really is not a good
thing. So let's modify the hash_cmp function pointer to return
a boolean and convert everything to use the new syntax.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>
Don't allocate threads in the stack, but use the standardized
`thread_get` and `thread_add_unused` to avoid creating corner cases in
the thread API.
This fixes a thread mutex memory leak in FreeBSD.
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
Two important changes:
* Centralize the thread teardown procedure;
* Save and restore thread mutex context to avoid losing the memory
pointer;
Signed-off-by: Rafael Zalamena <rzalamena@opensourcerouting.org>
The master->unused list was unbounded during normal operation.
A full BGP feed on my machine left 11k threads on the unused
list, taking up over 2mb of data. This seemed a bit excessive,
reduce to a limit of 10.
Also fix a crash that this exposed where we assumed that a thread
structure was not deleted.
Future committers can make this configurable? or modify
the value to something better for their system. I am
dubious of the value of this.
Signed-off-by: Donald Sharp <sharpd@cumulusnetworks.com>