* log: lower IPC connection issues to info level
... in handle_new_connection(). The caller has better context for whether a
problem merits a warning or error, and the function's return code is
sufficiently descriptive to do so. Some problems may be expected or able to be
worked around.
For example, Pacemaker's crm_mon attempts to contact pacemakerd IPC. On a
Pacemaker Remote node, that IPC will be unavailable, and crm_mon can check the
libqb return code to detect and handle that situation gracefully.
* log: lower some ringbuffer debug messages to trace level
They're rather noisy, with every shm-based IPC connection generating multiple
obscure messages like:
debug: shm size:1048589; real_size:1052672; rb->word_size:263168
and every disconnect generating the rather unhelpful:
debug: qb_ipcc_disconnect()
along with multiple messages like:
debug: Closing ringbuffer: /dev/shm/qb-10986-11014-34-26VRvs/qb-request-cmap-header
All of these seem appropriate to trace level.
Why just this one?
There are LOADS of asserts in libqb, some are OK and some may be
overkill. This one in particular is causing CI failures
and so annoys me more than the rest.
qb_vsnprintf_serialize was called with 'max_size' as the
limiting number for the length of the formatted log
message. But the buffer also needs to contain the
log header (given by 'actual_size'), so we now pass
't->max_line_length' as the maximum length of the
formatted log message to limit space to the actual
bytes left
Also added error checks to the blackbox calls at
the end of the test, as these now provide a proper
test that the BB is functioning. Before they were
masking failures.
if the message was too long, then msg_len was added to the
buffer size twice, thus causing potential data corruption
(seen VERY rarely in the CI test - or, at least, I think it was
this).
Also fix a double close() spotted by gcc13's -fanalyzer
Otherwise GCC complains about ‘__builtin_strncpy’ specified bound
depends on the length of the source argument.
Signed-off-by: Ferenc Wágner <wferi@debian.org>
A timer in QB_POLL_ENTRY_JOBLIST doesn't necessarily
have a t->timerlist_handle so that deref can segv. Also
the comment assumes the timers are threaded - which as we have
decided is definitely not true. So it's safe to move the check
earlier.
In the tests, I've adjusted the timeouts so that they definitely
happen at different times. On some architectures they can fire
concurrently and in the wrong order.
covscan complained we don't check the blackbox header when
reading it in. (quite reasonably)
Note that we still get a covscan error for ->shared_data, but that's
really impossible to verify in the read routine, so I'll leave the
covscan waiver to handle that.
* unix: Don't fail on FreeBSD running ZFS
ZFS doesn't support posix_fallocate() so libqb IPC or RB would
always fail with EINVAL.
As there seems to be no prospect of a more useful return code,
trap it in a QB_BSD #ifdef. That way if we do have actual errors
in the posix_fallocate() call the Linux tests should still find them.
Also, stick a small sleep in the test_ipc_disconnect_after_created
test to allow the server to shutdown before killing it with SIGTERM
and causing a test failure. all the other uses of it seem to have this
sleep!
Previously, when clock_gettime() was available, the time functions would use
that (regardless of success or failure), otherwise they would use
gettimeofday() if available.
Now, the functions first try clock_gettime() if available, but if that is
unavailable or fails, they then try gettimeofday() if available, but if that is
not available or fails, they try time().
configure.ac already defined HAVE_GETTIMEOFDAY, but the uses of gettimeofday()
weren't guarded by it. It obviously doesn't matter on any currently supported
platforms, but it will be needed for planned changes.
The time-related functions have two implementations, one if clock_gettime() is
available and the other if not.
Previously, there was one big ifdef-else with the clock_gettime()
implementation of each function followed by the other implementation of each
function.
With this commit, each function is defined once, with an ifdef-else inside it
with the two implementations of that function. For ease of review, no other
code changes are made, but the intent will become obvious with later changes.
qb_log calls malloc() and probably many other non-signal-safe
functions, so don't call it in the signal handler.
Thanks to Honza for spotting this
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
* tlist: Add heap based implementation of timer list
Previous timer was sorted list implementation of priority queue
and very slow when number of timers increased. This is mostly
not a problem because usually only few timers are used.
But for application where bigger number of timers are needed
it may become problem.
Solution is to use binary heap based priority queue which is much
faster.
API is unchanged, just timerlist_destroy is added which should be called
to free heap array. This function also destroys mutex (omitted when
mutex was added).
* tests: Fix check loop mt test
test_th was accesed both by main thread and loop_timer thread resulting in
failure. Fix is to access test_tht in loop_timer thread.
Speed test is adding only 10000 items so it is reasonable
fast even with sorted linked list implementation.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
The message-id parameter will enable systemd catalogs.
To enable message-id's the libqb should be configured with the
--enable-systemd-journal option.
Co-authored-by: root <Aleksei Burlakov>
This is an attempt to make sure that /dev/shm is cleaned up when a
server exits unexpectedly. Normally it's the server's responsibility
to tidy up sockets, but if it crashes or is killed with SIGKILL then
the client (us) makes a reasonable attempt to tidy up the server sockets
we have connected. The extra delay here just gives the server chance to
disappear fully. As a client we can get here pretty quickly but shutting
down a large server may take a little longer even when SIGKILLed.
The 1/100th of a second is an arbitrary delay (of course) but seems to
catch most servers in 2 tries or less.
* ipc: addd qb_ipcc_auth_get() API call
We can't use SO_PEERCRED on the client fd when using socket IPC
becayse it's a DGRAM socket (pacemaker tries this). So provide
an API to get the server credentials that libqb has already
squirreled away for its own purposes.
Also, fix some unused-variable compiler warnings in unix.c
when building on systems without posix_fallocate().
Using of posix_fallocate() guarantees that, if it succeed, the
attempting to write to allocated space range does not fail because of
lack of storage space. This prevents SIGBUS when trying to write to
mmaped file and no space left.
Co-Authored-by: Ivan Zakharyaschev <imz@altlinux.org>
Reported-by: Mikhail Kulagin <m.kulagin at postgrespro dot ru>
Co-authored-by: Ivan Zakharyaschev <imz@altlinux.org>
It's possible that cs->filename or cs->format could be read
in the 'fast' path while the 'slow' path is still constructing
the object. So we need to lock arr_next_lock before copying them
out for the caller.
Also wthread_should_exit was unprotected.
It's hard to predict the length of formatted output, so we'd better
notice (and abort) if the description is truncated. Incidentally,
mkdtemp() does this for us in the shared memory branch, but do an
explicit check there as well for consistency, and get rid of the wrongly
parametrized strncat() risking a buffer overflow (CONNECTION_DESCRIPTION
is not the length of the source "/qb").
Similar truncation checks should be added to qb_ipcs_{shm,us}_connect()
where they build the request/response names, and possibly to other
places using snprintf().
When qb_ipcs_connection_auth_set() has been used, the ownership of the
temp directory initially set by handle_new_connection() must be updated
as well.
The main and the most ABI-touching thing for the envisioned 2.0 branch
is the usage of the linker-build-time allocated callsite info, avoiding
the non-economic evaluations and, under some circumstances dangerous,
heap allocations in the run-time.
Considering that v1.9.0 release (libqb.so.20) was expressly marked as
tech-preview[1,2] (hence something that shall not make it to production
use), there should be no harm for master branch (that is headed towards
2.0 and beyond) to receive noticable SONAME bump (libqb.so.100) so as to
- leave enough of space for a possible v1-compatible branch evolution
(for use cases where recompile-everything is a no-go).
in particular, with resuming with libqb.so.30, there would
be a room for 99-33 = 63 add-new-drop-nothing compatible
changes for that branch (which is more than plentiful)
- indicate some big change is going on more clearly towards client space
This is supposed to be a reasonable trade-off solution that would still
leave enough wiggle space, and would represent responsible approach to the
development (like the original attempt to prevent ABI break in the first
place was), allowing for more than an enforced unanimity (rather
antagonistic in the free software realms).
[1] https://lists.clusterlabs.org/pipermail/users/2019-December/026690.html
[2] https://github.com/ClusterLabs/libqb/releases/tag/1.9.0
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>