Commit Graph

831 Commits

Author SHA1 Message Date
Chrissie Caulfield
2d03793eb0
unix: Don't fail on FreeBSD running ZFS (#461)
* unix: Don't fail on FreeBSD running ZFS

ZFS doesn't support posix_fallocate() so libqb IPC or RB would
always fail with EINVAL.

As there seems to be no prospect of a more useful return code,
trap it in a QB_BSD #ifdef. That way if we do have actual errors
in the posix_fallocate() call the Linux tests should still find them.

Also, stick a small sleep in the test_ipc_disconnect_after_created
test to allow the server to shutdown before killing it with SIGTERM
and causing a test failure. all the other uses of it seem to have this
sleep!
2022-03-17 07:47:39 +00:00
Christine Caulfield
f5106342d0 ipcc: Fix errno returned from qb_ipcc_connect
The errno value from qb_ipcc_connect was incorrectly negated
when I introduced qb_ipcc_async_connect()
2022-03-03 07:29:07 +00:00
Ken Gaillot
3fb2b59751 util: reimplement time functions as a series of fallbacks
Previously, when clock_gettime() was available, the time functions would use
that (regardless of success or failure), otherwise they would use
gettimeofday() if available.

Now, the functions first try clock_gettime() if available, but if that is
unavailable or fails, they then try gettimeofday() if available, but if that is
not available or fails, they try time().
2022-02-14 12:25:19 +00:00
Ken Gaillot
da12cc9695 util: use HAVE_GETTIMEOFDAY where appropriate
configure.ac already defined HAVE_GETTIMEOFDAY, but the uses of gettimeofday()
weren't guarded by it. It obviously doesn't matter on any currently supported
platforms, but it will be needed for planned changes.
2022-02-14 12:25:19 +00:00
Ken Gaillot
354c0c2531 util: drop HAVE_CLOCK_GETRES_MONOTONIC configure constant
It doesn't provide a significant benefit over just trying the call.
It was added by 6bd3f086 for Hurd support.
2022-02-14 12:25:19 +00:00
Ken Gaillot
1e67908580 util: add constant for which realtime clock to use
... to reduce code duplication and improve readability
2022-02-14 12:25:19 +00:00
Ken Gaillot
4f82b0b6c4 util: refactor so ifdef's are withing each time-related function
The time-related functions have two implementations, one if clock_gettime() is
available and the other if not.

Previously, there was one big ifdef-else with the clock_gettime()
implementation of each function followed by the other implementation of each
function.

With this commit, each function is defined once, with an ifdef-else inside it
with the two implementations of that function. For ease of review, no other
code changes are made, but the intent will become obvious with later changes.
2022-02-14 12:25:19 +00:00
Jakub Jankowski
176eae8f13
Retry if posix_fallocate is interrupted with EINTR (#453)
Every now and then Pacemaker reports errors:

  (pcmk__new_client)        debug: New IPC client 3efdbecf-c2d9-44bc-b4a6-9bcd48021ba1 for PID 27492 with uid 0 and gid 0
  (handle_new_connection)   debug: IPC credentials authenticated (/dev/shm/qb-7271-27492-12-hfPbKY/qb)
  (qb_ipcs_shm_connect)     debug: connecting to client [27492]
  (qb_rb_open_2)    debug: shm size:524301; real_size:528384; rb->word_size:132096
  (qb_rb_open_2)    debug: shm size:524301; real_size:528384; rb->word_size:132096
  (qb_sys_mmap_file_open)   error: couldn't allocate file /dev/shm/qb-7271-27492-12-hfPbKY/qb-event-cib_rw-data: Interrupted system call (4)
  (qb_rb_open_2)    error: couldn't create file for mmap
  (qb_ipcs_shm_rb_open)     error: qb_rb_open:/dev/shm/qb-7271-27492-12-hfPbKY/qb-event-cib_rw: Interrupted system call (4)
  (qb_rb_close_helper)      debug: Free'ing ringbuffer: /dev/shm/qb-7271-27492-12-hfPbKY/qb-response-cib_rw-header
  (qb_rb_close_helper)      debug: Free'ing ringbuffer: /dev/shm/qb-7271-27492-12-hfPbKY/qb-request-cib_rw-header
  (qb_ipcs_shm_connect)     error: shm connection FAILED: Interrupted system call (4)
  (handle_new_connection)   error: Error in connection setup (/dev/shm/qb-7271-27492-12-hfPbKY/qb): Interrupted system call (4)

While it probably might be addressed in Pacemaker code, a simple retry
loop in case posix_fallocate(3) returns EINTR seems to be a decent
workaround.

Fixes: #451

Signed-off-by: Jakub Jankowski <shasta@toxcorp.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
2022-01-14 07:57:25 +00:00
Chrissie Caulfield
de5ab3029c
ipcc: Add an async connect API (#450) 2022-01-05 10:53:09 +00:00
Christine Caulfield
a2691b9618 Bump library version for v2.0.4 2021-11-12 13:18:47 +00:00
Chrissie Caulfield
d4b49fb5e9
poll: Don't log in a signal handler (#447)
qb_log calls malloc() and probably many other non-signal-safe
functions, so don't call it in the signal handler.

Thanks to Honza for spotting this

Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2021-11-10 12:35:07 +00:00
Chrissie Caulfield
a60ca50b67
Fix pthread returns (#444)
pthread calls do not set errno, they return the error directly
2021-08-11 07:55:42 +01:00
Jan Friesse
48fff5eb58
Implement heap based timer list (#439)
* tlist: Add heap based implementation of timer list

Previous timer was sorted list implementation of priority queue
and very slow when number of timers increased. This is mostly
not a problem because usually only few timers are used.
But for application where bigger number of timers are needed
it may become problem.

Solution is to use binary heap based priority queue which is much
faster.

API is unchanged, just timerlist_destroy is added which should be called
to free heap array. This function also destroys mutex (omitted when
mutex was added).

* tests: Fix check loop mt test

test_th was accesed both by main thread and loop_timer thread resulting in
failure. Fix is to access test_tht in loop_timer thread.

Speed test is adding only 10000 items so it is reasonable
fast even with sorted linked list implementation.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2021-03-18 07:27:25 +00:00
Christine Caulfield
404adbcd99 release: bump library version for 2.0.3 release 2021-03-03 08:32:09 +00:00
Aleksei Burlakov
aae7a0aa5d
syslog: Add a message-id parameter for messages (#433)
The message-id parameter will enable systemd catalogs.
To enable message-id's the libqb should be configured with the
 --enable-systemd-journal option.

Co-authored-by: root <Aleksei Burlakov>
2021-03-01 15:58:50 +00:00
Chrissie Caulfield
d6e2bd1d6b
timers: Add some locking (#436)
Fix several locking issues reported by helgrind
2021-02-08 10:57:42 +00:00
Chrissie Caulfield
991872eded
ipcc: Have a few goes at tidying up after a dead server (#434)
This is an attempt to make sure that /dev/shm is cleaned up when a
server exits unexpectedly. Normally it's the server's responsibility
to tidy up sockets, but if it crashes or is killed with SIGKILL then
the client (us) makes a reasonable attempt to tidy up the server sockets
we have connected. The extra delay here just gives the server chance to
disappear fully. As a client we can get here pretty quickly but shutting
down a large server may take a little longer even when SIGKILLed.
The 1/100th of a second is an arbitrary delay (of course) but seems to
catch most servers in 2 tries or less.
2021-01-25 12:19:10 +00:00
Chrissie Caulfield
5097155bdf
strlcpy: Check for maxlen underflow (#432)
* strlcpy: Check for maxlen underflow

https://github.com/ClusterLabs/libqb/issues/429

* Always terminate the string if maxlen is > 0
2021-01-13 14:12:02 +00:00
Christine Caulfield
def947efcf lib: Update library version for 2.0.2 release
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
2020-12-03 13:05:43 +00:00
Hideo Yamauchi
51f181b3cd
ipcs : Decrease log level. (#426)
of ipcs event notification 'errors' that can occur in normal use.
2020-12-03 11:33:30 +00:00
Chrissie Caulfield
06ac2d43a9
cov: Quieten some covscan warnings (#427) 2020-12-03 09:39:28 +00:00
Chrissie Caulfield
51f64ed233
ipcs: ftruncate is not support on WIN32 (#424)
https://github.com/microsoft/WSL/issues/902
2020-10-05 08:08:48 +01:00
Chrissie Caulfield
a52d912675
ipcs: Add missing qb_list_del when freeing server (#423)
* ipcs: Remove list not used

Thanks to minhbq for pointing this out
2020-10-05 08:08:09 +01:00
Chrissie Caulfield
680db526f6
ipc: add qb_ipcc_auth_get() API call (#418)
* ipc: addd qb_ipcc_auth_get() API call

We can't use SO_PEERCRED on the client fd when using socket IPC
becayse it's a DGRAM socket (pacemaker tries this). So provide
an API to get the server credentials that libqb has already
squirreled away for its own purposes.

Also, fix some unused-variable compiler warnings in unix.c
when building on systems without posix_fallocate().
2020-09-28 09:53:21 +01:00
Christine Caulfield
416caf2b92 Bump version for 2.0.1 2020-07-29 08:28:23 +01:00
wladmis
1c6229c171
unix.c: use posix_fallocate() (#409)
Using of posix_fallocate() guarantees that, if it succeed, the
attempting to write to allocated space range does not fail because of
lack of storage space. This prevents SIGBUS when trying to write to
mmaped file and no space left.

Co-Authored-by: Ivan Zakharyaschev <imz@altlinux.org>
Reported-by: Mikhail Kulagin <m.kulagin at postgrespro dot ru>

Co-authored-by: Ivan Zakharyaschev <imz@altlinux.org>
2020-07-29 07:37:09 +01:00
Chrissie Caulfield
b2acdea2a8
array: More locking fixes (#400)
* array: More locking fixes

helgrind threw out a couple more locking errors in the logging/array
code and we also need to protect a->max_elements
2020-06-08 14:16:58 +01:00
Chrissie Caulfield
49641930d8
log: Fix threading races (#396)
It's possible that cs->filename or cs->format could be read
in the 'fast' path while the 'slow' path is still constructing
the object. So we need to lock arr_next_lock before copying them
out for the caller.

Also wthread_should_exit was unprotected.
2020-06-01 15:41:26 +01:00
Chrissie Caulfield
bdc716036a
Some bugs spotted by coverity (#399) 2020-05-28 07:30:26 +01:00
Jan Pokorný
803d9242ff log: journal: fix forgotten syslog reload when flipped from journal
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2020-05-04 08:32:47 +01:00
Ferenc Wágner
2baa2791ce Let remote_tempdir() assume a NUL-terminated name
This is the case already.  We also fix a buffer overflow opportunity in
the memcpy() call by this change.

Conflicts:
	lib/ipc_shm.c
2020-05-01 12:57:51 +01:00
Ferenc Wágner
e26ad0dae1 Make it impossible to truncate or overflow the connection description
It's hard to predict the length of formatted output, so we'd better
notice (and abort) if the description is truncated.  Incidentally,
mkdtemp() does this for us in the shared memory branch, but do an
explicit check there as well for consistency, and get rid of the wrongly
parametrized strncat() risking a buffer overflow (CONNECTION_DESCRIPTION
is not the length of the source "/qb").

Similar truncation checks should be added to qb_ipcs_{shm,us}_connect()
where they build the request/response names, and possibly to other
places using snprintf().
2020-05-01 12:54:30 +01:00
Chris Murphy
08806c5301
master: Issue 390: Clarify documentation of qb_loop_timer_expire_time_get and provide new function to return previously documented behavior (#391)
Includes unit test addition by chrissie-c
2020-04-29 13:20:52 +01:00
Chrissie Caulfield
1daca57c10
trie: Don't assume that chars are unsigned < 126 (#386)
* trie: Don't assume that chars are unsigned < 126

Trie fails on systems with unsigned chars when using characters over
126.
2020-03-09 08:14:39 +00:00
Jonas Witschel
99671f4d75 Set correct ownership if qb_ipcs_connection_auth_set() has been used
When qb_ipcs_connection_auth_set() has been used, the ownership of the
temp directory initially set by handle_new_connection() must be updated
as well.
2020-02-10 11:21:45 +01:00
Ferenc Wágner
700fb2b27e Allow group access to the IPC directory
And don't abort if we aren't permitted to chown() it.  The client might
still have the privileges to enter it.
2020-02-10 10:57:16 +01:00
Ferenc Wágner
a8301de262 Errors are represented as negative values 2020-02-10 10:57:01 +01:00
Jan Pokorný
7f891f0069 build: allow for possible v1 branch continuity by generous SONAME offset
The main and the most ABI-touching thing for the envisioned 2.0 branch
is the usage of the linker-build-time allocated callsite info, avoiding
the non-economic evaluations and, under some circumstances dangerous,
heap allocations in the run-time.

Considering that v1.9.0 release (libqb.so.20) was expressly marked as
tech-preview[1,2] (hence something that shall not make it to production
use), there should be no harm for master branch (that is headed towards
2.0 and beyond) to receive noticable SONAME bump (libqb.so.100) so as to

- leave enough of space for a possible v1-compatible branch evolution
  (for use cases where recompile-everything is a no-go).
  in particular, with resuming with libqb.so.30, there would
  be a room for 99-33 = 63 add-new-drop-nothing compatible
  changes for that branch (which is more than plentiful)

- indicate some big change is going on more clearly towards client space

This is supposed to be a reasonable trade-off solution that would still
leave enough wiggle space, and would represent responsible approach to the
development (like the original attempt to prevent ABI break in the first
place was), allowing for more than an enforced unanimity (rather
antagonistic in the free software realms).

[1] https://lists.clusterlabs.org/pipermail/users/2019-December/026690.html
[2] https://github.com/ClusterLabs/libqb/releases/tag/1.9.0

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2020-01-10 12:27:35 +00:00
Chrissie Caulfield
51a03aa9c6 make: Remove splint tests (#374)
Now that splint is actually contradicting errors that come from
the compilers I think it's time to retire it. I could cope with it
being a minor nuisance on the argument that "another check can't
hurt", but contradicting the actual compilers is too much.

The CI has Coverity installed which is much more up-to-date anyway.
Splint hasn't been updated since 2010
2019-12-11 18:46:22 +01:00
Christine Caulfield
3da80822b2 lib: Fix some minor warnings from newer compilers
and doxygen
2019-12-11 11:14:47 +00:00
Jan Friesse
302b564834 ipc: Always initialize response struct
Response structure was not initialized completely,
when mkdtemp/chown failed, server was not accepting connection yet or
connect failed for some reason.

This is not an issue, but valgrind reports this
as a problem so it is easy to miss real problem then.

Solution is to initialize response before it is used.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-11-08 08:54:04 +00:00
Jan Pokorný
484fddddb8 ringbuffer: fix mistaken errno handling around _rb_chunk_reclaim
Previously, there were two separate logical issues:

- errno could be set negative in qb_rb_chunk_alloc when
  when "reclaim" notifier failed

- _rb_chunk_reclaim (note: local scoped, hence comfortable for changes)
  was already setting errno at a single (coincidentally, in a correct
  way, but that'd be overwritten with the inverse because of the
  previous logical issue in qb_rb_chunk_alloc), so make it set errno
  at each failure path (now also when internal integrity in
  _rb_chunk_reclaim failed(), sparing the callers to double on that task

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2019-07-26 09:39:58 +01:00
Ken Gaillot
aad74c7e39 array,log: Never set errno to a negative value
These are clearly a logical mistake due to the standard practice of returning
-errno on error, but errno itself should never be negative.
2019-07-26 09:39:58 +01:00
Ken Gaillot
7dc8d386fc log: Set errno when qb_log_target_alloc() fails
The callers of qb_log_target_alloc() return -errno when it fails.
However, qb_log_target_alloc() wasn't setting errno.

The only failure case is when QB_TARGET_LOG_MAX (32) logs have been opened, so
it's unlikely to ever be a real-world problem. But in that case, now set errno
to EMFILE ("Too many open files").
2019-07-26 09:39:58 +01:00
Christine Caulfield
9526df0c11 ipc: Remove kqueue EOF log message
It's not logged for epoll systems and just clutters up logs
and slows things down without telling us anything useful.
2019-06-27 13:19:08 +01:00
Christine Caulfield
f1bf5d9da3 ipc: fix force-filesystem-sockets
the /etc/libqb/force-filesystem-sockets option got broken for some
applications in the last security update.
2019-06-24 13:29:34 +01:00
Jan Friesse
ed29f84ab6 ipc: Fix named socket unlink on FreeBSD
Terminating NUL on FreeBSD is not part of the sun_path.
Add it to use sun_path as a parameter of unlink.

Signed-off-by: Jan Friesse <jfriesse@redhat.com>
2019-06-20 08:41:42 +01:00
Jan Pokorný
83da9f2109
IPC: server: fix debug message wrt. what actually went wrong
It's misleading towards a random code observer, at least,
hiding the fact that what failed is actually the queing up
of some handling to perform asynchronously in the future,
rather than invoking it synchronously right away.

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2019-06-05 10:38:32 +02:00
Jan Pokorný
97adfa6ba0
IPC: server: avoid temporary channel priority loss, up to deadlock-worth
It turns out that while 7f56f58 allowed for less blocking (thus
throughput increasing) initial handling of connections from clients
within the abstract (out-of-libqb managed) event loop, it unfortunately
subscribes itself back to such polling mechanism for UNIX-socket-check
with a default priority, which can be lower than desired (via explicit
qb_ipcs_request_rate_limit() configuration) for particular channel
(amongst attention-competing siblings in the pool, the term here
refers to associated communication, that is, both server and
on-server abstraction for particular clients).  And priority-based
discrepancies are not forgiven in true priority abiding systems
(that is, unlikele with libqb's native event loop harness as detailed
in the previous commit, for which this would be soft-torelated hence
the problem would not be spotted in the first place -- but that's
expliicitly excluded from further discussion).

On top of that, it violates the natural assumption that once (single
threaded, which is imposed by libqb, at least between initial accept()
and after-said-UNIX-socket-check) server accepts the connection, it
shall rather take care of serving it (at least within stated initial
scope of client connection life cycle) rather than be rushing to accept
new ones -- which is exactly what used to happen previously once the
library user set the effectively priority in the abstract poll
above the default one.

It's conceivable, just as with the former case of attention-competing
siblings with higher priority whereby they could _infinitely_ live on
at the expense of starving the client in the initial handling phase
(authentication) despite the library user's as-high-as-siblings
intention (for using the default priority for that unconditionally
instead, which we address here), the dead lock is imminent also in
this latter accept-to-client-authentication-handling case as well
if there's an _unlimited_ fast-paced arrival queue (well, limited
by with number of allowable open descriptors within the system,
but for the Linux built-in maximum of 1M, there may be no practical
difference, at least for time-sensitive applications).

The only hope then is that such dead-locks are rather theoretical,
since a "spontaneous" constant stream of either communication on
unrelated, higher-prio sibling channels, or of new connection arrivals
can as well testify the poor design of the libqb's IPC application.
That being said, unconditional default priority in the isolated
context of initial server-side client authentication is clearly
a bug, but such application shall apply appropriate rate-limiting
measures (exactly on priority basis) to handle unexpected flux
nonetheless.

The fix makes test_ipc_dispatch_*_glib_prio_deadlock_provoke tests pass.

Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
2019-06-05 10:36:55 +02:00
Ferenc Wágner
d90d45f576 doc: qbarray: reword comment about index partitioning 2019-05-07 14:46:02 +01:00