* log: lower IPC connection issues to info level
... in handle_new_connection(). The caller has better context for whether a
problem merits a warning or error, and the function's return code is
sufficiently descriptive to do so. Some problems may be expected or able to be
worked around.
For example, Pacemaker's crm_mon attempts to contact pacemakerd IPC. On a
Pacemaker Remote node, that IPC will be unavailable, and crm_mon can check the
libqb return code to detect and handle that situation gracefully.
* log: lower some ringbuffer debug messages to trace level
They're rather noisy, with every shm-based IPC connection generating multiple
obscure messages like:
debug: shm size:1048589; real_size:1052672; rb->word_size:263168
and every disconnect generating the rather unhelpful:
debug: qb_ipcc_disconnect()
along with multiple messages like:
debug: Closing ringbuffer: /dev/shm/qb-10986-11014-34-26VRvs/qb-request-cmap-header
All of these seem appropriate to trace level.
* ipc: addd qb_ipcc_auth_get() API call
We can't use SO_PEERCRED on the client fd when using socket IPC
becayse it's a DGRAM socket (pacemaker tries this). So provide
an API to get the server credentials that libqb has already
squirreled away for its own purposes.
Also, fix some unused-variable compiler warnings in unix.c
when building on systems without posix_fallocate().
It's hard to predict the length of formatted output, so we'd better
notice (and abort) if the description is truncated. Incidentally,
mkdtemp() does this for us in the shared memory branch, but do an
explicit check there as well for consistency, and get rid of the wrongly
parametrized strncat() risking a buffer overflow (CONNECTION_DESCRIPTION
is not the length of the source "/qb").
Similar truncation checks should be added to qb_ipcs_{shm,us}_connect()
where they build the request/response names, and possibly to other
places using snprintf().
Response structure was not initialized completely,
when mkdtemp/chown failed, server was not accepting connection yet or
connect failed for some reason.
This is not an issue, but valgrind reports this
as a problem so it is easy to miss real problem then.
Solution is to initialize response before it is used.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Terminating NUL on FreeBSD is not part of the sun_path.
Add it to use sun_path as a parameter of unlink.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
It's misleading towards a random code observer, at least,
hiding the fact that what failed is actually the queing up
of some handling to perform asynchronously in the future,
rather than invoking it synchronously right away.
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
It turns out that while 7f56f58 allowed for less blocking (thus
throughput increasing) initial handling of connections from clients
within the abstract (out-of-libqb managed) event loop, it unfortunately
subscribes itself back to such polling mechanism for UNIX-socket-check
with a default priority, which can be lower than desired (via explicit
qb_ipcs_request_rate_limit() configuration) for particular channel
(amongst attention-competing siblings in the pool, the term here
refers to associated communication, that is, both server and
on-server abstraction for particular clients). And priority-based
discrepancies are not forgiven in true priority abiding systems
(that is, unlikele with libqb's native event loop harness as detailed
in the previous commit, for which this would be soft-torelated hence
the problem would not be spotted in the first place -- but that's
expliicitly excluded from further discussion).
On top of that, it violates the natural assumption that once (single
threaded, which is imposed by libqb, at least between initial accept()
and after-said-UNIX-socket-check) server accepts the connection, it
shall rather take care of serving it (at least within stated initial
scope of client connection life cycle) rather than be rushing to accept
new ones -- which is exactly what used to happen previously once the
library user set the effectively priority in the abstract poll
above the default one.
It's conceivable, just as with the former case of attention-competing
siblings with higher priority whereby they could _infinitely_ live on
at the expense of starving the client in the initial handling phase
(authentication) despite the library user's as-high-as-siblings
intention (for using the default priority for that unconditionally
instead, which we address here), the dead lock is imminent also in
this latter accept-to-client-authentication-handling case as well
if there's an _unlimited_ fast-paced arrival queue (well, limited
by with number of allowable open descriptors within the system,
but for the Linux built-in maximum of 1M, there may be no practical
difference, at least for time-sensitive applications).
The only hope then is that such dead-locks are rather theoretical,
since a "spontaneous" constant stream of either communication on
unrelated, higher-prio sibling channels, or of new connection arrivals
can as well testify the poor design of the libqb's IPC application.
That being said, unconditional default priority in the isolated
context of initial server-side client authentication is clearly
a bug, but such application shall apply appropriate rate-limiting
measures (exactly on priority basis) to handle unexpected flux
nonetheless.
The fix makes test_ipc_dispatch_*_glib_prio_deadlock_provoke tests pass.
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Use mkdtemp makes sure that IPC files are only visible to the
owning (client) process and do not use predictable names outside
of that.
This is not meant to be the last word on the subject, it's mainly a
simple way of making the current libqb more secure. Importantly, it's
backwards compatible with an old server.
It calls rmdir on the directory created by mkdtemp way too often, but
it seems to be the only way to be sure that things get cleaned up on
the various types of server/client exit. I'm sure we can come up with
something tidier for master but I hope this, or something similar, will
be OK for 1.0.x.
* ipc_shm: Don't truncate SHM files of an active server
I've put in an extra check so that clients don't truncate the
SHM file if the server still exists. Sadly on FreeBSD we can't
get the server PID for the client (unless someone has a patch handy!)
so we still do the truncate when disconnected. As a backstop (and also
to cover the BSD issue) I've added a SIGBUS trap to the server shutdown
so that it doesn't cause a server crash.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed by: Jan Friesse <jfriesse@redhat.com>
* IPC: Allow filesystem sockets to be chosen at run-time on Linux
Most of this patch came from Andrew Beekhof.
Keep a global variable that decides whether or not to use filesystem sockets
or abstract sockets for IPC connections. This variable is set by the presence of a file (default /etc/libqb/force-filesystem-sockets).
* tests: Fix test_ipcc_truncate_when_unlink_fails_shm test using FS sockets
When using filesystem sockets, the
test_ipcc_truncate_when_unlink_fails_shm test always fails, this was
because the unlink() call is wrapped to fail and so it never cleans up
the old version of the socket.
The fix is to preemptively remove the file before unlink gets wrapped.
* doc: Explain the force-filesystem-sockets option
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
* remove pid/euid from qb_ipcc_connection
* use proper #elif defines
* return NULL instead of 0 for pointers
* return -ENOMEM when malloc fails
* remove redundant if check
* use -1 for uid to chown()
In handle_new_connection(), connection_accept() could fail, which would
leave the state of the connection inactive. Previously, in this case,
the socket and the allocated qb_ipcs_connection would be leaked.
The ipc server registers the bind socket to
the poll loop in order to be alerted to new
connection requests. Upon shutdown, the ipc server
does not remove this poll entry. This patch fixes
this use after free.
This is needed, because newer kernels doesn't correctly support setting
SO_PASSCRED on sockets returned by accept call, but socket option must
be set on server socket (before accept call).
For more details, see:
http://patchwork.ozlabs.org/patch/284366/
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Seperate into a setup file and a socket backend file, it was getting messy
and confusing. Also preparing for using DGRAM sockets.
This should not result in any logical changes.
Signed-off-by: Angus Salkeld <asalkeld@redhat.com>