This changeset builds on previous 2-3 commits and represents the main
libqb's answer to the original question behind pacemaker's security
defect known as CVE-2016-7035.
Beside the helper partly unifying handling of qb_rb_force_close and
qb_rb_close, it provides the former with ability to use file truncating
as a fallback for when unlinking fails, e.g., because client (note that
mentioned is currently only relevant for the client side as normally
server is responsible for the lifecycle of the materialized files,
unless it crashes and only client is left to do its best) is not the
owner while they are placed at a directory with restricted deletion,
which enforces this very ownership condition.
In practice, this means that, at worst, just the zero-size files are
left behind, so not that much space exhaustion (usually "ramdisk"
like tmpfs is what backs default storage directory /dev/shm, so it
boils down to physical memory exhaustion, even if it can be just
for page cache and related overhead) can happen even on repeated
crashes as the memory mappings are cleared as much as possible.
Also openat/unlinkat functions (sported in qb_sys_unlink_or_truncate_at
as of the previous commit) are, when applicable, used so as to limit
possible race conditions between/during individual path traversals
(both files being got rid of presumably share the same directory).
Few words on which actions are attempted in which order for the
equivalent of qb_rb_force_close now:
There are subtle interactions between what's externally visible
(files) and what's not (memory mappings associated with such files),
and perhaps between memory pages management from the perspective of
the former (usually "ramdisk"/tmpfs) and the latter (mmap + munmap).
If the associated file is no longer publicly exposed by the means of
unlink (even if the object survives internally as refcounting is in
the game, with mmap holding a reference), memory mapping is not
affected. On the other hand, if it's just limited by truncation
to zero size, memory mapping is aware and generates SIGBUS in response
to accessing respective addresses. Similarly, accessing munmap'd
(no refcounting here) memory generates SIGSEGV. For delicacy,
the inputs for all of unlink, truncate, and munmap are stored
at the mmap'd location we are about to drop, but that's just a matter
of making copies ahead of time.
At Ken's suggestion, the scheme is: (unlink or truncate) then munmap,
which has a benefit that externally visible (and program's life span
otherwise surviving!) part is eliminated first, with memory mappings
(disposed at program termination automatically at latest) to follow.
(There was originally a paranoid expectation on my side that truncate
on tmpfs actually does silent munmap, so that our munmap could in fact
tear down the mapping added in the interim by the libraries, signal
handler or due to requirements of another thread, also because of
munmap on the range without any current mappings will not fail, and
thus there's likely no portable way to non-intrusively check the
status, but also due to documented SIGBUS vs. SIGSEGV differences
the whole assumption appears bogus on the second thought.)
Relevant unit tests that exercise client-side unlinking:
- check_ipc: test_ipc_server_fail_shm, test_ipc_exit_shm
- new test in a subsequent commit
These are intended for subsequent qb_rb_{force_,}close refactorization
and utilization of this new truncate as a fallback after unlink failure
as detailed in the commit to follow.
For newer POSIX revision compliant systems, there's "at" variant using
openat/unlinkat functions so that paths do not have to be traversed
in full anew when not needed (as both unlink and truncate operate on
the same path).
...using a new private inline helper that is intended to "decorate"
argument (plus extra reference level added) to qb_rb_{force_,}close().
It is purposefully not hardwired to neither qb_rb_close (it's a public
API function that should not change its semantics) nor qb_rb_force_close
(just for symmetry, preempting issues when the two would differ, and
also makes them more mutually compatible, which is already expected
at qb_ipcc_shm_disconnect).
It sets the original ringbuffer pointer to NULL (having the immediate
impact on other threads/asynchronous handling) and also sets the
(currently underused) reference counter set to exacly 1 (that is
subsequently going to be decremented in qb_rb_close so that it's
sound in the current arrangement).
More in the comment at the helper.
Suitable places are also made to use it right away.
This fixes issue with would-fail-if-applied-to-thread-right-away
qb_log_thread_priority_set invocation when logging thread doesn't
exist yet, which will arrange for calling itself at the time of
thread's birth that is the moment it will actually fail.
In this + lock-could-not-have-been-initialized corner cases, the
already running thread would proceed as allowed by error condition
handling in the main thread, trying to dereference uninitialized
(or outdated) pointer to the lock at hand, resulting in segfault.
Also include the test that would have been caught that (we use the
fact that it doesn't matter whether setting of the scheduler parameters
fails due to bad input or just because of lack of privileges as it's
the failure at the right moment that is of our interest).
See also:
https://github.com/ClusterLabs/libqb/issues/229
If a tag of 0 is passed into the logger and an existing callsite
is found with a non-zero tag, the don't overwrite the existing tag.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
... where appropriate space is measured for, surprisingly, a char,
not for an int. Note that's also the actual type used for both
de-/serializing, so there's no conflict.
Also bother to explain why, now surprisingly for real, an unsigned int
is scraped out from va_list (akin to to STDARG(3)).
* configure.ac: Define QB_GNU.
Add a check for a working clock_getres for the CLOCK_MONOTONIC
option defining HAVE_CLOCK_GETRES_MONOTONIC.
* lib/log_thread.c: Replace second argument of
qb_log_thread_priority_set(): logt_sched_param.sched_priority by 0
when not supported by the OS.
* lib/util.c: Use the CLOCK_REALTIME option in clock_getres() if
HAVE_CLOCK_GETRES_MONOTONIC os not defined.
Original "qb_log_ctl" interface had to be extended for passing read-only
strings (new parameter), resulting in new "qb_log_ctl2" function, which
is what qb_log_ctl calls into with the new parameter set to NULL.
This ensures backward compatibility.
A new QB_LOG_CONF_IDENT configuration directive for the mentioned
interface is added with a goal to set new internal identifier
that is, notably, used for syslog sink. This allows for switching
the identification without a need to reinitialize logging subsystem,
akin to changing target logging facility.
Also a brand new concept of testing syslog sink in particular is
introduced (finally). During initial trial&error stage, it used
LD_PRELOAD hack but it seems that libtool is sophisticated enough
that no such extra intervention is needed and the desired symbol
resolution Just Works (tm). However, the technique is highly
non-portable (there is even a warning about that from libtool,
which is partially on purpose as the _syslog_override.so should
rather be explicit it is by no mean a regular library) and hence
the syslog tests have to be enabled with explicit
./configure --enable-syslog-tests
rather than possibly break on untested platforms (far too many).
The concept can be extended upon, but initially, just the new
feature is being tested.
Post-review: thanks Chrissie for a suggestion how to deal with
extract-arg-and-forget in a less intrusive way (no defines).
Also use the new enum qb_log_target_slot type in for-loops together
with a proper substitute for the literal "0" initializer.
There could be more places that might be type-substituted for this enum
(and hence possibly catch more of an incorrect usage if the compiler
or checker has some notion of enum type narrowing), but leave it as
a possible enhancement for now.
Set the sockets receive buffer size to match the send buffer. On
FreeBSD without this calls to sendto() will result in an ENOBUFS error
if the message is larger than net.local.dgram.recvspace sysctl.
With gcc 5.3.1 20151207:
> log.c: In function 'cs_format':
> log.c:182:2: warning: function might be possible candidate for
> 'gnu_printf' format attribute [-Wsuggest-attribute=format]
> len = vsnprintf(str, QB_LOG_MAX_LEN, cs->format, ap_copy);
> ^
We certainly don't want to disable that warning globally so make use of
diagnostic pragmas for GCC instead in one instance that we cannot
annotate properly.
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
* remove pid/euid from qb_ipcc_connection
* use proper #elif defines
* return NULL instead of 0 for pointers
* return -ENOMEM when malloc fails
* remove redundant if check
* use -1 for uid to chown()
When creating a unix socket it's default gid is that of the parent
directory. If the SOCKETDIR is owned by root:wheel with 1777 mode
some of the pacemaker daemons end up unable to communicate with one
another due to having insufficient permissions on the sockets.
This can be fixed by setting the client sockets gid to the primary
group of the server socket owner it's attempting to connect to. And,
on the server side by setting the gid to the already captured gid
stored in the connection info. This ensures that regardless of who
owns the socket directory, as long as the applications have r/w
access to it they should work.
Currently the mmap files are created in LOCALSTATEDIR/run on non-Linux
platforms which can be problematic with pacemaker since it spawns processes
as the hacluster user, which by default doesn't have write permissions to
it. Using --with-socket-dir partially fixes the problem by allowing the
unix sockets to reside somewhere else but not the mmap files and this
patch puts them in the same directory.
When using sockets for IPC the file permissions default to whatever
the umask is. This isn't a problem on Linux since it uses abstract
namespace sockets which don't have any permissions, but on other
platforms this causes problems with pacemaker which spawns processes
under the hacluster user and ends up failing to connect.
In handle_new_connection(), connection_accept() could fail, which would
leave the state of the connection inactive. Previously, in this case,
the socket and the allocated qb_ipcs_connection would be leaked.