totem.nodeid is relict from times when nodelist was not required and
totemsrp was sending whole membership with ip addresses.
With Corosync 3 ip addresses are no longer sent so
it is not possible to find "next" node ip address where to send token
(because only nodeid is sent) without having information about all of
the nodes stored locally.
When totem.nodeid was configured it was partly used and other parts
(most notably totemudpu_token_target_set) were using autogenerated
nodeid. Together it was not possible to create even single node
membership.
Solution is to ignore totem.nodeid completely (and display warning when
it is set).
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Support for cgroup v2 is very similar to cgroup v1 just checking (and
writing) different file.
Because of all the problems described later with cgroup v2 new "auto"
mode (new default) is added. This mode first tries to set rr scheduling
and moves Corosync to root cgroup only if it fails.
Testing this feature is a bit harder than with cgroup v1 so it's
probably worh noting in this commit message.
1. Copy some service file (I've used httpd service) and set
CPUQuota=30% in the [service] section.
2. Check /sys/fs/cgroup/cgroup.subtree_control - there should be no
"cpu"
3. Start modified service
4. Check /sys/fs/cgroup/cgroup.subtree_control - there should be "cpu"
5. Start corosync - It should be able to get rt priority
When move_to_root_cgroup is disabled (applies only for kernels
with CONFIG_RT_GROUP_SCHED enabled), behavior differs:
- If corosync is started before modified service, so
there is no "cpu" in /sys/fs/cgroup/cgroup.subtree_control
corosync starts without problem and gets rt priority.
Starting modified service later will never add "cpu" into
/sys/fs/cgroup/cgroup.subtree_control (because corosync is holding
rt priority and it is placed in the non-root cgroup by systemd).
- When corosync is started after modified service, so "cpu"
is in /sys/fs/cgroup/cgroup.subtree_control, corosync is not
able to get RT priority.
It's worth noting problems when cgroup v2 is used together with systemd
logging described in corosync.conf(5) man page.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Support for cgroup v2 is very similar to cgroup v1 just checking (and
writing) different file.
Testing this feature is a bit harder than with cgroup v1 so it's
probably worh noting in this commit message.
1. Copy some service file (I've used httpd service) and set
CPUQuota=30% in the [service] section.
2. Check /sys/fs/cgroup/cgroup.subtree_control - there should be no
"cpu"
3. Start modified service
4. Check /sys/fs/cgroup/cgroup.subtree_control - there should be "cpu"
5. Start corosync - It should be able to get rt priority
When move_to_root_cgroup is disabled, behavior differs:
- If corosync is started before modified service, so
there is no "cpu" in /sys/fs/cgroup/cgroup.subtree_control
corosync starts without problem and gets rt priority.
Starting modified service later will never add "cpu" into
/sys/fs/cgroup/cgroup.subtree_control (because corosync is holding
rt priority and it is placed in the non-root cgroup by systemd).
- When corosync is started after modified service, so "cpu"
is in /sys/fs/cgroup/cgroup.subtree_control, corosync is not
able to get RT priority.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
... to be in align with crypto_cypher and crypto_hash.
Reload (corosync-cfgtool -R) works without any problem and changing of
key is not supported anyway,
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Use knet_get_crypto_list to find knet supported crypto models and use
them instead of hardcoded list.
Also fix compression handling. Previously knet_compression_model
value was not checked at all and was directly passed to knet.
Use knet_get_compress_list to find knet supported compress models and
use them to check validity of config file and for more informative
error message.
Lastly enhance corosync version display with information
about available crypto/compression models.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Don't lock all current and future memory if can't
increase memlock rlimit.
If we fail to increase our RLIMIT_MEMLOCK, then locking all our current
and future memory is extremely dangerous; once our memory use reaches
our RLIMIT_MEMLOCK, memory allocations will start failing, very likely
leading to our entire process crashing.
This can happen if we aren't a privileged process, for example if
running as non-root user, or inside an unprivileged container.
Signed-off-by: Dan Streetman <ddstreet@canonical.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Found by covscan which also didn't like us 'leaking' the
fd to the lockfile. So close that too.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
At the same time simplify the overwrite logic and stop clearing the
umask (which is unexpected and quite pointless here, as applications
can't really protect the users from their own pathological settings).
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Needs new knet crypto API.
If it's not available, then fall back to the old
API and forbid changing crypto while running.
To avoid us being dependant on the leader node, each
node sends its own crypto_reconfig_phase messages so
we can guarantee that the reconfiguration always completes
on each node.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This is useful for matching schedmiss event in stats map with logged
event.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Using monotonic time is not working because it doesn't have to match
time from epoch.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This patch add a stats.schedmiss.* set of entries that
are a record of the last 10 times corosync was not scheduled
in time.
These entries are keypt in reverse order (so stats.schedmiss.0.* is
always the latest one kept) and the values, including the timestamp,
are in milliseconds.
It's also possible to use a cmap tracker to follow these events, which
might be useful.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Previously node id was logged ether as a %d (most often), %u, %x or
PRI.32 and ring id ether as %lld, %llx with various separators (., :, /)
between rep nodeid and seq. This seems to cause confusion.
This patch adds macros CS_PRI_NODE_ID, CS_PRI_RING_ID and
CS_PRI_RING_ID_SEQ (CS prefix = corosync, PRI modeled in spirit of
inttypes.h PRIx32) and makes code use them.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
so that we get the nice log message when attempting to modify them at
runtime, just like for totem.crypto_* and co.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Here we're very far from entering the main loop, even farther from
sending the READY notification to systemd. This sounded awkward:
systemd[1]: Starting Corosync Cluster Engine...
corosync[827]: [MAIN ] Corosync Cluster Engine ('2.99.5'):
started and ready to provide service.
corosync[827]: [MAIN ] Corosync built-in features: dbus monitoring
watchdog augeas systemd xmlconf snmp pie relro bindnow
corosync[827]: [MAIN ] parse error in config: No interfaces defined
corosync[827]: [MAIN ] Corosync Cluster Engine exiting with status 8
at main.c:1378.
systemd[1]: corosync.service: Main process exited, code=exited,
status=8/n/a
systemd[1]: corosync.service: Failed with result 'exit-code'.
systemd[1]: Failed to start Corosync Cluster Engine.
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
system.run_dir was a little bit unfortunate and confusing name. Rename
to state_dir makes more evident what is content of this directory. To
keep setting consistent with code, get_run_dir is changed to
get_state_dir.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Originally totem.ip_version was used to force ip version used by totem.
With Knet this variable didn't make too much sense so it was not used.
Sadly rely only on DNS resolver order doesn't always work (RFC is quite
complicated, but if IPv6 is not configured then IPv4 is preferred), what
we tried to solve by forcing IPv6 and only if that fails, use IPv4.
Sadly this collides with nss_myhostname which is able to return every
local address and today system usually have at least one autogenerated
link-local IPv6 address so it is able to "overwrite" /etc/hosts.
Solution is to enhance totem.ip_version and use it also for Knet.
totem.ip_version is now just a flag for resolver and can have four
states: ipv4 (only IPv4 is used), ipv6 (only IPv6 is used), ipv4-6 (ask
IPv4 first and if it fails ask for IPv6) and ipv6-4 (ask IPv6 first and
if it fails ask for IPv4). Default for Knet and UDPU transports is
ipv6-4, for UDP it's ipv4, because autogenerated mcast addr doesn't play
too well with ipv6-4.
So everywhere where nss_myhostname becomes problem, it's just possible
to set totem.ip_version to ipv4-6.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
It's required to create TOTEM logsys subsys before totemip_parse is used
(so before totem_config_read). Logsys is not yet fully initialized, but
it's good enough.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
It didn't work anyway (the config system requires whole links
to be configured at once) and caused crashes.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
COROSYNC_MAIN_CONFIG_FILE environment variable was quite well hidden
and it was never used by init script. It also makes quite hard to debug
possible problems.
Replace it by -c option.
Also patch makes use of configuration file path as a base for uidgid.d
directory, so it's no longer needed to keep uidgid.d in sysconfdir.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
The reason for this change is, that number of corosync CLI options
kind of exploded and scheduler based one are really beter to be kept in
config file.
Nice side-effect of this move is better "integration" with systemd,
because currently used EnvironmentFile should be really used for
environment and not that much for passing extra options to CLI.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Feature depends on existence of libqb function qb_log_file_reopen.
New function call is added into CFG service API. This function is
used by corosync-cfgtool which now accepts -L parameter.
Finally, logrotate "postrotate" script is calling
corosync-cfgtool -L to notify corosync, instead of using
copytruncate option.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Libcgroup is deprecated and not shipping with new distributions
(OpenSuSE is one example). Solution is to have a partial implementation
of required functionality of libcgroup in the corosync code.
Patch uses hardcoded cgroup mount point, because most of the systems are
now systemd and systemd is also using hardcoded mountpoint (see
https://github.com/systemd/systemd/blob/master/src/core/mount-setup.c)
Configuration option --enable-cgroup is gone, because it's not needed
any longer.
Big thanks to Christine Caulfield <ccaulfie@redhat.com> for example of
simplified implementation of cgroup management code primitives.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Token_warning is used to present information about
when the token was last received.
Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Compiler shows warnings about possible not large enough buffer, so check
snprintf return value properly.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This shrinks the srp_addr (and consequently every packet sent by
corosync) so that instead of containing loads of IP addresses to
identify a node, it just sends the nodeid.
This then allows us to make ring0 optional and replaceable when running
knet.
It also means that we need some other way of identifying the local
node in corosync.conf, so the nodelist.node.name entry is now mandatory
and is mapped to the local host using the same algorithm as used in
cman.
This code needs LOTS of testing as it touches a huge amount of totemsrp
and totemconfig.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This enables starting the daemon directly in the service file, because
dependent units won't be started until initialization is complete.
Signed-off-by: Ferenc Wágner <wferi@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
libqb seems funny about logging things before its fully configured.
This corosync commit didn't help either:
8b6bd86a55
So to make sure that messages about the config file not being opened
get delivered to the user/syslog we send them directly.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
When corosync is started in environment where it ends in cgroup without
properly set rt_runtime_us it's impossible to get RT priority.
Already implemented workaround is to use higher non-RT priority.
This patch implements another solution. It moves corosync into root cpu
cgroup. Root cpu cgroup hopefully has enough RT budget.
Another solution was mentioned on ML
https://lists.freedesktop.org/archives/systemd-devel/2017-July/039353.html
but this means to generate some "random" values.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
(cherry picked from commit c56086c701)
Icmap is factored out so it's possible to add other
maps for cmap. API call to switch maps from application
end is added.
Corosync-cmapctl is enhanced with -m option.
Stats contains all statistics previously found in runtime.connections,
runtime.services and runtime.totem prefixes together with new knet
related. All stats are read only.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
If the library sent an invalid (ie too high) message ID to
corosync, then it could cause the daemon to crash.
Now we check the message ID before indexing the function array
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Option -P takes numeric value with same meaning
as nice or values min / max, meaning maximal / minimal priority (so
minimal / maximal nice value).
Scheduler / priority setting is moved in code so it is now executed
after logsys is configured so errors are logged.
Setting maximal priority is also used as fallback when realtime
scheduling is requested and sched_setscheduler fails.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
(cherry picked from commit a008448efb)
Man page of mlockall is clear:
Memory locks are not inherited by a child created via fork(2) and are
automatically removed (unlocked) during an execve(2) or when the
process terminates.
So calling mlockall before corosync_tty_detach is noop when corosync is
executed as a daemon (corosync -f was not affected).
This regression is caused by ed7d054e55
(setprio for logsys/qblog was correct, mlockall was not).
Solution is to move corosync_mlockall call on correct place.
Signed-off-by: Andrew Price <anprice@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
In a two-node cluster, I 've one node configured with open-vswtich:
5: br-fixed: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
state UNKNOWN group default
inet 192.168.124.88/24 scope global br-fixed
inet 192.168.124.87/24 scope global secondary br-fixed
inet 192.168.124.83/24 brd 192.168.124.255 scope global secondary
tentative br-fixed
inet 192.168.124.89/24 scope global secondary br-fixed
while I use 192.168.124.83 in node list of corosync.conf with udpu, and
the bind_addr is 192.168.124.0. After upgrading corosync on this node,
the it uses 192.168.124.88 instead of 192.168.124.83. As we can see:
corosync-cfgtool -s
Printing ring status.
Local node ID 1084783704
corosync-quorumtool -s
Membership information:
Nodeid Votes Name
1084783697 1 d52-54-77-77-01-02
1084783699 1 d52-54-77-77-01-01 (local)
while the other node can only see itself:
corosync-cfgtool -s
Printing ring status.
Local node ID 1084783697
RING ID 0
id = 192.168.124.81
status = ring 0 active with no faults
corosync-quorumtool -s
Membership information:
Nodeid Votes Name
1084783697 1 d52-54-77-77-01-02.virtual.cloud.suse.de (local)
this patch will check if there are both nodelist and bindnetaddr and if
so, display warning and use nodelist information.
Signed-off-by: Bin Liu <bliu@suse.com>
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
segv should be handled by corosync, libqb is not the
place to be handling emergency signals.
This currently requires the head of libqb git tree to
generate a blackbox & coredump in the event of a segfault,
but it's better than the write() spin that currently happens.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
LibQB adds default "*" syslog filter so we have to set syslog_priority
as low as possible so filters applied later in
_logsys_config_apply_per_file takes effect.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.
To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet
The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.
Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Uidgid entries parsed from configuration files now has prefix
(uidgid.config.) so they are distinguishable from dynamically added
entries. Entries added from config file are pruned on reload if no
longer exists in config file (dynamic one stays unaffected). Also whole
uidgid.config. prefix is made read only.
This make PCMK work again after configuration reload is called.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>