Libcgroup is deprecated and not shipping with new distributions
(OpenSuSE is one example). Solution is to have a partial implementation
of required functionality of libcgroup in the corosync code.
Patch uses hardcoded cgroup mount point, because most of the systems are
now systemd and systemd is also using hardcoded mountpoint (see
https://github.com/systemd/systemd/blob/master/src/core/mount-setup.c)
Configuration option --enable-cgroup is gone, because it's not needed
any longer.
Big thanks to Christine Caulfield <ccaulfie@redhat.com> for example of
simplified implementation of cgroup management code primitives.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Token_warning is used to present information about
when the token was last received.
Signed-off-by: Chris Walker <cwalker@cray.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Whilst working on the Reproducible Builds effort [0], we noticed
that corosync could not be built reproducibly.
This is because, whilst it uses SOURCE_DATE_EPOCH[1], the output
varies depending on the current timezone.
(The LC_ALL is not needed as we only use %Y-%m-%d)
This was originally filed in Debian as #896441.
[0] https://reproducible-builds.org/
[1] https://reproducible-builds.org/specs/source-date-epoch/
[2] https://bugs.debian.org/896441
Signed-off-by: Chris Lamb <lamby@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
The _overview manpages are not actually commands and hence do not belong
into manpage section 8. Move corosync_overview to section 7
("Miscellaneous") and the other *_overview pages to section 3 as they
contain API documentation (cf. string(3) for precedence of
multi-function manpages).
Signed-off-by: Christoph Berg <myon@debian.org>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
corosync-qdevice and corosync-qnetd now has a new home
https://github.com/corosync/corosync-qdevice
This will allow us to better react on actual needs of quite independent
projects.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
knet ping_timers are auto-configured according to token value.
This patch also fixes some knet config bugs that resulted in defaults
not being applied when values were removed from corosync.conf.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
KNET has options for nss or openssl crpyto libraries, make this
available to corosync.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
knet allows links to have different IP versions - proivided they
all match per link. So don't force them all to be the same.
I've added a check here to make sure that all nodes on the same
link are using the same IP version.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
use the idea from corosync-cmapctl to set ACTION and params in the first
swtich, and add another swtich to call function based on ACTION and the
params.
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
It was discovered that pacemaker has been occassionaly relying on
those items configured in corosync.conf (and documenting so), while
backpropagation got stuck somewhere. As the option is deemed generally
beneficial, rectify this gap now and make it standard,
public part of the configuration space, possibly also for other
client SW to use now.
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
1. multicast address/port is only for UDP
2. change kronosnet to Kronosnet
3. nodeid must be set with KNET, not UDPU
Signed-off-by: Bin Liu <bliu@suse.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Following the well-used scheme:
- expressly given defaults: italics (underline in standard terminals)
- key cross-references: bold (as well as the originals)
+ fix missing paragraph delimiters
+ s/what/which/ and s/on/one/ where appropriate
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Previously, some enumerations were hard to follow, as they were marked
up all at once, including punctuation and connectives. Also mark up
some expressly given defaults.
Signed-off-by: Jan Pokorný <jpokorny@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Heuristics are set of commands executed locally on startup, cluster
membership change, successful connect to corosync-qnetd and optionally
also at regular times. When all commands finish successfully
(their return error code is zero) on time, heuristics have passed,
otherwise they have failed. The heuristics result is sent to
corosync-qnetd and there it's used in calculations to determine which
partition should be quorate.
Right know, there are some problems (bugs):
- Regular heuristics is supported only by ffsplit. This is not a
problem for clusters with power fencing, but deployments where
non-quorate partition continues to operate may see this as a problem.
- Qdevice-tool status doesn't contain detailed information about
heuristics.
- Qdevice-tool doesn't have a possibility to trigger heuristics
re-execute.
Thanks Chrissie Caulfield for Englishify the man pages.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
RRP doesn't exist any more so all the ring re-enable code is redundant.
I've removed it from the library and all the code that does anything,
but I've left the hole in the IPC just in case old libraries are
hanging around.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
When corosync is started in environment where it ends in cgroup without
properly set rt_runtime_us it's impossible to get RT priority.
Already implemented workaround is to use higher non-RT priority.
This patch implements another solution. It moves corosync into root cpu
cgroup. Root cpu cgroup hopefully has enough RT budget.
Another solution was mentioned on ML
https://lists.freedesktop.org/archives/systemd-devel/2017-July/039353.html
but this means to generate some "random" values.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
(cherry picked from commit c56086c701)
Icmap is factored out so it's possible to add other
maps for cmap. API call to switch maps from application
end is added.
Corosync-cmapctl is enhanced with -m option.
Stats contains all statistics previously found in runtime.connections,
runtime.services and runtime.totem prefixes together with new knet
related. All stats are read only.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Option -P takes numeric value with same meaning
as nice or values min / max, meaning maximal / minimal priority (so
minimal / maximal nice value).
Scheduler / priority setting is moved in code so it is now executed
after logsys is configured so errors are logged.
Setting maximal priority is also used as fallback when realtime
scheduling is requested and sched_setscheduler fails.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
(cherry picked from commit a008448efb)
Instead of currently read bits, number of already read bits is
displayed to let the user know how long it's needed to "press keys"
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
/dev/urandom is good enough for crypto keys and it's not blocking. If
superb randomness is really needed, it's possible to use newly added
option -r.
Also manpage is reworked a bit to use .nf instead of many .br.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Corosync layers don't need to know the knet MTU size - this way
corosync fragments buffers only when they get larger than the
KNET buffer size (64K) and knet fragments below that based on
the actual MTU and transport considerations.
It is also now possible to configure knet to use UDP or SCTP
transports in corosync.conf. This is currently done per-link
so if you have more than 1 link you need several interface{}
stanzas inside totem{} to make it use other than the default
of UDP. if it's useful I might add the option of a global
default.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Commit 8d8d4a936a introduced the
configuration parameter resources.watchdog_device. This commit
introduces the resources section and watchdog_device parameter in
corosync.conf.5.
Signed-off-by: Adrian Vondendriesch <adrian.vondendriesch@credativ.de>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This is a big update that removes RRP & MRP from the codebase
and makes knet the default transport for corosync. UDP & UDPU
are still (currently) supported but are deprecated. Also crypto
and mutiple interfaces are only supported over knet.
To compile this codebase you will need to install libknet from
https://github.com/fabbione/kronosnet
The corosync.conf(5) man page has been updated with info on the new
options. Older config files should still work but many options
have changed because of the knet implementation so configs should
be checked carefully. In particular any cluster using using RRP
over UDP or UDPU will not start as RRP is no longer present. If you
need multiple interface support then you should be using the knet transport.
Knet brings many benefits to the corosync codebase, it provides support
for more interfaces than RRP (up to 8), will be more reliable in the event
of network outages and allows dynamic reconfiguration of interfaces.
It also fixes the ifup/ifdown and 127.0.0.1 binding problems that have
plagued corosync/openais from day 1
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Uidgid entries parsed from configuration files now has prefix
(uidgid.config.) so they are distinguishable from dynamically added
entries. Entries added from config file are pruned on reload if no
longer exists in config file (dynamic one stays unaffected). Also whole
uidgid.config. prefix is made read only.
This make PCMK work again after configuration reload is called.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
including mentioning corosync-qdevice(5) on the
votequorum(5) and corosync.conf(5) pages.
Thanks to Jan Pokorný for reporting these.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
This patch tidies the two state change callbacks and explains them
in the man page:
The difference between votequorum_nodelist_notification_t and
votequorum_quorum_notification_t is subtle but important.
The 'nodelist' callback is sent at the start of a cluster state
transition and contains the new ring_id and only the list of
nodes that are included in the sync state - ie only active nodes. No
quorum information is included this callback because it is not
available at that time.
The 'quorum' callback is sent after the cluster state transition has
completed and does contain quorum information.
In addition, the nodelist contains a list of all nodes known to
votequorum (whether up or down) and their state as well
as information about the quorum device attached (if any). quorum
callbacks will not be sent for qdevice up and down
events unless they affect quorum.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
- Changed usefull to useful
Change-Id: I2d7872b21e889202cd2b7752db4c76f18fffa95d
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
- "There are informations" changed to "There is information"
- Other occurrences of informations changed to information
Original patch was created by Richard B Winters <rik@mmogp.com>, so
thanks for it.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
- dont to don't
- overriden to overridden
- informations to information
Change-Id: If6644694d750c30ba9f5f43b4eb852485613d64a
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
"allows to" was updated to read "allows one to"
- With a subject it's grammatically correct.
Change-Id: I9559e31c780e211b651744c6eaa056ce8d4c3db1
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Change-Id: Ib80face38dce0345e649297d16cf8a63c5b0e8c1
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Change-Id: I8c5d6af915203533c80e4eaa574e305a46d74815
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Corrected the spelling of retrieve, where it was spelled as retreive.
- There were two cases of this mispelling; one
upper-case and one lower-case
Change-Id: Ic97fd210d8d3ae7e568e5a2e5d97c6220d2ff628
Signed-off-by: Richard B Winters <rik@mmogp.com>
Reviewed-by: Steven Dake <sdake@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
- add link to cmap_keys(8)
- remove link to cpg_groups_get(3)
- add missing cpg_* and votequorum_qdevice_* functions
- corosync-fplay has already been removed by ab32894
Signed-off-by: Ferenc Wágner <wferi@niif.hu>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Experience with larger production clusters showed that setting RR
priority for corosync is viable for prevent random fencing, ...
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
When using multiple interfaces, it's necessary to use different
multicast address/port pair for each interface to make
rrp work correctly. This is now checked in parser.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
With introduction of token_coefficient, token timeout defined in
configuration file may be no longer reflect real token timeout, what may
be confusing.
Enhanced description hopefully fix that.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Some totem configuration values (like token, consensus, ...) are ether
computed or default value is used. It's hard to find out, what
value is really used.
Solution is to store values in cmap.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
While I was looking at the above man page changes I thought I'd review
the rest of it. So here are some more English fixes for the cmap_keys.8
man page
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Adds a -o<a|i|n> option to corosync-votequorum so that the nodes list
can be sorted by Address, node Id or Name. The default remains IP
address.
Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Although YKD is currently unsupported, untested and decprecated it's
handy for testing things in the quorum module.
This patch allows YKD to actually load without an error. It does not fix
anything else in the service!
Also remove vsftype and its reference to YKD being the preferred and
default provider from the corosync.conf man page,
as that hasn't been true for a considerable time.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
It's possible in a two_node cluster (and others but it's more likely
with just two) that a node could be booted up after downtime or failure
and the other node is not available for some reason. In this case it
would not be allowed to proceed because wait_for_all is enforced.
This patch provides a cmap key to clear this flag in the desperate
situation where that becomes necessary. It should only be used with
extreme caution and will be wrapped up in pcs which should also check
that fencing has been run.
Signed-Off-By: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Configuration option quorum.device.sync_timeout is available for setting
qdevice poll timeout for synchronization phase. Default value is 30
sec.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This is needed for qdevice to be able to process messages during
synchronization phase.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
If votequorum service receives incorrect (not current) ringid, call is
ignored and CS_ERR_MESSAGE_ERROR is returned.
This and previous commits makes incompatible changes in votequorum
API/ABI, so library version is increased.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Returning ring id will be used in poll function.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Allow it to create keyfile not in the hardcoded location.
Drop root checks.
Minor cosmetic fixes to the man-page.
Signed-off-by: Vladislav Bogdanov <bubble@hoster-ok.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
During reload, local_node_pos is deleted and reinstation is handled in
totemconfig after reload is finished. votequorum handles this events and
tries to reload it's configuration. This led to logging a little scary
messages (even nothing bad is happening, because after local_node_pos
reinstation everything back to normal).
Solution is to stop processing events during reload. Sadly, simple
tracking of config.reload_in_progress doesn't work because LibQB events
triggering order is undefined so votequorum reload handler can be called
before totemconfig (and before local_node_pos is reinstatied).
So new config.totemconfig_reload_in_progress key is defined with very
similar semanthic as config.reload_in_progress but set inside
totem_reload_notify function. Votequorum then use this new key.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
Token coefficient is used only when nodelist is specified and contains
at least 3 nodes. If so, real token timeout is then computed as
token + (number_of_nodes - 2) * token_coefficient. This allows cluster
to scale without manually changing token timeout every time new
node is added. This value can be set to 0 resulting in effective
removal of this feature.
Signed-off-by: Jan Friesse <jfriesse@redhat.com>
Reviewed-by: Christine Caulfield <ccaulfie@redhat.com>
This patch adds more flexibility to the auto_tie_breaker feature of
votequorum. With this, not only can the lowest nodeid be used as
a tie breaker, but also the highest, or a node from a nominated list.
If there is a list of nodes, the first node in the list that was not
part of the previous partition is used. This allows the user to
specify a preferred set of nodes but prevents a split-brain if the
cluster divides evenly with a node in each half.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
Man pages for votequorum_qdevice_update and votquorum_qdevice_master_wins
were missing from the last commit.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Improve the man pages for the votequorum qdevice API and include
them in the build. Also improve the testvotequorum2 test program.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
This patch adds the option to store expected_votes to
persistent storage. This is needed to allow_downscale
to operate properly.
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Add the code to do the actual corosync.conf reload to cfg, along with
a corosync-cfgtool -R command to trigger it
Signed-off-by: Christine Caulfield <ccaulfie@redhat.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>
he markup around an example is impossible to lift to XML or HTML
cleanly. Simplifying it fixes the problem.
Signed-off-by: Eric Raymond <esr@thyrsus.com>
Reviewed-by: Jan Friesse <jfriesse@redhat.com>