mirror of
https://git.proxmox.com/git/pve-docs
synced 2025-08-13 03:41:49 +00:00
Update pvecm documentation for corosync 3
Parts about multicast and RRP have been removed entirely. Instead, a new section 'Corosync Redundancy' has been added explaining the concept of links and link priorities. Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
This commit is contained in:
parent
3254bfddb3
commit
a9e7c3aa23
446
pvecm.adoc
446
pvecm.adoc
@ -56,13 +56,8 @@ Grouping nodes into a cluster has the following advantages:
|
|||||||
Requirements
|
Requirements
|
||||||
------------
|
------------
|
||||||
|
|
||||||
* All nodes must be in the same network as `corosync` uses IP Multicast
|
* All nodes must be able to connect to each other via UDP ports 5404 and 5405
|
||||||
to communicate between nodes (also see
|
for corosync to work.
|
||||||
http://www.corosync.org[Corosync Cluster Engine]). Corosync uses UDP
|
|
||||||
ports 5404 and 5405 for cluster communication.
|
|
||||||
+
|
|
||||||
NOTE: Some switches do not support IP multicast by default and must be
|
|
||||||
manually enabled first.
|
|
||||||
|
|
||||||
* Date and time have to be synchronized.
|
* Date and time have to be synchronized.
|
||||||
|
|
||||||
@ -84,6 +79,11 @@ NOTE: While it's possible for {pve} 4.4 and {pve} 5.0 this is not supported as
|
|||||||
production configuration and should only used temporarily during upgrading the
|
production configuration and should only used temporarily during upgrading the
|
||||||
whole cluster from one to another major version.
|
whole cluster from one to another major version.
|
||||||
|
|
||||||
|
NOTE: Running a cluster of {pve} 6.x with earlier versions is not possible. The
|
||||||
|
cluster protocol (corosync) between {pve} 6.x and earlier versions changed
|
||||||
|
fundamentally. The corosync 3 packages for {pve} 5.4 are only intended for the
|
||||||
|
upgrade procedure to {pve} 6.0.
|
||||||
|
|
||||||
|
|
||||||
Preparing Nodes
|
Preparing Nodes
|
||||||
---------------
|
---------------
|
||||||
@ -96,10 +96,13 @@ Currently the cluster creation can either be done on the console (login via
|
|||||||
`ssh`) or the API, which we have a GUI implementation for (__Datacenter ->
|
`ssh`) or the API, which we have a GUI implementation for (__Datacenter ->
|
||||||
Cluster__).
|
Cluster__).
|
||||||
|
|
||||||
While it's often common use to reference all other nodenames in `/etc/hosts`
|
While it's common to reference all nodenames and their IPs in `/etc/hosts` (or
|
||||||
with their IP this is not strictly necessary for a cluster, which normally uses
|
make their names resolvable through other means), this is not necessary for a
|
||||||
multicast, to work. It maybe useful as you then can connect from one node to
|
cluster to work. It may be useful however, as you can then connect from one node
|
||||||
the other with SSH through the easier to remember node name.
|
to the other with SSH via the easier to remember node name (see also
|
||||||
|
xref:pvecm_corosync_addresses[Link Address Types]). Note that we always
|
||||||
|
recommend to reference nodes by their IP addresses in the cluster configuration.
|
||||||
|
|
||||||
|
|
||||||
[[pvecm_create_cluster]]
|
[[pvecm_create_cluster]]
|
||||||
Create the Cluster
|
Create the Cluster
|
||||||
@ -113,10 +116,10 @@ node names.
|
|||||||
hp1# pvecm create CLUSTERNAME
|
hp1# pvecm create CLUSTERNAME
|
||||||
----
|
----
|
||||||
|
|
||||||
CAUTION: The cluster name is used to compute the default multicast address.
|
NOTE: It is possible to create multiple clusters in the same physical or logical
|
||||||
Please use unique cluster names if you run more than one cluster inside your
|
network. Use unique cluster names if you do so. To avoid human confusion, it is
|
||||||
network. To avoid human confusion, it is also recommended to choose different
|
also recommended to choose different names even if clusters do not share the
|
||||||
names even if clusters do not share the cluster network.
|
cluster network.
|
||||||
|
|
||||||
To check the state of your cluster use:
|
To check the state of your cluster use:
|
||||||
|
|
||||||
@ -124,20 +127,6 @@ To check the state of your cluster use:
|
|||||||
hp1# pvecm status
|
hp1# pvecm status
|
||||||
----
|
----
|
||||||
|
|
||||||
Multiple Clusters In Same Network
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
It is possible to create multiple clusters in the same physical or logical
|
|
||||||
network. Each cluster must have a unique name, which is used to generate the
|
|
||||||
cluster's multicast group address. As long as no duplicate cluster names are
|
|
||||||
configured in one network segment, the different clusters won't interfere with
|
|
||||||
each other.
|
|
||||||
|
|
||||||
If multiple clusters operate in a single network it may be beneficial to setup
|
|
||||||
an IGMP querier and enable IGMP Snooping in said network. This may reduce the
|
|
||||||
load of the network significantly because multicast packets are only delivered
|
|
||||||
to endpoints of the respective member nodes.
|
|
||||||
|
|
||||||
|
|
||||||
[[pvecm_join_node_to_cluster]]
|
[[pvecm_join_node_to_cluster]]
|
||||||
Adding Nodes to the Cluster
|
Adding Nodes to the Cluster
|
||||||
@ -150,7 +139,7 @@ Login via `ssh` to the node you want to add.
|
|||||||
----
|
----
|
||||||
|
|
||||||
For `IP-ADDRESS-CLUSTER` use the IP or hostname of an existing cluster node.
|
For `IP-ADDRESS-CLUSTER` use the IP or hostname of an existing cluster node.
|
||||||
An IP address is recommended (see xref:pvecm_corosync_addresses[Ring Address Types]).
|
An IP address is recommended (see xref:pvecm_corosync_addresses[Link Address Types]).
|
||||||
|
|
||||||
CAUTION: A new node cannot hold any VMs, because you would get
|
CAUTION: A new node cannot hold any VMs, because you would get
|
||||||
conflicts about identical VM IDs. Also, all existing configuration in
|
conflicts about identical VM IDs. Also, all existing configuration in
|
||||||
@ -158,7 +147,7 @@ conflicts about identical VM IDs. Also, all existing configuration in
|
|||||||
workaround, use `vzdump` to backup and restore to a different VMID after
|
workaround, use `vzdump` to backup and restore to a different VMID after
|
||||||
adding the node to the cluster.
|
adding the node to the cluster.
|
||||||
|
|
||||||
To check the state of cluster:
|
To check the state of the cluster use:
|
||||||
|
|
||||||
----
|
----
|
||||||
# pvecm status
|
# pvecm status
|
||||||
@ -173,7 +162,7 @@ Date: Mon Apr 20 12:30:13 2015
|
|||||||
Quorum provider: corosync_votequorum
|
Quorum provider: corosync_votequorum
|
||||||
Nodes: 4
|
Nodes: 4
|
||||||
Node ID: 0x00000001
|
Node ID: 0x00000001
|
||||||
Ring ID: 1928
|
Ring ID: 1/8
|
||||||
Quorate: Yes
|
Quorate: Yes
|
||||||
|
|
||||||
Votequorum information
|
Votequorum information
|
||||||
@ -217,15 +206,15 @@ Adding Nodes With Separated Cluster Network
|
|||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
When adding a node to a cluster with a separated cluster network you need to
|
When adding a node to a cluster with a separated cluster network you need to
|
||||||
use the 'ringX_addr' parameters to set the nodes address on those networks:
|
use the 'link0' parameter to set the nodes address on that network:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
pvecm add IP-ADDRESS-CLUSTER -ring0_addr IP-ADDRESS-RING0
|
pvecm add IP-ADDRESS-CLUSTER -link0 LOCAL-IP-ADDRESS-LINK0
|
||||||
----
|
----
|
||||||
|
|
||||||
If you want to use the Redundant Ring Protocol you will also want to pass the
|
If you want to use the built-in xref:pvecm_redundancy[redundancy] of the
|
||||||
'ring1_addr' parameter.
|
kronosnet transport layer, also use the 'link1' parameter.
|
||||||
|
|
||||||
|
|
||||||
Remove a Cluster Node
|
Remove a Cluster Node
|
||||||
@ -283,7 +272,7 @@ Date: Mon Apr 20 12:44:28 2015
|
|||||||
Quorum provider: corosync_votequorum
|
Quorum provider: corosync_votequorum
|
||||||
Nodes: 3
|
Nodes: 3
|
||||||
Node ID: 0x00000001
|
Node ID: 0x00000001
|
||||||
Ring ID: 1992
|
Ring ID: 1/8
|
||||||
Quorate: Yes
|
Quorate: Yes
|
||||||
|
|
||||||
Votequorum information
|
Votequorum information
|
||||||
@ -302,8 +291,8 @@ Membership information
|
|||||||
0x00000003 1 192.168.15.92
|
0x00000003 1 192.168.15.92
|
||||||
----
|
----
|
||||||
|
|
||||||
If, for whatever reason, you want that this server joins the same
|
If, for whatever reason, you want this server to join the same cluster again,
|
||||||
cluster again, you have to
|
you have to
|
||||||
|
|
||||||
* reinstall {pve} on it from scratch
|
* reinstall {pve} on it from scratch
|
||||||
|
|
||||||
@ -329,14 +318,14 @@ storage with another cluster, as storage locking doesn't work over cluster
|
|||||||
boundary. Further, it may also lead to VMID conflicts.
|
boundary. Further, it may also lead to VMID conflicts.
|
||||||
|
|
||||||
Its suggested that you create a new storage where only the node which you want
|
Its suggested that you create a new storage where only the node which you want
|
||||||
to separate has access. This can be an new export on your NFS or a new Ceph
|
to separate has access. This can be a new export on your NFS or a new Ceph
|
||||||
pool, to name a few examples. Its just important that the exact same storage
|
pool, to name a few examples. Its just important that the exact same storage
|
||||||
does not gets accessed by multiple clusters. After setting this storage up move
|
does not gets accessed by multiple clusters. After setting this storage up move
|
||||||
all data from the node and its VMs to it. Then you are ready to separate the
|
all data from the node and its VMs to it. Then you are ready to separate the
|
||||||
node from the cluster.
|
node from the cluster.
|
||||||
|
|
||||||
WARNING: Ensure all shared resources are cleanly separated! You will run into
|
WARNING: Ensure all shared resources are cleanly separated! Otherwise you will
|
||||||
conflicts and problems else.
|
run into conflicts and problems.
|
||||||
|
|
||||||
First stop the corosync and the pve-cluster services on the node:
|
First stop the corosync and the pve-cluster services on the node:
|
||||||
[source,bash]
|
[source,bash]
|
||||||
@ -400,6 +389,7 @@ the nodes can still connect to each other with public key authentication. This
|
|||||||
should be fixed by removing the respective keys from the
|
should be fixed by removing the respective keys from the
|
||||||
'/etc/pve/priv/authorized_keys' file.
|
'/etc/pve/priv/authorized_keys' file.
|
||||||
|
|
||||||
|
|
||||||
Quorum
|
Quorum
|
||||||
------
|
------
|
||||||
|
|
||||||
@ -419,12 +409,13 @@ if it loses quorum.
|
|||||||
|
|
||||||
NOTE: {pve} assigns a single vote to each node by default.
|
NOTE: {pve} assigns a single vote to each node by default.
|
||||||
|
|
||||||
|
|
||||||
Cluster Network
|
Cluster Network
|
||||||
---------------
|
---------------
|
||||||
|
|
||||||
The cluster network is the core of a cluster. All messages sent over it have to
|
The cluster network is the core of a cluster. All messages sent over it have to
|
||||||
be delivered reliable to all nodes in their respective order. In {pve} this
|
be delivered reliably to all nodes in their respective order. In {pve} this
|
||||||
part is done by corosync, an implementation of a high performance low overhead
|
part is done by corosync, an implementation of a high performance, low overhead
|
||||||
high availability development toolkit. It serves our decentralized
|
high availability development toolkit. It serves our decentralized
|
||||||
configuration file system (`pmxcfs`).
|
configuration file system (`pmxcfs`).
|
||||||
|
|
||||||
@ -432,75 +423,57 @@ configuration file system (`pmxcfs`).
|
|||||||
Network Requirements
|
Network Requirements
|
||||||
~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~
|
||||||
This needs a reliable network with latencies under 2 milliseconds (LAN
|
This needs a reliable network with latencies under 2 milliseconds (LAN
|
||||||
performance) to work properly. While corosync can also use unicast for
|
performance) to work properly. The network should not be used heavily by other
|
||||||
communication between nodes its **highly recommended** to have a multicast
|
members, ideally corosync runs on its own network. Do not use a shared network
|
||||||
capable network. The network should not be used heavily by other members,
|
for corosync and storage (except as a potential low-priority fallback in a
|
||||||
ideally corosync runs on its own network.
|
xref:pvecm_redundancy[redundant] configuration).
|
||||||
*never* share it with network where storage communicates too.
|
|
||||||
|
|
||||||
Before setting up a cluster it is good practice to check if the network is fit
|
Before setting up a cluster, it is good practice to check if the network is fit
|
||||||
for that purpose.
|
for that purpose. To make sure the nodes can connect to each other on the
|
||||||
|
cluster network, you can test the connectivity between them with the `ping`
|
||||||
|
tool.
|
||||||
|
|
||||||
* Ensure that all nodes are in the same subnet. This must only be true for the
|
If the {pve} firewall is enabled, ACCEPT rules for corosync will automatically
|
||||||
network interfaces used for cluster communication (corosync).
|
be generated - no manual action is required.
|
||||||
|
|
||||||
* Ensure all nodes can reach each other over those interfaces, using `ping` is
|
NOTE: Corosync used Multicast before version 3.0 (introduced in {pve} 6.0).
|
||||||
enough for a basic test.
|
Modern versions rely on https://kronosnet.org/[Kronosnet] for cluster
|
||||||
|
communication, which, for now, only supports regular UDP unicast.
|
||||||
|
|
||||||
* Ensure that multicast works in general and a high package rates. This can be
|
CAUTION: You can still enable Multicast or legacy unicast by setting your
|
||||||
done with the `omping` tool. The final "%loss" number should be < 1%.
|
transport to `udp` or `udpu` in your xref:pvecm_edit_corosync_conf[corosync.conf],
|
||||||
+
|
but keep in mind that this will disable all cryptography and redundancy support.
|
||||||
[source,bash]
|
This is therefore not recommended.
|
||||||
----
|
|
||||||
omping -c 10000 -i 0.001 -F -q NODE1-IP NODE2-IP ...
|
|
||||||
----
|
|
||||||
|
|
||||||
* Ensure that multicast communication works over an extended period of time.
|
|
||||||
This uncovers problems where IGMP snooping is activated on the network but
|
|
||||||
no multicast querier is active. This test has a duration of around 10
|
|
||||||
minutes.
|
|
||||||
+
|
|
||||||
[source,bash]
|
|
||||||
----
|
|
||||||
omping -c 600 -i 1 -q NODE1-IP NODE2-IP ...
|
|
||||||
----
|
|
||||||
|
|
||||||
Your network is not ready for clustering if any of these test fails. Recheck
|
|
||||||
your network configuration. Especially switches are notorious for having
|
|
||||||
multicast disabled by default or IGMP snooping enabled with no IGMP querier
|
|
||||||
active.
|
|
||||||
|
|
||||||
In smaller cluster its also an option to use unicast if you really cannot get
|
|
||||||
multicast to work.
|
|
||||||
|
|
||||||
Separate Cluster Network
|
Separate Cluster Network
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
When creating a cluster without any parameters the cluster network is generally
|
When creating a cluster without any parameters the corosync cluster network is
|
||||||
shared with the Web UI and the VMs and its traffic. Depending on your setup
|
generally shared with the Web UI and the VMs and their traffic. Depending on
|
||||||
even storage traffic may get sent over the same network. Its recommended to
|
your setup, even storage traffic may get sent over the same network. Its
|
||||||
change that, as corosync is a time critical real time application.
|
recommended to change that, as corosync is a time critical real time
|
||||||
|
application.
|
||||||
|
|
||||||
Setting Up A New Network
|
Setting Up A New Network
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
First you have to setup a new network interface. It should be on a physical
|
First you have to set up a new network interface. It should be on a physically
|
||||||
separate network. Ensure that your network fulfills the
|
separate network. Ensure that your network fulfills the
|
||||||
xref:pvecm_cluster_network_requirements[cluster network requirements].
|
xref:pvecm_cluster_network_requirements[cluster network requirements].
|
||||||
|
|
||||||
Separate On Cluster Creation
|
Separate On Cluster Creation
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
This is possible through the 'ring0_addr' and 'bindnet0_addr' parameter of
|
This is possible via the 'linkX' parameters of the 'pvecm create'
|
||||||
the 'pvecm create' command used for creating a new cluster.
|
command used for creating a new cluster.
|
||||||
|
|
||||||
If you have setup an additional NIC with a static address on 10.10.10.1/25
|
If you have set up an additional NIC with a static address on 10.10.10.1/25,
|
||||||
and want to send and receive all cluster communication over this interface
|
and want to send and receive all cluster communication over this interface,
|
||||||
you would execute:
|
you would execute:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
pvecm create test --ring0_addr 10.10.10.1 --bindnet0_addr 10.10.10.0
|
pvecm create test --link0 10.10.10.1
|
||||||
----
|
----
|
||||||
|
|
||||||
To check if everything is working properly execute:
|
To check if everything is working properly execute:
|
||||||
@ -509,20 +482,20 @@ To check if everything is working properly execute:
|
|||||||
systemctl status corosync
|
systemctl status corosync
|
||||||
----
|
----
|
||||||
|
|
||||||
Afterwards, proceed as descripted in the section to
|
Afterwards, proceed as described above to
|
||||||
xref:pvecm_adding_nodes_with_separated_cluster_network[add nodes with a separated cluster network].
|
xref:pvecm_adding_nodes_with_separated_cluster_network[add nodes with a separated cluster network].
|
||||||
|
|
||||||
[[pvecm_separate_cluster_net_after_creation]]
|
[[pvecm_separate_cluster_net_after_creation]]
|
||||||
Separate After Cluster Creation
|
Separate After Cluster Creation
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
You can do this also if you have already created a cluster and want to switch
|
You can do this if you have already created a cluster and want to switch
|
||||||
its communication to another network, without rebuilding the whole cluster.
|
its communication to another network, without rebuilding the whole cluster.
|
||||||
This change may lead to short durations of quorum loss in the cluster, as nodes
|
This change may lead to short durations of quorum loss in the cluster, as nodes
|
||||||
have to restart corosync and come up one after the other on the new network.
|
have to restart corosync and come up one after the other on the new network.
|
||||||
|
|
||||||
Check how to xref:pvecm_edit_corosync_conf[edit the corosync.conf file] first.
|
Check how to xref:pvecm_edit_corosync_conf[edit the corosync.conf file] first.
|
||||||
The open it and you should see a file similar to:
|
Then, open it and you should see a file similar to:
|
||||||
|
|
||||||
----
|
----
|
||||||
logging {
|
logging {
|
||||||
@ -560,37 +533,41 @@ quorum {
|
|||||||
}
|
}
|
||||||
|
|
||||||
totem {
|
totem {
|
||||||
cluster_name: thomas-testcluster
|
cluster_name: testcluster
|
||||||
config_version: 3
|
config_version: 3
|
||||||
ip_version: ipv4
|
ip_version: ipv4-6
|
||||||
secauth: on
|
secauth: on
|
||||||
version: 2
|
version: 2
|
||||||
interface {
|
interface {
|
||||||
bindnetaddr: 192.168.30.50
|
linknumber: 0
|
||||||
ringnumber: 0
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
The first you want to do is add the 'name' properties in the node entries if
|
NOTE: `ringX_addr` actually specifies a corosync *link address*, the name "ring"
|
||||||
you do not see them already. Those *must* match the node name.
|
is a remnant of older corosync versions that is kept for backwards
|
||||||
|
compatibility.
|
||||||
|
|
||||||
Then replace the address from the 'ring0_addr' properties with the new
|
The first thing you want to do is add the 'name' properties in the node entries
|
||||||
addresses. You may use plain IP addresses or also hostnames here. If you use
|
if you do not see them already. Those *must* match the node name.
|
||||||
|
|
||||||
|
Then replace all addresses from the 'ring0_addr' properties of all nodes with
|
||||||
|
the new addresses. You may use plain IP addresses or hostnames here. If you use
|
||||||
hostnames ensure that they are resolvable from all nodes. (see also
|
hostnames ensure that they are resolvable from all nodes. (see also
|
||||||
xref:pvecm_corosync_addresses[Ring Address Types])
|
xref:pvecm_corosync_addresses[Link Address Types])
|
||||||
|
|
||||||
In my example I want to switch my cluster communication to the 10.10.10.1/25
|
In this example, we want to switch the cluster communication to the
|
||||||
network. So I replace all 'ring0_addr' respectively. I also set the bindnetaddr
|
10.10.10.1/25 network. So we replace all 'ring0_addr' respectively.
|
||||||
in the totem section of the config to an address of the new network. It can be
|
|
||||||
any address from the subnet configured on the new network interface.
|
|
||||||
|
|
||||||
After you increased the 'config_version' property the new configuration file
|
NOTE: The exact same procedure can be used to change other 'ringX_addr' values
|
||||||
|
as well, although we recommend to not change multiple addresses at once, to make
|
||||||
|
it easier to recover if something goes wrong.
|
||||||
|
|
||||||
|
After we increase the 'config_version' property, the new configuration file
|
||||||
should look like:
|
should look like:
|
||||||
|
|
||||||
----
|
----
|
||||||
|
|
||||||
logging {
|
logging {
|
||||||
debug: off
|
debug: off
|
||||||
to_syslog: yes
|
to_syslog: yes
|
||||||
@ -626,26 +603,28 @@ quorum {
|
|||||||
}
|
}
|
||||||
|
|
||||||
totem {
|
totem {
|
||||||
cluster_name: thomas-testcluster
|
cluster_name: testcluster
|
||||||
config_version: 4
|
config_version: 4
|
||||||
ip_version: ipv4
|
ip_version: ipv4-6
|
||||||
secauth: on
|
secauth: on
|
||||||
version: 2
|
version: 2
|
||||||
interface {
|
interface {
|
||||||
bindnetaddr: 10.10.10.1
|
linknumber: 0
|
||||||
ringnumber: 0
|
|
||||||
}
|
}
|
||||||
|
|
||||||
}
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
Now after a final check whether all changed information is correct we save it
|
Then, after a final check if all changed information is correct, we save it and
|
||||||
and see again the xref:pvecm_edit_corosync_conf[edit corosync.conf file] section to
|
once again follow the xref:pvecm_edit_corosync_conf[edit corosync.conf file]
|
||||||
learn how to bring it in effect.
|
section to bring it into effect.
|
||||||
|
|
||||||
As our change cannot be enforced live from corosync we have to do an restart.
|
The changes will be applied live, so restarting corosync is not strictly
|
||||||
|
necessary. If you changed other settings as well, or notice corosync
|
||||||
|
complaining, you can optionally trigger a restart.
|
||||||
|
|
||||||
On a single node execute:
|
On a single node execute:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
systemctl restart corosync
|
systemctl restart corosync
|
||||||
@ -665,7 +644,8 @@ They will then join the cluster membership one by one on the new network.
|
|||||||
Corosync addresses
|
Corosync addresses
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
A corosync link or ring address can be specified in two ways:
|
A corosync link address (for backwards compatibility denoted by 'ringX_addr' in
|
||||||
|
`corosync.conf`) can be specified in two ways:
|
||||||
|
|
||||||
* **IPv4/v6 addresses** will be used directly. They are recommended, since they
|
* **IPv4/v6 addresses** will be used directly. They are recommended, since they
|
||||||
are static and usually not changed carelessly.
|
are static and usually not changed carelessly.
|
||||||
@ -691,104 +671,132 @@ Nodes that joined the cluster on earlier versions likely still use their
|
|||||||
unresolved hostname in `corosync.conf`. It might be a good idea to replace
|
unresolved hostname in `corosync.conf`. It might be a good idea to replace
|
||||||
them with IPs or a seperate hostname, as mentioned above.
|
them with IPs or a seperate hostname, as mentioned above.
|
||||||
|
|
||||||
[[pvecm_rrp]]
|
|
||||||
Redundant Ring Protocol
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
To avoid a single point of failure you should implement counter measurements.
|
|
||||||
This can be on the hardware and operating system level through network bonding.
|
|
||||||
|
|
||||||
Corosync itself offers also a possibility to add redundancy through the so
|
[[pvecm_redundancy]]
|
||||||
called 'Redundant Ring Protocol'. This protocol allows running a second totem
|
Corosync Redundancy
|
||||||
ring on another network, this network should be physically separated from the
|
-------------------
|
||||||
other rings network to actually increase availability.
|
|
||||||
|
|
||||||
RRP On Cluster Creation
|
Corosync supports redundant networking via its integrated kronosnet layer by
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~
|
default (it is not supported on the legacy udp/udpu transports). It can be
|
||||||
|
enabled by specifying more than one link address, either via the '--linkX'
|
||||||
|
parameters of `pvecm` (while creating a cluster or adding a new node) or by
|
||||||
|
specifying more than one 'ringX_addr' in `corosync.conf`.
|
||||||
|
|
||||||
The 'pvecm create' command provides the additional parameters 'bindnetX_addr',
|
NOTE: To provide useful failover, every link should be on its own
|
||||||
'ringX_addr' and 'rrp_mode', can be used for RRP configuration.
|
physical network connection.
|
||||||
|
|
||||||
NOTE: See the xref:pvecm_corosync_conf_glossary[glossary] if you do not know what each parameter means.
|
Links are used according to a priority setting. You can configure this priority
|
||||||
|
by setting 'knet_link_priority' in the corresponding interface section in
|
||||||
So if you have two networks, one on the 10.10.10.1/24 and the other on the
|
`corosync.conf`, or, preferrably, using the 'priority' parameter when creating
|
||||||
10.10.20.1/24 subnet you would execute:
|
your cluster with `pvecm`:
|
||||||
|
|
||||||
[source,bash]
|
|
||||||
----
|
|
||||||
pvecm create CLUSTERNAME -bindnet0_addr 10.10.10.1 -ring0_addr 10.10.10.1 \
|
|
||||||
-bindnet1_addr 10.10.20.1 -ring1_addr 10.10.20.1
|
|
||||||
----
|
|
||||||
|
|
||||||
RRP On Existing Clusters
|
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
You will take similar steps as described in
|
|
||||||
xref:pvecm_separate_cluster_net_after_creation[separating the cluster network] to
|
|
||||||
enable RRP on an already running cluster. The single difference is, that you
|
|
||||||
will add `ring1` and use it instead of `ring0`.
|
|
||||||
|
|
||||||
First add a new `interface` subsection in the `totem` section, set its
|
|
||||||
`ringnumber` property to `1`. Set the interfaces `bindnetaddr` property to an
|
|
||||||
address of the subnet you have configured for your new ring.
|
|
||||||
Further set the `rrp_mode` to `passive`, this is the only stable mode.
|
|
||||||
|
|
||||||
Then add to each node entry in the `nodelist` section its new `ring1_addr`
|
|
||||||
property with the nodes additional ring address.
|
|
||||||
|
|
||||||
So if you have two networks, one on the 10.10.10.1/24 and the other on the
|
|
||||||
10.10.20.1/24 subnet, the final configuration file should look like:
|
|
||||||
|
|
||||||
----
|
----
|
||||||
totem {
|
# pvecm create CLUSTERNAME --link0 10.10.10.1,priority=20 --link1 10.20.20.1,priority=15
|
||||||
cluster_name: tweak
|
----
|
||||||
config_version: 9
|
|
||||||
ip_version: ipv4
|
This would cause 'link1' to be used first, since it has the lower priority.
|
||||||
rrp_mode: passive
|
|
||||||
secauth: on
|
If no priorities are configured manually (or two links have the same priority),
|
||||||
version: 2
|
links will be used in order of their number, with the lower number having higher
|
||||||
interface {
|
priority.
|
||||||
bindnetaddr: 10.10.10.1
|
|
||||||
ringnumber: 0
|
Even if all links are working, only the one with the highest priority will see
|
||||||
}
|
corosync traffic. Link priorities cannot be mixed, i.e. links with different
|
||||||
interface {
|
priorities will not be able to communicate with each other.
|
||||||
bindnetaddr: 10.10.20.1
|
|
||||||
ringnumber: 1
|
Since lower priority links will not see traffic unless all higher priorities
|
||||||
}
|
have failed, it becomes a useful strategy to specify even networks used for
|
||||||
|
other tasks (VMs, storage, etc...) as low-priority links. If worst comes to
|
||||||
|
worst, a higher-latency or more congested connection might be better than no
|
||||||
|
connection at all.
|
||||||
|
|
||||||
|
Adding Redundant Links To An Existing Cluster
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
To add a new link to a running configuration, first check how to
|
||||||
|
xref:pvecm_edit_corosync_conf[edit the corosync.conf file].
|
||||||
|
|
||||||
|
Then, add a new 'ringX_addr' to every node in the `nodelist` section. Make
|
||||||
|
sure that your 'X' is the same for every node you add it to, and that it is
|
||||||
|
unique for each node.
|
||||||
|
|
||||||
|
Lastly, add a new 'interface', as shown below, to your `totem`
|
||||||
|
section, replacing 'X' with your link number chosen above.
|
||||||
|
|
||||||
|
Assuming you added a link with number 1, the new configuration file could look
|
||||||
|
like this:
|
||||||
|
|
||||||
|
----
|
||||||
|
logging {
|
||||||
|
debug: off
|
||||||
|
to_syslog: yes
|
||||||
}
|
}
|
||||||
|
|
||||||
nodelist {
|
nodelist {
|
||||||
node {
|
|
||||||
name: pvecm1
|
|
||||||
nodeid: 1
|
|
||||||
quorum_votes: 1
|
|
||||||
ring0_addr: 10.10.10.1
|
|
||||||
ring1_addr: 10.10.20.1
|
|
||||||
}
|
|
||||||
|
|
||||||
node {
|
node {
|
||||||
name: pvecm2
|
name: due
|
||||||
nodeid: 2
|
nodeid: 2
|
||||||
quorum_votes: 1
|
quorum_votes: 1
|
||||||
ring0_addr: 10.10.10.2
|
ring0_addr: 10.10.10.2
|
||||||
ring1_addr: 10.10.20.2
|
ring1_addr: 10.20.20.2
|
||||||
|
}
|
||||||
|
|
||||||
|
node {
|
||||||
|
name: tre
|
||||||
|
nodeid: 3
|
||||||
|
quorum_votes: 1
|
||||||
|
ring0_addr: 10.10.10.3
|
||||||
|
ring1_addr: 10.20.20.3
|
||||||
|
}
|
||||||
|
|
||||||
|
node {
|
||||||
|
name: uno
|
||||||
|
nodeid: 1
|
||||||
|
quorum_votes: 1
|
||||||
|
ring0_addr: 10.10.10.1
|
||||||
|
ring1_addr: 10.20.20.1
|
||||||
}
|
}
|
||||||
|
|
||||||
[...] # other cluster nodes here
|
|
||||||
}
|
}
|
||||||
|
|
||||||
[...] # other remaining config sections here
|
quorum {
|
||||||
|
provider: corosync_votequorum
|
||||||
|
}
|
||||||
|
|
||||||
|
totem {
|
||||||
|
cluster_name: testcluster
|
||||||
|
config_version: 4
|
||||||
|
ip_version: ipv4-6
|
||||||
|
secauth: on
|
||||||
|
version: 2
|
||||||
|
interface {
|
||||||
|
linknumber: 0
|
||||||
|
}
|
||||||
|
interface {
|
||||||
|
linknumber: 1
|
||||||
|
}
|
||||||
|
}
|
||||||
----
|
----
|
||||||
|
|
||||||
Bring it in effect like described in the
|
The new link will be enabled as soon as you follow the last steps to
|
||||||
xref:pvecm_edit_corosync_conf[edit the corosync.conf file] section.
|
xref:pvecm_edit_corosync_conf[edit the corosync.conf file]. A restart should not
|
||||||
|
be necessary. You can check that corosync loaded the new link using:
|
||||||
|
|
||||||
This is a change which cannot take live in effect and needs at least a restart
|
----
|
||||||
of corosync. Recommended is a restart of the whole cluster.
|
journalctl -b -u corosync
|
||||||
|
----
|
||||||
|
|
||||||
|
It might be a good idea to test the new link by temporarily disconnecting the
|
||||||
|
old link on one node and making sure that its status remains online while
|
||||||
|
disconnected:
|
||||||
|
|
||||||
|
----
|
||||||
|
pvecm status
|
||||||
|
----
|
||||||
|
|
||||||
|
If you see a healthy cluster state, it means that your new link is being used.
|
||||||
|
|
||||||
If you cannot reboot the whole cluster ensure no High Availability services are
|
|
||||||
configured and the stop the corosync service on all nodes. After corosync is
|
|
||||||
stopped on all nodes start it one after the other again.
|
|
||||||
|
|
||||||
Corosync External Vote Support
|
Corosync External Vote Support
|
||||||
------------------------------
|
------------------------------
|
||||||
@ -832,10 +840,8 @@ for Debian based hosts, other Linux distributions should also have a package
|
|||||||
available through their respective package manager.
|
available through their respective package manager.
|
||||||
|
|
||||||
NOTE: In contrast to corosync itself, a QDevice connects to the cluster over
|
NOTE: In contrast to corosync itself, a QDevice connects to the cluster over
|
||||||
TCP/IP and thus does not need a multicast capable network between itself and
|
TCP/IP. The daemon may even run outside of the clusters LAN and can have longer
|
||||||
the cluster. In fact the daemon may run outside of the LAN and can have
|
latencies than 2 ms.
|
||||||
longer latencies than 2 ms.
|
|
||||||
|
|
||||||
|
|
||||||
Supported Setups
|
Supported Setups
|
||||||
~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~
|
||||||
@ -871,7 +877,6 @@ There are two drawbacks with this:
|
|||||||
If you understand the drawbacks and implications you can decide yourself if
|
If you understand the drawbacks and implications you can decide yourself if
|
||||||
you should use this technology in an odd numbered cluster setup.
|
you should use this technology in an odd numbered cluster setup.
|
||||||
|
|
||||||
|
|
||||||
QDevice-Net Setup
|
QDevice-Net Setup
|
||||||
~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
@ -923,7 +928,6 @@ Membership information
|
|||||||
|
|
||||||
which means the QDevice is set up.
|
which means the QDevice is set up.
|
||||||
|
|
||||||
|
|
||||||
Frequently Asked Questions
|
Frequently Asked Questions
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
@ -961,15 +965,15 @@ pve# pvecm qdevice remove
|
|||||||
|
|
||||||
//Still TODO
|
//Still TODO
|
||||||
//^^^^^^^^^^
|
//^^^^^^^^^^
|
||||||
//There ist still stuff to add here
|
//There is still stuff to add here
|
||||||
|
|
||||||
|
|
||||||
Corosync Configuration
|
Corosync Configuration
|
||||||
----------------------
|
----------------------
|
||||||
|
|
||||||
The `/etc/pve/corosync.conf` file plays a central role in {pve} cluster. It
|
The `/etc/pve/corosync.conf` file plays a central role in a {pve} cluster. It
|
||||||
controls the cluster member ship and its network.
|
controls the cluster membership and its network.
|
||||||
For reading more about it check the corosync.conf man page:
|
For further information about it, check the corosync.conf man page:
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
man corosync.conf
|
man corosync.conf
|
||||||
@ -983,23 +987,23 @@ Here are a few best practice tips for doing this.
|
|||||||
Edit corosync.conf
|
Edit corosync.conf
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Editing the corosync.conf file can be not always straight forward. There are
|
Editing the corosync.conf file is not always very straightforward. There are
|
||||||
two on each cluster, one in `/etc/pve/corosync.conf` and the other in
|
two on each cluster node, one in `/etc/pve/corosync.conf` and the other in
|
||||||
`/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
|
`/etc/corosync/corosync.conf`. Editing the one in our cluster file system will
|
||||||
propagate the changes to the local one, but not vice versa.
|
propagate the changes to the local one, but not vice versa.
|
||||||
|
|
||||||
The configuration will get updated automatically as soon as the file changes.
|
The configuration will get updated automatically as soon as the file changes.
|
||||||
This means changes which can be integrated in a running corosync will take
|
This means changes which can be integrated in a running corosync will take
|
||||||
instantly effect. So you should always make a copy and edit that instead, to
|
effect immediately. So you should always make a copy and edit that instead, to
|
||||||
avoid triggering some unwanted changes by an in between safe.
|
avoid triggering some unwanted changes by an in-between safe.
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
|
cp /etc/pve/corosync.conf /etc/pve/corosync.conf.new
|
||||||
----
|
----
|
||||||
|
|
||||||
Then open the Config file with your favorite editor, `nano` and `vim.tiny` are
|
Then open the config file with your favorite editor, `nano` and `vim.tiny` are
|
||||||
preinstalled on {pve} for example.
|
preinstalled on any {pve} node for example.
|
||||||
|
|
||||||
NOTE: Always increment the 'config_version' number on configuration changes,
|
NOTE: Always increment the 'config_version' number on configuration changes,
|
||||||
omitting this can lead to problems.
|
omitting this can lead to problems.
|
||||||
@ -1026,7 +1030,7 @@ systemctl status corosync
|
|||||||
journalctl -b -u corosync
|
journalctl -b -u corosync
|
||||||
----
|
----
|
||||||
|
|
||||||
If the change could applied automatically. If not you may have to restart the
|
If the change could be applied automatically. If not you may have to restart the
|
||||||
corosync service via:
|
corosync service via:
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
@ -1054,7 +1058,6 @@ corosync[1647]: [SERV ] Service engine 'corosync_quorum' failed to load for re
|
|||||||
It means that the hostname you set for corosync 'ringX_addr' in the
|
It means that the hostname you set for corosync 'ringX_addr' in the
|
||||||
configuration could not be resolved.
|
configuration could not be resolved.
|
||||||
|
|
||||||
|
|
||||||
Write Configuration When Not Quorate
|
Write Configuration When Not Quorate
|
||||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||||
|
|
||||||
@ -1080,19 +1083,8 @@ Corosync Configuration Glossary
|
|||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
ringX_addr::
|
ringX_addr::
|
||||||
This names the different ring addresses for the corosync totem rings used for
|
This names the different link addresses for the kronosnet connections between
|
||||||
the cluster communication.
|
nodes.
|
||||||
|
|
||||||
bindnetaddr::
|
|
||||||
Defines to which interface the ring should bind to. It may be any address of
|
|
||||||
the subnet configured on the interface we want to use. In general its the
|
|
||||||
recommended to just use an address a node uses on this interface.
|
|
||||||
|
|
||||||
rrp_mode::
|
|
||||||
Specifies the mode of the redundant ring protocol and may be passive, active or
|
|
||||||
none. Note that use of active is highly experimental and not official
|
|
||||||
supported. Passive is the preferred mode, it may double the cluster
|
|
||||||
communication throughput and increases availability.
|
|
||||||
|
|
||||||
|
|
||||||
Cluster Cold Start
|
Cluster Cold Start
|
||||||
@ -1127,10 +1119,10 @@ It makes a difference if a Guest is online or offline, or if it has
|
|||||||
local resources (like a local disk).
|
local resources (like a local disk).
|
||||||
|
|
||||||
For Details about Virtual Machine Migration see the
|
For Details about Virtual Machine Migration see the
|
||||||
xref:qm_migration[QEMU/KVM Migration Chapter]
|
xref:qm_migration[QEMU/KVM Migration Chapter].
|
||||||
|
|
||||||
For Details about Container Migration see the
|
For Details about Container Migration see the
|
||||||
xref:pct_migration[Container Migration Chapter]
|
xref:pct_migration[Container Migration Chapter].
|
||||||
|
|
||||||
Migration Type
|
Migration Type
|
||||||
~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~
|
||||||
@ -1155,7 +1147,6 @@ modern systems is lower because they implement AES encryption in
|
|||||||
hardware. The performance impact is particularly evident in fast
|
hardware. The performance impact is particularly evident in fast
|
||||||
networks where you can transfer 10 Gbps or more.
|
networks where you can transfer 10 Gbps or more.
|
||||||
|
|
||||||
|
|
||||||
Migration Network
|
Migration Network
|
||||||
~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
@ -1175,7 +1166,6 @@ destination node from the network specified in the CIDR form. To
|
|||||||
enable this, the network must be specified so that each node has one,
|
enable this, the network must be specified so that each node has one,
|
||||||
but only one IP in the respective network.
|
but only one IP in the respective network.
|
||||||
|
|
||||||
|
|
||||||
Example
|
Example
|
||||||
^^^^^^^
|
^^^^^^^
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user