mirror of
https://git.proxmox.com/git/pve-docs
synced 2025-04-29 03:26:01 +00:00
import ha-manager docs
just copied what we have so far
This commit is contained in:
parent
3f13c1c31b
commit
22653ac84b
4
Makefile
4
Makefile
@ -6,6 +6,7 @@ VZDUMP_SOURCES=attributes.txt vzdump.adoc vzdump.1-synopsis.adoc
|
||||
PVEFW_SOURCES=attributes.txt pve-firewall.adoc pve-firewall.8-synopsis.adoc
|
||||
QM_SOURCES=attributes.txt qm.adoc qm.1-synopsis.adoc
|
||||
PCT_SOURCES=attributes.txt pct.adoc pct.1-synopsis.adoc
|
||||
HA_SOURCES=attributes.txt ha-manager.1-synopsis.adoc ha-manager.adoc
|
||||
|
||||
SYSADMIN_SOURCES= \
|
||||
getting-help.adoc \
|
||||
@ -26,6 +27,7 @@ PVE_ADMIN_GUIDE_SOURCES= \
|
||||
${PVEUM_SOURCES} \
|
||||
${PVESM_SOURCES} \
|
||||
${VZDUMP_SOURCES} \
|
||||
${HA_SOURCES} \
|
||||
images/cluster-nwdiag.svg \
|
||||
images/node-nwdiag.svg \
|
||||
pve-bibliography.adoc \
|
||||
@ -71,7 +73,7 @@ all: pve-admin-guide.html
|
||||
|
||||
index.html: index.adoc ${PVE_ADMIN_GUIDE_SOURCES}
|
||||
$(MAKE) NOVIEW=1 pve-admin-guide.pdf pve-admin-guide.html pve-admin-guide.epub
|
||||
$(MAKE) NOVIEW=1 qm.1.html pct.1.html pvesm.1.html pveum.1.html vzdump.1.html pve-firewall.8.html
|
||||
$(MAKE) NOVIEW=1 qm.1.html pct.1.html pvesm.1.html pveum.1.html vzdump.1.html pve-firewall.8.html ha-manager.1.html
|
||||
asciidoc -a "date=$(shell date)" -a "revnumber=${RELEASE}" index.adoc
|
||||
$(BROWSER) index.html &
|
||||
|
||||
|
207
ha-manager.adoc
Normal file
207
ha-manager.adoc
Normal file
@ -0,0 +1,207 @@
|
||||
[[chapter-ha-manager]]
|
||||
ifdef::manvolnum[]
|
||||
PVE({manvolnum})
|
||||
================
|
||||
include::attributes.txt[]
|
||||
|
||||
NAME
|
||||
----
|
||||
|
||||
ha-manager - Proxmox VE HA manager command line interface
|
||||
|
||||
SYNOPSYS
|
||||
--------
|
||||
|
||||
include::ha-manager.1-synopsis.adoc[]
|
||||
|
||||
DESCRIPTION
|
||||
-----------
|
||||
endif::manvolnum[]
|
||||
|
||||
ifndef::manvolnum[]
|
||||
High Availability
|
||||
=================
|
||||
include::attributes.txt[]
|
||||
endif::manvolnum[]
|
||||
|
||||
'ha-manager' handles management of user-defined cluster services. This
|
||||
includes handling of user requests including service start, service
|
||||
disable, service relocate, and service restart. The cluster resource
|
||||
manager daemon also handles restarting and relocating services in the
|
||||
event of failures.
|
||||
|
||||
HOW IT WORKS
|
||||
------------
|
||||
|
||||
The local resource manager ('pve-ha-lrm') is started as a daemon on
|
||||
each node at system start and waits until the HA cluster is quorate
|
||||
and locks are working. After initialization, the LRM determines which
|
||||
services are enabled and starts them. Also the watchdog gets
|
||||
initialized.
|
||||
|
||||
The cluster resource manager ('pve-ha-crm') starts on each node and
|
||||
waits there for the manager lock, which can only be held by one node
|
||||
at a time. The node which successfully acquires the manager lock gets
|
||||
promoted to the CRM, it handles cluster wide actions like migrations
|
||||
and failures.
|
||||
|
||||
When an node leaves the cluster quorum, its state changes to unknown.
|
||||
If the current CRM then can secure the failed nodes lock, the services
|
||||
will be 'stolen' and restarted on another node.
|
||||
|
||||
When a cluster member determines that it is no longer in the cluster
|
||||
quorum, the LRM waits for a new quorum to form. As long as there is no
|
||||
quorum the node cannot reset the watchdog. This will trigger a reboot
|
||||
after 60 seconds.
|
||||
|
||||
CONFIGURATION
|
||||
-------------
|
||||
|
||||
The HA stack is well integrated int the Proxmox VE API2. So, for
|
||||
example, HA can be configured via 'ha-manager' or the PVE web
|
||||
interface, which both provide an easy to use tool.
|
||||
|
||||
The resource configuration file can be located at
|
||||
'/etc/pve/ha/resources.cfg' and the group configuration file at
|
||||
'/etc/pve/ha/groups.cfg'. Use the provided tools to make changes,
|
||||
there shouldn't be any need to edit them manually.
|
||||
|
||||
RESOURCES/SERVICES AGENTS
|
||||
-------------------------
|
||||
|
||||
A resource or also called service can be managed by the
|
||||
ha-manager. Currently we support virtual machines and container.
|
||||
|
||||
GROUPS
|
||||
------
|
||||
|
||||
A group is a collection of cluster nodes which a service may be bound to.
|
||||
|
||||
GROUP SETTINGS
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
nodes::
|
||||
|
||||
list of group node members
|
||||
|
||||
restricted::
|
||||
|
||||
resources bound to this group may only run on nodes defined by the
|
||||
group. If no group node member is available the resource will be
|
||||
placed in the stopped state.
|
||||
|
||||
nofailback::
|
||||
|
||||
the resource won't automatically fail back when a more preferred node
|
||||
(re)joins the cluster.
|
||||
|
||||
|
||||
RECOVERY POLICY
|
||||
---------------
|
||||
|
||||
There are two service recover policy settings which can be configured
|
||||
specific for each resource.
|
||||
|
||||
max_restart::
|
||||
|
||||
maximal number of tries to restart an failed service on the actual
|
||||
node. The default is set to one.
|
||||
|
||||
max_relocate::
|
||||
|
||||
maximal number of tries to relocate the service to a different node.
|
||||
A relocate only happens after the max_restart value is exceeded on the
|
||||
actual node. The default is set to one.
|
||||
|
||||
Note that the relocate count state will only reset to zero when the
|
||||
service had at least one successful start. That means if a service is
|
||||
re-enabled without fixing the error only the restart policy gets
|
||||
repeated.
|
||||
|
||||
ERROR RECOVERY
|
||||
--------------
|
||||
|
||||
If after all tries the service state could not be recovered it gets
|
||||
placed in an error state. In this state the service won't get touched
|
||||
by the HA stack anymore. To recover from this state you should follow
|
||||
these steps:
|
||||
|
||||
* bring the resource back into an safe and consistent state (e.g:
|
||||
killing its process)
|
||||
|
||||
* disable the ha resource to place it in an stopped state
|
||||
|
||||
* fix the error which led to this failures
|
||||
|
||||
* *after* you fixed all errors you may enable the service again
|
||||
|
||||
|
||||
SERVICE OPERATIONS
|
||||
------------------
|
||||
|
||||
This are how the basic user-initiated service operations (via
|
||||
'ha-manager') work.
|
||||
|
||||
enable::
|
||||
|
||||
the service will be started by the LRM if not already running.
|
||||
|
||||
disable::
|
||||
|
||||
the service will be stopped by the LRM if running.
|
||||
|
||||
migrate/relocate::
|
||||
|
||||
the service will be relocated (live) to another node.
|
||||
|
||||
remove::
|
||||
|
||||
the service will be removed from the HA managed resource list. Its
|
||||
current state will not be touched.
|
||||
|
||||
start/stop::
|
||||
|
||||
start and stop commands can be issued to the resource specific tools
|
||||
(like 'qm' or 'pct'), they will forward the request to the
|
||||
'ha-manager' which then will execute the action and set the resulting
|
||||
service state (enabled, disabled).
|
||||
|
||||
|
||||
SERVICE STATES
|
||||
--------------
|
||||
|
||||
stopped::
|
||||
|
||||
Service is stopped (confirmed by LRM)
|
||||
|
||||
request_stop::
|
||||
|
||||
Service should be stopped. Waiting for confirmation from LRM.
|
||||
|
||||
started::
|
||||
|
||||
Service is active an LRM should start it ASAP if not already running.
|
||||
|
||||
fence::
|
||||
|
||||
Wait for node fencing (service node is not inside quorate cluster
|
||||
partition).
|
||||
|
||||
freeze::
|
||||
|
||||
Do not touch the service state. We use this state while we reboot a
|
||||
node, or when we restart the LRM daemon.
|
||||
|
||||
migrate::
|
||||
|
||||
Migrate service (live) to other node.
|
||||
|
||||
error::
|
||||
|
||||
Service disabled because of LRM errors. Needs manual intervention.
|
||||
|
||||
|
||||
ifdef::manvolnum[]
|
||||
include::pve-copyright.adoc[]
|
||||
endif::manvolnum[]
|
||||
|
@ -26,6 +26,7 @@ include::attributes.txt[]
|
||||
| pvesm | link:pvesm.1.html[pvesm.1]
|
||||
| pveum | link:pveum.1.html[pveum.1]
|
||||
| vzdump | link:vzdump.1.html[vzdump.1]
|
||||
| ha-manager | link:ha-manager.1.html[ha-manager.1]
|
||||
| pve-firewall | link:pve-firewall.8.html[pve-firewall.8]
|
||||
|===========================================================
|
||||
|
||||
|
@ -26,6 +26,8 @@ include::pveum.adoc[]
|
||||
|
||||
include::pct.adoc[]
|
||||
|
||||
include::ha-manager.adoc[]
|
||||
|
||||
include::vzdump.adoc[]
|
||||
|
||||
// Return to normal title levels.
|
||||
@ -85,6 +87,14 @@ include::pve-firewall.8-synopsis.adoc[]
|
||||
|
||||
:leveloffset: 0
|
||||
|
||||
*ha-manager* - Proxmox VE HA manager
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:leveloffset: 1
|
||||
include::ha-manager.1-synopsis.adoc[]
|
||||
|
||||
:leveloffset: 0
|
||||
|
||||
include::pve-bibliography.adoc[]
|
||||
|
||||
:leveloffset: 1
|
||||
|
Loading…
Reference in New Issue
Block a user