mirror of
				https://git.proxmox.com/git/pve-docs
				synced 2025-10-26 16:01:29 +00:00 
			
		
		
		
	ha: add shutdown policy docs
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This commit is contained in:
		
							parent
							
								
									97d63abc45
								
							
						
					
					
						commit
						a4a67cdb74
					
				
							
								
								
									
										106
									
								
								ha-manager.adoc
									
									
									
									
									
								
							
							
						
						
									
										106
									
								
								ha-manager.adoc
									
									
									
									
									
								
							| @ -828,50 +828,90 @@ case, may result in a reset triggered by the watchdog. | |||||||
| Node Maintenance | Node Maintenance | ||||||
| ---------------- | ---------------- | ||||||
| 
 | 
 | ||||||
| It is sometimes possible to shutdown or reboot a node to do | It is sometimes possible to shutdown or reboot a node to do maintenance tasks. | ||||||
| maintenance tasks. Either to replace hardware, or simply to install a | Either to replace hardware, or simply to install a new kernel image. | ||||||
| new kernel image. | This is also true when using the HA stack. The behaviour of the HA stack during | ||||||
|  | a shutdown can be configured. | ||||||
|  | 
 | ||||||
|  | [[ha_manager_shutdown_policy]] | ||||||
|  | Shutdown Policy | ||||||
|  | ~~~~~~~~~~~~~~~ | ||||||
|  | 
 | ||||||
|  | Below you will find a description of the different HA policies for a node | ||||||
|  | shutdown. Currently 'Conditional' is the default due to backward compatibility. | ||||||
|  | Some users may find that the 'Migrate' behaves more as expected. | ||||||
|  | 
 | ||||||
|  | Migrate | ||||||
|  | ^^^^^^^ | ||||||
|  | 
 | ||||||
|  | Once the Local Resource manager (LRM) gets a shutdown request and this policy | ||||||
|  | is enabled, it will mark it self as unavailable for the current HA manager. | ||||||
|  | This triggers a migration of all HA Services currently located on this node. | ||||||
|  | Until all running Services got moved away, the LRM will try to delay the | ||||||
|  | shutdown process. But, this expects that the running services *can* be migrated | ||||||
|  | to another node. In other words, the service must not be locally bound, for | ||||||
|  | example by using hardware passthrough. As non-group member nodes are considered | ||||||
|  | as runnable target if no group member is available, this policy can still be | ||||||
|  | used when making use of group node restrictions. | ||||||
|  | Once the shut down node comes back online again, the previously displaced | ||||||
|  | services will be moved back, if they did not get migrated manually in-between. | ||||||
|  | 
 | ||||||
|  | NOTE: The watchdog is still active during the migration process on shutdown. | ||||||
|  | If the node loses quorum it will be fenced and the services will be recovered. | ||||||
|  | 
 | ||||||
|  | Failover | ||||||
|  | ^^^^^^^^ | ||||||
|  | 
 | ||||||
|  | This mode ensures that all services get stopped, but that they will also be | ||||||
|  | recovered, if the current node is not online soon. It can be useful when doing | ||||||
|  | maintenance on a cluster scale, were live-migrating VMs may not be possible if | ||||||
|  | to many nodes are powered-off at a time, but you still want to ensure HA | ||||||
|  | services get recovered and started again as soon as possible. | ||||||
|  | 
 | ||||||
|  | Freeze | ||||||
|  | ^^^^^^ | ||||||
|  | 
 | ||||||
|  | This mode ensures that all services get stopped and frozen, so that they won't | ||||||
|  | get recovered until the current node is online again. | ||||||
|  | 
 | ||||||
|  | Conditional | ||||||
|  | ^^^^^^^^^^^ | ||||||
|  | 
 | ||||||
|  | .Shutdown | ||||||
|  | 
 | ||||||
|  | A shutdown ('poweroff') is usually done if the node is planned to stay down for | ||||||
|  | some time. The LRM stops all managed services in that case. This means that | ||||||
|  | other nodes will take over those service afterwards. | ||||||
|  | 
 | ||||||
|  | NOTE: Recent hardware has large amounts of memory (RAM). So we stop all | ||||||
|  | resources, then restart them to avoid online migration of all that RAM. If you | ||||||
|  | want to use online migration, you need to invoke that manually before you | ||||||
|  | shutdown the node. | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| Shutdown | .Reboot | ||||||
| ~~~~~~~~ |  | ||||||
| 
 | 
 | ||||||
| A shutdown ('poweroff') is usually done if the node is planned to stay | Node reboots are initiated with the 'reboot' command. This is usually done | ||||||
| down for some time. The LRM stops all managed services in that | after installing a new kernel. Please note that this is different from | ||||||
| case. This means that other nodes will take over those service | ``shutdown'', because the node immediately starts again. | ||||||
| afterwards. |  | ||||||
| 
 | 
 | ||||||
| NOTE: Recent hardware has large amounts of RAM. So we stop all | The LRM tells the CRM that it wants to restart, and waits until the CRM puts | ||||||
| resources, then restart them to avoid online migration of all that | all resources into the `freeze` state (same mechanism is used for | ||||||
| RAM. If you want to use online migration, you need to invoke that | xref:ha_manager_package_updates[Package Updates]). This prevents that those | ||||||
| manually before you shutdown the node. | resources are moved to other nodes. Instead, the CRM start the resources after | ||||||
| 
 | the reboot on the same node. | ||||||
| 
 |  | ||||||
| Reboot |  | ||||||
| ~~~~~~ |  | ||||||
| 
 |  | ||||||
| Node reboots are initiated with the 'reboot' command. This is usually |  | ||||||
| done after installing a new kernel. Please note that this is different |  | ||||||
| from ``shutdown'', because the node immediately starts again. |  | ||||||
| 
 |  | ||||||
| The LRM tells the CRM that it wants to restart, and waits until the |  | ||||||
| CRM puts all resources into the `freeze` state (same mechanism is used |  | ||||||
| for xref:ha_manager_package_updates[Package Updates]). This prevents |  | ||||||
| that those resources are moved to other nodes. Instead, the CRM start |  | ||||||
| the resources after the reboot on the same node. |  | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| Manual Resource Movement | Manual Resource Movement | ||||||
| ~~~~~~~~~~~~~~~~~~~~~~~~ | ~~~~~~~~~~~~~~~~~~~~~~~~ | ||||||
| 
 | 
 | ||||||
| Last but not least, you can also move resources manually to other | Last but not least, you can also move resources manually to other nodes before | ||||||
| nodes before you shutdown or restart a node. The advantage is that you | you shutdown or restart a node. The advantage is that you have full control, | ||||||
| have full control, and you can decide if you want to use online | and you can decide if you want to use online migration or not. | ||||||
| migration or not. |  | ||||||
| 
 | 
 | ||||||
| NOTE: Please do not 'kill' services like `pve-ha-crm`, `pve-ha-lrm` or | NOTE: Please do not 'kill' services like `pve-ha-crm`, `pve-ha-lrm` or | ||||||
| `watchdog-mux`. They manage and use the watchdog, so this can result | `watchdog-mux`. They manage and use the watchdog, so this can result in a | ||||||
| in a node reboot. | immediate node reboot or even reset. | ||||||
| 
 | 
 | ||||||
| 
 | 
 | ||||||
| ifdef::manvolnum[] | ifdef::manvolnum[] | ||||||
|  | |||||||
		Loading…
	
		Reference in New Issue
	
	Block a user
	 Thomas Lamprecht
						Thomas Lamprecht