mirror of
https://git.proxmox.com/git/pve-docs
synced 2025-08-05 01:37:06 +00:00
ceph: section language fixup
Mostly fixes minor issues and makes it more in line with our writing guide. Some sections were reworded for better readability. Signed-off-by: Dylan Whyte <d.whyte@proxmox.com>
This commit is contained in:
parent
d26563851c
commit
40e6c80663
413
pveceph.adoc
413
pveceph.adoc
@ -25,11 +25,11 @@ endif::manvolnum[]
|
|||||||
|
|
||||||
[thumbnail="screenshot/gui-ceph-status.png"]
|
[thumbnail="screenshot/gui-ceph-status.png"]
|
||||||
|
|
||||||
{pve} unifies your compute and storage systems, i.e. you can use the same
|
{pve} unifies your compute and storage systems, that is, you can use the same
|
||||||
physical nodes within a cluster for both computing (processing VMs and
|
physical nodes within a cluster for both computing (processing VMs and
|
||||||
containers) and replicated storage. The traditional silos of compute and
|
containers) and replicated storage. The traditional silos of compute and
|
||||||
storage resources can be wrapped up into a single hyper-converged appliance.
|
storage resources can be wrapped up into a single hyper-converged appliance.
|
||||||
Separate storage networks (SANs) and connections via network attached storages
|
Separate storage networks (SANs) and connections via network attached storage
|
||||||
(NAS) disappear. With the integration of Ceph, an open source software-defined
|
(NAS) disappear. With the integration of Ceph, an open source software-defined
|
||||||
storage platform, {pve} has the ability to run and manage Ceph storage directly
|
storage platform, {pve} has the ability to run and manage Ceph storage directly
|
||||||
on the hypervisor nodes.
|
on the hypervisor nodes.
|
||||||
@ -38,27 +38,27 @@ Ceph is a distributed object store and file system designed to provide
|
|||||||
excellent performance, reliability and scalability.
|
excellent performance, reliability and scalability.
|
||||||
|
|
||||||
.Some advantages of Ceph on {pve} are:
|
.Some advantages of Ceph on {pve} are:
|
||||||
- Easy setup and management with CLI and GUI support
|
- Easy setup and management via CLI and GUI
|
||||||
- Thin provisioning
|
- Thin provisioning
|
||||||
- Snapshots support
|
- Snapshot support
|
||||||
- Self healing
|
- Self healing
|
||||||
- Scalable to the exabyte level
|
- Scalable to the exabyte level
|
||||||
- Setup pools with different performance and redundancy characteristics
|
- Setup pools with different performance and redundancy characteristics
|
||||||
- Data is replicated, making it fault tolerant
|
- Data is replicated, making it fault tolerant
|
||||||
- Runs on economical commodity hardware
|
- Runs on commodity hardware
|
||||||
- No need for hardware RAID controllers
|
- No need for hardware RAID controllers
|
||||||
- Open source
|
- Open source
|
||||||
|
|
||||||
For small to mid sized deployments, it is possible to install a Ceph server for
|
For small to medium-sized deployments, it is possible to install a Ceph server for
|
||||||
RADOS Block Devices (RBD) directly on your {pve} cluster nodes, see
|
RADOS Block Devices (RBD) directly on your {pve} cluster nodes (see
|
||||||
xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]. Recent
|
xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]). Recent
|
||||||
hardware has plenty of CPU power and RAM, so running storage services
|
hardware has a lot of CPU power and RAM, so running storage services
|
||||||
and VMs on the same node is possible.
|
and VMs on the same node is possible.
|
||||||
|
|
||||||
To simplify management, we provide 'pveceph' - a tool to install and
|
To simplify management, we provide 'pveceph' - a tool for installing and
|
||||||
manage {ceph} services on {pve} nodes.
|
managing {ceph} services on {pve} nodes.
|
||||||
|
|
||||||
.Ceph consists of a couple of Daemons, for use as a RBD storage:
|
.Ceph consists of multiple Daemons, for use as an RBD storage:
|
||||||
- Ceph Monitor (ceph-mon)
|
- Ceph Monitor (ceph-mon)
|
||||||
- Ceph Manager (ceph-mgr)
|
- Ceph Manager (ceph-mgr)
|
||||||
- Ceph OSD (ceph-osd; Object Storage Daemon)
|
- Ceph OSD (ceph-osd; Object Storage Daemon)
|
||||||
@ -74,22 +74,22 @@ footnote:[Ceph glossary {cephdocs-url}/glossary].
|
|||||||
Precondition
|
Precondition
|
||||||
------------
|
------------
|
||||||
|
|
||||||
To build a hyper-converged Proxmox + Ceph Cluster there should be at least
|
To build a hyper-converged Proxmox + Ceph Cluster, you must use at least
|
||||||
three (preferably) identical servers for the setup.
|
three (preferably) identical servers for the setup.
|
||||||
|
|
||||||
Check also the recommendations from
|
Check also the recommendations from
|
||||||
{cephdocs-url}/start/hardware-recommendations/[Ceph's website].
|
{cephdocs-url}/start/hardware-recommendations/[Ceph's website].
|
||||||
|
|
||||||
.CPU
|
.CPU
|
||||||
Higher CPU core frequency reduce latency and should be preferred. As a simple
|
A high CPU core frequency reduces latency and should be preferred. As a simple
|
||||||
rule of thumb, you should assign a CPU core (or thread) to each Ceph service to
|
rule of thumb, you should assign a CPU core (or thread) to each Ceph service to
|
||||||
provide enough resources for stable and durable Ceph performance.
|
provide enough resources for stable and durable Ceph performance.
|
||||||
|
|
||||||
.Memory
|
.Memory
|
||||||
Especially in a hyper-converged setup, the memory consumption needs to be
|
Especially in a hyper-converged setup, the memory consumption needs to be
|
||||||
carefully monitored. In addition to the intended workload from virtual machines
|
carefully monitored. In addition to the predicted memory usage of virtual
|
||||||
and containers, Ceph needs enough memory available to provide excellent and
|
machines and containers, you must also account for having enough memory
|
||||||
stable performance.
|
available for Ceph to provide excellent and stable performance.
|
||||||
|
|
||||||
As a rule of thumb, for roughly **1 TiB of data, 1 GiB of memory** will be used
|
As a rule of thumb, for roughly **1 TiB of data, 1 GiB of memory** will be used
|
||||||
by an OSD. Especially during recovery, rebalancing or backfilling.
|
by an OSD. Especially during recovery, rebalancing or backfilling.
|
||||||
@ -108,64 +108,65 @@ is also an option if there are no 10 GbE switches available.
|
|||||||
The volume of traffic, especially during recovery, will interfere with other
|
The volume of traffic, especially during recovery, will interfere with other
|
||||||
services on the same network and may even break the {pve} cluster stack.
|
services on the same network and may even break the {pve} cluster stack.
|
||||||
|
|
||||||
Further, estimate your bandwidth needs. While one HDD might not saturate a 1 Gb
|
Furthermore, you should estimate your bandwidth needs. While one HDD might not
|
||||||
link, multiple HDD OSDs per node can, and modern NVMe SSDs will even saturate
|
saturate a 1 Gb link, multiple HDD OSDs per node can, and modern NVMe SSDs will
|
||||||
10 Gbps of bandwidth quickly. Deploying a network capable of even more bandwidth
|
even saturate 10 Gbps of bandwidth quickly. Deploying a network capable of even
|
||||||
will ensure that it isn't your bottleneck and won't be anytime soon, 25, 40 or
|
more bandwidth will ensure that this isn't your bottleneck and won't be anytime
|
||||||
even 100 GBps are possible.
|
soon. 25, 40 or even 100 Gbps are possible.
|
||||||
|
|
||||||
.Disks
|
.Disks
|
||||||
When planning the size of your Ceph cluster, it is important to take the
|
When planning the size of your Ceph cluster, it is important to take the
|
||||||
recovery time into consideration. Especially with small clusters, the recovery
|
recovery time into consideration. Especially with small clusters, recovery
|
||||||
might take long. It is recommended that you use SSDs instead of HDDs in small
|
might take long. It is recommended that you use SSDs instead of HDDs in small
|
||||||
setups to reduce recovery time, minimizing the likelihood of a subsequent
|
setups to reduce recovery time, minimizing the likelihood of a subsequent
|
||||||
failure event during recovery.
|
failure event during recovery.
|
||||||
|
|
||||||
In general SSDs will provide more IOPs than spinning disks. This fact and the
|
In general SSDs will provide more IOPs than spinning disks. With this in mind,
|
||||||
higher cost may make a xref:pve_ceph_device_classes[class based] separation of
|
in addition to the higher cost, it may make sense to implement a
|
||||||
pools appealing. Another possibility to speedup OSDs is to use a faster disk
|
xref:pve_ceph_device_classes[class based] separation of pools. Another way to
|
||||||
as journal or DB/**W**rite-**A**head-**L**og device, see
|
speed up OSDs is to use a faster disk as a journal or
|
||||||
xref:pve_ceph_osds[creating Ceph OSDs]. If a faster disk is used for multiple
|
DB/**W**rite-**A**head-**L**og device, see xref:pve_ceph_osds[creating Ceph
|
||||||
OSDs, a proper balance between OSD and WAL / DB (or journal) disk must be
|
OSDs]. If a faster disk is used for multiple OSDs, a proper balance between OSD
|
||||||
selected, otherwise the faster disk becomes the bottleneck for all linked OSDs.
|
and WAL / DB (or journal) disk must be selected, otherwise the faster disk
|
||||||
|
becomes the bottleneck for all linked OSDs.
|
||||||
|
|
||||||
Aside from the disk type, Ceph best performs with an even sized and distributed
|
Aside from the disk type, Ceph performs best with an even sized and distributed
|
||||||
amount of disks per node. For example, 4 x 500 GB disks with in each node is
|
amount of disks per node. For example, 4 x 500 GB disks within each node is
|
||||||
better than a mixed setup with a single 1 TB and three 250 GB disk.
|
better than a mixed setup with a single 1 TB and three 250 GB disk.
|
||||||
|
|
||||||
One also need to balance OSD count and single OSD capacity. More capacity
|
You also need to balance OSD count and single OSD capacity. More capacity
|
||||||
allows to increase storage density, but it also means that a single OSD
|
allows you to increase storage density, but it also means that a single OSD
|
||||||
failure forces ceph to recover more data at once.
|
failure forces Ceph to recover more data at once.
|
||||||
|
|
||||||
.Avoid RAID
|
.Avoid RAID
|
||||||
As Ceph handles data object redundancy and multiple parallel writes to disks
|
As Ceph handles data object redundancy and multiple parallel writes to disks
|
||||||
(OSDs) on its own, using a RAID controller normally doesn’t improve
|
(OSDs) on its own, using a RAID controller normally doesn’t improve
|
||||||
performance or availability. On the contrary, Ceph is designed to handle whole
|
performance or availability. On the contrary, Ceph is designed to handle whole
|
||||||
disks on it's own, without any abstraction in between. RAID controller are not
|
disks on it's own, without any abstraction in between. RAID controllers are not
|
||||||
designed for the Ceph use case and may complicate things and sometimes even
|
designed for the Ceph workload and may complicate things and sometimes even
|
||||||
reduce performance, as their write and caching algorithms may interfere with
|
reduce performance, as their write and caching algorithms may interfere with
|
||||||
the ones from Ceph.
|
the ones from Ceph.
|
||||||
|
|
||||||
WARNING: Avoid RAID controller, use host bus adapter (HBA) instead.
|
WARNING: Avoid RAID controllers. Use host bus adapter (HBA) instead.
|
||||||
|
|
||||||
NOTE: Above recommendations should be seen as a rough guidance for choosing
|
NOTE: The above recommendations should be seen as a rough guidance for choosing
|
||||||
hardware. Therefore, it is still essential to adapt it to your specific needs,
|
hardware. Therefore, it is still essential to adapt it to your specific needs.
|
||||||
test your setup and monitor health and performance continuously.
|
You should test your setup and monitor health and performance continuously.
|
||||||
|
|
||||||
[[pve_ceph_install_wizard]]
|
[[pve_ceph_install_wizard]]
|
||||||
Initial Ceph installation & configuration
|
Initial Ceph Installation & Configuration
|
||||||
-----------------------------------------
|
-----------------------------------------
|
||||||
|
|
||||||
[thumbnail="screenshot/gui-node-ceph-install.png"]
|
[thumbnail="screenshot/gui-node-ceph-install.png"]
|
||||||
|
|
||||||
With {pve} you have the benefit of an easy to use installation wizard
|
With {pve} you have the benefit of an easy to use installation wizard
|
||||||
for Ceph. Click on one of your cluster nodes and navigate to the Ceph
|
for Ceph. Click on one of your cluster nodes and navigate to the Ceph
|
||||||
section in the menu tree. If Ceph is not already installed you will be
|
section in the menu tree. If Ceph is not already installed, you will see a
|
||||||
offered to do so now.
|
prompt offering to do so.
|
||||||
|
|
||||||
The wizard is divided into different sections, where each needs to be
|
The wizard is divided into multiple sections, where each needs to
|
||||||
finished successfully in order to use Ceph. After starting the installation
|
finish successfully, in order to use Ceph. After starting the installation,
|
||||||
the wizard will download and install all required packages from {pve}'s ceph
|
the wizard will download and install all the required packages from {pve}'s Ceph
|
||||||
repository.
|
repository.
|
||||||
|
|
||||||
After finishing the first step, you will need to create a configuration.
|
After finishing the first step, you will need to create a configuration.
|
||||||
@ -175,41 +176,41 @@ xref:chapter_pmxcfs[configuration file system (pmxcfs)].
|
|||||||
|
|
||||||
The configuration step includes the following settings:
|
The configuration step includes the following settings:
|
||||||
|
|
||||||
* *Public Network:* You should setup a dedicated network for Ceph, this
|
* *Public Network:* You can set up a dedicated network for Ceph. This
|
||||||
setting is required. Separating your Ceph traffic is highly recommended,
|
setting is required. Separating your Ceph traffic is highly recommended.
|
||||||
because it could lead to troubles with other latency dependent services,
|
Otherwise, it could cause trouble with other latency dependent services,
|
||||||
e.g., cluster communication may decrease Ceph's performance, if not done.
|
for example, cluster communication may decrease Ceph's performance.
|
||||||
|
|
||||||
[thumbnail="screenshot/gui-node-ceph-install-wizard-step2.png"]
|
[thumbnail="screenshot/gui-node-ceph-install-wizard-step2.png"]
|
||||||
|
|
||||||
* *Cluster Network:* As an optional step you can go even further and
|
* *Cluster Network:* As an optional step, you can go even further and
|
||||||
separate the xref:pve_ceph_osds[OSD] replication & heartbeat traffic
|
separate the xref:pve_ceph_osds[OSD] replication & heartbeat traffic
|
||||||
as well. This will relieve the public network and could lead to
|
as well. This will relieve the public network and could lead to
|
||||||
significant performance improvements especially in big clusters.
|
significant performance improvements, especially in large clusters.
|
||||||
|
|
||||||
You have two more options which are considered advanced and therefore
|
You have two more options which are considered advanced and therefore
|
||||||
should only changed if you are an expert.
|
should only changed if you know what you are doing.
|
||||||
|
|
||||||
* *Number of replicas*: Defines the how often a object is replicated
|
* *Number of replicas*: Defines how often an object is replicated
|
||||||
* *Minimum replicas*: Defines the minimum number of required replicas
|
* *Minimum replicas*: Defines the minimum number of required replicas
|
||||||
for I/O to be marked as complete.
|
for I/O to be marked as complete.
|
||||||
|
|
||||||
Additionally you need to choose your first monitor node, this is required.
|
Additionally, you need to choose your first monitor node. This step is required.
|
||||||
|
|
||||||
That's it, you should see a success page as the last step with further
|
That's it. You should now see a success page as the last step, with further
|
||||||
instructions on how to go on. You are now prepared to start using Ceph,
|
instructions on how to proceed. Your system is now ready to start using Ceph.
|
||||||
even though you will need to create additional xref:pve_ceph_monitors[monitors],
|
To get started, you will need to create some additional xref:pve_ceph_monitors[monitors],
|
||||||
create some xref:pve_ceph_osds[OSDs] and at least one xref:pve_ceph_pools[pool].
|
xref:pve_ceph_osds[OSDs] and at least one xref:pve_ceph_pools[pool].
|
||||||
|
|
||||||
The rest of this chapter will guide you on how to get the most out of
|
The rest of this chapter will guide you through getting the most out of
|
||||||
your {pve} based Ceph setup, this will include aforementioned and
|
your {pve} based Ceph setup. This includes the aforementioned tips and
|
||||||
more like xref:pveceph_fs[CephFS] which is a very handy addition to your
|
more, such as xref:pveceph_fs[CephFS], which is a helpful addition to your
|
||||||
new Ceph cluster.
|
new Ceph cluster.
|
||||||
|
|
||||||
[[pve_ceph_install]]
|
[[pve_ceph_install]]
|
||||||
Installation of Ceph Packages
|
Installation of Ceph Packages
|
||||||
-----------------------------
|
-----------------------------
|
||||||
Use {pve} Ceph installation wizard (recommended) or run the following
|
Use the {pve} Ceph installation wizard (recommended) or run the following
|
||||||
command on each node:
|
command on each node:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
@ -235,10 +236,10 @@ pveceph init --network 10.10.10.0/24
|
|||||||
----
|
----
|
||||||
|
|
||||||
This creates an initial configuration at `/etc/pve/ceph.conf` with a
|
This creates an initial configuration at `/etc/pve/ceph.conf` with a
|
||||||
dedicated network for ceph. That file is automatically distributed to
|
dedicated network for Ceph. This file is automatically distributed to
|
||||||
all {pve} nodes by using xref:chapter_pmxcfs[pmxcfs]. The command also
|
all {pve} nodes, using xref:chapter_pmxcfs[pmxcfs]. The command also
|
||||||
creates a symbolic link from `/etc/ceph/ceph.conf` pointing to that file.
|
creates a symbolic link at `/etc/ceph/ceph.conf`, which points to that file.
|
||||||
So you can simply run Ceph commands without the need to specify a
|
Thus, you can simply run Ceph commands without the need to specify a
|
||||||
configuration file.
|
configuration file.
|
||||||
|
|
||||||
|
|
||||||
@ -247,11 +248,11 @@ Ceph Monitor
|
|||||||
-----------
|
-----------
|
||||||
The Ceph Monitor (MON)
|
The Ceph Monitor (MON)
|
||||||
footnote:[Ceph Monitor {cephdocs-url}/start/intro/]
|
footnote:[Ceph Monitor {cephdocs-url}/start/intro/]
|
||||||
maintains a master copy of the cluster map. For high availability you need to
|
maintains a master copy of the cluster map. For high availability, you need at
|
||||||
have at least 3 monitors. One monitor will already be installed if you
|
least 3 monitors. One monitor will already be installed if you
|
||||||
used the installation wizard. You won't need more than 3 monitors as long
|
used the installation wizard. You won't need more than 3 monitors, as long
|
||||||
as your cluster is small to midsize, only really large clusters will
|
as your cluster is small to medium-sized. Only really large clusters will
|
||||||
need more than that.
|
require more than this.
|
||||||
|
|
||||||
|
|
||||||
[[pveceph_create_mon]]
|
[[pveceph_create_mon]]
|
||||||
@ -261,7 +262,7 @@ Create Monitors
|
|||||||
[thumbnail="screenshot/gui-ceph-monitor.png"]
|
[thumbnail="screenshot/gui-ceph-monitor.png"]
|
||||||
|
|
||||||
On each node where you want to place a monitor (three monitors are recommended),
|
On each node where you want to place a monitor (three monitors are recommended),
|
||||||
create it by using the 'Ceph -> Monitor' tab in the GUI or run.
|
create one by using the 'Ceph -> Monitor' tab in the GUI or run:
|
||||||
|
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
@ -273,11 +274,11 @@ pveceph mon create
|
|||||||
Destroy Monitors
|
Destroy Monitors
|
||||||
~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
To remove a Ceph Monitor via the GUI first select a node in the tree view and
|
To remove a Ceph Monitor via the GUI, first select a node in the tree view and
|
||||||
go to the **Ceph -> Monitor** panel. Select the MON and click the **Destroy**
|
go to the **Ceph -> Monitor** panel. Select the MON and click the **Destroy**
|
||||||
button.
|
button.
|
||||||
|
|
||||||
To remove a Ceph Monitor via the CLI first connect to the node on which the MON
|
To remove a Ceph Monitor via the CLI, first connect to the node on which the MON
|
||||||
is running. Then execute the following command:
|
is running. Then execute the following command:
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
@ -290,8 +291,9 @@ NOTE: At least three Monitors are needed for quorum.
|
|||||||
[[pve_ceph_manager]]
|
[[pve_ceph_manager]]
|
||||||
Ceph Manager
|
Ceph Manager
|
||||||
------------
|
------------
|
||||||
|
|
||||||
The Manager daemon runs alongside the monitors. It provides an interface to
|
The Manager daemon runs alongside the monitors. It provides an interface to
|
||||||
monitor the cluster. Since the Ceph luminous release at least one ceph-mgr
|
monitor the cluster. Since the release of Ceph luminous, at least one ceph-mgr
|
||||||
footnote:[Ceph Manager {cephdocs-url}/mgr/] daemon is
|
footnote:[Ceph Manager {cephdocs-url}/mgr/] daemon is
|
||||||
required.
|
required.
|
||||||
|
|
||||||
@ -299,7 +301,8 @@ required.
|
|||||||
Create Manager
|
Create Manager
|
||||||
~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Multiple Managers can be installed, but at any time only one Manager is active.
|
Multiple Managers can be installed, but only one Manager is active at any given
|
||||||
|
time.
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
@ -314,25 +317,25 @@ high availability install more then one manager.
|
|||||||
Destroy Manager
|
Destroy Manager
|
||||||
~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
To remove a Ceph Manager via the GUI first select a node in the tree view and
|
To remove a Ceph Manager via the GUI, first select a node in the tree view and
|
||||||
go to the **Ceph -> Monitor** panel. Select the Manager and click the
|
go to the **Ceph -> Monitor** panel. Select the Manager and click the
|
||||||
**Destroy** button.
|
**Destroy** button.
|
||||||
|
|
||||||
To remove a Ceph Monitor via the CLI first connect to the node on which the
|
To remove a Ceph Monitor via the CLI, first connect to the node on which the
|
||||||
Manager is running. Then execute the following command:
|
Manager is running. Then execute the following command:
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
pveceph mgr destroy
|
pveceph mgr destroy
|
||||||
----
|
----
|
||||||
|
|
||||||
NOTE: A Ceph cluster can function without a Manager, but certain functions like
|
NOTE: While a manager is not a hard-dependency, it is crucial for a Ceph cluster,
|
||||||
the cluster status or usage require a running Manager.
|
as it handles important features like PG-autoscaling, device health monitoring,
|
||||||
|
telemetry and more.
|
||||||
|
|
||||||
[[pve_ceph_osds]]
|
[[pve_ceph_osds]]
|
||||||
Ceph OSDs
|
Ceph OSDs
|
||||||
---------
|
---------
|
||||||
Ceph **O**bject **S**torage **D**aemons are storing objects for Ceph over the
|
Ceph **O**bject **S**torage **D**aemons store objects for Ceph over the
|
||||||
network. It is recommended to use one OSD per physical disk.
|
network. It is recommended to use one OSD per physical disk.
|
||||||
|
|
||||||
NOTE: By default an object is 4 MiB in size.
|
NOTE: By default an object is 4 MiB in size.
|
||||||
@ -343,7 +346,7 @@ Create OSDs
|
|||||||
|
|
||||||
[thumbnail="screenshot/gui-ceph-osd-status.png"]
|
[thumbnail="screenshot/gui-ceph-osd-status.png"]
|
||||||
|
|
||||||
You can create an OSD either via the {pve} web-interface, or via CLI using
|
You can create an OSD either via the {pve} web-interface or via the CLI using
|
||||||
`pveceph`. For example:
|
`pveceph`. For example:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
@ -351,12 +354,12 @@ You can create an OSD either via the {pve} web-interface, or via CLI using
|
|||||||
pveceph osd create /dev/sd[X]
|
pveceph osd create /dev/sd[X]
|
||||||
----
|
----
|
||||||
|
|
||||||
TIP: We recommend a Ceph cluster with at least three nodes and a at least 12
|
TIP: We recommend a Ceph cluster with at least three nodes and at least 12
|
||||||
OSDs, evenly distributed among the nodes.
|
OSDs, evenly distributed among the nodes.
|
||||||
|
|
||||||
If the disk was in use before (for example, in a ZFS, or as OSD) you need to
|
If the disk was in use before (for example, for ZFS or as an OSD) you first need
|
||||||
first zap all traces of that usage. To remove the partition table, boot
|
to zap all traces of that usage. To remove the partition table, boot sector and
|
||||||
sector and any other OSD leftover, you can use the following command:
|
any other OSD leftover, you can use the following command:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
@ -368,7 +371,7 @@ WARNING: The above command will destroy all data on the disk!
|
|||||||
.Ceph Bluestore
|
.Ceph Bluestore
|
||||||
|
|
||||||
Starting with the Ceph Kraken release, a new Ceph OSD storage type was
|
Starting with the Ceph Kraken release, a new Ceph OSD storage type was
|
||||||
introduced, the so called Bluestore
|
introduced called Bluestore
|
||||||
footnote:[Ceph Bluestore https://ceph.com/community/new-luminous-bluestore/].
|
footnote:[Ceph Bluestore https://ceph.com/community/new-luminous-bluestore/].
|
||||||
This is the default when creating OSDs since Ceph Luminous.
|
This is the default when creating OSDs since Ceph Luminous.
|
||||||
|
|
||||||
@ -388,25 +391,25 @@ not specified separately.
|
|||||||
pveceph osd create /dev/sd[X] -db_dev /dev/sd[Y] -wal_dev /dev/sd[Z]
|
pveceph osd create /dev/sd[X] -db_dev /dev/sd[Y] -wal_dev /dev/sd[Z]
|
||||||
----
|
----
|
||||||
|
|
||||||
You can directly choose the size for those with the '-db_size' and '-wal_size'
|
You can directly choose the size of those with the '-db_size' and '-wal_size'
|
||||||
parameters respectively. If they are not given the following values (in order)
|
parameters respectively. If they are not given, the following values (in order)
|
||||||
will be used:
|
will be used:
|
||||||
|
|
||||||
* bluestore_block_{db,wal}_size from ceph configuration...
|
* bluestore_block_{db,wal}_size from Ceph configuration...
|
||||||
** ... database, section 'osd'
|
** ... database, section 'osd'
|
||||||
** ... database, section 'global'
|
** ... database, section 'global'
|
||||||
** ... file, section 'osd'
|
** ... file, section 'osd'
|
||||||
** ... file, section 'global'
|
** ... file, section 'global'
|
||||||
* 10% (DB)/1% (WAL) of OSD size
|
* 10% (DB)/1% (WAL) of OSD size
|
||||||
|
|
||||||
NOTE: The DB stores BlueStore’s internal metadata and the WAL is BlueStore’s
|
NOTE: The DB stores BlueStore’s internal metadata, and the WAL is BlueStore’s
|
||||||
internal journal or write-ahead log. It is recommended to use a fast SSD or
|
internal journal or write-ahead log. It is recommended to use a fast SSD or
|
||||||
NVRAM for better performance.
|
NVRAM for better performance.
|
||||||
|
|
||||||
|
|
||||||
.Ceph Filestore
|
.Ceph Filestore
|
||||||
|
|
||||||
Before Ceph Luminous, Filestore was used as default storage type for Ceph OSDs.
|
Before Ceph Luminous, Filestore was used as the default storage type for Ceph OSDs.
|
||||||
Starting with Ceph Nautilus, {pve} does not support creating such OSDs with
|
Starting with Ceph Nautilus, {pve} does not support creating such OSDs with
|
||||||
'pveceph' anymore. If you still want to create filestore OSDs, use
|
'pveceph' anymore. If you still want to create filestore OSDs, use
|
||||||
'ceph-volume' directly.
|
'ceph-volume' directly.
|
||||||
@ -420,42 +423,46 @@ ceph-volume lvm create --filestore --data /dev/sd[X] --journal /dev/sd[Y]
|
|||||||
Destroy OSDs
|
Destroy OSDs
|
||||||
~~~~~~~~~~~~
|
~~~~~~~~~~~~
|
||||||
|
|
||||||
To remove an OSD via the GUI first select a {PVE} node in the tree view and go
|
To remove an OSD via the GUI, first select a {PVE} node in the tree view and go
|
||||||
to the **Ceph -> OSD** panel. Select the OSD to destroy. Next click the **OUT**
|
to the **Ceph -> OSD** panel. Then select the OSD to destroy and click the **OUT**
|
||||||
button. Once the OSD status changed from `in` to `out` click the **STOP**
|
button. Once the OSD status has changed from `in` to `out`, click the **STOP**
|
||||||
button. As soon as the status changed from `up` to `down` select **Destroy**
|
button. Finally, after the status has changed from `up` to `down`, select
|
||||||
from the `More` drop-down menu.
|
**Destroy** from the `More` drop-down menu.
|
||||||
|
|
||||||
To remove an OSD via the CLI run the following commands.
|
To remove an OSD via the CLI run the following commands.
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
ceph osd out <ID>
|
ceph osd out <ID>
|
||||||
systemctl stop ceph-osd@<ID>.service
|
systemctl stop ceph-osd@<ID>.service
|
||||||
----
|
----
|
||||||
|
|
||||||
NOTE: The first command instructs Ceph not to include the OSD in the data
|
NOTE: The first command instructs Ceph not to include the OSD in the data
|
||||||
distribution. The second command stops the OSD service. Until this time, no
|
distribution. The second command stops the OSD service. Until this time, no
|
||||||
data is lost.
|
data is lost.
|
||||||
|
|
||||||
The following command destroys the OSD. Specify the '-cleanup' option to
|
The following command destroys the OSD. Specify the '-cleanup' option to
|
||||||
additionally destroy the partition table.
|
additionally destroy the partition table.
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
pveceph osd destroy <ID>
|
pveceph osd destroy <ID>
|
||||||
----
|
----
|
||||||
WARNING: The above command will destroy data on the disk!
|
|
||||||
|
WARNING: The above command will destroy all data on the disk!
|
||||||
|
|
||||||
|
|
||||||
[[pve_ceph_pools]]
|
[[pve_ceph_pools]]
|
||||||
Ceph Pools
|
Ceph Pools
|
||||||
----------
|
----------
|
||||||
A pool is a logical group for storing objects. It holds **P**lacement
|
A pool is a logical group for storing objects. It holds a collection of objects,
|
||||||
**G**roups (`PG`, `pg_num`), a collection of objects.
|
known as **P**lacement **G**roups (`PG`, `pg_num`).
|
||||||
|
|
||||||
|
|
||||||
Create and Edit Pools
|
Create and Edit Pools
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
You can create pools through command line or on the web-interface on each {pve}
|
You can create pools from the command line or the web-interface of any {pve}
|
||||||
host under **Ceph -> Pools**.
|
host under **Ceph -> Pools**.
|
||||||
|
|
||||||
[thumbnail="screenshot/gui-ceph-pools.png"]
|
[thumbnail="screenshot/gui-ceph-pools.png"]
|
||||||
@ -465,7 +472,7 @@ replicas** and a **min_size of 2 replicas**, to ensure no data loss occurs if
|
|||||||
any OSD fails.
|
any OSD fails.
|
||||||
|
|
||||||
WARNING: **Do not set a min_size of 1**. A replicated pool with min_size of 1
|
WARNING: **Do not set a min_size of 1**. A replicated pool with min_size of 1
|
||||||
allows I/O on an object when it has only 1 replica which could lead to data
|
allows I/O on an object when it has only 1 replica, which could lead to data
|
||||||
loss, incomplete PGs or unfound objects.
|
loss, incomplete PGs or unfound objects.
|
||||||
|
|
||||||
It is advised that you calculate the PG number based on your setup. You can
|
It is advised that you calculate the PG number based on your setup. You can
|
||||||
@ -485,8 +492,8 @@ automatically scale the PG count for a pool in the background.
|
|||||||
pveceph pool create <name> --add_storages
|
pveceph pool create <name> --add_storages
|
||||||
----
|
----
|
||||||
|
|
||||||
TIP: If you would like to automatically also get a storage definition for your
|
TIP: If you would also like to automatically define a storage for your
|
||||||
pool, keep the `Add storages' checkbox ticked in the web-interface, or use the
|
pool, keep the `Add as Storage' checkbox checked in the web-interface, or use the
|
||||||
command line option '--add_storages' at pool creation.
|
command line option '--add_storages' at pool creation.
|
||||||
|
|
||||||
.Base Options
|
.Base Options
|
||||||
@ -526,19 +533,21 @@ manual.
|
|||||||
Destroy Pools
|
Destroy Pools
|
||||||
~~~~~~~~~~~~~
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
To destroy a pool via the GUI select a node in the tree view and go to the
|
To destroy a pool via the GUI, select a node in the tree view and go to the
|
||||||
**Ceph -> Pools** panel. Select the pool to destroy and click the **Destroy**
|
**Ceph -> Pools** panel. Select the pool to destroy and click the **Destroy**
|
||||||
button. To confirm the destruction of the pool you need to enter the pool name.
|
button. To confirm the destruction of the pool, you need to enter the pool name.
|
||||||
|
|
||||||
Run the following command to destroy a pool. Specify the '-remove_storages' to
|
Run the following command to destroy a pool. Specify the '-remove_storages' to
|
||||||
also remove the associated storage.
|
also remove the associated storage.
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
pveceph pool destroy <name>
|
pveceph pool destroy <name>
|
||||||
----
|
----
|
||||||
|
|
||||||
NOTE: Deleting the data of a pool is a background task and can take some time.
|
NOTE: Pool deletion runs in the background and can take some time.
|
||||||
You will notice that the data usage in the cluster is decreasing.
|
You will notice the data usage in the cluster decreasing throughout this
|
||||||
|
process.
|
||||||
|
|
||||||
|
|
||||||
PG Autoscaler
|
PG Autoscaler
|
||||||
@ -549,6 +558,7 @@ stored in each pool and to choose the appropriate pg_num values automatically.
|
|||||||
|
|
||||||
You may need to activate the PG autoscaler module before adjustments can take
|
You may need to activate the PG autoscaler module before adjustments can take
|
||||||
effect.
|
effect.
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
ceph mgr module enable pg_autoscaler
|
ceph mgr module enable pg_autoscaler
|
||||||
@ -562,9 +572,9 @@ much from the current value.
|
|||||||
on:: The `pg_num` is adjusted automatically with no need for any manual
|
on:: The `pg_num` is adjusted automatically with no need for any manual
|
||||||
interaction.
|
interaction.
|
||||||
off:: No automatic `pg_num` adjustments are made, and no warning will be issued
|
off:: No automatic `pg_num` adjustments are made, and no warning will be issued
|
||||||
if the PG count is far from optimal.
|
if the PG count is not optimal.
|
||||||
|
|
||||||
The scaling factor can be adjusted to facilitate future data storage, with the
|
The scaling factor can be adjusted to facilitate future data storage with the
|
||||||
`target_size`, `target_size_ratio` and the `pg_num_min` options.
|
`target_size`, `target_size_ratio` and the `pg_num_min` options.
|
||||||
|
|
||||||
WARNING: By default, the autoscaler considers tuning the PG count of a pool if
|
WARNING: By default, the autoscaler considers tuning the PG count of a pool if
|
||||||
@ -579,12 +589,13 @@ Nautilus: PG merging and autotuning].
|
|||||||
[[pve_ceph_device_classes]]
|
[[pve_ceph_device_classes]]
|
||||||
Ceph CRUSH & device classes
|
Ceph CRUSH & device classes
|
||||||
---------------------------
|
---------------------------
|
||||||
The foundation of Ceph is its algorithm, **C**ontrolled **R**eplication
|
The footnote:[CRUSH
|
||||||
**U**nder **S**calable **H**ashing
|
https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf] (**C**ontrolled
|
||||||
(CRUSH footnote:[CRUSH https://ceph.com/wp-content/uploads/2016/08/weil-crush-sc06.pdf]).
|
**R**eplication **U**nder **S**calable **H**ashing) algorithm is at the
|
||||||
|
foundation of Ceph.
|
||||||
|
|
||||||
CRUSH calculates where to store to and retrieve data from, this has the
|
CRUSH calculates where to store and retrieve data from. This has the
|
||||||
advantage that no central index service is needed. CRUSH works with a map of
|
advantage that no central indexing service is needed. CRUSH works using a map of
|
||||||
OSDs, buckets (device locations) and rulesets (data replication) for pools.
|
OSDs, buckets (device locations) and rulesets (data replication) for pools.
|
||||||
|
|
||||||
NOTE: Further information can be found in the Ceph documentation, under the
|
NOTE: Further information can be found in the Ceph documentation, under the
|
||||||
@ -594,8 +605,8 @@ This map can be altered to reflect different replication hierarchies. The object
|
|||||||
replicas can be separated (eg. failure domains), while maintaining the desired
|
replicas can be separated (eg. failure domains), while maintaining the desired
|
||||||
distribution.
|
distribution.
|
||||||
|
|
||||||
A common use case is to use different classes of disks for different Ceph pools.
|
A common configuration is to use different classes of disks for different Ceph
|
||||||
For this reason, Ceph introduced the device classes with luminous, to
|
pools. For this reason, Ceph introduced device classes with luminous, to
|
||||||
accommodate the need for easy ruleset generation.
|
accommodate the need for easy ruleset generation.
|
||||||
|
|
||||||
The device classes can be seen in the 'ceph osd tree' output. These classes
|
The device classes can be seen in the 'ceph osd tree' output. These classes
|
||||||
@ -627,8 +638,8 @@ ID CLASS WEIGHT TYPE NAME
|
|||||||
14 nvme 0.72769 osd.14
|
14 nvme 0.72769 osd.14
|
||||||
----
|
----
|
||||||
|
|
||||||
To let a pool distribute its objects only on a specific device class, you need
|
To instruct a pool to only distribute objects on a specific device class, you
|
||||||
to create a ruleset with the specific class first.
|
first need to create a ruleset for the device class:
|
||||||
|
|
||||||
[source, bash]
|
[source, bash]
|
||||||
----
|
----
|
||||||
@ -650,10 +661,9 @@ Once the rule is in the CRUSH map, you can tell a pool to use the ruleset.
|
|||||||
ceph osd pool set <pool-name> crush_rule <rule-name>
|
ceph osd pool set <pool-name> crush_rule <rule-name>
|
||||||
----
|
----
|
||||||
|
|
||||||
TIP: If the pool already contains objects, all of these have to be moved
|
TIP: If the pool already contains objects, these must be moved accordingly.
|
||||||
accordingly. Depending on your setup this may introduce a big performance hit
|
Depending on your setup, this may introduce a big performance impact on your
|
||||||
on your cluster. As an alternative, you can create a new pool and move disks
|
cluster. As an alternative, you can create a new pool and move disks separately.
|
||||||
separately.
|
|
||||||
|
|
||||||
|
|
||||||
Ceph Client
|
Ceph Client
|
||||||
@ -661,17 +671,18 @@ Ceph Client
|
|||||||
|
|
||||||
[thumbnail="screenshot/gui-ceph-log.png"]
|
[thumbnail="screenshot/gui-ceph-log.png"]
|
||||||
|
|
||||||
You can then configure {pve} to use such pools to store VM or
|
Following the setup from the previous sections, you can configure {pve} to use
|
||||||
Container images. Simply use the GUI too add a new `RBD` storage (see
|
such pools to store VM and Container images. Simply use the GUI to add a new
|
||||||
section xref:ceph_rados_block_devices[Ceph RADOS Block Devices (RBD)]).
|
`RBD` storage (see section xref:ceph_rados_block_devices[Ceph RADOS Block
|
||||||
|
Devices (RBD)]).
|
||||||
|
|
||||||
You also need to copy the keyring to a predefined location for an external Ceph
|
You also need to copy the keyring to a predefined location for an external Ceph
|
||||||
cluster. If Ceph is installed on the Proxmox nodes itself, then this will be
|
cluster. If Ceph is installed on the Proxmox nodes itself, then this will be
|
||||||
done automatically.
|
done automatically.
|
||||||
|
|
||||||
NOTE: The file name needs to be `<storage_id> + `.keyring` - `<storage_id>` is
|
NOTE: The filename needs to be `<storage_id> + `.keyring`, where `<storage_id>` is
|
||||||
the expression after 'rbd:' in `/etc/pve/storage.cfg` which is
|
the expression after 'rbd:' in `/etc/pve/storage.cfg`. In the following example,
|
||||||
`my-ceph-storage` in the following example:
|
`my-ceph-storage` is the `<storage_id>`:
|
||||||
|
|
||||||
[source,bash]
|
[source,bash]
|
||||||
----
|
----
|
||||||
@ -683,113 +694,115 @@ cp /etc/ceph/ceph.client.admin.keyring /etc/pve/priv/ceph/my-ceph-storage.keyrin
|
|||||||
CephFS
|
CephFS
|
||||||
------
|
------
|
||||||
|
|
||||||
Ceph provides also a filesystem running on top of the same object storage as
|
Ceph also provides a filesystem, which runs on top of the same object storage as
|
||||||
RADOS block devices do. A **M**eta**d**ata **S**erver (`MDS`) is used to map
|
RADOS block devices do. A **M**eta**d**ata **S**erver (`MDS`) is used to map the
|
||||||
the RADOS backed objects to files and directories, allowing to provide a
|
RADOS backed objects to files and directories, allowing Ceph to provide a
|
||||||
POSIX-compliant replicated filesystem. This allows one to have a clustered
|
POSIX-compliant, replicated filesystem. This allows you to easily configure a
|
||||||
highly available shared filesystem in an easy way if ceph is already used. Its
|
clustered, highly available, shared filesystem. Ceph's Metadata Servers
|
||||||
Metadata Servers guarantee that files get balanced out over the whole Ceph
|
guarantee that files are evenly distributed over the entire Ceph cluster. As a
|
||||||
cluster, this way even high load will not overload a single host, which can be
|
result, even cases of high load will not overwhelm a single host, which can be
|
||||||
an issue with traditional shared filesystem approaches, like `NFS`, for
|
an issue with traditional shared filesystem approaches, for example `NFS`.
|
||||||
example.
|
|
||||||
|
|
||||||
[thumbnail="screenshot/gui-node-ceph-cephfs-panel.png"]
|
[thumbnail="screenshot/gui-node-ceph-cephfs-panel.png"]
|
||||||
|
|
||||||
{pve} supports both, using an existing xref:storage_cephfs[CephFS as storage]
|
{pve} supports both creating a hyper-converged CephFS and using an existing
|
||||||
to save backups, ISO files or container templates and creating a
|
xref:storage_cephfs[CephFS as storage] to save backups, ISO files, and container
|
||||||
hyper-converged CephFS itself.
|
templates.
|
||||||
|
|
||||||
|
|
||||||
[[pveceph_fs_mds]]
|
[[pveceph_fs_mds]]
|
||||||
Metadata Server (MDS)
|
Metadata Server (MDS)
|
||||||
~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
CephFS needs at least one Metadata Server to be configured and running to be
|
CephFS needs at least one Metadata Server to be configured and running, in order
|
||||||
able to work. One can simply create one through the {pve} web GUI's `Node ->
|
to function. You can create an MDS through the {pve} web GUI's `Node
|
||||||
CephFS` panel or on the command line with:
|
-> CephFS` panel or from the command line with:
|
||||||
|
|
||||||
----
|
----
|
||||||
pveceph mds create
|
pveceph mds create
|
||||||
----
|
----
|
||||||
|
|
||||||
Multiple metadata servers can be created in a cluster. But with the default
|
Multiple metadata servers can be created in a cluster, but with the default
|
||||||
settings only one can be active at any time. If an MDS, or its node, becomes
|
settings, only one can be active at a time. If an MDS or its node becomes
|
||||||
unresponsive (or crashes), another `standby` MDS will get promoted to `active`.
|
unresponsive (or crashes), another `standby` MDS will get promoted to `active`.
|
||||||
One can speed up the hand-over between the active and a standby MDS up by using
|
You can speed up the handover between the active and standby MDS by using
|
||||||
the 'hotstandby' parameter option on create, or if you have already created it
|
the 'hotstandby' parameter option on creation, or if you have already created it
|
||||||
you may set/add:
|
you may set/add:
|
||||||
|
|
||||||
----
|
----
|
||||||
mds standby replay = true
|
mds standby replay = true
|
||||||
----
|
----
|
||||||
|
|
||||||
in the ceph.conf respective MDS section. With this enabled, this specific MDS
|
in the respective MDS section of `/etc/pve/ceph.conf`. With this enabled, the
|
||||||
will always poll the active one, so that it can take over faster as it is in a
|
specified MDS will remain in a `warm` state, polling the active one, so that it
|
||||||
`warm` state. But naturally, the active polling will cause some additional
|
can take over faster in case of any issues.
|
||||||
performance impact on your system and active `MDS`.
|
|
||||||
|
NOTE: This active polling will have an additional performance impact on your
|
||||||
|
system and the active `MDS`.
|
||||||
|
|
||||||
.Multiple Active MDS
|
.Multiple Active MDS
|
||||||
|
|
||||||
Since Luminous (12.2.x) you can also have multiple active metadata servers
|
Since Luminous (12.2.x) you can have multiple active metadata servers
|
||||||
running, but this is normally only useful for a high count on parallel clients,
|
running at once, but this is normally only useful if you have a high amount of
|
||||||
as else the `MDS` seldom is the bottleneck. If you want to set this up please
|
clients running in parallel. Otherwise the `MDS` is rarely the bottleneck in a
|
||||||
refer to the ceph documentation. footnote:[Configuring multiple active MDS
|
system. If you want to set this up, please refer to the Ceph documentation.
|
||||||
daemons {cephdocs-url}/cephfs/multimds/]
|
footnote:[Configuring multiple active MDS daemons
|
||||||
|
{cephdocs-url}/cephfs/multimds/]
|
||||||
|
|
||||||
[[pveceph_fs_create]]
|
[[pveceph_fs_create]]
|
||||||
Create CephFS
|
Create CephFS
|
||||||
~~~~~~~~~~~~~
|
~~~~~~~~~~~~~
|
||||||
|
|
||||||
With {pve}'s CephFS integration into you can create a CephFS easily over the
|
With {pve}'s integration of CephFS, you can easily create a CephFS using the
|
||||||
Web GUI, the CLI or an external API interface. Some prerequisites are required
|
web interface, CLI or an external API interface. Some prerequisites are required
|
||||||
for this to work:
|
for this to work:
|
||||||
|
|
||||||
.Prerequisites for a successful CephFS setup:
|
.Prerequisites for a successful CephFS setup:
|
||||||
- xref:pve_ceph_install[Install Ceph packages], if this was already done some
|
- xref:pve_ceph_install[Install Ceph packages] - if this was already done some
|
||||||
time ago you might want to rerun it on an up to date system to ensure that
|
time ago, you may want to rerun it on an up-to-date system to
|
||||||
also all CephFS related packages get installed.
|
ensure that all CephFS related packages get installed.
|
||||||
- xref:pve_ceph_monitors[Setup Monitors]
|
- xref:pve_ceph_monitors[Setup Monitors]
|
||||||
- xref:pve_ceph_monitors[Setup your OSDs]
|
- xref:pve_ceph_monitors[Setup your OSDs]
|
||||||
- xref:pveceph_fs_mds[Setup at least one MDS]
|
- xref:pveceph_fs_mds[Setup at least one MDS]
|
||||||
|
|
||||||
After this got all checked and done you can simply create a CephFS through
|
After this is complete, you can simply create a CephFS through
|
||||||
either the Web GUI's `Node -> CephFS` panel or the command line tool `pveceph`,
|
either the Web GUI's `Node -> CephFS` panel or the command line tool `pveceph`,
|
||||||
for example with:
|
for example:
|
||||||
|
|
||||||
----
|
----
|
||||||
pveceph fs create --pg_num 128 --add-storage
|
pveceph fs create --pg_num 128 --add-storage
|
||||||
----
|
----
|
||||||
|
|
||||||
This creates a CephFS named `'cephfs'' using a pool for its data named
|
This creates a CephFS named 'cephfs', using a pool for its data named
|
||||||
`'cephfs_data'' with `128` placement groups and a pool for its metadata named
|
'cephfs_data' with '128' placement groups and a pool for its metadata named
|
||||||
`'cephfs_metadata'' with one quarter of the data pools placement groups (`32`).
|
'cephfs_metadata' with one quarter of the data pool's placement groups (`32`).
|
||||||
Check the xref:pve_ceph_pools[{pve} managed Ceph pool chapter] or visit the
|
Check the xref:pve_ceph_pools[{pve} managed Ceph pool chapter] or visit the
|
||||||
Ceph documentation for more information regarding a fitting placement group
|
Ceph documentation for more information regarding an appropriate placement group
|
||||||
number (`pg_num`) for your setup footnoteref:[placement_groups].
|
number (`pg_num`) for your setup footnoteref:[placement_groups].
|
||||||
Additionally, the `'--add-storage'' parameter will add the CephFS to the {pve}
|
Additionally, the '--add-storage' parameter will add the CephFS to the {pve}
|
||||||
storage configuration after it has been created successfully.
|
storage configuration after it has been created successfully.
|
||||||
|
|
||||||
Destroy CephFS
|
Destroy CephFS
|
||||||
~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~
|
||||||
|
|
||||||
WARNING: Destroying a CephFS will render all its data unusable, this cannot be
|
WARNING: Destroying a CephFS will render all of its data unusable. This cannot be
|
||||||
undone!
|
undone!
|
||||||
|
|
||||||
If you really want to destroy an existing CephFS you first need to stop, or
|
If you really want to destroy an existing CephFS, you first need to stop or
|
||||||
destroy, all metadata servers (`M̀DS`). You can destroy them either over the Web
|
destroy all metadata servers (`M̀DS`). You can destroy them either via the web
|
||||||
GUI or the command line interface, with:
|
interface or via the command line interface, by issuing
|
||||||
|
|
||||||
----
|
----
|
||||||
pveceph mds destroy NAME
|
pveceph mds destroy NAME
|
||||||
----
|
----
|
||||||
on each {pve} node hosting a MDS daemon.
|
on each {pve} node hosting an MDS daemon.
|
||||||
|
|
||||||
Then, you can remove (destroy) CephFS by issuing a:
|
Then, you can remove (destroy) the CephFS by issuing
|
||||||
|
|
||||||
----
|
----
|
||||||
ceph fs rm NAME --yes-i-really-mean-it
|
ceph fs rm NAME --yes-i-really-mean-it
|
||||||
----
|
----
|
||||||
on a single node hosting Ceph. After this you may want to remove the created
|
on a single node hosting Ceph. After this, you may want to remove the created
|
||||||
data and metadata pools, this can be done either over the Web GUI or the CLI
|
data and metadata pools, this can be done either over the Web GUI or the CLI
|
||||||
with:
|
with:
|
||||||
|
|
||||||
@ -804,33 +817,36 @@ Ceph maintenance
|
|||||||
Replace OSDs
|
Replace OSDs
|
||||||
~~~~~~~~~~~~
|
~~~~~~~~~~~~
|
||||||
|
|
||||||
One of the common maintenance tasks in Ceph is to replace a disk of an OSD. If
|
One of the most common maintenance tasks in Ceph is to replace the disk of an
|
||||||
a disk is already in a failed state, then you can go ahead and run through the
|
OSD. If a disk is already in a failed state, then you can go ahead and run
|
||||||
steps in xref:pve_ceph_osd_destroy[Destroy OSDs]. Ceph will recreate those
|
through the steps in xref:pve_ceph_osd_destroy[Destroy OSDs]. Ceph will recreate
|
||||||
copies on the remaining OSDs if possible. This rebalancing will start as soon
|
those copies on the remaining OSDs if possible. This rebalancing will start as
|
||||||
as an OSD failure is detected or an OSD was actively stopped.
|
soon as an OSD failure is detected or an OSD was actively stopped.
|
||||||
|
|
||||||
NOTE: With the default size/min_size (3/2) of a pool, recovery only starts when
|
NOTE: With the default size/min_size (3/2) of a pool, recovery only starts when
|
||||||
`size + 1` nodes are available. The reason for this is that the Ceph object
|
`size + 1` nodes are available. The reason for this is that the Ceph object
|
||||||
balancer xref:pve_ceph_device_classes[CRUSH] defaults to a full node as
|
balancer xref:pve_ceph_device_classes[CRUSH] defaults to a full node as
|
||||||
`failure domain'.
|
`failure domain'.
|
||||||
|
|
||||||
To replace a still functioning disk, on the GUI go through the steps in
|
To replace a functioning disk from the GUI, go through the steps in
|
||||||
xref:pve_ceph_osd_destroy[Destroy OSDs]. The only addition is to wait until
|
xref:pve_ceph_osd_destroy[Destroy OSDs]. The only addition is to wait until
|
||||||
the cluster shows 'HEALTH_OK' before stopping the OSD to destroy it.
|
the cluster shows 'HEALTH_OK' before stopping the OSD to destroy it.
|
||||||
|
|
||||||
On the command line use the following commands.
|
On the command line, use the following commands:
|
||||||
|
|
||||||
----
|
----
|
||||||
ceph osd out osd.<id>
|
ceph osd out osd.<id>
|
||||||
----
|
----
|
||||||
|
|
||||||
You can check with the command below if the OSD can be safely removed.
|
You can check with the command below if the OSD can be safely removed.
|
||||||
|
|
||||||
----
|
----
|
||||||
ceph osd safe-to-destroy osd.<id>
|
ceph osd safe-to-destroy osd.<id>
|
||||||
----
|
----
|
||||||
|
|
||||||
Once the above check tells you that it is save to remove the OSD, you can
|
Once the above check tells you that it is safe to remove the OSD, you can
|
||||||
continue with following commands.
|
continue with the following commands:
|
||||||
|
|
||||||
----
|
----
|
||||||
systemctl stop ceph-osd@<id>.service
|
systemctl stop ceph-osd@<id>.service
|
||||||
pveceph osd destroy <id>
|
pveceph osd destroy <id>
|
||||||
@ -841,7 +857,8 @@ in xref:pve_ceph_osd_create[Create OSDs].
|
|||||||
|
|
||||||
Trim/Discard
|
Trim/Discard
|
||||||
~~~~~~~~~~~~
|
~~~~~~~~~~~~
|
||||||
It is a good measure to run 'fstrim' (discard) regularly on VMs or containers.
|
|
||||||
|
It is good practice to run 'fstrim' (discard) regularly on VMs and containers.
|
||||||
This releases data blocks that the filesystem isn’t using anymore. It reduces
|
This releases data blocks that the filesystem isn’t using anymore. It reduces
|
||||||
data usage and resource load. Most modern operating systems issue such discard
|
data usage and resource load. Most modern operating systems issue such discard
|
||||||
commands to their disks regularly. You only need to ensure that the Virtual
|
commands to their disks regularly. You only need to ensure that the Virtual
|
||||||
@ -850,6 +867,7 @@ Machines enable the xref:qm_hard_disk_discard[disk discard option].
|
|||||||
[[pveceph_scrub]]
|
[[pveceph_scrub]]
|
||||||
Scrub & Deep Scrub
|
Scrub & Deep Scrub
|
||||||
~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Ceph ensures data integrity by 'scrubbing' placement groups. Ceph checks every
|
Ceph ensures data integrity by 'scrubbing' placement groups. Ceph checks every
|
||||||
object in a PG for its health. There are two forms of Scrubbing, daily
|
object in a PG for its health. There are two forms of Scrubbing, daily
|
||||||
cheap metadata checks and weekly deep data checks. The weekly deep scrub reads
|
cheap metadata checks and weekly deep data checks. The weekly deep scrub reads
|
||||||
@ -859,15 +877,16 @@ scrubs footnote:[Ceph scrubbing {cephdocs-url}/rados/configuration/osd-config-re
|
|||||||
are executed.
|
are executed.
|
||||||
|
|
||||||
|
|
||||||
Ceph monitoring and troubleshooting
|
Ceph Monitoring and Troubleshooting
|
||||||
-----------------------------------
|
-----------------------------------
|
||||||
A good start is to continuously monitor the ceph health from the start of
|
|
||||||
initial deployment. Either through the ceph tools itself, but also by accessing
|
It is important to continuously monitor the health of a Ceph deployment from the
|
||||||
|
beginning, either by using the Ceph tools or by accessing
|
||||||
the status through the {pve} link:api-viewer/index.html[API].
|
the status through the {pve} link:api-viewer/index.html[API].
|
||||||
|
|
||||||
The following ceph commands below can be used to see if the cluster is healthy
|
The following Ceph commands can be used to see if the cluster is healthy
|
||||||
('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors
|
('HEALTH_OK'), if there are warnings ('HEALTH_WARN'), or even errors
|
||||||
('HEALTH_ERR'). If the cluster is in an unhealthy state the status commands
|
('HEALTH_ERR'). If the cluster is in an unhealthy state, the status commands
|
||||||
below will also give you an overview of the current events and actions to take.
|
below will also give you an overview of the current events and actions to take.
|
||||||
|
|
||||||
----
|
----
|
||||||
@ -877,8 +896,8 @@ pve# ceph -s
|
|||||||
pve# ceph -w
|
pve# ceph -w
|
||||||
----
|
----
|
||||||
|
|
||||||
To get a more detailed view, every ceph service has a log file under
|
To get a more detailed view, every Ceph service has a log file under
|
||||||
`/var/log/ceph/` and if there is not enough detail, the log level can be
|
`/var/log/ceph/`. If more detail is required, the log level can be
|
||||||
adjusted footnote:[Ceph log and debugging {cephdocs-url}/rados/troubleshooting/log-and-debug/].
|
adjusted footnote:[Ceph log and debugging {cephdocs-url}/rados/troubleshooting/log-and-debug/].
|
||||||
|
|
||||||
You can find more information about troubleshooting
|
You can find more information about troubleshooting
|
||||||
|
Loading…
Reference in New Issue
Block a user