Commit Graph

136 Commits

Author SHA1 Message Date
Fabian Ebner
e6f55a13b0 api: ceph: mon: make checking for duplicate addresses more robust
Because $mon->{addr} might come with a port attached (affects monitors
created with PVE 5.4 as reported in the community forum [0]), or even
be a hostname (according to the code in Ceph/Services.pm). Although
the latter shouldn't happen for configurations created by PVE.

[0]: https://forum.proxmox.com/threads/105904/

Fixes: 9e989449 ("api: ceph: mon: fix handling of IPv6 addresses in assert_mon_prerequisites")
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-06-08 08:49:34 +02:00
Thomas Lamprecht
ea9eea012a api: ceph pool: reword ec desc full textwidth and reword slightly
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-04 07:14:56 +02:00
Aaron Lauterer
1401cb4e43 ceph pools create: enhance erasure-code description
Mention which optional parameters will be used for the replicated
metadata pool but won't have an effect on the erasure coded data pool.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2022-05-04 07:12:00 +02:00
Aaron Lauterer
9569939a54 ceph pools create: remove crush_rule for ec pool data
The crush rule is an optional paramter which can be used for the
metadata pool, but the erasure coded data pool will always get its own
crush rule. Therefore this parameter can not be adapted.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2022-05-04 07:11:06 +02:00
Aaron Lauterer
fca1900c76 api: ceph pools: add type to returned properties
The osd dump already contains the pool type in numerical format.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
2022-05-02 15:43:11 +02:00
Thomas Lamprecht
ec63b237de api: ceph: fix description indentation style
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-29 14:28:12 +02:00
Thomas Lamprecht
07316e6d04 api: followup: code locality
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-29 14:26:32 +02:00
Aaron Lauterer
4605d6fd22 api: ceph ec pools: make add_storages overridable default
The behavior of always adding the storage config was lost in commit
23c407e. But it is more sensible to make it a default that can be
changed if needed.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2022-04-29 14:24:04 +02:00
Aaron Lauterer
136f761ba7 api: ceph ec pools: schema fixes and enhancements
Ceph has a min value for 'k' of 2. Adding default and description where
missing.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2022-04-29 14:24:04 +02:00
Thomas Lamprecht
f35e7fcd8e api: ceph ec pools: move to format-str, create ec in worker, reuse $rados
moved to a format string 'erasurce-coded', that allows also to drop
most of the param existence checking as we can set the correct
optional'ness in there.  Also avoids bloating the API to much for
just this.

Reuse the $rados connection more often to avoid to much
overhead/lingering sockets (the rados connection stays around in the
background to allow efficient reuse)

really should be three separate commits, but too intertwined and too
late for me to care tbh.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-28 20:26:38 +02:00
Aaron Lauterer
3bd128d7a0 ceph pools: allow to create erasure code pools
To use erasure coded (EC) pools for RBD storages, we need two pools. One
regular replicated pool that will hold the RBD omap and other metadata
and the EC pool which will hold the image data.

The coupling happens when an RBD image is created by adding the
--data-pool parameter. This is why we have the 'data-pool' parameter in
the storage configuration.

To follow already established semantics, we will create a 'X-metadata'
and 'X-data' pool. The storage configuration is always added as it is
the only thing that links the two together (besides naming schemes).

Different pg_num defaults are chosen for the replicated metadata pool as
it will not hold a lot of data.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2022-04-28 20:26:38 +02:00
Aaron Lauterer
29fe1eea7a api: ceph: $get_storages check if data-pool too
When removing a pool, we check against any storage that might have that
pool configured.
We need to check if that pool is used as data-pool too.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2022-04-28 20:26:38 +02:00
Fabian Ebner
cffeb11592 api: ceph: create osd: set correct partition type
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-11-11 21:50:51 +01:00
Fabian Ebner
5161a0c2f0 partially fix #2285: api: ceph: create osd: allow using partitions
Note that this does not only allow partitions to be used, but for DB
and WAL disks, one more type of disk, that wasn't allowed before.
Namely, GPT-partitioned disks with any partitions detected as used.
The reason is get_disks' behavior:
  * Without $include_partitions=1, the disk will have the same usage
    as it's first used partition, and thus wasn't allowed. (Except in
    the case that usage was LVM, where the check was bypassed, but
    luckily OSD creation just failed later because no Ceph volume
    group would be detected).
  * With $include_partitions=1, the disk will have usage 'partitions'
    and thus be allowed.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-11-11 21:50:38 +01:00
Fabian Ebner
46b1ccc357 api: ceph: create osd: set correct parttype for DB/WAL
The get_ceph_journals function in pve-storage uses this information.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-11-11 21:50:33 +01:00
Dominik Csapak
6ec3dc437b api: cephfs: add 'fs-name' for cephfs storage
so that we can uniquely identify the cephfs (in case of multiple)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-11 17:52:08 +01:00
Dominik Csapak
352a0e5c93 api: cephfs: add fs_name to 'is mds active' check
so that we check the mds for the correct cephfs we just added

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-11 17:52:08 +01:00
Dominik Csapak
028af34e3d api: cephfs: more checks on fs create
namely if the fs is already existing, and if there is currently a
standby mds that can be used for the new fs
previosuly, only one cephfs was possible, so these checks were not
necessary. now with pacific, it is possible to have multiple cephfs'
and we should check for those.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-11 17:52:08 +01:00
Dominik Csapak
0ab69d6e88 api: cephfs: refactor {ls, create}_fs
no function change intended

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-11 17:52:08 +01:00
Dominik Csapak
d3eed3b4a8 api: ceph: fix getting ceph versions
Since commit: 8a3a300b ("ceph services: drop broadcasting legacy
version pmxcfs KV")

The 'ceph-version' kv is not broadcasted anymore, so we should not
query it, instead use get_ceph_versions

Also drop the other legacy keys for the versions

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-10 14:36:22 +01:00
Fabian Ebner
45d602f212 api: ceph: create osd: work around udev bug
There is a udev bug [0] which can ultimately lead to the udev database
for certain devices not being actively updated. The Diskmanage package
relies upon lsblk for certain info, and lsblk queries the udev
database. Ensure the information is updated by manually calling
'udevadm trigger' for the changed devices.

Without the fix, and a bit of bad luck, a cleaned up disk could still
show up as an 'LVM2_member' for example.

[0]: https://github.com/systemd/systemd/issues/18525

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-09-30 18:12:58 +02:00
Fabian Ebner
683a3563e7 api: check: create osd: use wipe_blockdev from the Diskmanage package
which is mostly a copy of the wipe_disks helper with the difference
that it also uses wipefs on the device and its partitions.

Remove the wipe_disks helper as no users remain.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-09-30 18:12:58 +02:00
Fabian Ebner
e256595683 api: ceph: create osd: re-check disk requirements after fork/lock
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-09-30 18:12:58 +02:00
Alwin Antreich
0b6a283801 fix #2422: allow multiple Ceph public networks
Multiple public networks can be defined in the ceph.conf. The networks need to
be routed to each other.

Support handling multiple IPs for a single monitor. By default, one address from
each public network is selected for monitor creation, but, as before, it can be
overwritten with the mon-address parameter, now taking a list of addresses.

On removal, make sure the all addresses are removed from the mon_host entry in
the ceph configuration.

Originally-by: Alwin Antreich <a.antreich@proxmox.com>
[handling of multiple addresses]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:05 +02:00
Fabian Ebner
815325da0d api: ceph: mon: fix handling of IPv6 addresses in destroymon
by also comparing the canonical form to decide when to remove an address. When
getting the IP from the rados information, also drop eventual brackets, so our
existing function can handle it. Add the brackets back within the
remove_addr_from_mon_host function.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:05 +02:00
Fabian Ebner
3e10f0fcdb api: ceph: mon: factor out mon_host regex address removal
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
9e989449ae api: ceph: mon: fix handling of IPv6 addresses in assert_mon_prerequisites
by comparing their canonical forms.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
4be756f59c api: ceph: mon: add ips_from_mon_host helper
Partially based on pve-storage's CephConfig.pm get_monaddr_list, but the
interface is not the best for the use case here.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
396acb1577 api: ceph: mon: fix handling of IPv6 addresses in find_mon_ip
by comparing their canonical forms.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
8ecaa0bfbe api: ceph: create mon: explicitly add subsequent monitors to the monmap
in preparation for supporting multiple addresses. The config section does not
allow more than one public_addr.

Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
57951fc78b api: ceph: create mon: factor out monmaptool command
so it's easier to re-use for a future variant.

Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
d3b899c144 api: ceph: create mon: handle ms_bind_ipv* options more generally
mostly relevant to prepare support for IPv4/IPv6 dual stack mode as a special
case of the planned support for mutliple public networks.

As before, only set the false value when we are dealing with the first address,
but also be explicit about the IPv4 case as the defaults might change in the
future.

Then, when an address of a different type comes along later, set the relevant
bind option to true.

Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
6e96b07078 api: ceph: mon: split up arguments for run_command
no functional change is intended.

Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
596bb7b11a api: ceph: osd: create: rename size parameters
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-09 11:29:34 +02:00
Thomas Lamprecht
51498a2664 ceph: code/indentation cleanup
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-05-03 14:03:32 +02:00
Thomas Lamprecht
0dd48804e1 api: ceph/monitor: automatically disable insecure global ID reclaim after creating first monitor
nautilus 14.2.20 and octopus 15.2.11 fixed a security issue with
reclaiming the global ID auth (CVE-2021-20288). As fixing this issue
means that older client won't be able to connect anymore, the fix was
done behind a switch, with a HEALTH warning if it was not active
(i.e., disallowed connection from older clients).

New installations have this switch also at the insecure level, for
compat reasons, so lets deactivate it ourself after monitor creation
to avoid the health warning and slightly insecure setup (in default
PVE ceph the whole issue was of rather low impact/risk). But, only do
so when creating the first monitor of a ceph cluster, to avoid
breaking existing setups by accident.

An admin can always switch it back again, e.g., if they're recovering
from some failure and need to setup fresh monitors but have still old
clients.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-27 12:35:34 +02:00
Thomas Lamprecht
a91bd3c370 api: ceph pool create: replace left-over complex error handling
this was from the time where we had a loop here to add two storages,
one for KRDB-only and one for KRBD-never. Nowadays we can handle the
mixed case just fine, but the patch dropping that forget to cleanup
the error handling..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-21 17:34:23 +02:00
Thomas Lamprecht
84b08e8aec api: ceph/pool: fix formatting of API parameters
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-21 15:45:35 +02:00
Dominik Csapak
08db34257a API2/Ceph/Pools: remove unnecessary boolean conversion
we do nothing with that field, so leave it like it is

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-04-20 18:20:35 +02:00
Alwin Antreich
6b36f36842 ceph: set allowed minimal pg_num down to 1
In Ceph Octopus the device_health_metrics pool is auto-created with 1
PG. Since Ceph has the ability to split/merge PGs, hitting the wrong PG
count is now less of an issue anyhow.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-04-20 18:20:35 +02:00
Alwin Antreich
5a3d794242 ceph: add autoscale_status to api calls
the properties target_size_ratio, target_size_bytes and pg_num_min are
used to fine-tune the pg_autoscaler and are set on a pool. The updated
pool list shows now autoscale settings & status. Including the new
(optimal) target PGs. To make it easier for new users to get/set the
correct amount of PGs.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-04-20 18:20:35 +02:00
Thomas Lamprecht
d7a63207a3 ceph: osd_belongs_to_node: only check tree-entries of type host, refactor
We want to check explicitly for type host, so filter for that first
and create a hash map for easier usage afterwards.

Drop the error when there's no tree, as either RADOS error'd on bad
command already, or there really is no tree (but RADOS worked OK), in
which case we simply return that the OSD did not belong to this node.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-20 18:06:07 +02:00
Dominic Jäger
220173e9c6 Fix #2053: OSD destroy only on specified node
Allow destroying only OSDs that belong to the node that has been specified in
the API path.

So if
 - OSD 1 belongs to node A and
 - OSD 2 belongs to node B
then
 - pvesh delete nodes/A/ceph/osd/1 is allowed but
 - pvesh delete nodes/A/ceph/osd/2 is not

Destroying an OSD via GUI automatically inserts the correct node
into the API path.

pveceph automatically insert the local node into the API call, too.
Consequently, it can now only destroy local OSDs (fix #2053).
 - pveceph osd destroy 1 is allowed on node A but
 - pveceph osd destroy 2 is not

Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
2021-04-20 16:42:12 +02:00
Alwin Antreich
54ba7dd991 ceph: add get api call for single pool
Information of a single pool can be queried.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2021-02-06 14:28:39 +01:00
Alwin Antreich
461e214155 ceph: add titles to ceph_pool_common_options
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2021-02-06 14:25:53 +01:00
Alwin Antreich
51d6db5815 ceph: setpool, use parameter extraction instead
of the unneeded ref copy for params.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2021-02-06 14:24:30 +01:00
Alwin Antreich
56d02a863b api: ceph: subclass pools
for better handling and since the pool endpoints got more entries.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2021-02-06 14:17:53 +01:00
Stoiko Ivanov
c92fc8a1e8 api2: osd destroy: untaint device before pvremove
We get the device list from ceph-volume lvm list, and decode the json
output, which at that point is tainted (perlsec (1)).
Untaint it here before calling, because it is currently the only
call-site using the information in a problematic way (run_command).
(the only other call-site being in pve5to6)

Alternatively we could untaint while reading the information, but then
should only return a small subset of the ceph-volume output.

The issue is most likely due to
cb9db10c1a9855cf40ff13e81f9dd97d6a9b2698 in pve-common ('run_command:
improve performance for logging and long lines'),

Tested on a virtual testsetup by creating OSDs with second DB disk,
and destroying it via GUI (did not manage to get the error without the
DB disk)

Reported via our community forum:
https://forum.proxmox.com/threads/insecure-dependency-in-exec-during-osd-destroy.79574/

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2020-11-24 23:37:33 +01:00
Stoiko Ivanov
259b557cf4 api2: osd destroy: fix error function
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2020-11-24 23:37:33 +01:00
Alwin Antreich
2184098ed3 Allow setting device class on osd create
In some situations Ceph's auto-detection doesn't recognize the device
class correctly. The option allows to set it directly on osd create,
instead of altering it afterwards. This way the cluster doesn't need to
shift data back and forth unnecessarily.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2020-07-24 10:26:11 +02:00