To get more details for a single OSD, we add two new endpoints:
* nodes/{node}/ceph/osd/{osdid}/metadata
* nodes/{node}/ceph/osd/{osdid}/lv-info
The {osdid} endpoint itself gets a new GET handler to return the index.
The metadata one provides various metadata regarding the OSD.
Such as
* process id
* memory usage
* info about devices used (bdev/block, db, wal)
* size
* disks used (sdX)
...
* network addresses and ports used
...
Memory usage and PID are retrieved from systemd while the rest can be
retrieved from the metadata provided by Ceph.
The second one (lv-info) returns the following infos for a logical
volume:
* creation time
* lv name
* lv path
* lv size
* lv uuid
* vg name
Possible volumes are:
* block (default value if not provided)
* db
* wal
'ceph-volume' is used to gather the infos, except for the creation time
of the LV which is retrieved via 'lvs'.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
By switching from 'ceph osd tree' to the 'ceph osd df tree' mon API
equivalent , we get the same data structure with more information per
OSD. One of them is the number of PGs stored on that OSD.
The number of PGs per OSD is an important number, for example when
trying to figure out why the performance is not as good as expected.
Therefore, adding it to the OSD overview visible by default should
reduce the number of times, one needs to access the CLI.
Comparing runtime cost on a 3 node ceph cluster with 4 OSDs each doing 50k
iterations gives:
Rate osd-df-tree osd-tree
osd-df-tree 9141/s -- -25%
osd-tree 12136/s 33% --
So, while definitively a bit slower, but it's still in the µs range,
and as such below HTTP in TLS in TCP connection setup for most users,
so worth the extra useful information.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
[ TL: slight rewording of subject and add benchmark data ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
to include a more complete description of the returned data.
Sort properties in alphabetical order if the list is longer.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
since ceph luminous (ceph 12) pools need to be associated with at
least one applicaton. expose this information here too so that clients
of this endpoint can use it.
Signed-off-by: Stefan Sterz <s.sterz@proxmox.com>
Tested-By: Aaron Lauterer <a.lauterer@proxmox.com>
This avoids errors about the use of uninitialized values if the 'pool'
parameter is not present in the storage configuration.
The 'pool' property for an RBD storage config is not mandatory.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Because $mon->{addr} might come with a port attached (affects monitors
created with PVE 5.4 as reported in the community forum [0]), or even
be a hostname (according to the code in Ceph/Services.pm). Although
the latter shouldn't happen for configurations created by PVE.
[0]: https://forum.proxmox.com/threads/105904/
Fixes: 9e989449 ("api: ceph: mon: fix handling of IPv6 addresses in assert_mon_prerequisites")
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Mention which optional parameters will be used for the replicated
metadata pool but won't have an effect on the erasure coded data pool.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
The crush rule is an optional paramter which can be used for the
metadata pool, but the erasure coded data pool will always get its own
crush rule. Therefore this parameter can not be adapted.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
The osd dump already contains the pool type in numerical format.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
The behavior of always adding the storage config was lost in commit
23c407e. But it is more sensible to make it a default that can be
changed if needed.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
moved to a format string 'erasurce-coded', that allows also to drop
most of the param existence checking as we can set the correct
optional'ness in there. Also avoids bloating the API to much for
just this.
Reuse the $rados connection more often to avoid to much
overhead/lingering sockets (the rados connection stays around in the
background to allow efficient reuse)
really should be three separate commits, but too intertwined and too
late for me to care tbh.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
To use erasure coded (EC) pools for RBD storages, we need two pools. One
regular replicated pool that will hold the RBD omap and other metadata
and the EC pool which will hold the image data.
The coupling happens when an RBD image is created by adding the
--data-pool parameter. This is why we have the 'data-pool' parameter in
the storage configuration.
To follow already established semantics, we will create a 'X-metadata'
and 'X-data' pool. The storage configuration is always added as it is
the only thing that links the two together (besides naming schemes).
Different pg_num defaults are chosen for the replicated metadata pool as
it will not hold a lot of data.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
When removing a pool, we check against any storage that might have that
pool configured.
We need to check if that pool is used as data-pool too.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Note that this does not only allow partitions to be used, but for DB
and WAL disks, one more type of disk, that wasn't allowed before.
Namely, GPT-partitioned disks with any partitions detected as used.
The reason is get_disks' behavior:
* Without $include_partitions=1, the disk will have the same usage
as it's first used partition, and thus wasn't allowed. (Except in
the case that usage was LVM, where the check was bypassed, but
luckily OSD creation just failed later because no Ceph volume
group would be detected).
* With $include_partitions=1, the disk will have usage 'partitions'
and thus be allowed.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
namely if the fs is already existing, and if there is currently a
standby mds that can be used for the new fs
previosuly, only one cephfs was possible, so these checks were not
necessary. now with pacific, it is possible to have multiple cephfs'
and we should check for those.
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Since commit: 8a3a300b ("ceph services: drop broadcasting legacy
version pmxcfs KV")
The 'ceph-version' kv is not broadcasted anymore, so we should not
query it, instead use get_ceph_versions
Also drop the other legacy keys for the versions
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
There is a udev bug [0] which can ultimately lead to the udev database
for certain devices not being actively updated. The Diskmanage package
relies upon lsblk for certain info, and lsblk queries the udev
database. Ensure the information is updated by manually calling
'udevadm trigger' for the changed devices.
Without the fix, and a bit of bad luck, a cleaned up disk could still
show up as an 'LVM2_member' for example.
[0]: https://github.com/systemd/systemd/issues/18525
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
which is mostly a copy of the wipe_disks helper with the difference
that it also uses wipefs on the device and its partitions.
Remove the wipe_disks helper as no users remain.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Multiple public networks can be defined in the ceph.conf. The networks need to
be routed to each other.
Support handling multiple IPs for a single monitor. By default, one address from
each public network is selected for monitor creation, but, as before, it can be
overwritten with the mon-address parameter, now taking a list of addresses.
On removal, make sure the all addresses are removed from the mon_host entry in
the ceph configuration.
Originally-by: Alwin Antreich <a.antreich@proxmox.com>
[handling of multiple addresses]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
by also comparing the canonical form to decide when to remove an address. When
getting the IP from the rados information, also drop eventual brackets, so our
existing function can handle it. Add the brackets back within the
remove_addr_from_mon_host function.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Partially based on pve-storage's CephConfig.pm get_monaddr_list, but the
interface is not the best for the use case here.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
in preparation for supporting multiple addresses. The config section does not
allow more than one public_addr.
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
mostly relevant to prepare support for IPv4/IPv6 dual stack mode as a special
case of the planned support for mutliple public networks.
As before, only set the false value when we are dealing with the first address,
but also be explicit about the IPv4 case as the defaults might change in the
future.
Then, when an address of a different type comes along later, set the relevant
bind option to true.
Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
nautilus 14.2.20 and octopus 15.2.11 fixed a security issue with
reclaiming the global ID auth (CVE-2021-20288). As fixing this issue
means that older client won't be able to connect anymore, the fix was
done behind a switch, with a HEALTH warning if it was not active
(i.e., disallowed connection from older clients).
New installations have this switch also at the insecure level, for
compat reasons, so lets deactivate it ourself after monitor creation
to avoid the health warning and slightly insecure setup (in default
PVE ceph the whole issue was of rather low impact/risk). But, only do
so when creating the first monitor of a ceph cluster, to avoid
breaking existing setups by accident.
An admin can always switch it back again, e.g., if they're recovering
from some failure and need to setup fresh monitors but have still old
clients.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this was from the time where we had a loop here to add two storages,
one for KRDB-only and one for KRBD-never. Nowadays we can handle the
mixed case just fine, but the patch dropping that forget to cleanup
the error handling..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
In Ceph Octopus the device_health_metrics pool is auto-created with 1
PG. Since Ceph has the ability to split/merge PGs, hitting the wrong PG
count is now less of an issue anyhow.
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
the properties target_size_ratio, target_size_bytes and pg_num_min are
used to fine-tune the pg_autoscaler and are set on a pool. The updated
pool list shows now autoscale settings & status. Including the new
(optimal) target PGs. To make it easier for new users to get/set the
correct amount of PGs.
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
We want to check explicitly for type host, so filter for that first
and create a hash map for easier usage afterwards.
Drop the error when there's no tree, as either RADOS error'd on bad
command already, or there really is no tree (but RADOS worked OK), in
which case we simply return that the OSD did not belong to this node.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Allow destroying only OSDs that belong to the node that has been specified in
the API path.
So if
- OSD 1 belongs to node A and
- OSD 2 belongs to node B
then
- pvesh delete nodes/A/ceph/osd/1 is allowed but
- pvesh delete nodes/A/ceph/osd/2 is not
Destroying an OSD via GUI automatically inserts the correct node
into the API path.
pveceph automatically insert the local node into the API call, too.
Consequently, it can now only destroy local OSDs (fix#2053).
- pveceph osd destroy 1 is allowed on node A but
- pveceph osd destroy 2 is not
Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>