Commit Graph

113 Commits

Author SHA1 Message Date
Alwin Antreich
0b6a283801 fix #2422: allow multiple Ceph public networks
Multiple public networks can be defined in the ceph.conf. The networks need to
be routed to each other.

Support handling multiple IPs for a single monitor. By default, one address from
each public network is selected for monitor creation, but, as before, it can be
overwritten with the mon-address parameter, now taking a list of addresses.

On removal, make sure the all addresses are removed from the mon_host entry in
the ceph configuration.

Originally-by: Alwin Antreich <a.antreich@proxmox.com>
[handling of multiple addresses]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:05 +02:00
Fabian Ebner
815325da0d api: ceph: mon: fix handling of IPv6 addresses in destroymon
by also comparing the canonical form to decide when to remove an address. When
getting the IP from the rados information, also drop eventual brackets, so our
existing function can handle it. Add the brackets back within the
remove_addr_from_mon_host function.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:05 +02:00
Fabian Ebner
3e10f0fcdb api: ceph: mon: factor out mon_host regex address removal
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
9e989449ae api: ceph: mon: fix handling of IPv6 addresses in assert_mon_prerequisites
by comparing their canonical forms.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
4be756f59c api: ceph: mon: add ips_from_mon_host helper
Partially based on pve-storage's CephConfig.pm get_monaddr_list, but the
interface is not the best for the use case here.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
396acb1577 api: ceph: mon: fix handling of IPv6 addresses in find_mon_ip
by comparing their canonical forms.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
8ecaa0bfbe api: ceph: create mon: explicitly add subsequent monitors to the monmap
in preparation for supporting multiple addresses. The config section does not
allow more than one public_addr.

Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
57951fc78b api: ceph: create mon: factor out monmaptool command
so it's easier to re-use for a future variant.

Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
d3b899c144 api: ceph: create mon: handle ms_bind_ipv* options more generally
mostly relevant to prepare support for IPv4/IPv6 dual stack mode as a special
case of the planned support for mutliple public networks.

As before, only set the false value when we are dealing with the first address,
but also be explicit about the IPv4 case as the defaults might change in the
future.

Then, when an address of a different type comes along later, set the relevant
bind option to true.

Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
6e96b07078 api: ceph: mon: split up arguments for run_command
no functional change is intended.

Reviewed-by: Dominik Csapak <d.csapak@proxmox.com>
Tested-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-18 17:13:04 +02:00
Fabian Ebner
596bb7b11a api: ceph: osd: create: rename size parameters
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-06-09 11:29:34 +02:00
Thomas Lamprecht
51498a2664 ceph: code/indentation cleanup
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-05-03 14:03:32 +02:00
Thomas Lamprecht
0dd48804e1 api: ceph/monitor: automatically disable insecure global ID reclaim after creating first monitor
nautilus 14.2.20 and octopus 15.2.11 fixed a security issue with
reclaiming the global ID auth (CVE-2021-20288). As fixing this issue
means that older client won't be able to connect anymore, the fix was
done behind a switch, with a HEALTH warning if it was not active
(i.e., disallowed connection from older clients).

New installations have this switch also at the insecure level, for
compat reasons, so lets deactivate it ourself after monitor creation
to avoid the health warning and slightly insecure setup (in default
PVE ceph the whole issue was of rather low impact/risk). But, only do
so when creating the first monitor of a ceph cluster, to avoid
breaking existing setups by accident.

An admin can always switch it back again, e.g., if they're recovering
from some failure and need to setup fresh monitors but have still old
clients.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-27 12:35:34 +02:00
Thomas Lamprecht
a91bd3c370 api: ceph pool create: replace left-over complex error handling
this was from the time where we had a loop here to add two storages,
one for KRDB-only and one for KRBD-never. Nowadays we can handle the
mixed case just fine, but the patch dropping that forget to cleanup
the error handling..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-21 17:34:23 +02:00
Thomas Lamprecht
84b08e8aec api: ceph/pool: fix formatting of API parameters
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-21 15:45:35 +02:00
Dominik Csapak
08db34257a API2/Ceph/Pools: remove unnecessary boolean conversion
we do nothing with that field, so leave it like it is

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-04-20 18:20:35 +02:00
Alwin Antreich
6b36f36842 ceph: set allowed minimal pg_num down to 1
In Ceph Octopus the device_health_metrics pool is auto-created with 1
PG. Since Ceph has the ability to split/merge PGs, hitting the wrong PG
count is now less of an issue anyhow.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-04-20 18:20:35 +02:00
Alwin Antreich
5a3d794242 ceph: add autoscale_status to api calls
the properties target_size_ratio, target_size_bytes and pg_num_min are
used to fine-tune the pg_autoscaler and are set on a pool. The updated
pool list shows now autoscale settings & status. Including the new
(optimal) target PGs. To make it easier for new users to get/set the
correct amount of PGs.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-04-20 18:20:35 +02:00
Thomas Lamprecht
d7a63207a3 ceph: osd_belongs_to_node: only check tree-entries of type host, refactor
We want to check explicitly for type host, so filter for that first
and create a hash map for easier usage afterwards.

Drop the error when there's no tree, as either RADOS error'd on bad
command already, or there really is no tree (but RADOS worked OK), in
which case we simply return that the OSD did not belong to this node.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-20 18:06:07 +02:00
Dominic Jäger
220173e9c6 Fix #2053: OSD destroy only on specified node
Allow destroying only OSDs that belong to the node that has been specified in
the API path.

So if
 - OSD 1 belongs to node A and
 - OSD 2 belongs to node B
then
 - pvesh delete nodes/A/ceph/osd/1 is allowed but
 - pvesh delete nodes/A/ceph/osd/2 is not

Destroying an OSD via GUI automatically inserts the correct node
into the API path.

pveceph automatically insert the local node into the API call, too.
Consequently, it can now only destroy local OSDs (fix #2053).
 - pveceph osd destroy 1 is allowed on node A but
 - pveceph osd destroy 2 is not

Signed-off-by: Dominic Jäger <d.jaeger@proxmox.com>
2021-04-20 16:42:12 +02:00
Alwin Antreich
54ba7dd991 ceph: add get api call for single pool
Information of a single pool can be queried.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2021-02-06 14:28:39 +01:00
Alwin Antreich
461e214155 ceph: add titles to ceph_pool_common_options
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2021-02-06 14:25:53 +01:00
Alwin Antreich
51d6db5815 ceph: setpool, use parameter extraction instead
of the unneeded ref copy for params.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2021-02-06 14:24:30 +01:00
Alwin Antreich
56d02a863b api: ceph: subclass pools
for better handling and since the pool endpoints got more entries.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2021-02-06 14:17:53 +01:00
Stoiko Ivanov
c92fc8a1e8 api2: osd destroy: untaint device before pvremove
We get the device list from ceph-volume lvm list, and decode the json
output, which at that point is tainted (perlsec (1)).
Untaint it here before calling, because it is currently the only
call-site using the information in a problematic way (run_command).
(the only other call-site being in pve5to6)

Alternatively we could untaint while reading the information, but then
should only return a small subset of the ceph-volume output.

The issue is most likely due to
cb9db10c1a9855cf40ff13e81f9dd97d6a9b2698 in pve-common ('run_command:
improve performance for logging and long lines'),

Tested on a virtual testsetup by creating OSDs with second DB disk,
and destroying it via GUI (did not manage to get the error without the
DB disk)

Reported via our community forum:
https://forum.proxmox.com/threads/insecure-dependency-in-exec-during-osd-destroy.79574/

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2020-11-24 23:37:33 +01:00
Stoiko Ivanov
259b557cf4 api2: osd destroy: fix error function
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2020-11-24 23:37:33 +01:00
Alwin Antreich
2184098ed3 Allow setting device class on osd create
In some situations Ceph's auto-detection doesn't recognize the device
class correctly. The option allows to set it directly on osd create,
instead of altering it afterwards. This way the cluster doesn't need to
shift data back and forth unnecessarily.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2020-07-24 10:26:11 +02:00
Alwin Antreich
e25dda254c Make PVE6 compatible with supported ceph versions
Luminous, Nautilus and Octopus. In Octopus the mon_status was dropped.
Also the ceph status was cleaned up and doesn't provide the mgrmap and
monmap.

The rados queries used in the ceph status API endpoints (cluster / node)
were factored out and merged to one place.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2020-06-03 14:23:38 +02:00
Alwin Antreich
485b2cd10a Fix: ceph: mon_address not considered by new MON
The public_addr option for creating a new MON is only valid for manual
startup (since Ceph Jewel) and is just ignored by ceph-mon during setup.
As the MON is started after the creation through systemd without an IP
specified. It is trying to auto-select an IP.

Before this patch the public_addr was only explicitly written to the
ceph.conf if no public_network was set. The mon_address is only needed
in the config on the first start of the MON.

The ceph-mon itself tries to select an IP on the following conditions.
- no public_network or public_addr is in the ceph.conf
    * startup fails

- public_network is in the ceph.conf
    * with a single network, take the first available IP
    * on multiple networks, walk through the list orderly and start on
      the first network where an IP is found

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2020-04-15 09:52:31 +02:00
Dominik Csapak
7e98f79e40 ceph: make all service name regexes the same
instead of having multiple regexes in various places for the name,
define a 'SERVICE_REGEX' in PVE::Ceph::Services, and use that
everywhere in the api where we need it

additionally limit new sevices to 200 characters, since
systemd units have a limit of 256 characters[0] (including suffix), and
200 seems to be enough.

users can now create ceph services on machines with hostnames
longer than 32 characters

0: https://www.freedesktop.org/software/systemd/man/systemd.unit.html

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-03-04 15:38:09 +01:00
Dominik Csapak
a0ef509a66 ceph: do not check ips if no network is configured
the network and the cluster network are optional in the ceph config
and with 'pveceph init', so only check if we have an ip address
from those networks if it is actually configured

otherwise, the createosd call dies with an 'ip' error message
even if it would work

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2020-03-04 15:38:09 +01:00
Thomas Lamprecht
a05349ab35 followup: add a bit of context to error message
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-12-16 15:38:50 +01:00
Aaron Lauterer
05bd76ac0e API: OSD: Fix #2496 Check OSD Network
It's possible to have a situation where the cluster network (used for
inter-OSD traffic) is not configured on a node. The OSD can still be
created but can't communicate.

This check will abort the creation if there is no IP within the subnet
of the cluster network present on the node. If there is no dedicated
cluster network the public network is used as a failsafe even though
this situation should not occur.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2019-12-16 15:12:18 +01:00
Alwin Antreich
4a8145e329 ceph: Create symlink on standalone MGR creation
Ceph MGR fails to start when installed on a node without existing
symlink to /etc/pve/ceph.conf.

Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
2019-12-09 14:11:05 +01:00
Thomas Lamprecht
f6b2b1708f api mon: allow full-mesh routed setup for monitor IP
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-26 15:42:24 +01:00
Thomas Lamprecht
a740deff88 fix typos all over the place
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-09-03 07:55:32 +02:00
Dominik Csapak
385df8382d fix #2341: ceph: osd create: allow db/wal on partioned disks
It was intended that for partitioned disks, we create one and use it.
Instead the code died always when the disk was used and not of type 'LVM'

We now check correctly the 2 cases:
* used for partitions and has gpt
* used and lvm

The remaining api call handles those two cases correctly

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-08-22 14:09:20 +02:00
Thomas Lamprecht
7ef69f338e ceph tools: factor out frequent keyring and config init check
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-07-23 07:48:45 +02:00
Thomas Lamprecht
cead98bd69 api/osd: opinionated code cleanup of list
among others: reduce use of sub-hash as index for another hash

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-07-22 16:25:07 +02:00
Dominik Csapak
69ad2e539e ceph: osd list: add hostversions to the host nodes
we want to improve the version hints in the osd tree gui and need
the version at the host nodes

we could (and want to) workaround it in the gui to have that
info for both versions of the api call

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-07-22 15:52:07 +02:00
Thomas Lamprecht
67d8218fbd fix #2292: ceph osd create: use size parameter for db/wal
commit 970f96fdbb did not account for
getting the correct size parameter from the api call, so we ignored
it always resulting in uses not be able to set an explicit db/wal
size

Originally-fixed-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-07-19 11:05:49 +02:00
Fabian Grünbichler
b4cb37e057 ceph destroymon: actually die on errors
instead of silently ignoring them. since we are in a task worker here
this is especially important - otherwise the task status/result is also
wrong!

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2019-07-17 13:01:31 +02:00
Thomas Lamprecht
7c9f66d036 followup code cleanup
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-07-15 10:57:00 +02:00
Dominik Csapak
199aa9efb7 ceph: mon list: show only as running when monitor is quorate
nautilus puts non running monitors also in the monmap, so only show
as running when it has quorum

this is also not 100% correct, but the only 'correct' alternative is
to try and get/parse the systemd status of the units and broadcast it
to the pmxcfs

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-07-15 10:56:14 +02:00
Thomas Lamprecht
9cc5ac9e75 api/ceph: code cleanup
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-07-11 14:16:11 +02:00
Dominik Csapak
b7701301a8 api/ceph: add osd scrub api call
can be called to (deep) scrub a specific osd

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-07-11 14:16:06 +02:00
Dominik Csapak
351d128f80 ceph: mon create: add known monitor ips to mon_host if it is empty
this fixes an issue where only one monitor is in mon_host, which is
offline, prevents a client connection

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-07-04 09:57:50 +02:00
Dominik Csapak
217dde83f0 ceph: osd: use get-or-create to create a bootstrap-osd key on demand
if for some reason the cluster does not have this key, generate it

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-07-04 09:57:50 +02:00
Dominik Csapak
7712a4e151 ceph: osd create: check for auth before getting bootstrap key
we do not need it if auth is 'none'

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2019-07-04 09:57:50 +02:00
Thomas Lamprecht
8ec913c1cc followup: do not use string comparision for integers
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-07-03 15:34:19 +02:00