mirror of
https://git.proxmox.com/git/ceph.git
synced 2025-04-28 10:45:26 +00:00
import ceph reef 18.2.4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This commit is contained in:
parent
e9fe820e7f
commit
f38dd50b34
@ -1,7 +1,7 @@
|
||||
cmake_minimum_required(VERSION 3.16)
|
||||
|
||||
project(ceph
|
||||
VERSION 18.2.2
|
||||
VERSION 18.2.4
|
||||
LANGUAGES CXX C ASM)
|
||||
|
||||
cmake_policy(SET CMP0028 NEW)
|
||||
@ -247,6 +247,15 @@ set(HAVE_LIBURING ${WITH_LIBURING})
|
||||
CMAKE_DEPENDENT_OPTION(WITH_SYSTEM_LIBURING "Require and build with system liburing" OFF
|
||||
"HAVE_LIBAIO;WITH_BLUESTORE" OFF)
|
||||
|
||||
if(WITH_LIBURING)
|
||||
if(WITH_SYSTEM_LIBURING)
|
||||
find_package(uring REQUIRED)
|
||||
else()
|
||||
include(Builduring)
|
||||
build_uring()
|
||||
endif()
|
||||
endif()
|
||||
|
||||
CMAKE_DEPENDENT_OPTION(WITH_BLUESTORE_PMEM "Enable PMDK libraries" OFF
|
||||
"WITH_BLUESTORE" OFF)
|
||||
if(WITH_BLUESTORE_PMEM)
|
||||
@ -679,7 +688,7 @@ if(WITH_SYSTEM_NPM)
|
||||
message(FATAL_ERROR "Can't find npm.")
|
||||
endif()
|
||||
endif()
|
||||
set(DASHBOARD_FRONTEND_LANGS "" CACHE STRING
|
||||
set(DASHBOARD_FRONTEND_LANGS "ALL" CACHE STRING
|
||||
"List of comma separated ceph-dashboard frontend languages to build. \
|
||||
Use value `ALL` to build all languages")
|
||||
CMAKE_DEPENDENT_OPTION(WITH_MGR_ROOK_CLIENT "Enable the mgr's Rook support" ON
|
||||
|
@ -1,3 +1,17 @@
|
||||
>=18.2.2
|
||||
--------
|
||||
|
||||
* RBD: When diffing against the beginning of time (`fromsnapname == NULL`) in
|
||||
fast-diff mode (`whole_object == true` with `fast-diff` image feature enabled
|
||||
and valid), diff-iterate is now guaranteed to execute locally if exclusive
|
||||
lock is available. This brings a dramatic performance improvement for QEMU
|
||||
live disk synchronization and backup use cases.
|
||||
* RADOS: `get_pool_is_selfmanaged_snaps_mode` C++ API has been deprecated
|
||||
due to being prone to false negative results. It's safer replacement is
|
||||
`pool_is_in_selfmanaged_snaps_mode`.
|
||||
* RBD: The option ``--image-id`` has been added to `rbd children` CLI command,
|
||||
so it can be run for images in the trash.
|
||||
|
||||
>=19.0.0
|
||||
|
||||
* RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in
|
||||
@ -47,6 +61,52 @@
|
||||
affected and to clean them up accordingly.
|
||||
* mgr/snap-schedule: For clusters with multiple CephFS file systems, all the
|
||||
snap-schedule commands now expect the '--fs' argument.
|
||||
* RGW: Fixed a S3 Object Lock bug with PutObjectRetention requests that specify
|
||||
a RetainUntilDate after the year 2106. This date was truncated to 32 bits when
|
||||
stored, so a much earlier date was used for object lock enforcement. This does
|
||||
not effect PutBucketObjectLockConfiguration where a duration is given in Days.
|
||||
The RetainUntilDate encoding is fixed for new PutObjectRetention requests, but
|
||||
cannot repair the dates of existing object locks. Such objects can be identified
|
||||
with a HeadObject request based on the x-amz-object-lock-retain-until-date
|
||||
response header.
|
||||
* RADOS: `get_pool_is_selfmanaged_snaps_mode` C++ API has been deprecated
|
||||
due to being prone to false negative results. It's safer replacement is
|
||||
`pool_is_in_selfmanaged_snaps_mode`.
|
||||
* RADOS: For bug 62338 (https://tracker.ceph.com/issues/62338), we did not choose
|
||||
to condition the fix on a server flag in order to simplify backporting. As
|
||||
a result, in rare cases it may be possible for a PG to flip between two acting
|
||||
sets while an upgrade to a version with the fix is in progress. If you observe
|
||||
this behavior, you should be able to work around it by completing the upgrade or
|
||||
by disabling async recovery by setting osd_async_recovery_min_cost to a very
|
||||
large value on all OSDs until the upgrade is complete:
|
||||
``ceph config set osd osd_async_recovery_min_cost 1099511627776``
|
||||
* RADOS: A detailed version of the `balancer status` CLI command in the balancer
|
||||
module is now available. Users may run `ceph balancer status detail` to see more
|
||||
details about which PGs were updated in the balancer's last optimization.
|
||||
See https://docs.ceph.com/en/latest/rados/operations/balancer/ for more information.
|
||||
* CephFS: For clusters with multiple CephFS file systems, all the snap-schedule
|
||||
commands now expect the '--fs' argument.
|
||||
* CephFS: The period specifier ``m`` now implies minutes and the period specifier
|
||||
``M`` now implies months. This has been made consistent with the rest
|
||||
of the system.
|
||||
* CephFS: Full support for subvolumes and subvolume groups is now available
|
||||
for snap_schedule Manager module.
|
||||
|
||||
* CephFS: The `subvolume snapshot clone` command now depends on the config option
|
||||
`snapshot_clone_no_wait` which is used to reject the clone operation when
|
||||
all the cloner threads are busy. This config option is enabled by default which means
|
||||
that if no cloner threads are free, the clone request errors out with EAGAIN.
|
||||
The value of the config option can be fetched by using:
|
||||
`ceph config get mgr mgr/volumes/snapshot_clone_no_wait`
|
||||
and it can be disabled by using:
|
||||
`ceph config set mgr mgr/volumes/snapshot_clone_no_wait false`
|
||||
* CephFS: fixes to the implementation of the ``root_squash`` mechanism enabled
|
||||
via cephx ``mds`` caps on a client credential require a new client feature
|
||||
bit, ``client_mds_auth_caps``. Clients using credentials with ``root_squash``
|
||||
without this feature will trigger the MDS to raise a HEALTH_ERR on the
|
||||
cluster, MDS_CLIENTS_BROKEN_ROOTSQUASH. See the documentation on this warning
|
||||
and the new feature bit for more information.
|
||||
|
||||
|
||||
>=18.0.0
|
||||
|
||||
@ -54,6 +114,10 @@
|
||||
mirroring policies between RGW and AWS, you may wish to set
|
||||
"rgw policy reject invalid principals" to "false". This affects only newly set
|
||||
policies, not policies that are already in place.
|
||||
* The CephFS automatic metadata load (sometimes called "default") balancer is
|
||||
now disabled by default. The new file system flag `balance_automate`
|
||||
can be used to toggle it on or off. It can be enabled or disabled via
|
||||
`ceph fs set <fs_name> balance_automate <bool>`.
|
||||
* RGW's default backend for `rgw_enable_ops_log` changed from RADOS to file.
|
||||
The default value of `rgw_ops_log_rados` is now false, and `rgw_ops_log_file_path`
|
||||
defaults to "/var/log/ceph/ops-log-$cluster-$name.log".
|
||||
@ -226,6 +290,11 @@
|
||||
than the number mentioned against the config tunable `mds_max_snaps_per_dir`
|
||||
so that a new snapshot can be created and retained during the next schedule
|
||||
run.
|
||||
* `ceph config dump --format <json|xml>` output will display the localized
|
||||
option names instead of its normalized version. For e.g.,
|
||||
"mgr/prometheus/x/server_port" will be displayed instead of
|
||||
"mgr/prometheus/server_port". This matches the output of the non pretty-print
|
||||
formatted version of the command.
|
||||
|
||||
>=17.2.1
|
||||
|
||||
@ -291,3 +360,14 @@ Relevant tracker: https://tracker.ceph.com/issues/55715
|
||||
request from client(s). This can be useful during some recovery situations
|
||||
where it's desirable to bring MDS up but have no client workload.
|
||||
Relevant tracker: https://tracker.ceph.com/issues/57090
|
||||
|
||||
* New MDSMap field `max_xattr_size` which can be set using the `fs set` command.
|
||||
This MDSMap field allows to configure the maximum size allowed for the full
|
||||
key/value set for a filesystem extended attributes. It effectively replaces
|
||||
the old per-MDS `max_xattr_pairs_size` setting, which is now dropped.
|
||||
Relevant tracker: https://tracker.ceph.com/issues/55725
|
||||
|
||||
* Introduced a new file system flag `refuse_standby_for_another_fs` that can be
|
||||
set using the `fs set` command. This flag prevents using a standby for another
|
||||
file system (join_fs = X) when standby for the current filesystem is not available.
|
||||
Relevant tracker: https://tracker.ceph.com/issues/61599
|
||||
|
@ -1,4 +1,4 @@
|
||||
Sphinx == 4.5.0
|
||||
Sphinx == 5.0.2
|
||||
git+https://github.com/ceph/sphinx-ditaa.git@py3#egg=sphinx-ditaa
|
||||
git+https://github.com/vlasovskikh/funcparserlib.git
|
||||
breathe >= 4.20.0,!=4.33
|
||||
|
@ -1,6 +1,6 @@
|
||||
ceph-menv
|
||||
|
||||
Environment assistant for use in conjuction with multiple ceph vstart (or more accurately mstart) clusters. Eliminates the need to specify the cluster that is being used with each and every command. Can provide a shell prompt feedback about the currently used cluster.
|
||||
Environment assistant for use in conjunction with multiple Ceph vstart (or more accurately mstart) clusters. Eliminates the need to specify the cluster that is being used with each and every command. Can provide a shell prompt feedback about the currently used cluster.
|
||||
|
||||
|
||||
Usage:
|
||||
|
124
ceph/ceph.spec
124
ceph/ceph.spec
@ -35,8 +35,8 @@
|
||||
%else
|
||||
%bcond_with rbd_rwl_cache
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?rhel} < 9
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%if 0%{?rhel} < 9 || 0%{?openEuler}
|
||||
%bcond_with system_pmdk
|
||||
%else
|
||||
%ifarch s390x aarch64
|
||||
@ -93,7 +93,7 @@
|
||||
%endif
|
||||
%endif
|
||||
%bcond_with seastar
|
||||
%if 0%{?suse_version}
|
||||
%if 0%{?suse_version} || 0%{?openEuler}
|
||||
%bcond_with jaeger
|
||||
%else
|
||||
%bcond_without jaeger
|
||||
@ -112,7 +112,7 @@
|
||||
# this is tracked in https://bugzilla.redhat.com/2152265
|
||||
%bcond_with system_arrow
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?suse_version} || 0%{?rhel} >= 8
|
||||
%if 0%{?fedora} || 0%{?suse_version} || 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
%global weak_deps 1
|
||||
%endif
|
||||
%if %{with selinux}
|
||||
@ -170,7 +170,7 @@
|
||||
# main package definition
|
||||
#################################################################################
|
||||
Name: ceph
|
||||
Version: 18.2.2
|
||||
Version: 18.2.4
|
||||
Release: 0%{?dist}
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
Epoch: 2
|
||||
@ -186,7 +186,7 @@ License: LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-
|
||||
Group: System/Filesystems
|
||||
%endif
|
||||
URL: http://ceph.com/
|
||||
Source0: %{?_remote_tarball_prefix}ceph-18.2.2.tar.bz2
|
||||
Source0: %{?_remote_tarball_prefix}ceph-18.2.4.tar.bz2
|
||||
%if 0%{?suse_version}
|
||||
# _insert_obs_source_lines_here
|
||||
ExclusiveArch: x86_64 aarch64 ppc64le s390x
|
||||
@ -211,7 +211,7 @@ BuildRequires: selinux-policy-devel
|
||||
BuildRequires: gperf
|
||||
BuildRequires: cmake > 3.5
|
||||
BuildRequires: fuse-devel
|
||||
%if 0%{?fedora} || 0%{?suse_version} > 1500 || 0%{?rhel} == 9
|
||||
%if 0%{?fedora} || 0%{?suse_version} > 1500 || 0%{?rhel} == 9 || 0%{?openEuler}
|
||||
BuildRequires: gcc-c++ >= 11
|
||||
%endif
|
||||
%if 0%{?suse_version} == 1500
|
||||
@ -222,12 +222,12 @@ BuildRequires: %{gts_prefix}-gcc-c++
|
||||
BuildRequires: %{gts_prefix}-build
|
||||
BuildRequires: %{gts_prefix}-libatomic-devel
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel} == 9
|
||||
%if 0%{?fedora} || 0%{?rhel} == 9 || 0%{?openEuler}
|
||||
BuildRequires: libatomic
|
||||
%endif
|
||||
%if 0%{with tcmalloc}
|
||||
# libprofiler did not build on ppc64le until 2.7.90
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
BuildRequires: gperftools-devel >= 2.7.90
|
||||
%endif
|
||||
%if 0%{?rhel} && 0%{?rhel} < 8
|
||||
@ -379,7 +379,7 @@ BuildRequires: liblz4-devel >= 1.7
|
||||
BuildRequires: golang-github-prometheus-prometheus
|
||||
BuildRequires: jsonnet
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
Requires: systemd
|
||||
BuildRequires: boost-random
|
||||
BuildRequires: nss-devel
|
||||
@ -401,7 +401,7 @@ BuildRequires: lz4-devel >= 1.7
|
||||
# distro-conditional make check dependencies
|
||||
%if 0%{with make_check}
|
||||
BuildRequires: golang
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
BuildRequires: golang-github-prometheus
|
||||
BuildRequires: libtool-ltdl-devel
|
||||
BuildRequires: xmlsec1
|
||||
@ -412,7 +412,6 @@ BuildRequires: xmlsec1-nss
|
||||
BuildRequires: xmlsec1-openssl
|
||||
BuildRequires: xmlsec1-openssl-devel
|
||||
BuildRequires: python%{python3_pkgversion}-cherrypy
|
||||
BuildRequires: python%{python3_pkgversion}-jwt
|
||||
BuildRequires: python%{python3_pkgversion}-routes
|
||||
BuildRequires: python%{python3_pkgversion}-scipy
|
||||
BuildRequires: python%{python3_pkgversion}-werkzeug
|
||||
@ -425,7 +424,6 @@ BuildRequires: libxmlsec1-1
|
||||
BuildRequires: libxmlsec1-nss1
|
||||
BuildRequires: libxmlsec1-openssl1
|
||||
BuildRequires: python%{python3_pkgversion}-CherryPy
|
||||
BuildRequires: python%{python3_pkgversion}-PyJWT
|
||||
BuildRequires: python%{python3_pkgversion}-Routes
|
||||
BuildRequires: python%{python3_pkgversion}-Werkzeug
|
||||
BuildRequires: python%{python3_pkgversion}-numpy-devel
|
||||
@ -435,7 +433,7 @@ BuildRequires: xmlsec1-openssl-devel
|
||||
%endif
|
||||
# lttng and babeltrace for rbd-replay-prep
|
||||
%if %{with lttng}
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
BuildRequires: lttng-ust-devel
|
||||
BuildRequires: libbabeltrace-devel
|
||||
%endif
|
||||
@ -447,15 +445,18 @@ BuildRequires: babeltrace-devel
|
||||
%if 0%{?suse_version}
|
||||
BuildRequires: libexpat-devel
|
||||
%endif
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
BuildRequires: expat-devel
|
||||
%endif
|
||||
#hardened-cc1
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
BuildRequires: redhat-rpm-config
|
||||
%endif
|
||||
%if 0%{?openEuler}
|
||||
BuildRequires: openEuler-rpm-config
|
||||
%endif
|
||||
%if 0%{with seastar}
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
BuildRequires: cryptopp-devel
|
||||
BuildRequires: numactl-devel
|
||||
%endif
|
||||
@ -543,7 +544,7 @@ Requires: python%{python3_pkgversion}-cephfs = %{_epoch_prefix}%{version}-%{rele
|
||||
Requires: python%{python3_pkgversion}-rgw = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: python%{python3_pkgversion}-ceph-argparse = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-prettytable
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
@ -615,9 +616,8 @@ Requires: ceph-mgr = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: ceph-grafana-dashboards = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: ceph-prometheus-alerts = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: python%{python3_pkgversion}-setuptools
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-cherrypy
|
||||
Requires: python%{python3_pkgversion}-jwt
|
||||
Requires: python%{python3_pkgversion}-routes
|
||||
Requires: python%{python3_pkgversion}-werkzeug
|
||||
%if 0%{?weak_deps}
|
||||
@ -626,7 +626,6 @@ Recommends: python%{python3_pkgversion}-saml
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
Requires: python%{python3_pkgversion}-CherryPy
|
||||
Requires: python%{python3_pkgversion}-PyJWT
|
||||
Requires: python%{python3_pkgversion}-Routes
|
||||
Requires: python%{python3_pkgversion}-Werkzeug
|
||||
Recommends: python%{python3_pkgversion}-python3-saml
|
||||
@ -645,7 +644,7 @@ Group: System/Filesystems
|
||||
%endif
|
||||
Requires: ceph-mgr = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: python%{python3_pkgversion}-numpy
|
||||
%if 0%{?fedora} || 0%{?suse_version}
|
||||
%if 0%{?fedora} || 0%{?suse_version} || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-scikit-learn
|
||||
%endif
|
||||
Requires: python3-scipy
|
||||
@ -665,7 +664,7 @@ Requires: python%{python3_pkgversion}-pyOpenSSL
|
||||
Requires: python%{python3_pkgversion}-requests
|
||||
Requires: python%{python3_pkgversion}-dateutil
|
||||
Requires: python%{python3_pkgversion}-setuptools
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-cherrypy
|
||||
Requires: python%{python3_pkgversion}-pyyaml
|
||||
Requires: python%{python3_pkgversion}-werkzeug
|
||||
@ -722,7 +721,7 @@ Requires: openssh
|
||||
Requires: python%{python3_pkgversion}-CherryPy
|
||||
Requires: python%{python3_pkgversion}-Jinja2
|
||||
%endif
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Requires: openssh-clients
|
||||
Requires: python%{python3_pkgversion}-cherrypy
|
||||
Requires: python%{python3_pkgversion}-jinja2
|
||||
@ -814,7 +813,7 @@ Requires: ceph-selinux = %{_epoch_prefix}%{version}-%{release}
|
||||
%endif
|
||||
Requires: librados2 = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: librgw2 = %{_epoch_prefix}%{version}-%{release}
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Requires: mailcap
|
||||
%endif
|
||||
%if 0%{?weak_deps}
|
||||
@ -894,6 +893,7 @@ Requires: parted
|
||||
Requires: util-linux
|
||||
Requires: xfsprogs
|
||||
Requires: python%{python3_pkgversion}-setuptools
|
||||
Requires: python%{python3_pkgversion}-packaging
|
||||
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
|
||||
%description volume
|
||||
This package contains a tool to deploy OSD with different devices like
|
||||
@ -905,7 +905,7 @@ Summary: RADOS distributed object store client library
|
||||
%if 0%{?suse_version}
|
||||
Group: System/Libraries
|
||||
%endif
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
|
||||
%endif
|
||||
%description -n librados2
|
||||
@ -1052,7 +1052,7 @@ Requires: librados2 = %{_epoch_prefix}%{version}-%{release}
|
||||
%if 0%{?suse_version}
|
||||
Requires(post): coreutils
|
||||
%endif
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
|
||||
%endif
|
||||
%description -n librbd1
|
||||
@ -1096,7 +1096,7 @@ Summary: Ceph distributed file system client library
|
||||
Group: System/Libraries
|
||||
%endif
|
||||
Obsoletes: libcephfs1 < %{_epoch_prefix}%{version}-%{release}
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
|
||||
Obsoletes: ceph-libcephfs
|
||||
%endif
|
||||
@ -1149,7 +1149,7 @@ descriptions, and submitting the command to the appropriate daemon.
|
||||
|
||||
%package -n python%{python3_pkgversion}-ceph-common
|
||||
Summary: Python 3 utility libraries for Ceph
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-pyyaml
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
@ -1288,11 +1288,20 @@ Group: System/Monitoring
|
||||
%description mib
|
||||
This package provides a Ceph MIB for SNMP traps.
|
||||
|
||||
%package node-proxy
|
||||
Summary: hw monitoring agent for Ceph
|
||||
BuildArch: noarch
|
||||
%if 0%{?suse_version}
|
||||
Group: System/Monitoring
|
||||
%endif
|
||||
%description node-proxy
|
||||
This package provides a Ceph hardware monitoring agent.
|
||||
|
||||
#################################################################################
|
||||
# common
|
||||
#################################################################################
|
||||
%prep
|
||||
%autosetup -p1 -n ceph-18.2.2
|
||||
%autosetup -p1 -n ceph-18.2.4
|
||||
|
||||
%build
|
||||
# Disable lto on systems that do not support symver attribute
|
||||
@ -1467,7 +1476,7 @@ install -m 0755 %{buildroot}%{_bindir}/crimson-osd %{buildroot}%{_bindir}/ceph-o
|
||||
%endif
|
||||
|
||||
install -m 0644 -D src/etc-rbdmap %{buildroot}%{_sysconfdir}/ceph/rbdmap
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
install -m 0644 -D etc/sysconfig/ceph %{buildroot}%{_sysconfdir}/sysconfig/ceph
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
@ -1501,7 +1510,7 @@ install -m 0644 -D udev/50-rbd.rules %{buildroot}%{_udevrulesdir}/50-rbd.rules
|
||||
# sudoers.d
|
||||
install -m 0440 -D sudoers.d/ceph-smartctl %{buildroot}%{_sysconfdir}/sudoers.d/ceph-smartctl
|
||||
|
||||
%if 0%{?rhel} >= 8
|
||||
%if 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
pathfix.py -pni "%{__python3} %{py3_shbang_opts}" %{buildroot}%{_bindir}/*
|
||||
pathfix.py -pni "%{__python3} %{py3_shbang_opts}" %{buildroot}%{_sbindir}/*
|
||||
%endif
|
||||
@ -1538,7 +1547,7 @@ install -m 644 -D -t %{buildroot}%{_datadir}/snmp/mibs monitoring/snmp/CEPH-MIB.
|
||||
%fdupes %{buildroot}%{_prefix}
|
||||
%endif
|
||||
|
||||
%if 0%{?rhel} == 8
|
||||
%if 0%{?rhel} == 8 || 0%{?openEuler}
|
||||
%py_byte_compile %{__python3} %{buildroot}%{python3_sitelib}
|
||||
%endif
|
||||
|
||||
@ -1581,7 +1590,7 @@ rm -rf %{_vpath_builddir}
|
||||
%{_libdir}/libosd_tp.so*
|
||||
%endif
|
||||
%config(noreplace) %{_sysconfdir}/logrotate.d/ceph
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%config(noreplace) %{_sysconfdir}/sysconfig/ceph
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
@ -1614,7 +1623,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph.target ceph-crash.service >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph.target ceph-crash.service
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -1625,7 +1634,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph.target ceph-crash.service
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph.target ceph-crash.service
|
||||
%endif
|
||||
|
||||
@ -1722,7 +1731,7 @@ exit 0
|
||||
%pre common
|
||||
CEPH_GROUP_ID=167
|
||||
CEPH_USER_ID=167
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
/usr/sbin/groupadd ceph -g $CEPH_GROUP_ID -o -r 2>/dev/null || :
|
||||
/usr/sbin/useradd ceph -u $CEPH_USER_ID -o -r -g ceph -s /sbin/nologin -c "Ceph daemons" -d %{_localstatedir}/lib/ceph 2>/dev/null || :
|
||||
%endif
|
||||
@ -1768,7 +1777,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-mds@\*.service ceph-mds.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-mds@\*.service ceph-mds.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -1779,7 +1788,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-mds@\*.service ceph-mds.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-mds@\*.service ceph-mds.target
|
||||
%endif
|
||||
|
||||
@ -1813,7 +1822,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-mgr@\*.service ceph-mgr.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-mgr@\*.service ceph-mgr.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -1824,7 +1833,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-mgr@\*.service ceph-mgr.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-mgr@\*.service ceph-mgr.target
|
||||
%endif
|
||||
|
||||
@ -1953,7 +1962,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-mon@\*.service ceph-mon.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-mon@\*.service ceph-mon.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -1964,7 +1973,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-mon@\*.service ceph-mon.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-mon@\*.service ceph-mon.target
|
||||
%endif
|
||||
|
||||
@ -2002,7 +2011,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset cephfs-mirror@\*.service cephfs-mirror.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post cephfs-mirror@\*.service cephfs-mirror.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2013,7 +2022,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun cephfs-mirror@\*.service cephfs-mirror.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun cephfs-mirror@\*.service cephfs-mirror.target
|
||||
%endif
|
||||
|
||||
@ -2033,6 +2042,7 @@ fi
|
||||
|
||||
%files -n ceph-exporter
|
||||
%{_bindir}/ceph-exporter
|
||||
%{_unitdir}/ceph-exporter.service
|
||||
|
||||
%files -n rbd-fuse
|
||||
%{_bindir}/rbd-fuse
|
||||
@ -2050,7 +2060,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-rbd-mirror@\*.service ceph-rbd-mirror.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2061,7 +2071,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
|
||||
%endif
|
||||
|
||||
@ -2091,7 +2101,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2102,7 +2112,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
|
||||
%endif
|
||||
|
||||
@ -2145,7 +2155,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-radosgw@\*.service ceph-radosgw.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-radosgw@\*.service ceph-radosgw.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2156,7 +2166,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-radosgw@\*.service ceph-radosgw.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-radosgw@\*.service ceph-radosgw.target
|
||||
%endif
|
||||
|
||||
@ -2196,7 +2206,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-osd@\*.service ceph-osd.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-osd@\*.service ceph-osd.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2212,7 +2222,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-osd@\*.service ceph-osd.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-osd@\*.service ceph-osd.target
|
||||
%endif
|
||||
|
||||
@ -2251,7 +2261,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-volume@\*.service >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-volume@\*.service
|
||||
%endif
|
||||
|
||||
@ -2259,7 +2269,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-volume@\*.service
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-volume@\*.service
|
||||
%endif
|
||||
|
||||
@ -2620,4 +2630,10 @@ exit 0
|
||||
%attr(0755,root,root) %dir %{_datadir}/snmp
|
||||
%{_datadir}/snmp/mibs
|
||||
|
||||
%files node-proxy
|
||||
%{_sbindir}/ceph-node-proxy
|
||||
%dir %{python3_sitelib}/ceph_node_proxy
|
||||
%{python3_sitelib}/ceph_node_proxy/*
|
||||
%{python3_sitelib}/ceph_node_proxy-*
|
||||
|
||||
%changelog
|
||||
|
@ -35,8 +35,8 @@
|
||||
%else
|
||||
%bcond_with rbd_rwl_cache
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?rhel} < 9
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%if 0%{?rhel} < 9 || 0%{?openEuler}
|
||||
%bcond_with system_pmdk
|
||||
%else
|
||||
%ifarch s390x aarch64
|
||||
@ -93,7 +93,7 @@
|
||||
%endif
|
||||
%endif
|
||||
%bcond_with seastar
|
||||
%if 0%{?suse_version}
|
||||
%if 0%{?suse_version} || 0%{?openEuler}
|
||||
%bcond_with jaeger
|
||||
%else
|
||||
%bcond_without jaeger
|
||||
@ -112,7 +112,7 @@
|
||||
# this is tracked in https://bugzilla.redhat.com/2152265
|
||||
%bcond_with system_arrow
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?suse_version} || 0%{?rhel} >= 8
|
||||
%if 0%{?fedora} || 0%{?suse_version} || 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
%global weak_deps 1
|
||||
%endif
|
||||
%if %{with selinux}
|
||||
@ -211,7 +211,7 @@ BuildRequires: selinux-policy-devel
|
||||
BuildRequires: gperf
|
||||
BuildRequires: cmake > 3.5
|
||||
BuildRequires: fuse-devel
|
||||
%if 0%{?fedora} || 0%{?suse_version} > 1500 || 0%{?rhel} == 9
|
||||
%if 0%{?fedora} || 0%{?suse_version} > 1500 || 0%{?rhel} == 9 || 0%{?openEuler}
|
||||
BuildRequires: gcc-c++ >= 11
|
||||
%endif
|
||||
%if 0%{?suse_version} == 1500
|
||||
@ -222,12 +222,12 @@ BuildRequires: %{gts_prefix}-gcc-c++
|
||||
BuildRequires: %{gts_prefix}-build
|
||||
BuildRequires: %{gts_prefix}-libatomic-devel
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel} == 9
|
||||
%if 0%{?fedora} || 0%{?rhel} == 9 || 0%{?openEuler}
|
||||
BuildRequires: libatomic
|
||||
%endif
|
||||
%if 0%{with tcmalloc}
|
||||
# libprofiler did not build on ppc64le until 2.7.90
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
BuildRequires: gperftools-devel >= 2.7.90
|
||||
%endif
|
||||
%if 0%{?rhel} && 0%{?rhel} < 8
|
||||
@ -379,7 +379,7 @@ BuildRequires: liblz4-devel >= 1.7
|
||||
BuildRequires: golang-github-prometheus-prometheus
|
||||
BuildRequires: jsonnet
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
Requires: systemd
|
||||
BuildRequires: boost-random
|
||||
BuildRequires: nss-devel
|
||||
@ -401,7 +401,7 @@ BuildRequires: lz4-devel >= 1.7
|
||||
# distro-conditional make check dependencies
|
||||
%if 0%{with make_check}
|
||||
BuildRequires: golang
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
BuildRequires: golang-github-prometheus
|
||||
BuildRequires: libtool-ltdl-devel
|
||||
BuildRequires: xmlsec1
|
||||
@ -412,7 +412,6 @@ BuildRequires: xmlsec1-nss
|
||||
BuildRequires: xmlsec1-openssl
|
||||
BuildRequires: xmlsec1-openssl-devel
|
||||
BuildRequires: python%{python3_pkgversion}-cherrypy
|
||||
BuildRequires: python%{python3_pkgversion}-jwt
|
||||
BuildRequires: python%{python3_pkgversion}-routes
|
||||
BuildRequires: python%{python3_pkgversion}-scipy
|
||||
BuildRequires: python%{python3_pkgversion}-werkzeug
|
||||
@ -425,7 +424,6 @@ BuildRequires: libxmlsec1-1
|
||||
BuildRequires: libxmlsec1-nss1
|
||||
BuildRequires: libxmlsec1-openssl1
|
||||
BuildRequires: python%{python3_pkgversion}-CherryPy
|
||||
BuildRequires: python%{python3_pkgversion}-PyJWT
|
||||
BuildRequires: python%{python3_pkgversion}-Routes
|
||||
BuildRequires: python%{python3_pkgversion}-Werkzeug
|
||||
BuildRequires: python%{python3_pkgversion}-numpy-devel
|
||||
@ -435,7 +433,7 @@ BuildRequires: xmlsec1-openssl-devel
|
||||
%endif
|
||||
# lttng and babeltrace for rbd-replay-prep
|
||||
%if %{with lttng}
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
BuildRequires: lttng-ust-devel
|
||||
BuildRequires: libbabeltrace-devel
|
||||
%endif
|
||||
@ -447,15 +445,18 @@ BuildRequires: babeltrace-devel
|
||||
%if 0%{?suse_version}
|
||||
BuildRequires: libexpat-devel
|
||||
%endif
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
BuildRequires: expat-devel
|
||||
%endif
|
||||
#hardened-cc1
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
BuildRequires: redhat-rpm-config
|
||||
%endif
|
||||
%if 0%{?openEuler}
|
||||
BuildRequires: openEuler-rpm-config
|
||||
%endif
|
||||
%if 0%{with seastar}
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
BuildRequires: cryptopp-devel
|
||||
BuildRequires: numactl-devel
|
||||
%endif
|
||||
@ -543,7 +544,7 @@ Requires: python%{python3_pkgversion}-cephfs = %{_epoch_prefix}%{version}-%{rele
|
||||
Requires: python%{python3_pkgversion}-rgw = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: python%{python3_pkgversion}-ceph-argparse = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-prettytable
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
@ -615,9 +616,8 @@ Requires: ceph-mgr = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: ceph-grafana-dashboards = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: ceph-prometheus-alerts = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: python%{python3_pkgversion}-setuptools
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-cherrypy
|
||||
Requires: python%{python3_pkgversion}-jwt
|
||||
Requires: python%{python3_pkgversion}-routes
|
||||
Requires: python%{python3_pkgversion}-werkzeug
|
||||
%if 0%{?weak_deps}
|
||||
@ -626,7 +626,6 @@ Recommends: python%{python3_pkgversion}-saml
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
Requires: python%{python3_pkgversion}-CherryPy
|
||||
Requires: python%{python3_pkgversion}-PyJWT
|
||||
Requires: python%{python3_pkgversion}-Routes
|
||||
Requires: python%{python3_pkgversion}-Werkzeug
|
||||
Recommends: python%{python3_pkgversion}-python3-saml
|
||||
@ -645,7 +644,7 @@ Group: System/Filesystems
|
||||
%endif
|
||||
Requires: ceph-mgr = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: python%{python3_pkgversion}-numpy
|
||||
%if 0%{?fedora} || 0%{?suse_version}
|
||||
%if 0%{?fedora} || 0%{?suse_version} || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-scikit-learn
|
||||
%endif
|
||||
Requires: python3-scipy
|
||||
@ -665,7 +664,7 @@ Requires: python%{python3_pkgversion}-pyOpenSSL
|
||||
Requires: python%{python3_pkgversion}-requests
|
||||
Requires: python%{python3_pkgversion}-dateutil
|
||||
Requires: python%{python3_pkgversion}-setuptools
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-cherrypy
|
||||
Requires: python%{python3_pkgversion}-pyyaml
|
||||
Requires: python%{python3_pkgversion}-werkzeug
|
||||
@ -722,7 +721,7 @@ Requires: openssh
|
||||
Requires: python%{python3_pkgversion}-CherryPy
|
||||
Requires: python%{python3_pkgversion}-Jinja2
|
||||
%endif
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Requires: openssh-clients
|
||||
Requires: python%{python3_pkgversion}-cherrypy
|
||||
Requires: python%{python3_pkgversion}-jinja2
|
||||
@ -814,7 +813,7 @@ Requires: ceph-selinux = %{_epoch_prefix}%{version}-%{release}
|
||||
%endif
|
||||
Requires: librados2 = %{_epoch_prefix}%{version}-%{release}
|
||||
Requires: librgw2 = %{_epoch_prefix}%{version}-%{release}
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Requires: mailcap
|
||||
%endif
|
||||
%if 0%{?weak_deps}
|
||||
@ -894,6 +893,7 @@ Requires: parted
|
||||
Requires: util-linux
|
||||
Requires: xfsprogs
|
||||
Requires: python%{python3_pkgversion}-setuptools
|
||||
Requires: python%{python3_pkgversion}-packaging
|
||||
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
|
||||
%description volume
|
||||
This package contains a tool to deploy OSD with different devices like
|
||||
@ -905,7 +905,7 @@ Summary: RADOS distributed object store client library
|
||||
%if 0%{?suse_version}
|
||||
Group: System/Libraries
|
||||
%endif
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
|
||||
%endif
|
||||
%description -n librados2
|
||||
@ -1052,7 +1052,7 @@ Requires: librados2 = %{_epoch_prefix}%{version}-%{release}
|
||||
%if 0%{?suse_version}
|
||||
Requires(post): coreutils
|
||||
%endif
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
|
||||
%endif
|
||||
%description -n librbd1
|
||||
@ -1096,7 +1096,7 @@ Summary: Ceph distributed file system client library
|
||||
Group: System/Libraries
|
||||
%endif
|
||||
Obsoletes: libcephfs1 < %{_epoch_prefix}%{version}-%{release}
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
|
||||
Obsoletes: ceph-libcephfs
|
||||
%endif
|
||||
@ -1149,7 +1149,7 @@ descriptions, and submitting the command to the appropriate daemon.
|
||||
|
||||
%package -n python%{python3_pkgversion}-ceph-common
|
||||
Summary: Python 3 utility libraries for Ceph
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8
|
||||
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
Requires: python%{python3_pkgversion}-pyyaml
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
@ -1288,6 +1288,15 @@ Group: System/Monitoring
|
||||
%description mib
|
||||
This package provides a Ceph MIB for SNMP traps.
|
||||
|
||||
%package node-proxy
|
||||
Summary: hw monitoring agent for Ceph
|
||||
BuildArch: noarch
|
||||
%if 0%{?suse_version}
|
||||
Group: System/Monitoring
|
||||
%endif
|
||||
%description node-proxy
|
||||
This package provides a Ceph hardware monitoring agent.
|
||||
|
||||
#################################################################################
|
||||
# common
|
||||
#################################################################################
|
||||
@ -1467,7 +1476,7 @@ install -m 0755 %{buildroot}%{_bindir}/crimson-osd %{buildroot}%{_bindir}/ceph-o
|
||||
%endif
|
||||
|
||||
install -m 0644 -D src/etc-rbdmap %{buildroot}%{_sysconfdir}/ceph/rbdmap
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
install -m 0644 -D etc/sysconfig/ceph %{buildroot}%{_sysconfdir}/sysconfig/ceph
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
@ -1501,7 +1510,7 @@ install -m 0644 -D udev/50-rbd.rules %{buildroot}%{_udevrulesdir}/50-rbd.rules
|
||||
# sudoers.d
|
||||
install -m 0440 -D sudoers.d/ceph-smartctl %{buildroot}%{_sysconfdir}/sudoers.d/ceph-smartctl
|
||||
|
||||
%if 0%{?rhel} >= 8
|
||||
%if 0%{?rhel} >= 8 || 0%{?openEuler}
|
||||
pathfix.py -pni "%{__python3} %{py3_shbang_opts}" %{buildroot}%{_bindir}/*
|
||||
pathfix.py -pni "%{__python3} %{py3_shbang_opts}" %{buildroot}%{_sbindir}/*
|
||||
%endif
|
||||
@ -1538,7 +1547,7 @@ install -m 644 -D -t %{buildroot}%{_datadir}/snmp/mibs monitoring/snmp/CEPH-MIB.
|
||||
%fdupes %{buildroot}%{_prefix}
|
||||
%endif
|
||||
|
||||
%if 0%{?rhel} == 8
|
||||
%if 0%{?rhel} == 8 || 0%{?openEuler}
|
||||
%py_byte_compile %{__python3} %{buildroot}%{python3_sitelib}
|
||||
%endif
|
||||
|
||||
@ -1581,7 +1590,7 @@ rm -rf %{_vpath_builddir}
|
||||
%{_libdir}/libosd_tp.so*
|
||||
%endif
|
||||
%config(noreplace) %{_sysconfdir}/logrotate.d/ceph
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%config(noreplace) %{_sysconfdir}/sysconfig/ceph
|
||||
%endif
|
||||
%if 0%{?suse_version}
|
||||
@ -1614,7 +1623,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph.target ceph-crash.service >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph.target ceph-crash.service
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -1625,7 +1634,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph.target ceph-crash.service
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph.target ceph-crash.service
|
||||
%endif
|
||||
|
||||
@ -1722,7 +1731,7 @@ exit 0
|
||||
%pre common
|
||||
CEPH_GROUP_ID=167
|
||||
CEPH_USER_ID=167
|
||||
%if 0%{?rhel} || 0%{?fedora}
|
||||
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
|
||||
/usr/sbin/groupadd ceph -g $CEPH_GROUP_ID -o -r 2>/dev/null || :
|
||||
/usr/sbin/useradd ceph -u $CEPH_USER_ID -o -r -g ceph -s /sbin/nologin -c "Ceph daemons" -d %{_localstatedir}/lib/ceph 2>/dev/null || :
|
||||
%endif
|
||||
@ -1768,7 +1777,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-mds@\*.service ceph-mds.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-mds@\*.service ceph-mds.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -1779,7 +1788,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-mds@\*.service ceph-mds.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-mds@\*.service ceph-mds.target
|
||||
%endif
|
||||
|
||||
@ -1813,7 +1822,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-mgr@\*.service ceph-mgr.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-mgr@\*.service ceph-mgr.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -1824,7 +1833,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-mgr@\*.service ceph-mgr.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-mgr@\*.service ceph-mgr.target
|
||||
%endif
|
||||
|
||||
@ -1953,7 +1962,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-mon@\*.service ceph-mon.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-mon@\*.service ceph-mon.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -1964,7 +1973,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-mon@\*.service ceph-mon.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-mon@\*.service ceph-mon.target
|
||||
%endif
|
||||
|
||||
@ -2002,7 +2011,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset cephfs-mirror@\*.service cephfs-mirror.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post cephfs-mirror@\*.service cephfs-mirror.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2013,7 +2022,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun cephfs-mirror@\*.service cephfs-mirror.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun cephfs-mirror@\*.service cephfs-mirror.target
|
||||
%endif
|
||||
|
||||
@ -2033,6 +2042,7 @@ fi
|
||||
|
||||
%files -n ceph-exporter
|
||||
%{_bindir}/ceph-exporter
|
||||
%{_unitdir}/ceph-exporter.service
|
||||
|
||||
%files -n rbd-fuse
|
||||
%{_bindir}/rbd-fuse
|
||||
@ -2050,7 +2060,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-rbd-mirror@\*.service ceph-rbd-mirror.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2061,7 +2071,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
|
||||
%endif
|
||||
|
||||
@ -2091,7 +2101,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2102,7 +2112,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
|
||||
%endif
|
||||
|
||||
@ -2145,7 +2155,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-radosgw@\*.service ceph-radosgw.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-radosgw@\*.service ceph-radosgw.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2156,7 +2166,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-radosgw@\*.service ceph-radosgw.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-radosgw@\*.service ceph-radosgw.target
|
||||
%endif
|
||||
|
||||
@ -2196,7 +2206,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-osd@\*.service ceph-osd.target >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-osd@\*.service ceph-osd.target
|
||||
%endif
|
||||
if [ $1 -eq 1 ] ; then
|
||||
@ -2212,7 +2222,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-osd@\*.service ceph-osd.target
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-osd@\*.service ceph-osd.target
|
||||
%endif
|
||||
|
||||
@ -2251,7 +2261,7 @@ if [ $1 -eq 1 ] ; then
|
||||
/usr/bin/systemctl preset ceph-volume@\*.service >/dev/null 2>&1 || :
|
||||
fi
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_post ceph-volume@\*.service
|
||||
%endif
|
||||
|
||||
@ -2259,7 +2269,7 @@ fi
|
||||
%if 0%{?suse_version}
|
||||
%service_del_preun ceph-volume@\*.service
|
||||
%endif
|
||||
%if 0%{?fedora} || 0%{?rhel}
|
||||
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
|
||||
%systemd_preun ceph-volume@\*.service
|
||||
%endif
|
||||
|
||||
@ -2620,4 +2630,10 @@ exit 0
|
||||
%attr(0755,root,root) %dir %{_datadir}/snmp
|
||||
%{_datadir}/snmp/mibs
|
||||
|
||||
%files node-proxy
|
||||
%{_sbindir}/ceph-node-proxy
|
||||
%dir %{python3_sitelib}/ceph_node_proxy
|
||||
%{python3_sitelib}/ceph_node_proxy/*
|
||||
%{python3_sitelib}/ceph_node_proxy-*
|
||||
|
||||
%changelog
|
||||
|
@ -1,7 +1,13 @@
|
||||
ceph (18.2.2-1jammy) jammy; urgency=medium
|
||||
ceph (18.2.4-1jammy) jammy; urgency=medium
|
||||
|
||||
|
||||
-- Jenkins Build Slave User <jenkins-build@braggi10.front.sepia.ceph.com> Mon, 04 Mar 2024 20:27:31 +0000
|
||||
-- Jenkins Build Slave User <jenkins-build@braggi02.front.sepia.ceph.com> Fri, 12 Jul 2024 15:42:34 +0000
|
||||
|
||||
ceph (18.2.4-1) stable; urgency=medium
|
||||
|
||||
* New upstream release
|
||||
|
||||
-- Ceph Release Team <ceph-maintainers@ceph.io> Fri, 12 Jul 2024 09:57:18 -0400
|
||||
|
||||
ceph (18.2.2-1) stable; urgency=medium
|
||||
|
||||
|
@ -86,6 +86,9 @@ function(build_arrow)
|
||||
else()
|
||||
list(APPEND arrow_CMAKE_ARGS -DCMAKE_BUILD_TYPE=Release)
|
||||
endif()
|
||||
# don't add -Werror or debug package builds fail with:
|
||||
#warning _FORTIFY_SOURCE requires compiling with optimization (-O)
|
||||
list(APPEND arrow_CMAKE_ARGS -DBUILD_WARNING_LEVEL=PRODUCTION)
|
||||
|
||||
# we use an external project and copy the sources to bin directory to ensure
|
||||
# that object files are built outside of the source tree.
|
||||
|
@ -11,6 +11,13 @@ function(build_rocksdb)
|
||||
-DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE})
|
||||
endif()
|
||||
|
||||
list(APPEND rocksdb_CMAKE_ARGS -DWITH_LIBURING=${WITH_LIBURING})
|
||||
if(WITH_LIBURING)
|
||||
list(APPEND rocksdb_CMAKE_ARGS -During_INCLUDE_DIR=${URING_INCLUDE_DIR})
|
||||
list(APPEND rocksdb_CMAKE_ARGS -During_LIBRARIES=${URING_LIBRARY_DIR})
|
||||
list(APPEND rocksdb_INTERFACE_LINK_LIBRARIES uring::uring)
|
||||
endif()
|
||||
|
||||
if(ALLOCATOR STREQUAL "jemalloc")
|
||||
list(APPEND rocksdb_CMAKE_ARGS -DWITH_JEMALLOC=ON)
|
||||
list(APPEND rocksdb_INTERFACE_LINK_LIBRARIES JeMalloc::JeMalloc)
|
||||
@ -52,12 +59,13 @@ function(build_rocksdb)
|
||||
endif()
|
||||
include(CheckCXXCompilerFlag)
|
||||
check_cxx_compiler_flag("-Wno-deprecated-copy" HAS_WARNING_DEPRECATED_COPY)
|
||||
set(rocksdb_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
|
||||
if(HAS_WARNING_DEPRECATED_COPY)
|
||||
set(rocksdb_CXX_FLAGS -Wno-deprecated-copy)
|
||||
string(APPEND rocksdb_CXX_FLAGS " -Wno-deprecated-copy")
|
||||
endif()
|
||||
check_cxx_compiler_flag("-Wno-pessimizing-move" HAS_WARNING_PESSIMIZING_MOVE)
|
||||
if(HAS_WARNING_PESSIMIZING_MOVE)
|
||||
set(rocksdb_CXX_FLAGS "${rocksdb_CXX_FLAGS} -Wno-pessimizing-move")
|
||||
string(APPEND rocksdb_CXX_FLAGS " -Wno-pessimizing-move")
|
||||
endif()
|
||||
if(rocksdb_CXX_FLAGS)
|
||||
list(APPEND rocksdb_CMAKE_ARGS -DCMAKE_CXX_FLAGS='${rocksdb_CXX_FLAGS}')
|
||||
@ -84,6 +92,9 @@ function(build_rocksdb)
|
||||
INSTALL_COMMAND ""
|
||||
LIST_SEPARATOR !)
|
||||
|
||||
# make sure all the link libraries are built first
|
||||
add_dependencies(rocksdb_ext ${rocksdb_INTERFACE_LINK_LIBRARIES})
|
||||
|
||||
add_library(RocksDB::RocksDB STATIC IMPORTED)
|
||||
add_dependencies(RocksDB::RocksDB rocksdb_ext)
|
||||
set(rocksdb_INCLUDE_DIR "${rocksdb_SOURCE_DIR}/include")
|
||||
|
@ -32,6 +32,8 @@ function(build_uring)
|
||||
ExternalProject_Get_Property(liburing_ext source_dir)
|
||||
set(URING_INCLUDE_DIR "${source_dir}/src/include")
|
||||
set(URING_LIBRARY_DIR "${source_dir}/src")
|
||||
set(URING_INCLUDE_DIR ${URING_INCLUDE_DIR} PARENT_SCOPE)
|
||||
set(URING_LIBRARY_DIR ${URING_LIBRARY_DIR} PARENT_SCOPE)
|
||||
|
||||
add_library(uring::uring STATIC IMPORTED GLOBAL)
|
||||
add_dependencies(uring::uring liburing_ext)
|
||||
|
2
ceph/debian/ceph-exporter.install
Normal file
2
ceph/debian/ceph-exporter.install
Normal file
@ -0,0 +1,2 @@
|
||||
lib/systemd/system/ceph-exporter*
|
||||
usr/bin/ceph-exporter
|
@ -1,3 +1,4 @@
|
||||
bcrypt
|
||||
pyOpenSSL
|
||||
cephfs
|
||||
ceph-argparse
|
||||
|
@ -91,7 +91,6 @@ Build-Depends: automake,
|
||||
python3-all-dev,
|
||||
python3-cherrypy3,
|
||||
python3-natsort,
|
||||
python3-jwt <pkg.ceph.check>,
|
||||
python3-pecan <pkg.ceph.check>,
|
||||
python3-bcrypt <pkg.ceph.check>,
|
||||
tox <pkg.ceph.check>,
|
||||
@ -353,6 +352,30 @@ Description: debugging symbols for ceph-mgr
|
||||
.
|
||||
This package contains the debugging symbols for ceph-mgr.
|
||||
|
||||
Package: ceph-exporter
|
||||
Architecture: linux-any
|
||||
Depends: ceph-base (= ${binary:Version}),
|
||||
Description: metrics exporter for the ceph distributed storage system
|
||||
Ceph is a massively scalable, open-source, distributed
|
||||
storage system that runs on commodity hardware and delivers object,
|
||||
block and file system storage.
|
||||
.
|
||||
This package contains the metrics exporter daemon, which is used to expose
|
||||
the performance metrics.
|
||||
|
||||
Package: ceph-exporter-dbg
|
||||
Architecture: linux-any
|
||||
Section: debug
|
||||
Priority: extra
|
||||
Depends: ceph-exporter (= ${binary:Version}),
|
||||
${misc:Depends},
|
||||
Description: debugging symbols for ceph-exporter
|
||||
Ceph is a massively scalable, open-source, distributed
|
||||
storage system that runs on commodity hardware and delivers object,
|
||||
block and file system storage.
|
||||
.
|
||||
This package contains the debugging symbols for ceph-exporter.
|
||||
|
||||
Package: ceph-mon
|
||||
Architecture: linux-any
|
||||
Depends: ceph-base (= ${binary:Version}),
|
||||
|
@ -105,6 +105,7 @@ override_dh_strip:
|
||||
dh_strip -pceph-mds --dbg-package=ceph-mds-dbg
|
||||
dh_strip -pceph-fuse --dbg-package=ceph-fuse-dbg
|
||||
dh_strip -pceph-mgr --dbg-package=ceph-mgr-dbg
|
||||
dh_strip -pceph-exporter --dbg-package=ceph-exporter-dbg
|
||||
dh_strip -pceph-mon --dbg-package=ceph-mon-dbg
|
||||
dh_strip -pceph-osd --dbg-package=ceph-osd-dbg
|
||||
dh_strip -pceph-base --dbg-package=ceph-base-dbg
|
||||
|
357
ceph/doc/_static/js/pgcalc.js
vendored
Normal file
357
ceph/doc/_static/js/pgcalc.js
vendored
Normal file
@ -0,0 +1,357 @@
|
||||
var _____WB$wombat$assign$function_____ = function(name) {return (self._wb_wombat && self._wb_wombat.local_init && self._wb_wombat.local_init(name)) || self[name]; };
|
||||
if (!self.__WB_pmw) { self.__WB_pmw = function(obj) { this.__WB_source = obj; return this; } }
|
||||
{
|
||||
let window = _____WB$wombat$assign$function_____("window");
|
||||
let self = _____WB$wombat$assign$function_____("self");
|
||||
let document = _____WB$wombat$assign$function_____("document");
|
||||
let location = _____WB$wombat$assign$function_____("location");
|
||||
let top = _____WB$wombat$assign$function_____("top");
|
||||
let parent = _____WB$wombat$assign$function_____("parent");
|
||||
let frames = _____WB$wombat$assign$function_____("frames");
|
||||
let opener = _____WB$wombat$assign$function_____("opener");
|
||||
|
||||
var pow2belowThreshold = 0.25
|
||||
var key_values={};
|
||||
key_values['poolName'] ={'name':'Pool Name','default':'newPool','description': 'Name of the pool in question. Typical pool names are included below.', 'width':'30%; text-align: left'};
|
||||
key_values['size'] ={'name':'Size','default': 3, 'description': 'Number of replicas the pool will have. Default value of 3 is pre-filled.', 'width':'10%', 'global':1};
|
||||
key_values['osdNum'] ={'name':'OSD #','default': 100, 'description': 'Number of OSDs which this Pool will have PGs in. Typically, this is the entire Cluster OSD count, but could be less based on CRUSH rules. (e.g. Separate SSD and SATA disk sets)', 'width':'10%', 'global':1};
|
||||
key_values['percData'] ={'name':'%Data', 'default': 5, 'description': 'This value represents the approximate percentage of data which will be contained in this pool for that specific OSD set. Examples are pre-filled below for guidance.','width':'10%'};
|
||||
key_values['targPGsPerOSD'] ={'name':'Target PGs per OSD', 'default':100, 'description': 'This value should be populated based on the following guidance:', 'width':'10%', 'global':1, 'options': [ ['100','If the cluster OSD count is not expected to increase in the foreseeable future.'], ['200', 'If the cluster OSD count is expected to increase (up to double the size) in the foreseeable future.']]}
|
||||
|
||||
var notes ={
|
||||
'totalPerc':'<b>"Total Data Percentage"</b> below table should be a multiple of 100%.',
|
||||
'totalPGs':'<b>"Total PG Count"</b> below table will be the count of Primary PG copies. However, when calculating total PGs per OSD average, you must include all copies.',
|
||||
'noDecrease':'It\'s also important to know that the PG count can be increased, but <b>NEVER</b> decreased without destroying / recreating the pool. However, increasing the PG Count of a pool is one of the most impactful events in a Ceph Cluster, and should be avoided for production clusters if possible.',
|
||||
};
|
||||
|
||||
var presetTables={};
|
||||
presetTables['All-in-One']=[
|
||||
{ 'poolName' : 'rbd', 'size' : '3', 'osdNum' : '100', 'percData' : '100', 'targPGsPerOSD' : '100'},
|
||||
];
|
||||
presetTables['OpenStack']=[
|
||||
{ 'poolName' : 'cinder-backup', 'size' : '3', 'osdNum' : '100', 'percData' : '25', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'cinder-volumes', 'size' : '3', 'osdNum' : '100', 'percData' : '53', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'ephemeral-vms', 'size' : '3', 'osdNum' : '100', 'percData' : '15', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'glance-images', 'size' : '3', 'osdNum' : '100', 'percData' : '7', 'targPGsPerOSD' : '100'},
|
||||
];
|
||||
presetTables['OpenStack w RGW - Jewel and later']=[
|
||||
{ 'poolName' : '.rgw.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.control', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.data.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.gc', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.intent-log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.meta', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.usage', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.users.keys', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.users.email', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.users.swift', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.users.uid', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.buckets.extra', 'size' : '3', 'osdNum' : '100', 'percData' : '1.0', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.buckets.index', 'size' : '3', 'osdNum' : '100', 'percData' : '3.0', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.buckets.data', 'size' : '3', 'osdNum' : '100', 'percData' : '19', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'cinder-backup', 'size' : '3', 'osdNum' : '100', 'percData' : '18', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'cinder-volumes', 'size' : '3', 'osdNum' : '100', 'percData' : '42.8', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'ephemeral-vms', 'size' : '3', 'osdNum' : '100', 'percData' : '10', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'glance-images', 'size' : '3', 'osdNum' : '100', 'percData' : '5', 'targPGsPerOSD' : '100'},
|
||||
];
|
||||
|
||||
presetTables['Rados Gateway Only - Jewel and later']=[
|
||||
{ 'poolName' : '.rgw.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.control', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.data.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.gc', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.intent-log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.meta', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.usage', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.users.keys', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.users.email', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.users.swift', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.users.uid', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.buckets.extra', 'size' : '3', 'osdNum' : '100', 'percData' : '1.0', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.buckets.index', 'size' : '3', 'osdNum' : '100', 'percData' : '3.0', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'default.rgw.buckets.data', 'size' : '3', 'osdNum' : '100', 'percData' : '94.8', 'targPGsPerOSD' : '100'},
|
||||
];
|
||||
|
||||
presetTables['OpenStack w RGW - Infernalis and earlier']=[
|
||||
{ 'poolName' : '.intent-log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.buckets', 'size' : '3', 'osdNum' : '100', 'percData' : '18', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.buckets.extra', 'size' : '3', 'osdNum' : '100', 'percData' : '1.0', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.buckets.index', 'size' : '3', 'osdNum' : '100', 'percData' : '3.0', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.control', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.gc', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.usage', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.users', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.users.email', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.users.swift', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.users.uid', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'cinder-backup', 'size' : '3', 'osdNum' : '100', 'percData' : '19', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'cinder-volumes', 'size' : '3', 'osdNum' : '100', 'percData' : '42.9', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'ephemeral-vms', 'size' : '3', 'osdNum' : '100', 'percData' : '10', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'glance-images', 'size' : '3', 'osdNum' : '100', 'percData' : '5', 'targPGsPerOSD' : '100'},
|
||||
];
|
||||
|
||||
presetTables['Rados Gateway Only - Infernalis and earlier']=[
|
||||
{ 'poolName' : '.intent-log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.buckets', 'size' : '3', 'osdNum' : '100', 'percData' : '94.9', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.buckets.extra', 'size' : '3', 'osdNum' : '100', 'percData' : '1.0', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.buckets.index', 'size' : '3', 'osdNum' : '100', 'percData' : '3.0', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.control', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.gc', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.rgw.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.usage', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.users', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.users.email', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.users.swift', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : '.users.uid', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
|
||||
];
|
||||
presetTables['RBD and libRados']=[
|
||||
{ 'poolName' : 'rbd', 'size' : '3', 'osdNum' : '100', 'percData' : '75', 'targPGsPerOSD' : '100'},
|
||||
{ 'poolName' : 'myObjects', 'size' : '3', 'osdNum' : '100', 'percData' : '25', 'targPGsPerOSD' : '100'},
|
||||
];
|
||||
|
||||
$(function() {
|
||||
$("#presetType").on("change",changePreset);
|
||||
$("#btnAddPool").on("click",addPool);
|
||||
$("#btnGenCommands").on("click",generateCommands);
|
||||
$.each(presetTables,function(index,value) {
|
||||
selIndex='';
|
||||
if ( index == 'OpenStack w RGW - Jewel and later' )
|
||||
selIndex=' selected';
|
||||
$("#presetType").append("<option value=\""+index+"\""+selIndex+">"+index+"</option>");
|
||||
});
|
||||
changePreset();
|
||||
$("#beforeTable").html("<fieldset id='keyFieldset'><legend>Key</legend><dl class='table-display' id='keyDL'></dl></fieldset>");
|
||||
$.each(key_values, function(index, value) {
|
||||
pre='';
|
||||
post='';
|
||||
if ('global' in value) {
|
||||
pre='<a href="javascript://" onClick="globalChange(\''+index+'\');" title="Change the \''+value['name']+'\' parameter globally">';
|
||||
post='</a>'
|
||||
}
|
||||
|
||||
var dlAdd="<dt id='dt_"+index+"'>"+pre+value['name']+post+"</dt><dd id='dd_"+index+"'>"+value['description'];
|
||||
if ( 'options' in value ) {
|
||||
dlAdd+="<dl class='sub-table'>";
|
||||
$.each(value['options'], function (subIndex, subValue) {
|
||||
dlAdd+="<dt><a href=\"javascript://\" onClick=\"massUpdate('"+index+"','"+subValue[0]+"');\" title=\"Set all '"+value['name']+"' fields to '"+subValue[0]+"'.\">"+subValue[0]+"</a></dt><dd>"+subValue[1]+"</dd>";
|
||||
});
|
||||
dlAdd+="</dl>";
|
||||
}
|
||||
dlAdd+="</dd>";
|
||||
$("#keyDL").append(dlAdd);
|
||||
});
|
||||
$("#afterTable").html("<fieldset id='notesFieldset'><legend>Notes</legend><ul id='notesUL'>\n<ul></fieldset>");
|
||||
$.each(notes,function(index, value) {
|
||||
$("#notesUL").append("\t<li id=\"li_"+index+"\">"+value+"</li>\n");
|
||||
});
|
||||
|
||||
});
|
||||
|
||||
function changePreset() {
|
||||
resetTable();
|
||||
fillTable($("#presetType").val());
|
||||
}
|
||||
|
||||
function resetTable() {
|
||||
$("#pgsperpool").html("");
|
||||
$("#pgsperpool").append("<tr id='headerRow'>\n</tr>\n");
|
||||
$("#headerRow").append("\t<th> </th>\n");
|
||||
var fieldCount=0;
|
||||
var percDataIndex=0;
|
||||
$.each(key_values, function(index, value) {
|
||||
fieldCount++;
|
||||
pre='';
|
||||
post='';
|
||||
var widthAdd='';
|
||||
if ( index == 'percData' )
|
||||
percDataIndex=fieldCount;
|
||||
if ('width' in value)
|
||||
widthAdd=' style=\'width: '+value['width']+'\'';
|
||||
if ('global' in value) {
|
||||
pre='<a href="javascript://" onClick="globalChange(\''+index+'\');" title="Change the \''+value['name']+'\' parameter globally">';
|
||||
post='</a>'
|
||||
}
|
||||
$("#headerRow").append("\t<th"+widthAdd+">"+pre+value['name']+post+"</th>\n");
|
||||
});
|
||||
percDataIndex++;
|
||||
$("#headerRow").append("\t<th class='center'>Suggested PG Count</th>\n");
|
||||
$("#pgsperpool").append("<tr id='totalRow'><td colspan='"+percDataIndex+"' id='percTotal' style='text-align: right; margin-right: 10px;'><strong>Total Data Percentage:</strong> <span id='percTotalValue'>0</span>%</td><td> </td><td id='pgTotal' class='bold pgcount' style='text-align: right;'>PG Total Count: <span id='pgTotalValue'>0</span></td></tr>");
|
||||
}
|
||||
|
||||
function nearestPow2( aSize ){
|
||||
var tmp=Math.pow(2, Math.round(Math.log(aSize)/Math.log(2)));
|
||||
if(tmp<(aSize*(1-pow2belowThreshold)))
|
||||
tmp*=2;
|
||||
return tmp;
|
||||
}
|
||||
|
||||
function globalChange(field) {
|
||||
dialogHTML='<div title="Change \''+key_values[field]['name']+'\' Globally"><form>';
|
||||
dialogHTML+='<label for="value">New '+key_values[field]['name']+' value:</label><br />\n';
|
||||
dialogHTML+='<input type="text" name="globalValue" id="globalValue" value="'+$("#row0_"+field+"_input").val()+'" style="text-align: right;"/>';
|
||||
dialogHTML+='<input type="hidden" name="globalField" id="globalField" value="'+field+'"/>';
|
||||
dialogHTML+='<input type="submit" tabindex="-1" style="position:absolute; top:-1000px">';
|
||||
dialogHTML+='</form>';
|
||||
globalDialog=$(dialogHTML).dialog({
|
||||
autoOpen: true,
|
||||
width: 350,
|
||||
show: 'fold',
|
||||
hide: 'fold',
|
||||
modal: true,
|
||||
buttons: {
|
||||
"Update Value": function() { massUpdate($("#globalField").val(),$("#globalValue").val()); globalDialog.dialog("close"); setTimeout(function() { globalDialog.dialog("destroy"); }, 1000); },
|
||||
"Cancel": function() { globalDialog.dialog("close"); setTimeout(function() { globalDialog.dialog("destroy"); }, 1000); }
|
||||
}
|
||||
});
|
||||
}
|
||||
|
||||
var rowCount=0;
|
||||
function fillTable(presetType) {
|
||||
rowCount=0;
|
||||
$.each(presetTables[presetType], function(index,value) {
|
||||
addTableRow(value);
|
||||
});
|
||||
}
|
||||
|
||||
function addPool() {
|
||||
dialogHTML='<div title="Add Pool"><form>';
|
||||
$.each(key_values, function(index,value) {
|
||||
dialogHTML+='<br /><label for="new'+index+'">'+value['name']+':</label><br />\n';
|
||||
classAdd='right';
|
||||
if ( index == 'poolName' )
|
||||
classAdd='left';
|
||||
dialogHTML+='<input type="text" name="new'+index+'" id="new'+index+'" value="'+value['default']+'" class="'+classAdd+'"/><br />';
|
||||
});
|
||||
dialogHTML+='<input type="submit" tabindex="-1" style="position:absolute; top:-1000px">';
|
||||
dialogHTML+='</form>';
|
||||
addPoolDialog=$(dialogHTML).dialog({
|
||||
autoOpen: true,
|
||||
width: 350,
|
||||
show: 'fold',
|
||||
hide: 'fold',
|
||||
modal: true,
|
||||
buttons: {
|
||||
"Add Pool": function() {
|
||||
var newPoolValues={};
|
||||
$.each(key_values,function(index,value) {
|
||||
newPoolValues[index]=$("#new"+index).val();
|
||||
});
|
||||
addTableRow(newPoolValues);
|
||||
addPoolDialog.dialog("close");
|
||||
setTimeout(function() { addPoolDialog.dialog("destroy"); }, 1000); },
|
||||
"Cancel": function() { addPoolDialog.dialog("close"); setTimeout(function() { addPoolDialog.dialog("destroy"); }, 1000); }
|
||||
}
|
||||
});
|
||||
|
||||
// addTableRow({'poolName':'newPool','size':3, 'osdNum':100,'targPGsPerOSD': 100, 'percData':0});
|
||||
}
|
||||
|
||||
function addTableRow(rowValues) {
|
||||
rowAdd="<tr id='row"+rowCount+"'>\n";
|
||||
rowAdd+="\t<td width='15px' class='inputColor'><a href='javascript://' title='Remove Pool' onClick='$(\"#row"+rowCount+"\").remove();updateTotals();'><span class='ui-icon ui-icon-trash'></span></a></td>\n";
|
||||
$.each(key_values, function(index,value) {
|
||||
classAdd=' center';
|
||||
modifier='';
|
||||
if ( index == 'percData' ) {
|
||||
classAdd='" style="text-align: right;';
|
||||
// modifier=' %';
|
||||
} else if ( index == 'poolName' )
|
||||
classAdd=' left';
|
||||
rowAdd+="\t<td id=\"row"+rowCount+"_"+index+"\"><input type=\"text\" class=\"inputColor "+index+classAdd+"\" id=\"row"+rowCount+"_"+index+"_input\" value=\""+rowValues[index]+"\" onFocus=\"focusMe("+rowCount+",'"+index+"');\" onKeyUp=\"keyMe("+rowCount+",'"+index+"');\" onBlur=\"blurMe("+rowCount+",'"+index+"');\">"+modifier+"</td>\n";
|
||||
});
|
||||
rowAdd+="\t<td id=\"row"+rowCount+"_pgCount\" class='pgcount' style='text-align: right;'>0</td></tr>";
|
||||
$("#totalRow").before(rowAdd);
|
||||
updatePGCount(rowCount);
|
||||
$("[id$='percData_input']").each(function() { var fieldVal=parseFloat($(this).val()); $(this).val(fieldVal.toFixed(2)); });
|
||||
rowCount++;
|
||||
}
|
||||
|
||||
function updatePGCount(rowID) {
|
||||
if(rowID==-1) {
|
||||
for(var i=0;i<rowCount;i++) {
|
||||
updatePGCount(i);
|
||||
}
|
||||
} else {
|
||||
minValue=nearestPow2(Math.floor($("#row"+rowID+"_osdNum_input").val()/$("#row"+rowID+"_size_input").val())+1);
|
||||
if(minValue<$("#row"+rowID+"_osdNum_input").val())
|
||||
minValue*=2;
|
||||
calcValue=nearestPow2(Math.floor(($("#row"+rowID+"_targPGsPerOSD_input").val()*$("#row"+rowID+"_osdNum_input").val()*$("#row"+rowID+"_percData_input").val())/(100*$("#row"+rowID+"_size_input").val())));
|
||||
if(minValue>calcValue)
|
||||
$("#row"+rowID+"_pgCount").html(minValue);
|
||||
else
|
||||
$("#row"+rowID+"_pgCount").html(calcValue);
|
||||
}
|
||||
updateTotals();
|
||||
}
|
||||
|
||||
function focusMe(rowID,field) {
|
||||
$("#row"+rowID+"_"+field+"_input").toggleClass('inputColor');
|
||||
$("#row"+rowID+"_"+field+"_input").toggleClass('highlightColor');
|
||||
$("#dt_"+field).toggleClass('highlightColor');
|
||||
$("#dd_"+field).toggleClass('highlightColor');
|
||||
updatePGCount(rowID);
|
||||
}
|
||||
|
||||
function blurMe(rowID,field) {
|
||||
focusMe(rowID,field);
|
||||
$("[id$='percData_input']").each(function() { var fieldVal=parseFloat($(this).val()); $(this).val(fieldVal.toFixed(2)); });
|
||||
}
|
||||
|
||||
function keyMe(rowID,field) {
|
||||
updatePGCount(rowID);
|
||||
}
|
||||
|
||||
function massUpdate(field,value) {
|
||||
$("[id$='_"+field+"_input']").val(value);
|
||||
key_values[field]['default']=value;
|
||||
updatePGCount(-1);
|
||||
}
|
||||
|
||||
function updateTotals() {
|
||||
var totalPerc=0;
|
||||
var totalPGs=0;
|
||||
$("[id$='percData_input']").each(function() {
|
||||
totalPerc+=parseFloat($(this).val());
|
||||
if ( parseFloat($(this).val()) > 100 )
|
||||
$(this).addClass('ui-state-error');
|
||||
else
|
||||
$(this).removeClass('ui-state-error');
|
||||
});
|
||||
$("[id$='_pgCount']").each(function() {
|
||||
totalPGs+=parseInt($(this).html());
|
||||
});
|
||||
$("#percTotalValue").html(totalPerc.toFixed(2));
|
||||
$("#pgTotalValue").html(totalPGs);
|
||||
if(parseFloat(totalPerc.toFixed(2)) % 100 != 0) {
|
||||
$("#percTotalValue").addClass('ui-state-error');
|
||||
$("#li_totalPerc").addClass('ui-state-error');
|
||||
} else {
|
||||
$("#percTotalValue").removeClass('ui-state-error');
|
||||
$("#li_totalPerc").removeClass('ui-state-error');
|
||||
}
|
||||
$("#commandCode").html("");
|
||||
}
|
||||
|
||||
function generateCommands() {
|
||||
outputCommands="## Note: The 'while' loops below pause between pools to allow all\n\
|
||||
## PGs to be created. This is a safety mechanism to prevent\n\
|
||||
## saturating the Monitor nodes.\n\
|
||||
## -------------------------------------------------------------------\n\n";
|
||||
for(i=0;i<rowCount;i++) {
|
||||
console.log(i);
|
||||
outputCommands+="ceph osd pool create "+$("#row"+i+"_poolName_input").val()+" "+$("#row"+i+"_pgCount").html()+"\n";
|
||||
outputCommands+="ceph osd pool set "+$("#row"+i+"_poolName_input").val()+" size "+$("#row"+i+"_size_input").val()+"\n";
|
||||
outputCommands+="while [ $(ceph -s | grep creating -c) -gt 0 ]; do echo -n .;sleep 1; done\n\n";
|
||||
}
|
||||
window.location.href = "data:application/download," + encodeURIComponent(outputCommands);
|
||||
}
|
||||
|
||||
|
||||
}
|
@ -19,9 +19,14 @@ The Ceph Storage Cluster
|
||||
========================
|
||||
|
||||
Ceph provides an infinitely scalable :term:`Ceph Storage Cluster` based upon
|
||||
:abbr:`RADOS (Reliable Autonomic Distributed Object Store)`, which you can read
|
||||
about in `RADOS - A Scalable, Reliable Storage Service for Petabyte-scale
|
||||
Storage Clusters`_.
|
||||
:abbr:`RADOS (Reliable Autonomic Distributed Object Store)`, a reliable,
|
||||
distributed storage service that uses the intelligence in each of its nodes to
|
||||
secure the data it stores and to provide that data to :term:`client`\s. See
|
||||
Sage Weil's "`The RADOS Object Store
|
||||
<https://ceph.io/en/news/blog/2009/the-rados-distributed-object-store/>`_" blog
|
||||
post for a brief explanation of RADOS and see `RADOS - A Scalable, Reliable
|
||||
Storage Service for Petabyte-scale Storage Clusters`_ for an exhaustive
|
||||
explanation of :term:`RADOS`.
|
||||
|
||||
A Ceph Storage Cluster consists of multiple types of daemons:
|
||||
|
||||
@ -33,11 +38,10 @@ A Ceph Storage Cluster consists of multiple types of daemons:
|
||||
.. _arch_monitor:
|
||||
|
||||
Ceph Monitors maintain the master copy of the cluster map, which they provide
|
||||
to Ceph clients. Provisioning multiple monitors within the Ceph cluster ensures
|
||||
availability in the event that one of the monitor daemons or its host fails.
|
||||
The Ceph monitor provides copies of the cluster map to storage cluster clients.
|
||||
to Ceph clients. The existence of multiple monitors in the Ceph cluster ensures
|
||||
availability if one of the monitor daemons or its host fails.
|
||||
|
||||
A Ceph OSD Daemon checks its own state and the state of other OSDs and reports
|
||||
A Ceph OSD Daemon checks its own state and the state of other OSDs and reports
|
||||
back to monitors.
|
||||
|
||||
A Ceph Manager serves as an endpoint for monitoring, orchestration, and plug-in
|
||||
@ -47,10 +51,11 @@ A Ceph Metadata Server (MDS) manages file metadata when CephFS is used to
|
||||
provide file services.
|
||||
|
||||
Storage cluster clients and :term:`Ceph OSD Daemon`\s use the CRUSH algorithm
|
||||
to compute information about data location. This means that clients and OSDs
|
||||
are not bottlenecked by a central lookup table. Ceph's high-level features
|
||||
include a native interface to the Ceph Storage Cluster via ``librados``, and a
|
||||
number of service interfaces built on top of ``librados``.
|
||||
to compute information about the location of data. Use of the CRUSH algoritm
|
||||
means that clients and OSDs are not bottlenecked by a central lookup table.
|
||||
Ceph's high-level features include a native interface to the Ceph Storage
|
||||
Cluster via ``librados``, and a number of service interfaces built on top of
|
||||
``librados``.
|
||||
|
||||
Storing Data
|
||||
------------
|
||||
@ -61,7 +66,7 @@ comes through a :term:`Ceph Block Device`, :term:`Ceph Object Storage`, the
|
||||
``librados``. The data received by the Ceph Storage Cluster is stored as RADOS
|
||||
objects. Each object is stored on an :term:`Object Storage Device` (this is
|
||||
also called an "OSD"). Ceph OSDs control read, write, and replication
|
||||
operations on storage drives. The default BlueStore back end stores objects
|
||||
operations on storage drives. The default BlueStore back end stores objects
|
||||
in a monolithic, database-like fashion.
|
||||
|
||||
.. ditaa::
|
||||
@ -69,7 +74,7 @@ in a monolithic, database-like fashion.
|
||||
/------\ +-----+ +-----+
|
||||
| obj |------>| {d} |------>| {s} |
|
||||
\------/ +-----+ +-----+
|
||||
|
||||
|
||||
Object OSD Drive
|
||||
|
||||
Ceph OSD Daemons store data as objects in a flat namespace. This means that
|
||||
@ -85,10 +90,10 @@ created date, and the last modified date.
|
||||
/------+------------------------------+----------------\
|
||||
| ID | Binary Data | Metadata |
|
||||
+------+------------------------------+----------------+
|
||||
| 1234 | 0101010101010100110101010010 | name1 = value1 |
|
||||
| 1234 | 0101010101010100110101010010 | name1 = value1 |
|
||||
| | 0101100001010100110101010010 | name2 = value2 |
|
||||
| | 0101100001010100110101010010 | nameN = valueN |
|
||||
\------+------------------------------+----------------/
|
||||
\------+------------------------------+----------------/
|
||||
|
||||
.. note:: An object ID is unique across the entire cluster, not just the local
|
||||
filesystem.
|
||||
@ -128,8 +133,8 @@ massive scale by distributing the work to all the OSD daemons in the cluster
|
||||
and all the clients that communicate with them. CRUSH uses intelligent data
|
||||
replication to ensure resiliency, which is better suited to hyper-scale
|
||||
storage. The following sections provide additional details on how CRUSH works.
|
||||
For a detailed discussion of CRUSH, see `CRUSH - Controlled, Scalable,
|
||||
Decentralized Placement of Replicated Data`_.
|
||||
For an in-depth, academic discussion of CRUSH, see `CRUSH - Controlled,
|
||||
Scalable, Decentralized Placement of Replicated Data`_.
|
||||
|
||||
.. index:: architecture; cluster map
|
||||
|
||||
@ -147,14 +152,14 @@ five maps that constitute the cluster map are:
|
||||
the address, and the TCP port of each monitor. The monitor map specifies the
|
||||
current epoch, the time of the monitor map's creation, and the time of the
|
||||
monitor map's last modification. To view a monitor map, run ``ceph mon
|
||||
dump``.
|
||||
|
||||
dump``.
|
||||
|
||||
#. **The OSD Map:** Contains the cluster ``fsid``, the time of the OSD map's
|
||||
creation, the time of the OSD map's last modification, a list of pools, a
|
||||
list of replica sizes, a list of PG numbers, and a list of OSDs and their
|
||||
statuses (for example, ``up``, ``in``). To view an OSD map, run ``ceph
|
||||
osd dump``.
|
||||
|
||||
osd dump``.
|
||||
|
||||
#. **The PG Map:** Contains the PG version, its time stamp, the last OSD map
|
||||
epoch, the full ratios, and the details of each placement group. This
|
||||
includes the PG ID, the `Up Set`, the `Acting Set`, the state of the PG (for
|
||||
@ -168,8 +173,8 @@ five maps that constitute the cluster map are:
|
||||
{decomp-crushmap-filename}``. Use a text editor or ``cat`` to view the
|
||||
decompiled map.
|
||||
|
||||
#. **The MDS Map:** Contains the current MDS map epoch, when the map was
|
||||
created, and the last time it changed. It also contains the pool for
|
||||
#. **The MDS Map:** Contains the current MDS map epoch, when the map was
|
||||
created, and the last time it changed. It also contains the pool for
|
||||
storing metadata, a list of metadata servers, and which metadata servers
|
||||
are ``up`` and ``in``. To view an MDS map, execute ``ceph fs dump``.
|
||||
|
||||
@ -212,13 +217,13 @@ High Availability Authentication
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The ``cephx`` authentication system is used by Ceph to authenticate users and
|
||||
daemons and to protect against man-in-the-middle attacks.
|
||||
daemons and to protect against man-in-the-middle attacks.
|
||||
|
||||
.. note:: The ``cephx`` protocol does not address data encryption in transport
|
||||
.. note:: The ``cephx`` protocol does not address data encryption in transport
|
||||
(for example, SSL/TLS) or encryption at rest.
|
||||
|
||||
``cephx`` uses shared secret keys for authentication. This means that both the
|
||||
client and the monitor cluster keep a copy of the client's secret key.
|
||||
client and the monitor cluster keep a copy of the client's secret key.
|
||||
|
||||
The ``cephx`` protocol makes it possible for each party to prove to the other
|
||||
that it has a copy of the key without revealing it. This provides mutual
|
||||
@ -235,7 +240,7 @@ Direct interactions between Ceph clients and OSDs require authenticated
|
||||
connections. The ``cephx`` authentication system establishes and sustains these
|
||||
authenticated connections.
|
||||
|
||||
The ``cephx`` protocol operates in a manner similar to `Kerberos`_.
|
||||
The ``cephx`` protocol operates in a manner similar to `Kerberos`_.
|
||||
|
||||
A user invokes a Ceph client to contact a monitor. Unlike Kerberos, each
|
||||
monitor can authenticate users and distribute keys, which means that there is
|
||||
@ -248,7 +253,7 @@ Monitors. The client then uses the session key to request services from the
|
||||
monitors, and the monitors provide the client with a ticket that authenticates
|
||||
the client against the OSDs that actually handle data. Ceph Monitors and OSDs
|
||||
share a secret, which means that the clients can use the ticket provided by the
|
||||
monitors to authenticate against any OSD or metadata server in the cluster.
|
||||
monitors to authenticate against any OSD or metadata server in the cluster.
|
||||
|
||||
Like Kerberos tickets, ``cephx`` tickets expire. An attacker cannot use an
|
||||
expired ticket or session key that has been obtained surreptitiously. This form
|
||||
@ -264,8 +269,8 @@ subsystem generates the username and key, stores a copy on the monitor(s), and
|
||||
transmits the user's secret back to the ``client.admin`` user. This means that
|
||||
the client and the monitor share a secret key.
|
||||
|
||||
.. note:: The ``client.admin`` user must provide the user ID and
|
||||
secret key to the user in a secure manner.
|
||||
.. note:: The ``client.admin`` user must provide the user ID and
|
||||
secret key to the user in a secure manner.
|
||||
|
||||
.. ditaa::
|
||||
|
||||
@ -275,7 +280,7 @@ the client and the monitor share a secret key.
|
||||
| request to |
|
||||
| create a user |
|
||||
|-------------->|----------+ create user
|
||||
| | | and
|
||||
| | | and
|
||||
|<--------------|<---------+ store key
|
||||
| transmit key |
|
||||
| |
|
||||
@ -298,25 +303,25 @@ and uses it to sign requests to OSDs and to metadata servers in the cluster.
|
||||
+---------+ +---------+
|
||||
| authenticate |
|
||||
|-------------->|----------+ generate and
|
||||
| | | encrypt
|
||||
| | | encrypt
|
||||
|<--------------|<---------+ session key
|
||||
| transmit |
|
||||
| encrypted |
|
||||
| session key |
|
||||
| |
|
||||
| |
|
||||
|-----+ decrypt |
|
||||
| | session |
|
||||
|<----+ key |
|
||||
| | session |
|
||||
|<----+ key |
|
||||
| |
|
||||
| req. ticket |
|
||||
|-------------->|----------+ generate and
|
||||
| | | encrypt
|
||||
| | | encrypt
|
||||
|<--------------|<---------+ ticket
|
||||
| recv. ticket |
|
||||
| |
|
||||
| |
|
||||
|-----+ decrypt |
|
||||
| | ticket |
|
||||
|<----+ |
|
||||
| | ticket |
|
||||
|<----+ |
|
||||
|
||||
|
||||
The ``cephx`` protocol authenticates ongoing communications between the clients
|
||||
@ -331,7 +336,7 @@ between the client and the daemon.
|
||||
| Client | | Monitor | | MDS | | OSD |
|
||||
+---------+ +---------+ +-------+ +-------+
|
||||
| request to | | |
|
||||
| create a user | | |
|
||||
| create a user | | |
|
||||
|-------------->| mon and | |
|
||||
|<--------------| client share | |
|
||||
| receive | a secret. | |
|
||||
@ -339,7 +344,7 @@ between the client and the daemon.
|
||||
| |<------------>| |
|
||||
| |<-------------+------------>|
|
||||
| | mon, mds, | |
|
||||
| authenticate | and osd | |
|
||||
| authenticate | and osd | |
|
||||
|-------------->| share | |
|
||||
|<--------------| a secret | |
|
||||
| session key | | |
|
||||
@ -355,7 +360,7 @@ between the client and the daemon.
|
||||
| receive response (CephFS only) |
|
||||
| |
|
||||
| make request |
|
||||
|------------------------------------------->|
|
||||
|------------------------------------------->|
|
||||
|<-------------------------------------------|
|
||||
receive response
|
||||
|
||||
@ -364,7 +369,7 @@ daemons. The authentication is not extended beyond the Ceph client. If a user
|
||||
accesses the Ceph client from a remote host, cephx authentication will not be
|
||||
applied to the connection between the user's host and the client host.
|
||||
|
||||
See `Cephx Config Guide`_ for more on configuration details.
|
||||
See `Cephx Config Guide`_ for more on configuration details.
|
||||
|
||||
See `User Management`_ for more on user management.
|
||||
|
||||
@ -418,7 +423,7 @@ the greater cluster provides several benefits:
|
||||
Monitors receive no such message after a configurable period of time,
|
||||
then they mark the OSD ``down``. This mechanism is a failsafe, however.
|
||||
Normally, Ceph OSD Daemons determine if a neighboring OSD is ``down`` and
|
||||
report it to the Ceph Monitors. This contributes to making Ceph Monitors
|
||||
report it to the Ceph Monitors. This contributes to making Ceph Monitors
|
||||
lightweight processes. See `Monitoring OSDs`_ and `Heartbeats`_ for
|
||||
additional details.
|
||||
|
||||
@ -465,7 +470,7 @@ the greater cluster provides several benefits:
|
||||
Write (2) | | | | Write (3)
|
||||
+------+ | | +------+
|
||||
| +------+ +------+ |
|
||||
| | Ack (4) Ack (5)| |
|
||||
| | Ack (4) Ack (5)| |
|
||||
v * * v
|
||||
+---------------+ +---------------+
|
||||
| Secondary OSD | | Tertiary OSD |
|
||||
@ -492,7 +497,7 @@ About Pools
|
||||
|
||||
The Ceph storage system supports the notion of 'Pools', which are logical
|
||||
partitions for storing objects.
|
||||
|
||||
|
||||
Ceph Clients retrieve a `Cluster Map`_ from a Ceph Monitor, and write RADOS
|
||||
objects to pools. The way that Ceph places the data in the pools is determined
|
||||
by the pool's ``size`` or number of replicas, the CRUSH rule, and the number of
|
||||
@ -513,12 +518,12 @@ placement groups in the pool.
|
||||
+--------+ +---------------+
|
||||
| Pool |---------->| CRUSH Rule |
|
||||
+--------+ Selects +---------------+
|
||||
|
||||
|
||||
|
||||
Pools set at least the following parameters:
|
||||
|
||||
- Ownership/Access to Objects
|
||||
- The Number of Placement Groups, and
|
||||
- The Number of Placement Groups, and
|
||||
- The CRUSH Rule to Use.
|
||||
|
||||
See `Set Pool Values`_ for details.
|
||||
@ -531,12 +536,12 @@ Mapping PGs to OSDs
|
||||
|
||||
Each pool has a number of placement groups (PGs) within it. CRUSH dynamically
|
||||
maps PGs to OSDs. When a Ceph Client stores objects, CRUSH maps each RADOS
|
||||
object to a PG.
|
||||
object to a PG.
|
||||
|
||||
This mapping of RADOS objects to PGs implements an abstraction and indirection
|
||||
layer between Ceph OSD Daemons and Ceph Clients. The Ceph Storage Cluster must
|
||||
be able to grow (or shrink) and redistribute data adaptively when the internal
|
||||
topology changes.
|
||||
topology changes.
|
||||
|
||||
If the Ceph Client "knew" which Ceph OSD Daemons were storing which objects, a
|
||||
tight coupling would exist between the Ceph Client and the Ceph OSD Daemon.
|
||||
@ -565,11 +570,11 @@ placement groups, and how it maps placement groups to OSDs.
|
||||
+------+------+-------------+ |
|
||||
| | | |
|
||||
v v v v
|
||||
/----------\ /----------\ /----------\ /----------\
|
||||
/----------\ /----------\ /----------\ /----------\
|
||||
| | | | | | | |
|
||||
| OSD #1 | | OSD #2 | | OSD #3 | | OSD #4 |
|
||||
| | | | | | | |
|
||||
\----------/ \----------/ \----------/ \----------/
|
||||
\----------/ \----------/ \----------/ \----------/
|
||||
|
||||
The client uses its copy of the cluster map and the CRUSH algorithm to compute
|
||||
precisely which OSD it will use when reading or writing a particular object.
|
||||
@ -583,11 +588,11 @@ When a Ceph Client binds to a Ceph Monitor, it retrieves the latest version of
|
||||
the `Cluster Map`_. When a client has been equipped with a copy of the cluster
|
||||
map, it is aware of all the monitors, OSDs, and metadata servers in the
|
||||
cluster. **However, even equipped with a copy of the latest version of the
|
||||
cluster map, the client doesn't know anything about object locations.**
|
||||
cluster map, the client doesn't know anything about object locations.**
|
||||
|
||||
**Object locations must be computed.**
|
||||
|
||||
The client requies only the object ID and the name of the pool in order to
|
||||
The client requires only the object ID and the name of the pool in order to
|
||||
compute the object location.
|
||||
|
||||
Ceph stores data in named pools (for example, "liverpool"). When a client
|
||||
@ -626,7 +631,7 @@ persists, you may need to refer to the `Troubleshooting Peering Failure`_
|
||||
section.
|
||||
|
||||
.. Note:: PGs that agree on the state of the cluster do not necessarily have
|
||||
the current data yet.
|
||||
the current data yet.
|
||||
|
||||
The Ceph Storage Cluster was designed to store at least two copies of an object
|
||||
(that is, ``size = 2``), which is the minimum requirement for data safety. For
|
||||
@ -656,7 +661,7 @@ epoch.
|
||||
The Ceph OSD daemons that are part of an *Acting Set* might not always be
|
||||
``up``. When an OSD in the *Acting Set* is ``up``, it is part of the *Up Set*.
|
||||
The *Up Set* is an important distinction, because Ceph can remap PGs to other
|
||||
Ceph OSD Daemons when an OSD fails.
|
||||
Ceph OSD Daemons when an OSD fails.
|
||||
|
||||
.. note:: Consider a hypothetical *Acting Set* for a PG that contains
|
||||
``osd.25``, ``osd.32`` and ``osd.61``. The first OSD (``osd.25``), is the
|
||||
@ -676,7 +681,7 @@ process (albeit rather crudely, since it is substantially less impactful with
|
||||
large clusters) where some, but not all of the PGs migrate from existing OSDs
|
||||
(OSD 1, and OSD 2) to the new OSD (OSD 3). Even when rebalancing, CRUSH is
|
||||
stable. Many of the placement groups remain in their original configuration,
|
||||
and each OSD gets some added capacity, so there are no load spikes on the
|
||||
and each OSD gets some added capacity, so there are no load spikes on the
|
||||
new OSD after rebalancing is complete.
|
||||
|
||||
|
||||
@ -823,7 +828,7 @@ account.
|
||||
| | | |
|
||||
| +-------+-------+ |
|
||||
| ^ |
|
||||
| | |
|
||||
| | |
|
||||
| | |
|
||||
+--+---+ +------+ +---+--+ +---+--+
|
||||
name | NYAN | | NYAN | | NYAN | | NYAN |
|
||||
@ -876,7 +881,7 @@ version 1).
|
||||
.. ditaa::
|
||||
|
||||
Primary OSD
|
||||
|
||||
|
||||
+-------------+
|
||||
| OSD 1 | +-------------+
|
||||
| log | Write Full | |
|
||||
@ -921,7 +926,7 @@ as ``D2v2`` ) while others are acknowledged and persisted to storage drives
|
||||
.. ditaa::
|
||||
|
||||
Primary OSD
|
||||
|
||||
|
||||
+-------------+
|
||||
| OSD 1 |
|
||||
| log |
|
||||
@ -930,11 +935,11 @@ as ``D2v2`` ) while others are acknowledged and persisted to storage drives
|
||||
| +----+ +<------------+ Ceph Client |
|
||||
| | v2 | |
|
||||
| +----+ | +-------------+
|
||||
| |D1v1| 1,1 |
|
||||
| +----+ |
|
||||
+------+------+
|
||||
|
|
||||
|
|
||||
| |D1v1| 1,1 |
|
||||
| +----+ |
|
||||
+------+------+
|
||||
|
|
||||
|
|
||||
| +------+------+
|
||||
| | OSD 2 |
|
||||
| +------+ | log |
|
||||
@ -962,7 +967,7 @@ the logs' ``last_complete`` pointer can move from ``1,1`` to ``1,2``.
|
||||
.. ditaa::
|
||||
|
||||
Primary OSD
|
||||
|
||||
|
||||
+-------------+
|
||||
| OSD 1 |
|
||||
| log |
|
||||
@ -971,10 +976,10 @@ the logs' ``last_complete`` pointer can move from ``1,1`` to ``1,2``.
|
||||
| +----+ +<------------+ Ceph Client |
|
||||
| | v2 | |
|
||||
| +----+ | +-------------+
|
||||
| |D1v1| 1,1 |
|
||||
| +----+ |
|
||||
+------+------+
|
||||
|
|
||||
| |D1v1| 1,1 |
|
||||
| +----+ |
|
||||
+------+------+
|
||||
|
|
||||
| +-------------+
|
||||
| | OSD 2 |
|
||||
| | log |
|
||||
@ -986,7 +991,7 @@ the logs' ``last_complete`` pointer can move from ``1,1`` to ``1,2``.
|
||||
| | |D2v1| 1,1 |
|
||||
| | +----+ |
|
||||
| +-------------+
|
||||
|
|
||||
|
|
||||
| +-------------+
|
||||
| | OSD 3 |
|
||||
| | log |
|
||||
@ -1007,7 +1012,7 @@ on **OSD 3**.
|
||||
.. ditaa::
|
||||
|
||||
Primary OSD
|
||||
|
||||
|
||||
+-------------+
|
||||
| OSD 1 |
|
||||
| log |
|
||||
@ -1050,7 +1055,7 @@ will be the head of the new authoritative log.
|
||||
| (down) |
|
||||
| c333 |
|
||||
+------+------+
|
||||
|
|
||||
|
|
||||
| +-------------+
|
||||
| | OSD 2 |
|
||||
| | log |
|
||||
@ -1059,7 +1064,7 @@ will be the head of the new authoritative log.
|
||||
| | +----+ |
|
||||
| | |
|
||||
| +-------------+
|
||||
|
|
||||
|
|
||||
| +-------------+
|
||||
| | OSD 3 |
|
||||
| | log |
|
||||
@ -1079,20 +1084,20 @@ will be the head of the new authoritative log.
|
||||
| 1,1 |
|
||||
| |
|
||||
+------+------+
|
||||
|
||||
|
||||
|
||||
|
||||
The log entry 1,2 found on **OSD 3** is divergent from the new authoritative log
|
||||
provided by **OSD 4**: it is discarded and the file containing the ``C1v2``
|
||||
chunk is removed. The ``D1v1`` chunk is rebuilt with the ``decode`` function of
|
||||
the erasure coding library during scrubbing and stored on the new primary
|
||||
the erasure coding library during scrubbing and stored on the new primary
|
||||
**OSD 4**.
|
||||
|
||||
|
||||
.. ditaa::
|
||||
|
||||
Primary OSD
|
||||
|
||||
|
||||
+-------------+
|
||||
| OSD 4 |
|
||||
| log |
|
||||
@ -1140,7 +1145,7 @@ configured to act as a cache tier, and a backing pool of either erasure-coded
|
||||
or relatively slower/cheaper devices configured to act as an economical storage
|
||||
tier. The Ceph objecter handles where to place the objects and the tiering
|
||||
agent determines when to flush objects from the cache to the backing storage
|
||||
tier. So the cache tier and the backing storage tier are completely transparent
|
||||
tier. So the cache tier and the backing storage tier are completely transparent
|
||||
to Ceph clients.
|
||||
|
||||
|
||||
@ -1150,14 +1155,14 @@ to Ceph clients.
|
||||
| Ceph Client |
|
||||
+------+------+
|
||||
^
|
||||
Tiering is |
|
||||
Tiering is |
|
||||
Transparent | Faster I/O
|
||||
to Ceph | +---------------+
|
||||
Client Ops | | |
|
||||
Client Ops | | |
|
||||
| +----->+ Cache Tier |
|
||||
| | | |
|
||||
| | +-----+---+-----+
|
||||
| | | ^
|
||||
| | | ^
|
||||
v v | | Active Data in Cache Tier
|
||||
+------+----+--+ | |
|
||||
| Objecter | | |
|
||||
@ -1198,11 +1203,11 @@ operations on the outbound data and return the data to the client.
|
||||
|
||||
A Ceph class for a content management system that presents pictures of a
|
||||
particular size and aspect ratio could take an inbound bitmap image, crop it
|
||||
to a particular aspect ratio, resize it and embed an invisible copyright or
|
||||
watermark to help protect the intellectual property; then, save the
|
||||
to a particular aspect ratio, resize it and embed an invisible copyright or
|
||||
watermark to help protect the intellectual property; then, save the
|
||||
resulting bitmap image to the object store.
|
||||
|
||||
See ``src/objclass/objclass.h``, ``src/fooclass.cc`` and ``src/barclass`` for
|
||||
See ``src/objclass/objclass.h``, ``src/fooclass.cc`` and ``src/barclass`` for
|
||||
exemplary implementations.
|
||||
|
||||
|
||||
@ -1279,7 +1284,7 @@ synchronization/communication channel.
|
||||
+----------+ +----------+ +----------+ +---------------+
|
||||
| | | |
|
||||
| | | |
|
||||
| | Watch Object | |
|
||||
| | Watch Object | |
|
||||
|--------------------------------------------------->|
|
||||
| | | |
|
||||
|<---------------------------------------------------|
|
||||
@ -1295,7 +1300,7 @@ synchronization/communication channel.
|
||||
| | | |
|
||||
| | |<-----------------|
|
||||
| | | Ack/Commit |
|
||||
| | Notify | |
|
||||
| | Notify | |
|
||||
|--------------------------------------------------->|
|
||||
| | | |
|
||||
|<---------------------------------------------------|
|
||||
@ -1305,7 +1310,7 @@ synchronization/communication channel.
|
||||
| | Notify | |
|
||||
| | |<-----------------|
|
||||
| | | Notify |
|
||||
| | Ack | |
|
||||
| | Ack | |
|
||||
|----------------+---------------------------------->|
|
||||
| | | |
|
||||
| | Ack | |
|
||||
@ -1313,7 +1318,7 @@ synchronization/communication channel.
|
||||
| | | |
|
||||
| | | Ack |
|
||||
| | |----------------->|
|
||||
| | | |
|
||||
| | | |
|
||||
|<---------------+----------------+------------------|
|
||||
| Complete
|
||||
|
||||
@ -1331,13 +1336,13 @@ volume'. Ceph's striping offers the throughput of RAID 0 striping, the
|
||||
reliability of n-way RAID mirroring and faster recovery.
|
||||
|
||||
Ceph provides three types of clients: Ceph Block Device, Ceph File System, and
|
||||
Ceph Object Storage. A Ceph Client converts its data from the representation
|
||||
Ceph Object Storage. A Ceph Client converts its data from the representation
|
||||
format it provides to its users (a block device image, RESTful objects, CephFS
|
||||
filesystem directories) into objects for storage in the Ceph Storage Cluster.
|
||||
filesystem directories) into objects for storage in the Ceph Storage Cluster.
|
||||
|
||||
.. tip:: The objects Ceph stores in the Ceph Storage Cluster are not striped.
|
||||
Ceph Object Storage, Ceph Block Device, and the Ceph File System stripe their
|
||||
data over multiple Ceph Storage Cluster objects. Ceph Clients that write
|
||||
.. tip:: The objects Ceph stores in the Ceph Storage Cluster are not striped.
|
||||
Ceph Object Storage, Ceph Block Device, and the Ceph File System stripe their
|
||||
data over multiple Ceph Storage Cluster objects. Ceph Clients that write
|
||||
directly to the Ceph Storage Cluster via ``librados`` must perform the
|
||||
striping (and parallel I/O) for themselves to obtain these benefits.
|
||||
|
||||
@ -1380,7 +1385,7 @@ diagram depicts the simplest form of striping:
|
||||
| End cCCC | | End cCCC |
|
||||
| Object 0 | | Object 1 |
|
||||
\-----------/ \-----------/
|
||||
|
||||
|
||||
|
||||
If you anticipate large images sizes, large S3 or Swift objects (e.g., video),
|
||||
or large CephFS directories, you may see considerable read/write performance
|
||||
@ -1420,16 +1425,16 @@ stripe (``stripe unit 16``) in the first object in the new object set (``object
|
||||
+-----------------+--------+--------+-----------------+
|
||||
| | | | +--\
|
||||
v v v v |
|
||||
/-----------\ /-----------\ /-----------\ /-----------\ |
|
||||
/-----------\ /-----------\ /-----------\ /-----------\ |
|
||||
| Begin cCCC| | Begin cCCC| | Begin cCCC| | Begin cCCC| |
|
||||
| Object 0 | | Object 1 | | Object 2 | | Object 3 | |
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ |
|
||||
| stripe | | stripe | | stripe | | stripe | |
|
||||
| unit 0 | | unit 1 | | unit 2 | | unit 3 | |
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ |
|
||||
| stripe | | stripe | | stripe | | stripe | +-\
|
||||
| stripe | | stripe | | stripe | | stripe | +-\
|
||||
| unit 4 | | unit 5 | | unit 6 | | unit 7 | | Object
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ +- Set
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ +- Set
|
||||
| stripe | | stripe | | stripe | | stripe | | 1
|
||||
| unit 8 | | unit 9 | | unit 10 | | unit 11 | +-/
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ |
|
||||
@ -1437,36 +1442,36 @@ stripe (``stripe unit 16``) in the first object in the new object set (``object
|
||||
| unit 12 | | unit 13 | | unit 14 | | unit 15 | |
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ |
|
||||
| End cCCC | | End cCCC | | End cCCC | | End cCCC | |
|
||||
| Object 0 | | Object 1 | | Object 2 | | Object 3 | |
|
||||
| Object 0 | | Object 1 | | Object 2 | | Object 3 | |
|
||||
\-----------/ \-----------/ \-----------/ \-----------/ |
|
||||
|
|
||||
+--/
|
||||
|
||||
|
||||
+--\
|
||||
|
|
||||
/-----------\ /-----------\ /-----------\ /-----------\ |
|
||||
/-----------\ /-----------\ /-----------\ /-----------\ |
|
||||
| Begin cCCC| | Begin cCCC| | Begin cCCC| | Begin cCCC| |
|
||||
| Object 4 | | Object 5 | | Object 6 | | Object 7 | |
|
||||
| Object 4 | | Object 5 | | Object 6 | | Object 7 | |
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ |
|
||||
| stripe | | stripe | | stripe | | stripe | |
|
||||
| unit 16 | | unit 17 | | unit 18 | | unit 19 | |
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ |
|
||||
| stripe | | stripe | | stripe | | stripe | +-\
|
||||
| stripe | | stripe | | stripe | | stripe | +-\
|
||||
| unit 20 | | unit 21 | | unit 22 | | unit 23 | | Object
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ +- Set
|
||||
| stripe | | stripe | | stripe | | stripe | | 2
|
||||
| stripe | | stripe | | stripe | | stripe | | 2
|
||||
| unit 24 | | unit 25 | | unit 26 | | unit 27 | +-/
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ |
|
||||
| stripe | | stripe | | stripe | | stripe | |
|
||||
| unit 28 | | unit 29 | | unit 30 | | unit 31 | |
|
||||
+-----------+ +-----------+ +-----------+ +-----------+ |
|
||||
| End cCCC | | End cCCC | | End cCCC | | End cCCC | |
|
||||
| Object 4 | | Object 5 | | Object 6 | | Object 7 | |
|
||||
| Object 4 | | Object 5 | | Object 6 | | Object 7 | |
|
||||
\-----------/ \-----------/ \-----------/ \-----------/ |
|
||||
|
|
||||
+--/
|
||||
|
||||
Three important variables determine how Ceph stripes data:
|
||||
Three important variables determine how Ceph stripes data:
|
||||
|
||||
- **Object Size:** Objects in the Ceph Storage Cluster have a maximum
|
||||
configurable size (e.g., 2MB, 4MB, etc.). The object size should be large
|
||||
@ -1474,24 +1479,24 @@ Three important variables determine how Ceph stripes data:
|
||||
the stripe unit.
|
||||
|
||||
- **Stripe Width:** Stripes have a configurable unit size (e.g., 64kb).
|
||||
The Ceph Client divides the data it will write to objects into equally
|
||||
sized stripe units, except for the last stripe unit. A stripe width,
|
||||
should be a fraction of the Object Size so that an object may contain
|
||||
The Ceph Client divides the data it will write to objects into equally
|
||||
sized stripe units, except for the last stripe unit. A stripe width,
|
||||
should be a fraction of the Object Size so that an object may contain
|
||||
many stripe units.
|
||||
|
||||
- **Stripe Count:** The Ceph Client writes a sequence of stripe units
|
||||
over a series of objects determined by the stripe count. The series
|
||||
of objects is called an object set. After the Ceph Client writes to
|
||||
over a series of objects determined by the stripe count. The series
|
||||
of objects is called an object set. After the Ceph Client writes to
|
||||
the last object in the object set, it returns to the first object in
|
||||
the object set.
|
||||
|
||||
|
||||
.. important:: Test the performance of your striping configuration before
|
||||
putting your cluster into production. You CANNOT change these striping
|
||||
parameters after you stripe the data and write it to objects.
|
||||
|
||||
Once the Ceph Client has striped data to stripe units and mapped the stripe
|
||||
units to objects, Ceph's CRUSH algorithm maps the objects to placement groups,
|
||||
and the placement groups to Ceph OSD Daemons before the objects are stored as
|
||||
and the placement groups to Ceph OSD Daemons before the objects are stored as
|
||||
files on a storage drive.
|
||||
|
||||
.. note:: Since a client writes to a single pool, all data striped into objects
|
||||
@ -1513,23 +1518,23 @@ Ceph Clients include a number of service interfaces. These include:
|
||||
that uses ``librbd`` directly--avoiding the kernel object overhead for
|
||||
virtualized systems.
|
||||
|
||||
- **Object Storage:** The :term:`Ceph Object Storage` (a.k.a., RGW) service
|
||||
- **Object Storage:** The :term:`Ceph Object Storage` (a.k.a., RGW) service
|
||||
provides RESTful APIs with interfaces that are compatible with Amazon S3
|
||||
and OpenStack Swift.
|
||||
|
||||
- **Filesystem**: The :term:`Ceph File System` (CephFS) service provides
|
||||
a POSIX compliant filesystem usable with ``mount`` or as
|
||||
and OpenStack Swift.
|
||||
|
||||
- **Filesystem**: The :term:`Ceph File System` (CephFS) service provides
|
||||
a POSIX compliant filesystem usable with ``mount`` or as
|
||||
a filesystem in user space (FUSE).
|
||||
|
||||
Ceph can run additional instances of OSDs, MDSs, and monitors for scalability
|
||||
and high availability. The following diagram depicts the high-level
|
||||
architecture.
|
||||
architecture.
|
||||
|
||||
.. ditaa::
|
||||
|
||||
+--------------+ +----------------+ +-------------+
|
||||
| Block Device | | Object Storage | | CephFS |
|
||||
+--------------+ +----------------+ +-------------+
|
||||
+--------------+ +----------------+ +-------------+
|
||||
|
||||
+--------------+ +----------------+ +-------------+
|
||||
| librbd | | librgw | | libcephfs |
|
||||
@ -1561,10 +1566,10 @@ another application.
|
||||
.. topic:: S3/Swift Objects and Store Cluster Objects Compared
|
||||
|
||||
Ceph's Object Storage uses the term *object* to describe the data it stores.
|
||||
S3 and Swift objects are not the same as the objects that Ceph writes to the
|
||||
S3 and Swift objects are not the same as the objects that Ceph writes to the
|
||||
Ceph Storage Cluster. Ceph Object Storage objects are mapped to Ceph Storage
|
||||
Cluster objects. The S3 and Swift objects do not necessarily
|
||||
correspond in a 1:1 manner with an object stored in the storage cluster. It
|
||||
Cluster objects. The S3 and Swift objects do not necessarily
|
||||
correspond in a 1:1 manner with an object stored in the storage cluster. It
|
||||
is possible for an S3 or Swift object to map to multiple Ceph objects.
|
||||
|
||||
See `Ceph Object Storage`_ for details.
|
||||
@ -1580,7 +1585,7 @@ Ceph Storage Cluster, where each object gets mapped to a placement group and
|
||||
distributed, and the placement groups are spread across separate ``ceph-osd``
|
||||
daemons throughout the cluster.
|
||||
|
||||
.. important:: Striping allows RBD block devices to perform better than a single
|
||||
.. important:: Striping allows RBD block devices to perform better than a single
|
||||
server could!
|
||||
|
||||
Thin-provisioned snapshottable Ceph Block Devices are an attractive option for
|
||||
@ -1589,7 +1594,8 @@ typically deploy a Ceph Block Device with the ``rbd`` network storage driver in
|
||||
QEMU/KVM, where the host machine uses ``librbd`` to provide a block device
|
||||
service to the guest. Many cloud computing stacks use ``libvirt`` to integrate
|
||||
with hypervisors. You can use thin-provisioned Ceph Block Devices with QEMU and
|
||||
``libvirt`` to support OpenStack and CloudStack among other solutions.
|
||||
``libvirt`` to support OpenStack, OpenNebula and CloudStack
|
||||
among other solutions.
|
||||
|
||||
While we do not provide ``librbd`` support with other hypervisors at this time,
|
||||
you may also use Ceph Block Device kernel objects to provide a block device to a
|
||||
@ -1614,7 +1620,7 @@ a Filesystem in User Space (FUSE).
|
||||
|
||||
+-----------------------+ +------------------------+
|
||||
| CephFS Kernel Object | | CephFS FUSE |
|
||||
+-----------------------+ +------------------------+
|
||||
+-----------------------+ +------------------------+
|
||||
|
||||
+---------------------------------------------------+
|
||||
| CephFS Library (libcephfs) |
|
||||
@ -1643,9 +1649,9 @@ CephFS separates the metadata from the data, storing the metadata in the MDS,
|
||||
and storing the file data in one or more objects in the Ceph Storage Cluster.
|
||||
The Ceph filesystem aims for POSIX compatibility. ``ceph-mds`` can run as a
|
||||
single process, or it can be distributed out to multiple physical machines,
|
||||
either for high availability or for scalability.
|
||||
either for high availability or for scalability.
|
||||
|
||||
- **High Availability**: The extra ``ceph-mds`` instances can be `standby`,
|
||||
- **High Availability**: The extra ``ceph-mds`` instances can be `standby`,
|
||||
ready to take over the duties of any failed ``ceph-mds`` that was
|
||||
`active`. This is easy because all the data, including the journal, is
|
||||
stored on RADOS. The transition is triggered automatically by ``ceph-mon``.
|
||||
|
@ -22,20 +22,20 @@ Preparation
|
||||
#. Make sure that the ``cephadm`` command line tool is available on each host
|
||||
in the existing cluster. See :ref:`get-cephadm` to learn how.
|
||||
|
||||
#. Prepare each host for use by ``cephadm`` by running this command:
|
||||
#. Prepare each host for use by ``cephadm`` by running this command on that host:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
cephadm prepare-host
|
||||
|
||||
#. Choose a version of Ceph to use for the conversion. This procedure will work
|
||||
with any release of Ceph that is Octopus (15.2.z) or later, inclusive. The
|
||||
with any release of Ceph that is Octopus (15.2.z) or later. The
|
||||
latest stable release of Ceph is the default. You might be upgrading from an
|
||||
earlier Ceph release at the same time that you're performing this
|
||||
conversion; if you are upgrading from an earlier release, make sure to
|
||||
conversion. If you are upgrading from an earlier release, make sure to
|
||||
follow any upgrade-related instructions for that release.
|
||||
|
||||
Pass the image to cephadm with the following command:
|
||||
Pass the Ceph container image to cephadm with the following command:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
@ -50,25 +50,27 @@ Preparation
|
||||
|
||||
cephadm ls
|
||||
|
||||
Before starting the conversion process, ``cephadm ls`` shows all existing
|
||||
daemons to have a style of ``legacy``. As the adoption process progresses,
|
||||
adopted daemons will appear with a style of ``cephadm:v1``.
|
||||
Before starting the conversion process, ``cephadm ls`` reports all existing
|
||||
daemons with the style ``legacy``. As the adoption process progresses,
|
||||
adopted daemons will appear with the style ``cephadm:v1``.
|
||||
|
||||
|
||||
Adoption process
|
||||
----------------
|
||||
|
||||
#. Make sure that the ceph configuration has been migrated to use the cluster
|
||||
config database. If the ``/etc/ceph/ceph.conf`` is identical on each host,
|
||||
then the following command can be run on one single host and will affect all
|
||||
hosts:
|
||||
#. Make sure that the ceph configuration has been migrated to use the cluster's
|
||||
central config database. If ``/etc/ceph/ceph.conf`` is identical on all
|
||||
hosts, then the following command can be run on one host and will take
|
||||
effect for all hosts:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph config assimilate-conf -i /etc/ceph/ceph.conf
|
||||
|
||||
If there are configuration variations between hosts, you will need to repeat
|
||||
this command on each host. During this adoption process, view the cluster's
|
||||
this command on each host, taking care that if there are conflicting option
|
||||
settings across hosts, the values from the last host will be used. During this
|
||||
adoption process, view the cluster's central
|
||||
configuration to confirm that it is complete by running the following
|
||||
command:
|
||||
|
||||
@ -76,36 +78,36 @@ Adoption process
|
||||
|
||||
ceph config dump
|
||||
|
||||
#. Adopt each monitor:
|
||||
#. Adopt each Monitor:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
cephadm adopt --style legacy --name mon.<hostname>
|
||||
|
||||
Each legacy monitor should stop, quickly restart as a cephadm
|
||||
Each legacy Monitor will stop, quickly restart as a cephadm
|
||||
container, and rejoin the quorum.
|
||||
|
||||
#. Adopt each manager:
|
||||
#. Adopt each Manager:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
cephadm adopt --style legacy --name mgr.<hostname>
|
||||
|
||||
#. Enable cephadm:
|
||||
#. Enable cephadm orchestration:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph mgr module enable cephadm
|
||||
ceph orch set backend cephadm
|
||||
|
||||
#. Generate an SSH key:
|
||||
#. Generate an SSH key for cephadm:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph cephadm generate-key
|
||||
ceph cephadm get-pub-key > ~/ceph.pub
|
||||
|
||||
#. Install the cluster SSH key on each host in the cluster:
|
||||
#. Install the cephadm SSH key on each host in the cluster:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
@ -118,9 +120,10 @@ Adoption process
|
||||
SSH keys.
|
||||
|
||||
.. note::
|
||||
It is also possible to have cephadm use a non-root user to SSH
|
||||
It is also possible to arrange for cephadm to use a non-root user to SSH
|
||||
into cluster hosts. This user needs to have passwordless sudo access.
|
||||
Use ``ceph cephadm set-user <user>`` and copy the SSH key to that user.
|
||||
Use ``ceph cephadm set-user <user>`` and copy the SSH key to that user's
|
||||
home directory on each host.
|
||||
See :ref:`cephadm-ssh-user`
|
||||
|
||||
#. Tell cephadm which hosts to manage:
|
||||
@ -129,10 +132,10 @@ Adoption process
|
||||
|
||||
ceph orch host add <hostname> [ip-address]
|
||||
|
||||
This will perform a ``cephadm check-host`` on each host before adding it;
|
||||
this check ensures that the host is functioning properly. The IP address
|
||||
argument is recommended; if not provided, then the host name will be resolved
|
||||
via DNS.
|
||||
This will run ``cephadm check-host`` on each host before adding it.
|
||||
This check ensures that the host is functioning properly. The IP address
|
||||
argument is recommended. If the address is not provided, then the host name
|
||||
will be resolved via DNS.
|
||||
|
||||
#. Verify that the adopted monitor and manager daemons are visible:
|
||||
|
||||
@ -153,8 +156,8 @@ Adoption process
|
||||
cephadm adopt --style legacy --name osd.1
|
||||
cephadm adopt --style legacy --name osd.2
|
||||
|
||||
#. Redeploy MDS daemons by telling cephadm how many daemons to run for
|
||||
each file system. List file systems by name with the command ``ceph fs
|
||||
#. Redeploy CephFS MDS daemons (if deployed) by telling cephadm how many daemons to run for
|
||||
each file system. List CephFS file systems by name with the command ``ceph fs
|
||||
ls``. Run the following command on the master nodes to redeploy the MDS
|
||||
daemons:
|
||||
|
||||
@ -189,19 +192,19 @@ Adoption process
|
||||
systemctl stop ceph-mds.target
|
||||
rm -rf /var/lib/ceph/mds/ceph-*
|
||||
|
||||
#. Redeploy RGW daemons. Cephadm manages RGW daemons by zone. For each
|
||||
zone, deploy new RGW daemons with cephadm:
|
||||
#. Redeploy Ceph Object Gateway RGW daemons if deployed. Cephadm manages RGW
|
||||
daemons by zone. For each zone, deploy new RGW daemons with cephadm:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch apply rgw <svc_id> [--realm=<realm>] [--zone=<zone>] [--port=<port>] [--ssl] [--placement=<placement>]
|
||||
|
||||
where *<placement>* can be a simple daemon count, or a list of
|
||||
specific hosts (see :ref:`orchestrator-cli-placement-spec`), and the
|
||||
specific hosts (see :ref:`orchestrator-cli-placement-spec`). The
|
||||
zone and realm arguments are needed only for a multisite setup.
|
||||
|
||||
After the daemons have started and you have confirmed that they are
|
||||
functioning, stop and remove the old, legacy daemons:
|
||||
functioning, stop and remove the legacy daemons:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
|
@ -1,36 +1,36 @@
|
||||
=======================
|
||||
Basic Ceph Client Setup
|
||||
=======================
|
||||
Client machines require some basic configuration to interact with
|
||||
Ceph clusters. This section describes how to configure a client machine
|
||||
so that it can interact with a Ceph cluster.
|
||||
Client hosts require basic configuration to interact with
|
||||
Ceph clusters. This section describes how to perform this configuration.
|
||||
|
||||
.. note::
|
||||
Most client machines need to install only the `ceph-common` package
|
||||
and its dependencies. Such a setup supplies the basic `ceph` and
|
||||
`rados` commands, as well as other commands including `mount.ceph`
|
||||
and `rbd`.
|
||||
Most client hosts need to install only the ``ceph-common`` package
|
||||
and its dependencies. Such an installation supplies the basic ``ceph`` and
|
||||
``rados`` commands, as well as other commands including ``mount.ceph``
|
||||
and ``rbd``.
|
||||
|
||||
Config File Setup
|
||||
=================
|
||||
Client machines usually require smaller configuration files (here
|
||||
sometimes called "config files") than do full-fledged cluster members.
|
||||
Client hosts usually require smaller configuration files (here
|
||||
sometimes called "config files") than do back-end cluster hosts.
|
||||
To generate a minimal config file, log into a host that has been
|
||||
configured as a client or that is running a cluster daemon, and then run the following command:
|
||||
configured as a client or that is running a cluster daemon, then
|
||||
run the following command:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph config generate-minimal-conf
|
||||
|
||||
This command generates a minimal config file that tells the client how
|
||||
to reach the Ceph monitors. The contents of this file should usually
|
||||
be installed in ``/etc/ceph/ceph.conf``.
|
||||
to reach the Ceph Monitors. This file should usually
|
||||
be copied to ``/etc/ceph/ceph.conf`` on each client host.
|
||||
|
||||
Keyring Setup
|
||||
=============
|
||||
Most Ceph clusters run with authentication enabled. This means that
|
||||
the client needs keys in order to communicate with the machines in the
|
||||
cluster. To generate a keyring file with credentials for `client.fs`,
|
||||
the client needs keys in order to communicate with Ceph daemons.
|
||||
To generate a keyring file with credentials for ``client.fs``,
|
||||
log into an running cluster member and run the following command:
|
||||
|
||||
.. prompt:: bash $
|
||||
@ -40,6 +40,10 @@ log into an running cluster member and run the following command:
|
||||
The resulting output is directed into a keyring file, typically
|
||||
``/etc/ceph/ceph.keyring``.
|
||||
|
||||
To gain a broader understanding of client keyring distribution and administration, you should read :ref:`client_keyrings_and_configs`.
|
||||
To gain a broader understanding of client keyring distribution and administration,
|
||||
you should read :ref:`client_keyrings_and_configs`.
|
||||
|
||||
To see an example that explains how to distribute ``ceph.conf`` configuration files to hosts that are tagged with the ``bare_config`` label, you should read the section called "Distributing ceph.conf to hosts tagged with bare_config" in the section called :ref:`etc_ceph_conf_distribution`.
|
||||
To see an example that explains how to distribute ``ceph.conf`` configuration
|
||||
files to hosts that are tagged with the ``bare_config`` label, you should read
|
||||
the subsection named "Distributing ceph.conf to hosts tagged with bare_config"
|
||||
under the heading :ref:`etc_ceph_conf_distribution`.
|
||||
|
@ -30,8 +30,8 @@ This table shows which version pairs are expected to work or not work together:
|
||||
|
||||
.. note::
|
||||
|
||||
While not all podman versions have been actively tested against
|
||||
all Ceph versions, there are no known issues with using podman
|
||||
While not all Podman versions have been actively tested against
|
||||
all Ceph versions, there are no known issues with using Podman
|
||||
version 3.0 or greater with Ceph Quincy and later releases.
|
||||
|
||||
.. warning::
|
||||
|
@ -74,9 +74,9 @@ To add each new host to the cluster, perform two steps:
|
||||
ceph orch host add host2 10.10.0.102
|
||||
ceph orch host add host3 10.10.0.103
|
||||
|
||||
It is best to explicitly provide the host IP address. If an IP is
|
||||
It is best to explicitly provide the host IP address. If an address is
|
||||
not provided, then the host name will be immediately resolved via
|
||||
DNS and that IP will be used.
|
||||
DNS and the result will be used.
|
||||
|
||||
One or more labels can also be included to immediately label the
|
||||
new host. For example, by default the ``_admin`` label will make
|
||||
@ -104,7 +104,7 @@ To drain all daemons from a host, run a command of the following form:
|
||||
The ``_no_schedule`` and ``_no_conf_keyring`` labels will be applied to the
|
||||
host. See :ref:`cephadm-special-host-labels`.
|
||||
|
||||
If you only want to drain daemons but leave managed ceph conf and keyring
|
||||
If you want to drain daemons but leave managed `ceph.conf` and keyring
|
||||
files on the host, you may pass the ``--keep-conf-keyring`` flag to the
|
||||
drain command.
|
||||
|
||||
@ -115,7 +115,8 @@ drain command.
|
||||
This will apply the ``_no_schedule`` label to the host but not the
|
||||
``_no_conf_keyring`` label.
|
||||
|
||||
All OSDs on the host will be scheduled to be removed. You can check the progress of the OSD removal operation with the following command:
|
||||
All OSDs on the host will be scheduled to be removed. You can check
|
||||
progress of the OSD removal operation with the following command:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
@ -148,7 +149,7 @@ cluster by running the following command:
|
||||
Offline host removal
|
||||
--------------------
|
||||
|
||||
Even if a host is offline and can not be recovered, it can be removed from the
|
||||
If a host is offline and can not be recovered, it can be removed from the
|
||||
cluster by running a command of the following form:
|
||||
|
||||
.. prompt:: bash #
|
||||
@ -250,8 +251,8 @@ Rescanning Host Devices
|
||||
=======================
|
||||
|
||||
Some servers and external enclosures may not register device removal or insertion with the
|
||||
kernel. In these scenarios, you'll need to perform a host rescan. A rescan is typically
|
||||
non-disruptive, and can be performed with the following CLI command:
|
||||
kernel. In these scenarios, you'll need to perform a device rescan on the appropriate host.
|
||||
A rescan is typically non-disruptive, and can be performed with the following CLI command:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
@ -314,19 +315,43 @@ create a new CRUSH host located in the specified hierarchy.
|
||||
|
||||
.. note::
|
||||
|
||||
The ``location`` attribute will be only affect the initial CRUSH location. Subsequent
|
||||
changes of the ``location`` property will be ignored. Also, removing a host will not remove
|
||||
any CRUSH buckets.
|
||||
The ``location`` attribute will be only affect the initial CRUSH location.
|
||||
Subsequent changes of the ``location`` property will be ignored. Also,
|
||||
removing a host will not remove an associated CRUSH bucket unless the
|
||||
``--rm-crush-entry`` flag is provided to the ``orch host rm`` command
|
||||
|
||||
See also :ref:`crush_map_default_types`.
|
||||
|
||||
Removing a host from the CRUSH map
|
||||
==================================
|
||||
|
||||
The ``ceph orch host rm`` command has support for removing the associated host bucket
|
||||
from the CRUSH map. This is done by providing the ``--rm-crush-entry`` flag.
|
||||
|
||||
.. prompt:: bash [ceph:root@host1/]#
|
||||
|
||||
ceph orch host rm host1 --rm-crush-entry
|
||||
|
||||
When this flag is specified, cephadm will attempt to remove the host bucket
|
||||
from the CRUSH map as part of the host removal process. Note that if
|
||||
it fails to do so, cephadm will report the failure and the host will remain under
|
||||
cephadm control.
|
||||
|
||||
.. note::
|
||||
|
||||
Removal from the CRUSH map will fail if there are OSDs deployed on the
|
||||
host. If you would like to remove all the host's OSDs as well, please start
|
||||
by using the ``ceph orch host drain`` command to do so. Once the OSDs
|
||||
have been removed, then you may direct cephadm remove the CRUSH bucket
|
||||
along with the host using the ``--rm-crush-entry`` flag.
|
||||
|
||||
OS Tuning Profiles
|
||||
==================
|
||||
|
||||
Cephadm can be used to manage operating-system-tuning profiles that apply sets
|
||||
of sysctl settings to sets of hosts.
|
||||
Cephadm can be used to manage operating system tuning profiles that apply
|
||||
``sysctl`` settings to sets of hosts.
|
||||
|
||||
Create a YAML spec file in the following format:
|
||||
To do so, create a YAML spec file in the following format:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
@ -345,18 +370,21 @@ Apply the tuning profile with the following command:
|
||||
|
||||
ceph orch tuned-profile apply -i <tuned-profile-file-name>
|
||||
|
||||
This profile is written to ``/etc/sysctl.d/`` on each host that matches the
|
||||
hosts specified in the placement block of the yaml, and ``sysctl --system`` is
|
||||
This profile is written to a file under ``/etc/sysctl.d/`` on each host
|
||||
specified in the ``placement`` block, then ``sysctl --system`` is
|
||||
run on the host.
|
||||
|
||||
.. note::
|
||||
|
||||
The exact filename that the profile is written to within ``/etc/sysctl.d/``
|
||||
is ``<profile-name>-cephadm-tuned-profile.conf``, where ``<profile-name>`` is
|
||||
the ``profile_name`` setting that you specify in the YAML spec. Because
|
||||
the ``profile_name`` setting that you specify in the YAML spec. We suggest
|
||||
naming these profiles following the usual ``sysctl.d`` `NN-xxxxx` convention. Because
|
||||
sysctl settings are applied in lexicographical order (sorted by the filename
|
||||
in which the setting is specified), you may want to set the ``profile_name``
|
||||
in your spec so that it is applied before or after other conf files.
|
||||
in which the setting is specified), you may want to carefully choose
|
||||
the ``profile_name`` in your spec so that it is applied before or after other
|
||||
conf files. Careful selection ensures that values supplied here override or
|
||||
do not override those in other ``sysctl.d`` files as desired.
|
||||
|
||||
.. note::
|
||||
|
||||
@ -365,7 +393,7 @@ run on the host.
|
||||
|
||||
.. note::
|
||||
|
||||
Applying tuned profiles is idempotent when the ``--no-overwrite`` option is
|
||||
Applying tuning profiles is idempotent when the ``--no-overwrite`` option is
|
||||
passed. Moreover, if the ``--no-overwrite`` option is passed, existing
|
||||
profiles with the same name are not overwritten.
|
||||
|
||||
@ -525,7 +553,7 @@ There are two ways to customize this configuration for your environment:
|
||||
|
||||
We do *not recommend* this approach. The path name must be
|
||||
visible to *any* mgr daemon, and cephadm runs all daemons as
|
||||
containers. That means that the file either need to be placed
|
||||
containers. That means that the file must either be placed
|
||||
inside a customized container image for your deployment, or
|
||||
manually distributed to the mgr data directory
|
||||
(``/var/lib/ceph/<cluster-fsid>/mgr.<id>`` on the host, visible at
|
||||
@ -578,8 +606,8 @@ Note that ``man hostname`` recommends ``hostname`` to return the bare
|
||||
host name:
|
||||
|
||||
The FQDN (Fully Qualified Domain Name) of the system is the
|
||||
name that the resolver(3) returns for the host name, such as,
|
||||
ursula.example.com. It is usually the hostname followed by the DNS
|
||||
name that the resolver(3) returns for the host name, for example
|
||||
``ursula.example.com``. It is usually the short hostname followed by the DNS
|
||||
domain name (the part after the first dot). You can check the FQDN
|
||||
using ``hostname --fqdn`` or the domain name using ``dnsdomainname``.
|
||||
|
||||
|
@ -4,7 +4,7 @@
|
||||
Deploying a new Ceph cluster
|
||||
============================
|
||||
|
||||
Cephadm creates a new Ceph cluster by "bootstrapping" on a single
|
||||
Cephadm creates a new Ceph cluster by bootstrapping a single
|
||||
host, expanding the cluster to encompass any additional hosts, and
|
||||
then deploying the needed services.
|
||||
|
||||
@ -18,7 +18,7 @@ Requirements
|
||||
- Python 3
|
||||
- Systemd
|
||||
- Podman or Docker for running containers
|
||||
- Time synchronization (such as chrony or NTP)
|
||||
- Time synchronization (such as Chrony or the legacy ``ntpd``)
|
||||
- LVM2 for provisioning storage devices
|
||||
|
||||
Any modern Linux distribution should be sufficient. Dependencies
|
||||
@ -45,6 +45,13 @@ There are two ways to install ``cephadm``:
|
||||
Choose either the distribution-specific method or the curl-based method. Do
|
||||
not attempt to use both these methods on one system.
|
||||
|
||||
.. note:: Recent versions of cephadm are distributed as an executable compiled
|
||||
from source code. Unlike for earlier versions of Ceph it is no longer
|
||||
sufficient to copy a single script from Ceph's git tree and run it. If you
|
||||
wish to run cephadm using a development version you should create your own
|
||||
build of cephadm. See :ref:`compiling-cephadm` for details on how to create
|
||||
your own standalone cephadm executable.
|
||||
|
||||
.. _cephadm_install_distros:
|
||||
|
||||
distribution-specific installations
|
||||
@ -85,9 +92,9 @@ that case, you can install cephadm directly. For example:
|
||||
curl-based installation
|
||||
-----------------------
|
||||
|
||||
* First, determine what version of Ceph you will need. You can use the releases
|
||||
* First, determine what version of Ceph you wish to install. You can use the releases
|
||||
page to find the `latest active releases <https://docs.ceph.com/en/latest/releases/#active-releases>`_.
|
||||
For example, we might look at that page and find that ``18.2.0`` is the latest
|
||||
For example, we might find that ``18.2.1`` is the latest
|
||||
active release.
|
||||
|
||||
* Use ``curl`` to fetch a build of cephadm for that release.
|
||||
@ -113,7 +120,7 @@ curl-based installation
|
||||
* If you encounter any issues with running cephadm due to errors including
|
||||
the message ``bad interpreter``, then you may not have Python or
|
||||
the correct version of Python installed. The cephadm tool requires Python 3.6
|
||||
and above. You can manually run cephadm with a particular version of Python by
|
||||
or later. You can manually run cephadm with a particular version of Python by
|
||||
prefixing the command with your installed Python version. For example:
|
||||
|
||||
.. prompt:: bash #
|
||||
@ -121,6 +128,11 @@ curl-based installation
|
||||
|
||||
python3.8 ./cephadm <arguments...>
|
||||
|
||||
* Although the standalone cephadm is sufficient to bootstrap a cluster, it is
|
||||
best to have the ``cephadm`` command installed on the host. To install
|
||||
the packages that provide the ``cephadm`` command, run the following
|
||||
commands:
|
||||
|
||||
.. _cephadm_update:
|
||||
|
||||
update cephadm
|
||||
@ -166,7 +178,7 @@ What to know before you bootstrap
|
||||
The first step in creating a new Ceph cluster is running the ``cephadm
|
||||
bootstrap`` command on the Ceph cluster's first host. The act of running the
|
||||
``cephadm bootstrap`` command on the Ceph cluster's first host creates the Ceph
|
||||
cluster's first "monitor daemon", and that monitor daemon needs an IP address.
|
||||
cluster's first Monitor daemon.
|
||||
You must pass the IP address of the Ceph cluster's first host to the ``ceph
|
||||
bootstrap`` command, so you'll need to know the IP address of that host.
|
||||
|
||||
@ -187,13 +199,13 @@ Run the ``ceph bootstrap`` command:
|
||||
|
||||
This command will:
|
||||
|
||||
* Create a monitor and manager daemon for the new cluster on the local
|
||||
* Create a Monitor and a Manager daemon for the new cluster on the local
|
||||
host.
|
||||
* Generate a new SSH key for the Ceph cluster and add it to the root
|
||||
user's ``/root/.ssh/authorized_keys`` file.
|
||||
* Write a copy of the public key to ``/etc/ceph/ceph.pub``.
|
||||
* Write a minimal configuration file to ``/etc/ceph/ceph.conf``. This
|
||||
file is needed to communicate with the new cluster.
|
||||
file is needed to communicate with Ceph daemons.
|
||||
* Write a copy of the ``client.admin`` administrative (privileged!)
|
||||
secret key to ``/etc/ceph/ceph.client.admin.keyring``.
|
||||
* Add the ``_admin`` label to the bootstrap host. By default, any host
|
||||
@ -205,7 +217,7 @@ This command will:
|
||||
Further information about cephadm bootstrap
|
||||
-------------------------------------------
|
||||
|
||||
The default bootstrap behavior will work for most users. But if you'd like
|
||||
The default bootstrap process will work for most users. But if you'd like
|
||||
immediately to know more about ``cephadm bootstrap``, read the list below.
|
||||
|
||||
Also, you can run ``cephadm bootstrap -h`` to see all of ``cephadm``'s
|
||||
@ -216,15 +228,15 @@ available options.
|
||||
journald. If you want Ceph to write traditional log files to ``/var/log/ceph/$fsid``,
|
||||
use the ``--log-to-file`` option during bootstrap.
|
||||
|
||||
* Larger Ceph clusters perform better when (external to the Ceph cluster)
|
||||
* Larger Ceph clusters perform best when (external to the Ceph cluster)
|
||||
public network traffic is separated from (internal to the Ceph cluster)
|
||||
cluster traffic. The internal cluster traffic handles replication, recovery,
|
||||
and heartbeats between OSD daemons. You can define the :ref:`cluster
|
||||
network<cluster-network>` by supplying the ``--cluster-network`` option to the ``bootstrap``
|
||||
subcommand. This parameter must define a subnet in CIDR notation (for example
|
||||
subcommand. This parameter must be a subnet in CIDR notation (for example
|
||||
``10.90.90.0/24`` or ``fe80::/64``).
|
||||
|
||||
* ``cephadm bootstrap`` writes to ``/etc/ceph`` the files needed to access
|
||||
* ``cephadm bootstrap`` writes to ``/etc/ceph`` files needed to access
|
||||
the new cluster. This central location makes it possible for Ceph
|
||||
packages installed on the host (e.g., packages that give access to the
|
||||
cephadm command line interface) to find these files.
|
||||
@ -245,12 +257,12 @@ available options.
|
||||
EOF
|
||||
$ ./cephadm bootstrap --config initial-ceph.conf ...
|
||||
|
||||
* The ``--ssh-user *<user>*`` option makes it possible to choose which SSH
|
||||
* The ``--ssh-user *<user>*`` option makes it possible to designate which SSH
|
||||
user cephadm will use to connect to hosts. The associated SSH key will be
|
||||
added to ``/home/*<user>*/.ssh/authorized_keys``. The user that you
|
||||
designate with this option must have passwordless sudo access.
|
||||
|
||||
* If you are using a container on an authenticated registry that requires
|
||||
* If you are using a container image from a registry that requires
|
||||
login, you may add the argument:
|
||||
|
||||
* ``--registry-json <path to json file>``
|
||||
@ -261,7 +273,7 @@ available options.
|
||||
|
||||
Cephadm will attempt to log in to this registry so it can pull your container
|
||||
and then store the login info in its config database. Other hosts added to
|
||||
the cluster will then also be able to make use of the authenticated registry.
|
||||
the cluster will then also be able to make use of the authenticated container registry.
|
||||
|
||||
* See :ref:`cephadm-deployment-scenarios` for additional examples for using ``cephadm bootstrap``.
|
||||
|
||||
@ -326,7 +338,7 @@ Add all hosts to the cluster by following the instructions in
|
||||
|
||||
By default, a ``ceph.conf`` file and a copy of the ``client.admin`` keyring are
|
||||
maintained in ``/etc/ceph`` on all hosts that have the ``_admin`` label. This
|
||||
label is initially applied only to the bootstrap host. We usually recommend
|
||||
label is initially applied only to the bootstrap host. We recommend
|
||||
that one or more other hosts be given the ``_admin`` label so that the Ceph CLI
|
||||
(for example, via ``cephadm shell``) is easily accessible on multiple hosts. To add
|
||||
the ``_admin`` label to additional host(s), run a command of the following form:
|
||||
@ -339,9 +351,10 @@ the ``_admin`` label to additional host(s), run a command of the following form:
|
||||
Adding additional MONs
|
||||
======================
|
||||
|
||||
A typical Ceph cluster has three or five monitor daemons spread
|
||||
A typical Ceph cluster has three or five Monitor daemons spread
|
||||
across different hosts. We recommend deploying five
|
||||
monitors if there are five or more nodes in your cluster.
|
||||
Monitors if there are five or more nodes in your cluster. Most clusters do not
|
||||
benefit from seven or more Monitors.
|
||||
|
||||
Please follow :ref:`deploy_additional_monitors` to deploy additional MONs.
|
||||
|
||||
@ -366,12 +379,12 @@ See :ref:`osd_autotune`.
|
||||
|
||||
To deploy hyperconverged Ceph with TripleO, please refer to the TripleO documentation: `Scenario: Deploy Hyperconverged Ceph <https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/cephadm.html#scenario-deploy-hyperconverged-ceph>`_
|
||||
|
||||
In other cases where the cluster hardware is not exclusively used by Ceph (hyperconverged),
|
||||
In other cases where the cluster hardware is not exclusively used by Ceph (converged infrastructure),
|
||||
reduce the memory consumption of Ceph like so:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
# hyperconverged only:
|
||||
# converged only:
|
||||
ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2
|
||||
|
||||
Then enable memory autotuning:
|
||||
@ -400,9 +413,11 @@ Different deployment scenarios
|
||||
Single host
|
||||
-----------
|
||||
|
||||
To configure a Ceph cluster to run on a single host, use the
|
||||
``--single-host-defaults`` flag when bootstrapping. For use cases of this, see
|
||||
:ref:`one-node-cluster`.
|
||||
To deploy a Ceph cluster running on a single host, use the
|
||||
``--single-host-defaults`` flag when bootstrapping. For use cases, see
|
||||
:ref:`one-node-cluster`. Such clusters are generally not suitable for
|
||||
production.
|
||||
|
||||
|
||||
The ``--single-host-defaults`` flag sets the following configuration options::
|
||||
|
||||
@ -419,8 +434,8 @@ Deployment in an isolated environment
|
||||
-------------------------------------
|
||||
|
||||
You might need to install cephadm in an environment that is not connected
|
||||
directly to the internet (such an environment is also called an "isolated
|
||||
environment"). This can be done if a custom container registry is used. Either
|
||||
directly to the Internet (an "isolated" or "airgapped"
|
||||
environment). This requires the use of a custom container registry. Either
|
||||
of two kinds of custom container registry can be used in this scenario: (1) a
|
||||
Podman-based or Docker-based insecure registry, or (2) a secure registry.
|
||||
|
||||
@ -569,9 +584,9 @@ in order to have cephadm use them for SSHing between cluster hosts
|
||||
Note that this setup does not require installing the corresponding public key
|
||||
from the private key passed to bootstrap on other nodes. In fact, cephadm will
|
||||
reject the ``--ssh-public-key`` argument when passed along with ``--ssh-signed-cert``.
|
||||
Not because having the public key breaks anything, but because it is not at all needed
|
||||
for this setup and it helps bootstrap differentiate if the user wants the CA signed
|
||||
keys setup or standard pubkey encryption. What this means is, SSH key rotation
|
||||
This is not because having the public key breaks anything, but rather because it is not at all needed
|
||||
and helps the bootstrap command differentiate if the user wants the CA signed
|
||||
keys setup or standard pubkey encryption. What this means is that SSH key rotation
|
||||
would simply be a matter of getting another key signed by the same CA and providing
|
||||
cephadm with the new private key and signed cert. No additional distribution of
|
||||
keys to cluster nodes is needed after the initial setup of the CA key as a trusted key,
|
||||
|
@ -328,15 +328,15 @@ You can disable this health warning by running the following command:
|
||||
|
||||
Cluster Configuration Checks
|
||||
----------------------------
|
||||
Cephadm periodically scans each of the hosts in the cluster in order
|
||||
to understand the state of the OS, disks, NICs etc. These facts can
|
||||
then be analysed for consistency across the hosts in the cluster to
|
||||
Cephadm periodically scans each host in the cluster in order
|
||||
to understand the state of the OS, disks, network interfacess etc. This information can
|
||||
then be analyzed for consistency across the hosts in the cluster to
|
||||
identify any configuration anomalies.
|
||||
|
||||
Enabling Cluster Configuration Checks
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The configuration checks are an **optional** feature, and are enabled
|
||||
These configuration checks are an **optional** feature, and are enabled
|
||||
by running the following command:
|
||||
|
||||
.. prompt:: bash #
|
||||
@ -346,7 +346,7 @@ by running the following command:
|
||||
States Returned by Cluster Configuration Checks
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
The configuration checks are triggered after each host scan (1m). The
|
||||
Configuration checks are triggered after each host scan. The
|
||||
cephadm log entries will show the current state and outcome of the
|
||||
configuration checks as follows:
|
||||
|
||||
@ -383,14 +383,14 @@ To list all the configuration checks and their current states, run the following
|
||||
# ceph cephadm config-check ls
|
||||
|
||||
NAME HEALTHCHECK STATUS DESCRIPTION
|
||||
kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts
|
||||
os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts
|
||||
public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_network
|
||||
kernel_security CEPHADM_CHECK_KERNEL_LSM enabled check that SELINUX/Apparmor profiles are consistent across cluster hosts
|
||||
os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled check that subscription states are consistent for all cluster hosts
|
||||
public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a network interface on the Ceph public_network
|
||||
osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting
|
||||
osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed
|
||||
network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts
|
||||
ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active)
|
||||
kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent
|
||||
osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common network link speed
|
||||
network_missing CEPHADM_CHECK_NETWORK_MISSING enabled check that the cluster/public networks as defined exist on the Ceph hosts
|
||||
ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency: all Ceph daemons should be the same release unless upgrade is in progress
|
||||
kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the maj.min version of the kernel is consistent across Ceph hosts
|
||||
|
||||
The name of each configuration check can be used to enable or disable a specific check by running a command of the following form:
|
||||
:
|
||||
@ -414,31 +414,31 @@ flagged as an anomaly and a healthcheck (WARNING) state raised.
|
||||
|
||||
CEPHADM_CHECK_SUBSCRIPTION
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
This check relates to the status of vendor subscription. This check is
|
||||
performed only for hosts using RHEL, but helps to confirm that all hosts are
|
||||
This check relates to the status of OS vendor subscription. This check is
|
||||
performed only for hosts using RHEL and helps to confirm that all hosts are
|
||||
covered by an active subscription, which ensures that patches and updates are
|
||||
available.
|
||||
|
||||
CEPHADM_CHECK_PUBLIC_MEMBERSHIP
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
All members of the cluster should have NICs configured on at least one of the
|
||||
All members of the cluster should have a network interface configured on at least one of the
|
||||
public network subnets. Hosts that are not on the public network will rely on
|
||||
routing, which may affect performance.
|
||||
|
||||
CEPHADM_CHECK_MTU
|
||||
~~~~~~~~~~~~~~~~~
|
||||
The MTU of the NICs on OSDs can be a key factor in consistent performance. This
|
||||
The MTU of the network interfaces on OSD hosts can be a key factor in consistent performance. This
|
||||
check examines hosts that are running OSD services to ensure that the MTU is
|
||||
configured consistently within the cluster. This is determined by establishing
|
||||
configured consistently within the cluster. This is determined by determining
|
||||
the MTU setting that the majority of hosts is using. Any anomalies result in a
|
||||
Ceph health check.
|
||||
health check.
|
||||
|
||||
CEPHADM_CHECK_LINKSPEED
|
||||
~~~~~~~~~~~~~~~~~~~~~~~
|
||||
This check is similar to the MTU check. Linkspeed consistency is a factor in
|
||||
consistent cluster performance, just as the MTU of the NICs on the OSDs is.
|
||||
This check determines the linkspeed shared by the majority of OSD hosts, and a
|
||||
health check is run for any hosts that are set at a lower linkspeed rate.
|
||||
This check is similar to the MTU check. Link speed consistency is a factor in
|
||||
consistent cluster performance, as is the MTU of the OSD node network interfaces.
|
||||
This check determines the link speed shared by the majority of OSD hosts, and a
|
||||
health check is run for any hosts that are set at a lower link speed rate.
|
||||
|
||||
CEPHADM_CHECK_NETWORK_MISSING
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
@ -448,15 +448,14 @@ a health check is raised.
|
||||
|
||||
CEPHADM_CHECK_CEPH_RELEASE
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
Under normal operations, the Ceph cluster runs daemons under the same ceph
|
||||
release (that is, the Ceph cluster runs all daemons under (for example)
|
||||
Octopus). This check determines the active release for each daemon, and
|
||||
Under normal operations, the Ceph cluster runs daemons that are of the same Ceph
|
||||
release (for example, Reef). This check determines the active release for each daemon, and
|
||||
reports any anomalies as a healthcheck. *This check is bypassed if an upgrade
|
||||
process is active within the cluster.*
|
||||
is in process.*
|
||||
|
||||
CEPHADM_CHECK_KERNEL_VERSION
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
The OS kernel version (maj.min) is checked for consistency across the hosts.
|
||||
The OS kernel version (maj.min) is checked for consistency across hosts.
|
||||
The kernel version of the majority of the hosts is used as the basis for
|
||||
identifying anomalies.
|
||||
|
||||
|
@ -357,7 +357,9 @@ Or in YAML:
|
||||
Placement by pattern matching
|
||||
-----------------------------
|
||||
|
||||
Daemons can be placed on hosts as well:
|
||||
Daemons can be placed on hosts using a host pattern as well.
|
||||
By default, the host pattern is matched using fnmatch which supports
|
||||
UNIX shell-style wildcards (see https://docs.python.org/3/library/fnmatch.html):
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
@ -385,6 +387,26 @@ Or in YAML:
|
||||
placement:
|
||||
host_pattern: "*"
|
||||
|
||||
The host pattern also has support for using a regex. To use a regex, you
|
||||
must either add "regex: " to the start of the pattern when using the
|
||||
command line, or specify a ``pattern_type`` field to be "regex"
|
||||
when using YAML.
|
||||
|
||||
On the command line:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch apply prometheus --placement='regex:FOO[0-9]|BAR[0-9]'
|
||||
|
||||
In YAML:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
service_type: prometheus
|
||||
placement:
|
||||
host_pattern:
|
||||
pattern: 'FOO[0-9]|BAR[0-9]'
|
||||
pattern_type: regex
|
||||
|
||||
Changing the number of daemons
|
||||
------------------------------
|
||||
|
@ -83,6 +83,37 @@ steps below:
|
||||
|
||||
ceph orch apply grafana
|
||||
|
||||
Enabling security for the monitoring stack
|
||||
----------------------------------------------
|
||||
|
||||
By default, in a cephadm-managed cluster, the monitoring components are set up and configured without enabling security measures.
|
||||
While this suffices for certain deployments, others with strict security needs may find it necessary to protect the
|
||||
monitoring stack against unauthorized access. In such cases, cephadm relies on a specific configuration parameter,
|
||||
`mgr/cephadm/secure_monitoring_stack`, which toggles the security settings for all monitoring components. To activate security
|
||||
measures, set this option to ``true`` with a command of the following form:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph config set mgr mgr/cephadm/secure_monitoring_stack true
|
||||
|
||||
This change will trigger a sequence of reconfigurations across all monitoring daemons, typically requiring
|
||||
few minutes until all components are fully operational. The updated secure configuration includes the following modifications:
|
||||
|
||||
#. Prometheus: basic authentication is required to access the web portal and TLS is enabled for secure communication.
|
||||
#. Alertmanager: basic authentication is required to access the web portal and TLS is enabled for secure communication.
|
||||
#. Node Exporter: TLS is enabled for secure communication.
|
||||
#. Grafana: TLS is enabled and authentication is requiered to access the datasource information.
|
||||
|
||||
In this secure setup, users will need to setup authentication
|
||||
(username/password) for both Prometheus and Alertmanager. By default the
|
||||
username and password are set to ``admin``/``admin``. The user can change these
|
||||
value with the commands ``ceph orch prometheus set-credentials`` and ``ceph
|
||||
orch alertmanager set-credentials`` respectively. These commands offer the
|
||||
flexibility to input the username/password either as parameters or via a JSON
|
||||
file, which enhances security. Additionally, Cephadm provides the commands
|
||||
`orch prometheus get-credentials` and `orch alertmanager get-credentials` to
|
||||
retrieve the current credentials.
|
||||
|
||||
.. _cephadm-monitoring-centralized-logs:
|
||||
|
||||
Centralized Logging in Ceph
|
||||
|
@ -15,7 +15,7 @@ Deploying NFS ganesha
|
||||
=====================
|
||||
|
||||
Cephadm deploys NFS Ganesha daemon (or set of daemons). The configuration for
|
||||
NFS is stored in the ``nfs-ganesha`` pool and exports are managed via the
|
||||
NFS is stored in the ``.nfs`` pool and exports are managed via the
|
||||
``ceph nfs export ...`` commands and via the dashboard.
|
||||
|
||||
To deploy a NFS Ganesha gateway, run the following command:
|
||||
|
@ -232,7 +232,7 @@ Remove an OSD
|
||||
|
||||
Removing an OSD from a cluster involves two steps:
|
||||
|
||||
#. evacuating all placement groups (PGs) from the cluster
|
||||
#. evacuating all placement groups (PGs) from the OSD
|
||||
#. removing the PG-free OSD from the cluster
|
||||
|
||||
The following command performs these two steps:
|
||||
|
@ -246,6 +246,7 @@ It is a yaml format file with the following properties:
|
||||
virtual_interface_networks: [ ... ] # optional: list of CIDR networks
|
||||
use_keepalived_multicast: <bool> # optional: Default is False.
|
||||
vrrp_interface_network: <string>/<string> # optional: ex: 192.168.20.0/24
|
||||
health_check_interval: <string> # optional: Default is 2s.
|
||||
ssl_cert: | # optional: SSL certificate and key
|
||||
-----BEGIN CERTIFICATE-----
|
||||
...
|
||||
@ -273,6 +274,7 @@ It is a yaml format file with the following properties:
|
||||
monitor_port: <integer> # ex: 1967, used by haproxy for load balancer status
|
||||
virtual_interface_networks: [ ... ] # optional: list of CIDR networks
|
||||
first_virtual_router_id: <integer> # optional: default 50
|
||||
health_check_interval: <string> # optional: Default is 2s.
|
||||
ssl_cert: | # optional: SSL certificate and key
|
||||
-----BEGIN CERTIFICATE-----
|
||||
...
|
||||
@ -321,6 +323,9 @@ where the properties of this service specification are:
|
||||
keepalived will have different virtual_router_id. In the case of using ``virtual_ips_list``,
|
||||
each IP will create its own virtual router. So the first one will have ``first_virtual_router_id``,
|
||||
second one will have ``first_virtual_router_id`` + 1, etc. Valid values go from 1 to 255.
|
||||
* ``health_check_interval``
|
||||
Default is 2 seconds. This parameter can be used to set the interval between health checks
|
||||
for the haproxy with the backend servers.
|
||||
|
||||
.. _ingress-virtual-ip:
|
||||
|
||||
|
@ -32,7 +32,7 @@ completely by running the following commands:
|
||||
ceph orch set backend ''
|
||||
ceph mgr module disable cephadm
|
||||
|
||||
These commands disable all of the ``ceph orch ...`` CLI commands. All
|
||||
These commands disable all ``ceph orch ...`` CLI commands. All
|
||||
previously deployed daemon containers continue to run and will start just as
|
||||
they were before you ran these commands.
|
||||
|
||||
@ -56,7 +56,7 @@ following form:
|
||||
|
||||
ceph orch ls --service_name=<service-name> --format yaml
|
||||
|
||||
This will return something in the following form:
|
||||
This will return information in the following form:
|
||||
|
||||
.. code-block:: yaml
|
||||
|
||||
@ -252,16 +252,17 @@ For more detail on operations of this kind, see
|
||||
Accessing the Admin Socket
|
||||
--------------------------
|
||||
|
||||
Each Ceph daemon provides an admin socket that bypasses the MONs (See
|
||||
:ref:`rados-monitoring-using-admin-socket`).
|
||||
Each Ceph daemon provides an admin socket that allows runtime option setting and statistic reading. See
|
||||
:ref:`rados-monitoring-using-admin-socket`.
|
||||
|
||||
#. To access the admin socket, enter the daemon container on the host::
|
||||
|
||||
[root@mon1 ~]# cephadm enter --name <daemon-name>
|
||||
|
||||
#. Run a command of the following form to see the admin socket's configuration::
|
||||
#. Run a command of the following forms to see the admin socket's configuration and other available actions::
|
||||
|
||||
[ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-<daemon-name>.asok config show
|
||||
[ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-<daemon-name>.asok help
|
||||
|
||||
Running Various Ceph Tools
|
||||
--------------------------------
|
||||
@ -444,11 +445,11 @@ Running repeated debugging sessions
|
||||
When using ``cephadm shell``, as in the example above, any changes made to the
|
||||
container that is spawned by the shell command are ephemeral. After the shell
|
||||
session exits, the files that were downloaded and installed cease to be
|
||||
available. You can simply re-run the same commands every time ``cephadm
|
||||
shell`` is invoked, but in order to save time and resources one can create a
|
||||
new container image and use it for repeated debugging sessions.
|
||||
available. You can simply re-run the same commands every time ``cephadm shell``
|
||||
is invoked, but to save time and resources you can create a new container image
|
||||
and use it for repeated debugging sessions.
|
||||
|
||||
In the following example, we create a simple file that will construct the
|
||||
In the following example, we create a simple file that constructs the
|
||||
container image. The command below uses podman but it is expected to work
|
||||
correctly even if ``podman`` is replaced with ``docker``::
|
||||
|
||||
@ -463,14 +464,14 @@ correctly even if ``podman`` is replaced with ``docker``::
|
||||
|
||||
The above file creates a new local image named ``ceph:debugging``. This image
|
||||
can be used on the same machine that built it. The image can also be pushed to
|
||||
a container repository or saved and copied to a node runing other Ceph
|
||||
containers. Consult the ``podman`` or ``docker`` documentation for more
|
||||
a container repository or saved and copied to a node that is running other Ceph
|
||||
containers. See the ``podman`` or ``docker`` documentation for more
|
||||
information about the container workflow.
|
||||
|
||||
After the image has been built, it can be used to initiate repeat debugging
|
||||
sessions. By using an image in this way, you avoid the trouble of having to
|
||||
re-install the debug tools and debuginfo packages every time you need to run a
|
||||
debug session. To debug a core file using this image, in the same way as
|
||||
re-install the debug tools and the debuginfo packages every time you need to
|
||||
run a debug session. To debug a core file using this image, in the same way as
|
||||
previously described, run:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
@ -2,7 +2,7 @@
|
||||
Upgrading Ceph
|
||||
==============
|
||||
|
||||
Cephadm can safely upgrade Ceph from one bugfix release to the next. For
|
||||
Cephadm can safely upgrade Ceph from one point release to the next. For
|
||||
example, you can upgrade from v15.2.0 (the first Octopus release) to the next
|
||||
point release, v15.2.1.
|
||||
|
||||
@ -137,25 +137,25 @@ UPGRADE_NO_STANDBY_MGR
|
||||
----------------------
|
||||
|
||||
This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an
|
||||
active standby manager daemon. In order to proceed with the upgrade, Ceph
|
||||
requires an active standby manager daemon (which you can think of in this
|
||||
active standby Manager daemon. In order to proceed with the upgrade, Ceph
|
||||
requires an active standby Manager daemon (which you can think of in this
|
||||
context as "a second manager").
|
||||
|
||||
You can ensure that Cephadm is configured to run 2 (or more) managers by
|
||||
You can ensure that Cephadm is configured to run two (or more) Managers by
|
||||
running the following command:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch apply mgr 2 # or more
|
||||
|
||||
You can check the status of existing mgr daemons by running the following
|
||||
You can check the status of existing Manager daemons by running the following
|
||||
command:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch ps --daemon-type mgr
|
||||
|
||||
If an existing mgr daemon has stopped, you can try to restart it by running the
|
||||
If an existing Manager daemon has stopped, you can try to restart it by running the
|
||||
following command:
|
||||
|
||||
.. prompt:: bash #
|
||||
@ -183,7 +183,7 @@ Using customized container images
|
||||
=================================
|
||||
|
||||
For most users, upgrading requires nothing more complicated than specifying the
|
||||
Ceph version number to upgrade to. In such cases, cephadm locates the specific
|
||||
Ceph version to which to upgrade. In such cases, cephadm locates the specific
|
||||
Ceph container image to use by combining the ``container_image_base``
|
||||
configuration option (default: ``docker.io/ceph/ceph``) with a tag of
|
||||
``vX.Y.Z``.
|
||||
|
@ -1,11 +1,13 @@
|
||||
.. _cephfs_add_remote_mds:
|
||||
|
||||
.. note::
|
||||
It is highly recommended to use :doc:`/cephadm/index` or another Ceph
|
||||
orchestrator for setting up the ceph cluster. Use this approach only if you
|
||||
are setting up the ceph cluster manually. If one still intends to use the
|
||||
manual way for deploying MDS daemons, :doc:`/cephadm/services/mds/` can
|
||||
also be used.
|
||||
.. warning:: The material on this page is to be used only for manually setting
|
||||
up a Ceph cluster. If you intend to use an automated tool such as
|
||||
:doc:`/cephadm/index` to set up a Ceph cluster, do not use the
|
||||
instructions on this page.
|
||||
|
||||
.. note:: If you are certain that you know what you are doing and you intend to
|
||||
manually deploy MDS daemons, see :doc:`/cephadm/services/mds/` before
|
||||
proceeding.
|
||||
|
||||
============================
|
||||
Deploying Metadata Servers
|
||||
|
@ -258,31 +258,47 @@ Clients that are missing newly added features will be evicted automatically.
|
||||
|
||||
Here are the current CephFS features and first release they came out:
|
||||
|
||||
+------------------+--------------+-----------------+
|
||||
| Feature | Ceph release | Upstream Kernel |
|
||||
+==================+==============+=================+
|
||||
| jewel | jewel | 4.5 |
|
||||
+------------------+--------------+-----------------+
|
||||
| kraken | kraken | 4.13 |
|
||||
+------------------+--------------+-----------------+
|
||||
| luminous | luminous | 4.13 |
|
||||
+------------------+--------------+-----------------+
|
||||
| mimic | mimic | 4.19 |
|
||||
+------------------+--------------+-----------------+
|
||||
| reply_encoding | nautilus | 5.1 |
|
||||
+------------------+--------------+-----------------+
|
||||
| reclaim_client | nautilus | N/A |
|
||||
+------------------+--------------+-----------------+
|
||||
| lazy_caps_wanted | nautilus | 5.1 |
|
||||
+------------------+--------------+-----------------+
|
||||
| multi_reconnect | nautilus | 5.1 |
|
||||
+------------------+--------------+-----------------+
|
||||
| deleg_ino | octopus | 5.6 |
|
||||
+------------------+--------------+-----------------+
|
||||
| metric_collect | pacific | N/A |
|
||||
+------------------+--------------+-----------------+
|
||||
| alternate_name | pacific | PLANNED |
|
||||
+------------------+--------------+-----------------+
|
||||
+----------------------------+--------------+-----------------+
|
||||
| Feature | Ceph release | Upstream Kernel |
|
||||
+============================+==============+=================+
|
||||
| jewel | jewel | 4.5 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| kraken | kraken | 4.13 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| luminous | luminous | 4.13 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| mimic | mimic | 4.19 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| reply_encoding | nautilus | 5.1 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| reclaim_client | nautilus | N/A |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| lazy_caps_wanted | nautilus | 5.1 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| multi_reconnect | nautilus | 5.1 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| deleg_ino | octopus | 5.6 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| metric_collect | pacific | N/A |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| alternate_name | pacific | 6.5 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| notify_session_state | quincy | 5.19 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| op_getvxattr | quincy | 6.0 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| 32bits_retry_fwd | reef | 6.6 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| new_snaprealm_info | reef | UNKNOWN |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| has_owner_uidgid | reef | 6.6 |
|
||||
+----------------------------+--------------+-----------------+
|
||||
| client_mds_auth_caps | squid+bp | PLANNED |
|
||||
+----------------------------+--------------+-----------------+
|
||||
|
||||
..
|
||||
Comment: use `git describe --tags --abbrev=0 <commit>` to lookup release
|
||||
|
||||
|
||||
CephFS Feature Descriptions
|
||||
|
||||
@ -340,6 +356,15 @@ Clients can send performance metric to MDS if MDS support this feature.
|
||||
Clients can set and understand "alternate names" for directory entries. This is
|
||||
to be used for encrypted file name support.
|
||||
|
||||
::
|
||||
|
||||
client_mds_auth_caps
|
||||
|
||||
To effectively implement ``root_squash`` in a client's ``mds`` caps, the client
|
||||
must understand that it is enforcing ``root_squash`` and other cap metadata.
|
||||
Clients without this feature are in danger of dropping updates to files. It is
|
||||
recommend to set this feature bit.
|
||||
|
||||
|
||||
Global settings
|
||||
---------------
|
||||
|
@ -47,4 +47,4 @@ client cache.
|
||||
| MDSs | -=-------> | OSDs |
|
||||
+---------------------+ +--------------------+
|
||||
|
||||
.. _Architecture: ../architecture
|
||||
.. _Architecture: ../../architecture
|
||||
|
@ -93,6 +93,15 @@ providing high-availability.
|
||||
.. note:: Deploying a single mirror daemon is recommended. Running multiple
|
||||
daemons is untested.
|
||||
|
||||
The following file types are supported by the mirroring:
|
||||
|
||||
- Regular files (-)
|
||||
- Directory files (d)
|
||||
- Symbolic link file (l)
|
||||
|
||||
The other file types are ignored by the mirroring. So they won't be
|
||||
available on a successfully synchronized peer.
|
||||
|
||||
The mirroring module is disabled by default. To enable the mirroring module,
|
||||
run the following command:
|
||||
|
||||
|
@ -63,6 +63,62 @@ By default, `cephfs-top` uses `client.fstop` user to connect to a Ceph cluster::
|
||||
$ ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd 'allow r' mgr 'allow r'
|
||||
$ cephfs-top
|
||||
|
||||
Description of Fields
|
||||
---------------------
|
||||
|
||||
1. chit : Cap hit
|
||||
Percentage of file capability hits over total number of caps
|
||||
|
||||
2. dlease : Dentry lease
|
||||
Percentage of dentry leases handed out over the total dentry lease requests
|
||||
|
||||
3. ofiles : Opened files
|
||||
Number of opened files
|
||||
|
||||
4. oicaps : Pinned caps
|
||||
Number of pinned caps
|
||||
|
||||
5. oinodes : Opened inodes
|
||||
Number of opened inodes
|
||||
|
||||
6. rtio : Total size of read IOs
|
||||
Number of bytes read in input/output operations generated by all process
|
||||
|
||||
7. wtio : Total size of write IOs
|
||||
Number of bytes written in input/output operations generated by all processes
|
||||
|
||||
8. raio : Average size of read IOs
|
||||
Mean of number of bytes read in input/output operations generated by all
|
||||
process over total IO done
|
||||
|
||||
9. waio : Average size of write IOs
|
||||
Mean of number of bytes written in input/output operations generated by all
|
||||
process over total IO done
|
||||
|
||||
10. rsp : Read speed
|
||||
Speed of read IOs with respect to the duration since the last refresh of clients
|
||||
|
||||
11. wsp : Write speed
|
||||
Speed of write IOs with respect to the duration since the last refresh of clients
|
||||
|
||||
12. rlatavg : Average read latency
|
||||
Mean value of the read latencies
|
||||
|
||||
13. rlatsd : Standard deviation (variance) for read latency
|
||||
Dispersion of the metric for the read latency relative to its mean
|
||||
|
||||
14. wlatavg : Average write latency
|
||||
Mean value of the write latencies
|
||||
|
||||
15. wlatsd : Standard deviation (variance) for write latency
|
||||
Dispersion of the metric for the write latency relative to its mean
|
||||
|
||||
16. mlatavg : Average metadata latency
|
||||
Mean value of the metadata latencies
|
||||
|
||||
17. mlatsd : Standard deviation (variance) for metadata latency
|
||||
Dispersion of the metric for the metadata latency relative to its mean
|
||||
|
||||
Command-Line Options
|
||||
--------------------
|
||||
|
||||
|
@ -259,3 +259,121 @@ Following is an example of enabling root_squash in a filesystem except within
|
||||
caps mds = "allow rw fsname=a root_squash, allow rw fsname=a path=/volumes"
|
||||
caps mon = "allow r fsname=a"
|
||||
caps osd = "allow rw tag cephfs data=a"
|
||||
|
||||
Updating Capabilities using ``fs authorize``
|
||||
============================================
|
||||
After Ceph's Reef version, ``fs authorize`` can not only be used to create a
|
||||
new client with caps for a CephFS but it can also be used to add new caps
|
||||
(for a another CephFS or another path in same FS) to an already existing
|
||||
client.
|
||||
|
||||
Let's say we run following and create a new client::
|
||||
|
||||
$ ceph fs authorize a client.x / rw
|
||||
[client.x]
|
||||
key = AQAOtSVk9WWtIhAAJ3gSpsjwfIQ0gQ6vfSx/0w==
|
||||
$ ceph auth get client.x
|
||||
[client.x]
|
||||
key = AQAOtSVk9WWtIhAAJ3gSpsjwfIQ0gQ6vfSx/0w==
|
||||
caps mds = "allow rw fsname=a"
|
||||
caps mon = "allow r fsname=a"
|
||||
caps osd = "allow rw tag cephfs data=a"
|
||||
|
||||
Previously, running ``fs authorize a client.x / rw`` a second time used to
|
||||
print an error message. But after Reef, it instead prints message that
|
||||
there's not update::
|
||||
|
||||
$ ./bin/ceph fs authorize a client.x / rw
|
||||
no update for caps of client.x
|
||||
|
||||
Adding New Caps Using ``fs authorize``
|
||||
--------------------------------------
|
||||
Users can now add caps for another path in same CephFS::
|
||||
|
||||
$ ceph fs authorize a client.x /dir1 rw
|
||||
updated caps for client.x
|
||||
$ ceph auth get client.x
|
||||
[client.x]
|
||||
key = AQAOtSVk9WWtIhAAJ3gSpsjwfIQ0gQ6vfSx/0w==
|
||||
caps mds = "allow r fsname=a, allow rw fsname=a path=some/dir"
|
||||
caps mon = "allow r fsname=a"
|
||||
caps osd = "allow rw tag cephfs data=a"
|
||||
|
||||
And even add caps for another CephFS on Ceph cluster::
|
||||
|
||||
$ ceph fs authorize b client.x / rw
|
||||
updated caps for client.x
|
||||
$ ceph auth get client.x
|
||||
[client.x]
|
||||
key = AQD6tiVk0uJdARAABMaQuLRotxTi3Qdj47FkBA==
|
||||
caps mds = "allow rw fsname=a, allow rw fsname=b"
|
||||
caps mon = "allow r fsname=a, allow r fsname=b"
|
||||
caps osd = "allow rw tag cephfs data=a, allow rw tag cephfs data=b"
|
||||
|
||||
Changing rw permissions in caps
|
||||
-------------------------------
|
||||
|
||||
It's not possible to modify caps by running ``fs authorize`` except for the
|
||||
case when read/write permissions have to be changed. This is because the
|
||||
``fs authorize`` becomes ambiguous. For example, user runs ``fs authorize
|
||||
cephfs1 client.x /dir1 rw`` to create a client and then runs ``fs authorize
|
||||
cephfs1 client.x /dir2 rw`` (notice ``/dir1`` is changed to ``/dir2``).
|
||||
Running second command can be interpreted as changing ``/dir1`` to ``/dir2``
|
||||
in current cap or can also be interpreted as authorizing the client with a
|
||||
new cap for path ``/dir2``. As seen in previous sections, second
|
||||
interpretation is chosen and therefore it's impossible to update a part of
|
||||
capability granted except rw permissions. Following is how read/write
|
||||
permissions for ``client.x`` (that was created above) can be changed::
|
||||
|
||||
$ ceph fs authorize a client.x / r
|
||||
[client.x]
|
||||
key = AQBBKjBkIFhBDBAA6q5PmDDWaZtYjd+jafeVUQ==
|
||||
$ ceph auth get client.x
|
||||
[client.x]
|
||||
key = AQBBKjBkIFhBDBAA6q5PmDDWaZtYjd+jafeVUQ==
|
||||
caps mds = "allow r fsname=a"
|
||||
caps mon = "allow r fsname=a"
|
||||
caps osd = "allow r tag cephfs data=a"
|
||||
|
||||
``fs authorize`` never deducts any part of caps
|
||||
-----------------------------------------------
|
||||
It's not possible to remove caps issued to a client by running ``fs
|
||||
authorize`` again. For example, if a client cap has ``root_squash`` applied
|
||||
on a certain CephFS, running ``fs authorize`` again for the same CephFS but
|
||||
without ``root_squash`` will not lead to any update, the client caps will
|
||||
remain unchanged::
|
||||
|
||||
$ ceph fs authorize a client.x / rw root_squash
|
||||
[client.x]
|
||||
key = AQD61CVkcA1QCRAAd0XYqPbHvcc+lpUAuc6Vcw==
|
||||
$ ceph auth get client.x
|
||||
[client.x]
|
||||
key = AQD61CVkcA1QCRAAd0XYqPbHvcc+lpUAuc6Vcw==
|
||||
caps mds = "allow rw fsname=a root_squash"
|
||||
caps mon = "allow r fsname=a"
|
||||
caps osd = "allow rw tag cephfs data=a"
|
||||
$ ceph fs authorize a client.x / rw
|
||||
[client.x]
|
||||
key = AQD61CVkcA1QCRAAd0XYqPbHvcc+lpUAuc6Vcw==
|
||||
no update was performed for caps of client.x. caps of client.x remains unchanged.
|
||||
|
||||
And if a client already has a caps for FS name ``a`` and path ``dir1``,
|
||||
running ``fs authorize`` again for FS name ``a`` but path ``dir2``, instead
|
||||
of modifying the caps client already holds, a new cap for ``dir2`` will be
|
||||
granted::
|
||||
|
||||
$ ceph fs authorize a client.x /dir1 rw
|
||||
$ ceph auth get client.x
|
||||
[client.x]
|
||||
key = AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA==
|
||||
caps mds = "allow rw fsname=a path=/dir1"
|
||||
caps mon = "allow r fsname=a"
|
||||
caps osd = "allow rw tag cephfs data=a"
|
||||
$ ceph fs authorize a client.x /dir2 rw
|
||||
updated caps for client.x
|
||||
$ ceph auth get client.x
|
||||
[client.x]
|
||||
key = AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA==
|
||||
caps mds = "allow rw fsname=a path=dir1, allow rw fsname=a path=dir2"
|
||||
caps mon = "allow r fsname=a"
|
||||
caps osd = "allow rw tag cephfs data=a"
|
||||
|
@ -15,7 +15,7 @@ Advanced: Metadata repair tools
|
||||
file system before attempting to repair it.
|
||||
|
||||
If you do not have access to professional support for your cluster,
|
||||
consult the ceph-users mailing list or the #ceph IRC channel.
|
||||
consult the ceph-users mailing list or the #ceph IRC/Slack channel.
|
||||
|
||||
|
||||
Journal export
|
||||
|
@ -501,10 +501,14 @@ To initiate a clone operation use::
|
||||
|
||||
$ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name>
|
||||
|
||||
.. note:: ``subvolume snapshot clone`` command depends upon the above mentioned config option ``snapshot_clone_no_wait``
|
||||
|
||||
If a snapshot (source subvolume) is a part of non-default group, the group name needs to be specified::
|
||||
|
||||
$ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --group_name <subvol_group_name>
|
||||
|
||||
If a snapshot (source subvolume) is a part of non-default group, the group name needs to be specified:
|
||||
|
||||
Cloned subvolumes can be a part of a different group than the source snapshot (by default, cloned subvolumes are created in default group). To clone to a particular group use::
|
||||
|
||||
$ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --target_group_name <subvol_group_name>
|
||||
@ -513,13 +517,15 @@ Similar to specifying a pool layout when creating a subvolume, pool layout can b
|
||||
|
||||
$ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --pool_layout <pool_layout>
|
||||
|
||||
Configure the maximum number of concurrent clones. The default is 4::
|
||||
|
||||
$ ceph config set mgr mgr/volumes/max_concurrent_clones <value>
|
||||
|
||||
To check the status of a clone operation use::
|
||||
|
||||
$ ceph fs clone status <vol_name> <clone_name> [--group_name <group_name>]
|
||||
ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --pool_layout <pool_layout>
|
||||
|
||||
To check the status of a clone operation use:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph fs clone status <vol_name> <clone_name> [--group_name <group_name>]
|
||||
|
||||
A clone can be in one of the following states:
|
||||
|
||||
@ -616,6 +622,31 @@ On successful cancellation, the cloned subvolume is moved to the ``canceled`` st
|
||||
|
||||
.. note:: The canceled cloned may be deleted by supplying the ``--force`` option to the `fs subvolume rm` command.
|
||||
|
||||
Configurables
|
||||
~~~~~~~~~~~~~
|
||||
|
||||
Configure the maximum number of concurrent clone operations. The default is 4:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph config set mgr mgr/volumes/max_concurrent_clones <value>
|
||||
|
||||
Configure the snapshot_clone_no_wait option :
|
||||
|
||||
The ``snapshot_clone_no_wait`` config option is used to reject clone creation requests when cloner threads
|
||||
(which can be configured using above option i.e. ``max_concurrent_clones``) are not available.
|
||||
It is enabled by default i.e. the value set is True, whereas it can be configured by using below command.
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph config set mgr mgr/volumes/snapshot_clone_no_wait <bool>
|
||||
|
||||
The current value of ``snapshot_clone_no_wait`` can be fetched by using below command.
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph config get mgr mgr/volumes/snapshot_clone_no_wait
|
||||
|
||||
|
||||
.. _subvol-pinning:
|
||||
|
||||
|
@ -130,7 +130,9 @@ other daemons, please see :ref:`health-checks`.
|
||||
from properly cleaning up resources used by client requests. This message
|
||||
appears if a client appears to have more than ``max_completed_requests``
|
||||
(default 100000) requests that are complete on the MDS side but haven't
|
||||
yet been accounted for in the client's *oldest tid* value.
|
||||
yet been accounted for in the client's *oldest tid* value. The last tid
|
||||
used by the MDS to trim completed client requests (or flush) is included
|
||||
as part of `session ls` (or `client ls`) command as a debug aid.
|
||||
|
||||
``MDS_DAMAGE``
|
||||
--------------
|
||||
@ -238,3 +240,32 @@ other daemons, please see :ref:`health-checks`.
|
||||
Description
|
||||
All MDS ranks are unavailable resulting in the file system to be completely
|
||||
offline.
|
||||
|
||||
``MDS_CLIENTS_LAGGY``
|
||||
----------------------------
|
||||
Message
|
||||
"Client *ID* is laggy; not evicted because some OSD(s) is/are laggy"
|
||||
|
||||
Description
|
||||
If OSD(s) is laggy (due to certain conditions like network cut-off, etc)
|
||||
then it might make clients laggy(session might get idle or cannot flush
|
||||
dirty data for cap revokes). If ``defer_client_eviction_on_laggy_osds`` is
|
||||
set to true (default true), client eviction will not take place and thus
|
||||
this health warning will be generated.
|
||||
|
||||
``MDS_CLIENTS_BROKEN_ROOTSQUASH``
|
||||
---------------------------------
|
||||
Message
|
||||
"X client(s) with broken root_squash implementation (MDS_CLIENTS_BROKEN_ROOTSQUASH)"
|
||||
|
||||
Description
|
||||
A bug was discovered in root_squash which would potentially lose changes made by a
|
||||
client restricted with root_squash caps. The fix required a change to the protocol
|
||||
and a client upgrade is required.
|
||||
|
||||
This is a HEALTH_ERR warning because of the danger of inconsistency and lost
|
||||
data. It is recommended to either upgrade your clients, discontinue using
|
||||
root_squash in the interim, or silence the warning if desired.
|
||||
|
||||
To evict and permanently block broken clients from connecting to the
|
||||
cluster, set the ``required_client_feature`` bit ``client_mds_auth_caps``.
|
||||
|
@ -116,7 +116,7 @@ The mechanism provided for this purpose is called an ``export pin``, an
|
||||
extended attribute of directories. The name of this extended attribute is
|
||||
``ceph.dir.pin``. Users can set this attribute using standard commands:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
setfattr -n ceph.dir.pin -v 2 path/to/dir
|
||||
|
||||
@ -128,7 +128,7 @@ pin. In this way, setting the export pin on a directory affects all of its
|
||||
children. However, the parents pin can be overridden by setting the child
|
||||
directory's export pin. For example:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
mkdir -p a/b
|
||||
# "a" and "a/b" both start without an export pin set
|
||||
@ -173,7 +173,7 @@ immediate children across a range of MDS ranks. The canonical example use-case
|
||||
would be the ``/home`` directory: we want every user's home directory to be
|
||||
spread across the entire MDS cluster. This can be set via:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
setfattr -n ceph.dir.pin.distributed -v 1 /cephfs/home
|
||||
|
||||
@ -183,7 +183,7 @@ may be ephemerally pinned. This is set through the extended attribute
|
||||
``ceph.dir.pin.random`` with the value set to the percentage of directories
|
||||
that should be pinned. For example:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
setfattr -n ceph.dir.pin.random -v 0.5 /cephfs/tmp
|
||||
|
||||
@ -205,7 +205,7 @@ Ephemeral pins may override parent export pins and vice versa. What determines
|
||||
which policy is followed is the rule of the closest parent: if a closer parent
|
||||
directory has a conflicting policy, use that one instead. For example:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
mkdir -p foo/bar1/baz foo/bar2
|
||||
setfattr -n ceph.dir.pin -v 0 foo
|
||||
@ -217,7 +217,7 @@ directory will obey the pin on ``foo`` normally.
|
||||
|
||||
For the reverse situation:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
mkdir -p home/{patrick,john}
|
||||
setfattr -n ceph.dir.pin.distributed -v 1 home
|
||||
@ -229,7 +229,8 @@ because its export pin overrides the policy on ``home``.
|
||||
To remove a partitioning policy, remove the respective extended attribute
|
||||
or set the value to 0.
|
||||
|
||||
.. code::bash
|
||||
.. prompt:: bash #
|
||||
|
||||
$ setfattr -n ceph.dir.pin.distributed -v 0 home
|
||||
# or
|
||||
$ setfattr -x ceph.dir.pin.distributed home
|
||||
@ -237,10 +238,36 @@ or set the value to 0.
|
||||
For export pins, remove the extended attribute or set the extended attribute
|
||||
value to `-1`.
|
||||
|
||||
.. code::bash
|
||||
.. prompt:: bash #
|
||||
|
||||
$ setfattr -n ceph.dir.pin -v -1 home
|
||||
|
||||
|
||||
Dynamic Subtree Partitioning
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
CephFS has long had a dynamic metadata blanacer (sometimes called the "default
|
||||
balancer") which can split or merge subtrees while placing them on "colder" MDS
|
||||
ranks. Moving the metadata around can improve overall file system throughput
|
||||
and cache size.
|
||||
|
||||
However, the balancer has suffered from problem with efficiency and performance
|
||||
so it is by default turned off. This is to avoid an administrator "turning on
|
||||
multimds" by increasing the ``max_mds`` setting and then finding the balancer
|
||||
has made a mess of the cluster performance (reverting is straightforward but
|
||||
can take time).
|
||||
|
||||
The setting to turn on the balancer is:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph fs set <fs_name> balance_automate true
|
||||
|
||||
Turning on the balancer should only be done with appropriate configuration,
|
||||
such as with the ``bal_rank_mask`` setting (described below). Careful
|
||||
monitoring of the file system performance and MDS is advised.
|
||||
|
||||
|
||||
Dynamic subtree partitioning with Balancer on specific ranks
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
@ -260,27 +287,27 @@ static pinned subtrees.
|
||||
|
||||
This option can be configured with the ``ceph fs set`` command. For example:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph fs set <fs_name> bal_rank_mask <hex>
|
||||
|
||||
Each bitfield of the ``<hex>`` number represents a dedicated rank. If the ``<hex>`` is
|
||||
set to ``0x3``, the balancer runs on active ``0`` and ``1`` ranks. For example:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph fs set <fs_name> bal_rank_mask 0x3
|
||||
|
||||
If the ``bal_rank_mask`` is set to ``-1`` or ``all``, all active ranks are masked
|
||||
and utilized by the balancer. As an example:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph fs set <fs_name> bal_rank_mask -1
|
||||
|
||||
On the other hand, if the balancer needs to be disabled,
|
||||
the ``bal_rank_mask`` should be set to ``0x0``. For example:
|
||||
|
||||
::
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph fs set <fs_name> bal_rank_mask 0x0
|
||||
|
@ -21,6 +21,14 @@ value::
|
||||
setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir # 100 MB
|
||||
setfattr -n ceph.quota.max_files -v 10000 /some/dir # 10,000 files
|
||||
|
||||
``ceph.quota.max_bytes`` can also be set using human-friendly units::
|
||||
|
||||
setfattr -n ceph.quota.max_bytes -v 100K /some/dir # 100 KiB
|
||||
setfattr -n ceph.quota.max_bytes -v 5Gi /some/dir # 5 GiB
|
||||
|
||||
.. note:: Values will be strictly cast to IEC units even when SI units
|
||||
are input, e.g. 1K to 1024 bytes.
|
||||
|
||||
To view quota limit::
|
||||
|
||||
$ getfattr -n ceph.quota.max_bytes /some/dir
|
||||
|
@ -30,9 +30,9 @@ assumed to be keyword arguments too.
|
||||
Snapshot schedules are identified by path, their repeat interval and their start
|
||||
time. The
|
||||
repeat interval defines the time between two subsequent snapshots. It is
|
||||
specified by a number and a period multiplier, one of `h(our)`, `d(ay)` and
|
||||
`w(eek)`. E.g. a repeat interval of `12h` specifies one snapshot every 12
|
||||
hours.
|
||||
specified by a number and a period multiplier, one of `h(our)`, `d(ay)`,
|
||||
`w(eek)`, `M(onth)` and `Y(ear)`. E.g. a repeat interval of `12h` specifies one
|
||||
snapshot every 12 hours.
|
||||
The start time is specified as a time string (more details about passing times
|
||||
below). By default
|
||||
the start time is last midnight. So when a snapshot schedule with repeat
|
||||
@ -52,8 +52,8 @@ space or concatenated pairs of `<number><time period>`.
|
||||
The semantics are that a spec will ensure `<number>` snapshots are kept that are
|
||||
at least `<time period>` apart. For Example `7d` means the user wants to keep 7
|
||||
snapshots that are at least one day (but potentially longer) apart from each other.
|
||||
The following time periods are recognized: `h(our), d(ay), w(eek), m(onth),
|
||||
y(ear)` and `n`. The latter is a special modifier where e.g. `10n` means keep
|
||||
The following time periods are recognized: `h(our)`, `d(ay)`, `w(eek)`, `M(onth)`,
|
||||
`Y(ear)` and `n`. The latter is a special modifier where e.g. `10n` means keep
|
||||
the last 10 snapshots regardless of timing,
|
||||
|
||||
All subcommands take optional `fs` argument to specify paths in
|
||||
|
@ -118,10 +118,16 @@ enforces this affinity.
|
||||
When failing over MDS daemons, a cluster's monitors will prefer standby daemons with
|
||||
``mds_join_fs`` equal to the file system ``name`` with the failed ``rank``. If no
|
||||
standby exists with ``mds_join_fs`` equal to the file system ``name``, it will
|
||||
choose an unqualified standby (no setting for ``mds_join_fs``) for the replacement,
|
||||
or any other available standby, as a last resort. Note, this does not change the
|
||||
behavior that ``standby-replay`` daemons are always selected before
|
||||
other standbys.
|
||||
choose an unqualified standby (no setting for ``mds_join_fs``) for the replacement.
|
||||
As a last resort, a standby for another filesystem will be chosen, although this
|
||||
behavior can be disabled:
|
||||
|
||||
::
|
||||
|
||||
ceph fs set <fs name> refuse_standby_for_another_fs true
|
||||
|
||||
Note, configuring MDS file system affinity does not change the behavior that
|
||||
``standby-replay`` daemons are always selected before other standbys.
|
||||
|
||||
Even further, the monitors will regularly examine the CephFS file systems even when
|
||||
stable to check if a standby with stronger affinity is available to replace an
|
||||
|
@ -401,3 +401,64 @@ own copy of the cephadm "binary" use the script located at
|
||||
``./src/cephadm/build.py [output]``.
|
||||
|
||||
.. _Python Zip Application: https://peps.python.org/pep-0441/
|
||||
|
||||
You can pass a limited set of version metadata values to be stored in the
|
||||
compiled cepadm. These options can be passed to the build script with
|
||||
the ``--set-version-var`` or ``-S`` option. The values should take the form
|
||||
``KEY=VALUE`` and valid keys include:
|
||||
* ``CEPH_GIT_VER``
|
||||
* ``CEPH_GIT_NICE_VER``
|
||||
* ``CEPH_RELEASE``
|
||||
* ``CEPH_RELEASE_NAME``
|
||||
* ``CEPH_RELEASE_TYPE``
|
||||
|
||||
Example: ``./src/cephadm/build.py -SCEPH_GIT_VER=$(git rev-parse HEAD) -SCEPH_GIT_NICE_VER=$(git describe) /tmp/cephadm``
|
||||
|
||||
Typically these values will be passed to build.py by other, higher level, build
|
||||
tools - such as cmake.
|
||||
|
||||
The compiled version of the binary may include a curated set of dependencies
|
||||
within the zipapp. The tool used to fetch the bundled dependencies can be
|
||||
Python's ``pip``, locally installed RPMs, or bundled dependencies can be
|
||||
disabled. To select the mode for bundled dependencies use the
|
||||
``--bundled-dependencies`` or ``-B`` option with a value of ``pip``, ``rpm``,
|
||||
or ``none``.
|
||||
|
||||
The compiled cephadm zipapp file retains metadata about how it was built. This
|
||||
can be displayed by running ``cephadm version --verbose``. The command will
|
||||
emit a JSON formatted object showing version metadata (if available), a list of
|
||||
the bundled dependencies generated by the build script (if bundled dependencies
|
||||
were enabled), and a summary of the top-level contents of the zipapp. Example::
|
||||
|
||||
$ ./cephadm version --verbose
|
||||
{
|
||||
"name": "cephadm",
|
||||
"ceph_git_nice_ver": "18.0.0-6867-g6a1df2d0b01",
|
||||
"ceph_git_ver": "6a1df2d0b01da581bfef3357940e1e88d5ce70ce",
|
||||
"ceph_release_name": "reef",
|
||||
"ceph_release_type": "dev",
|
||||
"bundled_packages": [
|
||||
{
|
||||
"name": "Jinja2",
|
||||
"version": "3.1.2",
|
||||
"package_source": "pip",
|
||||
"requirements_entry": "Jinja2 == 3.1.2"
|
||||
},
|
||||
{
|
||||
"name": "MarkupSafe",
|
||||
"version": "2.1.3",
|
||||
"package_source": "pip",
|
||||
"requirements_entry": "MarkupSafe == 2.1.3"
|
||||
}
|
||||
],
|
||||
"zip_root_entries": [
|
||||
"Jinja2-3.1.2-py3.9.egg-info",
|
||||
"MarkupSafe-2.1.3-py3.9.egg-info",
|
||||
"__main__.py",
|
||||
"__main__.pyc",
|
||||
"_cephadmmeta",
|
||||
"cephadmlib",
|
||||
"jinja2",
|
||||
"markupsafe"
|
||||
]
|
||||
}
|
||||
|
@ -148,7 +148,7 @@ options. By default, ``log-to-stdout`` is enabled, and ``--log-to-syslog`` is di
|
||||
vstart.sh
|
||||
---------
|
||||
|
||||
The following options aree handy when using ``vstart.sh``,
|
||||
The following options can be used with ``vstart.sh``.
|
||||
|
||||
``--crimson``
|
||||
Start ``crimson-osd`` instead of ``ceph-osd``.
|
||||
@ -195,9 +195,6 @@ The following options aree handy when using ``vstart.sh``,
|
||||
Valid types include ``HDD``, ``SSD``(default), ``ZNS``, and ``RANDOM_BLOCK_SSD``
|
||||
Note secondary devices should not be faster than the main device.
|
||||
|
||||
``--seastore``
|
||||
Use SeaStore as the object store backend.
|
||||
|
||||
To start a cluster with a single Crimson node, run::
|
||||
|
||||
$ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh -n -x \
|
||||
|
@ -1,3 +1,5 @@
|
||||
.. _crimson_dev_doc:
|
||||
|
||||
===============================
|
||||
Crimson developer documentation
|
||||
===============================
|
||||
|
@ -13,20 +13,18 @@ following table shows all the leads and their nicks on `GitHub`_:
|
||||
|
||||
.. _github: https://github.com/
|
||||
|
||||
========= ================ =============
|
||||
Scope Lead GitHub nick
|
||||
========= ================ =============
|
||||
Ceph Sage Weil liewegas
|
||||
RADOS Neha Ojha neha-ojha
|
||||
RGW Yehuda Sadeh yehudasa
|
||||
RGW Matt Benjamin mattbenjamin
|
||||
RBD Ilya Dryomov dis
|
||||
CephFS Venky Shankar vshankar
|
||||
Dashboard Ernesto Puerta epuertat
|
||||
MON Joao Luis jecluis
|
||||
Build/Ops Ken Dreyer ktdreyer
|
||||
Docs Zac Dover zdover23
|
||||
========= ================ =============
|
||||
========= ================== =============
|
||||
Scope Lead GitHub nick
|
||||
========= ================== =============
|
||||
RADOS Radoslaw Zarzynski rzarzynski
|
||||
RGW Casey Bodley cbodley
|
||||
RGW Matt Benjamin mattbenjamin
|
||||
RBD Ilya Dryomov dis
|
||||
CephFS Venky Shankar vshankar
|
||||
Dashboard Nizamudeen A nizamial09
|
||||
Build/Ops Ken Dreyer ktdreyer
|
||||
Docs Zac Dover zdover23
|
||||
========= ================== =============
|
||||
|
||||
The Ceph-specific acronyms in the table are explained in
|
||||
:doc:`/architecture`.
|
||||
|
@ -209,6 +209,15 @@ For example: for the above test ID, the path is::
|
||||
|
||||
This method can be used to view the log more quickly than would be possible through a browser.
|
||||
|
||||
In addition to ``teuthology.log``, some other files are included for debugging
|
||||
purposes:
|
||||
|
||||
* ``unit_test_summary.yaml``: Provides a summary of all unit test failures.
|
||||
Generated (optionally) when the ``unit_test_scan`` configuration option is
|
||||
used in the job's YAML file.
|
||||
|
||||
* ``valgrind.yaml``: Summarizes any Valgrind errors that may occur.
|
||||
|
||||
.. note:: To access archives more conveniently, ``/a/`` has been symbolically
|
||||
linked to ``/ceph/teuthology-archive/``. For instance, to access the previous
|
||||
example, we can use something like::
|
||||
|
@ -2,10 +2,14 @@
|
||||
Ceph Internals
|
||||
================
|
||||
|
||||
.. note:: If you're looking for how to use Ceph as a library from your
|
||||
own software, please see :doc:`/api/index`.
|
||||
.. note:: For information on how to use Ceph as a library (from your own
|
||||
software), see :doc:`/api/index`.
|
||||
|
||||
You can start a development mode Ceph cluster, after compiling the source, with::
|
||||
Starting a Development-mode Ceph Cluster
|
||||
----------------------------------------
|
||||
|
||||
Compile the source and then run the following commands to start a
|
||||
development-mode Ceph cluster::
|
||||
|
||||
cd build
|
||||
OSD=3 MON=3 MGR=3 ../src/vstart.sh -n -x
|
||||
|
@ -218,6 +218,8 @@ we may want to exploit.
|
||||
The dedup-tool needs to be updated to use ``LIST_SNAPS`` to discover
|
||||
clones as part of leak detection.
|
||||
|
||||
.. _osd-make-writeable:
|
||||
|
||||
An important question is how we deal with the fact that many clones
|
||||
will frequently have references to the same backing chunks at the same
|
||||
offset. In particular, ``make_writeable`` will generally create a clone
|
||||
|
@ -23,12 +23,11 @@ The difference between *pool snaps* and *self managed snaps* from the
|
||||
OSD's point of view lies in whether the *SnapContext* comes to the OSD
|
||||
via the client's MOSDOp or via the most recent OSDMap.
|
||||
|
||||
See OSD::make_writeable
|
||||
See :ref:`manifest.rst <osd-make-writeable>` for more information.
|
||||
|
||||
Ondisk Structures
|
||||
-----------------
|
||||
Each object has in the PG collection a *head* object (or *snapdir*, which we
|
||||
will come to shortly) and possibly a set of *clone* objects.
|
||||
Each object has in the PG collection a *head* object and possibly a set of *clone* objects.
|
||||
Each hobject_t has a snap field. For the *head* (the only writeable version
|
||||
of an object), the snap field is set to CEPH_NOSNAP. For the *clones*, the
|
||||
snap field is set to the *seq* of the *SnapContext* at their creation.
|
||||
@ -47,8 +46,12 @@ The *head* object contains a *SnapSet* encoded in an attribute, which tracks
|
||||
3. Overlapping intervals between clones for tracking space usage
|
||||
4. Clone size
|
||||
|
||||
If the *head* is deleted while there are still clones, a *snapdir* object
|
||||
is created instead to house the *SnapSet*.
|
||||
The *head* can't be deleted while there are still clones. Instead, it is
|
||||
marked as whiteout (``object_info_t::FLAG_WHITEOUT``) in order to house the
|
||||
*SnapSet* contained in it.
|
||||
In that case, the *head* object no longer logically exists.
|
||||
|
||||
See: should_whiteout()
|
||||
|
||||
Additionally, the *object_info_t* on each clone includes a vector of snaps
|
||||
for which clone is defined.
|
||||
@ -126,3 +129,111 @@ up to 8 prefixes need to be checked to determine all hobjects in a particular
|
||||
snap for a particular PG. Upon split, the prefixes to check on the parent
|
||||
are adjusted such that only the objects remaining in the PG will be visible.
|
||||
The children will immediately have the correct mapping.
|
||||
|
||||
clone_overlap
|
||||
-------------
|
||||
Each SnapSet attached to the *head* object contains the overlapping intervals
|
||||
between clone objects for optimizing space.
|
||||
The overlapping intervals are stored within the ``clone_overlap`` map, each element in the
|
||||
map stores the snap ID and the corresponding overlap with the next newest clone.
|
||||
|
||||
See the following example using a 4 byte object:
|
||||
|
||||
+--------+---------+
|
||||
| object | content |
|
||||
+========+=========+
|
||||
| head | [AAAA] |
|
||||
+--------+---------+
|
||||
|
||||
listsnaps output is as follows:
|
||||
|
||||
+---------+-------+------+---------+
|
||||
| cloneid | snaps | size | overlap |
|
||||
+=========+=======+======+=========+
|
||||
| head | - | 4 | |
|
||||
+---------+-------+------+---------+
|
||||
|
||||
After taking a snapshot (ID 1) and re-writing the first 2 bytes of the object,
|
||||
the clone created will overlap with the new *head* object in its last 2 bytes.
|
||||
|
||||
+------------+---------+
|
||||
| object | content |
|
||||
+============+=========+
|
||||
| head | [BBAA] |
|
||||
+------------+---------+
|
||||
| clone ID 1 | [AAAA] |
|
||||
+------------+---------+
|
||||
|
||||
+---------+-------+------+---------+
|
||||
| cloneid | snaps | size | overlap |
|
||||
+=========+=======+======+=========+
|
||||
| 1 | 1 | 4 | [2~2] |
|
||||
+---------+-------+------+---------+
|
||||
| head | - | 4 | |
|
||||
+---------+-------+------+---------+
|
||||
|
||||
By taking another snapshot (ID 2) and this time re-writing only the first 1 byte of the object,
|
||||
the clone created (ID 2) will overlap with the new *head* object in its last 3 bytes.
|
||||
While the oldest clone (ID 1) will overlap with the newest clone in its last 2 bytes.
|
||||
|
||||
+------------+---------+
|
||||
| object | content |
|
||||
+============+=========+
|
||||
| head | [CBAA] |
|
||||
+------------+---------+
|
||||
| clone ID 2 | [BBAA] |
|
||||
+------------+---------+
|
||||
| clone ID 1 | [AAAA] |
|
||||
+------------+---------+
|
||||
|
||||
+---------+-------+------+---------+
|
||||
| cloneid | snaps | size | overlap |
|
||||
+=========+=======+======+=========+
|
||||
| 1 | 1 | 4 | [2~2] |
|
||||
+---------+-------+------+---------+
|
||||
| 2 | 2 | 4 | [1~3] |
|
||||
+---------+-------+------+---------+
|
||||
| head | - | 4 | |
|
||||
+---------+-------+------+---------+
|
||||
|
||||
If the *head* object will be completely re-written by re-writing 4 bytes,
|
||||
the only existing overlap that will remain will be between the two clones.
|
||||
|
||||
+------------+---------+
|
||||
| object | content |
|
||||
+============+=========+
|
||||
| head | [DDDD] |
|
||||
+------------+---------+
|
||||
| clone ID 2 | [BBAA] |
|
||||
+------------+---------+
|
||||
| clone ID 1 | [AAAA] |
|
||||
+------------+---------+
|
||||
|
||||
+---------+-------+------+---------+
|
||||
| cloneid | snaps | size | overlap |
|
||||
+=========+=======+======+=========+
|
||||
| 1 | 1 | 4 | [2~2] |
|
||||
+---------+-------+------+---------+
|
||||
| 2 | 2 | 4 | |
|
||||
+---------+-------+------+---------+
|
||||
| head | - | 4 | |
|
||||
+---------+-------+------+---------+
|
||||
|
||||
Lastly, after the last snap (ID 2) is removed and snaptrim kicks in,
|
||||
no overlapping intervals will remain:
|
||||
|
||||
+------------+---------+
|
||||
| object | content |
|
||||
+============+=========+
|
||||
| head | [DDDD] |
|
||||
+------------+---------+
|
||||
| clone ID 1 | [AAAA] |
|
||||
+------------+---------+
|
||||
|
||||
+---------+-------+------+---------+
|
||||
| cloneid | snaps | size | overlap |
|
||||
+=========+=======+======+=========+
|
||||
| 1 | 1 | 4 | |
|
||||
+---------+-------+------+---------+
|
||||
| head | - | 4 | |
|
||||
+---------+-------+------+---------+
|
||||
|
@ -6,98 +6,93 @@ Concepts
|
||||
--------
|
||||
|
||||
*Peering*
|
||||
the process of bringing all of the OSDs that store
|
||||
a Placement Group (PG) into agreement about the state
|
||||
of all of the objects (and their metadata) in that PG.
|
||||
Note that agreeing on the state does not mean that
|
||||
they all have the latest contents.
|
||||
the process of bringing all of the OSDs that store a Placement Group (PG)
|
||||
into agreement about the state of all of the objects in that PG and all of
|
||||
the metadata associated with those objects. Two OSDs can agree on the state
|
||||
of the objects in the placement group yet still may not necessarily have the
|
||||
latest contents.
|
||||
|
||||
*Acting set*
|
||||
the ordered list of OSDs who are (or were as of some epoch)
|
||||
responsible for a particular PG.
|
||||
the ordered list of OSDs that are (or were as of some epoch) responsible for
|
||||
a particular PG.
|
||||
|
||||
*Up set*
|
||||
the ordered list of OSDs responsible for a particular PG for
|
||||
a particular epoch according to CRUSH. Normally this
|
||||
is the same as the *acting set*, except when the *acting set* has been
|
||||
explicitly overridden via *PG temp* in the OSDMap.
|
||||
the ordered list of OSDs responsible for a particular PG for a particular
|
||||
epoch, according to CRUSH. This is the same as the *acting set* except when
|
||||
the *acting set* has been explicitly overridden via *PG temp* in the OSDMap.
|
||||
|
||||
*PG temp*
|
||||
a temporary placement group acting set used while backfilling the
|
||||
primary osd. Let say acting is [0,1,2] and we are
|
||||
active+clean. Something happens and acting is now [3,1,2]. osd 3 is
|
||||
empty and can't serve reads although it is the primary. osd.3 will
|
||||
see that and request a *PG temp* of [1,2,3] to the monitors using a
|
||||
MOSDPGTemp message so that osd.1 temporarily becomes the
|
||||
primary. It will select osd.3 as a backfill peer and continue to
|
||||
serve reads and writes while osd.3 is backfilled. When backfilling
|
||||
is complete, *PG temp* is discarded and the acting set changes back
|
||||
to [3,1,2] and osd.3 becomes the primary.
|
||||
a temporary placement group acting set that is used while backfilling the
|
||||
primary OSD. Assume that the acting set is ``[0,1,2]`` and we are
|
||||
``active+clean``. Now assume that something happens and the acting set
|
||||
becomes ``[2,1,2]``. Under these circumstances, OSD ``3`` is empty and can't
|
||||
serve reads even though it is the primary. ``osd.3`` will respond by
|
||||
requesting a *PG temp* of ``[1,2,3]`` to the monitors using a ``MOSDPGTemp``
|
||||
message, and ``osd.1`` will become the primary temporarily. ``osd.1`` will
|
||||
select ``osd.3`` as a backfill peer and will continue to serve reads and
|
||||
writes while ``osd.3`` is backfilled. When backfilling is complete, *PG
|
||||
temp* is discarded. The acting set changes back to ``[3,1,2]`` and ``osd.3``
|
||||
becomes the primary.
|
||||
|
||||
*current interval* or *past interval*
|
||||
a sequence of OSD map epochs during which the *acting set* and *up
|
||||
set* for particular PG do not change
|
||||
a sequence of OSD map epochs during which the *acting set* and the *up
|
||||
set* for particular PG do not change.
|
||||
|
||||
*primary*
|
||||
the (by convention first) member of the *acting set*,
|
||||
who is responsible for coordination peering, and is
|
||||
the only OSD that will accept client initiated
|
||||
writes to objects in a placement group.
|
||||
the member of the *acting set* that is responsible for coordination peering.
|
||||
The only OSD that accepts client-initiated writes to the objects in a
|
||||
placement group. By convention, the primary is the first member of the
|
||||
*acting set*.
|
||||
|
||||
*replica*
|
||||
a non-primary OSD in the *acting set* for a placement group
|
||||
(and who has been recognized as such and *activated* by the primary).
|
||||
a non-primary OSD in the *acting set* of a placement group. A replica has
|
||||
been recognized as a non-primary OSD and has been *activated* by the
|
||||
primary.
|
||||
|
||||
*stray*
|
||||
an OSD who is not a member of the current *acting set*, but
|
||||
has not yet been told that it can delete its copies of a
|
||||
particular placement group.
|
||||
an OSD that is not a member of the current *acting set* and has not yet been
|
||||
told to delete its copies of a particular placement group.
|
||||
|
||||
*recovery*
|
||||
ensuring that copies of all of the objects in a PG
|
||||
are on all of the OSDs in the *acting set*. Once
|
||||
*peering* has been performed, the primary can start
|
||||
accepting write operations, and *recovery* can proceed
|
||||
in the background.
|
||||
the process of ensuring that copies of all of the objects in a PG are on all
|
||||
of the OSDs in the *acting set*. After *peering* has been performed, the
|
||||
primary can begin accepting write operations and *recovery* can proceed in
|
||||
the background.
|
||||
|
||||
*PG info*
|
||||
basic metadata about the PG's creation epoch, the version
|
||||
for the most recent write to the PG, *last epoch started*, *last
|
||||
epoch clean*, and the beginning of the *current interval*. Any
|
||||
inter-OSD communication about PGs includes the *PG info*, such that
|
||||
any OSD that knows a PG exists (or once existed) also has a lower
|
||||
bound on *last epoch clean* or *last epoch started*.
|
||||
basic metadata about the PG's creation epoch, the version for the most
|
||||
recent write to the PG, the *last epoch started*, the *last epoch clean*,
|
||||
and the beginning of the *current interval*. Any inter-OSD communication
|
||||
about PGs includes the *PG info*, such that any OSD that knows a PG exists
|
||||
(or once existed) and also has a lower bound on *last epoch clean* or *last
|
||||
epoch started*.
|
||||
|
||||
*PG log*
|
||||
a list of recent updates made to objects in a PG.
|
||||
Note that these logs can be truncated after all OSDs
|
||||
in the *acting set* have acknowledged up to a certain
|
||||
point.
|
||||
a list of recent updates made to objects in a PG. These logs can be
|
||||
truncated after all OSDs in the *acting set* have acknowledged the changes.
|
||||
|
||||
*missing set*
|
||||
Each OSD notes update log entries and if they imply updates to
|
||||
the contents of an object, adds that object to a list of needed
|
||||
updates. This list is called the *missing set* for that <OSD,PG>.
|
||||
the set of all objects that have not yet had their contents updated to match
|
||||
the log entries. The missing set is collated by each OSD. Missing sets are
|
||||
kept track of on an ``<OSD,PG>`` basis.
|
||||
|
||||
*Authoritative History*
|
||||
a complete, and fully ordered set of operations that, if
|
||||
performed, would bring an OSD's copy of a Placement Group
|
||||
up to date.
|
||||
a complete and fully-ordered set of operations that bring an OSD's copy of a
|
||||
Placement Group up to date.
|
||||
|
||||
*epoch*
|
||||
a (monotonically increasing) OSD map version number
|
||||
a (monotonically increasing) OSD map version number.
|
||||
|
||||
*last epoch start*
|
||||
the last epoch at which all nodes in the *acting set*
|
||||
for a particular placement group agreed on an
|
||||
*authoritative history*. At this point, *peering* is
|
||||
deemed to have been successful.
|
||||
the last epoch at which all nodes in the *acting set* for a given placement
|
||||
group agreed on an *authoritative history*. At the start of the last epoch,
|
||||
*peering* is deemed to have been successful.
|
||||
|
||||
*up_thru*
|
||||
before a primary can successfully complete the *peering* process,
|
||||
it must inform a monitor that is alive through the current
|
||||
OSD map epoch by having the monitor set its *up_thru* in the osd
|
||||
map. This helps peering ignore previous *acting sets* for which
|
||||
map. This helps peering ignore previous *acting sets* for which
|
||||
peering never completed after certain sequences of failures, such as
|
||||
the second interval below:
|
||||
|
||||
@ -107,10 +102,9 @@ Concepts
|
||||
- *acting set* = [B] (B restarts, A does not)
|
||||
|
||||
*last epoch clean*
|
||||
the last epoch at which all nodes in the *acting set*
|
||||
for a particular placement group were completely
|
||||
up to date (both PG logs and object contents).
|
||||
At this point, *recovery* is deemed to have been
|
||||
the last epoch at which all nodes in the *acting set* for a given placement
|
||||
group were completely up to date (this includes both the PG's logs and the
|
||||
PG's object contents). At this point, *recovery* is deemed to have been
|
||||
completed.
|
||||
|
||||
Description of the Peering Process
|
||||
|
@ -213,10 +213,24 @@
|
||||
Ceph cluster. See :ref:`the "Cluster Map" section of the
|
||||
Architecture document<architecture_cluster_map>` for details.
|
||||
|
||||
Crimson
|
||||
A next-generation OSD architecture whose core aim is the
|
||||
reduction of latency costs incurred due to cross-core
|
||||
communications. A re-design of the OSD that reduces lock
|
||||
contention by reducing communication between shards in the data
|
||||
path. Crimson improves upon the performance of classic Ceph
|
||||
OSDs by eliminating reliance on thread pools. See `Crimson:
|
||||
Next-generation Ceph OSD for Multi-core Scalability
|
||||
<https://ceph.io/en/news/blog/2023/crimson-multi-core-scalability/>`_.
|
||||
See the :ref:`Crimson developer
|
||||
documentation<crimson_dev_doc>`.
|
||||
|
||||
CRUSH
|
||||
**C**\ontrolled **R**\eplication **U**\nder **S**\calable
|
||||
**H**\ashing. The algorithm that Ceph uses to compute object
|
||||
storage locations.
|
||||
storage locations. See `CRUSH: Controlled, Scalable,
|
||||
Decentralized Placement of Replicated Data
|
||||
<https://ceph.com/assets/pdfs/weil-crush-sc06.pdf>`_.
|
||||
|
||||
CRUSH rule
|
||||
The CRUSH data placement rule that applies to a particular
|
||||
@ -255,17 +269,31 @@
|
||||
Hybrid OSD
|
||||
Refers to an OSD that has both HDD and SSD drives.
|
||||
|
||||
librados
|
||||
An API that can be used to create a custom interface to a Ceph
|
||||
storage cluster. ``librados`` makes it possible to interact
|
||||
with Ceph Monitors and with OSDs. See :ref:`Introduction to
|
||||
librados <librados-intro>`. See :ref:`librados (Python)
|
||||
<librados-python>`.
|
||||
|
||||
LVM tags
|
||||
**L**\ogical **V**\olume **M**\anager tags. Extensible metadata
|
||||
for LVM volumes and groups. They are used to store
|
||||
Ceph-specific information about devices and its relationship
|
||||
with OSDs.
|
||||
|
||||
:ref:`MDS<cephfs_add_remote_mds>`
|
||||
MDS
|
||||
The Ceph **M**\eta\ **D**\ata **S**\erver daemon. Also referred
|
||||
to as "ceph-mds". The Ceph metadata server daemon must be
|
||||
running in any Ceph cluster that runs the CephFS file system.
|
||||
The MDS stores all filesystem metadata.
|
||||
The MDS stores all filesystem metadata. :term:`Client`\s work
|
||||
together with either a single MDS or a group of MDSes to
|
||||
maintain a distributed metadata cache that is required by
|
||||
CephFS.
|
||||
|
||||
See :ref:`Deploying Metadata Servers<cephfs_add_remote_mds>`.
|
||||
|
||||
See the :ref:`ceph-mds man page<ceph_mds_man>`.
|
||||
|
||||
MGR
|
||||
The Ceph manager software, which collects all the state from
|
||||
@ -274,12 +302,30 @@
|
||||
:ref:`MON<arch_monitor>`
|
||||
The Ceph monitor software.
|
||||
|
||||
Monitor Store
|
||||
The persistent storage that is used by the Monitor. This
|
||||
includes the Monitor's RocksDB and all related files in
|
||||
``/var/lib/ceph``.
|
||||
|
||||
Node
|
||||
See :term:`Ceph Node`.
|
||||
|
||||
Object Storage Device
|
||||
See :term:`OSD`.
|
||||
|
||||
OMAP
|
||||
"object map". A key-value store (a database) that is used to
|
||||
reduce the time it takes to read data from and to write to the
|
||||
Ceph cluster. RGW bucket indexes are stored as OMAPs.
|
||||
Erasure-coded pools cannot store RADOS OMAP data structures.
|
||||
|
||||
Run the command ``ceph osd df`` to see your OMAPs.
|
||||
|
||||
See Eleanor Cawthon's 2012 paper `A Distributed Key-Value Store
|
||||
using Ceph
|
||||
<https://ceph.io/assets/pdfs/CawthonKeyValueStore.pdf>`_ (17
|
||||
pages).
|
||||
|
||||
OSD
|
||||
Probably :term:`Ceph OSD`, but not necessarily. Sometimes
|
||||
(especially in older correspondence, and especially in
|
||||
@ -291,18 +337,19 @@
|
||||
mid-2010s to insist that "OSD" should refer to "Object Storage
|
||||
Device", so it is important to know which meaning is intended.
|
||||
|
||||
OSD fsid
|
||||
This is a unique identifier used to identify an OSD. It is
|
||||
found in the OSD path in a file called ``osd_fsid``. The
|
||||
term ``fsid`` is used interchangeably with ``uuid``
|
||||
OSD FSID
|
||||
The OSD fsid is a unique identifier that is used to identify an
|
||||
OSD. It is found in the OSD path in a file called ``osd_fsid``.
|
||||
The term ``FSID`` is used interchangeably with ``UUID``.
|
||||
|
||||
OSD id
|
||||
The integer that defines an OSD. It is generated by the
|
||||
monitors during the creation of each OSD.
|
||||
OSD ID
|
||||
The OSD id an integer unique to each OSD (each OSD has a unique
|
||||
OSD ID). Each OSD id is generated by the monitors during the
|
||||
creation of its associated OSD.
|
||||
|
||||
OSD uuid
|
||||
This is the unique identifier of an OSD. This term is used
|
||||
interchangeably with ``fsid``
|
||||
OSD UUID
|
||||
The OSD UUID is the unique identifier of an OSD. This term is
|
||||
used interchangeably with ``FSID``.
|
||||
|
||||
Period
|
||||
In the context of :term:`RGW`, a period is the configuration
|
||||
|
183
ceph/doc/hardware-monitoring/index.rst
Normal file
183
ceph/doc/hardware-monitoring/index.rst
Normal file
@ -0,0 +1,183 @@
|
||||
.. _hardware-monitoring:
|
||||
|
||||
Hardware monitoring
|
||||
===================
|
||||
|
||||
`node-proxy` is the internal name to designate the running agent which inventories a machine's hardware, provides the different statuses and enable the operator to perform some actions.
|
||||
It gathers details from the RedFish API, processes and pushes data to agent endpoint in the Ceph manager daemon.
|
||||
|
||||
.. graphviz::
|
||||
|
||||
digraph G {
|
||||
node [shape=record];
|
||||
mgr [label="{<mgr> ceph manager}"];
|
||||
dashboard [label="<dashboard> ceph dashboard"];
|
||||
agent [label="<agent> agent"];
|
||||
redfish [label="<redfish> redfish"];
|
||||
|
||||
agent -> redfish [label=" 1." color=green];
|
||||
agent -> mgr [label=" 2." color=orange];
|
||||
dashboard:dashboard -> mgr [label=" 3."color=lightgreen];
|
||||
node [shape=plaintext];
|
||||
legend [label=<<table border="0" cellborder="1" cellspacing="0">
|
||||
<tr><td bgcolor="lightgrey">Legend</td></tr>
|
||||
<tr><td align="center">1. Collects data from redfish API</td></tr>
|
||||
<tr><td align="left">2. Pushes data to ceph mgr</td></tr>
|
||||
<tr><td align="left">3. Query ceph mgr</td></tr>
|
||||
</table>>];
|
||||
}
|
||||
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
For the time being, the `node-proxy` agent relies on the RedFish API.
|
||||
It implies both `node-proxy` agent and `ceph-mgr` daemon need to be able to access the Out-Of-Band network to work.
|
||||
|
||||
|
||||
Deploying the agent
|
||||
-------------------
|
||||
|
||||
| The first step is to provide the out of band management tool credentials.
|
||||
| This can be done when adding the host with a service spec file:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# cat host.yml
|
||||
---
|
||||
service_type: host
|
||||
hostname: node-10
|
||||
addr: 10.10.10.10
|
||||
oob:
|
||||
addr: 20.20.20.10
|
||||
username: admin
|
||||
password: p@ssword
|
||||
|
||||
Apply the spec:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# ceph orch apply -i host.yml
|
||||
Added host 'node-10' with addr '10.10.10.10'
|
||||
|
||||
Deploy the agent:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# ceph config set mgr mgr/cephadm/hw_monitoring true
|
||||
|
||||
CLI
|
||||
---
|
||||
|
||||
| **orch** **hardware** **status** [hostname] [--category CATEGORY] [--format plain | json]
|
||||
|
||||
supported categories are:
|
||||
|
||||
* summary (default)
|
||||
* memory
|
||||
* storage
|
||||
* processors
|
||||
* network
|
||||
* power
|
||||
* fans
|
||||
* firmwares
|
||||
* criticals
|
||||
|
||||
Examples
|
||||
********
|
||||
|
||||
|
||||
hardware health statuses summary
|
||||
++++++++++++++++++++++++++++++++
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# ceph orch hardware status
|
||||
+------------+---------+-----+-----+--------+-------+------+
|
||||
| HOST | STORAGE | CPU | NET | MEMORY | POWER | FANS |
|
||||
+------------+---------+-----+-----+--------+-------+------+
|
||||
| node-10 | ok | ok | ok | ok | ok | ok |
|
||||
+------------+---------+-----+-----+--------+-------+------+
|
||||
|
||||
|
||||
storage devices report
|
||||
++++++++++++++++++++++
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# ceph orch hardware status IBM-Ceph-1 --category storage
|
||||
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
|
||||
| HOST | NAME | MODEL | SIZE | PROTOCOL | SN | STATUS | STATE |
|
||||
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
|
||||
| node-10 | Disk 8 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99QLL | OK | Enabled |
|
||||
| node-10 | Disk 10 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZYX | OK | Enabled |
|
||||
| node-10 | Disk 11 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZWB | OK | Enabled |
|
||||
| node-10 | Disk 9 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZC9 | OK | Enabled |
|
||||
| node-10 | Disk 3 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT9903Y | OK | Enabled |
|
||||
| node-10 | Disk 1 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT9901E | OK | Enabled |
|
||||
| node-10 | Disk 7 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZQJ | OK | Enabled |
|
||||
| node-10 | Disk 2 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99PA2 | OK | Enabled |
|
||||
| node-10 | Disk 4 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99PFG | OK | Enabled |
|
||||
| node-10 | Disk 0 in Backplane 0 of Storage Controller in Slot 2 | MZ7L33T8HBNAAD3 | 3840755981824 | SATA | S6M5NE0T800539 | OK | Enabled |
|
||||
| node-10 | Disk 1 in Backplane 0 of Storage Controller in Slot 2 | MZ7L33T8HBNAAD3 | 3840755981824 | SATA | S6M5NE0T800554 | OK | Enabled |
|
||||
| node-10 | Disk 6 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZER | OK | Enabled |
|
||||
| node-10 | Disk 0 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZEJ | OK | Enabled |
|
||||
| node-10 | Disk 5 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99QMH | OK | Enabled |
|
||||
| node-10 | Disk 0 on AHCI Controller in SL 6 | MTFDDAV240TDU | 240057409536 | SATA | 22373BB1E0F8 | OK | Enabled |
|
||||
| node-10 | Disk 1 on AHCI Controller in SL 6 | MTFDDAV240TDU | 240057409536 | SATA | 22373BB1E0D5 | OK | Enabled |
|
||||
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
|
||||
|
||||
|
||||
|
||||
firmwares details
|
||||
+++++++++++++++++
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# ceph orch hardware status node-10 --category firmwares
|
||||
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
|
||||
| HOST | COMPONENT | NAME | DATE | VERSION | STATUS |
|
||||
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
|
||||
| node-10 | current-107649-7.03__raid.backplane.firmware.0 | Backplane 0 | 2022-12-05T00:00:00Z | 7.03 | OK |
|
||||
|
||||
|
||||
... omitted output ...
|
||||
|
||||
|
||||
| node-10 | previous-25227-6.10.30.20__idrac.embedded.1-1 | Integrated Remote Access Controller | 00:00:00Z | 6.10.30.20 | OK |
|
||||
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
|
||||
|
||||
|
||||
hardware critical warnings report
|
||||
+++++++++++++++++++++++++++++++++
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# ceph orch hardware status --category criticals
|
||||
+------------+-----------+------------+----------+-----------------+
|
||||
| HOST | COMPONENT | NAME | STATUS | STATE |
|
||||
+------------+-----------+------------+----------+-----------------+
|
||||
| node-10 | power | PS2 Status | critical | unplugged |
|
||||
+------------+-----------+------------+----------+-----------------+
|
||||
|
||||
|
||||
Developpers
|
||||
-----------
|
||||
|
||||
.. py:currentmodule:: cephadm.agent
|
||||
.. autoclass:: NodeProxyEndpoint
|
||||
.. automethod:: NodeProxyEndpoint.__init__
|
||||
.. automethod:: NodeProxyEndpoint.oob
|
||||
.. automethod:: NodeProxyEndpoint.data
|
||||
.. automethod:: NodeProxyEndpoint.fullreport
|
||||
.. automethod:: NodeProxyEndpoint.summary
|
||||
.. automethod:: NodeProxyEndpoint.criticals
|
||||
.. automethod:: NodeProxyEndpoint.memory
|
||||
.. automethod:: NodeProxyEndpoint.storage
|
||||
.. automethod:: NodeProxyEndpoint.network
|
||||
.. automethod:: NodeProxyEndpoint.power
|
||||
.. automethod:: NodeProxyEndpoint.processors
|
||||
.. automethod:: NodeProxyEndpoint.fans
|
||||
.. automethod:: NodeProxyEndpoint.firmwares
|
||||
.. automethod:: NodeProxyEndpoint.led
|
||||
|
@ -118,8 +118,9 @@ about Ceph, see our `Architecture`_ section.
|
||||
governance
|
||||
foundation
|
||||
ceph-volume/index
|
||||
releases/general
|
||||
releases/index
|
||||
Ceph Releases (general) <https://docs.ceph.com/en/latest/releases/general/>
|
||||
Ceph Releases (index) <https://docs.ceph.com/en/latest/releases/>
|
||||
security/index
|
||||
hardware-monitoring/index
|
||||
Glossary <glossary>
|
||||
Tracing <jaegertracing/index>
|
||||
|
@ -98,59 +98,7 @@ repository.
|
||||
Updating Submodules
|
||||
-------------------
|
||||
|
||||
#. Determine whether your submodules are out of date:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
git status
|
||||
|
||||
A. If your submodules are up to date
|
||||
If your submodules are up to date, the following console output will
|
||||
appear:
|
||||
|
||||
::
|
||||
|
||||
On branch main
|
||||
Your branch is up to date with 'origin/main'.
|
||||
|
||||
nothing to commit, working tree clean
|
||||
|
||||
If you see this console output, then your submodules are up to date.
|
||||
You do not need this procedure.
|
||||
|
||||
|
||||
B. If your submodules are not up to date
|
||||
If your submodules are not up to date, you will see a message that
|
||||
includes a list of "untracked files". The example here shows such a
|
||||
list, which was generated from a real situation in which the
|
||||
submodules were no longer current. Your list of files will not be the
|
||||
same as this list of files, but this list is provided as an example.
|
||||
If in your case any untracked files are listed, then you should
|
||||
continue to the next step of this procedure.
|
||||
|
||||
::
|
||||
|
||||
On branch main
|
||||
Your branch is up to date with 'origin/main'.
|
||||
|
||||
Untracked files:
|
||||
(use "git add <file>..." to include in what will be committed)
|
||||
src/pybind/cephfs/build/
|
||||
src/pybind/cephfs/cephfs.c
|
||||
src/pybind/cephfs/cephfs.egg-info/
|
||||
src/pybind/rados/build/
|
||||
src/pybind/rados/rados.c
|
||||
src/pybind/rados/rados.egg-info/
|
||||
src/pybind/rbd/build/
|
||||
src/pybind/rbd/rbd.c
|
||||
src/pybind/rbd/rbd.egg-info/
|
||||
src/pybind/rgw/build/
|
||||
src/pybind/rgw/rgw.c
|
||||
src/pybind/rgw/rgw.egg-info/
|
||||
|
||||
nothing added to commit but untracked files present (use "git add" to track)
|
||||
|
||||
#. If your submodules are out of date, run the following commands:
|
||||
If your submodules are out of date, run the following commands:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
@ -158,24 +106,10 @@ Updating Submodules
|
||||
git clean -fdx
|
||||
git submodule foreach git clean -fdx
|
||||
|
||||
If you still have problems with a submodule directory, use ``rm -rf
|
||||
[directory name]`` to remove the directory. Then run ``git submodule update
|
||||
--init --recursive`` again.
|
||||
If you still have problems with a submodule directory, use ``rm -rf [directory
|
||||
name]`` to remove the directory. Then run ``git submodule update --init
|
||||
--recursive --progress`` again.
|
||||
|
||||
#. Run ``git status`` again:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
git status
|
||||
|
||||
Your submodules are up to date if you see the following message:
|
||||
|
||||
::
|
||||
|
||||
On branch main
|
||||
Your branch is up to date with 'origin/main'.
|
||||
|
||||
nothing to commit, working tree clean
|
||||
|
||||
Choose a Branch
|
||||
===============
|
||||
|
@ -251,6 +251,17 @@ openSUSE Tumbleweed
|
||||
The newest major release of Ceph is already available through the normal Tumbleweed repositories.
|
||||
There's no need to add another package repository manually.
|
||||
|
||||
openEuler
|
||||
^^^^^^^^^
|
||||
|
||||
There are two major versions supported in normal openEuler repositories. They are ceph 12.2.8 in openEuler-20.03-LTS series and ceph 16.2.7 in openEuler-22.03-LTS series. There’s no need to add another package repository manually.
|
||||
You can install ceph just by executing the following:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
sudo yum -y install ceph
|
||||
|
||||
Also you can download packages manually from https://repo.openeuler.org/openEuler-{release}/everything/{arch}/Packages/.
|
||||
|
||||
Ceph Development Packages
|
||||
-------------------------
|
||||
|
@ -4,14 +4,13 @@
|
||||
Installing Ceph
|
||||
===============
|
||||
|
||||
There are multiple ways to install Ceph.
|
||||
There are multiple ways to install Ceph.
|
||||
|
||||
Recommended methods
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
:ref:`Cephadm <cephadm_deploying_new_cluster>` installs and manages a Ceph
|
||||
cluster that uses containers and systemd and is tightly integrated with the CLI
|
||||
and dashboard GUI.
|
||||
:ref:`Cephadm <cephadm_deploying_new_cluster>` is a tool that can be used to
|
||||
install and manage a Ceph cluster.
|
||||
|
||||
* cephadm supports only Octopus and newer releases.
|
||||
* cephadm is fully integrated with the orchestration API and fully supports the
|
||||
@ -59,6 +58,8 @@ tool that can be used to quickly deploy clusters. It is deprecated.
|
||||
|
||||
`github.com/openstack/puppet-ceph <https://github.com/openstack/puppet-ceph>`_ installs Ceph via Puppet.
|
||||
|
||||
`OpenNebula HCI clusters <https://docs.opennebula.io/stable/provision_clusters/hci_clusters/overview.html>`_ deploys Ceph on various cloud platforms.
|
||||
|
||||
Ceph can also be :ref:`installed manually <install-manual>`.
|
||||
|
||||
|
||||
|
@ -461,6 +461,52 @@ In the below instructions, ``{id}`` is an arbitrary name, such as the hostname o
|
||||
|
||||
#. Now you are ready to `create a Ceph file system`_.
|
||||
|
||||
Manually Installing RADOSGW
|
||||
===========================
|
||||
|
||||
For a more involved discussion of the procedure presented here, see `this
|
||||
thread on the ceph-users mailing list
|
||||
<https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LB3YRIKAPOHXYCW7MKLVUJPYWYRQVARU/>`_.
|
||||
|
||||
#. Install ``radosgw`` packages on the nodes that will be the RGW nodes.
|
||||
|
||||
#. From a monitor or from a node with admin privileges, run a command of the
|
||||
following form:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph auth get-or-create client.short-hostname-of-rgw mon 'allow rw' osd 'allow rwx'
|
||||
|
||||
#. On one of the RGW nodes, do the following:
|
||||
|
||||
a. Create a ``ceph-user``-owned directory. For example:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
install -d -o ceph -g ceph /var/lib/ceph/radosgw/ceph-$(hostname -s)
|
||||
|
||||
b. Enter the directory just created and create a ``keyring`` file:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
touch /var/lib/ceph/radosgw/ceph-$(hostname -s)/keyring
|
||||
|
||||
Use a command similar to this one to put the key from the earlier ``ceph
|
||||
auth get-or-create`` step in the ``keyring`` file. Use your preferred
|
||||
editor:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
$EDITOR /var/lib/ceph/radosgw/ceph-$(hostname -s)/keyring
|
||||
|
||||
c. Repeat these steps on every RGW node.
|
||||
|
||||
#. Start the RADOSGW service by running the following command:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
systemctl start ceph-radosgw@$(hostname -s).service
|
||||
|
||||
|
||||
Summary
|
||||
=======
|
||||
|
@ -1,5 +1,7 @@
|
||||
:orphan:
|
||||
|
||||
.. _ceph_mds_man:
|
||||
|
||||
=========================================
|
||||
ceph-mds -- ceph metadata server daemon
|
||||
=========================================
|
||||
|
@ -244,45 +244,56 @@ Procedure
|
||||
Manipulating the Object Map Key
|
||||
-------------------------------
|
||||
|
||||
Use the **ceph-objectstore-tool** utility to change the object map (OMAP) key. You need to provide the data path, the placement group identifier (PG ID), the object, and the key in the OMAP.
|
||||
Note
|
||||
Use the **ceph-objectstore-tool** utility to change the object map (OMAP) key.
|
||||
Provide the data path, the placement group identifier (PG ID), the object, and
|
||||
the key in the OMAP.
|
||||
|
||||
Prerequisites
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
* Having root access to the Ceph OSD node.
|
||||
* Stopping the ceph-osd daemon.
|
||||
|
||||
Procedure
|
||||
Commands
|
||||
^^^^^^^^
|
||||
|
||||
Get the object map key:
|
||||
Run the commands in this section as ``root`` on an OSD node.
|
||||
|
||||
Syntax::
|
||||
* **Getting the object map key**
|
||||
|
||||
Syntax:
|
||||
|
||||
.. code-block:: ini
|
||||
|
||||
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-omap $KEY > $OBJECT_MAP_FILE_NAME
|
||||
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-omap $KEY > $OBJECT_MAP_FILE_NAME
|
||||
|
||||
Example::
|
||||
|
||||
[root@osd ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' get-omap "" > zone_info.default.omap.txt
|
||||
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' get-omap "" > zone_info.default.omap.txt
|
||||
|
||||
Set the object map key:
|
||||
* **Setting the object map key**
|
||||
|
||||
Syntax::
|
||||
Syntax:
|
||||
|
||||
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-omap $KEY < $OBJECT_MAP_FILE_NAME
|
||||
.. code-block:: ini
|
||||
|
||||
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-omap $KEY < $OBJECT_MAP_FILE_NAME
|
||||
|
||||
Example::
|
||||
|
||||
[root@osd ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' set-omap "" < zone_info.default.omap.txt
|
||||
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' set-omap "" < zone_info.default.omap.txt
|
||||
|
||||
Remove the object map key:
|
||||
* **Removing the object map key**
|
||||
|
||||
Syntax::
|
||||
Syntax:
|
||||
|
||||
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT rm-omap $KEY
|
||||
.. code-block:: ini
|
||||
|
||||
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT rm-omap $KEY
|
||||
|
||||
Example::
|
||||
|
||||
[root@osd ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' rm-omap ""
|
||||
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' rm-omap ""
|
||||
|
||||
|
||||
Listing an Object's Attributes
|
||||
|
@ -18,14 +18,16 @@ Synopsis
|
||||
Description
|
||||
===========
|
||||
|
||||
**ceph-osd** is the object storage daemon for the Ceph distributed file
|
||||
system. It is responsible for storing objects on a local file system
|
||||
and providing access to them over the network.
|
||||
**ceph-osd** is the **o**\bject **s**\torage **d**\aemon for the Ceph
|
||||
distributed file system. It manages data on local storage with redundancy and
|
||||
provides access to that data over the network.
|
||||
|
||||
The datapath argument should be a directory on a xfs file system
|
||||
where the object data resides. The journal is optional, and is only
|
||||
useful performance-wise when it resides on a different disk than
|
||||
datapath with low latency (ideally, an NVRAM device).
|
||||
For Filestore-backed clusters, the argument of the ``--osd-data datapath``
|
||||
option (which is ``datapath`` in this example) should be a directory on an XFS
|
||||
file system where the object data resides. The journal is optional. The journal
|
||||
improves performance only when it resides on a different disk than the disk
|
||||
specified by ``datapath`` . The storage medium on which the journal is stored
|
||||
should be a low-latency medium (ideally, an SSD device).
|
||||
|
||||
|
||||
Options
|
||||
|
@ -56,7 +56,7 @@ Options
|
||||
|
||||
.. code:: bash
|
||||
|
||||
[build]$ python3 -m venv venv && source venv/bin/activate && pip3 install cmd2
|
||||
[build]$ python3 -m venv venv && source venv/bin/activate && pip3 install cmd2 colorama
|
||||
[build]$ source vstart_environment.sh && source venv/bin/activate && python3 ../src/tools/cephfs/shell/cephfs-shell
|
||||
|
||||
Commands
|
||||
|
@ -199,6 +199,50 @@ Advanced
|
||||
option is enabled, a namespace operation may complete before the MDS
|
||||
replies, if it has sufficient capabilities to do so.
|
||||
|
||||
:command:`crush_location=x`
|
||||
Specify the location of the client in terms of CRUSH hierarchy (since 5.8).
|
||||
This is a set of key-value pairs separated from each other by '|', with
|
||||
keys separated from values by ':'. Note that '|' may need to be quoted
|
||||
or escaped to avoid it being interpreted as a pipe by the shell. The key
|
||||
is the bucket type name (e.g. rack, datacenter or region with default
|
||||
bucket types) and the value is the bucket name. For example, to indicate
|
||||
that the client is local to rack "myrack", data center "mydc" and region
|
||||
"myregion"::
|
||||
|
||||
crush_location=rack:myrack|datacenter:mydc|region:myregion
|
||||
|
||||
Each key-value pair stands on its own: "myrack" doesn't need to reside in
|
||||
"mydc", which in turn doesn't need to reside in "myregion". The location
|
||||
is not a path to the root of the hierarchy but rather a set of nodes that
|
||||
are matched independently. "Multipath" locations are supported, so it is
|
||||
possible to indicate locality for multiple parallel hierarchies::
|
||||
|
||||
crush_location=rack:myrack1|rack:myrack2|datacenter:mydc
|
||||
|
||||
|
||||
:command:`read_from_replica=<no|balance|localize>`
|
||||
- ``no``: Disable replica reads, always pick the primary OSD (since 5.8, default).
|
||||
|
||||
- ``balance``: When a replicated pool receives a read request, pick a random
|
||||
OSD from the PG's acting set to serve it (since 5.8).
|
||||
|
||||
This mode is safe for general use only since Octopus (i.e. after "ceph osd
|
||||
require-osd-release octopus"). Otherwise it should be limited to read-only
|
||||
workloads such as snapshots.
|
||||
|
||||
- ``localize``: When a replicated pool receives a read request, pick the most
|
||||
local OSD to serve it (since 5.8). The locality metric is calculated against
|
||||
the location of the client given with crush_location; a match with the
|
||||
lowest-valued bucket type wins. For example, an OSD in a matching rack
|
||||
is closer than an OSD in a matching data center, which in turn is closer
|
||||
than an OSD in a matching region.
|
||||
|
||||
This mode is safe for general use only since Octopus (i.e. after "ceph osd
|
||||
require-osd-release octopus"). Otherwise it should be limited to read-only
|
||||
workloads such as snapshots.
|
||||
|
||||
|
||||
|
||||
Examples
|
||||
========
|
||||
|
||||
|
@ -333,7 +333,7 @@ Commands
|
||||
be specified.
|
||||
|
||||
:command:`flatten` [--encryption-format *encryption-format* --encryption-passphrase-file *passphrase-file*]... *image-spec*
|
||||
If image is a clone, copy all shared blocks from the parent snapshot and
|
||||
If the image is a clone, copy all shared blocks from the parent snapshot and
|
||||
make the child independent of the parent, severing the link between
|
||||
parent snap and child. The parent snapshot can be unprotected and
|
||||
deleted if it has no further dependent clones.
|
||||
@ -390,7 +390,7 @@ Commands
|
||||
Set metadata key with the value. They will displayed in `image-meta list`.
|
||||
|
||||
:command:`import` [--export-format *format (1 or 2)*] [--image-format *format-id*] [--object-size *size-in-B/K/M*] [--stripe-unit *size-in-B/K/M* --stripe-count *num*] [--image-feature *feature-name*]... [--image-shared] *src-path* [*image-spec*]
|
||||
Create a new image and imports its data from path (use - for
|
||||
Create a new image and import its data from path (use - for
|
||||
stdin). The import operation will try to create sparse rbd images
|
||||
if possible. For import from stdin, the sparsification unit is
|
||||
the data block size of the destination image (object size).
|
||||
@ -402,14 +402,14 @@ Commands
|
||||
of image, but also the snapshots and other properties, such as image_order, features.
|
||||
|
||||
:command:`import-diff` *src-path* *image-spec*
|
||||
Import an incremental diff of an image and applies it to the current image. If the diff
|
||||
Import an incremental diff of an image and apply it to the current image. If the diff
|
||||
was generated relative to a start snapshot, we verify that snapshot already exists before
|
||||
continuing. If there was an end snapshot we verify it does not already exist before
|
||||
applying the changes, and create the snapshot when we are done.
|
||||
|
||||
:command:`info` *image-spec* | *snap-spec*
|
||||
Will dump information (such as size and object size) about a specific rbd image.
|
||||
If image is a clone, information about its parent is also displayed.
|
||||
If the image is a clone, information about its parent is also displayed.
|
||||
If a snapshot is specified, whether it is protected is shown as well.
|
||||
|
||||
:command:`journal client disconnect` *journal-spec*
|
||||
@ -472,7 +472,7 @@ Commands
|
||||
the destination image are lost.
|
||||
|
||||
:command:`migration commit` *image-spec*
|
||||
Commit image migration. This step is run after a successful migration
|
||||
Commit image migration. This step is run after successful migration
|
||||
prepare and migration execute steps and removes the source image data.
|
||||
|
||||
:command:`migration execute` *image-spec*
|
||||
@ -499,14 +499,12 @@ Commands
|
||||
:command:`mirror image disable` [--force] *image-spec*
|
||||
Disable RBD mirroring for an image. If the mirroring is
|
||||
configured in ``image`` mode for the image's pool, then it
|
||||
can be explicitly disabled mirroring for each image within
|
||||
the pool.
|
||||
must be disabled for each image individually.
|
||||
|
||||
:command:`mirror image enable` *image-spec* *mode*
|
||||
Enable RBD mirroring for an image. If the mirroring is
|
||||
configured in ``image`` mode for the image's pool, then it
|
||||
can be explicitly enabled mirroring for each image within
|
||||
the pool.
|
||||
must be enabled for each image individually.
|
||||
|
||||
The mirror image mode can either be ``journal`` (default) or
|
||||
``snapshot``. The ``journal`` mode requires the RBD journaling
|
||||
@ -523,7 +521,7 @@ Commands
|
||||
|
||||
:command:`mirror pool demote` [*pool-name*]
|
||||
Demote all primary images within a pool to non-primary.
|
||||
Every mirroring enabled image will demoted in the pool.
|
||||
Every mirror-enabled image in the pool will be demoted.
|
||||
|
||||
:command:`mirror pool disable` [*pool-name*]
|
||||
Disable RBD mirroring by default within a pool. When mirroring
|
||||
@ -551,7 +549,7 @@ Commands
|
||||
|
||||
The default for *remote client name* is "client.admin".
|
||||
|
||||
This requires mirroring mode is enabled.
|
||||
This requires mirroring to be enabled on the pool.
|
||||
|
||||
:command:`mirror pool peer remove` [*pool-name*] *uuid*
|
||||
Remove a mirroring peer from a pool. The peer uuid is available
|
||||
@ -564,12 +562,12 @@ Commands
|
||||
|
||||
:command:`mirror pool promote` [--force] [*pool-name*]
|
||||
Promote all non-primary images within a pool to primary.
|
||||
Every mirroring enabled image will promoted in the pool.
|
||||
Every mirror-enabled image in the pool will be promoted.
|
||||
|
||||
:command:`mirror pool status` [--verbose] [*pool-name*]
|
||||
Show status for all mirrored images in the pool.
|
||||
With --verbose, also show additionally output status
|
||||
details for every mirroring image in the pool.
|
||||
With ``--verbose``, show additional output status
|
||||
details for every mirror-enabled image in the pool.
|
||||
|
||||
:command:`mirror snapshot schedule add` [-p | --pool *pool*] [--namespace *namespace*] [--image *image*] *interval* [*start-time*]
|
||||
Add mirror snapshot schedule.
|
||||
@ -603,7 +601,7 @@ Commands
|
||||
specified to rebuild an invalid object map for a snapshot.
|
||||
|
||||
:command:`pool init` [*pool-name*] [--force]
|
||||
Initialize pool for use by RBD. Newly created pools must initialized
|
||||
Initialize pool for use by RBD. Newly created pools must be initialized
|
||||
prior to use.
|
||||
|
||||
:command:`resize` (-s | --size *size-in-M/G/T*) [--allow-shrink] [--encryption-format *encryption-format* --encryption-passphrase-file *passphrase-file*]... *image-spec*
|
||||
@ -615,7 +613,7 @@ Commands
|
||||
snapshots, this fails and nothing is deleted.
|
||||
|
||||
:command:`snap create` *snap-spec*
|
||||
Create a new snapshot. Requires the snapshot name parameter specified.
|
||||
Create a new snapshot. Requires the snapshot name parameter to be specified.
|
||||
|
||||
:command:`snap limit clear` *image-spec*
|
||||
Remove any previously set limit on the number of snapshots allowed on
|
||||
@ -625,7 +623,7 @@ Commands
|
||||
Set a limit for the number of snapshots allowed on an image.
|
||||
|
||||
:command:`snap ls` *image-spec*
|
||||
Dump the list of snapshots inside a specific image.
|
||||
Dump the list of snapshots of a specific image.
|
||||
|
||||
:command:`snap protect` *snap-spec*
|
||||
Protect a snapshot from deletion, so that clones can be made of it
|
||||
@ -668,9 +666,11 @@ Commands
|
||||
:command:`trash ls` [*pool-name*]
|
||||
List all entries from trash.
|
||||
|
||||
:command:`trash mv` *image-spec*
|
||||
:command:`trash mv` [--expires-at <expires-at>] *image-spec*
|
||||
Move an image to the trash. Images, even ones actively in-use by
|
||||
clones, can be moved to the trash and deleted at a later time.
|
||||
clones, can be moved to the trash and deleted at a later time. Use
|
||||
``--expires-at`` to set the expiration time of an image after which
|
||||
it's allowed to be removed.
|
||||
|
||||
:command:`trash purge` [*pool-name*]
|
||||
Remove all expired images from trash.
|
||||
@ -678,10 +678,10 @@ Commands
|
||||
:command:`trash restore` *image-id*
|
||||
Restore an image from trash.
|
||||
|
||||
:command:`trash rm` *image-id*
|
||||
Delete an image from trash. If image deferment time has not expired
|
||||
you can not removed it unless use force. But an actively in-use by clones
|
||||
or has snapshots can not be removed.
|
||||
:command:`trash rm` [--force] *image-id*
|
||||
Delete an image from trash. If the image deferment time has not expired
|
||||
it can be removed using ``--force``. An image that is actively in-use by clones
|
||||
or has snapshots cannot be removed.
|
||||
|
||||
:command:`trash purge schedule add` [-p | --pool *pool*] [--namespace *namespace*] *interval* [*start-time*]
|
||||
Add trash purge schedule.
|
||||
|
@ -568,6 +568,9 @@ If the NFS service is running on a non-standard port number:
|
||||
|
||||
.. note:: Only NFS v4.0+ is supported.
|
||||
|
||||
.. note:: As of this writing (01 Jan 2024), no version of Microsoft Windows
|
||||
supports mouting an NFS v4.x export natively.
|
||||
|
||||
Troubleshooting
|
||||
===============
|
||||
|
||||
|
@ -151,3 +151,96 @@ ceph-mgr and check the logs.
|
||||
|
||||
With logging set to debug for the manager the module will print various logging
|
||||
lines prefixed with *mgr[zabbix]* for easy filtering.
|
||||
|
||||
Installing zabbix-agent 2
|
||||
-------------------------
|
||||
|
||||
*The procedures that explain the installation of Zabbix 2 were developed by John Jasen.*
|
||||
|
||||
Follow the instructions in the sections :ref:`mgr_zabbix_2_nodes`,
|
||||
:ref:`mgr_zabbix_2_cluster`, and :ref:`mgr_zabbix_2_server` to install a Zabbix
|
||||
server to monitor your Ceph cluster.
|
||||
|
||||
.. _mgr_zabbix_2_nodes:
|
||||
|
||||
Ceph MGR Nodes
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
#. Download an appropriate Zabbix release from https://www.zabbix.com/download
|
||||
or install a package from the Zabbix repositories.
|
||||
#. Use your package manager to remove any other Zabbix agents.
|
||||
#. Install ``zabbix-agent 2`` using the instructions at
|
||||
https://www.zabbix.com/download.
|
||||
#. Edit ``/etc/zabbix/zabbix-agent2.conf``. Add your Zabbix monitoring servers
|
||||
and your localhost to the ``Servers`` line of ``zabbix-agent2.conf``::
|
||||
|
||||
Server=127.0.0.1,zabbix2.example.com,zabbix1.example.com
|
||||
#. Start or restart the ``zabbix-agent2`` agent:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
systemctl restart zabbix-agent2
|
||||
|
||||
.. _mgr_zabbix_2_cluster:
|
||||
|
||||
Ceph Cluster
|
||||
^^^^^^^^^^^^
|
||||
|
||||
#. Enable the ``restful`` module:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph mgr module enable restful
|
||||
|
||||
#. Generate a self-signed certificate. This step is optional:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
restful create-self-signed-cert
|
||||
|
||||
#. Create an API user called ``zabbix-monitor``:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph restful create-key zabbix-monitor
|
||||
|
||||
The output of this command, an API key, will look something like this::
|
||||
|
||||
a4bb2019-XXXX-YYYY-ZZZZ-abcdefghij
|
||||
|
||||
#. Save the generated API key. It will be necessary later.
|
||||
#. Test API access by using ``zabbix-get``:
|
||||
|
||||
.. note:: This step is optional.
|
||||
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
zabbix_get -s 127.0.0.1 -k ceph.ping["${CEPH.CONNSTRING}","${CEPH.USER}","{CEPH.API.KEY}"
|
||||
|
||||
Example:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
zabbix_get -s 127.0.0.1 -k ceph.ping["https://localhost:8003","zabbix-monitor","a4bb2019-XXXX-YYYY-ZZZZ-abcdefghij"]
|
||||
|
||||
.. note:: You may need to install ``zabbix-get`` via your package manager.
|
||||
|
||||
.. _mgr_zabbix_2_server:
|
||||
|
||||
Zabbix Server
|
||||
^^^^^^^^^^^^^
|
||||
|
||||
#. Create a host for the Ceph monitoring servers.
|
||||
#. Add the template ``Ceph by Zabbix agent 2`` to the host.
|
||||
#. Inform the host of the keys:
|
||||
|
||||
#. Go to “Macros” on the host.
|
||||
#. Show “Inherited and host macros”.
|
||||
#. Change ``${CEPH.API.KEY}`` and ``${CEPH.USER}`` to the values provided
|
||||
under ``ceph restful create-key``, above. Example::
|
||||
|
||||
{$CEPH.API.KEY} a4bb2019-XXXX-YYYY-ZZZZ-abcdefghij
|
||||
{$CEPH.USER} zabbix-monitor
|
||||
|
||||
#. Update the host. Within a few cycles, data will populate the server.
|
||||
|
@ -470,5 +470,8 @@ Useful queries
|
||||
rate(ceph_rbd_read_latency_sum[30s]) / rate(ceph_rbd_read_latency_count[30s]) * on (instance) group_left (ceph_daemon) ceph_rgw_metadata
|
||||
|
||||
|
||||
Hardware monitoring
|
||||
===================
|
||||
|
||||
See :ref:`hardware-monitoring`
|
||||
|
||||
|
@ -1,3 +1,5 @@
|
||||
.. _librados-intro:
|
||||
|
||||
==========================
|
||||
Introduction to librados
|
||||
==========================
|
||||
|
@ -1,3 +1,5 @@
|
||||
.. _librados-python:
|
||||
|
||||
===================
|
||||
Librados (Python)
|
||||
===================
|
||||
|
@ -358,7 +358,7 @@ OSD and run the following command:
|
||||
|
||||
ceph-bluestore-tool \
|
||||
--path <data path> \
|
||||
--sharding="m(3) p(3,0-12) o(3,0-13)=block_cache={type=binned_lru} l p" \
|
||||
--sharding="m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} l p" \
|
||||
reshard
|
||||
|
||||
.. confval:: bluestore_rocksdb_cf
|
||||
|
@ -123,11 +123,10 @@ OSD host, run the following commands:
|
||||
ssh {osd-host}
|
||||
sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
|
||||
|
||||
The ``osd_data`` path ought to lead to a mount point that has mounted on it a
|
||||
device that is distinct from the device that contains the operating system and
|
||||
the daemons. To use a device distinct from the device that contains the
|
||||
The ``osd_data`` path must lead to a device that is not shared with the
|
||||
operating system. To use a device other than the device that contains the
|
||||
operating system and the daemons, prepare it for use with Ceph and mount it on
|
||||
the directory you just created by running the following commands:
|
||||
the directory you just created by running commands of the following form:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
|
@ -151,7 +151,7 @@ generates a catalog of all objects in each placement group and compares each
|
||||
primary object to its replicas, ensuring that no objects are missing or
|
||||
mismatched. Light scrubbing checks the object size and attributes, and is
|
||||
usually done daily. Deep scrubbing reads the data and uses checksums to ensure
|
||||
data integrity, and is usually done weekly. The freqeuncies of both light
|
||||
data integrity, and is usually done weekly. The frequencies of both light
|
||||
scrubbing and deep scrubbing are determined by the cluster's configuration,
|
||||
which is fully under your control and subject to the settings explained below
|
||||
in this section.
|
||||
|
@ -6,12 +6,41 @@
|
||||
|
||||
.. index:: pools; configuration
|
||||
|
||||
Ceph uses default values to determine how many placement groups (PGs) will be
|
||||
assigned to each pool. We recommend overriding some of the defaults.
|
||||
Specifically, we recommend setting a pool's replica size and overriding the
|
||||
default number of placement groups. You can set these values when running
|
||||
`pool`_ commands. You can also override the defaults by adding new ones in the
|
||||
``[global]`` section of your Ceph configuration file.
|
||||
The number of placement groups that the CRUSH algorithm assigns to each pool is
|
||||
determined by the values of variables in the centralized configuration database
|
||||
in the monitor cluster.
|
||||
|
||||
Both containerized deployments of Ceph (deployments made using ``cephadm`` or
|
||||
Rook) and non-containerized deployments of Ceph rely on the values in the
|
||||
central configuration database in the monitor cluster to assign placement
|
||||
groups to pools.
|
||||
|
||||
Example Commands
|
||||
----------------
|
||||
|
||||
To see the value of the variable that governs the number of placement groups in a given pool, run a command of the following form:
|
||||
|
||||
.. prompt:: bash
|
||||
|
||||
ceph config get osd osd_pool_default_pg_num
|
||||
|
||||
To set the value of the variable that governs the number of placement groups in a given pool, run a command of the following form:
|
||||
|
||||
.. prompt:: bash
|
||||
|
||||
ceph config set osd osd_pool_default_pg_num
|
||||
|
||||
Manual Tuning
|
||||
-------------
|
||||
In some cases, it might be advisable to override some of the defaults. For
|
||||
example, you might determine that it is wise to set a pool's replica size and
|
||||
to override the default number of placement groups in the pool. You can set
|
||||
these values when running `pool`_ commands.
|
||||
|
||||
See Also
|
||||
--------
|
||||
|
||||
See :ref:`pg-autoscaler`.
|
||||
|
||||
|
||||
.. literalinclude:: pool-pg.conf
|
||||
|
@ -344,12 +344,13 @@ addresses, repeat this process.
|
||||
Changing a Monitor's IP address (Advanced Method)
|
||||
-------------------------------------------------
|
||||
|
||||
There are cases in which the method outlined in :ref"`<Changing a Monitor's IP
|
||||
Address (Preferred Method)> operations_add_or_rm_mons_changing_mon_ip` cannot
|
||||
be used. For example, it might be necessary to move the cluster's monitors to a
|
||||
different network, to a different part of the datacenter, or to a different
|
||||
datacenter altogether. It is still possible to change the monitors' IP
|
||||
addresses, but a different method must be used.
|
||||
There are cases in which the method outlined in
|
||||
:ref:`operations_add_or_rm_mons_changing_mon_ip` cannot be used. For example,
|
||||
it might be necessary to move the cluster's monitors to a different network, to
|
||||
a different part of the datacenter, or to a different datacenter altogether. It
|
||||
is still possible to change the monitors' IP addresses, but a different method
|
||||
must be used.
|
||||
|
||||
|
||||
For such cases, a new monitor map with updated IP addresses for every monitor
|
||||
in the cluster must be generated and injected on each monitor. Although this
|
||||
@ -357,11 +358,11 @@ method is not particularly easy, such a major migration is unlikely to be a
|
||||
routine task. As stated at the beginning of this section, existing monitors are
|
||||
not supposed to change their IP addresses.
|
||||
|
||||
Continue with the monitor configuration in the example from :ref"`<Changing a
|
||||
Monitor's IP Address (Preferred Method)>
|
||||
operations_add_or_rm_mons_changing_mon_ip` . Suppose that all of the monitors
|
||||
are to be moved from the ``10.0.0.x`` range to the ``10.1.0.x`` range, and that
|
||||
these networks are unable to communicate. Carry out the following procedure:
|
||||
Continue with the monitor configuration in the example from
|
||||
:ref:`operations_add_or_rm_mons_changing_mon_ip`. Suppose that all of the
|
||||
monitors are to be moved from the ``10.0.0.x`` range to the ``10.1.0.x`` range,
|
||||
and that these networks are unable to communicate. Carry out the following
|
||||
procedure:
|
||||
|
||||
#. Retrieve the monitor map (``{tmp}`` is the path to the retrieved monitor
|
||||
map, and ``{filename}`` is the name of the file that contains the retrieved
|
||||
@ -448,7 +449,135 @@ and inject the modified monitor map into each new monitor.
|
||||
Migration to the new location is now complete. The monitors should operate
|
||||
successfully.
|
||||
|
||||
Using cephadm to change the public network
|
||||
==========================================
|
||||
|
||||
Overview
|
||||
--------
|
||||
|
||||
The procedure in this overview section provides only the broad outlines of
|
||||
using ``cephadm`` to change the public network.
|
||||
|
||||
#. Create backups of all keyrings, configuration files, and the current monmap.
|
||||
|
||||
#. Stop the cluster and disable ``ceph.target`` to prevent the daemons from
|
||||
starting.
|
||||
|
||||
#. Move the servers and power them on.
|
||||
|
||||
#. Change the network setup as desired.
|
||||
|
||||
|
||||
Example Procedure
|
||||
-----------------
|
||||
|
||||
.. note:: In this procedure, the "old network" has addresses of the form
|
||||
``10.10.10.0/24`` and the "new network" has addresses of the form
|
||||
``192.168.160.0/24``.
|
||||
|
||||
#. Enter the shell of the first monitor:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
cephadm shell --name mon.reef1
|
||||
|
||||
#. Extract the current monmap from ``mon.reef1``:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph-mon -i reef1 --extract-monmap monmap
|
||||
|
||||
#. Print the content of the monmap:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
monmaptool --print monmap
|
||||
|
||||
::
|
||||
|
||||
monmaptool: monmap file monmap
|
||||
epoch 5
|
||||
fsid 2851404a-d09a-11ee-9aaa-fa163e2de51a
|
||||
last_changed 2024-02-21T09:32:18.292040+0000
|
||||
created 2024-02-21T09:18:27.136371+0000
|
||||
min_mon_release 18 (reef)
|
||||
election_strategy: 1
|
||||
0: [v2:10.10.10.11:3300/0,v1:10.10.10.11:6789/0] mon.reef1
|
||||
1: [v2:10.10.10.12:3300/0,v1:10.10.10.12:6789/0] mon.reef2
|
||||
2: [v2:10.10.10.13:3300/0,v1:10.10.10.13:6789/0] mon.reef3
|
||||
|
||||
#. Remove monitors with old addresses:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
monmaptool --rm reef1 --rm reef2 --rm reef3 monmap
|
||||
|
||||
#. Add monitors with new addresses:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
monmaptool --addv reef1 [v2:192.168.160.11:3300/0,v1:192.168.160.11:6789/0] --addv reef2 [v2:192.168.160.12:3300/0,v1:192.168.160.12:6789/0] --addv reef3 [v2:192.168.160.13:3300/0,v1:192.168.160.13:6789/0] monmap
|
||||
|
||||
#. Verify that the changes to the monmap have been made successfully:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
monmaptool --print monmap
|
||||
|
||||
::
|
||||
|
||||
monmaptool: monmap file monmap
|
||||
epoch 4
|
||||
fsid 2851404a-d09a-11ee-9aaa-fa163e2de51a
|
||||
last_changed 2024-02-21T09:32:18.292040+0000
|
||||
created 2024-02-21T09:18:27.136371+0000
|
||||
min_mon_release 18 (reef)
|
||||
election_strategy: 1
|
||||
0: [v2:192.168.160.11:3300/0,v1:192.168.160.11:6789/0] mon.reef1
|
||||
1: [v2:192.168.160.12:3300/0,v1:192.168.160.12:6789/0] mon.reef2
|
||||
2: [v2:192.168.160.13:3300/0,v1:192.168.160.13:6789/0] mon.reef3
|
||||
|
||||
#. Inject the new monmap into the Ceph cluster:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph-mon -i reef1 --inject-monmap monmap
|
||||
|
||||
#. Repeat the steps above for all other monitors in the cluster.
|
||||
|
||||
#. Update ``/var/lib/ceph/{FSID}/mon.{MON}/config``.
|
||||
|
||||
#. Start the monitors.
|
||||
|
||||
#. Update the ceph ``public_network``:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph config set mon public_network 192.168.160.0/24
|
||||
|
||||
#. Update the configuration files of the managers
|
||||
(``/var/lib/ceph/{FSID}/mgr.{mgr}/config``) and start them. Orchestrator
|
||||
will now be available, but it will attempt to connect to the old network
|
||||
because the host list contains the old addresses.
|
||||
|
||||
#. Update the host addresses by running commands of the following form:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch host set-addr reef1 192.168.160.11
|
||||
ceph orch host set-addr reef2 192.168.160.12
|
||||
ceph orch host set-addr reef3 192.168.160.13
|
||||
|
||||
#. Wait a few minutes for the orchestrator to connect to each host.
|
||||
|
||||
#. Reconfigure the OSDs so that their config files are automatically updated:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch reconfig osd
|
||||
|
||||
*The above procedure was developed by Eugen Block and was successfully tested
|
||||
in February 2024 on Ceph version 18.2.1 (Reef).*
|
||||
|
||||
.. _Manual Deployment: ../../../install/manual-deployment
|
||||
.. _Monitor Bootstrap: ../../../dev/mon-bootstrap
|
||||
|
@ -474,27 +474,25 @@ following command:
|
||||
|
||||
ceph tell mds.{mds-id} config set {setting} {value}
|
||||
|
||||
Example:
|
||||
Example: to enable debug messages, run the following command:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
ceph tell mds.0 config set debug_ms 1
|
||||
|
||||
To enable debug messages, run the following command:
|
||||
To display the status of all metadata servers, run the following command:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
ceph mds stat
|
||||
|
||||
To display the status of all metadata servers, run the following command:
|
||||
To mark the active metadata server as failed (and to trigger failover to a
|
||||
standby if a standby is present), run the following command:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
ceph mds fail 0
|
||||
|
||||
To mark the active metadata server as failed (and to trigger failover to a
|
||||
standby if a standby is present), run the following command:
|
||||
|
||||
.. todo:: ``ceph mds`` subcommands missing docs: set, dump, getmap, stop, setmap
|
||||
|
||||
|
||||
|
@ -57,53 +57,62 @@ case for most clusters), its CRUSH location can be specified as follows::
|
||||
``pod``, ``pdu``, ``rack``, ``chassis``, and ``host``. These defined
|
||||
types suffice for nearly all clusters, but can be customized by
|
||||
modifying the CRUSH map.
|
||||
#. Not all keys need to be specified. For example, by default, Ceph
|
||||
automatically sets an ``OSD``'s location as ``root=default
|
||||
host=HOSTNAME`` (as determined by the output of ``hostname -s``).
|
||||
|
||||
The CRUSH location for an OSD can be modified by adding the ``crush location``
|
||||
option in ``ceph.conf``. When this option has been added, every time the OSD
|
||||
The CRUSH location for an OSD can be set by adding the ``crush_location``
|
||||
option in ``ceph.conf``, example:
|
||||
|
||||
crush_location = root=default row=a rack=a2 chassis=a2a host=a2a1
|
||||
|
||||
When this option has been added, every time the OSD
|
||||
starts it verifies that it is in the correct location in the CRUSH map and
|
||||
moves itself if it is not. To disable this automatic CRUSH map management, add
|
||||
the following to the ``ceph.conf`` configuration file in the ``[osd]``
|
||||
section::
|
||||
|
||||
osd crush update on start = false
|
||||
osd_crush_update_on_start = false
|
||||
|
||||
Note that this action is unnecessary in most cases.
|
||||
|
||||
If the ``crush_location`` is not set explicitly,
|
||||
a default of ``root=default host=HOSTNAME`` is used for ``OSD``s,
|
||||
where the hostname is determined by the output of the ``hostname -s`` command.
|
||||
|
||||
.. note:: If you switch from this default to an explicitly set ``crush_location``,
|
||||
do not forget to include ``root=default`` because existing CRUSH rules refer to it.
|
||||
|
||||
Custom location hooks
|
||||
---------------------
|
||||
|
||||
A custom location hook can be used to generate a more complete CRUSH location
|
||||
on startup. The CRUSH location is determined by, in order of preference:
|
||||
A custom location hook can be used to generate a more complete CRUSH location,
|
||||
on startup.
|
||||
|
||||
#. A ``crush location`` option in ``ceph.conf``
|
||||
#. A default of ``root=default host=HOSTNAME`` where the hostname is determined
|
||||
by the output of the ``hostname -s`` command
|
||||
This is useful when some location fields are not known at the time
|
||||
``ceph.conf`` is written (for example, fields ``rack`` or ``datacenter``
|
||||
when deploying a single configuration across multiple datacenters).
|
||||
|
||||
A script can be written to provide additional location fields (for example,
|
||||
``rack`` or ``datacenter``) and the hook can be enabled via the following
|
||||
config option::
|
||||
If configured, executed, and parsed successfully, the hook's output replaces
|
||||
any previously set CRUSH location.
|
||||
|
||||
crush location hook = /path/to/customized-ceph-crush-location
|
||||
The hook hook can be enabled in ``ceph.conf`` by providing a path to an
|
||||
executable file (often a script), example::
|
||||
|
||||
crush_location_hook = /path/to/customized-ceph-crush-location
|
||||
|
||||
This hook is passed several arguments (see below). The hook outputs a single
|
||||
line to ``stdout`` that contains the CRUSH location description. The output
|
||||
resembles the following:::
|
||||
line to ``stdout`` that contains the CRUSH location description. The arguments
|
||||
resemble the following:::
|
||||
|
||||
--cluster CLUSTER --id ID --type TYPE
|
||||
|
||||
Here the cluster name is typically ``ceph``, the ``id`` is the daemon
|
||||
identifier or (in the case of OSDs) the OSD number, and the daemon type is
|
||||
``osd``, ``mds, ``mgr``, or ``mon``.
|
||||
``osd``, ``mds``, ``mgr``, or ``mon``.
|
||||
|
||||
For example, a simple hook that specifies a rack location via a value in the
|
||||
file ``/etc/rack`` might be as follows::
|
||||
file ``/etc/rack`` (assuming it contains no spaces) might be as follows::
|
||||
|
||||
#!/bin/sh
|
||||
echo "host=$(hostname -s) rack=$(cat /etc/rack) root=default"
|
||||
echo "root=default rack=$(cat /etc/rack) host=$(hostname -s)"
|
||||
|
||||
|
||||
CRUSH structure
|
||||
|
@ -96,7 +96,9 @@ Where:
|
||||
``--force``
|
||||
|
||||
:Description: Override an existing profile by the same name, and allow
|
||||
setting a non-4K-aligned stripe_unit.
|
||||
setting a non-4K-aligned stripe_unit. Overriding an existing
|
||||
profile can be dangerous, and thus ``--yes-i-really-mean-it``
|
||||
must be used as well.
|
||||
|
||||
:Type: String
|
||||
:Required: No.
|
||||
|
@ -179,6 +179,8 @@ This can be enabled only on a pool residing on BlueStore OSDs, since
|
||||
BlueStore's checksumming is used during deep scrubs to detect bitrot
|
||||
or other corruption. Using Filestore with EC overwrites is not only
|
||||
unsafe, but it also results in lower performance compared to BlueStore.
|
||||
Moreover, Filestore is deprecated and any Filestore OSDs in your cluster
|
||||
should be migrated to BlueStore.
|
||||
|
||||
Erasure-coded pools do not support omap, so to use them with RBD and
|
||||
CephFS you must instruct them to store their data in an EC pool and
|
||||
@ -192,6 +194,182 @@ erasure-coded pool as the ``--data-pool`` during image creation:
|
||||
For CephFS, an erasure-coded pool can be set as the default data pool during
|
||||
file system creation or via `file layouts <../../../cephfs/file-layouts>`_.
|
||||
|
||||
Erasure-coded pool overhead
|
||||
---------------------------
|
||||
|
||||
The overhead factor (space amplification) of an erasure-coded pool
|
||||
is `(k+m) / k`. For a 4,2 profile, the overhead is
|
||||
thus 1.5, which means that 1.5 GiB of underlying storage are used to store
|
||||
1 GiB of user data. Contrast with default three-way replication, with
|
||||
which the overhead factor is 3.0. Do not mistake erasure coding for a free
|
||||
lunch: there is a significant performance tradeoff, especially when using HDDs
|
||||
and when performing cluster recovery or backfill.
|
||||
|
||||
Below is a table showing the overhead factors for various values of `k` and `m`.
|
||||
As `m` increases above 2, the incremental capacity overhead gain quickly
|
||||
experiences diminishing returns but the performance impact grows proportionally.
|
||||
We recommend that you do not choose a profile with `k` > 4 or `m` > 2 until
|
||||
and unless you fully understand the ramifications, including the number of
|
||||
failure domains your cluster topology must contain. If you choose `m=1`,
|
||||
expect data unavailability during maintenance and data loss if component
|
||||
failures overlap.
|
||||
|
||||
.. list-table:: Erasure coding overhead
|
||||
:widths: 4 4 4 4 4 4 4 4 4 4 4 4
|
||||
:header-rows: 1
|
||||
:stub-columns: 1
|
||||
|
||||
* -
|
||||
- m=1
|
||||
- m=2
|
||||
- m=3
|
||||
- m=4
|
||||
- m=4
|
||||
- m=6
|
||||
- m=7
|
||||
- m=8
|
||||
- m=9
|
||||
- m=10
|
||||
- m=11
|
||||
* - k=1
|
||||
- 2.00
|
||||
- 3.00
|
||||
- 4.00
|
||||
- 5.00
|
||||
- 6.00
|
||||
- 7.00
|
||||
- 8.00
|
||||
- 9.00
|
||||
- 10.00
|
||||
- 11.00
|
||||
- 12.00
|
||||
* - k=2
|
||||
- 1.50
|
||||
- 2.00
|
||||
- 2.50
|
||||
- 3.00
|
||||
- 3.50
|
||||
- 4.00
|
||||
- 4.50
|
||||
- 5.00
|
||||
- 5.50
|
||||
- 6.00
|
||||
- 6.50
|
||||
* - k=3
|
||||
- 1.33
|
||||
- 1.67
|
||||
- 2.00
|
||||
- 2.33
|
||||
- 2.67
|
||||
- 3.00
|
||||
- 3.33
|
||||
- 3.67
|
||||
- 4.00
|
||||
- 4.33
|
||||
- 4.67
|
||||
* - k=4
|
||||
- 1.25
|
||||
- 1.50
|
||||
- 1.75
|
||||
- 2.00
|
||||
- 2.25
|
||||
- 2.50
|
||||
- 2.75
|
||||
- 3.00
|
||||
- 3.25
|
||||
- 3.50
|
||||
- 3.75
|
||||
* - k=5
|
||||
- 1.20
|
||||
- 1.40
|
||||
- 1.60
|
||||
- 1.80
|
||||
- 2.00
|
||||
- 2.20
|
||||
- 2.40
|
||||
- 2.60
|
||||
- 2.80
|
||||
- 3.00
|
||||
- 3.20
|
||||
* - k=6
|
||||
- 1.16
|
||||
- 1.33
|
||||
- 1.50
|
||||
- 1.66
|
||||
- 1.83
|
||||
- 2.00
|
||||
- 2.17
|
||||
- 2.33
|
||||
- 2.50
|
||||
- 2.66
|
||||
- 2.83
|
||||
* - k=7
|
||||
- 1.14
|
||||
- 1.29
|
||||
- 1.43
|
||||
- 1.58
|
||||
- 1.71
|
||||
- 1.86
|
||||
- 2.00
|
||||
- 2.14
|
||||
- 2.29
|
||||
- 2.43
|
||||
- 2.58
|
||||
* - k=8
|
||||
- 1.13
|
||||
- 1.25
|
||||
- 1.38
|
||||
- 1.50
|
||||
- 1.63
|
||||
- 1.75
|
||||
- 1.88
|
||||
- 2.00
|
||||
- 2.13
|
||||
- 2.25
|
||||
- 2.38
|
||||
* - k=9
|
||||
- 1.11
|
||||
- 1.22
|
||||
- 1.33
|
||||
- 1.44
|
||||
- 1.56
|
||||
- 1.67
|
||||
- 1.78
|
||||
- 1.88
|
||||
- 2.00
|
||||
- 2.11
|
||||
- 2.22
|
||||
* - k=10
|
||||
- 1.10
|
||||
- 1.20
|
||||
- 1.30
|
||||
- 1.40
|
||||
- 1.50
|
||||
- 1.60
|
||||
- 1.70
|
||||
- 1.80
|
||||
- 1.90
|
||||
- 2.00
|
||||
- 2.10
|
||||
* - k=11
|
||||
- 1.09
|
||||
- 1.18
|
||||
- 1.27
|
||||
- 1.36
|
||||
- 1.45
|
||||
- 1.54
|
||||
- 1.63
|
||||
- 1.72
|
||||
- 1.82
|
||||
- 1.91
|
||||
- 2.00
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
Erasure-coded pools and cache tiering
|
||||
-------------------------------------
|
||||
|
@ -21,6 +21,7 @@ and, monitoring an operating cluster.
|
||||
monitoring-osd-pg
|
||||
user-management
|
||||
pg-repair
|
||||
pgcalc/index
|
||||
|
||||
.. raw:: html
|
||||
|
||||
|
@ -517,6 +517,8 @@ multiple monitors are running to ensure proper functioning of your Ceph
|
||||
cluster. Check monitor status regularly in order to ensure that all of the
|
||||
monitors are running.
|
||||
|
||||
.. _display-mon-map:
|
||||
|
||||
To display the monitor map, run the following command:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
68
ceph/doc/rados/operations/pgcalc/index.rst
Normal file
68
ceph/doc/rados/operations/pgcalc/index.rst
Normal file
@ -0,0 +1,68 @@
|
||||
.. _pgcalc:
|
||||
|
||||
|
||||
=======
|
||||
PG Calc
|
||||
=======
|
||||
|
||||
|
||||
.. raw:: html
|
||||
|
||||
|
||||
<link rel="stylesheet" id="wp-job-manager-job-listings-css" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/wp-content/plugins/wp-job-manager/assets/dist/css/job-listings.css" type="text/css" media="all"/>
|
||||
<link rel="stylesheet" id="ceph/googlefont-css" href="https://web.archive.org/web/20230614135557cs_/https://fonts.googleapis.com/css?family=Raleway%3A300%2C400%2C700&ver=5.7.2" type="text/css" media="all"/>
|
||||
<link rel="stylesheet" id="Stylesheet-css" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/wp-content/themes/cephTheme/Resources/Styles/style.min.css" type="text/css" media="all"/>
|
||||
<link rel="stylesheet" id="tablepress-default-css" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/wp-content/plugins/tablepress/css/default.min.css" type="text/css" media="all"/>
|
||||
<link rel="stylesheet" id="jetpack_css-css" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/wp-content/plugins/jetpack/css/jetpack.css" type="text/css" media="all"/>
|
||||
<script type="text/javascript" src="https://web.archive.org/web/20230614135557js_/https://old.ceph.com/wp-content/themes/cephTheme/foundation_framework/js/vendor/jquery.js" id="jquery-js"></script>
|
||||
|
||||
<link rel="stylesheet" href="https://web.archive.org/web/20230614135557cs_/https://ajax.googleapis.com/ajax/libs/jqueryui/1.11.2/themes/smoothness/jquery-ui.css"/>
|
||||
<link rel="stylesheet" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/pgcalc_assets/pgcalc.css"/>
|
||||
<script src="https://ajax.googleapis.com/ajax/libs/jqueryui/1.11.2/jquery-ui.min.js"></script>
|
||||
|
||||
<script src="../../../_static/js/pgcalc.js"></script>
|
||||
<div id="pgcalcdiv">
|
||||
<div id="instructions">
|
||||
<h2>Ceph PGs per Pool Calculator</h2><br/><fieldset><legend>Instructions</legend>
|
||||
<ol>
|
||||
<li>Confirm your understanding of the fields by reading through the Key below.</li>
|
||||
<li>Select a <b>"Ceph Use Case"</b> from the drop down menu.</li>
|
||||
<li>Adjust the values in the <span class="inputColor addBorder" style="font-weight: bold;">"Green"</span> shaded fields below.<br/>
|
||||
<b>Tip:</b> Headers can be clicked to change the value throughout the table.</li>
|
||||
<li>You will see the Suggested PG Count update based on your inputs.</li>
|
||||
<li>Click the <b>"Add Pool"</b> button to create a new line for a new pool.</li>
|
||||
<li>Click the <span class="ui-icon ui-icon-trash" style="display:inline-block;"></span> icon to delete the specific Pool.</li>
|
||||
<li>For more details on the logic used and some important details, see the area below the table.</li>
|
||||
<li>Once all values have been adjusted, click the <b>"Generate Commands"</b> button to get the pool creation commands.</li>
|
||||
</ol></fieldset>
|
||||
</div>
|
||||
<div id="beforeTable"></div>
|
||||
<br/>
|
||||
<p class="validateTips"> </p>
|
||||
<label for="presetType">Ceph Use Case Selector:</label><br/><select id="presetType"></select><button style="margin-left: 200px;" id="btnAddPool" type="button">Add Pool</button><button type="button" id="btnGenCommands" download="commands.txt">Generate Commands</button>
|
||||
<div id="pgsPerPoolTable">
|
||||
<table id="pgsperpool">
|
||||
</table>
|
||||
</div> <!-- id = pgsPerPoolTable -->
|
||||
<br/>
|
||||
<div id="afterTable"></div>
|
||||
<div id="countLogic"><fieldset><legend>Logic behind Suggested PG Count</legend>
|
||||
<br/>
|
||||
<div class="upperFormula">( Target PGs per OSD ) x ( OSD # ) x ( %Data )</div>
|
||||
<div class="lowerFormula">( Size )</div>
|
||||
<ol id="countLogicList">
|
||||
<li>If the value of the above calculation is less than the value of <b>( OSD# ) / ( Size )</b>, then the value is updated to the value of <b>( OSD# ) / ( Size )</b>. This is to ensure even load / data distribution by allocating at least one Primary or Secondary PG to every OSD for every Pool.</li>
|
||||
<li>The output value is then rounded to the <b>nearest power of 2</b>.<br/><b>Tip:</b> The nearest power of 2 provides a marginal improvement in efficiency of the <a href="https://web.archive.org/web/20230614135557/http://ceph.com/docs/master/rados/operations/crush-map/" title="CRUSH Map Details">CRUSH</a> algorithm.</li>
|
||||
<li>If the nearest power of 2 is more than <b>25%</b> below the original value, the next higher power of 2 is used.</li>
|
||||
</ol>
|
||||
<b>Objective</b>
|
||||
<ul><li>The objective of this calculation and the target ranges noted in the "Key" section above are to ensure that there are sufficient Placement Groups for even data distribution throughout the cluster, while not going high enough on the PG per OSD ratio to cause problems during Recovery and/or Backfill operations.</li></ul>
|
||||
<b>Effects of enpty or non-active pools:</b>
|
||||
<ul>
|
||||
<li>Empty or otherwise non-active pools should not be considered helpful toward even data distribution throughout the cluster.</li>
|
||||
<li>However, the PGs associated with these empty / non-active pools still consume memory and CPU overhead.</li>
|
||||
</ul>
|
||||
</fieldset>
|
||||
</div>
|
||||
<div id="commands" title="Pool Creation Commands"><code><pre id="commandCode"></pre></code></div>
|
||||
</div>
|
@ -4,6 +4,21 @@
|
||||
Placement Groups
|
||||
==================
|
||||
|
||||
Placement groups (PGs) are subsets of each logical Ceph pool. Placement groups
|
||||
perform the function of placing objects (as a group) into OSDs. Ceph manages
|
||||
data internally at placement-group granularity: this scales better than would
|
||||
managing individual RADOS objects. A cluster that has a larger number of
|
||||
placement groups (for example, 150 per OSD) is better balanced than an
|
||||
otherwise identical cluster with a smaller number of placement groups.
|
||||
|
||||
Ceph’s internal RADOS objects are each mapped to a specific placement group,
|
||||
and each placement group belongs to exactly one Ceph pool.
|
||||
|
||||
See Sage Weil's blog post `New in Nautilus: PG merging and autotuning
|
||||
<https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/>`_
|
||||
for more information about the relationship of placement groups to pools and to
|
||||
objects.
|
||||
|
||||
.. _pg-autoscaler:
|
||||
|
||||
Autoscaling placement groups
|
||||
@ -131,11 +146,11 @@ The output will resemble the following::
|
||||
if a ``pg_num`` change is in progress, the current number of PGs that the
|
||||
pool is working towards.
|
||||
|
||||
- **NEW PG_NUM** (if present) is the value that the system is recommending the
|
||||
``pg_num`` of the pool to be changed to. It is always a power of 2, and it is
|
||||
present only if the recommended value varies from the current value by more
|
||||
than the default factor of ``3``. To adjust this factor (in the following
|
||||
example, it is changed to ``2``), run the following command:
|
||||
- **NEW PG_NUM** (if present) is the value that the system recommends that the
|
||||
``pg_num`` of the pool should be. It is always a power of two, and it
|
||||
is present only if the recommended value varies from the current value by
|
||||
more than the default factor of ``3``. To adjust this multiple (in the
|
||||
following example, it is changed to ``2``), run the following command:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
@ -168,7 +183,6 @@ The output will resemble the following::
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph osd pool set .mgr crush_rule replicated-ssd
|
||||
ceph osd pool set pool 1 crush_rule to replicated-ssd
|
||||
|
||||
This intervention will result in a small amount of backfill, but
|
||||
typically this traffic completes quickly.
|
||||
@ -626,15 +640,14 @@ pools, each with 512 PGs on 10 OSDs, the OSDs will have to handle ~50,000 PGs
|
||||
each. This cluster will require significantly more resources and significantly
|
||||
more time for peering.
|
||||
|
||||
For determining the optimal number of PGs per OSD, we recommend the `PGCalc`_
|
||||
tool.
|
||||
|
||||
|
||||
.. _setting the number of placement groups:
|
||||
|
||||
Setting the Number of PGs
|
||||
=========================
|
||||
|
||||
:ref:`Placement Group Link <pgcalc>`
|
||||
|
||||
Setting the initial number of PGs in a pool must be done at the time you create
|
||||
the pool. See `Create a Pool`_ for details.
|
||||
|
||||
@ -894,4 +907,3 @@ about it entirely (if it is too new to have a previous version). To mark the
|
||||
|
||||
.. _Create a Pool: ../pools#createpool
|
||||
.. _Mapping PGs to OSDs: ../../../architecture#mapping-pgs-to-osds
|
||||
.. _pgcalc: https://old.ceph.com/pgcalc/
|
||||
|
@ -18,15 +18,17 @@ Pools provide:
|
||||
<../erasure-code>`_, resilience is defined as the number of coding chunks
|
||||
(for example, ``m = 2`` in the default **erasure code profile**).
|
||||
|
||||
- **Placement Groups**: You can set the number of placement groups (PGs) for
|
||||
the pool. In a typical configuration, the target number of PGs is
|
||||
approximately one hundred PGs per OSD. This provides reasonable balancing
|
||||
without consuming excessive computing resources. When setting up multiple
|
||||
pools, be careful to set an appropriate number of PGs for each pool and for
|
||||
the cluster as a whole. Each PG belongs to a specific pool: when multiple
|
||||
pools use the same OSDs, make sure that the **sum** of PG replicas per OSD is
|
||||
in the desired PG-per-OSD target range. To calculate an appropriate number of
|
||||
PGs for your pools, use the `pgcalc`_ tool.
|
||||
- **Placement Groups**: The :ref:`autoscaler <pg-autoscaler>` sets the number
|
||||
of placement groups (PGs) for the pool. In a typical configuration, the
|
||||
target number of PGs is approximately one-hundred and fifty PGs per OSD. This
|
||||
provides reasonable balancing without consuming excessive computing
|
||||
resources. When setting up multiple pools, set an appropriate number of PGs
|
||||
for each pool and for the cluster as a whole. Each PG belongs to a specific
|
||||
pool: when multiple pools use the same OSDs, make sure that the **sum** of PG
|
||||
replicas per OSD is in the desired PG-per-OSD target range. See :ref:`Setting
|
||||
the Number of Placement Groups <setting the number of placement groups>` for
|
||||
instructions on how to manually set the number of placement groups per pool
|
||||
(this procedure works only when the autoscaler is not used).
|
||||
|
||||
- **CRUSH Rules**: When data is stored in a pool, the placement of the object
|
||||
and its replicas (or chunks, in the case of erasure-coded pools) in your
|
||||
@ -94,19 +96,12 @@ To get even more information, you can execute this command with the ``--format``
|
||||
Creating a Pool
|
||||
===============
|
||||
|
||||
Before creating a pool, consult `Pool, PG and CRUSH Config Reference`_. Your
|
||||
Ceph configuration file contains a setting (namely, ``pg_num``) that determines
|
||||
the number of PGs. However, this setting's default value is NOT appropriate
|
||||
for most systems. In most cases, you should override this default value when
|
||||
creating your pool. For details on PG numbers, see `setting the number of
|
||||
placement groups`_
|
||||
|
||||
For example:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
osd_pool_default_pg_num = 128
|
||||
osd_pool_default_pgp_num = 128
|
||||
Before creating a pool, consult `Pool, PG and CRUSH Config Reference`_. The
|
||||
Ceph central configuration database in the monitor cluster contains a setting
|
||||
(namely, ``pg_num``) that determines the number of PGs per pool when a pool has
|
||||
been created and no per-pool value has been specified. It is possible to change
|
||||
this value from its default. For more on the subject of setting the number of
|
||||
PGs per pool, see `setting the number of placement groups`_.
|
||||
|
||||
.. note:: In Luminous and later releases, each pool must be associated with the
|
||||
application that will be using the pool. For more information, see
|
||||
@ -742,8 +737,6 @@ Managing pools that are flagged with ``--bulk``
|
||||
===============================================
|
||||
See :ref:`managing_bulk_flagged_pools`.
|
||||
|
||||
|
||||
.. _pgcalc: https://old.ceph.com/pgcalc/
|
||||
.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
|
||||
.. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
|
||||
.. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups
|
||||
|
@ -121,8 +121,6 @@ your CRUSH map. This procedure shows how to do this.
|
||||
|
||||
rule stretch_rule {
|
||||
id 1
|
||||
min_size 1
|
||||
max_size 10
|
||||
type replicated
|
||||
step take site1
|
||||
step chooseleaf firstn 2 type host
|
||||
@ -141,11 +139,15 @@ your CRUSH map. This procedure shows how to do this.
|
||||
|
||||
#. Run the monitors in connectivity mode. See `Changing Monitor Elections`_.
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
ceph mon set election_strategy connectivity
|
||||
|
||||
#. Command the cluster to enter stretch mode. In this example, ``mon.e`` is the
|
||||
tiebreaker monitor and we are splitting across data centers. The tiebreaker
|
||||
monitor must be assigned a data center that is neither ``site1`` nor
|
||||
``site2``. For this purpose you can create another data-center bucket named
|
||||
``site3`` in your CRUSH and place ``mon.e`` there:
|
||||
``site2``. This data center **should not** be defined in your CRUSH map, here
|
||||
we are placing ``mon.e`` in a virtual data center called ``site3``:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
|
@ -175,17 +175,19 @@ For each subsystem, there is a logging level for its output logs (a so-called
|
||||
"log level") and a logging level for its in-memory logs (a so-called "memory
|
||||
level"). Different values may be set for these two logging levels in each
|
||||
subsystem. Ceph's logging levels operate on a scale of ``1`` to ``20``, where
|
||||
``1`` is terse and ``20`` is verbose [#f1]_. As a general rule, the in-memory
|
||||
logs are not sent to the output log unless one or more of the following
|
||||
conditions obtain:
|
||||
``1`` is terse and ``20`` is verbose. In certain rare cases, there are logging
|
||||
levels that can take a value greater than 20. The resulting logs are extremely
|
||||
verbose.
|
||||
|
||||
- a fatal signal is raised or
|
||||
- an ``assert`` in source code is triggered or
|
||||
- upon requested. Please consult `document on admin socket
|
||||
<http://docs.ceph.com/en/latest/man/8/ceph/#daemon>`_ for more details.
|
||||
The in-memory logs are not sent to the output log unless one or more of the
|
||||
following conditions are true:
|
||||
|
||||
.. warning ::
|
||||
.. [#f1] In certain rare cases, there are logging levels that can take a value greater than 20. The resulting logs are extremely verbose.
|
||||
- a fatal signal has been raised or
|
||||
- an assertion within Ceph code has been triggered or
|
||||
- the sending of in-memory logs to the output log has been manually triggered.
|
||||
Consult `the portion of the "Ceph Administration Tool documentation
|
||||
that provides an example of how to submit admin socket commands
|
||||
<http://docs.ceph.com/en/latest/man/8/ceph/#daemon>`_ for more detail.
|
||||
|
||||
Log levels and memory levels can be set either together or separately. If a
|
||||
subsystem is assigned a single value, then that value determines both the log
|
||||
|
@ -85,23 +85,27 @@ Using the monitor's admin socket
|
||||
================================
|
||||
|
||||
A monitor's admin socket allows you to interact directly with a specific daemon
|
||||
by using a Unix socket file. This file is found in the monitor's ``run``
|
||||
directory. The admin socket's default directory is
|
||||
``/var/run/ceph/ceph-mon.ID.asok``, but this can be overridden and the admin
|
||||
socket might be elsewhere, especially if your cluster's daemons are deployed in
|
||||
containers. If you cannot find it, either check your ``ceph.conf`` for an
|
||||
alternative path or run the following command:
|
||||
by using a Unix socket file. This socket file is found in the monitor's ``run``
|
||||
directory.
|
||||
|
||||
The admin socket's default directory is ``/var/run/ceph/ceph-mon.ID.asok``. It
|
||||
is possible to override the admin socket's default location. If the default
|
||||
location has been overridden, then the admin socket will be elsewhere. This is
|
||||
often the case when a cluster's daemons are deployed in containers.
|
||||
|
||||
To find the directory of the admin socket, check either your ``ceph.conf`` for
|
||||
an alternative path or run the following command:
|
||||
|
||||
.. prompt:: bash $
|
||||
|
||||
ceph-conf --name mon.ID --show-config-value admin_socket
|
||||
|
||||
The admin socket is available for use only when the monitor daemon is running.
|
||||
Whenever the monitor has been properly shut down, the admin socket is removed.
|
||||
However, if the monitor is not running and the admin socket persists, it is
|
||||
likely that the monitor has been improperly shut down. In any case, if the
|
||||
monitor is not running, it will be impossible to use the admin socket, and the
|
||||
``ceph`` command is likely to return ``Error 111: Connection Refused``.
|
||||
The admin socket is available for use only when the Monitor daemon is running.
|
||||
Every time the Monitor is properly shut down, the admin socket is removed. If
|
||||
the Monitor is not running and yet the admin socket persists, it is likely that
|
||||
the Monitor has been improperly shut down. If the Monitor is not running, it
|
||||
will be impossible to use the admin socket, and the ``ceph`` command is likely
|
||||
to return ``Error 111: Connection Refused``.
|
||||
|
||||
To access the admin socket, run a ``ceph tell`` command of the following form
|
||||
(specifying the daemon that you are interested in):
|
||||
@ -110,7 +114,7 @@ To access the admin socket, run a ``ceph tell`` command of the following form
|
||||
|
||||
ceph tell mon.<id> mon_status
|
||||
|
||||
This command passes a ``help`` command to the specific running monitor daemon
|
||||
This command passes a ``help`` command to the specified running Monitor daemon
|
||||
``<id>`` via its admin socket. If you know the full path to the admin socket
|
||||
file, this can be done more directly by running the following command:
|
||||
|
||||
@ -127,10 +131,11 @@ and ``quorum_status``.
|
||||
Understanding mon_status
|
||||
========================
|
||||
|
||||
The status of the monitor (as reported by the ``ceph tell mon.X mon_status``
|
||||
command) can always be obtained via the admin socket. This command outputs a
|
||||
great deal of information about the monitor (including the information found in
|
||||
the output of the ``quorum_status`` command).
|
||||
The status of a Monitor (as reported by the ``ceph tell mon.X mon_status``
|
||||
command) can be obtained via the admin socket. The ``ceph tell mon.X
|
||||
mon_status`` command outputs a great deal of information about the monitor
|
||||
(including the information found in the output of the ``quorum_status``
|
||||
command).
|
||||
|
||||
To understand this command's output, let us consider the following example, in
|
||||
which we see the output of ``ceph tell mon.c mon_status``::
|
||||
@ -160,29 +165,34 @@ which we see the output of ``ceph tell mon.c mon_status``::
|
||||
"name": "c",
|
||||
"addr": "127.0.0.1:6795\/0"}]}}
|
||||
|
||||
It is clear that there are three monitors in the monmap (*a*, *b*, and *c*),
|
||||
the quorum is formed by only two monitors, and *c* is in the quorum as a
|
||||
*peon*.
|
||||
This output reports that there are three monitors in the monmap (*a*, *b*, and
|
||||
*c*), that quorum is formed by only two monitors, and that *c* is in quorum as
|
||||
a *peon*.
|
||||
|
||||
**Which monitor is out of the quorum?**
|
||||
**Which monitor is out of quorum?**
|
||||
|
||||
The answer is **a** (that is, ``mon.a``).
|
||||
The answer is **a** (that is, ``mon.a``). ``mon.a`` is out of quorum.
|
||||
|
||||
**Why?**
|
||||
**How do we know, in this example, that mon.a is out of quorum?**
|
||||
|
||||
When the ``quorum`` set is examined, there are clearly two monitors in the
|
||||
set: *1* and *2*. But these are not monitor names. They are monitor ranks, as
|
||||
established in the current ``monmap``. The ``quorum`` set does not include
|
||||
the monitor that has rank 0, and according to the ``monmap`` that monitor is
|
||||
``mon.a``.
|
||||
We know that ``mon.a`` is out of quorum because it has rank 0, and Monitors
|
||||
with rank 0 are by definition out of quorum.
|
||||
|
||||
If we examine the ``quorum`` set, we can see that there are clearly two
|
||||
monitors in the set: *1* and *2*. But these are not monitor names. They are
|
||||
monitor ranks, as established in the current ``monmap``. The ``quorum`` set
|
||||
does not include the monitor that has rank 0, and according to the ``monmap``
|
||||
that monitor is ``mon.a``.
|
||||
|
||||
**How are monitor ranks determined?**
|
||||
|
||||
Monitor ranks are calculated (or recalculated) whenever monitors are added or
|
||||
removed. The calculation of ranks follows a simple rule: the **greater** the
|
||||
``IP:PORT`` combination, the **lower** the rank. In this case, because
|
||||
``127.0.0.1:6789`` is lower than the other two ``IP:PORT`` combinations,
|
||||
``mon.a`` has the highest rank: namely, rank 0.
|
||||
Monitor ranks are calculated (or recalculated) whenever monitors are added to
|
||||
or removed from the cluster. The calculation of ranks follows a simple rule:
|
||||
the **greater** the ``IP:PORT`` combination, the **lower** the rank. In this
|
||||
case, because ``127.0.0.1:6789`` (``mon.a``) is numerically less than the
|
||||
other two ``IP:PORT`` combinations (which are ``127.0.0.1:6790`` for "Monitor
|
||||
b" and ``127.0.0.1:6795`` for "Monitor c"), ``mon.a`` has the highest rank:
|
||||
namely, rank 0.
|
||||
|
||||
|
||||
Most Common Monitor Issues
|
||||
@ -250,14 +260,15 @@ detail`` returns a message similar to the following::
|
||||
Monitors at a wrong address. ``mon_status`` outputs the ``monmap`` that is
|
||||
known to the monitor: determine whether the other Monitors' locations as
|
||||
specified in the ``monmap`` match the locations of the Monitors in the
|
||||
network. If they do not, see `Recovering a Monitor's Broken monmap`_.
|
||||
If the locations of the Monitors as specified in the ``monmap`` match the
|
||||
locations of the Monitors in the network, then the persistent
|
||||
``probing`` state could be related to severe clock skews amongst the monitor
|
||||
nodes. See `Clock Skews`_. If the information in `Clock Skews`_ does not
|
||||
bring the Monitor out of the ``probing`` state, then prepare your system logs
|
||||
and ask the Ceph community for help. See `Preparing your logs`_ for
|
||||
information about the proper preparation of logs.
|
||||
network. If they do not, see :ref:`Recovering a Monitor's Broken monmap
|
||||
<rados_troubleshooting_troubleshooting_mon_recovering_broken_monmap>`. If
|
||||
the locations of the Monitors as specified in the ``monmap`` match the
|
||||
locations of the Monitors in the network, then the persistent ``probing``
|
||||
state could be related to severe clock skews among the monitor nodes. See
|
||||
`Clock Skews`_. If the information in `Clock Skews`_ does not bring the
|
||||
Monitor out of the ``probing`` state, then prepare your system logs and ask
|
||||
the Ceph community for help. See `Preparing your logs`_ for information about
|
||||
the proper preparation of logs.
|
||||
|
||||
|
||||
**What does it mean when a Monitor's state is ``electing``?**
|
||||
@ -314,13 +325,16 @@ detail`` returns a message similar to the following::
|
||||
substantiate it. See `Preparing your logs`_ for information about the
|
||||
proper preparation of logs.
|
||||
|
||||
.. _rados_troubleshooting_troubleshooting_mon_recovering_broken_monmap:
|
||||
|
||||
Recovering a Monitor's Broken ``monmap``
|
||||
----------------------------------------
|
||||
Recovering a Monitor's Broken "monmap"
|
||||
--------------------------------------
|
||||
|
||||
This is how a ``monmap`` usually looks, depending on the number of
|
||||
monitors::
|
||||
A monmap can be retrieved by using a command of the form ``ceph tell mon.c
|
||||
mon_status``, as described in :ref:`Understanding mon_status
|
||||
<rados_troubleshoting_troubleshooting_mon_understanding_mon_status>`.
|
||||
|
||||
Here is an example of a ``monmap``::
|
||||
|
||||
epoch 3
|
||||
fsid 5c4e9d53-e2e1-478a-8061-f543f8be4cf8
|
||||
@ -329,61 +343,64 @@ monitors::
|
||||
0: 127.0.0.1:6789/0 mon.a
|
||||
1: 127.0.0.1:6790/0 mon.b
|
||||
2: 127.0.0.1:6795/0 mon.c
|
||||
|
||||
This may not be what you have however. For instance, in some versions of
|
||||
early Cuttlefish there was a bug that could cause your ``monmap``
|
||||
to be nullified. Completely filled with zeros. This means that not even
|
||||
``monmaptool`` would be able to make sense of cold, hard, inscrutable zeros.
|
||||
It's also possible to end up with a monitor with a severely outdated monmap,
|
||||
notably if the node has been down for months while you fight with your vendor's
|
||||
TAC. The subject ``ceph-mon`` daemon might be unable to find the surviving
|
||||
monitors (e.g., say ``mon.c`` is down; you add a new monitor ``mon.d``,
|
||||
then remove ``mon.a``, then add a new monitor ``mon.e`` and remove
|
||||
``mon.b``; you will end up with a totally different monmap from the one
|
||||
``mon.c`` knows).
|
||||
|
||||
In this situation you have two possible solutions:
|
||||
This ``monmap`` is in working order, but your ``monmap`` might not be in
|
||||
working order. The ``monmap`` in a given node might be outdated because the
|
||||
node was down for a long time, during which the cluster's Monitors changed.
|
||||
|
||||
Scrap the monitor and redeploy
|
||||
There are two ways to update a Monitor's outdated ``monmap``:
|
||||
|
||||
You should only take this route if you are positive that you won't
|
||||
lose the information kept by that monitor; that you have other monitors
|
||||
and that they are running just fine so that your new monitor is able
|
||||
to synchronize from the remaining monitors. Keep in mind that destroying
|
||||
a monitor, if there are no other copies of its contents, may lead to
|
||||
loss of data.
|
||||
A. **Scrap the monitor and redeploy.**
|
||||
|
||||
Inject a monmap into the monitor
|
||||
Do this only if you are certain that you will not lose the information kept
|
||||
by the Monitor that you scrap. Make sure that you have other Monitors in
|
||||
good condition, so that the new Monitor will be able to synchronize with
|
||||
the surviving Monitors. Remember that destroying a Monitor can lead to data
|
||||
loss if there are no other copies of the Monitor's contents.
|
||||
|
||||
These are the basic steps:
|
||||
B. **Inject a monmap into the monitor.**
|
||||
|
||||
Retrieve the ``monmap`` from the surviving monitors and inject it into the
|
||||
monitor whose ``monmap`` is corrupted or lost.
|
||||
It is possible to fix a Monitor that has an outdated ``monmap`` by
|
||||
retrieving an up-to-date ``monmap`` from surviving Monitors in the cluster
|
||||
and injecting it into the Monitor that has a corrupted or missing
|
||||
``monmap``.
|
||||
|
||||
Implement this solution by carrying out the following procedure:
|
||||
Implement this solution by carrying out the following procedure:
|
||||
|
||||
1. Is there a quorum of monitors? If so, retrieve the ``monmap`` from the
|
||||
quorum::
|
||||
#. Retrieve the ``monmap`` in one of the two following ways:
|
||||
|
||||
$ ceph mon getmap -o /tmp/monmap
|
||||
a. **IF THERE IS A QUORUM OF MONITORS:**
|
||||
|
||||
Retrieve the ``monmap`` from the quorum:
|
||||
|
||||
2. If there is no quorum, then retrieve the ``monmap`` directly from another
|
||||
monitor that has been stopped (in this example, the other monitor has
|
||||
the ID ``ID-FOO``)::
|
||||
.. prompt:: bash
|
||||
|
||||
$ ceph-mon -i ID-FOO --extract-monmap /tmp/monmap
|
||||
ceph mon getmap -o /tmp/monmap
|
||||
|
||||
3. Stop the monitor you are going to inject the monmap into.
|
||||
b. **IF THERE IS NO QUORUM OF MONITORS:**
|
||||
|
||||
Retrieve the ``monmap`` directly from a Monitor that has been stopped
|
||||
:
|
||||
|
||||
4. Inject the monmap::
|
||||
.. prompt:: bash
|
||||
|
||||
$ ceph-mon -i ID --inject-monmap /tmp/monmap
|
||||
ceph-mon -i ID-FOO --extract-monmap /tmp/monmap
|
||||
|
||||
5. Start the monitor
|
||||
In this example, the ID of the stopped Monitor is ``ID-FOO``.
|
||||
|
||||
.. warning:: Injecting ``monmaps`` can cause serious problems because doing
|
||||
so will overwrite the latest existing ``monmap`` stored on the monitor. Be
|
||||
careful!
|
||||
#. Stop the Monitor into which the ``monmap`` will be injected.
|
||||
|
||||
#. Inject the monmap into the stopped Monitor:
|
||||
|
||||
.. prompt:: bash
|
||||
|
||||
ceph-mon -i ID --inject-monmap /tmp/monmap
|
||||
|
||||
#. Start the Monitor.
|
||||
|
||||
.. warning:: Injecting a ``monmap`` into a Monitor can cause serious
|
||||
problems. Injecting a ``monmap`` overwrites the latest existing
|
||||
``monmap`` stored on the monitor. Be careful!
|
||||
|
||||
Clock Skews
|
||||
-----------
|
||||
@ -464,12 +481,13 @@ Clock Skew Questions and Answers
|
||||
Client Can't Connect or Mount
|
||||
-----------------------------
|
||||
|
||||
Check your IP tables. Some operating-system install utilities add a ``REJECT``
|
||||
rule to ``iptables``. ``iptables`` rules will reject all clients other than
|
||||
``ssh`` that try to connect to the host. If your monitor host's IP tables have
|
||||
a ``REJECT`` rule in place, clients that are connecting from a separate node
|
||||
will fail and will raise a timeout error. Any ``iptables`` rules that reject
|
||||
clients trying to connect to Ceph daemons must be addressed. For example::
|
||||
If a client can't connect to the cluster or mount, check your iptables. Some
|
||||
operating-system install utilities add a ``REJECT`` rule to ``iptables``.
|
||||
``iptables`` rules will reject all clients other than ``ssh`` that try to
|
||||
connect to the host. If your monitor host's iptables have a ``REJECT`` rule in
|
||||
place, clients that connect from a separate node will fail, and this will raise
|
||||
a timeout error. Look for ``iptables`` rules that reject clients that are
|
||||
trying to connect to Ceph daemons. For example::
|
||||
|
||||
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
|
||||
|
||||
@ -487,9 +505,9 @@ Monitor Store Failures
|
||||
Symptoms of store corruption
|
||||
----------------------------
|
||||
|
||||
Ceph monitors store the :term:`Cluster Map` in a key-value store. If key-value
|
||||
store corruption causes a monitor to fail, then the monitor log might contain
|
||||
one of the following error messages::
|
||||
Ceph Monitors maintain the :term:`Cluster Map` in a key-value store. If
|
||||
key-value store corruption causes a Monitor to fail, then the Monitor log might
|
||||
contain one of the following error messages::
|
||||
|
||||
Corruption: error in middle of record
|
||||
|
||||
@ -500,10 +518,10 @@ or::
|
||||
Recovery using healthy monitor(s)
|
||||
---------------------------------
|
||||
|
||||
If there are surviving monitors, we can always :ref:`replace
|
||||
<adding-and-removing-monitors>` the corrupted monitor with a new one. After the
|
||||
new monitor boots, it will synchronize with a healthy peer. After the new
|
||||
monitor is fully synchronized, it will be able to serve clients.
|
||||
If the cluster contains surviving Monitors, the corrupted Monitor can be
|
||||
:ref:`replaced <adding-and-removing-monitors>` with a new Monitor. After the
|
||||
new Monitor boots, it will synchronize with a healthy peer. After the new
|
||||
Monitor is fully synchronized, it will be able to serve clients.
|
||||
|
||||
.. _mon-store-recovery-using-osds:
|
||||
|
||||
@ -511,15 +529,14 @@ Recovery using OSDs
|
||||
-------------------
|
||||
|
||||
Even if all monitors fail at the same time, it is possible to recover the
|
||||
monitor store by using information stored in OSDs. You are encouraged to deploy
|
||||
at least three (and preferably five) monitors in a Ceph cluster. In such a
|
||||
deployment, complete monitor failure is unlikely. However, unplanned power loss
|
||||
in a data center whose disk settings or filesystem settings are improperly
|
||||
configured could cause the underlying filesystem to fail and this could kill
|
||||
all of the monitors. In such a case, data in the OSDs can be used to recover
|
||||
the monitors. The following is such a script and can be used to recover the
|
||||
monitors:
|
||||
|
||||
Monitor store by using information that is stored in OSDs. You are encouraged
|
||||
to deploy at least three (and preferably five) Monitors in a Ceph cluster. In
|
||||
such a deployment, complete Monitor failure is unlikely. However, unplanned
|
||||
power loss in a data center whose disk settings or filesystem settings are
|
||||
improperly configured could cause the underlying filesystem to fail and this
|
||||
could kill all of the monitors. In such a case, data in the OSDs can be used to
|
||||
recover the Monitors. The following is a script that can be used in such a case
|
||||
to recover the Monitors:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
@ -572,10 +589,10 @@ monitors:
|
||||
|
||||
This script performs the following steps:
|
||||
|
||||
#. Collects the map from each OSD host.
|
||||
#. Rebuilds the store.
|
||||
#. Fills the entities in the keyring file with appropriate capabilities.
|
||||
#. Replaces the corrupted store on ``mon.foo`` with the recovered copy.
|
||||
#. Collect the map from each OSD host.
|
||||
#. Rebuild the store.
|
||||
#. Fill the entities in the keyring file with appropriate capabilities.
|
||||
#. Replace the corrupted store on ``mon.foo`` with the recovered copy.
|
||||
|
||||
|
||||
Known limitations
|
||||
@ -587,19 +604,18 @@ The above recovery tool is unable to recover the following information:
|
||||
auth add`` command are recovered from the OSD's copy, and the
|
||||
``client.admin`` keyring is imported using ``ceph-monstore-tool``. However,
|
||||
the MDS keyrings and all other keyrings will be missing in the recovered
|
||||
monitor store. You might need to manually re-add them.
|
||||
Monitor store. It might be necessary to manually re-add them.
|
||||
|
||||
- **Creating pools**: If any RADOS pools were in the process of being created,
|
||||
that state is lost. The recovery tool operates on the assumption that all
|
||||
pools have already been created. If there are PGs that are stuck in the
|
||||
'unknown' state after the recovery for a partially created pool, you can
|
||||
``unknown`` state after the recovery for a partially created pool, you can
|
||||
force creation of the *empty* PG by running the ``ceph osd force-create-pg``
|
||||
command. Note that this will create an *empty* PG, so take this action only
|
||||
if you know the pool is empty.
|
||||
command. This creates an *empty* PG, so take this action only if you are
|
||||
certain that the pool is empty.
|
||||
|
||||
- **MDS Maps**: The MDS maps are lost.
|
||||
|
||||
|
||||
Everything Failed! Now What?
|
||||
============================
|
||||
|
||||
@ -611,16 +627,20 @@ irc.oftc.net), or at ``dev@ceph.io`` and ``ceph-users@lists.ceph.com``. Make
|
||||
sure that you have prepared your logs and that you have them ready upon
|
||||
request.
|
||||
|
||||
See https://ceph.io/en/community/connect/ for current (as of October 2023)
|
||||
information on getting in contact with the upstream Ceph community.
|
||||
The upstream Ceph Slack workspace can be joined at this address:
|
||||
https://ceph-storage.slack.com/
|
||||
|
||||
See https://ceph.io/en/community/connect/ for current (as of December 2023)
|
||||
information on getting in contact with the upstream Ceph community.
|
||||
|
||||
Preparing your logs
|
||||
-------------------
|
||||
|
||||
The default location for monitor logs is ``/var/log/ceph/ceph-mon.FOO.log*``.
|
||||
However, if they are not there, you can find their current location by running
|
||||
the following command:
|
||||
The default location for Monitor logs is ``/var/log/ceph/ceph-mon.FOO.log*``.
|
||||
It is possible that the location of the Monitor logs has been changed from the
|
||||
default. If the location of the Monitor logs has been changed from the default
|
||||
location, find the location of the Monitor logs by running the following
|
||||
command:
|
||||
|
||||
.. prompt:: bash
|
||||
|
||||
@ -631,21 +651,21 @@ cluster's configuration files. If Ceph is using the default debug levels, then
|
||||
your logs might be missing important information that would help the upstream
|
||||
Ceph community address your issue.
|
||||
|
||||
To make sure your monitor logs contain relevant information, you can raise
|
||||
debug levels. Here we are interested in information from the monitors. As with
|
||||
other components, the monitors have different parts that output their debug
|
||||
Raise debug levels to make sure that your Monitor logs contain relevant
|
||||
information. Here we are interested in information from the Monitors. As with
|
||||
other components, the Monitors have different parts that output their debug
|
||||
information on different subsystems.
|
||||
|
||||
If you are an experienced Ceph troubleshooter, we recommend raising the debug
|
||||
levels of the most relevant subsystems. Of course, this approach might not be
|
||||
easy for beginners. In most cases, however, enough information to address the
|
||||
issue will be secured if the following debug levels are entered::
|
||||
levels of the most relevant subsystems. This approach might not be easy for
|
||||
beginners. In most cases, however, enough information to address the issue will
|
||||
be logged if the following debug levels are entered::
|
||||
|
||||
debug_mon = 10
|
||||
debug_ms = 1
|
||||
|
||||
Sometimes these debug levels do not yield enough information. In such cases,
|
||||
members of the upstream Ceph community might ask you to make additional changes
|
||||
members of the upstream Ceph community will ask you to make additional changes
|
||||
to these or to other debug levels. In any case, it is better for us to receive
|
||||
at least some useful information than to receive an empty log.
|
||||
|
||||
@ -653,10 +673,12 @@ at least some useful information than to receive an empty log.
|
||||
Do I need to restart a monitor to adjust debug levels?
|
||||
------------------------------------------------------
|
||||
|
||||
No, restarting a monitor is not necessary. Debug levels may be adjusted by
|
||||
using two different methods, depending on whether or not there is a quorum:
|
||||
No. It is not necessary to restart a Monitor when adjusting its debug levels.
|
||||
|
||||
There is a quorum
|
||||
There are two different methods for adjusting debug levels. One method is used
|
||||
when there is quorum. The other is used when there is no quorum.
|
||||
|
||||
**Adjusting debug levels when there is a quorum**
|
||||
|
||||
Either inject the debug option into the specific monitor that needs to
|
||||
be debugged::
|
||||
@ -668,17 +690,19 @@ There is a quorum
|
||||
ceph tell mon.* config set debug_mon 10/10
|
||||
|
||||
|
||||
There is no quorum
|
||||
**Adjusting debug levels when there is no quorum**
|
||||
|
||||
Use the admin socket of the specific monitor that needs to be debugged
|
||||
and directly adjust the monitor's configuration options::
|
||||
|
||||
ceph daemon mon.FOO config set debug_mon 10/10
|
||||
|
||||
**Returning debug levels to their default values**
|
||||
|
||||
To return the debug levels to their default values, run the above commands
|
||||
using the debug level ``1/10`` rather than ``10/10``. To check a monitor's
|
||||
current values, use the admin socket and run either of the following commands:
|
||||
using the debug level ``1/10`` rather than the debug level ``10/10``. To check
|
||||
a Monitor's current values, use the admin socket and run either of the
|
||||
following commands:
|
||||
|
||||
.. prompt:: bash
|
||||
|
||||
@ -695,17 +719,17 @@ or:
|
||||
I Reproduced the problem with appropriate debug levels. Now what?
|
||||
-----------------------------------------------------------------
|
||||
|
||||
We prefer that you send us only the portions of your logs that are relevant to
|
||||
your monitor problems. Of course, it might not be easy for you to determine
|
||||
which portions are relevant so we are willing to accept complete and
|
||||
unabridged logs. However, we request that you avoid sending logs containing
|
||||
hundreds of thousands of lines with no additional clarifying information. One
|
||||
common-sense way of making our task easier is to write down the current time
|
||||
and date when you are reproducing the problem and then extract portions of your
|
||||
Send the upstream Ceph community only the portions of your logs that are
|
||||
relevant to your Monitor problems. Because it might not be easy for you to
|
||||
determine which portions are relevant, the upstream Ceph community accepts
|
||||
complete and unabridged logs. But don't send logs containing hundreds of
|
||||
thousands of lines with no additional clarifying information. One common-sense
|
||||
way to help the Ceph community help you is to write down the current time and
|
||||
date when you are reproducing the problem and then extract portions of your
|
||||
logs based on that information.
|
||||
|
||||
Finally, reach out to us on the mailing lists or IRC or Slack, or by filing a
|
||||
new issue on the `tracker`_.
|
||||
Contact the upstream Ceph community on the mailing lists or IRC or Slack, or by
|
||||
filing a new issue on the `tracker`_.
|
||||
|
||||
.. _tracker: http://tracker.ceph.com/projects/ceph/issues/new
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -275,6 +275,9 @@ Get User Info
|
||||
|
||||
Get user information.
|
||||
|
||||
Either a ``uid`` or ``access-key`` must be supplied as a request parameter. We recommend supplying uid.
|
||||
If both are provided but correspond to different users, the info for the user specified with ``uid`` will be returned.
|
||||
|
||||
:caps: users=read
|
||||
|
||||
|
||||
@ -297,6 +300,13 @@ Request Parameters
|
||||
:Example: ``foo_user``
|
||||
:Required: Yes
|
||||
|
||||
``access-key``
|
||||
|
||||
:Description: The S3 access key of the user for which the information is requested.
|
||||
:Type: String
|
||||
:Example: ``ABCD0EF12GHIJ2K34LMN``
|
||||
:Required: No
|
||||
|
||||
|
||||
Response Entities
|
||||
~~~~~~~~~~~~~~~~~
|
||||
|
@ -4,12 +4,18 @@ Compression
|
||||
|
||||
.. versionadded:: Kraken
|
||||
|
||||
The Ceph Object Gateway supports server-side compression of uploaded objects,
|
||||
using any of Ceph's existing compression plugins.
|
||||
The Ceph Object Gateway supports server-side compression of uploaded objects.
|
||||
using any of the existing compression plugins.
|
||||
|
||||
.. note:: The Reef release added a :ref:`feature_compress_encrypted` zonegroup
|
||||
feature to enable compression with `Server-Side Encryption`_.
|
||||
|
||||
Supported compression plugins include the following:
|
||||
|
||||
* lz4
|
||||
* snappy
|
||||
* zlib
|
||||
* zstd
|
||||
|
||||
Configuration
|
||||
=============
|
||||
@ -18,14 +24,15 @@ Compression can be enabled on a storage class in the Zone's placement target
|
||||
by providing the ``--compression=<type>`` option to the command
|
||||
``radosgw-admin zone placement modify``.
|
||||
|
||||
The compression ``type`` refers to the name of the compression plugin to use
|
||||
when writing new object data. Each compressed object remembers which plugin
|
||||
was used, so changing this setting does not hinder the ability to decompress
|
||||
existing objects, nor does it force existing objects to be recompressed.
|
||||
The compression ``type`` refers to the name of the compression plugin that will
|
||||
be used when writing new object data. Each compressed object remembers which
|
||||
plugin was used, so any change to this setting will neither affect Ceph's
|
||||
ability to decompress existing objects nor require existing objects to be
|
||||
recompressed.
|
||||
|
||||
This compression setting applies to all new objects uploaded to buckets using
|
||||
this placement target. Compression can be disabled by setting the ``type`` to
|
||||
an empty string or ``none``.
|
||||
Compression settings apply to all new objects uploaded to buckets using this
|
||||
placement target. Compression can be disabled by setting the ``type`` to an
|
||||
empty string or ``none``.
|
||||
|
||||
For example::
|
||||
|
||||
@ -62,11 +69,15 @@ For example::
|
||||
Statistics
|
||||
==========
|
||||
|
||||
While all existing commands and APIs continue to report object and bucket
|
||||
sizes based their uncompressed data, compression statistics for a given bucket
|
||||
are included in its ``bucket stats``::
|
||||
Run the ``radosgw-admin bucket stats`` command to see compression statistics
|
||||
for a given bucket:
|
||||
|
||||
.. prompt:: bash
|
||||
|
||||
radosgw-admin bucket stats --bucket=<name>
|
||||
|
||||
::
|
||||
|
||||
$ radosgw-admin bucket stats --bucket=<name>
|
||||
{
|
||||
...
|
||||
"usage": {
|
||||
@ -83,6 +94,9 @@ are included in its ``bucket stats``::
|
||||
...
|
||||
}
|
||||
|
||||
Other commands and APIs will report object and bucket sizes based on their
|
||||
uncompressed data.
|
||||
|
||||
The ``size_utilized`` and ``size_kb_utilized`` fields represent the total
|
||||
size of compressed data, in bytes and kilobytes respectively.
|
||||
|
||||
|
@ -15,13 +15,13 @@ Storage Clusters. :term:`Ceph Object Storage` supports two interfaces:
|
||||
that is compatible with a large subset of the OpenStack Swift API.
|
||||
|
||||
Ceph Object Storage uses the Ceph Object Gateway daemon (``radosgw``), an HTTP
|
||||
server designed for interacting with a Ceph Storage Cluster. The Ceph Object
|
||||
server designed to interact with a Ceph Storage Cluster. The Ceph Object
|
||||
Gateway provides interfaces that are compatible with both Amazon S3 and
|
||||
OpenStack Swift, and it has its own user management. Ceph Object Gateway can
|
||||
store data in the same Ceph Storage Cluster in which data from Ceph File System
|
||||
clients and Ceph Block Device clients is stored. The S3 API and the Swift API
|
||||
share a common namespace, which makes it possible to write data to a Ceph
|
||||
Storage Cluster with one API and then retrieve that data with the other API.
|
||||
use a single Ceph Storage cluster to store data from Ceph File System and from
|
||||
Ceph Block device clients. The S3 API and the Swift API share a common
|
||||
namespace, which means that it is possible to write data to a Ceph Storage
|
||||
Cluster with one API and then retrieve that data with the other API.
|
||||
|
||||
.. ditaa::
|
||||
|
||||
|
@ -24,49 +24,48 @@ Varieties of Multi-site Configuration
|
||||
|
||||
.. versionadded:: Jewel
|
||||
|
||||
Beginning with the Kraken release, Ceph supports several multi-site
|
||||
configurations for the Ceph Object Gateway:
|
||||
Since the Kraken release, Ceph has supported several multi-site configurations
|
||||
for the Ceph Object Gateway:
|
||||
|
||||
- **Multi-zone:** A more advanced topology, the "multi-zone" configuration, is
|
||||
possible. A multi-zone configuration consists of one zonegroup and
|
||||
multiple zones, with each zone consisting of one or more `ceph-radosgw`
|
||||
instances. **Each zone is backed by its own Ceph Storage Cluster.**
|
||||
- **Multi-zone:** The "multi-zone" configuration has a complex topology. A
|
||||
multi-zone configuration consists of one zonegroup and multiple zones. Each
|
||||
zone consists of one or more `ceph-radosgw` instances. **Each zone is backed
|
||||
by its own Ceph Storage Cluster.**
|
||||
|
||||
The presence of multiple zones in a given zonegroup provides disaster
|
||||
recovery for that zonegroup in the event that one of the zones experiences a
|
||||
significant failure. Beginning with the Kraken release, each zone is active
|
||||
and can receive write operations. A multi-zone configuration that contains
|
||||
multiple active zones enhances disaster recovery and can also be used as a
|
||||
foundation for content delivery networks.
|
||||
significant failure. Each zone is active and can receive write operations. A
|
||||
multi-zone configuration that contains multiple active zones enhances
|
||||
disaster recovery and can be used as a foundation for content-delivery
|
||||
networks.
|
||||
|
||||
- **Multi-zonegroups:** Ceph Object Gateway supports multiple zonegroups (which
|
||||
were formerly called "regions"). Each zonegroup contains one or more zones.
|
||||
If two zones are in the same zonegroup, and if that zonegroup is in the same
|
||||
realm as a second zonegroup, then the objects stored in the two zones share
|
||||
a global object namespace. This global object namespace ensures unique
|
||||
object IDs across zonegroups and zones.
|
||||
If two zones are in the same zonegroup and that zonegroup is in the same
|
||||
realm as a second zonegroup, then the objects stored in the two zones share a
|
||||
global object namespace. This global object namespace ensures unique object
|
||||
IDs across zonegroups and zones.
|
||||
|
||||
Each bucket is owned by the zonegroup where it was created (except where
|
||||
overridden by the :ref:`LocationConstraint<s3_bucket_placement>` on
|
||||
bucket creation), and its object data will only replicate to other zones in
|
||||
that zonegroup. Any request for data in that bucket that are sent to other
|
||||
bucket creation), and its object data will replicate only to other zones in
|
||||
that zonegroup. Any request for data in that bucket that is sent to other
|
||||
zonegroups will redirect to the zonegroup where the bucket resides.
|
||||
|
||||
It can be useful to create multiple zonegroups when you want to share a
|
||||
namespace of users and buckets across many zones, but isolate the object data
|
||||
to a subset of those zones. It might be that you have several connected sites
|
||||
that share storage, but only require a single backup for purposes of disaster
|
||||
recovery. In such a case, it could make sense to create several zonegroups
|
||||
with only two zones each to avoid replicating all objects to all zones.
|
||||
namespace of users and buckets across many zones and isolate the object data
|
||||
to a subset of those zones. Maybe you have several connected sites that share
|
||||
storage but require only a single backup for purposes of disaster recovery.
|
||||
In such a case, you could create several zonegroups with only two zones each
|
||||
to avoid replicating all objects to all zones.
|
||||
|
||||
In other cases, it might make more sense to isolate things in separate
|
||||
realms, with each realm having a single zonegroup. Zonegroups provide
|
||||
flexibility by making it possible to control the isolation of data and
|
||||
metadata separately.
|
||||
In other cases, you might isolate data in separate realms, with each realm
|
||||
having a single zonegroup. Zonegroups provide flexibility by making it
|
||||
possible to control the isolation of data and metadata separately.
|
||||
|
||||
- **Multiple Realms:** Beginning with the Kraken release, the Ceph Object
|
||||
Gateway supports "realms", which are containers for zonegroups. Realms make
|
||||
it possible to set policies that apply to multiple zonegroups. Realms have a
|
||||
- **Multiple Realms:** Since the Kraken release, the Ceph Object Gateway
|
||||
supports "realms", which are containers for zonegroups. Realms make it
|
||||
possible to set policies that apply to multiple zonegroups. Realms have a
|
||||
globally unique namespace and can contain either a single zonegroup or
|
||||
multiple zonegroups. If you choose to make use of multiple realms, you can
|
||||
define multiple namespaces and multiple configurations (this means that each
|
||||
@ -464,8 +463,8 @@ For example:
|
||||
|
||||
.. important:: The following steps assume a multi-site configuration that uses
|
||||
newly installed systems that have not yet begun storing data. **DO NOT
|
||||
DELETE the ``default`` zone or its pools** if you are already using it to
|
||||
store data, or the data will be irretrievably lost.
|
||||
DELETE the** ``default`` **zone or its pools** if you are already using it
|
||||
to store data, or the data will be irretrievably lost.
|
||||
|
||||
Delete the default zone if needed:
|
||||
|
||||
@ -528,6 +527,17 @@ running the following commands on the object gateway host:
|
||||
systemctl start ceph-radosgw@rgw.`hostname -s`
|
||||
systemctl enable ceph-radosgw@rgw.`hostname -s`
|
||||
|
||||
If the ``cephadm`` command was used to deploy the cluster, you will not be able
|
||||
to use ``systemctl`` to start the gateway because no services will exist on
|
||||
which ``systemctl`` could operate. This is due to the containerized nature of
|
||||
the ``cephadm``-deployed Ceph cluster. If you have used the ``cephadm`` command
|
||||
and you have a containerized cluster, you must run a command of the following
|
||||
form to start the gateway:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ceph orch apply rgw <name> --realm=<realm> --zone=<zone> --placement --port
|
||||
|
||||
Checking Synchronization Status
|
||||
-------------------------------
|
||||
|
||||
|
@ -154,6 +154,10 @@ updating, use the name of an existing topic and different endpoint values).
|
||||
[&Attributes.entry.9.key=persistent&Attributes.entry.9.value=true|false]
|
||||
[&Attributes.entry.10.key=cloudevents&Attributes.entry.10.value=true|false]
|
||||
[&Attributes.entry.11.key=mechanism&Attributes.entry.11.value=<mechanism>]
|
||||
[&Attributes.entry.12.key=time_to_live&Attributes.entry.12.value=<seconds to live>]
|
||||
[&Attributes.entry.13.key=max_retries&Attributes.entry.13.value=<retries number>]
|
||||
[&Attributes.entry.14.key=retry_sleep_duration&Attributes.entry.14.value=<sleep seconds>]
|
||||
[&Attributes.entry.15.key=Policy&Attributes.entry.15.value=<policy-JSON-string>]
|
||||
|
||||
Request parameters:
|
||||
|
||||
|
@ -11,16 +11,13 @@ multiple zones.
|
||||
Tuning
|
||||
======
|
||||
|
||||
When ``radosgw`` first tries to operate on a zone pool that does not
|
||||
exist, it will create that pool with the default values from
|
||||
``osd pool default pg num`` and ``osd pool default pgp num``. These defaults
|
||||
are sufficient for some pools, but others (especially those listed in
|
||||
``placement_pools`` for the bucket index and data) will require additional
|
||||
tuning. We recommend using the `Ceph Placement Group’s per Pool
|
||||
Calculator <https://old.ceph.com/pgcalc/>`__ to calculate a suitable number of
|
||||
placement groups for these pools. See
|
||||
`Pools <http://docs.ceph.com/en/latest/rados/operations/pools/#pools>`__
|
||||
for details on pool creation.
|
||||
When ``radosgw`` first tries to operate on a zone pool that does not exist, it
|
||||
will create that pool with the default values from ``osd pool default pg num``
|
||||
and ``osd pool default pgp num``. These defaults are sufficient for some pools,
|
||||
but others (especially those listed in ``placement_pools`` for the bucket index
|
||||
and data) will require additional tuning. See `Pools
|
||||
<http://docs.ceph.com/en/latest/rados/operations/pools/#pools>`__ for details
|
||||
on pool creation.
|
||||
|
||||
.. _radosgw-pool-namespaces:
|
||||
|
||||
|
@ -90,7 +90,8 @@ $ sudo ln -sf /usr/local/openresty/bin/openresty /usr/bin/nginx
|
||||
|
||||
Put in-place your Nginx configuration files and edit them according to your environment:
|
||||
|
||||
All Nginx conf files are under: https://github.com/ceph/ceph/tree/main/examples/rgw/rgw-cache
|
||||
All Nginx conf files are under:
|
||||
https://github.com/ceph/ceph/tree/main/examples/rgw/rgw-cache
|
||||
|
||||
`nginx.conf` should go to `/etc/nginx/nginx.conf`
|
||||
|
||||
|
@ -2,14 +2,20 @@
|
||||
Role
|
||||
======
|
||||
|
||||
A role is similar to a user and has permission policies attached to it, that determine what a role can or can not do. A role can be assumed by any identity that needs it. If a user assumes a role, a set of dynamically created temporary credentials are returned to the user. A role can be used to delegate access to users, applications, services that do not have permissions to access some s3 resources.
|
||||
A role is similar to a user. It has permission policies attached to it that
|
||||
determine what it can do and what it cannot do. A role can be assumed by any
|
||||
identity that needs it. When a user assumes a role, a set of
|
||||
dynamically-created temporary credentials are provided to the user. A role can
|
||||
be used to delegate access to users, to applications, and to services that do
|
||||
not have permissions to access certain S3 resources.
|
||||
|
||||
The following radosgw-admin commands can be used to create/ delete/ update a role and permissions associated with a role.
|
||||
The following ``radosgw-admin`` commands can be used to create or delete or
|
||||
update a role and the permissions associated with it.
|
||||
|
||||
Create a Role
|
||||
-------------
|
||||
|
||||
To create a role, execute the following::
|
||||
To create a role, run a command of the following form::
|
||||
|
||||
radosgw-admin role create --role-name={role-name} [--path=="{path to the role}"] [--assume-role-policy-doc={trust-policy-document}]
|
||||
|
||||
@ -23,15 +29,16 @@ Request Parameters
|
||||
|
||||
``path``
|
||||
|
||||
:Description: Path to the role. The default value is a slash(/).
|
||||
:Description: Path to the role. The default value is a slash(``/``).
|
||||
:Type: String
|
||||
|
||||
``assume-role-policy-doc``
|
||||
|
||||
:Description: The trust relationship policy document that grants an entity permission to assume the role.
|
||||
:Description: The trust relationship policy document that grants an entity
|
||||
permission to assume the role.
|
||||
:Type: String
|
||||
|
||||
For example::
|
||||
For example::
|
||||
|
||||
radosgw-admin role create --role-name=S3Access1 --path=/application_abc/component_xyz/ --assume-role-policy-doc=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Principal\":\{\"AWS\":\[\"arn:aws:iam:::user/TESTER\"\]\},\"Action\":\[\"sts:AssumeRole\"\]\}\]\}
|
||||
|
||||
@ -51,9 +58,11 @@ For example::
|
||||
Delete a Role
|
||||
-------------
|
||||
|
||||
To delete a role, execute the following::
|
||||
To delete a role, run a command of the following form:
|
||||
|
||||
radosgw-admin role delete --role-name={role-name}
|
||||
.. prompt:: bash
|
||||
|
||||
radosgw-admin role delete --role-name={role-name}
|
||||
|
||||
Request Parameters
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
@ -63,18 +72,23 @@ Request Parameters
|
||||
:Description: Name of the role.
|
||||
:Type: String
|
||||
|
||||
For example::
|
||||
|
||||
radosgw-admin role delete --role-name=S3Access1
|
||||
For example:
|
||||
|
||||
Note: A role can be deleted only when it doesn't have any permission policy attached to it.
|
||||
.. prompt:: bash
|
||||
|
||||
radosgw-admin role delete --role-name=S3Access1
|
||||
|
||||
Note: A role can be deleted only when it has no permission policy attached to
|
||||
it.
|
||||
|
||||
Get a Role
|
||||
----------
|
||||
|
||||
To get information about a role, execute the following::
|
||||
To get information about a role, run a command of the following form:
|
||||
|
||||
radosgw-admin role get --role-name={role-name}
|
||||
.. prompt:: bash
|
||||
|
||||
radosgw-admin role get --role-name={role-name}
|
||||
|
||||
Request Parameters
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
@ -84,9 +98,11 @@ Request Parameters
|
||||
:Description: Name of the role.
|
||||
:Type: String
|
||||
|
||||
For example::
|
||||
For example:
|
||||
|
||||
.. prompt:: bash
|
||||
|
||||
radosgw-admin role get --role-name=S3Access1
|
||||
radosgw-admin role get --role-name=S3Access1
|
||||
|
||||
.. code-block:: javascript
|
||||
|
||||
@ -104,21 +120,26 @@ For example::
|
||||
List Roles
|
||||
----------
|
||||
|
||||
To list roles with a specified path prefix, execute the following::
|
||||
To list roles with a specified path prefix, run a command of the following form:
|
||||
|
||||
radosgw-admin role list [--path-prefix ={path prefix}]
|
||||
.. prompt:: bash
|
||||
|
||||
radosgw-admin role list [--path-prefix ={path prefix}]
|
||||
|
||||
Request Parameters
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
|
||||
``path-prefix``
|
||||
|
||||
:Description: Path prefix for filtering roles. If this is not specified, all roles are listed.
|
||||
:Description: Path prefix for filtering roles. If this is not specified, all
|
||||
roles are listed.
|
||||
:Type: String
|
||||
|
||||
For example::
|
||||
For example:
|
||||
|
||||
.. prompt:: bash
|
||||
|
||||
radosgw-admin role list --path-prefix="/application"
|
||||
radosgw-admin role list --path-prefix="/application"
|
||||
|
||||
.. code-block:: javascript
|
||||
|
||||
@ -134,7 +155,6 @@ For example::
|
||||
}
|
||||
]
|
||||
|
||||
|
||||
Update Assume Role Policy Document of a role
|
||||
--------------------------------------------
|
||||
|
||||
@ -334,6 +354,7 @@ Create a Role
|
||||
-------------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=CreateRole&RoleName=S3Access&Path=/application_abc/component_xyz/&AssumeRolePolicyDocument=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Principal\":\{\"AWS\":\[\"arn:aws:iam:::user/TESTER\"\]\},\"Action\":\[\"sts:AssumeRole\"\]\}\]\}"
|
||||
|
||||
.. code-block:: XML
|
||||
@ -353,14 +374,18 @@ Delete a Role
|
||||
-------------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=DeleteRole&RoleName=S3Access"
|
||||
|
||||
Note: A role can be deleted only when it doesn't have any permission policy attached to it.
|
||||
Note: A role can be deleted only when it doesn't have any permission policy
|
||||
attached to it. If you intend to delete a role, you must first delete any
|
||||
policies attached to it.
|
||||
|
||||
Get a Role
|
||||
----------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=GetRole&RoleName=S3Access"
|
||||
|
||||
.. code-block:: XML
|
||||
@ -380,6 +405,7 @@ List Roles
|
||||
----------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=ListRoles&RoleName=S3Access&PathPrefix=/application"
|
||||
|
||||
.. code-block:: XML
|
||||
@ -399,18 +425,21 @@ Update Assume Role Policy Document
|
||||
----------------------------------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=UpdateAssumeRolePolicy&RoleName=S3Access&PolicyDocument=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Principal\":\{\"AWS\":\[\"arn:aws:iam:::user/TESTER2\"\]\},\"Action\":\[\"sts:AssumeRole\"\]\}\]\}"
|
||||
|
||||
Add/ Update a Policy attached to a Role
|
||||
---------------------------------------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=PutRolePolicy&RoleName=S3Access&PolicyName=Policy1&PolicyDocument=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Action\":\[\"s3:CreateBucket\"\],\"Resource\":\"arn:aws:s3:::example_bucket\"\}\]\}"
|
||||
|
||||
List Permission Policy Names attached to a Role
|
||||
-----------------------------------------------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=ListRolePolicies&RoleName=S3Access"
|
||||
|
||||
.. code-block:: XML
|
||||
@ -424,6 +453,7 @@ Get Permission Policy attached to a Role
|
||||
----------------------------------------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=GetRolePolicy&RoleName=S3Access&PolicyName=Policy1"
|
||||
|
||||
.. code-block:: XML
|
||||
@ -439,6 +469,7 @@ Delete Policy attached to a Role
|
||||
--------------------------------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=DeleteRolePolicy&RoleName=S3Access&PolicyName=Policy1"
|
||||
|
||||
Tag a role
|
||||
@ -447,6 +478,7 @@ A role can have multivalued tags attached to it. These tags can be passed in as
|
||||
AWS does not support multi-valued role tags.
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=TagRole&RoleName=S3Access&Tags.member.1.Key=Department&Tags.member.1.Value=Engineering"
|
||||
|
||||
.. code-block:: XML
|
||||
@ -463,6 +495,7 @@ List role tags
|
||||
Lists the tags attached to a role.
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=ListRoleTags&RoleName=S3Access"
|
||||
|
||||
.. code-block:: XML
|
||||
@ -486,6 +519,7 @@ Delete role tags
|
||||
Delete a tag/ tags attached to a role.
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=UntagRoles&RoleName=S3Access&TagKeys.member.1=Department"
|
||||
|
||||
.. code-block:: XML
|
||||
@ -500,6 +534,7 @@ Update Role
|
||||
-----------
|
||||
|
||||
Example::
|
||||
|
||||
POST "<hostname>?Action=UpdateRole&RoleName=S3Access&MaxSessionDuration=43200"
|
||||
|
||||
.. code-block:: XML
|
||||
@ -565,6 +600,3 @@ The following is sample code for adding tags to role, listing tags and untagging
|
||||
'Department',
|
||||
]
|
||||
)
|
||||
|
||||
|
||||
|
||||
|
@ -104,7 +104,7 @@ An example of a role permission policy that uses aws:PrincipalTag is as follows:
|
||||
{
|
||||
"Effect":"Allow",
|
||||
"Action":["s3:*"],
|
||||
"Resource":["arn:aws:s3::t1tenant:my-test-bucket","arn:aws:s3::t1tenant:my-test-bucket/*],"+
|
||||
"Resource":["arn:aws:s3::t1tenant:my-test-bucket","arn:aws:s3::t1tenant:my-test-bucket/*"],
|
||||
"Condition":{"StringEquals":{"aws:PrincipalTag/Department":"Engineering"}}
|
||||
}]
|
||||
}
|
||||
|
@ -32,9 +32,9 @@ the ``librbd`` library.
|
||||
|
||||
Ceph's block devices deliver high performance with vast scalability to
|
||||
`kernel modules`_, or to :abbr:`KVMs (kernel virtual machines)` such as `QEMU`_, and
|
||||
cloud-based computing systems like `OpenStack`_ and `CloudStack`_ that rely on
|
||||
libvirt and QEMU to integrate with Ceph block devices. You can use the same cluster
|
||||
to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
|
||||
cloud-based computing systems like `OpenStack`_, `OpenNebula`_ and `CloudStack`_
|
||||
that rely on libvirt and QEMU to integrate with Ceph block devices. You can use
|
||||
the same cluster to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
|
||||
:ref:`Ceph File System <ceph-file-system>`, and Ceph block devices simultaneously.
|
||||
|
||||
.. important:: To use Ceph Block Devices, you must have access to a running
|
||||
@ -69,4 +69,5 @@ to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
|
||||
.. _kernel modules: ./rbd-ko/
|
||||
.. _QEMU: ./qemu-rbd/
|
||||
.. _OpenStack: ./rbd-openstack
|
||||
.. _OpenNebula: https://docs.opennebula.io/stable/open_cluster_deployment/storage_setup/ceph_ds.html
|
||||
.. _CloudStack: ./rbd-cloudstack
|
||||
|
@ -4,11 +4,11 @@
|
||||
|
||||
.. index:: Ceph Block Device; livirt
|
||||
|
||||
The ``libvirt`` library creates a virtual machine abstraction layer between
|
||||
hypervisor interfaces and the software applications that use them. With
|
||||
``libvirt``, developers and system administrators can focus on a common
|
||||
The ``libvirt`` library creates a virtual machine abstraction layer between
|
||||
hypervisor interfaces and the software applications that use them. With
|
||||
``libvirt``, developers and system administrators can focus on a common
|
||||
management framework, common API, and common shell interface (i.e., ``virsh``)
|
||||
to many different hypervisors, including:
|
||||
to many different hypervisors, including:
|
||||
|
||||
- QEMU/KVM
|
||||
- XEN
|
||||
@ -18,7 +18,7 @@ to many different hypervisors, including:
|
||||
|
||||
Ceph block devices support QEMU/KVM. You can use Ceph block devices with
|
||||
software that interfaces with ``libvirt``. The following stack diagram
|
||||
illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``.
|
||||
illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``.
|
||||
|
||||
|
||||
.. ditaa::
|
||||
@ -41,10 +41,11 @@ illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``.
|
||||
|
||||
|
||||
The most common ``libvirt`` use case involves providing Ceph block devices to
|
||||
cloud solutions like OpenStack or CloudStack. The cloud solution uses
|
||||
cloud solutions like OpenStack, OpenNebula or CloudStack. The cloud solution uses
|
||||
``libvirt`` to interact with QEMU/KVM, and QEMU/KVM interacts with Ceph block
|
||||
devices via ``librbd``. See `Block Devices and OpenStack`_ and `Block Devices
|
||||
and CloudStack`_ for details. See `Installation`_ for installation details.
|
||||
devices via ``librbd``. See `Block Devices and OpenStack`_,
|
||||
`Block Devices and OpenNebula`_ and `Block Devices and CloudStack`_ for details.
|
||||
See `Installation`_ for installation details.
|
||||
|
||||
You can also use Ceph block devices with ``libvirt``, ``virsh`` and the
|
||||
``libvirt`` API. See `libvirt Virtualization API`_ for details.
|
||||
@ -62,12 +63,12 @@ Configuring Ceph
|
||||
|
||||
To configure Ceph for use with ``libvirt``, perform the following steps:
|
||||
|
||||
#. `Create a pool`_. The following example uses the
|
||||
#. `Create a pool`_. The following example uses the
|
||||
pool name ``libvirt-pool``.::
|
||||
|
||||
ceph osd pool create libvirt-pool
|
||||
|
||||
Verify the pool exists. ::
|
||||
Verify the pool exists. ::
|
||||
|
||||
ceph osd lspools
|
||||
|
||||
@ -80,23 +81,23 @@ To configure Ceph for use with ``libvirt``, perform the following steps:
|
||||
and references ``libvirt-pool``. ::
|
||||
|
||||
ceph auth get-or-create client.libvirt mon 'profile rbd' osd 'profile rbd pool=libvirt-pool'
|
||||
|
||||
Verify the name exists. ::
|
||||
|
||||
|
||||
Verify the name exists. ::
|
||||
|
||||
ceph auth ls
|
||||
|
||||
**NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``,
|
||||
not the Ceph name ``client.libvirt``. See `User Management - User`_ and
|
||||
`User Management - CLI`_ for a detailed explanation of the difference
|
||||
between ID and name.
|
||||
**NOTE**: ``libvirt`` will access Ceph using the ID ``libvirt``,
|
||||
not the Ceph name ``client.libvirt``. See `User Management - User`_ and
|
||||
`User Management - CLI`_ for a detailed explanation of the difference
|
||||
between ID and name.
|
||||
|
||||
#. Use QEMU to `create an image`_ in your RBD pool.
|
||||
#. Use QEMU to `create an image`_ in your RBD pool.
|
||||
The following example uses the image name ``new-libvirt-image``
|
||||
and references ``libvirt-pool``. ::
|
||||
|
||||
qemu-img create -f rbd rbd:libvirt-pool/new-libvirt-image 2G
|
||||
|
||||
Verify the image exists. ::
|
||||
Verify the image exists. ::
|
||||
|
||||
rbd -p libvirt-pool ls
|
||||
|
||||
@ -111,7 +112,7 @@ To configure Ceph for use with ``libvirt``, perform the following steps:
|
||||
admin socket = /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok
|
||||
|
||||
The ``client.libvirt`` section name should match the cephx user you created
|
||||
above.
|
||||
above.
|
||||
If SELinux or AppArmor is enabled, note that this could prevent the client
|
||||
process (qemu via libvirt) from doing some operations, such as writing logs
|
||||
or operate the images or admin socket to the destination locations (``/var/
|
||||
@ -123,7 +124,7 @@ Preparing the VM Manager
|
||||
========================
|
||||
|
||||
You may use ``libvirt`` without a VM manager, but you may find it simpler to
|
||||
create your first domain with ``virt-manager``.
|
||||
create your first domain with ``virt-manager``.
|
||||
|
||||
#. Install a virtual machine manager. See `KVM/VirtManager`_ for details. ::
|
||||
|
||||
@ -131,7 +132,7 @@ create your first domain with ``virt-manager``.
|
||||
|
||||
#. Download an OS image (if necessary).
|
||||
|
||||
#. Launch the virtual machine manager. ::
|
||||
#. Launch the virtual machine manager. ::
|
||||
|
||||
sudo virt-manager
|
||||
|
||||
@ -142,12 +143,12 @@ Creating a VM
|
||||
|
||||
To create a VM with ``virt-manager``, perform the following steps:
|
||||
|
||||
#. Press the **Create New Virtual Machine** button.
|
||||
#. Press the **Create New Virtual Machine** button.
|
||||
|
||||
#. Name the new virtual machine domain. In the exemplary embodiment, we
|
||||
use the name ``libvirt-virtual-machine``. You may use any name you wish,
|
||||
but ensure you replace ``libvirt-virtual-machine`` with the name you
|
||||
choose in subsequent commandline and configuration examples. ::
|
||||
but ensure you replace ``libvirt-virtual-machine`` with the name you
|
||||
choose in subsequent commandline and configuration examples. ::
|
||||
|
||||
libvirt-virtual-machine
|
||||
|
||||
@ -155,9 +156,9 @@ To create a VM with ``virt-manager``, perform the following steps:
|
||||
|
||||
/path/to/image/recent-linux.img
|
||||
|
||||
**NOTE:** Import a recent image. Some older images may not rescan for
|
||||
**NOTE:** Import a recent image. Some older images may not rescan for
|
||||
virtual devices properly.
|
||||
|
||||
|
||||
#. Configure and start the VM.
|
||||
|
||||
#. You may use ``virsh list`` to verify the VM domain exists. ::
|
||||
@ -179,11 +180,11 @@ you that root privileges are required. For a reference of ``virsh``
|
||||
commands, refer to `Virsh Command Reference`_.
|
||||
|
||||
|
||||
#. Open the configuration file with ``virsh edit``. ::
|
||||
#. Open the configuration file with ``virsh edit``. ::
|
||||
|
||||
sudo virsh edit {vm-domain-name}
|
||||
|
||||
Under ``<devices>`` there should be a ``<disk>`` entry. ::
|
||||
Under ``<devices>`` there should be a ``<disk>`` entry. ::
|
||||
|
||||
<devices>
|
||||
<emulator>/usr/bin/kvm</emulator>
|
||||
@ -196,18 +197,18 @@ commands, refer to `Virsh Command Reference`_.
|
||||
|
||||
|
||||
Replace ``/path/to/image/recent-linux.img`` with the path to the OS image.
|
||||
The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See
|
||||
The minimum kernel for using the faster ``virtio`` bus is 2.6.25. See
|
||||
`Virtio`_ for details.
|
||||
|
||||
**IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit
|
||||
the configuration file under ``/etc/libvirt/qemu`` with a text editor,
|
||||
``libvirt`` may not recognize the change. If there is a discrepancy between
|
||||
the contents of the XML file under ``/etc/libvirt/qemu`` and the result of
|
||||
``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work
|
||||
**IMPORTANT:** Use ``sudo virsh edit`` instead of a text editor. If you edit
|
||||
the configuration file under ``/etc/libvirt/qemu`` with a text editor,
|
||||
``libvirt`` may not recognize the change. If there is a discrepancy between
|
||||
the contents of the XML file under ``/etc/libvirt/qemu`` and the result of
|
||||
``sudo virsh dumpxml {vm-domain-name}``, then your VM may not work
|
||||
properly.
|
||||
|
||||
|
||||
#. Add the Ceph RBD image you created as a ``<disk>`` entry. ::
|
||||
|
||||
#. Add the Ceph RBD image you created as a ``<disk>`` entry. ::
|
||||
|
||||
<disk type='network' device='disk'>
|
||||
<source protocol='rbd' name='libvirt-pool/new-libvirt-image'>
|
||||
@ -216,21 +217,21 @@ commands, refer to `Virsh Command Reference`_.
|
||||
<target dev='vdb' bus='virtio'/>
|
||||
</disk>
|
||||
|
||||
Replace ``{monitor-host}`` with the name of your host, and replace the
|
||||
pool and/or image name as necessary. You may add multiple ``<host>``
|
||||
Replace ``{monitor-host}`` with the name of your host, and replace the
|
||||
pool and/or image name as necessary. You may add multiple ``<host>``
|
||||
entries for your Ceph monitors. The ``dev`` attribute is the logical
|
||||
device name that will appear under the ``/dev`` directory of your
|
||||
VM. The optional ``bus`` attribute indicates the type of disk device to
|
||||
emulate. The valid settings are driver specific (e.g., "ide", "scsi",
|
||||
device name that will appear under the ``/dev`` directory of your
|
||||
VM. The optional ``bus`` attribute indicates the type of disk device to
|
||||
emulate. The valid settings are driver specific (e.g., "ide", "scsi",
|
||||
"virtio", "xen", "usb" or "sata").
|
||||
|
||||
|
||||
See `Disks`_ for details of the ``<disk>`` element, and its child elements
|
||||
and attributes.
|
||||
|
||||
|
||||
#. Save the file.
|
||||
|
||||
#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by
|
||||
default), you must generate a secret. ::
|
||||
#. If your Ceph Storage Cluster has `Ceph Authentication`_ enabled (it does by
|
||||
default), you must generate a secret. ::
|
||||
|
||||
cat > secret.xml <<EOF
|
||||
<secret ephemeral='no' private='no'>
|
||||
@ -249,11 +250,11 @@ commands, refer to `Virsh Command Reference`_.
|
||||
|
||||
ceph auth get-key client.libvirt | sudo tee client.libvirt.key
|
||||
|
||||
#. Set the UUID of the secret. ::
|
||||
#. Set the UUID of the secret. ::
|
||||
|
||||
sudo virsh secret-set-value --secret {uuid of secret} --base64 $(cat client.libvirt.key) && rm client.libvirt.key secret.xml
|
||||
|
||||
You must also set the secret manually by adding the following ``<auth>``
|
||||
You must also set the secret manually by adding the following ``<auth>``
|
||||
entry to the ``<disk>`` element you entered earlier (replacing the
|
||||
``uuid`` value with the result from the command line example above). ::
|
||||
|
||||
@ -266,14 +267,14 @@ commands, refer to `Virsh Command Reference`_.
|
||||
<auth username='libvirt'>
|
||||
<secret type='ceph' uuid='{uuid of secret}'/>
|
||||
</auth>
|
||||
<target ...
|
||||
<target ...
|
||||
|
||||
|
||||
**NOTE:** The exemplary ID is ``libvirt``, not the Ceph name
|
||||
``client.libvirt`` as generated at step 2 of `Configuring Ceph`_. Ensure
|
||||
you use the ID component of the Ceph name you generated. If for some reason
|
||||
you need to regenerate the secret, you will have to execute
|
||||
``sudo virsh secret-undefine {uuid}`` before executing
|
||||
**NOTE:** The exemplary ID is ``libvirt``, not the Ceph name
|
||||
``client.libvirt`` as generated at step 2 of `Configuring Ceph`_. Ensure
|
||||
you use the ID component of the Ceph name you generated. If for some reason
|
||||
you need to regenerate the secret, you will have to execute
|
||||
``sudo virsh secret-undefine {uuid}`` before executing
|
||||
``sudo virsh secret-set-value`` again.
|
||||
|
||||
|
||||
@ -285,30 +286,31 @@ To verify that the VM and Ceph are communicating, you may perform the
|
||||
following procedures.
|
||||
|
||||
|
||||
#. Check to see if Ceph is running::
|
||||
#. Check to see if Ceph is running::
|
||||
|
||||
ceph health
|
||||
|
||||
#. Check to see if the VM is running. ::
|
||||
#. Check to see if the VM is running. ::
|
||||
|
||||
sudo virsh list
|
||||
|
||||
#. Check to see if the VM is communicating with Ceph. Replace
|
||||
``{vm-domain-name}`` with the name of your VM domain::
|
||||
#. Check to see if the VM is communicating with Ceph. Replace
|
||||
``{vm-domain-name}`` with the name of your VM domain::
|
||||
|
||||
sudo virsh qemu-monitor-command --hmp {vm-domain-name} 'info block'
|
||||
|
||||
#. Check to see if the device from ``<target dev='vdb' bus='virtio'/>`` exists::
|
||||
|
||||
|
||||
virsh domblklist {vm-domain-name} --details
|
||||
|
||||
If everything looks okay, you may begin using the Ceph block device
|
||||
If everything looks okay, you may begin using the Ceph block device
|
||||
within your VM.
|
||||
|
||||
|
||||
.. _Installation: ../../install
|
||||
.. _libvirt Virtualization API: http://www.libvirt.org
|
||||
.. _Block Devices and OpenStack: ../rbd-openstack
|
||||
.. _Block Devices and OpenNebula: https://docs.opennebula.io/stable/open_cluster_deployment/storage_setup/ceph_ds.html#datastore-internals
|
||||
.. _Block Devices and CloudStack: ../rbd-cloudstack
|
||||
.. _Create a pool: ../../rados/operations/pools#create-a-pool
|
||||
.. _Create a Ceph User: ../../rados/operations/user-management#add-a-user
|
||||
|
70
ceph/doc/rbd/nvmeof-initiator-esx.rst
Normal file
70
ceph/doc/rbd/nvmeof-initiator-esx.rst
Normal file
@ -0,0 +1,70 @@
|
||||
---------------------------------
|
||||
NVMe/TCP Initiator for VMware ESX
|
||||
---------------------------------
|
||||
|
||||
Prerequisites
|
||||
=============
|
||||
|
||||
- A VMware ESXi host running VMware vSphere Hypervisor (ESXi) 7.0U3 version or later.
|
||||
- Deployed Ceph NVMe-oF gateway.
|
||||
- Ceph cluster with NVMe-oF configuration.
|
||||
- Subsystem defined in the gateway.
|
||||
|
||||
Configuration
|
||||
=============
|
||||
|
||||
The following instructions will use the default vSphere web client and esxcli.
|
||||
|
||||
1. Enable NVMe/TCP on a NIC:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
esxcli nvme fabric enable --protocol TCP --device vmnicN
|
||||
|
||||
Replace ``N`` with the number of the NIC.
|
||||
|
||||
2. Tag a VMKernel NIC to permit NVMe/TCP traffic:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
esxcli network uip interface tag add --interface-nme vmkN --tagname NVMeTCP
|
||||
|
||||
Replace ``N`` with the ID of the VMkernel.
|
||||
|
||||
3. Configure the VMware ESXi host for NVMe/TCP:
|
||||
|
||||
#. List the NVMe-oF adapter:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
esxcli nvme adapter list
|
||||
|
||||
#. Discover NVMe-oF subsystems:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
esxcli nvme fabric discover -a NVME_TCP_ADAPTER -i GATEWAY_IP -p 4420
|
||||
|
||||
#. Connect to NVME-oF gateway subsystem:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
esxcli nvme connect -a NVME_TCP_ADAPTER -i GATEWAY_IP -p 4420 -s SUBSYSTEM_NQN
|
||||
|
||||
#. List the NVMe/TCP controllers:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
esxcli nvme controller list
|
||||
|
||||
#. List the NVMe-oF namespaces in the subsystem:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
esxcli nvme namespace list
|
||||
|
||||
4. Verify that the initiator has been set up correctly:
|
||||
|
||||
#. From the vSphere client go to the ESXi host.
|
||||
#. On the Storage page go to the Devices tab.
|
||||
#. Verify that the NVME/TCP disks are listed in the table.
|
83
ceph/doc/rbd/nvmeof-initiator-linux.rst
Normal file
83
ceph/doc/rbd/nvmeof-initiator-linux.rst
Normal file
@ -0,0 +1,83 @@
|
||||
==============================
|
||||
NVMe/TCP Initiator for Linux
|
||||
==============================
|
||||
|
||||
Prerequisites
|
||||
=============
|
||||
|
||||
- Kernel 5.0 or later
|
||||
- RHEL 9.2 or later
|
||||
- Ubuntu 24.04 or later
|
||||
- SLES 15 SP3 or later
|
||||
|
||||
Installation
|
||||
============
|
||||
|
||||
1. Install the nvme-cli:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
yum install nvme-cli
|
||||
|
||||
2. Load the NVMe-oF module:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
modprobe nvme-fabrics
|
||||
|
||||
3. Verify the NVMe/TCP target is reachable:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
nvme discover -t tcp -a GATEWAY_IP -s 4420
|
||||
|
||||
4. Connect to the NVMe/TCP target:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
nvme connect -t tcp -a GATEWAY_IP -n SUBSYSTEM_NQN
|
||||
|
||||
Next steps
|
||||
==========
|
||||
|
||||
Verify that the initiator is set up correctly:
|
||||
|
||||
1. List the NVMe block devices:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
nvme list
|
||||
|
||||
2. Create a filesystem on the desired device:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
mkfs.ext4 NVME_NODE_PATH
|
||||
|
||||
3. Mount the filesystem:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
mkdir /mnt/nvmeof
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
mount NVME_NODE_PATH /mnt/nvmeof
|
||||
|
||||
4. List the NVME-oF files:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
ls /mnt/nvmeof
|
||||
|
||||
5. Create a text file in the ``/mnt/nvmeof`` directory:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
echo "Hello NVME-oF" > /mnt/nvmeof/hello.text
|
||||
|
||||
6. Verify that the file can be accessed:
|
||||
|
||||
.. prompt:: bash #
|
||||
|
||||
cat /mnt/nvmeof/hello.text
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user