import ceph reef 18.2.4

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This commit is contained in:
Thomas Lamprecht 2024-07-25 18:23:05 +02:00
parent e9fe820e7f
commit f38dd50b34
1399 changed files with 133003 additions and 58787 deletions

View File

@ -1,7 +1,7 @@
cmake_minimum_required(VERSION 3.16)
project(ceph
VERSION 18.2.2
VERSION 18.2.4
LANGUAGES CXX C ASM)
cmake_policy(SET CMP0028 NEW)
@ -247,6 +247,15 @@ set(HAVE_LIBURING ${WITH_LIBURING})
CMAKE_DEPENDENT_OPTION(WITH_SYSTEM_LIBURING "Require and build with system liburing" OFF
"HAVE_LIBAIO;WITH_BLUESTORE" OFF)
if(WITH_LIBURING)
if(WITH_SYSTEM_LIBURING)
find_package(uring REQUIRED)
else()
include(Builduring)
build_uring()
endif()
endif()
CMAKE_DEPENDENT_OPTION(WITH_BLUESTORE_PMEM "Enable PMDK libraries" OFF
"WITH_BLUESTORE" OFF)
if(WITH_BLUESTORE_PMEM)
@ -679,7 +688,7 @@ if(WITH_SYSTEM_NPM)
message(FATAL_ERROR "Can't find npm.")
endif()
endif()
set(DASHBOARD_FRONTEND_LANGS "" CACHE STRING
set(DASHBOARD_FRONTEND_LANGS "ALL" CACHE STRING
"List of comma separated ceph-dashboard frontend languages to build. \
Use value `ALL` to build all languages")
CMAKE_DEPENDENT_OPTION(WITH_MGR_ROOK_CLIENT "Enable the mgr's Rook support" ON

View File

@ -1,3 +1,17 @@
>=18.2.2
--------
* RBD: When diffing against the beginning of time (`fromsnapname == NULL`) in
fast-diff mode (`whole_object == true` with `fast-diff` image feature enabled
and valid), diff-iterate is now guaranteed to execute locally if exclusive
lock is available. This brings a dramatic performance improvement for QEMU
live disk synchronization and backup use cases.
* RADOS: `get_pool_is_selfmanaged_snaps_mode` C++ API has been deprecated
due to being prone to false negative results. It's safer replacement is
`pool_is_in_selfmanaged_snaps_mode`.
* RBD: The option ``--image-id`` has been added to `rbd children` CLI command,
so it can be run for images in the trash.
>=19.0.0
* RGW: S3 multipart uploads using Server-Side Encryption now replicate correctly in
@ -47,6 +61,52 @@
affected and to clean them up accordingly.
* mgr/snap-schedule: For clusters with multiple CephFS file systems, all the
snap-schedule commands now expect the '--fs' argument.
* RGW: Fixed a S3 Object Lock bug with PutObjectRetention requests that specify
a RetainUntilDate after the year 2106. This date was truncated to 32 bits when
stored, so a much earlier date was used for object lock enforcement. This does
not effect PutBucketObjectLockConfiguration where a duration is given in Days.
The RetainUntilDate encoding is fixed for new PutObjectRetention requests, but
cannot repair the dates of existing object locks. Such objects can be identified
with a HeadObject request based on the x-amz-object-lock-retain-until-date
response header.
* RADOS: `get_pool_is_selfmanaged_snaps_mode` C++ API has been deprecated
due to being prone to false negative results. It's safer replacement is
`pool_is_in_selfmanaged_snaps_mode`.
* RADOS: For bug 62338 (https://tracker.ceph.com/issues/62338), we did not choose
to condition the fix on a server flag in order to simplify backporting. As
a result, in rare cases it may be possible for a PG to flip between two acting
sets while an upgrade to a version with the fix is in progress. If you observe
this behavior, you should be able to work around it by completing the upgrade or
by disabling async recovery by setting osd_async_recovery_min_cost to a very
large value on all OSDs until the upgrade is complete:
``ceph config set osd osd_async_recovery_min_cost 1099511627776``
* RADOS: A detailed version of the `balancer status` CLI command in the balancer
module is now available. Users may run `ceph balancer status detail` to see more
details about which PGs were updated in the balancer's last optimization.
See https://docs.ceph.com/en/latest/rados/operations/balancer/ for more information.
* CephFS: For clusters with multiple CephFS file systems, all the snap-schedule
commands now expect the '--fs' argument.
* CephFS: The period specifier ``m`` now implies minutes and the period specifier
``M`` now implies months. This has been made consistent with the rest
of the system.
* CephFS: Full support for subvolumes and subvolume groups is now available
for snap_schedule Manager module.
* CephFS: The `subvolume snapshot clone` command now depends on the config option
`snapshot_clone_no_wait` which is used to reject the clone operation when
all the cloner threads are busy. This config option is enabled by default which means
that if no cloner threads are free, the clone request errors out with EAGAIN.
The value of the config option can be fetched by using:
`ceph config get mgr mgr/volumes/snapshot_clone_no_wait`
and it can be disabled by using:
`ceph config set mgr mgr/volumes/snapshot_clone_no_wait false`
* CephFS: fixes to the implementation of the ``root_squash`` mechanism enabled
via cephx ``mds`` caps on a client credential require a new client feature
bit, ``client_mds_auth_caps``. Clients using credentials with ``root_squash``
without this feature will trigger the MDS to raise a HEALTH_ERR on the
cluster, MDS_CLIENTS_BROKEN_ROOTSQUASH. See the documentation on this warning
and the new feature bit for more information.
>=18.0.0
@ -54,6 +114,10 @@
mirroring policies between RGW and AWS, you may wish to set
"rgw policy reject invalid principals" to "false". This affects only newly set
policies, not policies that are already in place.
* The CephFS automatic metadata load (sometimes called "default") balancer is
now disabled by default. The new file system flag `balance_automate`
can be used to toggle it on or off. It can be enabled or disabled via
`ceph fs set <fs_name> balance_automate <bool>`.
* RGW's default backend for `rgw_enable_ops_log` changed from RADOS to file.
The default value of `rgw_ops_log_rados` is now false, and `rgw_ops_log_file_path`
defaults to "/var/log/ceph/ops-log-$cluster-$name.log".
@ -226,6 +290,11 @@
than the number mentioned against the config tunable `mds_max_snaps_per_dir`
so that a new snapshot can be created and retained during the next schedule
run.
* `ceph config dump --format <json|xml>` output will display the localized
option names instead of its normalized version. For e.g.,
"mgr/prometheus/x/server_port" will be displayed instead of
"mgr/prometheus/server_port". This matches the output of the non pretty-print
formatted version of the command.
>=17.2.1
@ -291,3 +360,14 @@ Relevant tracker: https://tracker.ceph.com/issues/55715
request from client(s). This can be useful during some recovery situations
where it's desirable to bring MDS up but have no client workload.
Relevant tracker: https://tracker.ceph.com/issues/57090
* New MDSMap field `max_xattr_size` which can be set using the `fs set` command.
This MDSMap field allows to configure the maximum size allowed for the full
key/value set for a filesystem extended attributes. It effectively replaces
the old per-MDS `max_xattr_pairs_size` setting, which is now dropped.
Relevant tracker: https://tracker.ceph.com/issues/55725
* Introduced a new file system flag `refuse_standby_for_another_fs` that can be
set using the `fs set` command. This flag prevents using a standby for another
file system (join_fs = X) when standby for the current filesystem is not available.
Relevant tracker: https://tracker.ceph.com/issues/61599

View File

@ -1,4 +1,4 @@
Sphinx == 4.5.0
Sphinx == 5.0.2
git+https://github.com/ceph/sphinx-ditaa.git@py3#egg=sphinx-ditaa
git+https://github.com/vlasovskikh/funcparserlib.git
breathe >= 4.20.0,!=4.33

View File

@ -1,6 +1,6 @@
ceph-menv
Environment assistant for use in conjuction with multiple ceph vstart (or more accurately mstart) clusters. Eliminates the need to specify the cluster that is being used with each and every command. Can provide a shell prompt feedback about the currently used cluster.
Environment assistant for use in conjunction with multiple Ceph vstart (or more accurately mstart) clusters. Eliminates the need to specify the cluster that is being used with each and every command. Can provide a shell prompt feedback about the currently used cluster.
Usage:

View File

@ -35,8 +35,8 @@
%else
%bcond_with rbd_rwl_cache
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?rhel} < 9
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%if 0%{?rhel} < 9 || 0%{?openEuler}
%bcond_with system_pmdk
%else
%ifarch s390x aarch64
@ -93,7 +93,7 @@
%endif
%endif
%bcond_with seastar
%if 0%{?suse_version}
%if 0%{?suse_version} || 0%{?openEuler}
%bcond_with jaeger
%else
%bcond_without jaeger
@ -112,7 +112,7 @@
# this is tracked in https://bugzilla.redhat.com/2152265
%bcond_with system_arrow
%endif
%if 0%{?fedora} || 0%{?suse_version} || 0%{?rhel} >= 8
%if 0%{?fedora} || 0%{?suse_version} || 0%{?rhel} >= 8 || 0%{?openEuler}
%global weak_deps 1
%endif
%if %{with selinux}
@ -170,7 +170,7 @@
# main package definition
#################################################################################
Name: ceph
Version: 18.2.2
Version: 18.2.4
Release: 0%{?dist}
%if 0%{?fedora} || 0%{?rhel}
Epoch: 2
@ -186,7 +186,7 @@ License: LGPL-2.1 and LGPL-3.0 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-
Group: System/Filesystems
%endif
URL: http://ceph.com/
Source0: %{?_remote_tarball_prefix}ceph-18.2.2.tar.bz2
Source0: %{?_remote_tarball_prefix}ceph-18.2.4.tar.bz2
%if 0%{?suse_version}
# _insert_obs_source_lines_here
ExclusiveArch: x86_64 aarch64 ppc64le s390x
@ -211,7 +211,7 @@ BuildRequires: selinux-policy-devel
BuildRequires: gperf
BuildRequires: cmake > 3.5
BuildRequires: fuse-devel
%if 0%{?fedora} || 0%{?suse_version} > 1500 || 0%{?rhel} == 9
%if 0%{?fedora} || 0%{?suse_version} > 1500 || 0%{?rhel} == 9 || 0%{?openEuler}
BuildRequires: gcc-c++ >= 11
%endif
%if 0%{?suse_version} == 1500
@ -222,12 +222,12 @@ BuildRequires: %{gts_prefix}-gcc-c++
BuildRequires: %{gts_prefix}-build
BuildRequires: %{gts_prefix}-libatomic-devel
%endif
%if 0%{?fedora} || 0%{?rhel} == 9
%if 0%{?fedora} || 0%{?rhel} == 9 || 0%{?openEuler}
BuildRequires: libatomic
%endif
%if 0%{with tcmalloc}
# libprofiler did not build on ppc64le until 2.7.90
%if 0%{?fedora} || 0%{?rhel} >= 8
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
BuildRequires: gperftools-devel >= 2.7.90
%endif
%if 0%{?rhel} && 0%{?rhel} < 8
@ -379,7 +379,7 @@ BuildRequires: liblz4-devel >= 1.7
BuildRequires: golang-github-prometheus-prometheus
BuildRequires: jsonnet
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
Requires: systemd
BuildRequires: boost-random
BuildRequires: nss-devel
@ -401,7 +401,7 @@ BuildRequires: lz4-devel >= 1.7
# distro-conditional make check dependencies
%if 0%{with make_check}
BuildRequires: golang
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
BuildRequires: golang-github-prometheus
BuildRequires: libtool-ltdl-devel
BuildRequires: xmlsec1
@ -412,7 +412,6 @@ BuildRequires: xmlsec1-nss
BuildRequires: xmlsec1-openssl
BuildRequires: xmlsec1-openssl-devel
BuildRequires: python%{python3_pkgversion}-cherrypy
BuildRequires: python%{python3_pkgversion}-jwt
BuildRequires: python%{python3_pkgversion}-routes
BuildRequires: python%{python3_pkgversion}-scipy
BuildRequires: python%{python3_pkgversion}-werkzeug
@ -425,7 +424,6 @@ BuildRequires: libxmlsec1-1
BuildRequires: libxmlsec1-nss1
BuildRequires: libxmlsec1-openssl1
BuildRequires: python%{python3_pkgversion}-CherryPy
BuildRequires: python%{python3_pkgversion}-PyJWT
BuildRequires: python%{python3_pkgversion}-Routes
BuildRequires: python%{python3_pkgversion}-Werkzeug
BuildRequires: python%{python3_pkgversion}-numpy-devel
@ -435,7 +433,7 @@ BuildRequires: xmlsec1-openssl-devel
%endif
# lttng and babeltrace for rbd-replay-prep
%if %{with lttng}
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
BuildRequires: lttng-ust-devel
BuildRequires: libbabeltrace-devel
%endif
@ -447,15 +445,18 @@ BuildRequires: babeltrace-devel
%if 0%{?suse_version}
BuildRequires: libexpat-devel
%endif
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
BuildRequires: expat-devel
%endif
#hardened-cc1
%if 0%{?fedora} || 0%{?rhel}
BuildRequires: redhat-rpm-config
%endif
%if 0%{?openEuler}
BuildRequires: openEuler-rpm-config
%endif
%if 0%{with seastar}
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
BuildRequires: cryptopp-devel
BuildRequires: numactl-devel
%endif
@ -543,7 +544,7 @@ Requires: python%{python3_pkgversion}-cephfs = %{_epoch_prefix}%{version}-%{rele
Requires: python%{python3_pkgversion}-rgw = %{_epoch_prefix}%{version}-%{release}
Requires: python%{python3_pkgversion}-ceph-argparse = %{_epoch_prefix}%{version}-%{release}
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
Requires: python%{python3_pkgversion}-prettytable
%endif
%if 0%{?suse_version}
@ -615,9 +616,8 @@ Requires: ceph-mgr = %{_epoch_prefix}%{version}-%{release}
Requires: ceph-grafana-dashboards = %{_epoch_prefix}%{version}-%{release}
Requires: ceph-prometheus-alerts = %{_epoch_prefix}%{version}-%{release}
Requires: python%{python3_pkgversion}-setuptools
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
Requires: python%{python3_pkgversion}-cherrypy
Requires: python%{python3_pkgversion}-jwt
Requires: python%{python3_pkgversion}-routes
Requires: python%{python3_pkgversion}-werkzeug
%if 0%{?weak_deps}
@ -626,7 +626,6 @@ Recommends: python%{python3_pkgversion}-saml
%endif
%if 0%{?suse_version}
Requires: python%{python3_pkgversion}-CherryPy
Requires: python%{python3_pkgversion}-PyJWT
Requires: python%{python3_pkgversion}-Routes
Requires: python%{python3_pkgversion}-Werkzeug
Recommends: python%{python3_pkgversion}-python3-saml
@ -645,7 +644,7 @@ Group: System/Filesystems
%endif
Requires: ceph-mgr = %{_epoch_prefix}%{version}-%{release}
Requires: python%{python3_pkgversion}-numpy
%if 0%{?fedora} || 0%{?suse_version}
%if 0%{?fedora} || 0%{?suse_version} || 0%{?openEuler}
Requires: python%{python3_pkgversion}-scikit-learn
%endif
Requires: python3-scipy
@ -665,7 +664,7 @@ Requires: python%{python3_pkgversion}-pyOpenSSL
Requires: python%{python3_pkgversion}-requests
Requires: python%{python3_pkgversion}-dateutil
Requires: python%{python3_pkgversion}-setuptools
%if 0%{?fedora} || 0%{?rhel} >= 8
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
Requires: python%{python3_pkgversion}-cherrypy
Requires: python%{python3_pkgversion}-pyyaml
Requires: python%{python3_pkgversion}-werkzeug
@ -722,7 +721,7 @@ Requires: openssh
Requires: python%{python3_pkgversion}-CherryPy
Requires: python%{python3_pkgversion}-Jinja2
%endif
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Requires: openssh-clients
Requires: python%{python3_pkgversion}-cherrypy
Requires: python%{python3_pkgversion}-jinja2
@ -814,7 +813,7 @@ Requires: ceph-selinux = %{_epoch_prefix}%{version}-%{release}
%endif
Requires: librados2 = %{_epoch_prefix}%{version}-%{release}
Requires: librgw2 = %{_epoch_prefix}%{version}-%{release}
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Requires: mailcap
%endif
%if 0%{?weak_deps}
@ -894,6 +893,7 @@ Requires: parted
Requires: util-linux
Requires: xfsprogs
Requires: python%{python3_pkgversion}-setuptools
Requires: python%{python3_pkgversion}-packaging
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
%description volume
This package contains a tool to deploy OSD with different devices like
@ -905,7 +905,7 @@ Summary: RADOS distributed object store client library
%if 0%{?suse_version}
Group: System/Libraries
%endif
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
%endif
%description -n librados2
@ -1052,7 +1052,7 @@ Requires: librados2 = %{_epoch_prefix}%{version}-%{release}
%if 0%{?suse_version}
Requires(post): coreutils
%endif
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
%endif
%description -n librbd1
@ -1096,7 +1096,7 @@ Summary: Ceph distributed file system client library
Group: System/Libraries
%endif
Obsoletes: libcephfs1 < %{_epoch_prefix}%{version}-%{release}
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
Obsoletes: ceph-libcephfs
%endif
@ -1149,7 +1149,7 @@ descriptions, and submitting the command to the appropriate daemon.
%package -n python%{python3_pkgversion}-ceph-common
Summary: Python 3 utility libraries for Ceph
%if 0%{?fedora} || 0%{?rhel} >= 8
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
Requires: python%{python3_pkgversion}-pyyaml
%endif
%if 0%{?suse_version}
@ -1288,11 +1288,20 @@ Group: System/Monitoring
%description mib
This package provides a Ceph MIB for SNMP traps.
%package node-proxy
Summary: hw monitoring agent for Ceph
BuildArch: noarch
%if 0%{?suse_version}
Group: System/Monitoring
%endif
%description node-proxy
This package provides a Ceph hardware monitoring agent.
#################################################################################
# common
#################################################################################
%prep
%autosetup -p1 -n ceph-18.2.2
%autosetup -p1 -n ceph-18.2.4
%build
# Disable lto on systems that do not support symver attribute
@ -1467,7 +1476,7 @@ install -m 0755 %{buildroot}%{_bindir}/crimson-osd %{buildroot}%{_bindir}/ceph-o
%endif
install -m 0644 -D src/etc-rbdmap %{buildroot}%{_sysconfdir}/ceph/rbdmap
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
install -m 0644 -D etc/sysconfig/ceph %{buildroot}%{_sysconfdir}/sysconfig/ceph
%endif
%if 0%{?suse_version}
@ -1501,7 +1510,7 @@ install -m 0644 -D udev/50-rbd.rules %{buildroot}%{_udevrulesdir}/50-rbd.rules
# sudoers.d
install -m 0440 -D sudoers.d/ceph-smartctl %{buildroot}%{_sysconfdir}/sudoers.d/ceph-smartctl
%if 0%{?rhel} >= 8
%if 0%{?rhel} >= 8 || 0%{?openEuler}
pathfix.py -pni "%{__python3} %{py3_shbang_opts}" %{buildroot}%{_bindir}/*
pathfix.py -pni "%{__python3} %{py3_shbang_opts}" %{buildroot}%{_sbindir}/*
%endif
@ -1538,7 +1547,7 @@ install -m 644 -D -t %{buildroot}%{_datadir}/snmp/mibs monitoring/snmp/CEPH-MIB.
%fdupes %{buildroot}%{_prefix}
%endif
%if 0%{?rhel} == 8
%if 0%{?rhel} == 8 || 0%{?openEuler}
%py_byte_compile %{__python3} %{buildroot}%{python3_sitelib}
%endif
@ -1581,7 +1590,7 @@ rm -rf %{_vpath_builddir}
%{_libdir}/libosd_tp.so*
%endif
%config(noreplace) %{_sysconfdir}/logrotate.d/ceph
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%config(noreplace) %{_sysconfdir}/sysconfig/ceph
%endif
%if 0%{?suse_version}
@ -1614,7 +1623,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph.target ceph-crash.service >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph.target ceph-crash.service
%endif
if [ $1 -eq 1 ] ; then
@ -1625,7 +1634,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph.target ceph-crash.service
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph.target ceph-crash.service
%endif
@ -1722,7 +1731,7 @@ exit 0
%pre common
CEPH_GROUP_ID=167
CEPH_USER_ID=167
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
/usr/sbin/groupadd ceph -g $CEPH_GROUP_ID -o -r 2>/dev/null || :
/usr/sbin/useradd ceph -u $CEPH_USER_ID -o -r -g ceph -s /sbin/nologin -c "Ceph daemons" -d %{_localstatedir}/lib/ceph 2>/dev/null || :
%endif
@ -1768,7 +1777,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-mds@\*.service ceph-mds.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-mds@\*.service ceph-mds.target
%endif
if [ $1 -eq 1 ] ; then
@ -1779,7 +1788,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-mds@\*.service ceph-mds.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-mds@\*.service ceph-mds.target
%endif
@ -1813,7 +1822,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-mgr@\*.service ceph-mgr.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-mgr@\*.service ceph-mgr.target
%endif
if [ $1 -eq 1 ] ; then
@ -1824,7 +1833,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-mgr@\*.service ceph-mgr.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-mgr@\*.service ceph-mgr.target
%endif
@ -1953,7 +1962,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-mon@\*.service ceph-mon.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-mon@\*.service ceph-mon.target
%endif
if [ $1 -eq 1 ] ; then
@ -1964,7 +1973,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-mon@\*.service ceph-mon.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-mon@\*.service ceph-mon.target
%endif
@ -2002,7 +2011,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset cephfs-mirror@\*.service cephfs-mirror.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post cephfs-mirror@\*.service cephfs-mirror.target
%endif
if [ $1 -eq 1 ] ; then
@ -2013,7 +2022,7 @@ fi
%if 0%{?suse_version}
%service_del_preun cephfs-mirror@\*.service cephfs-mirror.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun cephfs-mirror@\*.service cephfs-mirror.target
%endif
@ -2033,6 +2042,7 @@ fi
%files -n ceph-exporter
%{_bindir}/ceph-exporter
%{_unitdir}/ceph-exporter.service
%files -n rbd-fuse
%{_bindir}/rbd-fuse
@ -2050,7 +2060,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-rbd-mirror@\*.service ceph-rbd-mirror.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
%endif
if [ $1 -eq 1 ] ; then
@ -2061,7 +2071,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
%endif
@ -2091,7 +2101,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
%endif
if [ $1 -eq 1 ] ; then
@ -2102,7 +2112,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
%endif
@ -2145,7 +2155,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-radosgw@\*.service ceph-radosgw.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-radosgw@\*.service ceph-radosgw.target
%endif
if [ $1 -eq 1 ] ; then
@ -2156,7 +2166,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-radosgw@\*.service ceph-radosgw.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-radosgw@\*.service ceph-radosgw.target
%endif
@ -2196,7 +2206,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-osd@\*.service ceph-osd.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-osd@\*.service ceph-osd.target
%endif
if [ $1 -eq 1 ] ; then
@ -2212,7 +2222,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-osd@\*.service ceph-osd.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-osd@\*.service ceph-osd.target
%endif
@ -2251,7 +2261,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-volume@\*.service >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-volume@\*.service
%endif
@ -2259,7 +2269,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-volume@\*.service
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-volume@\*.service
%endif
@ -2620,4 +2630,10 @@ exit 0
%attr(0755,root,root) %dir %{_datadir}/snmp
%{_datadir}/snmp/mibs
%files node-proxy
%{_sbindir}/ceph-node-proxy
%dir %{python3_sitelib}/ceph_node_proxy
%{python3_sitelib}/ceph_node_proxy/*
%{python3_sitelib}/ceph_node_proxy-*
%changelog

View File

@ -35,8 +35,8 @@
%else
%bcond_with rbd_rwl_cache
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?rhel} < 9
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%if 0%{?rhel} < 9 || 0%{?openEuler}
%bcond_with system_pmdk
%else
%ifarch s390x aarch64
@ -93,7 +93,7 @@
%endif
%endif
%bcond_with seastar
%if 0%{?suse_version}
%if 0%{?suse_version} || 0%{?openEuler}
%bcond_with jaeger
%else
%bcond_without jaeger
@ -112,7 +112,7 @@
# this is tracked in https://bugzilla.redhat.com/2152265
%bcond_with system_arrow
%endif
%if 0%{?fedora} || 0%{?suse_version} || 0%{?rhel} >= 8
%if 0%{?fedora} || 0%{?suse_version} || 0%{?rhel} >= 8 || 0%{?openEuler}
%global weak_deps 1
%endif
%if %{with selinux}
@ -211,7 +211,7 @@ BuildRequires: selinux-policy-devel
BuildRequires: gperf
BuildRequires: cmake > 3.5
BuildRequires: fuse-devel
%if 0%{?fedora} || 0%{?suse_version} > 1500 || 0%{?rhel} == 9
%if 0%{?fedora} || 0%{?suse_version} > 1500 || 0%{?rhel} == 9 || 0%{?openEuler}
BuildRequires: gcc-c++ >= 11
%endif
%if 0%{?suse_version} == 1500
@ -222,12 +222,12 @@ BuildRequires: %{gts_prefix}-gcc-c++
BuildRequires: %{gts_prefix}-build
BuildRequires: %{gts_prefix}-libatomic-devel
%endif
%if 0%{?fedora} || 0%{?rhel} == 9
%if 0%{?fedora} || 0%{?rhel} == 9 || 0%{?openEuler}
BuildRequires: libatomic
%endif
%if 0%{with tcmalloc}
# libprofiler did not build on ppc64le until 2.7.90
%if 0%{?fedora} || 0%{?rhel} >= 8
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
BuildRequires: gperftools-devel >= 2.7.90
%endif
%if 0%{?rhel} && 0%{?rhel} < 8
@ -379,7 +379,7 @@ BuildRequires: liblz4-devel >= 1.7
BuildRequires: golang-github-prometheus-prometheus
BuildRequires: jsonnet
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
Requires: systemd
BuildRequires: boost-random
BuildRequires: nss-devel
@ -401,7 +401,7 @@ BuildRequires: lz4-devel >= 1.7
# distro-conditional make check dependencies
%if 0%{with make_check}
BuildRequires: golang
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
BuildRequires: golang-github-prometheus
BuildRequires: libtool-ltdl-devel
BuildRequires: xmlsec1
@ -412,7 +412,6 @@ BuildRequires: xmlsec1-nss
BuildRequires: xmlsec1-openssl
BuildRequires: xmlsec1-openssl-devel
BuildRequires: python%{python3_pkgversion}-cherrypy
BuildRequires: python%{python3_pkgversion}-jwt
BuildRequires: python%{python3_pkgversion}-routes
BuildRequires: python%{python3_pkgversion}-scipy
BuildRequires: python%{python3_pkgversion}-werkzeug
@ -425,7 +424,6 @@ BuildRequires: libxmlsec1-1
BuildRequires: libxmlsec1-nss1
BuildRequires: libxmlsec1-openssl1
BuildRequires: python%{python3_pkgversion}-CherryPy
BuildRequires: python%{python3_pkgversion}-PyJWT
BuildRequires: python%{python3_pkgversion}-Routes
BuildRequires: python%{python3_pkgversion}-Werkzeug
BuildRequires: python%{python3_pkgversion}-numpy-devel
@ -435,7 +433,7 @@ BuildRequires: xmlsec1-openssl-devel
%endif
# lttng and babeltrace for rbd-replay-prep
%if %{with lttng}
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
BuildRequires: lttng-ust-devel
BuildRequires: libbabeltrace-devel
%endif
@ -447,15 +445,18 @@ BuildRequires: babeltrace-devel
%if 0%{?suse_version}
BuildRequires: libexpat-devel
%endif
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
BuildRequires: expat-devel
%endif
#hardened-cc1
%if 0%{?fedora} || 0%{?rhel}
BuildRequires: redhat-rpm-config
%endif
%if 0%{?openEuler}
BuildRequires: openEuler-rpm-config
%endif
%if 0%{with seastar}
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
BuildRequires: cryptopp-devel
BuildRequires: numactl-devel
%endif
@ -543,7 +544,7 @@ Requires: python%{python3_pkgversion}-cephfs = %{_epoch_prefix}%{version}-%{rele
Requires: python%{python3_pkgversion}-rgw = %{_epoch_prefix}%{version}-%{release}
Requires: python%{python3_pkgversion}-ceph-argparse = %{_epoch_prefix}%{version}-%{release}
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
Requires: python%{python3_pkgversion}-prettytable
%endif
%if 0%{?suse_version}
@ -615,9 +616,8 @@ Requires: ceph-mgr = %{_epoch_prefix}%{version}-%{release}
Requires: ceph-grafana-dashboards = %{_epoch_prefix}%{version}-%{release}
Requires: ceph-prometheus-alerts = %{_epoch_prefix}%{version}-%{release}
Requires: python%{python3_pkgversion}-setuptools
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
Requires: python%{python3_pkgversion}-cherrypy
Requires: python%{python3_pkgversion}-jwt
Requires: python%{python3_pkgversion}-routes
Requires: python%{python3_pkgversion}-werkzeug
%if 0%{?weak_deps}
@ -626,7 +626,6 @@ Recommends: python%{python3_pkgversion}-saml
%endif
%if 0%{?suse_version}
Requires: python%{python3_pkgversion}-CherryPy
Requires: python%{python3_pkgversion}-PyJWT
Requires: python%{python3_pkgversion}-Routes
Requires: python%{python3_pkgversion}-Werkzeug
Recommends: python%{python3_pkgversion}-python3-saml
@ -645,7 +644,7 @@ Group: System/Filesystems
%endif
Requires: ceph-mgr = %{_epoch_prefix}%{version}-%{release}
Requires: python%{python3_pkgversion}-numpy
%if 0%{?fedora} || 0%{?suse_version}
%if 0%{?fedora} || 0%{?suse_version} || 0%{?openEuler}
Requires: python%{python3_pkgversion}-scikit-learn
%endif
Requires: python3-scipy
@ -665,7 +664,7 @@ Requires: python%{python3_pkgversion}-pyOpenSSL
Requires: python%{python3_pkgversion}-requests
Requires: python%{python3_pkgversion}-dateutil
Requires: python%{python3_pkgversion}-setuptools
%if 0%{?fedora} || 0%{?rhel} >= 8
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
Requires: python%{python3_pkgversion}-cherrypy
Requires: python%{python3_pkgversion}-pyyaml
Requires: python%{python3_pkgversion}-werkzeug
@ -722,7 +721,7 @@ Requires: openssh
Requires: python%{python3_pkgversion}-CherryPy
Requires: python%{python3_pkgversion}-Jinja2
%endif
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Requires: openssh-clients
Requires: python%{python3_pkgversion}-cherrypy
Requires: python%{python3_pkgversion}-jinja2
@ -814,7 +813,7 @@ Requires: ceph-selinux = %{_epoch_prefix}%{version}-%{release}
%endif
Requires: librados2 = %{_epoch_prefix}%{version}-%{release}
Requires: librgw2 = %{_epoch_prefix}%{version}-%{release}
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Requires: mailcap
%endif
%if 0%{?weak_deps}
@ -894,6 +893,7 @@ Requires: parted
Requires: util-linux
Requires: xfsprogs
Requires: python%{python3_pkgversion}-setuptools
Requires: python%{python3_pkgversion}-packaging
Requires: python%{python3_pkgversion}-ceph-common = %{_epoch_prefix}%{version}-%{release}
%description volume
This package contains a tool to deploy OSD with different devices like
@ -905,7 +905,7 @@ Summary: RADOS distributed object store client library
%if 0%{?suse_version}
Group: System/Libraries
%endif
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
%endif
%description -n librados2
@ -1052,7 +1052,7 @@ Requires: librados2 = %{_epoch_prefix}%{version}-%{release}
%if 0%{?suse_version}
Requires(post): coreutils
%endif
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
%endif
%description -n librbd1
@ -1096,7 +1096,7 @@ Summary: Ceph distributed file system client library
Group: System/Libraries
%endif
Obsoletes: libcephfs1 < %{_epoch_prefix}%{version}-%{release}
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
Obsoletes: ceph-libs < %{_epoch_prefix}%{version}-%{release}
Obsoletes: ceph-libcephfs
%endif
@ -1149,7 +1149,7 @@ descriptions, and submitting the command to the appropriate daemon.
%package -n python%{python3_pkgversion}-ceph-common
Summary: Python 3 utility libraries for Ceph
%if 0%{?fedora} || 0%{?rhel} >= 8
%if 0%{?fedora} || 0%{?rhel} >= 8 || 0%{?openEuler}
Requires: python%{python3_pkgversion}-pyyaml
%endif
%if 0%{?suse_version}
@ -1288,6 +1288,15 @@ Group: System/Monitoring
%description mib
This package provides a Ceph MIB for SNMP traps.
%package node-proxy
Summary: hw monitoring agent for Ceph
BuildArch: noarch
%if 0%{?suse_version}
Group: System/Monitoring
%endif
%description node-proxy
This package provides a Ceph hardware monitoring agent.
#################################################################################
# common
#################################################################################
@ -1467,7 +1476,7 @@ install -m 0755 %{buildroot}%{_bindir}/crimson-osd %{buildroot}%{_bindir}/ceph-o
%endif
install -m 0644 -D src/etc-rbdmap %{buildroot}%{_sysconfdir}/ceph/rbdmap
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
install -m 0644 -D etc/sysconfig/ceph %{buildroot}%{_sysconfdir}/sysconfig/ceph
%endif
%if 0%{?suse_version}
@ -1501,7 +1510,7 @@ install -m 0644 -D udev/50-rbd.rules %{buildroot}%{_udevrulesdir}/50-rbd.rules
# sudoers.d
install -m 0440 -D sudoers.d/ceph-smartctl %{buildroot}%{_sysconfdir}/sudoers.d/ceph-smartctl
%if 0%{?rhel} >= 8
%if 0%{?rhel} >= 8 || 0%{?openEuler}
pathfix.py -pni "%{__python3} %{py3_shbang_opts}" %{buildroot}%{_bindir}/*
pathfix.py -pni "%{__python3} %{py3_shbang_opts}" %{buildroot}%{_sbindir}/*
%endif
@ -1538,7 +1547,7 @@ install -m 644 -D -t %{buildroot}%{_datadir}/snmp/mibs monitoring/snmp/CEPH-MIB.
%fdupes %{buildroot}%{_prefix}
%endif
%if 0%{?rhel} == 8
%if 0%{?rhel} == 8 || 0%{?openEuler}
%py_byte_compile %{__python3} %{buildroot}%{python3_sitelib}
%endif
@ -1581,7 +1590,7 @@ rm -rf %{_vpath_builddir}
%{_libdir}/libosd_tp.so*
%endif
%config(noreplace) %{_sysconfdir}/logrotate.d/ceph
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%config(noreplace) %{_sysconfdir}/sysconfig/ceph
%endif
%if 0%{?suse_version}
@ -1614,7 +1623,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph.target ceph-crash.service >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph.target ceph-crash.service
%endif
if [ $1 -eq 1 ] ; then
@ -1625,7 +1634,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph.target ceph-crash.service
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph.target ceph-crash.service
%endif
@ -1722,7 +1731,7 @@ exit 0
%pre common
CEPH_GROUP_ID=167
CEPH_USER_ID=167
%if 0%{?rhel} || 0%{?fedora}
%if 0%{?rhel} || 0%{?fedora} || 0%{?openEuler}
/usr/sbin/groupadd ceph -g $CEPH_GROUP_ID -o -r 2>/dev/null || :
/usr/sbin/useradd ceph -u $CEPH_USER_ID -o -r -g ceph -s /sbin/nologin -c "Ceph daemons" -d %{_localstatedir}/lib/ceph 2>/dev/null || :
%endif
@ -1768,7 +1777,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-mds@\*.service ceph-mds.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-mds@\*.service ceph-mds.target
%endif
if [ $1 -eq 1 ] ; then
@ -1779,7 +1788,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-mds@\*.service ceph-mds.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-mds@\*.service ceph-mds.target
%endif
@ -1813,7 +1822,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-mgr@\*.service ceph-mgr.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-mgr@\*.service ceph-mgr.target
%endif
if [ $1 -eq 1 ] ; then
@ -1824,7 +1833,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-mgr@\*.service ceph-mgr.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-mgr@\*.service ceph-mgr.target
%endif
@ -1953,7 +1962,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-mon@\*.service ceph-mon.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-mon@\*.service ceph-mon.target
%endif
if [ $1 -eq 1 ] ; then
@ -1964,7 +1973,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-mon@\*.service ceph-mon.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-mon@\*.service ceph-mon.target
%endif
@ -2002,7 +2011,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset cephfs-mirror@\*.service cephfs-mirror.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post cephfs-mirror@\*.service cephfs-mirror.target
%endif
if [ $1 -eq 1 ] ; then
@ -2013,7 +2022,7 @@ fi
%if 0%{?suse_version}
%service_del_preun cephfs-mirror@\*.service cephfs-mirror.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun cephfs-mirror@\*.service cephfs-mirror.target
%endif
@ -2033,6 +2042,7 @@ fi
%files -n ceph-exporter
%{_bindir}/ceph-exporter
%{_unitdir}/ceph-exporter.service
%files -n rbd-fuse
%{_bindir}/rbd-fuse
@ -2050,7 +2060,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-rbd-mirror@\*.service ceph-rbd-mirror.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
%endif
if [ $1 -eq 1 ] ; then
@ -2061,7 +2071,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-rbd-mirror@\*.service ceph-rbd-mirror.target
%endif
@ -2091,7 +2101,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
%endif
if [ $1 -eq 1 ] ; then
@ -2102,7 +2112,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-immutable-object-cache@\*.service ceph-immutable-object-cache.target
%endif
@ -2145,7 +2155,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-radosgw@\*.service ceph-radosgw.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-radosgw@\*.service ceph-radosgw.target
%endif
if [ $1 -eq 1 ] ; then
@ -2156,7 +2166,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-radosgw@\*.service ceph-radosgw.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-radosgw@\*.service ceph-radosgw.target
%endif
@ -2196,7 +2206,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-osd@\*.service ceph-osd.target >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-osd@\*.service ceph-osd.target
%endif
if [ $1 -eq 1 ] ; then
@ -2212,7 +2222,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-osd@\*.service ceph-osd.target
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-osd@\*.service ceph-osd.target
%endif
@ -2251,7 +2261,7 @@ if [ $1 -eq 1 ] ; then
/usr/bin/systemctl preset ceph-volume@\*.service >/dev/null 2>&1 || :
fi
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_post ceph-volume@\*.service
%endif
@ -2259,7 +2269,7 @@ fi
%if 0%{?suse_version}
%service_del_preun ceph-volume@\*.service
%endif
%if 0%{?fedora} || 0%{?rhel}
%if 0%{?fedora} || 0%{?rhel} || 0%{?openEuler}
%systemd_preun ceph-volume@\*.service
%endif
@ -2620,4 +2630,10 @@ exit 0
%attr(0755,root,root) %dir %{_datadir}/snmp
%{_datadir}/snmp/mibs
%files node-proxy
%{_sbindir}/ceph-node-proxy
%dir %{python3_sitelib}/ceph_node_proxy
%{python3_sitelib}/ceph_node_proxy/*
%{python3_sitelib}/ceph_node_proxy-*
%changelog

View File

@ -1,7 +1,13 @@
ceph (18.2.2-1jammy) jammy; urgency=medium
ceph (18.2.4-1jammy) jammy; urgency=medium
-- Jenkins Build Slave User <jenkins-build@braggi10.front.sepia.ceph.com> Mon, 04 Mar 2024 20:27:31 +0000
-- Jenkins Build Slave User <jenkins-build@braggi02.front.sepia.ceph.com> Fri, 12 Jul 2024 15:42:34 +0000
ceph (18.2.4-1) stable; urgency=medium
* New upstream release
-- Ceph Release Team <ceph-maintainers@ceph.io> Fri, 12 Jul 2024 09:57:18 -0400
ceph (18.2.2-1) stable; urgency=medium

View File

@ -86,6 +86,9 @@ function(build_arrow)
else()
list(APPEND arrow_CMAKE_ARGS -DCMAKE_BUILD_TYPE=Release)
endif()
# don't add -Werror or debug package builds fail with:
#warning _FORTIFY_SOURCE requires compiling with optimization (-O)
list(APPEND arrow_CMAKE_ARGS -DBUILD_WARNING_LEVEL=PRODUCTION)
# we use an external project and copy the sources to bin directory to ensure
# that object files are built outside of the source tree.

View File

@ -11,6 +11,13 @@ function(build_rocksdb)
-DCMAKE_TOOLCHAIN_FILE=${CMAKE_TOOLCHAIN_FILE})
endif()
list(APPEND rocksdb_CMAKE_ARGS -DWITH_LIBURING=${WITH_LIBURING})
if(WITH_LIBURING)
list(APPEND rocksdb_CMAKE_ARGS -During_INCLUDE_DIR=${URING_INCLUDE_DIR})
list(APPEND rocksdb_CMAKE_ARGS -During_LIBRARIES=${URING_LIBRARY_DIR})
list(APPEND rocksdb_INTERFACE_LINK_LIBRARIES uring::uring)
endif()
if(ALLOCATOR STREQUAL "jemalloc")
list(APPEND rocksdb_CMAKE_ARGS -DWITH_JEMALLOC=ON)
list(APPEND rocksdb_INTERFACE_LINK_LIBRARIES JeMalloc::JeMalloc)
@ -52,12 +59,13 @@ function(build_rocksdb)
endif()
include(CheckCXXCompilerFlag)
check_cxx_compiler_flag("-Wno-deprecated-copy" HAS_WARNING_DEPRECATED_COPY)
set(rocksdb_CXX_FLAGS "${CMAKE_CXX_FLAGS}")
if(HAS_WARNING_DEPRECATED_COPY)
set(rocksdb_CXX_FLAGS -Wno-deprecated-copy)
string(APPEND rocksdb_CXX_FLAGS " -Wno-deprecated-copy")
endif()
check_cxx_compiler_flag("-Wno-pessimizing-move" HAS_WARNING_PESSIMIZING_MOVE)
if(HAS_WARNING_PESSIMIZING_MOVE)
set(rocksdb_CXX_FLAGS "${rocksdb_CXX_FLAGS} -Wno-pessimizing-move")
string(APPEND rocksdb_CXX_FLAGS " -Wno-pessimizing-move")
endif()
if(rocksdb_CXX_FLAGS)
list(APPEND rocksdb_CMAKE_ARGS -DCMAKE_CXX_FLAGS='${rocksdb_CXX_FLAGS}')
@ -84,6 +92,9 @@ function(build_rocksdb)
INSTALL_COMMAND ""
LIST_SEPARATOR !)
# make sure all the link libraries are built first
add_dependencies(rocksdb_ext ${rocksdb_INTERFACE_LINK_LIBRARIES})
add_library(RocksDB::RocksDB STATIC IMPORTED)
add_dependencies(RocksDB::RocksDB rocksdb_ext)
set(rocksdb_INCLUDE_DIR "${rocksdb_SOURCE_DIR}/include")

View File

@ -32,6 +32,8 @@ function(build_uring)
ExternalProject_Get_Property(liburing_ext source_dir)
set(URING_INCLUDE_DIR "${source_dir}/src/include")
set(URING_LIBRARY_DIR "${source_dir}/src")
set(URING_INCLUDE_DIR ${URING_INCLUDE_DIR} PARENT_SCOPE)
set(URING_LIBRARY_DIR ${URING_LIBRARY_DIR} PARENT_SCOPE)
add_library(uring::uring STATIC IMPORTED GLOBAL)
add_dependencies(uring::uring liburing_ext)

View File

@ -0,0 +1,2 @@
lib/systemd/system/ceph-exporter*
usr/bin/ceph-exporter

View File

@ -1,3 +1,4 @@
bcrypt
pyOpenSSL
cephfs
ceph-argparse

View File

@ -91,7 +91,6 @@ Build-Depends: automake,
python3-all-dev,
python3-cherrypy3,
python3-natsort,
python3-jwt <pkg.ceph.check>,
python3-pecan <pkg.ceph.check>,
python3-bcrypt <pkg.ceph.check>,
tox <pkg.ceph.check>,
@ -353,6 +352,30 @@ Description: debugging symbols for ceph-mgr
.
This package contains the debugging symbols for ceph-mgr.
Package: ceph-exporter
Architecture: linux-any
Depends: ceph-base (= ${binary:Version}),
Description: metrics exporter for the ceph distributed storage system
Ceph is a massively scalable, open-source, distributed
storage system that runs on commodity hardware and delivers object,
block and file system storage.
.
This package contains the metrics exporter daemon, which is used to expose
the performance metrics.
Package: ceph-exporter-dbg
Architecture: linux-any
Section: debug
Priority: extra
Depends: ceph-exporter (= ${binary:Version}),
${misc:Depends},
Description: debugging symbols for ceph-exporter
Ceph is a massively scalable, open-source, distributed
storage system that runs on commodity hardware and delivers object,
block and file system storage.
.
This package contains the debugging symbols for ceph-exporter.
Package: ceph-mon
Architecture: linux-any
Depends: ceph-base (= ${binary:Version}),

View File

@ -105,6 +105,7 @@ override_dh_strip:
dh_strip -pceph-mds --dbg-package=ceph-mds-dbg
dh_strip -pceph-fuse --dbg-package=ceph-fuse-dbg
dh_strip -pceph-mgr --dbg-package=ceph-mgr-dbg
dh_strip -pceph-exporter --dbg-package=ceph-exporter-dbg
dh_strip -pceph-mon --dbg-package=ceph-mon-dbg
dh_strip -pceph-osd --dbg-package=ceph-osd-dbg
dh_strip -pceph-base --dbg-package=ceph-base-dbg

357
ceph/doc/_static/js/pgcalc.js vendored Normal file
View File

@ -0,0 +1,357 @@
var _____WB$wombat$assign$function_____ = function(name) {return (self._wb_wombat && self._wb_wombat.local_init && self._wb_wombat.local_init(name)) || self[name]; };
if (!self.__WB_pmw) { self.__WB_pmw = function(obj) { this.__WB_source = obj; return this; } }
{
let window = _____WB$wombat$assign$function_____("window");
let self = _____WB$wombat$assign$function_____("self");
let document = _____WB$wombat$assign$function_____("document");
let location = _____WB$wombat$assign$function_____("location");
let top = _____WB$wombat$assign$function_____("top");
let parent = _____WB$wombat$assign$function_____("parent");
let frames = _____WB$wombat$assign$function_____("frames");
let opener = _____WB$wombat$assign$function_____("opener");
var pow2belowThreshold = 0.25
var key_values={};
key_values['poolName'] ={'name':'Pool Name','default':'newPool','description': 'Name of the pool in question. Typical pool names are included below.', 'width':'30%; text-align: left'};
key_values['size'] ={'name':'Size','default': 3, 'description': 'Number of replicas the pool will have. Default value of 3 is pre-filled.', 'width':'10%', 'global':1};
key_values['osdNum'] ={'name':'OSD #','default': 100, 'description': 'Number of OSDs which this Pool will have PGs in. Typically, this is the entire Cluster OSD count, but could be less based on CRUSH rules. (e.g. Separate SSD and SATA disk sets)', 'width':'10%', 'global':1};
key_values['percData'] ={'name':'%Data', 'default': 5, 'description': 'This value represents the approximate percentage of data which will be contained in this pool for that specific OSD set. Examples are pre-filled below for guidance.','width':'10%'};
key_values['targPGsPerOSD'] ={'name':'Target PGs per OSD', 'default':100, 'description': 'This value should be populated based on the following guidance:', 'width':'10%', 'global':1, 'options': [ ['100','If the cluster OSD count is not expected to increase in the foreseeable future.'], ['200', 'If the cluster OSD count is expected to increase (up to double the size) in the foreseeable future.']]}
var notes ={
'totalPerc':'<b>"Total Data Percentage"</b> below table should be a multiple of 100%.',
'totalPGs':'<b>"Total PG Count"</b> below table will be the count of Primary PG copies. However, when calculating total PGs per OSD average, you must include all copies.',
'noDecrease':'It\'s also important to know that the PG count can be increased, but <b>NEVER</b> decreased without destroying / recreating the pool. However, increasing the PG Count of a pool is one of the most impactful events in a Ceph Cluster, and should be avoided for production clusters if possible.',
};
var presetTables={};
presetTables['All-in-One']=[
{ 'poolName' : 'rbd', 'size' : '3', 'osdNum' : '100', 'percData' : '100', 'targPGsPerOSD' : '100'},
];
presetTables['OpenStack']=[
{ 'poolName' : 'cinder-backup', 'size' : '3', 'osdNum' : '100', 'percData' : '25', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'cinder-volumes', 'size' : '3', 'osdNum' : '100', 'percData' : '53', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'ephemeral-vms', 'size' : '3', 'osdNum' : '100', 'percData' : '15', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'glance-images', 'size' : '3', 'osdNum' : '100', 'percData' : '7', 'targPGsPerOSD' : '100'},
];
presetTables['OpenStack w RGW - Jewel and later']=[
{ 'poolName' : '.rgw.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.control', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.data.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.gc', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.intent-log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.meta', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.usage', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.users.keys', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.users.email', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.users.swift', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.users.uid', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.buckets.extra', 'size' : '3', 'osdNum' : '100', 'percData' : '1.0', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.buckets.index', 'size' : '3', 'osdNum' : '100', 'percData' : '3.0', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.buckets.data', 'size' : '3', 'osdNum' : '100', 'percData' : '19', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'cinder-backup', 'size' : '3', 'osdNum' : '100', 'percData' : '18', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'cinder-volumes', 'size' : '3', 'osdNum' : '100', 'percData' : '42.8', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'ephemeral-vms', 'size' : '3', 'osdNum' : '100', 'percData' : '10', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'glance-images', 'size' : '3', 'osdNum' : '100', 'percData' : '5', 'targPGsPerOSD' : '100'},
];
presetTables['Rados Gateway Only - Jewel and later']=[
{ 'poolName' : '.rgw.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.control', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.data.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.gc', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.intent-log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.meta', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.usage', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.users.keys', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.users.email', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.users.swift', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.users.uid', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.buckets.extra', 'size' : '3', 'osdNum' : '100', 'percData' : '1.0', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.buckets.index', 'size' : '3', 'osdNum' : '100', 'percData' : '3.0', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'default.rgw.buckets.data', 'size' : '3', 'osdNum' : '100', 'percData' : '94.8', 'targPGsPerOSD' : '100'},
];
presetTables['OpenStack w RGW - Infernalis and earlier']=[
{ 'poolName' : '.intent-log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.buckets', 'size' : '3', 'osdNum' : '100', 'percData' : '18', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.buckets.extra', 'size' : '3', 'osdNum' : '100', 'percData' : '1.0', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.buckets.index', 'size' : '3', 'osdNum' : '100', 'percData' : '3.0', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.control', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.gc', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.usage', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.users', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.users.email', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.users.swift', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.users.uid', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'cinder-backup', 'size' : '3', 'osdNum' : '100', 'percData' : '19', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'cinder-volumes', 'size' : '3', 'osdNum' : '100', 'percData' : '42.9', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'ephemeral-vms', 'size' : '3', 'osdNum' : '100', 'percData' : '10', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'glance-images', 'size' : '3', 'osdNum' : '100', 'percData' : '5', 'targPGsPerOSD' : '100'},
];
presetTables['Rados Gateway Only - Infernalis and earlier']=[
{ 'poolName' : '.intent-log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.log', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.buckets', 'size' : '3', 'osdNum' : '100', 'percData' : '94.9', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.buckets.extra', 'size' : '3', 'osdNum' : '100', 'percData' : '1.0', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.buckets.index', 'size' : '3', 'osdNum' : '100', 'percData' : '3.0', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.control', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.gc', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.rgw.root', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.usage', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.users', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.users.email', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.users.swift', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
{ 'poolName' : '.users.uid', 'size' : '3', 'osdNum' : '100', 'percData' : '0.1', 'targPGsPerOSD' : '100'},
];
presetTables['RBD and libRados']=[
{ 'poolName' : 'rbd', 'size' : '3', 'osdNum' : '100', 'percData' : '75', 'targPGsPerOSD' : '100'},
{ 'poolName' : 'myObjects', 'size' : '3', 'osdNum' : '100', 'percData' : '25', 'targPGsPerOSD' : '100'},
];
$(function() {
$("#presetType").on("change",changePreset);
$("#btnAddPool").on("click",addPool);
$("#btnGenCommands").on("click",generateCommands);
$.each(presetTables,function(index,value) {
selIndex='';
if ( index == 'OpenStack w RGW - Jewel and later' )
selIndex=' selected';
$("#presetType").append("<option value=\""+index+"\""+selIndex+">"+index+"</option>");
});
changePreset();
$("#beforeTable").html("<fieldset id='keyFieldset'><legend>Key</legend><dl class='table-display' id='keyDL'></dl></fieldset>");
$.each(key_values, function(index, value) {
pre='';
post='';
if ('global' in value) {
pre='<a href="javascript://" onClick="globalChange(\''+index+'\');" title="Change the \''+value['name']+'\' parameter globally">';
post='</a>'
}
var dlAdd="<dt id='dt_"+index+"'>"+pre+value['name']+post+"</dt><dd id='dd_"+index+"'>"+value['description'];
if ( 'options' in value ) {
dlAdd+="<dl class='sub-table'>";
$.each(value['options'], function (subIndex, subValue) {
dlAdd+="<dt><a href=\"javascript://\" onClick=\"massUpdate('"+index+"','"+subValue[0]+"');\" title=\"Set all '"+value['name']+"' fields to '"+subValue[0]+"'.\">"+subValue[0]+"</a></dt><dd>"+subValue[1]+"</dd>";
});
dlAdd+="</dl>";
}
dlAdd+="</dd>";
$("#keyDL").append(dlAdd);
});
$("#afterTable").html("<fieldset id='notesFieldset'><legend>Notes</legend><ul id='notesUL'>\n<ul></fieldset>");
$.each(notes,function(index, value) {
$("#notesUL").append("\t<li id=\"li_"+index+"\">"+value+"</li>\n");
});
});
function changePreset() {
resetTable();
fillTable($("#presetType").val());
}
function resetTable() {
$("#pgsperpool").html("");
$("#pgsperpool").append("<tr id='headerRow'>\n</tr>\n");
$("#headerRow").append("\t<th>&nbsp;</th>\n");
var fieldCount=0;
var percDataIndex=0;
$.each(key_values, function(index, value) {
fieldCount++;
pre='';
post='';
var widthAdd='';
if ( index == 'percData' )
percDataIndex=fieldCount;
if ('width' in value)
widthAdd=' style=\'width: '+value['width']+'\'';
if ('global' in value) {
pre='<a href="javascript://" onClick="globalChange(\''+index+'\');" title="Change the \''+value['name']+'\' parameter globally">';
post='</a>'
}
$("#headerRow").append("\t<th"+widthAdd+">"+pre+value['name']+post+"</th>\n");
});
percDataIndex++;
$("#headerRow").append("\t<th class='center'>Suggested PG Count</th>\n");
$("#pgsperpool").append("<tr id='totalRow'><td colspan='"+percDataIndex+"' id='percTotal' style='text-align: right; margin-right: 10px;'><strong>Total Data Percentage:</strong> <span id='percTotalValue'>0</span>%</td><td>&nbsp;</td><td id='pgTotal' class='bold pgcount' style='text-align: right;'>PG Total Count: <span id='pgTotalValue'>0</span></td></tr>");
}
function nearestPow2( aSize ){
var tmp=Math.pow(2, Math.round(Math.log(aSize)/Math.log(2)));
if(tmp<(aSize*(1-pow2belowThreshold)))
tmp*=2;
return tmp;
}
function globalChange(field) {
dialogHTML='<div title="Change \''+key_values[field]['name']+'\' Globally"><form>';
dialogHTML+='<label for="value">New '+key_values[field]['name']+' value:</label><br />\n';
dialogHTML+='<input type="text" name="globalValue" id="globalValue" value="'+$("#row0_"+field+"_input").val()+'" style="text-align: right;"/>';
dialogHTML+='<input type="hidden" name="globalField" id="globalField" value="'+field+'"/>';
dialogHTML+='<input type="submit" tabindex="-1" style="position:absolute; top:-1000px">';
dialogHTML+='</form>';
globalDialog=$(dialogHTML).dialog({
autoOpen: true,
width: 350,
show: 'fold',
hide: 'fold',
modal: true,
buttons: {
"Update Value": function() { massUpdate($("#globalField").val(),$("#globalValue").val()); globalDialog.dialog("close"); setTimeout(function() { globalDialog.dialog("destroy"); }, 1000); },
"Cancel": function() { globalDialog.dialog("close"); setTimeout(function() { globalDialog.dialog("destroy"); }, 1000); }
}
});
}
var rowCount=0;
function fillTable(presetType) {
rowCount=0;
$.each(presetTables[presetType], function(index,value) {
addTableRow(value);
});
}
function addPool() {
dialogHTML='<div title="Add Pool"><form>';
$.each(key_values, function(index,value) {
dialogHTML+='<br /><label for="new'+index+'">'+value['name']+':</label><br />\n';
classAdd='right';
if ( index == 'poolName' )
classAdd='left';
dialogHTML+='<input type="text" name="new'+index+'" id="new'+index+'" value="'+value['default']+'" class="'+classAdd+'"/><br />';
});
dialogHTML+='<input type="submit" tabindex="-1" style="position:absolute; top:-1000px">';
dialogHTML+='</form>';
addPoolDialog=$(dialogHTML).dialog({
autoOpen: true,
width: 350,
show: 'fold',
hide: 'fold',
modal: true,
buttons: {
"Add Pool": function() {
var newPoolValues={};
$.each(key_values,function(index,value) {
newPoolValues[index]=$("#new"+index).val();
});
addTableRow(newPoolValues);
addPoolDialog.dialog("close");
setTimeout(function() { addPoolDialog.dialog("destroy"); }, 1000); },
"Cancel": function() { addPoolDialog.dialog("close"); setTimeout(function() { addPoolDialog.dialog("destroy"); }, 1000); }
}
});
// addTableRow({'poolName':'newPool','size':3, 'osdNum':100,'targPGsPerOSD': 100, 'percData':0});
}
function addTableRow(rowValues) {
rowAdd="<tr id='row"+rowCount+"'>\n";
rowAdd+="\t<td width='15px' class='inputColor'><a href='javascript://' title='Remove Pool' onClick='$(\"#row"+rowCount+"\").remove();updateTotals();'><span class='ui-icon ui-icon-trash'></span></a></td>\n";
$.each(key_values, function(index,value) {
classAdd=' center';
modifier='';
if ( index == 'percData' ) {
classAdd='" style="text-align: right;';
// modifier=' %';
} else if ( index == 'poolName' )
classAdd=' left';
rowAdd+="\t<td id=\"row"+rowCount+"_"+index+"\"><input type=\"text\" class=\"inputColor "+index+classAdd+"\" id=\"row"+rowCount+"_"+index+"_input\" value=\""+rowValues[index]+"\" onFocus=\"focusMe("+rowCount+",'"+index+"');\" onKeyUp=\"keyMe("+rowCount+",'"+index+"');\" onBlur=\"blurMe("+rowCount+",'"+index+"');\">"+modifier+"</td>\n";
});
rowAdd+="\t<td id=\"row"+rowCount+"_pgCount\" class='pgcount' style='text-align: right;'>0</td></tr>";
$("#totalRow").before(rowAdd);
updatePGCount(rowCount);
$("[id$='percData_input']").each(function() { var fieldVal=parseFloat($(this).val()); $(this).val(fieldVal.toFixed(2)); });
rowCount++;
}
function updatePGCount(rowID) {
if(rowID==-1) {
for(var i=0;i<rowCount;i++) {
updatePGCount(i);
}
} else {
minValue=nearestPow2(Math.floor($("#row"+rowID+"_osdNum_input").val()/$("#row"+rowID+"_size_input").val())+1);
if(minValue<$("#row"+rowID+"_osdNum_input").val())
minValue*=2;
calcValue=nearestPow2(Math.floor(($("#row"+rowID+"_targPGsPerOSD_input").val()*$("#row"+rowID+"_osdNum_input").val()*$("#row"+rowID+"_percData_input").val())/(100*$("#row"+rowID+"_size_input").val())));
if(minValue>calcValue)
$("#row"+rowID+"_pgCount").html(minValue);
else
$("#row"+rowID+"_pgCount").html(calcValue);
}
updateTotals();
}
function focusMe(rowID,field) {
$("#row"+rowID+"_"+field+"_input").toggleClass('inputColor');
$("#row"+rowID+"_"+field+"_input").toggleClass('highlightColor');
$("#dt_"+field).toggleClass('highlightColor');
$("#dd_"+field).toggleClass('highlightColor');
updatePGCount(rowID);
}
function blurMe(rowID,field) {
focusMe(rowID,field);
$("[id$='percData_input']").each(function() { var fieldVal=parseFloat($(this).val()); $(this).val(fieldVal.toFixed(2)); });
}
function keyMe(rowID,field) {
updatePGCount(rowID);
}
function massUpdate(field,value) {
$("[id$='_"+field+"_input']").val(value);
key_values[field]['default']=value;
updatePGCount(-1);
}
function updateTotals() {
var totalPerc=0;
var totalPGs=0;
$("[id$='percData_input']").each(function() {
totalPerc+=parseFloat($(this).val());
if ( parseFloat($(this).val()) > 100 )
$(this).addClass('ui-state-error');
else
$(this).removeClass('ui-state-error');
});
$("[id$='_pgCount']").each(function() {
totalPGs+=parseInt($(this).html());
});
$("#percTotalValue").html(totalPerc.toFixed(2));
$("#pgTotalValue").html(totalPGs);
if(parseFloat(totalPerc.toFixed(2)) % 100 != 0) {
$("#percTotalValue").addClass('ui-state-error');
$("#li_totalPerc").addClass('ui-state-error');
} else {
$("#percTotalValue").removeClass('ui-state-error');
$("#li_totalPerc").removeClass('ui-state-error');
}
$("#commandCode").html("");
}
function generateCommands() {
outputCommands="## Note: The 'while' loops below pause between pools to allow all\n\
## PGs to be created. This is a safety mechanism to prevent\n\
## saturating the Monitor nodes.\n\
## -------------------------------------------------------------------\n\n";
for(i=0;i<rowCount;i++) {
console.log(i);
outputCommands+="ceph osd pool create "+$("#row"+i+"_poolName_input").val()+" "+$("#row"+i+"_pgCount").html()+"\n";
outputCommands+="ceph osd pool set "+$("#row"+i+"_poolName_input").val()+" size "+$("#row"+i+"_size_input").val()+"\n";
outputCommands+="while [ $(ceph -s | grep creating -c) -gt 0 ]; do echo -n .;sleep 1; done\n\n";
}
window.location.href = "data:application/download," + encodeURIComponent(outputCommands);
}
}

View File

@ -19,9 +19,14 @@ The Ceph Storage Cluster
========================
Ceph provides an infinitely scalable :term:`Ceph Storage Cluster` based upon
:abbr:`RADOS (Reliable Autonomic Distributed Object Store)`, which you can read
about in `RADOS - A Scalable, Reliable Storage Service for Petabyte-scale
Storage Clusters`_.
:abbr:`RADOS (Reliable Autonomic Distributed Object Store)`, a reliable,
distributed storage service that uses the intelligence in each of its nodes to
secure the data it stores and to provide that data to :term:`client`\s. See
Sage Weil's "`The RADOS Object Store
<https://ceph.io/en/news/blog/2009/the-rados-distributed-object-store/>`_" blog
post for a brief explanation of RADOS and see `RADOS - A Scalable, Reliable
Storage Service for Petabyte-scale Storage Clusters`_ for an exhaustive
explanation of :term:`RADOS`.
A Ceph Storage Cluster consists of multiple types of daemons:
@ -33,9 +38,8 @@ A Ceph Storage Cluster consists of multiple types of daemons:
.. _arch_monitor:
Ceph Monitors maintain the master copy of the cluster map, which they provide
to Ceph clients. Provisioning multiple monitors within the Ceph cluster ensures
availability in the event that one of the monitor daemons or its host fails.
The Ceph monitor provides copies of the cluster map to storage cluster clients.
to Ceph clients. The existence of multiple monitors in the Ceph cluster ensures
availability if one of the monitor daemons or its host fails.
A Ceph OSD Daemon checks its own state and the state of other OSDs and reports
back to monitors.
@ -47,10 +51,11 @@ A Ceph Metadata Server (MDS) manages file metadata when CephFS is used to
provide file services.
Storage cluster clients and :term:`Ceph OSD Daemon`\s use the CRUSH algorithm
to compute information about data location. This means that clients and OSDs
are not bottlenecked by a central lookup table. Ceph's high-level features
include a native interface to the Ceph Storage Cluster via ``librados``, and a
number of service interfaces built on top of ``librados``.
to compute information about the location of data. Use of the CRUSH algoritm
means that clients and OSDs are not bottlenecked by a central lookup table.
Ceph's high-level features include a native interface to the Ceph Storage
Cluster via ``librados``, and a number of service interfaces built on top of
``librados``.
Storing Data
------------
@ -128,8 +133,8 @@ massive scale by distributing the work to all the OSD daemons in the cluster
and all the clients that communicate with them. CRUSH uses intelligent data
replication to ensure resiliency, which is better suited to hyper-scale
storage. The following sections provide additional details on how CRUSH works.
For a detailed discussion of CRUSH, see `CRUSH - Controlled, Scalable,
Decentralized Placement of Replicated Data`_.
For an in-depth, academic discussion of CRUSH, see `CRUSH - Controlled,
Scalable, Decentralized Placement of Replicated Data`_.
.. index:: architecture; cluster map
@ -587,7 +592,7 @@ cluster map, the client doesn't know anything about object locations.**
**Object locations must be computed.**
The client requies only the object ID and the name of the pool in order to
The client requires only the object ID and the name of the pool in order to
compute the object location.
Ceph stores data in named pools (for example, "liverpool"). When a client
@ -1589,7 +1594,8 @@ typically deploy a Ceph Block Device with the ``rbd`` network storage driver in
QEMU/KVM, where the host machine uses ``librbd`` to provide a block device
service to the guest. Many cloud computing stacks use ``libvirt`` to integrate
with hypervisors. You can use thin-provisioned Ceph Block Devices with QEMU and
``libvirt`` to support OpenStack and CloudStack among other solutions.
``libvirt`` to support OpenStack, OpenNebula and CloudStack
among other solutions.
While we do not provide ``librbd`` support with other hypervisors at this time,
you may also use Ceph Block Device kernel objects to provide a block device to a

View File

@ -22,20 +22,20 @@ Preparation
#. Make sure that the ``cephadm`` command line tool is available on each host
in the existing cluster. See :ref:`get-cephadm` to learn how.
#. Prepare each host for use by ``cephadm`` by running this command:
#. Prepare each host for use by ``cephadm`` by running this command on that host:
.. prompt:: bash #
cephadm prepare-host
#. Choose a version of Ceph to use for the conversion. This procedure will work
with any release of Ceph that is Octopus (15.2.z) or later, inclusive. The
with any release of Ceph that is Octopus (15.2.z) or later. The
latest stable release of Ceph is the default. You might be upgrading from an
earlier Ceph release at the same time that you're performing this
conversion; if you are upgrading from an earlier release, make sure to
conversion. If you are upgrading from an earlier release, make sure to
follow any upgrade-related instructions for that release.
Pass the image to cephadm with the following command:
Pass the Ceph container image to cephadm with the following command:
.. prompt:: bash #
@ -50,25 +50,27 @@ Preparation
cephadm ls
Before starting the conversion process, ``cephadm ls`` shows all existing
daemons to have a style of ``legacy``. As the adoption process progresses,
adopted daemons will appear with a style of ``cephadm:v1``.
Before starting the conversion process, ``cephadm ls`` reports all existing
daemons with the style ``legacy``. As the adoption process progresses,
adopted daemons will appear with the style ``cephadm:v1``.
Adoption process
----------------
#. Make sure that the ceph configuration has been migrated to use the cluster
config database. If the ``/etc/ceph/ceph.conf`` is identical on each host,
then the following command can be run on one single host and will affect all
hosts:
#. Make sure that the ceph configuration has been migrated to use the cluster's
central config database. If ``/etc/ceph/ceph.conf`` is identical on all
hosts, then the following command can be run on one host and will take
effect for all hosts:
.. prompt:: bash #
ceph config assimilate-conf -i /etc/ceph/ceph.conf
If there are configuration variations between hosts, you will need to repeat
this command on each host. During this adoption process, view the cluster's
this command on each host, taking care that if there are conflicting option
settings across hosts, the values from the last host will be used. During this
adoption process, view the cluster's central
configuration to confirm that it is complete by running the following
command:
@ -76,36 +78,36 @@ Adoption process
ceph config dump
#. Adopt each monitor:
#. Adopt each Monitor:
.. prompt:: bash #
cephadm adopt --style legacy --name mon.<hostname>
Each legacy monitor should stop, quickly restart as a cephadm
Each legacy Monitor will stop, quickly restart as a cephadm
container, and rejoin the quorum.
#. Adopt each manager:
#. Adopt each Manager:
.. prompt:: bash #
cephadm adopt --style legacy --name mgr.<hostname>
#. Enable cephadm:
#. Enable cephadm orchestration:
.. prompt:: bash #
ceph mgr module enable cephadm
ceph orch set backend cephadm
#. Generate an SSH key:
#. Generate an SSH key for cephadm:
.. prompt:: bash #
ceph cephadm generate-key
ceph cephadm get-pub-key > ~/ceph.pub
#. Install the cluster SSH key on each host in the cluster:
#. Install the cephadm SSH key on each host in the cluster:
.. prompt:: bash #
@ -118,9 +120,10 @@ Adoption process
SSH keys.
.. note::
It is also possible to have cephadm use a non-root user to SSH
It is also possible to arrange for cephadm to use a non-root user to SSH
into cluster hosts. This user needs to have passwordless sudo access.
Use ``ceph cephadm set-user <user>`` and copy the SSH key to that user.
Use ``ceph cephadm set-user <user>`` and copy the SSH key to that user's
home directory on each host.
See :ref:`cephadm-ssh-user`
#. Tell cephadm which hosts to manage:
@ -129,10 +132,10 @@ Adoption process
ceph orch host add <hostname> [ip-address]
This will perform a ``cephadm check-host`` on each host before adding it;
this check ensures that the host is functioning properly. The IP address
argument is recommended; if not provided, then the host name will be resolved
via DNS.
This will run ``cephadm check-host`` on each host before adding it.
This check ensures that the host is functioning properly. The IP address
argument is recommended. If the address is not provided, then the host name
will be resolved via DNS.
#. Verify that the adopted monitor and manager daemons are visible:
@ -153,8 +156,8 @@ Adoption process
cephadm adopt --style legacy --name osd.1
cephadm adopt --style legacy --name osd.2
#. Redeploy MDS daemons by telling cephadm how many daemons to run for
each file system. List file systems by name with the command ``ceph fs
#. Redeploy CephFS MDS daemons (if deployed) by telling cephadm how many daemons to run for
each file system. List CephFS file systems by name with the command ``ceph fs
ls``. Run the following command on the master nodes to redeploy the MDS
daemons:
@ -189,19 +192,19 @@ Adoption process
systemctl stop ceph-mds.target
rm -rf /var/lib/ceph/mds/ceph-*
#. Redeploy RGW daemons. Cephadm manages RGW daemons by zone. For each
zone, deploy new RGW daemons with cephadm:
#. Redeploy Ceph Object Gateway RGW daemons if deployed. Cephadm manages RGW
daemons by zone. For each zone, deploy new RGW daemons with cephadm:
.. prompt:: bash #
ceph orch apply rgw <svc_id> [--realm=<realm>] [--zone=<zone>] [--port=<port>] [--ssl] [--placement=<placement>]
where *<placement>* can be a simple daemon count, or a list of
specific hosts (see :ref:`orchestrator-cli-placement-spec`), and the
specific hosts (see :ref:`orchestrator-cli-placement-spec`). The
zone and realm arguments are needed only for a multisite setup.
After the daemons have started and you have confirmed that they are
functioning, stop and remove the old, legacy daemons:
functioning, stop and remove the legacy daemons:
.. prompt:: bash #

View File

@ -1,36 +1,36 @@
=======================
Basic Ceph Client Setup
=======================
Client machines require some basic configuration to interact with
Ceph clusters. This section describes how to configure a client machine
so that it can interact with a Ceph cluster.
Client hosts require basic configuration to interact with
Ceph clusters. This section describes how to perform this configuration.
.. note::
Most client machines need to install only the `ceph-common` package
and its dependencies. Such a setup supplies the basic `ceph` and
`rados` commands, as well as other commands including `mount.ceph`
and `rbd`.
Most client hosts need to install only the ``ceph-common`` package
and its dependencies. Such an installation supplies the basic ``ceph`` and
``rados`` commands, as well as other commands including ``mount.ceph``
and ``rbd``.
Config File Setup
=================
Client machines usually require smaller configuration files (here
sometimes called "config files") than do full-fledged cluster members.
Client hosts usually require smaller configuration files (here
sometimes called "config files") than do back-end cluster hosts.
To generate a minimal config file, log into a host that has been
configured as a client or that is running a cluster daemon, and then run the following command:
configured as a client or that is running a cluster daemon, then
run the following command:
.. prompt:: bash #
ceph config generate-minimal-conf
This command generates a minimal config file that tells the client how
to reach the Ceph monitors. The contents of this file should usually
be installed in ``/etc/ceph/ceph.conf``.
to reach the Ceph Monitors. This file should usually
be copied to ``/etc/ceph/ceph.conf`` on each client host.
Keyring Setup
=============
Most Ceph clusters run with authentication enabled. This means that
the client needs keys in order to communicate with the machines in the
cluster. To generate a keyring file with credentials for `client.fs`,
the client needs keys in order to communicate with Ceph daemons.
To generate a keyring file with credentials for ``client.fs``,
log into an running cluster member and run the following command:
.. prompt:: bash $
@ -40,6 +40,10 @@ log into an running cluster member and run the following command:
The resulting output is directed into a keyring file, typically
``/etc/ceph/ceph.keyring``.
To gain a broader understanding of client keyring distribution and administration, you should read :ref:`client_keyrings_and_configs`.
To gain a broader understanding of client keyring distribution and administration,
you should read :ref:`client_keyrings_and_configs`.
To see an example that explains how to distribute ``ceph.conf`` configuration files to hosts that are tagged with the ``bare_config`` label, you should read the section called "Distributing ceph.conf to hosts tagged with bare_config" in the section called :ref:`etc_ceph_conf_distribution`.
To see an example that explains how to distribute ``ceph.conf`` configuration
files to hosts that are tagged with the ``bare_config`` label, you should read
the subsection named "Distributing ceph.conf to hosts tagged with bare_config"
under the heading :ref:`etc_ceph_conf_distribution`.

View File

@ -30,8 +30,8 @@ This table shows which version pairs are expected to work or not work together:
.. note::
While not all podman versions have been actively tested against
all Ceph versions, there are no known issues with using podman
While not all Podman versions have been actively tested against
all Ceph versions, there are no known issues with using Podman
version 3.0 or greater with Ceph Quincy and later releases.
.. warning::

View File

@ -74,9 +74,9 @@ To add each new host to the cluster, perform two steps:
ceph orch host add host2 10.10.0.102
ceph orch host add host3 10.10.0.103
It is best to explicitly provide the host IP address. If an IP is
It is best to explicitly provide the host IP address. If an address is
not provided, then the host name will be immediately resolved via
DNS and that IP will be used.
DNS and the result will be used.
One or more labels can also be included to immediately label the
new host. For example, by default the ``_admin`` label will make
@ -104,7 +104,7 @@ To drain all daemons from a host, run a command of the following form:
The ``_no_schedule`` and ``_no_conf_keyring`` labels will be applied to the
host. See :ref:`cephadm-special-host-labels`.
If you only want to drain daemons but leave managed ceph conf and keyring
If you want to drain daemons but leave managed `ceph.conf` and keyring
files on the host, you may pass the ``--keep-conf-keyring`` flag to the
drain command.
@ -115,7 +115,8 @@ drain command.
This will apply the ``_no_schedule`` label to the host but not the
``_no_conf_keyring`` label.
All OSDs on the host will be scheduled to be removed. You can check the progress of the OSD removal operation with the following command:
All OSDs on the host will be scheduled to be removed. You can check
progress of the OSD removal operation with the following command:
.. prompt:: bash #
@ -148,7 +149,7 @@ cluster by running the following command:
Offline host removal
--------------------
Even if a host is offline and can not be recovered, it can be removed from the
If a host is offline and can not be recovered, it can be removed from the
cluster by running a command of the following form:
.. prompt:: bash #
@ -250,8 +251,8 @@ Rescanning Host Devices
=======================
Some servers and external enclosures may not register device removal or insertion with the
kernel. In these scenarios, you'll need to perform a host rescan. A rescan is typically
non-disruptive, and can be performed with the following CLI command:
kernel. In these scenarios, you'll need to perform a device rescan on the appropriate host.
A rescan is typically non-disruptive, and can be performed with the following CLI command:
.. prompt:: bash #
@ -314,19 +315,43 @@ create a new CRUSH host located in the specified hierarchy.
.. note::
The ``location`` attribute will be only affect the initial CRUSH location. Subsequent
changes of the ``location`` property will be ignored. Also, removing a host will not remove
any CRUSH buckets.
The ``location`` attribute will be only affect the initial CRUSH location.
Subsequent changes of the ``location`` property will be ignored. Also,
removing a host will not remove an associated CRUSH bucket unless the
``--rm-crush-entry`` flag is provided to the ``orch host rm`` command
See also :ref:`crush_map_default_types`.
Removing a host from the CRUSH map
==================================
The ``ceph orch host rm`` command has support for removing the associated host bucket
from the CRUSH map. This is done by providing the ``--rm-crush-entry`` flag.
.. prompt:: bash [ceph:root@host1/]#
ceph orch host rm host1 --rm-crush-entry
When this flag is specified, cephadm will attempt to remove the host bucket
from the CRUSH map as part of the host removal process. Note that if
it fails to do so, cephadm will report the failure and the host will remain under
cephadm control.
.. note::
Removal from the CRUSH map will fail if there are OSDs deployed on the
host. If you would like to remove all the host's OSDs as well, please start
by using the ``ceph orch host drain`` command to do so. Once the OSDs
have been removed, then you may direct cephadm remove the CRUSH bucket
along with the host using the ``--rm-crush-entry`` flag.
OS Tuning Profiles
==================
Cephadm can be used to manage operating-system-tuning profiles that apply sets
of sysctl settings to sets of hosts.
Cephadm can be used to manage operating system tuning profiles that apply
``sysctl`` settings to sets of hosts.
Create a YAML spec file in the following format:
To do so, create a YAML spec file in the following format:
.. code-block:: yaml
@ -345,18 +370,21 @@ Apply the tuning profile with the following command:
ceph orch tuned-profile apply -i <tuned-profile-file-name>
This profile is written to ``/etc/sysctl.d/`` on each host that matches the
hosts specified in the placement block of the yaml, and ``sysctl --system`` is
This profile is written to a file under ``/etc/sysctl.d/`` on each host
specified in the ``placement`` block, then ``sysctl --system`` is
run on the host.
.. note::
The exact filename that the profile is written to within ``/etc/sysctl.d/``
is ``<profile-name>-cephadm-tuned-profile.conf``, where ``<profile-name>`` is
the ``profile_name`` setting that you specify in the YAML spec. Because
the ``profile_name`` setting that you specify in the YAML spec. We suggest
naming these profiles following the usual ``sysctl.d`` `NN-xxxxx` convention. Because
sysctl settings are applied in lexicographical order (sorted by the filename
in which the setting is specified), you may want to set the ``profile_name``
in your spec so that it is applied before or after other conf files.
in which the setting is specified), you may want to carefully choose
the ``profile_name`` in your spec so that it is applied before or after other
conf files. Careful selection ensures that values supplied here override or
do not override those in other ``sysctl.d`` files as desired.
.. note::
@ -365,7 +393,7 @@ run on the host.
.. note::
Applying tuned profiles is idempotent when the ``--no-overwrite`` option is
Applying tuning profiles is idempotent when the ``--no-overwrite`` option is
passed. Moreover, if the ``--no-overwrite`` option is passed, existing
profiles with the same name are not overwritten.
@ -525,7 +553,7 @@ There are two ways to customize this configuration for your environment:
We do *not recommend* this approach. The path name must be
visible to *any* mgr daemon, and cephadm runs all daemons as
containers. That means that the file either need to be placed
containers. That means that the file must either be placed
inside a customized container image for your deployment, or
manually distributed to the mgr data directory
(``/var/lib/ceph/<cluster-fsid>/mgr.<id>`` on the host, visible at
@ -578,8 +606,8 @@ Note that ``man hostname`` recommends ``hostname`` to return the bare
host name:
The FQDN (Fully Qualified Domain Name) of the system is the
name that the resolver(3) returns for the host name, such as,
ursula.example.com. It is usually the hostname followed by the DNS
name that the resolver(3) returns for the host name, for example
``ursula.example.com``. It is usually the short hostname followed by the DNS
domain name (the part after the first dot). You can check the FQDN
using ``hostname --fqdn`` or the domain name using ``dnsdomainname``.

View File

@ -4,7 +4,7 @@
Deploying a new Ceph cluster
============================
Cephadm creates a new Ceph cluster by "bootstrapping" on a single
Cephadm creates a new Ceph cluster by bootstrapping a single
host, expanding the cluster to encompass any additional hosts, and
then deploying the needed services.
@ -18,7 +18,7 @@ Requirements
- Python 3
- Systemd
- Podman or Docker for running containers
- Time synchronization (such as chrony or NTP)
- Time synchronization (such as Chrony or the legacy ``ntpd``)
- LVM2 for provisioning storage devices
Any modern Linux distribution should be sufficient. Dependencies
@ -45,6 +45,13 @@ There are two ways to install ``cephadm``:
Choose either the distribution-specific method or the curl-based method. Do
not attempt to use both these methods on one system.
.. note:: Recent versions of cephadm are distributed as an executable compiled
from source code. Unlike for earlier versions of Ceph it is no longer
sufficient to copy a single script from Ceph's git tree and run it. If you
wish to run cephadm using a development version you should create your own
build of cephadm. See :ref:`compiling-cephadm` for details on how to create
your own standalone cephadm executable.
.. _cephadm_install_distros:
distribution-specific installations
@ -85,9 +92,9 @@ that case, you can install cephadm directly. For example:
curl-based installation
-----------------------
* First, determine what version of Ceph you will need. You can use the releases
* First, determine what version of Ceph you wish to install. You can use the releases
page to find the `latest active releases <https://docs.ceph.com/en/latest/releases/#active-releases>`_.
For example, we might look at that page and find that ``18.2.0`` is the latest
For example, we might find that ``18.2.1`` is the latest
active release.
* Use ``curl`` to fetch a build of cephadm for that release.
@ -113,7 +120,7 @@ curl-based installation
* If you encounter any issues with running cephadm due to errors including
the message ``bad interpreter``, then you may not have Python or
the correct version of Python installed. The cephadm tool requires Python 3.6
and above. You can manually run cephadm with a particular version of Python by
or later. You can manually run cephadm with a particular version of Python by
prefixing the command with your installed Python version. For example:
.. prompt:: bash #
@ -121,6 +128,11 @@ curl-based installation
python3.8 ./cephadm <arguments...>
* Although the standalone cephadm is sufficient to bootstrap a cluster, it is
best to have the ``cephadm`` command installed on the host. To install
the packages that provide the ``cephadm`` command, run the following
commands:
.. _cephadm_update:
update cephadm
@ -166,7 +178,7 @@ What to know before you bootstrap
The first step in creating a new Ceph cluster is running the ``cephadm
bootstrap`` command on the Ceph cluster's first host. The act of running the
``cephadm bootstrap`` command on the Ceph cluster's first host creates the Ceph
cluster's first "monitor daemon", and that monitor daemon needs an IP address.
cluster's first Monitor daemon.
You must pass the IP address of the Ceph cluster's first host to the ``ceph
bootstrap`` command, so you'll need to know the IP address of that host.
@ -187,13 +199,13 @@ Run the ``ceph bootstrap`` command:
This command will:
* Create a monitor and manager daemon for the new cluster on the local
* Create a Monitor and a Manager daemon for the new cluster on the local
host.
* Generate a new SSH key for the Ceph cluster and add it to the root
user's ``/root/.ssh/authorized_keys`` file.
* Write a copy of the public key to ``/etc/ceph/ceph.pub``.
* Write a minimal configuration file to ``/etc/ceph/ceph.conf``. This
file is needed to communicate with the new cluster.
file is needed to communicate with Ceph daemons.
* Write a copy of the ``client.admin`` administrative (privileged!)
secret key to ``/etc/ceph/ceph.client.admin.keyring``.
* Add the ``_admin`` label to the bootstrap host. By default, any host
@ -205,7 +217,7 @@ This command will:
Further information about cephadm bootstrap
-------------------------------------------
The default bootstrap behavior will work for most users. But if you'd like
The default bootstrap process will work for most users. But if you'd like
immediately to know more about ``cephadm bootstrap``, read the list below.
Also, you can run ``cephadm bootstrap -h`` to see all of ``cephadm``'s
@ -216,15 +228,15 @@ available options.
journald. If you want Ceph to write traditional log files to ``/var/log/ceph/$fsid``,
use the ``--log-to-file`` option during bootstrap.
* Larger Ceph clusters perform better when (external to the Ceph cluster)
* Larger Ceph clusters perform best when (external to the Ceph cluster)
public network traffic is separated from (internal to the Ceph cluster)
cluster traffic. The internal cluster traffic handles replication, recovery,
and heartbeats between OSD daemons. You can define the :ref:`cluster
network<cluster-network>` by supplying the ``--cluster-network`` option to the ``bootstrap``
subcommand. This parameter must define a subnet in CIDR notation (for example
subcommand. This parameter must be a subnet in CIDR notation (for example
``10.90.90.0/24`` or ``fe80::/64``).
* ``cephadm bootstrap`` writes to ``/etc/ceph`` the files needed to access
* ``cephadm bootstrap`` writes to ``/etc/ceph`` files needed to access
the new cluster. This central location makes it possible for Ceph
packages installed on the host (e.g., packages that give access to the
cephadm command line interface) to find these files.
@ -245,12 +257,12 @@ available options.
EOF
$ ./cephadm bootstrap --config initial-ceph.conf ...
* The ``--ssh-user *<user>*`` option makes it possible to choose which SSH
* The ``--ssh-user *<user>*`` option makes it possible to designate which SSH
user cephadm will use to connect to hosts. The associated SSH key will be
added to ``/home/*<user>*/.ssh/authorized_keys``. The user that you
designate with this option must have passwordless sudo access.
* If you are using a container on an authenticated registry that requires
* If you are using a container image from a registry that requires
login, you may add the argument:
* ``--registry-json <path to json file>``
@ -261,7 +273,7 @@ available options.
Cephadm will attempt to log in to this registry so it can pull your container
and then store the login info in its config database. Other hosts added to
the cluster will then also be able to make use of the authenticated registry.
the cluster will then also be able to make use of the authenticated container registry.
* See :ref:`cephadm-deployment-scenarios` for additional examples for using ``cephadm bootstrap``.
@ -326,7 +338,7 @@ Add all hosts to the cluster by following the instructions in
By default, a ``ceph.conf`` file and a copy of the ``client.admin`` keyring are
maintained in ``/etc/ceph`` on all hosts that have the ``_admin`` label. This
label is initially applied only to the bootstrap host. We usually recommend
label is initially applied only to the bootstrap host. We recommend
that one or more other hosts be given the ``_admin`` label so that the Ceph CLI
(for example, via ``cephadm shell``) is easily accessible on multiple hosts. To add
the ``_admin`` label to additional host(s), run a command of the following form:
@ -339,9 +351,10 @@ the ``_admin`` label to additional host(s), run a command of the following form:
Adding additional MONs
======================
A typical Ceph cluster has three or five monitor daemons spread
A typical Ceph cluster has three or five Monitor daemons spread
across different hosts. We recommend deploying five
monitors if there are five or more nodes in your cluster.
Monitors if there are five or more nodes in your cluster. Most clusters do not
benefit from seven or more Monitors.
Please follow :ref:`deploy_additional_monitors` to deploy additional MONs.
@ -366,12 +379,12 @@ See :ref:`osd_autotune`.
To deploy hyperconverged Ceph with TripleO, please refer to the TripleO documentation: `Scenario: Deploy Hyperconverged Ceph <https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/cephadm.html#scenario-deploy-hyperconverged-ceph>`_
In other cases where the cluster hardware is not exclusively used by Ceph (hyperconverged),
In other cases where the cluster hardware is not exclusively used by Ceph (converged infrastructure),
reduce the memory consumption of Ceph like so:
.. prompt:: bash #
# hyperconverged only:
# converged only:
ceph config set mgr mgr/cephadm/autotune_memory_target_ratio 0.2
Then enable memory autotuning:
@ -400,9 +413,11 @@ Different deployment scenarios
Single host
-----------
To configure a Ceph cluster to run on a single host, use the
``--single-host-defaults`` flag when bootstrapping. For use cases of this, see
:ref:`one-node-cluster`.
To deploy a Ceph cluster running on a single host, use the
``--single-host-defaults`` flag when bootstrapping. For use cases, see
:ref:`one-node-cluster`. Such clusters are generally not suitable for
production.
The ``--single-host-defaults`` flag sets the following configuration options::
@ -419,8 +434,8 @@ Deployment in an isolated environment
-------------------------------------
You might need to install cephadm in an environment that is not connected
directly to the internet (such an environment is also called an "isolated
environment"). This can be done if a custom container registry is used. Either
directly to the Internet (an "isolated" or "airgapped"
environment). This requires the use of a custom container registry. Either
of two kinds of custom container registry can be used in this scenario: (1) a
Podman-based or Docker-based insecure registry, or (2) a secure registry.
@ -569,9 +584,9 @@ in order to have cephadm use them for SSHing between cluster hosts
Note that this setup does not require installing the corresponding public key
from the private key passed to bootstrap on other nodes. In fact, cephadm will
reject the ``--ssh-public-key`` argument when passed along with ``--ssh-signed-cert``.
Not because having the public key breaks anything, but because it is not at all needed
for this setup and it helps bootstrap differentiate if the user wants the CA signed
keys setup or standard pubkey encryption. What this means is, SSH key rotation
This is not because having the public key breaks anything, but rather because it is not at all needed
and helps the bootstrap command differentiate if the user wants the CA signed
keys setup or standard pubkey encryption. What this means is that SSH key rotation
would simply be a matter of getting another key signed by the same CA and providing
cephadm with the new private key and signed cert. No additional distribution of
keys to cluster nodes is needed after the initial setup of the CA key as a trusted key,

View File

@ -328,15 +328,15 @@ You can disable this health warning by running the following command:
Cluster Configuration Checks
----------------------------
Cephadm periodically scans each of the hosts in the cluster in order
to understand the state of the OS, disks, NICs etc. These facts can
then be analysed for consistency across the hosts in the cluster to
Cephadm periodically scans each host in the cluster in order
to understand the state of the OS, disks, network interfacess etc. This information can
then be analyzed for consistency across the hosts in the cluster to
identify any configuration anomalies.
Enabling Cluster Configuration Checks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The configuration checks are an **optional** feature, and are enabled
These configuration checks are an **optional** feature, and are enabled
by running the following command:
.. prompt:: bash #
@ -346,7 +346,7 @@ by running the following command:
States Returned by Cluster Configuration Checks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The configuration checks are triggered after each host scan (1m). The
Configuration checks are triggered after each host scan. The
cephadm log entries will show the current state and outcome of the
configuration checks as follows:
@ -383,14 +383,14 @@ To list all the configuration checks and their current states, run the following
# ceph cephadm config-check ls
NAME HEALTHCHECK STATUS DESCRIPTION
kernel_security CEPHADM_CHECK_KERNEL_LSM enabled checks SELINUX/Apparmor profiles are consistent across cluster hosts
os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled checks subscription states are consistent for all cluster hosts
public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a NIC on the Ceph public_network
kernel_security CEPHADM_CHECK_KERNEL_LSM enabled check that SELINUX/Apparmor profiles are consistent across cluster hosts
os_subscription CEPHADM_CHECK_SUBSCRIPTION enabled check that subscription states are consistent for all cluster hosts
public_network CEPHADM_CHECK_PUBLIC_MEMBERSHIP enabled check that all hosts have a network interface on the Ceph public_network
osd_mtu_size CEPHADM_CHECK_MTU enabled check that OSD hosts share a common MTU setting
osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common linkspeed
network_missing CEPHADM_CHECK_NETWORK_MISSING enabled checks that the cluster/public networks defined exist on the Ceph hosts
ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency - ceph daemons should be on the same release (unless upgrade is active)
kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the MAJ.MIN of the kernel on Ceph hosts is consistent
osd_linkspeed CEPHADM_CHECK_LINKSPEED enabled check that OSD hosts share a common network link speed
network_missing CEPHADM_CHECK_NETWORK_MISSING enabled check that the cluster/public networks as defined exist on the Ceph hosts
ceph_release CEPHADM_CHECK_CEPH_RELEASE enabled check for Ceph version consistency: all Ceph daemons should be the same release unless upgrade is in progress
kernel_version CEPHADM_CHECK_KERNEL_VERSION enabled checks that the maj.min version of the kernel is consistent across Ceph hosts
The name of each configuration check can be used to enable or disable a specific check by running a command of the following form:
:
@ -414,31 +414,31 @@ flagged as an anomaly and a healthcheck (WARNING) state raised.
CEPHADM_CHECK_SUBSCRIPTION
~~~~~~~~~~~~~~~~~~~~~~~~~~
This check relates to the status of vendor subscription. This check is
performed only for hosts using RHEL, but helps to confirm that all hosts are
This check relates to the status of OS vendor subscription. This check is
performed only for hosts using RHEL and helps to confirm that all hosts are
covered by an active subscription, which ensures that patches and updates are
available.
CEPHADM_CHECK_PUBLIC_MEMBERSHIP
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All members of the cluster should have NICs configured on at least one of the
All members of the cluster should have a network interface configured on at least one of the
public network subnets. Hosts that are not on the public network will rely on
routing, which may affect performance.
CEPHADM_CHECK_MTU
~~~~~~~~~~~~~~~~~
The MTU of the NICs on OSDs can be a key factor in consistent performance. This
The MTU of the network interfaces on OSD hosts can be a key factor in consistent performance. This
check examines hosts that are running OSD services to ensure that the MTU is
configured consistently within the cluster. This is determined by establishing
configured consistently within the cluster. This is determined by determining
the MTU setting that the majority of hosts is using. Any anomalies result in a
Ceph health check.
health check.
CEPHADM_CHECK_LINKSPEED
~~~~~~~~~~~~~~~~~~~~~~~
This check is similar to the MTU check. Linkspeed consistency is a factor in
consistent cluster performance, just as the MTU of the NICs on the OSDs is.
This check determines the linkspeed shared by the majority of OSD hosts, and a
health check is run for any hosts that are set at a lower linkspeed rate.
This check is similar to the MTU check. Link speed consistency is a factor in
consistent cluster performance, as is the MTU of the OSD node network interfaces.
This check determines the link speed shared by the majority of OSD hosts, and a
health check is run for any hosts that are set at a lower link speed rate.
CEPHADM_CHECK_NETWORK_MISSING
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -448,15 +448,14 @@ a health check is raised.
CEPHADM_CHECK_CEPH_RELEASE
~~~~~~~~~~~~~~~~~~~~~~~~~~
Under normal operations, the Ceph cluster runs daemons under the same ceph
release (that is, the Ceph cluster runs all daemons under (for example)
Octopus). This check determines the active release for each daemon, and
Under normal operations, the Ceph cluster runs daemons that are of the same Ceph
release (for example, Reef). This check determines the active release for each daemon, and
reports any anomalies as a healthcheck. *This check is bypassed if an upgrade
process is active within the cluster.*
is in process.*
CEPHADM_CHECK_KERNEL_VERSION
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The OS kernel version (maj.min) is checked for consistency across the hosts.
The OS kernel version (maj.min) is checked for consistency across hosts.
The kernel version of the majority of the hosts is used as the basis for
identifying anomalies.

View File

@ -357,7 +357,9 @@ Or in YAML:
Placement by pattern matching
-----------------------------
Daemons can be placed on hosts as well:
Daemons can be placed on hosts using a host pattern as well.
By default, the host pattern is matched using fnmatch which supports
UNIX shell-style wildcards (see https://docs.python.org/3/library/fnmatch.html):
.. prompt:: bash #
@ -385,6 +387,26 @@ Or in YAML:
placement:
host_pattern: "*"
The host pattern also has support for using a regex. To use a regex, you
must either add "regex: " to the start of the pattern when using the
command line, or specify a ``pattern_type`` field to be "regex"
when using YAML.
On the command line:
.. prompt:: bash #
ceph orch apply prometheus --placement='regex:FOO[0-9]|BAR[0-9]'
In YAML:
.. code-block:: yaml
service_type: prometheus
placement:
host_pattern:
pattern: 'FOO[0-9]|BAR[0-9]'
pattern_type: regex
Changing the number of daemons
------------------------------

View File

@ -83,6 +83,37 @@ steps below:
ceph orch apply grafana
Enabling security for the monitoring stack
----------------------------------------------
By default, in a cephadm-managed cluster, the monitoring components are set up and configured without enabling security measures.
While this suffices for certain deployments, others with strict security needs may find it necessary to protect the
monitoring stack against unauthorized access. In such cases, cephadm relies on a specific configuration parameter,
`mgr/cephadm/secure_monitoring_stack`, which toggles the security settings for all monitoring components. To activate security
measures, set this option to ``true`` with a command of the following form:
.. prompt:: bash #
ceph config set mgr mgr/cephadm/secure_monitoring_stack true
This change will trigger a sequence of reconfigurations across all monitoring daemons, typically requiring
few minutes until all components are fully operational. The updated secure configuration includes the following modifications:
#. Prometheus: basic authentication is required to access the web portal and TLS is enabled for secure communication.
#. Alertmanager: basic authentication is required to access the web portal and TLS is enabled for secure communication.
#. Node Exporter: TLS is enabled for secure communication.
#. Grafana: TLS is enabled and authentication is requiered to access the datasource information.
In this secure setup, users will need to setup authentication
(username/password) for both Prometheus and Alertmanager. By default the
username and password are set to ``admin``/``admin``. The user can change these
value with the commands ``ceph orch prometheus set-credentials`` and ``ceph
orch alertmanager set-credentials`` respectively. These commands offer the
flexibility to input the username/password either as parameters or via a JSON
file, which enhances security. Additionally, Cephadm provides the commands
`orch prometheus get-credentials` and `orch alertmanager get-credentials` to
retrieve the current credentials.
.. _cephadm-monitoring-centralized-logs:
Centralized Logging in Ceph

View File

@ -15,7 +15,7 @@ Deploying NFS ganesha
=====================
Cephadm deploys NFS Ganesha daemon (or set of daemons). The configuration for
NFS is stored in the ``nfs-ganesha`` pool and exports are managed via the
NFS is stored in the ``.nfs`` pool and exports are managed via the
``ceph nfs export ...`` commands and via the dashboard.
To deploy a NFS Ganesha gateway, run the following command:

View File

@ -232,7 +232,7 @@ Remove an OSD
Removing an OSD from a cluster involves two steps:
#. evacuating all placement groups (PGs) from the cluster
#. evacuating all placement groups (PGs) from the OSD
#. removing the PG-free OSD from the cluster
The following command performs these two steps:

View File

@ -246,6 +246,7 @@ It is a yaml format file with the following properties:
virtual_interface_networks: [ ... ] # optional: list of CIDR networks
use_keepalived_multicast: <bool> # optional: Default is False.
vrrp_interface_network: <string>/<string> # optional: ex: 192.168.20.0/24
health_check_interval: <string> # optional: Default is 2s.
ssl_cert: | # optional: SSL certificate and key
-----BEGIN CERTIFICATE-----
...
@ -273,6 +274,7 @@ It is a yaml format file with the following properties:
monitor_port: <integer> # ex: 1967, used by haproxy for load balancer status
virtual_interface_networks: [ ... ] # optional: list of CIDR networks
first_virtual_router_id: <integer> # optional: default 50
health_check_interval: <string> # optional: Default is 2s.
ssl_cert: | # optional: SSL certificate and key
-----BEGIN CERTIFICATE-----
...
@ -321,6 +323,9 @@ where the properties of this service specification are:
keepalived will have different virtual_router_id. In the case of using ``virtual_ips_list``,
each IP will create its own virtual router. So the first one will have ``first_virtual_router_id``,
second one will have ``first_virtual_router_id`` + 1, etc. Valid values go from 1 to 255.
* ``health_check_interval``
Default is 2 seconds. This parameter can be used to set the interval between health checks
for the haproxy with the backend servers.
.. _ingress-virtual-ip:

View File

@ -32,7 +32,7 @@ completely by running the following commands:
ceph orch set backend ''
ceph mgr module disable cephadm
These commands disable all of the ``ceph orch ...`` CLI commands. All
These commands disable all ``ceph orch ...`` CLI commands. All
previously deployed daemon containers continue to run and will start just as
they were before you ran these commands.
@ -56,7 +56,7 @@ following form:
ceph orch ls --service_name=<service-name> --format yaml
This will return something in the following form:
This will return information in the following form:
.. code-block:: yaml
@ -252,16 +252,17 @@ For more detail on operations of this kind, see
Accessing the Admin Socket
--------------------------
Each Ceph daemon provides an admin socket that bypasses the MONs (See
:ref:`rados-monitoring-using-admin-socket`).
Each Ceph daemon provides an admin socket that allows runtime option setting and statistic reading. See
:ref:`rados-monitoring-using-admin-socket`.
#. To access the admin socket, enter the daemon container on the host::
[root@mon1 ~]# cephadm enter --name <daemon-name>
#. Run a command of the following form to see the admin socket's configuration::
#. Run a command of the following forms to see the admin socket's configuration and other available actions::
[ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-<daemon-name>.asok config show
[ceph: root@mon1 /]# ceph --admin-daemon /var/run/ceph/ceph-<daemon-name>.asok help
Running Various Ceph Tools
--------------------------------
@ -444,11 +445,11 @@ Running repeated debugging sessions
When using ``cephadm shell``, as in the example above, any changes made to the
container that is spawned by the shell command are ephemeral. After the shell
session exits, the files that were downloaded and installed cease to be
available. You can simply re-run the same commands every time ``cephadm
shell`` is invoked, but in order to save time and resources one can create a
new container image and use it for repeated debugging sessions.
available. You can simply re-run the same commands every time ``cephadm shell``
is invoked, but to save time and resources you can create a new container image
and use it for repeated debugging sessions.
In the following example, we create a simple file that will construct the
In the following example, we create a simple file that constructs the
container image. The command below uses podman but it is expected to work
correctly even if ``podman`` is replaced with ``docker``::
@ -463,14 +464,14 @@ correctly even if ``podman`` is replaced with ``docker``::
The above file creates a new local image named ``ceph:debugging``. This image
can be used on the same machine that built it. The image can also be pushed to
a container repository or saved and copied to a node runing other Ceph
containers. Consult the ``podman`` or ``docker`` documentation for more
a container repository or saved and copied to a node that is running other Ceph
containers. See the ``podman`` or ``docker`` documentation for more
information about the container workflow.
After the image has been built, it can be used to initiate repeat debugging
sessions. By using an image in this way, you avoid the trouble of having to
re-install the debug tools and debuginfo packages every time you need to run a
debug session. To debug a core file using this image, in the same way as
re-install the debug tools and the debuginfo packages every time you need to
run a debug session. To debug a core file using this image, in the same way as
previously described, run:
.. prompt:: bash #

View File

@ -2,7 +2,7 @@
Upgrading Ceph
==============
Cephadm can safely upgrade Ceph from one bugfix release to the next. For
Cephadm can safely upgrade Ceph from one point release to the next. For
example, you can upgrade from v15.2.0 (the first Octopus release) to the next
point release, v15.2.1.
@ -137,25 +137,25 @@ UPGRADE_NO_STANDBY_MGR
----------------------
This alert (``UPGRADE_NO_STANDBY_MGR``) means that Ceph does not detect an
active standby manager daemon. In order to proceed with the upgrade, Ceph
requires an active standby manager daemon (which you can think of in this
active standby Manager daemon. In order to proceed with the upgrade, Ceph
requires an active standby Manager daemon (which you can think of in this
context as "a second manager").
You can ensure that Cephadm is configured to run 2 (or more) managers by
You can ensure that Cephadm is configured to run two (or more) Managers by
running the following command:
.. prompt:: bash #
ceph orch apply mgr 2 # or more
You can check the status of existing mgr daemons by running the following
You can check the status of existing Manager daemons by running the following
command:
.. prompt:: bash #
ceph orch ps --daemon-type mgr
If an existing mgr daemon has stopped, you can try to restart it by running the
If an existing Manager daemon has stopped, you can try to restart it by running the
following command:
.. prompt:: bash #
@ -183,7 +183,7 @@ Using customized container images
=================================
For most users, upgrading requires nothing more complicated than specifying the
Ceph version number to upgrade to. In such cases, cephadm locates the specific
Ceph version to which to upgrade. In such cases, cephadm locates the specific
Ceph container image to use by combining the ``container_image_base``
configuration option (default: ``docker.io/ceph/ceph``) with a tag of
``vX.Y.Z``.

View File

@ -1,11 +1,13 @@
.. _cephfs_add_remote_mds:
.. note::
It is highly recommended to use :doc:`/cephadm/index` or another Ceph
orchestrator for setting up the ceph cluster. Use this approach only if you
are setting up the ceph cluster manually. If one still intends to use the
manual way for deploying MDS daemons, :doc:`/cephadm/services/mds/` can
also be used.
.. warning:: The material on this page is to be used only for manually setting
up a Ceph cluster. If you intend to use an automated tool such as
:doc:`/cephadm/index` to set up a Ceph cluster, do not use the
instructions on this page.
.. note:: If you are certain that you know what you are doing and you intend to
manually deploy MDS daemons, see :doc:`/cephadm/services/mds/` before
proceeding.
============================
Deploying Metadata Servers

View File

@ -258,31 +258,47 @@ Clients that are missing newly added features will be evicted automatically.
Here are the current CephFS features and first release they came out:
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| Feature | Ceph release | Upstream Kernel |
+==================+==============+=================+
+============================+==============+=================+
| jewel | jewel | 4.5 |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| kraken | kraken | 4.13 |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| luminous | luminous | 4.13 |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| mimic | mimic | 4.19 |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| reply_encoding | nautilus | 5.1 |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| reclaim_client | nautilus | N/A |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| lazy_caps_wanted | nautilus | 5.1 |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| multi_reconnect | nautilus | 5.1 |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| deleg_ino | octopus | 5.6 |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| metric_collect | pacific | N/A |
+------------------+--------------+-----------------+
| alternate_name | pacific | PLANNED |
+------------------+--------------+-----------------+
+----------------------------+--------------+-----------------+
| alternate_name | pacific | 6.5 |
+----------------------------+--------------+-----------------+
| notify_session_state | quincy | 5.19 |
+----------------------------+--------------+-----------------+
| op_getvxattr | quincy | 6.0 |
+----------------------------+--------------+-----------------+
| 32bits_retry_fwd | reef | 6.6 |
+----------------------------+--------------+-----------------+
| new_snaprealm_info | reef | UNKNOWN |
+----------------------------+--------------+-----------------+
| has_owner_uidgid | reef | 6.6 |
+----------------------------+--------------+-----------------+
| client_mds_auth_caps | squid+bp | PLANNED |
+----------------------------+--------------+-----------------+
..
Comment: use `git describe --tags --abbrev=0 <commit>` to lookup release
CephFS Feature Descriptions
@ -340,6 +356,15 @@ Clients can send performance metric to MDS if MDS support this feature.
Clients can set and understand "alternate names" for directory entries. This is
to be used for encrypted file name support.
::
client_mds_auth_caps
To effectively implement ``root_squash`` in a client's ``mds`` caps, the client
must understand that it is enforcing ``root_squash`` and other cap metadata.
Clients without this feature are in danger of dropping updates to files. It is
recommend to set this feature bit.
Global settings
---------------

View File

@ -47,4 +47,4 @@ client cache.
| MDSs | -=-------> | OSDs |
+---------------------+ +--------------------+
.. _Architecture: ../architecture
.. _Architecture: ../../architecture

View File

@ -93,6 +93,15 @@ providing high-availability.
.. note:: Deploying a single mirror daemon is recommended. Running multiple
daemons is untested.
The following file types are supported by the mirroring:
- Regular files (-)
- Directory files (d)
- Symbolic link file (l)
The other file types are ignored by the mirroring. So they won't be
available on a successfully synchronized peer.
The mirroring module is disabled by default. To enable the mirroring module,
run the following command:

View File

@ -63,6 +63,62 @@ By default, `cephfs-top` uses `client.fstop` user to connect to a Ceph cluster::
$ ceph auth get-or-create client.fstop mon 'allow r' mds 'allow r' osd 'allow r' mgr 'allow r'
$ cephfs-top
Description of Fields
---------------------
1. chit : Cap hit
Percentage of file capability hits over total number of caps
2. dlease : Dentry lease
Percentage of dentry leases handed out over the total dentry lease requests
3. ofiles : Opened files
Number of opened files
4. oicaps : Pinned caps
Number of pinned caps
5. oinodes : Opened inodes
Number of opened inodes
6. rtio : Total size of read IOs
Number of bytes read in input/output operations generated by all process
7. wtio : Total size of write IOs
Number of bytes written in input/output operations generated by all processes
8. raio : Average size of read IOs
Mean of number of bytes read in input/output operations generated by all
process over total IO done
9. waio : Average size of write IOs
Mean of number of bytes written in input/output operations generated by all
process over total IO done
10. rsp : Read speed
Speed of read IOs with respect to the duration since the last refresh of clients
11. wsp : Write speed
Speed of write IOs with respect to the duration since the last refresh of clients
12. rlatavg : Average read latency
Mean value of the read latencies
13. rlatsd : Standard deviation (variance) for read latency
Dispersion of the metric for the read latency relative to its mean
14. wlatavg : Average write latency
Mean value of the write latencies
15. wlatsd : Standard deviation (variance) for write latency
Dispersion of the metric for the write latency relative to its mean
16. mlatavg : Average metadata latency
Mean value of the metadata latencies
17. mlatsd : Standard deviation (variance) for metadata latency
Dispersion of the metric for the metadata latency relative to its mean
Command-Line Options
--------------------

View File

@ -259,3 +259,121 @@ Following is an example of enabling root_squash in a filesystem except within
caps mds = "allow rw fsname=a root_squash, allow rw fsname=a path=/volumes"
caps mon = "allow r fsname=a"
caps osd = "allow rw tag cephfs data=a"
Updating Capabilities using ``fs authorize``
============================================
After Ceph's Reef version, ``fs authorize`` can not only be used to create a
new client with caps for a CephFS but it can also be used to add new caps
(for a another CephFS or another path in same FS) to an already existing
client.
Let's say we run following and create a new client::
$ ceph fs authorize a client.x / rw
[client.x]
key = AQAOtSVk9WWtIhAAJ3gSpsjwfIQ0gQ6vfSx/0w==
$ ceph auth get client.x
[client.x]
key = AQAOtSVk9WWtIhAAJ3gSpsjwfIQ0gQ6vfSx/0w==
caps mds = "allow rw fsname=a"
caps mon = "allow r fsname=a"
caps osd = "allow rw tag cephfs data=a"
Previously, running ``fs authorize a client.x / rw`` a second time used to
print an error message. But after Reef, it instead prints message that
there's not update::
$ ./bin/ceph fs authorize a client.x / rw
no update for caps of client.x
Adding New Caps Using ``fs authorize``
--------------------------------------
Users can now add caps for another path in same CephFS::
$ ceph fs authorize a client.x /dir1 rw
updated caps for client.x
$ ceph auth get client.x
[client.x]
key = AQAOtSVk9WWtIhAAJ3gSpsjwfIQ0gQ6vfSx/0w==
caps mds = "allow r fsname=a, allow rw fsname=a path=some/dir"
caps mon = "allow r fsname=a"
caps osd = "allow rw tag cephfs data=a"
And even add caps for another CephFS on Ceph cluster::
$ ceph fs authorize b client.x / rw
updated caps for client.x
$ ceph auth get client.x
[client.x]
key = AQD6tiVk0uJdARAABMaQuLRotxTi3Qdj47FkBA==
caps mds = "allow rw fsname=a, allow rw fsname=b"
caps mon = "allow r fsname=a, allow r fsname=b"
caps osd = "allow rw tag cephfs data=a, allow rw tag cephfs data=b"
Changing rw permissions in caps
-------------------------------
It's not possible to modify caps by running ``fs authorize`` except for the
case when read/write permissions have to be changed. This is because the
``fs authorize`` becomes ambiguous. For example, user runs ``fs authorize
cephfs1 client.x /dir1 rw`` to create a client and then runs ``fs authorize
cephfs1 client.x /dir2 rw`` (notice ``/dir1`` is changed to ``/dir2``).
Running second command can be interpreted as changing ``/dir1`` to ``/dir2``
in current cap or can also be interpreted as authorizing the client with a
new cap for path ``/dir2``. As seen in previous sections, second
interpretation is chosen and therefore it's impossible to update a part of
capability granted except rw permissions. Following is how read/write
permissions for ``client.x`` (that was created above) can be changed::
$ ceph fs authorize a client.x / r
[client.x]
key = AQBBKjBkIFhBDBAA6q5PmDDWaZtYjd+jafeVUQ==
$ ceph auth get client.x
[client.x]
key = AQBBKjBkIFhBDBAA6q5PmDDWaZtYjd+jafeVUQ==
caps mds = "allow r fsname=a"
caps mon = "allow r fsname=a"
caps osd = "allow r tag cephfs data=a"
``fs authorize`` never deducts any part of caps
-----------------------------------------------
It's not possible to remove caps issued to a client by running ``fs
authorize`` again. For example, if a client cap has ``root_squash`` applied
on a certain CephFS, running ``fs authorize`` again for the same CephFS but
without ``root_squash`` will not lead to any update, the client caps will
remain unchanged::
$ ceph fs authorize a client.x / rw root_squash
[client.x]
key = AQD61CVkcA1QCRAAd0XYqPbHvcc+lpUAuc6Vcw==
$ ceph auth get client.x
[client.x]
key = AQD61CVkcA1QCRAAd0XYqPbHvcc+lpUAuc6Vcw==
caps mds = "allow rw fsname=a root_squash"
caps mon = "allow r fsname=a"
caps osd = "allow rw tag cephfs data=a"
$ ceph fs authorize a client.x / rw
[client.x]
key = AQD61CVkcA1QCRAAd0XYqPbHvcc+lpUAuc6Vcw==
no update was performed for caps of client.x. caps of client.x remains unchanged.
And if a client already has a caps for FS name ``a`` and path ``dir1``,
running ``fs authorize`` again for FS name ``a`` but path ``dir2``, instead
of modifying the caps client already holds, a new cap for ``dir2`` will be
granted::
$ ceph fs authorize a client.x /dir1 rw
$ ceph auth get client.x
[client.x]
key = AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA==
caps mds = "allow rw fsname=a path=/dir1"
caps mon = "allow r fsname=a"
caps osd = "allow rw tag cephfs data=a"
$ ceph fs authorize a client.x /dir2 rw
updated caps for client.x
$ ceph auth get client.x
[client.x]
key = AQC1tyVknMt+JxAAp0pVnbZGbSr/nJrmkMNKqA==
caps mds = "allow rw fsname=a path=dir1, allow rw fsname=a path=dir2"
caps mon = "allow r fsname=a"
caps osd = "allow rw tag cephfs data=a"

View File

@ -15,7 +15,7 @@ Advanced: Metadata repair tools
file system before attempting to repair it.
If you do not have access to professional support for your cluster,
consult the ceph-users mailing list or the #ceph IRC channel.
consult the ceph-users mailing list or the #ceph IRC/Slack channel.
Journal export

View File

@ -501,10 +501,14 @@ To initiate a clone operation use::
$ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name>
.. note:: ``subvolume snapshot clone`` command depends upon the above mentioned config option ``snapshot_clone_no_wait``
If a snapshot (source subvolume) is a part of non-default group, the group name needs to be specified::
$ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --group_name <subvol_group_name>
If a snapshot (source subvolume) is a part of non-default group, the group name needs to be specified:
Cloned subvolumes can be a part of a different group than the source snapshot (by default, cloned subvolumes are created in default group). To clone to a particular group use::
$ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --target_group_name <subvol_group_name>
@ -513,13 +517,15 @@ Similar to specifying a pool layout when creating a subvolume, pool layout can b
$ ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --pool_layout <pool_layout>
Configure the maximum number of concurrent clones. The default is 4::
$ ceph config set mgr mgr/volumes/max_concurrent_clones <value>
To check the status of a clone operation use::
$ ceph fs clone status <vol_name> <clone_name> [--group_name <group_name>]
ceph fs subvolume snapshot clone <vol_name> <subvol_name> <snap_name> <target_subvol_name> --pool_layout <pool_layout>
To check the status of a clone operation use:
.. prompt:: bash #
ceph fs clone status <vol_name> <clone_name> [--group_name <group_name>]
A clone can be in one of the following states:
@ -616,6 +622,31 @@ On successful cancellation, the cloned subvolume is moved to the ``canceled`` st
.. note:: The canceled cloned may be deleted by supplying the ``--force`` option to the `fs subvolume rm` command.
Configurables
~~~~~~~~~~~~~
Configure the maximum number of concurrent clone operations. The default is 4:
.. prompt:: bash #
ceph config set mgr mgr/volumes/max_concurrent_clones <value>
Configure the snapshot_clone_no_wait option :
The ``snapshot_clone_no_wait`` config option is used to reject clone creation requests when cloner threads
(which can be configured using above option i.e. ``max_concurrent_clones``) are not available.
It is enabled by default i.e. the value set is True, whereas it can be configured by using below command.
.. prompt:: bash #
ceph config set mgr mgr/volumes/snapshot_clone_no_wait <bool>
The current value of ``snapshot_clone_no_wait`` can be fetched by using below command.
.. prompt:: bash #
ceph config get mgr mgr/volumes/snapshot_clone_no_wait
.. _subvol-pinning:

View File

@ -130,7 +130,9 @@ other daemons, please see :ref:`health-checks`.
from properly cleaning up resources used by client requests. This message
appears if a client appears to have more than ``max_completed_requests``
(default 100000) requests that are complete on the MDS side but haven't
yet been accounted for in the client's *oldest tid* value.
yet been accounted for in the client's *oldest tid* value. The last tid
used by the MDS to trim completed client requests (or flush) is included
as part of `session ls` (or `client ls`) command as a debug aid.
``MDS_DAMAGE``
--------------
@ -238,3 +240,32 @@ other daemons, please see :ref:`health-checks`.
Description
All MDS ranks are unavailable resulting in the file system to be completely
offline.
``MDS_CLIENTS_LAGGY``
----------------------------
Message
"Client *ID* is laggy; not evicted because some OSD(s) is/are laggy"
Description
If OSD(s) is laggy (due to certain conditions like network cut-off, etc)
then it might make clients laggy(session might get idle or cannot flush
dirty data for cap revokes). If ``defer_client_eviction_on_laggy_osds`` is
set to true (default true), client eviction will not take place and thus
this health warning will be generated.
``MDS_CLIENTS_BROKEN_ROOTSQUASH``
---------------------------------
Message
"X client(s) with broken root_squash implementation (MDS_CLIENTS_BROKEN_ROOTSQUASH)"
Description
A bug was discovered in root_squash which would potentially lose changes made by a
client restricted with root_squash caps. The fix required a change to the protocol
and a client upgrade is required.
This is a HEALTH_ERR warning because of the danger of inconsistency and lost
data. It is recommended to either upgrade your clients, discontinue using
root_squash in the interim, or silence the warning if desired.
To evict and permanently block broken clients from connecting to the
cluster, set the ``required_client_feature`` bit ``client_mds_auth_caps``.

View File

@ -116,7 +116,7 @@ The mechanism provided for this purpose is called an ``export pin``, an
extended attribute of directories. The name of this extended attribute is
``ceph.dir.pin``. Users can set this attribute using standard commands:
::
.. prompt:: bash #
setfattr -n ceph.dir.pin -v 2 path/to/dir
@ -128,7 +128,7 @@ pin. In this way, setting the export pin on a directory affects all of its
children. However, the parents pin can be overridden by setting the child
directory's export pin. For example:
::
.. prompt:: bash #
mkdir -p a/b
# "a" and "a/b" both start without an export pin set
@ -173,7 +173,7 @@ immediate children across a range of MDS ranks. The canonical example use-case
would be the ``/home`` directory: we want every user's home directory to be
spread across the entire MDS cluster. This can be set via:
::
.. prompt:: bash #
setfattr -n ceph.dir.pin.distributed -v 1 /cephfs/home
@ -183,7 +183,7 @@ may be ephemerally pinned. This is set through the extended attribute
``ceph.dir.pin.random`` with the value set to the percentage of directories
that should be pinned. For example:
::
.. prompt:: bash #
setfattr -n ceph.dir.pin.random -v 0.5 /cephfs/tmp
@ -205,7 +205,7 @@ Ephemeral pins may override parent export pins and vice versa. What determines
which policy is followed is the rule of the closest parent: if a closer parent
directory has a conflicting policy, use that one instead. For example:
::
.. prompt:: bash #
mkdir -p foo/bar1/baz foo/bar2
setfattr -n ceph.dir.pin -v 0 foo
@ -217,7 +217,7 @@ directory will obey the pin on ``foo`` normally.
For the reverse situation:
::
.. prompt:: bash #
mkdir -p home/{patrick,john}
setfattr -n ceph.dir.pin.distributed -v 1 home
@ -229,7 +229,8 @@ because its export pin overrides the policy on ``home``.
To remove a partitioning policy, remove the respective extended attribute
or set the value to 0.
.. code::bash
.. prompt:: bash #
$ setfattr -n ceph.dir.pin.distributed -v 0 home
# or
$ setfattr -x ceph.dir.pin.distributed home
@ -237,10 +238,36 @@ or set the value to 0.
For export pins, remove the extended attribute or set the extended attribute
value to `-1`.
.. code::bash
.. prompt:: bash #
$ setfattr -n ceph.dir.pin -v -1 home
Dynamic Subtree Partitioning
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CephFS has long had a dynamic metadata blanacer (sometimes called the "default
balancer") which can split or merge subtrees while placing them on "colder" MDS
ranks. Moving the metadata around can improve overall file system throughput
and cache size.
However, the balancer has suffered from problem with efficiency and performance
so it is by default turned off. This is to avoid an administrator "turning on
multimds" by increasing the ``max_mds`` setting and then finding the balancer
has made a mess of the cluster performance (reverting is straightforward but
can take time).
The setting to turn on the balancer is:
.. prompt:: bash #
ceph fs set <fs_name> balance_automate true
Turning on the balancer should only be done with appropriate configuration,
such as with the ``bal_rank_mask`` setting (described below). Careful
monitoring of the file system performance and MDS is advised.
Dynamic subtree partitioning with Balancer on specific ranks
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -260,27 +287,27 @@ static pinned subtrees.
This option can be configured with the ``ceph fs set`` command. For example:
::
.. prompt:: bash #
ceph fs set <fs_name> bal_rank_mask <hex>
Each bitfield of the ``<hex>`` number represents a dedicated rank. If the ``<hex>`` is
set to ``0x3``, the balancer runs on active ``0`` and ``1`` ranks. For example:
::
.. prompt:: bash #
ceph fs set <fs_name> bal_rank_mask 0x3
If the ``bal_rank_mask`` is set to ``-1`` or ``all``, all active ranks are masked
and utilized by the balancer. As an example:
::
.. prompt:: bash #
ceph fs set <fs_name> bal_rank_mask -1
On the other hand, if the balancer needs to be disabled,
the ``bal_rank_mask`` should be set to ``0x0``. For example:
::
.. prompt:: bash #
ceph fs set <fs_name> bal_rank_mask 0x0

View File

@ -21,6 +21,14 @@ value::
setfattr -n ceph.quota.max_bytes -v 100000000 /some/dir # 100 MB
setfattr -n ceph.quota.max_files -v 10000 /some/dir # 10,000 files
``ceph.quota.max_bytes`` can also be set using human-friendly units::
setfattr -n ceph.quota.max_bytes -v 100K /some/dir # 100 KiB
setfattr -n ceph.quota.max_bytes -v 5Gi /some/dir # 5 GiB
.. note:: Values will be strictly cast to IEC units even when SI units
are input, e.g. 1K to 1024 bytes.
To view quota limit::
$ getfattr -n ceph.quota.max_bytes /some/dir

View File

@ -30,9 +30,9 @@ assumed to be keyword arguments too.
Snapshot schedules are identified by path, their repeat interval and their start
time. The
repeat interval defines the time between two subsequent snapshots. It is
specified by a number and a period multiplier, one of `h(our)`, `d(ay)` and
`w(eek)`. E.g. a repeat interval of `12h` specifies one snapshot every 12
hours.
specified by a number and a period multiplier, one of `h(our)`, `d(ay)`,
`w(eek)`, `M(onth)` and `Y(ear)`. E.g. a repeat interval of `12h` specifies one
snapshot every 12 hours.
The start time is specified as a time string (more details about passing times
below). By default
the start time is last midnight. So when a snapshot schedule with repeat
@ -52,8 +52,8 @@ space or concatenated pairs of `<number><time period>`.
The semantics are that a spec will ensure `<number>` snapshots are kept that are
at least `<time period>` apart. For Example `7d` means the user wants to keep 7
snapshots that are at least one day (but potentially longer) apart from each other.
The following time periods are recognized: `h(our), d(ay), w(eek), m(onth),
y(ear)` and `n`. The latter is a special modifier where e.g. `10n` means keep
The following time periods are recognized: `h(our)`, `d(ay)`, `w(eek)`, `M(onth)`,
`Y(ear)` and `n`. The latter is a special modifier where e.g. `10n` means keep
the last 10 snapshots regardless of timing,
All subcommands take optional `fs` argument to specify paths in

View File

@ -118,10 +118,16 @@ enforces this affinity.
When failing over MDS daemons, a cluster's monitors will prefer standby daemons with
``mds_join_fs`` equal to the file system ``name`` with the failed ``rank``. If no
standby exists with ``mds_join_fs`` equal to the file system ``name``, it will
choose an unqualified standby (no setting for ``mds_join_fs``) for the replacement,
or any other available standby, as a last resort. Note, this does not change the
behavior that ``standby-replay`` daemons are always selected before
other standbys.
choose an unqualified standby (no setting for ``mds_join_fs``) for the replacement.
As a last resort, a standby for another filesystem will be chosen, although this
behavior can be disabled:
::
ceph fs set <fs name> refuse_standby_for_another_fs true
Note, configuring MDS file system affinity does not change the behavior that
``standby-replay`` daemons are always selected before other standbys.
Even further, the monitors will regularly examine the CephFS file systems even when
stable to check if a standby with stronger affinity is available to replace an

View File

@ -401,3 +401,64 @@ own copy of the cephadm "binary" use the script located at
``./src/cephadm/build.py [output]``.
.. _Python Zip Application: https://peps.python.org/pep-0441/
You can pass a limited set of version metadata values to be stored in the
compiled cepadm. These options can be passed to the build script with
the ``--set-version-var`` or ``-S`` option. The values should take the form
``KEY=VALUE`` and valid keys include:
* ``CEPH_GIT_VER``
* ``CEPH_GIT_NICE_VER``
* ``CEPH_RELEASE``
* ``CEPH_RELEASE_NAME``
* ``CEPH_RELEASE_TYPE``
Example: ``./src/cephadm/build.py -SCEPH_GIT_VER=$(git rev-parse HEAD) -SCEPH_GIT_NICE_VER=$(git describe) /tmp/cephadm``
Typically these values will be passed to build.py by other, higher level, build
tools - such as cmake.
The compiled version of the binary may include a curated set of dependencies
within the zipapp. The tool used to fetch the bundled dependencies can be
Python's ``pip``, locally installed RPMs, or bundled dependencies can be
disabled. To select the mode for bundled dependencies use the
``--bundled-dependencies`` or ``-B`` option with a value of ``pip``, ``rpm``,
or ``none``.
The compiled cephadm zipapp file retains metadata about how it was built. This
can be displayed by running ``cephadm version --verbose``. The command will
emit a JSON formatted object showing version metadata (if available), a list of
the bundled dependencies generated by the build script (if bundled dependencies
were enabled), and a summary of the top-level contents of the zipapp. Example::
$ ./cephadm version --verbose
{
"name": "cephadm",
"ceph_git_nice_ver": "18.0.0-6867-g6a1df2d0b01",
"ceph_git_ver": "6a1df2d0b01da581bfef3357940e1e88d5ce70ce",
"ceph_release_name": "reef",
"ceph_release_type": "dev",
"bundled_packages": [
{
"name": "Jinja2",
"version": "3.1.2",
"package_source": "pip",
"requirements_entry": "Jinja2 == 3.1.2"
},
{
"name": "MarkupSafe",
"version": "2.1.3",
"package_source": "pip",
"requirements_entry": "MarkupSafe == 2.1.3"
}
],
"zip_root_entries": [
"Jinja2-3.1.2-py3.9.egg-info",
"MarkupSafe-2.1.3-py3.9.egg-info",
"__main__.py",
"__main__.pyc",
"_cephadmmeta",
"cephadmlib",
"jinja2",
"markupsafe"
]
}

View File

@ -148,7 +148,7 @@ options. By default, ``log-to-stdout`` is enabled, and ``--log-to-syslog`` is di
vstart.sh
---------
The following options aree handy when using ``vstart.sh``,
The following options can be used with ``vstart.sh``.
``--crimson``
Start ``crimson-osd`` instead of ``ceph-osd``.
@ -195,9 +195,6 @@ The following options aree handy when using ``vstart.sh``,
Valid types include ``HDD``, ``SSD``(default), ``ZNS``, and ``RANDOM_BLOCK_SSD``
Note secondary devices should not be faster than the main device.
``--seastore``
Use SeaStore as the object store backend.
To start a cluster with a single Crimson node, run::
$ MGR=1 MON=1 OSD=1 MDS=0 RGW=0 ../src/vstart.sh -n -x \

View File

@ -1,3 +1,5 @@
.. _crimson_dev_doc:
===============================
Crimson developer documentation
===============================

View File

@ -13,20 +13,18 @@ following table shows all the leads and their nicks on `GitHub`_:
.. _github: https://github.com/
========= ================ =============
========= ================== =============
Scope Lead GitHub nick
========= ================ =============
Ceph Sage Weil liewegas
RADOS Neha Ojha neha-ojha
RGW Yehuda Sadeh yehudasa
========= ================== =============
RADOS Radoslaw Zarzynski rzarzynski
RGW Casey Bodley cbodley
RGW Matt Benjamin mattbenjamin
RBD Ilya Dryomov dis
CephFS Venky Shankar vshankar
Dashboard Ernesto Puerta epuertat
MON Joao Luis jecluis
Dashboard Nizamudeen A nizamial09
Build/Ops Ken Dreyer ktdreyer
Docs Zac Dover zdover23
========= ================ =============
========= ================== =============
The Ceph-specific acronyms in the table are explained in
:doc:`/architecture`.

View File

@ -209,6 +209,15 @@ For example: for the above test ID, the path is::
This method can be used to view the log more quickly than would be possible through a browser.
In addition to ``teuthology.log``, some other files are included for debugging
purposes:
* ``unit_test_summary.yaml``: Provides a summary of all unit test failures.
Generated (optionally) when the ``unit_test_scan`` configuration option is
used in the job's YAML file.
* ``valgrind.yaml``: Summarizes any Valgrind errors that may occur.
.. note:: To access archives more conveniently, ``/a/`` has been symbolically
linked to ``/ceph/teuthology-archive/``. For instance, to access the previous
example, we can use something like::

View File

@ -2,10 +2,14 @@
Ceph Internals
================
.. note:: If you're looking for how to use Ceph as a library from your
own software, please see :doc:`/api/index`.
.. note:: For information on how to use Ceph as a library (from your own
software), see :doc:`/api/index`.
You can start a development mode Ceph cluster, after compiling the source, with::
Starting a Development-mode Ceph Cluster
----------------------------------------
Compile the source and then run the following commands to start a
development-mode Ceph cluster::
cd build
OSD=3 MON=3 MGR=3 ../src/vstart.sh -n -x

View File

@ -218,6 +218,8 @@ we may want to exploit.
The dedup-tool needs to be updated to use ``LIST_SNAPS`` to discover
clones as part of leak detection.
.. _osd-make-writeable:
An important question is how we deal with the fact that many clones
will frequently have references to the same backing chunks at the same
offset. In particular, ``make_writeable`` will generally create a clone

View File

@ -23,12 +23,11 @@ The difference between *pool snaps* and *self managed snaps* from the
OSD's point of view lies in whether the *SnapContext* comes to the OSD
via the client's MOSDOp or via the most recent OSDMap.
See OSD::make_writeable
See :ref:`manifest.rst <osd-make-writeable>` for more information.
Ondisk Structures
-----------------
Each object has in the PG collection a *head* object (or *snapdir*, which we
will come to shortly) and possibly a set of *clone* objects.
Each object has in the PG collection a *head* object and possibly a set of *clone* objects.
Each hobject_t has a snap field. For the *head* (the only writeable version
of an object), the snap field is set to CEPH_NOSNAP. For the *clones*, the
snap field is set to the *seq* of the *SnapContext* at their creation.
@ -47,8 +46,12 @@ The *head* object contains a *SnapSet* encoded in an attribute, which tracks
3. Overlapping intervals between clones for tracking space usage
4. Clone size
If the *head* is deleted while there are still clones, a *snapdir* object
is created instead to house the *SnapSet*.
The *head* can't be deleted while there are still clones. Instead, it is
marked as whiteout (``object_info_t::FLAG_WHITEOUT``) in order to house the
*SnapSet* contained in it.
In that case, the *head* object no longer logically exists.
See: should_whiteout()
Additionally, the *object_info_t* on each clone includes a vector of snaps
for which clone is defined.
@ -126,3 +129,111 @@ up to 8 prefixes need to be checked to determine all hobjects in a particular
snap for a particular PG. Upon split, the prefixes to check on the parent
are adjusted such that only the objects remaining in the PG will be visible.
The children will immediately have the correct mapping.
clone_overlap
-------------
Each SnapSet attached to the *head* object contains the overlapping intervals
between clone objects for optimizing space.
The overlapping intervals are stored within the ``clone_overlap`` map, each element in the
map stores the snap ID and the corresponding overlap with the next newest clone.
See the following example using a 4 byte object:
+--------+---------+
| object | content |
+========+=========+
| head | [AAAA] |
+--------+---------+
listsnaps output is as follows:
+---------+-------+------+---------+
| cloneid | snaps | size | overlap |
+=========+=======+======+=========+
| head | - | 4 | |
+---------+-------+------+---------+
After taking a snapshot (ID 1) and re-writing the first 2 bytes of the object,
the clone created will overlap with the new *head* object in its last 2 bytes.
+------------+---------+
| object | content |
+============+=========+
| head | [BBAA] |
+------------+---------+
| clone ID 1 | [AAAA] |
+------------+---------+
+---------+-------+------+---------+
| cloneid | snaps | size | overlap |
+=========+=======+======+=========+
| 1 | 1 | 4 | [2~2] |
+---------+-------+------+---------+
| head | - | 4 | |
+---------+-------+------+---------+
By taking another snapshot (ID 2) and this time re-writing only the first 1 byte of the object,
the clone created (ID 2) will overlap with the new *head* object in its last 3 bytes.
While the oldest clone (ID 1) will overlap with the newest clone in its last 2 bytes.
+------------+---------+
| object | content |
+============+=========+
| head | [CBAA] |
+------------+---------+
| clone ID 2 | [BBAA] |
+------------+---------+
| clone ID 1 | [AAAA] |
+------------+---------+
+---------+-------+------+---------+
| cloneid | snaps | size | overlap |
+=========+=======+======+=========+
| 1 | 1 | 4 | [2~2] |
+---------+-------+------+---------+
| 2 | 2 | 4 | [1~3] |
+---------+-------+------+---------+
| head | - | 4 | |
+---------+-------+------+---------+
If the *head* object will be completely re-written by re-writing 4 bytes,
the only existing overlap that will remain will be between the two clones.
+------------+---------+
| object | content |
+============+=========+
| head | [DDDD] |
+------------+---------+
| clone ID 2 | [BBAA] |
+------------+---------+
| clone ID 1 | [AAAA] |
+------------+---------+
+---------+-------+------+---------+
| cloneid | snaps | size | overlap |
+=========+=======+======+=========+
| 1 | 1 | 4 | [2~2] |
+---------+-------+------+---------+
| 2 | 2 | 4 | |
+---------+-------+------+---------+
| head | - | 4 | |
+---------+-------+------+---------+
Lastly, after the last snap (ID 2) is removed and snaptrim kicks in,
no overlapping intervals will remain:
+------------+---------+
| object | content |
+============+=========+
| head | [DDDD] |
+------------+---------+
| clone ID 1 | [AAAA] |
+------------+---------+
+---------+-------+------+---------+
| cloneid | snaps | size | overlap |
+=========+=======+======+=========+
| 1 | 1 | 4 | |
+---------+-------+------+---------+
| head | - | 4 | |
+---------+-------+------+---------+

View File

@ -6,92 +6,87 @@ Concepts
--------
*Peering*
the process of bringing all of the OSDs that store
a Placement Group (PG) into agreement about the state
of all of the objects (and their metadata) in that PG.
Note that agreeing on the state does not mean that
they all have the latest contents.
the process of bringing all of the OSDs that store a Placement Group (PG)
into agreement about the state of all of the objects in that PG and all of
the metadata associated with those objects. Two OSDs can agree on the state
of the objects in the placement group yet still may not necessarily have the
latest contents.
*Acting set*
the ordered list of OSDs who are (or were as of some epoch)
responsible for a particular PG.
the ordered list of OSDs that are (or were as of some epoch) responsible for
a particular PG.
*Up set*
the ordered list of OSDs responsible for a particular PG for
a particular epoch according to CRUSH. Normally this
is the same as the *acting set*, except when the *acting set* has been
explicitly overridden via *PG temp* in the OSDMap.
the ordered list of OSDs responsible for a particular PG for a particular
epoch, according to CRUSH. This is the same as the *acting set* except when
the *acting set* has been explicitly overridden via *PG temp* in the OSDMap.
*PG temp*
a temporary placement group acting set used while backfilling the
primary osd. Let say acting is [0,1,2] and we are
active+clean. Something happens and acting is now [3,1,2]. osd 3 is
empty and can't serve reads although it is the primary. osd.3 will
see that and request a *PG temp* of [1,2,3] to the monitors using a
MOSDPGTemp message so that osd.1 temporarily becomes the
primary. It will select osd.3 as a backfill peer and continue to
serve reads and writes while osd.3 is backfilled. When backfilling
is complete, *PG temp* is discarded and the acting set changes back
to [3,1,2] and osd.3 becomes the primary.
a temporary placement group acting set that is used while backfilling the
primary OSD. Assume that the acting set is ``[0,1,2]`` and we are
``active+clean``. Now assume that something happens and the acting set
becomes ``[2,1,2]``. Under these circumstances, OSD ``3`` is empty and can't
serve reads even though it is the primary. ``osd.3`` will respond by
requesting a *PG temp* of ``[1,2,3]`` to the monitors using a ``MOSDPGTemp``
message, and ``osd.1`` will become the primary temporarily. ``osd.1`` will
select ``osd.3`` as a backfill peer and will continue to serve reads and
writes while ``osd.3`` is backfilled. When backfilling is complete, *PG
temp* is discarded. The acting set changes back to ``[3,1,2]`` and ``osd.3``
becomes the primary.
*current interval* or *past interval*
a sequence of OSD map epochs during which the *acting set* and *up
set* for particular PG do not change
a sequence of OSD map epochs during which the *acting set* and the *up
set* for particular PG do not change.
*primary*
the (by convention first) member of the *acting set*,
who is responsible for coordination peering, and is
the only OSD that will accept client initiated
writes to objects in a placement group.
the member of the *acting set* that is responsible for coordination peering.
The only OSD that accepts client-initiated writes to the objects in a
placement group. By convention, the primary is the first member of the
*acting set*.
*replica*
a non-primary OSD in the *acting set* for a placement group
(and who has been recognized as such and *activated* by the primary).
a non-primary OSD in the *acting set* of a placement group. A replica has
been recognized as a non-primary OSD and has been *activated* by the
primary.
*stray*
an OSD who is not a member of the current *acting set*, but
has not yet been told that it can delete its copies of a
particular placement group.
an OSD that is not a member of the current *acting set* and has not yet been
told to delete its copies of a particular placement group.
*recovery*
ensuring that copies of all of the objects in a PG
are on all of the OSDs in the *acting set*. Once
*peering* has been performed, the primary can start
accepting write operations, and *recovery* can proceed
in the background.
the process of ensuring that copies of all of the objects in a PG are on all
of the OSDs in the *acting set*. After *peering* has been performed, the
primary can begin accepting write operations and *recovery* can proceed in
the background.
*PG info*
basic metadata about the PG's creation epoch, the version
for the most recent write to the PG, *last epoch started*, *last
epoch clean*, and the beginning of the *current interval*. Any
inter-OSD communication about PGs includes the *PG info*, such that
any OSD that knows a PG exists (or once existed) also has a lower
bound on *last epoch clean* or *last epoch started*.
basic metadata about the PG's creation epoch, the version for the most
recent write to the PG, the *last epoch started*, the *last epoch clean*,
and the beginning of the *current interval*. Any inter-OSD communication
about PGs includes the *PG info*, such that any OSD that knows a PG exists
(or once existed) and also has a lower bound on *last epoch clean* or *last
epoch started*.
*PG log*
a list of recent updates made to objects in a PG.
Note that these logs can be truncated after all OSDs
in the *acting set* have acknowledged up to a certain
point.
a list of recent updates made to objects in a PG. These logs can be
truncated after all OSDs in the *acting set* have acknowledged the changes.
*missing set*
Each OSD notes update log entries and if they imply updates to
the contents of an object, adds that object to a list of needed
updates. This list is called the *missing set* for that <OSD,PG>.
the set of all objects that have not yet had their contents updated to match
the log entries. The missing set is collated by each OSD. Missing sets are
kept track of on an ``<OSD,PG>`` basis.
*Authoritative History*
a complete, and fully ordered set of operations that, if
performed, would bring an OSD's copy of a Placement Group
up to date.
a complete and fully-ordered set of operations that bring an OSD's copy of a
Placement Group up to date.
*epoch*
a (monotonically increasing) OSD map version number
a (monotonically increasing) OSD map version number.
*last epoch start*
the last epoch at which all nodes in the *acting set*
for a particular placement group agreed on an
*authoritative history*. At this point, *peering* is
deemed to have been successful.
the last epoch at which all nodes in the *acting set* for a given placement
group agreed on an *authoritative history*. At the start of the last epoch,
*peering* is deemed to have been successful.
*up_thru*
before a primary can successfully complete the *peering* process,
@ -107,10 +102,9 @@ Concepts
- *acting set* = [B] (B restarts, A does not)
*last epoch clean*
the last epoch at which all nodes in the *acting set*
for a particular placement group were completely
up to date (both PG logs and object contents).
At this point, *recovery* is deemed to have been
the last epoch at which all nodes in the *acting set* for a given placement
group were completely up to date (this includes both the PG's logs and the
PG's object contents). At this point, *recovery* is deemed to have been
completed.
Description of the Peering Process

View File

@ -213,10 +213,24 @@
Ceph cluster. See :ref:`the "Cluster Map" section of the
Architecture document<architecture_cluster_map>` for details.
Crimson
A next-generation OSD architecture whose core aim is the
reduction of latency costs incurred due to cross-core
communications. A re-design of the OSD that reduces lock
contention by reducing communication between shards in the data
path. Crimson improves upon the performance of classic Ceph
OSDs by eliminating reliance on thread pools. See `Crimson:
Next-generation Ceph OSD for Multi-core Scalability
<https://ceph.io/en/news/blog/2023/crimson-multi-core-scalability/>`_.
See the :ref:`Crimson developer
documentation<crimson_dev_doc>`.
CRUSH
**C**\ontrolled **R**\eplication **U**\nder **S**\calable
**H**\ashing. The algorithm that Ceph uses to compute object
storage locations.
storage locations. See `CRUSH: Controlled, Scalable,
Decentralized Placement of Replicated Data
<https://ceph.com/assets/pdfs/weil-crush-sc06.pdf>`_.
CRUSH rule
The CRUSH data placement rule that applies to a particular
@ -255,17 +269,31 @@
Hybrid OSD
Refers to an OSD that has both HDD and SSD drives.
librados
An API that can be used to create a custom interface to a Ceph
storage cluster. ``librados`` makes it possible to interact
with Ceph Monitors and with OSDs. See :ref:`Introduction to
librados <librados-intro>`. See :ref:`librados (Python)
<librados-python>`.
LVM tags
**L**\ogical **V**\olume **M**\anager tags. Extensible metadata
for LVM volumes and groups. They are used to store
Ceph-specific information about devices and its relationship
with OSDs.
:ref:`MDS<cephfs_add_remote_mds>`
MDS
The Ceph **M**\eta\ **D**\ata **S**\erver daemon. Also referred
to as "ceph-mds". The Ceph metadata server daemon must be
running in any Ceph cluster that runs the CephFS file system.
The MDS stores all filesystem metadata.
The MDS stores all filesystem metadata. :term:`Client`\s work
together with either a single MDS or a group of MDSes to
maintain a distributed metadata cache that is required by
CephFS.
See :ref:`Deploying Metadata Servers<cephfs_add_remote_mds>`.
See the :ref:`ceph-mds man page<ceph_mds_man>`.
MGR
The Ceph manager software, which collects all the state from
@ -274,12 +302,30 @@
:ref:`MON<arch_monitor>`
The Ceph monitor software.
Monitor Store
The persistent storage that is used by the Monitor. This
includes the Monitor's RocksDB and all related files in
``/var/lib/ceph``.
Node
See :term:`Ceph Node`.
Object Storage Device
See :term:`OSD`.
OMAP
"object map". A key-value store (a database) that is used to
reduce the time it takes to read data from and to write to the
Ceph cluster. RGW bucket indexes are stored as OMAPs.
Erasure-coded pools cannot store RADOS OMAP data structures.
Run the command ``ceph osd df`` to see your OMAPs.
See Eleanor Cawthon's 2012 paper `A Distributed Key-Value Store
using Ceph
<https://ceph.io/assets/pdfs/CawthonKeyValueStore.pdf>`_ (17
pages).
OSD
Probably :term:`Ceph OSD`, but not necessarily. Sometimes
(especially in older correspondence, and especially in
@ -291,18 +337,19 @@
mid-2010s to insist that "OSD" should refer to "Object Storage
Device", so it is important to know which meaning is intended.
OSD fsid
This is a unique identifier used to identify an OSD. It is
found in the OSD path in a file called ``osd_fsid``. The
term ``fsid`` is used interchangeably with ``uuid``
OSD FSID
The OSD fsid is a unique identifier that is used to identify an
OSD. It is found in the OSD path in a file called ``osd_fsid``.
The term ``FSID`` is used interchangeably with ``UUID``.
OSD id
The integer that defines an OSD. It is generated by the
monitors during the creation of each OSD.
OSD ID
The OSD id an integer unique to each OSD (each OSD has a unique
OSD ID). Each OSD id is generated by the monitors during the
creation of its associated OSD.
OSD uuid
This is the unique identifier of an OSD. This term is used
interchangeably with ``fsid``
OSD UUID
The OSD UUID is the unique identifier of an OSD. This term is
used interchangeably with ``FSID``.
Period
In the context of :term:`RGW`, a period is the configuration

View File

@ -0,0 +1,183 @@
.. _hardware-monitoring:
Hardware monitoring
===================
`node-proxy` is the internal name to designate the running agent which inventories a machine's hardware, provides the different statuses and enable the operator to perform some actions.
It gathers details from the RedFish API, processes and pushes data to agent endpoint in the Ceph manager daemon.
.. graphviz::
digraph G {
node [shape=record];
mgr [label="{<mgr> ceph manager}"];
dashboard [label="<dashboard> ceph dashboard"];
agent [label="<agent> agent"];
redfish [label="<redfish> redfish"];
agent -> redfish [label=" 1." color=green];
agent -> mgr [label=" 2." color=orange];
dashboard:dashboard -> mgr [label=" 3."color=lightgreen];
node [shape=plaintext];
legend [label=<<table border="0" cellborder="1" cellspacing="0">
<tr><td bgcolor="lightgrey">Legend</td></tr>
<tr><td align="center">1. Collects data from redfish API</td></tr>
<tr><td align="left">2. Pushes data to ceph mgr</td></tr>
<tr><td align="left">3. Query ceph mgr</td></tr>
</table>>];
}
Limitations
-----------
For the time being, the `node-proxy` agent relies on the RedFish API.
It implies both `node-proxy` agent and `ceph-mgr` daemon need to be able to access the Out-Of-Band network to work.
Deploying the agent
-------------------
| The first step is to provide the out of band management tool credentials.
| This can be done when adding the host with a service spec file:
.. code-block:: bash
# cat host.yml
---
service_type: host
hostname: node-10
addr: 10.10.10.10
oob:
addr: 20.20.20.10
username: admin
password: p@ssword
Apply the spec:
.. code-block:: bash
# ceph orch apply -i host.yml
Added host 'node-10' with addr '10.10.10.10'
Deploy the agent:
.. code-block:: bash
# ceph config set mgr mgr/cephadm/hw_monitoring true
CLI
---
| **orch** **hardware** **status** [hostname] [--category CATEGORY] [--format plain | json]
supported categories are:
* summary (default)
* memory
* storage
* processors
* network
* power
* fans
* firmwares
* criticals
Examples
********
hardware health statuses summary
++++++++++++++++++++++++++++++++
.. code-block:: bash
# ceph orch hardware status
+------------+---------+-----+-----+--------+-------+------+
| HOST | STORAGE | CPU | NET | MEMORY | POWER | FANS |
+------------+---------+-----+-----+--------+-------+------+
| node-10 | ok | ok | ok | ok | ok | ok |
+------------+---------+-----+-----+--------+-------+------+
storage devices report
++++++++++++++++++++++
.. code-block:: bash
# ceph orch hardware status IBM-Ceph-1 --category storage
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
| HOST | NAME | MODEL | SIZE | PROTOCOL | SN | STATUS | STATE |
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
| node-10 | Disk 8 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99QLL | OK | Enabled |
| node-10 | Disk 10 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZYX | OK | Enabled |
| node-10 | Disk 11 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZWB | OK | Enabled |
| node-10 | Disk 9 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZC9 | OK | Enabled |
| node-10 | Disk 3 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT9903Y | OK | Enabled |
| node-10 | Disk 1 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT9901E | OK | Enabled |
| node-10 | Disk 7 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZQJ | OK | Enabled |
| node-10 | Disk 2 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99PA2 | OK | Enabled |
| node-10 | Disk 4 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99PFG | OK | Enabled |
| node-10 | Disk 0 in Backplane 0 of Storage Controller in Slot 2 | MZ7L33T8HBNAAD3 | 3840755981824 | SATA | S6M5NE0T800539 | OK | Enabled |
| node-10 | Disk 1 in Backplane 0 of Storage Controller in Slot 2 | MZ7L33T8HBNAAD3 | 3840755981824 | SATA | S6M5NE0T800554 | OK | Enabled |
| node-10 | Disk 6 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZER | OK | Enabled |
| node-10 | Disk 0 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT98ZEJ | OK | Enabled |
| node-10 | Disk 5 in Backplane 1 of Storage Controller in Slot 2 | ST20000NM008D-3D | 20000588955136 | SATA | ZVT99QMH | OK | Enabled |
| node-10 | Disk 0 on AHCI Controller in SL 6 | MTFDDAV240TDU | 240057409536 | SATA | 22373BB1E0F8 | OK | Enabled |
| node-10 | Disk 1 on AHCI Controller in SL 6 | MTFDDAV240TDU | 240057409536 | SATA | 22373BB1E0D5 | OK | Enabled |
+------------+--------------------------------------------------------+------------------+----------------+----------+----------------+--------+---------+
firmwares details
+++++++++++++++++
.. code-block:: bash
# ceph orch hardware status node-10 --category firmwares
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
| HOST | COMPONENT | NAME | DATE | VERSION | STATUS |
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
| node-10 | current-107649-7.03__raid.backplane.firmware.0 | Backplane 0 | 2022-12-05T00:00:00Z | 7.03 | OK |
... omitted output ...
| node-10 | previous-25227-6.10.30.20__idrac.embedded.1-1 | Integrated Remote Access Controller | 00:00:00Z | 6.10.30.20 | OK |
+------------+----------------------------------------------------------------------------+--------------------------------------------------------------+----------------------+-------------+--------+
hardware critical warnings report
+++++++++++++++++++++++++++++++++
.. code-block:: bash
# ceph orch hardware status --category criticals
+------------+-----------+------------+----------+-----------------+
| HOST | COMPONENT | NAME | STATUS | STATE |
+------------+-----------+------------+----------+-----------------+
| node-10 | power | PS2 Status | critical | unplugged |
+------------+-----------+------------+----------+-----------------+
Developpers
-----------
.. py:currentmodule:: cephadm.agent
.. autoclass:: NodeProxyEndpoint
.. automethod:: NodeProxyEndpoint.__init__
.. automethod:: NodeProxyEndpoint.oob
.. automethod:: NodeProxyEndpoint.data
.. automethod:: NodeProxyEndpoint.fullreport
.. automethod:: NodeProxyEndpoint.summary
.. automethod:: NodeProxyEndpoint.criticals
.. automethod:: NodeProxyEndpoint.memory
.. automethod:: NodeProxyEndpoint.storage
.. automethod:: NodeProxyEndpoint.network
.. automethod:: NodeProxyEndpoint.power
.. automethod:: NodeProxyEndpoint.processors
.. automethod:: NodeProxyEndpoint.fans
.. automethod:: NodeProxyEndpoint.firmwares
.. automethod:: NodeProxyEndpoint.led

View File

@ -118,8 +118,9 @@ about Ceph, see our `Architecture`_ section.
governance
foundation
ceph-volume/index
releases/general
releases/index
Ceph Releases (general) <https://docs.ceph.com/en/latest/releases/general/>
Ceph Releases (index) <https://docs.ceph.com/en/latest/releases/>
security/index
hardware-monitoring/index
Glossary <glossary>
Tracing <jaegertracing/index>

View File

@ -98,59 +98,7 @@ repository.
Updating Submodules
-------------------
#. Determine whether your submodules are out of date:
.. prompt:: bash $
git status
A. If your submodules are up to date
If your submodules are up to date, the following console output will
appear:
::
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
If you see this console output, then your submodules are up to date.
You do not need this procedure.
B. If your submodules are not up to date
If your submodules are not up to date, you will see a message that
includes a list of "untracked files". The example here shows such a
list, which was generated from a real situation in which the
submodules were no longer current. Your list of files will not be the
same as this list of files, but this list is provided as an example.
If in your case any untracked files are listed, then you should
continue to the next step of this procedure.
::
On branch main
Your branch is up to date with 'origin/main'.
Untracked files:
(use "git add <file>..." to include in what will be committed)
src/pybind/cephfs/build/
src/pybind/cephfs/cephfs.c
src/pybind/cephfs/cephfs.egg-info/
src/pybind/rados/build/
src/pybind/rados/rados.c
src/pybind/rados/rados.egg-info/
src/pybind/rbd/build/
src/pybind/rbd/rbd.c
src/pybind/rbd/rbd.egg-info/
src/pybind/rgw/build/
src/pybind/rgw/rgw.c
src/pybind/rgw/rgw.egg-info/
nothing added to commit but untracked files present (use "git add" to track)
#. If your submodules are out of date, run the following commands:
If your submodules are out of date, run the following commands:
.. prompt:: bash $
@ -158,24 +106,10 @@ Updating Submodules
git clean -fdx
git submodule foreach git clean -fdx
If you still have problems with a submodule directory, use ``rm -rf
[directory name]`` to remove the directory. Then run ``git submodule update
--init --recursive`` again.
If you still have problems with a submodule directory, use ``rm -rf [directory
name]`` to remove the directory. Then run ``git submodule update --init
--recursive --progress`` again.
#. Run ``git status`` again:
.. prompt:: bash $
git status
Your submodules are up to date if you see the following message:
::
On branch main
Your branch is up to date with 'origin/main'.
nothing to commit, working tree clean
Choose a Branch
===============

View File

@ -251,6 +251,17 @@ openSUSE Tumbleweed
The newest major release of Ceph is already available through the normal Tumbleweed repositories.
There's no need to add another package repository manually.
openEuler
^^^^^^^^^
There are two major versions supported in normal openEuler repositories. They are ceph 12.2.8 in openEuler-20.03-LTS series and ceph 16.2.7 in openEuler-22.03-LTS series. Theres no need to add another package repository manually.
You can install ceph just by executing the following:
.. prompt:: bash $
sudo yum -y install ceph
Also you can download packages manually from https://repo.openeuler.org/openEuler-{release}/everything/{arch}/Packages/.
Ceph Development Packages
-------------------------

View File

@ -9,9 +9,8 @@ There are multiple ways to install Ceph.
Recommended methods
~~~~~~~~~~~~~~~~~~~
:ref:`Cephadm <cephadm_deploying_new_cluster>` installs and manages a Ceph
cluster that uses containers and systemd and is tightly integrated with the CLI
and dashboard GUI.
:ref:`Cephadm <cephadm_deploying_new_cluster>` is a tool that can be used to
install and manage a Ceph cluster.
* cephadm supports only Octopus and newer releases.
* cephadm is fully integrated with the orchestration API and fully supports the
@ -59,6 +58,8 @@ tool that can be used to quickly deploy clusters. It is deprecated.
`github.com/openstack/puppet-ceph <https://github.com/openstack/puppet-ceph>`_ installs Ceph via Puppet.
`OpenNebula HCI clusters <https://docs.opennebula.io/stable/provision_clusters/hci_clusters/overview.html>`_ deploys Ceph on various cloud platforms.
Ceph can also be :ref:`installed manually <install-manual>`.

View File

@ -461,6 +461,52 @@ In the below instructions, ``{id}`` is an arbitrary name, such as the hostname o
#. Now you are ready to `create a Ceph file system`_.
Manually Installing RADOSGW
===========================
For a more involved discussion of the procedure presented here, see `this
thread on the ceph-users mailing list
<https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LB3YRIKAPOHXYCW7MKLVUJPYWYRQVARU/>`_.
#. Install ``radosgw`` packages on the nodes that will be the RGW nodes.
#. From a monitor or from a node with admin privileges, run a command of the
following form:
.. prompt:: bash #
ceph auth get-or-create client.short-hostname-of-rgw mon 'allow rw' osd 'allow rwx'
#. On one of the RGW nodes, do the following:
a. Create a ``ceph-user``-owned directory. For example:
.. prompt:: bash #
install -d -o ceph -g ceph /var/lib/ceph/radosgw/ceph-$(hostname -s)
b. Enter the directory just created and create a ``keyring`` file:
.. prompt:: bash #
touch /var/lib/ceph/radosgw/ceph-$(hostname -s)/keyring
Use a command similar to this one to put the key from the earlier ``ceph
auth get-or-create`` step in the ``keyring`` file. Use your preferred
editor:
.. prompt:: bash #
$EDITOR /var/lib/ceph/radosgw/ceph-$(hostname -s)/keyring
c. Repeat these steps on every RGW node.
#. Start the RADOSGW service by running the following command:
.. prompt:: bash #
systemctl start ceph-radosgw@$(hostname -s).service
Summary
=======

View File

@ -1,5 +1,7 @@
:orphan:
.. _ceph_mds_man:
=========================================
ceph-mds -- ceph metadata server daemon
=========================================

View File

@ -244,45 +244,56 @@ Procedure
Manipulating the Object Map Key
-------------------------------
Use the **ceph-objectstore-tool** utility to change the object map (OMAP) key. You need to provide the data path, the placement group identifier (PG ID), the object, and the key in the OMAP.
Note
Use the **ceph-objectstore-tool** utility to change the object map (OMAP) key.
Provide the data path, the placement group identifier (PG ID), the object, and
the key in the OMAP.
Prerequisites
^^^^^^^^^^^^^
* Having root access to the Ceph OSD node.
* Stopping the ceph-osd daemon.
Procedure
Commands
^^^^^^^^
Get the object map key:
Run the commands in this section as ``root`` on an OSD node.
Syntax::
* **Getting the object map key**
Syntax:
.. code-block:: ini
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT get-omap $KEY > $OBJECT_MAP_FILE_NAME
Example::
[root@osd ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' get-omap "" > zone_info.default.omap.txt
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' get-omap "" > zone_info.default.omap.txt
Set the object map key:
* **Setting the object map key**
Syntax::
Syntax:
.. code-block:: ini
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT set-omap $KEY < $OBJECT_MAP_FILE_NAME
Example::
[root@osd ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' set-omap "" < zone_info.default.omap.txt
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' set-omap "" < zone_info.default.omap.txt
Remove the object map key:
* **Removing the object map key**
Syntax::
Syntax:
.. code-block:: ini
ceph-objectstore-tool --data-path $PATH_TO_OSD --pgid $PG_ID $OBJECT rm-omap $KEY
Example::
[root@osd ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' rm-omap ""
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0 --pgid 0.1c '{"oid":"zone_info.default","key":"","snapid":-2,"hash":235010478,"max":0,"pool":11,"namespace":""}' rm-omap ""
Listing an Object's Attributes

View File

@ -18,14 +18,16 @@ Synopsis
Description
===========
**ceph-osd** is the object storage daemon for the Ceph distributed file
system. It is responsible for storing objects on a local file system
and providing access to them over the network.
**ceph-osd** is the **o**\bject **s**\torage **d**\aemon for the Ceph
distributed file system. It manages data on local storage with redundancy and
provides access to that data over the network.
The datapath argument should be a directory on a xfs file system
where the object data resides. The journal is optional, and is only
useful performance-wise when it resides on a different disk than
datapath with low latency (ideally, an NVRAM device).
For Filestore-backed clusters, the argument of the ``--osd-data datapath``
option (which is ``datapath`` in this example) should be a directory on an XFS
file system where the object data resides. The journal is optional. The journal
improves performance only when it resides on a different disk than the disk
specified by ``datapath`` . The storage medium on which the journal is stored
should be a low-latency medium (ideally, an SSD device).
Options

View File

@ -56,7 +56,7 @@ Options
.. code:: bash
[build]$ python3 -m venv venv && source venv/bin/activate && pip3 install cmd2
[build]$ python3 -m venv venv && source venv/bin/activate && pip3 install cmd2 colorama
[build]$ source vstart_environment.sh && source venv/bin/activate && python3 ../src/tools/cephfs/shell/cephfs-shell
Commands

View File

@ -199,6 +199,50 @@ Advanced
option is enabled, a namespace operation may complete before the MDS
replies, if it has sufficient capabilities to do so.
:command:`crush_location=x`
Specify the location of the client in terms of CRUSH hierarchy (since 5.8).
This is a set of key-value pairs separated from each other by '|', with
keys separated from values by ':'. Note that '|' may need to be quoted
or escaped to avoid it being interpreted as a pipe by the shell. The key
is the bucket type name (e.g. rack, datacenter or region with default
bucket types) and the value is the bucket name. For example, to indicate
that the client is local to rack "myrack", data center "mydc" and region
"myregion"::
crush_location=rack:myrack|datacenter:mydc|region:myregion
Each key-value pair stands on its own: "myrack" doesn't need to reside in
"mydc", which in turn doesn't need to reside in "myregion". The location
is not a path to the root of the hierarchy but rather a set of nodes that
are matched independently. "Multipath" locations are supported, so it is
possible to indicate locality for multiple parallel hierarchies::
crush_location=rack:myrack1|rack:myrack2|datacenter:mydc
:command:`read_from_replica=<no|balance|localize>`
- ``no``: Disable replica reads, always pick the primary OSD (since 5.8, default).
- ``balance``: When a replicated pool receives a read request, pick a random
OSD from the PG's acting set to serve it (since 5.8).
This mode is safe for general use only since Octopus (i.e. after "ceph osd
require-osd-release octopus"). Otherwise it should be limited to read-only
workloads such as snapshots.
- ``localize``: When a replicated pool receives a read request, pick the most
local OSD to serve it (since 5.8). The locality metric is calculated against
the location of the client given with crush_location; a match with the
lowest-valued bucket type wins. For example, an OSD in a matching rack
is closer than an OSD in a matching data center, which in turn is closer
than an OSD in a matching region.
This mode is safe for general use only since Octopus (i.e. after "ceph osd
require-osd-release octopus"). Otherwise it should be limited to read-only
workloads such as snapshots.
Examples
========

View File

@ -333,7 +333,7 @@ Commands
be specified.
:command:`flatten` [--encryption-format *encryption-format* --encryption-passphrase-file *passphrase-file*]... *image-spec*
If image is a clone, copy all shared blocks from the parent snapshot and
If the image is a clone, copy all shared blocks from the parent snapshot and
make the child independent of the parent, severing the link between
parent snap and child. The parent snapshot can be unprotected and
deleted if it has no further dependent clones.
@ -390,7 +390,7 @@ Commands
Set metadata key with the value. They will displayed in `image-meta list`.
:command:`import` [--export-format *format (1 or 2)*] [--image-format *format-id*] [--object-size *size-in-B/K/M*] [--stripe-unit *size-in-B/K/M* --stripe-count *num*] [--image-feature *feature-name*]... [--image-shared] *src-path* [*image-spec*]
Create a new image and imports its data from path (use - for
Create a new image and import its data from path (use - for
stdin). The import operation will try to create sparse rbd images
if possible. For import from stdin, the sparsification unit is
the data block size of the destination image (object size).
@ -402,14 +402,14 @@ Commands
of image, but also the snapshots and other properties, such as image_order, features.
:command:`import-diff` *src-path* *image-spec*
Import an incremental diff of an image and applies it to the current image. If the diff
Import an incremental diff of an image and apply it to the current image. If the diff
was generated relative to a start snapshot, we verify that snapshot already exists before
continuing. If there was an end snapshot we verify it does not already exist before
applying the changes, and create the snapshot when we are done.
:command:`info` *image-spec* | *snap-spec*
Will dump information (such as size and object size) about a specific rbd image.
If image is a clone, information about its parent is also displayed.
If the image is a clone, information about its parent is also displayed.
If a snapshot is specified, whether it is protected is shown as well.
:command:`journal client disconnect` *journal-spec*
@ -472,7 +472,7 @@ Commands
the destination image are lost.
:command:`migration commit` *image-spec*
Commit image migration. This step is run after a successful migration
Commit image migration. This step is run after successful migration
prepare and migration execute steps and removes the source image data.
:command:`migration execute` *image-spec*
@ -499,14 +499,12 @@ Commands
:command:`mirror image disable` [--force] *image-spec*
Disable RBD mirroring for an image. If the mirroring is
configured in ``image`` mode for the image's pool, then it
can be explicitly disabled mirroring for each image within
the pool.
must be disabled for each image individually.
:command:`mirror image enable` *image-spec* *mode*
Enable RBD mirroring for an image. If the mirroring is
configured in ``image`` mode for the image's pool, then it
can be explicitly enabled mirroring for each image within
the pool.
must be enabled for each image individually.
The mirror image mode can either be ``journal`` (default) or
``snapshot``. The ``journal`` mode requires the RBD journaling
@ -523,7 +521,7 @@ Commands
:command:`mirror pool demote` [*pool-name*]
Demote all primary images within a pool to non-primary.
Every mirroring enabled image will demoted in the pool.
Every mirror-enabled image in the pool will be demoted.
:command:`mirror pool disable` [*pool-name*]
Disable RBD mirroring by default within a pool. When mirroring
@ -551,7 +549,7 @@ Commands
The default for *remote client name* is "client.admin".
This requires mirroring mode is enabled.
This requires mirroring to be enabled on the pool.
:command:`mirror pool peer remove` [*pool-name*] *uuid*
Remove a mirroring peer from a pool. The peer uuid is available
@ -564,12 +562,12 @@ Commands
:command:`mirror pool promote` [--force] [*pool-name*]
Promote all non-primary images within a pool to primary.
Every mirroring enabled image will promoted in the pool.
Every mirror-enabled image in the pool will be promoted.
:command:`mirror pool status` [--verbose] [*pool-name*]
Show status for all mirrored images in the pool.
With --verbose, also show additionally output status
details for every mirroring image in the pool.
With ``--verbose``, show additional output status
details for every mirror-enabled image in the pool.
:command:`mirror snapshot schedule add` [-p | --pool *pool*] [--namespace *namespace*] [--image *image*] *interval* [*start-time*]
Add mirror snapshot schedule.
@ -603,7 +601,7 @@ Commands
specified to rebuild an invalid object map for a snapshot.
:command:`pool init` [*pool-name*] [--force]
Initialize pool for use by RBD. Newly created pools must initialized
Initialize pool for use by RBD. Newly created pools must be initialized
prior to use.
:command:`resize` (-s | --size *size-in-M/G/T*) [--allow-shrink] [--encryption-format *encryption-format* --encryption-passphrase-file *passphrase-file*]... *image-spec*
@ -615,7 +613,7 @@ Commands
snapshots, this fails and nothing is deleted.
:command:`snap create` *snap-spec*
Create a new snapshot. Requires the snapshot name parameter specified.
Create a new snapshot. Requires the snapshot name parameter to be specified.
:command:`snap limit clear` *image-spec*
Remove any previously set limit on the number of snapshots allowed on
@ -625,7 +623,7 @@ Commands
Set a limit for the number of snapshots allowed on an image.
:command:`snap ls` *image-spec*
Dump the list of snapshots inside a specific image.
Dump the list of snapshots of a specific image.
:command:`snap protect` *snap-spec*
Protect a snapshot from deletion, so that clones can be made of it
@ -668,9 +666,11 @@ Commands
:command:`trash ls` [*pool-name*]
List all entries from trash.
:command:`trash mv` *image-spec*
:command:`trash mv` [--expires-at <expires-at>] *image-spec*
Move an image to the trash. Images, even ones actively in-use by
clones, can be moved to the trash and deleted at a later time.
clones, can be moved to the trash and deleted at a later time. Use
``--expires-at`` to set the expiration time of an image after which
it's allowed to be removed.
:command:`trash purge` [*pool-name*]
Remove all expired images from trash.
@ -678,10 +678,10 @@ Commands
:command:`trash restore` *image-id*
Restore an image from trash.
:command:`trash rm` *image-id*
Delete an image from trash. If image deferment time has not expired
you can not removed it unless use force. But an actively in-use by clones
or has snapshots can not be removed.
:command:`trash rm` [--force] *image-id*
Delete an image from trash. If the image deferment time has not expired
it can be removed using ``--force``. An image that is actively in-use by clones
or has snapshots cannot be removed.
:command:`trash purge schedule add` [-p | --pool *pool*] [--namespace *namespace*] *interval* [*start-time*]
Add trash purge schedule.

View File

@ -568,6 +568,9 @@ If the NFS service is running on a non-standard port number:
.. note:: Only NFS v4.0+ is supported.
.. note:: As of this writing (01 Jan 2024), no version of Microsoft Windows
supports mouting an NFS v4.x export natively.
Troubleshooting
===============

View File

@ -151,3 +151,96 @@ ceph-mgr and check the logs.
With logging set to debug for the manager the module will print various logging
lines prefixed with *mgr[zabbix]* for easy filtering.
Installing zabbix-agent 2
-------------------------
*The procedures that explain the installation of Zabbix 2 were developed by John Jasen.*
Follow the instructions in the sections :ref:`mgr_zabbix_2_nodes`,
:ref:`mgr_zabbix_2_cluster`, and :ref:`mgr_zabbix_2_server` to install a Zabbix
server to monitor your Ceph cluster.
.. _mgr_zabbix_2_nodes:
Ceph MGR Nodes
^^^^^^^^^^^^^^
#. Download an appropriate Zabbix release from https://www.zabbix.com/download
or install a package from the Zabbix repositories.
#. Use your package manager to remove any other Zabbix agents.
#. Install ``zabbix-agent 2`` using the instructions at
https://www.zabbix.com/download.
#. Edit ``/etc/zabbix/zabbix-agent2.conf``. Add your Zabbix monitoring servers
and your localhost to the ``Servers`` line of ``zabbix-agent2.conf``::
Server=127.0.0.1,zabbix2.example.com,zabbix1.example.com
#. Start or restart the ``zabbix-agent2`` agent:
.. prompt:: bash #
systemctl restart zabbix-agent2
.. _mgr_zabbix_2_cluster:
Ceph Cluster
^^^^^^^^^^^^
#. Enable the ``restful`` module:
.. prompt:: bash #
ceph mgr module enable restful
#. Generate a self-signed certificate. This step is optional:
.. prompt:: bash #
restful create-self-signed-cert
#. Create an API user called ``zabbix-monitor``:
.. prompt:: bash #
ceph restful create-key zabbix-monitor
The output of this command, an API key, will look something like this::
a4bb2019-XXXX-YYYY-ZZZZ-abcdefghij
#. Save the generated API key. It will be necessary later.
#. Test API access by using ``zabbix-get``:
.. note:: This step is optional.
.. prompt:: bash #
zabbix_get -s 127.0.0.1 -k ceph.ping["${CEPH.CONNSTRING}","${CEPH.USER}","{CEPH.API.KEY}"
Example:
.. prompt:: bash #
zabbix_get -s 127.0.0.1 -k ceph.ping["https://localhost:8003","zabbix-monitor","a4bb2019-XXXX-YYYY-ZZZZ-abcdefghij"]
.. note:: You may need to install ``zabbix-get`` via your package manager.
.. _mgr_zabbix_2_server:
Zabbix Server
^^^^^^^^^^^^^
#. Create a host for the Ceph monitoring servers.
#. Add the template ``Ceph by Zabbix agent 2`` to the host.
#. Inform the host of the keys:
#. Go to “Macros” on the host.
#. Show “Inherited and host macros”.
#. Change ``${CEPH.API.KEY}`` and ``${CEPH.USER}`` to the values provided
under ``ceph restful create-key``, above. Example::
{$CEPH.API.KEY} a4bb2019-XXXX-YYYY-ZZZZ-abcdefghij
{$CEPH.USER} zabbix-monitor
#. Update the host. Within a few cycles, data will populate the server.

View File

@ -470,5 +470,8 @@ Useful queries
rate(ceph_rbd_read_latency_sum[30s]) / rate(ceph_rbd_read_latency_count[30s]) * on (instance) group_left (ceph_daemon) ceph_rgw_metadata
Hardware monitoring
===================
See :ref:`hardware-monitoring`

View File

@ -1,3 +1,5 @@
.. _librados-intro:
==========================
Introduction to librados
==========================

View File

@ -1,3 +1,5 @@
.. _librados-python:
===================
Librados (Python)
===================

View File

@ -358,7 +358,7 @@ OSD and run the following command:
ceph-bluestore-tool \
--path <data path> \
--sharding="m(3) p(3,0-12) o(3,0-13)=block_cache={type=binned_lru} l p" \
--sharding="m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} l p" \
reshard
.. confval:: bluestore_rocksdb_cf

View File

@ -123,11 +123,10 @@ OSD host, run the following commands:
ssh {osd-host}
sudo mkdir /var/lib/ceph/osd/ceph-{osd-number}
The ``osd_data`` path ought to lead to a mount point that has mounted on it a
device that is distinct from the device that contains the operating system and
the daemons. To use a device distinct from the device that contains the
The ``osd_data`` path must lead to a device that is not shared with the
operating system. To use a device other than the device that contains the
operating system and the daemons, prepare it for use with Ceph and mount it on
the directory you just created by running the following commands:
the directory you just created by running commands of the following form:
.. prompt:: bash $

View File

@ -151,7 +151,7 @@ generates a catalog of all objects in each placement group and compares each
primary object to its replicas, ensuring that no objects are missing or
mismatched. Light scrubbing checks the object size and attributes, and is
usually done daily. Deep scrubbing reads the data and uses checksums to ensure
data integrity, and is usually done weekly. The freqeuncies of both light
data integrity, and is usually done weekly. The frequencies of both light
scrubbing and deep scrubbing are determined by the cluster's configuration,
which is fully under your control and subject to the settings explained below
in this section.

View File

@ -6,12 +6,41 @@
.. index:: pools; configuration
Ceph uses default values to determine how many placement groups (PGs) will be
assigned to each pool. We recommend overriding some of the defaults.
Specifically, we recommend setting a pool's replica size and overriding the
default number of placement groups. You can set these values when running
`pool`_ commands. You can also override the defaults by adding new ones in the
``[global]`` section of your Ceph configuration file.
The number of placement groups that the CRUSH algorithm assigns to each pool is
determined by the values of variables in the centralized configuration database
in the monitor cluster.
Both containerized deployments of Ceph (deployments made using ``cephadm`` or
Rook) and non-containerized deployments of Ceph rely on the values in the
central configuration database in the monitor cluster to assign placement
groups to pools.
Example Commands
----------------
To see the value of the variable that governs the number of placement groups in a given pool, run a command of the following form:
.. prompt:: bash
ceph config get osd osd_pool_default_pg_num
To set the value of the variable that governs the number of placement groups in a given pool, run a command of the following form:
.. prompt:: bash
ceph config set osd osd_pool_default_pg_num
Manual Tuning
-------------
In some cases, it might be advisable to override some of the defaults. For
example, you might determine that it is wise to set a pool's replica size and
to override the default number of placement groups in the pool. You can set
these values when running `pool`_ commands.
See Also
--------
See :ref:`pg-autoscaler`.
.. literalinclude:: pool-pg.conf

View File

@ -344,12 +344,13 @@ addresses, repeat this process.
Changing a Monitor's IP address (Advanced Method)
-------------------------------------------------
There are cases in which the method outlined in :ref"`<Changing a Monitor's IP
Address (Preferred Method)> operations_add_or_rm_mons_changing_mon_ip` cannot
be used. For example, it might be necessary to move the cluster's monitors to a
different network, to a different part of the datacenter, or to a different
datacenter altogether. It is still possible to change the monitors' IP
addresses, but a different method must be used.
There are cases in which the method outlined in
:ref:`operations_add_or_rm_mons_changing_mon_ip` cannot be used. For example,
it might be necessary to move the cluster's monitors to a different network, to
a different part of the datacenter, or to a different datacenter altogether. It
is still possible to change the monitors' IP addresses, but a different method
must be used.
For such cases, a new monitor map with updated IP addresses for every monitor
in the cluster must be generated and injected on each monitor. Although this
@ -357,11 +358,11 @@ method is not particularly easy, such a major migration is unlikely to be a
routine task. As stated at the beginning of this section, existing monitors are
not supposed to change their IP addresses.
Continue with the monitor configuration in the example from :ref"`<Changing a
Monitor's IP Address (Preferred Method)>
operations_add_or_rm_mons_changing_mon_ip` . Suppose that all of the monitors
are to be moved from the ``10.0.0.x`` range to the ``10.1.0.x`` range, and that
these networks are unable to communicate. Carry out the following procedure:
Continue with the monitor configuration in the example from
:ref:`operations_add_or_rm_mons_changing_mon_ip`. Suppose that all of the
monitors are to be moved from the ``10.0.0.x`` range to the ``10.1.0.x`` range,
and that these networks are unable to communicate. Carry out the following
procedure:
#. Retrieve the monitor map (``{tmp}`` is the path to the retrieved monitor
map, and ``{filename}`` is the name of the file that contains the retrieved
@ -448,7 +449,135 @@ and inject the modified monitor map into each new monitor.
Migration to the new location is now complete. The monitors should operate
successfully.
Using cephadm to change the public network
==========================================
Overview
--------
The procedure in this overview section provides only the broad outlines of
using ``cephadm`` to change the public network.
#. Create backups of all keyrings, configuration files, and the current monmap.
#. Stop the cluster and disable ``ceph.target`` to prevent the daemons from
starting.
#. Move the servers and power them on.
#. Change the network setup as desired.
Example Procedure
-----------------
.. note:: In this procedure, the "old network" has addresses of the form
``10.10.10.0/24`` and the "new network" has addresses of the form
``192.168.160.0/24``.
#. Enter the shell of the first monitor:
.. prompt:: bash #
cephadm shell --name mon.reef1
#. Extract the current monmap from ``mon.reef1``:
.. prompt:: bash #
ceph-mon -i reef1 --extract-monmap monmap
#. Print the content of the monmap:
.. prompt:: bash #
monmaptool --print monmap
::
monmaptool: monmap file monmap
epoch 5
fsid 2851404a-d09a-11ee-9aaa-fa163e2de51a
last_changed 2024-02-21T09:32:18.292040+0000
created 2024-02-21T09:18:27.136371+0000
min_mon_release 18 (reef)
election_strategy: 1
0: [v2:10.10.10.11:3300/0,v1:10.10.10.11:6789/0] mon.reef1
1: [v2:10.10.10.12:3300/0,v1:10.10.10.12:6789/0] mon.reef2
2: [v2:10.10.10.13:3300/0,v1:10.10.10.13:6789/0] mon.reef3
#. Remove monitors with old addresses:
.. prompt:: bash #
monmaptool --rm reef1 --rm reef2 --rm reef3 monmap
#. Add monitors with new addresses:
.. prompt:: bash #
monmaptool --addv reef1 [v2:192.168.160.11:3300/0,v1:192.168.160.11:6789/0] --addv reef2 [v2:192.168.160.12:3300/0,v1:192.168.160.12:6789/0] --addv reef3 [v2:192.168.160.13:3300/0,v1:192.168.160.13:6789/0] monmap
#. Verify that the changes to the monmap have been made successfully:
.. prompt:: bash #
monmaptool --print monmap
::
monmaptool: monmap file monmap
epoch 4
fsid 2851404a-d09a-11ee-9aaa-fa163e2de51a
last_changed 2024-02-21T09:32:18.292040+0000
created 2024-02-21T09:18:27.136371+0000
min_mon_release 18 (reef)
election_strategy: 1
0: [v2:192.168.160.11:3300/0,v1:192.168.160.11:6789/0] mon.reef1
1: [v2:192.168.160.12:3300/0,v1:192.168.160.12:6789/0] mon.reef2
2: [v2:192.168.160.13:3300/0,v1:192.168.160.13:6789/0] mon.reef3
#. Inject the new monmap into the Ceph cluster:
.. prompt:: bash #
ceph-mon -i reef1 --inject-monmap monmap
#. Repeat the steps above for all other monitors in the cluster.
#. Update ``/var/lib/ceph/{FSID}/mon.{MON}/config``.
#. Start the monitors.
#. Update the ceph ``public_network``:
.. prompt:: bash #
ceph config set mon public_network 192.168.160.0/24
#. Update the configuration files of the managers
(``/var/lib/ceph/{FSID}/mgr.{mgr}/config``) and start them. Orchestrator
will now be available, but it will attempt to connect to the old network
because the host list contains the old addresses.
#. Update the host addresses by running commands of the following form:
.. prompt:: bash #
ceph orch host set-addr reef1 192.168.160.11
ceph orch host set-addr reef2 192.168.160.12
ceph orch host set-addr reef3 192.168.160.13
#. Wait a few minutes for the orchestrator to connect to each host.
#. Reconfigure the OSDs so that their config files are automatically updated:
.. prompt:: bash #
ceph orch reconfig osd
*The above procedure was developed by Eugen Block and was successfully tested
in February 2024 on Ceph version 18.2.1 (Reef).*
.. _Manual Deployment: ../../../install/manual-deployment
.. _Monitor Bootstrap: ../../../dev/mon-bootstrap

View File

@ -474,27 +474,25 @@ following command:
ceph tell mds.{mds-id} config set {setting} {value}
Example:
Example: to enable debug messages, run the following command:
.. prompt:: bash $
ceph tell mds.0 config set debug_ms 1
To enable debug messages, run the following command:
To display the status of all metadata servers, run the following command:
.. prompt:: bash $
ceph mds stat
To display the status of all metadata servers, run the following command:
To mark the active metadata server as failed (and to trigger failover to a
standby if a standby is present), run the following command:
.. prompt:: bash $
ceph mds fail 0
To mark the active metadata server as failed (and to trigger failover to a
standby if a standby is present), run the following command:
.. todo:: ``ceph mds`` subcommands missing docs: set, dump, getmap, stop, setmap

View File

@ -57,53 +57,62 @@ case for most clusters), its CRUSH location can be specified as follows::
``pod``, ``pdu``, ``rack``, ``chassis``, and ``host``. These defined
types suffice for nearly all clusters, but can be customized by
modifying the CRUSH map.
#. Not all keys need to be specified. For example, by default, Ceph
automatically sets an ``OSD``'s location as ``root=default
host=HOSTNAME`` (as determined by the output of ``hostname -s``).
The CRUSH location for an OSD can be modified by adding the ``crush location``
option in ``ceph.conf``. When this option has been added, every time the OSD
The CRUSH location for an OSD can be set by adding the ``crush_location``
option in ``ceph.conf``, example:
crush_location = root=default row=a rack=a2 chassis=a2a host=a2a1
When this option has been added, every time the OSD
starts it verifies that it is in the correct location in the CRUSH map and
moves itself if it is not. To disable this automatic CRUSH map management, add
the following to the ``ceph.conf`` configuration file in the ``[osd]``
section::
osd crush update on start = false
osd_crush_update_on_start = false
Note that this action is unnecessary in most cases.
If the ``crush_location`` is not set explicitly,
a default of ``root=default host=HOSTNAME`` is used for ``OSD``s,
where the hostname is determined by the output of the ``hostname -s`` command.
.. note:: If you switch from this default to an explicitly set ``crush_location``,
do not forget to include ``root=default`` because existing CRUSH rules refer to it.
Custom location hooks
---------------------
A custom location hook can be used to generate a more complete CRUSH location
on startup. The CRUSH location is determined by, in order of preference:
A custom location hook can be used to generate a more complete CRUSH location,
on startup.
#. A ``crush location`` option in ``ceph.conf``
#. A default of ``root=default host=HOSTNAME`` where the hostname is determined
by the output of the ``hostname -s`` command
This is useful when some location fields are not known at the time
``ceph.conf`` is written (for example, fields ``rack`` or ``datacenter``
when deploying a single configuration across multiple datacenters).
A script can be written to provide additional location fields (for example,
``rack`` or ``datacenter``) and the hook can be enabled via the following
config option::
If configured, executed, and parsed successfully, the hook's output replaces
any previously set CRUSH location.
crush location hook = /path/to/customized-ceph-crush-location
The hook hook can be enabled in ``ceph.conf`` by providing a path to an
executable file (often a script), example::
crush_location_hook = /path/to/customized-ceph-crush-location
This hook is passed several arguments (see below). The hook outputs a single
line to ``stdout`` that contains the CRUSH location description. The output
resembles the following:::
line to ``stdout`` that contains the CRUSH location description. The arguments
resemble the following:::
--cluster CLUSTER --id ID --type TYPE
Here the cluster name is typically ``ceph``, the ``id`` is the daemon
identifier or (in the case of OSDs) the OSD number, and the daemon type is
``osd``, ``mds, ``mgr``, or ``mon``.
``osd``, ``mds``, ``mgr``, or ``mon``.
For example, a simple hook that specifies a rack location via a value in the
file ``/etc/rack`` might be as follows::
file ``/etc/rack`` (assuming it contains no spaces) might be as follows::
#!/bin/sh
echo "host=$(hostname -s) rack=$(cat /etc/rack) root=default"
echo "root=default rack=$(cat /etc/rack) host=$(hostname -s)"
CRUSH structure

View File

@ -96,7 +96,9 @@ Where:
``--force``
:Description: Override an existing profile by the same name, and allow
setting a non-4K-aligned stripe_unit.
setting a non-4K-aligned stripe_unit. Overriding an existing
profile can be dangerous, and thus ``--yes-i-really-mean-it``
must be used as well.
:Type: String
:Required: No.

View File

@ -179,6 +179,8 @@ This can be enabled only on a pool residing on BlueStore OSDs, since
BlueStore's checksumming is used during deep scrubs to detect bitrot
or other corruption. Using Filestore with EC overwrites is not only
unsafe, but it also results in lower performance compared to BlueStore.
Moreover, Filestore is deprecated and any Filestore OSDs in your cluster
should be migrated to BlueStore.
Erasure-coded pools do not support omap, so to use them with RBD and
CephFS you must instruct them to store their data in an EC pool and
@ -192,6 +194,182 @@ erasure-coded pool as the ``--data-pool`` during image creation:
For CephFS, an erasure-coded pool can be set as the default data pool during
file system creation or via `file layouts <../../../cephfs/file-layouts>`_.
Erasure-coded pool overhead
---------------------------
The overhead factor (space amplification) of an erasure-coded pool
is `(k+m) / k`. For a 4,2 profile, the overhead is
thus 1.5, which means that 1.5 GiB of underlying storage are used to store
1 GiB of user data. Contrast with default three-way replication, with
which the overhead factor is 3.0. Do not mistake erasure coding for a free
lunch: there is a significant performance tradeoff, especially when using HDDs
and when performing cluster recovery or backfill.
Below is a table showing the overhead factors for various values of `k` and `m`.
As `m` increases above 2, the incremental capacity overhead gain quickly
experiences diminishing returns but the performance impact grows proportionally.
We recommend that you do not choose a profile with `k` > 4 or `m` > 2 until
and unless you fully understand the ramifications, including the number of
failure domains your cluster topology must contain. If you choose `m=1`,
expect data unavailability during maintenance and data loss if component
failures overlap.
.. list-table:: Erasure coding overhead
:widths: 4 4 4 4 4 4 4 4 4 4 4 4
:header-rows: 1
:stub-columns: 1
* -
- m=1
- m=2
- m=3
- m=4
- m=4
- m=6
- m=7
- m=8
- m=9
- m=10
- m=11
* - k=1
- 2.00
- 3.00
- 4.00
- 5.00
- 6.00
- 7.00
- 8.00
- 9.00
- 10.00
- 11.00
- 12.00
* - k=2
- 1.50
- 2.00
- 2.50
- 3.00
- 3.50
- 4.00
- 4.50
- 5.00
- 5.50
- 6.00
- 6.50
* - k=3
- 1.33
- 1.67
- 2.00
- 2.33
- 2.67
- 3.00
- 3.33
- 3.67
- 4.00
- 4.33
- 4.67
* - k=4
- 1.25
- 1.50
- 1.75
- 2.00
- 2.25
- 2.50
- 2.75
- 3.00
- 3.25
- 3.50
- 3.75
* - k=5
- 1.20
- 1.40
- 1.60
- 1.80
- 2.00
- 2.20
- 2.40
- 2.60
- 2.80
- 3.00
- 3.20
* - k=6
- 1.16
- 1.33
- 1.50
- 1.66
- 1.83
- 2.00
- 2.17
- 2.33
- 2.50
- 2.66
- 2.83
* - k=7
- 1.14
- 1.29
- 1.43
- 1.58
- 1.71
- 1.86
- 2.00
- 2.14
- 2.29
- 2.43
- 2.58
* - k=8
- 1.13
- 1.25
- 1.38
- 1.50
- 1.63
- 1.75
- 1.88
- 2.00
- 2.13
- 2.25
- 2.38
* - k=9
- 1.11
- 1.22
- 1.33
- 1.44
- 1.56
- 1.67
- 1.78
- 1.88
- 2.00
- 2.11
- 2.22
* - k=10
- 1.10
- 1.20
- 1.30
- 1.40
- 1.50
- 1.60
- 1.70
- 1.80
- 1.90
- 2.00
- 2.10
* - k=11
- 1.09
- 1.18
- 1.27
- 1.36
- 1.45
- 1.54
- 1.63
- 1.72
- 1.82
- 1.91
- 2.00
Erasure-coded pools and cache tiering
-------------------------------------

View File

@ -21,6 +21,7 @@ and, monitoring an operating cluster.
monitoring-osd-pg
user-management
pg-repair
pgcalc/index
.. raw:: html

View File

@ -517,6 +517,8 @@ multiple monitors are running to ensure proper functioning of your Ceph
cluster. Check monitor status regularly in order to ensure that all of the
monitors are running.
.. _display-mon-map:
To display the monitor map, run the following command:
.. prompt:: bash $

View File

@ -0,0 +1,68 @@
.. _pgcalc:
=======
PG Calc
=======
.. raw:: html
<link rel="stylesheet" id="wp-job-manager-job-listings-css" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/wp-content/plugins/wp-job-manager/assets/dist/css/job-listings.css" type="text/css" media="all"/>
<link rel="stylesheet" id="ceph/googlefont-css" href="https://web.archive.org/web/20230614135557cs_/https://fonts.googleapis.com/css?family=Raleway%3A300%2C400%2C700&amp;ver=5.7.2" type="text/css" media="all"/>
<link rel="stylesheet" id="Stylesheet-css" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/wp-content/themes/cephTheme/Resources/Styles/style.min.css" type="text/css" media="all"/>
<link rel="stylesheet" id="tablepress-default-css" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/wp-content/plugins/tablepress/css/default.min.css" type="text/css" media="all"/>
<link rel="stylesheet" id="jetpack_css-css" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/wp-content/plugins/jetpack/css/jetpack.css" type="text/css" media="all"/>
<script type="text/javascript" src="https://web.archive.org/web/20230614135557js_/https://old.ceph.com/wp-content/themes/cephTheme/foundation_framework/js/vendor/jquery.js" id="jquery-js"></script>
<link rel="stylesheet" href="https://web.archive.org/web/20230614135557cs_/https://ajax.googleapis.com/ajax/libs/jqueryui/1.11.2/themes/smoothness/jquery-ui.css"/>
<link rel="stylesheet" href="https://web.archive.org/web/20230614135557cs_/https://old.ceph.com/pgcalc_assets/pgcalc.css"/>
<script src="https://ajax.googleapis.com/ajax/libs/jqueryui/1.11.2/jquery-ui.min.js"></script>
<script src="../../../_static/js/pgcalc.js"></script>
<div id="pgcalcdiv">
<div id="instructions">
<h2>Ceph PGs per Pool Calculator</h2><br/><fieldset><legend>Instructions</legend>
<ol>
<li>Confirm your understanding of the fields by reading through the Key below.</li>
<li>Select a <b>"Ceph Use Case"</b> from the drop down menu.</li>
<li>Adjust the values in the <span class="inputColor addBorder" style="font-weight: bold;">"Green"</span> shaded fields below.<br/>
<b>Tip:</b> Headers can be clicked to change the value throughout the table.</li>
<li>You will see the Suggested PG Count update based on your inputs.</li>
<li>Click the <b>"Add Pool"</b> button to create a new line for a new pool.</li>
<li>Click the <span class="ui-icon ui-icon-trash" style="display:inline-block;"></span> icon to delete the specific Pool.</li>
<li>For more details on the logic used and some important details, see the area below the table.</li>
<li>Once all values have been adjusted, click the <b>"Generate Commands"</b> button to get the pool creation commands.</li>
</ol></fieldset>
</div>
<div id="beforeTable"></div>
<br/>
<p class="validateTips">&nbsp;</p>
<label for="presetType">Ceph Use Case Selector:</label><br/><select id="presetType"></select><button style="margin-left: 200px;" id="btnAddPool" type="button">Add Pool</button><button type="button" id="btnGenCommands" download="commands.txt">Generate Commands</button>
<div id="pgsPerPoolTable">
<table id="pgsperpool">
</table>
</div> <!-- id = pgsPerPoolTable -->
<br/>
<div id="afterTable"></div>
<div id="countLogic"><fieldset><legend>Logic behind Suggested PG Count</legend>
<br/>
<div class="upperFormula">( Target PGs per OSD ) x ( OSD # ) x ( %Data )</div>
<div class="lowerFormula">( Size )</div>
<ol id="countLogicList">
<li>If the value of the above calculation is less than the value of <b>( OSD# ) / ( Size )</b>, then the value is updated to the value of <b>( OSD# ) / ( Size )</b>. This is to ensure even load / data distribution by allocating at least one Primary or Secondary PG to every OSD for every Pool.</li>
<li>The output value is then rounded to the <b>nearest power of 2</b>.<br/><b>Tip:</b> The nearest power of 2 provides a marginal improvement in efficiency of the <a href="https://web.archive.org/web/20230614135557/http://ceph.com/docs/master/rados/operations/crush-map/" title="CRUSH Map Details">CRUSH</a> algorithm.</li>
<li>If the nearest power of 2 is more than <b>25%</b> below the original value, the next higher power of 2 is used.</li>
</ol>
<b>Objective</b>
<ul><li>The objective of this calculation and the target ranges noted in the &quot;Key&quot; section above are to ensure that there are sufficient Placement Groups for even data distribution throughout the cluster, while not going high enough on the PG per OSD ratio to cause problems during Recovery and/or Backfill operations.</li></ul>
<b>Effects of enpty or non-active pools:</b>
<ul>
<li>Empty or otherwise non-active pools should not be considered helpful toward even data distribution throughout the cluster.</li>
<li>However, the PGs associated with these empty / non-active pools still consume memory and CPU overhead.</li>
</ul>
</fieldset>
</div>
<div id="commands" title="Pool Creation Commands"><code><pre id="commandCode"></pre></code></div>
</div>

View File

@ -4,6 +4,21 @@
Placement Groups
==================
Placement groups (PGs) are subsets of each logical Ceph pool. Placement groups
perform the function of placing objects (as a group) into OSDs. Ceph manages
data internally at placement-group granularity: this scales better than would
managing individual RADOS objects. A cluster that has a larger number of
placement groups (for example, 150 per OSD) is better balanced than an
otherwise identical cluster with a smaller number of placement groups.
Cephs internal RADOS objects are each mapped to a specific placement group,
and each placement group belongs to exactly one Ceph pool.
See Sage Weil's blog post `New in Nautilus: PG merging and autotuning
<https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/>`_
for more information about the relationship of placement groups to pools and to
objects.
.. _pg-autoscaler:
Autoscaling placement groups
@ -131,11 +146,11 @@ The output will resemble the following::
if a ``pg_num`` change is in progress, the current number of PGs that the
pool is working towards.
- **NEW PG_NUM** (if present) is the value that the system is recommending the
``pg_num`` of the pool to be changed to. It is always a power of 2, and it is
present only if the recommended value varies from the current value by more
than the default factor of ``3``. To adjust this factor (in the following
example, it is changed to ``2``), run the following command:
- **NEW PG_NUM** (if present) is the value that the system recommends that the
``pg_num`` of the pool should be. It is always a power of two, and it
is present only if the recommended value varies from the current value by
more than the default factor of ``3``. To adjust this multiple (in the
following example, it is changed to ``2``), run the following command:
.. prompt:: bash #
@ -168,7 +183,6 @@ The output will resemble the following::
.. prompt:: bash #
ceph osd pool set .mgr crush_rule replicated-ssd
ceph osd pool set pool 1 crush_rule to replicated-ssd
This intervention will result in a small amount of backfill, but
typically this traffic completes quickly.
@ -626,15 +640,14 @@ pools, each with 512 PGs on 10 OSDs, the OSDs will have to handle ~50,000 PGs
each. This cluster will require significantly more resources and significantly
more time for peering.
For determining the optimal number of PGs per OSD, we recommend the `PGCalc`_
tool.
.. _setting the number of placement groups:
Setting the Number of PGs
=========================
:ref:`Placement Group Link <pgcalc>`
Setting the initial number of PGs in a pool must be done at the time you create
the pool. See `Create a Pool`_ for details.
@ -894,4 +907,3 @@ about it entirely (if it is too new to have a previous version). To mark the
.. _Create a Pool: ../pools#createpool
.. _Mapping PGs to OSDs: ../../../architecture#mapping-pgs-to-osds
.. _pgcalc: https://old.ceph.com/pgcalc/

View File

@ -18,15 +18,17 @@ Pools provide:
<../erasure-code>`_, resilience is defined as the number of coding chunks
(for example, ``m = 2`` in the default **erasure code profile**).
- **Placement Groups**: You can set the number of placement groups (PGs) for
the pool. In a typical configuration, the target number of PGs is
approximately one hundred PGs per OSD. This provides reasonable balancing
without consuming excessive computing resources. When setting up multiple
pools, be careful to set an appropriate number of PGs for each pool and for
the cluster as a whole. Each PG belongs to a specific pool: when multiple
pools use the same OSDs, make sure that the **sum** of PG replicas per OSD is
in the desired PG-per-OSD target range. To calculate an appropriate number of
PGs for your pools, use the `pgcalc`_ tool.
- **Placement Groups**: The :ref:`autoscaler <pg-autoscaler>` sets the number
of placement groups (PGs) for the pool. In a typical configuration, the
target number of PGs is approximately one-hundred and fifty PGs per OSD. This
provides reasonable balancing without consuming excessive computing
resources. When setting up multiple pools, set an appropriate number of PGs
for each pool and for the cluster as a whole. Each PG belongs to a specific
pool: when multiple pools use the same OSDs, make sure that the **sum** of PG
replicas per OSD is in the desired PG-per-OSD target range. See :ref:`Setting
the Number of Placement Groups <setting the number of placement groups>` for
instructions on how to manually set the number of placement groups per pool
(this procedure works only when the autoscaler is not used).
- **CRUSH Rules**: When data is stored in a pool, the placement of the object
and its replicas (or chunks, in the case of erasure-coded pools) in your
@ -94,19 +96,12 @@ To get even more information, you can execute this command with the ``--format``
Creating a Pool
===============
Before creating a pool, consult `Pool, PG and CRUSH Config Reference`_. Your
Ceph configuration file contains a setting (namely, ``pg_num``) that determines
the number of PGs. However, this setting's default value is NOT appropriate
for most systems. In most cases, you should override this default value when
creating your pool. For details on PG numbers, see `setting the number of
placement groups`_
For example:
.. prompt:: bash $
osd_pool_default_pg_num = 128
osd_pool_default_pgp_num = 128
Before creating a pool, consult `Pool, PG and CRUSH Config Reference`_. The
Ceph central configuration database in the monitor cluster contains a setting
(namely, ``pg_num``) that determines the number of PGs per pool when a pool has
been created and no per-pool value has been specified. It is possible to change
this value from its default. For more on the subject of setting the number of
PGs per pool, see `setting the number of placement groups`_.
.. note:: In Luminous and later releases, each pool must be associated with the
application that will be using the pool. For more information, see
@ -742,8 +737,6 @@ Managing pools that are flagged with ``--bulk``
===============================================
See :ref:`managing_bulk_flagged_pools`.
.. _pgcalc: https://old.ceph.com/pgcalc/
.. _Pool, PG and CRUSH Config Reference: ../../configuration/pool-pg-config-ref
.. _Bloom Filter: https://en.wikipedia.org/wiki/Bloom_filter
.. _setting the number of placement groups: ../placement-groups#set-the-number-of-placement-groups

View File

@ -121,8 +121,6 @@ your CRUSH map. This procedure shows how to do this.
rule stretch_rule {
id 1
min_size 1
max_size 10
type replicated
step take site1
step chooseleaf firstn 2 type host
@ -141,11 +139,15 @@ your CRUSH map. This procedure shows how to do this.
#. Run the monitors in connectivity mode. See `Changing Monitor Elections`_.
.. prompt:: bash $
ceph mon set election_strategy connectivity
#. Command the cluster to enter stretch mode. In this example, ``mon.e`` is the
tiebreaker monitor and we are splitting across data centers. The tiebreaker
monitor must be assigned a data center that is neither ``site1`` nor
``site2``. For this purpose you can create another data-center bucket named
``site3`` in your CRUSH and place ``mon.e`` there:
``site2``. This data center **should not** be defined in your CRUSH map, here
we are placing ``mon.e`` in a virtual data center called ``site3``:
.. prompt:: bash $

View File

@ -175,17 +175,19 @@ For each subsystem, there is a logging level for its output logs (a so-called
"log level") and a logging level for its in-memory logs (a so-called "memory
level"). Different values may be set for these two logging levels in each
subsystem. Ceph's logging levels operate on a scale of ``1`` to ``20``, where
``1`` is terse and ``20`` is verbose [#f1]_. As a general rule, the in-memory
logs are not sent to the output log unless one or more of the following
conditions obtain:
``1`` is terse and ``20`` is verbose. In certain rare cases, there are logging
levels that can take a value greater than 20. The resulting logs are extremely
verbose.
- a fatal signal is raised or
- an ``assert`` in source code is triggered or
- upon requested. Please consult `document on admin socket
<http://docs.ceph.com/en/latest/man/8/ceph/#daemon>`_ for more details.
The in-memory logs are not sent to the output log unless one or more of the
following conditions are true:
.. warning ::
.. [#f1] In certain rare cases, there are logging levels that can take a value greater than 20. The resulting logs are extremely verbose.
- a fatal signal has been raised or
- an assertion within Ceph code has been triggered or
- the sending of in-memory logs to the output log has been manually triggered.
Consult `the portion of the "Ceph Administration Tool documentation
that provides an example of how to submit admin socket commands
<http://docs.ceph.com/en/latest/man/8/ceph/#daemon>`_ for more detail.
Log levels and memory levels can be set either together or separately. If a
subsystem is assigned a single value, then that value determines both the log

View File

@ -85,23 +85,27 @@ Using the monitor's admin socket
================================
A monitor's admin socket allows you to interact directly with a specific daemon
by using a Unix socket file. This file is found in the monitor's ``run``
directory. The admin socket's default directory is
``/var/run/ceph/ceph-mon.ID.asok``, but this can be overridden and the admin
socket might be elsewhere, especially if your cluster's daemons are deployed in
containers. If you cannot find it, either check your ``ceph.conf`` for an
alternative path or run the following command:
by using a Unix socket file. This socket file is found in the monitor's ``run``
directory.
The admin socket's default directory is ``/var/run/ceph/ceph-mon.ID.asok``. It
is possible to override the admin socket's default location. If the default
location has been overridden, then the admin socket will be elsewhere. This is
often the case when a cluster's daemons are deployed in containers.
To find the directory of the admin socket, check either your ``ceph.conf`` for
an alternative path or run the following command:
.. prompt:: bash $
ceph-conf --name mon.ID --show-config-value admin_socket
The admin socket is available for use only when the monitor daemon is running.
Whenever the monitor has been properly shut down, the admin socket is removed.
However, if the monitor is not running and the admin socket persists, it is
likely that the monitor has been improperly shut down. In any case, if the
monitor is not running, it will be impossible to use the admin socket, and the
``ceph`` command is likely to return ``Error 111: Connection Refused``.
The admin socket is available for use only when the Monitor daemon is running.
Every time the Monitor is properly shut down, the admin socket is removed. If
the Monitor is not running and yet the admin socket persists, it is likely that
the Monitor has been improperly shut down. If the Monitor is not running, it
will be impossible to use the admin socket, and the ``ceph`` command is likely
to return ``Error 111: Connection Refused``.
To access the admin socket, run a ``ceph tell`` command of the following form
(specifying the daemon that you are interested in):
@ -110,7 +114,7 @@ To access the admin socket, run a ``ceph tell`` command of the following form
ceph tell mon.<id> mon_status
This command passes a ``help`` command to the specific running monitor daemon
This command passes a ``help`` command to the specified running Monitor daemon
``<id>`` via its admin socket. If you know the full path to the admin socket
file, this can be done more directly by running the following command:
@ -127,10 +131,11 @@ and ``quorum_status``.
Understanding mon_status
========================
The status of the monitor (as reported by the ``ceph tell mon.X mon_status``
command) can always be obtained via the admin socket. This command outputs a
great deal of information about the monitor (including the information found in
the output of the ``quorum_status`` command).
The status of a Monitor (as reported by the ``ceph tell mon.X mon_status``
command) can be obtained via the admin socket. The ``ceph tell mon.X
mon_status`` command outputs a great deal of information about the monitor
(including the information found in the output of the ``quorum_status``
command).
To understand this command's output, let us consider the following example, in
which we see the output of ``ceph tell mon.c mon_status``::
@ -160,29 +165,34 @@ which we see the output of ``ceph tell mon.c mon_status``::
"name": "c",
"addr": "127.0.0.1:6795\/0"}]}}
It is clear that there are three monitors in the monmap (*a*, *b*, and *c*),
the quorum is formed by only two monitors, and *c* is in the quorum as a
*peon*.
This output reports that there are three monitors in the monmap (*a*, *b*, and
*c*), that quorum is formed by only two monitors, and that *c* is in quorum as
a *peon*.
**Which monitor is out of the quorum?**
**Which monitor is out of quorum?**
The answer is **a** (that is, ``mon.a``).
The answer is **a** (that is, ``mon.a``). ``mon.a`` is out of quorum.
**Why?**
**How do we know, in this example, that mon.a is out of quorum?**
When the ``quorum`` set is examined, there are clearly two monitors in the
set: *1* and *2*. But these are not monitor names. They are monitor ranks, as
established in the current ``monmap``. The ``quorum`` set does not include
the monitor that has rank 0, and according to the ``monmap`` that monitor is
``mon.a``.
We know that ``mon.a`` is out of quorum because it has rank 0, and Monitors
with rank 0 are by definition out of quorum.
If we examine the ``quorum`` set, we can see that there are clearly two
monitors in the set: *1* and *2*. But these are not monitor names. They are
monitor ranks, as established in the current ``monmap``. The ``quorum`` set
does not include the monitor that has rank 0, and according to the ``monmap``
that monitor is ``mon.a``.
**How are monitor ranks determined?**
Monitor ranks are calculated (or recalculated) whenever monitors are added or
removed. The calculation of ranks follows a simple rule: the **greater** the
``IP:PORT`` combination, the **lower** the rank. In this case, because
``127.0.0.1:6789`` is lower than the other two ``IP:PORT`` combinations,
``mon.a`` has the highest rank: namely, rank 0.
Monitor ranks are calculated (or recalculated) whenever monitors are added to
or removed from the cluster. The calculation of ranks follows a simple rule:
the **greater** the ``IP:PORT`` combination, the **lower** the rank. In this
case, because ``127.0.0.1:6789`` (``mon.a``) is numerically less than the
other two ``IP:PORT`` combinations (which are ``127.0.0.1:6790`` for "Monitor
b" and ``127.0.0.1:6795`` for "Monitor c"), ``mon.a`` has the highest rank:
namely, rank 0.
Most Common Monitor Issues
@ -250,14 +260,15 @@ detail`` returns a message similar to the following::
Monitors at a wrong address. ``mon_status`` outputs the ``monmap`` that is
known to the monitor: determine whether the other Monitors' locations as
specified in the ``monmap`` match the locations of the Monitors in the
network. If they do not, see `Recovering a Monitor's Broken monmap`_.
If the locations of the Monitors as specified in the ``monmap`` match the
locations of the Monitors in the network, then the persistent
``probing`` state could be related to severe clock skews amongst the monitor
nodes. See `Clock Skews`_. If the information in `Clock Skews`_ does not
bring the Monitor out of the ``probing`` state, then prepare your system logs
and ask the Ceph community for help. See `Preparing your logs`_ for
information about the proper preparation of logs.
network. If they do not, see :ref:`Recovering a Monitor's Broken monmap
<rados_troubleshooting_troubleshooting_mon_recovering_broken_monmap>`. If
the locations of the Monitors as specified in the ``monmap`` match the
locations of the Monitors in the network, then the persistent ``probing``
state could be related to severe clock skews among the monitor nodes. See
`Clock Skews`_. If the information in `Clock Skews`_ does not bring the
Monitor out of the ``probing`` state, then prepare your system logs and ask
the Ceph community for help. See `Preparing your logs`_ for information about
the proper preparation of logs.
**What does it mean when a Monitor's state is ``electing``?**
@ -314,13 +325,16 @@ detail`` returns a message similar to the following::
substantiate it. See `Preparing your logs`_ for information about the
proper preparation of logs.
.. _rados_troubleshooting_troubleshooting_mon_recovering_broken_monmap:
Recovering a Monitor's Broken ``monmap``
----------------------------------------
Recovering a Monitor's Broken "monmap"
--------------------------------------
This is how a ``monmap`` usually looks, depending on the number of
monitors::
A monmap can be retrieved by using a command of the form ``ceph tell mon.c
mon_status``, as described in :ref:`Understanding mon_status
<rados_troubleshoting_troubleshooting_mon_understanding_mon_status>`.
Here is an example of a ``monmap``::
epoch 3
fsid 5c4e9d53-e2e1-478a-8061-f543f8be4cf8
@ -330,60 +344,63 @@ monitors::
1: 127.0.0.1:6790/0 mon.b
2: 127.0.0.1:6795/0 mon.c
This may not be what you have however. For instance, in some versions of
early Cuttlefish there was a bug that could cause your ``monmap``
to be nullified. Completely filled with zeros. This means that not even
``monmaptool`` would be able to make sense of cold, hard, inscrutable zeros.
It's also possible to end up with a monitor with a severely outdated monmap,
notably if the node has been down for months while you fight with your vendor's
TAC. The subject ``ceph-mon`` daemon might be unable to find the surviving
monitors (e.g., say ``mon.c`` is down; you add a new monitor ``mon.d``,
then remove ``mon.a``, then add a new monitor ``mon.e`` and remove
``mon.b``; you will end up with a totally different monmap from the one
``mon.c`` knows).
This ``monmap`` is in working order, but your ``monmap`` might not be in
working order. The ``monmap`` in a given node might be outdated because the
node was down for a long time, during which the cluster's Monitors changed.
In this situation you have two possible solutions:
There are two ways to update a Monitor's outdated ``monmap``:
Scrap the monitor and redeploy
A. **Scrap the monitor and redeploy.**
You should only take this route if you are positive that you won't
lose the information kept by that monitor; that you have other monitors
and that they are running just fine so that your new monitor is able
to synchronize from the remaining monitors. Keep in mind that destroying
a monitor, if there are no other copies of its contents, may lead to
loss of data.
Do this only if you are certain that you will not lose the information kept
by the Monitor that you scrap. Make sure that you have other Monitors in
good condition, so that the new Monitor will be able to synchronize with
the surviving Monitors. Remember that destroying a Monitor can lead to data
loss if there are no other copies of the Monitor's contents.
Inject a monmap into the monitor
B. **Inject a monmap into the monitor.**
These are the basic steps:
Retrieve the ``monmap`` from the surviving monitors and inject it into the
monitor whose ``monmap`` is corrupted or lost.
It is possible to fix a Monitor that has an outdated ``monmap`` by
retrieving an up-to-date ``monmap`` from surviving Monitors in the cluster
and injecting it into the Monitor that has a corrupted or missing
``monmap``.
Implement this solution by carrying out the following procedure:
1. Is there a quorum of monitors? If so, retrieve the ``monmap`` from the
quorum::
#. Retrieve the ``monmap`` in one of the two following ways:
$ ceph mon getmap -o /tmp/monmap
a. **IF THERE IS A QUORUM OF MONITORS:**
2. If there is no quorum, then retrieve the ``monmap`` directly from another
monitor that has been stopped (in this example, the other monitor has
the ID ``ID-FOO``)::
Retrieve the ``monmap`` from the quorum:
$ ceph-mon -i ID-FOO --extract-monmap /tmp/monmap
.. prompt:: bash
3. Stop the monitor you are going to inject the monmap into.
ceph mon getmap -o /tmp/monmap
4. Inject the monmap::
b. **IF THERE IS NO QUORUM OF MONITORS:**
$ ceph-mon -i ID --inject-monmap /tmp/monmap
Retrieve the ``monmap`` directly from a Monitor that has been stopped
:
5. Start the monitor
.. prompt:: bash
.. warning:: Injecting ``monmaps`` can cause serious problems because doing
so will overwrite the latest existing ``monmap`` stored on the monitor. Be
careful!
ceph-mon -i ID-FOO --extract-monmap /tmp/monmap
In this example, the ID of the stopped Monitor is ``ID-FOO``.
#. Stop the Monitor into which the ``monmap`` will be injected.
#. Inject the monmap into the stopped Monitor:
.. prompt:: bash
ceph-mon -i ID --inject-monmap /tmp/monmap
#. Start the Monitor.
.. warning:: Injecting a ``monmap`` into a Monitor can cause serious
problems. Injecting a ``monmap`` overwrites the latest existing
``monmap`` stored on the monitor. Be careful!
Clock Skews
-----------
@ -464,12 +481,13 @@ Clock Skew Questions and Answers
Client Can't Connect or Mount
-----------------------------
Check your IP tables. Some operating-system install utilities add a ``REJECT``
rule to ``iptables``. ``iptables`` rules will reject all clients other than
``ssh`` that try to connect to the host. If your monitor host's IP tables have
a ``REJECT`` rule in place, clients that are connecting from a separate node
will fail and will raise a timeout error. Any ``iptables`` rules that reject
clients trying to connect to Ceph daemons must be addressed. For example::
If a client can't connect to the cluster or mount, check your iptables. Some
operating-system install utilities add a ``REJECT`` rule to ``iptables``.
``iptables`` rules will reject all clients other than ``ssh`` that try to
connect to the host. If your monitor host's iptables have a ``REJECT`` rule in
place, clients that connect from a separate node will fail, and this will raise
a timeout error. Look for ``iptables`` rules that reject clients that are
trying to connect to Ceph daemons. For example::
REJECT all -- anywhere anywhere reject-with icmp-host-prohibited
@ -487,9 +505,9 @@ Monitor Store Failures
Symptoms of store corruption
----------------------------
Ceph monitors store the :term:`Cluster Map` in a key-value store. If key-value
store corruption causes a monitor to fail, then the monitor log might contain
one of the following error messages::
Ceph Monitors maintain the :term:`Cluster Map` in a key-value store. If
key-value store corruption causes a Monitor to fail, then the Monitor log might
contain one of the following error messages::
Corruption: error in middle of record
@ -500,10 +518,10 @@ or::
Recovery using healthy monitor(s)
---------------------------------
If there are surviving monitors, we can always :ref:`replace
<adding-and-removing-monitors>` the corrupted monitor with a new one. After the
new monitor boots, it will synchronize with a healthy peer. After the new
monitor is fully synchronized, it will be able to serve clients.
If the cluster contains surviving Monitors, the corrupted Monitor can be
:ref:`replaced <adding-and-removing-monitors>` with a new Monitor. After the
new Monitor boots, it will synchronize with a healthy peer. After the new
Monitor is fully synchronized, it will be able to serve clients.
.. _mon-store-recovery-using-osds:
@ -511,15 +529,14 @@ Recovery using OSDs
-------------------
Even if all monitors fail at the same time, it is possible to recover the
monitor store by using information stored in OSDs. You are encouraged to deploy
at least three (and preferably five) monitors in a Ceph cluster. In such a
deployment, complete monitor failure is unlikely. However, unplanned power loss
in a data center whose disk settings or filesystem settings are improperly
configured could cause the underlying filesystem to fail and this could kill
all of the monitors. In such a case, data in the OSDs can be used to recover
the monitors. The following is such a script and can be used to recover the
monitors:
Monitor store by using information that is stored in OSDs. You are encouraged
to deploy at least three (and preferably five) Monitors in a Ceph cluster. In
such a deployment, complete Monitor failure is unlikely. However, unplanned
power loss in a data center whose disk settings or filesystem settings are
improperly configured could cause the underlying filesystem to fail and this
could kill all of the monitors. In such a case, data in the OSDs can be used to
recover the Monitors. The following is a script that can be used in such a case
to recover the Monitors:
.. code-block:: bash
@ -572,10 +589,10 @@ monitors:
This script performs the following steps:
#. Collects the map from each OSD host.
#. Rebuilds the store.
#. Fills the entities in the keyring file with appropriate capabilities.
#. Replaces the corrupted store on ``mon.foo`` with the recovered copy.
#. Collect the map from each OSD host.
#. Rebuild the store.
#. Fill the entities in the keyring file with appropriate capabilities.
#. Replace the corrupted store on ``mon.foo`` with the recovered copy.
Known limitations
@ -587,19 +604,18 @@ The above recovery tool is unable to recover the following information:
auth add`` command are recovered from the OSD's copy, and the
``client.admin`` keyring is imported using ``ceph-monstore-tool``. However,
the MDS keyrings and all other keyrings will be missing in the recovered
monitor store. You might need to manually re-add them.
Monitor store. It might be necessary to manually re-add them.
- **Creating pools**: If any RADOS pools were in the process of being created,
that state is lost. The recovery tool operates on the assumption that all
pools have already been created. If there are PGs that are stuck in the
'unknown' state after the recovery for a partially created pool, you can
``unknown`` state after the recovery for a partially created pool, you can
force creation of the *empty* PG by running the ``ceph osd force-create-pg``
command. Note that this will create an *empty* PG, so take this action only
if you know the pool is empty.
command. This creates an *empty* PG, so take this action only if you are
certain that the pool is empty.
- **MDS Maps**: The MDS maps are lost.
Everything Failed! Now What?
============================
@ -611,16 +627,20 @@ irc.oftc.net), or at ``dev@ceph.io`` and ``ceph-users@lists.ceph.com``. Make
sure that you have prepared your logs and that you have them ready upon
request.
See https://ceph.io/en/community/connect/ for current (as of October 2023)
information on getting in contact with the upstream Ceph community.
The upstream Ceph Slack workspace can be joined at this address:
https://ceph-storage.slack.com/
See https://ceph.io/en/community/connect/ for current (as of December 2023)
information on getting in contact with the upstream Ceph community.
Preparing your logs
-------------------
The default location for monitor logs is ``/var/log/ceph/ceph-mon.FOO.log*``.
However, if they are not there, you can find their current location by running
the following command:
The default location for Monitor logs is ``/var/log/ceph/ceph-mon.FOO.log*``.
It is possible that the location of the Monitor logs has been changed from the
default. If the location of the Monitor logs has been changed from the default
location, find the location of the Monitor logs by running the following
command:
.. prompt:: bash
@ -631,21 +651,21 @@ cluster's configuration files. If Ceph is using the default debug levels, then
your logs might be missing important information that would help the upstream
Ceph community address your issue.
To make sure your monitor logs contain relevant information, you can raise
debug levels. Here we are interested in information from the monitors. As with
other components, the monitors have different parts that output their debug
Raise debug levels to make sure that your Monitor logs contain relevant
information. Here we are interested in information from the Monitors. As with
other components, the Monitors have different parts that output their debug
information on different subsystems.
If you are an experienced Ceph troubleshooter, we recommend raising the debug
levels of the most relevant subsystems. Of course, this approach might not be
easy for beginners. In most cases, however, enough information to address the
issue will be secured if the following debug levels are entered::
levels of the most relevant subsystems. This approach might not be easy for
beginners. In most cases, however, enough information to address the issue will
be logged if the following debug levels are entered::
debug_mon = 10
debug_ms = 1
Sometimes these debug levels do not yield enough information. In such cases,
members of the upstream Ceph community might ask you to make additional changes
members of the upstream Ceph community will ask you to make additional changes
to these or to other debug levels. In any case, it is better for us to receive
at least some useful information than to receive an empty log.
@ -653,10 +673,12 @@ at least some useful information than to receive an empty log.
Do I need to restart a monitor to adjust debug levels?
------------------------------------------------------
No, restarting a monitor is not necessary. Debug levels may be adjusted by
using two different methods, depending on whether or not there is a quorum:
No. It is not necessary to restart a Monitor when adjusting its debug levels.
There is a quorum
There are two different methods for adjusting debug levels. One method is used
when there is quorum. The other is used when there is no quorum.
**Adjusting debug levels when there is a quorum**
Either inject the debug option into the specific monitor that needs to
be debugged::
@ -668,17 +690,19 @@ There is a quorum
ceph tell mon.* config set debug_mon 10/10
There is no quorum
**Adjusting debug levels when there is no quorum**
Use the admin socket of the specific monitor that needs to be debugged
and directly adjust the monitor's configuration options::
ceph daemon mon.FOO config set debug_mon 10/10
**Returning debug levels to their default values**
To return the debug levels to their default values, run the above commands
using the debug level ``1/10`` rather than ``10/10``. To check a monitor's
current values, use the admin socket and run either of the following commands:
using the debug level ``1/10`` rather than the debug level ``10/10``. To check
a Monitor's current values, use the admin socket and run either of the
following commands:
.. prompt:: bash
@ -695,17 +719,17 @@ or:
I Reproduced the problem with appropriate debug levels. Now what?
-----------------------------------------------------------------
We prefer that you send us only the portions of your logs that are relevant to
your monitor problems. Of course, it might not be easy for you to determine
which portions are relevant so we are willing to accept complete and
unabridged logs. However, we request that you avoid sending logs containing
hundreds of thousands of lines with no additional clarifying information. One
common-sense way of making our task easier is to write down the current time
and date when you are reproducing the problem and then extract portions of your
Send the upstream Ceph community only the portions of your logs that are
relevant to your Monitor problems. Because it might not be easy for you to
determine which portions are relevant, the upstream Ceph community accepts
complete and unabridged logs. But don't send logs containing hundreds of
thousands of lines with no additional clarifying information. One common-sense
way to help the Ceph community help you is to write down the current time and
date when you are reproducing the problem and then extract portions of your
logs based on that information.
Finally, reach out to us on the mailing lists or IRC or Slack, or by filing a
new issue on the `tracker`_.
Contact the upstream Ceph community on the mailing lists or IRC or Slack, or by
filing a new issue on the `tracker`_.
.. _tracker: http://tracker.ceph.com/projects/ceph/issues/new

View File

@ -2,25 +2,23 @@
Admin Guide
=============
Once you have your Ceph Object Storage service up and running, you may
administer the service with user management, access controls, quotas
and usage tracking among other features.
After the Ceph Object Storage service is up and running, it can be administered
with user management, access controls, quotas, and usage tracking.
User Management
===============
Ceph Object Storage user management refers to users of the Ceph Object Storage
service (i.e., not the Ceph Object Gateway as a user of the Ceph Storage
Cluster). You must create a user, access key and secret to enable end users to
interact with Ceph Object Gateway services.
Ceph Object Storage user management refers only to users of the Ceph Object
Storage service and not to the Ceph Object Gateway as a user of the Ceph
Storage Cluster. Create a user, access key, and secret key to enable end users
to interact with Ceph Object Gateway services.
There are two user types:
There are two types of user:
- **User:** The term 'user' reflects a user of the S3 interface.
- **User:** The term "user" refers to user of the S3 interface.
- **Subuser:** The term 'subuser' reflects a user of the Swift interface. A subuser
is associated to a user .
- **Subuser:** The term "subuser" refers to a user of the Swift interface. A
subuser is associated with a user.
.. ditaa::
+---------+
@ -31,22 +29,28 @@ There are two user types:
+-----+ Subuser |
+-----------+
You can create, modify, view, suspend and remove users and subusers. In addition
to user and subuser IDs, you may add a display name and an email address for a
user. You can specify a key and secret, or generate a key and secret
automatically. When generating or specifying keys, note that user IDs correspond
to an S3 key type and subuser IDs correspond to a swift key type. Swift keys
also have access levels of ``read``, ``write``, ``readwrite`` and ``full``.
Users and subusers can be created, modified, viewed, suspended and removed.
you may add a Display names and an email addresses can be added to user
profiles. Keys and secrets can either be specified or generated automatically.
When generating or specifying keys, remember that user IDs correspond to S3 key
types and subuser IDs correspond to Swift key types.
Swift keys have access levels of ``read``, ``write``, ``readwrite`` and
``full``.
Create a User
-------------
To create a user (S3 interface), execute the following::
To create a user (S3 interface), run a command of the following form:
.. prompt:: bash
radosgw-admin user create --uid={username} --display-name="{display-name}" [--email={email}]
For example::
For example:
.. prompt:: bash
radosgw-admin user create --uid=johndoe --display-name="John Doe" --email=john@example.com
@ -75,32 +79,37 @@ For example::
"max_objects": -1},
"temp_url_keys": []}
Creating a user also creates an ``access_key`` and ``secret_key`` entry for use
with any S3 API-compatible client.
The creation of a user entails the creation of an ``access_key`` and a
``secret_key`` entry, which can be used with any S3 API-compatible client.
.. important:: Check the key output. Sometimes ``radosgw-admin``
generates a JSON escape (``\``) character, and some clients
do not know how to handle JSON escape characters. Remedies include
removing the JSON escape character (``\``), encapsulating the string
in quotes, regenerating the key and ensuring that it
does not have a JSON escape character or specify the key and secret
manually.
.. important:: Check the key output. Sometimes ``radosgw-admin`` generates a
JSON escape (``\``) character, and some clients do not know how to handle
JSON escape characters. Remedies include removing the JSON escape character
(``\``), encapsulating the string in quotes, regenerating the key and
ensuring that it does not have a JSON escape character, or specifying the
key and secret manually.
Create a Subuser
----------------
To create a subuser (Swift interface) for the user, you must specify the user ID
(``--uid={username}``), a subuser ID and the access level for the subuser. ::
To create a subuser (a user of the Swift interface) for the user, specify the
user ID (``--uid={username}``), a subuser ID, and the subuser's access level:
.. prompt:: bash
radosgw-admin subuser create --uid={uid} --subuser={uid} --access=[ read | write | readwrite | full ]
For example::
For example:
.. prompt:: bash
radosgw-admin subuser create --uid=johndoe --subuser=johndoe:swift --access=full
.. note:: ``full`` is not ``readwrite``, as it also includes the access control policy.
.. note:: ``full`` is not the same as ``readwrite``. The ``full`` access level
includes ``read`` and ``write``, but it also includes the access control
policy.
.. code-block:: javascript
@ -133,100 +142,126 @@ For example::
Get User Info
-------------
To get information about a user, you must specify ``user info`` and the user ID
(``--uid={username}``) . ::
To get information about a user, specify ``user info`` and the user ID
(``--uid={username}``). Use a command of the following form:
.. prompt:: bash
radosgw-admin user info --uid=johndoe
Modify User Info
----------------
To modify information about a user, you must specify the user ID (``--uid={username}``)
and the attributes you want to modify. Typical modifications are to keys and secrets,
email addresses, display names and access levels. For example::
To modify information about a user, specify the user ID (``--uid={username}``)
and the attributes that you want to modify. Typical modifications are made to
keys and secrets, email addresses, display names, and access levels. Use a
command of the following form:
.. prompt:: bash
radosgw-admin user modify --uid=johndoe --display-name="John E. Doe"
To modify subuser values, specify ``subuser modify``, user ID and the subuser ID. For example::
To modify subuser values, specify ``subuser modify``, user ID and the subuser
ID. Use a command of the following form:
.. prompt:: bash
radosgw-admin subuser modify --uid=johndoe --subuser=johndoe:swift --access=full
User Enable/Suspend
-------------------
User Suspend
------------
When you create a user, the user is enabled by default. However, you may suspend
user privileges and re-enable them at a later time. To suspend a user, specify
``user suspend`` and the user ID. ::
When a user is created, the user is enabled by default. However, it is possible
to suspend user privileges and to re-enable them at a later time. To suspend a
user, specify ``user suspend`` and the user ID in a command of the following
form:
.. prompt:: bash
radosgw-admin user suspend --uid=johndoe
To re-enable a suspended user, specify ``user enable`` and the user ID. ::
User Enable
-----------
To re-enable a suspended user, provide ``user enable`` and specify the user ID
in a command of the following form:
.. prompt:: bash
radosgw-admin user enable --uid=johndoe
.. note:: Disabling the user disables the subuser.
.. note:: Disabling the user also disables any subusers.
Remove a User
-------------
When you remove a user, the user and subuser are removed from the system.
However, you may remove just the subuser if you wish. To remove a user (and
subuser), specify ``user rm`` and the user ID. ::
When you remove a user, you also remove any subusers associated with the user.
It is possible to remove a subuser without removing its associated user. This
is covered in the section called :ref:`Remove a Subuser <radosgw-admin-remove-a-subuser>`.
To remove a user and any subusers associated with it, use the ``user rm``
command and provide the user ID of the user to be removed. Use a command of the
following form:
.. prompt:: bash
radosgw-admin user rm --uid=johndoe
To remove the subuser only, specify ``subuser rm`` and the subuser ID. ::
radosgw-admin subuser rm --subuser=johndoe:swift
Options include:
- **Purge Data:** The ``--purge-data`` option purges all data associated
to the UID.
with the UID.
- **Purge Keys:** The ``--purge-keys`` option purges all keys associated
to the UID.
with the UID.
.. _radosgw-admin-remove-a-subuser:
Remove a Subuser
----------------
When you remove a sub user, you are removing access to the Swift interface.
The user will remain in the system. To remove the subuser, specify
``subuser rm`` and the subuser ID. ::
Removing a subuser removes access to the Swift interface or to S3. The user
associated with the removed subuser remains in the system after the subuser's
removal.
To remove the subuser, use the command ``subuser rm`` and provide the subuser
ID of the subuser to be removed. Use a command of the following form:
.. prompt:: bash
radosgw-admin subuser rm --subuser=johndoe:swift
Options include:
- **Purge Keys:** The ``--purge-keys`` option purges all keys associated
to the UID.
with the UID.
Add / Remove a Key
------------------------
Add or Remove a Key
--------------------
Both users and subusers require the key to access the S3 or Swift interface. To
Both users and subusers require a key to access the S3 or Swift interface. To
use S3, the user needs a key pair which is composed of an access key and a
secret key. On the other hand, to use Swift, the user typically needs a secret
key (password), and use it together with the associated user ID. You may create
a key and either specify or generate the access key and/or secret key. You may
also remove a key. Options include:
secret key. To use Swift, the user needs a secret key (password), which is used
together with its associated user ID. You can create a key and either specify
or generate the access key or secret key. You can also remove a key. Options
include:
- ``--key-type=<type>`` specifies the key type. The options are: s3, swift
- ``--key-type=<type>`` specifies the key type. The options are: ``s3``, ``swift``
- ``--access-key=<key>`` manually specifies an S3 access key.
- ``--secret-key=<key>`` manually specifies a S3 secret key or a Swift secret key.
- ``--gen-access-key`` automatically generates a random S3 access key.
- ``--gen-secret`` automatically generates a random S3 secret key or a random Swift secret key.
An example how to add a specified S3 key pair for a user. ::
Adding S3 keys
~~~~~~~~~~~~~~
To add a specific S3 key pair for a user, run a command of the following form:
.. prompt:: bash
radosgw-admin key create --uid=foo --key-type=s3 --access-key fooAccessKey --secret-key fooSecretKey
@ -243,9 +278,15 @@ An example how to add a specified S3 key pair for a user. ::
"secret_key": "fooSecretKey"}],
}
Note that you may create multiple S3 key pairs for a user.
.. note:: You can create multiple S3 key pairs for a user.
To attach a specified swift secret key for a subuser. ::
Adding Swift secret keys
~~~~~~~~~~~~~~~~~~~~~~~~
To attach a specific Swift secret key for a subuser, run a command of the
following form:
.. prompt:: bash
radosgw-admin key create --subuser=foo:bar --key-type=swift --secret-key barSecret
@ -263,9 +304,16 @@ To attach a specified swift secret key for a subuser. ::
{ "user": "foo:bar",
"secret_key": "asfghjghghmgm"}]}
Note that a subuser can have only one swift secret key.
.. note:: A subuser can have only one Swift secret key.
Subusers can also be used with S3 APIs if the subuser is associated with a S3 key pair. ::
Associating subusers with S3 key pairs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Subusers can also be used with S3 APIs if the subuser is associated with a S3
key pair. To associate a subuser with an S3 key pair, run a command of the
following form:
.. prompt:: bash
radosgw-admin key create --subuser=foo:bar --key-type=s3 --access-key barAccessKey --secret-key barSecretKey
@ -286,49 +334,70 @@ Subusers can also be used with S3 APIs if the subuser is associated with a S3 ke
}
To remove a S3 key pair, specify the access key. ::
Removing S3 key pairs
~~~~~~~~~~~~~~~~~~~~~
To remove a S3 key pair, specify the access key to be removed. Run a command of the following form:
.. prompt:: bash
radosgw-admin key rm --uid=foo --key-type=s3 --access-key=fooAccessKey
To remove the swift secret key. ::
Removing Swift secret keys
~~~~~~~~~~~~~~~~~~~~~~~~~~
To remove a Swift secret key, run a command of the following form:
.. prompt:: bash
radosgw-admin key rm --subuser=foo:bar --key-type=swift
Add / Remove Admin Capabilities
-------------------------------
Add or Remove Admin Capabilities
--------------------------------
The Ceph Storage Cluster provides an administrative API that enables users to
execute administrative functions via the REST API. By default, users do NOT have
access to this API. To enable a user to exercise administrative functionality,
provide the user with administrative capabilities.
execute administrative functions via the REST API. By default, users do NOT
have access to this API. To enable a user to exercise administrative
functionality, provide the user with administrative capabilities.
To add administrative capabilities to a user, execute the following::
To add administrative capabilities to a user, run a command of the following
form:
.. prompt:: bash
radosgw-admin caps add --uid={uid} --caps={caps}
You can add read, write or all capabilities to users, buckets, metadata and
usage (utilization). For example::
usage (utilization). To do this, use a command-line option of the following
form:
--caps="[users|buckets|metadata|usage|zone|amz-cache|info|bilog|mdlog|datalog|user-policy|oidc-provider|roles|ratelimit]=[*|read|write|read, write]"
.. prompt:: bash
For example::
--caps="[users|buckets|metadata|usage|zone|amz-cache|info|bilog|mdlog|datalog|user-policy|oidc-provider|roles|ratelimit]=[\*|read|write|read, write]"
For example:
.. prompt:: bash
radosgw-admin caps add --uid=johndoe --caps="users=*;buckets=*"
To remove administrative capabilities from a user, run a command of the
following form:
To remove administrative capabilities from a user, execute the following::
.. prompt:: bash
radosgw-admin caps rm --uid=johndoe --caps={caps}
Quota Management
================
The Ceph Object Gateway enables you to set quotas on users and buckets owned by
users. Quotas include the maximum number of objects in a bucket and the maximum
storage size a bucket can hold.
The Ceph Object Gateway makes it possible for you to set quotas on users and
buckets owned by users. Quotas include the maximum number of objects in a
bucket and the maximum storage size a bucket can hold.
- **Bucket:** The ``--bucket`` option allows you to specify a quota for
buckets the user owns.
@ -337,38 +406,47 @@ storage size a bucket can hold.
the maximum number of objects. A negative value disables this setting.
- **Maximum Size:** The ``--max-size`` option allows you to specify a quota
size in B/K/M/G/T, where B is the default. A negative value disables this setting.
size in B/K/M/G/T, where B is the default. A negative value disables this
setting.
- **Quota Scope:** The ``--quota-scope`` option sets the scope for the quota.
The options are ``bucket`` and ``user``. Bucket quotas apply to buckets a
user owns. User quotas apply to a user.
The options are ``bucket`` and ``user``. Bucket quotas apply to each bucket
owned by the user. User Quotas are summed across all buckets owned by the
user.
Set User Quota
--------------
Before you enable a quota, you must first set the quota parameters.
For example::
To set quota parameters, run a command of the following form:
.. prompt:: bash
radosgw-admin quota set --quota-scope=user --uid=<uid> [--max-objects=<num objects>] [--max-size=<max size>]
For example::
For example:
.. prompt:: bash
radosgw-admin quota set --quota-scope=user --uid=johndoe --max-objects=1024 --max-size=1024B
A negative value for num objects and / or max size means that the
specific quota attribute check is disabled.
Passing a negative value as an argument of ``--max-objects`` or ``--max-size``
disables the given quota attribute.
Enable/Disable User Quota
-------------------------
Enabling and Disabling User Quota
---------------------------------
Once you set a user quota, you may enable it. For example::
After a user quota is set, it must be enabled in order to take effect. To enable a user quota, run a command of the following form:
.. prompt:: bash
radosgw-admin quota enable --quota-scope=user --uid=<uid>
You may disable an enabled user quota. For example::
To disable an enabled user quota, run a command of the following form:
.. prompt:: bash
radosgw-admin quota disable --quota-scope=user --uid=<uid>
@ -377,22 +455,30 @@ Set Bucket Quota
----------------
Bucket quotas apply to the buckets owned by the specified ``uid``. They are
independent of the user. ::
independent of the user. To set a bucket quota, run a command of the following
form:
.. prompt:: bash
radosgw-admin quota set --uid=<uid> --quota-scope=bucket [--max-objects=<num objects>] [--max-size=<max size]
A negative value for num objects and / or max size means that the
specific quota attribute check is disabled.
A negative value for ``--max-objects`` or ``--max-size`` means that the
specific quota attribute is disabled.
Enable/Disable Bucket Quota
---------------------------
Enable and Disabling Bucket Quota
---------------------------------
Once you set a bucket quota, you may enable it. For example::
After a bucket quota has been set, it must be enabled in order to take effect.
To enable a bucket quota, run a command of the following form:
.. prompt:: bash
radosgw-admin quota enable --quota-scope=bucket --uid=<uid>
You may disable an enabled bucket quota. For example::
To disable an enabled bucket quota, run a command of the following form:
.. prompt:: bash
radosgw-admin quota disable --quota-scope=bucket --uid=<uid>
@ -400,9 +486,11 @@ You may disable an enabled bucket quota. For example::
Get Quota Settings
------------------
You may access each user's quota settings via the user information
You can access each user's quota settings via the user information
API. To read user quota setting information with the CLI interface,
execute the following::
run a command of the following form:
.. prompt:: bash
radosgw-admin user info --uid=<uid>
@ -410,9 +498,12 @@ execute the following::
Update Quota Stats
------------------
Quota stats get updated asynchronously. You can update quota
statistics for all users and all buckets manually to retrieve
the latest quota stats. ::
Quota stats are updated asynchronously. You can update quota statistics for all
users and all buckets manually to force an update of the latest quota stats. To
update quota statistics for all users and all buckets in order to retrieve the
latest quota statistics, run a command of the following form:
.. prompt:: bash
radosgw-admin user stats --uid=<uid> --sync-stats
@ -421,69 +512,90 @@ the latest quota stats. ::
Get User Usage Stats
--------------------
To see how much of the quota a user has consumed, execute the following::
To see how much of a quota a user has consumed, run a command of the following
form:
.. prompt:: bash
radosgw-admin user stats --uid=<uid>
.. note:: You should execute ``radosgw-admin user stats`` with the
``--sync-stats`` option to receive the latest data.
.. note:: Run ``radosgw-admin user stats`` with the ``--sync-stats`` option to
receive the latest data.
Default Quotas
--------------
You can set default quotas in the config. These defaults are used when
creating a new user and have no effect on existing users. If the
relevant default quota is set in config, then that quota is set on the
new user, and that quota is enabled. See ``rgw bucket default quota max objects``,
``rgw bucket default quota max size``, ``rgw user default quota max objects``, and
``rgw user default quota max size`` in `Ceph Object Gateway Config Reference`_
You can set default quotas in the Ceph Object Gateway config. **These defaults
will be used only when creating new users and will have no effect on existing
users.** If a default quota is set in the Ceph Object Gateway Config, then that
quota is set for all subsequently-created users, and that quota is enabled. See
``rgw_bucket_default_quota_max_objects``,
``rgw_bucket_default_quota_max_size``, ``rgw_user_default_quota_max_objects``,
and ``rgw_user_default_quota_max_size`` in `Ceph Object Gateway Config
Reference`_
Quota Cache
-----------
Quota statistics are cached on each RGW instance. If there are multiple
instances, then the cache can keep quotas from being perfectly enforced, as
each instance will have a different view of quotas. The options that control
this are ``rgw bucket quota ttl``, ``rgw user quota bucket sync interval`` and
``rgw user quota sync interval``. The higher these values are, the more
efficient quota operations are, but the more out-of-sync multiple instances
will be. The lower these values are, the closer to perfect enforcement
multiple instances will achieve. If all three are 0, then quota caching is
effectively disabled, and multiple instances will have perfect quota
enforcement. See `Ceph Object Gateway Config Reference`_
Quota statistics are cached by each RGW instance. If multiple RGW instances are
deployed, then this cache may prevent quotas from being perfectly enforced,
because each instance may have a different set of quota settings.
Here are the options that control this behavior:
:confval:`rgw_bucket_quota_ttl`
:confval:`rgw_user_quota_bucket_sync_interval`
:confval:`rgw_user_quota_sync_interval`
Increasing these values will make quota operations more efficient at the cost
of increasing the likelihood that the multiple RGW instances may not
consistently have the latest quota settings. Decreasing these values brings
the multiple RGW instances closer to perfect quota synchronization.
If all three values are set to ``0`` , then quota caching is effectively
disabled, and multiple instances will have perfect quota enforcement. See
`Ceph Object Gateway Config Reference`_.
Reading / Writing Global Quotas
-------------------------------
You can read and write global quota settings in the period configuration. To
view the global quota settings::
view the global quota settings, run the following command:
.. prompt:: bash
radosgw-admin global quota get
The global quota settings can be manipulated with the ``global quota``
Global quota settings can be manipulated with the ``global quota``
counterparts of the ``quota set``, ``quota enable``, and ``quota disable``
commands. ::
commands, as in the following examples:
.. prompt:: bash
radosgw-admin global quota set --quota-scope bucket --max-objects 1024
radosgw-admin global quota enable --quota-scope bucket
.. note:: In a multisite configuration, where there is a realm and period
.. note:: In a multisite configuration where there is a realm and period
present, changes to the global quotas must be committed using ``period
update --commit``. If there is no period present, the rados gateway(s) must
update --commit``. If no period is present, the RGW instances must
be restarted for the changes to take effect.
Rate Limit Management
=====================
The Ceph Object Gateway makes it possible to set rate limits on users and
buckets. "Rate limit" includes the maximum number of read operations (read
ops) and write operations (write ops) per minute and the number of bytes per
minute that can be written or read per user or per bucket.
Quotas can be set for The Ceph Object Gateway on users and buckets. The "rate
limit" includes the maximum number of read operations (read ops) and write
operations (write ops) per minute as well as the number of bytes per minute
that can be written or read per user or per bucket.
Read Requests and Write Requests
--------------------------------
Operations that use the ``GET`` method or the ``HEAD`` method in their REST
requests are "read requests". All other requests are "write requests".
How Metrics Work
----------------
Each object gateway tracks per-user metrics separately from bucket metrics.
These metrics are not shared with other gateways. The configured limits should
be divided by the number of active object gateways. For example, if "user A" is
@ -518,66 +630,90 @@ time has elapsed, "user A" will be able to send ``GET`` requests again.
- **User:** The ``--uid`` option allows you to specify a rate limit for a
user.
- **Maximum Read Ops:** The ``--max-read-ops`` setting allows you to specify
the maximum number of read ops per minute per RGW. A 0 value disables this setting (which means unlimited access).
- **Maximum Read Ops:** The ``--max-read-ops`` setting allows you to limit read
bytes per minute per RGW instance. A ``0`` value disables throttling.
- **Maximum Read Bytes:** The ``--max-read-bytes`` setting allows you to specify
the maximum number of read bytes per minute per RGW. A 0 value disables this setting (which means unlimited access).
- **Maximum Read Bytes:** The ``--max-read-bytes`` setting allows you to limit
read bytes per minute per RGW instance. A ``0`` value disables throttling.
- **Maximum Write Ops:** The ``--max-write-ops`` setting allows you to specify
the maximum number of write ops per minute per RGW. A 0 value disables this setting (which means unlimited access).
the maximum number of write ops per minute per RGW instance. A ``0`` value
disables throttling.
- **Maximum Write Bytes:** The ``--max-write-bytes`` setting allows you to specify
the maximum number of write bytes per minute per RGW. A 0 value disables this setting (which means unlimited access).
- **Maximum Write Bytes:** The ``--max-write-bytes`` setting allows you to
specify the maximum number of write bytes per minute per RGW instance. A
``0`` value disables throttling.
- **Rate Limit Scope:** The ``--ratelimit-scope`` option sets the scope for the rate limit.
The options are ``bucket`` , ``user`` and ``anonymous``. Bucket rate limit apply to buckets.
The user rate limit applies to a user. Anonymous applies to an unauthenticated user.
Anonymous scope is only available for global rate limit.
- **Rate Limit Scope:** The ``--ratelimit-scope`` option sets the scope for the
rate limit. The options are ``bucket`` , ``user`` and ``anonymous``. Bucket
rate limit apply to buckets. The user rate limit applies to a user. The
``anonymous`` option applies to an unauthenticated user. Anonymous scope is
available only for global rate limit.
Set User Rate Limit
-------------------
Before you enable a rate limit, you must first set the rate limit parameters.
For example::
Before you can enable a rate limit, you must first set the rate limit
parameters. The following is the general form of commands that set rate limit
parameters:
radosgw-admin ratelimit set --ratelimit-scope=user --uid=<uid> <[--max-read-ops=<num ops>] [--max-read-bytes=<num bytes>]
.. prompt:: bash
radosgw-admin ratelimit set --ratelimit-scope=user --uid=<uid>
<[--max-read-ops=<num ops>] [--max-read-bytes=<num bytes>]
[--max-write-ops=<num ops>] [--max-write-bytes=<num bytes>]>
For example::
An example of using ``radosgw-admin ratelimit set`` to set a rate limit might
look like this:
.. prompt:: bash
radosgw-admin ratelimit set --ratelimit-scope=user --uid=johndoe --max-read-ops=1024 --max-write-bytes=10240
A 0 value for num ops and / or num bytes means that the
specific rate limit attribute check is disabled.
A value of ``0`` assigned to ``--max-read-ops``, ``--max-read-bytes``,
``--max-write-ops``, or ``--max-write-bytes`` disables the specified rate
limit.
Get User Rate Limit
-------------------
Get the current configured rate limit parameters
For example::
The ``radosgw-admin ratelimit get`` command returns the currently configured
rate limit parameters.
The following is the general form of the command that returns the current
configured limit parameters:
.. prompt:: bash
radosgw-admin ratelimit get --ratelimit-scope=user --uid=<uid>
For example::
An example of using ``radosgw-admin ratelimit get`` to return the rate limit
parameters might look like this:
.. prompt:: bash
radosgw-admin ratelimit get --ratelimit-scope=user --uid=johndoe
A 0 value for num ops and / or num bytes means that the
specific rate limit attribute check is disabled.
A value of ``0`` assigned to ``--max-read-ops``, ``--max-read-bytes``,
``--max-write-ops``, or ``--max-write-bytes`` disables the specified rate
limit.
Enable/Disable User Rate Limit
------------------------------
Enable and Disable User Rate Limit
----------------------------------
Once you set a user rate limit, you may enable it. For example::
After you have set a user rate limit, you must enable it in order for it to
take effect. Run a command of the following form to enable a user rate limit:
.. prompt:: bash
radosgw-admin ratelimit enable --ratelimit-scope=user --uid=<uid>
You may disable an enabled user rate limit. For example::
To disable an enabled user rate limit, run a command of the following form:
.. prompt:: bash
radosgw-admin ratelimit disable --ratelimit-scope=user --uid=johndoe
@ -586,114 +722,154 @@ Set Bucket Rate Limit
---------------------
Before you enable a rate limit, you must first set the rate limit parameters.
For example::
The following is the general form of commands that set rate limit parameters:
.. prompt:: bash
radosgw-admin ratelimit set --ratelimit-scope=bucket --bucket=<bucket> <[--max-read-ops=<num ops>] [--max-read-bytes=<num bytes>]
[--max-write-ops=<num ops>] [--max-write-bytes=<num bytes>]>
For example::
An example of using ``radosgw-admin ratelimit set`` to set a rate limit for a
bucket might look like this:
.. prompt:: bash
radosgw-admin ratelimit set --ratelimit-scope=bucket --bucket=mybucket --max-read-ops=1024 --max-write-bytes=10240
A 0 value for num ops and / or num bytes means that the
specific rate limit attribute check is disabled.
A value of ``0`` assigned to ``--max-read-ops``, ``--max-read-bytes``,
``--max-write-ops``, or ``-max-write-bytes`` disables the specified bucket rate
limit.
Get Bucket Rate Limit
---------------------
Get the current configured rate limit parameters
For example::
The ``radosgw-admin ratelimit get`` command returns the current configured rate
limit parameters.
radosgw-admin ratelimit set --ratelimit-scope=bucket --bucket=<bucket>
The following is the general form of the command that returns the current
configured limit parameters:
For example::
.. prompt:: bash
radosgw-admin ratelimit get --ratelimit-scope=bucket --bucket=<bucket>
An example of using ``radosgw-admin ratelimit get`` to return the rate limit
parameters for a bucket might look like this:
.. prompt:: bash
radosgw-admin ratelimit get --ratelimit-scope=bucket --bucket=mybucket
A 0 value for num ops and / or num bytes means that the
specific rate limit attribute check is disabled.
A value of ``0`` assigned to ``--max-read-ops``, ``--max-read-bytes``,
``--max-write-ops``, or ``--max-write-bytes`` disables the specified rate
limit.
Enable/Disable Bucket Rate Limit
--------------------------------
Enable and Disable Bucket Rate Limit
------------------------------------
Once you set a bucket rate limit, you may enable it. For example::
After you set a bucket rate limit, you can enable it. The following is the
general form of the ``radosgw-admin ratelimit enable`` command that enables
bucket rate limits:
.. prompt:: bash
radosgw-admin ratelimit enable --ratelimit-scope=bucket --bucket=<bucket>
You may disable an enabled bucket rate limit. For example::
An enabled bucket rate limit can be disabled by running a command of the following form:
.. prompt:: bash
radosgw-admin ratelimit disable --ratelimit-scope=bucket --uid=mybucket
Reading and Writing Global Rate Limit Configuration
---------------------------------------------------
Reading / Writing Global Rate Limit Configuration
-------------------------------------------------
You can read and write global rate limit settings in the period's configuration.
To view the global rate limit settings, run the following command:
You can read and write global rate limit settings in the period configuration. To
view the global rate limit settings::
.. prompt:: bash
radosgw-admin global ratelimit get
The global rate limit settings can be manipulated with the ``global ratelimit``
counterparts of the ``ratelimit set``, ``ratelimit enable``, and ``ratelimit disable``
commands. Per user and per bucket ratelimit configuration is overriding the global configuration::
counterparts of the ``ratelimit set``, ``ratelimit enable``, and ``ratelimit
disable`` commands. Per-user and per-bucket ratelimit configurations override
the global configuration:
.. prompt:: bash
radosgw-admin global ratelimit set --ratelimit-scope bucket --max-read-ops=1024
radosgw-admin global ratelimit enable --ratelimit-scope bucket
The global rate limit can configure rate limit scope for all authenticated users::
The global rate limit can be used to configure the scope of the rate limit for
all authenticated users:
.. prompt:: bash
radosgw-admin global ratelimit set --ratelimit-scope user --max-read-ops=1024
radosgw-admin global ratelimit enable --ratelimit-scope user
The global rate limit can configure rate limit scope for all unauthenticated users::
The global rate limit can be used to configure the scope of the rate limit for
all unauthenticated users:
.. prompt:: bash
radosgw-admin global ratelimit set --ratelimit-scope=anonymous --max-read-ops=1024
radosgw-admin global ratelimit enable --ratelimit-scope=anonymous
.. note:: In a multisite configuration, where there is a realm and period
present, changes to the global rate limit must be committed using ``period
update --commit``. If there is no period present, the rados gateway(s) must
be restarted for the changes to take effect.
.. note:: In a multisite configuration where a realm and a period are present,
any changes to the global rate limit must be committed using ``period update
--commit``. If no period is present, the rados gateway(s) must be restarted
for the changes to take effect.
Usage
=====
The Ceph Object Gateway logs usage for each user. You can track
user usage within date ranges too.
The Ceph Object Gateway logs the usage of each user. You can track the usage of
each user within a specified date range.
- Add ``rgw_enable_usage_log = true`` in the ``[client.rgw]`` section of
``ceph.conf`` and restart the ``radosgw`` service.
.. note:: Until Ceph has a linkable macro that handles all the many ways that options can be set, we advise that you set ``rgw_enable_usage_log = true`` in central config or in ``ceph.conf`` and restart all RGWs.
- Add ``rgw enable usage log = true`` in [client.rgw] section of ceph.conf and restart the radosgw service.
Options include:
- **Start Date:** The ``--start-date`` option allows you to filter usage
stats from a particular start date and an optional start time
stats from a specified start date and an optional start time
(**format:** ``yyyy-mm-dd [HH:MM:SS]``).
- **End Date:** The ``--end-date`` option allows you to filter usage up
to a particular date and an optional end time
to a particular end date and an optional end time
(**format:** ``yyyy-mm-dd [HH:MM:SS]``).
- **Log Entries:** The ``--show-log-entries`` option allows you to specify
whether or not to include log entries with the usage stats
whether to include log entries with the usage stats
(options: ``true`` | ``false``).
.. note:: You may specify time with minutes and seconds, but it is stored
with 1 hour resolution.
.. note:: You can specify time to a precision of minutes and seconds, but the
specified time is stored only with a one-hour resolution.
Show Usage
----------
To show usage statistics, specify the ``usage show``. To show usage for a
particular user, you must specify a user ID. You may also specify a start date,
end date, and whether or not to show log entries.::
To show usage statistics, use the ``radosgw-admin usage show`` command. To show
usage for a particular user, you must specify a user ID. You can also specify a
start date, end date, and whether to show log entries. The following is an example
of such a command:
.. prompt:: bash $
radosgw-admin usage show --uid=johndoe --start-date=2012-03-01 --end-date=2012-04-01
You may also show a summary of usage information for all users by omitting a user ID. ::
You can show a summary of usage information for all users by omitting the user
ID, as in the following example command:
.. prompt:: bash $
radosgw-admin usage show --show-log-entries=false
@ -701,9 +877,12 @@ You may also show a summary of usage information for all users by omitting a use
Trim Usage
----------
With heavy use, usage logs can begin to take up storage space. You can trim
usage logs for all users and for specific users. You may also specify date
ranges for trim operations. ::
Usage logs can consume significant storage space, especially over time and with
heavy use. You can trim the usage logs for all users and for specific users.
You can also specify date ranges for trim operations, as in the following
example commands:
.. prompt:: bash $
radosgw-admin usage trim --start-date=2010-01-01 --end-date=2010-12-31
radosgw-admin usage trim --uid=johndoe

View File

@ -275,6 +275,9 @@ Get User Info
Get user information.
Either a ``uid`` or ``access-key`` must be supplied as a request parameter. We recommend supplying uid.
If both are provided but correspond to different users, the info for the user specified with ``uid`` will be returned.
:caps: users=read
@ -297,6 +300,13 @@ Request Parameters
:Example: ``foo_user``
:Required: Yes
``access-key``
:Description: The S3 access key of the user for which the information is requested.
:Type: String
:Example: ``ABCD0EF12GHIJ2K34LMN``
:Required: No
Response Entities
~~~~~~~~~~~~~~~~~

View File

@ -4,12 +4,18 @@ Compression
.. versionadded:: Kraken
The Ceph Object Gateway supports server-side compression of uploaded objects,
using any of Ceph's existing compression plugins.
The Ceph Object Gateway supports server-side compression of uploaded objects.
using any of the existing compression plugins.
.. note:: The Reef release added a :ref:`feature_compress_encrypted` zonegroup
feature to enable compression with `Server-Side Encryption`_.
Supported compression plugins include the following:
* lz4
* snappy
* zlib
* zstd
Configuration
=============
@ -18,14 +24,15 @@ Compression can be enabled on a storage class in the Zone's placement target
by providing the ``--compression=<type>`` option to the command
``radosgw-admin zone placement modify``.
The compression ``type`` refers to the name of the compression plugin to use
when writing new object data. Each compressed object remembers which plugin
was used, so changing this setting does not hinder the ability to decompress
existing objects, nor does it force existing objects to be recompressed.
The compression ``type`` refers to the name of the compression plugin that will
be used when writing new object data. Each compressed object remembers which
plugin was used, so any change to this setting will neither affect Ceph's
ability to decompress existing objects nor require existing objects to be
recompressed.
This compression setting applies to all new objects uploaded to buckets using
this placement target. Compression can be disabled by setting the ``type`` to
an empty string or ``none``.
Compression settings apply to all new objects uploaded to buckets using this
placement target. Compression can be disabled by setting the ``type`` to an
empty string or ``none``.
For example::
@ -62,11 +69,15 @@ For example::
Statistics
==========
While all existing commands and APIs continue to report object and bucket
sizes based their uncompressed data, compression statistics for a given bucket
are included in its ``bucket stats``::
Run the ``radosgw-admin bucket stats`` command to see compression statistics
for a given bucket:
.. prompt:: bash
radosgw-admin bucket stats --bucket=<name>
::
$ radosgw-admin bucket stats --bucket=<name>
{
...
"usage": {
@ -83,6 +94,9 @@ are included in its ``bucket stats``::
...
}
Other commands and APIs will report object and bucket sizes based on their
uncompressed data.
The ``size_utilized`` and ``size_kb_utilized`` fields represent the total
size of compressed data, in bytes and kilobytes respectively.

View File

@ -15,13 +15,13 @@ Storage Clusters. :term:`Ceph Object Storage` supports two interfaces:
that is compatible with a large subset of the OpenStack Swift API.
Ceph Object Storage uses the Ceph Object Gateway daemon (``radosgw``), an HTTP
server designed for interacting with a Ceph Storage Cluster. The Ceph Object
server designed to interact with a Ceph Storage Cluster. The Ceph Object
Gateway provides interfaces that are compatible with both Amazon S3 and
OpenStack Swift, and it has its own user management. Ceph Object Gateway can
store data in the same Ceph Storage Cluster in which data from Ceph File System
clients and Ceph Block Device clients is stored. The S3 API and the Swift API
share a common namespace, which makes it possible to write data to a Ceph
Storage Cluster with one API and then retrieve that data with the other API.
use a single Ceph Storage cluster to store data from Ceph File System and from
Ceph Block device clients. The S3 API and the Swift API share a common
namespace, which means that it is possible to write data to a Ceph Storage
Cluster with one API and then retrieve that data with the other API.
.. ditaa::

View File

@ -24,49 +24,48 @@ Varieties of Multi-site Configuration
.. versionadded:: Jewel
Beginning with the Kraken release, Ceph supports several multi-site
configurations for the Ceph Object Gateway:
Since the Kraken release, Ceph has supported several multi-site configurations
for the Ceph Object Gateway:
- **Multi-zone:** A more advanced topology, the "multi-zone" configuration, is
possible. A multi-zone configuration consists of one zonegroup and
multiple zones, with each zone consisting of one or more `ceph-radosgw`
instances. **Each zone is backed by its own Ceph Storage Cluster.**
- **Multi-zone:** The "multi-zone" configuration has a complex topology. A
multi-zone configuration consists of one zonegroup and multiple zones. Each
zone consists of one or more `ceph-radosgw` instances. **Each zone is backed
by its own Ceph Storage Cluster.**
The presence of multiple zones in a given zonegroup provides disaster
recovery for that zonegroup in the event that one of the zones experiences a
significant failure. Beginning with the Kraken release, each zone is active
and can receive write operations. A multi-zone configuration that contains
multiple active zones enhances disaster recovery and can also be used as a
foundation for content delivery networks.
significant failure. Each zone is active and can receive write operations. A
multi-zone configuration that contains multiple active zones enhances
disaster recovery and can be used as a foundation for content-delivery
networks.
- **Multi-zonegroups:** Ceph Object Gateway supports multiple zonegroups (which
were formerly called "regions"). Each zonegroup contains one or more zones.
If two zones are in the same zonegroup, and if that zonegroup is in the same
realm as a second zonegroup, then the objects stored in the two zones share
a global object namespace. This global object namespace ensures unique
object IDs across zonegroups and zones.
If two zones are in the same zonegroup and that zonegroup is in the same
realm as a second zonegroup, then the objects stored in the two zones share a
global object namespace. This global object namespace ensures unique object
IDs across zonegroups and zones.
Each bucket is owned by the zonegroup where it was created (except where
overridden by the :ref:`LocationConstraint<s3_bucket_placement>` on
bucket creation), and its object data will only replicate to other zones in
that zonegroup. Any request for data in that bucket that are sent to other
bucket creation), and its object data will replicate only to other zones in
that zonegroup. Any request for data in that bucket that is sent to other
zonegroups will redirect to the zonegroup where the bucket resides.
It can be useful to create multiple zonegroups when you want to share a
namespace of users and buckets across many zones, but isolate the object data
to a subset of those zones. It might be that you have several connected sites
that share storage, but only require a single backup for purposes of disaster
recovery. In such a case, it could make sense to create several zonegroups
with only two zones each to avoid replicating all objects to all zones.
namespace of users and buckets across many zones and isolate the object data
to a subset of those zones. Maybe you have several connected sites that share
storage but require only a single backup for purposes of disaster recovery.
In such a case, you could create several zonegroups with only two zones each
to avoid replicating all objects to all zones.
In other cases, it might make more sense to isolate things in separate
realms, with each realm having a single zonegroup. Zonegroups provide
flexibility by making it possible to control the isolation of data and
metadata separately.
In other cases, you might isolate data in separate realms, with each realm
having a single zonegroup. Zonegroups provide flexibility by making it
possible to control the isolation of data and metadata separately.
- **Multiple Realms:** Beginning with the Kraken release, the Ceph Object
Gateway supports "realms", which are containers for zonegroups. Realms make
it possible to set policies that apply to multiple zonegroups. Realms have a
- **Multiple Realms:** Since the Kraken release, the Ceph Object Gateway
supports "realms", which are containers for zonegroups. Realms make it
possible to set policies that apply to multiple zonegroups. Realms have a
globally unique namespace and can contain either a single zonegroup or
multiple zonegroups. If you choose to make use of multiple realms, you can
define multiple namespaces and multiple configurations (this means that each
@ -464,8 +463,8 @@ For example:
.. important:: The following steps assume a multi-site configuration that uses
newly installed systems that have not yet begun storing data. **DO NOT
DELETE the ``default`` zone or its pools** if you are already using it to
store data, or the data will be irretrievably lost.
DELETE the** ``default`` **zone or its pools** if you are already using it
to store data, or the data will be irretrievably lost.
Delete the default zone if needed:
@ -528,6 +527,17 @@ running the following commands on the object gateway host:
systemctl start ceph-radosgw@rgw.`hostname -s`
systemctl enable ceph-radosgw@rgw.`hostname -s`
If the ``cephadm`` command was used to deploy the cluster, you will not be able
to use ``systemctl`` to start the gateway because no services will exist on
which ``systemctl`` could operate. This is due to the containerized nature of
the ``cephadm``-deployed Ceph cluster. If you have used the ``cephadm`` command
and you have a containerized cluster, you must run a command of the following
form to start the gateway:
.. prompt:: bash #
ceph orch apply rgw <name> --realm=<realm> --zone=<zone> --placement --port
Checking Synchronization Status
-------------------------------

View File

@ -154,6 +154,10 @@ updating, use the name of an existing topic and different endpoint values).
[&Attributes.entry.9.key=persistent&Attributes.entry.9.value=true|false]
[&Attributes.entry.10.key=cloudevents&Attributes.entry.10.value=true|false]
[&Attributes.entry.11.key=mechanism&Attributes.entry.11.value=<mechanism>]
[&Attributes.entry.12.key=time_to_live&Attributes.entry.12.value=<seconds to live>]
[&Attributes.entry.13.key=max_retries&Attributes.entry.13.value=<retries number>]
[&Attributes.entry.14.key=retry_sleep_duration&Attributes.entry.14.value=<sleep seconds>]
[&Attributes.entry.15.key=Policy&Attributes.entry.15.value=<policy-JSON-string>]
Request parameters:

View File

@ -11,16 +11,13 @@ multiple zones.
Tuning
======
When ``radosgw`` first tries to operate on a zone pool that does not
exist, it will create that pool with the default values from
``osd pool default pg num`` and ``osd pool default pgp num``. These defaults
are sufficient for some pools, but others (especially those listed in
``placement_pools`` for the bucket index and data) will require additional
tuning. We recommend using the `Ceph Placement Groups per Pool
Calculator <https://old.ceph.com/pgcalc/>`__ to calculate a suitable number of
placement groups for these pools. See
`Pools <http://docs.ceph.com/en/latest/rados/operations/pools/#pools>`__
for details on pool creation.
When ``radosgw`` first tries to operate on a zone pool that does not exist, it
will create that pool with the default values from ``osd pool default pg num``
and ``osd pool default pgp num``. These defaults are sufficient for some pools,
but others (especially those listed in ``placement_pools`` for the bucket index
and data) will require additional tuning. See `Pools
<http://docs.ceph.com/en/latest/rados/operations/pools/#pools>`__ for details
on pool creation.
.. _radosgw-pool-namespaces:

View File

@ -90,7 +90,8 @@ $ sudo ln -sf /usr/local/openresty/bin/openresty /usr/bin/nginx
Put in-place your Nginx configuration files and edit them according to your environment:
All Nginx conf files are under: https://github.com/ceph/ceph/tree/main/examples/rgw/rgw-cache
All Nginx conf files are under:
https://github.com/ceph/ceph/tree/main/examples/rgw/rgw-cache
`nginx.conf` should go to `/etc/nginx/nginx.conf`

View File

@ -2,14 +2,20 @@
Role
======
A role is similar to a user and has permission policies attached to it, that determine what a role can or can not do. A role can be assumed by any identity that needs it. If a user assumes a role, a set of dynamically created temporary credentials are returned to the user. A role can be used to delegate access to users, applications, services that do not have permissions to access some s3 resources.
A role is similar to a user. It has permission policies attached to it that
determine what it can do and what it cannot do. A role can be assumed by any
identity that needs it. When a user assumes a role, a set of
dynamically-created temporary credentials are provided to the user. A role can
be used to delegate access to users, to applications, and to services that do
not have permissions to access certain S3 resources.
The following radosgw-admin commands can be used to create/ delete/ update a role and permissions associated with a role.
The following ``radosgw-admin`` commands can be used to create or delete or
update a role and the permissions associated with it.
Create a Role
-------------
To create a role, execute the following::
To create a role, run a command of the following form::
radosgw-admin role create --role-name={role-name} [--path=="{path to the role}"] [--assume-role-policy-doc={trust-policy-document}]
@ -23,12 +29,13 @@ Request Parameters
``path``
:Description: Path to the role. The default value is a slash(/).
:Description: Path to the role. The default value is a slash(``/``).
:Type: String
``assume-role-policy-doc``
:Description: The trust relationship policy document that grants an entity permission to assume the role.
:Description: The trust relationship policy document that grants an entity
permission to assume the role.
:Type: String
For example::
@ -51,7 +58,9 @@ For example::
Delete a Role
-------------
To delete a role, execute the following::
To delete a role, run a command of the following form:
.. prompt:: bash
radosgw-admin role delete --role-name={role-name}
@ -63,16 +72,21 @@ Request Parameters
:Description: Name of the role.
:Type: String
For example::
For example:
.. prompt:: bash
radosgw-admin role delete --role-name=S3Access1
Note: A role can be deleted only when it doesn't have any permission policy attached to it.
Note: A role can be deleted only when it has no permission policy attached to
it.
Get a Role
----------
To get information about a role, execute the following::
To get information about a role, run a command of the following form:
.. prompt:: bash
radosgw-admin role get --role-name={role-name}
@ -84,7 +98,9 @@ Request Parameters
:Description: Name of the role.
:Type: String
For example::
For example:
.. prompt:: bash
radosgw-admin role get --role-name=S3Access1
@ -104,7 +120,9 @@ For example::
List Roles
----------
To list roles with a specified path prefix, execute the following::
To list roles with a specified path prefix, run a command of the following form:
.. prompt:: bash
radosgw-admin role list [--path-prefix ={path prefix}]
@ -113,10 +131,13 @@ Request Parameters
``path-prefix``
:Description: Path prefix for filtering roles. If this is not specified, all roles are listed.
:Description: Path prefix for filtering roles. If this is not specified, all
roles are listed.
:Type: String
For example::
For example:
.. prompt:: bash
radosgw-admin role list --path-prefix="/application"
@ -134,7 +155,6 @@ For example::
}
]
Update Assume Role Policy Document of a role
--------------------------------------------
@ -334,6 +354,7 @@ Create a Role
-------------
Example::
POST "<hostname>?Action=CreateRole&RoleName=S3Access&Path=/application_abc/component_xyz/&AssumeRolePolicyDocument=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Principal\":\{\"AWS\":\[\"arn:aws:iam:::user/TESTER\"\]\},\"Action\":\[\"sts:AssumeRole\"\]\}\]\}"
.. code-block:: XML
@ -353,14 +374,18 @@ Delete a Role
-------------
Example::
POST "<hostname>?Action=DeleteRole&RoleName=S3Access"
Note: A role can be deleted only when it doesn't have any permission policy attached to it.
Note: A role can be deleted only when it doesn't have any permission policy
attached to it. If you intend to delete a role, you must first delete any
policies attached to it.
Get a Role
----------
Example::
POST "<hostname>?Action=GetRole&RoleName=S3Access"
.. code-block:: XML
@ -380,6 +405,7 @@ List Roles
----------
Example::
POST "<hostname>?Action=ListRoles&RoleName=S3Access&PathPrefix=/application"
.. code-block:: XML
@ -399,18 +425,21 @@ Update Assume Role Policy Document
----------------------------------
Example::
POST "<hostname>?Action=UpdateAssumeRolePolicy&RoleName=S3Access&PolicyDocument=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Principal\":\{\"AWS\":\[\"arn:aws:iam:::user/TESTER2\"\]\},\"Action\":\[\"sts:AssumeRole\"\]\}\]\}"
Add/ Update a Policy attached to a Role
---------------------------------------
Example::
POST "<hostname>?Action=PutRolePolicy&RoleName=S3Access&PolicyName=Policy1&PolicyDocument=\{\"Version\":\"2012-10-17\",\"Statement\":\[\{\"Effect\":\"Allow\",\"Action\":\[\"s3:CreateBucket\"\],\"Resource\":\"arn:aws:s3:::example_bucket\"\}\]\}"
List Permission Policy Names attached to a Role
-----------------------------------------------
Example::
POST "<hostname>?Action=ListRolePolicies&RoleName=S3Access"
.. code-block:: XML
@ -424,6 +453,7 @@ Get Permission Policy attached to a Role
----------------------------------------
Example::
POST "<hostname>?Action=GetRolePolicy&RoleName=S3Access&PolicyName=Policy1"
.. code-block:: XML
@ -439,6 +469,7 @@ Delete Policy attached to a Role
--------------------------------
Example::
POST "<hostname>?Action=DeleteRolePolicy&RoleName=S3Access&PolicyName=Policy1"
Tag a role
@ -447,6 +478,7 @@ A role can have multivalued tags attached to it. These tags can be passed in as
AWS does not support multi-valued role tags.
Example::
POST "<hostname>?Action=TagRole&RoleName=S3Access&Tags.member.1.Key=Department&Tags.member.1.Value=Engineering"
.. code-block:: XML
@ -463,6 +495,7 @@ List role tags
Lists the tags attached to a role.
Example::
POST "<hostname>?Action=ListRoleTags&RoleName=S3Access"
.. code-block:: XML
@ -486,6 +519,7 @@ Delete role tags
Delete a tag/ tags attached to a role.
Example::
POST "<hostname>?Action=UntagRoles&RoleName=S3Access&TagKeys.member.1=Department"
.. code-block:: XML
@ -500,6 +534,7 @@ Update Role
-----------
Example::
POST "<hostname>?Action=UpdateRole&RoleName=S3Access&MaxSessionDuration=43200"
.. code-block:: XML
@ -565,6 +600,3 @@ The following is sample code for adding tags to role, listing tags and untagging
'Department',
]
)

View File

@ -104,7 +104,7 @@ An example of a role permission policy that uses aws:PrincipalTag is as follows:
{
"Effect":"Allow",
"Action":["s3:*"],
"Resource":["arn:aws:s3::t1tenant:my-test-bucket","arn:aws:s3::t1tenant:my-test-bucket/*],"+
"Resource":["arn:aws:s3::t1tenant:my-test-bucket","arn:aws:s3::t1tenant:my-test-bucket/*"],
"Condition":{"StringEquals":{"aws:PrincipalTag/Department":"Engineering"}}
}]
}

View File

@ -32,9 +32,9 @@ the ``librbd`` library.
Ceph's block devices deliver high performance with vast scalability to
`kernel modules`_, or to :abbr:`KVMs (kernel virtual machines)` such as `QEMU`_, and
cloud-based computing systems like `OpenStack`_ and `CloudStack`_ that rely on
libvirt and QEMU to integrate with Ceph block devices. You can use the same cluster
to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
cloud-based computing systems like `OpenStack`_, `OpenNebula`_ and `CloudStack`_
that rely on libvirt and QEMU to integrate with Ceph block devices. You can use
the same cluster to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
:ref:`Ceph File System <ceph-file-system>`, and Ceph block devices simultaneously.
.. important:: To use Ceph Block Devices, you must have access to a running
@ -69,4 +69,5 @@ to operate the :ref:`Ceph RADOS Gateway <object-gateway>`, the
.. _kernel modules: ./rbd-ko/
.. _QEMU: ./qemu-rbd/
.. _OpenStack: ./rbd-openstack
.. _OpenNebula: https://docs.opennebula.io/stable/open_cluster_deployment/storage_setup/ceph_ds.html
.. _CloudStack: ./rbd-cloudstack

View File

@ -41,10 +41,11 @@ illustrates how ``libvirt`` and QEMU use Ceph block devices via ``librbd``.
The most common ``libvirt`` use case involves providing Ceph block devices to
cloud solutions like OpenStack or CloudStack. The cloud solution uses
cloud solutions like OpenStack, OpenNebula or CloudStack. The cloud solution uses
``libvirt`` to interact with QEMU/KVM, and QEMU/KVM interacts with Ceph block
devices via ``librbd``. See `Block Devices and OpenStack`_ and `Block Devices
and CloudStack`_ for details. See `Installation`_ for installation details.
devices via ``librbd``. See `Block Devices and OpenStack`_,
`Block Devices and OpenNebula`_ and `Block Devices and CloudStack`_ for details.
See `Installation`_ for installation details.
You can also use Ceph block devices with ``libvirt``, ``virsh`` and the
``libvirt`` API. See `libvirt Virtualization API`_ for details.
@ -309,6 +310,7 @@ within your VM.
.. _Installation: ../../install
.. _libvirt Virtualization API: http://www.libvirt.org
.. _Block Devices and OpenStack: ../rbd-openstack
.. _Block Devices and OpenNebula: https://docs.opennebula.io/stable/open_cluster_deployment/storage_setup/ceph_ds.html#datastore-internals
.. _Block Devices and CloudStack: ../rbd-cloudstack
.. _Create a pool: ../../rados/operations/pools#create-a-pool
.. _Create a Ceph User: ../../rados/operations/user-management#add-a-user

View File

@ -0,0 +1,70 @@
---------------------------------
NVMe/TCP Initiator for VMware ESX
---------------------------------
Prerequisites
=============
- A VMware ESXi host running VMware vSphere Hypervisor (ESXi) 7.0U3 version or later.
- Deployed Ceph NVMe-oF gateway.
- Ceph cluster with NVMe-oF configuration.
- Subsystem defined in the gateway.
Configuration
=============
The following instructions will use the default vSphere web client and esxcli.
1. Enable NVMe/TCP on a NIC:
.. prompt:: bash #
esxcli nvme fabric enable --protocol TCP --device vmnicN
Replace ``N`` with the number of the NIC.
2. Tag a VMKernel NIC to permit NVMe/TCP traffic:
.. prompt:: bash #
esxcli network uip interface tag add --interface-nme vmkN --tagname NVMeTCP
Replace ``N`` with the ID of the VMkernel.
3. Configure the VMware ESXi host for NVMe/TCP:
#. List the NVMe-oF adapter:
.. prompt:: bash #
esxcli nvme adapter list
#. Discover NVMe-oF subsystems:
.. prompt:: bash #
esxcli nvme fabric discover -a NVME_TCP_ADAPTER -i GATEWAY_IP -p 4420
#. Connect to NVME-oF gateway subsystem:
.. prompt:: bash #
esxcli nvme connect -a NVME_TCP_ADAPTER -i GATEWAY_IP -p 4420 -s SUBSYSTEM_NQN
#. List the NVMe/TCP controllers:
.. prompt:: bash #
esxcli nvme controller list
#. List the NVMe-oF namespaces in the subsystem:
.. prompt:: bash #
esxcli nvme namespace list
4. Verify that the initiator has been set up correctly:
#. From the vSphere client go to the ESXi host.
#. On the Storage page go to the Devices tab.
#. Verify that the NVME/TCP disks are listed in the table.

View File

@ -0,0 +1,83 @@
==============================
NVMe/TCP Initiator for Linux
==============================
Prerequisites
=============
- Kernel 5.0 or later
- RHEL 9.2 or later
- Ubuntu 24.04 or later
- SLES 15 SP3 or later
Installation
============
1. Install the nvme-cli:
.. prompt:: bash #
yum install nvme-cli
2. Load the NVMe-oF module:
.. prompt:: bash #
modprobe nvme-fabrics
3. Verify the NVMe/TCP target is reachable:
.. prompt:: bash #
nvme discover -t tcp -a GATEWAY_IP -s 4420
4. Connect to the NVMe/TCP target:
.. prompt:: bash #
nvme connect -t tcp -a GATEWAY_IP -n SUBSYSTEM_NQN
Next steps
==========
Verify that the initiator is set up correctly:
1. List the NVMe block devices:
.. prompt:: bash #
nvme list
2. Create a filesystem on the desired device:
.. prompt:: bash #
mkfs.ext4 NVME_NODE_PATH
3. Mount the filesystem:
.. prompt:: bash #
mkdir /mnt/nvmeof
.. prompt:: bash #
mount NVME_NODE_PATH /mnt/nvmeof
4. List the NVME-oF files:
.. prompt:: bash #
ls /mnt/nvmeof
5. Create a text file in the ``/mnt/nvmeof`` directory:
.. prompt:: bash #
echo "Hello NVME-oF" > /mnt/nvmeof/hello.text
6. Verify that the file can be accessed:
.. prompt:: bash #
cat /mnt/nvmeof/hello.text

Some files were not shown because too many files have changed in this diff Show More