import new upstream nautilus stable release 14.2.8

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This commit is contained in:
Thomas Lamprecht 2020-03-03 15:27:37 +01:00
parent a0324939f9
commit 92f5a8d42d
11786 changed files with 1045748 additions and 467237 deletions

View File

@ -1,7 +1,7 @@
cmake_minimum_required(VERSION 3.5.1)
project(ceph CXX C ASM)
set(VERSION 14.2.6)
set(VERSION 14.2.8)
if(POLICY CMP0028)
cmake_policy(SET CMP0028 NEW)

View File

@ -1,118 +1,45 @@
14.2.4
14.2.8
------
* In the Zabbix Mgr Module there was a typo in the key being send
to Zabbix for PGs in backfill_wait state. The key that was sent
was 'wait_backfill' and the correct name is 'backfill_wait'.
Update your Zabbix template accordingly so that it accepts the
new key being send to Zabbix.
* The following OSD memory config options related to bluestore cache autotuning can now
be configured during runtime:
14.2.3
--------
- osd_memory_base (default: 768 MB)
- osd_memory_cache_min (default: 128 MB)
- osd_memory_expected_fragmentation (default: 0.15)
- osd_memory_target (default: 4 GB)
* Nautilus-based librbd clients can now open images on Jewel clusters.
The above options can be set with::
* The RGW "num_rados_handles" has been removed.
If you were using a value of "num_rados_handles" greater than 1
multiply your current "objecter_inflight_ops" and
"objecter_inflight_op_bytes" paramaeters by the old
"num_rados_handles" to get the same throttle behavior.
* The ``bluestore_no_per_pool_stats_tolerance`` config option has been
replaced with ``bluestore_fsck_error_on_no_per_pool_stats``
(default: false). The overall default behavior has not changed:
fsck will warn but not fail on legacy stores, and repair will
convert to per-pool stats.
ceph config set global <option> <value>
14.2.2
------
* The MGR now accepts 'profile rbd' and 'profile rbd-read-only' user caps.
These caps can be used to provide users access to MGR-based RBD functionality
such as 'rbd perf image iostat' an 'rbd perf image iotop'.
* The no{up,down,in,out} related commands has been revamped.
There are now 2 ways to set the no{up,down,in,out} flags:
the old 'ceph osd [un]set <flag>' command, which sets cluster-wide flags;
and the new 'ceph osd [un]set-group <flags> <who>' command,
which sets flags in batch at the granularity of any crush node,
or device class.
* The configuration value ``osd_calc_pg_upmaps_max_stddev`` used for upmap
balancing has been removed. Instead use the mgr balancer config
``upmap_max_deviation`` which now is an integer number of PGs of deviation
from the target PGs per OSD. This can be set with a command like
``ceph config set mgr mgr/balancer/upmap_max_deviation 2``. The default
``upmap_max_deviation`` is 1. There are situations where crush rules
would not allow a pool to ever have completely balanced PGs. For example, if
crush requires 1 replica on each of 3 racks, but there are fewer OSDs in 1 of
the racks. In those cases, the configuration value can be increased.
* RGW: radosgw-admin introduces two subcommands that allow the
managing of expire-stale objects that might be left behind after a
bucket reshard in earlier versions of RGW. One subcommand lists such
objects and the other deletes them. Read the troubleshooting section
of the dynamic resharding docs for details.
* RGW: a mismatch between the bucket notification documentation and the actual
message format was fixed. This means that any endpoints receiving bucket
notification, will now receive the same notifications inside an JSON array
named 'Records'. Note that this does not affect pulling bucket notification
from a subscription in a 'pubsub' zone, as these are already wrapped inside
that array.
14.2.5
------
* Ceph will now issue a health warning if a RADOS pool as a ``pg_num``
value that is not a power of two. This can be fixed by adjusting
the pool to a nearby power of two::
* The telemetry module now has a 'device' channel, enabled by default, that
will report anonymized hard disk and SSD health metrics to telemetry.ceph.com
in order to build and improve device failure prediction algorithms. Because
the content of telemetry reports has changed, you will need to either re-opt-in
with::
ceph osd pool set <pool-name> pg_num <new-pg-num>
ceph telemetry on
Alternatively, the warning can be silenced with::
You can view exactly what information will be reported first with::
ceph telemetry show
ceph telemetry show device # specifically show the device channel
If you are not comfortable sharing device metrics, you can disable that
channel first before re-opting-in:
ceph config set mgr mgr/telemetry/channel_crash false
ceph telemetry on
* The telemetry module now reports more information about CephFS file systems,
including:
- how many MDS daemons (in total and per file system)
- which features are (or have been) enabled
- how many data pools
- approximate file system age (year + month of creation)
- how many files, bytes, and snapshots
- how much metadata is being cached
We have also added:
- which Ceph release the monitors are running
- whether msgr v1 or v2 addresses are used for the monitors
- whether IPv4 or IPv6 addresses are used for the monitors
- whether RADOS cache tiering is enabled (and which mode)
- whether pools are replicated or erasure coded, and
which erasure code profile plugin and parameters are in use
- how many hosts are in the cluster, and how many hosts have each type of daemon
- whether a separate OSD cluster network is being used
- how many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled
- how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use
- aggregate stats about the CRUSH map, like which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use
If you had telemetry enabled, you will need to re-opt-in with::
ceph telemetry on
You can view exactly what information will be reported first with::
ceph telemetry show # see everything
ceph telemetry show basic # basic cluster info (including all of the new info)
* A health warning is now generated if the average osd heartbeat ping
time exceeds a configurable threshold for any of the intervals
computed. The OSD computes 1 minute, 5 minute and 15 minute
intervals with average, minimum and maximum values. New configuration
option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of
``osd_heartbeat_grace`` to determine the threshold. A value of zero
disables the warning. New configuration option
``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the
computed value, causes a warning
when OSD heartbeat pings take longer than the specified amount.
New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will
list all connections with a ping time longer than the specified threshold or
value determined by the config options, for the average for any of the 3 intervals.
New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will
do the same but only including heartbeats initiated by the specified OSD.
* New OSD daemon command dump_recovery_reservations which reveals the
recovery locks held (in_progress) and waiting in priority queues.
* New OSD daemon command dump_scrub_reservations which reveals the
scrub reservations that are held for local (primary) and remote (replica) PGs.
ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false

View File

@ -1,7 +1,7 @@
# Contributor: John Coyle <dx9err@gmail.com>
# Maintainer: John Coyle <dx9err@gmail.com>
pkgname=ceph
pkgver=14.2.6
pkgver=14.2.8
pkgrel=0
pkgdesc="Ceph is a distributed object store and file system"
pkgusers="ceph"
@ -64,7 +64,7 @@ makedepends="
xmlstarlet
yasm
"
source="ceph-14.2.6.tar.bz2"
source="ceph-14.2.8.tar.bz2"
subpackages="
$pkgname-base
$pkgname-common
@ -117,7 +117,7 @@ _sysconfdir=/etc
_udevrulesdir=/etc/udev/rules.d
_python_sitelib=/usr/lib/python2.7/site-packages
builddir=$srcdir/ceph-14.2.6
builddir=$srcdir/ceph-14.2.8
build() {
export CEPH_BUILD_VIRTUALENV=$builddir

View File

@ -101,7 +101,7 @@
# main package definition
#################################################################################
Name: ceph
Version: 14.2.6
Version: 14.2.8
Release: 0%{?dist}
%if 0%{?fedora} || 0%{?rhel}
Epoch: 2
@ -117,7 +117,7 @@ License: LGPL-2.1 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and
Group: System/Filesystems
%endif
URL: http://ceph.com/
Source0: %{?_remote_tarball_prefix}ceph-14.2.6.tar.bz2
Source0: %{?_remote_tarball_prefix}ceph-14.2.8.tar.bz2
%if 0%{?suse_version}
# _insert_obs_source_lines_here
ExclusiveArch: x86_64 aarch64 ppc64le s390x
@ -149,12 +149,8 @@ BuildRequires: fuse-devel
%if 0%{?rhel} == 7
# devtoolset offers newer make and valgrind-devel, but the old ones are good
# enough.
%ifarch x86_64
BuildRequires: devtoolset-8-gcc-c++ >= 8.2.1
%else
BuildRequires: devtoolset-7-gcc-c++ >= 7.3.1-5.13
%endif
%else
BuildRequires: gcc-c++
%endif
BuildRequires: gdbm
@ -296,6 +292,7 @@ BuildRequires: python%{_python_buildid}-PyJWT
BuildRequires: python%{_python_buildid}-Routes
BuildRequires: python%{_python_buildid}-Werkzeug
BuildRequires: python%{_python_buildid}-numpy-devel
BuildRequires: rpm-build
BuildRequires: xmlsec1-devel
%endif
%endif
@ -1105,7 +1102,7 @@ This package provides Cephs default alerts for Prometheus.
# common
#################################################################################
%prep
%autosetup -p1 -n ceph-14.2.6
%autosetup -p1 -n ceph-14.2.8
%build
# LTO can be enabled as soon as the following GCC bug is fixed:
@ -1554,6 +1551,7 @@ fi
%files mgr
%{_bindir}/ceph-mgr
%dir %{_datadir}/ceph/mgr
%{_datadir}/ceph/mgr/alerts
%{_datadir}/ceph/mgr/ansible
%{_datadir}/ceph/mgr/balancer
%{_datadir}/ceph/mgr/crash

View File

@ -149,12 +149,8 @@ BuildRequires: fuse-devel
%if 0%{?rhel} == 7
# devtoolset offers newer make and valgrind-devel, but the old ones are good
# enough.
%ifarch x86_64
BuildRequires: devtoolset-8-gcc-c++ >= 8.2.1
%else
BuildRequires: devtoolset-7-gcc-c++ >= 7.3.1-5.13
%endif
%else
BuildRequires: gcc-c++
%endif
BuildRequires: gdbm
@ -296,6 +292,7 @@ BuildRequires: python%{_python_buildid}-PyJWT
BuildRequires: python%{_python_buildid}-Routes
BuildRequires: python%{_python_buildid}-Werkzeug
BuildRequires: python%{_python_buildid}-numpy-devel
BuildRequires: rpm-build
BuildRequires: xmlsec1-devel
%endif
%endif
@ -1554,6 +1551,7 @@ fi
%files mgr
%{_bindir}/ceph-mgr
%dir %{_datadir}/ceph/mgr
%{_datadir}/ceph/mgr/alerts
%{_datadir}/ceph/mgr/ansible
%{_datadir}/ceph/mgr/balancer
%{_datadir}/ceph/mgr/crash

View File

@ -1,8 +1,20 @@
ceph (14.2.6-1xenial) xenial; urgency=medium
ceph (14.2.8-1xenial) xenial; urgency=medium
*
-- Jenkins Build Slave User <jenkins-build@slave-ubuntu02.front.sepia.ceph.com> Wed, 08 Jan 2020 18:48:19 +0000
-- Jenkins Build Slave User <jenkins-build@ceph-builders> Mon, 02 Mar 2020 18:02:26 +0000
ceph (14.2.8-1) stable; urgency=medium
* New upstream release
-- Ceph Release Team <ceph-maintainers@ceph.com> Mon, 02 Mar 2020 17:49:19 +0000
ceph (14.2.7-1) stable; urgency=medium
* New upstream release
-- Ceph Release Team <ceph-maintainers@ceph.com> Fri, 31 Jan 2020 17:07:50 +0000
ceph (14.2.6-1) stable; urgency=medium

View File

@ -137,14 +137,14 @@ function(do_build_boost version)
check_boost_version("${PROJECT_SOURCE_DIR}/src/boost" ${version})
set(source_dir
SOURCE_DIR "${PROJECT_SOURCE_DIR}/src/boost")
elseif(version VERSION_GREATER 1.67)
elseif(version VERSION_GREATER 1.72)
message(FATAL_ERROR "Unknown BOOST_REQUESTED_VERSION: ${version}")
else()
message(STATUS "boost will be downloaded...")
# NOTE: If you change this version number make sure the package is available
# at the three URLs below (may involve uploading to download.ceph.com)
set(boost_version 1.67.0)
set(boost_sha256 2684c972994ee57fc5632e03bf044746f6eb45d4920c343937a465fd67a5adba)
set(boost_version 1.72.0)
set(boost_sha256 59c9b274bc451cf91a9ba1dd2c7fdcaf5d60b1b3aa83f2c9fa143417cc660722)
string(REPLACE "." "_" boost_version_underscore ${boost_version} )
set(boost_url
https://dl.bintray.com/boostorg/release/${boost_version}/source/boost_${boost_version_underscore}.tar.bz2)

File diff suppressed because it is too large Load Diff

View File

@ -1,5 +1,6 @@
lib/systemd/system/ceph-mgr*
usr/bin/ceph-mgr
usr/share/ceph/mgr/alerts
usr/share/ceph/mgr/ansible
usr/share/ceph/mgr/balancer
usr/share/ceph/mgr/crash

View File

@ -13,7 +13,7 @@
{%- if edit_on_github_url %}
<div id="docubetter" align="right" style="display:none; padding: 15px; font-weight: bold;">
<a id="edit-on-github" href="{{ edit_on_github_url }}" rel="nofollow">{{ _('Edit on GitHub')}}</a> | <a href="https://github.com/ceph/ceph/projects/4">Report a Documentation Bug</a>
<a id="edit-on-github" href="{{ edit_on_github_url }}" rel="nofollow">{{ _('Edit on GitHub')}}</a> | <a href="https://pad.ceph.com/p/Report_Documentation_Bugs">Report a Documentation Bug</a>
</div>
{%- endif %}

View File

@ -15,8 +15,11 @@ follow a predictable, and robust way of preparing, activating, and starting OSDs
There is currently support for ``lvm``, and plain disks (with GPT partitions)
that may have been deployed with ``ceph-disk``.
``zfs`` support is available for running a FreeBSD cluster.
* :ref:`ceph-volume-lvm`
* :ref:`ceph-volume-simple`
* :ref:`ceph-volume-zfs`
**Node inventory**
@ -76,3 +79,5 @@ and ``ceph-disk`` is fully disabled. Encryption is fully supported.
simple/activate
simple/scan
simple/systemd
zfs/index
zfs/inventory

View File

@ -26,6 +26,75 @@ the back end can be specified with:
* :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
* :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`
.. _ceph-volume-lvm-prepare_bluestore:
``bluestore``
-------------
The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
more flexibility for devices compared to :term:`filestore`.
Bluestore supports the following configurations:
* A block device, a block.wal, and a block.db device
* A block device and a block.wal device
* A block device and a block.db device
* A single block device
The bluestore subcommand accepts physical block devices, partitions on
physical block devices or logical volumes as arguments for the various device parameters
If a physical device is provided, a logical volume will be created. A volume group will
either be created or reused it its name begins with ``ceph``.
This allows a simpler approach at using LVM but at the cost of flexibility:
there are no options or configurations to change how the LV is created.
The ``block`` is specified with the ``--data`` flag, and in its simplest use
case it looks like::
ceph-volume lvm prepare --bluestore --data vg/lv
A raw device can be specified in the same way::
ceph-volume lvm prepare --bluestore --data /path/to/device
For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
If a ``block.db`` or a ``block.wal`` is needed (they are optional for
bluestore) they can be specified with ``--block.db`` and ``--block.wal``
accordingly. These can be a physical device, a partition or
a logical volume.
For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
because they can be used as-is.
While creating the OSD directory, the process will use a ``tmpfs`` mount to
place all the files needed for the OSD. These files are initially created by
``ceph-osd --mkfs`` and are fully ephemeral.
A symlink is always created for the ``block`` device, and optionally for
``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
id of 0, the directory could look like::
# ls -l /var/lib/ceph/osd/ceph-0
lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
-rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
-rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
-rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
-rw-------. 1 ceph ceph 6 Oct 20 13:05 ready
-rw-------. 1 ceph ceph 10 Oct 20 13:05 type
-rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami
In the above case, a device was used for ``block`` so ``ceph-volume`` create
a volume group and a logical volume using the following convention:
* volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
``ceph-{random uuid}``
* logical volume name: ``osd-block-{osd_fsid}``
.. _ceph-volume-lvm-prepare_filestore:
``filestore``
@ -33,41 +102,47 @@ the back end can be specified with:
This is the OSD backend that allows preparation of logical volumes for
a :term:`filestore` objectstore OSD.
It can use a logical volume for the OSD data and a partitioned physical device
or logical volume for the journal. No special preparation is needed for these
volumes other than following the minimum size requirements for data and
journal.
It can use a logical volume for the OSD data and a physical device, a partition
or logical volume for the journal. A physical device will have a logical volume
created on it. A volume group will either be created or reused it its name begins
with ``ceph``. No special preparation is needed for these volumes other than
following the minimum size requirements for data and journal.
The API call looks like::
The CLI call looks like this of a basic standalone filestore OSD::
ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal journal
ceph-volume lvm prepare --filestore --data <data block device>
To deploy file store with an external journal::
ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device>
For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
ceph-volume lvm prepare --filestore --dmcrypt --data volume_group/lv_name --journal journal
ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device>
There is flexibility to use a raw device or partition as well for ``--data``
that will be converted to a logical volume. This is not ideal in all situations
since ``ceph-volume`` is just going to create a unique volume group and
a logical volume from that device.
Both the journal and data block device can take three forms:
When using logical volumes for ``--data``, the value *must* be a volume group
name and a logical volume name separated by a ``/``. Since logical volume names
are not enforced for uniqueness, this prevents using the wrong volume. The
``--journal`` can be either a logical volume *or* a partition.
* a physical block device
* a partition on a physical block device
* a logical volume
When using a partition, it *must* contain a ``PARTUUID`` discoverable by
``blkid``, so that it can later be identified correctly regardless of the
device name (or path).
When using logical volumes the value *must* be of the format
``volume_group/logical_volume``. Since logical volume names
are not enforced for uniqueness, this prevents accidentally
choosing the wrong volume.
When using a partition, this is how it would look for ``/dev/sdc1``::
When using a partition, it *must* contain a ``PARTUUID``, that can be
discovered by ``blkid``. THis ensure it can later be identified correctly
regardless of the device name (or path).
For example: passing a logical volume for data and a partition ``/dev/sdc1`` for
the journal::
ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal /dev/sdc1
For a logical volume, just like for ``--data``, a volume group and logical
volume name are required::
Passing a bare device for data and a logical volume ias the journal::
ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal volume_group/journal_lv
ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv
A generated uuid is used to ask the cluster for a new OSD. These two pieces are
crucial for identifying an OSD and will later be used throughout the
@ -166,72 +241,6 @@ can be started later (for detailed metadata description see
:ref:`ceph-volume-lvm-tags`).
.. _ceph-volume-lvm-prepare_bluestore:
``bluestore``
-------------
The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
more flexibility for devices. Bluestore supports the following configurations:
* A block device, a block.wal, and a block.db device
* A block device and a block.wal device
* A block device and a block.db device
* A single block device
It can accept a whole device (or partition), or a logical volume for ``block``.
If a physical device is provided it will then be turned into a logical volume.
This allows a simpler approach at using LVM but at the cost of flexibility:
there are no options or configurations to change how the LV is created.
The ``block`` is specified with the ``--data`` flag, and in its simplest use
case it looks like::
ceph-volume lvm prepare --bluestore --data vg/lv
A raw device can be specified in the same way::
ceph-volume lvm prepare --bluestore --data /path/to/device
For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
If a ``block.db`` or a ``block.wal`` is needed (they are optional for
bluestore) they can be specified with ``--block.db`` and ``--block.wal``
accordingly. These can be a physical device (they **must** be a partition) or
a logical volume.
For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
because they can be used as-is. Logical Volumes are also allowed.
While creating the OSD directory, the process will use a ``tmpfs`` mount to
place all the files needed for the OSD. These files are initially created by
``ceph-osd --mkfs`` and are fully ephemeral.
A symlink is always created for the ``block`` device, and optionally for
``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
id of 0, the directory could look like::
# ls -l /var/lib/ceph/osd/ceph-0
lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
-rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
-rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
-rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
-rw-------. 1 ceph ceph 6 Oct 20 13:05 ready
-rw-------. 1 ceph ceph 10 Oct 20 13:05 type
-rw-------. 1 ceph ceph 2 Oct 20 13:05 whoami
In the above case, a device was used for ``block`` so ``ceph-volume`` create
a volume group and a logical volume using the following convention:
* volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
``ceph-{random uuid}``
* logical volume name: ``osd-block-{osd_fsid}``
Crush device class
------------------
@ -300,9 +309,8 @@ Summary
-------
To recap the ``prepare`` process for :term:`bluestore`:
#. Accept a logical volume for block or a raw device (that will get converted
to an lv)
#. Accept partitions or logical volumes for ``block.wal`` or ``block.db``
#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
#. Creates logical volumes on any raw physical devices.
#. Generate a UUID for the OSD
#. Ask the monitor get an OSD ID reusing the generated UUID
#. OSD data directory is created on a tmpfs mount.
@ -314,7 +322,7 @@ To recap the ``prepare`` process for :term:`bluestore`:
And the ``prepare`` process for :term:`filestore`:
#. Accept only logical volumes for data and journal (both required)
#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
#. Generate a UUID for the OSD
#. Ask the monitor get an OSD ID reusing the generated UUID
#. OSD data directory is created and data volume mounted

View File

@ -0,0 +1,31 @@
.. _ceph-volume-zfs:
``zfs``
=======
Implements the functionality needed to deploy OSDs from the ``zfs`` subcommand:
``ceph-volume zfs``
The current implementation only works for ZFS on FreeBSD
**Command Line Subcommands**
* :ref:`ceph-volume-zfs-inventory`
.. not yet implemented
.. * :ref:`ceph-volume-zfs-prepare`
.. * :ref:`ceph-volume-zfs-activate`
.. * :ref:`ceph-volume-zfs-create`
.. * :ref:`ceph-volume-zfs-list`
.. * :ref:`ceph-volume-zfs-scan`
**Internal functionality**
There are other aspects of the ``zfs`` subcommand that are internal and not
exposed to the user, these sections explain how these pieces work together,
clarifying the workflows of the tool.
:ref:`zfs <ceph-volume-zfs-api>`

View File

@ -0,0 +1,19 @@
.. _ceph-volume-zfs-inventory:
``inventory``
=============
The ``inventory`` subcommand queries a host's disc inventory through GEOM and provides
hardware information and metadata on every physical device.
This only works on a FreeBSD platform.
By default the command returns a short, human-readable report of all physical disks.
For programmatic consumption of this report pass ``--format json`` to generate a
JSON formatted report. This report includes extensive information on the
physical drives such as disk metadata (like model and size), logical volumes
and whether they are used by ceph, and if the disk is usable by ceph and
reasons why not.
A device path can be specified to report extensive information on a device in
both plain and json format.

View File

@ -1,51 +1,106 @@
============================
Add/Remove Metadata Server
Deploying Metadata Servers
============================
You must deploy at least one metadata server daemon to use CephFS. Instructions are given here for setting up an MDS manually, but you might prefer to use another tool such as ceph-deploy or ceph-ansible.
Each CephFS file system requires at least one MDS. The cluster operator will
generally use their automated deployment tool to launch required MDS servers as
needed. Rook and ansible (via the ceph-ansible playbooks) are recommended
tools for doing this. For clarity, we also show the systemd commands here which
may be run by the deployment technology if executed on bare-metal.
See `MDS Config Reference`_ for details on configuring metadata servers.
Add a Metadata Server
=====================
Provisioning Hardware for an MDS
================================
#. Create an mds data point ``/var/lib/ceph/mds/ceph-{$id}``.
The present version of the MDS is single-threaded and CPU-bound for most
activities, including responding to client requests. Even so, an MDS under the
most aggressive client loads still uses about 2 to 3 CPU cores. This is due to
the other miscellaneous upkeep threads working in tandem.
Even so, it is recommended that an MDS server be well provisioned with an
advanced CPU with sufficient cores. Development is on-going to make better use
of available CPU cores in the MDS; it is expected in future versions of Ceph
that the MDS server will improve performance by taking advantage of more cores.
The other dimension to MDS performance is the available RAM for caching. The
MDS necessarily manages a distributed and cooperative metadata cache among all
clients and other active MDSs. Therefore it is essential to provide the MDS
with sufficient RAM to enable faster metadata access and mutation.
Generally, an MDS serving a large cluster of clients (1000 or more) will use at
least 64GB of cache (see also :doc:`/cephfs/cache-size-limits`). An MDS with a larger
cache is not well explored in the largest known community clusters; there may
be diminishing returns where management of such a large cache negatively
impacts performance in surprising ways. It would be best to do analysis with
expected workloads to determine if provisioning more RAM is worthwhile.
In a bare-metal cluster, the best practice is to over-provision hardware for
the MDS server. Even if a single MDS daemon is unable to fully utilize the
hardware, it may be desirable later on to start more active MDS daemons on the
same node to fully utilize the available cores and memory. Additionally, it may
become clear with workloads on the cluster that performance improves with
multiple active MDS on the same node rather than over-provisioning a single
MDS.
Finally, be aware that CephFS is a highly-available file system by supporting
standby MDS (see also :ref:`mds-standby`) for rapid failover. To get a real
benefit from deploying standbys, it is usually necessary to distribute MDS
daemons across at least two nodes in the cluster. Otherwise, a hardware failure
on a single node may result in the file system becoming unavailable.
Co-locating the MDS with other Ceph daemons (hyperconverged) is an effective
and recommended way to accomplish this so long as all daemons are configured to
use available hardware within certain limits. For the MDS, this generally
means limiting its cache size.
Adding an MDS
=============
#. Create an mds data point ``/var/lib/ceph/mds/ceph-${id}``. The daemon only uses this directory to store its keyring.
#. Edit ``ceph.conf`` and add MDS section. ::
[mds.{$id}]
[mds.${id}]
host = {hostname}
#. Create the authentication key, if you use CephX. ::
$ sudo ceph auth get-or-create mds.{$id} mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-{$id}/keyring
$ sudo ceph auth get-or-create mds.${id} mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-${id}/keyring
#. Start the service. ::
$ sudo service ceph start mds.{$id}
$ sudo systemctl start mds.${id}
#. The status of the cluster shows: ::
#. The status of the cluster should show: ::
mds: cephfs_a-1/1/1 up {0=c=up:active}, 3 up:standby
mds: ${id}:1 {0=${id}=up:active} 2 up:standby
Remove a Metadata Server
========================
.. note:: Ensure that if you remove a metadata server, the remaining metadata
servers will be able to service requests from CephFS clients. If that is not
possible, consider adding a metadata server before destroying the metadata
server you would like to take offline.
Removing an MDS
===============
If you have a metadata server in your cluster that you'd like to remove, you may use
the following method.
#. Create a new Metadata Server as shown in the above section.
#. (Optionally:) Create a new replacement Metadata Server. If there are no
replacement MDS to take over once the MDS is removed, the file system will
become unavailable to clients. If that is not desirable, consider adding a
metadata server before tearing down the metadata server you would like to
take offline.
#. Stop the old Metadata Server and start using the new one. ::
#. Stop the MDS to be removed. ::
$ ceph mds fail <mds name>
$ sudo systemctl stop mds.${id}
#. Remove the ``/var/lib/ceph/mds/ceph-{$id}`` directory on the old Metadata server.
The MDS will automatically notify the Ceph monitors that it is going down.
This enables the monitors to perform instantaneous failover to an available
standby, if one exists. It is unnecessary to use administrative commands to
effect this failover, e.g. through the use of ``ceph mds fail mds.${id}``.
#. Remove the ``/var/lib/ceph/mds/ceph-${id}`` directory on the MDS. ::
$ sudo rm -rf /var/lib/ceph/mds/ceph-${id}
.. _MDS Config Reference: ../mds-config-ref

View File

@ -29,9 +29,9 @@ directory while creating key for a client using the following syntax. ::
ceph fs authorize *filesystem_name* client.*client_name* /*specified_directory* rw
for example, to restrict client ``foo`` to writing only in the ``bar`` directory of filesystem ``cephfs``, use ::
For example, to restrict client ``foo`` to writing only in the ``bar`` directory of filesystem ``cephfs_a``, use ::
ceph fs authorize cephfs client.foo / r /bar rw
ceph fs authorize cephfs_a client.foo / r /bar rw
results in:
@ -44,7 +44,7 @@ for example, to restrict client ``foo`` to writing only in the ``bar`` directory
To completely restrict the client to the ``bar`` directory, omit the
root directory ::
ceph fs authorize cephfs client.foo /bar rw
ceph fs authorize cephfs_a client.foo /bar rw
Note that if a client's read access is restricted to a path, they will only
be able to mount the filesystem when specifying a readable path in the

View File

@ -8,11 +8,19 @@ Creating pools
A Ceph filesystem requires at least two RADOS pools, one for data and one for metadata.
When configuring these pools, you might consider:
- Using a higher replication level for the metadata pool, as any data
loss in this pool can render the whole filesystem inaccessible.
- Using lower-latency storage such as SSDs for the metadata pool, as this
will directly affect the observed latency of filesystem operations
on clients.
- Using a higher replication level for the metadata pool, as any data loss in
this pool can render the whole filesystem inaccessible.
- Using lower-latency storage such as SSDs for the metadata pool, as this will
directly affect the observed latency of filesystem operations on clients.
- The data pool used to create the file system is the "default" data pool and
the location for storing all inode backtrace information, used for hard link
management and disaster recovery. For this reason, all inodes created in
CephFS have at least one object in the default data pool. If erasure-coded
pools are planned for the file system, it is usually better to use a
replicated pool for the default data pool to improve small-object write and
read performance for updating backtraces. Separately, another erasure-coded
data pool can be added (see also :ref:`ecpool`) that can be used on an entire
hierarchy of directories and files (see also :ref:`file-layouts`).
Refer to :doc:`/rados/operations/pools` to learn more about managing pools. For
example, to create two pools with default settings for use with a filesystem, you
@ -23,6 +31,11 @@ might run the following commands:
$ ceph osd pool create cephfs_data <pg_num>
$ ceph osd pool create cephfs_metadata <pg_num>
Generally, the metadata pool will have at most a few gigabytes of data. For
this reason, a smaller PG count is usually recommended. 64 or 128 is commonly
used in practice for large clusters.
Creating a filesystem
=====================

View File

@ -1,3 +1,4 @@
.. _file-layouts:
File layouts
============

View File

@ -65,14 +65,14 @@ FS Subvolume groups
Create a subvolume group using::
$ ceph fs subvolumegroup create <vol_name> <group_name> [--mode <octal_mode> --pool_layout <data_pool_name>]
$ ceph fs subvolumegroup create <vol_name> <group_name> [--pool_layout <data_pool_name> --uid <uid> --gid <gid> --mode <octal_mode>]
The command succeeds even if the subvolume group already exists.
When creating a subvolume group you can specify its data pool layout (see
:doc:`/cephfs/file-layouts`), and file mode in octal numerals. By default, the
subvolume group is created with an octal file mode '755', and data pool layout
of its parent directory.
:doc:`/cephfs/file-layouts`), uid, gid, and file mode in octal numerals. By default, the
subvolume group is created with an octal file mode '755', uid '0', gid '0' and data pool
layout of its parent directory.
Remove a subvolume group using::
@ -108,17 +108,17 @@ FS Subvolumes
Create a subvolume using::
$ ceph fs subvolume create <vol_name> <subvol_name> [--group_name <subvol_group_name> --mode <octal_mode> --pool_layout <data_pool_name> --size <size_in_bytes>]
$ ceph fs subvolume create <vol_name> <subvol_name> [--size <size_in_bytes> --group_name <subvol_group_name> --pool_layout <data_pool_name> --uid <uid> --gid <gid> --mode <octal_mode>]
The command succeeds even if the subvolume already exists.
When creating a subvolume you can specify its subvolume group, data pool layout,
file mode in octal numerals, and size in bytes. The size of the subvolume is
uid, gid, file mode in octal numerals, and size in bytes. The size of the subvolume is
specified by setting a quota on it (see :doc:`/cephfs/quota`). By default a
subvolume is created within the default subvolume group, and with an octal file
mode '755', data pool layout of its parent directory and no size limit.
mode '755', uid of its subvolume group, gid of its subvolume group, data pool layout of
its parent directory and no size limit.
Remove a subvolume group using::
@ -133,6 +133,14 @@ The removal of a subvolume fails if it has snapshots, or is non-existent.
Using the '--force' flag allows the command to succeed even if the subvolume is
non-existent.
Resize a subvolume using::
$ ceph fs subvolume resize <vol_name> <subvol_name> <new_size> [--group_name <subvol_group_name>] [--no_shrink]
The command resizes the subvolume quota using the size specified by 'new_size'.
'--no_shrink' flag prevents the subvolume to shrink below the current used size of the subvolume.
The subvolume can be resized to an infinite size by passing 'inf' or 'infinite' as the new_size.
Fetch the absolute path of a subvolume using::

View File

@ -50,8 +50,7 @@ least one :term:`Ceph Metadata Server` running.
.. toctree::
:maxdepth: 1
Add/Remove MDS(s) <add-remove-mds>
MDS states <mds-states>
Provision/Add/Remove MDS(s) <add-remove-mds>
MDS failover and standby configuration <standby>
MDS Configuration Settings <mds-config-ref>
Client Configuration Settings <client-config-ref>
@ -70,7 +69,7 @@ authentication keyring.
.. toctree::
:maxdepth: 1
Create CephFS <createfs>
Create a CephFS file system <createfs>
Mount CephFS <kernel>
Mount CephFS as FUSE <fuse>
Mount CephFS in fstab <fstab>

View File

@ -10,4 +10,5 @@ ceph-volume developer documentation
plugins
lvm
zfs
systemd

View File

@ -0,0 +1,176 @@
.. _ceph-volume-zfs-api:
ZFS
===
The backend of ``ceph-volume zfs`` is ZFS, it relies heavily on the usage of
tags, which is a way for ZFS to allow extending its volume metadata. These
values can later be queried against devices and it is how they get discovered
later.
Currently this interface is only usable when running on FreeBSD.
.. warning:: These APIs are not meant to be public, but are documented so that
it is clear what the tool is doing behind the scenes. Do not alter
any of these values.
.. _ceph-volume-zfs-tag-api:
Tag API
-------
The process of identifying filesystems, volumes and pools as part of Ceph relies
on applying tags on all volumes. It follows a naming convention for the
namespace that looks like::
ceph.<tag name>=<tag value>
All tags are prefixed by the ``ceph`` keyword to claim ownership of that
namespace and make it easily identifiable. This is how the OSD ID would be used
in the context of zfs tags::
ceph.osd_id=0
Tags on filesystems are stored as property.
Tags on a zpool are stored in the comment property as a concatenated list
seperated by ``;``
.. _ceph-volume-zfs-tags:
Metadata
--------
The following describes all the metadata from Ceph OSDs that is stored on a
ZFS filesystem, volume, pool:
``type``
--------
Describes if the device is an OSD or Journal, with the ability to expand to
other types when supported
Example::
ceph.type=osd
``cluster_fsid``
----------------
Example::
ceph.cluster_fsid=7146B649-AE00-4157-9F5D-1DBFF1D52C26
``data_device``
---------------
Example::
ceph.data_device=/dev/ceph/data-0
``data_uuid``
-------------
Example::
ceph.data_uuid=B76418EB-0024-401C-8955-AE6919D45CC3
``journal_device``
------------------
Example::
ceph.journal_device=/dev/ceph/journal-0
``journal_uuid``
----------------
Example::
ceph.journal_uuid=2070E121-C544-4F40-9571-0B7F35C6CB2B
``osd_fsid``
------------
Example::
ceph.osd_fsid=88ab9018-f84b-4d62-90b4-ce7c076728ff
``osd_id``
----------
Example::
ceph.osd_id=1
``block_device``
----------------
Just used on :term:`bluestore` backends. Captures the path to the logical
volume path.
Example::
ceph.block_device=/dev/gpt/block-0
``block_uuid``
--------------
Just used on :term:`bluestore` backends. Captures either the logical volume UUID or
the partition UUID.
Example::
ceph.block_uuid=E5F041BB-AAD4-48A8-B3BF-31F7AFD7D73E
``db_device``
-------------
Just used on :term:`bluestore` backends. Captures the path to the logical
volume path.
Example::
ceph.db_device=/dev/gpt/db-0
``db_uuid``
-----------
Just used on :term:`bluestore` backends. Captures either the logical volume UUID or
the partition UUID.
Example::
ceph.db_uuid=F9D02CF1-31AB-4910-90A3-6A6302375525
``wal_device``
--------------
Just used on :term:`bluestore` backends. Captures the path to the logical
volume path.
Example::
ceph.wal_device=/dev/gpt/wal-0
``wal_uuid``
------------
Just used on :term:`bluestore` backends. Captures either the logical volume UUID or
the partition UUID.
Example::
ceph.wal_uuid=A58D1C68-0D6E-4CB3-8E99-B261AD47CC39
``compression``
---------------
A compression-enabled device can allways be set using the native zfs settings on
a volume or filesystem. This will/can be activated during creation of the volume
of filesystem.
When activated by ``ceph-volume zfs`` this tag will be created.
Compression manually set AFTER ``ceph-volume`` will go unnoticed, unless this
tag is also manually set.
Example for an enabled compression device::
ceph.vdo=1

View File

@ -50,6 +50,10 @@ Any options not recognized by ceph-fuse will be passed on to libfuse.
Connect to specified monitor (instead of looking through ceph.conf).
.. option:: -k <path-to-keyring>
Provide path to keyring; useful when it's absent in standard locations.
.. option:: --client_mountpoint/-r root_directory
Use root_directory as the mounted root, rather than the full Ceph tree.

View File

@ -13,6 +13,12 @@ Synopsis
| **osdmaptool** *mapfilename* [--print] [--createsimple *numosd*
[--pgbits *bitsperosd* ] ] [--clobber]
| **osdmaptool** *mapfilename* [--import-crush *crushmap*]
| **osdmaptool** *mapfilename* [--export-crush *crushmap*]
| **osdmaptool** *mapfilename* [--upmap *file*] [--upmap-max *max-optimizations*]
[--upmap-deviation *max-deviation*] [--upmap-pool *poolname*]
[--upmap-save *file*] [--upmap-save *newosdmap*] [--upmap-active]
| **osdmaptool** *mapfilename* [--upmap-cleanup] [--upmap-save *newosdmap*]
Description
@ -21,6 +27,8 @@ Description
**osdmaptool** is a utility that lets you create, view, and manipulate
OSD cluster maps from the Ceph distributed storage system. Notably, it
lets you extract the embedded CRUSH map or import a new CRUSH map.
It can also simulate the upmap balancer mode so you can get a sense of
what is needed to balance your PGs.
Options
@ -111,6 +119,10 @@ Options
mark osds up and in (but do not persist).
.. option:: --mark-out
mark an osd as out (but do not persist)
.. option:: --tree
Displays a hierarchical tree of the map.
@ -119,6 +131,43 @@ Options
clears pg_temp and primary_temp variables.
.. option:: --health
dump health checks
.. option:: --with-default-pool
include default pool when creating map
.. option:: --upmap-cleanup <file>
clean up pg_upmap[_items] entries, writing commands to <file> [default: - for stdout]
.. option:: --upmap <file>
calculate pg upmap entries to balance pg layout writing commands to <file> [default: - for stdout]
.. option:: --upmap-max <max-optimizations>
set max upmap entries to calculate [default: 10]
.. option:: --upmap-deviation <max-deviation>
max deviation from target [default: 5]
.. option:: --upmap-pool <poolname>
restrict upmap balancing to 1 pool or the option can be repeated for multiple pools
.. option:: --upmap-save
write modified OSDMap with upmap changes
.. option:: --upmap-active
Act like an active balancer, keep applying changes until balanced
Example
=======
@ -130,19 +179,19 @@ To view the result::
osdmaptool --print osdmap
To view the mappings of placement groups for pool 0::
To view the mappings of placement groups for pool 1::
osdmaptool --test-map-pgs-dump rbd --pool 0
osdmaptool osdmap --test-map-pgs-dump --pool 1
pool 0 pg_num 8
0.0 [0,2,1] 0
0.1 [2,0,1] 2
0.2 [0,1,2] 0
0.3 [2,0,1] 2
0.4 [0,2,1] 0
0.5 [0,2,1] 0
0.6 [0,1,2] 0
0.7 [1,0,2] 1
1.0 [0,2,1] 0
1.1 [2,0,1] 2
1.2 [0,1,2] 0
1.3 [2,0,1] 2
1.4 [0,2,1] 0
1.5 [0,2,1] 0
1.6 [0,1,2] 0
1.7 [1,0,2] 1
#osd count first primary c wt wt
osd.0 8 5 5 1 1
osd.1 8 1 1 1 1
@ -157,7 +206,7 @@ To view the mappings of placement groups for pool 0::
size 3 8
In which,
#. pool 0 has 8 placement groups. And two tables follow:
#. pool 1 has 8 placement groups. And two tables follow:
#. A table for placement groups. Each row presents a placement group. With columns of:
* placement group id,
@ -201,6 +250,56 @@ placement group distribution, whose standard deviation is 1.41421::
size 20
size 364
To simulate the active balancer in upmap mode::
osdmaptool --upmap upmaps.out --upmap-active --upmap-deviation 6 --upmap-max 11 osdmap
osdmaptool: osdmap file 'osdmap'
writing upmap command output to: upmaps.out
checking for upmap cleanups
upmap, max-count 11, max deviation 6
pools movies photos metadata data
prepared 11/11 changes
Time elapsed 0.00310404 secs
pools movies photos metadata data
prepared 11/11 changes
Time elapsed 0.00283402 secs
pools data metadata movies photos
prepared 11/11 changes
Time elapsed 0.003122 secs
pools photos metadata data movies
prepared 11/11 changes
Time elapsed 0.00324372 secs
pools movies metadata data photos
prepared 1/11 changes
Time elapsed 0.00222609 secs
pools data movies photos metadata
prepared 0/11 changes
Time elapsed 0.00209916 secs
Unable to find further optimization, or distribution is already perfect
osd.0 pgs 41
osd.1 pgs 42
osd.2 pgs 42
osd.3 pgs 41
osd.4 pgs 46
osd.5 pgs 39
osd.6 pgs 39
osd.7 pgs 43
osd.8 pgs 41
osd.9 pgs 46
osd.10 pgs 46
osd.11 pgs 46
osd.12 pgs 46
osd.13 pgs 41
osd.14 pgs 40
osd.15 pgs 40
osd.16 pgs 39
osd.17 pgs 46
osd.18 pgs 46
osd.19 pgs 39
osd.20 pgs 42
Total time elapsed 0.0167765 secs, 5 rounds
Availability
============

58
ceph/doc/mgr/alerts.rst Normal file
View File

@ -0,0 +1,58 @@
Alerts module
=============
The alerts module can send simple alert messages about cluster health
via e-mail. In the future, it will support other notification methods
as well.
:note: This module is *not* intended to be a robust monitoring
solution. The fact that it is run as part of the Ceph cluster
itself is fundamentally limiting in that a failure of the
ceph-mgr daemon prevents alerts from being sent. This module
can, however, be useful for standalone clusters that exist in
environments where existing monitoring infrastructure does not
exist.
Enabling
--------
The *alerts* module is enabled with::
ceph mgr module enable alerts
Configuration
-------------
To configure SMTP, all of the following config options must be set::
ceph config set mgr mgr/alerts/smtp_host *<smtp-server>*
ceph config set mgr mgr/alerts/smtp_destination *<email-address-to-send-to>*
ceph config set mgr mgr/alerts/smtp_sender *<from-email-address>*
By default, the module will use SSL and port 465. To change that,::
ceph config set mgr mgr/alerts/smtp_ssl false # if not SSL
ceph config set mgr mgr/alerts/smtp_port *<port-number>* # if not 465
To authenticate to the SMTP server, you must set the user and password::
ceph config set mgr mgr/alerts/smtp_user *<username>*
ceph config set mgr mgr/alerts/smtp_password *<password>*
By default, the name in the ``From:`` line is simply ``Ceph``. To
change that (e.g., to identify which cluster this is),::
ceph config set mgr mgr/alerts/smtp_from_name 'Ceph Cluster Foo'
By default, the module will check the cluster health once per minute
and, if there is a change, send a message. To change that
frequency,::
ceph config set mgr mgr/alerts/interval *<interval>* # e.g., "5m" for 5 minutes
Commands
--------
To force an alert to be send immediately,::
ceph alerts send

View File

@ -425,6 +425,12 @@ The format of url is : `<protocol>:<IP-address>:<port>`
above, check your browser's documentation on how to unblock mixed content.
Alternatively, consider enabling SSL/TLS support in Grafana.
If you are using a self-signed certificate in your Grafana setup, then you should
disable certificate verification in the dashboard to avoid refused connections,
e.g. caused by certificates signed by unknown CA or not matching the host name::
$ ceph dashboard set-grafana-api-ssl-verify False
You can directly access Grafana Instance as well to monitor your cluster.
.. _dashboard-sso-support:

View File

@ -29,6 +29,7 @@ sensible.
Writing modules <modules>
Writing orchestrator plugins <orchestrator_modules>
Dashboard module <dashboard>
Alerts module <alerts>
DiskPrediction module <diskprediction>
Local pool module <localpool>
RESTful module <restful>

View File

@ -100,6 +100,22 @@ A `template <https://raw.githubusercontent.com/ceph/ceph/9c54334b615362e0a60442c
This template contains all items and a few triggers. You can customize the triggers afterwards to fit your needs.
Multiple Zabbix servers
^^^^^^^^^^^^^^^^^^^^^^^
It is possible to instruct zabbix module to send data to multiple Zabbix servers.
Parameter *zabbix_host* can be set with multiple hostnames separated by commas.
Hosnames (or IP adderesses) can be followed by colon and port number. If a port
number is not present module will use the port number defined in *zabbix_port*.
For example:
::
ceph zabbix config-set zabbix_host "zabbix1,zabbix2:2222,zabbix3:3333"
Manually sending data
---------------------
If needed the module can be asked to send data immediately instead of waiting for

View File

@ -60,15 +60,6 @@ Ceph configuration file.
:Default: ``30``
``mon pg warn max per osd``
:Description: Issue a ``HEALTH_WARN`` in cluster log if the average number
of PGs per (in) OSD is above this number. (a non-positive number
disables this)
:Type: Integer
:Default: ``300``
``mon pg warn min objects``
:Description: Do not warn if the total number of objects in cluster is below
@ -207,7 +198,7 @@ Ceph configuration file.
value is the same as ``pg_num`` with ``mkpool``.
:Type: 32-bit Integer
:Default: ``8``
:Default: ``32``
``osd pool default pgp num``

View File

@ -1,3 +1,5 @@
.. _ecpool:
=============
Erasure code
=============

View File

@ -636,6 +636,23 @@ The PG count for existing pools can be increased or new pools can be created.
Please refer to :ref:`choosing-number-of-placement-groups` for more
information.
POOL_PG_NUM_NOT_POWER_OF_TWO
____________________________
One or more pools has a ``pg_num`` value that is not a power of two.
Although this is not strictly incorrect, it does lead to a less
balanced distribution of data because some PGs have roughly twice as
much data as others.
This is easily corrected by setting the ``pg_num`` value for the
affected pool(s) to a nearby power of two::
ceph osd pool set <pool-name> pg_num <value>
This health warning can be disabled with::
ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
POOL_TOO_FEW_PGS
________________

View File

@ -345,7 +345,7 @@ You may set values for the following keys:
``crush_rule``
:Description: The rule to use for mapping object placement in the cluster.
:Type: Integer
:Type: String
.. _allow_ec_overwrites:

View File

@ -23,14 +23,12 @@ use with::
ceph features
A word of caution
Balancer module
-----------------
This is a new feature and not very user friendly. At the time of this
writing we are working on a new `balancer` module for ceph-mgr that
will eventually do all of this automatically.
The new `balancer` module for ceph-mgr will automatically balance
the number of PGs per OSD. See ``Balancer``
Until then,
Offline optimization
--------------------
@ -43,7 +41,9 @@ Upmap entries are updated with an offline optimizer built into ``osdmaptool``.
#. Run the optimizer::
osdmaptool om --upmap out.txt [--upmap-pool <pool>] [--upmap-max <max-count>] [--upmap-deviation <max-deviation>]
osdmaptool om --upmap out.txt [--upmap-pool <pool>]
[--upmap-max <max-optimizations>] [--upmap-deviation <max-deviation>]
[--upmap-active]
It is highly recommended that optimization be done for each pool
individually, or for sets of similarly-utilized pools. You can
@ -52,24 +52,34 @@ Upmap entries are updated with an offline optimizer built into ``osdmaptool``.
kind of data (e.g., RBD image pools, yes; RGW index pool and RGW
data pool, no).
The ``max-count`` value is the maximum number of upmap entries to
identify in the run. The default is 100, but you may want to make
this a smaller number so that the tool completes more quickly (but
does less work). If it cannot find any additional changes to make
it will stop early (i.e., when the pool distribution is perfect).
The ``max-optimizations`` value is the maximum number of upmap entries to
identify in the run. The default is `10` like the ceph-mgr balancer module,
but you should use a larger number if you are doing offline optimization.
If it cannot find any additional changes to make it will stop early
(i.e., when the pool distribution is perfect).
The ``max-deviation`` value defaults to `.01` (i.e., 1%). If an OSD
utilization varies from the average by less than this amount it
will be considered perfect.
The ``max-deviation`` value defaults to `5`. If an OSD PG count
varies from the computed target number by less than or equal
to this amount it will be considered perfect.
#. The proposed changes are written to the output file ``out.txt`` in
the example above. These are normal ceph CLI commands that can be
run to apply the changes to the cluster. This can be done with::
The ``--upmap-active`` option simulates the behavior of the active
balancer in upmap mode. It keeps cycling until the OSDs are balanced
and reports how many rounds and how long each round is taking. The
elapsed time for rounds indicates the CPU load ceph-mgr will be
consuming when it tries to compute the next optimization plan.
#. Apply the changes::
source out.txt
The proposed changes are written to the output file ``out.txt`` in
the example above. These are normal ceph CLI commands that can be
run to apply the changes to the cluster.
The above steps can be repeated as many times as necessary to achieve
a perfect distribution of PGs for each set of pools.
You can see some (gory) details about what the tool is doing by
passing ``--debug-osd 10`` to ``osdmaptool``.
passing ``--debug-osd 10`` and even more with ``--debug-crush 10``
to ``osdmaptool``.

View File

@ -144,6 +144,35 @@ Capability syntax follows the form::
the use of this capability is restricted to clients connecting from
this network.
- **Manager Caps:** Manager (``ceph-mgr``) capabilities include
``r``, ``w``, ``x`` access settings or ``profile {name}``. For example: ::
mgr 'allow {access-spec} [network {network/prefix}]'
mgr 'profile {name} [{key1} {match-type} {value1} ...] [network {network/prefix}]'
Manager capabilities can also be specified for specific commands,
all commands exported by a built-in manager service, or all commands
exported by a specific add-on module. For example: ::
mgr 'allow command "{command-prefix}" [with {key1} {match-type} {value1} ...] [network {network/prefix}]'
mgr 'allow service {service-name} {access-spec} [network {network/prefix}]'
mgr 'allow module {module-name} [with {key1} {match-type} {value1} ...] {access-spec} [network {network/prefix}]'
The ``{access-spec}`` syntax is as follows: ::
* | all | [r][w][x]
The ``{service-name}`` is one of the following: ::
mgr | osd | pg | py
The ``{match-type}`` is one of the following: ::
= | prefix | regex
- **Metadata Server Caps:** For administrators, use ``allow *``. For all
other users, such as CephFS clients, consult :doc:`/cephfs/client-auth`
@ -240,12 +269,15 @@ The following entries describe valid capability profiles:
so they have permissions to add keys, etc. when bootstrapping
an ``rbd-mirror`` daemon.
``profile rbd`` (Monitor and OSD)
``profile rbd`` (Manager, Monitor, and OSD)
:Description: Gives a user permissions to manipulate RBD images. When used
as a Monitor cap, it provides the minimal privileges required
by an RBD client application. When used as an OSD cap, it
provides read-write access to an RBD client application.
by an RBD client application; this includes the ability
to blacklist other client users. When used as an OSD cap, it
provides read-write access to the specified pool to an
RBD client application. The Manager cap supports optional
``pool`` and ``namespace`` keyword arguments.
``profile rbd-mirror`` (Monitor only)
@ -253,9 +285,11 @@ The following entries describe valid capability profiles:
RBD mirroring config-key secrets. It provides the minimal
privileges required for the ``rbd-mirror`` daemon.
``profile rbd-read-only`` (OSD only)
``profile rbd-read-only`` (Manager and OSD)
:Description: Gives a user read-only permissions to RBD images.
:Description: Gives a user read-only permissions to RBD images. The Manager
cap supports optional ``pool`` and ``namespace`` keyword
arguments.
Pool

View File

@ -439,7 +439,18 @@ information stored in OSDs.::
--cap mon 'allow *'
ceph-authtool /path/to/admin.keyring -n client.admin \
--cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
ceph-monstore-tool $ms rebuild -- --keyring /path/to/admin.keyring
# add one or more ceph-mgr's key to the keyring. in this case, an encoded key
# for mgr.x is added, you can find the encoded key in
# /etc/ceph/${cluster}.${mgr_name}.keyring on the machine where ceph-mgr is
# deployed
ceph-authtool /path/to/admin.keyring --add-key 'AQDN8kBe9PLWARAAZwxXMr+n85SBYbSlLcZnMA==' -n mgr.x \
--cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *'
# if your monitors' ids are not single characters like 'a', 'b', 'c', please
# specify them in the command line by passing them as arguments of the "--mon-ids"
# option. if you are not sure, please check your ceph.conf to see if there is any
# sections named like '[mon.foo]'. don't pass the "--mon-ids" option, if you are
# using DNS SRV for looking up monitors.
ceph-monstore-tool $ms rebuild -- --keyring /path/to/admin.keyring --mon-ids alpha beta gamma
# make a backup of the corrupted store.db just in case! repeat for
# all monitors.

View File

@ -258,7 +258,7 @@ pushed or pulled using the pubsub sync module.
"eTag":"",
"versionId":"",
"sequencer": "",
"metadata":""
"metadata":[]
}
},
"eventId":"",
@ -283,7 +283,7 @@ pushed or pulled using the pubsub sync module.
- s3.object.version: object version in case of versioned bucket
- s3.object.sequencer: monotonically increasing identifier of the change per object (hexadecimal format)
- s3.object.metadata: any metadata set on the object sent as: ``x-amz-meta-`` (an extension to the S3 notification API)
- s3.eventId: not supported (an extension to the S3 notification API)
- s3.eventId: unique ID of the event, that could be used for acking (an extension to the S3 notification API)
.. _PubSub Module : ../pubsub-module
.. _S3 Notification Compatibility: ../s3-notification-compatibility

View File

@ -438,7 +438,7 @@ the events will have an S3-compatible record format (JSON):
"eTag":"",
"versionId":"",
"sequencer":"",
"metadata":""
"metadata":[]
}
},
"eventId":"",
@ -452,7 +452,6 @@ the events will have an S3-compatible record format (JSON):
- requestParameters: not supported
- responseElements: not supported
- s3.configurationId: notification ID that created the subscription for the event
- s3.eventId: unique ID of the event, that could be used for acking (an extension to the S3 notification API)
- s3.bucket.name: name of the bucket
- s3.bucket.ownerIdentity.principalId: owner of the bucket
- s3.bucket.arn: ARN of the bucket

View File

@ -35,13 +35,13 @@ recommended that you utilize a more restricted user wherever possible.
To `create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create``
command, user name, monitor caps, and OSD caps::
ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]'
ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]' mgr 'profile rbd [pool={pool-name}]'
For example, to create a user ID named ``qemu`` with read-write access to the
pool ``vms`` and read-only access to the pool ``images``, execute the
following::
ceph auth get-or-create client.qemu mon 'profile rbd' osd 'profile rbd pool=vms, profile rbd-read-only pool=images'
ceph auth get-or-create client.qemu mon 'profile rbd' osd 'profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=images'
The output from the ``ceph auth get-or-create`` command will be the keyring for
the specified user, which can be written to ``/etc/ceph/ceph.client.{ID}.keyring``.

View File

@ -4,6 +4,17 @@
See `Block Device`_ for additional details.
Generic IO Settings
===================
``rbd compression hint``
:Description: Hint to send to the OSDs on write operations. If set to `compressible` and the OSD `bluestore compression mode` setting is `passive`, the OSD will attempt to compress the data. If set to `incompressible` and the OSD compression setting is `aggressive`, the OSD will not attempt to compress the data.
:Type: Enum
:Required: No
:Default: ``none``
:Values: ``none``, ``compressible``, ``incompressible``
Cache Settings
=======================

View File

@ -132,9 +132,9 @@ Setup Ceph Client Authentication
If you have `cephx authentication`_ enabled, create a new user for Nova/Cinder
and Glance. Execute the following::
ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images'
ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images'
ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups'
ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' mgr 'profile rbd pool=images'
ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes, profile rbd pool=vms'
ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' mgr 'profile rbd pool=backups'
Add the keyrings for ``client.cinder``, ``client.glance``, and
``client.cinder-backup`` to the appropriate nodes and change their ownership::

View File

@ -148,25 +148,33 @@ function install_pkg_on_ubuntu {
function install_boost_on_ubuntu {
local codename=$1
if dpkg -s ceph-libboost1.67-dev &> /dev/null; then
$SUDO env DEBIAN_FRONTEND=noninteractive apt-get -y remove 'ceph-libboost.*1.67.*'
$SUDO rm /etc/apt/sources.list.d/ceph-libboost1.67.list
fi
local project=libboost
local ver=1.72
local sha1=1d7c7a00cc3f37e340bae0360191a757b44ec80c
install_pkg_on_ubuntu \
ceph-libboost1.67 \
dd38c27740c1f9a9e6719a07eef84a1369dc168b \
$project \
$sha1 \
$codename \
ceph-libboost-atomic1.67-dev \
ceph-libboost-chrono1.67-dev \
ceph-libboost-container1.67-dev \
ceph-libboost-context1.67-dev \
ceph-libboost-coroutine1.67-dev \
ceph-libboost-date-time1.67-dev \
ceph-libboost-filesystem1.67-dev \
ceph-libboost-iostreams1.67-dev \
ceph-libboost-program-options1.67-dev \
ceph-libboost-python1.67-dev \
ceph-libboost-random1.67-dev \
ceph-libboost-regex1.67-dev \
ceph-libboost-system1.67-dev \
ceph-libboost-thread1.67-dev \
ceph-libboost-timer1.67-dev
ceph-libboost-atomic$ver-dev \
ceph-libboost-chrono$ver-dev \
ceph-libboost-container$ver-dev \
ceph-libboost-context$ver-dev \
ceph-libboost-coroutine$ver-dev \
ceph-libboost-date-time$ver-dev \
ceph-libboost-filesystem$ver-dev \
ceph-libboost-iostreams$ver-dev \
ceph-libboost-program-options$ver-dev \
ceph-libboost-python$ver-dev \
ceph-libboost-random$ver-dev \
ceph-libboost-regex$ver-dev \
ceph-libboost-system$ver-dev \
ceph-libboost-test$ver-dev \
ceph-libboost-thread$ver-dev \
ceph-libboost-timer$ver-dev
}
function version_lt {
@ -350,7 +358,7 @@ else
$SUDO yum -y install centos-release-scl-rh
$SUDO yum-config-manager --disable centos-sclo-rh
$SUDO yum-config-manager --enable centos-sclo-rh-testing
dts_ver=7
dts_ver=8
;;
esac
elif test $ID = rhel -a $MAJOR_VERSION = 7 ; then
@ -375,7 +383,11 @@ else
opensuse*|suse|sles)
echo "Using zypper to install dependencies"
zypp_install="zypper --gpg-auto-import-keys --non-interactive install --no-recommends"
$SUDO $zypp_install systemd-rpm-macros
$SUDO $zypp_install systemd-rpm-macros rpm-build || exit 1
if [ -e /usr/bin/python2 ] ; then
# see https://tracker.ceph.com/issues/23981
$SUDO $zypp_install python2-virtualenv python2-devel || exit 1
fi
munge_ceph_spec_in $for_make_check $DIR/ceph.spec
$SUDO $zypp_install $(rpmspec -q --buildrequires $DIR/ceph.spec) || exit 1
$SUDO $zypp_install libxmlsec1-1 libxmlsec1-nss1 libxmlsec1-openssl1 xmlsec1-devel xmlsec1-openssl-devel

View File

@ -61,25 +61,11 @@ download_boost() {
rm -rf src/boost
}
_python_autoselect() {
python_command=
for interpreter in python2.7 python3 ; do
type $interpreter > /dev/null 2>&1 || continue
python_command=$interpreter
break
done
if [ -z "$python_command" ] ; then
echo "Could not find a suitable python interpreter! Bailing out."
exit 1
fi
echo $python_command
}
build_dashboard_frontend() {
CURR_DIR=`pwd`
TEMP_DIR=`mktemp -d`
$CURR_DIR/src/tools/setup-virtualenv.sh --python=$(_python_autoselect) $TEMP_DIR
$CURR_DIR/src/tools/setup-virtualenv.sh $TEMP_DIR
$TEMP_DIR/bin/pip install nodeenv
$TEMP_DIR/bin/nodeenv -p --node=10.13.0
cd src/pybind/mgr/dashboard/frontend
@ -152,8 +138,8 @@ ln -s . $outfile
tar cvf $outfile.version.tar $outfile/src/.git_version $outfile/ceph.spec $outfile/alpine/APKBUILD
# NOTE: If you change this version number make sure the package is available
# at the three URLs referenced below (may involve uploading to download.ceph.com)
boost_version=1.67.0
download_boost $boost_version 2684c972994ee57fc5632e03bf044746f6eb45d4920c343937a465fd67a5adba \
boost_version=1.72.0
download_boost $boost_version 59c9b274bc451cf91a9ba1dd2c7fdcaf5d60b1b3aa83f2c9fa143417cc660722 \
https://dl.bintray.com/boostorg/release/$boost_version/source \
https://downloads.sourceforge.net/project/boost/boost/$boost_version \
https://download.ceph.com/qa

View File

@ -116,7 +116,7 @@
}
],
"thresholds": "1,2",
"timeFrom": "1m",
"timeFrom": null,
"title": "Health Status",
"transparent": false,
"type": "singlestat",
@ -402,49 +402,49 @@
"steppedLine": false,
"targets": [
{
"expr": "ceph_pg_total",
"expr": "sum(ceph_pg_total)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Total",
"refId": "A"
},
{
"expr": "ceph_pg_active",
"expr": "sum(ceph_pg_active)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Active",
"refId": "B"
},
{
"expr": "ceph_pg_total - ceph_pg_active",
"expr": "sum(ceph_pg_total - ceph_pg_active)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Inactive",
"refId": "G"
},
{
"expr": "ceph_pg_undersized",
"expr": "sum(ceph_pg_undersized)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Undersized",
"refId": "F"
},
{
"expr": "ceph_pg_degraded",
"expr": "sum(ceph_pg_degraded)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Degraded",
"refId": "C"
},
{
"expr": "ceph_pg_inconsistent",
"expr": "sum(ceph_pg_inconsistent)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Inconsistent",
"refId": "D"
},
{
"expr": "ceph_pg_down",
"expr": "sum(ceph_pg_down)",
"format": "time_series",
"intervalFactor": 1,
"legendFormat": "Down",

View File

@ -38,17 +38,7 @@
"graphTooltip": 0,
"id": null,
"iteration": 1557386759572,
"links": [
{
"asDropdown": true,
"icon": "external link",
"tags": [
"overview"
],
"title": "Shortcuts",
"type": "dashboards"
}
],
"links": [],
"panels": [
{
"gridPos": {
@ -527,7 +517,7 @@
}
],
"thresholds": [],
"timeFrom": "15m",
"timeFrom": null,
"timeShift": null,
"title": "Network drop rate",
"tooltip": {
@ -711,7 +701,7 @@
}
],
"thresholds": [],
"timeFrom": "15m",
"timeFrom": null,
"timeShift": null,
"title": "Network error rate",
"tooltip": {
@ -1221,5 +1211,5 @@
"timezone": "browser",
"title": "Host Details",
"uid": "rtOg0AiWz",
"version": 3
"version": 4
}

View File

@ -503,7 +503,7 @@
"step": 240
}
],
"timeFrom": "2m",
"timeFrom": null,
"timeShift": null,
"title": "OSD Objectstore Types",
"type": "grafana-piechart-panel",
@ -620,7 +620,7 @@
"step": 2
}
],
"timeFrom": "2m",
"timeFrom": null,
"timeShift": null,
"title": "OSD Size Summary",
"type": "grafana-piechart-panel",
@ -781,7 +781,7 @@
}
],
"thresholds": [],
"timeFrom": "36h",
"timeFrom": null,
"timeShift": null,
"title": "Read/Write Profile",
"tooltip": {

View File

@ -8,7 +8,10 @@ groups:
severity: critical
type: ceph_default
annotations:
description: Ceph in health_error state for more than 5m
description: >
Ceph in HEALTH_ERROR state for more than 5 minutes.
Please check "ceph health detail" for more information.
- alert: health warn
expr: ceph_health_status == 1
for: 15m
@ -16,7 +19,10 @@ groups:
severity: warning
type: ceph_default
annotations:
description: Ceph in health_warn for more than 15m.
description: >
Ceph has been in HEALTH_WARN for more than 15 minutes.
Please check "ceph health detail" for more information.
- name: mon
rules:
- alert: low monitor quorum count
@ -25,16 +31,32 @@ groups:
severity: critical
type: ceph_default
annotations:
description: Monitor count in quorum is low.
description: |
Monitor count in quorum is below three.
Only {{ $value }} of {{ with query "count(ceph_mon_quorum_status)" }}{{ . | first | value }}{{ end }} monitors are active.
The following monitors are down:
{{- range query "(ceph_mon_quorum_status == 0) + on(ceph_daemon) group_left(hostname) (ceph_mon_metadata * 0)" }}
- {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
{{- end }}
- name: osd
rules:
- alert: 10% OSDs down
expr: sum(ceph_osd_up) / count(ceph_osd_in) <= 0.9
expr: (sum(ceph_osd_up) / count(ceph_osd_up)) * 100 <= 90
labels:
severity: critical
type: ceph_default
annotations:
description: More than 10% of OSDs are down.
description: |
{{ $value | humanize}}% or {{with query "sum(ceph_osd_up)" }}{{ . | first | value }}{{ end }} of {{ with query "count(ceph_osd_up)"}}{{. | first | value }}{{ end }} OSDs are down (>=10%).
The following OSDs are down:
{{- range query "(ceph_osd_up * on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0" }}
- {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
{{- end }}
- alert: OSD down
expr: count(ceph_osd_up == 0) > 0
for: 15m
@ -42,35 +64,63 @@ groups:
severity: warning
type: ceph_default
annotations:
description: One or more OSDs down for more than 15 minutes.
description: |
{{ $s := "" }}{{ if gt $value 1.0 }}{{ $s = "s" }}{{ end }}
{{ $value }} OSD{{ $s }} down for more than 15 minutes.
{{ $value }} of {{ query "count(ceph_osd_up)" | first | value }} OSDs are down.
The following OSD{{ $s }} {{ if eq $s "" }}is{{ else }}are{{ end }} down:
{{- range query "(ceph_osd_up * on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0"}}
- {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
{{- end }}
- alert: OSDs near full
expr: ((ceph_osd_stat_bytes_used / ceph_osd_stat_bytes) and on(ceph_daemon) ceph_osd_up == 1) > 0.8
expr: |
(
((ceph_osd_stat_bytes_used / ceph_osd_stat_bytes) and on(ceph_daemon) ceph_osd_up == 1)
* on(ceph_daemon) group_left(hostname) ceph_osd_metadata
) * 100 > 90
for: 5m
labels:
severity: critical
type: ceph_default
annotations:
description: OSD {{ $labels.ceph_daemon }} is dangerously full, over 80%.
# alert on single OSDs flapping
- alert: flap osd
expr: rate(ceph_osd_up[5m])*60 > 1
description: >
OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} is
dangerously full: {{ $value | humanize }}%
- alert: flapping OSD
expr: |
(
rate(ceph_osd_up[5m])
* on(ceph_daemon) group_left(hostname) ceph_osd_metadata
) * 60 > 1
labels:
severity: warning
type: ceph_default
annotations:
description: >
OSD {{ $labels.ceph_daemon }} was marked down and back up at least once a
minute for 5 minutes.
OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} was
marked down and back up at {{ $value | humanize }} times once a
minute for 5 minutes.
# alert on high deviation from average PG count
- alert: high pg count deviation
expr: abs(((ceph_osd_numpg > 0) - on (job) group_left avg(ceph_osd_numpg > 0) by (job)) / on (job) group_left avg(ceph_osd_numpg > 0) by (job)) > 0.35
expr: |
abs(
(
(ceph_osd_numpg > 0) - on (job) group_left avg(ceph_osd_numpg > 0) by (job)
) / on (job) group_left avg(ceph_osd_numpg > 0) by (job)
) * on(ceph_daemon) group_left(hostname) ceph_osd_metadata > 0.30
for: 5m
labels:
severity: warning
type: ceph_default
annotations:
description: >
OSD {{ $labels.ceph_daemon }} deviates by more than 30% from
average PG count.
OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} deviates
by more than 30% from average PG count.
# alert on high commit latency...but how high is too high
- name: mds
rules:
@ -81,30 +131,38 @@ groups:
- name: pgs
rules:
- alert: pgs inactive
expr: ceph_pg_total - ceph_pg_active > 0
expr: ceph_pool_metadata * on(pool_id,instance) group_left() (ceph_pg_total - ceph_pg_active) > 0
for: 5m
labels:
severity: critical
type: ceph_default
annotations:
description: One or more PGs are inactive for more than 5 minutes.
description: >
{{ $value }} PGs have been inactive for more than 5 minutes in pool {{ $labels.name }}.
Inactive placement groups aren't able to serve read/write
requests.
- alert: pgs unclean
expr: ceph_pg_total - ceph_pg_clean > 0
expr: ceph_pool_metadata * on(pool_id,instance) group_left() (ceph_pg_total - ceph_pg_clean) > 0
for: 15m
labels:
severity: warning
type: ceph_default
annotations:
description: One or more PGs are not clean for more than 15 minutes.
description: >
{{ $value }} PGs haven't been clean for more than 15 minutes in pool {{ $labels.name }}.
Unclean PGs haven't been able to completely recover from a
previous failure.
- name: nodes
rules:
- alert: root volume full
expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} < 0.05
expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100 < 5
labels:
severity: critical
type: ceph_default
annotations:
description: Root volume (OSD and MON store) is dangerously full (< 5% free).
description: >
Root volume (OSD and MON store) is dangerously full: {{ $value | humanize }}% free.
# alert on nic packet errors and drops rates > 1 packet/s
- alert: network packets dropped
expr: irate(node_network_receive_drop_total{device!="lo"}[5m]) + irate(node_network_transmit_drop_total{device!="lo"}[5m]) > 1
@ -115,8 +173,11 @@ groups:
description: >
Node {{ $labels.instance }} experiences packet drop > 1
packet/s on interface {{ $labels.device }}.
- alert: network packet errors
expr: irate(node_network_receive_errs_total{device!="lo"}[5m]) + irate(node_network_transmit_errs_total{device!="lo"}[5m]) > 1
expr: |
irate(node_network_receive_errs_total{device!="lo"}[5m]) +
irate(node_network_transmit_errs_total{device!="lo"}[5m]) > 1
labels:
severity: warning
type: ceph_default
@ -124,31 +185,48 @@ groups:
description: >
Node {{ $labels.instance }} experiences packet errors > 1
packet/s on interface {{ $labels.device }}.
# predict fs fillup times
# predict fs fill-up times
- alert: storage filling
expr: ((node_filesystem_free_bytes) / deriv(node_filesystem_free_bytes[2d]) <= 5) > 0
expr: |
(
(
node_filesystem_free_bytes / deriv(node_filesystem_free_bytes[2d])
* on(instance) group_left(nodename) node_uname_info
) <= 5
) > 0
labels:
severity: warning
type: ceph_default
annotations:
description: >
Mountpoint {{ $labels.mountpoint }} will be full in less than 5 days
assuming the average fillup rate of the past 48 hours.
Mountpoint {{ $labels.mountpoint }} on {{ $labels.nodename }}
will be full in less than 5 days assuming the average fill-up
rate of the past 48 hours.
- name: pools
rules:
- alert: pool full
expr: ceph_pool_stored / ceph_pool_max_avail * on(pool_id) group_right ceph_pool_metadata > 0.9
expr: |
ceph_pool_stored / ceph_pool_max_avail
* on(pool_id) group_right ceph_pool_metadata * 100 > 90
labels:
severity: critical
type: ceph_default
annotations:
description: Pool {{ $labels.name }} at 90% capacity or over.
description: Pool {{ $labels.name }} at {{ $value | humanize }}% capacity.
- alert: pool filling up
expr: (((ceph_pool_max_avail - ceph_pool_stored) / deriv(ceph_pool_max_avail[2d])) * on(pool_id) group_right ceph_pool_metadata <=5) > 0
expr: |
(
(
(ceph_pool_max_avail - ceph_pool_stored) / deriv(ceph_pool_max_avail[2d])
) * on(pool_id) group_right ceph_pool_metadata <= 5
) > 0
labels:
severity: warning
type: ceph_default
annotations:
description: >
Pool {{ $labels.name }} will be full in less than 5 days
assuming the average fillup rate of the past 48 hours.
assuming the average fill-up rate of the past 48 hours.

View File

@ -1,5 +1,7 @@
overrides:
ceph:
log-whitelist:
- SLOW_OPS
conf:
osd:
filestore flush min: 0

View File

@ -2,8 +2,11 @@
Setup
=====
$ RO_KEY=$(ceph auth get-or-create-key client.ro mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd-read-only')
$ rbd create --size 10 img
$ rbd snap create img@snap
$ rbd snap protect img@snap
$ rbd clone img@snap cloneimg
$ rbd create --size 1 imgpart
$ DEV=$(sudo rbd map imgpart)
$ cat <<EOF | sudo sfdisk $DEV >/dev/null 2>&1
@ -144,10 +147,16 @@ R/O, unpartitioned:
.*BLKROSET: Permission denied (re)
[1]
$ sudo blockdev --setrw $DEV
.*BLKROSET: Read-only file system (re)
[1]
$ blockdev --getro $DEV
0
1
$ dd if=/dev/urandom of=$DEV bs=1k seek=1 count=1 status=none
dd: error writing '/dev/rbd?': Operation not permitted (glob)
[1]
$ blkdiscard $DEV
blkdiscard: /dev/rbd?: BLKDISCARD ioctl failed: Operation not permitted (glob)
[1]
$ sudo rbd unmap $DEV
R/O, partitioned:
@ -174,18 +183,30 @@ R/O, partitioned:
.*BLKROSET: Permission denied (re)
[1]
$ sudo blockdev --setrw ${DEV}p1
.*BLKROSET: Read-only file system (re)
[1]
$ blockdev --setrw ${DEV}p2
.*BLKROSET: Permission denied (re)
[1]
$ sudo blockdev --setrw ${DEV}p2
.*BLKROSET: Read-only file system (re)
[1]
$ blockdev --getro ${DEV}p1
0
1
$ blockdev --getro ${DEV}p2
0
1
$ dd if=/dev/urandom of=${DEV}p1 bs=1k seek=1 count=1 status=none
dd: error writing '/dev/rbd?p1': Operation not permitted (glob)
[1]
$ blkdiscard ${DEV}p1
blkdiscard: /dev/rbd?p1: BLKDISCARD ioctl failed: Operation not permitted (glob)
[1]
$ dd if=/dev/urandom of=${DEV}p2 bs=1k seek=1 count=1 status=none
dd: error writing '/dev/rbd?p2': Operation not permitted (glob)
[1]
$ blkdiscard ${DEV}p2
blkdiscard: /dev/rbd?p2: BLKDISCARD ioctl failed: Operation not permitted (glob)
[1]
$ sudo rbd unmap $DEV
@ -270,6 +291,45 @@ Partitioned:
$ sudo rbd unmap $DEV
read-only OSD caps
==================
R/W:
$ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) img)
rbd: sysfs write failed
rbd: map failed: (1) Operation not permitted
[1]
R/O:
$ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) --read-only img)
$ blockdev --getro $DEV
1
$ sudo rbd unmap $DEV
Snapshot:
$ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) img@snap)
$ blockdev --getro $DEV
1
$ sudo rbd unmap $DEV
R/W, clone:
$ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) cloneimg)
rbd: sysfs write failed
rbd: map failed: (1) Operation not permitted
[1]
R/O, clone:
$ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) --read-only cloneimg)
$ blockdev --getro $DEV
1
$ sudo rbd unmap $DEV
rw -> ro with open_count > 0
============================
@ -288,6 +348,8 @@ Teardown
$ rbd snap purge imgpart >/dev/null 2>&1
$ rbd rm imgpart >/dev/null 2>&1
$ rbd rm cloneimg >/dev/null 2>&1
$ rbd snap unprotect img@snap
$ rbd snap purge img >/dev/null 2>&1
$ rbd rm img >/dev/null 2>&1

View File

@ -0,0 +1,31 @@
journaling makes the image only unwritable, rather than both unreadable
and unwritable:
$ rbd create --size 1 --image-feature layering,exclusive-lock,journaling img
$ rbd snap create img@snap
$ rbd snap protect img@snap
$ rbd clone --image-feature layering,exclusive-lock,journaling img@snap cloneimg
$ DEV=$(sudo rbd map img)
rbd: sysfs write failed
rbd: map failed: (6) No such device or address
[6]
$ DEV=$(sudo rbd map --read-only img)
$ blockdev --getro $DEV
1
$ sudo rbd unmap $DEV
$ DEV=$(sudo rbd map cloneimg)
rbd: sysfs write failed
rbd: map failed: (6) No such device or address
[6]
$ DEV=$(sudo rbd map --read-only cloneimg)
$ blockdev --getro $DEV
1
$ sudo rbd unmap $DEV
$ rbd rm --no-progress cloneimg
$ rbd snap unprotect img@snap
$ rbd snap rm --no-progress img@snap
$ rbd rm --no-progress img

View File

@ -24,18 +24,18 @@ Write to first and last sectors and make sure we hit the right objects:
Dump first and last megabytes:
$ DEV=$(sudo rbd map hugeimg/img)
$ hexdump -n 1048576 $DEV
$ dd if=$DEV bs=1M count=1 status=none | hexdump
0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
0000200 0000 0000 0000 0000 0000 0000 0000 0000
*
0100000
$ hexdump -s 4611686018426339328 $DEV
3ffffffffff00000 0000 0000 0000 0000 0000 0000 0000 0000
$ dd if=$DEV bs=1M skip=4398046511103 status=none | hexdump
0000000 0000 0000 0000 0000 0000 0000 0000 0000
*
3ffffffffffffe00 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
00ffe00 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
*
4000000000000000
0100000
$ sudo rbd unmap $DEV
$ ceph osd pool delete hugeimg hugeimg --yes-i-really-really-mean-it >/dev/null 2>&1

View File

@ -293,8 +293,6 @@ function test_kill_daemon() {
kill_daemon $pidfile TERM || return 1
done
ceph osd dump | grep "osd.0 down" || return 1
name_prefix=mgr
for pidfile in $(find $dir 2>/dev/null | grep $name_prefix'[^/]*\.pid') ; do
#
@ -381,7 +379,6 @@ function test_kill_daemons() {
# killing just the osd and verify the mon still is responsive
#
kill_daemons $dir TERM osd || return 1
ceph osd dump | grep "osd.0 down" || return 1
#
# kill the mgr
#
@ -780,6 +777,7 @@ function destroy_osd() {
ceph osd out osd.$id || return 1
kill_daemons $dir TERM osd.$id || return 1
ceph osd down osd.$id || return 1
ceph osd purge osd.$id --yes-i-really-mean-it || return 1
teardown $dir/$id || return 1
rm -fr $dir/$id
@ -930,8 +928,10 @@ function test_wait_for_osd() {
run_mon $dir a --osd_pool_default_size=1 || return 1
run_mgr $dir x || return 1
run_osd $dir 0 || return 1
run_osd $dir 1 || return 1
wait_for_osd up 0 || return 1
kill_daemons $dir TERM osd || return 1
wait_for_osd up 1 || return 1
kill_daemons $dir TERM osd.0 || return 1
wait_for_osd down 0 || return 1
( TIMEOUT=1 ; ! wait_for_osd up 0 ) || return 1
teardown $dir || return 1
@ -1313,6 +1313,36 @@ function test_get_num_active_clean() {
teardown $dir || return 1
}
##
# Return the number of active or peered PGs in the cluster. A PG matches if
# ceph pg dump pgs reports it is either **active** or **peered** and that
# not **stale**.
#
# @param STDOUT the number of active PGs
# @return 0 on success, 1 on error
#
function get_num_active_or_peered() {
local expression
expression+="select(contains(\"active\") or contains(\"peered\")) | "
expression+="select(contains(\"stale\") | not)"
ceph --format json pg dump pgs 2>/dev/null | \
jq ".pg_stats | [.[] | .state | $expression] | length"
}
function test_get_num_active_or_peered() {
local dir=$1
setup $dir || return 1
run_mon $dir a --osd_pool_default_size=1 || return 1
run_mgr $dir x || return 1
run_osd $dir 0 || return 1
create_rbd_pool || return 1
wait_for_clean || return 1
local num_peered=$(get_num_active_or_peered)
test "$num_peered" = $PG_NUM || return 1
teardown $dir || return 1
}
#######################################################################
##
@ -1588,6 +1618,64 @@ function test_wait_for_clean() {
teardown $dir || return 1
}
##
# Wait until the cluster becomes peered or if it does not make progress
# for $WAIT_FOR_CLEAN_TIMEOUT seconds.
# Progress is measured either via the **get_is_making_recovery_progress**
# predicate or if the number of peered PGs changes (as returned by get_num_active_or_peered)
#
# @return 0 if the cluster is clean, 1 otherwise
#
function wait_for_peered() {
local cmd=$1
local num_peered=-1
local cur_peered
local -a delays=($(get_timeout_delays $WAIT_FOR_CLEAN_TIMEOUT .1))
local -i loop=0
flush_pg_stats || return 1
while test $(get_num_pgs) == 0 ; do
sleep 1
done
while true ; do
# Comparing get_num_active_clean & get_num_pgs is used to determine
# if the cluster is clean. That's almost an inline of is_clean() to
# get more performance by avoiding multiple calls of get_num_active_clean.
cur_peered=$(get_num_active_or_peered)
test $cur_peered = $(get_num_pgs) && break
if test $cur_peered != $num_peered ; then
loop=0
num_peered=$cur_peered
elif get_is_making_recovery_progress ; then
loop=0
elif (( $loop >= ${#delays[*]} )) ; then
ceph report
return 1
fi
# eval is a no-op if cmd is empty
eval $cmd
sleep ${delays[$loop]}
loop+=1
done
return 0
}
function test_wait_for_peered() {
local dir=$1
setup $dir || return 1
run_mon $dir a --osd_pool_default_size=2 || return 1
run_osd $dir 0 || return 1
run_mgr $dir x || return 1
create_rbd_pool || return 1
! WAIT_FOR_CLEAN_TIMEOUT=1 wait_for_clean || return 1
run_osd $dir 1 || return 1
wait_for_peered || return 1
teardown $dir || return 1
}
#######################################################################
##

View File

@ -67,9 +67,9 @@ function TEST_balancer() {
ceph balancer pool add $TEST_POOL1 || return 1
ceph balancer pool add $TEST_POOL2 || return 1
ceph balancer pool ls || return 1
eval POOL=$(ceph balancer pool ls | jq '.[0]')
eval POOL=$(ceph balancer pool ls | jq 'sort | .[0]')
test "$POOL" = "$TEST_POOL1" || return 1
eval POOL=$(ceph balancer pool ls | jq '.[1]')
eval POOL=$(ceph balancer pool ls | jq 'sort | .[1]')
test "$POOL" = "$TEST_POOL2" || return 1
ceph balancer pool rm $TEST_POOL1 || return 1
ceph balancer pool rm $TEST_POOL2 || return 1
@ -104,7 +104,7 @@ function TEST_balancer() {
! ceph balancer optimize plan_upmap $TEST_POOL || return 1
ceph balancer status || return 1
eval RESULT=$(ceph balancer status | jq '.optimize_result')
test "$RESULT" = "Unable to find further optimization, or pool(s)' pg_num is decreasing, or distribution is already perfect" || return 1
test "$RESULT" = "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect" || return 1
ceph balancer on || return 1
ACTIVE=$(ceph balancer status | jq '.active')
@ -118,6 +118,102 @@ function TEST_balancer() {
teardown $dir || return 1
}
function TEST_balancer2() {
local dir=$1
TEST_PGS1=118
TEST_PGS2=132
TOTAL_PGS=$(expr $TEST_PGS1 + $TEST_PGS2)
OSDS=5
DEFAULT_REPLICAS=3
# Integer average of PGS per OSD (70.8), so each OSD >= this
FINAL_PER_OSD1=$(expr \( $TEST_PGS1 \* $DEFAULT_REPLICAS \) / $OSDS)
# Integer average of PGS per OSD (150)
FINAL_PER_OSD2=$(expr \( \( $TEST_PGS1 + $TEST_PGS2 \) \* $DEFAULT_REPLICAS \) / $OSDS)
CEPH_ARGS+="--osd_pool_default_pg_autoscale_mode=off "
CEPH_ARGS+="--debug_osd=20 "
setup $dir || return 1
run_mon $dir a || return 1
run_mgr $dir x || return 1
for i in $(seq 0 $(expr $OSDS - 1))
do
run_osd $dir $i || return 1
done
ceph osd set-require-min-compat-client luminous
ceph config set mgr mgr/balancer/upmap_max_deviation 1
ceph balancer mode upmap || return 1
ceph balancer on || return 1
ceph config set mgr mgr/balancer/sleep_interval 5
create_pool $TEST_POOL1 $TEST_PGS1
wait_for_clean || return 1
# Wait up to 2 minutes
OK=no
for i in $(seq 1 25)
do
sleep 5
if grep -q "Optimization plan is almost perfect" $dir/mgr.x.log
then
OK=yes
break
fi
done
test $OK = "yes" || return 1
# Plan is found, but PGs still need to move
sleep 30
ceph osd df
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[0].pgs')
test $PGS -ge $FINAL_PER_OSD1 || return 1
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[1].pgs')
test $PGS -ge $FINAL_PER_OSD1 || return 1
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[2].pgs')
test $PGS -ge $FINAL_PER_OSD1 || return 1
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[3].pgs')
test $PGS -ge $FINAL_PER_OSD1 || return 1
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[4].pgs')
test $PGS -ge $FINAL_PER_OSD1 || return 1
create_pool $TEST_POOL2 $TEST_PGS2
# Wait up to 2 minutes
OK=no
for i in $(seq 1 25)
do
sleep 5
COUNT=$(grep "Optimization plan is almost perfect" $dir/mgr.x.log | wc -l)
if test $COUNT = "2"
then
OK=yes
break
fi
done
test $OK = "yes" || return 1
# Plan is found, but PGs still need to move
sleep 30
ceph osd df
# We should be with plue or minus 1 of FINAL_PER_OSD2
# This is because here each pool is balanced independently
MIN=$(expr $FINAL_PER_OSD2 - 1)
MAX=$(expr $FINAL_PER_OSD2 + 1)
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[0].pgs')
test $PGS -ge $MIN -a $PGS -le $MAX || return 1
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[1].pgs')
test $PGS -ge $MIN -a $PGS -le $MAX || return 1
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[2].pgs')
test $PGS -ge $MIN -a $PGS -le $MAX || return 1
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[3].pgs')
test $PGS -ge $MIN -a $PGS -le $MAX || return 1
PGS=$(ceph osd df --format=json-pretty | jq '.nodes[4].pgs')
test $PGS -ge $MIN -a $PGS -le $MAX || return 1
teardown $dir || return 1
}
main balancer "$@"
# Local Variables:

View File

@ -237,5 +237,53 @@ function TEST_0_mds() {
kill_daemons $dir KILL mds.a
}
function TEST_0_osd() {
local dir=$1
CEPH_ARGS="$ORIG_CEPH_ARGS --mon-host=$CEPH_MON_A "
run_mon $dir a --public-addr=$CEPH_MON_A || return 1
run_mgr $dir x || return 1
run_osd $dir 0 || return 1
run_osd $dir 1 || return 1
run_osd $dir 2 || return 1
run_osd $dir 3 || return 1
ceph osd erasure-code-profile set ec-profile m=2 k=2 crush-failure-domain=osd || return 1
ceph osd pool create ec 8 erasure ec-profile || return 1
wait_for_clean || return 1
# with min_size 3, we can stop only 1 osd
ceph osd pool set ec min_size 3 || return 1
wait_for_clean || return 1
ceph osd ok-to-stop 0 || return 1
ceph osd ok-to-stop 1 || return 1
ceph osd ok-to-stop 2 || return 1
ceph osd ok-to-stop 3 || return 1
! ceph osd ok-to-stop 0 1 || return 1
! ceph osd ok-to-stop 2 3 || return 1
# with min_size 2 we can stop 1 osds
ceph osd pool set ec min_size 2 || return 1
wait_for_clean || return 1
ceph osd ok-to-stop 0 1 || return 1
ceph osd ok-to-stop 2 3 || return 1
! ceph osd ok-to-stop 0 1 2 || return 1
! ceph osd ok-to-stop 1 2 3 || return 1
# we should get the same result with one of the osds already down
kill_daemons $dir TERM osd.0 || return 1
ceph osd down 0 || return 1
wait_for_peered || return 1
ceph osd ok-to-stop 0 || return 1
ceph osd ok-to-stop 0 1 || return 1
! ceph osd ok-to-stop 0 1 2 || return 1
! ceph osd ok-to-stop 1 2 3 || return 1
}
main ok-to-stop "$@"

View File

@ -49,7 +49,7 @@ function get_num_in_state() {
}
function wait_for_state() {
function wait_for_not_state() {
local state=$1
local num_in_state=-1
local cur_in_state
@ -78,15 +78,15 @@ function wait_for_state() {
}
function wait_for_backfill() {
function wait_for_not_backfilling() {
local timeout=$1
wait_for_state backfilling $timeout
wait_for_not_state backfilling $timeout
}
function wait_for_active() {
function wait_for_not_activating() {
local timeout=$1
wait_for_state activating $timeout
wait_for_not_state activating $timeout
}
# All tests are created in an environment which has fake total space
@ -149,8 +149,8 @@ function TEST_backfill_test_simple() {
done
sleep 5
wait_for_backfill 240 || return 1
wait_for_active 60 || return 1
wait_for_not_backfilling 240 || return 1
wait_for_not_activating 60 || return 1
ERRORS=0
if [ "$(ceph pg dump pgs | grep +backfill_toofull | wc -l)" != "1" ];
@ -228,8 +228,8 @@ function TEST_backfill_test_multi() {
done
sleep 5
wait_for_backfill 240 || return 1
wait_for_active 60 || return 1
wait_for_not_backfilling 240 || return 1
wait_for_not_activating 60 || return 1
ERRORS=0
full="$(ceph pg dump pgs | grep +backfill_toofull | wc -l)"
@ -380,8 +380,8 @@ function TEST_backfill_test_sametarget() {
ceph osd pool set $pool2 size 2
sleep 5
wait_for_backfill 240 || return 1
wait_for_active 60 || return 1
wait_for_not_backfilling 240 || return 1
wait_for_not_activating 60 || return 1
ERRORS=0
if [ "$(ceph pg dump pgs | grep +backfill_toofull | wc -l)" != "1" ];
@ -515,8 +515,8 @@ function TEST_backfill_multi_partial() {
ceph osd in osd.$fillosd
sleep 15
wait_for_backfill 240 || return 1
wait_for_active 60 || return 1
wait_for_not_backfilling 240 || return 1
wait_for_not_activating 60 || return 1
flush_pg_stats || return 1
ceph pg dump pgs
@ -698,8 +698,8 @@ function TEST_ec_backfill_simple() {
ceph pg dump pgs
wait_for_backfill 240 || return 1
wait_for_active 60 || return 1
wait_for_not_backfilling 240 || return 1
wait_for_not_activating 60 || return 1
ceph pg dump pgs
@ -822,8 +822,8 @@ function TEST_ec_backfill_multi() {
sleep 10
wait_for_backfill 240 || return 1
wait_for_active 60 || return 1
wait_for_not_backfilling 240 || return 1
wait_for_not_activating 60 || return 1
ceph pg dump pgs
@ -961,8 +961,8 @@ function SKIP_TEST_ec_backfill_multi_partial() {
sleep 10
ceph pg dump pgs
wait_for_backfill 240 || return 1
wait_for_active 60 || return 1
wait_for_not_backfilling 240 || return 1
wait_for_not_activating 60 || return 1
ceph pg dump pgs
@ -1069,8 +1069,8 @@ function SKIP_TEST_ec_backfill_multi_partial() {
ceph osd in osd.$fillosd
sleep 15
wait_for_backfill 240 || return 1
wait_for_active 60 || return 1
wait_for_not_backfilling 240 || return 1
wait_for_not_activating 60 || return 1
ERRORS=0
if [ "$(ceph pg dump pgs | grep -v "^1.0" | grep +backfill_toofull | wc -l)" != "1" ];

View File

@ -49,7 +49,6 @@ function get_num_in_state() {
function wait_for_state() {
local state=$1
local num_in_state=-1
local cur_in_state
local -a delays=($(get_timeout_delays $2 5))
local -i loop=0
@ -61,11 +60,8 @@ function wait_for_state() {
while true ; do
cur_in_state=$(get_num_in_state ${state})
test $cur_in_state = "0" && break
if test $cur_in_state != $num_in_state ; then
loop=0
num_in_state=$cur_in_state
elif (( $loop >= ${#delays[*]} )) ; then
test $cur_in_state -gt 0 && break
if (( $loop >= ${#delays[*]} )) ; then
ceph pg dump pgs
return 1
fi

View File

@ -3,7 +3,8 @@ meta:
overrides:
ceph_ansible:
ansible-version: "2.8"
ansible-version: '2.8.1'
branch: stable-4.0
vars:
ceph_conf_overrides:
global:

View File

@ -0,0 +1,11 @@
overrides:
ceph:
conf:
global:
lockdep: true
tasks:
- cephfs_test_runner:
modules:
- tasks.cephfs.test_admin

View File

@ -1,11 +0,0 @@
overrides:
ceph:
conf:
global:
lockdep: true
tasks:
- cephfs_test_runner:
modules:
- tasks.cephfs.test_config_commands

View File

@ -12,6 +12,7 @@ overrides:
- Scrub error on inode
- Metadata damage detected
- inconsistent rstat on inode
- Error recovering journal
tasks:
- cephfs_test_runner:

View File

@ -0,0 +1,5 @@
tasks:
- cephfs_test_runner:
modules:
- tasks.cephfs.test_openfiletable

View File

@ -1,3 +1,15 @@
overrides:
ceph:
log-whitelist:
- OSD full dropping all updates
- OSD near full
- pausewr flag
- failsafe engaged, dropping updates
- failsafe disengaged, no longer dropping
- is full \(reached quota
- POOL_FULL
- POOL_BACKFILLFULL
tasks:
- cephfs_test_runner:
modules:

View File

@ -9,3 +9,5 @@ overrides:
- evicting unresponsive client
- POOL_APP_NOT_ENABLED
- has not responded to cap revoke by MDS for over
- MDS_CLIENT_LATE_RELEASE
- responding to mclientcaps

View File

@ -17,6 +17,8 @@ overrides:
mds heartbeat grace: 60
mon:
mon osd crush smoke test: false
osd:
osd fast shutdown: false
valgrind:
mon: [--tool=memcheck, --leak-check=full, --show-reachable=yes]
osd: [--tool=memcheck]

View File

@ -1,3 +1,7 @@
overrides:
ceph:
log-whitelist:
- SLOW_OPS
tasks:
- workunit:
clients:

View File

@ -1,12 +0,0 @@
overrides:
ceph:
conf:
global:
lockdep: true
tasks:
- cephfs_test_runner:
fail_on_skip: false
modules:
- tasks.cephfs.test_config_commands

View File

@ -11,6 +11,7 @@ overrides:
- Scrub error on inode
- Metadata damage detected
- inconsistent rstat on inode
- Error recovering journal
tasks:
- cephfs_test_runner:

View File

@ -1,5 +1,7 @@
overrides:
ceph:
log-whitelist:
- SLOW_OPS
conf:
osd:
filestore flush min: 0

View File

@ -1,5 +0,0 @@
tasks:
- cram:
clients:
client.0:
- qa/rbd/krbd_blkroset.t

View File

@ -0,0 +1,6 @@
tasks:
- cram:
clients:
client.0:
- qa/rbd/krbd_blkroset.t
- qa/rbd/krbd_get_features.t

View File

@ -1,4 +1,5 @@
tasks:
- cephfs_test_runner:
fail_on_skip: false
modules:
- tasks.cephfs.test_exports

View File

@ -4,6 +4,7 @@ overrides:
ceph:
log-whitelist:
- evicting unresponsive client
- RECENT_CRASH
tasks:
- cephfs_test_runner:

View File

@ -28,6 +28,7 @@ tasks:
- default.rgw.log
- s3readwrite:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0
readwrite:
bucket: rwtest

View File

@ -2,6 +2,10 @@ tasks:
- install:
- exec:
mon.b:
- sudo systemctl stop chronyd.service || true
- sudo systemctl stop systemd-timesync.service || true
- sudo systemctl stop ntpd.service || true
- sudo systemctl stop ntp.service || true
- date -u -s @$(expr $(date -u +%s) + 2)
- ceph:
wait-for-healthy: false
@ -11,6 +15,7 @@ tasks:
- overall HEALTH_
- \(MON_CLOCK_SKEW\)
- \(MGR_DOWN\)
- \(MON_DOWN\)
- \(PG_
- \(SLOW_OPS\)
- No standby daemons available

View File

@ -23,6 +23,8 @@ overrides:
osd max object namespace len: 64
mon:
mon osd crush smoke test: false
osd:
osd fast shutdown: false
valgrind:
mon: [--tool=memcheck, --leak-check=full, --show-reachable=yes]
osd: [--tool=memcheck]

View File

@ -2,6 +2,11 @@ overrides:
ceph:
log-whitelist:
- must scrub before tier agent can activate
conf:
osd:
# override short_pg_log_entries.yaml (which sets these under [global])
osd_min_pg_log_entries: 3000
osd_max_pg_log_entries: 3000
tasks:
- exec:
client.0:

View File

@ -2,6 +2,11 @@ overrides:
ceph:
log-whitelist:
- must scrub before tier agent can activate
conf:
osd:
# override short_pg_log_entries.yaml (which sets these under [global])
osd_min_pg_log_entries: 3000
osd_max_pg_log_entries: 3000
tasks:
- exec:
client.0:

View File

@ -2,6 +2,11 @@ overrides:
ceph:
log-whitelist:
- must scrub before tier agent can activate
conf:
osd:
# override short_pg_log_entries.yaml (which sets these under [global])
osd_min_pg_log_entries: 3000
osd_max_pg_log_entries: 3000
tasks:
- exec:
client.0:

View File

@ -2,6 +2,11 @@ overrides:
ceph:
log-whitelist:
- must scrub before tier agent can activate
conf:
osd:
# override short_pg_log_entries.yaml (which sets these under [global])
osd_min_pg_log_entries: 3000
osd_max_pg_log_entries: 3000
tasks:
- exec:
client.0:

View File

@ -2,6 +2,11 @@ overrides:
ceph:
log-whitelist:
- must scrub before tier agent can activate
conf:
osd:
# override short_pg_log_entries.yaml (which sets these under [global])
osd_min_pg_log_entries: 3000
osd_max_pg_log_entries: 3000
tasks:
- exec:
client.0:

View File

@ -2,6 +2,11 @@ overrides:
ceph:
log-whitelist:
- must scrub before tier agent can activate
conf:
osd:
# override short_pg_log_entries.yaml (which sets these under [global])
osd_min_pg_log_entries: 3000
osd_max_pg_log_entries: 3000
tasks:
- exec:
client.0:

View File

@ -13,6 +13,8 @@ overrides:
debug refs: 5
mon:
mon osd crush smoke test: false
osd:
osd fast shutdown: false
log-whitelist:
- overall HEALTH_
# valgrind is slow.. we might get PGs stuck peering etc

View File

@ -4,6 +4,7 @@ tasks:
- rgw: [client.0]
- s3readwrite:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0
readwrite:
bucket: rwtest

View File

@ -4,6 +4,7 @@ tasks:
- rgw: [client.0]
- s3roundtrip:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0
roundtrip:
bucket: rttest

View File

@ -4,4 +4,5 @@ tasks:
- rgw: [client.0]
- swift:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0

View File

@ -11,6 +11,8 @@ overrides:
osd heartbeat grace: 40
mon:
mon osd crush smoke test: false
osd:
osd fast shutdown: false
valgrind:
mon: [--tool=memcheck, --leak-check=full, --show-reachable=yes]
osd: [--tool=memcheck]

View File

@ -1,6 +1,7 @@
tasks:
- s3readwrite:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0
readwrite:
bucket: rwtest

View File

@ -1,6 +1,7 @@
tasks:
- s3roundtrip:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0
roundtrip:
bucket: rttest

View File

@ -1,4 +1,5 @@
tasks:
- swift:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0

View File

@ -1,4 +1,5 @@
tasks:
- swift:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0

View File

@ -12,6 +12,8 @@ overrides:
osd heartbeat grace: 40
mon:
mon osd crush smoke test: false
osd:
osd fast shutdown: false
valgrind:
mon: [--tool=memcheck, --leak-check=full, --show-reachable=yes]
osd: [--tool=memcheck]

View File

@ -9,6 +9,7 @@ tasks:
- rgw: [client.0]
- s3tests:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0
overrides:
ceph:

View File

@ -5,6 +5,7 @@ tasks:
- rgw: [client.0]
- s3tests:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0
overrides:
ceph:

View File

@ -5,4 +5,5 @@ tasks:
- rgw: [client.0]
- swift:
client.0:
force-branch: ceph-nautilus
rgw_server: client.0

View File

@ -11,7 +11,7 @@ tasks:
- s3tests:
client.0:
rgw_server: client.0
force-branch: master
force-branch: ceph-nautilus
overrides:
ceph:
fs: xfs

View File

@ -11,7 +11,7 @@ tasks:
- s3tests:
client.0:
rgw_server: client.0
force-branch: master
force-branch: ceph-nautilus
overrides:
ceph:
fs: xfs

View File

@ -12,7 +12,7 @@ tasks:
- s3tests:
client.0:
rgw_server: client.0
force-branch: master
force-branch: ceph-nautilus
overrides:
ceph:
fs: xfs

View File

@ -0,0 +1 @@
../.qa/

View File

@ -0,0 +1 @@
../.qa/

View File

@ -0,0 +1 @@
../.qa/

Some files were not shown because too many files have changed in this diff Show More