import new upstream nautilus stable release 14.2.8

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-28 16:34:15 +00:00 · 2020-03-03 15:27:37 +01:00 · 2020-03-03 15:27:37 +01:00 · 92f5a8d42d
commit 92f5a8d42d
parent a0324939f9
11786 changed files with 1045748 additions and 467237 deletions
--- a/ceph/CMakeLists.txt
+++ b/ceph/CMakeLists.txt
@ -1,7 +1,7 @@
 cmake_minimum_required(VERSION 3.5.1)

 project(ceph CXX C ASM)
-set(VERSION 14.2.6)
+set(VERSION 14.2.8)

 if(POLICY CMP0028)
  cmake_policy(SET CMP0028 NEW)
--- a/ceph/PendingReleaseNotes
+++ b/ceph/PendingReleaseNotes
@ -1,118 +1,45 @@
-14.2.4
+14.2.8
 ------

-* In the Zabbix Mgr Module there was a typo in the key being send
-  to Zabbix for PGs in backfill_wait state. The key that was sent
-  was 'wait_backfill' and the correct name is 'backfill_wait'.
-  Update your Zabbix template accordingly so that it accepts the
-  new key being send to Zabbix.
+* The following OSD memory config options related to bluestore cache autotuning can now
+  be configured during runtime:

-14.2.3
--------
+    - osd_memory_base (default: 768 MB)
+    - osd_memory_cache_min (default: 128 MB)
+    - osd_memory_expected_fragmentation (default: 0.15)
+    - osd_memory_target (default: 4 GB)

-* Nautilus-based librbd clients can now open images on Jewel clusters.
+  The above options can be set with::

-* The RGW "num_rados_handles" has been removed.
-  If you were using a value of "num_rados_handles" greater than 1
-  multiply your current "objecter_inflight_ops" and
-  "objecter_inflight_op_bytes" paramaeters by the old
-  "num_rados_handles" to get the same throttle behavior.
-  
-* The ``bluestore_no_per_pool_stats_tolerance`` config option has been
-  replaced with ``bluestore_fsck_error_on_no_per_pool_stats``
-  (default: false).  The overall default behavior has not changed:
-  fsck will warn but not fail on legacy stores, and repair will
-  convert to per-pool stats.
+    ceph config set global <option> <value>

-14.2.2
------
+* The MGR now accepts 'profile rbd' and 'profile rbd-read-only' user caps.
+  These caps can be used to provide users access to MGR-based RBD functionality
+  such as 'rbd perf image iostat' an 'rbd perf image iotop'.

-* The no{up,down,in,out} related commands has been revamped.
-  There are now 2 ways to set the no{up,down,in,out} flags:
-  the old 'ceph osd [un]set <flag>' command, which sets cluster-wide flags;
-  and the new 'ceph osd [un]set-group <flags> <who>' command,
-  which sets flags in batch at the granularity of any crush node,
-  or device class.
+* The configuration value ``osd_calc_pg_upmaps_max_stddev`` used for upmap
+  balancing has been removed. Instead use the mgr balancer config
+  ``upmap_max_deviation`` which now is an integer number of PGs of deviation
+  from the target PGs per OSD.  This can be set with a command like
+  ``ceph config set mgr mgr/balancer/upmap_max_deviation 2``.  The default
+  ``upmap_max_deviation`` is 1.  There are situations where crush rules
+  would not allow a pool to ever have completely balanced PGs.  For example, if
+  crush requires 1 replica on each of 3 racks, but there are fewer OSDs in 1 of
+  the racks.  In those cases, the configuration value can be increased.

-* RGW: radosgw-admin introduces two subcommands that allow the
-  managing of expire-stale objects that might be left behind after a
-  bucket reshard in earlier versions of RGW. One subcommand lists such
-  objects and the other deletes them. Read the troubleshooting section
-  of the dynamic resharding docs for details.
+* RGW: a mismatch between the bucket notification documentation and the actual
+  message format was fixed. This means that any endpoints receiving bucket 
+  notification, will now receive the same notifications inside an JSON array
+  named 'Records'. Note that this does not affect pulling bucket notification
+  from a subscription in a 'pubsub' zone, as these are already wrapped inside
+  that array.

-14.2.5
------
+* Ceph will now issue a health warning if a RADOS pool as a ``pg_num``
+  value that is not a power of two.  This can be fixed by adjusting
+  the pool to a nearby power of two::

-* The telemetry module now has a 'device' channel, enabled by default, that
-  will report anonymized hard disk and SSD health metrics to telemetry.ceph.com
-  in order to build and improve device failure prediction algorithms.  Because
-  the content of telemetry reports has changed, you will need to either re-opt-in
-  with::
+    ceph osd pool set <pool-name> pg_num <new-pg-num>

-    ceph telemetry on
+  Alternatively, the warning can be silenced with::

-  You can view exactly what information will be reported first with::
-
-    ceph telemetry show
-    ceph telemetry show device   # specifically show the device channel
-
-  If you are not comfortable sharing device metrics, you can disable that
-  channel first before re-opting-in:
-
-    ceph config set mgr mgr/telemetry/channel_crash false
-    ceph telemetry on
-
-* The telemetry module now reports more information about CephFS file systems,
-  including:
-
-    - how many MDS daemons (in total and per file system)
-    - which features are (or have been) enabled
-    - how many data pools
-    - approximate file system age (year + month of creation)
-    - how many files, bytes, and snapshots
-    - how much metadata is being cached
-
-  We have also added:
-
-    - which Ceph release the monitors are running
-    - whether msgr v1 or v2 addresses are used for the monitors
-    - whether IPv4 or IPv6 addresses are used for the monitors
-    - whether RADOS cache tiering is enabled (and which mode)
-    - whether pools are replicated or erasure coded, and
-      which erasure code profile plugin and parameters are in use
-    - how many hosts are in the cluster, and how many hosts have each type of daemon
-    - whether a separate OSD cluster network is being used
-    - how many RBD pools and images are in the cluster, and how many pools have RBD mirroring enabled
-    - how many RGW daemons, zones, and zonegroups are present; which RGW frontends are in use
-    - aggregate stats about the CRUSH map, like which algorithms are used, how big buckets are, how many rules are defined, and what tunables are in use
-
-  If you had telemetry enabled, you will need to re-opt-in with::
-
-    ceph telemetry on
-
-  You can view exactly what information will be reported first with::
-
-    ceph telemetry show        # see everything
-    ceph telemetry show basic  # basic cluster info (including all of the new info)
-
-* A health warning is now generated if the average osd heartbeat ping
-  time exceeds a configurable threshold for any of the intervals
-  computed.  The OSD computes 1 minute, 5 minute and 15 minute
-  intervals with average, minimum and maximum values.  New configuration
-  option ``mon_warn_on_slow_ping_ratio`` specifies a percentage of
-  ``osd_heartbeat_grace`` to determine the threshold.  A value of zero
-  disables the warning.  New configuration option
- ``mon_warn_on_slow_ping_time`` specified in milliseconds over-rides the
-  computed value, causes a warning
-  when OSD heartbeat pings take longer than the specified amount.
-  New admin command ``ceph daemon mgr.# dump_osd_network [threshold]`` command will
-  list all connections with a ping time longer than the specified threshold or
-  value determined by the config options, for the average for any of the 3 intervals.
-  New admin command ``ceph daemon osd.# dump_osd_network [threshold]`` will
-  do the same but only including heartbeats initiated by the specified OSD.
-
-* New OSD daemon command dump_recovery_reservations which reveals the
-  recovery locks held (in_progress) and waiting in priority queues.
-
-* New OSD daemon command dump_scrub_reservations which reveals the
-  scrub reservations that are held for local (primary) and remote (replica) PGs.
+    ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
--- a/ceph/alpine/APKBUILD
+++ b/ceph/alpine/APKBUILD
@ -1,7 +1,7 @@
 # Contributor: John Coyle <dx9err@gmail.com>
 # Maintainer: John Coyle <dx9err@gmail.com>
 pkgname=ceph
-pkgver=14.2.6
+pkgver=14.2.8
 pkgrel=0
 pkgdesc="Ceph is a distributed object store and file system"
 pkgusers="ceph"
@ -64,7 +64,7 @@ makedepends="
 	xmlstarlet
 	yasm
 "
-source="ceph-14.2.6.tar.bz2"
+source="ceph-14.2.8.tar.bz2"
 subpackages="
 	$pkgname-base
 	$pkgname-common
@ -117,7 +117,7 @@ _sysconfdir=/etc
 _udevrulesdir=/etc/udev/rules.d
 _python_sitelib=/usr/lib/python2.7/site-packages

-builddir=$srcdir/ceph-14.2.6
+builddir=$srcdir/ceph-14.2.8

 build() {
 	export CEPH_BUILD_VIRTUALENV=$builddir
--- a/ceph/ceph.spec
+++ b/ceph/ceph.spec
@ -101,7 +101,7 @@
 # main package definition
 #################################################################################
 Name:		ceph
-Version:	14.2.6
+Version:	14.2.8
 Release:	0%{?dist}
 %if 0%{?fedora} || 0%{?rhel}
 Epoch:		2
@ -117,7 +117,7 @@ License:	LGPL-2.1 and CC-BY-SA-3.0 and GPL-2.0 and BSL-1.0 and BSD-3-Clause and
 Group:		System/Filesystems
 %endif
 URL:		http://ceph.com/
-Source0:	%{?_remote_tarball_prefix}ceph-14.2.6.tar.bz2
+Source0:	%{?_remote_tarball_prefix}ceph-14.2.8.tar.bz2
 %if 0%{?suse_version}
 # _insert_obs_source_lines_here
 ExclusiveArch:  x86_64 aarch64 ppc64le s390x
@ -149,12 +149,8 @@ BuildRequires:	fuse-devel
 %if 0%{?rhel} == 7
 # devtoolset offers newer make and valgrind-devel, but the old ones are good
 # enough.
-%ifarch x86_64
 BuildRequires:	devtoolset-8-gcc-c++ >= 8.2.1
 %else
-BuildRequires:	devtoolset-7-gcc-c++ >= 7.3.1-5.13
-%endif
-%else
 BuildRequires:	gcc-c++
 %endif
 BuildRequires:	gdbm
@ -296,6 +292,7 @@ BuildRequires:	python%{_python_buildid}-PyJWT
 BuildRequires:	python%{_python_buildid}-Routes
 BuildRequires:	python%{_python_buildid}-Werkzeug
 BuildRequires:	python%{_python_buildid}-numpy-devel
+BuildRequires:	rpm-build
 BuildRequires:  xmlsec1-devel
 %endif
 %endif
@ -1105,7 +1102,7 @@ This package provides Ceph’s default alerts for Prometheus.
 # common
 #################################################################################
 %prep
-%autosetup -p1 -n ceph-14.2.6
+%autosetup -p1 -n ceph-14.2.8

 %build
 # LTO can be enabled as soon as the following GCC bug is fixed:
@ -1554,6 +1551,7 @@ fi
 %files mgr
 %{_bindir}/ceph-mgr
 %dir %{_datadir}/ceph/mgr
+%{_datadir}/ceph/mgr/alerts
 %{_datadir}/ceph/mgr/ansible
 %{_datadir}/ceph/mgr/balancer
 %{_datadir}/ceph/mgr/crash
--- a/ceph/ceph.spec.in
+++ b/ceph/ceph.spec.in
@ -149,12 +149,8 @@ BuildRequires:	fuse-devel
 %if 0%{?rhel} == 7
 # devtoolset offers newer make and valgrind-devel, but the old ones are good
 # enough.
-%ifarch x86_64
 BuildRequires:	devtoolset-8-gcc-c++ >= 8.2.1
 %else
-BuildRequires:	devtoolset-7-gcc-c++ >= 7.3.1-5.13
-%endif
-%else
 BuildRequires:	gcc-c++
 %endif
 BuildRequires:	gdbm
@ -296,6 +292,7 @@ BuildRequires:	python%{_python_buildid}-PyJWT
 BuildRequires:	python%{_python_buildid}-Routes
 BuildRequires:	python%{_python_buildid}-Werkzeug
 BuildRequires:	python%{_python_buildid}-numpy-devel
+BuildRequires:	rpm-build
 BuildRequires:  xmlsec1-devel
 %endif
 %endif
@ -1554,6 +1551,7 @@ fi
 %files mgr
 %{_bindir}/ceph-mgr
 %dir %{_datadir}/ceph/mgr
+%{_datadir}/ceph/mgr/alerts
 %{_datadir}/ceph/mgr/ansible
 %{_datadir}/ceph/mgr/balancer
 %{_datadir}/ceph/mgr/crash
--- a/ceph/changelog.upstream
+++ b/ceph/changelog.upstream
@ -1,8 +1,20 @@
-ceph (14.2.6-1xenial) xenial; urgency=medium
+ceph (14.2.8-1xenial) xenial; urgency=medium

  * 

- -- Jenkins Build Slave User <jenkins-build@slave-ubuntu02.front.sepia.ceph.com>  Wed, 08 Jan 2020 18:48:19 +0000
+ -- Jenkins Build Slave User <jenkins-build@ceph-builders>  Mon, 02 Mar 2020 18:02:26 +0000
+
+ceph (14.2.8-1) stable; urgency=medium
+
+  * New upstream release
+
+ -- Ceph Release Team <ceph-maintainers@ceph.com>  Mon, 02 Mar 2020 17:49:19 +0000
+
+ceph (14.2.7-1) stable; urgency=medium
+
+  * New upstream release
+
+ -- Ceph Release Team <ceph-maintainers@ceph.com>  Fri, 31 Jan 2020 17:07:50 +0000

 ceph (14.2.6-1) stable; urgency=medium

--- a/ceph/cmake/modules/BuildBoost.cmake
+++ b/ceph/cmake/modules/BuildBoost.cmake
@ -137,14 +137,14 @@ function(do_build_boost version)
    check_boost_version("${PROJECT_SOURCE_DIR}/src/boost" ${version})
    set(source_dir
      SOURCE_DIR "${PROJECT_SOURCE_DIR}/src/boost")
-  elseif(version VERSION_GREATER 1.67)
+  elseif(version VERSION_GREATER 1.72)
    message(FATAL_ERROR "Unknown BOOST_REQUESTED_VERSION: ${version}")
  else()
    message(STATUS "boost will be downloaded...")
    # NOTE: If you change this version number make sure the package is available
    # at the three URLs below (may involve uploading to download.ceph.com)
-    set(boost_version 1.67.0)
-    set(boost_sha256 2684c972994ee57fc5632e03bf044746f6eb45d4920c343937a465fd67a5adba)
+    set(boost_version 1.72.0)
+    set(boost_sha256 59c9b274bc451cf91a9ba1dd2c7fdcaf5d60b1b3aa83f2c9fa143417cc660722)
    string(REPLACE "." "_" boost_version_underscore ${boost_version} )
    set(boost_url 
      https://dl.bintray.com/boostorg/release/${boost_version}/source/boost_${boost_version_underscore}.tar.bz2)
--- a/ceph/cmake/modules/FindBoost.cmake
+++ b/ceph/cmake/modules/FindBoost.cmake
--- a/ceph/debian/ceph-mgr.install
+++ b/ceph/debian/ceph-mgr.install
@ -1,5 +1,6 @@
 lib/systemd/system/ceph-mgr*
 usr/bin/ceph-mgr
+usr/share/ceph/mgr/alerts
 usr/share/ceph/mgr/ansible
 usr/share/ceph/mgr/balancer
 usr/share/ceph/mgr/crash
--- a/ceph/doc/_templates/page.html
+++ b/ceph/doc/_templates/page.html
@ -13,7 +13,7 @@

 {%- if edit_on_github_url %}
  <div id="docubetter" align="right" style="display:none; padding: 15px; font-weight: bold;">
-    <a id="edit-on-github" href="{{ edit_on_github_url }}" rel="nofollow">{{ _('Edit on GitHub')}}</a> | <a href="https://github.com/ceph/ceph/projects/4">Report a Documentation Bug</a>
+    <a id="edit-on-github" href="{{ edit_on_github_url }}" rel="nofollow">{{ _('Edit on GitHub')}}</a> | <a href="https://pad.ceph.com/p/Report_Documentation_Bugs">Report a Documentation Bug</a>
  </div>
 {%- endif %}

--- a/ceph/doc/ceph-volume/index.rst
+++ b/ceph/doc/ceph-volume/index.rst
@ -15,8 +15,11 @@ follow a predictable, and robust way of preparing, activating, and starting OSDs
 There is currently support for ``lvm``, and plain disks (with GPT partitions)
 that may have been deployed with ``ceph-disk``.

+``zfs`` support is available for running a FreeBSD cluster.
+
 * :ref:`ceph-volume-lvm`
 * :ref:`ceph-volume-simple`
+* :ref:`ceph-volume-zfs`

 **Node inventory**

@ -76,3 +79,5 @@ and ``ceph-disk`` is fully disabled. Encryption is fully supported.
   simple/activate
   simple/scan
   simple/systemd
+   zfs/index
+   zfs/inventory
--- a/ceph/doc/ceph-volume/lvm/prepare.rst
+++ b/ceph/doc/ceph-volume/lvm/prepare.rst
@ -26,6 +26,75 @@ the back end can be specified with:
 * :ref:`--filestore <ceph-volume-lvm-prepare_filestore>`
 * :ref:`--bluestore <ceph-volume-lvm-prepare_bluestore>`

+.. _ceph-volume-lvm-prepare_bluestore:
+
+``bluestore``
+-------------
+The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
+more flexibility for devices compared to :term:`filestore`.
+Bluestore supports the following configurations:
+
+* A block device, a block.wal, and a block.db device
+* A block device and a block.wal device
+* A block device and a block.db device
+* A single block device
+
+The bluestore subcommand accepts physical block devices, partitions on
+physical block devices or logical volumes as arguments for the various device parameters
+If a physical device is provided, a logical volume will be created. A volume group will
+either be created or reused it its name begins with ``ceph``.
+This allows a simpler approach at using LVM but at the cost of flexibility:
+there are no options or configurations to change how the LV is created.
+
+The ``block`` is specified with the ``--data`` flag, and in its simplest use
+case it looks like::
+
+    ceph-volume lvm prepare --bluestore --data vg/lv
+
+A raw device can be specified in the same way::
+
+    ceph-volume lvm prepare --bluestore --data /path/to/device
+
+For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
+
+    ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
+
+If a ``block.db`` or a ``block.wal`` is needed (they are optional for
+bluestore) they can be specified with ``--block.db`` and ``--block.wal``
+accordingly. These can be a physical device, a partition  or
+a logical volume.
+
+For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
+because they can be used as-is.
+
+While creating the OSD directory, the process will use a ``tmpfs`` mount to
+place all the files needed for the OSD. These files are initially created by
+``ceph-osd --mkfs`` and are fully ephemeral.
+
+A symlink is always created for the ``block`` device, and optionally for
+``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
+id of 0, the directory could look like::
+
+    # ls -l /var/lib/ceph/osd/ceph-0
+    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
+    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
+    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
+    -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
+    -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
+    -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
+    -rw-------. 1 ceph ceph  6 Oct 20 13:05 ready
+    -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
+    -rw-------. 1 ceph ceph  2 Oct 20 13:05 whoami
+
+In the above case, a device was used for ``block`` so ``ceph-volume`` create
+a volume group and a logical volume using the following convention:
+
+* volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
+  ``ceph-{random uuid}``
+
+* logical volume name: ``osd-block-{osd_fsid}``
+
+
 .. _ceph-volume-lvm-prepare_filestore:

 ``filestore``
@ -33,41 +102,47 @@ the back end can be specified with:
 This is the OSD backend that allows preparation of logical volumes for
 a :term:`filestore` objectstore OSD.

-It can use a logical volume for the OSD data and a partitioned physical device
-or logical volume for the journal.  No special preparation is needed for these
-volumes other than following the minimum size requirements for data and
-journal.
+It can use a logical volume for the OSD data and a physical device, a partition
+or logical volume for the journal. A physical device will have a logical volume
+created on it. A volume group will either be created or reused it its name begins
+with ``ceph``.  No special preparation is needed for these volumes other than
+following the minimum size requirements for data and journal.

-The API call looks like::
+The CLI call looks like this of a basic standalone filestore OSD::

-    ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal journal
+    ceph-volume lvm prepare --filestore --data <data block device>
+
+To deploy file store with an external journal::
+
+    ceph-volume lvm prepare --filestore --data <data block device> --journal <journal block device>

 For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::

-    ceph-volume lvm prepare --filestore --dmcrypt --data volume_group/lv_name --journal journal
+    ceph-volume lvm prepare --filestore --dmcrypt --data <data block device> --journal <journal block device>

-There is flexibility to use a raw device or partition as well for ``--data``
-that will be converted to a logical volume. This is not ideal in all situations
-since ``ceph-volume`` is just going to create a unique volume group and
-a logical volume from that device.
+Both the journal and data block device can take three forms:

-When using logical volumes for ``--data``, the  value *must* be a volume group
-name and a logical volume name separated by a ``/``. Since logical volume names
-are not enforced for uniqueness, this prevents using the wrong volume. The
-``--journal`` can be either a logical volume *or* a partition.
+* a physical block device
+* a partition on a physical block device
+* a logical volume

-When using a partition, it *must* contain a ``PARTUUID`` discoverable by
-``blkid``, so that it can later be identified correctly regardless of the
-device name (or path).
+When using logical volumes the value *must* be of the format
+``volume_group/logical_volume``. Since logical volume names
+are not enforced for uniqueness, this prevents accidentally 
+choosing the wrong volume.

-When using a partition, this is how it would look for ``/dev/sdc1``::
+When using a partition, it *must* contain a ``PARTUUID``, that can be 
+discovered by ``blkid``. THis ensure it can later be identified correctly
+regardless of the device name (or path).
+
+For example: passing a logical volume for data and a partition ``/dev/sdc1`` for
+the journal::

    ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal /dev/sdc1

-For a logical volume, just like for ``--data``, a volume group and logical
-volume name are required::
+Passing a bare device for data and a logical volume ias the journal::

-    ceph-volume lvm prepare --filestore --data volume_group/lv_name --journal volume_group/journal_lv
+    ceph-volume lvm prepare --filestore --data /dev/sdc --journal volume_group/journal_lv

 A generated uuid is used to ask the cluster for a new OSD. These two pieces are
 crucial for identifying an OSD and will later be used throughout the
@ -166,72 +241,6 @@ can be started later (for detailed metadata description see
 :ref:`ceph-volume-lvm-tags`).


-.. _ceph-volume-lvm-prepare_bluestore:
-
-``bluestore``
-------------
-The :term:`bluestore` objectstore is the default for new OSDs. It offers a bit
-more flexibility for devices. Bluestore supports the following configurations:
-
-* A block device, a block.wal, and a block.db device
-* A block device and a block.wal device
-* A block device and a block.db device
-* A single block device
-
-It can accept a whole device (or partition), or a logical volume for ``block``.
-If a physical device is provided it will then be turned into a logical volume.
-This allows a simpler approach at using LVM but at the cost of flexibility:
-there are no options or configurations to change how the LV is created.
-
-The ``block`` is specified with the ``--data`` flag, and in its simplest use
-case it looks like::
-
-    ceph-volume lvm prepare --bluestore --data vg/lv
-
-A raw device can be specified in the same way::
-
-    ceph-volume lvm prepare --bluestore --data /path/to/device
-
-For enabling :ref:`encryption <ceph-volume-lvm-encryption>`, the ``--dmcrypt`` flag is required::
-
-    ceph-volume lvm prepare --bluestore --dmcrypt --data vg/lv
-
-If a ``block.db`` or a ``block.wal`` is needed (they are optional for
-bluestore) they can be specified with ``--block.db`` and ``--block.wal``
-accordingly. These can be a physical device (they **must** be a partition) or
-a logical volume.
-
-For both ``block.db`` and ``block.wal`` partitions aren't made logical volumes
-because they can be used as-is. Logical Volumes are also allowed.
-
-While creating the OSD directory, the process will use a ``tmpfs`` mount to
-place all the files needed for the OSD. These files are initially created by
-``ceph-osd --mkfs`` and are fully ephemeral.
-
-A symlink is always created for the ``block`` device, and optionally for
-``block.db`` and ``block.wal``. For a cluster with a default name, and an OSD
-id of 0, the directory could look like::
-
-    # ls -l /var/lib/ceph/osd/ceph-0
-    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block -> /dev/ceph-be2b6fbd-bcf2-4c51-b35d-a35a162a02f0/osd-block-25cf0a05-2bc6-44ef-9137-79d65bd7ad62
-    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.db -> /dev/sda1
-    lrwxrwxrwx. 1 ceph ceph 93 Oct 20 13:05 block.wal -> /dev/ceph/osd-wal-0
-    -rw-------. 1 ceph ceph 37 Oct 20 13:05 ceph_fsid
-    -rw-------. 1 ceph ceph 37 Oct 20 13:05 fsid
-    -rw-------. 1 ceph ceph 55 Oct 20 13:05 keyring
-    -rw-------. 1 ceph ceph  6 Oct 20 13:05 ready
-    -rw-------. 1 ceph ceph 10 Oct 20 13:05 type
-    -rw-------. 1 ceph ceph  2 Oct 20 13:05 whoami
-
-In the above case, a device was used for ``block`` so ``ceph-volume`` create
-a volume group and a logical volume using the following convention:
-
-* volume group name: ``ceph-{cluster fsid}`` or if the vg exists already
-  ``ceph-{random uuid}``
-
-* logical volume name: ``osd-block-{osd_fsid}``
-
-
 Crush device class
 ------------------

@ -300,9 +309,8 @@ Summary
 -------
 To recap the ``prepare`` process for :term:`bluestore`:

-#. Accept a logical volume for block or a raw device (that will get converted
-   to an lv)
-#. Accept partitions or logical volumes for ``block.wal`` or ``block.db``
+#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
+#. Creates logical volumes on any raw physical devices.
 #. Generate a UUID for the OSD
 #. Ask the monitor get an OSD ID reusing the generated UUID
 #. OSD data directory is created on a tmpfs mount.
@ -314,7 +322,7 @@ To recap the ``prepare`` process for :term:`bluestore`:

 And the ``prepare`` process for :term:`filestore`:

-#. Accept only logical volumes for data and journal (both required)
+#. Accepts raw physical devices, partitions on physical devices or logical volumes as arguments.
 #. Generate a UUID for the OSD
 #. Ask the monitor get an OSD ID reusing the generated UUID
 #. OSD data directory is created and data volume mounted
--- a/ceph/doc/ceph-volume/zfs/index.rst
+++ b/ceph/doc/ceph-volume/zfs/index.rst
@ -0,0 +1,31 @@
+.. _ceph-volume-zfs:
+
+``zfs``
+=======
+Implements the functionality needed to deploy OSDs from the ``zfs`` subcommand:
+``ceph-volume zfs``
+
+The current implementation only works for ZFS on FreeBSD
+
+**Command Line Subcommands**
+
+* :ref:`ceph-volume-zfs-inventory`
+
+.. not yet implemented
+.. * :ref:`ceph-volume-zfs-prepare`
+
+.. * :ref:`ceph-volume-zfs-activate`
+
+.. * :ref:`ceph-volume-zfs-create`
+
+.. * :ref:`ceph-volume-zfs-list`
+
+.. * :ref:`ceph-volume-zfs-scan`
+
+**Internal functionality**
+
+There are other aspects of the ``zfs`` subcommand that are internal and not
+exposed to the user, these sections explain how these pieces work together,
+clarifying the workflows of the tool.
+
+:ref:`zfs <ceph-volume-zfs-api>`
--- a/ceph/doc/ceph-volume/zfs/inventory.rst
+++ b/ceph/doc/ceph-volume/zfs/inventory.rst
@ -0,0 +1,19 @@
+.. _ceph-volume-zfs-inventory:
+
+``inventory``
+=============
+The ``inventory`` subcommand queries a host's disc inventory through GEOM and provides
+hardware information and metadata on every physical device.
+
+This only works on a FreeBSD platform.
+
+By default the command returns a short, human-readable report of all physical disks.
+
+For programmatic consumption of this report pass ``--format json`` to generate a
+JSON formatted report. This report includes extensive information on the
+physical drives such as disk metadata (like model and size), logical volumes
+and whether they are used by ceph, and if the disk is usable by ceph and
+reasons why not.
+
+A device path can be specified to report extensive information on a device in
+both plain and json format.
--- a/ceph/doc/cephfs/add-remove-mds.rst
+++ b/ceph/doc/cephfs/add-remove-mds.rst
@ -1,51 +1,106 @@
 ============================
- Add/Remove Metadata Server
+ Deploying Metadata Servers
 ============================

-You must deploy at least one metadata server daemon to use CephFS.  Instructions are given here for setting up an MDS manually, but you might prefer to use another tool such as ceph-deploy or ceph-ansible.
+Each CephFS file system requires at least one MDS. The cluster operator will
+generally use their automated deployment tool to launch required MDS servers as
+needed.  Rook and ansible (via the ceph-ansible playbooks) are recommended
+tools for doing this. For clarity, we also show the systemd commands here which
+may be run by the deployment technology if executed on bare-metal.

 See `MDS Config Reference`_ for details on configuring metadata servers.


-Add a Metadata Server
-=====================
+Provisioning Hardware for an MDS
+================================

-#. Create an mds data point ``/var/lib/ceph/mds/ceph-{$id}``.
+The present version of the MDS is single-threaded and CPU-bound for most
+activities, including responding to client requests. Even so, an MDS under the
+most aggressive client loads still uses about 2 to 3 CPU cores. This is due to
+the other miscellaneous upkeep threads working in tandem.
+
+Even so, it is recommended that an MDS server be well provisioned with an
+advanced CPU with sufficient cores. Development is on-going to make better use
+of available CPU cores in the MDS; it is expected in future versions of Ceph
+that the MDS server will improve performance by taking advantage of more cores.
+
+The other dimension to MDS performance is the available RAM for caching. The
+MDS necessarily manages a distributed and cooperative metadata cache among all
+clients and other active MDSs. Therefore it is essential to provide the MDS
+with sufficient RAM to enable faster metadata access and mutation.
+
+Generally, an MDS serving a large cluster of clients (1000 or more) will use at
+least 64GB of cache (see also :doc:`/cephfs/cache-size-limits`). An MDS with a larger
+cache is not well explored in the largest known community clusters; there may
+be diminishing returns where management of such a large cache negatively
+impacts performance in surprising ways. It would be best to do analysis with
+expected workloads to determine if provisioning more RAM is worthwhile.
+
+In a bare-metal cluster, the best practice is to over-provision hardware for
+the MDS server. Even if a single MDS daemon is unable to fully utilize the
+hardware, it may be desirable later on to start more active MDS daemons on the
+same node to fully utilize the available cores and memory. Additionally, it may
+become clear with workloads on the cluster that performance improves with
+multiple active MDS on the same node rather than over-provisioning a single
+MDS.
+
+Finally, be aware that CephFS is a highly-available file system by supporting
+standby MDS (see also :ref:`mds-standby`) for rapid failover. To get a real
+benefit from deploying standbys, it is usually necessary to distribute MDS
+daemons across at least two nodes in the cluster. Otherwise, a hardware failure
+on a single node may result in the file system becoming unavailable.
+
+Co-locating the MDS with other Ceph daemons (hyperconverged) is an effective
+and recommended way to accomplish this so long as all daemons are configured to
+use available hardware within certain limits.  For the MDS, this generally
+means limiting its cache size.
+
+
+Adding an MDS
+=============
+
+#. Create an mds data point ``/var/lib/ceph/mds/ceph-${id}``. The daemon only uses this directory to store its keyring.

 #. Edit ``ceph.conf`` and add MDS section. ::

-	[mds.{$id}]
+	[mds.${id}]
 	host = {hostname}

 #. Create the authentication key, if you use CephX. ::

-	$ sudo ceph auth get-or-create mds.{$id} mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-{$id}/keyring
+	$ sudo ceph auth get-or-create mds.${id} mon 'profile mds' mgr 'profile mds' mds 'allow *' osd 'allow *' > /var/lib/ceph/mds/ceph-${id}/keyring

 #. Start the service. ::

-	$ sudo service ceph start mds.{$id}
+	$ sudo systemctl start mds.${id}

-#. The status of the cluster shows: ::
+#. The status of the cluster should show: ::

-	mds: cephfs_a-1/1/1 up  {0=c=up:active}, 3 up:standby
+	mds: ${id}:1 {0=${id}=up:active} 2 up:standby

-Remove a Metadata Server
-========================
-
-.. note:: Ensure that if you remove a metadata server, the remaining metadata
-   servers will be able to service requests from CephFS clients. If that is not
-   possible, consider adding a metadata server before destroying the metadata
-   server you would like to take offline.
+Removing an MDS
+===============

 If you have a metadata server in your cluster that you'd like to remove, you may use
 the following method.

-#. Create a new Metadata Server as shown in the above section.
+#. (Optionally:) Create a new replacement Metadata Server. If there are no
+   replacement MDS to take over once the MDS is removed, the file system will
+   become unavailable to clients.  If that is not desirable, consider adding a
+   metadata server before tearing down the metadata server you would like to
+   take offline.

-#. Stop the old Metadata Server and start using the new one. ::
+#. Stop the MDS to be removed. ::

-	$ ceph mds fail <mds name>
+	$ sudo systemctl stop mds.${id}

-#. Remove the ``/var/lib/ceph/mds/ceph-{$id}`` directory on the old Metadata server.
+   The MDS will automatically notify the Ceph monitors that it is going down.
+   This enables the monitors to perform instantaneous failover to an available
+   standby, if one exists. It is unnecessary to use administrative commands to
+   effect this failover, e.g. through the use of ``ceph mds fail mds.${id}``.
+
+#. Remove the ``/var/lib/ceph/mds/ceph-${id}`` directory on the MDS. ::
+
+	$ sudo rm -rf /var/lib/ceph/mds/ceph-${id}

 .. _MDS Config Reference: ../mds-config-ref
--- a/ceph/doc/cephfs/client-auth.rst
+++ b/ceph/doc/cephfs/client-auth.rst
@ -29,9 +29,9 @@ directory while creating key for a client using the following syntax. ::

 ceph fs authorize *filesystem_name* client.*client_name* /*specified_directory* rw

-for example, to restrict client ``foo`` to writing only in the ``bar`` directory of filesystem ``cephfs``, use ::
+For example, to restrict client ``foo`` to writing only in the ``bar`` directory of filesystem ``cephfs_a``, use ::

- ceph fs authorize cephfs client.foo / r /bar rw
+ ceph fs authorize cephfs_a client.foo / r /bar rw

 results in:

@ -44,7 +44,7 @@ for example, to restrict client ``foo`` to writing only in the ``bar`` directory
 To completely restrict the client to the ``bar`` directory, omit the
 root directory ::

- ceph fs authorize cephfs client.foo /bar rw
+ ceph fs authorize cephfs_a client.foo /bar rw

 Note that if a client's read access is restricted to a path, they will only
 be able to mount the filesystem when specifying a readable path in the
--- a/ceph/doc/cephfs/createfs.rst
+++ b/ceph/doc/cephfs/createfs.rst
@ -8,11 +8,19 @@ Creating pools
 A Ceph filesystem requires at least two RADOS pools, one for data and one for metadata.
 When configuring these pools, you might consider:

- Using a higher replication level for the metadata pool, as any data
-  loss in this pool can render the whole filesystem inaccessible.
- Using lower-latency storage such as SSDs for the metadata pool, as this
-  will directly affect the observed latency of filesystem operations
-  on clients.
+- Using a higher replication level for the metadata pool, as any data loss in
+  this pool can render the whole filesystem inaccessible.
+- Using lower-latency storage such as SSDs for the metadata pool, as this will
+  directly affect the observed latency of filesystem operations on clients.
+- The data pool used to create the file system is the "default" data pool and
+  the location for storing all inode backtrace information, used for hard link
+  management and disaster recovery. For this reason, all inodes created in
+  CephFS have at least one object in the default data pool. If erasure-coded
+  pools are planned for the file system, it is usually better to use a
+  replicated pool for the default data pool to improve small-object write and
+  read performance for updating backtraces. Separately, another erasure-coded
+  data pool can be added (see also :ref:`ecpool`) that can be used on an entire
+  hierarchy of directories and files (see also :ref:`file-layouts`).

 Refer to :doc:`/rados/operations/pools` to learn more about managing pools.  For
 example, to create two pools with default settings for use with a filesystem, you
@ -23,6 +31,11 @@ might run the following commands:
    $ ceph osd pool create cephfs_data <pg_num>
    $ ceph osd pool create cephfs_metadata <pg_num>

+Generally, the metadata pool will have at most a few gigabytes of data. For
+this reason, a smaller PG count is usually recommended. 64 or 128 is commonly
+used in practice for large clusters.
+
+
 Creating a filesystem
 =====================

--- a/ceph/doc/cephfs/file-layouts.rst
+++ b/ceph/doc/cephfs/file-layouts.rst
@ -1,3 +1,4 @@
+.. _file-layouts:

 File layouts
 ============
--- a/ceph/doc/cephfs/fs-volumes.rst
+++ b/ceph/doc/cephfs/fs-volumes.rst
@ -65,14 +65,14 @@ FS Subvolume groups

 Create a subvolume group using::

-    $ ceph fs subvolumegroup create <vol_name> <group_name> [--mode <octal_mode> --pool_layout <data_pool_name>]
+    $ ceph fs subvolumegroup create <vol_name> <group_name> [--pool_layout <data_pool_name> --uid <uid> --gid <gid> --mode <octal_mode>]

 The command succeeds even if the subvolume group already exists.

 When creating a subvolume group you can specify its data pool layout (see
-:doc:`/cephfs/file-layouts`), and file mode in octal numerals. By default, the
-subvolume group is created with an octal file mode '755', and data pool layout
-of its parent directory.
+:doc:`/cephfs/file-layouts`), uid, gid, and file mode in octal numerals. By default, the
+subvolume group is created with an octal file mode '755', uid '0', gid '0' and data pool
+layout of its parent directory.


 Remove a subvolume group using::
@ -108,17 +108,17 @@ FS Subvolumes

 Create a subvolume using::

-    $ ceph fs subvolume create <vol_name> <subvol_name> [--group_name <subvol_group_name> --mode <octal_mode> --pool_layout <data_pool_name> --size <size_in_bytes>]
+    $ ceph fs subvolume create <vol_name> <subvol_name> [--size <size_in_bytes> --group_name <subvol_group_name> --pool_layout <data_pool_name> --uid <uid> --gid <gid> --mode <octal_mode>]


 The command succeeds even if the subvolume already exists.

 When creating a subvolume you can specify its subvolume group, data pool layout,
-file mode in octal numerals, and size in bytes. The size of the subvolume is
+uid, gid, file mode in octal numerals, and size in bytes. The size of the subvolume is
 specified by setting a quota on it (see :doc:`/cephfs/quota`). By default a
 subvolume is created within the default subvolume group, and with an octal file
-mode '755', data pool layout of its parent directory and no size limit.
-
+mode '755', uid of its subvolume group, gid of its subvolume group, data pool layout of
+its parent directory and no size limit.

 Remove a subvolume group using::

@ -133,6 +133,14 @@ The removal of a subvolume fails if it has snapshots, or is non-existent.
 Using the '--force' flag allows the command to succeed even if the subvolume is
 non-existent.

+Resize a subvolume using::
+
+    $ ceph fs subvolume resize <vol_name> <subvol_name> <new_size> [--group_name <subvol_group_name>] [--no_shrink]
+
+The command resizes the subvolume quota using the size specified by 'new_size'.
+'--no_shrink' flag prevents the subvolume to shrink below the current used size of the subvolume.
+
+The subvolume can be resized to an infinite size by passing 'inf' or 'infinite' as the new_size.

 Fetch the absolute path of a subvolume using::

--- a/ceph/doc/cephfs/index.rst
+++ b/ceph/doc/cephfs/index.rst
@ -50,8 +50,7 @@ least one :term:`Ceph Metadata Server` running.
 .. toctree:: 
 	:maxdepth: 1

-	Add/Remove MDS(s) <add-remove-mds>
-	MDS states <mds-states>
+	Provision/Add/Remove MDS(s) <add-remove-mds>
 	MDS failover and standby configuration <standby>
 	MDS Configuration Settings <mds-config-ref>
 	Client Configuration Settings <client-config-ref>
@ -70,7 +69,7 @@ authentication keyring.
 .. toctree:: 
 	:maxdepth: 1

-	Create CephFS <createfs>
+	Create a CephFS file system <createfs>
 	Mount CephFS <kernel>
 	Mount CephFS as FUSE <fuse>
 	Mount CephFS in fstab <fstab>
--- a/ceph/doc/dev/ceph-volume/index.rst
+++ b/ceph/doc/dev/ceph-volume/index.rst
@ -10,4 +10,5 @@ ceph-volume developer documentation

   plugins
   lvm
+   zfs
   systemd
--- a/ceph/doc/dev/ceph-volume/zfs.rst
+++ b/ceph/doc/dev/ceph-volume/zfs.rst
@ -0,0 +1,176 @@
+
+.. _ceph-volume-zfs-api:
+
+ZFS
+===
+The backend of ``ceph-volume zfs`` is ZFS, it relies heavily on the usage of
+tags, which is a way for ZFS to allow extending its volume metadata. These
+values can later be queried against devices and it is how they get discovered
+later.
+
+Currently this interface is only usable when running on FreeBSD.
+
+.. warning:: These APIs are not meant to be public, but are documented so that
+             it is clear what the tool is doing behind the scenes. Do not alter
+             any of these values.
+
+
+.. _ceph-volume-zfs-tag-api:
+
+Tag API
+-------
+The process of identifying filesystems, volumes and pools as part of Ceph relies
+on applying tags on all volumes. It follows a naming convention for the
+namespace that looks like::
+
+    ceph.<tag name>=<tag value>
+
+All tags are prefixed by the ``ceph`` keyword to claim ownership of that
+namespace and make it easily identifiable. This is how the OSD ID would be used
+in the context of zfs tags::
+
+    ceph.osd_id=0
+
+Tags on filesystems are stored as property.
+Tags on a zpool are stored in the comment property as a concatenated list 
+seperated by ``;`` 
+
+.. _ceph-volume-zfs-tags:
+
+Metadata
+--------
+The following describes all the metadata from Ceph OSDs that is stored on a
+ZFS filesystem, volume, pool:
+
+
+``type``
+--------
+Describes if the device is an OSD or Journal, with the ability to expand to
+other types when supported 
+
+Example::
+
+    ceph.type=osd
+
+
+``cluster_fsid``
+----------------
+Example::
+
+    ceph.cluster_fsid=7146B649-AE00-4157-9F5D-1DBFF1D52C26
+
+
+``data_device``
+---------------
+Example::
+
+    ceph.data_device=/dev/ceph/data-0
+
+
+``data_uuid``
+-------------
+Example::
+
+    ceph.data_uuid=B76418EB-0024-401C-8955-AE6919D45CC3
+
+
+``journal_device``
+------------------
+Example::
+
+    ceph.journal_device=/dev/ceph/journal-0
+
+
+``journal_uuid``
+----------------
+Example::
+
+    ceph.journal_uuid=2070E121-C544-4F40-9571-0B7F35C6CB2B
+
+
+``osd_fsid``
+------------
+Example::
+
+    ceph.osd_fsid=88ab9018-f84b-4d62-90b4-ce7c076728ff
+
+
+``osd_id``
+----------
+Example::
+
+    ceph.osd_id=1
+
+
+``block_device``
+----------------
+Just used on :term:`bluestore` backends. Captures the path to the logical
+volume path.
+
+Example::
+
+    ceph.block_device=/dev/gpt/block-0
+
+
+``block_uuid``
+--------------
+Just used on :term:`bluestore` backends. Captures either the logical volume UUID or
+the partition UUID.
+
+Example::
+
+    ceph.block_uuid=E5F041BB-AAD4-48A8-B3BF-31F7AFD7D73E
+
+
+``db_device``
+-------------
+Just used on :term:`bluestore` backends. Captures the path to the logical
+volume path.
+
+Example::
+
+    ceph.db_device=/dev/gpt/db-0
+
+
+``db_uuid``
+-----------
+Just used on :term:`bluestore` backends. Captures either the logical volume UUID or
+the partition UUID.
+
+Example::
+
+    ceph.db_uuid=F9D02CF1-31AB-4910-90A3-6A6302375525
+
+
+``wal_device``
+--------------
+Just used on :term:`bluestore` backends. Captures the path to the logical
+volume path.
+
+Example::
+
+    ceph.wal_device=/dev/gpt/wal-0
+
+
+``wal_uuid``
+------------
+Just used on :term:`bluestore` backends. Captures either the logical volume UUID or
+the partition UUID.
+
+Example::
+
+    ceph.wal_uuid=A58D1C68-0D6E-4CB3-8E99-B261AD47CC39
+
+
+``compression``
+---------------
+A compression-enabled device can allways be set using the native zfs settings on
+a volume or filesystem. This will/can be activated during creation of the volume
+of filesystem. 
+When activated by ``ceph-volume zfs`` this tag will be created.
+Compression manually set AFTER ``ceph-volume`` will go unnoticed, unless this 
+tag is also manually set.
+
+Example for an enabled compression device::
+
+    ceph.vdo=1
--- a/ceph/doc/man/8/ceph-fuse.rst
+++ b/ceph/doc/man/8/ceph-fuse.rst
@ -50,6 +50,10 @@ Any options not recognized by ceph-fuse will be passed on to libfuse.

   Connect to specified monitor (instead of looking through ceph.conf).

+.. option:: -k <path-to-keyring>
+
+   Provide path to keyring; useful when it's absent in standard locations.
+
 .. option:: --client_mountpoint/-r root_directory

   Use root_directory as the mounted root, rather than the full Ceph tree.
--- a/ceph/doc/man/8/osdmaptool.rst
+++ b/ceph/doc/man/8/osdmaptool.rst
@ -13,6 +13,12 @@ Synopsis

 | **osdmaptool** *mapfilename* [--print] [--createsimple *numosd*
  [--pgbits *bitsperosd* ] ] [--clobber]
+| **osdmaptool** *mapfilename* [--import-crush *crushmap*]
+| **osdmaptool** *mapfilename* [--export-crush *crushmap*]
+| **osdmaptool** *mapfilename* [--upmap *file*] [--upmap-max *max-optimizations*]
+  [--upmap-deviation *max-deviation*] [--upmap-pool *poolname*]
+  [--upmap-save *file*] [--upmap-save *newosdmap*] [--upmap-active]
+| **osdmaptool** *mapfilename* [--upmap-cleanup] [--upmap-save *newosdmap*]


 Description
@ -21,6 +27,8 @@ Description
 **osdmaptool** is a utility that lets you create, view, and manipulate
 OSD cluster maps from the Ceph distributed storage system. Notably, it
 lets you extract the embedded CRUSH map or import a new CRUSH map.
+It can also simulate the upmap balancer mode so you can get a sense of
+what is needed to balance your PGs.


 Options
@ -111,6 +119,10 @@ Options

   mark osds up and in (but do not persist).

+.. option:: --mark-out
+
+   mark an osd as out (but do not persist)
+
 .. option:: --tree

   Displays a hierarchical tree of the map.
@ -119,6 +131,43 @@ Options

   clears pg_temp and primary_temp variables.

+.. option:: --health
+
+   dump health checks
+
+.. option:: --with-default-pool
+
+   include default pool when creating map
+
+.. option:: --upmap-cleanup <file>
+
+   clean up pg_upmap[_items] entries, writing commands to <file> [default: - for stdout]
+
+.. option:: --upmap <file>
+
+   calculate pg upmap entries to balance pg layout writing commands to <file> [default: - for stdout]
+
+.. option:: --upmap-max <max-optimizations>
+
+   set max upmap entries to calculate [default: 10]
+
+.. option:: --upmap-deviation <max-deviation>
+
+   max deviation from target [default: 5]
+
+.. option:: --upmap-pool <poolname>
+
+   restrict upmap balancing to 1 pool or the option can be repeated for multiple pools
+
+.. option:: --upmap-save
+
+   write modified OSDMap with upmap changes
+
+.. option:: --upmap-active
+
+   Act like an active balancer, keep applying changes until balanced
+
+
 Example
 =======

@ -130,19 +179,19 @@ To view the result::

        osdmaptool --print osdmap

-To view the mappings of placement groups for pool 0::
+To view the mappings of placement groups for pool 1::

-        osdmaptool --test-map-pgs-dump rbd --pool 0
+        osdmaptool osdmap --test-map-pgs-dump --pool 1

        pool 0 pg_num 8
-        0.0     [0,2,1] 0
-        0.1     [2,0,1] 2
-        0.2     [0,1,2] 0
-        0.3     [2,0,1] 2
-        0.4     [0,2,1] 0
-        0.5     [0,2,1] 0
-        0.6     [0,1,2] 0
-        0.7     [1,0,2] 1
+        1.0     [0,2,1] 0
+        1.1     [2,0,1] 2
+        1.2     [0,1,2] 0
+        1.3     [2,0,1] 2
+        1.4     [0,2,1] 0
+        1.5     [0,2,1] 0
+        1.6     [0,1,2] 0
+        1.7     [1,0,2] 1
        #osd    count   first   primary c wt    wt
        osd.0   8       5       5       1       1
        osd.1   8       1       1       1       1
@ -157,7 +206,7 @@ To view the mappings of placement groups for pool 0::
        size 3  8

 In which,
- #. pool 0 has 8 placement groups. And two tables follow:
+ #. pool 1 has 8 placement groups. And two tables follow:
 #. A table for placement groups. Each row presents a placement group. With columns of:

    * placement group id,
@ -201,6 +250,56 @@ placement group distribution, whose standard deviation is 1.41421::
        size 20
        size 364

+   To simulate the active balancer in upmap mode::
+
+        osdmaptool --upmap upmaps.out --upmap-active --upmap-deviation 6 --upmap-max 11 osdmap
+
+   osdmaptool: osdmap file 'osdmap'
+   writing upmap command output to: upmaps.out
+   checking for upmap cleanups
+   upmap, max-count 11, max deviation 6
+   pools movies photos metadata data
+   prepared 11/11 changes
+   Time elapsed 0.00310404 secs
+   pools movies photos metadata data
+   prepared 11/11 changes
+   Time elapsed 0.00283402 secs
+   pools data metadata movies photos
+   prepared 11/11 changes
+   Time elapsed 0.003122 secs
+   pools photos metadata data movies
+   prepared 11/11 changes
+   Time elapsed 0.00324372 secs
+   pools movies metadata data photos
+   prepared 1/11 changes
+   Time elapsed 0.00222609 secs
+   pools data movies photos metadata
+   prepared 0/11 changes
+   Time elapsed 0.00209916 secs
+   Unable to find further optimization, or distribution is already perfect
+   osd.0 pgs 41
+   osd.1 pgs 42
+   osd.2 pgs 42
+   osd.3 pgs 41
+   osd.4 pgs 46
+   osd.5 pgs 39
+   osd.6 pgs 39
+   osd.7 pgs 43
+   osd.8 pgs 41
+   osd.9 pgs 46
+   osd.10 pgs 46
+   osd.11 pgs 46
+   osd.12 pgs 46
+   osd.13 pgs 41
+   osd.14 pgs 40
+   osd.15 pgs 40
+   osd.16 pgs 39
+   osd.17 pgs 46
+   osd.18 pgs 46
+   osd.19 pgs 39
+   osd.20 pgs 42
+   Total time elapsed 0.0167765 secs, 5 rounds
+

 Availability
 ============
--- a/ceph/doc/mgr/alerts.rst
+++ b/ceph/doc/mgr/alerts.rst
@ -0,0 +1,58 @@
+Alerts module
+=============
+
+The alerts module can send simple alert messages about cluster health
+via e-mail.  In the future, it will support other notification methods
+as well.
+
+:note: This module is *not* intended to be a robust monitoring
+       solution.  The fact that it is run as part of the Ceph cluster
+       itself is fundamentally limiting in that a failure of the
+       ceph-mgr daemon prevents alerts from being sent.  This module
+       can, however, be useful for standalone clusters that exist in
+       environments where existing monitoring infrastructure does not
+       exist.
+
+Enabling
+--------
+
+The *alerts* module is enabled with::
+
+  ceph mgr module enable alerts
+
+Configuration
+-------------
+
+To configure SMTP, all of the following config options must be set::
+
+  ceph config set mgr mgr/alerts/smtp_host *<smtp-server>*
+  ceph config set mgr mgr/alerts/smtp_destination *<email-address-to-send-to>*
+  ceph config set mgr mgr/alerts/smtp_sender *<from-email-address>*
+
+By default, the module will use SSL and port 465.  To change that,::
+
+  ceph config set mgr mgr/alerts/smtp_ssl false   # if not SSL
+  ceph config set mgr mgr/alerts/smtp_port *<port-number>*  # if not 465
+
+To authenticate to the SMTP server, you must set the user and password::
+
+  ceph config set mgr mgr/alerts/smtp_user *<username>*
+  ceph config set mgr mgr/alerts/smtp_password *<password>*
+
+By default, the name in the ``From:`` line is simply ``Ceph``.  To
+change that (e.g., to identify which cluster this is),::
+
+  ceph config set mgr mgr/alerts/smtp_from_name 'Ceph Cluster Foo'
+
+By default, the module will check the cluster health once per minute
+and, if there is a change, send a message.  To change that
+frequency,::
+
+  ceph config set mgr mgr/alerts/interval *<interval>*   # e.g., "5m" for 5 minutes
+
+Commands
+--------
+
+To force an alert to be send immediately,::
+
+  ceph alerts send
--- a/ceph/doc/mgr/dashboard.rst
+++ b/ceph/doc/mgr/dashboard.rst
@ -425,6 +425,12 @@ The format of url is : `<protocol>:<IP-address>:<port>`
  above, check your browser's documentation on how to unblock mixed content.
  Alternatively, consider enabling SSL/TLS support in Grafana.

+If you are using a self-signed certificate in your Grafana setup, then you should
+disable certificate verification in the dashboard to avoid refused connections,
+e.g. caused by certificates signed by unknown CA or not matching the host name::
+
+  $ ceph dashboard set-grafana-api-ssl-verify False
+
 You can directly access Grafana Instance as well to monitor your cluster.

 .. _dashboard-sso-support:
--- a/ceph/doc/mgr/index.rst
+++ b/ceph/doc/mgr/index.rst
@ -29,6 +29,7 @@ sensible.
    Writing modules <modules>
    Writing orchestrator plugins <orchestrator_modules>
    Dashboard module <dashboard>
+    Alerts module <alerts>
    DiskPrediction module <diskprediction>
    Local pool module <localpool>
    RESTful module <restful>
--- a/ceph/doc/mgr/zabbix.rst
+++ b/ceph/doc/mgr/zabbix.rst
@ -100,6 +100,22 @@ A `template <https://raw.githubusercontent.com/ceph/ceph/9c54334b615362e0a60442c

 This template contains all items and a few triggers. You can customize the triggers afterwards to fit your needs.

+
+Multiple Zabbix servers
+^^^^^^^^^^^^^^^^^^^^^^^
+It is possible to instruct zabbix module to send data to multiple Zabbix servers.
+
+Parameter *zabbix_host* can be set with multiple hostnames separated by commas.
+Hosnames (or IP adderesses) can be followed by colon and port number. If a port
+number is not present module will use the port number defined in *zabbix_port*.
+
+For example:
+
+::
+
+    ceph zabbix config-set zabbix_host "zabbix1,zabbix2:2222,zabbix3:3333"
+
+
 Manually sending data
 ---------------------
 If needed the module can be asked to send data immediately instead of waiting for
--- a/ceph/doc/rados/configuration/pool-pg-config-ref.rst
+++ b/ceph/doc/rados/configuration/pool-pg-config-ref.rst
@ -60,15 +60,6 @@ Ceph configuration file.
 :Default: ``30``


-``mon pg warn max per osd``
-
-:Description: Issue a ``HEALTH_WARN`` in cluster log if the average number
-              of PGs per (in) OSD is above this number. (a non-positive number
-              disables this)
-:Type: Integer
-:Default: ``300``
-
-
 ``mon pg warn min objects``

 :Description: Do not warn if the total number of objects in cluster is below
@ -207,7 +198,7 @@ Ceph configuration file.
              value is the same as ``pg_num`` with ``mkpool``.

 :Type: 32-bit Integer
-:Default: ``8``
+:Default: ``32``


 ``osd pool default pgp num``
--- a/ceph/doc/rados/operations/erasure-code.rst
+++ b/ceph/doc/rados/operations/erasure-code.rst
@ -1,3 +1,5 @@
+.. _ecpool:
+
 =============
 Erasure code
 =============
--- a/ceph/doc/rados/operations/health-checks.rst
+++ b/ceph/doc/rados/operations/health-checks.rst
@ -636,6 +636,23 @@ The PG count for existing pools can be increased or new pools can be created.
 Please refer to :ref:`choosing-number-of-placement-groups` for more
 information.

+POOL_PG_NUM_NOT_POWER_OF_TWO
+____________________________
+
+One or more pools has a ``pg_num`` value that is not a power of two.
+Although this is not strictly incorrect, it does lead to a less
+balanced distribution of data because some PGs have roughly twice as
+much data as others.
+
+This is easily corrected by setting the ``pg_num`` value for the
+affected pool(s) to a nearby power of two::
+
+  ceph osd pool set <pool-name> pg_num <value>
+
+This health warning can be disabled with::
+
+  ceph config set global mon_warn_on_pool_pg_num_not_power_of_two false
+
 POOL_TOO_FEW_PGS
 ________________

--- a/ceph/doc/rados/operations/pools.rst
+++ b/ceph/doc/rados/operations/pools.rst
@ -345,7 +345,7 @@ You may set values for the following keys:
 ``crush_rule``

 :Description: The rule to use for mapping object placement in the cluster.
-:Type: Integer
+:Type: String

 .. _allow_ec_overwrites:

--- a/ceph/doc/rados/operations/upmap.rst
+++ b/ceph/doc/rados/operations/upmap.rst
@ -23,14 +23,12 @@ use with::

  ceph features

-A word of caution
+Balancer module
 -----------------

-This is a new feature and not very user friendly.  At the time of this
-writing we are working on a new `balancer` module for ceph-mgr that
-will eventually do all of this automatically.
+The new `balancer` module for ceph-mgr will automatically balance
+the number of PGs per OSD.  See ``Balancer``

-Until then,

 Offline optimization
 --------------------
@ -43,7 +41,9 @@ Upmap entries are updated with an offline optimizer built into ``osdmaptool``.

 #. Run the optimizer::

-     osdmaptool om --upmap out.txt [--upmap-pool <pool>] [--upmap-max <max-count>] [--upmap-deviation <max-deviation>]
+     osdmaptool om --upmap out.txt [--upmap-pool <pool>]
+              [--upmap-max <max-optimizations>] [--upmap-deviation <max-deviation>]
+              [--upmap-active]

   It is highly recommended that optimization be done for each pool
   individually, or for sets of similarly-utilized pools.  You can
@ -52,24 +52,34 @@ Upmap entries are updated with an offline optimizer built into ``osdmaptool``.
   kind of data (e.g., RBD image pools, yes; RGW index pool and RGW
   data pool, no).

-   The ``max-count`` value is the maximum number of upmap entries to
-   identify in the run.  The default is 100, but you may want to make
-   this a smaller number so that the tool completes more quickly (but
-   does less work).  If it cannot find any additional changes to make
-   it will stop early (i.e., when the pool distribution is perfect).
+   The ``max-optimizations`` value is the maximum number of upmap entries to
+   identify in the run.  The default is `10` like the ceph-mgr balancer module,
+   but you should use a larger number if you are doing offline optimization.
+   If it cannot find any additional changes to make it will stop early
+   (i.e., when the pool distribution is perfect).

-   The ``max-deviation`` value defaults to `.01` (i.e., 1%).  If an OSD
-   utilization varies from the average by less than this amount it
-   will be considered perfect.
+   The ``max-deviation`` value defaults to `5`.  If an OSD PG count
+   varies from the computed target number by less than or equal
+   to this amount it will be considered perfect.

-#. The proposed changes are written to the output file ``out.txt`` in
-   the example above.  These are normal ceph CLI commands that can be
-   run to apply the changes to the cluster.  This can be done with::
+   The ``--upmap-active`` option simulates the behavior of the active
+   balancer in upmap mode.  It keeps cycling until the OSDs are balanced
+   and reports how many rounds and how long each round is taking.  The
+   elapsed time for rounds indicates the CPU load ceph-mgr will be
+   consuming when it tries to compute the next optimization plan.
+
+#. Apply the changes::

     source out.txt

+   The proposed changes are written to the output file ``out.txt`` in
+   the example above.  These are normal ceph CLI commands that can be
+   run to apply the changes to the cluster.
+
+
 The above steps can be repeated as many times as necessary to achieve
 a perfect distribution of PGs for each set of pools.

 You can see some (gory) details about what the tool is doing by
-passing ``--debug-osd 10`` to ``osdmaptool``.
+passing ``--debug-osd 10`` and even more with ``--debug-crush 10``
+to ``osdmaptool``.
--- a/ceph/doc/rados/operations/user-management.rst
+++ b/ceph/doc/rados/operations/user-management.rst
@ -144,6 +144,35 @@ Capability syntax follows the form::
  the use of this capability is restricted to clients connecting from
  this network.

+- **Manager Caps:** Manager (``ceph-mgr``) capabilities include
+  ``r``, ``w``, ``x`` access settings or ``profile {name}``. For example: ::
+
+	mgr 'allow {access-spec} [network {network/prefix}]'
+
+	mgr 'profile {name} [{key1} {match-type} {value1} ...] [network {network/prefix}]'
+
+  Manager capabilities can also be specified for specific commands,
+  all commands exported by a built-in manager service, or all commands
+  exported by a specific add-on module. For example: ::
+
+        mgr 'allow command "{command-prefix}" [with {key1} {match-type} {value1} ...] [network {network/prefix}]'
+
+        mgr 'allow service {service-name} {access-spec} [network {network/prefix}]'
+
+        mgr 'allow module {module-name} [with {key1} {match-type} {value1} ...] {access-spec} [network {network/prefix}]'
+
+  The ``{access-spec}`` syntax is as follows: ::
+
+        * | all | [r][w][x]
+
+  The ``{service-name}`` is one of the following: ::
+
+        mgr | osd | pg | py
+
+  The ``{match-type}`` is one of the following: ::
+
+        = | prefix | regex
+
 - **Metadata Server Caps:** For administrators, use ``allow *``.  For all
  other users, such as CephFS clients, consult :doc:`/cephfs/client-auth`

@ -240,12 +269,15 @@ The following entries describe valid capability profiles:
              so they have permissions to add keys, etc. when bootstrapping
              an ``rbd-mirror`` daemon.

-``profile rbd`` (Monitor and OSD)
+``profile rbd`` (Manager, Monitor, and OSD)

 :Description: Gives a user permissions to manipulate RBD images. When used
              as a Monitor cap, it provides the minimal privileges required
-              by an RBD client application. When used as an OSD cap, it
-              provides read-write access to an RBD client application.
+              by an RBD client application; this includes the ability
+	      to blacklist other client users. When used as an OSD cap, it
+              provides read-write access to the specified pool to an
+	      RBD client application. The Manager cap supports optional
+              ``pool`` and ``namespace`` keyword arguments.

 ``profile rbd-mirror`` (Monitor only)

@ -253,9 +285,11 @@ The following entries describe valid capability profiles:
              RBD mirroring config-key secrets. It provides the minimal
              privileges required for the ``rbd-mirror`` daemon.

-``profile rbd-read-only`` (OSD only)
+``profile rbd-read-only`` (Manager and OSD)

-:Description: Gives a user read-only permissions to RBD images.
+:Description: Gives a user read-only permissions to RBD images. The Manager
+              cap supports optional ``pool`` and ``namespace`` keyword
+              arguments.


 Pool
--- a/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst
+++ b/ceph/doc/rados/troubleshooting/troubleshooting-mon.rst
@ -439,7 +439,18 @@ information stored in OSDs.::
    --cap mon 'allow *'
  ceph-authtool /path/to/admin.keyring -n client.admin \
    --cap mon 'allow *' --cap osd 'allow *' --cap mds 'allow *'
-  ceph-monstore-tool $ms rebuild -- --keyring /path/to/admin.keyring
+  # add one or more ceph-mgr's key to the keyring. in this case, an encoded key
+  # for mgr.x is added, you can find the encoded key in
+  # /etc/ceph/${cluster}.${mgr_name}.keyring on the machine where ceph-mgr is
+  # deployed
+  ceph-authtool /path/to/admin.keyring --add-key 'AQDN8kBe9PLWARAAZwxXMr+n85SBYbSlLcZnMA==' -n mgr.x \
+    --cap mon 'allow profile mgr' --cap osd 'allow *' --cap mds 'allow *'
+  # if your monitors' ids are not single characters like 'a', 'b', 'c', please
+  # specify them in the command line by passing them as arguments of the "--mon-ids"
+  # option. if you are not sure, please check your ceph.conf to see if there is any
+  # sections named like '[mon.foo]'. don't pass the "--mon-ids" option, if you are
+  # using DNS SRV for looking up monitors.
+  ceph-monstore-tool $ms rebuild -- --keyring /path/to/admin.keyring --mon-ids alpha beta gamma
  
  # make a backup of the corrupted store.db just in case!  repeat for
  # all monitors.
--- a/ceph/doc/radosgw/notifications.rst
+++ b/ceph/doc/radosgw/notifications.rst
@ -258,7 +258,7 @@ pushed or pulled using the pubsub sync module.
                   "eTag":"",
                   "versionId":"",
                   "sequencer": "",
-                   "metadata":""
+                   "metadata":[]
               }
           },
           "eventId":"",
@ -283,7 +283,7 @@ pushed or pulled using the pubsub sync module.
 - s3.object.version: object version in case of versioned bucket
 - s3.object.sequencer: monotonically increasing identifier of the change per object (hexadecimal format)
 - s3.object.metadata: any metadata set on the object sent as: ``x-amz-meta-`` (an extension to the S3 notification API) 
- s3.eventId: not supported (an extension to the S3 notification API)
+- s3.eventId: unique ID of the event, that could be used for acking (an extension to the S3 notification API)

 .. _PubSub Module : ../pubsub-module
 .. _S3 Notification Compatibility: ../s3-notification-compatibility
--- a/ceph/doc/radosgw/pubsub-module.rst
+++ b/ceph/doc/radosgw/pubsub-module.rst
@ -438,7 +438,7 @@ the events will have an S3-compatible record format (JSON):
                   "eTag":"",
                   "versionId":"",
                   "sequencer":"",
-                   "metadata":""
+                   "metadata":[]
               }
           },
           "eventId":"",
@ -452,7 +452,6 @@ the events will have an S3-compatible record format (JSON):
 - requestParameters: not supported
 - responseElements: not supported
 - s3.configurationId: notification ID that created the subscription for the event
- s3.eventId: unique ID of the event, that could be used for acking (an extension to the S3 notification API)
 - s3.bucket.name: name of the bucket
 - s3.bucket.ownerIdentity.principalId: owner of the bucket
 - s3.bucket.arn: ARN of the bucket
--- a/ceph/doc/rbd/rados-rbd-cmds.rst
+++ b/ceph/doc/rbd/rados-rbd-cmds.rst
@ -35,13 +35,13 @@ recommended that you utilize a more restricted user wherever possible.
 To `create a Ceph user`_, with ``ceph`` specify the ``auth get-or-create``
 command, user name, monitor caps, and OSD caps::

-        ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]'
+        ceph auth get-or-create client.{ID} mon 'profile rbd' osd 'profile {profile name} [pool={pool-name}][, profile ...]' mgr 'profile rbd [pool={pool-name}]'

 For example, to create a user ID named ``qemu`` with read-write access to the
 pool ``vms`` and read-only access to the pool ``images``, execute the
 following::

-	ceph auth get-or-create client.qemu mon 'profile rbd' osd 'profile rbd pool=vms, profile rbd-read-only pool=images'
+	ceph auth get-or-create client.qemu mon 'profile rbd' osd 'profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=images'

 The output from the ``ceph auth get-or-create`` command will be the keyring for
 the specified user, which can be written to ``/etc/ceph/ceph.client.{ID}.keyring``.
--- a/ceph/doc/rbd/rbd-config-ref.rst
+++ b/ceph/doc/rbd/rbd-config-ref.rst
@ -4,6 +4,17 @@

 See `Block Device`_ for additional details.

+Generic IO Settings
+===================
+
+``rbd compression hint``
+
+:Description: Hint to send to the OSDs on write operations. If set to `compressible` and the OSD `bluestore compression mode` setting is `passive`, the OSD will attempt to compress the data. If set to `incompressible` and the OSD compression setting is `aggressive`, the OSD will not attempt to compress the data.
+:Type: Enum
+:Required: No
+:Default: ``none``
+:Values: ``none``, ``compressible``, ``incompressible``
+
 Cache Settings
 =======================

--- a/ceph/doc/rbd/rbd-openstack.rst
+++ b/ceph/doc/rbd/rbd-openstack.rst
@ -132,9 +132,9 @@ Setup Ceph Client Authentication
 If you have `cephx authentication`_ enabled, create a new user for Nova/Cinder
 and Glance. Execute the following::

-    ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images'
-    ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images'
-    ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups'
+    ceph auth get-or-create client.glance mon 'profile rbd' osd 'profile rbd pool=images' mgr 'profile rbd pool=images'
+    ceph auth get-or-create client.cinder mon 'profile rbd' osd 'profile rbd pool=volumes, profile rbd pool=vms, profile rbd-read-only pool=images' mgr 'profile rbd pool=volumes, profile rbd pool=vms'
+    ceph auth get-or-create client.cinder-backup mon 'profile rbd' osd 'profile rbd pool=backups' mgr 'profile rbd pool=backups'

 Add the keyrings for ``client.cinder``, ``client.glance``, and
 ``client.cinder-backup`` to the appropriate nodes and change their ownership::
--- a/ceph/install-deps.sh
+++ b/ceph/install-deps.sh
@ -148,25 +148,33 @@ function install_pkg_on_ubuntu {

 function install_boost_on_ubuntu {
    local codename=$1
+    if dpkg -s ceph-libboost1.67-dev &> /dev/null; then
+	$SUDO env DEBIAN_FRONTEND=noninteractive apt-get -y remove 'ceph-libboost.*1.67.*'
+	$SUDO rm /etc/apt/sources.list.d/ceph-libboost1.67.list
+    fi
+    local project=libboost
+    local ver=1.72
+    local sha1=1d7c7a00cc3f37e340bae0360191a757b44ec80c
    install_pkg_on_ubuntu \
-	ceph-libboost1.67 \
-	dd38c27740c1f9a9e6719a07eef84a1369dc168b \
+	$project \
+	$sha1 \
 	$codename \
-	ceph-libboost-atomic1.67-dev \
-	ceph-libboost-chrono1.67-dev \
-	ceph-libboost-container1.67-dev \
-	ceph-libboost-context1.67-dev \
-	ceph-libboost-coroutine1.67-dev \
-	ceph-libboost-date-time1.67-dev \
-	ceph-libboost-filesystem1.67-dev \
-	ceph-libboost-iostreams1.67-dev \
-	ceph-libboost-program-options1.67-dev \
-	ceph-libboost-python1.67-dev \
-	ceph-libboost-random1.67-dev \
-	ceph-libboost-regex1.67-dev \
-	ceph-libboost-system1.67-dev \
-	ceph-libboost-thread1.67-dev \
-	ceph-libboost-timer1.67-dev
+	ceph-libboost-atomic$ver-dev \
+	ceph-libboost-chrono$ver-dev \
+	ceph-libboost-container$ver-dev \
+	ceph-libboost-context$ver-dev \
+	ceph-libboost-coroutine$ver-dev \
+	ceph-libboost-date-time$ver-dev \
+	ceph-libboost-filesystem$ver-dev \
+	ceph-libboost-iostreams$ver-dev \
+	ceph-libboost-program-options$ver-dev \
+	ceph-libboost-python$ver-dev \
+	ceph-libboost-random$ver-dev \
+	ceph-libboost-regex$ver-dev \
+	ceph-libboost-system$ver-dev \
+	ceph-libboost-test$ver-dev \
+	ceph-libboost-thread$ver-dev \
+	ceph-libboost-timer$ver-dev
 }

 function version_lt {
@ -350,7 +358,7 @@ else
 			    $SUDO yum -y install centos-release-scl-rh
 			    $SUDO yum-config-manager --disable centos-sclo-rh
 			    $SUDO yum-config-manager --enable centos-sclo-rh-testing
-			    dts_ver=7
+			    dts_ver=8
 			    ;;
 		    esac
                elif test $ID = rhel -a $MAJOR_VERSION = 7 ; then
@ -375,7 +383,11 @@ else
    opensuse*|suse|sles)
        echo "Using zypper to install dependencies"
        zypp_install="zypper --gpg-auto-import-keys --non-interactive install --no-recommends"
-        $SUDO $zypp_install systemd-rpm-macros
+        $SUDO $zypp_install systemd-rpm-macros rpm-build || exit 1
+        if [ -e /usr/bin/python2 ] ; then
+            # see https://tracker.ceph.com/issues/23981
+            $SUDO $zypp_install python2-virtualenv python2-devel || exit 1
+        fi
        munge_ceph_spec_in $for_make_check $DIR/ceph.spec
        $SUDO $zypp_install $(rpmspec -q --buildrequires $DIR/ceph.spec) || exit 1
        $SUDO $zypp_install libxmlsec1-1 libxmlsec1-nss1 libxmlsec1-openssl1 xmlsec1-devel xmlsec1-openssl-devel
--- a/ceph/make-dist
+++ b/ceph/make-dist
@ -61,25 +61,11 @@ download_boost() {
    rm -rf src/boost
 }

-_python_autoselect() {
-  python_command=
-  for interpreter in python2.7 python3 ; do
-    type $interpreter > /dev/null 2>&1 || continue
-    python_command=$interpreter
-    break
-  done
-  if [ -z "$python_command" ] ; then
-    echo "Could not find a suitable python interpreter! Bailing out."
-    exit 1
-  fi
-  echo $python_command
-}
-
 build_dashboard_frontend() {
  CURR_DIR=`pwd`
  TEMP_DIR=`mktemp -d`

-  $CURR_DIR/src/tools/setup-virtualenv.sh --python=$(_python_autoselect) $TEMP_DIR
+  $CURR_DIR/src/tools/setup-virtualenv.sh $TEMP_DIR
  $TEMP_DIR/bin/pip install nodeenv
  $TEMP_DIR/bin/nodeenv -p --node=10.13.0
  cd src/pybind/mgr/dashboard/frontend
@ -152,8 +138,8 @@ ln -s . $outfile
 tar cvf $outfile.version.tar $outfile/src/.git_version $outfile/ceph.spec $outfile/alpine/APKBUILD
 # NOTE: If you change this version number make sure the package is available
 # at the three URLs referenced below (may involve uploading to download.ceph.com)
-boost_version=1.67.0
-download_boost $boost_version 2684c972994ee57fc5632e03bf044746f6eb45d4920c343937a465fd67a5adba \
+boost_version=1.72.0
+download_boost $boost_version 59c9b274bc451cf91a9ba1dd2c7fdcaf5d60b1b3aa83f2c9fa143417cc660722 \
               https://dl.bintray.com/boostorg/release/$boost_version/source \
               https://downloads.sourceforge.net/project/boost/boost/$boost_version \
               https://download.ceph.com/qa
--- a/ceph/monitoring/grafana/dashboards/ceph-cluster.json
+++ b/ceph/monitoring/grafana/dashboards/ceph-cluster.json
@ -116,7 +116,7 @@
        }
      ],
      "thresholds": "1,2",
-      "timeFrom": "1m",
+      "timeFrom": null,
      "title": "Health Status",
      "transparent": false,
      "type": "singlestat",
@ -402,49 +402,49 @@
      "steppedLine": false,
      "targets": [
        {
-          "expr": "ceph_pg_total",
+          "expr": "sum(ceph_pg_total)",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Total",
          "refId": "A"
        },
        {
-          "expr": "ceph_pg_active",
+          "expr": "sum(ceph_pg_active)",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Active",
          "refId": "B"
        },
        {
-          "expr": "ceph_pg_total - ceph_pg_active",
+          "expr": "sum(ceph_pg_total - ceph_pg_active)",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Inactive",
          "refId": "G"
        },
        {
-          "expr": "ceph_pg_undersized",
+          "expr": "sum(ceph_pg_undersized)",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Undersized",
          "refId": "F"
        },
        {
-          "expr": "ceph_pg_degraded",
+          "expr": "sum(ceph_pg_degraded)",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Degraded",
          "refId": "C"
        },
        {
-          "expr": "ceph_pg_inconsistent",
+          "expr": "sum(ceph_pg_inconsistent)",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Inconsistent",
          "refId": "D"
        },
        {
-          "expr": "ceph_pg_down",
+          "expr": "sum(ceph_pg_down)",
          "format": "time_series",
          "intervalFactor": 1,
          "legendFormat": "Down",
--- a/ceph/monitoring/grafana/dashboards/host-details.json
+++ b/ceph/monitoring/grafana/dashboards/host-details.json
@ -38,17 +38,7 @@
  "graphTooltip": 0,
  "id": null,
  "iteration": 1557386759572,
-  "links": [
-    {
-      "asDropdown": true,
-      "icon": "external link",
-      "tags": [
-        "overview"
-      ],
-      "title": "Shortcuts",
-      "type": "dashboards"
-    }
-  ],
+  "links": [],
  "panels": [
    {
      "gridPos": {
@ -527,7 +517,7 @@
        }
      ],
      "thresholds": [],
-      "timeFrom": "15m",
+      "timeFrom": null,
      "timeShift": null,
      "title": "Network drop rate",
      "tooltip": {
@ -711,7 +701,7 @@
        }
      ],
      "thresholds": [],
-      "timeFrom": "15m",
+      "timeFrom": null,
      "timeShift": null,
      "title": "Network error rate",
      "tooltip": {
@ -1221,5 +1211,5 @@
  "timezone": "browser",
  "title": "Host Details",
  "uid": "rtOg0AiWz",
-  "version": 3
+  "version": 4
 }
--- a/ceph/monitoring/grafana/dashboards/osds-overview.json
+++ b/ceph/monitoring/grafana/dashboards/osds-overview.json
@ -503,7 +503,7 @@
          "step": 240
        }
      ],
-      "timeFrom": "2m",
+      "timeFrom": null,
      "timeShift": null,
      "title": "OSD Objectstore Types",
      "type": "grafana-piechart-panel",
@ -620,7 +620,7 @@
          "step": 2
        }
      ],
-      "timeFrom": "2m",
+      "timeFrom": null,
      "timeShift": null,
      "title": "OSD Size Summary",
      "type": "grafana-piechart-panel",
@ -781,7 +781,7 @@
        }
      ],
      "thresholds": [],
-      "timeFrom": "36h",
+      "timeFrom": null,
      "timeShift": null,
      "title": "Read/Write Profile",
      "tooltip": {
--- a/ceph/monitoring/prometheus/alerts/ceph_default_alerts.yml
+++ b/ceph/monitoring/prometheus/alerts/ceph_default_alerts.yml
@ -8,7 +8,10 @@ groups:
          severity: critical
          type: ceph_default
        annotations:
-          description: Ceph in health_error state for more than 5m
+          description: >
+            Ceph in HEALTH_ERROR state for more than 5 minutes.
+            Please check "ceph health detail" for more information.
+
      - alert: health warn
        expr: ceph_health_status == 1
        for: 15m
@ -16,7 +19,10 @@ groups:
          severity: warning
          type: ceph_default
        annotations:
-          description: Ceph in health_warn for more than 15m.
+          description: >
+            Ceph has been in HEALTH_WARN for more than 15 minutes.
+            Please check "ceph health detail" for more information.
+
  - name: mon
    rules:
      - alert: low monitor quorum count
@ -25,16 +31,32 @@ groups:
          severity: critical
          type: ceph_default
        annotations:
-          description: Monitor count in quorum is low.
+          description: |
+            Monitor count in quorum is below three.
+
+            Only {{ $value }} of {{ with query "count(ceph_mon_quorum_status)" }}{{ . | first | value }}{{ end }} monitors are active.
+
+            The following monitors are down:
+            {{- range query "(ceph_mon_quorum_status == 0) + on(ceph_daemon) group_left(hostname) (ceph_mon_metadata * 0)" }}
+              - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
+            {{- end }}
+
  - name: osd
    rules:
      - alert: 10% OSDs down
-        expr: sum(ceph_osd_up) / count(ceph_osd_in) <= 0.9
+        expr: (sum(ceph_osd_up) / count(ceph_osd_up)) * 100 <= 90
        labels:
          severity: critical
          type: ceph_default
        annotations:
-          description: More than 10% of OSDs are down.
+          description: |
+            {{ $value | humanize}}% or {{with query "sum(ceph_osd_up)" }}{{ . | first | value }}{{ end }} of {{ with query "count(ceph_osd_up)"}}{{. | first | value }}{{ end }} OSDs are down (>=10%).
+
+            The following OSDs are down:
+            {{- range query "(ceph_osd_up * on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0" }}
+              - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
+            {{- end }}
+
      - alert: OSD down
        expr: count(ceph_osd_up == 0) > 0
        for: 15m
@ -42,35 +64,63 @@ groups:
          severity: warning
          type: ceph_default
        annotations:
-          description: One or more OSDs down for more than 15 minutes.
+          description: |
+            {{ $s := "" }}{{ if gt $value 1.0 }}{{ $s = "s" }}{{ end }}
+            {{ $value }} OSD{{ $s }} down for more than 15 minutes.
+
+            {{ $value }} of {{ query "count(ceph_osd_up)" | first | value }} OSDs are down.
+
+            The following OSD{{ $s }} {{ if eq $s "" }}is{{ else }}are{{ end }} down:
+              {{- range query "(ceph_osd_up * on(ceph_daemon) group_left(hostname) ceph_osd_metadata) == 0"}}
+                - {{ .Labels.ceph_daemon }} on {{ .Labels.hostname }}
+              {{- end }}
+
      - alert: OSDs near full
-        expr: ((ceph_osd_stat_bytes_used / ceph_osd_stat_bytes) and on(ceph_daemon) ceph_osd_up == 1) > 0.8
+        expr: |
+          (
+            ((ceph_osd_stat_bytes_used / ceph_osd_stat_bytes) and on(ceph_daemon) ceph_osd_up == 1)
+            * on(ceph_daemon) group_left(hostname) ceph_osd_metadata
+          ) * 100 > 90
+        for: 5m
        labels:
          severity: critical
          type: ceph_default
        annotations:
-          description: OSD {{ $labels.ceph_daemon }} is dangerously full, over 80%.
-      # alert on single OSDs flapping
-      - alert: flap osd
-        expr: rate(ceph_osd_up[5m])*60 > 1
+          description: >
+            OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} is
+            dangerously full: {{ $value | humanize }}%
+
+      - alert: flapping OSD
+        expr: |
+          (
+            rate(ceph_osd_up[5m])
+            * on(ceph_daemon) group_left(hostname) ceph_osd_metadata
+          ) * 60 > 1
        labels:
          severity: warning
          type: ceph_default
        annotations:
          description: >
-              OSD {{ $labels.ceph_daemon }} was marked down and back up at least once a
-              minute for 5 minutes.
+            OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} was
+            marked down and back up at {{ $value | humanize }} times once a
+            minute for 5 minutes.
+
      # alert on high deviation from average PG count
      - alert: high pg count deviation
-        expr: abs(((ceph_osd_numpg > 0) - on (job) group_left avg(ceph_osd_numpg > 0) by (job)) / on (job) group_left avg(ceph_osd_numpg > 0) by (job)) > 0.35
+        expr: |
+          abs(
+            (
+              (ceph_osd_numpg > 0) - on (job) group_left avg(ceph_osd_numpg > 0) by (job)
+            ) / on (job) group_left avg(ceph_osd_numpg > 0) by (job)
+          ) * on(ceph_daemon) group_left(hostname) ceph_osd_metadata > 0.30
        for: 5m
        labels:
          severity: warning
          type: ceph_default
        annotations:
          description: >
-              OSD {{ $labels.ceph_daemon }} deviates by more than 30% from
-              average PG count.
+            OSD {{ $labels.ceph_daemon }} on {{ $labels.hostname }} deviates
+            by more than 30% from average PG count.
      # alert on high commit latency...but how high is too high
  - name: mds
    rules:
@ -81,30 +131,38 @@ groups:
  - name: pgs
    rules:
      - alert: pgs inactive
-        expr: ceph_pg_total - ceph_pg_active > 0
+        expr: ceph_pool_metadata * on(pool_id,instance) group_left() (ceph_pg_total - ceph_pg_active) > 0
        for: 5m
        labels:
          severity: critical
          type: ceph_default
        annotations:
-          description: One or more PGs are inactive for more than 5 minutes.
+          description: >
+            {{ $value }} PGs have been inactive for more than 5 minutes in pool {{ $labels.name }}.
+            Inactive placement groups aren't able to serve read/write
+            requests.
      - alert: pgs unclean
-        expr: ceph_pg_total - ceph_pg_clean > 0
+        expr: ceph_pool_metadata * on(pool_id,instance) group_left() (ceph_pg_total - ceph_pg_clean) > 0
        for: 15m
        labels:
          severity: warning
          type: ceph_default
        annotations:
-          description: One or more PGs are not clean for more than 15 minutes.
+          description: >
+            {{ $value }} PGs haven't been clean for more than 15 minutes in pool {{ $labels.name }}.
+            Unclean PGs haven't been able to completely recover from a
+            previous failure.
  - name: nodes
    rules:
      - alert: root volume full
-        expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} < 0.05
+        expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100 < 5
        labels:
          severity: critical
          type: ceph_default
        annotations:
-          description: Root volume (OSD and MON store) is dangerously full (< 5% free).
+          description: >
+            Root volume (OSD and MON store) is dangerously full: {{ $value | humanize }}% free.
+
      # alert on nic packet errors and drops rates > 1 packet/s
      - alert: network packets dropped
        expr: irate(node_network_receive_drop_total{device!="lo"}[5m]) + irate(node_network_transmit_drop_total{device!="lo"}[5m]) > 1
@ -115,8 +173,11 @@ groups:
          description: >
            Node {{ $labels.instance }} experiences packet drop > 1
            packet/s on interface {{ $labels.device }}.
+
      - alert: network packet errors
-        expr: irate(node_network_receive_errs_total{device!="lo"}[5m]) + irate(node_network_transmit_errs_total{device!="lo"}[5m]) > 1
+        expr: |
+          irate(node_network_receive_errs_total{device!="lo"}[5m]) +
+          irate(node_network_transmit_errs_total{device!="lo"}[5m]) > 1
        labels:
          severity: warning
          type: ceph_default
@ -124,31 +185,48 @@ groups:
          description: >
            Node {{ $labels.instance }} experiences packet errors > 1
            packet/s on interface {{ $labels.device }}.
-      # predict fs fillup times
+
+      # predict fs fill-up times
      - alert: storage filling
-        expr: ((node_filesystem_free_bytes) / deriv(node_filesystem_free_bytes[2d]) <= 5) > 0
+        expr: |
+          (
+            (
+              node_filesystem_free_bytes / deriv(node_filesystem_free_bytes[2d])
+              * on(instance) group_left(nodename) node_uname_info
+            ) <= 5
+          ) > 0
        labels:
          severity: warning
          type: ceph_default
        annotations:
          description: >
-            Mountpoint {{ $labels.mountpoint }} will be full in less than 5 days
-            assuming the average fillup rate of the past 48 hours.
+            Mountpoint {{ $labels.mountpoint }} on {{ $labels.nodename }}
+            will be full in less than 5 days assuming the average fill-up
+            rate of the past 48 hours.
+
  - name: pools
    rules:
      - alert: pool full
-        expr: ceph_pool_stored / ceph_pool_max_avail * on(pool_id) group_right ceph_pool_metadata > 0.9
+        expr: |
+          ceph_pool_stored / ceph_pool_max_avail
+          * on(pool_id) group_right ceph_pool_metadata * 100 > 90
        labels:
          severity: critical
          type: ceph_default
        annotations:
-          description: Pool {{ $labels.name }} at 90% capacity or over.
+          description: Pool {{ $labels.name }} at {{ $value | humanize }}% capacity.
+
      - alert: pool filling up
-        expr: (((ceph_pool_max_avail - ceph_pool_stored) / deriv(ceph_pool_max_avail[2d])) * on(pool_id) group_right ceph_pool_metadata <=5) > 0
+        expr: |
+          (
+            (
+              (ceph_pool_max_avail - ceph_pool_stored) / deriv(ceph_pool_max_avail[2d])
+            ) * on(pool_id) group_right ceph_pool_metadata <= 5
+          ) > 0
        labels:
          severity: warning
          type: ceph_default
        annotations:
          description: >
            Pool {{ $labels.name }} will be full in less than 5 days
-            assuming the average fillup rate of the past 48 hours.
+            assuming the average fill-up rate of the past 48 hours.
--- a/ceph/qa/cephfs/tasks/cfuse_workunit_suites_ffsb.yaml
+++ b/ceph/qa/cephfs/tasks/cfuse_workunit_suites_ffsb.yaml
@ -1,5 +1,7 @@
 overrides:
  ceph:
+    log-whitelist:
+    - SLOW_OPS
    conf:
      osd:
        filestore flush min: 0
--- a/ceph/qa/rbd/krbd_blkroset.t
+++ b/ceph/qa/rbd/krbd_blkroset.t
@ -2,8 +2,11 @@
 Setup
 =====

+  $ RO_KEY=$(ceph auth get-or-create-key client.ro mon 'profile rbd' mgr 'profile rbd' osd 'profile rbd-read-only')
  $ rbd create --size 10 img
  $ rbd snap create img@snap
+  $ rbd snap protect img@snap
+  $ rbd clone img@snap cloneimg
  $ rbd create --size 1 imgpart
  $ DEV=$(sudo rbd map imgpart)
  $ cat <<EOF | sudo sfdisk $DEV >/dev/null 2>&1
@ -144,10 +147,16 @@ R/O, unpartitioned:
  .*BLKROSET: Permission denied (re)
  [1]
  $ sudo blockdev --setrw $DEV
+  .*BLKROSET: Read-only file system (re)
+  [1]
  $ blockdev --getro $DEV
-  0
+  1
  $ dd if=/dev/urandom of=$DEV bs=1k seek=1 count=1 status=none
+  dd: error writing '/dev/rbd?': Operation not permitted (glob)
+  [1]
  $ blkdiscard $DEV
+  blkdiscard: /dev/rbd?: BLKDISCARD ioctl failed: Operation not permitted (glob)
+  [1]
  $ sudo rbd unmap $DEV

 R/O, partitioned:
@ -174,18 +183,30 @@ R/O, partitioned:
  .*BLKROSET: Permission denied (re)
  [1]
  $ sudo blockdev --setrw ${DEV}p1
+  .*BLKROSET: Read-only file system (re)
+  [1]
  $ blockdev --setrw ${DEV}p2
  .*BLKROSET: Permission denied (re)
  [1]
  $ sudo blockdev --setrw ${DEV}p2
+  .*BLKROSET: Read-only file system (re)
+  [1]
  $ blockdev --getro ${DEV}p1
-  0
+  1
  $ blockdev --getro ${DEV}p2
-  0
+  1
  $ dd if=/dev/urandom of=${DEV}p1 bs=1k seek=1 count=1 status=none
+  dd: error writing '/dev/rbd?p1': Operation not permitted (glob)
+  [1]
  $ blkdiscard ${DEV}p1
+  blkdiscard: /dev/rbd?p1: BLKDISCARD ioctl failed: Operation not permitted (glob)
+  [1]
  $ dd if=/dev/urandom of=${DEV}p2 bs=1k seek=1 count=1 status=none
+  dd: error writing '/dev/rbd?p2': Operation not permitted (glob)
+  [1]
  $ blkdiscard ${DEV}p2
+  blkdiscard: /dev/rbd?p2: BLKDISCARD ioctl failed: Operation not permitted (glob)
+  [1]
  $ sudo rbd unmap $DEV


@ -270,6 +291,45 @@ Partitioned:
  $ sudo rbd unmap $DEV


+read-only OSD caps
+==================
+
+R/W:
+
+  $ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) img)
+  rbd: sysfs write failed
+  rbd: map failed: (1) Operation not permitted
+  [1]
+
+R/O:
+
+  $ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) --read-only img)
+  $ blockdev --getro $DEV
+  1
+  $ sudo rbd unmap $DEV
+
+Snapshot:
+
+  $ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) img@snap)
+  $ blockdev --getro $DEV
+  1
+  $ sudo rbd unmap $DEV
+
+R/W, clone:
+
+  $ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) cloneimg)
+  rbd: sysfs write failed
+  rbd: map failed: (1) Operation not permitted
+  [1]
+
+R/O, clone:
+
+  $ DEV=$(sudo rbd map --id ro --key $(echo $RO_KEY) --read-only cloneimg)
+  $ blockdev --getro $DEV
+  1
+  $ sudo rbd unmap $DEV
+
+
 rw -> ro with open_count > 0
 ============================

@ -288,6 +348,8 @@ Teardown

  $ rbd snap purge imgpart >/dev/null 2>&1
  $ rbd rm imgpart >/dev/null 2>&1
+  $ rbd rm cloneimg >/dev/null 2>&1
+  $ rbd snap unprotect img@snap
  $ rbd snap purge img >/dev/null 2>&1
  $ rbd rm img >/dev/null 2>&1

--- a/ceph/qa/rbd/krbd_get_features.t
+++ b/ceph/qa/rbd/krbd_get_features.t
@ -0,0 +1,31 @@
+
+journaling makes the image only unwritable, rather than both unreadable
+and unwritable:
+
+  $ rbd create --size 1 --image-feature layering,exclusive-lock,journaling img
+  $ rbd snap create img@snap
+  $ rbd snap protect img@snap
+  $ rbd clone --image-feature layering,exclusive-lock,journaling img@snap cloneimg
+
+  $ DEV=$(sudo rbd map img)
+  rbd: sysfs write failed
+  rbd: map failed: (6) No such device or address
+  [6]
+  $ DEV=$(sudo rbd map --read-only img)
+  $ blockdev --getro $DEV
+  1
+  $ sudo rbd unmap $DEV
+
+  $ DEV=$(sudo rbd map cloneimg)
+  rbd: sysfs write failed
+  rbd: map failed: (6) No such device or address
+  [6]
+  $ DEV=$(sudo rbd map --read-only cloneimg)
+  $ blockdev --getro $DEV
+  1
+  $ sudo rbd unmap $DEV
+
+  $ rbd rm --no-progress cloneimg
+  $ rbd snap unprotect img@snap
+  $ rbd snap rm --no-progress img@snap
+  $ rbd rm --no-progress img
--- a/ceph/qa/rbd/krbd_huge_image.t
+++ b/ceph/qa/rbd/krbd_huge_image.t
@ -24,18 +24,18 @@ Write to first and last sectors and make sure we hit the right objects:
 Dump first and last megabytes:

  $ DEV=$(sudo rbd map hugeimg/img)
-  $ hexdump -n 1048576 $DEV
+  $ dd if=$DEV bs=1M count=1 status=none | hexdump
  0000000 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
  *
  0000200 0000 0000 0000 0000 0000 0000 0000 0000
  *
  0100000
-  $ hexdump -s 4611686018426339328 $DEV
-  3ffffffffff00000 0000 0000 0000 0000 0000 0000 0000 0000
+  $ dd if=$DEV bs=1M skip=4398046511103 status=none | hexdump
+  0000000 0000 0000 0000 0000 0000 0000 0000 0000
  *
-  3ffffffffffffe00 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
+  00ffe00 cdcd cdcd cdcd cdcd cdcd cdcd cdcd cdcd
  *
-  4000000000000000
+  0100000
  $ sudo rbd unmap $DEV

  $ ceph osd pool delete hugeimg hugeimg --yes-i-really-really-mean-it >/dev/null 2>&1
--- a/ceph/qa/standalone/ceph-helpers.sh
+++ b/ceph/qa/standalone/ceph-helpers.sh
@ -293,8 +293,6 @@ function test_kill_daemon() {
        kill_daemon $pidfile TERM || return 1
    done

-    ceph osd dump | grep "osd.0 down" || return 1
-
    name_prefix=mgr
    for pidfile in $(find $dir 2>/dev/null | grep $name_prefix'[^/]*\.pid') ; do
        #
@ -381,7 +379,6 @@ function test_kill_daemons() {
    # killing just the osd and verify the mon still is responsive
    #
    kill_daemons $dir TERM osd || return 1
-    ceph osd dump | grep "osd.0 down" || return 1
    #
    # kill the mgr
    #
@ -780,6 +777,7 @@ function destroy_osd() {

    ceph osd out osd.$id || return 1
    kill_daemons $dir TERM osd.$id || return 1
+    ceph osd down osd.$id || return 1
    ceph osd purge osd.$id --yes-i-really-mean-it || return 1
    teardown $dir/$id || return 1
    rm -fr $dir/$id
@ -930,8 +928,10 @@ function test_wait_for_osd() {
    run_mon $dir a --osd_pool_default_size=1 || return 1
    run_mgr $dir x || return 1
    run_osd $dir 0 || return 1
+    run_osd $dir 1 || return 1
    wait_for_osd up 0 || return 1
-    kill_daemons $dir TERM osd || return 1
+    wait_for_osd up 1 || return 1
+    kill_daemons $dir TERM osd.0 || return 1
    wait_for_osd down 0 || return 1
    ( TIMEOUT=1 ; ! wait_for_osd up 0 ) || return 1
    teardown $dir || return 1
@ -1313,6 +1313,36 @@ function test_get_num_active_clean() {
    teardown $dir || return 1
 }

+##
+# Return the number of active or peered PGs in the cluster. A PG matches if
+# ceph pg dump pgs reports it is either **active** or **peered** and that
+# not **stale**.
+#
+# @param STDOUT the number of active PGs
+# @return 0 on success, 1 on error
+#
+function get_num_active_or_peered() {
+    local expression
+    expression+="select(contains(\"active\") or contains(\"peered\")) | "
+    expression+="select(contains(\"stale\") | not)"
+    ceph --format json pg dump pgs 2>/dev/null | \
+        jq ".pg_stats | [.[] | .state | $expression] | length"
+}
+
+function test_get_num_active_or_peered() {
+    local dir=$1
+
+    setup $dir || return 1
+    run_mon $dir a --osd_pool_default_size=1 || return 1
+    run_mgr $dir x || return 1
+    run_osd $dir 0 || return 1
+    create_rbd_pool || return 1
+    wait_for_clean || return 1
+    local num_peered=$(get_num_active_or_peered)
+    test "$num_peered" = $PG_NUM || return 1
+    teardown $dir || return 1
+}
+
 #######################################################################

 ##
@ -1588,6 +1618,64 @@ function test_wait_for_clean() {
    teardown $dir || return 1
 }

+##
+# Wait until the cluster becomes peered or if it does not make progress
+# for $WAIT_FOR_CLEAN_TIMEOUT seconds.
+# Progress is measured either via the **get_is_making_recovery_progress**
+# predicate or if the number of peered PGs changes (as returned by get_num_active_or_peered)
+#
+# @return 0 if the cluster is clean, 1 otherwise
+#
+function wait_for_peered() {
+    local cmd=$1
+    local num_peered=-1
+    local cur_peered
+    local -a delays=($(get_timeout_delays $WAIT_FOR_CLEAN_TIMEOUT .1))
+    local -i loop=0
+
+    flush_pg_stats || return 1
+    while test $(get_num_pgs) == 0 ; do
+	sleep 1
+    done
+
+    while true ; do
+        # Comparing get_num_active_clean & get_num_pgs is used to determine
+        # if the cluster is clean. That's almost an inline of is_clean() to
+        # get more performance by avoiding multiple calls of get_num_active_clean.
+        cur_peered=$(get_num_active_or_peered)
+        test $cur_peered = $(get_num_pgs) && break
+        if test $cur_peered != $num_peered ; then
+            loop=0
+            num_peered=$cur_peered
+        elif get_is_making_recovery_progress ; then
+            loop=0
+        elif (( $loop >= ${#delays[*]} )) ; then
+            ceph report
+            return 1
+        fi
+	# eval is a no-op if cmd is empty
+        eval $cmd
+        sleep ${delays[$loop]}
+        loop+=1
+    done
+    return 0
+}
+
+function test_wait_for_peered() {
+    local dir=$1
+
+    setup $dir || return 1
+    run_mon $dir a --osd_pool_default_size=2 || return 1
+    run_osd $dir 0 || return 1
+    run_mgr $dir x || return 1
+    create_rbd_pool || return 1
+    ! WAIT_FOR_CLEAN_TIMEOUT=1 wait_for_clean || return 1
+    run_osd $dir 1 || return 1
+    wait_for_peered || return 1
+    teardown $dir || return 1
+}
+
+
 #######################################################################

 ##
--- a/ceph/qa/standalone/mgr/balancer.sh
+++ b/ceph/qa/standalone/mgr/balancer.sh
@ -67,9 +67,9 @@ function TEST_balancer() {
    ceph balancer pool add $TEST_POOL1 || return 1
    ceph balancer pool add $TEST_POOL2 || return 1
    ceph balancer pool ls || return 1
-    eval POOL=$(ceph balancer pool ls | jq '.[0]')
+    eval POOL=$(ceph balancer pool ls | jq 'sort | .[0]')
    test "$POOL" = "$TEST_POOL1" || return 1
-    eval POOL=$(ceph balancer pool ls | jq '.[1]')
+    eval POOL=$(ceph balancer pool ls | jq 'sort | .[1]')
    test "$POOL" = "$TEST_POOL2" || return 1
    ceph balancer pool rm $TEST_POOL1 || return 1
    ceph balancer pool rm $TEST_POOL2 || return 1
@ -104,7 +104,7 @@ function TEST_balancer() {
    ! ceph balancer optimize plan_upmap $TEST_POOL || return 1
    ceph balancer status || return 1
    eval RESULT=$(ceph balancer status | jq '.optimize_result')
-    test "$RESULT" = "Unable to find further optimization, or pool(s)' pg_num is decreasing, or distribution is already perfect" || return 1
+    test "$RESULT" = "Unable to find further optimization, or pool(s) pg_num is decreasing, or distribution is already perfect" || return 1

    ceph balancer on || return 1
    ACTIVE=$(ceph balancer status | jq '.active')
@ -118,6 +118,102 @@ function TEST_balancer() {
    teardown $dir || return 1
 }

+function TEST_balancer2() {
+    local dir=$1
+    TEST_PGS1=118
+    TEST_PGS2=132
+    TOTAL_PGS=$(expr $TEST_PGS1 + $TEST_PGS2)
+    OSDS=5
+    DEFAULT_REPLICAS=3
+    # Integer average of PGS per OSD (70.8), so each OSD >= this
+    FINAL_PER_OSD1=$(expr \( $TEST_PGS1 \* $DEFAULT_REPLICAS \) / $OSDS)
+    # Integer average of PGS per OSD (150)
+    FINAL_PER_OSD2=$(expr \( \( $TEST_PGS1 + $TEST_PGS2 \) \* $DEFAULT_REPLICAS \) / $OSDS)
+
+    CEPH_ARGS+="--osd_pool_default_pg_autoscale_mode=off "
+    CEPH_ARGS+="--debug_osd=20 "
+    setup $dir || return 1
+    run_mon $dir a || return 1
+    run_mgr $dir x || return 1
+    for i in $(seq 0 $(expr $OSDS - 1))
+    do
+      run_osd $dir $i || return 1
+    done
+
+    ceph osd set-require-min-compat-client luminous
+    ceph config set mgr mgr/balancer/upmap_max_deviation 1
+    ceph balancer mode upmap || return 1
+    ceph balancer on || return 1
+    ceph config set mgr mgr/balancer/sleep_interval 5
+
+    create_pool $TEST_POOL1 $TEST_PGS1
+
+    wait_for_clean || return 1
+
+    # Wait up to 2 minutes
+    OK=no
+    for i in $(seq 1 25)
+    do
+      sleep 5
+      if grep -q "Optimization plan is almost perfect" $dir/mgr.x.log
+      then
+        OK=yes
+        break
+      fi
+    done
+    test $OK = "yes" || return 1
+    # Plan is found, but PGs still need to move
+    sleep 30
+    ceph osd df
+
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[0].pgs')
+    test $PGS -ge $FINAL_PER_OSD1 || return 1
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[1].pgs')
+    test $PGS -ge $FINAL_PER_OSD1 || return 1
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[2].pgs')
+    test $PGS -ge $FINAL_PER_OSD1 || return 1
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[3].pgs')
+    test $PGS -ge $FINAL_PER_OSD1 || return 1
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[4].pgs')
+    test $PGS -ge $FINAL_PER_OSD1 || return 1
+
+    create_pool $TEST_POOL2 $TEST_PGS2
+
+    # Wait up to 2 minutes
+    OK=no
+    for i in $(seq 1 25)
+    do
+      sleep 5
+      COUNT=$(grep "Optimization plan is almost perfect" $dir/mgr.x.log | wc -l)
+      if test $COUNT = "2"
+      then
+        OK=yes
+        break
+      fi
+    done
+    test $OK = "yes" || return 1
+    # Plan is found, but PGs still need to move
+    sleep 30
+    ceph osd df
+
+    # We should be with plue or minus 1 of FINAL_PER_OSD2
+    # This is because here each pool is balanced independently
+    MIN=$(expr $FINAL_PER_OSD2 - 1)
+    MAX=$(expr $FINAL_PER_OSD2 + 1)
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[0].pgs')
+    test $PGS -ge $MIN -a $PGS -le $MAX || return 1
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[1].pgs')
+    test $PGS -ge $MIN -a $PGS -le $MAX || return 1
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[2].pgs')
+    test $PGS -ge $MIN -a $PGS -le $MAX || return 1
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[3].pgs')
+    test $PGS -ge $MIN -a $PGS -le $MAX || return 1
+    PGS=$(ceph osd df --format=json-pretty | jq '.nodes[4].pgs')
+    test $PGS -ge $MIN -a $PGS -le $MAX || return 1
+
+    teardown $dir || return 1
+}
+
 main balancer "$@"

 # Local Variables:
--- a/ceph/qa/standalone/misc/ok-to-stop.sh
+++ b/ceph/qa/standalone/misc/ok-to-stop.sh
@ -237,5 +237,53 @@ function TEST_0_mds() {
    kill_daemons $dir KILL mds.a
 }

+function TEST_0_osd() {
+    local dir=$1
+
+    CEPH_ARGS="$ORIG_CEPH_ARGS --mon-host=$CEPH_MON_A "
+
+    run_mon $dir a --public-addr=$CEPH_MON_A || return 1
+    run_mgr $dir x || return 1
+    run_osd $dir 0 || return 1
+    run_osd $dir 1 || return 1
+    run_osd $dir 2 || return 1
+    run_osd $dir 3 || return 1
+
+    ceph osd erasure-code-profile set ec-profile m=2 k=2 crush-failure-domain=osd || return 1
+    ceph osd pool create ec 8 erasure ec-profile || return 1
+
+    wait_for_clean || return 1
+
+    # with min_size 3, we can stop only 1 osd
+    ceph osd pool set ec min_size 3 || return 1
+    wait_for_clean || return 1
+
+    ceph osd ok-to-stop 0 || return 1
+    ceph osd ok-to-stop 1 || return 1
+    ceph osd ok-to-stop 2 || return 1
+    ceph osd ok-to-stop 3 || return 1
+    ! ceph osd ok-to-stop 0 1 || return 1
+    ! ceph osd ok-to-stop 2 3 || return 1
+
+    # with min_size 2 we can stop 1 osds
+    ceph osd pool set ec min_size 2 || return 1
+    wait_for_clean || return 1
+
+    ceph osd ok-to-stop 0 1 || return 1
+    ceph osd ok-to-stop 2 3 || return 1
+    ! ceph osd ok-to-stop 0 1 2 || return 1
+    ! ceph osd ok-to-stop 1 2 3 || return 1
+
+    # we should get the same result with one of the osds already down
+    kill_daemons $dir TERM osd.0 || return 1
+    ceph osd down 0 || return 1
+    wait_for_peered || return 1
+
+    ceph osd ok-to-stop 0 || return 1
+    ceph osd ok-to-stop 0 1 || return 1
+    ! ceph osd ok-to-stop 0 1 2 || return 1
+    ! ceph osd ok-to-stop 1 2 3 || return 1
+}
+

 main ok-to-stop "$@"
--- a/ceph/qa/standalone/osd/osd-backfill-space.sh
+++ b/ceph/qa/standalone/osd/osd-backfill-space.sh
@ -49,7 +49,7 @@ function get_num_in_state() {
 }


-function wait_for_state() {
+function wait_for_not_state() {
    local state=$1
    local num_in_state=-1
    local cur_in_state
@ -78,15 +78,15 @@ function wait_for_state() {
 }


-function wait_for_backfill() {
+function wait_for_not_backfilling() {
    local timeout=$1
-    wait_for_state backfilling $timeout
+    wait_for_not_state backfilling $timeout
 }


-function wait_for_active() {
+function wait_for_not_activating() {
    local timeout=$1
-    wait_for_state activating $timeout
+    wait_for_not_state activating $timeout
 }

 # All tests are created in an environment which has fake total space
@ -149,8 +149,8 @@ function TEST_backfill_test_simple() {
    done
    sleep 5

-    wait_for_backfill 240 || return 1
-    wait_for_active 60 || return 1
+    wait_for_not_backfilling 240 || return 1
+    wait_for_not_activating 60 || return 1

    ERRORS=0
    if [ "$(ceph pg dump pgs | grep +backfill_toofull | wc -l)" != "1" ];
@ -228,8 +228,8 @@ function TEST_backfill_test_multi() {
    done
    sleep 5

-    wait_for_backfill 240 || return 1
-    wait_for_active 60 || return 1
+    wait_for_not_backfilling 240 || return 1
+    wait_for_not_activating 60 || return 1

    ERRORS=0
    full="$(ceph pg dump pgs | grep +backfill_toofull | wc -l)"
@ -380,8 +380,8 @@ function TEST_backfill_test_sametarget() {
    ceph osd pool set $pool2 size 2
    sleep 5

-    wait_for_backfill 240 || return 1
-    wait_for_active 60 || return 1
+    wait_for_not_backfilling 240 || return 1
+    wait_for_not_activating 60 || return 1

    ERRORS=0
    if [ "$(ceph pg dump pgs | grep +backfill_toofull | wc -l)" != "1" ];
@ -515,8 +515,8 @@ function TEST_backfill_multi_partial() {
    ceph osd in osd.$fillosd
    sleep 15

-    wait_for_backfill 240 || return 1
-    wait_for_active 60 || return 1
+    wait_for_not_backfilling 240 || return 1
+    wait_for_not_activating 60 || return 1

    flush_pg_stats || return 1
    ceph pg dump pgs
@ -698,8 +698,8 @@ function TEST_ec_backfill_simple() {

    ceph pg dump pgs

-    wait_for_backfill 240 || return 1
-    wait_for_active 60 || return 1
+    wait_for_not_backfilling 240 || return 1
+    wait_for_not_activating 60 || return 1

    ceph pg dump pgs

@ -822,8 +822,8 @@ function TEST_ec_backfill_multi() {

    sleep 10

-    wait_for_backfill 240 || return 1
-    wait_for_active 60 || return 1
+    wait_for_not_backfilling 240 || return 1
+    wait_for_not_activating 60 || return 1

    ceph pg dump pgs

@ -961,8 +961,8 @@ function SKIP_TEST_ec_backfill_multi_partial() {
    sleep 10
    ceph pg dump pgs

-    wait_for_backfill 240 || return 1
-    wait_for_active 60 || return 1
+    wait_for_not_backfilling 240 || return 1
+    wait_for_not_activating 60 || return 1

    ceph pg dump pgs

@ -1069,8 +1069,8 @@ function SKIP_TEST_ec_backfill_multi_partial() {
    ceph osd in osd.$fillosd
    sleep 15

-    wait_for_backfill 240 || return 1
-    wait_for_active 60 || return 1
+    wait_for_not_backfilling 240 || return 1
+    wait_for_not_activating 60 || return 1

    ERRORS=0
    if [ "$(ceph pg dump pgs | grep -v "^1.0" | grep +backfill_toofull | wc -l)" != "1" ];
--- a/ceph/qa/standalone/osd/osd-recovery-space.sh
+++ b/ceph/qa/standalone/osd/osd-recovery-space.sh
@ -49,7 +49,6 @@ function get_num_in_state() {

 function wait_for_state() {
    local state=$1
-    local num_in_state=-1
    local cur_in_state
    local -a delays=($(get_timeout_delays $2 5))
    local -i loop=0
@ -61,11 +60,8 @@ function wait_for_state() {

    while true ; do
        cur_in_state=$(get_num_in_state ${state})
-        test $cur_in_state = "0" && break
-        if test $cur_in_state != $num_in_state ; then
-            loop=0
-            num_in_state=$cur_in_state
-        elif (( $loop >= ${#delays[*]} )) ; then
+        test $cur_in_state -gt 0 && break
+        if (( $loop >= ${#delays[*]} )) ; then
            ceph pg dump pgs
            return 1
        fi
--- a/ceph/qa/suites/ceph-ansible/smoke/basic/2-ceph/ceph_ansible.yaml
+++ b/ceph/qa/suites/ceph-ansible/smoke/basic/2-ceph/ceph_ansible.yaml
@ -3,7 +3,8 @@ meta:

 overrides:
   ceph_ansible:
-     ansible-version: "2.8"
+     ansible-version: '2.8.1'
+     branch: stable-4.0
     vars:
        ceph_conf_overrides:
          global:
--- a/ceph/qa/suites/fs/basic_functional/tasks/admin.yaml
+++ b/ceph/qa/suites/fs/basic_functional/tasks/admin.yaml
@ -0,0 +1,11 @@
+
+overrides:
+  ceph:
+    conf:
+      global:
+        lockdep: true
+
+tasks:
+  - cephfs_test_runner:
+      modules:
+        - tasks.cephfs.test_admin
--- a/ceph/qa/suites/fs/basic_functional/tasks/config-commands.yaml
+++ b/ceph/qa/suites/fs/basic_functional/tasks/config-commands.yaml
@ -1,11 +0,0 @@
-
-overrides:
-  ceph:
-    conf:
-      global:
-        lockdep: true
-
-tasks:
-  - cephfs_test_runner:
-      modules:
-        - tasks.cephfs.test_config_commands
--- a/ceph/qa/suites/fs/basic_functional/tasks/data-scan.yaml
+++ b/ceph/qa/suites/fs/basic_functional/tasks/data-scan.yaml
@ -12,6 +12,7 @@ overrides:
      - Scrub error on inode
      - Metadata damage detected
      - inconsistent rstat on inode
+      - Error recovering journal

 tasks:
  - cephfs_test_runner:
--- a/ceph/qa/suites/fs/basic_functional/tasks/openfiletable.yaml
+++ b/ceph/qa/suites/fs/basic_functional/tasks/openfiletable.yaml
@ -0,0 +1,5 @@
+
+tasks:
+  - cephfs_test_runner:
+      modules:
+        - tasks.cephfs.test_openfiletable
--- a/ceph/qa/suites/fs/basic_functional/tasks/volumes.yaml
+++ b/ceph/qa/suites/fs/basic_functional/tasks/volumes.yaml
@ -1,3 +1,15 @@
+overrides:
+  ceph:
+    log-whitelist:
+      - OSD full dropping all updates
+      - OSD near full
+      - pausewr flag
+      - failsafe engaged, dropping updates
+      - failsafe disengaged, no longer dropping
+      - is full \(reached quota
+      - POOL_FULL
+      - POOL_BACKFILLFULL
+
 tasks:
  - cephfs_test_runner:
      modules:
--- a/ceph/qa/suites/fs/multiclient/tasks/cephfs_misc_tests.yaml
+++ b/ceph/qa/suites/fs/multiclient/tasks/cephfs_misc_tests.yaml
@ -9,3 +9,5 @@ overrides:
      - evicting unresponsive client
      - POOL_APP_NOT_ENABLED
      - has not responded to cap revoke by MDS for over
+      - MDS_CLIENT_LATE_RELEASE
+      - responding to mclientcaps
--- a/ceph/qa/suites/fs/verify/validater/valgrind.yaml
+++ b/ceph/qa/suites/fs/verify/validater/valgrind.yaml
@ -17,6 +17,8 @@ overrides:
        mds heartbeat grace: 60
      mon:
        mon osd crush smoke test: false
+      osd:
+        osd fast shutdown: false
    valgrind:
      mon: [--tool=memcheck, --leak-check=full, --show-reachable=yes]
      osd: [--tool=memcheck]
--- a/ceph/qa/suites/kcephfs/cephfs/tasks/kclient_workunit_suites_ffsb.yaml
+++ b/ceph/qa/suites/kcephfs/cephfs/tasks/kclient_workunit_suites_ffsb.yaml
@ -1,3 +1,7 @@
+overrides:
+  ceph:
+    log-whitelist:
+    - SLOW_OPS
 tasks:
 - workunit:
    clients:
--- a/ceph/qa/suites/kcephfs/recovery/tasks/config-commands.yaml
+++ b/ceph/qa/suites/kcephfs/recovery/tasks/config-commands.yaml
@ -1,12 +0,0 @@
-
-overrides:
-  ceph:
-    conf:
-      global:
-        lockdep: true
-
-tasks:
-  - cephfs_test_runner:
-      fail_on_skip: false
-      modules:
-        - tasks.cephfs.test_config_commands
--- a/ceph/qa/suites/kcephfs/recovery/tasks/data-scan.yaml
+++ b/ceph/qa/suites/kcephfs/recovery/tasks/data-scan.yaml
@ -11,6 +11,7 @@ overrides:
      - Scrub error on inode
      - Metadata damage detected
      - inconsistent rstat on inode
+      - Error recovering journal

 tasks:
  - cephfs_test_runner:
--- a/ceph/qa/suites/kcephfs/thrash/workloads/kclient_workunit_suites_ffsb.yaml
+++ b/ceph/qa/suites/kcephfs/thrash/workloads/kclient_workunit_suites_ffsb.yaml
@ -1,5 +1,7 @@
 overrides:
  ceph:
+    log-whitelist:
+    - SLOW_OPS
    conf:
      osd:
        filestore flush min: 0
--- a/ceph/qa/suites/krbd/basic/tasks/krbd_blkroset.yaml
+++ b/ceph/qa/suites/krbd/basic/tasks/krbd_blkroset.yaml
@ -1,5 +0,0 @@
-tasks:
- cram:
-    clients:
-      client.0:
-      - qa/rbd/krbd_blkroset.t
--- a/ceph/qa/suites/krbd/basic/tasks/krbd_read_only.yaml
+++ b/ceph/qa/suites/krbd/basic/tasks/krbd_read_only.yaml
@ -0,0 +1,6 @@
+tasks:
+- cram:
+    clients:
+      client.0:
+      - qa/rbd/krbd_blkroset.t
+      - qa/rbd/krbd_get_features.t
--- a/ceph/qa/suites/multimds/basic/tasks/cephfs_test_exports.yaml
+++ b/ceph/qa/suites/multimds/basic/tasks/cephfs_test_exports.yaml
@ -1,4 +1,5 @@
 tasks:
 - cephfs_test_runner:
+    fail_on_skip: false
    modules:
      - tasks.cephfs.test_exports
--- a/ceph/qa/suites/multimds/basic/tasks/cephfs_test_snapshots.yaml
+++ b/ceph/qa/suites/multimds/basic/tasks/cephfs_test_snapshots.yaml
@ -4,6 +4,7 @@ overrides:
  ceph:
    log-whitelist:
      - evicting unresponsive client
+      - RECENT_CRASH

 tasks:
 - cephfs_test_runner:
--- a/ceph/qa/suites/rados/basic/tasks/rgw_snaps.yaml
+++ b/ceph/qa/suites/rados/basic/tasks/rgw_snaps.yaml
@ -28,6 +28,7 @@ tasks:
    - default.rgw.log
 - s3readwrite:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
      readwrite:
        bucket: rwtest
--- a/ceph/qa/suites/rados/multimon/tasks/mon_clock_with_skews.yaml
+++ b/ceph/qa/suites/rados/multimon/tasks/mon_clock_with_skews.yaml
@ -2,6 +2,10 @@ tasks:
 - install:
 - exec:
    mon.b:
+    - sudo systemctl stop chronyd.service || true
+    - sudo systemctl stop systemd-timesync.service || true
+    - sudo systemctl stop ntpd.service || true
+    - sudo systemctl stop ntp.service || true
    - date -u -s @$(expr $(date -u +%s) + 2)
 - ceph:
    wait-for-healthy: false
@ -11,6 +15,7 @@ tasks:
    - overall HEALTH_
    - \(MON_CLOCK_SKEW\)
    - \(MGR_DOWN\)
+    - \(MON_DOWN\)
    - \(PG_
    - \(SLOW_OPS\)
    - No standby daemons available
--- a/ceph/qa/suites/rados/singleton-flat/valgrind-leaks.yaml
+++ b/ceph/qa/suites/rados/singleton-flat/valgrind-leaks.yaml
@ -23,6 +23,8 @@ overrides:
        osd max object namespace len: 64
      mon:
        mon osd crush smoke test: false
+      osd:
+        osd fast shutdown: false
    valgrind:
      mon: [--tool=memcheck, --leak-check=full, --show-reachable=yes]
      osd: [--tool=memcheck]
--- a/ceph/qa/suites/rados/thrash/workloads/cache-agent-big.yaml
+++ b/ceph/qa/suites/rados/thrash/workloads/cache-agent-big.yaml
@ -2,6 +2,11 @@ overrides:
  ceph:
    log-whitelist:
      - must scrub before tier agent can activate
+    conf:
+      osd:
+        # override short_pg_log_entries.yaml (which sets these under [global])
+        osd_min_pg_log_entries: 3000
+        osd_max_pg_log_entries: 3000
 tasks:
 - exec:
    client.0:
--- a/ceph/qa/suites/rados/thrash/workloads/cache-agent-small.yaml
+++ b/ceph/qa/suites/rados/thrash/workloads/cache-agent-small.yaml
@ -2,6 +2,11 @@ overrides:
  ceph:
    log-whitelist:
      - must scrub before tier agent can activate
+    conf:
+      osd:
+        # override short_pg_log_entries.yaml (which sets these under [global])
+        osd_min_pg_log_entries: 3000
+        osd_max_pg_log_entries: 3000
 tasks:
 - exec:
    client.0:
--- a/ceph/qa/suites/rados/thrash/workloads/cache-pool-snaps-readproxy.yaml
+++ b/ceph/qa/suites/rados/thrash/workloads/cache-pool-snaps-readproxy.yaml
@ -2,6 +2,11 @@ overrides:
  ceph:
    log-whitelist:
      - must scrub before tier agent can activate
+    conf:
+      osd:
+        # override short_pg_log_entries.yaml (which sets these under [global])
+        osd_min_pg_log_entries: 3000
+        osd_max_pg_log_entries: 3000
 tasks:
 - exec:
    client.0:
--- a/ceph/qa/suites/rados/thrash/workloads/cache-pool-snaps.yaml
+++ b/ceph/qa/suites/rados/thrash/workloads/cache-pool-snaps.yaml
@ -2,6 +2,11 @@ overrides:
  ceph:
    log-whitelist:
      - must scrub before tier agent can activate
+    conf:
+      osd:
+        # override short_pg_log_entries.yaml (which sets these under [global])
+        osd_min_pg_log_entries: 3000
+        osd_max_pg_log_entries: 3000
 tasks:
 - exec:
    client.0:
--- a/ceph/qa/suites/rados/thrash/workloads/cache-snaps.yaml
+++ b/ceph/qa/suites/rados/thrash/workloads/cache-snaps.yaml
@ -2,6 +2,11 @@ overrides:
  ceph:
    log-whitelist:
      - must scrub before tier agent can activate
+    conf:
+      osd:
+        # override short_pg_log_entries.yaml (which sets these under [global])
+        osd_min_pg_log_entries: 3000
+        osd_max_pg_log_entries: 3000
 tasks:
 - exec:
    client.0:
--- a/ceph/qa/suites/rados/thrash/workloads/cache.yaml
+++ b/ceph/qa/suites/rados/thrash/workloads/cache.yaml
@ -2,6 +2,11 @@ overrides:
  ceph:
    log-whitelist:
      - must scrub before tier agent can activate
+    conf:
+      osd:
+        # override short_pg_log_entries.yaml (which sets these under [global])
+        osd_min_pg_log_entries: 3000
+        osd_max_pg_log_entries: 3000
 tasks:
 - exec:
    client.0:
--- a/ceph/qa/suites/rados/verify/validater/valgrind.yaml
+++ b/ceph/qa/suites/rados/verify/validater/valgrind.yaml
@ -13,6 +13,8 @@ overrides:
        debug refs: 5
      mon:
        mon osd crush smoke test: false
+      osd:
+        osd fast shutdown: false
    log-whitelist:
      - overall HEALTH_
 # valgrind is slow.. we might get PGs stuck peering etc
--- a/ceph/qa/suites/rgw/multifs/tasks/rgw_readwrite.yaml
+++ b/ceph/qa/suites/rgw/multifs/tasks/rgw_readwrite.yaml
@ -4,6 +4,7 @@ tasks:
 - rgw: [client.0]
 - s3readwrite:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
      readwrite:
        bucket: rwtest
--- a/ceph/qa/suites/rgw/multifs/tasks/rgw_roundtrip.yaml
+++ b/ceph/qa/suites/rgw/multifs/tasks/rgw_roundtrip.yaml
@ -4,6 +4,7 @@ tasks:
 - rgw: [client.0]
 - s3roundtrip:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
      roundtrip:
        bucket: rttest
--- a/ceph/qa/suites/rgw/multifs/tasks/rgw_swift.yaml
+++ b/ceph/qa/suites/rgw/multifs/tasks/rgw_swift.yaml
@ -4,4 +4,5 @@ tasks:
 - rgw: [client.0]
 - swift:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
--- a/ceph/qa/suites/rgw/multisite/valgrind.yaml
+++ b/ceph/qa/suites/rgw/multisite/valgrind.yaml
@ -11,6 +11,8 @@ overrides:
        osd heartbeat grace: 40
      mon:
        mon osd crush smoke test: false
+      osd:
+        osd fast shutdown: false
    valgrind:
      mon: [--tool=memcheck, --leak-check=full, --show-reachable=yes]
      osd: [--tool=memcheck]
--- a/ceph/qa/suites/rgw/thrash/workload/rgw_readwrite.yaml
+++ b/ceph/qa/suites/rgw/thrash/workload/rgw_readwrite.yaml
@ -1,6 +1,7 @@
 tasks:
 - s3readwrite:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
      readwrite:
        bucket: rwtest
--- a/ceph/qa/suites/rgw/thrash/workload/rgw_roundtrip.yaml
+++ b/ceph/qa/suites/rgw/thrash/workload/rgw_roundtrip.yaml
@ -1,6 +1,7 @@
 tasks:
 - s3roundtrip:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
      roundtrip:
        bucket: rttest
--- a/ceph/qa/suites/rgw/thrash/workload/rgw_swift.yaml
+++ b/ceph/qa/suites/rgw/thrash/workload/rgw_swift.yaml
@ -1,4 +1,5 @@
 tasks:
 - swift:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
--- a/ceph/qa/suites/rgw/verify/tasks/swift.yaml
+++ b/ceph/qa/suites/rgw/verify/tasks/swift.yaml
@ -1,4 +1,5 @@
 tasks:
 - swift:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
--- a/ceph/qa/suites/rgw/verify/validater/valgrind.yaml
+++ b/ceph/qa/suites/rgw/verify/validater/valgrind.yaml
@ -12,6 +12,8 @@ overrides:
        osd heartbeat grace: 40
      mon:
        mon osd crush smoke test: false
+      osd:
+        osd fast shutdown: false
    valgrind:
      mon: [--tool=memcheck, --leak-check=full, --show-reachable=yes]
      osd: [--tool=memcheck]
--- a/ceph/qa/suites/smoke/basic/tasks/rgw_ec_s3tests.yaml
+++ b/ceph/qa/suites/smoke/basic/tasks/rgw_ec_s3tests.yaml
@ -9,6 +9,7 @@ tasks:
 - rgw: [client.0]
 - s3tests:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
 overrides:
  ceph:
--- a/ceph/qa/suites/smoke/basic/tasks/rgw_s3tests.yaml
+++ b/ceph/qa/suites/smoke/basic/tasks/rgw_s3tests.yaml
@ -5,6 +5,7 @@ tasks:
 - rgw: [client.0]
 - s3tests:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
 overrides:
  ceph:
--- a/ceph/qa/suites/smoke/basic/tasks/rgw_swift.yaml
+++ b/ceph/qa/suites/smoke/basic/tasks/rgw_swift.yaml
@ -5,4 +5,5 @@ tasks:
 - rgw: [client.0]
 - swift:
    client.0:
+      force-branch: ceph-nautilus
      rgw_server: client.0
--- a/ceph/qa/suites/teuthology/rgw/tasks/s3tests-civetweb.yaml
+++ b/ceph/qa/suites/teuthology/rgw/tasks/s3tests-civetweb.yaml
@ -11,7 +11,7 @@ tasks:
 - s3tests:
    client.0:
      rgw_server: client.0
-      force-branch: master
+      force-branch: ceph-nautilus
 overrides:
  ceph:
    fs: xfs
--- a/ceph/qa/suites/teuthology/rgw/tasks/s3tests-fastcgi.yaml
+++ b/ceph/qa/suites/teuthology/rgw/tasks/s3tests-fastcgi.yaml
@ -11,7 +11,7 @@ tasks:
 - s3tests:
    client.0:
      rgw_server: client.0
-      force-branch: master
+      force-branch: ceph-nautilus
 overrides:
  ceph:
    fs: xfs
--- a/ceph/qa/suites/teuthology/rgw/tasks/s3tests-fcgi.yaml
+++ b/ceph/qa/suites/teuthology/rgw/tasks/s3tests-fcgi.yaml
@ -12,7 +12,7 @@ tasks:
 - s3tests:
    client.0:
      rgw_server: client.0
-      force-branch: master
+      force-branch: ceph-nautilus
 overrides:
  ceph:
    fs: xfs
--- a/ceph/qa/suites/upgrade/client-upgrade-nautilus/.qa
+++ b/ceph/qa/suites/upgrade/client-upgrade-nautilus/.qa
@ -0,0 +1 @@
+../.qa/
--- a/ceph/qa/suites/upgrade/client-upgrade-nautilus/nautilus-client-x/.qa
+++ b/ceph/qa/suites/upgrade/client-upgrade-nautilus/nautilus-client-x/.qa
@ -0,0 +1 @@
+../.qa/
--- a/ceph/src/pybind/mgr/dashboard/frontend/src/app/core/mgr-modules/telemetry/telemetry.component.html
+++ b/ceph/src/pybind/mgr/dashboard/frontend/src/app/core/mgr-modules/telemetry/telemetry.component.html
--- a/ceph/qa/suites/upgrade/client-upgrade-nautilus/nautilus-client-x/basic/.qa
+++ b/ceph/qa/suites/upgrade/client-upgrade-nautilus/nautilus-client-x/basic/.qa
@ -0,0 +1 @@
+../.qa/
--- a/Show More
+++ b/Show More