mirror of
				https://git.proxmox.com/git/mirror_zfs
				synced 2025-11-04 05:22:48 +00:00 
			
		
		
		
	Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Laura Hild <lsh@jlab.org> Closes #15247
		
			
				
	
	
		
			513 lines
		
	
	
		
			20 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			513 lines
		
	
	
		
			20 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
.\"
 | 
						|
.\" CDDL HEADER START
 | 
						|
.\"
 | 
						|
.\" The contents of this file are subject to the terms of the
 | 
						|
.\" Common Development and Distribution License (the "License").
 | 
						|
.\" You may not use this file except in compliance with the License.
 | 
						|
.\"
 | 
						|
.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
 | 
						|
.\" or https://opensource.org/licenses/CDDL-1.0.
 | 
						|
.\" See the License for the specific language governing permissions
 | 
						|
.\" and limitations under the License.
 | 
						|
.\"
 | 
						|
.\" When distributing Covered Code, include this CDDL HEADER in each
 | 
						|
.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
 | 
						|
.\" If applicable, add the following below this CDDL HEADER, with the
 | 
						|
.\" fields enclosed by brackets "[]" replaced with your own identifying
 | 
						|
.\" information: Portions Copyright [yyyy] [name of copyright owner]
 | 
						|
.\"
 | 
						|
.\" CDDL HEADER END
 | 
						|
.\"
 | 
						|
.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
 | 
						|
.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
 | 
						|
.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved.
 | 
						|
.\" Copyright (c) 2017 Datto Inc.
 | 
						|
.\" Copyright (c) 2018 George Melikov. All Rights Reserved.
 | 
						|
.\" Copyright 2017 Nexenta Systems, Inc.
 | 
						|
.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
 | 
						|
.\"
 | 
						|
.Dd April 7, 2023
 | 
						|
.Dt ZPOOLCONCEPTS 7
 | 
						|
.Os
 | 
						|
.
 | 
						|
.Sh NAME
 | 
						|
.Nm zpoolconcepts
 | 
						|
.Nd overview of ZFS storage pools
 | 
						|
.
 | 
						|
.Sh DESCRIPTION
 | 
						|
.Ss Virtual Devices (vdevs)
 | 
						|
A "virtual device" describes a single device or a collection of devices,
 | 
						|
organized according to certain performance and fault characteristics.
 | 
						|
The following virtual devices are supported:
 | 
						|
.Bl -tag -width "special"
 | 
						|
.It Sy disk
 | 
						|
A block device, typically located under
 | 
						|
.Pa /dev .
 | 
						|
ZFS can use individual slices or partitions, though the recommended mode of
 | 
						|
operation is to use whole disks.
 | 
						|
A disk can be specified by a full path, or it can be a shorthand name
 | 
						|
.Po the relative portion of the path under
 | 
						|
.Pa /dev
 | 
						|
.Pc .
 | 
						|
A whole disk can be specified by omitting the slice or partition designation.
 | 
						|
For example,
 | 
						|
.Pa sda
 | 
						|
is equivalent to
 | 
						|
.Pa /dev/sda .
 | 
						|
When given a whole disk, ZFS automatically labels the disk, if necessary.
 | 
						|
.It Sy file
 | 
						|
A regular file.
 | 
						|
The use of files as a backing store is strongly discouraged.
 | 
						|
It is designed primarily for experimental purposes, as the fault tolerance of a
 | 
						|
file is only as good as the file system on which it resides.
 | 
						|
A file must be specified by a full path.
 | 
						|
.It Sy mirror
 | 
						|
A mirror of two or more devices.
 | 
						|
Data is replicated in an identical fashion across all components of a mirror.
 | 
						|
A mirror with
 | 
						|
.Em N No disks of size Em X No can hold Em X No bytes and can withstand Em N-1
 | 
						|
devices failing, without losing data.
 | 
						|
.It Sy raidz , raidz1 , raidz2 , raidz3
 | 
						|
A distributed-parity layout, similar to RAID-5/6, with improved distribution of
 | 
						|
parity, and which does not suffer from the RAID-5/6
 | 
						|
.Qq write hole ,
 | 
						|
.Pq in which data and parity become inconsistent after a power loss .
 | 
						|
Data and parity is striped across all disks within a raidz group, though not
 | 
						|
necessarily in a consistent stripe width.
 | 
						|
.Pp
 | 
						|
A raidz group can have single, double, or triple parity, meaning that the
 | 
						|
raidz group can sustain one, two, or three failures, respectively, without
 | 
						|
losing any data.
 | 
						|
The
 | 
						|
.Sy raidz1
 | 
						|
vdev type specifies a single-parity raidz group; the
 | 
						|
.Sy raidz2
 | 
						|
vdev type specifies a double-parity raidz group; and the
 | 
						|
.Sy raidz3
 | 
						|
vdev type specifies a triple-parity raidz group.
 | 
						|
The
 | 
						|
.Sy raidz
 | 
						|
vdev type is an alias for
 | 
						|
.Sy raidz1 .
 | 
						|
.Pp
 | 
						|
A raidz group with
 | 
						|
.Em N No disks of size Em X No with Em P No parity disks can hold approximately
 | 
						|
.Em (N-P)*X No bytes and can withstand Em P No devices failing without losing data .
 | 
						|
The minimum number of devices in a raidz group is one more than the number of
 | 
						|
parity disks.
 | 
						|
The recommended number is between 3 and 9 to help increase performance.
 | 
						|
.It Sy draid , draid1 , draid2 , draid3
 | 
						|
A variant of raidz that provides integrated distributed hot spares, allowing
 | 
						|
for faster resilvering, while retaining the benefits of raidz.
 | 
						|
A dRAID vdev is constructed from multiple internal raidz groups, each with
 | 
						|
.Em D No data devices and Em P No parity devices .
 | 
						|
These groups are distributed over all of the children in order to fully
 | 
						|
utilize the available disk performance.
 | 
						|
.Pp
 | 
						|
Unlike raidz, dRAID uses a fixed stripe width (padding as necessary with
 | 
						|
zeros) to allow fully sequential resilvering.
 | 
						|
This fixed stripe width significantly affects both usable capacity and IOPS.
 | 
						|
For example, with the default
 | 
						|
.Em D=8 No and Em 4 KiB No disk sectors the minimum allocation size is Em 32 KiB .
 | 
						|
If using compression, this relatively large allocation size can reduce the
 | 
						|
effective compression ratio.
 | 
						|
When using ZFS volumes (zvols) and dRAID, the default of the
 | 
						|
.Sy volblocksize
 | 
						|
property is increased to account for the allocation size.
 | 
						|
If a dRAID pool will hold a significant amount of small blocks, it is
 | 
						|
recommended to also add a mirrored
 | 
						|
.Sy special
 | 
						|
vdev to store those blocks.
 | 
						|
.Pp
 | 
						|
In regards to I/O, performance is similar to raidz since, for any read, all
 | 
						|
.Em D No data disks must be accessed .
 | 
						|
Delivered random IOPS can be reasonably approximated as
 | 
						|
.Sy floor((N-S)/(D+P))*single_drive_IOPS .
 | 
						|
.Pp
 | 
						|
Like raidz, a dRAID can have single-, double-, or triple-parity.
 | 
						|
The
 | 
						|
.Sy draid1 ,
 | 
						|
.Sy draid2 ,
 | 
						|
and
 | 
						|
.Sy draid3
 | 
						|
types can be used to specify the parity level.
 | 
						|
The
 | 
						|
.Sy draid
 | 
						|
vdev type is an alias for
 | 
						|
.Sy draid1 .
 | 
						|
.Pp
 | 
						|
A dRAID with
 | 
						|
.Em N No disks of size Em X , D No data disks per redundancy group , Em P
 | 
						|
.No parity level, and Em S No distributed hot spares can hold approximately
 | 
						|
.Em (N-S)*(D/(D+P))*X No bytes and can withstand Em P
 | 
						|
devices failing without losing data.
 | 
						|
.It Sy draid Ns Oo Ar parity Oc Ns Oo Sy \&: Ns Ar data Ns Sy d Oc Ns Oo Sy \&: Ns Ar children Ns Sy c Oc Ns Oo Sy \&: Ns Ar spares Ns Sy s Oc
 | 
						|
A non-default dRAID configuration can be specified by appending one or more
 | 
						|
of the following optional arguments to the
 | 
						|
.Sy draid
 | 
						|
keyword:
 | 
						|
.Bl -tag -compact -width "children"
 | 
						|
.It Ar parity
 | 
						|
The parity level (1-3).
 | 
						|
.It Ar data
 | 
						|
The number of data devices per redundancy group.
 | 
						|
In general, a smaller value of
 | 
						|
.Em D No will increase IOPS, improve the compression ratio ,
 | 
						|
and speed up resilvering at the expense of total usable capacity.
 | 
						|
Defaults to
 | 
						|
.Em 8 , No unless Em N-P-S No is less than Em 8 .
 | 
						|
.It Ar children
 | 
						|
The expected number of children.
 | 
						|
Useful as a cross-check when listing a large number of devices.
 | 
						|
An error is returned when the provided number of children differs.
 | 
						|
.It Ar spares
 | 
						|
The number of distributed hot spares.
 | 
						|
Defaults to zero.
 | 
						|
.El
 | 
						|
.It Sy spare
 | 
						|
A pseudo-vdev which keeps track of available hot spares for a pool.
 | 
						|
For more information, see the
 | 
						|
.Sx Hot Spares
 | 
						|
section.
 | 
						|
.It Sy log
 | 
						|
A separate intent log device.
 | 
						|
If more than one log device is specified, then writes are load-balanced between
 | 
						|
devices.
 | 
						|
Log devices can be mirrored.
 | 
						|
However, raidz vdev types are not supported for the intent log.
 | 
						|
For more information, see the
 | 
						|
.Sx Intent Log
 | 
						|
section.
 | 
						|
.It Sy dedup
 | 
						|
A device solely dedicated for deduplication tables.
 | 
						|
The redundancy of this device should match the redundancy of the other normal
 | 
						|
devices in the pool.
 | 
						|
If more than one dedup device is specified, then
 | 
						|
allocations are load-balanced between those devices.
 | 
						|
.It Sy special
 | 
						|
A device dedicated solely for allocating various kinds of internal metadata,
 | 
						|
and optionally small file blocks.
 | 
						|
The redundancy of this device should match the redundancy of the other normal
 | 
						|
devices in the pool.
 | 
						|
If more than one special device is specified, then
 | 
						|
allocations are load-balanced between those devices.
 | 
						|
.Pp
 | 
						|
For more information on special allocations, see the
 | 
						|
.Sx Special Allocation Class
 | 
						|
section.
 | 
						|
.It Sy cache
 | 
						|
A device used to cache storage pool data.
 | 
						|
A cache device cannot be configured as a mirror or raidz group.
 | 
						|
For more information, see the
 | 
						|
.Sx Cache Devices
 | 
						|
section.
 | 
						|
.El
 | 
						|
.Pp
 | 
						|
Virtual devices cannot be nested arbitrarily.
 | 
						|
A mirror, raidz or draid virtual device can only be created with files or disks.
 | 
						|
Mirrors of mirrors or other such combinations are not allowed.
 | 
						|
.Pp
 | 
						|
A pool can have any number of virtual devices at the top of the configuration
 | 
						|
.Po known as
 | 
						|
.Qq root vdevs
 | 
						|
.Pc .
 | 
						|
Data is dynamically distributed across all top-level devices to balance data
 | 
						|
among devices.
 | 
						|
As new virtual devices are added, ZFS automatically places data on the newly
 | 
						|
available devices.
 | 
						|
.Pp
 | 
						|
Virtual devices are specified one at a time on the command line,
 | 
						|
separated by whitespace.
 | 
						|
Keywords like
 | 
						|
.Sy mirror No and Sy raidz
 | 
						|
are used to distinguish where a group ends and another begins.
 | 
						|
For example, the following creates a pool with two root vdevs,
 | 
						|
each a mirror of two disks:
 | 
						|
.Dl # Nm zpool Cm create Ar mypool Sy mirror Ar sda sdb Sy mirror Ar sdc sdd
 | 
						|
.
 | 
						|
.Ss Device Failure and Recovery
 | 
						|
ZFS supports a rich set of mechanisms for handling device failure and data
 | 
						|
corruption.
 | 
						|
All metadata and data is checksummed, and ZFS automatically repairs bad data
 | 
						|
from a good copy, when corruption is detected.
 | 
						|
.Pp
 | 
						|
In order to take advantage of these features, a pool must make use of some form
 | 
						|
of redundancy, using either mirrored or raidz groups.
 | 
						|
While ZFS supports running in a non-redundant configuration, where each root
 | 
						|
vdev is simply a disk or file, this is strongly discouraged.
 | 
						|
A single case of bit corruption can render some or all of your data unavailable.
 | 
						|
.Pp
 | 
						|
A pool's health status is described by one of three states:
 | 
						|
.Sy online , degraded , No or Sy faulted .
 | 
						|
An online pool has all devices operating normally.
 | 
						|
A degraded pool is one in which one or more devices have failed, but the data is
 | 
						|
still available due to a redundant configuration.
 | 
						|
A faulted pool has corrupted metadata, or one or more faulted devices, and
 | 
						|
insufficient replicas to continue functioning.
 | 
						|
.Pp
 | 
						|
The health of the top-level vdev, such as a mirror or raidz device,
 | 
						|
is potentially impacted by the state of its associated vdevs
 | 
						|
or component devices.
 | 
						|
A top-level vdev or component device is in one of the following states:
 | 
						|
.Bl -tag -width "DEGRADED"
 | 
						|
.It Sy DEGRADED
 | 
						|
One or more top-level vdevs is in the degraded state because one or more
 | 
						|
component devices are offline.
 | 
						|
Sufficient replicas exist to continue functioning.
 | 
						|
.Pp
 | 
						|
One or more component devices is in the degraded or faulted state, but
 | 
						|
sufficient replicas exist to continue functioning.
 | 
						|
The underlying conditions are as follows:
 | 
						|
.Bl -bullet -compact
 | 
						|
.It
 | 
						|
The number of checksum errors exceeds acceptable levels and the device is
 | 
						|
degraded as an indication that something may be wrong.
 | 
						|
ZFS continues to use the device as necessary.
 | 
						|
.It
 | 
						|
The number of I/O errors exceeds acceptable levels.
 | 
						|
The device could not be marked as faulted because there are insufficient
 | 
						|
replicas to continue functioning.
 | 
						|
.El
 | 
						|
.It Sy FAULTED
 | 
						|
One or more top-level vdevs is in the faulted state because one or more
 | 
						|
component devices are offline.
 | 
						|
Insufficient replicas exist to continue functioning.
 | 
						|
.Pp
 | 
						|
One or more component devices is in the faulted state, and insufficient
 | 
						|
replicas exist to continue functioning.
 | 
						|
The underlying conditions are as follows:
 | 
						|
.Bl -bullet -compact
 | 
						|
.It
 | 
						|
The device could be opened, but the contents did not match expected values.
 | 
						|
.It
 | 
						|
The number of I/O errors exceeds acceptable levels and the device is faulted to
 | 
						|
prevent further use of the device.
 | 
						|
.El
 | 
						|
.It Sy OFFLINE
 | 
						|
The device was explicitly taken offline by the
 | 
						|
.Nm zpool Cm offline
 | 
						|
command.
 | 
						|
.It Sy ONLINE
 | 
						|
The device is online and functioning.
 | 
						|
.It Sy REMOVED
 | 
						|
The device was physically removed while the system was running.
 | 
						|
Device removal detection is hardware-dependent and may not be supported on all
 | 
						|
platforms.
 | 
						|
.It Sy UNAVAIL
 | 
						|
The device could not be opened.
 | 
						|
If a pool is imported when a device was unavailable, then the device will be
 | 
						|
identified by a unique identifier instead of its path since the path was never
 | 
						|
correct in the first place.
 | 
						|
.El
 | 
						|
.Pp
 | 
						|
Checksum errors represent events where a disk returned data that was expected
 | 
						|
to be correct, but was not.
 | 
						|
In other words, these are instances of silent data corruption.
 | 
						|
The checksum errors are reported in
 | 
						|
.Nm zpool Cm status
 | 
						|
and
 | 
						|
.Nm zpool Cm events .
 | 
						|
When a block is stored redundantly, a damaged block may be reconstructed
 | 
						|
(e.g. from raidz parity or a mirrored copy).
 | 
						|
In this case, ZFS reports the checksum error against the disks that contained
 | 
						|
damaged data.
 | 
						|
If a block is unable to be reconstructed (e.g. due to 3 disks being damaged
 | 
						|
in a raidz2 group), it is not possible to determine which disks were silently
 | 
						|
corrupted.
 | 
						|
In this case, checksum errors are reported for all disks on which the block
 | 
						|
is stored.
 | 
						|
.Pp
 | 
						|
If a device is removed and later re-attached to the system,
 | 
						|
ZFS attempts to bring the device online automatically.
 | 
						|
Device attachment detection is hardware-dependent
 | 
						|
and might not be supported on all platforms.
 | 
						|
.
 | 
						|
.Ss Hot Spares
 | 
						|
ZFS allows devices to be associated with pools as
 | 
						|
.Qq hot spares .
 | 
						|
These devices are not actively used in the pool.
 | 
						|
But, when an active device
 | 
						|
fails, it is automatically replaced by a hot spare.
 | 
						|
To create a pool with hot spares, specify a
 | 
						|
.Sy spare
 | 
						|
vdev with any number of devices.
 | 
						|
For example,
 | 
						|
.Dl # Nm zpool Cm create Ar pool Sy mirror Ar sda sdb Sy spare Ar sdc sdd
 | 
						|
.Pp
 | 
						|
Spares can be shared across multiple pools, and can be added with the
 | 
						|
.Nm zpool Cm add
 | 
						|
command and removed with the
 | 
						|
.Nm zpool Cm remove
 | 
						|
command.
 | 
						|
Once a spare replacement is initiated, a new
 | 
						|
.Sy spare
 | 
						|
vdev is created within the configuration that will remain there until the
 | 
						|
original device is replaced.
 | 
						|
At this point, the hot spare becomes available again, if another device fails.
 | 
						|
.Pp
 | 
						|
If a pool has a shared spare that is currently being used, the pool cannot be
 | 
						|
exported, since other pools may use this shared spare, which may lead to
 | 
						|
potential data corruption.
 | 
						|
.Pp
 | 
						|
Shared spares add some risk.
 | 
						|
If the pools are imported on different hosts,
 | 
						|
and both pools suffer a device failure at the same time,
 | 
						|
both could attempt to use the spare at the same time.
 | 
						|
This may not be detected, resulting in data corruption.
 | 
						|
.Pp
 | 
						|
An in-progress spare replacement can be cancelled by detaching the hot spare.
 | 
						|
If the original faulted device is detached, then the hot spare assumes its
 | 
						|
place in the configuration, and is removed from the spare list of all active
 | 
						|
pools.
 | 
						|
.Pp
 | 
						|
The
 | 
						|
.Sy draid
 | 
						|
vdev type provides distributed hot spares.
 | 
						|
These hot spares are named after the dRAID vdev they're a part of
 | 
						|
.Po Sy draid1 Ns - Ns Ar 2 Ns - Ns Ar 3 No specifies spare Ar 3 No of vdev Ar 2 ,
 | 
						|
.No which is a single parity dRAID Pc
 | 
						|
and may only be used by that dRAID vdev.
 | 
						|
Otherwise, they behave the same as normal hot spares.
 | 
						|
.Pp
 | 
						|
Spares cannot replace log devices.
 | 
						|
.
 | 
						|
.Ss Intent Log
 | 
						|
The ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous
 | 
						|
transactions.
 | 
						|
For instance, databases often require their transactions to be on stable storage
 | 
						|
devices when returning from a system call.
 | 
						|
NFS and other applications can also use
 | 
						|
.Xr fsync 2
 | 
						|
to ensure data stability.
 | 
						|
By default, the intent log is allocated from blocks within the main pool.
 | 
						|
However, it might be possible to get better performance using separate intent
 | 
						|
log devices such as NVRAM or a dedicated disk.
 | 
						|
For example:
 | 
						|
.Dl # Nm zpool Cm create Ar pool sda sdb Sy log Ar sdc
 | 
						|
.Pp
 | 
						|
Multiple log devices can also be specified, and they can be mirrored.
 | 
						|
See the
 | 
						|
.Sx EXAMPLES
 | 
						|
section for an example of mirroring multiple log devices.
 | 
						|
.Pp
 | 
						|
Log devices can be added, replaced, attached, detached, and removed.
 | 
						|
In addition, log devices are imported and exported as part of the pool
 | 
						|
that contains them.
 | 
						|
Mirrored devices can be removed by specifying the top-level mirror vdev.
 | 
						|
.
 | 
						|
.Ss Cache Devices
 | 
						|
Devices can be added to a storage pool as
 | 
						|
.Qq cache devices .
 | 
						|
These devices provide an additional layer of caching between main memory and
 | 
						|
disk.
 | 
						|
For read-heavy workloads, where the working set size is much larger than what
 | 
						|
can be cached in main memory, using cache devices allows much more of this
 | 
						|
working set to be served from low latency media.
 | 
						|
Using cache devices provides the greatest performance improvement for random
 | 
						|
read-workloads of mostly static content.
 | 
						|
.Pp
 | 
						|
To create a pool with cache devices, specify a
 | 
						|
.Sy cache
 | 
						|
vdev with any number of devices.
 | 
						|
For example:
 | 
						|
.Dl # Nm zpool Cm create Ar pool sda sdb Sy cache Ar sdc sdd
 | 
						|
.Pp
 | 
						|
Cache devices cannot be mirrored or part of a raidz configuration.
 | 
						|
If a read error is encountered on a cache device, that read I/O is reissued to
 | 
						|
the original storage pool device, which might be part of a mirrored or raidz
 | 
						|
configuration.
 | 
						|
.Pp
 | 
						|
The content of the cache devices is persistent across reboots and restored
 | 
						|
asynchronously when importing the pool in L2ARC (persistent L2ARC).
 | 
						|
This can be disabled by setting
 | 
						|
.Sy l2arc_rebuild_enabled Ns = Ns Sy 0 .
 | 
						|
For cache devices smaller than
 | 
						|
.Em 1 GiB ,
 | 
						|
ZFS does not write the metadata structures
 | 
						|
required for rebuilding the L2ARC, to conserve space.
 | 
						|
This can be changed with
 | 
						|
.Sy l2arc_rebuild_blocks_min_l2size .
 | 
						|
The cache device header
 | 
						|
.Pq Em 512 B
 | 
						|
is updated even if no metadata structures are written.
 | 
						|
Setting
 | 
						|
.Sy l2arc_headroom Ns = Ns Sy 0
 | 
						|
will result in scanning the full-length ARC lists for cacheable content to be
 | 
						|
written in L2ARC (persistent ARC).
 | 
						|
If a cache device is added with
 | 
						|
.Nm zpool Cm add ,
 | 
						|
its label and header will be overwritten and its contents will not be
 | 
						|
restored in L2ARC, even if the device was previously part of the pool.
 | 
						|
If a cache device is onlined with
 | 
						|
.Nm zpool Cm online ,
 | 
						|
its contents will be restored in L2ARC.
 | 
						|
This is useful in case of memory pressure,
 | 
						|
where the contents of the cache device are not fully restored in L2ARC.
 | 
						|
The user can off- and online the cache device when there is less memory
 | 
						|
pressure, to fully restore its contents to L2ARC.
 | 
						|
.
 | 
						|
.Ss Pool checkpoint
 | 
						|
Before starting critical procedures that include destructive actions
 | 
						|
.Pq like Nm zfs Cm destroy ,
 | 
						|
an administrator can checkpoint the pool's state and, in the case of a
 | 
						|
mistake or failure, rewind the entire pool back to the checkpoint.
 | 
						|
Otherwise, the checkpoint can be discarded when the procedure has completed
 | 
						|
successfully.
 | 
						|
.Pp
 | 
						|
A pool checkpoint can be thought of as a pool-wide snapshot and should be used
 | 
						|
with care as it contains every part of the pool's state, from properties to vdev
 | 
						|
configuration.
 | 
						|
Thus, certain operations are not allowed while a pool has a checkpoint.
 | 
						|
Specifically, vdev removal/attach/detach, mirror splitting, and
 | 
						|
changing the pool's GUID.
 | 
						|
Adding a new vdev is supported, but in the case of a rewind it will have to be
 | 
						|
added again.
 | 
						|
Finally, users of this feature should keep in mind that scrubs in a pool that
 | 
						|
has a checkpoint do not repair checkpointed data.
 | 
						|
.Pp
 | 
						|
To create a checkpoint for a pool:
 | 
						|
.Dl # Nm zpool Cm checkpoint Ar pool
 | 
						|
.Pp
 | 
						|
To later rewind to its checkpointed state, you need to first export it and
 | 
						|
then rewind it during import:
 | 
						|
.Dl # Nm zpool Cm export Ar pool
 | 
						|
.Dl # Nm zpool Cm import Fl -rewind-to-checkpoint Ar pool
 | 
						|
.Pp
 | 
						|
To discard the checkpoint from a pool:
 | 
						|
.Dl # Nm zpool Cm checkpoint Fl d Ar pool
 | 
						|
.Pp
 | 
						|
Dataset reservations (controlled by the
 | 
						|
.Sy reservation No and Sy refreservation
 | 
						|
properties) may be unenforceable while a checkpoint exists, because the
 | 
						|
checkpoint is allowed to consume the dataset's reservation.
 | 
						|
Finally, data that is part of the checkpoint but has been freed in the
 | 
						|
current state of the pool won't be scanned during a scrub.
 | 
						|
.
 | 
						|
.Ss Special Allocation Class
 | 
						|
Allocations in the special class are dedicated to specific block types.
 | 
						|
By default, this includes all metadata, the indirect blocks of user data, and
 | 
						|
any deduplication tables.
 | 
						|
The class can also be provisioned to accept small file blocks.
 | 
						|
.Pp
 | 
						|
A pool must always have at least one normal
 | 
						|
.Pq non- Ns Sy dedup Ns /- Ns Sy special
 | 
						|
vdev before
 | 
						|
other devices can be assigned to the special class.
 | 
						|
If the
 | 
						|
.Sy special
 | 
						|
class becomes full, then allocations intended for it
 | 
						|
will spill back into the normal class.
 | 
						|
.Pp
 | 
						|
Deduplication tables can be excluded from the special class by unsetting the
 | 
						|
.Sy zfs_ddt_data_is_special
 | 
						|
ZFS module parameter.
 | 
						|
.Pp
 | 
						|
Inclusion of small file blocks in the special class is opt-in.
 | 
						|
Each dataset can control the size of small file blocks allowed
 | 
						|
in the special class by setting the
 | 
						|
.Sy special_small_blocks
 | 
						|
property to nonzero.
 | 
						|
See
 | 
						|
.Xr zfsprops 7
 | 
						|
for more info on this property.
 |