mirror_ubuntu-kernels/drivers/md
NeilBrown f466722ca6 md: Change handling of save_raid_disk and metadata update during recovery.
Since commit d70ed2e4fa
   MD: Allow restarting an interrupted incremental recovery.

we don't write out the metadata to devices while they are recovering.
This had a good reason, but has unfortunate consequences.  This patch
changes things to make them work better.

At issue is what happens if the array is shut down while a recovery is
happening, particularly a bitmap-guided recovery.
Ideally the recovery should pick up where it left off.
However the metadata cannot represent the state "A recovery is in
process which is guided by the bitmap".

Before the above mentioned commit, we wrote metadata to the device
which said "this is being recovered and it is up to <here>".  So after
a restart, a full recovery (not bitmap-guided) would happen from
where-ever it was up to.

After the commit the metadata wasn't updated so it still said "This
device is fully in sync with <this> event count".  That leads to a
bitmap-based recovery following the whole bitmap, which should be a
lot less work than a full recovery from some starting point.  So this
was an improvement.

However updates some metadata but not all leads to other problems.
In particular, the metadata written to the fully-up-to-date device
record that the array has all devices present (even though some are
recovering).  So on restart, mdadm wants to find all devices and
expects them to have current event counts.
Obviously it doesn't (some have old event counts) so (when assembling
with --incremental) it waits indefinitely for the rest of the expected
devices.

It really is wrong to not update all the metadata together.  Do that
is bound to cause confusion.
Instead, we should make it possible to record the truth in the
metadata.  i.e. we need to be able to record that a device is being
recovered based on the bitmap.
We already have a Feature flag to say that recovery is happening.  We
now add another one to say that it is a bitmap-based recovery.

With this we can remove the code that disables the write-out of
metadata on some devices.

So this patch:
 - moves the setting of 'saved_raid_disk' from add_new_disk to
   the validate_super methods.  This makes sure it is always set
   properly, both when adding a new device to an array, and when
   assembling an array from a collection of devices.
 - Adds a metadata flag MD_FEATURE_RECOVERY_BITMAP which is only
   used if MD_FEATURE_RECOVERY_OFFSET is set, and record that a
   bitmap-based recovery is allowed.
   This is only present in v1.x metadata. v0.90 doesn't support
   devices which are in the middle of recovery at all.
 - Only skips writing metadata to Faulty devices.

 - Also allows rdev state to be set to "-insync" via sysfs.
   This can be used for external-metadata arrays.  When the
   'role' is set the device is assumed to be in-sync.  If, after
   setting the role, we set the state to "-insync", the role is
   moved to saved_raid_disk which effectively says the device is
   partly in-sync with that slot and needs a bitmap recovery.

Cc: Andrei Warkentin <andreiw@vmware.com>
Signed-off-by: NeilBrown <neilb@suse.de>
2014-01-14 16:44:21 +11:00
..
bcache bcache: defensively handle format strings 2013-11-10 21:56:43 -08:00
persistent-data dm space map disk: optimise sm_disk_dec_block 2013-11-09 18:20:24 -05:00
bitmap.c sysfs: clean up sysfs_get_dirent() 2013-09-26 15:33:18 -07:00
bitmap.h md/bitmap: record the space available for the bitmap in the superblock. 2012-05-22 13:55:34 +10:00
dm-bio-prison.c dm: add cache target 2013-03-01 22:45:51 +00:00
dm-bio-prison.h dm: add cache target 2013-03-01 22:45:51 +00:00
dm-bio-record.h
dm-bufio.c drivers: convert shrinkers to new count/scan API 2013-09-10 18:56:32 -04:00
dm-bufio.h dm bufio: prefetch 2012-03-28 18:41:29 +01:00
dm-cache-block-types.h dm: add cache target 2013-03-01 22:45:51 +00:00
dm-cache-metadata.c dm cache metadata: check the metadata version when reading the superblock 2013-11-11 11:37:49 -05:00
dm-cache-metadata.h dm cache: add passthrough mode 2013-11-11 11:37:49 -05:00
dm-cache-policy-cleaner.c dm cache: policy change version from string to integer set 2013-03-20 17:21:27 +00:00
dm-cache-policy-internal.h dm cache: add remove_cblock method to policy interface 2013-11-11 11:37:50 -05:00
dm-cache-policy-mq.c dm cache: resolve small nits and improve Documentation 2013-11-12 13:11:09 -05:00
dm-cache-policy.c dm cache: return -EINVAL if the user specifies unknown cache policy 2013-11-09 18:20:18 -05:00
dm-cache-policy.h dm cache: add remove_cblock method to policy interface 2013-11-11 11:37:50 -05:00
dm-cache-target.c dm cache: resolve small nits and improve Documentation 2013-11-12 13:11:09 -05:00
dm-crypt.c tree-wide: use reinit_completion instead of INIT_COMPLETION 2013-11-15 09:32:21 +09:00
dm-delay.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm-exception-store.c dm: replace simple_strtoul 2012-07-27 15:07:59 +01:00
dm-exception-store.h dm snapshot: test chunk size against both origin and snapshot 2010-08-12 04:13:51 +01:00
dm-flakey.c dm flakey: correct ctr alloc failure mesg 2013-07-10 23:41:17 +01:00
dm-io.c dm: add reserved_bio_based_ios module parameter 2013-09-23 10:42:24 -04:00
dm-ioctl.c dm: allow remove to be deferred 2013-11-09 18:20:22 -05:00
dm-kcopyd.c dm: stop using WQ_NON_REENTRANT 2013-08-23 09:02:13 -04:00
dm-linear.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm-log-userspace-base.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
dm-log-userspace-transfer.c connector/userns: replace netlink uses of cap_raised() with capable() 2012-05-10 23:21:39 -04:00
dm-log-userspace-transfer.h dm log: userspace add luid to distinguish between concurrent log instances 2009-09-04 20:40:34 +01:00
dm-log.c dm: use memweight() 2012-07-30 17:25:16 -07:00
dm-mpath.c dm mpath: requeue I/O during pg_init 2013-11-05 11:20:34 -05:00
dm-mpath.h
dm-path-selector.c md: Add module.h to all files using it implicitly 2011-10-31 19:31:18 -04:00
dm-path-selector.h dm mpath: add start_io and nr_bytes to path selectors 2009-06-22 10:12:27 +01:00
dm-queue-length.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-raid1.c dm: stop using WQ_NON_REENTRANT 2013-08-23 09:02:13 -04:00
dm-raid.c MD: Remember the last sync operation that was performed 2013-06-26 12:38:24 +10:00
dm-region-hash.c dm raid1: fix crash with mirror recovery and discard 2012-07-20 14:25:03 +01:00
dm-round-robin.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-service-time.c dm: reject trailing characters in sccanf input 2012-03-28 18:41:26 +01:00
dm-snap-persistent.c dm snapshot: fix data corruption 2013-10-16 03:17:47 +01:00
dm-snap-transient.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-snap.c dm-snapshot: fix performance degradation due to small hash size 2013-09-20 10:36:34 -04:00
dm-stats.c dm stats: fix possible counter corruption on 32-bit systems 2013-09-18 14:41:06 -04:00
dm-stats.h dm: add statistics support 2013-09-05 20:46:06 -04:00
dm-stripe.c dm stripe: silence a couple sparse warnings 2013-09-06 11:36:01 -04:00
dm-switch.c dm: add switch target 2013-07-10 23:41:19 +01:00
dm-sysfs.c Driver core: Constify struct sysfs_ops in struct kobj_type 2010-03-07 17:04:49 -08:00
dm-table.c dm table: print error on preresume failure 2013-11-09 18:20:21 -05:00
dm-target.c dm: allow error target to replace bio-based and request-based targets 2013-09-05 20:46:05 -04:00
dm-thin-metadata.c dm thin: generate event when metadata threshold passed 2013-05-10 14:37:21 +01:00
dm-thin-metadata.h dm thin: generate event when metadata threshold passed 2013-05-10 14:37:21 +01:00
dm-thin.c dm thin: do not expose non-zero discard limits if discards disabled 2013-09-23 10:42:06 -04:00
dm-uevent.c md: Add in export.h for files using EXPORT_SYMBOL 2011-10-31 19:31:19 -04:00
dm-uevent.h
dm-verity.c dm verity: use __ffs and __fls 2013-07-10 23:41:17 +01:00
dm-zero.c dm: rename request variables to bios 2013-03-01 22:45:47 +00:00
dm.c dm: allow remove to be deferred 2013-11-09 18:20:22 -05:00
dm.h dm: allow remove to be deferred 2013-11-09 18:20:22 -05:00
faulty.c block: Add bio_end_sector() 2013-03-23 14:15:29 -07:00
Kconfig dm: fix Kconfig menu indentation 2013-11-09 18:20:22 -05:00
linear.c block: Add bio_end_sector() 2013-03-23 14:15:29 -07:00
linear.h md/linear: typedef removal: linear_conf_t -> struct linear_conf 2011-10-11 16:48:54 +11:00
Makefile dm: add statistics support 2013-09-05 20:46:06 -04:00
md.c md: Change handling of save_raid_disk and metadata update during recovery. 2014-01-14 16:44:21 +11:00
md.h md: fix problem when adding device to read-only array with bitmap. 2014-01-14 16:44:08 +11:00
multipath.c MD: change the parameter of md thread 2012-10-11 13:34:00 +11:00
multipath.h md/multipath: typedef removal: multipath_conf_t -> struct mpconf 2011-10-11 16:48:57 +11:00
raid0.c md: fix buglet in RAID5 -> RAID0 conversion. 2013-06-26 12:38:19 +10:00
raid0.h md: add proper merge_bvec handling to RAID0 and Linear. 2012-03-19 12:46:39 +11:00
raid1.c md/raid1: fix request counting bug in new 'barrier' code. 2014-01-14 16:44:07 +11:00
raid1.h raid1: Rewrite the implementation of iobarrier. 2013-11-19 15:19:18 +11:00
raid5.c md/raid5: fix a recently broken BUG_ON(). 2014-01-14 16:44:07 +11:00
raid5.h md update for 3.13. 2013-11-20 13:05:25 -08:00
raid10.c md/raid10: fix bug when raid10 recovery fails to recover a block. 2014-01-14 16:44:08 +11:00
raid10.h MD RAID10: Improve redundancy for 'far' and 'offset' algorithms (part 1) 2013-02-26 11:55:30 +11:00