diff --git a/debian/badblockhowto.html b/debian/badblockhowto.html index 8b9e6b8..52ea120 100644 --- a/debian/badblockhowto.html +++ b/debian/badblockhowto.html @@ -1,13 +1,8 @@ -
Table of Contents
Handling bad blocks is a difficult problem as it often involves decisions about losing information. Modern storage devices tend to handle the simple cases automatically, for example by writing @@ -35,10 +30,10 @@ the media. Even though such a remapping can be done by a disk drive transparently, there is still a lingering worry about media deterioration and the disk running out of spare sectors to remap.
-Can smartmontools help? As the SMART acronym -[1] -suggests, the smartctl command and the -smartd daemon concentrate on monitoring and analysis. +Can smartmontools help? As the SMART acronym +[1] +suggests, the smartctl command and the +smartd daemon concentrate on monitoring and analysis. So apart from changing some reporting settings, smartmontools will not modify the raw data in a device. Also smartmontools only works with physical devices, it does not know about partitions and file systems. @@ -64,14 +59,14 @@ Another approach is to ignore the upper level consequences (e.g. corrupting a file or worse damage to a file system) and use the facilities offered by a storage device to repair the damage. The SCSI disk command set is used elaborate on this low level approach. -
This section contains examples of what to do at the file system level when smartmontools reports a bad block. These examples assume the Linux operating system and either the ext2/ext3 or ReiserFS file system. The various Linux commands shown have man pages and the reader is encouraged -to examine these. Of note is the dd command which is +to examine these. Of note is the dd command which is often used in repair work -[2] +[2] and has a unique command line syntax.
The authors would like to thank Sergey Vlasov, Theodore Ts'o, @@ -79,7 +74,7 @@ Michael Bendzick, and others for explaining this approach. The authors would like to add text showing how to do this for other file systems, in particular XFS, and JFS: please email if you can provide this information. -
+
In this example, the disk is failing self-tests at Logical Block Address LBA = 0x016561e9 = 23421417. The LBA counts sectors in units of 512 bytes, and starts at zero. @@ -170,6 +165,18 @@ and the file that contains that inode: root]# debugfs debugfs 1.32 (09-Nov-2002) debugfs: open /dev/hda3 +debugfs: testb 2269012 +Block 2269012 not in use +
+ +If the block is not in use, as in the above example, then you can skip +the rest of this step and go ahead to Step Five. +
+If, on the other hand, the block is in use, we want to identify +the file that uses it: +
+debugfs: testb 2269012 +Block 2269012 marked in use debugfs: icheck 2269012 Block Inode number 2269012 41032 @@ -177,11 +184,46 @@ debugfs: ncheck 41032 Inode Pathname 41032 /S1/R/H/714197568-714203359/H-R-714202192-16.gwf
-
In this example, you can see that the problematic file (with the mount
point included in the path) is:
/data/S1/R/H/714197568-714203359/H-R-714202192-16.gwf
+When we are working with an ext3 file system, it may happen that the +affected file is the journal itself. Generally, if this is the case, +the inode number will be very small. In any case, debugfs will not +be able to get the file name: +
+debugfs: testb 2269012 +Block 2269012 marked in use +debugfs: icheck 2269012 +Block Inode number +2269012 8 +debugfs: ncheck 8 +Inode Pathname +debugfs: +
+
+To get around this situation, we can remove the journal altogether: +
+tune2fs -O ^has_journal /dev/hda3 +
+ +and then start again with Step Four: we should see this time that the +wrong block is not in use any more. If we removed the journal file, at +the end of the whole procedure we should remember to rebuild it: +
+tune2fs -j /dev/hda3 +
+
+Fifth Step +NOTE: This last step will permanently + + and irretrievably destroy the contents +of the file system block that is damaged: if the block was allocated to +a file, some of the data that is in this file is going to be overwritten +with zeros. You will not be able to recover that data unless you can +replace the file with a fresh or correct version. +
To force the disk to reallocate this bad block we'll write zeros to the bad block, and sync the disk:
@@ -189,11 +231,6 @@ root]# dd if=/dev/zero of=/dev/hda3 bs=4096 count=1 seek=2269012 root]# sync
-NOTE: This last step has permanently - and irretrievably destroyed some of -the data that was in this file. Don't do this unless you don't need -the file or you can replace it with a fresh or correct version. -
Now everything is back to normal: the sector has been reallocated. Compare the output just below to similar output near the top of this article: @@ -207,9 +244,11 @@ ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
Note: for some disks it may be necessary to update the SMART Attribute values by using -smartctl -t offline /dev/hda +smartctl -t offline /dev/hda
-The disk now passes its self-tests again: +We have corrected the first errored block. If more than one blocks +were errored, we should repeat all the steps for the subsequent ones. +After we do that, the disk will pass its self-tests again:
root]# smartctl -t long /dev/hda [wait until test completes, then] @@ -235,7 +274,7 @@ ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_ 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
-
On this drive, the first sign of trouble was this email from smartd:
To: ballen @@ -248,7 +287,7 @@ On this drive, the first sign of trouble was this email from smartd: Device: /dev/hda, Self-Test Log error count increased from 0 to 1
-Running smartctl -a /dev/hda confirmed the problem: +Running smartctl -a /dev/hda confirmed the problem:
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error @@ -354,6 +393,12 @@ Inode Pathname 45192 /S1/R/H/714979488-714985279/H-R-714979984-16.gwf debugfs: quit
+Note that the first few steps of this procedure could also be done +with a single command, which is very helpful if there are many bad +blocks (thanks to Danie Marais for pointing this out): +
+debugfs: icheck 3778301 3778302 3778303 +
And finally, just to confirm that this is really the damaged file:
@@ -396,14 +441,14 @@ Num Test_Description Status Remaining LifeTime(hours) LBA # 1 Extended offline Completed without error 00% 692 - # 2 Extended offline Completed: read failure 80% 682 0x021d9f44
-
This section was written by Kay Diederichs. Even though this section assumes Linux and the ext2/ext3 file system, the strategy should be more generally applicable.
I read your badblocks-howto at and greatly benefited from it. One thing that's (maybe) missing is that often the -smartctl -t long scan finds a bad sector which is +smartctl -t long scan finds a bad sector which is not assigned to any file. In that case it does not help to run debugfs, or rather debugfs reports the fact that no file owns that sector. Furthermore, @@ -418,12 +463,12 @@ huge file on that file system.
creates the file. Leave it running until the partition/file system is full. This will make the disk reallocate those sectors which do not -belong to a file. Check the smartctl -a output after +belong to a file. Check the smartctl -a output after that and make sure that the sectors are reallocated. If any remain, use the debugfs method. Of course the usual caveats apply - back it up first, and so on. -
This section was written by Joachim Jautz with additions from Manfred Schwarb.
@@ -436,15 +481,15 @@ smartd[575]: Device: /dev/hda, 1 Offline uncorrectable sectors
[Step 0] The SMART selftest/error log
-(see smartctl -l selftest) indicated there was a problem
+(see smartctl -l selftest) indicated there was a problem
with block address (i.e. the 512 byte sector at) 58656333. The partition
-table (e.g. see sfdisk -luS /dev/hda or
-fdisk -ul /dev/hda) indicated that this block was in the
+table (e.g. see sfdisk -luS /dev/hda or
+fdisk -ul /dev/hda) indicated that this block was in the
/dev/hda3
partition which contained a ReiserFS file
system. That partition started at block address 54781650.
While doing the initial analysis it may also be useful to take a copy -of the disk attributes returned by smartctl -A /dev/hda. +of the disk attributes returned by smartctl -A /dev/hda. Specifically the values associated with the "Reallocated_Sector_Ct" and "Reallocated_Event_Count" attributes (for ATA disks, the grown list (GLIST) length for SCSI disks). If these are incremented at the end of the procedure @@ -463,7 +508,7 @@ Blocksize: 4096
It is re-assuring that the calculated 4 KB damaged block address in
/dev/hda3
is less than "Count of blocks on the
-device" shown in the output of debugreiserfs shown above.
+device" shown in the output of debugreiserfs shown above.
[Step 3] Try to get more info about this block => reading the block fails as expected but at least we see now that it seems to be unused. @@ -487,7 +532,7 @@ advice then if you have just a few bad blocks, try writing to the bad blocks and see if the drive remaps the bad blocks (that means it takes a block it has in reserve and allocates it for use for of that block number). If it cannot remap the block, use -badblock option (-B) with reiserfs utils to handle +badblock option (-B) with reiserfs utils to handle this block correctly.
bread: Cannot read the block (484335): (Input/output error). @@ -497,24 +542,24 @@ Aborted So it looks like we have the right (i.e. faulty) block address.[Step 4] Try then to find the affected file -[3]: +[3]:
-tar -cO /mydir >/dev/null +tar -cO /mydir | cat >/dev/nullIf you do not find any unreadable files, then the block may be free or located in some metadata of the file system.
[Step 5] Try your luck: bang the affected block with -badblocks -n (non-destructive read-write mode, do unmount +badblocks -n (non-destructive read-write mode, do unmount first), if you are very lucky the failure is transient and you can provoke reallocation -[4]: +[4]:
# badblocks -b 4096 -p 3 -s -v -n /dev/hda3 `expr 484335 + 100` `expr 484335 - 100`-check success with debugreiserfs -1 484335 /dev/hda3. +check success with debugreiserfs -1 484335 /dev/hda3. Otherwise:
[Step 6] Perform this step only if Step 5 has failed @@ -535,23 +580,23 @@ This could take a long time so you probably better go for lunch ...
[Step 8] Proceed as stated earlier. For example, sync disk and run a long selftest that should succeed now. -
This section first looks at a damaged partition table. Then it ignores the upper level impact of a bad block and just repairs the underlying sector so that defective sector will not cause problems in the future. -
+
Some software failures can lead to zeroes or random data being written on the first block of a disk. For disks that use a DOS-based partitioning scheme this will overwrite the partition table which is found at the end of the first block. This is a single point of failure so after the -damage tools like fdisk have no alternate data to use +damage tools like fdisk have no alternate data to use so they report no partitions or a damaged partition table.
One utility that may help is
-
+
testdisk
which can scan a disk looking for
partitions and recreate a partition table if requested.
-[6]
+[6]
Programs that create DOS partitions often place the first partition at logical block address 63. In Linux @@ -574,7 +619,7 @@ a disk. The extended DOS partition table is placed elsewhere on a disk. Again there is only one copy of it so it represents another single point of failure. All DOS partition information can be read in a form that can be used to recreate the tables with the -sfdisk command. Obviously this needs to be done +sfdisk command. Obviously this needs to be done beforehand and the file put on other media. Here is how to fetch the partition table information:
@@ -594,15 +639,15 @@ block(s) holding the partition table(s) and puts it in changes the partition tables as indicated bymy_disk_partition_info.txt
. For what it is worth the author did test this on his system! -[7] +[7]For creating, destroying, resizing, checking and copying partitions, and the file systems on them, GNU's - +
parted
is worth examining. -The +TheLarge Disk HOWTO
is also a useful resource. -
This section was written by Frederic BOITEUX. It was titled: "HOW TO LOCATE AND REPAIR BAD BLOCKS ON AN LVM VOLUME".
@@ -750,10 +795,10 @@ renounce
Search / correction follows the same scheme as for simple partitions : -
+
find possible impacted files with debugfs (icheck <fs block nb>, then ncheck <icheck nb>). -
+
reallocate bad block writing zeros in it, *using the fs block size* :
@@ -762,7 +807,7 @@ dd if=/dev/zero of=/dev/WDC80Go/ext1 count=1 bs=4096 seek=1378581
Et voilą ! -
The SCSI disk command set and associated disk architecture are assumed in this section. SCSI disks have their own logical to physical mapping allowing a damaged sector (usually carrying 512 bytes of data) to be @@ -789,11 +834,11 @@ sectors are a scarce resource.
Once a SCSI disk format has completed successfully, other problems may appear over time. These fall into two categories: -
+
recoverable: the Error Correction Codes (ECC) detect a problem but it is small enough to be corrected. Optionally other strategies such as retrying the access may retrieve the data. -
+
unrecoverable: try as it may, the disk logic and ECC algorithms cannot recover the data. This is often reported as a medium error. @@ -808,12 +853,12 @@ the incoming data detects a CRC error due to a bad cable or termination. Depending on the disk vendor, recoverable errors can be ignored. After all, some disks have up to 68 bytes of ECC above the payload size of 512 bytes so why use up spare sectors which are limited in number -[8] +[8] ? If the disk can recover the data and does decide to re-allocate (reassign) a sector, then first it checks the settings of the ARRE and AWRE bits in the read-write error recovery mode page. Usually these bits are set -[9] +[9] enabling automatic (read or write) re-allocation. The automatic re-allocation may also fail if the zone (or disk) has run out of spare sectors. @@ -825,7 +870,7 @@ disk to spend too long trying to recover an error. Unrecoverable errors will cause a medium error sense key, perhaps with some useful additional sense information. If the extended background self test includes a full disk read scan, one would expect the -self test log to list the bad block, as shown in the the section called “Repairs in a file system”. +self test log to list the bad block, as shown in the the section called “Repairs in a file system”. Recent SCSI disks with a periodic background scan should also list unrecoverable read errors (and some recoverable errors as well). The advantage of the background scan is that it runs to completion while self @@ -839,7 +884,7 @@ command will reassign one or more blocks, attempting to (partially ?) recover the data (a forlorn hope at this stage), fetch an unused spare sector from the current zone while adding the damaged old sector to the GLIST (hence the name "grown" list). The contents of the GLIST may not be that interesting -but smartctl prints out the number of entries in the grown +but smartctl prints out the number of entries in the grown list and if that number grows quickly, the disk may be approaching the end of its useful life.
@@ -847,26 +892,26 @@ Here is an alternate brute force technique to consider: if the data on the SCSI or ATA disk has all been backed up (e.g. is held on the other disks in a RAID 5 enclosure), then simply reformatting the disk may be the least cumbersome approach. -
+
Given a "bad block", it still may be useful to look at the -fdisk command (if the disk has multiple partitions) +fdisk command (if the disk has multiple partitions) to find out which partition is involved, then use -debugfs (or a similar tool for the file system in +debugfs (or a similar tool for the file system in question) to find out which, if any, file or other part of the file system -may have been damaged. This is discussed in the the section called “Repairs in a file system”. +may have been damaged. This is discussed in the the section called “Repairs in a file system”.
Then a program that can execute the REASSIGN BLOCKS SCSI command is required. In Linux (2.4 and 2.6 series), FreeBSD, Tru64(OSF) and Windows -the author's sg_reassign utility in the sg3_utils +the author's sg_reassign utility in the sg3_utils package can be used. Also found in that package is -sg_verify which can be used to check that a block is +sg_verify which can be used to check that a block is readable.
Assume that logical block address 1193046 (which is 123456 in hex) is
corrupt
-[10]
+[10]
on the disk at /dev/sdb
. A long selftest command like
-smartctl -t long /dev/sdb may result in log results
+smartctl -t long /dev/sdb may result in log results
like this:
# smartctl -l selftest /dev/sdb @@ -882,7 +927,7 @@ Num Test Status segment LifeTime LBA_first_err [SK AS # 3 Background short Completed - 194 - [- - -]
-The sg_verify utility can be used to confirm that there +The sg_verify utility can be used to confirm that there is a problem at that address:
# sg_verify --lba=1193046 /dev/sdb @@ -911,17 +956,17 @@ length:The GLIST length has grown by one as expected. If the disk was unable to recover any data, then the "new" block at lba 0x123456 has vendor specific -data in it. The sg_reassign utility can also do bulk -reassigns, see man sg_reassign for more information. +data in it. The sg_reassign utility can also do bulk +reassigns, see man sg_reassign for more information.
-The dd command could be used to read the contents of +The dd command could be used to read the contents of the "new" block:
# dd if=/dev/sdb iflag=direct skip=1193046 of=blk.img bs=512 count=1
and a hex editor -[11] +[11] used to view and potentially change the
blk.img
file. An alteredblk.img
file (or/dev/zero
) could be written back with: @@ -936,36 +981,38 @@ a superblock or a directory. Even if a full backup of the disk is available, or the disk has been "ejected" from a RAID, it may still be worthwhile to reassign the bad block(s) that caused the problem (or simply format the disk (see -sg_format in the sg3_utils package)) and re-use the +sg_format in the sg3_utils package)) and re-use the disk later (not unlike the way a replacement disk from a manufacturer might be used).-CVS $Id: badblockhowto.xml,v 1.4 2007/01/31 13:56:32 dpgilbert Exp $ -
[1] Self-Monitoring, Analysis and Reporting Technology -> SMART -
[2] -Starting with GNU coreutils release 5.3.0, the dd +
[2]
+Starting with GNU coreutils release 5.3.0, the dd
command in Linux includes the options 'iflag=direct' and 'oflag=direct'.
-Using these with the dd commands should be helpful,
+Using these with the dd commands should be helpful,
because adding these flags should avoid any interaction
with the block buffering IO layer in Linux and permit direct reads/writes
-from the raw device. Use dd --help to see if your
+from the raw device. Use dd --help to see if your
version of dd supports these options. If not, the latest code for dd
-can be found at
+can be found at
alpha.gnu.org/gnu/coreutils
.
-
[3] -Do not use tar cf /dev/null, see -info tar. -
[4] +
[3]
+Do not use tar -c -f /dev/null or
+tar -cO /mydir >/dev/null. GNU tar does not
+actually read the files if /dev/null
is used as
+archive path or as standard output, see info tar.
+
[4] Important: set blocksize range is arbitrary, but do not only test a single block, as bad blocks are often social. Not too large as this test probably has not 0% risk. -
[5] +
[5] The rather awkward `expr 484335 + 100` (note the back quotes) can be replaced with $((484335+100)) if the bash shell is being used. Similarly the last argument can become $((484335-100)) . -
[6] -testdisk scans the media for the beginning of file +
[6]
+testdisk scans the media for the beginning of file
systems that it recognizes. It can be tricked by data that looks
like the beginning of a file system or an old file system from a
previous partitioning of the media (disk). So care should be taken.
@@ -974,24 +1021,24 @@ extended partitions lie wholly within a extended partition table
allocation. Also if the root partition of a Linux/Unix installation
can be found then the /etc/fstab
file is a useful
resource for finding the partition numbers of other partitions.
-
[7] +
[7] Thanks to Manfred Schwarb for the information about storing partition table(s) beforehand. -
[8] +
[8] Detecting and fixing an error with ECC "on the fly" and not going the further step and reassigning the block in question may explain why some disks have large numbers in their read error counter log. Various worried users have reported large numbers in the "errors corrected without substantial delay" counter field which is in the "Errors corrected by ECC fast" column in -the smartctl -l error output. -
[9] +the smartctl -l error output. +
[9] Often disks inside a hardware RAID have the ARRE and AWRE bits cleared (disabled) so the RAID controller can do things manually or flag the disk for replacement. -
[10] +
[10] In this case the corruption was manufactured by using the WRITE LONG -SCSI command. See sg_write_long in sg3_utils. -
[11] +SCSI command. See sg_write_long in sg3_utils. +
[11] Most window managers have a handy calculator that will do hex to decimal conversions. More work may be needed at the file system level,