mirror of
https://git.proxmox.com/git/mirror_smartmontools-debian
synced 2025-08-03 04:03:03 +00:00
Updated debian/badblockhowto.html
Closes: #540359 Thanks: Francesco Potorti`
This commit is contained in:
parent
9ca7a79c40
commit
d494f0050a
247
debian/badblockhowto.html
vendored
247
debian/badblockhowto.html
vendored
@ -1,13 +1,8 @@
|
||||
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Bad block HOWTO for smartmontools</title><meta name="generator" content="DocBook XSL Stylesheets V1.69.1"><meta name="description" content="
|
||||
This article describes what actions might be taken when smartmontools
|
||||
detects a bad block on a disk. It demonstrates how to identify the file
|
||||
associated with an unreadable disk sector, and how to force that sector
|
||||
to reallocate.
|
||||
"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="article" lang="en"><div class="titlepage"><div><div><h1 class="title"><a name="index"></a>Bad block HOWTO for smartmontools</h1></div><div><div class="author"><h3 class="author"><span class="firstname">Bruce</span> <span class="surname">Allen</span></h3><div class="affiliation"><div class="address"><p><br>
|
||||
<code class="email"><<a href="mailto:smartmontools-support@lists.sourceforge.net">smartmontools-support@lists.sourceforge.net</a>></code><br>
|
||||
<html><head><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"><title>Bad block HOWTO for smartmontools</title><meta name="generator" content="DocBook XSL Stylesheets V1.75.2"><meta name="description" content="This article describes what actions might be taken when smartmontools detects a bad block on a disk. It demonstrates how to identify the file associated with an unreadable disk sector, and how to force that sector to reallocate."></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="article" title="Bad block HOWTO for smartmontools"><div class="titlepage"><div><div><h2 class="title"><a name="index"></a>Bad block HOWTO for smartmontools</h2></div><div><div class="author"><h3 class="author"><span class="firstname">Bruce</span> <span class="surname">Allen</span></h3><div class="affiliation"><div class="address"><p><br>
|
||||
<code class="email"><<a class="email" href="mailto:smartmontools-support@lists.sourceforge.net">smartmontools-support@lists.sourceforge.net</a>></code><br>
|
||||
</p></div></div></div></div><div><div class="author"><h3 class="author"><span class="firstname">Douglas</span> <span class="surname">Gilbert</span></h3><div class="affiliation"><div class="address"><p><br>
|
||||
<code class="email"><<a href="mailto:smartmontools-support@lists.sourceforge.net">smartmontools-support@lists.sourceforge.net</a>></code><br>
|
||||
</p></div></div></div></div><div><p class="copyright">Copyright © 2004, 2005, 2006, 2007 Bruce Allen</p></div><div><div class="legalnotice"><a name="id4710404"></a><p>
|
||||
<code class="email"><<a class="email" href="mailto:smartmontools-support@lists.sourceforge.net">smartmontools-support@lists.sourceforge.net</a>></code><br>
|
||||
</p></div></div></div></div><div><p class="copyright">Copyright © 2004, 2005, 2006, 2007 Bruce Allen</p></div><div><div class="legalnotice" title="Legal Notice"><a name="id2541562"></a><p>
|
||||
Permission is granted to copy, distribute and/or modify this document
|
||||
under the terms of the GNU Free Documentation License, Version 1.1
|
||||
or any later version published by the Free Software Foundation;
|
||||
@ -15,18 +10,18 @@
|
||||
no Back-Cover Texts.
|
||||
</p><p>
|
||||
For an online copy of the license see
|
||||
<a href="http://www.fsf.org/copyleft/fdl.html" target="_top">
|
||||
<a class="ulink" href="http://www.fsf.org/copyleft/fdl.html" target="_top">
|
||||
<code class="literal">www.fsf.org/copyleft/fdl.html</code></a>.
|
||||
</p></div></div><div><p class="pubdate">2007-01-23</p></div><div><div class="revhistory"><table border="1" width="100%" summary="Revision history"><tr><th align="left" valign="top" colspan="3"><b>Revision History</b></th></tr><tr><td align="left">Revision 1.1</td><td align="left">2007-01-23</td><td align="left">dpg</td></tr><tr><td align="left" colspan="3">
|
||||
add sections on ReiserFS and partition table damage
|
||||
</td></tr><tr><td align="left">Revision 1.0</td><td align="left">2006-11-14</td><td align="left">dpg</td></tr><tr><td align="left" colspan="3">
|
||||
merge BadBlockHowTo.txt and BadBlockSCSIHowTo.txt
|
||||
</td></tr></table></div></div><div><div class="abstract"><p class="title"><b>Abstract</b></p><p>
|
||||
</td></tr></table></div></div><div><div class="abstract" title="Abstract"><p class="title"><b>Abstract</b></p><p>
|
||||
This article describes what actions might be taken when smartmontools
|
||||
detects a bad block on a disk. It demonstrates how to identify the file
|
||||
associated with an unreadable disk sector, and how to force that sector
|
||||
to reallocate.
|
||||
</p></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="#intro">Introduction</a></span></dt><dt><span class="sect1"><a href="#rfile">Repairs in a file system</a></span></dt><dd><dl><dt><span class="sect2"><a href="#e2_example1">ext2/ext3 first example</a></span></dt><dt><span class="sect2"><a href="#e2_example2">ext2/ext3 second example</a></span></dt><dt><span class="sect2"><a href="#unassigned">Unassigned sectors</a></span></dt><dt><span class="sect2"><a href="#reiserfs_ex">ReiserFS example</a></span></dt></dl></dd><dt><span class="sect1"><a href="#sdisk">Repairs at the disk level</a></span></dt><dd><dl><dt><span class="sect2"><a href="#partition">Partition table problems</a></span></dt><dt><span class="sect2"><a href="#lvm">LVM repairs</a></span></dt><dt><span class="sect2"><a href="#bb">Bad block reassignment</a></span></dt></dl></dd></dl></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="intro"></a>Introduction</h2></div></div></div><p>
|
||||
</p></div></div></div><hr></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="sect1"><a href="#intro">Introduction</a></span></dt><dt><span class="sect1"><a href="#rfile">Repairs in a file system</a></span></dt><dd><dl><dt><span class="sect2"><a href="#e2_example1">ext2/ext3 first example</a></span></dt><dt><span class="sect2"><a href="#e2_example2">ext2/ext3 second example</a></span></dt><dt><span class="sect2"><a href="#unassigned">Unassigned sectors</a></span></dt><dt><span class="sect2"><a href="#reiserfs_ex">ReiserFS example</a></span></dt></dl></dd><dt><span class="sect1"><a href="#sdisk">Repairs at the disk level</a></span></dt><dd><dl><dt><span class="sect2"><a href="#partition">Partition table problems</a></span></dt><dt><span class="sect2"><a href="#lvm">LVM repairs</a></span></dt><dt><span class="sect2"><a href="#bb">Bad block reassignment</a></span></dt></dl></dd></dl></div><div class="sect1" title="Introduction"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="intro"></a>Introduction</h2></div></div></div><p>
|
||||
Handling bad blocks is a difficult problem as it often involves
|
||||
decisions about losing information. Modern storage devices tend
|
||||
to handle the simple cases automatically, for example by writing
|
||||
@ -35,10 +30,10 @@ the media. Even though such a remapping can be done by a disk
|
||||
drive transparently, there is still a lingering worry about media
|
||||
deterioration and the disk running out of spare sectors to remap.
|
||||
</p><p>
|
||||
Can smartmontools help? As the <span class="acronym">SMART</span> acronym
|
||||
<sup>[<a name="id4710480" href="#ftn.id4710480">1</a>]</sup>
|
||||
suggests, the <span><strong class="command">smartctl</strong></span> command and the
|
||||
<span><strong class="command">smartd</strong></span> daemon concentrate on monitoring and analysis.
|
||||
Can smartmontools help? As the <acronym class="acronym">SMART</acronym> acronym
|
||||
<sup>[<a name="id2506421" href="#ftn.id2506421" class="footnote">1</a>]</sup>
|
||||
suggests, the <span class="command"><strong>smartctl</strong></span> command and the
|
||||
<span class="command"><strong>smartd</strong></span> daemon concentrate on monitoring and analysis.
|
||||
So apart from changing some reporting settings, smartmontools will not
|
||||
modify the raw data in a device. Also smartmontools only works with
|
||||
physical devices, it does not know about partitions and file systems.
|
||||
@ -64,14 +59,14 @@ Another approach is to ignore the upper level consequences (e.g. corrupting
|
||||
a file or worse damage to a file system) and use the facilities offered by
|
||||
a storage device to repair the damage. The SCSI disk command set is used
|
||||
elaborate on this low level approach.
|
||||
</p></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="rfile"></a>Repairs in a file system</h2></div></div></div><p>
|
||||
</p></div><div class="sect1" title="Repairs in a file system"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="rfile"></a>Repairs in a file system</h2></div></div></div><p>
|
||||
This section contains examples of what to do at the file system level
|
||||
when smartmontools reports a bad block. These examples assume the Linux
|
||||
operating system and either the ext2/ext3 or ReiserFS file system. The
|
||||
various Linux commands shown have man pages and the reader is encouraged
|
||||
to examine these. Of note is the <span><strong class="command">dd</strong></span> command which is
|
||||
to examine these. Of note is the <span class="command"><strong>dd</strong></span> command which is
|
||||
often used in repair work
|
||||
<sup>[<a name="id4710574" href="#ftn.id4710574">2</a>]</sup>
|
||||
<sup>[<a name="id2506498" href="#ftn.id2506498" class="footnote">2</a>]</sup>
|
||||
and has a unique command line syntax.
|
||||
</p><p>
|
||||
The authors would like to thank Sergey Vlasov, Theodore Ts'o,
|
||||
@ -79,7 +74,7 @@ Michael Bendzick, and others for explaining this approach. The authors would
|
||||
like to add text showing how to do this for other file systems, in
|
||||
particular XFS, and JFS: please email if you can provide this
|
||||
information.
|
||||
</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="e2_example1"></a>ext2/ext3 first example</h3></div></div></div><p>
|
||||
</p><div class="sect2" title="ext2/ext3 first example"><div class="titlepage"><div><div><h3 class="title"><a name="e2_example1"></a>ext2/ext3 first example</h3></div></div></div><p>
|
||||
In this example, the disk is failing self-tests at Logical Block
|
||||
Address LBA = 0x016561e9 = 23421417. The LBA counts sectors in units
|
||||
of 512 bytes, and starts at zero.
|
||||
@ -170,6 +165,18 @@ and the file that contains that inode:
|
||||
root]# debugfs
|
||||
debugfs 1.32 (09-Nov-2002)
|
||||
debugfs: open /dev/hda3
|
||||
debugfs: testb 2269012
|
||||
Block 2269012 not in use
|
||||
</pre><p>
|
||||
|
||||
If the block is not in use, as in the above example, then you can skip
|
||||
the rest of this step and go ahead to Step Five.
|
||||
</p><p>
|
||||
If, on the other hand, the block is in use, we want to identify
|
||||
the file that uses it:
|
||||
</p><pre class="programlisting">
|
||||
debugfs: testb 2269012
|
||||
Block 2269012 marked in use
|
||||
debugfs: icheck 2269012
|
||||
Block Inode number
|
||||
2269012 41032
|
||||
@ -177,11 +184,46 @@ debugfs: ncheck 41032
|
||||
Inode Pathname
|
||||
41032 /S1/R/H/714197568-714203359/H-R-714202192-16.gwf
|
||||
</pre><p>
|
||||
|
||||
In this example, you can see that the problematic file (with the mount
|
||||
point included in the path) is:
|
||||
<code class="filename">/data/S1/R/H/714197568-714203359/H-R-714202192-16.gwf</code>
|
||||
</p><p>
|
||||
When we are working with an ext3 file system, it may happen that the
|
||||
affected file is the journal itself. Generally, if this is the case,
|
||||
the inode number will be very small. In any case, debugfs will not
|
||||
be able to get the file name:
|
||||
</p><pre class="programlisting">
|
||||
debugfs: testb 2269012
|
||||
Block 2269012 marked in use
|
||||
debugfs: icheck 2269012
|
||||
Block Inode number
|
||||
2269012 8
|
||||
debugfs: ncheck 8
|
||||
Inode Pathname
|
||||
debugfs:
|
||||
</pre><p>
|
||||
</p><p>
|
||||
To get around this situation, we can remove the journal altogether:
|
||||
</p><pre class="programlisting">
|
||||
tune2fs -O ^has_journal /dev/hda3
|
||||
</pre><p>
|
||||
|
||||
and then start again with Step Four: we should see this time that the
|
||||
wrong block is not in use any more. If we removed the journal file, at
|
||||
the end of the whole procedure we should remember to rebuild it:
|
||||
</p><pre class="programlisting">
|
||||
tune2fs -j /dev/hda3
|
||||
</pre><p>
|
||||
</p><p>
|
||||
Fifth Step
|
||||
<span class="emphasis"><em>NOTE:</em></span> This last step will <span class="emphasis"><em>permanently
|
||||
|
||||
</em></span> and irretrievably <span class="emphasis"><em>destroy</em></span> the contents
|
||||
of the file system block that is damaged: if the block was allocated to
|
||||
a file, some of the data that is in this file is going to be overwritten
|
||||
with zeros. You will not be able to recover that data unless you can
|
||||
replace the file with a fresh or correct version.
|
||||
</p><p>
|
||||
To force the disk to reallocate this bad block we'll write zeros to
|
||||
the bad block, and sync the disk:
|
||||
</p><pre class="programlisting">
|
||||
@ -189,11 +231,6 @@ root]# dd if=/dev/zero of=/dev/hda3 bs=4096 count=1 seek=2269012
|
||||
root]# sync
|
||||
</pre><p>
|
||||
</p><p>
|
||||
<span class="emphasis"><em>NOTE:</em></span> This last step has <span class="emphasis"><em>permanently
|
||||
</em></span> and irretrievably <span class="emphasis"><em>destroyed</em></span> some of
|
||||
the data that was in this file. Don't do this unless you don't need
|
||||
the file or you can replace it with a fresh or correct version.
|
||||
</p><p>
|
||||
Now everything is back to normal: the sector has been reallocated.
|
||||
Compare the output just below to similar output near the top of this
|
||||
article:
|
||||
@ -207,9 +244,11 @@ ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
|
||||
</pre><p>
|
||||
|
||||
Note: for some disks it may be necessary to update the SMART Attribute values by using
|
||||
<span><strong class="command">smartctl -t offline /dev/hda</strong></span>
|
||||
<span class="command"><strong>smartctl -t offline /dev/hda</strong></span>
|
||||
</p><p>
|
||||
The disk now passes its self-tests again:
|
||||
We have corrected the first errored block. If more than one blocks
|
||||
were errored, we should repeat all the steps for the subsequent ones.
|
||||
After we do that, the disk will pass its self-tests again:
|
||||
|
||||
</p><pre class="programlisting">
|
||||
root]# smartctl -t long /dev/hda [wait until test completes, then]
|
||||
@ -235,7 +274,7 @@ ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_
|
||||
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
|
||||
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
|
||||
</pre><p>
|
||||
</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="e2_example2"></a>ext2/ext3 second example</h3></div></div></div><p>
|
||||
</p></div><div class="sect2" title="ext2/ext3 second example"><div class="titlepage"><div><div><h3 class="title"><a name="e2_example2"></a>ext2/ext3 second example</h3></div></div></div><p>
|
||||
On this drive, the first sign of trouble was this email from smartd:
|
||||
</p><pre class="programlisting">
|
||||
To: ballen
|
||||
@ -248,7 +287,7 @@ On this drive, the first sign of trouble was this email from smartd:
|
||||
Device: /dev/hda, Self-Test Log error count increased from 0 to 1
|
||||
</pre><p>
|
||||
</p><p>
|
||||
Running <span><strong class="command">smartctl -a /dev/hda</strong></span> confirmed the problem:
|
||||
Running <span class="command"><strong>smartctl -a /dev/hda</strong></span> confirmed the problem:
|
||||
|
||||
</p><pre class="programlisting">
|
||||
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
|
||||
@ -354,6 +393,12 @@ Inode Pathname
|
||||
45192 /S1/R/H/714979488-714985279/H-R-714979984-16.gwf
|
||||
debugfs: quit
|
||||
</pre><p>
|
||||
Note that the first few steps of this procedure could also be done
|
||||
with a single command, which is very helpful if there are many bad
|
||||
blocks (thanks to Danie Marais for pointing this out):
|
||||
</p><pre class="programlisting">
|
||||
debugfs: icheck 3778301 3778302 3778303
|
||||
</pre><p>
|
||||
</p><p>
|
||||
And finally, just to confirm that this is really the damaged file:
|
||||
</p><p>
|
||||
@ -396,14 +441,14 @@ Num Test_Description Status Remaining LifeTime(hours) LBA
|
||||
# 1 Extended offline Completed without error 00% 692 -
|
||||
# 2 Extended offline Completed: read failure 80% 682 0x021d9f44
|
||||
</pre><p>
|
||||
</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="unassigned"></a>Unassigned sectors</h3></div></div></div><p>
|
||||
</p></div><div class="sect2" title="Unassigned sectors"><div class="titlepage"><div><div><h3 class="title"><a name="unassigned"></a>Unassigned sectors</h3></div></div></div><p>
|
||||
This section was written by Kay Diederichs. Even though this section
|
||||
assumes Linux and the ext2/ext3 file system, the strategy should be
|
||||
more generally applicable.
|
||||
</p><p>
|
||||
I read your badblocks-howto at and greatly
|
||||
benefited from it. One thing that's (maybe) missing is that often the
|
||||
<span><strong class="command">smartctl -t long</strong></span> scan finds a bad sector which is
|
||||
<span class="command"><strong>smartctl -t long</strong></span> scan finds a bad sector which is
|
||||
<span class="emphasis"><em> not</em></span> assigned to
|
||||
any file. In that case it does not help to run debugfs, or rather
|
||||
debugfs reports the fact that no file owns that sector. Furthermore,
|
||||
@ -418,12 +463,12 @@ huge file on that file system.
|
||||
</pre><p>
|
||||
creates the file. Leave it running until the partition/file system is
|
||||
full. This will make the disk reallocate those sectors which do not
|
||||
belong to a file. Check the <span><strong class="command">smartctl -a</strong></span> output after
|
||||
belong to a file. Check the <span class="command"><strong>smartctl -a</strong></span> output after
|
||||
that and make
|
||||
sure that the sectors are reallocated. If any remain, use the debugfs
|
||||
method. Of course the usual caveats apply - back it up first, and so
|
||||
on.
|
||||
</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="reiserfs_ex"></a>ReiserFS example</h3></div></div></div><p>
|
||||
</p></div><div class="sect2" title="ReiserFS example"><div class="titlepage"><div><div><h3 class="title"><a name="reiserfs_ex"></a>ReiserFS example</h3></div></div></div><p>
|
||||
This section was written by Joachim Jautz with additions from Manfred
|
||||
Schwarb.
|
||||
</p><p>
|
||||
@ -436,15 +481,15 @@ smartd[575]: Device: /dev/hda, 1 Offline uncorrectable sectors
|
||||
</pre><p>
|
||||
</p><p>
|
||||
[Step 0] The SMART selftest/error log
|
||||
(see <span><strong class="command">smartctl -l selftest</strong></span>) indicated there was a problem
|
||||
(see <span class="command"><strong>smartctl -l selftest</strong></span>) indicated there was a problem
|
||||
with block address (i.e. the 512 byte sector at) 58656333. The partition
|
||||
table (e.g. see <span><strong class="command">sfdisk -luS /dev/hda</strong></span> or
|
||||
<span><strong class="command">fdisk -ul /dev/hda</strong></span>) indicated that this block was in the
|
||||
table (e.g. see <span class="command"><strong>sfdisk -luS /dev/hda</strong></span> or
|
||||
<span class="command"><strong>fdisk -ul /dev/hda</strong></span>) indicated that this block was in the
|
||||
<code class="filename">/dev/hda3</code> partition which contained a ReiserFS file
|
||||
system. That partition started at block address 54781650.
|
||||
</p><p>
|
||||
While doing the initial analysis it may also be useful to take a copy
|
||||
of the disk attributes returned by <span><strong class="command">smartctl -A /dev/hda</strong></span>.
|
||||
of the disk attributes returned by <span class="command"><strong>smartctl -A /dev/hda</strong></span>.
|
||||
Specifically the values associated with the "Reallocated_Sector_Ct" and
|
||||
"Reallocated_Event_Count" attributes (for ATA disks, the grown list (GLIST)
|
||||
length for SCSI disks). If these are incremented at the end of the procedure
|
||||
@ -463,7 +508,7 @@ Blocksize: 4096
|
||||
</pre><p>
|
||||
It is re-assuring that the calculated 4 KB damaged block address in
|
||||
<code class="filename">/dev/hda3</code> is less than "Count of blocks on the
|
||||
device" shown in the output of <span><strong class="command">debugreiserfs</strong></span> shown above.
|
||||
device" shown in the output of <span class="command"><strong>debugreiserfs</strong></span> shown above.
|
||||
</p><p>
|
||||
[Step 3] Try to get more info about this block => reading the block
|
||||
fails as expected but at least we see now that it seems to be unused.
|
||||
@ -487,7 +532,7 @@ advice then if you have just a few bad blocks, try writing to the
|
||||
bad blocks and see if the drive remaps the bad blocks (that means
|
||||
it takes a block it has in reserve and allocates it for use for
|
||||
of that block number). If it cannot remap the block, use
|
||||
<span><strong class="command">badblock</strong></span> option (-B) with reiserfs utils to handle
|
||||
<span class="command"><strong>badblock</strong></span> option (-B) with reiserfs utils to handle
|
||||
this block correctly.
|
||||
</p><pre class="programlisting">
|
||||
bread: Cannot read the block (484335): (Input/output error).
|
||||
@ -497,24 +542,24 @@ Aborted
|
||||
So it looks like we have the right (i.e. faulty) block address.
|
||||
</p><p>
|
||||
[Step 4] Try then to find the affected file
|
||||
<sup>[<a name="id4711397" href="#ftn.id4711397">3</a>]</sup>:
|
||||
<sup>[<a name="id2550815" href="#ftn.id2550815" class="footnote">3</a>]</sup>:
|
||||
</p><pre class="programlisting">
|
||||
tar -cO /mydir >/dev/null
|
||||
tar -cO /mydir | cat >/dev/null
|
||||
</pre><p>
|
||||
If you do not find any unreadable files, then the block may be free or
|
||||
located in some metadata of the file system.
|
||||
</p><p>
|
||||
[Step 5] Try your luck: bang the affected block with
|
||||
<span><strong class="command">badblocks -n</strong></span> (non-destructive read-write mode, do unmount
|
||||
<span class="command"><strong>badblocks -n</strong></span> (non-destructive read-write mode, do unmount
|
||||
first), if you are very lucky the failure is transient and you can provoke
|
||||
reallocation
|
||||
<sup>[<a name="id4711431" href="#ftn.id4711431">4</a>]</sup>:
|
||||
<sup>[<a name="id2550862" href="#ftn.id2550862" class="footnote">4</a>]</sup>:
|
||||
</p><pre class="programlisting">
|
||||
# badblocks -b 4096 -p 3 -s -v -n /dev/hda3 `expr 484335 + 100` `expr 484335 - 100`
|
||||
</pre><p>
|
||||
<sup>[<a name="id4711447" href="#ftn.id4711447">5</a>]</sup>
|
||||
<sup>[<a name="id2550876" href="#ftn.id2550876" class="footnote">5</a>]</sup>
|
||||
</p><p>
|
||||
check success with <span><strong class="command">debugreiserfs -1 484335 /dev/hda3</strong></span>.
|
||||
check success with <span class="command"><strong>debugreiserfs -1 484335 /dev/hda3</strong></span>.
|
||||
Otherwise:
|
||||
</p><p>
|
||||
[Step 6] Perform this step <span class="emphasis"><em>only</em></span> if Step 5 has failed
|
||||
@ -535,23 +580,23 @@ This could take a long time so you probably better go for lunch ...
|
||||
</p><p>
|
||||
[Step 8] Proceed as stated earlier. For example, sync disk and run a long
|
||||
selftest that should succeed now.
|
||||
</p></div></div><div class="sect1" lang="en"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sdisk"></a>Repairs at the disk level</h2></div></div></div><p>
|
||||
</p></div></div><div class="sect1" title="Repairs at the disk level"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="sdisk"></a>Repairs at the disk level</h2></div></div></div><p>
|
||||
This section first looks at a damaged partition table. Then it ignores
|
||||
the upper level impact of a bad block and just repairs the underlying
|
||||
sector so that defective sector will not cause problems in the future.
|
||||
</p><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="partition"></a>Partition table problems</h3></div></div></div><p>
|
||||
</p><div class="sect2" title="Partition table problems"><div class="titlepage"><div><div><h3 class="title"><a name="partition"></a>Partition table problems</h3></div></div></div><p>
|
||||
Some software failures can lead to zeroes or random data being written
|
||||
on the first block of a disk. For disks that use a DOS-based partitioning
|
||||
scheme this will overwrite the partition table which is found at the
|
||||
end of the first block. This is a single point of failure so after the
|
||||
damage tools like <span><strong class="command">fdisk</strong></span> have no alternate data to use
|
||||
damage tools like <span class="command"><strong>fdisk</strong></span> have no alternate data to use
|
||||
so they report no partitions or a damaged partition table.
|
||||
</p><p>
|
||||
One utility that may help is
|
||||
<a href="http://www.cgsecurity.org/wiki/TestDisk" target="_top">
|
||||
<a class="ulink" href="http://www.cgsecurity.org/wiki/TestDisk" target="_top">
|
||||
<code class="literal">testdisk</code></a> which can scan a disk looking for
|
||||
partitions and recreate a partition table if requested.
|
||||
<sup>[<a name="id4711568" href="#ftn.id4711568">6</a>]</sup>
|
||||
<sup>[<a name="id2550980" href="#ftn.id2550980" class="footnote">6</a>]</sup>
|
||||
</p><p>
|
||||
Programs that create DOS partitions
|
||||
often place the first partition at logical block address 63. In Linux
|
||||
@ -574,7 +619,7 @@ a disk. The extended DOS partition table is placed elsewhere on
|
||||
a disk. Again there is only one copy of it so it represents another
|
||||
single point of failure. All DOS partition information can be
|
||||
read in a form that can be used to recreate the tables with the
|
||||
<span><strong class="command">sfdisk</strong></span> command. Obviously this needs to be done
|
||||
<span class="command"><strong>sfdisk</strong></span> command. Obviously this needs to be done
|
||||
beforehand and the file put on other media. Here is how to fetch the
|
||||
partition table information:
|
||||
</p><pre class="programlisting">
|
||||
@ -594,15 +639,15 @@ block(s) holding the partition table(s) and puts it in
|
||||
changes the partition tables as indicated by
|
||||
<code class="filename">my_disk_partition_info.txt</code>. For what it is worth the
|
||||
author did test this on his system!
|
||||
<sup>[<a name="id4711687" href="#ftn.id4711687">7</a>]</sup>
|
||||
<sup>[<a name="id2551099" href="#ftn.id2551099" class="footnote">7</a>]</sup>
|
||||
</p><p>
|
||||
For creating, destroying, resizing, checking and copying partitions, and
|
||||
the file systems on them, GNU's
|
||||
<a href="http://www.gnu.org/software/parted" target="_top">
|
||||
<a class="ulink" href="http://www.gnu.org/software/parted" target="_top">
|
||||
<code class="literal">parted</code></a> is worth examining.
|
||||
The <a href="http://www.tldp.org/HOWTO/Large-Disk-HOWTO.html" target="_top">
|
||||
The <a class="ulink" href="http://www.tldp.org/HOWTO/Large-Disk-HOWTO.html" target="_top">
|
||||
<code class="literal">Large Disk HOWTO</code></a> is also a useful resource.
|
||||
</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="lvm"></a>LVM repairs</h3></div></div></div><p>
|
||||
</p></div><div class="sect2" title="LVM repairs"><div class="titlepage"><div><div><h3 class="title"><a name="lvm"></a>LVM repairs</h3></div></div></div><p>
|
||||
This section was written by Frederic BOITEUX. It was titled: "HOW TO
|
||||
LOCATE AND REPAIR BAD BLOCKS ON AN LVM VOLUME".
|
||||
</p><p>
|
||||
@ -750,10 +795,10 @@ renounce
|
||||
</p><p>
|
||||
Search / correction follows the same scheme as for simple
|
||||
partitions :
|
||||
</p><div class="itemizedlist"><ul type="disc"><li><p>
|
||||
</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>
|
||||
find possible impacted files with debugfs (icheck <fs block nb>,
|
||||
then ncheck <icheck nb>).
|
||||
</p></li><li><p>
|
||||
</p></li><li class="listitem"><p>
|
||||
reallocate bad block writing zeros in it, *using the fs block size* :
|
||||
</p></li></ul></div><p>
|
||||
</p><p>
|
||||
@ -762,7 +807,7 @@ dd if=/dev/zero of=/dev/WDC80Go/ext1 count=1 bs=4096 seek=1378581
|
||||
</pre><p>
|
||||
</p><p>
|
||||
Et voilà !
|
||||
</p></div><div class="sect2" lang="en"><div class="titlepage"><div><div><h3 class="title"><a name="bb"></a>Bad block reassignment</h3></div></div></div><p>
|
||||
</p></div><div class="sect2" title="Bad block reassignment"><div class="titlepage"><div><div><h3 class="title"><a name="bb"></a>Bad block reassignment</h3></div></div></div><p>
|
||||
The SCSI disk command set and associated disk architecture are assumed
|
||||
in this section. SCSI disks have their own logical to physical mapping
|
||||
allowing a damaged sector (usually carrying 512 bytes of data) to be
|
||||
@ -789,11 +834,11 @@ sectors are a scarce resource.
|
||||
</p><p>
|
||||
Once a SCSI disk format has completed successfully, other problems
|
||||
may appear over time. These fall into two categories:
|
||||
</p><div class="itemizedlist"><ul type="disc"><li><p>
|
||||
</p><div class="itemizedlist"><ul class="itemizedlist" type="disc"><li class="listitem"><p>
|
||||
recoverable: the Error Correction Codes (ECC) detect a problem
|
||||
but it is small enough to be corrected. Optionally other strategies
|
||||
such as retrying the access may retrieve the data.
|
||||
</p></li><li><p>
|
||||
</p></li><li class="listitem"><p>
|
||||
unrecoverable: try as it may, the disk logic and ECC algorithms
|
||||
cannot recover the data. This is often reported as a
|
||||
<span class="emphasis"><em>medium error</em></span>.
|
||||
@ -808,12 +853,12 @@ the incoming data detects a CRC error due to a bad cable or termination.
|
||||
Depending on the disk vendor, recoverable errors can be ignored. After all,
|
||||
some disks have up to 68 bytes of ECC above the payload size of 512 bytes
|
||||
so why use up spare sectors which are limited in number
|
||||
<sup>[<a name="id4712485" href="#ftn.id4712485">8</a>]</sup>
|
||||
<sup>[<a name="id2551516" href="#ftn.id2551516" class="footnote">8</a>]</sup>
|
||||
?
|
||||
If the disk can recover the data and does decide to re-allocate (reassign)
|
||||
a sector, then first it checks the settings of the ARRE and AWRE bits in the
|
||||
read-write error recovery mode page. Usually these bits are set
|
||||
<sup>[<a name="id4712514" href="#ftn.id4712514">9</a>]</sup>
|
||||
<sup>[<a name="id2551535" href="#ftn.id2551535" class="footnote">9</a>]</sup>
|
||||
enabling automatic (read or write) re-allocation. The automatic
|
||||
re-allocation may also fail if the zone (or disk) has run out of spare
|
||||
sectors.
|
||||
@ -825,7 +870,7 @@ disk to spend too long trying to recover an error.
|
||||
Unrecoverable errors will cause a <span class="emphasis"><em>medium error</em></span> sense
|
||||
key, perhaps with some useful additional sense information. If the extended
|
||||
background self test includes a full disk read scan, one would expect the
|
||||
self test log to list the bad block, as shown in the <a href="#rfile" title="Repairs in a file system">the section called “Repairs in a file system”</a>.
|
||||
self test log to list the bad block, as shown in the <a class="xref" href="#rfile" title="Repairs in a file system">the section called “Repairs in a file system”</a>.
|
||||
Recent SCSI disks with a periodic background scan should also list
|
||||
unrecoverable read errors (and some recoverable errors as well). The
|
||||
advantage of the background scan is that it runs to completion while self
|
||||
@ -839,7 +884,7 @@ command will reassign one or more blocks, attempting to (partially ?) recover
|
||||
the data (a forlorn hope at this stage), fetch an unused spare sector from the
|
||||
current zone while adding the damaged old sector to the GLIST (hence the
|
||||
name "grown" list). The contents of the GLIST may not be that interesting
|
||||
but <span><strong class="command">smartctl</strong></span> prints out the number of entries in the grown
|
||||
but <span class="command"><strong>smartctl</strong></span> prints out the number of entries in the grown
|
||||
list and if that number grows quickly, the disk may be approaching the end
|
||||
of its useful life.
|
||||
</p><p>
|
||||
@ -847,26 +892,26 @@ Here is an alternate brute force technique to consider: if the data on the
|
||||
SCSI or ATA disk has all been backed up (e.g. is held on the other disks in
|
||||
a RAID 5 enclosure), then simply reformatting the disk may be the least
|
||||
cumbersome approach.
|
||||
</p><div class="sect3" lang="en"><div class="titlepage"><div><div><h4 class="title"><a name="sexample"></a>Example</h4></div></div></div><p>
|
||||
</p><div class="sect3" title="Example"><div class="titlepage"><div><div><h4 class="title"><a name="sexample"></a>Example</h4></div></div></div><p>
|
||||
Given a "bad block", it still may be useful to look at the
|
||||
<span><strong class="command">fdisk</strong></span> command (if the disk has multiple partitions)
|
||||
<span class="command"><strong>fdisk</strong></span> command (if the disk has multiple partitions)
|
||||
to find out which partition is involved, then use
|
||||
<span><strong class="command">debugfs</strong></span> (or a similar tool for the file system in
|
||||
<span class="command"><strong>debugfs</strong></span> (or a similar tool for the file system in
|
||||
question) to find out which, if any, file or other part of the file system
|
||||
may have been damaged. This is discussed in the <a href="#rfile" title="Repairs in a file system">the section called “Repairs in a file system”</a>.
|
||||
may have been damaged. This is discussed in the <a class="xref" href="#rfile" title="Repairs in a file system">the section called “Repairs in a file system”</a>.
|
||||
</p><p>
|
||||
Then a program that can execute the REASSIGN BLOCKS SCSI command is
|
||||
required. In Linux (2.4 and 2.6 series), FreeBSD, Tru64(OSF) and Windows
|
||||
the author's <span><strong class="command">sg_reassign</strong></span> utility in the sg3_utils
|
||||
the author's <span class="command"><strong>sg_reassign</strong></span> utility in the sg3_utils
|
||||
package can be used. Also found in that package is
|
||||
<span><strong class="command">sg_verify</strong></span> which can be used to check that a block is
|
||||
<span class="command"><strong>sg_verify</strong></span> which can be used to check that a block is
|
||||
readable.
|
||||
</p><p>
|
||||
Assume that logical block address 1193046 (which is 123456 in hex) is
|
||||
corrupt
|
||||
<sup>[<a name="id4712652" href="#ftn.id4712652">10</a>]</sup>
|
||||
<sup>[<a name="id2551756" href="#ftn.id2551756" class="footnote">10</a>]</sup>
|
||||
on the disk at <code class="filename">/dev/sdb</code>. A long selftest command like
|
||||
<span><strong class="command">smartctl -t long /dev/sdb</strong></span> may result in log results
|
||||
<span class="command"><strong>smartctl -t long /dev/sdb</strong></span> may result in log results
|
||||
like this:
|
||||
</p><pre class="programlisting">
|
||||
# smartctl -l selftest /dev/sdb
|
||||
@ -882,7 +927,7 @@ Num Test Status segment LifeTime LBA_first_err [SK AS
|
||||
# 3 Background short Completed - 194 - [- - -]
|
||||
</pre><p>
|
||||
</p><p>
|
||||
The <span><strong class="command">sg_verify</strong></span> utility can be used to confirm that there
|
||||
The <span class="command"><strong>sg_verify</strong></span> utility can be used to confirm that there
|
||||
is a problem at that address:
|
||||
</p><pre class="programlisting">
|
||||
# sg_verify --lba=1193046 /dev/sdb
|
||||
@ -911,17 +956,17 @@ length:
|
||||
</p><p>
|
||||
The GLIST length has grown by one as expected. If the disk was unable to
|
||||
recover any data, then the "new" block at lba 0x123456 has vendor specific
|
||||
data in it. The <span><strong class="command">sg_reassign</strong></span> utility can also do bulk
|
||||
reassigns, see <span><strong class="command">man sg_reassign</strong></span> for more information.
|
||||
data in it. The <span class="command"><strong>sg_reassign</strong></span> utility can also do bulk
|
||||
reassigns, see <span class="command"><strong>man sg_reassign</strong></span> for more information.
|
||||
</p><p>
|
||||
The <span><strong class="command">dd</strong></span> command could be used to read the contents of
|
||||
The <span class="command"><strong>dd</strong></span> command could be used to read the contents of
|
||||
the "new" block:
|
||||
</p><pre class="programlisting">
|
||||
# dd if=/dev/sdb iflag=direct skip=1193046 of=blk.img bs=512 count=1
|
||||
</pre><p>
|
||||
</p><p>
|
||||
and a hex editor
|
||||
<sup>[<a name="id4712776" href="#ftn.id4712776">11</a>]</sup>
|
||||
<sup>[<a name="id2551874" href="#ftn.id2551874" class="footnote">11</a>]</sup>
|
||||
used to view and potentially change the
|
||||
<code class="filename">blk.img</code> file. An altered <code class="filename">blk.img</code>
|
||||
file (or <code class="filename">/dev/zero</code>) could be written back with:
|
||||
@ -936,36 +981,38 @@ a superblock or a directory.
|
||||
Even if a full backup of the disk is available, or the disk has been
|
||||
"ejected" from a RAID, it may still be worthwhile to reassign the bad
|
||||
block(s) that caused the problem (or simply format the disk (see
|
||||
<span><strong class="command">sg_format</strong></span> in the sg3_utils package)) and re-use the
|
||||
<span class="command"><strong>sg_format</strong></span> in the sg3_utils package)) and re-use the
|
||||
disk later (not unlike the way a replacement disk from a manufacturer
|
||||
might be used).
|
||||
</p><p>
|
||||
CVS $Id: badblockhowto.xml,v 1.4 2007/01/31 13:56:32 dpgilbert Exp $
|
||||
</p></div></div></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a name="ftn.id4710480" href="#id4710480">1</a>] </sup>
|
||||
$Id: badblockhowto.xml 2873 2009-08-11 21:46:20Z dipohl $
|
||||
</p></div></div></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a name="ftn.id2506421" href="#id2506421" class="para">1</a>] </sup>
|
||||
Self-Monitoring, Analysis and Reporting Technology -> SMART
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4710574" href="#id4710574">2</a>] </sup>
|
||||
Starting with GNU coreutils release 5.3.0, the <span><strong class="command">dd</strong></span>
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2506498" href="#id2506498" class="para">2</a>] </sup>
|
||||
Starting with GNU coreutils release 5.3.0, the <span class="command"><strong>dd</strong></span>
|
||||
command in Linux includes the options 'iflag=direct' and 'oflag=direct'.
|
||||
Using these with the <span><strong class="command">dd</strong></span> commands should be helpful,
|
||||
Using these with the <span class="command"><strong>dd</strong></span> commands should be helpful,
|
||||
because adding these flags should avoid any interaction
|
||||
with the block buffering IO layer in Linux and permit direct reads/writes
|
||||
from the raw device. Use <span><strong class="command">dd --help</strong></span> to see if your
|
||||
from the raw device. Use <span class="command"><strong>dd --help</strong></span> to see if your
|
||||
version of dd supports these options. If not, the latest code for dd
|
||||
can be found at <a href="http://alpha.gnu.org/gnu/coreutils" target="_top">
|
||||
can be found at <a class="ulink" href="http://alpha.gnu.org/gnu/coreutils" target="_top">
|
||||
<code class="literal">alpha.gnu.org/gnu/coreutils</code></a>.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4711397" href="#id4711397">3</a>] </sup>
|
||||
Do not use <span><strong class="command">tar cf /dev/null</strong></span>, see
|
||||
<span><strong class="command">info tar</strong></span>.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4711431" href="#id4711431">4</a>] </sup>
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2550815" href="#id2550815" class="para">3</a>] </sup>
|
||||
Do not use <span class="command"><strong>tar -c -f /dev/null</strong></span> or
|
||||
<span class="command"><strong>tar -cO /mydir >/dev/null</strong></span>. GNU tar does not
|
||||
actually read the files if <code class="filename">/dev/null</code> is used as
|
||||
archive path or as standard output, see <span class="command"><strong>info tar</strong></span>.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2550862" href="#id2550862" class="para">4</a>] </sup>
|
||||
Important: set blocksize range is arbitrary, but do not only test a single
|
||||
block, as bad blocks are often social. Not too large as this test probably
|
||||
has not 0% risk.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4711447" href="#id4711447">5</a>] </sup>
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2550876" href="#id2550876" class="para">5</a>] </sup>
|
||||
The rather awkward `expr 484335 + 100` (note the back quotes) can be replaced
|
||||
with $((484335+100)) if the bash shell is being used. Similarly the last
|
||||
argument can become $((484335-100)) .
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4711568" href="#id4711568">6</a>] </sup>
|
||||
<span><strong class="command">testdisk</strong></span> scans the media for the beginning of file
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2550980" href="#id2550980" class="para">6</a>] </sup>
|
||||
<span class="command"><strong>testdisk</strong></span> scans the media for the beginning of file
|
||||
systems that it recognizes. It can be tricked by data that looks
|
||||
like the beginning of a file system or an old file system from a
|
||||
previous partitioning of the media (disk). So care should be taken.
|
||||
@ -974,24 +1021,24 @@ extended partitions lie wholly within a extended partition table
|
||||
allocation. Also if the root partition of a Linux/Unix installation
|
||||
can be found then the <code class="filename">/etc/fstab</code> file is a useful
|
||||
resource for finding the partition numbers of other partitions.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4711687" href="#id4711687">7</a>] </sup>
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2551099" href="#id2551099" class="para">7</a>] </sup>
|
||||
Thanks to Manfred Schwarb for the information about storing partition
|
||||
table(s) beforehand.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4712485" href="#id4712485">8</a>] </sup>
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2551516" href="#id2551516" class="para">8</a>] </sup>
|
||||
Detecting and fixing an error with ECC "on the fly" and not going the further
|
||||
step and reassigning the block in question may explain why some disks have
|
||||
large numbers in their read error counter log. Various worried users have
|
||||
reported large numbers in the "errors corrected without substantial delay"
|
||||
counter field which is in the "Errors corrected by ECC fast" column in
|
||||
the <span><strong class="command">smartctl -l error</strong></span> output.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4712514" href="#id4712514">9</a>] </sup>
|
||||
the <span class="command"><strong>smartctl -l error</strong></span> output.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2551535" href="#id2551535" class="para">9</a>] </sup>
|
||||
Often disks inside a hardware RAID have the ARRE and AWRE bits
|
||||
cleared (disabled) so the RAID controller can do things manually or flag
|
||||
the disk for replacement.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4712652" href="#id4712652">10</a>] </sup>
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2551756" href="#id2551756" class="para">10</a>] </sup>
|
||||
In this case the corruption was manufactured by using the WRITE LONG
|
||||
SCSI command. See <span><strong class="command">sg_write_long</strong></span> in sg3_utils.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id4712776" href="#id4712776">11</a>] </sup>
|
||||
SCSI command. See <span class="command"><strong>sg_write_long</strong></span> in sg3_utils.
|
||||
</p></div><div class="footnote"><p><sup>[<a name="ftn.id2551874" href="#id2551874" class="para">11</a>] </sup>
|
||||
Most window managers have a handy calculator that will do hex to
|
||||
decimal conversions. More work may be needed at the file system level,
|
||||
</p></div></div></div></body></html>
|
||||
|
Loading…
Reference in New Issue
Block a user