[H-GEN] Disk array about to croak?

Benjamin Fowler ben.fowler.bjf at gmail.com
Wed Sep 17 06:16:00 EDT 2014


Hello all,

I have a little HP Mediasmart server which I've redone with Debian. It runs
a 4-drive SATA disk array, which runs ext4 over LVM over MD/softraid (raid
5). It's a neat little machine, which has been going quite nicely for
hosting all my media and network backups.

Until now, that is. I've been noticing the following sort of output in my
daily logwatch emails:

So what I _think_ is happening, is that the first disk in the array is
getting read errors. It hasn't failed out yet. Would I be right in saying
that the first disk is about to give up the ghost?

(Guess it's time to start thinking about moving the root and boot disks off
the array -- this little server only has 4 disk controllers, and all of
them are for the disk array. If I lose the first drive, the (headless!!)
machine is basically toast until I can rebuild a network installer and TFTP
boot into a recovery disk image with a network console :-/...)


WARNING:  Kernel Errors Present
             res 41/40:00:58:94:43/00:00:1a:00:00/40 Emask 0x409 (media
error) <F> ...:  6 Time(s)
    ata1.00: error: { UNC } ...:  6 Time(s)
    end_request: I/O error, dev sda, sector ...:  1 Time(s)
    md/raid:md1: read error corrected (8 sec ...:  1 Time(s)
    sd 0:0:0:0: [sda]  Add. Sense: Unrecovered read error - auto reallocat
...:  1 Time(s)
    sd 0:0:0:0: [sda]  Sense Key : Medium Error [current] [descr ...:  1
Time(s)

 1 Time(s):         1a 43 94 58
 1 Time(s):         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
 1 Time(s): Descriptor sense data with sense descriptors (in hex):
 2 Time(s): ata1.00: cmd 60/08:00:58:94:43/00:00:1a:00:00/40 tag 0 ncq 4096
in
 3 Time(s): ata1.00: cmd 60/08:08:58:94:43/00:00:1a:00:00/40 tag 1 ncq 4096
in
 1 Time(s): ata1.00: cmd 60/08:28:58:94:43/00:00:1a:00:00/40 tag 5 ncq 4096
in
 6 Time(s): ata1.00: configured for UDMA/133
 5 Time(s): ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
 1 Time(s): ata1.00: exception Emask 0x0 SAct 0x60 SErr 0x0 action 0x0
 6 Time(s): ata1.00: failed command: READ FPDMA QUEUED
 6 Time(s): ata1.00: irq_stat 0x40000008
 6 Time(s): ata1.00: status: { DRDY ERR }
 6 Time(s): ata1: EH complete
 1 Time(s): raid5_end_read_request: 43 callbacks suppressed
 1 Time(s): sd 0:0:0:0: [sda]  Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
 1 Time(s): sd 0:0:0:0: [sda] CDB: Read(10): 28 00 1a 43 94 58 00 00 08 00
 1 Time(s): sd 0:0:0:0: [sda] Unhandled sense code


Cheers, Ben.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.humbug.org.au/pipermail/general/attachments/20140917/30902826/attachment.html>


More information about the General mailing list