[H-GEN] Disk array about to croak?
Snowy Angelique Maslov
snowy at snowy.org
Wed Sep 17 07:22:46 EDT 2014
On 17/09/2014 8:16 PM, Benjamin Fowler wrote:
> [ Humbug *General* list - semi-serious discussions about Humbug and ]
> [ Unix-related topics. Posts from non-subscribed addresses will vanish. ]
>
>
>
> Hello all,
>
> I have a little HP Mediasmart server which I've redone with Debian. It
> runs a 4-drive SATA disk array, which runs ext4 over LVM over
> MD/softraid (raid 5). It's a neat little machine, which has been going
> quite nicely for hosting all my media and network backups.
>
> Until now, that is. I've been noticing the following sort of output in
> my daily logwatch emails:
>
> So what I _think_ is happening, is that the first disk in the array is
> getting read errors. It hasn't failed out yet. Would I be right in
> saying that the first disk is about to give up the ghost?
>
> (Guess it's time to start thinking about moving the root and boot
> disks off the array -- this little server only has 4 disk controllers,
> and all of them are for the disk array. If I lose the first drive, the
> (headless!!) machine is basically toast until I can rebuild a network
> installer and TFTP boot into a recovery disk image with a network
> console :-/...)
>
>
> WARNING: Kernel Errors Present
> res 41/40:00:58:94:43/00:00:1a:00:00/40 Emask 0x409 (media
> error) <F> ...: 6 Time(s)
> ata1.00: error: { UNC } ...: 6 Time(s)
> end_request: I/O error, dev sda, sector ...: 1 Time(s)
> md/raid:md1: read error corrected (8 sec ...: 1 Time(s)
> sd 0:0:0:0: [sda] Add. Sense: Unrecovered read error - auto reallocat
> ...: 1 Time(s)
> sd 0:0:0:0: [sda] Sense Key : Medium Error [current] [descr ...: 1
> Time(s)
>
> 1 Time(s): 1a 43 94 58
> 1 Time(s): 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> 1 Time(s): Descriptor sense data with sense descriptors (in hex):
> 2 Time(s): ata1.00: cmd 60/08:00:58:94:43/00:00:1a:00:00/40 tag 0 ncq
> 4096 in
> 3 Time(s): ata1.00: cmd 60/08:08:58:94:43/00:00:1a:00:00/40 tag 1 ncq
> 4096 in
> 1 Time(s): ata1.00: cmd 60/08:28:58:94:43/00:00:1a:00:00/40 tag 5 ncq
> 4096 in
> 6 Time(s): ata1.00: configured for UDMA/133
> 5 Time(s): ata1.00: exception Emask 0x0 SAct 0x3 SErr 0x0 action 0x0
> 1 Time(s): ata1.00: exception Emask 0x0 SAct 0x60 SErr 0x0 action 0x0
> 6 Time(s): ata1.00: failed command: READ FPDMA QUEUED
> 6 Time(s): ata1.00: irq_stat 0x40000008
> 6 Time(s): ata1.00: status: { DRDY ERR }
> 6 Time(s): ata1: EH complete
> 1 Time(s): raid5_end_read_request: 43 callbacks suppressed
> 1 Time(s): sd 0:0:0:0: [sda] Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> 1 Time(s): sd 0:0:0:0: [sda] CDB: Read(10): 28 00 1a 43 94 58 00 00 08 00
> 1 Time(s): sd 0:0:0:0: [sda] Unhandled sense code
Certainly not a healthy disk Ben - I'd do a smartctl test on it to be
sure but I would bet money on it that the drive is on its way out. To
run a quick test:
# smartctl --test=short /dev/sda
That should take about a minute. And then run:
# smartctl -a /dev/sda
That should display a report on the drive status.
--
Snowy Angelique Maslov<snowy at snowy.org>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.humbug.org.au/pipermail/general/attachments/20140917/1d00296c/attachment.html>
More information about the General
mailing list