[H-GEN] DMA query

Jason Parker-Burlingham jasonp at panix.com
Thu May 6 00:12:24 EDT 2004


I'm trying to come up with an explanation for strange behavior
observed on a SAMBA-based fileserver.  The computer in question
doesn't really have a shortage of problems, but lately it has started
to cease answering requests from the network---users can't get
profiles, they call us, usually after unplugging anything that looks
handy and rebooting like mad.

I managed to get there quickly enough today to discover that although
the relevant processes were running, the host didn't appear to be on
the network (as I recall, it would respond to its own broadcast pings,
but nothing else would; neither could it be pinged from other hosts).

So I decided to start working the problem from the physical layer on
up, but as soon as I unplugged and reinserted the host's network cable
on the switch, everything started to work again (I had a continuous
ping running on a machine across the room and noticed almost
immediately).  So I think I have a badly behaved network card.

But I think that may not be the true cause:

I'd been noticing DMA timeouts and IDE resets, too.  The night before
the events described above, I removed a suspect disk, hoping that'd
put an end to them, started a backup to tape, and left.  Here's what
happened, starting with when the kernel detected my using the tape
unit, and ending with the last messages from the kernel before I
arrived back at the client's a few hours later:

   hdd: attached ide-tape driver.
   ide-tape: hdd <-> ht0: Seagate STT20000A rev 8A51
   ide-tape: hdd <-> ht0: 1000KBps, 6*54kB buffer, 9720kB pipeline, 110ms tDSC, DMA
   hdd: error waiting for DMA
   hdd: dma timeout retry: status=0xd0 { Busy }
   hdd: DMA disabled
   hdd: ATAPI reset complete
   ide-tape: ht0: I/O error, pc =  a, key =  2, asc =  4, ascq =  1
   [lots of ht0 errors just like above]
   ide-tape: Couldn't write a filemark
   ide-tape: ht0: I/O error, pc = 10, key =  2, asc =  4, ascq =  1
   ide-tape: ht0: I/O error, pc = 10, key =  2, asc =  4, ascq =  1
   hdc: attached ide-cdrom driver.
   hdc: ATAPI CD-ROM drive, 128kB Cache, UDMA(33)
   Uniform CD-ROM driver Revision: 3.12
   cdrom: This disc doesn't have any tracks I recognize!

Now, what's curious is that I had disabled and reenabled networking
(basically, re-running ifconfig) before deciding to work more
methodically; but the messages from the kernel about eth0 make me
suspicious:

   # dmesg|grep eth0:
   eth0: RealTek RTL8139 at 0xc8868f00, 00:50:bf:3a:3c:9e, IRQ 9
   eth0:  Identified 8139 chip type 'RTL-8139B'
   eth0: link up, 100Mbps, full-duplex, lpa 0x45E1

[this is when I started a packet-sniff after things broke]
   eth0: Promiscuous mode enabled.
   eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
   eth0: link down
   eth0: link up, 100Mbps, full-duplex, lpa 0x45E1
   eth0: link down
   eth0: link up, 100Mbps, full-duplex, lpa 0x45E1

Notice the unpaired "link up".

My best theory is that the IDE bus reset tickles the network card into
an unresponsive state, and reinserting the network cable (or
rebooting) is sufficient to fix it.  The only problem with this theory
is that I don't know enough to rule it in or out.

This may all be moot since the client is getting badly-needed
replacement hardware.  But I still want to know if I'm just making
shit up, or if I could conceivably be on the right track.

(Oh, and backups to tape when you've disabled DMA on all devices are
*awful*.)

jason
-- 
http://panix.com/~jasonp?BabyPictures




More information about the General mailing list