OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
Re: random disk-related hard lockups

From: Alexander Yurchenko (grangert.mipt.ru)
Date: Tue Jul 08 2003 - 04:10:46 CDT


Sorry, but all our mind readers are on vacation. Please provide a dmesg.

On Mon, Jul 07, 2003 at 11:46:54PM -0500, Matt Garman wrote:
> Every now and then my OpenBSD 3.3 system will lock up hard ("hard"
> meaning that I have to actually power down using the hardware power
> switch).
>
> The problem is almost certainly disk-related. In addition to being my
> gateway/firewall/NAT machine, I use the OBSD system to backup my main
> computer every night using rsync. This is (potentially) a fairly
> disk-intensive operation, since I backup over 50 GB of data (though most
> of it is static).
>
> Anyway, just before a crash, I get the following kinds of messages in
> /var/log/messages:
>
> Jun 19 03:05:35 septictank /bsd: wd0(pciide0:0:0): timeout
> Jun 19 03:05:35 septictank /bsd: type: ata
> Jun 19 03:05:35 septictank /bsd: c_bcount: 65536
> Jun 19 03:05:35 septictank /bsd: c_skip: 0
> Jun 19 03:05:35 septictank /bsd: pciide0:0:0: bus-master DMA error: missing interrupt, status=0x20
> Jun 19 03:05:35 septictank /bsd: pciide0 channel 0: reset failed for drive 0
> Jun 19 03:05:35 septictank /bsd: wd0e: device timeout reading fsbn 16130096 of 16130096-16130223 (wd0 bn 204873056; cn 203247 tn 1 sn 17), retrying
> Jun 19 03:05:35 septictank /bsd: wd0: soft error (corrected)
>
> ... and more messages that are essentially the same.
>
> And here's the most recent lockup:
>
> Jul 4 03:03:40 septictank /bsd: wd0(pciide0:0:0): timeout
> Jul 4 03:03:40 septictank /bsd: type: ata
> Jul 4 03:03:40 septictank /bsd: c_bcount: 8192
> Jul 4 03:03:40 septictank /bsd: c_skip: 0
> Jul 4 03:03:40 septictank /bsd: pciide0:0:0: bus-master DMA error: missing interrupt, status=0x20
> Jul 4 03:03:40 septictank /bsd: pciide0 channel 0: reset failed for drive 0
> Jul 4 03:03:40 septictank /bsd: wd0e: device timeout writing fsbn 16935104 of 16935104-16935119 (wd0 bn 205678064; cn 204045 tn 11 sn 11), retrying
> Jul 4 03:04:10 septictank /bsd: pciide0:0:0: not ready, st=0xd0, err=0x00
> Jul 4 03:04:10 septictank /bsd: wd0e: device timeout writing fsbn 16935104 of 16935104-16935119 (wd0 bn 205678064; cn 204045 tn 11 sn 11), retrying
> Jul 4 03:04:10 septictank ppp[21778]: tun0: Warning: Carrier settings ignored
> Jul 4 03:04:10 septictank /bsd: wd0: soft error (corrected)
>
> ... and more messages that are essentially the same.
>
> The kernel is the generic (stock) 3.3 kernel. I used the exact same
> hardware and rsync process with OpenBSD 3.2, and never seemed to have
> this kind of problem.
>
> Thanks for any feedback or suggestions!
> Matt
>
> --
> Matt Garman, garmanraw-sewage.net
> ``I ain't never seen no whiskey, the blues made my sloppy drunk!''
> -- Sleepy John Estes, ``Leaving Trunk''

--
   Alexander Yurchenko (aka grange)