OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
em driver in 3.5 - watchdog timeout -- resetting messages

From: Bill Marquette (billmucsecurity.com)
Date: Mon May 03 2004 - 19:24:55 CDT


I'm seeing random "watchdog timeout -- resetting" messages right after
boot in 3.5 on cards that I'd never previously had issues with (OpenBSD
3.3). The message will move between em0 and em1 (but apparently never hit
em2 and em3) and sometimes not appear at all. I probably wouldn't have
paid much attention to this if I wasn't trying to get carp up and running
on these interfaces. The short is (I'll try and put more detail in below)
that the machine will receive all traffic destined for it and certain (arp
and carp) multicast, but nothing to the virtual mac.

The card having issues is an Intel Pro/1000 MF Dual port server adapter
(LC form factor connecters) with the 82546EB chipset. I also have a dual
port Intel Pro/1000 MT (copper) card in the same machine (not sure on the
chipset offhand) running with apparently no problems (so far - this is em2
and em3).

Longer description:
The machine is sitting on a switch with spanning tree enabled for the port
it's attached to. A second identically configured machine (same issues)
is also on the same switch...fwiw, these are Compaq DL380 G3's on a Cisco
6509 switch. When running tcpdump after the watchdog message I cannot see
the STP traffic (odd, spent some time blaming the switch), nor can I see
traffic destined for the mac address of the CARP interface (it was
MASTER). I can however see arp being sent to the all FF's mac looking for
the mac of the CARP interface and the (correct) reply. I can also see
traffic destined soley for the real IP address of my interface.

I can "fix" it during that boot by ifconfiging the interface down and then
back up.

Any thoughts? I've tried pulling down the -current cvs revs and muddled
through getting the kernel to compile just to see what would happen - core
dump shortly after boot (odd, but I probably screwed something up, no
biggie). If I can reboot enough times I eventually win the lottery and
get both of the "faulty" em interfaces to come up w/out the error. While
it's fixable I'd like to not have to ifconfig down/up all my interfaces on
boot if I don't absolutely have to. The only thing that I can think of
that I haven't tried is to split the card out to different IRQs, but that
would be a step back from the behaviour in 3.3 (and I don't have any IRQ's
that aren't already overused).

Thanks

--Bill

Obligatory dmesg:
OpenBSD 3.5 (GENERIC) #0: Fri Apr 9 17:26:12 CDT 2004
    rootxxxx.xxxx.com:/usr/src/sys/arch/i386/compile/GENERIC
cpu0: Intel(R) Xeon(TM) CPU 2.80GHz ("GenuineIntel" 686-class) 2.79 GHz
cpu0:
FPU,V86,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,SBF,CNXT-ID
real mem = 536428544 (523856K)
avail mem = 491294720 (479780K)
using 4278 buffers containing 26923008 bytes (26292K) of memory
mainbus0 (root)
bios0 at mainbus0: AT/286+(00) BIOS, date 12/31/99, BIOS32 rev. 0 0xf0000
pcibios0 at bios0: rev. 2.1 0xf0000/0x2000
pcibios0: PCI BIOS has 9 Interrupt Routing table entries
pcibios0: no compatible PCI ICU found
pcibios0: Warning, unable to fix up PCI interrupt routing
pcibios0: PCI bus #0 is the last bus
bios0: ROM list: 0xc0000/0x8000 0xc8000/0x6000 0xee000/0x2000!
pci0 at mainbus0 bus 0: configuration mode 1 (no bios)
pchb0 at pci0 dev 0 function 0 "ServerWorks CMIC_LE Host" rev 0x13
pchb1 at pci0 dev 0 function 1 "ServerWorks CMIC_LE Host" rev 0x00
pci1 at pchb1 bus 3
em0 at pci1 dev 1 function 0 "Intel PRO/1000MF (PWLA8492MF)" rev 0x01: irq
11, address: 00:07:e9:09:0d:c6
em1 at pci1 dev 1 function 1 "Intel PRO/1000MF (PWLA8492MF)" rev 0x01: irq
11, address: 00:07:e9:09:0d:c7
pchb2 at pci0 dev 0 function 2 vendor "ServerWorks", unknown product 0x0
rev 0x00
pci2 at pchb2 bus 1
vga1 at pci0 dev 3 function 0 "ATI Rage XL" rev 0x27
wsdisplay0 at vga1: console (80x25, vt100 emulation)
wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
vendor "Compaq", unknown product 0xb203 (class system subclass
miscellaneous, rev 0x01) at pci0 dev 4 function 0 not configured
vendor "Compaq", unknown product 0xb204 (class system subclass
miscellaneous, rev 0x01) at pci0 dev 4 function 2 not configured
pcib0 at pci0 dev 15 function 0 "ServerWorks CSB5 SouthBridge" rev 0x93
pciide0 at pci0 dev 15 function 1 "ServerWorks CSB5 IDE" rev 0x93: DMA
atapiscsi0 at pciide0 channel 0 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <TEAC, CD-224E, 9.9A> SCSI0 5/cdrom removable
cd0(pciide0:0:0): using PIO mode 4, DMA mode 2
pchb3 at pci0 dev 15 function 3 "ServerWorks CSB5 PCI" rev 0x00
pchb4 at pci0 dev 16 function 0 "ServerWorks CIOBX2" rev 0x05
pchb5 at pci0 dev 16 function 2 "ServerWorks CIOBX2" rev 0x05
pci3 at pchb5 bus 6
ppb0 at pci3 dev 1 function 0 "DPT PCI-PCI" rev 0x02
pci4 at ppb0 bus 7
iop0 at pci3 dev 1 function 1 "DPT SmartRAID (I2O)" rev 0x02: I2O adapter
<ADAPTEC 2100S>
iop0: interrupting at irq 7
em2 at pci3 dev 2 function 0 "Intel PRO/1000MT (82546GB)" rev 0x03: irq
10, address: 00:04:23:a5:94:28
em3 at pci3 dev 2 function 1 "Intel PRO/1000MT (82546GB)" rev 0x03: irq
10, address: 00:04:23:a5:94:29
"Compaq PCI Hotplug" rev 0x14 at pci3 dev 30 function 0 not configured
pchb6 at pci0 dev 17 function 0 "ServerWorks CIOBX2" rev 0x05
pchb7 at pci0 dev 17 function 2 "ServerWorks CIOBX2" rev 0x05
pci5 at pchb7 bus 2
bge0 at pci5 dev 1 function 0 "Broadcom BCM5703X" rev 0x02: irq 10:
address: 00:0b:cd:43:1f:7d
brgphy0 at bge0 phy 1: BCM5703 10/100/1000baseTX PHY, rev. 2
bge1 at pci5 dev 2 function 0 "Broadcom BCM5703X" rev 0x02: irq 10:
address: 00:0b:cd:43:1f:7c
brgphy1 at bge1 phy 1: BCM5703 10/100/1000baseTX PHY, rev. 2
isa0 at pcib0
isadma0 at isa0
pckbc0 at isa0 port 0x60/5
pckbd0 at pckbc0 (kbd slot)
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pmsi0 at pckbc0 (aux slot)
pckbc0: using irq 12 for aux slot
wsmouse0 at pmsi0 mux 0
pcppi0 at isa0 port 0x61
midi0 at pcppi0: <PC speaker>
sysbeep0 at pcppi0
npx0 at isa0 port 0xf0/16: using exception 16
pccom0 at isa0 port 0x3f8/8 irq 4: ns16550a, 16 byte fifo
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
fd0 at fdc0 drive 0: 1.44MB 80 cyl, 2 head, 18 sec
biomask c0c0 netmask ccc0 ttymask dcc2
pctr: user-level cycle counter enabled
iop0: configuring...
ioprbs0 at iop0 tid 517: <ADAPTEC, RAID-1, 370F> direct access, fixed
scsibus1 at ioprbs0: 1 targets
sd0 at scsibus1 targ 0 lun 0: <I2O, Container #00, > SCSI2 0/direct fixed
sd0: 34732MB, 8820 cyl, 128 head, 63 sec, 512 bytes/sec, 71131136 sec total
device (class 0x80) at iop0 tid 8 not configured
dkcsum: sd0 matched BIOS disk 80
root on sd0a
rootdev=0x400 rrootdev=0xd00 rawdev=0xd02
em1: watchdog timeout -- resetting