OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
Re: Timeout Problem with mail3.tk-online.net

From: Mark Martinec (Mark.Martinec+postfixijs.si)
Date: Thu Sep 07 2006 - 04:40:49 CDT


> Ralf says that if he forces the other end to go below 1360 byte
> packets, mail comes across just fine.

...TCP segment size, not packet size, sorry.

Out of curiosity I re-examined the tcpdump, which shows the following
consistently:

- all TCP data segments below a threshold size have correct TCP checksum,
  and all have a PUSH flag set in the TCP header;

- all TCP data segments at full negotiated size have incorrect TCP checksum,
  and all have a PUSH flag *cleared* (IP header checksum is correct, and
  data look correct at first sight);

- incorrect checksum is always off-by-one, i.e.:
    cksum 0x9402 (incorrect (-> 0x9401)
    cksum 0xe03c (incorrect (-> 0xe03b)
    cksum 0xce90 (incorrect (-> 0xce8f)
  as if it were recomputed after modifying a packet, and used a 2's complement
  instead of 1's complement as required. Note that this is not a single bit
  error, i.e. not simply because PUSH bit was cleared;

letting p0f examine the tcpdump it shows:
  193.22.182.31:60193 - Solaris 10 (beta) (NAT!)
    -> 160.45.207.131:25 (distance 14, ...)

Mail header indicates the sending MTA runs qmail.
Whether this is really a Solaris box is likely, but not certain.
Also, NAT is likely, but not certain.

There are lots of pieces that may be wrong at the sending site.
NAT does need to recalculate checksums, although it would probably
get all packets wrong, not just the full-size ones. ADSL use of
PPPoE or some link-level re-encapsulation or fragmentation/reassembly
may be the problem, or could only be triggering a problem elsewhere.
Since the sending site indicates MSS lower than is usual even for PPPoE,
they may be using some further encapsulation, e.g. a VPN tunnel.
Or a firewall / traffic shaper that is too clever for its own good.

A workaround is to reduce MTU at the sending site, although this
just sweeps the problem under a carpet. A true solution would be
to find the guilty piece and fix/replace/junk it.

See this article:
  http://www.sigcomm.org/sigcomm2000/conf/abstract/9-1.htm
  http://www.sigcomm.org/sigcomm2000/conf/paper/sigcomm2000-9-1.ps.gz

It mentions bad TCP checksums can coincide with PUSH flags being cleared,
but it does not investigate a reason. Does this ring a bell for somebody?

  Mark