OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
Re: Possible MX Lookup/Ordering Issue

gordanbobich.net
Date: Thu Nov 01 2007 - 10:49:38 CDT


On Thu, 1 Nov 2007, Wietse Venema wrote:

> We don't need no steenkeeng debug logs at this point. The
> normal logging show exactly which IP addresses the SMTP
> client tries to connect to.
>
> You may obfuscate the recipient (as long as you give the domain
> name in the text) but please do not remove process ID, time
> stamp or other information.

Here is a sample log extract. I had to triple check that there are no
related records inbetween, since the entry lines are intertwined (it's a
mail relay for some thousands of users).

Ironically, the mail eventually got out through avt0 (which is a reject
target), because I resolved the problem locally by setting up an uptables
rule to DNAT the decoy targets to the real target, so that even if the MTA
picks the wrong MX, it'll get the mail through correctly to my incoming
server. This demonstrates that postfix does seem to roll-over the list and
try again through the list, but it remains unexplained why it went to
avt15 decoy first (no, there are no logs inbetween for message
5C6037A2683, I triple-checked).

Nov 1 07:15:30 smtp-out-b postfix/smtpd[22565]: connect from
213.177.244.231.adsl.griffin.net.uk[213.177.244.231]
Nov 1 07:15:31 smtp-out-b throttle[22567]: IP address 213.177.244.231
allowed
Nov 1 07:15:33 smtp-out-b postfix/smtpd[22565]: 5C6037A2683:
client=213.177.244.231.adsl.griffin.net.uk[213.177.244.231]
Nov 1 07:15:33 smtp-out-b postfix/cleanup[22462]: 5C6037A2683:
message-id=<!&!AAAAAAAAAAAYAAAAAAAAAMZwGI7IkCJPiEIFgLPzqtnCgAAAEAAAA
FTJx60TftpBkIoHmlwF8GsBAAAAAA==zoom.co.uk>
Nov 1 07:15:33 smtp-out-b postfix/qmgr[20307]: 5C6037A2683:
from=<foozoom.co.uk>, size=2903, nrcpt=1 (queue active)
Nov 1 07:15:36 smtp-out-b postfix/smtpd[22565]: disconnect from
213.177.244.231.adsl.griffin.net.uk[213.177.244.231]
Nov 1 07:16:52 smtp-out-b postfix/smtp[22405]: 5C6037A2683:
to=<bargriffin.com>, relay=none, delay=81, status=deferred (connect t
o avt15.mail.griffin.com[85.189.39.55]: No route to host)
Nov 1 07:38:45 smtp-out-b postfix/qmgr[20307]: 5C6037A2683:
from=<foozoom.co.uk>, size=2903, nrcpt=1 (queue active)
Nov 1 07:40:10 smtp-out-b postfix/smtp[23233]: 5C6037A2683:
to=<bargriffin.com>, relay=none, delay=1479, status=deferred (connect
  to avt15.mail.griffin.com[85.189.39.55]: No route to host)
Nov 1 08:12:05 smtp-out-b postfix/qmgr[20307]: 5C6037A2683:
from=<foozoom.co.uk>, size=2903, nrcpt=1 (queue active)
Nov 1 08:13:23 smtp-out-b postfix/smtp[24076]: 5C6037A2683:
to=<bargriffin.com>, relay=none, delay=3472, status=deferred (connect
  to avt15.mail.griffin.com[85.189.39.55]: No route to host)
Nov 1 09:18:45 smtp-out-b postfix/qmgr[20307]: 5C6037A2683:
from=<foozoom.co.uk>, size=2903, nrcpt=1 (queue active)
Nov 1 09:20:17 smtp-out-b postfix/smtp[25788]: 5C6037A2683:
to=<bargriffin.com>, relay=none, delay=7486, status=deferred (connect
  to avt15.mail.griffin.com[85.189.39.55]: No route to host)
Nov 1 10:42:06 smtp-out-b postfix/qmgr[27856]: 5C6037A2683:
from=<foozoom.co.uk>, size=2903, nrcpt=1 (queue active)
Nov 1 10:44:13 smtp-out-b postfix/smtp[27831]: 5C6037A2683:
to=<bargriffin.com>, relay=none, delay=12522, status=deferred (connec
t to avt15.mail.griffin.com[85.189.39.55]: No route to host)
Nov 1 11:30:35 smtp-out-b postfix/qmgr[29559]: 5C6037A2683:
from=<foozoom.co.uk>, size=2903, nrcpt=1 (queue active)
Nov 1 11:34:30 smtp-out-b postfix/smtp[29644]: 5C6037A2683:
to=<bargriffin.com>, relay=none, delay=15539, status=deferred (connec
t to avt15.mail.griffin.com[85.189.39.55]: No route to host)
Nov 1 11:48:38 smtp-out-b postfix/qmgr[29559]: 5C6037A2683:
from=<foozoom.co.uk>, size=2903, nrcpt=1 (queue active)
Nov 1 11:52:49 smtp-out-b postfix/smtp[29645]: 5C6037A2683:
to=<bargriffin.com>, relay=none, delay=16638, status=deferred (connec
t to avt15.mail.griffin.com[85.189.39.55]: No route to host)
Nov 1 12:27:21 smtp-out-b postfix/qmgr[29559]: 5C6037A2683:
from=<foozoom.co.uk>, size=2903, nrcpt=1 (queue active)
Nov 1 12:30:05 smtp-out-b postfix/smtp[31630]: 5C6037A2683:
to=<bargriffin.com>, relay=avt0.mail.griffin.com[85.189.39.40], delay
=18874, status=sent (250 Ok: queued as 2FF8D5D68DC)
Nov 1 12:30:05 smtp-out-b postfix/qmgr[29559]: 5C6037A2683: removed

The MX records for griffin.com (target domain in this case) are (done
from the same machine the mail log extract came from):
# dig mx griffin.com

; <<>> DiG 9.3.1 <<>> mx griffin.com
;; global options: printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 37746
;; flags: qr rd ra; QUERY: 1, ANSWER: 17, AUTHORITY: 2, ADDITIONAL: 4

;; QUESTION SECTION:
;griffin.com. IN MX

;; ANSWER SECTION:
griffin.com. 8 IN MX 105 avt5.mail.griffin.com.
griffin.com. 8 IN MX 106 avt6.mail.griffin.com.
griffin.com. 8 IN MX 107 avt7.mail.griffin.com.
griffin.com. 8 IN MX 108 avt8.mail.griffin.com.
griffin.com. 8 IN MX 109 avt9.mail.griffin.com.
griffin.com. 8 IN MX 110
avt10.mail.griffin.com.
griffin.com. 8 IN MX 111
avt11.mail.griffin.com.
griffin.com. 8 IN MX 112
avt12.mail.griffin.com.
griffin.com. 8 IN MX 113
avt13.mail.griffin.com.
griffin.com. 8 IN MX 114
avt14.mail.griffin.com.
griffin.com. 8 IN MX 115
avt15.mail.griffin.com.
griffin.com. 8 IN MX 10 avt0.mail.griffin.com.
griffin.com. 8 IN MX 30 av.griffin.com.
griffin.com. 8 IN MX 101 avt1.mail.griffin.com.
griffin.com. 8 IN MX 102 avt2.mail.griffin.com.
griffin.com. 8 IN MX 103 avt3.mail.griffin.com.
griffin.com. 8 IN MX 104 avt4.mail.griffin.com.

;; AUTHORITY SECTION:
griffin.com. 9395 IN NS auth2.dns.griffin.com.
griffin.com. 9395 IN NS auth1.dns.griffin.com.

;; ADDITIONAL SECTION:
avt0.mail.griffin.com. 80346 IN A 85.189.39.40
av.griffin.com. 7 IN A 85.189.39.7
avt1.mail.griffin.com. 944 IN A 85.189.39.41
avt2.mail.griffin.com. 72155 IN A 85.189.39.42

;; Query time: 1 msec
;; SERVER: 85.189.39.5#53(85.189.39.5)
;; WHEN: Thu Nov 1 15:46:31 2007
;; MSG SIZE rcvd: 503

So, according to the priority list, it should have tried avt0 first, av
second, avt1 3rd, etc. Yet it clearly didn't. It went straight to avt15
and got stuck there for over 5 hours.

Gordan