OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
Re: mystery: why did postfix stop delivering?

From: Wietse Venema (wietseporcupine.org)
Date: Tue Jul 10 2007 - 11:55:53 CDT


Ofer Inbar:
> My problem is "solved", I just don't know what caused it. I'm posting
> here looking for ideas I can investigate, not necessarily solutions.
>
> RedHat EL4
> postfix 2.2.10
>
> This morning, postfix just stopped delivering mail, but kept accepting
> new messages both from local submissiona and smtp, and queuing them.
> It was very responsive on the SMTP port. The queue kept getting
> larger and large but for several hours, it did not make any attempts
> to deliver any of them.
>
> When I noticed there was a problem, I started watching the logfile.
> I ran top and saw that there was plenty of free memory and CPU and I/O
> wait was fluctuating but spending most of its time very low. Of course
> that doesn't tell me what the conditions were at the time this first began.
>
> - I tried sending myself an email and saw it get queued and just sit there.
>
> - I ran "postfix flush" and saw no extra activity on the log file, just
> a slow but steady trickle of incoming messages being queued.
>
> - I ran postfix stop, followed by postfix start, and *whoosh*
> everything in the queue got delivered within a few minutes

The master kept spawning new smtpd/cleanup/trivial-rewrite processes,
so this would be a stuck queue manager. Such a problem resolves
itself in a few hours: the watchdog timer forces a process exit of
a stuck process (modulo !#%& Linux brain damage), after which a
new queue manager process is started.

> Now, looking at the postfix syslog file for evidence, I see nothing
> unusual. I can see the point where postfix stopped trying to deliver,
> but that's all I notice ... no more attempts to deliver messages after
> a certain point, while log entries about incoming messages and the
> occasional statistics continue.
>
> I also see nothing unusual in /var/log/messages at the time.
>
> Any ideas of what might cause postfix to act like this?

No. In 10 years of existence the Postfix queue manager has no record
of getting stuck SPONTANEOUSLY. If this is a recurring problem with
the queue manager then I would first suspect a bad executable, bad
library or bad kernel. If you have recurring failures with random
processes then I would suspect bad hardware.

        Wietse