OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
Re: active queue growing fast: advice?

From: Noel Jones (njonesmegan.vbhcs.org)
Date: Fri Aug 17 2007 - 11:31:09 CDT


At 10:56 AM 8/17/2007, Dave McGuire wrote:

> Hi folks. I have a production mail server that has been badly
>constipated since yesterday afternoon and I'm trying to figure out
>why. I could really use a little help if someone has a moment.
>
> The basic configuration is a front-end system running Postfix
>v2.4.0, with amavisd-new/SpamAssassin/ClamAV, which filters mail and
>then sends it along to multiple destinations via transport maps, one
>of which is a second machine on the local network also running
>Postfix v2.4.0. Pretty much everything for both of these machines is
>stored in a MySQL server running on a third system.
>
> They've been running great for many months, with no recent
>configuration changes that I am aware of, but suddenly the active
>queue on the front-end machine has become large, around 10K messages
>and climbing, and very little mail is getting past the frnot-end
>machine. I see no obvious problems in the logs.
>
> A quick run of "qshape active" suggests that a large volume of
>mail destined for the second server mentioned above is sitting in the
>active queue for reasons unknown. The second server is on the local
>network, is under very little load, and is working fine otherwise. I
>can telnet to port 25 on that box and manually send stuff through;
>everything behaves as expected.
>
> I can't seem to figure out why the front-end machine is having
>trouble sending messages to the destinations configured in the
>transport map, even the one that's local, fast, and not busy. It
>acts as if it's not even trying. Is there some way to get some
>visibility into what Postfix is doing in that regard? I've checked
>over the logs and spotted nothing useful, but of course the log
>messages are blowing by so fast (as usual) that finding anything in
>there is rough.

Are these messages to valid recipients? Most common source of queue
congestion accepting mail to undeliverable recipients and then
bouncing the rest.

Next you need to determine if the final destination or the
content_filter is the bottleneck.

Check some messages with "postcat" to see if they are waiting for the
content_filter or for the final destination. Note that "qshape"
doesn't differentiate mail waiting for the content_filter from mail
waiting for the final destination.

Check the throughput of amavisd-new. If it has suddenly started
taking longer to process messages that would explain your
backlog. Common problems related to amavisd-new are
- Spamassassin timing out or taking unreasonably long
    - due to corrupted Bayes database
    - dead RBL lookup timing out
    - some system module failing
- or ClamAV taking unreasonably long
    - because clamd has failed and the slower "clamscan" backup is running
    - some misconfiguration

The above is not an complete list of what can go wrong ... ;-)

To debug, first increase logging of amavisd-new to level 2 to get
more detailed timing information, and run the 'amavisd-nanny" program
to monitor behavior of amavisd-new.

Good luck.

---
Noel Jones