OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
Postfix Archives: Re: Slow downstream site wedged my Postfix...

Re: Slow downstream site wedged my Postfix...


Subject: Re: Slow downstream site wedged my Postfix...
From: Wietse Venema (wietseporcupine.org)
Date: Tue Jan 25 2000 - 10:41:07 CST


In this case, the solution is for Postfix to not accept tons of
mail for the downed site to begin with.

This is a strategy that I thought up in off-list discussions with
Patrick Rak. Although we disagreed initially on implementation
details the general approach is that the master maintains a count
that is incremented when mail is inserted into the incoming queue,
and decremented when mail is delivered (or deferred, as the case
may be).

As the count exceeds some threshold, Postfix inserts delays into
the the SMTP dialog, so it still accepts mail, just a bit slower.
If lots of mail comes in over a few connections this strategy
effectively slows down the sender and does not affect other inbound
mail too much. If the mail flood comes from all over the network,
eventually the connect() queue in the kernel will fill up and client
connections will start to time out. That's still better than Postfix
becoming clogged up.

What do you people think of this solution?

        Wietse

Andrew McNamara:
> And the baby chewed the axle off. 8-)
>
> A downstream site using us as an MX secondary all but completely wedged
> Postfix on one of our servers today. I think I remember Wietse
> mentioning this scenario before.
>
> The primary MX host of the site in question was very slow accepting
> e-mail, and a lot of mail built up for them on one of our postfix
> machines (around 10000 jobs).
>
> Because they weren't actually down, qmgr didn't move their jobs into
> the defered queue. Eventually, all the jobs in the active queue were
> for this one site (qmgr_message_active_limit = 5000), and legitimate
> mail started backing up in the incoming directory. Complaints about
> delays of several hours from other sites prompted me to start poking.
>
> As I understand it, qmgr already knows how many connections it has open
> to a given site (so it can implement initial_destination_concurrency
> limits). I wonder if it would be feasable for it to also track how many
> jobs in the active queue were for a given site, and either move them
> straight into deferred or just ignore them in incoming when they pass
> some threshold?
>
> ---
> Andrew McNamara (System Architect)
>
> connect.com.au Pty Ltd
> Lvl 3, 213 Miller St, North Sydney, NSW 2060, Australia
> Phone: +61 2 9409 2117, Fax: +61 2 9409 2111
>
>
>



This archive was generated by hypermail 2b27 : Tue Jan 25 2000 - 10:42:47 CST