Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
Re: canonical regexp help

From: mouss (usebsdfree.fr)
Date: Wed Sep 13 2006 - 16:32:20 CDT

Mark Edwards wrote:
> On Sep 11, 2006, at 2:26 PM, mouss wrote:
>> there should be no place for whishful thinking here. you get N mails,
>> each takes K seconds to filter. if you do online filtering, you can
>> serve these without riskig a clinet timeout. a client timeout is bad
>> (it means data is transferred multiple times without success).
>> there's no "he will retry later", because you're not providing any
>> guarantee that a retry will have more sucess.
> Yes, but in practice legitimate clients do retry.

They will, but your server _may_ still make them timeout. so in theory
(I know, it's just theory), it is possible for multiple clients not to
be able to send you mail.

> I don't know what the applicable RFC's are for this, but I have never
> actually observed a case where a client actually gave up and bounced.
> That happens if a server is offline for days, yes, but that isn't a
> scenario that I would tolerate. If things were so shaky that mails
> were timing out that many times in succession, there's no way I would
> leave that pre-cache filter implemented. In practice, I have
> successfully run pre-cache filtering on a small server (10 mail users)
> with no such ill effects.

well, as far as you don't get much mail, it's ok. but you need to keep
your eyes on the system a little more than if you used an
after-the-queue filter.

>> come on. we all have a lot of spam. I receive a huge quantity of spam
>> but that's not a reason to make me choose random approaches to fight
>> it, because I assume that if I can't use email the way I want it,
>> then I lost the battle, and I also believe that arbitrary approaches
>> won't work. but I may be too rational...
> How is this an arbitrary approach?

I was referring to "postfix stop/start" that I used an "extreme" example.

> I don't see any other reasonable approach. Sending mail that gets
> identified by SpamAssassin into a blackhole with no notification is
> breaking email far worse than risking possible timeouts. Mail should
> never be deleted without at least some attempt at notification as to why.

True. This is why I deliver spam, either to "Inbox" or to a "Junk"
folder (user prefs). No RFC requires users to read their mail, and I am
not responsible for any loss:)

> If I just tag the spams and send them through, users will sort and
> delete them without looking at them, which is effectively the same
> thing as blackholing them.

No. Not at all. If the recipient deletes the message, it's his problem.

> It might be "correct" on the part of the administrator, but its
> still breaking the system. And doing no spam filtering and forcing it
> all on the user is arguably the worst and most arbitrary approach of all.
> A properly configured pre-cache spam filter with some kind of
> notification of why the mail was rejected is the only reasonable
> approach for a small site, and I'm totally comfortable with it. My
> users, at least, are eternally grateful. I understand its not a
> perfect setup, but its the least bad of the options as far as I can tell.

The problem here is that you are deciding what is spam and what is not,
and this decision is configured locally. so if the sender (assuming it
was not forged) receives your bounce, he may feel helpless (your users
get headers with SA rules, so one can see which rules were being
triggered. the remote user doesn't have this chance: he will get a
"techy" bounce message).

The other problem is that you don't give your users a chance to tell you
that SA was wrong in its decision. so you don't detect false positives.
> Anyway, thanks for the input, if someone can suggest a better approach
> to filter spam safely I'm all ears, but otherwise I'm happy with the
> pre-cache SpamAssassin approach for now.

As far as you're happy...