OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: Wietse Venema (wietse_at_porcupine.org)
Date: Mon Sep 02 2002 - 09:58:05 CDT

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    Interesting. This certainly pushes the limit of usability for a
    solution that does not match the problem well (Solution: match each
    message body line as a completely independent piece of data.
    Problem: mail content is organized into clumps of similar content).
    My fault. Neat hack, though.

            Wietse

    Bert Driehuis:
    > I ran into performance issues with body_checks. My cleanup daemon was
    > using huge amounts of CPU time, and profiling showed that (surprise!)
    > pcre_exec was the culprit. I've toyed with solutions outside of
    > postscript (like merging as many rules as possible into one regexp), but
    > this quickly proved itself to be unworkable (mechanising it requires a
    > PhD in traveling salesman planning, PCRE limitations show, and the
    > resulting regexp is error prone and hard to debug, and barely faster to
    > boot). After a bunch of false starts, this is what I think is the bottom
    > line:
    >
    > The bottleneck of Postfix's current PCRE dictionary is that for every
    > line, all regexps have to be tried. With 800 body checks, as I had at
    > some stage, that's millions of regexp matches for a single message with
    > an attachment of a couple hundred K. Header_checks don't hurt as badly,
    > but if you handle large volumes of small messages and have more than a
    > few rules, this will still take a significant bite out of rule checking
    > (especially if you wish to keep your ruleset legible).
    >
    > I tried to optimized it by skipping irrelevant checks. I considered
    > doing this in a way that's completely hidden to the user, but PCRE is
    > too rich to safely automate this, and besides, I didn't want to burden
    > the Postfix code with complex (and thus, hard to test) code.
    >
    > The attached diff allows for an IF .. ENDIF construct, like this:
    >
    > IF /http/
    > /http:\/\/banned_site\// REJECT
    > /http:\/\/anothersite\// REJECT
    > /http:\/\/yetanothersite\// REJECT
    > ENDIF
    > IF /</
    > /<script.*evil.js/ REJECT
    > /<embed src.*evil.swf/ REJECT
    > ENDIF
    >
    > The regexps between the IF end the ENDIF are not evaluated unless the
    > regexp after the IF matches. Some trivial reorganisations of my
    > body_checks reduced the CPU load to a third of what it was before, when
    > run on my spam archive (I don't have long term measurements from a live
    > system yet, but I expect my spam mailbox to be richer in the words I key
    > off than real-life mail). I haven't even started going over the whole
    > set; I'm currently still evaluation about a 100 rules that probably
    > share more keywords to key off.
    >
    > This still doesn't turn PCRE checks into a procedural language, which is
    > probably a good thing; anyway, this is intended for optimization only.
    >
    > I'd like to get feedback both on the quality of the solution, and on its
    > necessity. I'll be happy to keep this as a private solution and maintain
    > it like Jozsef Kadlecsik's per-user UCE patch, but maybe Wietze can be
    > swayed into implementing this or something like it in the official
    > Postfix release eventually. Please drop me a note in private if you just
    > want to say "me too" or "I don't care", so as not to swamp this list
    > with the "I just want to be counted" responses. It's busy enough as-is
    > :-) Google turned up a few concerns about the regexp overhead, but not
    > enough to convince me that my performance issue is typical, so feedback
    > is definitely appreciated.
    >
    > Note that this patch addresses PCRE exclusively, but the classic regex
    > implementation is similar and I'll be happy to code this up for regex if
    > there is interest.
    >
    > Cheers,
    >
    > -- Bert
    >
    > --
    > Bert Driehuis, MIS -- bert_driehuisnl.compuware.com -- +31-20-3116119
    > Dihydrogen Monoxide kills! Join the campaign at http://www.dhmo.org/
    Content-Description:

    [ Attachment, skipping... ]

    -
    To unsubscribe, send mail to majordomopostfix.org with content
    (not subject): unsubscribe postfix-users