OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
From: Steve Manes (smanes_at_magpie.com)
Date: Sun Sep 01 2002 - 17:57:23 CDT

  • Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

    At 05:11 PM 9/1/2002 -0500, Schmehl, Paul L wrote:
    >I think you need to reread Paul's thesis. Rather than using fifteen
    >keywords, his filter calculates the "spam value" of the fifteen *most
    >significant* words found in the message. IOW, it look at *all* the
    >strings in the message - header and body - assigns values to each and
    >every string and then sums the values of the fifteen most significant
    >words.

    What makes this work is how the filter learns from the messages you've
    marked/deleted as spam and the ones you haven't, building two dictionaries
    of the most significant words in the two sets of messages. If the filter
    sees that you always mark as spam those messages containing the word
    "madam", there's a high probability that all incoming messages containing
    that word are also spam. The trick is to develop enough sample points for
    the filter to sum the weights of words in a message and return a score,
    rather than do what a conventional spam filter does and toss anything that
    contains the phrase "dear friend", which could have innocent uses too. You
    would tighten up what the filter delivers to your mailbox by reducing the
    gateway score, say from 95% to 85%.

    It's an adaptive rather than blindly reactive filter that learns what you
    like and don't like. I could see this being deployed across a large ISP
    too. It's a pretty cool idea.

    -
    To unsubscribe, send mail to majordomopostfix.org with content
    (not subject): unsubscribe postfix-users