OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
Subject: Re: idea: content filtering
From: Russ Allbery (rrastanford.edu)
Date: Fri May 12 2000 - 13:36:50 CDT


Bennett Todd <betrahul.net> writes:
> 2000-05-11-21:29:51 Russ Allbery:

>> But it's *way* easier and generally actually faster to write your
>> filters in Perl than in C.

> I actually write my filters in PCRE regexps. They're read and applied
> first by the teensy C screening program, then by the persistent perl
> daemon. The teensy program doesn't even try to look at anything larger
> than a configurable threshhold --- 1MB by default; the perl daemon does
> paragraph-at-a-time examination, and has the code to reformat the
> message to disable any worm it may contain.

I strongly suggest that you benchmark your C screening program linked with
PCRE against an embedded Perl filter to do the same task, if you haven't
already. I'll bet you'll find the speed difference to be negligible with
a well-written regex.

Note also that mail is in the same situation as news, namely that it's
almost completely I/O bound and processor time is almost free provided
that you structure things so that disk I/O can be going on at the same
time as the processing.

-- 
Russ Allbery (rrastanford.edu)             <http://www.eyrie.org/~eagle/>