|
Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com |
Subject: Re: Content filtering 101
From: Bennett Todd (bet
rahul.net)Date: Wed Jun 07 2000 - 10:41:48 CDT
- Next message: Rask Ingemann Lambertsen: "Re: multiple mail servers"
- Previous message: Yiorgos Adamopoulos: "Re: pop-before-smtp (was Re: FreeInet and checking mail?)"
- In reply to: Liviu Daia: "Content filtering 101"
- Next in thread: Liviu Daia: "Re: Content filtering 101"
- Reply: Bennett Todd: "Re: Content filtering 101"
- Reply: Liviu Daia: "Re: Content filtering 101"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
2000-06-07-09:16:00 Liviu Daia:
> (2) Bennett Todd's tailbiter, version 1.1 (with the latest patch).
>
> It has the same major drawbacks as the previous versions of
> macofida:
Yup. I was trying to figure out the filtering interface and get a
simple tool in place for function and performance testing; I'm going
to take a look at the latest macofida as a framework for running my
filter, although I want to lose the factor-of-4 performance hit
first. Then again, I guess I oughta try macofida and make sure it
suffers a comparable performance hit; the problem could be in the
choice of modules I'm using for the SMTP, and the way I'm using
them.
> - it can't send back an error code to the client process, so if
> something goes wrong the best it can do is to save the message to
> /var/tmp/tailbiter.<PID> (provided it managed to read it first ---
> otherwise the message is simply lost;
If it didn't manage to read it first, I'd like to hope that Postfix
would have noticed that the dialogue failed, and held the message.
If not there's not much I can do. It's true that it'd be sexier to
be able to report other problems --- can't relay back in to postfix,
out of memory, out of disk, whatever --- back to the sending
postfix, to get the message held where it is. That's the main
incentive I see to switch to macofida.
> - the messages is loaded completely in memory; that happens because of
> the way Net::SMTP::Server::Client works, so saving it to a temporary
> file won't change that;
And indeed that's necessary if the scanning algorithm you wish to
apply is running a regexp (or series of regexps) over the whole
message in multi-line match mode.
> - it uses Net::SMTP::Server for spawning children, which looks much less
> robust than Net::Daemon, at least AFAICT.
Sorry? Net::SMTP::Server doesn't spawn children, and if I didn't
explicitly fork right in the perl I wrote, this would be a
single-threaded, purely sequential SMTP server. And given the
benchmark results I got --- no improvement from multi-processing ---
that'd probably be more efficient. But Linux forks quick, so I'm not
going to sweat that performance hit right now. The main reason I
forked was so that when reading the whole message into memory, one
gigantic bloater of a message wouldn't leave the long-lived,
persistent daemon all bloated up.
> It also has a few other minor annoyances, like calling
> "gethostbyaddr" for no real reason (fun when not running a DNS),
Well, it seemed to me the natural way to check and make sure the
connection was coming from localhost. Easy enough to # out if you
don't want to do that check. Or recode it to compare the raw addr
against a packed 127.0.0.1. Whatever.
> [...] and not being able to cope with my (admittedly not the
> latest-and-greatest) Perl 5.004_04 at home, because:
because 5.004_04 doesn't support pre-compiling regexps, that's why.
If you want to recode the filter to recompile the regexps for every
message, so you can continue to use an unfortunately old release,
that shouldn't be too hard; in fact, I think it'll work if you just:
--- tailbiter Tue Jun 6 09:59:57 2000
+++ tailbiter.oldperl Wed Jun 7 11:31:37 2000

-64,7 +64,7 
open(FP, "<$pat") or die "$0: $pat: $!\n";
while (<FP>) {
chomp;
- push
re, qr/$_/im;
+ push
re, $_;
}
close FP;
> (a) Basically, the above setup is useless for testing filter
> performance. What seems to happen here is that Postfix will
> happily pump up the messages to the filter at full speed (because
> of the 1000 limit above I get essentially the same rate as in the
> unfiltered case), a huge queue is created by the second smtpd, and
> smtp-source returns without waiting for it to drain. stat-ing the
> spool would probably interfere with the results, so how do we test
> this? Comments / corrections / suggestions welcome.
Grab timing from the logfile, rather than the injector. Remember to
reset the logfile between each run, both to ensure identical testing
circumstances (probably not significant) and to make it really easy
to sort out logfile entries by which run they apply to.
> (b) Running tinydns as above seems to be important. Even with
> "disable_dns_lookups = yes", Postfix tries to resolve localhost.
> Without actually looking into it, I'd say $disable_dns_lookups
> only affects smtp, while smtpd still tries to resolve client's IP.
> Wietse?
Tries to _resolve_ localhost? Or to gethostbyaddr localhost? The
latter should be happy to refer to /etc/hosts if you really have no
DNS at all.
I ran dnscache as my resolver, and it has localhost wired in, so I
didn't notice that potential problem.
> (c) I didn't try running tailbiter instead of macofida, but I suspect
> the same thing happens with it: the "backlog" observed earlier by
> Bennett is actually the queue created by the second smtpd, and
> the big speed difference is actually due to "gethostbyaddr" and /
> or other DNS lookup failures. Again, comments / corrections are
> welcome.
Now _that's_ a fascinating theory, I should be able to test that
easy enough....
Nope, I do not believe that's the case, at least with my test setup.
When I page through a logfile from my test run, I get 5-10 messages
being accepted by smtpd for each smtp sending on into the filter,
until the injector visibly stops loading messages in, and the
backlog drains. The filter was continuing to be invoked until very
near the end of the logfile; the back-end smtp delivering into
/dev/null does not seem to have collected any backlog.
Would you like me to email you a log? Only about 50KB compressed
with bzip2.
-Bennett
- application/pgp-signature attachment: stored
- Next message: Rask Ingemann Lambertsen: "Re: multiple mail servers"
- Previous message: Yiorgos Adamopoulos: "Re: pop-before-smtp (was Re: FreeInet and checking mail?)"
- In reply to: Liviu Daia: "Content filtering 101"
- Next in thread: Liviu Daia: "Re: Content filtering 101"
- Reply: Bennett Todd: "Re: Content filtering 101"
- Reply: Liviu Daia: "Re: Content filtering 101"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]