OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
 
Re: [Dailydave] approximate string matching - Bloom filters

From: Fausett, Mark (US SSA) (mark.fausettbaesystems.com)
Date: Fri Sep 01 2006 - 12:23:43 CDT


Bloom filters are approximate in a different sense though -- Think of
them as space efficient, but lossy token sets; you put a bunch of tokens
in, and subsequently can query whether a particular token was placed
into the set; to some degree of confidence.
Bloom filters are subject to false positives -- they'll sometimes
incorrectly tell you that a token is in the set -- but not false
negatives. Because hashing functions are used to insert tokens into the
bloom filter, the false positives have nothing to do with approximate
string matches.

Cheers,

Mark Fausett
-----Original Message-----
From: dailydave-bounceslists.immunitysec.com
[mailto:dailydave-bounceslists.immunitysec.com] On Behalf Of Martin
Roesch
Sent: Friday, September 01, 2006 10:06 AM
To: Mateusz Berezecki
Cc: dailydavelists.immunitysec.com
Subject: Re: [Dailydave] approximate string matching

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Have you taken a look at Bloom Filters? I'm not sure what your exact
requirements are but Bloom Filters are fast and do probabilistic pattern
matching. Check out:

http://en.wikipedia.org/wiki/Bloom_filter
http://www.arl.wustl.edu/~sarang/analysis.pdf

They're easy to implement and fast, so maybe that will work for you.
Hope that helps!

      -Marty

...SNIP...

_______________________________________________
Dailydave mailing list
Dailydavelists.immunitysec.com
http://lists.immunitysec.com/mailman/listinfo/dailydave