Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
Subject: alternate hashing scheme (was: Re: Feature ... more generic $mail_spool ...)
From: Piotr Klaban (makleroryl.man.torun.pl)
Date: Fri Oct 06 2000 - 08:25:35 CDT

On Mon, Oct 02, 2000 at 11:20:21PM +0200, Brad Knowles wrote:
> The one thing I will observe is that this kind of hashing
> mechanism will only scale so far -- once you get past 100-200k, it
> won't really work so well any more. In fact, it may well break down

This is what I want to say during reading the previous messages
from this thread. I have implemented a better (IMHO) hashing scheme.
Please point me to any problem that may arrise.

The scheme is:
- deliver to the files that are not related to usernames but to uids.
  The user need to have a uid, either from passwd or from database.
  The mailbox (mbox format) is the file created from the uid:

perl -nle '$_ = substr("00" . $_, -3) if $_ < 100; print \
        join("/", "/root", \
        join("/", split(//,substr($_,0,-2))), \

uid homedir/spool
1 /root/0/001
2 /root/0/002
3 /root/0/003
5 /root/0/005
10 /root/0/010
99 /root/0/099
10000 /root/1/0/0/10000
123456789 /root/1/2/3/4/5/6/7/123456789

The first version of the hashing scheme could help
you understand the whole system:

  With the first version uid was splitted into the numbers,
  the last two numbers are the file name, and the preceding numbers
  are used to create directory structure:
        12345 -> /root/1/2/3/45
        98765432 -> 9/8/7/6/5/4/32
  Then there is no more than (100+10) = 110 files/directories
  in each subdirectory.

The more users exist, the deeper directory structure is created.

In my system I use this hashing scheme for users with uid > 500.
The problem is with uids < 100, where I just add two/one zeros at the
beggining (20 -> 020, 7 -> 007 [1]).

> Of course, you can't get rid of directory hashing schemes like
> this, because not everyone has (or can have) a filesystem that can
> properly handle many thousands of files in a single directory.

Right, for now ext2 can not handle more that 1000 files
in a directory well.

[1] 007 is probably the property of James B.

Piotr Klaban