OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
RE: improving random record selection

From: Jerry Schwartz (jschwartzthe-infoshop.com)
Date: Mon May 19 2008 - 09:24:56 CDT


>-----Original Message-----
>From: Rob Wultsch [mailto:wultschgmail.com]
>Sent: Saturday, May 17, 2008 6:47 PM
>To: Scott Haneda
>Cc: mysqllists.mysql.com
>Subject: Re: improving random record selection
>
>On Sat, May 17, 2008 at 2:32 PM, Scott Haneda <talklistsnewgeo.com>
>wrote:
>> $sql = "
>> SELECT storage_path, image_md5, t.id
>> FROM images AS t
>> JOIN
>> (SELECT CEIL(MAX(id)*RAND()) AS id FROM images)
>AS x
>> ON (t.id >= x.id)
>> AND (t.approved = 1) AND (t.ip_address !=
>> '$exclude_ip') LIMIT 1";
>I am going to reformat (whitespace only) your query a bit to start out
>with:
>SELECT storage_path, image_md5, t.id
>FROM images AS t
> JOIN (
> SELECT CEIL(
> MAX(id)*RAND()
> ) AS id
> FROM images
> ) AS x ON (t.id >= x.id)
> AND (t.approved = 1)
> AND (t.ip_address != '$exclude_ip')
>LIMIT 1
>
>I am going to break this up a bit:
>SELECT storage_path, image_md5, t.id
>FROM images AS t
>JOIN ....
>This should be mostly self explanatory. t.id specifies the table
>because id is ambiguous (x.id is created later on). Please note that I
>never use JOIN by itself. I would make this an INNER JOIN.
>
>SELECT CEIL(
> MAX(id)*RAND()
> ) AS id
>FROM images AS x
>
>MAX(id) find the largest id that currently exists. This value is then
>multiplied by whatever rand returns, which would be a between 0 and 1.
>The result of the multiplication is then rounded up, and aliased as
>id. The the table (of one row) is then aliased as x. So you now have
>x.id which is a random number between 0 and the largest id value that
>currently exists.
>
>ON (t.id >= x.id)
> AND (t.approved = 1)
> AND (t.ip_address != '$exclude_ip')
>Finally we have your JOIN condition. It says, for the table aliased as
>t, the id must be great than or equal to x.id (which was explained
>above). This will eliminate some portion of the images table from the
>possibility of being selected. Next all rows in the same table where
>approved is not equal to 1 should be removed. Finally all rows that
>fail t.ip_address != '$exclude_ip' get excluded.
>
>LIMIT 1
>Only return one row.
>
>Problems:
>1. You should be using:
> AND (t.approved = 1)
> AND (t.ip_address != '$exclude_ip')
>in the subquery. If x.id is larger than the largest row that fits
>those conditions you will get no results.
>2. There is no ORDER BY clause. There is nothing telling MySQL use the
>t.id which is next largest value above x.id. MySQL will probably pick
>out the right row, because they are probably stored in order. You
>probably can get away with not having the ORDER BY clause, and it will
>cost you extra cycles. How many extra cycles depends on how out of
>order the table is. You can reorder the row by id using:
>ALTER TABLE images ORDER BY id;
>3. If the holes in your data are not distributed equally...
>
>Suggested new query:
>SELECT storage_path, image_md5, t.id
>FROM images AS t
> INNER JOIN (
> SELECT CEIL(
> MAX(id)*RAND()
> ) AS id
> FROM images
> WHERE x.approved = 1
> AND x.ip_address != '$exclude_ip'
> ) AS x ON (t.id >= x.id)
>ORDER BY t.id ASC
>LIMIT 1
>
[JS] I might not understand what this is doing, but I think it will
preferentially sample the ids that are at the end of a gap.
>--
>Rob Wultsch
>wultschgmail.com
>wultsch (aim)
>
>--
>MySQL General Mailing List
>For list archives: http://lists.mysql.com/mysql
>To unsubscribe: http://lists.mysql.com/mysql?unsub=jschwartzthe-
>infoshop.com

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql