OSEC

Neohapsis is currently accepting applications for employment. For more information, please visit our website www.neohapsis.com or email hr@neohapsis.com
Re: removing duplicate entries

From: Moon's Father (yueliangdao0608gmail.com)
Date: Tue Aug 12 2008 - 00:27:58 CDT


If you're good at Chinese ,just visit here.
http://blog.chinaunix.net/u/29134/showart_375303.html

On Thu, Aug 7, 2008 at 10:34 PM, Magnus Smith <
Magnus.Smithsunderland.gov.uk> wrote:

> I think got it in the end by doing a union and a join.
>
> delete AA, PA from ACCOUNTACTION AA, ACCOUNTPAYMENTACTION PA
> where AA.ID = PA.ID and AA.ID in
> (select D.ID from
>
> (select A1.ID from ACCOUNTACTION A1
>
> left join
>
> (
> select * from ACCOUNTACTION A2
> where A2.ACTIONDATE like '2008-08-01 02:00%'
> group by A2.ACCOUNT_ID
> having count(A2.ACCOUNT_ID) > 1
>
> union
>
> select * from ACCOUNTACTION A3
> where A3.ACTIONDATE like '2008-08-01 02:00%'
> group by A3.ACCOUNT_ID having count(A3.ACCOUNT_ID) = 1
> ) as U1
>
> on A1.ID = U1.ID
> where A1.ACTIONDATE like '2008-08-01 02:00%'
> and U1.ID is NULL
> ) as D
> );
>
> Thanks for the pointers ;-)
>
>
>
>
> -----Original Message-----
> From: Magnus Smith [mailto:Magnus.Smithsunderland.gov.uk]
> Sent: 07 August 2008 10:35
> To: Ananda Kumar
> Cc: mysqllists.mysql.com
> Subject: RE: removing duplicate entries
>
> Yes I can see you are correct. I tried setting up a little test case
> myself.
>
> CREATE TABLE ACCOUNTACTION (
> ID INT NOT NULL PRIMARY KEY,
> ACTIONDATE DATETIME,
> ACCOUNT_ID INT NOT NULL
> );
>
> CREATE TABLE ACCOUNTPAYMENTACTION (
> ID INT NOT NULL PRIMARY KEY,
> AMOUNT INT
> );
>
> INSERT INTO ACCOUNTACTION (ID, ACTIONDATE, ACCOUNT_ID)
> VALUES('001', '2008-08-01 02:00:00', '101'),
> ('002', '2008-08-01 02:00:00', '101'),
> ('003', '2008-08-01 02:00:00', '101'),
> ('004', '2008-08-01 02:00:00', '102'),
> ('005', '2008-08-01 02:00:00', '103'),
> ('006', '2008-08-01 02:00:00', '104'),
> ('007', '2008-08-01 02:00:00', '104'),
> ('008', '2008-08-01 02:00:00', '105'),
> ('009', '2008-08-01 03:00:00', '104'),
> ('010', '2008-08-01 03:00:00', '105'),
> ('011', '2008-08-01 02:00:00', '106');
>
> INSERT INTO ACCOUNTPAYMENTACTION (ID, AMOUNT)
> VALUES('001', '1000'),
> ('002', '1000'),
> ('003', '1000'),
> ('004', '1000'),
> ('005', '1000'),
> ('006', '1000'),
> ('007', '1000'),
> ('008', '1000'),
> ('009', '1000'),
> ('010', '1000'),
> ('011', '1000');
>
>
> I got the following query that seems to work on my test case.
>
> I create a union of everything that is not a duplicate and then take the
> ones that are not in this to be the duplicates
>
>
> select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> and (ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID)
> not in
> (select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID
> having count(ACCOUNTACTION.ACCOUNT_ID) > 1
> union
> select ACCOUNTACTION.ID, ACCOUNTACTION.ACCOUNT_ID from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID having count(ACCOUNTACTION.ACCOUNT_ID)
> = 1);
>
>
> The problem is that when I try to apply this to my real database tables
> that are quite large then the query does not return.
>
> I am thinking that there must be a more efficient way to write the
> query?
>
> I would be pleased to hear any suggestions - thanks
>
>
>
>
> ________________________________
>
> From: Ananda Kumar [mailto:anandklgmail.com]
> Sent: 06 August 2008 13:51
> To: Magnus Smith
> Cc: mysqllists.mysql.com
> Subject: Re: removing duplicate entries
>
>
> I just did a test case here
>
>
> select * from amc_25;
> +------+
> | id |
> +------+
> | 2 |
> | 14 |
> | 1 |
> | 2 |
> +------+
> 4 rows in set (0.01 sec)
>
> select id from amc_25 group by id having count(id) >1 and id!=min(id);
>
> Empty set (0.00 sec)
>
>
>
> It does not give me any rows.
>
> R u sure the rows returned, are the one you want to keep are indeed
> duplicates
>
>
>
>
> On 8/6/08, Magnus Smith <Magnus.Smithsunderland.gov.uk> wrote:
>
> When I try the first suggestion (i) then I get all the 1682
> duplicate rows. The thing is that I need to keep the originals which
> are the ones with the lowest ACCOUNTACTION.ID <http://accountaction.id/>
> value.
>
> The second suggestion (ii) gives me 563 rows that are the
> duplicates with the lowest ACCOUNTACTION.ID <http://accountaction.id/>
> which are the ones I wish to keep
>
> So the ones I want to delete are the ones in (i) and not (ii)
>
> When I use
>
>
> select ACCOUNTACTION.ID <http://accountaction.id/> from
> ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID
> having (count(ACCOUNTACTION.ACCOUNT_ID) > 1
> and ACCOUNTACTION.ID <http://accountaction.id/> !=
> min(ACCOUNTACTION.ID <http://accountaction.id/> ));
>
> then I get 1119 rows which is all the duplicates in (i) less the
> originals in (ii)
>
>
> The problem I'm having is using this in a delete statement.
>
>
> ________________________________
>
> From: Ananda Kumar [mailto:anandklgmail.com]
> Sent: 06 August 2008 10:11
> To: Magnus Smith
> Cc: mysqllists.mysql.com
> Subject: Re: removing duplicate entries
>
>
>
> I doubt the belwo sql will give you duplcates
>
> select ACCOUNTACTION.ID <http://accountaction.id/> from
> ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID
> having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID
> <http://accountaction.id/> !=
> min(ACCOUNTACTION.ID <http://accountaction.id/> ));
>
> The reason being, for duplicates records accountaction.id
> <http://accountaction.id/> will always equal to min(accountaction.id
> <http://accountaction.id/> ).
>
>
> try this
> select ACCOUNTACTION.ID <http://accountaction.id/> from
> ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID
> having (count(ACCOUNTACTION.ACCOUNT_ID) > 1
>
> or
>
> select ACCOUNTACTION.ID <http://accountaction.id/> from
> ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID
> having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and ACCOUNTACTION.ID
> <http://accountaction.id/> =min(accountaction.id
> <http://accountaction.id/> );
>
> I would use the first select statement.
>
>
>
> On 8/6/08, Magnus Smith <Magnus.Smithsunderland.gov.uk> wrote:
>
> I have the following two tables
>
> ACCOUNTACTION
>
> +-------------------+--------------+------+-----+---------+-------+
> | Field | Type | Null | Key |
> Default | Extra |
>
> +-------------------+--------------+------+-----+---------+-------+
> | ID | bigint(20) | NO | PRI |
> | |
> | AccountActionType | varchar(31) | YES | | NULL
> | |
> | DESCRIPTION | varchar(255) | YES | | NULL
> | |
> | ACTIONDATE | datetime | YES | | NULL
> | |
> | ACCOUNT_ID | bigint(20) | YES | MUL | NULL
> | |
>
> +-------------------+--------------+------+-----+---------+-------+
>
> and
>
> ACCOUNTPAYMENTACTION
>
> +---------------+------------+------+-----+---------+-------+
> | Field | Type | Null | Key | Default |
> Extra |
>
> +---------------+------------+------+-----+---------+-------+
> | ID | bigint(20) | NO | PRI | |
> |
> | AMOUNTINPENCE | bigint(20) | YES | | NULL |
> |
>
> +---------------+------------+------+-----+---------+-------+
>
> ACCOUNTPAYMENTACTION shares the primary key with
> ACCOUNTACTION
>
> I need to remove duplicate entries that occured at a
> specific time in
> ACCOUNTACTION I then plan to remove the rows in
> ACCOUNTPAYMENTACTION
> that are no longer referenced in ACCOUNTACTION by using
> an outer join
>
> I can select the duplicate records in ACCOUNTACTION
> using
>
> select ACCOUNTACTION.ID <http://accountaction.id/> from
> ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID
> having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and
> ACCOUNTACTION.ID <http://accountaction.id/> !=
> min(ACCOUNTACTION.ID <http://accountaction.id/> ));
>
> I am trying to delete these records but am having
> trouble with the sql
> delete
>
> I tried the following but nothing happened
>
> delete ACCOUNTACTION where ACCOUNTACTION.ID
> <http://accountaction.id/> in
> (select ACCOUNTACTION.ID <http://accountaction.id/>
> from ACCOUNTACTION
> where ACCOUNTACTION.ACTIONDATE like '2008-08-01 02:00%'
> group by ACCOUNTACTION.ACCOUNT_ID
> having (count(ACCOUNTACTION.ACCOUNT_ID) > 1 and
> ACCOUNTACTION.ID <http://accountaction.id/> !=
> min(ACCOUNTACTION.ID <http://accountaction.id/> )));
>
> Can anyone help me?
>
>
>
>
>
>
> --
> MySQL General Mailing List
> For list archives: http://lists.mysql.com/mysql
> To unsubscribe:
> http://lists.mysql.com/mysql?unsub=yueliangdao0608gmail.com
>
>

--
I'm a MySQL DBA in china.
More about me just visit here:
http://yueliangdao0608.cublog.cn