SQL: How to delete rows from table based on a criteria

Question

I have the following table:

tbl
source    type    date
---       ---     ---
google    A       2010-02-25
google    A       2013-04-11
facebook  C       2008-10-22
facebook  C       2007-01-28

I want to keep only a single entry of each source, and the criteria is select the source tuple with min(date) group by source. The table consists of millions of records, and I'm looking for an efficient way to delete redundant records.

Does this table have any unique identifier (i.e. primary key, id field etc.)? — Chris J
– Chris J, Commented Oct 9, 2017 at 13:40

Gordon Linoff · Accepted Answer · 2017-10-09 13:46:56Z

3

In MySQL, you can do this using a join:

delete t
    from t join
         (select source, min(date) as mindate
          from t
          group by source
         ) tt
         on t.source = tt.source
    where t.date > tt.mindate;

The only way -- off-hand -- that I can think to make this more efficient is to store the aggregation result in a subquery and add an index to it.

I can also add that regardless of the computation for determining the rows to delete, deleting lots of rows in a table in inefficient. Usually, I would recommend a three-step approach:

Write a query to generate the table you want and store the results in a temporary table.
Truncate the original table.
Re-insert the (much) smaller number of rows.

answered Oct 9, 2017 at 13:46

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Madhukar · Accepted Answer · 2017-10-09 13:53:02Z

0

In Microsoft SQL, you can try this.

;
WITH cte 
        AS (SELECT ROW_NUMBER() OVER (PARTITION BY source, type
                                        ORDER BY createdate) RN
            FROM   tbsource)
DELETE FROM cte
WHERE  RN > 1;

edited Oct 9, 2017 at 13:53

answered Oct 9, 2017 at 13:48

Madhukar

1,2421 gold badge15 silver badges32 bronze badges

7 Comments

Madhukar Over a year ago

@GordonLinoff, I'm sorry. The question doesn't contain it is for MySQL.

Michael K Over a year ago

There is a mysql tag, but probably should be in the title too if that's what it should be.

Madhukar Over a year ago

@mikato, Yeah correct! Even SQL tag is there which makes this question to see as a generic one.

Parfait Over a year ago

And what is Microsoft SQL? Likely you meant Microsoft SQL Server or Microsoft TSQL. This is not splitting hairs since MS Access has a SQL dialect of its own distinct from MSSQL and both are MS products. And Microsoft Query spans several interfaces.

Madhukar Over a year ago

@Parfait, You're correct. I meant Microsoft SQL Server or Transact SQL here.

|

Nayan Sharma · Accepted Answer · 2017-10-09 14:10:55Z

0

delete from t where date not in (select al.d from (select min(date) as d  from t group by source )al);

answered Oct 9, 2017 at 14:10

Nayan Sharma

1,85318 silver badges21 bronze badges

1 Comment

Toby Speight Over a year ago

Thank you for this code snippet, which might provide some limited, immediate help. A proper explanation would greatly improve its long-term value by showing why this is a good solution to the problem, and would make it more useful to future readers with other, similar questions. Please edit your answer to add some explanation, including the assumptions you've made.

Gaetano Piazzolla · Accepted Answer · 2017-10-09 14:21:28Z

0

Add an identity column to the duplicate table as a serial number that acts as a row unique identifier(auto incremental ascending order):

 alter table tbl add sno int identity(1,1)

This query selects only non duplicated rows with min(date):

(select min(date),sno From tbl group by source)

So "sno" will be equals to "1" and "4".

Now join with this table, and delete the records of the join that are duplicated (t.sno is null)

delete E from tbl E
    left join
    (select min(date),sno From tbl group by source) T on E.sno=T.sno
where T.sno is null

Solution adapted from method 3 of this link: LINK

answered Oct 9, 2017 at 14:21

Gaetano Piazzolla

1,5671 gold badge18 silver badges32 bronze badges

Collectives™ on Stack Overflow

SQL: How to delete rows from table based on a criteria

4 Answers 4

Comments

7 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

7 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related