Deleting duplicate rows from sqlite database

Question

I have a huge table - 36 million rows - in SQLite3. In this very large table, there are two columns:

hash - text
d - real

Some of the rows are duplicates. That is, both hash and d have the same values. If two hashes are identical, then so are the values of d. However, two identical d's does not imply two identical hash'es.

I want to delete the duplicate rows. I don't have a primary key column.

What's the fastest way to do this?

lnafziger · Accepted Answer · 2012-09-14 03:05:46Z

155

You need a way to distinguish the rows. Based on your comment, you could use the special rowid column for that.

To delete duplicates by keeping the lowest rowid per (hash,d):

delete   from YourTable
where    rowid not in
         (
         select  min(rowid)
         from    YourTable
         group by
                 hash
         ,       d
         )

edited Sep 14, 2012 at 3:05

lnafziger

25.7k8 gold badges64 silver badges101 bronze badges

answered Nov 19, 2011 at 0:02

Andomar

239k55 gold badges387 silver badges412 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Ciro Santilli OurBigBook.com Over a year ago

To run this on a large table you also likely want to first CREATE INDEX YourTable_hash_d ON YourTable(hash, d) which will speed things up dramatically as per .expert.

MaDa · Accepted Answer · 2011-11-19 00:04:44Z

6

I guess the fastest would be to use the very database for it: add a new table with the same columns, but with proper constraints (a unique index on hash/real pair?), iterate through the original table and try to insert records in the new table, ignoring constraint violation errors (i.e. continue iterating when exceptions are raised).

Then delete the old table and rename the new to the old one.

answered Nov 19, 2011 at 0:04

MaDa

10.8k10 gold badges51 silver badges85 bronze badges

1 Comment

Adrian K Over a year ago

Not as elegant as simply altering the table, I guess, BUT one really good thing about your approach is that you can re-run it as many times as you like without touching/destroying the source data until you're absolutely happy with the results.

user171780 · Accepted Answer · 2022-08-10 06:27:53Z

3

The proposed solution was not working for me, so I ended up doing this:

CREATE TABLE temp_table as SELECT DISTINCT * FROM your_table
DROP TABLE your_table
ALTER TABLE temp_table RENAME TO your_table

answered Aug 10, 2022 at 6:27

user171780

3,2455 gold badges35 silver badges79 bronze badges

Comments

Community · Accepted Answer · 2017-05-23 12:10:39Z

1

If adding a primary key is not an option, then one approach would be to store the duplicates DISTINCT in a temp table, delete all of the duplicated records from the existing table, and then add the records back into the original table from the temp table.

For example (written for SQL Server 2008, but the technique is the same for any database):

DECLARE @original AS TABLE([hash] varchar(20), [d] float)
INSERT INTO @original VALUES('A', 1)
INSERT INTO @original VALUES('A', 2)
INSERT INTO @original VALUES('A', 1)
INSERT INTO @original VALUES('B', 1)
INSERT INTO @original VALUES('C', 1)
INSERT INTO @original VALUES('C', 1)

DECLARE @temp AS TABLE([hash] varchar(20), [d] float)
INSERT INTO @temp
SELECT [hash], [d] FROM @original 
GROUP BY [hash], [d]
HAVING COUNT(*) > 1

DELETE O
FROM @original O
JOIN @temp T ON T.[hash] = O.[hash] AND T.[d] = O.[d]

INSERT INTO @original
SELECT [hash], [d] FROM @temp

SELECT * FROM @original

I'm not sure if sqlite has a ROW_NUMBER() type function, but if it does you could also try some of the approaches listed here: Delete duplicate records from a SQL table without a primary key

edited May 23, 2017 at 12:10

CommunityBot

11 silver badge

answered Nov 19, 2011 at 0:02

rsbarro

27.4k9 gold badges73 silver badges75 bronze badges

1 Comment

Andomar Over a year ago

+1, not sure if sqlite supports the delete <alias> from <table> <alias> syntax though

Collectives™ on Stack Overflow

Deleting duplicate rows from sqlite database

4 Answers 4

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

1 Comment

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related