Delete Duplicate rows in several Postgresql tables

Question

I have a postgres database with several tables like table1, table2, table3. More than 1000 tables.

I imported all of these tables from a script. And apparently the script had issues to import.

Many tables have duplicate rows (all values exactly same).

I am able to go in each table and then delete duplicate row using Dbeaver, but because there are over 1000 tables, it is very time consuming.

Example of tables:

table1

name      gender     age
a         m          20
a         m          20
b         f          21
b         f          21

table2

fruit     hobby      
x         running
x         running
y         stamp
y         stamp

How can I do the following:

Identify tables in postgres with duplicate rows.
Delete all duplicate rows, leaving 1 record.

I need to do this on all 1000+ tables at once.

This would be easier to answer with some examples of the records you'd like to delete. — Jake Worth
– Jake Worth, Commented Jul 30, 2021 at 18:13
Hi @dang: for solution need to write a stored procedure where using table wise loops for removing duplicate records. — Rahul Biswas
– Rahul Biswas, Commented Jul 30, 2021 at 18:15

Akhilesh Mishra · Accepted Answer · 2021-07-31 04:53:35Z

1

As you want to automate your deduplication of all table, you need to use plpgsql function where you can write dynamic queries to achieve it.

Try This function:

create or replace function func_dedup(_schemaname varchar) returns void as
$$
declare
_rec record;
begin

for _rec in select table_name from information_schema. tables where table_schema=_schemaname

loop
execute format('CREATE TEMP TABLE tab_temp as select DISTINCT * from '||_rec.table_name);
execute format('truncate '||_rec.table_name);
execute format('insert into '||_rec.table_name||' select * from tab_temp');
execute format('drop table tab_temp');
end loop;

end;
$$
language plpgsql

Now call your function like below:

select * from func_dedup('your_schema'); --

demo

Steps:

Get the list of all tables in your schema by using below query and loop it for each table.

select table_name from information_schema. tables where table_schema=_schemaname

Insert all distinct records in a TEMP TABLE.
Truncate your main table.
Insert all your data from TEMP TABLE to main table.
Drop the TEMP TABLE. (here dropping temp table is important we have to reuse it for next loop cycle.)

Note - if your tables are very large in size the consider using Regular Table instead of TEMP TABLE.

edited Jul 31, 2021 at 4:53

answered Jul 31, 2021 at 4:44

Akhilesh Mishra

6,1503 gold badges20 silver badges37 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

dang Over a year ago

Is there a way to find all the tables where I have duplicate rows?

Collectives™ on Stack Overflow

Delete Duplicate rows in several Postgresql tables

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related