1

I have a postgres database with several tables like table1, table2, table3. More than 1000 tables.

I imported all of these tables from a script. And apparently the script had issues to import.

Many tables have duplicate rows (all values exactly same).

I am able to go in each table and then delete duplicate row using Dbeaver, but because there are over 1000 tables, it is very time consuming.

Example of tables:

table1

name      gender     age
a         m          20
a         m          20
b         f          21
b         f          21

table2

fruit     hobby      
x         running
x         running
y         stamp
y         stamp

How can I do the following:

  • Identify tables in postgres with duplicate rows.
  • Delete all duplicate rows, leaving 1 record.

I need to do this on all 1000+ tables at once.

3
  • 1
    This would be easier to answer with some examples of the records you'd like to delete. Commented Jul 30, 2021 at 18:13
  • Hi @dang: for solution need to write a stored procedure where using table wise loops for removing duplicate records. Commented Jul 30, 2021 at 18:15
  • @JakeWorth added examples of tables Commented Jul 30, 2021 at 19:05

1 Answer 1

1

As you want to automate your deduplication of all table, you need to use plpgsql function where you can write dynamic queries to achieve it.

Try This function:

create or replace function func_dedup(_schemaname varchar) returns void as
$$
declare
_rec record;
begin

for _rec in select table_name from information_schema. tables where table_schema=_schemaname

loop
execute format('CREATE TEMP TABLE tab_temp as select DISTINCT * from '||_rec.table_name);
execute format('truncate '||_rec.table_name);
execute format('insert into '||_rec.table_name||' select * from tab_temp');
execute format('drop table tab_temp');
end loop;

end;
$$
language plpgsql

Now call your function like below:

select * from func_dedup('your_schema'); --

demo

Steps:

  1. Get the list of all tables in your schema by using below query and loop it for each table.
select table_name from information_schema. tables where table_schema=_schemaname
  1. Insert all distinct records in a TEMP TABLE.
  2. Truncate your main table.
  3. Insert all your data from TEMP TABLE to main table.
  4. Drop the TEMP TABLE. (here dropping temp table is important we have to reuse it for next loop cycle.)

Note - if your tables are very large in size the consider using Regular Table instead of TEMP TABLE.

Sign up to request clarification or add additional context in comments.

1 Comment

Is there a way to find all the tables where I have duplicate rows?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.