0

There is a table with duplicates rows, where all column values are equal:

+------+---------+------------+
|  id  |  value  | timestamp  |
+------+---------+------------+
| 1    |  500    | 2019-10-12 |
| 2    |  400    | 2019-10-11 |
| 1    |  500    | 2019-10-12 |
+------+---------+------------+

I want to keep one of those equal rows and delete the others. I came up with:

DELETE
FROM
  `table` t1
WHERE (
  SELECT
    ROW_NUMBER() OVER (PARTITION BY id),
  FROM
    `table` t2
  WHERE
    t1.id = t2.id 
) > 1

However this does not work:

Correlated subqueries that reference other tables are not supported unless they can be de-correlated, such as by transforming them into an efficient JOIN.

Any ideas how to remove duplicate rows?

1 Answer 1

1

Below is for BigQuery Standard SQL

... where all column values are equal
So you can use simple DISTINCT * and instead of DELETE use CREATE / REPLACE to write back to the same table

#standardSQL
CREATE OR REPLACE TABLE `project.dataset.table`   
PARTITION BY date
SELECT DISTINCT * 
FROM `project.dataset.table`     

In PARTITION BY clause - you should add the fields you use to partition the original table

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.