1
with de_duplicate (ad_id, id_type, lat, long) AS (
select ad_id, id_type, lat, long,
Row_Number() over(partition by ad_id,id_type, lat, long) AS duplicate_count
from tempschema.temp_test)
select * from de_duplicate;

Above runs successful but when I try to perform a delete operation

with de_duplicate(ad_id, id_type, lat, long) AS 
(
select ad_id, id_type, lat, long,
Row_Number() over(partition by ad_id,id_type, lat, long) AS duplicate_count
from tempschema.temp_test
)
delete from de_duplicate where duplicate_count > 1;

It throws an error Amazon Invalid operation: syntax error at or near "delete" Position: 190;

I am running these queries on a redshift cluster. Any thoughts?

3
  • 1
    This syntax is not allowed in Redshift. Commented Jan 18, 2018 at 1:20
  • who normally deletes over a CTE anyone? Commented Jan 18, 2018 at 1:41
  • This is a temporary table for which I don't need a backup. Any help with the right syntax @GordonLinoff is greatly appreciated. Thanks Commented Jan 18, 2018 at 2:10

2 Answers 2

1

Consider converting CTE into a subquery and add the unique_id to match against outer query:

DELETE FROM tempschema.temp_test
WHERE unique_id NOT IN
  (SELECT sub.unique_id
   FROM 
      (SELECT unique_id, ad_id, id_type, lat, long,
              ROW_NUMBER() OVER (PARTITION BY ad_id, id_type, lat, long) AS dup_count
        FROM tempschema.temp_test) sub
   WHERE sub.dup_count > 1) 

Alternatively, consider deleting using an aggregate subquery:

DELETE FROM tempschema.temp_test
WHERE unique_id NOT IN
   (SELECT MIN(unique_id)
    FROM tempschema.temp_test
    GROUP BY ad_id, id_type, lat, long)

Of course both assumes you have a unique_id in table but can be adjusted if not.

Sign up to request clarification or add additional context in comments.

2 Comments

Select statement works fine but delete statement throws the similar error.
For which approach -both? If first, then that's interesting. Redshift tends to follow Postgre's SQL dialect but I guess CTEs are not allowed in DELETE. Consider moving entire CTE in FROM clause of subquery.
0

I understand what you're trying to do, it's a common problem, but the approach has 2 issues:

1) you're trying to delete from the result of your query (de_duplicate), not from the source table (tempschema.temp_test). Even if you identify duplicates in de_duplicate statement it has nothing to do with the source table tempschema.temp_test.

2) CTE (WITH clause) doesn't work directly with DELETE and UPDATE, they require joined subqueries.

The two possible approaches in your case:

1) use a joined subquery if you have a unique ID and duplication criteria in your table (val in the test case below, so id=3 and id=4 are duplicates):

create table test1 (id integer, val integer);
insert into test1 values (1,1),(2,2),(3,3),(4,3);

delete from test1 using (
    select *
    from (
        select *, row_number() over (partition by val order by id desc)
        from test1
    )
    where row_number>1
) s
where test1.id=s.id;

2) create a cleaned staging table and swap the tables:

create table tempschema.temp_test_staging (like tempschema.temp_test);
insert into tempschema.temp_test_staging
select *
from (
    select ad_id, id_type, lat, long,
    Row_Number() over(partition by ad_id,id_type, lat, long) AS duplicate_count
    from tempschema.temp_test
)
where duplicate_count=1;
alter table tempschema.temp_test rename to temp_test_old;
alter table tempschema.temp_test_staging rename to temp_test;

1 Comment

Thanks @AlexYes, as I am trying delete duplicates, I have selected only the rows with dup_count as 1. I think the problem is solved for me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.