0

I have the following table redshift.

guest_id name rownum
1 Safvan 1
1 Safvan 2
1 Thomas 3
2 Anandu 1
2 Manish 2

I need to delete all the records in each partition based on guest_id except the record having max(rownum).

The result should be like

guest_id name rownum
1 Thomas 3
2 Manish 2

Thanks in advance for valuable helps.

2
  • 1
    Please, show your current attempt and describe what is wrong with it. Commented Sep 16, 2021 at 7:09
  • that helped me somewhere..it is a good thread. I have posted my solution as answer. Commented Sep 16, 2021 at 8:21

3 Answers 3

1

My solution is :

create table table_rownum as (
select
    *,
    row_number() over (partition by guest_id
order by
    rownum desc) as rownum_temp
from
    table_orig);

delete from table_rownum where rownum_temp<>1;

alter table table_rownum drop column rownum_temp;

truncate table table_orig;

insert into table_orig (select * from table_rownum);

drop table table_rownum;

Please suggest if there is better solution.

Sign up to request clarification or add additional context in comments.

1 Comment

using CTE for removing duplicate records. Please check my answer. No need to create extra table and also perform drop or truncate. Only DELETE use for removing.
0

Subquery returns guest_id wise max row then JOIN with main table where matching guest_id and max_row not equal row_num then perform DELETE.

DELETE redshift
FROM redshift r
INNER JOIN (SELECT guest_id
                 , MAX(rownum) rownum
           FROM redshift
           GROUP BY guest_id) t
        ON r.guest_id = t.guest_id
       AND r.rownum != t.rownum

Please check from https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=47081e3517000949460932808ac9f09d

Delete duplicate records by using CTE

WITH t_cte AS (
       SELECT *
            , ROW_NUMBER() OVER (PARTITION BY guest_id ORDER BY rownum DESC) row_num
       FROM redshift
)
DELETE redshift 
FROM t_cte c
INNER JOIN redshift r
        ON c.guest_id = r.guest_id
       AND  c.row_num > 1 AND c.rownum = r.rownum

Please check from url https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=90b7099ca779c0836b90278ae1b3635a

7 Comments

Here when using with cte, the delete will hit the original table redshift? or you missed to put original table in the main query?
WITH cte AS ( SELECT * , ROW_NUMBER() OVER (PARTITION BY guest_id ORDER BY rownum DESC) row_num FROM redshift ) DELETE FROM redshift rs inner join cte on rs.rownum=cte.row_num WHERE cte.row_num<>1; This is the correct query?
please check from given url where u can check data.
Redshift is basically built on Postgres, the query throwing error in Postgres.
you can use first query.
|
-3

Delete from table name where rownum not in (select max(rownum) from table name groupby column)

2 Comments

It will delete all the data in the table if subquery returns more than one row.
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.