Remove duplicates on specific rows in oracle query

Question

I've been trying to query an oracle database and remove duplicate rows based on specific column values without much success.

My current oracle query is the following:

SELECT col1, col2, col3, col4
FROM myTable
WHERE myConditions
ORDER BY col5

The problem is that in this query, col4 has several possible values for the combinaison of col1/col2/col3 and I only want to keep the value where col5 is minimal (hence the order by in the query).

In pandas this would be the equivalent of running this query and then running df.drop_duplicates(subset = ["col1", "col2", "col3"]) on the result.

I've tried using first_value function to achieve this, but my oracle skills are limited, would anyone know how to modify the query to get the wanted results ?

Here's an example of the desired query output for a given table:

Table :

  col1 col2  col3  col4  col5
0    A    A     1     3     1
1    A    A     1     5     2
2    A    A     2     2     1
3    A    A     2    -1     2
4    A    B     1     0     3
5    A    B     1     0     4
6    A    B     2     1     4
7    A    B     2     4     3

Query output :

  col1 col2  col3  col4
0    A    A     1     3
1    A    A     2     2
2    A    B     1     0
3    A    B     2     4

Thorsten Kettner · Accepted Answer · 2021-08-10 16:18:33Z

1

In Oracle you can use KEEP FIRST for this:

SELECT col1, col2, col3, MIN(col4) KEEP (DENSE_RANK FIRST ORDER BY col5)
FROM myTable
WHERE myConditions
GROUP BY col1, col2, col3
ORDER BY col1, col2, col3;

(It doesn't matter whether you use MIN(col4) or MAX(col4) here by the way, because you only expect one row with the minimum col5 per col1, col2, and col3, so there are no ties to deal with.)

answered Aug 10, 2021 at 16:18

Thorsten Kettner

96.6k8 gold badges56 silver badges82 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Kristian Fitzgerald · Accepted Answer · 2021-08-10 16:08:10Z

1

Does this give you what you are looking for?

SELECT distinct * FROM (
    SELECT col1, col2, col3, FIRST_VALUE(col4)  OVER (PARTITION BY col1, col2, col3 ORDER BY col5 ASC) as col4_min5
    FROM myTable
    WHERE myConditions
) tbl

edited Aug 10, 2021 at 16:08

answered Aug 10, 2021 at 15:40

Kristian Fitzgerald

2691 silver badge7 bronze badges

4 Comments

Erlinska Over a year ago

Not quite but the size of the output is the right one, here for each possible value of (col1/col2/col3) the value of col4 is the minimum value of col4 in myTable and the value of col5 is the minimal value of col5 at which col4 is minimal. What I'm trying to get is the same table where col4 is filled with its value when col5 is minimal, and without col5 in the output.

Kristian Fitzgerald Over a year ago

Ah, I see what you are looking for now. I updated the code above. It is mssql, but i think it should work in oracle too

user5683823 Over a year ago

The modified version does not aggregate over the first three columns; it will produce as many rows in the output as there are in the input. The correct answer uses an aggregate function - the first function, demonstrated in the "Correct Answer".

Kristian Fitzgerald Over a year ago

The distinct keyword is performing that aggregation

Collectives™ on Stack Overflow

Remove duplicates on specific rows in oracle query

2 Answers 2

Comments

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related