0

I've been trying to query an oracle database and remove duplicate rows based on specific column values without much success.

My current oracle query is the following:

SELECT col1, col2, col3, col4
FROM myTable
WHERE myConditions
ORDER BY col5

The problem is that in this query, col4 has several possible values for the combinaison of col1/col2/col3 and I only want to keep the value where col5 is minimal (hence the order by in the query).

In pandas this would be the equivalent of running this query and then running df.drop_duplicates(subset = ["col1", "col2", "col3"]) on the result.

I've tried using first_value function to achieve this, but my oracle skills are limited, would anyone know how to modify the query to get the wanted results ?

Here's an example of the desired query output for a given table:

Table :

  col1 col2  col3  col4  col5
0    A    A     1     3     1
1    A    A     1     5     2
2    A    A     2     2     1
3    A    A     2    -1     2
4    A    B     1     0     3
5    A    B     1     0     4
6    A    B     2     1     4
7    A    B     2     4     3

Query output :

  col1 col2  col3  col4
0    A    A     1     3
1    A    A     2     2
2    A    B     1     0
3    A    B     2     4

2 Answers 2

1

In Oracle you can use KEEP FIRST for this:

SELECT col1, col2, col3, MIN(col4) KEEP (DENSE_RANK FIRST ORDER BY col5)
FROM myTable
WHERE myConditions
GROUP BY col1, col2, col3
ORDER BY col1, col2, col3;

(It doesn't matter whether you use MIN(col4) or MAX(col4) here by the way, because you only expect one row with the minimum col5 per col1, col2, and col3, so there are no ties to deal with.)

Sign up to request clarification or add additional context in comments.

Comments

1

Does this give you what you are looking for?

SELECT distinct * FROM (
    SELECT col1, col2, col3, FIRST_VALUE(col4)  OVER (PARTITION BY col1, col2, col3 ORDER BY col5 ASC) as col4_min5
    FROM myTable
    WHERE myConditions
) tbl

4 Comments

Not quite but the size of the output is the right one, here for each possible value of (col1/col2/col3) the value of col4 is the minimum value of col4 in myTable and the value of col5 is the minimal value of col5 at which col4 is minimal. What I'm trying to get is the same table where col4 is filled with its value when col5 is minimal, and without col5 in the output.
Ah, I see what you are looking for now. I updated the code above. It is mssql, but i think it should work in oracle too
The modified version does not aggregate over the first three columns; it will produce as many rows in the output as there are in the input. The correct answer uses an aggregate function - the first function, demonstrated in the "Correct Answer".
The distinct keyword is performing that aggregation

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.