I've been trying to query an oracle database and remove duplicate rows based on specific column values without much success.
My current oracle query is the following:
SELECT col1, col2, col3, col4
FROM myTable
WHERE myConditions
ORDER BY col5
The problem is that in this query, col4 has several possible values for the combinaison of col1/col2/col3 and I only want to keep the value where col5 is minimal (hence the order by in the query).
In pandas this would be the equivalent of running this query and then running df.drop_duplicates(subset = ["col1", "col2", "col3"]) on the result.
I've tried using first_value function to achieve this, but my oracle skills are limited, would anyone know how to modify the query to get the wanted results ?
Here's an example of the desired query output for a given table:
Table :
col1 col2 col3 col4 col5
0 A A 1 3 1
1 A A 1 5 2
2 A A 2 2 1
3 A A 2 -1 2
4 A B 1 0 3
5 A B 1 0 4
6 A B 2 1 4
7 A B 2 4 3
Query output :
col1 col2 col3 col4
0 A A 1 3
1 A A 2 2
2 A B 1 0
3 A B 2 4