Oracle SQL How to find duplicate values in different columns?

Question

I have a set of rows with many columns. For example,

ID | Col1 | Col2 | Col3 | Duplicate
------------------------------------
81 | 101  | 102  | 101  | YES
82 | 101  | 103  | 104  | NO

I need to calculate the "Duplicate" column. It is duplicate because it has the same value in Col1 and Col3. I know there is the LEAST function, which is similar to the MIN function but with columns. Does something similar to achieve this exists?

The approach I have in mind is to write all possible combinations in a case like this:

SELECT ID, col1, col2, col3, 
       CASE WHEN col1 = col2 or col1 = col3 or col2 = col3 then 1 else 0 end as Duplicate
FROM table

But, I wish to avoid that, since I have too many columns in some cases, and is very prone to errors.

What is the best way to solve this?

That may be long code to write, and prone to errors, but it is the most efficient way to solve the problem. Alternatively, you could unpivot and look for duplicates with standard tools, but that will take a lot longer (not least because it will have to group again by ID, right after you just gave up that grouping by unpivoting the data). Then: It is not clear how you would use either LEAST or MIN to find out if there are duplicates. And: can null appear in the columns, and if so how do you treat them for deciding if there are duplicates? — user5683823
– user5683823, Commented Jul 13, 2017 at 16:56
I don't want to use the LEAST function for this. I was just asking if a function to find duplicates with a syntax similar to LEAST existed. And yes, unpivoting is not a valid option for my situation. — user7792598
– user7792598, Commented Jul 13, 2017 at 17:04

Gordon Linoff · Accepted Answer · 2017-07-13 16:52:59Z

2

Hmmm. You are looking for within-row duplicates. This is painful. More recent versions of Oracle support lateral joins. But for just a handful of non-NULL columns, you can do:

select id, col1, col2, col3,
       (case when col1 in (col2, col3) or col2 in (col3) then 1 else 0 end) as Duplicate
from t;

For each additional column, you need to add one more in comparison and update the other in-lists.

answered Jul 13, 2017 at 16:52

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user5683823 Over a year ago

The lateral clause (Oracle 12.1 and above) is an interesting idea, but it is still not clear how it would help. The values are still in a row and need to be unpivoted; but lateral allows that to be done one row at a time, keeping the grouping that is otherwise lost with a standard unpivot. In any case, upvoted for thinking of using lateral if available.

user7792598 Over a year ago

I think I will go with this, however, Lateral is new for me, and I do have Oracle 12c. Where can I find a simple example to learn its usage?

user5683823 Over a year ago

@user7792598 - I posted a solution along those lines. You may google for "Oracle 12.1 LATERAL" and see what comes back.

user5683823 · Accepted Answer · 2017-07-13 17:15:55Z

Something like this... note that in the lateral clause we still need to unpivot, but that is one row at a time - resulting in possibly much faster execution than simple unpivot and standard aggregation.

with
     input_data ( id, col1, col2, col3 ) as (
       select 81, 101, 102, 101 from dual union all
       select 82, 101, 103, 104 from dual
     )
-- End of simulated input data (for testing purposes only).
-- Solution (SQL query) begins BELOW THIS LINE.
select i.id, i.col1, i.col2, i.col3, l.duplicates
from   input_data i,
         lateral ( select  case when count (distinct val) = count(val) 
                                then 'NO' else 'YES'
                           end  as duplicates
                   from    input_data
                   unpivot ( val for col in ( col1, col2, col3 ) )
                   where   id = i.id
                 ) l
;

ID  COL1  COL2  COL3  DUPLICATES
--  ----  ----  ----  ----------
81   101   102   101  YES
82   101   103   104  NO

Vamsi Prabhala · Accepted Answer · 2017-07-13 17:04:46Z

0

You can do this by unpivoting and then counting the distinct values per id and checking if it equals the number of rows for that id. Equal means there are no duplicates. Then left join this result to the original table to caclulate the duplicate column.

SELECT t.*,
       CASE WHEN x.id IS NOT NULL THEN 'Yes' ELSE 'No' END AS duplicate
FROM t
LEFT JOIN
  (SELECT id
   FROM
     (SELECT *
      FROM t 
      unpivot (val FOR col IN (col1,col2,col3)) u 
     ) t
   GROUP BY id
   HAVING count(*)<>count(DISTINCT val)
  ) x ON x.id=t.id

answered Jul 13, 2017 at 17:04

Vamsi Prabhala

49.4k4 gold badges41 silver badges64 bronze badges

Comments

Bill Karwin · Accepted Answer · 2017-07-13 18:06:19Z

0

The best way^† is to avoid storing repeating groups of columns. If you have multiple columns that essentially store comparable data (i.e. a multi-valued attribute), move the data to a dependent table, and use one column.

CREATE TABLE child (
 ref_id INT,
 col INT
);

INSERT INTO child VALUES
(81, 101), (81, 102), (81, 101),
(82, 101), (82, 103), (82, 104);

Then it's easier to find cases where a value occurs more than once:

SELECT id, col, COUNT(*)
FROM child
GROUP BY id, col
HAVING COUNT(*) > 1;

If you can't change the structure of the table, you could simulate it using UNIONs:

SELECT id, col1, COUNT(*)
FROM (
    SELECT id, col1 AS col FROM mytable
    UNION ALL SELECT id, col2 FROM mytable
    UNION ALL SELECT id, col3 FROM mytable
    ... for more columns ...
) t
GROUP BY id, col
HAVING COUNT(*) > 1;

^† Best for the query you are trying to run. A denormalized storage strategy might be better for some other types of queries.

edited Jul 13, 2017 at 18:06

answered Jul 13, 2017 at 17:01

Bill Karwin

567k87 gold badges709 silver badges869 bronze badges

3 Comments

user5683823 Over a year ago

best is a very strong term. For speed of processing, in a warehousing environment data is sometimes (often?) stored the way the OP has it - then queries against this table don't have to do all the grouping / "partition by" (as for analytic functions) again.

Bill Karwin Over a year ago

@mathguy, "best" is relative to the query at hand. Any optimization helps a specific type of query, at the expense of other queries. The OP is asking how to do a specific query, and the data is not stored in the best way for that query. I have edited the answer with a footnote about this.

user5683823 Over a year ago

Right. You know that, and I kind of know that, but future visitors to this thread may not. The clarification you added is perfect - that's precisely what I had in mind.

fg78nc · Accepted Answer · 2017-07-14 06:29:31Z

0

SELECT ID, col1, col2, 
    NVL2(NULLIF(col1, col2), 'Not duplicate', 'Duplicate')
       FROM table;

If you want to compare more than 2 columns can implement same logic with COALESCE

edited Jul 14, 2017 at 6:29

answered Jul 14, 2017 at 5:59

fg78nc

5,2624 gold badges24 silver badges38 bronze badges

Comments

Dheeraj kumar · Accepted Answer · 2017-07-14 06:52:06Z

0

I think you want to use fresh data that doesnot contains any duplicate values inside table if it right then use SELECT DISTINCT statement like

SELECT DISTINCT * FROM TABLE_NAME

It will conatins duplicate free data,
Note: It will also applicable for a particular column like

SELECT DISTINCT col1 FROM TABLE_NAME

answered Jul 14, 2017 at 6:52

Dheeraj kumar

1

Collectives™ on Stack Overflow

Oracle SQL How to find duplicate values in different columns?

6 Answers 6

3 Comments

Comments

Comments

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

6 Answers 6

3 Comments

Comments

Comments

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related