0

I have a set of rows with many columns. For example,

ID | Col1 | Col2 | Col3 | Duplicate
------------------------------------
81 | 101  | 102  | 101  | YES
82 | 101  | 103  | 104  | NO

I need to calculate the "Duplicate" column. It is duplicate because it has the same value in Col1 and Col3. I know there is the LEAST function, which is similar to the MIN function but with columns. Does something similar to achieve this exists?

The approach I have in mind is to write all possible combinations in a case like this:

SELECT ID, col1, col2, col3, 
       CASE WHEN col1 = col2 or col1 = col3 or col2 = col3 then 1 else 0 end as Duplicate
FROM table

But, I wish to avoid that, since I have too many columns in some cases, and is very prone to errors.

What is the best way to solve this?

2
  • That may be long code to write, and prone to errors, but it is the most efficient way to solve the problem. Alternatively, you could unpivot and look for duplicates with standard tools, but that will take a lot longer (not least because it will have to group again by ID, right after you just gave up that grouping by unpivoting the data). Then: It is not clear how you would use either LEAST or MIN to find out if there are duplicates. And: can null appear in the columns, and if so how do you treat them for deciding if there are duplicates? Commented Jul 13, 2017 at 16:56
  • I don't want to use the LEAST function for this. I was just asking if a function to find duplicates with a syntax similar to LEAST existed. And yes, unpivoting is not a valid option for my situation. Commented Jul 13, 2017 at 17:04

6 Answers 6

2

Hmmm. You are looking for within-row duplicates. This is painful. More recent versions of Oracle support lateral joins. But for just a handful of non-NULL columns, you can do:

select id, col1, col2, col3,
       (case when col1 in (col2, col3) or col2 in (col3) then 1 else 0 end) as Duplicate
from t;

For each additional column, you need to add one more in comparison and update the other in-lists.

Sign up to request clarification or add additional context in comments.

3 Comments

The lateral clause (Oracle 12.1 and above) is an interesting idea, but it is still not clear how it would help. The values are still in a row and need to be unpivoted; but lateral allows that to be done one row at a time, keeping the grouping that is otherwise lost with a standard unpivot. In any case, upvoted for thinking of using lateral if available.
I think I will go with this, however, Lateral is new for me, and I do have Oracle 12c. Where can I find a simple example to learn its usage?
@user7792598 - I posted a solution along those lines. You may google for "Oracle 12.1 LATERAL" and see what comes back.
1

Something like this... note that in the lateral clause we still need to unpivot, but that is one row at a time - resulting in possibly much faster execution than simple unpivot and standard aggregation.

with
     input_data ( id, col1, col2, col3 ) as (
       select 81, 101, 102, 101 from dual union all
       select 82, 101, 103, 104 from dual
     )
-- End of simulated input data (for testing purposes only).
-- Solution (SQL query) begins BELOW THIS LINE.
select i.id, i.col1, i.col2, i.col3, l.duplicates
from   input_data i,
         lateral ( select  case when count (distinct val) = count(val) 
                                then 'NO' else 'YES'
                           end  as duplicates
                   from    input_data
                   unpivot ( val for col in ( col1, col2, col3 ) )
                   where   id = i.id
                 ) l
;

ID  COL1  COL2  COL3  DUPLICATES
--  ----  ----  ----  ----------
81   101   102   101  YES
82   101   103   104  NO 

Comments

0

You can do this by unpivoting and then counting the distinct values per id and checking if it equals the number of rows for that id. Equal means there are no duplicates. Then left join this result to the original table to caclulate the duplicate column.

SELECT t.*,
       CASE WHEN x.id IS NOT NULL THEN 'Yes' ELSE 'No' END AS duplicate
FROM t
LEFT JOIN
  (SELECT id
   FROM
     (SELECT *
      FROM t 
      unpivot (val FOR col IN (col1,col2,col3)) u 
     ) t
   GROUP BY id
   HAVING count(*)<>count(DISTINCT val)
  ) x ON x.id=t.id

Comments

0

The best way is to avoid storing repeating groups of columns. If you have multiple columns that essentially store comparable data (i.e. a multi-valued attribute), move the data to a dependent table, and use one column.

CREATE TABLE child (
 ref_id INT,
 col INT
);

INSERT INTO child VALUES
(81, 101), (81, 102), (81, 101),
(82, 101), (82, 103), (82, 104);

Then it's easier to find cases where a value occurs more than once:

SELECT id, col, COUNT(*)
FROM child
GROUP BY id, col
HAVING COUNT(*) > 1;

If you can't change the structure of the table, you could simulate it using UNIONs:

SELECT id, col1, COUNT(*)
FROM (
    SELECT id, col1 AS col FROM mytable
    UNION ALL SELECT id, col2 FROM mytable
    UNION ALL SELECT id, col3 FROM mytable
    ... for more columns ...
) t
GROUP BY id, col
HAVING COUNT(*) > 1;

Best for the query you are trying to run. A denormalized storage strategy might be better for some other types of queries.

3 Comments

best is a very strong term. For speed of processing, in a warehousing environment data is sometimes (often?) stored the way the OP has it - then queries against this table don't have to do all the grouping / "partition by" (as for analytic functions) again.
@mathguy, "best" is relative to the query at hand. Any optimization helps a specific type of query, at the expense of other queries. The OP is asking how to do a specific query, and the data is not stored in the best way for that query. I have edited the answer with a footnote about this.
Right. You know that, and I kind of know that, but future visitors to this thread may not. The clarification you added is perfect - that's precisely what I had in mind.
0
SELECT ID, col1, col2, 
    NVL2(NULLIF(col1, col2), 'Not duplicate', 'Duplicate')
       FROM table;

If you want to compare more than 2 columns can implement same logic with COALESCE

Comments

0

I think you want to use fresh data that doesnot contains any duplicate values inside table if it right then use SELECT DISTINCT statement like

SELECT DISTINCT * FROM TABLE_NAME

It will conatins duplicate free data,
Note: It will also applicable for a particular column like

SELECT DISTINCT col1 FROM TABLE_NAME

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.