How to find duplicates in SQL where not all columns are duplicate (only some)? [duplicate]

Question

Is it possible in SQL to create a query that returns all rows where some columns are duplicates, but not all?

Actionable example: consider this hypothetical SQL table with five rows in it:

| Column_A | Column_B | Column_C |
| -------- | -------- | -------- |
| ABC      | DEF      | GHI      |
| ABC      | DEF      | JKL      |
| DEF      | GHI      | GHI      |
| DEF      | GHI      | JKL      |
| ABC      | GHI      | GHI      |

The question I'm asking is this: how can I write a query that will return/"select" all rows where both Column_A and Column_B are equal to that of at least one other row in the table?

To eliminate vagueness, here is a problem that if you can solve it you will resolve my issue:

What SQL query will return exactly these four rows and no other rows?

| ABC      | DEF      | GHI      |
| ABC      | DEF      | JKL      |
| DEF      | GHI      | GHI      |
| DEF      | GHI      | JKL      |

To do this the query must check if column A and B are duplicates of other rows, but ignore column C.

I thought that using a GROUP BY and HAVING would work, but those only work when all rows are duplicate, because it just returns each unique row, I need to return all rows where only some columns are duplicate.

Is this possible in SQL, if so how?

select * from t where (a, b) in (select a, b from t group by a, b having count(*) > 1) ? — The Impaler
– The Impaler, Commented Apr 8, 2024 at 19:07
@TheImpaler: works - IF your RDBMS does support this kind of syntax - not all RDBMS do ... (and the OP unfortunately didn't mention what concrete RDBMS he's using...) — marc_s
– marc_s, Commented Apr 8, 2024 at 19:31
Windows functions are probably the most efficient, COUNT(*) OVER (PARTITION BY Column_A, Column_B) then check that for >1 — Charlieface
– Charlieface, Commented Apr 8, 2024 at 19:38

marc_s · Accepted Answer · 2024-04-08 19:30:59Z

0

You can do a select where you return all the columns and check with a subquery in your where the two columns who are equals.

SELECT 
    TABLE.Column_A, TABLE.Column_B, TABLE.Column_C
FROM 
    TABLE
WHERE 
    (TABLE.Column_A, TABLE.Column_B) IN 
         (SELECT TABLE.Column_A, TABLE.Column_B
          FROM TABLE
          GROUP BY TABLE.Column_A, TABLE.Column_B
          HAVING COUNT(*) > 1);

edited Apr 8, 2024 at 19:30

marc_s

760k186 gold badges1.4k silver badges1.5k bronze badges

answered Apr 8, 2024 at 19:09

Élisabeth Louchard

11 bronze badge

Sign up to request clarification or add additional context in comments.

1 Comment

marc_s Over a year ago

Word of advice: not all SQL-based RDBMS system will support this syntax .....

Collectives™ on Stack Overflow

How to find duplicates in SQL where not all columns are duplicate (only some)? [duplicate]

1 Answer 1

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Linked

Related