1

I need to select duplicate rows based on two columns in a join, and i can't seem to figure out how that is done.

Currently i got this:

SELECT s.name,administrative_site_id as adm_id,s.external_code,si.identifier_value 
FROM suppliers s
INNER JOIN suppliers_identifier si
ON s.id = si.supplier_id

And the output is something along the lines of below:

| Name       | adm_id      | external_code |identifier_value  |
|:-----------|------------:|:------------: |:----------------:|
| Warlob     |     66323   |    ext531     |    id444         |
| Ozzy       |     53123   |    ext632     |    id333         |
| Motorhead  |     521     |    ext733     |    id222         |
| Perez      |     123     |    ext833     |    id111         |
| Starlight  |     521     |    ext934     |    id222         |
| Aligned    |     123     |    ext235     |    id111         |

What i am looking for, is how to simply select these 4 rows, as they are duplicates based on column: adm_id and Identifier_value

| Name       | adm_id      | external_code |identifier_value  |
|:-----------|------------:|:------------: |:----------------:|
| Motorhead  |     521     |    ext733     |    id222         |
| Perez      |     123     |    ext833     |    id111         |
| Starlight  |     521     |    ext934     |    id222         |
| Aligned    |     123     |    ext235     |    id111         |
2

2 Answers 2

3

First group by ADM_ID, IDENTIFIER_VALUE and find groups that has more than one row in it. Then select all rows that has these couples

SELECT S.NAME
      ,ADMINISTRATIVE_SITE_ID AS ADM_ID
      ,S.EXTERNAL_CODE
      ,SI.IDENTIFIER_VALUE
  FROM SUPPLIERS S INNER JOIN SUPPLIERS_IDENTIFIER SI ON S.ID = SI.SUPPLIER_ID
 WHERE (ADMINISTRATIVE_SITE_ID, SI.IDENTIFIER_VALUE) IN (SELECT ADMINISTRATIVE_SITE_ID AS ADM_ID, SI.IDENTIFIER_VALUE
                                                           FROM SUPPLIERS S INNER JOIN SUPPLIERS_IDENTIFIER SI ON S.ID = SI.SUPPLIER_ID
                                                         GROUP BY ADM_ID, IDENTIFIER_VALUE
                                                         HAVING COUNT(*) > 1)
Sign up to request clarification or add additional context in comments.

1 Comment

This does exactly what i wanted! only had to change the GROUP BY to directly reference ADMINISTRATIVE_SITE_ID instead of ADM_ID.
0

Or an alternate way that may perform better on big datasets:

with t as (
SELECT s.name,administrative_site_id as adm_id,s.external_code,si.identifier_value 
COUNT(*) OVER (PARTITION BY administrative_site_id ,identifier_value ) AS cnt
FROM suppliers s
INNER JOIN suppliers_identifier si
ON s.id = si.supplier_id)
select name, adm_id, external_code, identifier_value 
from t
where cnt > 1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.