MySQL - Finding Duplicated Data from Two Columns

Question

I have an arbitrarily large MySQL table where there are duplicated rows, however to determine which rows are duplicated I need to match the data from two columns. A modified snippet of the table is below.

mysql> select * from DATA_STATUS where METADATA_ID='6ac00785-abcd-3f4a-defg-12b8ed23abff';
+--------+------------+--------------------------------------+-------------+
| ID     | STATUS     |  METADATA_ID                         | METADATA_FK |
+--------+------------+--------------------------------------+-------------+
| 1      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 2      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 3      |          0 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 4      |          0 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 5      |          1 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 6      |          2 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+

I want to do a select on the entire table where there are multiple of the same METADATA_ID where the duplicated METADATA_ID rows have a STATUS of 3. I know how to query a table for duplicates in one column, but am struggling to figure out how to match on duplicates and other conditions.

From the example data, the row IDs that match this condition are 1 and 2 but not 3.

EDIT: Additional information for clarification and TL;DR conditions

The overall criteria for a duplicate is STATUS=3 and METADATA_ID > 1, the snippet below shows the rows that meet this.

+--------+------------+--------------------------------------+-------------+
| ID     | STATUS     |  METADATA_ID                         | METADATA_FK |
+--------+------------+--------------------------------------+-------------+
| 1      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 2      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+

I want the query to either pull back just one row that contains the ID, STATUS and METADATA_ID (METADATA_FK is optional) when a duplicate is found, or all instances of the duplication, either is fine. The data is not counted as a duplicate if STATUS is not 3 or the METADATA_ID only exists once in the table.

Mattia Nocerino · Accepted Answer · 2017-04-21 10:18:49Z

1

Try this:

select * 
from yourtable
where 
  status_id = 3 and 
  metadata_id in (
        select metadata_id 
        from yourtable
        where status_id = 3 
        group by metadata_id 
        having count(*) > 1
  );

Working example

If not all rows are necessary you can use this simple query:

select * from yourtable where status_id = 3 group by metadata_id having count(*) > 1;

edited Apr 21, 2017 at 10:18

answered Apr 21, 2017 at 9:29

Mattia Nocerino

1,5231 gold badge17 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

Donglecow Over a year ago

Thanks. I tried the previous answer and the updated answer on the actual table. The first query worked as expected, however the updated one didn't. It looks like it's not checking for duplicate metadata_id values.

Mattia Nocerino Over a year ago

The second one is retrieving all the rows that are duplicates (row 1 and row 2) because i thought that was what you were asking. Maybe i didn't get the question, can you provide a sample output from the input you provided?

Donglecow Over a year ago

Sure. I'll edit the question to clarify this and add in some additional rows to further help.

Donglecow Over a year ago

I've made the edit now. I'll try out your updated suggestion too. Sorry I can't use the real data as it contains sensitive information.

Donglecow Over a year ago

Thanks, the top example seems to have worked like a charm! The second example also confirms and is helpful just for finding those METADATA_ID values of records that do have duplicates without pulling back all of the rows that are duplicates.

|

Kickstart · Accepted Answer · 2017-04-21 11:01:22Z

1

Assuming you want all the records which have those fields duplicated:-

SELECT some_table.ID, 
        some_table.STATUS, 
        some_table.METADATA_ID, 
        some_table.METADATA_FK
FROM
(
    SELECT STATUS, 
        METADATA_ID, 
        METADATA_FK
    FROM some_table
    WHERE status_id = 3
    GROUP BY STATUS, METADATA_ID, METADATA_FK
    HAVING COUNT(*) > 1
) sub0
INNER JOIN some_table
ON sub0.STATUS = some_table.STATUS
AND sub0.METADATA_ID = some_table.METADATA_ID
AND sub0.METADATA_FK = some_table.METADATA_FK

I have assumed that metafata_fk is part of the uniqueness of a record

edited Apr 21, 2017 at 11:01

answered Apr 21, 2017 at 10:10

Kickstart

21.5k2 gold badges26 silver badges33 bronze badges

2 Comments

Donglecow Over a year ago

Many thanks. I can't suggest an edit, but my MySQL client threw an error because of the comma on the ON sub0.STATUS = some_table.STATUS, line. Other than that, the query ran through, however it doesn't appear to be checking for a STATUS of 3 which is a condition for identifying the duplicates.

Kickstart Over a year ago

Done the minor fixes for those.

Collectives™ on Stack Overflow

MySQL - Finding Duplicated Data from Two Columns

2 Answers 2

7 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related