2

I have an arbitrarily large MySQL table where there are duplicated rows, however to determine which rows are duplicated I need to match the data from two columns. A modified snippet of the table is below.

mysql> select * from DATA_STATUS where METADATA_ID='6ac00785-abcd-3f4a-defg-12b8ed23abff';
+--------+------------+--------------------------------------+-------------+
| ID     | STATUS     |  METADATA_ID                         | METADATA_FK |
+--------+------------+--------------------------------------+-------------+
| 1      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 2      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 3      |          0 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 4      |          0 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 5      |          1 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 6      |          2 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+

I want to do a select on the entire table where there are multiple of the same METADATA_ID where the duplicated METADATA_ID rows have a STATUS of 3. I know how to query a table for duplicates in one column, but am struggling to figure out how to match on duplicates and other conditions.

From the example data, the row IDs that match this condition are 1 and 2 but not 3.


EDIT: Additional information for clarification and TL;DR conditions

The overall criteria for a duplicate is STATUS=3 and METADATA_ID > 1, the snippet below shows the rows that meet this.

+--------+------------+--------------------------------------+-------------+
| ID     | STATUS     |  METADATA_ID                         | METADATA_FK |
+--------+------------+--------------------------------------+-------------+
| 1      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+
| 2      |          3 | 6ac00785-abcd-3f4a-defg-12b8ed23abff |       1234  |
+--------+------------+--------------------------------------+-------------+

I want the query to either pull back just one row that contains the ID, STATUS and METADATA_ID (METADATA_FK is optional) when a duplicate is found, or all instances of the duplication, either is fine. The data is not counted as a duplicate if STATUS is not 3 or the METADATA_ID only exists once in the table.

2 Answers 2

1

Try this:

select * 
from yourtable
where 
  status_id = 3 and 
  metadata_id in (
        select metadata_id 
        from yourtable
        where status_id = 3 
        group by metadata_id 
        having count(*) > 1
  );

Working example

If not all rows are necessary you can use this simple query:

select * from yourtable where status_id = 3 group by metadata_id having count(*) > 1;
Sign up to request clarification or add additional context in comments.

7 Comments

Thanks. I tried the previous answer and the updated answer on the actual table. The first query worked as expected, however the updated one didn't. It looks like it's not checking for duplicate metadata_id values.
The second one is retrieving all the rows that are duplicates (row 1 and row 2) because i thought that was what you were asking. Maybe i didn't get the question, can you provide a sample output from the input you provided?
Sure. I'll edit the question to clarify this and add in some additional rows to further help.
I've made the edit now. I'll try out your updated suggestion too. Sorry I can't use the real data as it contains sensitive information.
Thanks, the top example seems to have worked like a charm! The second example also confirms and is helpful just for finding those METADATA_ID values of records that do have duplicates without pulling back all of the rows that are duplicates.
|
1

Assuming you want all the records which have those fields duplicated:-

SELECT some_table.ID, 
        some_table.STATUS, 
        some_table.METADATA_ID, 
        some_table.METADATA_FK
FROM
(
    SELECT STATUS, 
        METADATA_ID, 
        METADATA_FK
    FROM some_table
    WHERE status_id = 3
    GROUP BY STATUS, METADATA_ID, METADATA_FK
    HAVING COUNT(*) > 1
) sub0
INNER JOIN some_table
ON sub0.STATUS = some_table.STATUS
AND sub0.METADATA_ID = some_table.METADATA_ID
AND sub0.METADATA_FK = some_table.METADATA_FK

I have assumed that metafata_fk is part of the uniqueness of a record

2 Comments

Many thanks. I can't suggest an edit, but my MySQL client threw an error because of the comma on the ON sub0.STATUS = some_table.STATUS, line. Other than that, the query ran through, however it doesn't appear to be checking for a STATUS of 3 which is a condition for identifying the duplicates.
Done the minor fixes for those.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.