2

I'm writing a python script that searches for duplicate entries in a table in a mysql database using python-mysql-connector. I want the function to output the duplicate entries in a table of customer information. I'm not sure how to store the duplicates and keep track of the index of the item in the table. Should they be stored in a list or set?

import mysql.connector

dbconnect = mysql.connector.connect(host='localhost', user='root', password='wordpass', db='contacts')


cur= dbconnect.cursor(buffered= True)


rows= cur.fetchall()


def find_duplicates(query):

    for row in rows:

       query= cur.execute ("SELECT id, name, address1, city, postal_code COUNT(*) FROM customer " 
    "GROUP BY name, address1, city, postal_code HAVING COUNT(*) > 1")

       if row in cur.fetchone():
         return row
       else:
         cur.fetchone()

1 Answer 1

1

I think you can change your query to return you the full duplicated result set. Something like that should work I think:

SELECT t.* FROM customer AS t 
    INNER JOIN (
        SELECT name, address1, city, postal_code 
        FROM customer GROUP BY name, address1, city, postal_code 
        HAVING COUNT(*) > 1) AS td 
    ON t.name = td.name AND t.address1 = td.address1 
        AND t.city = td.city AND t.postal_code = td.postal_code;

Once you have all the dupes in hand with IDs you can present group them in python easily I think.

Sign up to request clarification or add additional context in comments.

4 Comments

Thanks for your input. Would that query be better than the one I currently have: SELECT id, name, address1, city, postal_code, COUNT() FROM customer GROUP BY name, address1, city, postal_code HAVING COUNT() > 1 For multiple SQL solutions in other posts, this query seems to work.
Your query will only return one result per duplicate set, so you will not display all duplicate entries from the table using it.
I see. I'll try incorporating this query into my code and see if it generates the result I'm looking for. So, it will still be necessary to iterate through the rows like: rows= cur.fetchall() for row in rows: cur.execute(query) #query goes here
Yes, you still need to iterate through the rows to show them to users. If you have too many results look into how to page the results with ORDER BY and LIMIT

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.