0

I've been researching the proper way to find duplicate rows based on specific fields for days now. I think I need a little more help -

 SELECT * 
    FROM enrollees
    INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
    ON enrollees.first_name = b.first_name
        AND enrollees.last_name = b.last_name 
        AND enrollees.address1 = b.address1
        AND enrollees.city = b.city
        AND enrollees.state = b.state
        AND enrollees.zip = b.zip 
        AND count > 1 
        AND enrollees.program_instance_id = b.program_instance_id 
        AND enrollees.id != MinId;

The goal is to take the duplicates and put them in an archive table (enrollees_duplicates), then delete the duplicates from the live table (enrollees). I tried writing one query to find and insert the duplicate rows but it gives me the following error:

"Column count doesn't match value count at row 1"

The query I tried using is:

INSERT INTO enrollees_duplicates (SELECT * 
    FROM enrollees
    INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
    ON enrollees.first_name = b.first_name
        AND enrollees.last_name = b.last_name 
        AND enrollees.address1 = b.address1
        AND enrollees.city = b.city
        AND enrollees.state = b.state
        AND enrollees.zip = b.zip 
        AND count > 1 
        AND enrollees.program_instance_id = b.program_instance_id 
        AND enrollees.id != MinId);

I assume it is because I'm not retrieving all of the columns in the INNER JOIN select? If that's the case, wouldn't it still throw the same error if I changed it to SELECT * (with the MinId and count additions) because there would be two extra columns that don't exist in the new table?

Is there any way to do all of the work with an SQL query without having to SELECT the duplicates, store them in a PHP array, and then use another SQL query to pull each row, INSERT it into the duplicate table, and then another SQL query to delete the duplicate row.

My intention was to use two queries. One to insert all duplicate rows into the archive table and another to delete the duplicate rows. If it could, somehow, be made into one query that finds the duplicates, inserts them into the archive table, and then deletes them - all in one run, that would be even better.

Being new to this field, Any help or guidance would be appreciated.

2
  • Suggestion - Please see if you can keep your question's description to the point, that helps and even encourages people trying to help you. Commented Oct 18, 2013 at 19:31
  • I was only trying to be thorough. Explain exactly what my objective is and what route I was going, in case there is a better, more efficient way that could be suggested. Sorry if it seems like rambling, I just wanted to give as much information as I could so no one had to ask for more. Commented Oct 18, 2013 at 19:35

2 Answers 2

0

"Column count doesn't match value count at row 1"

Tables enrollees_duplicates and enrollees have diffrent structure.

Might be better to use ON DELETE TRIGGER ? (http://dev.mysql.com/doc/refman/5.0/en/create-trigger.html).

Sign up to request clarification or add additional context in comments.

Comments

0

The solution to my problem is that when my first select was just '*', it was adding the two additional columns (MinId, count) to the result which made the column count different. By only grabbing the results of the 'enrollees' table and not the additional parameters of the subquery too, it corrects the column difference.

INSERT INTO enrollees_duplicates (SELECT enrollees.* 
    FROM enrollees
    INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
    ON enrollees.first_name = b.first_name
        AND enrollees.last_name = b.last_name 
        AND enrollees.address1 = b.address1
        AND enrollees.city = b.city
        AND enrollees.state = b.state
        AND enrollees.zip = b.zip 
        AND count > 1 
        AND enrollees.program_instance_id = b.program_instance_id 
        AND enrollees.id != MinId);

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.