I've been researching the proper way to find duplicate rows based on specific fields for days now. I think I need a little more help -
SELECT *
FROM enrollees
INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
ON enrollees.first_name = b.first_name
AND enrollees.last_name = b.last_name
AND enrollees.address1 = b.address1
AND enrollees.city = b.city
AND enrollees.state = b.state
AND enrollees.zip = b.zip
AND count > 1
AND enrollees.program_instance_id = b.program_instance_id
AND enrollees.id != MinId;
The goal is to take the duplicates and put them in an archive table (enrollees_duplicates), then delete the duplicates from the live table (enrollees). I tried writing one query to find and insert the duplicate rows but it gives me the following error:
"Column count doesn't match value count at row 1"
The query I tried using is:
INSERT INTO enrollees_duplicates (SELECT *
FROM enrollees
INNER JOIN (SELECT first_name, last_name, address1, city, state, zip, program_instance_id, MIN(id) AS MinId, COUNT(id) AS count FROM enrollees GROUP BY first_name, last_name, address1, city, state, zip, program_instance_id) b
ON enrollees.first_name = b.first_name
AND enrollees.last_name = b.last_name
AND enrollees.address1 = b.address1
AND enrollees.city = b.city
AND enrollees.state = b.state
AND enrollees.zip = b.zip
AND count > 1
AND enrollees.program_instance_id = b.program_instance_id
AND enrollees.id != MinId);
I assume it is because I'm not retrieving all of the columns in the INNER JOIN select? If that's the case, wouldn't it still throw the same error if I changed it to SELECT * (with the MinId and count additions) because there would be two extra columns that don't exist in the new table?
Is there any way to do all of the work with an SQL query without having to SELECT the duplicates, store them in a PHP array, and then use another SQL query to pull each row, INSERT it into the duplicate table, and then another SQL query to delete the duplicate row.
My intention was to use two queries. One to insert all duplicate rows into the archive table and another to delete the duplicate rows. If it could, somehow, be made into one query that finds the duplicates, inserts them into the archive table, and then deletes them - all in one run, that would be even better.
Being new to this field, Any help or guidance would be appreciated.