3

I have a mySQL database with a tad under 2 million rows. The database is non-interactive, so efficiency isn't key.

The (simplified) structure I have is:

`id` int(11) NOT NULL auto_increment
  `category` varchar(64) NOT NULL
  `productListing` varchar(256) NOT NULL

Now the problem I would like to solve is, I want to find duplicates on productListing field, merge the data on the category field into a single result - deleting the duplicates.

So given the following data:

+----+-----------+---------------------------+
| id | category  | productListing            |
+----+-----------+---------------------------+
|  1 | Category1 | productGroup1             | 
|  2 | Category2 | productGroup1             | 
|  3 | Category3 | anotherGroup9             | 
+----+-----------+---------------------------+

What I want to end up is with:

+----+----------------------+---------------------------+
| id | category             | productListing            |
+----+----------------------+---------------------------+
|  1 | Category1,Category2  | productGroup1             | 
|  3 | Category3            | anotherGroup9             | 
+----+----------------------+---------------------------+

What's the most efficient way to do this either in pure mySQL query or php?

2 Answers 2

2

I think you're looking for GROUP_CONCAT:

SELECT GROUP_CONCAT(category), productListing
FROM YourTable
GROUP BY productListing

I would create a new table, inserting the updated values, delete the old one and rename the new table to the old one's name:

CREATE TABLE new_YourTable SELECT GROUP_CONCAT(...;
DROP TABLE YourTable;
RENAME TABLE new_YourTable TO YourTable;
-- don't forget to add triggers, indexes, foreign keys, etc. to new table
Sign up to request clarification or add additional context in comments.

2 Comments

This seems to be doing what I want, I guess the next question is - is storing this type of data in a comma separated list the right way to go about things? I'm thinking it may make more sense to create a new table, but unless there is a benefit I won't really bother.
Storing the data in a comma seperated list is not the right way to go. A column should have one value. Otherwise updates would be very inefficient.
0
SELECT  MIN(id), GROUP_CONCAT(category SEPARATOR ',' ORDER BY id), productListing
FROM    mytable
GROUP BY
        productListing

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.