1

I'd like to merge rows based on multiple criteria, essentially removing duplicates where I get to define what "duplicate" means. Here is an example table:

     ╔═════╦═══════╦═════╦═══════╗
     ║ id* ║ name  ║ age ║ grade ║
     ╠═════╬═══════╬═════╬═══════╣
     ║  1  ║ John  ║ 11  ║   5   ║
     ║  2  ║ John  ║ 11  ║   5   ║
     ║  3  ║ John  ║ 11  ║   6   ║
     ║  4  ║ Sam   ║ 14  ║   7   ║
     ║  5  ║ Sam   ║ 14  ║   7   ║
     ╚═════╩═══════╩═════╩═══════╝

In my example, let's say I want to merge on name and age but ignore grade. The result should be:

     ╔═════╦═══════╦═════╦═══════╗
     ║ id* ║ name  ║ age ║ grade ║
     ╠═════╬═══════╬═════╬═══════╣
     ║  1  ║ John  ║ 11  ║   5   ║
     ║  3  ║ John  ║ 11  ║   6   ║
     ║  4  ║ Sam   ║ 14  ║   7   ║
     ╚═════╩═══════╩═════╩═══════╝

I don't particularly care if the id column is updated to be incremental, but I suppose that would be nice.

Can I do this in MySQL?

3
  • Do you mean when you query it, or to update that table? Commented Sep 10, 2015 at 19:55
  • I would like to update the table. Commented Sep 10, 2015 at 19:56
  • You're probably better off dumping the result into a temp table (based on that one answer down there), and then truncate/dump this data back in. Commented Sep 10, 2015 at 19:57

2 Answers 2

1

My suggestion, based on my above comment.

SELECT distinct name, age, grade 
into tempTable
from theTable

This will ignore the IDs and give you only a distinct dump, and into a new table.

Then you can either drop the old and, and rename the new one. Or truncate the old one, and dump this back in.

Sign up to request clarification or add additional context in comments.

5 Comments

Conceptually, this makes sense. I've never used INTO. Does tempTable have to exist before you run that command? When I try it, I get Undeclared variable: tempTable .
No. In fact, the table should not exist first. This will create the thing for you. The column definitions will be created automatically based on the columns you're using to create it. Depending on your database, you may need to name your table with a schema, like "dbo.tempTable".
Okay, maybe this explains it: stackoverflow.com/questions/2949653/…. I'm using MariaDB.
I didn't realize there was a difference. I've edited my question to remove references to SQL and will propose an edit to your answer.
Well, the question was already plussed up because it seems it's helpful. And the comments here show how we arrived at a solution - up to and including two different methods (in syntax). So, there is no need to edit. In any case - glad to be of help...
1

You could just delete the duplicates in place like this:

delete test
from test 
inner join (
  select name, age, grade, min(id) as minid, count(*)
  from test
  group by name, age, grade
  having count(*) > 1
) main on test.id = main.minid;

Example: http://sqlfiddle.com/#!9/f1a38/1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.