1

I have duplicate results like below where some column may have data and may not

| contact_info | icon | id  | title         | lastmodified_by  |
+--------------+------+-----+---------------+------------------+
|          169 |  305 | 123 | Whakarewarewa | 2011100400305262 |
|         NULL | NULL | 850 | Whakarewarewa | NULL             |
+--------------+------+-----+---------------+----------------



| contact_info | icon | id  | title         | lastmodified_by  |
+--------------+------+-----+---------------+------------------+
|         NULL | NULL | 123 | Paris         | NULL             |
|         NULL | NULL | 850 | Paris         | NULL             |
+--------------+------+-----+---------------+----------------

I want to delete record having less Data and if the all the Field values are exact same then delete any row. There are thousand records like this.

5
  • It is obvious that there would be multiple records else i wouldn't have asked this question here on stackoverflow. Commented Jun 5, 2012 at 8:29
  • This question was askwed trillions of time, here, here, here, here, here... Commented Jun 5, 2012 at 8:44
  • This question is different from all those. Here there are 3 cases 1) Columns having exactly same data 2) One column is having data and other not . 3) Cannot delete all min ids or max ids because data can be either in max id or min id Commented Jun 5, 2012 at 8:48
  • @lorenzo-s this links describe how to remove duplicates, but does not consider NULL values - 'record having less Data'. Commented Jun 5, 2012 at 8:49
  • @saunjean Ok, but at least you could have tried something starting from these answer, and then posting your code together with question. Commented Jun 5, 2012 at 8:49

3 Answers 3

3

Try this two-step solution:

Run this query to vew all duplicates - record having less Data -

SELECT t1.* FROM table t1
  JOIN (
    SELECT
      title,
      MIN(IF(contact_info IS NULL, 0, 1) + IF(contact_info IS NULL, 0, 1) + IF(lastmodified_by IS NULL, 0, 1)) min_value_data,
      MAX(IF(contact_info IS NULL, 0, 1) + IF(contact_info IS NULL, 0, 1) + IF(lastmodified_by IS NULL, 0, 1)) max_value_data
    FROM table GROUP BY title HAVING min_value_data <> max_value_data
  ) t2
  ON t1.title = t2.title AND IF(t1.contact_info IS NULL, 0, 1) + IF(t1.contact_info IS NULL, 0, 1) + IF(t1.lastmodified_by IS NULL, 0, 1) <> t2.max_value_data

Rewrite it to DELETE statement and execute.


Then run this query to remove all duplicates except min ID:

DELETE t1 FROM table t1
  JOIN (SELECT MIN(id) id, title FROM table GROUP BY title) t2
    ON t1.id <> t2.id AND t1.title = t2.title;
Sign up to request clarification or add additional context in comments.

6 Comments

What if any record with min id has no data but max id has data in it. It will delete id with max id and keep id with min id???
Yes, you are right. I read your edited question; and the query should be rewrited.
It is giving this "ERROR 1052 (23000): Column 'title' in field list is ambiguous" . Then i thought to put t1.title in starting of the query but then it gives "ERROR 1054 (42S22): Unknown column 't1.value_data' in 'on clause'"
Add table alias (the field should be t1.title).
I Tried the same thing before posting the comment and the error is still the same "ERROR 1054 (42S22): Unknown column 't1.value_data' in 'on clause'"
|
1

Use this to select duplicates, feel free to alter this to a delete statement:

SELECT * FROM `test`,
(SELECT title, count( title ) AS ttl
FROM `test`
GROUP BY title
HAVING ttl >1) AS sub

WHERE test.title = sub.title
AND contact_info IS NULL AND lastmodified_by IS NULL 

4 Comments

This query fails in the case Where title=Paris . It lists both the Paris in the Result. This is the case where both results are exact duplicates including all fields
Correct, I did not see your Paris example. How about this: Showing rows 0 - 1 (2 total, Query took 0.0007 sec) SELECT * FROM test AS main, ( SELECT title, count( title ) AS ttl FROM test GROUP BY title HAVING ttl >1 ) AS sub WHERE main.title = sub.title AND main.contact_info IS NULL AND main.lastmodified_by IS NULL GROUP BY main.contact_info, main.icon, main.title, main.lastmodified_by
This query is almost correct except the part that Table in which rows to be modified or deleted cannot be referenced in Subquery.so i cannot delete using this query
See my new answer below. using more queries.
0

Main table = tes1

Create temp

CREATE TEMPORARY TABLE my_temp ( id INT(20) NOT NULL ) ENGINE=MEMORY;

Fill with id's to remove

INSERT INTO my_temp (id) SELECT id FROM tes1 AS main, ( SELECT title, count( title ) AS ttl FROM tes1 GROUP BY title HAVING ttl >1 ) AS sub WHERE main.title = sub.title AND main.contact_info IS NULL AND main.lastmodified_by IS NULL GROUP BY main.contact_info, main.icon, main.title, main.lastmodified_by;

Delete!

DELETE FROM tes1 WHERE id IN (select id from my_temp);

Cleanup, note: do we really need this?

DROP TABLE my_temp;

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.