4

I found out in this answer I can easily delete duplicate rows (duplication based on N columns) in a table with raw SQL.

Is there an equivalence using Django ORM ? The only stuff I found in Django concerned duplicated based on 1 column only.

Note : I know there is a way to prevent future duplicates (based on several fields) in Django, using unique_together field (but I didn't know before).

Thanks.

1
  • May you show your some of rows and columns of table? Commented Sep 21, 2015 at 8:05

2 Answers 2

15

A direct translation from the SQL in the other answer into Django ORM:

from django.db.models import Min
# First select the min ids
min_id_objects = MyModel.objects.values('A', 'B').annotate(minid=Min('id'))
min_ids = [obj['minid'] for obj in min_id_objects]
# Now delete 
MyModel.objects.exclude(id__in=min_ids).delete()

This will results in 2 separate SQL queries instead of the one nested SQL provided in the other answer. But I think this is good enough.

Sign up to request clarification or add additional context in comments.

3 Comments

If you have default ordering for your model, make sure to remove it by putting .order_by() at the end: min_id_objects = MyModel.objects.values('A', 'B').annotate(minid=Min('id')).order_by(). Otherwise, your default order field will appear in the GROUP BY clause of the query, which could mean you miss duplicates.
@neowang It seems that this query could be a problem with a large MyModel dataset and just few dups, because the WHERE/IN clause created by the exclude can be huge.
One caveat that I found is that if you want to delete the first object that was created and not the last object that was created, use max instead of min here.
1

You can add an RunSQL operation with the SQL that removes duplicates in one of your migrations, before the operation that adds the uniqueness constraint.

A remark: if you are using sqlmigrate, RunSQL has an advantage of including its SQL into the resulting migration SQL.

2 Comments

I actually already added the uniqueness constraint (before deleting duplicates) and and applied migration.
You can delete everything from the DB, delete migrations from the point where you added it, add the RunSQL and recreate the migrations (I am assuming it is not in production).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.