6

I'm using SQLAlchemy to manage a database and I'm trying to delete all rows that contain duplicates. The table has an id (primary key) and domain name.

Example:
ID| Domain
1 | example-1.com
2 | example-2.com
3 | example-1.com

In this case I want to delete 1 instance of example-1.com. Sometimes I will need to delete more than 1 but in general the database should not have a domain more than once and if it does, only the first row should be kept and the others should be deleted.

1 Answer 1

4

Assuming your model looks something like this:

import sqlalchemy as sa
from sqlalchemy import orm

Base = orm.declarative_base()


class Domain(Base):
    __tablename__ = 'domain_names'

    id = sa.Column(sa.Integer, primary_key=True)
    domain = sa.Column(sa.String)

Then you can delete the duplicates like this:

# Create a query that identifies the row for each domain with the lowest id
inner_q = session.query(sa.func.min(Domain.id)).group_by(Domain.domain)
aliased = sa.alias(inner_q)
# Select the rows that do not match the subquery
q = session.query(Domain).filter(~Domain.id.in_(aliased))

# Delete the unmatched rows (SQLAlchemy generates a single DELETE statement from this loop)
for domain in q:
    session.delete(domain)
session.commit()

# Show remaining rows
for domain in session.query(Domain):
    print(domain)
print()

If you are not using the ORM, the core equivalent is:

meta = sa.MetaData()
domains = sa.Table('domain_names', meta, autoload=True, autoload_with=engine)

inner_q = sa.select([sa.func.min(domains.c.id)]).group_by(domains.c.domain)
aliased = sa.alias(inner_q)

with engine.connect() as conn:
    conn.execute(domains.delete().where(~domains.c.id.in_(aliased)))

This answer is based on the SQL provided in this answer. There are other ways of deleting duplicates, which you can see in the other answers on the link, or by googling "sql delete duplicates" or similar.

Sign up to request clarification or add additional context in comments.

6 Comments

What is 'Base' in the class? This code is not runable. What do I miss that others didn't?
@Apostolos It's the Base class as used in a the standard SQLAlchemy docs. I've edited the question to show one way of generating it.
Thanks for your response. I get "AttributeError: module 'sqlalchemy.orm' has no attribute 'declarative_base'". (My 'sqlalchemy' version is 1.3.24.)
For such an old version you'll need to use from sqlalchemy.ext.declarative import declarative_base.
Thanks @snakecharmerb. I just upgraded 'sqlalchemy' to 2.0.23. (These guys seem to work a lot on their packages! 🙂) Strange thing, 'sqlalchemy' was part of the 'chatterbot' package the last version of which I installed a couple of weeks ago ...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.