python sqlalchemy distinct column values

Question

I have 6 tables in my SQLite database, each table with 6 columns(Date, user, NormalA, specialA, contact, remarks) and 1000+ rows.

How can I use sqlalchemy to sort through the Date column to look for duplicate dates, and delete that row?

Q1: do you have also a separate primary key column? Q2: Why is the fact that you have 6 tables important for this question? — van
– van, Commented Mar 20, 2016 at 18:45

van · Accepted Answer · 2016-03-25 18:34:39Z

3

Assuming this is your model:

class MyTable(Base):
    __tablename__ = 'my_table'
    id = Column(Integer, primary_key=True)
    date = Column(DateTime)
    user = Column(String)
    # do not really care of columns other than `id` and `date`
    # important here is the fact that `id` is a PK

following are two ways to delete you data:

Find duplicates, mark them for deletion and commit the transaction
Create a single SQL query which will perform deletion on the database directly.

For both of them a helper sub-query will be used:

# helper subquery: find first row (by primary key) for each unique date
subq = (
    session.query(MyTable.date, func.min(MyTable.id).label("min_id"))
    .group_by(MyTable.date)
) .subquery('date_min_id')

Option-1: Find duplicates, mark them for deletion and commit the transaction

# query to find all duplicates
q_duplicates = (
    session
    .query(MyTable)
    .join(subq, and_(
        MyTable.date == subq.c.date,
        MyTable.id != subq.c.min_id)
    )
)

for x in q_duplicates:
    print("Will delete %s" % x)
    session.delete(x)
session.commit()

Option-2: Create a single SQL query which will perform deletion on the database directly

sq = (
    session
    .query(MyTable.id)
    .join(subq, and_(
        MyTable.date == subq.c.date,
        MyTable.id != subq.c.min_id)
    )
).subquery("subq")

dq = (
    session
    .query(MyTable)
    .filter(MyTable.id.in_(sq))
).delete(synchronize_session=False)

answered Mar 25, 2016 at 18:34

van

77.6k13 gold badges174 silver badges179 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Apostolos Over a year ago

What is 'Base' or what does it stand for in the Class? This code is not runnable without defining it.

van Over a year ago

@Apostolos, please see the Declarative Mapping part of documentation. Base is the base class for mapped classes/tables.

Community · Accepted Answer · 2017-05-23 12:09:46Z

1

Inspired by the Find duplicate values in SQL table this might help you to select duplicate dates:

query = session.query(
    MyTable
).\
    having(func.count(MyTable.date) > 1).\
    group_by(MyTable.date).all()

If you only want to show unique dates; distinct on is what you might need

edited May 23, 2017 at 12:09

CommunityBot

11 silver badge

answered Mar 16, 2016 at 12:39

Joost Döbken

4,1476 gold badges50 silver badges96 bronze badges

2 Comments

jake wong Over a year ago

Cant do a func.count. There doesn't seem to be that choice

Parfait Over a year ago

@jakewong Did you import the function: from sqlalchemy import func?

Florent B. · Accepted Answer · 2016-03-23 09:35:24Z

While I like the whole object oriented approache with SQLAlchemy, sometimes I find it easier to directly use some SQL. And since the records don't have a key, we need the row number (_ROWID_) to delete the targeted records and I don't think the API provides it.

So first we connect to the database:

from sqlalchemy import create_engine
db = create_engine(r'sqlite:///C:\temp\example.db')
eng = db.engine

Then to list all the records:

for row in eng.execute("SELECT * FROM TableA;") :
  print row

And to display all the duplicated records where the dates are identical:

for row in eng.execute("""
  SELECT * FROM {table}
  WHERE {field} IN (SELECT {field} FROM {table} GROUP BY {field} HAVING COUNT(*) > 1)
  ORDER BY {field};
  """.format(table="TableA", field="Date")) :
  print row

Now that we identified all the duplicates, they probably need to be fixed if the other fields are different:

eng.execute("UPDATE TableA SET NormalA=18, specialA=20 WHERE Date = '2016-18-12' ;");
eng.execute("UPDATE TableA SET NormalA=4,  specialA=8  WHERE Date = '2015-18-12' ;");

And finnally to keep the first inserted record and delete the most recent duplicated records :

print eng.execute("""
  DELETE FROM {table} 
  WHERE _ROWID_ NOT IN (SELECT MIN(_ROWID_) FROM {table} GROUP BY {field});
  """.format(table="TableA", field="Date")).rowcount

Or to keep the last inserted record and delete the other duplicated records :

print eng.execute("""
  DELETE FROM {table} 
  WHERE _ROWID_ NOT IN (SELECT MAX(_ROWID_) FROM {table} GROUP BY {field});
  """.format(table="TableA", field="Date")).rowcount

Collectives™ on Stack Overflow

python sqlalchemy distinct column values

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related