6

I want to select the count of all contacts with the same email address that have more than one duplicate. I can't get this query working in SQLAlchey with PostgreSQL.

SELECT count(*), email FROM contact group by email having count(*) > 1

I tried this:

all_records = db.session.query(Contact).options(
    load_only('email')).group_by(Contact.email).having(
    func.count('*') > 1).all()
sqlalchemy.exc.ProgrammingError: (psycopg2.ProgrammingError) column "contact.id" must appear in the GROUP BY clause or be used in an aggregate function
LINE 1: SELECT contact.id AS contact_id, contact.email AS contact_em...
           ^
[SQL: 'SELECT contact.id AS contact_id, contact.email AS contact_email \nFROM contact GROUP BY contact.email \nHAVING count(%(count_1)s) > %(count_2)s'] [parameters: {'count_1': '*', 'count_2': 1}]

And I tried this:

all_records = db.session.query(func.count(Contact.id)).options(
    load_only('email')).group_by(Contact.email).having(
    func.count('*') > 1).all()
sqlalchemy.exc.ArgumentError
sqlalchemy.exc.ArgumentError: Wildcard loader can only be used with exactly one entity.  Use Load(ent) to specify specific entities.

It works correctly if I execute raw SQL:

all_records = db.session.execute(
    "SELECT count(*), email FROM contact group by email"
    " having count(*) > 1").fetchall()

I'm using Flask-SQLAlchemy, but here's a minimal SQLAlchemy setup to demonstrate the issue:

import sqlalchemy as sa
from sqlalchemy import orm
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class Contact(Base):
    __tablename__ = 'contact'
    id = sa.Column(sa.Integer, primary_key=True)
    email = sa.Column(sa.String)

engine = sa.create_engine('postgresql:///example', echo=True)
Base.metadata.create_all(engine)
session = orm.Session(engine)
session.add_all((
    Contact(email='[email protected]'),
    Contact(email='[email protected]'),
    Contact(email='[email protected]'),
    Contact(email='[email protected]'),
    Contact(email='[email protected]'),
))
session.commit()

# first failed query
all_records = session.query(Contact).options(
    orm.load_only('email')).group_by(Contact.email).having(
    sa.func.count('*') > 1).all()

# second failed query
all_records = db.session.query(sa.func.count(Contact.id)).options(
    orm.load_only('email')).group_by(Contact.email).having(
    sa.func.count('*') > 1).all()

With the sample data, I expect to get one result row, 3, [email protected].

1
  • 2
    As to why id is added even though you specify load_only(), see this. Commented Oct 27, 2017 at 16:19

2 Answers 2

8

You're not building the same query in SQLAlchemy that you're writing manually.

You want to select the count of each email that has more than one occurrence.

q = session.query(
    db.func.count(Contact.email),
    Contact.email
).group_by(
    Contact.email
).having(
    db.func.count(Contact.email) > 1
)
print(q)
SELECT count(contact.email) AS count_1, contact.email AS contact_email 
FROM contact GROUP BY contact.email 
HAVING count(contact.email) > %(count_2)s

The first query fails because you query the entire model, so SQLAlchemy selects all columns. You can only select grouped columns when using group_by. SQLAlchemy must always select the primary key when querying the entire model, load_only doesn't affect that.

The second query fails because load_only only works when selecting an entire model, but you're selecting an aggregate and a column.

Sign up to request clarification or add additional context in comments.

Comments

4

Just select what you would in a text query:

db.session.query(func.count('*'), Contact.email).\
    group_by(Contact.email).\
    having(func.count('*') > 1).\
    all()

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.