13

I am having hard time optimizing my SQLAlchemy queries. My SQL knowledge is very basic, and I just can't get the stuff I need from the SQLAlchemy docs.

Suppose the following very basic one-to-many relationship:

class Parent(Base):
    __tablename__ = "parents"
    id = Column(Integer, primary_key = True)
    children = relationship("Child", backref = "parent")

class Child(Base):
    __tablename__ = "children"
    id = Column(Integer, primary_key = True)
    parent_id = Column(Integer, ForeignKey("parents.id"))
    naughty = Column(Boolean)

How could I:

  • Query tuples of (Parent, count_of_naughty_children, count_of_all_children) for each parent?

After decent time spent googling, I found how to query those values separately:

# The following returns tuples of (Parent, count_of_all_children):
session.query(Parent, func.count(Child.id)).outerjoin(Child, Parent.children).\
    group_by(Parent.id)
# The following returns tuples of (Parent, count_of_naughty_children):
al = aliased(Children, session.query(Children).filter_by(naughty = True).\
    subquery())
session.query(Parent, func.count(al.id)).outerjoin(al, Parent.children).\
    group_by(Parent.id)

I tried to combine them in different ways, but didn't manage to get what I want.

  • Query all parents which have more than 80% naughty children? Edit: naughty could be NULL.

I guess this query is going to be based on the previous one, filtering by naughty/all ratio.

Any help is appreciated.

EDIT : Thanks to Antti Haapala's help, I found solution to the second question:

avg = func.avg(func.coalesce(Child.naughty, 0)) # coalesce() treats NULLs as 0
# avg = func.avg(Child.naughty) - if you want to ignore NULLs
session.query(Parent).join(Child, Parent.children).group_by(Parent).\
    having(avg > 0.8)

It finds average if children's naughty variable, treating False and NULLs as 0, and True as 1. Tested with MySQL backend, but should work on others, too.

2 Answers 2

12

the count() sql aggretate function is pretty simple; it gives you the total number of non-null values in each group. With that in mind, we can adjust your query to give you the proper result.

print (Query([
    Parent,
    func.count(Child.id),
    func.count(case(
        [((Child.naughty == True), Child.id)], else_=literal_column("NULL"))).label("naughty")])

    .join(Parent.children).group_by(Parent)
    )

Which produces the following sql:

SELECT 
 parents.id AS parents_id, 
 count(children.id) AS count_1, 
 count(CASE WHEN (children.naughty = 1) 
       THEN children.id 
       ELSE NULL END) AS naughty 
FROM parents 
JOIN children ON parents.id = children.parent_id 
GROUP BY parents.id
Sign up to request clarification or add additional context in comments.

Comments

5

If your query is only to get the parents who have > 80 % children naughty, you can on most databases cast the naughty to integer, then take average of it; then having this average greater than 0.8.

Thus you get something like

from sqlalchemy.sql.expression import cast

naughtyp = func.avg(cast(Child.naughty, Integer))
session.query(Parent, func.count(Child.id), naughtyp).join(Child)\
    .group_by(Parent.id).having(naughtyp > 0.8).all()

2 Comments

Thanks a lot, that did the trick. But I didn't mention that in my real model "naughty" could be null - sorry, my fault. Null values are getting ignored by avg, so this solution isn't exactly what I want.
func.coalesce() helped me to fix this ^^

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.