1

I am trying to translate a fairly short bit of SQL into an sqlAlchemy ORM query. The SQL uses Postgres's generate_series to make a set of dates and my goal is to make a set of time series arrays categorized by one of the columns.

The tables (simplified) are very simple:

counts:
-----------------
count   (Integer)
day     (Date)
placeID (foreign key related to places)

"counts_pkey" PRIMARY KEY (day, placeID)

places:
-----------------
id
name   (varchar)

The output I'm after is a time series of counts for each place including null values when counts are not reported for a day. For example, this would correspond to a series over four days:

    array_agg    |    name
-----------------+-------------------
 {NULL,0,7,NULL} | A Place
 {NULL,1,NULL,2} | Some other place
 {5,NULL,3,NULL} | Yet another

I can do this fairly easily by taking a CROSS JOIN on a date range and places and joining that with the counts:

SELECT array_agg(counts.count), places.name 
FROM generate_series('2018-11-01', '2018-11-04', interval '1 days') as day 
CROSS JOIN  places 
LEFT OUTER JOIN counts on counts.day = day.day AND counts.PlaceID = places.id 
GROUP BY places.name;

What I can't seem to figure out is how to get SQLAlchemy to do this. After a lot of digging, I found an old google groups thread which almost works leading to this:

date_list = select([column('generate_series')])\
.select_from(func.generate_series(backthen, today, '1 day'))\ 
.alias('date_list')

time_series = db.session.query(Place.name, func.array_agg(Count.count))\
.select_from(date_list)\
.outerjoin(Count, (Count.day == date_list.c.generate_series) & (Count.placeID == Place.id ))\
.group_by(Place.name)

This creates a sub-select for the time series, but it produces a database error:

There is an entry for table "places", but it cannot be referenced from this part of the query.

So my question is: how would you do this in sqlalchemy. Also, I'm open to the idea that this is difficult because my approach with the SQL is bone-headed.

0

1 Answer 1

1

The problem is that given the query construct SQLAlchemy produces a query along the lines of

SELECT ...
FROM places,
     (...) AS date_list LEFT OUTER JOIN count ON ... AND count."placeID" = places.id
...

There are 2 FROM-list items: places and the join. Items cannot cross-reference each other1, and hence the error due to places.id in the ON-clause.

SQLAlchemy does not support explicit CROSS JOIN, but on the other hand a CROSS JOIN is equivalent to an INNER JOIN ON (TRUE). You could also omit wrapping the function expression in a subquery and use it as is by giving it an alias:

date_list = func.generate_series(backthen, today, '1 day').alias('gen_day')

time_series = session.query(Place.name, func.array_agg(Count.count))\
    .join(date_list, true())\
    .outerjoin(Count, (Count.day == column('gen_day')) &
                      (Count.placeID == Place.id ))\
    .group_by(Place.name)

1: Except function-call FROM-items, or using LATERAL.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.