1

I've spent the last day trying to get an aggregation over a time series from my db. I tried to use the Django ORM but quickly gave up and went running back to SQL. I don't think there's a way to use PSQL generate_series with it, I assume they'd prefer you to use itertools or another method in python.

I have a model much like this:

class Vote(models.Model):
    value = models.IntegerField(default=0)
    timestamp = models.DateTimeField('date voted', auto_now_add=True)
    location = models.ForeignKey('location', on_delete=models.CASCADE)

What I want to do, is show a series of metrics over time -- for now, an aggregation per hour of the current day for the current user. The user has a timezone set (defaults to 'America/Chicago'). I've been jacking around with the postgres query, inserting tons of AS TIME ZONE casts in an effort to wrangle the bounds and return values of the query. I had it returning the correct results late last night but this morning, it's off again. I know it's got to be something very dumb that I'm doing. I even resorted to double-casting timestamps because of the weird way Postgres handles AT TIME ZONE (correcting TO UTC instead of FROM)

Again, I'd like to show buckets of aggregates for each hour of the user's current day up to/including 'now'.

This is my current query:

WITH hour_intervals AS (
    SELECT * FROM generate_series(date_trunc('day',(SELECT TIMESTAMP 'today' AT TIME ZONE 'UTC' AT TIME ZONE %s)), (LOCALTIMESTAMP AT TIME ZONE 'UTC' AT TIME ZONE %s), '1 hour') start_time
)

SELECT f.start_time,
COUNT(id) total,
COUNT(CASE WHEN value > 0 THEN 1 END) AS positive_votes,
COUNT(CASE WHEN value = 0 THEN 1 END) AS indifferent_votes,
COUNT(CASE WHEN value < 0 THEN 1 END) AS negative_votes,
SUM(CASE WHEN value > 0 THEN 2 WHEN value = 0 THEN 1 WHEN value < 0 THEN -4 END) AS score

FROM votes_vote m
RIGHT JOIN hour_intervals f 
        ON m.timestamp AT TIME ZONE %s >= f.start_time AND m.timestamp AT TIME ZONE %s < f.start_time + '1 hour'::interval
        AND m.location_id = %s
GROUP BY f.start_time
ORDER BY f.start_time

DEBUGGING INFO
Django 1.9.2 and my settings.py has USE_TZ=True
Postgres 9.5.2 and my login role for django has

ALTER ROLE yesno_django
  SET client_encoding = 'utf8';
ALTER ROLE yesno_django
  SET default_transaction_isolation = 'read committed';
ALTER ROLE yesno_django
  SET TimeZone = 'UTC';

UPDATE Fiddling with the query some more, this is now a working query for today's votes...

WITH hour_intervals AS (
    SELECT * FROM generate_series((SELECT TIMESTAMP 'today' AT TIME ZONE 'UTC'), (LOCALTIMESTAMP AT TIME ZONE 'UTC' AT TIME ZONE %s), '1 hour') start_time
)

SELECT f.start_time,
COUNT(id) total,
COUNT(CASE WHEN value > 0 THEN 1 END) AS positive_votes,
COUNT(CASE WHEN value = 0 THEN 1 END) AS indifferent_votes,
COUNT(CASE WHEN value < 0 THEN 1 END) AS negative_votes,
SUM(CASE WHEN value > 0 THEN 2 WHEN value = 0 THEN 1 WHEN value < 0 THEN -4 END) AS score

FROM votes_vote m
RIGHT JOIN hour_intervals f 
        ON m.timestamp AT TIME ZONE %s >= f.start_time AND m.timestamp AT TIME ZONE %s < f.start_time + '1 hour'::interval
        AND m.location_id = %s
GROUP BY f.start_time
ORDER BY f.start_time

How come the query I had earlier worked perfectly from 7pm to 10pmish last night but then fails today? Should I expect this new query to fall down as well?

Can someone explain where I went wrong the first time (or every time)?

4
  • 1
    Why can't you use DATE_TRUNC? Django have built-in option for using it. Commented Feb 16, 2016 at 16:26
  • @GwynBleidD like this? votes = Vote.objects.filter(location=l).filter(timestamp__date=timezone.now().date()).extra({"hour":"date_trunc('hour',timestamp)"}).values("hour").order_by().annotate(score=score_annotation, count=Count('id')) I think it's close -- I'm going to play with this method a bit more. thanks! Commented Feb 16, 2016 at 17:07
  • I ment date_trunc from SQL, but if you don't have to strictly use your method to generate that query, I can post full answer creating pretty much same results. Commented Feb 16, 2016 at 17:13
  • @GwynBleidD I'd love to see it -- the QuerySet I posted above doesn't work, actually. Commented Feb 16, 2016 at 17:37

2 Answers 2

2

First, add related_name='votes' into your foreign key to location, for better control, now using location model you can do:

from django.db.models import Count, Case, Sum, When, IntegerField
from django.db.models.expressions import DateTime

queryset = location.objects.annotate(
    datetimes=DateTime('votes__timestamp', 'hour', tz),
    positive_votes=Count(Case(
        When(votes__value__gt=0, then=1),
        default=None,
        output_field=IntegerField())),
    indifferent_votes=Count(Case(
        When(votes__value=0, then=1),
        default=None,
        output_field=IntegerField())),
    negative_votes=Count(Case(
        When(votes__value__lt=0, then=1),
        default=None,
        output_field=IntegerField())),
    score=Sum(Case(
        When(votes__value__lt=0, then=-4),
        When(votes__value=0, then=1),
        When(votes__value__gt=0, then=2),
        output_field=IntegerField())),
    ).values_list('datetimes', 'positive_votes', 'indifferent_votes', 'negative_votes', 'score').distinct().order_by('datetimes')

That will generate statistics for each of location. You can of course filter it to any location or time range.

Sign up to request clarification or add additional context in comments.

5 Comments

Thanks! I took the SQL this generates and ran it against my db and got the right results...however, I'm getting ValueError: Database returned an invalid value in QuerySet.datetimes(). Are time zone definitions for your database and pytz installed? when calling from Django. Looks like there might be a bug: code.djangoproject.com/ticket/25937#comment:1
One note, There is one regression with this filter. It leaves gaps where there are no results whereas the generate_series gives a complete timeline.
If tz is none and you have timezone supoort globally enabled in django, it will throw that error. So you must set timezone every time. And yes, one drawback of that query is ommiting hours without any vote.
I set tz = timezone.get_current_timezone() before calling the query. Should it be done a different way? timezone.activate(timezone.get_current_timezone()) ?
If you're having USE_TZ setting set to True, you must set time zone object as third parameter of DateTime. If you have USE_TZ set to False, try to send None instead.
0

If the datetime fields you are dealing will allow nulls you can work around https://code.djangoproject.com/ticket/25937 with the following:

Potato.objects.annotate(
    time=Coalesce(
        TruncMonth('removed', tzinfo=timezone.UTC()),
        Value(datetime.min.replace(tzinfo=timezone.UTC()),
    ).values('time').annotate(c=Count('pk'))

This replaces the NULL times with an easy to spot sentinel. if you were already using datetime.min, you'll have to come up with something else.

I'm using this in production, but I've found that where TruncMonth() on it's own would give you local time, when you put Coalesce() around it you can have only naive or UTC.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.