django queryset aggregation count counting wrong thing

Question

This is a continuation question from:
Django queryset get distinct column values with respect to other column

My Problem:

Using aggregate count in Django counts the wrong thing in the queryset, or as far as I can see something that is not even in my queryset.

What I did I used:

queryset.order_by('col1', 'col2').distinct('col1', 'col2').values('col2')

to get the values of col2 of a model where all the rows have a distinct pair of values in (col1, col2). There is an example in the link above. I printed my queryset and it looks good, I have

[{'col2': value1}, ... , {'col2': value1},{'col2': value2}, ..., {'col2': value2},...]

I now want to count how much each value appears in the queryset I got. I do this using aggregation count. I have:

a = {'count': Count(F(col2), distinct=False)}
queryset.annotate(**a)

I tried this with ``distinct=True` as well but no luck

I would expect to get [{col2:value1, count: num1}, {col2:value2, count: num2} ...].
Instead I get [{col2: value1, count: num11}, {col2: value1, count: num12}, ... ,{col2: value1, count: num_1n}, {col2: value2, count: num21}, ... ,{col2: value1, count: num_2n}, ...]. Where as far as I can tell num11, ...., num_1n are the amount of lines value1 existed in col2 with any specific value in col1, previous to me using order_by('col1', 'col2').distinct('col1', 'col2').values('col2') when preparing the query.

I can't figure out what can cause this. I tried looking in the queryset.query parameter but I can't understand where I am going wrong.

Any help would be greatly appreciated.

Shouldn't you Count(F('col1'))?

willeM_ Van Onsem
– willeM_ Van Onsem

2018-10-20 15:37:58 +00:00
Commented Oct 20, 2018 at 15:37 — willeM_ Van Onsem
– willeM_ Van Onsem, Commented Oct 20, 2018 at 15:37

willeM_ Van Onsem · Accepted Answer · 2018-10-20 15:40:59Z

4

The .order_by should specify only 'col2', like:

queryset.values('col2').annotate(
    count=Count('col1', distinct=True)
).order_by('col2')

This will thus yield a QuerySet that looks like:

< QuerySet [
    {'col2': 1, 'count': 4 },
    {'col2': 2, 'count': 2 }
]>

So that means that there are four distinct values for col1 given col2 has value 1, and two distinct values for col1 given col2 has value 2.

This will construct a query like:

SELECT col2, COUNT(DISTINCT col1) AS count
FROM some_table
GROUP BY col2
ORDER BY col2

The .distinct(..) is here not necessary since due to the GROUP BY we will only obtain distinct col2 values, and because we COUNT(DISTINCT ..) this thus means that each distinct value of col1 is counted once.

answered Oct 20, 2018 at 15:40

willeM_ Van Onsem

482k33 gold badges483 silver badges624 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

django queryset aggregation count counting wrong thing

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related