Merging rows when counting - Django/SQL

Question

I have the following model:

class Item(models.Model):

    unique_code = models.CharField(max_length=100)
    category_code = models.CharField(max_length=100)
    label = models.CharField(max_length=100)

I would like to get:

the count of the different category_codes used
count of the different unique_codes used
count of the different combination of category_code and unique_code used

Any ideas?

vls · Accepted Answer · 2011-02-01 18:09:02Z

3

Django/SQL solution as requested:

the count of the different category_codes used:

category_codes_cnt = Item.objects.values('category_codes').distinct().count()

count of the different unique_codes used:

unique_codes_cnt = Item.objects.values('unique_codes').distinct().count()

count of the different combination of category_code and unique_code used:

codes_cnt = Item.objects.values('category_codes', 'unique_codes').distinct().count()

answered Feb 1, 2011 at 18:09

vls

2,31915 silver badges20 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

S.Lott · Accepted Answer · 2011-02-01 17:47:15Z

1

Don't waste too much time trying finesse a cool SQL solution.

from collections import defaultdict
count_cat_code = defaultdict(int)
count_unique_code = defaultdict(int)
count_combo_code = defaultdict(int)
for obj in Item.objects.all():
    count_cat_code[obj.category_code] += 1
    count_unique_code[obj.unique_code] += 1
    count_combo_code[obj.category_code,obj.unique_code] += 1

That will do it. And it will work reasonably quickly. Indeed, if you do some benchmarking, you may find that -- sometimes -- it's as fast as a "pure SQL" statement.

[Why? Because RDBMS must use a fairly inefficient algorithm for doing GROUP BY and Counts. In Python we have the luxury of assuming some things based on our application and our knowledge of the data. In this case, for example, I assumed that it would all fit in memory. An assumption that cannot be made by the RDBMS internal algorithms.]

answered Feb 1, 2011 at 17:47

S.Lott

393k83 gold badges521 silver badges791 bronze badges

9 Comments

RadiantHex Over a year ago

@S.Lott: Thanks!! :) But what if it contains 1Million rows, which is kind of the case here. Any tips?

S.Lott Over a year ago

Doesn't matter. SQL isn't much faster than Python for this. All million rows must be read no matter what. SQL often has to sort them into temporary storage before it can do the GROUP BY, leading to a O ( n log(n) ) kind of performance. In Python you make one fast pass through the data collecting everything you need. Your hash tables are all O (1) so it's net O (n).

araqnid Over a year ago

otoh, that does mean every row has to be marshalled/transmitted/unmarshalled, which may turn out to be more overhead than hashing. don't guess, measure. at least for a small example I used, postgresql will do a hash aggregate anyway.

S.Lott Over a year ago

@araqnid: "does mean every row has to be marshalled/transmitted/unmarshalled". Yes. "which may turn out to be more overhead than hashing."? What? The database can't do the hashing for you. And GROUP BY in SQL may not involve any hashing. GROUP BY in SQL often involves an huge, expensive sort. A table scan can be faster.

vls Over a year ago

This will be significantly slower than a SQL query that calculates a total within the database program and transmits result to the client, especially if the object is large and plentiful.

|

araqnid · Accepted Answer · 2011-02-01 18:03:42Z

0

select count(distinct unique_code) as unique_code_count,
       count(distinct category_code) as category_code_count,
       count(*) as combination_count
from (select unique_code, category_code, count(*) as combination_count
      from item
      group by unique_code, category_code) combination

answered Feb 1, 2011 at 18:03

araqnid

135k25 gold badges164 silver badges139 bronze badges

Collectives™ on Stack Overflow

Merging rows when counting - Django/SQL

3 Answers 3

Comments

9 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

9 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related