Python collections.Counter() runtime

Question

I just run into a problem that I need to put a list, e.g. l = [1, 2, 3, 4], into a dic, e.g. {1: 1, 2: 1, 3: 1, 4: 1}. I just want to know whether I should use collections.Counter() or just write a loop by myself to do this. Is build-in method faster than writing loop by myself?

Use collections.Counter() because this what it does. Why to write your own code ;) — Moinuddin Quadri
– Moinuddin Quadri, Commented Nov 9, 2016 at 18:29
You can always test if something is faster, with the timeit module. In Python 3, the Counter object has C performance improvements and is very fast indeed. — Martijn Pieters
– Martijn Pieters, Commented Nov 9, 2016 at 18:39
@MartijnPieters: I calculated it using timeit and My own code is faster than Counter. May be I did something wrong, but everything looks same to me — Moinuddin Quadri
– Moinuddin Quadri, Commented Nov 9, 2016 at 18:43
@anonymous: I did mention that in Python 3 it is faster. Are you using Python 2 perhaps? See my answer below for a benchmark. — Martijn Pieters
– Martijn Pieters, Commented Nov 9, 2016 at 18:47

Martijn Pieters · Accepted Answer · 2016-11-09 19:03:17Z

5

You can always test if something is faster, with the timeit module. In Python 3, the Counter object has C performance improvements and is very fast indeed:

>>> from timeit import timeit
>>> import random, string
>>> from collections import Counter, defaultdict
>>> def count_manually(it):
...     res = defaultdict(int)
...     for el in it:
...         res[el] += 1
...     return res
...
>>> test_data = [random.choice(string.printable) for _ in range(10000)]
>>> timeit('count_manually(test_data)', 'from __main__ import test_data, count_manually', number=2000)
1.4321454349992564

>>> timeit('Counter(test_data)', 'from __main__ import test_data, Counter', number=2000)
0.776072466003825

Here Counter() was 2 times faster.

That said, unless you are counting in a performance-critical section of your code, focus on readability and maintainability in mind, and in that respect a Counter() wins hands-down over write-your-own code.

Next to all that, Counter() objects offer functionality on top of dictionaries: they can be treated as multisets (you can sum or subtract counters, and produce unions or intersections), and they can efficiently give you the top N elements by count.

edited Nov 9, 2016 at 19:03

answered Nov 9, 2016 at 18:40

Martijn Pieters

1.1m326 gold badges4.2k silver badges3.4k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Moinuddin Quadri Over a year ago

Yes you are right. In Python 2 it is slower, but faster in Python 3

Sharon Tan Over a year ago

Thanks very much!!

Moinuddin Quadri · Accepted Answer · 2016-11-09 20:00:14Z

1

It depends on the readability v/s efficiency. Let's see both the implementations first. I will be using this as list for the sample run:

my_list = [1, 2, 3, 4, 4, 5, 4, 3, 2]

Using collections.Counter():

from collections import Counter
d = Counter(my_list)

Using collections.defaultdict() creating my own counter:

from collections import defaultdict
d = defaultdict(int)
for i in [1, 2, 3, 4, 4, 5, 4, 3, 2]: 
    d[i] += 1

As you see, collections.Counter() is more readable

Let see efficiency using timeit:

In Python 2.7:

mquadri$ python -m "timeit" -c "from collections import defaultdict" "d=defaultdict(int)" "for i in [1, 2, 3, 4, 4, 5, 4, 3, 2]: d[i] += 1"
100000 loops, best of 3: 2.95 usec per loop

mquadri$ python -m "timeit" -c "from collections import Counter" "Counter([1, 2, 3, 4, 4, 5, 4, 3, 2])"
100000 loops, best of 3: 6.5 usec per loop

collection.Counter() implementation is slower by 2 times than own code.

In Python 3:

mquadri$ python3 -m "timeit" -c "from collections import defaultdict" "d=defaultdict(int)" "for i in [1, 2, 3, 4, 4, 5, 4, 3, 2]: d[i] += 1"
100000 loops, best of 3: 3.1 usec per loop

mquadri$ python3 -m "timeit" -c "from collections import Counter" "Counter([1, 2, 3, 4, 4, 5, 4, 3, 2])"
100000 loops, best of 3: 5.57 usec per loop

collections.Counter() is twice as faster as own code.

edited Nov 9, 2016 at 20:00

answered Nov 9, 2016 at 18:46

Moinuddin Quadri

48.4k13 gold badges101 silver badges138 bronze badges

1 Comment

manesioz Over a year ago

Still I think it's safe to say that it's O(n) complexity either way.

Collectives™ on Stack Overflow

Python collections.Counter() runtime

2 Answers 2

2 Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related