I just run into a problem that I need to put a list, e.g. l = [1, 2, 3, 4], into a dic, e.g. {1: 1, 2: 1, 3: 1, 4: 1}. I just want to know whether I should use collections.Counter() or just write a loop by myself to do this. Is build-in method faster than writing loop by myself?
2 Answers
You can always test if something is faster, with the timeit module. In Python 3, the Counter object has C performance improvements and is very fast indeed:
>>> from timeit import timeit
>>> import random, string
>>> from collections import Counter, defaultdict
>>> def count_manually(it):
... res = defaultdict(int)
... for el in it:
... res[el] += 1
... return res
...
>>> test_data = [random.choice(string.printable) for _ in range(10000)]
>>> timeit('count_manually(test_data)', 'from __main__ import test_data, count_manually', number=2000)
1.4321454349992564
>>> timeit('Counter(test_data)', 'from __main__ import test_data, Counter', number=2000)
0.776072466003825
Here Counter() was 2 times faster.
That said, unless you are counting in a performance-critical section of your code, focus on readability and maintainability in mind, and in that respect a Counter() wins hands-down over write-your-own code.
Next to all that, Counter() objects offer functionality on top of dictionaries: they can be treated as multisets (you can sum or subtract counters, and produce unions or intersections), and they can efficiently give you the top N elements by count.
2 Comments
It depends on the readability v/s efficiency. Let's see both the implementations first. I will be using this as list for the sample run:
my_list = [1, 2, 3, 4, 4, 5, 4, 3, 2]
Using collections.Counter():
from collections import Counter
d = Counter(my_list)
Using collections.defaultdict() creating my own counter:
from collections import defaultdict
d = defaultdict(int)
for i in [1, 2, 3, 4, 4, 5, 4, 3, 2]:
d[i] += 1
As you see, collections.Counter() is more readable
Let see efficiency using timeit:
In Python 2.7:
mquadri$ python -m "timeit" -c "from collections import defaultdict" "d=defaultdict(int)" "for i in [1, 2, 3, 4, 4, 5, 4, 3, 2]: d[i] += 1" 100000 loops, best of 3: 2.95 usec per loop mquadri$ python -m "timeit" -c "from collections import Counter" "Counter([1, 2, 3, 4, 4, 5, 4, 3, 2])" 100000 loops, best of 3: 6.5 usec per loopcollection.Counter()implementation is slower by 2 times than own code.In Python 3:
mquadri$ python3 -m "timeit" -c "from collections import defaultdict" "d=defaultdict(int)" "for i in [1, 2, 3, 4, 4, 5, 4, 3, 2]: d[i] += 1" 100000 loops, best of 3: 3.1 usec per loop mquadri$ python3 -m "timeit" -c "from collections import Counter" "Counter([1, 2, 3, 4, 4, 5, 4, 3, 2])" 100000 loops, best of 3: 5.57 usec per loopcollections.Counter()is twice as faster as own code.
collections.Counter()because this what it does. Why to write your own code ;)timeitmodule. In Python 3, theCounterobject has C performance improvements and is very fast indeed.timeitand My own code is faster thanCounter. May be I did something wrong, but everything looks same to me