I have a list which has repeating items and I want a list of the unique items with their frequency.
For example, I have ['a', 'a', 'b', 'b', 'b'], and I want [('a', 2), ('b', 3)].
Looking for a simple way to do this without looping twice.
I have a list which has repeating items and I want a list of the unique items with their frequency.
For example, I have ['a', 'a', 'b', 'b', 'b'], and I want [('a', 2), ('b', 3)].
Looking for a simple way to do this without looping twice.
With Python 2.7+, you can use collections.Counter.
Otherwise, see this counter receipe.
Under Python 2.7+:
from collections import Counter
input = ['a', 'a', 'b', 'b', 'b']
c = Counter( input )
print( c.items() )
Output is:
[('a', 2), ('b', 3)]
If your items are grouped (i.e. similar items come together in a bunch), the most efficient method to use is itertools.groupby:
>>> [(g[0], len(list(g[1]))) for g in itertools.groupby(['a', 'a', 'b', 'b', 'b'])]
[('a', 2), ('b', 3)]
groupby is the efficient and preferred approachIf you are willing to use a 3rd party library, NumPy offers a convenient solution. This is particularly efficient if your list contains only numeric data.
import numpy as np
L = ['a', 'a', 'b', 'b', 'b']
res = list(zip(*np.unique(L, return_counts=True)))
# [('a', 2), ('b', 3)]
To understand the syntax, note np.unique here returns a tuple of unique values and counts:
uniq, counts = np.unique(L, return_counts=True)
print(uniq) # ['a' 'b']
print(counts) # [2 3]
See also: What are the advantages of NumPy over regular Python lists?
I know this isn't a one-liner... but to me I like it because it's clear to me that we pass over the initial list of values once (instead of calling count on it):
>>> from collections import defaultdict
>>> l = ['a', 'a', 'b', 'b', 'b']
>>> d = defaultdict(int)
>>> for i in l:
... d[i] += 1
...
>>> d
defaultdict(<type 'int'>, {'a': 2, 'b': 3})
>>> list(d.iteritems())
[('a', 2), ('b', 3)]
>>>
Another way to do this would be
mylist = [1, 1, 2, 3, 3, 3, 4, 4, 4, 4]
mydict = {}
for i in mylist:
if i in mydict: mydict[i] += 1
else: mydict[i] = 1
then to get the list of tuples,
mytups = [(i, mydict[i]) for i in mydict]
This only goes over the list once, but it does have to traverse the dictionary once as well. However, given that there are a lot of duplicates in the list, then the dictionary should be a lot smaller, hence faster to traverse.
Nevertheless, not a very pretty or concise bit of code, I'll admit.
mytups = mydict.items() is a simpler way to get the list of tuples.