44

I have a list which has repeating items and I want a list of the unique items with their frequency.

For example, I have ['a', 'a', 'b', 'b', 'b'], and I want [('a', 2), ('b', 3)].

Looking for a simple way to do this without looping twice.

3
  • 1
    Just so you know... the answer you accepted violates your "without looping twice" constraint. (I'm comment here so that you get notified :-). Commented Mar 6, 2010 at 15:41
  • Can you just clarify your question a little bit too? Are your items always grouped together? Or can they appear in any order in the list? Commented Mar 6, 2010 at 15:57
  • Yes, Tom. Although my question does not specify this - but in my particular situation, the values are coming sorted. Thanks. Commented Mar 6, 2010 at 16:02

10 Answers 10

75

With Python 2.7+, you can use collections.Counter.

Otherwise, see this counter receipe.

Under Python 2.7+:

from collections import Counter
input =  ['a', 'a', 'b', 'b', 'b']
c = Counter( input )

print( c.items() )

Output is:

[('a', 2), ('b', 3)]

Sign up to request clarification or add additional context in comments.

Comments

16
>>> mylist=['a', 'a', 'b', 'b', 'b']
>>> [ (i,mylist.count(i)) for i in set(mylist) ]
[('a', 2), ('b', 3)]

Comments

15

If your items are grouped (i.e. similar items come together in a bunch), the most efficient method to use is itertools.groupby:

>>> [(g[0], len(list(g[1]))) for g in itertools.groupby(['a', 'a', 'b', 'b', 'b'])]
[('a', 2), ('b', 3)]

6 Comments

@Tom: I'm aware of this limitation. When the items are grouped, however, groupby is the efficient and preferred approach
You should make that clear... notice the constraint in the question says "I have a list which has repeating items"... the list the OP gave was just an example. I don't think this solution is general enough. If the OP specified that the input list always had the elements grouped, I would agree.
@Tom: you're right - I've updated the answer (BTW I assumed from his "repeating items" that they're grouped)
Ok Eli... thanks for the update :-). I revoke my -1 because your answer is now more clear.
Is there a way to sort the resulting tuple list by count?
|
7

If you are willing to use a 3rd party library, NumPy offers a convenient solution. This is particularly efficient if your list contains only numeric data.

import numpy as np

L = ['a', 'a', 'b', 'b', 'b']

res = list(zip(*np.unique(L, return_counts=True)))

# [('a', 2), ('b', 3)]

To understand the syntax, note np.unique here returns a tuple of unique values and counts:

uniq, counts = np.unique(L, return_counts=True)

print(uniq)    # ['a' 'b']
print(counts)  # [2 3]

See also: What are the advantages of NumPy over regular Python lists?

Comments

3

I know this isn't a one-liner... but to me I like it because it's clear to me that we pass over the initial list of values once (instead of calling count on it):

>>> from collections import defaultdict
>>> l = ['a', 'a', 'b', 'b', 'b']
>>> d = defaultdict(int)
>>> for i in l:
...  d[i] += 1
... 
>>> d
defaultdict(<type 'int'>, {'a': 2, 'b': 3})
>>> list(d.iteritems())
[('a', 2), ('b', 3)]
>>>

Comments

3

the "old school way".

>>> alist=['a', 'a', 'b', 'b', 'b']
>>> d={}
>>> for i in alist:
...    if not d.has_key(i): d[i]=1  #also: if not i in d
...    else: d[i]+=1
...
>>> d
{'a': 2, 'b': 3}

Comments

1

Another way to do this would be

mylist = [1, 1, 2, 3, 3, 3, 4, 4, 4, 4]
mydict = {}
for i in mylist:
    if i in mydict: mydict[i] += 1
    else: mydict[i] = 1

then to get the list of tuples,

mytups = [(i, mydict[i]) for i in mydict]

This only goes over the list once, but it does have to traverse the dictionary once as well. However, given that there are a lot of duplicates in the list, then the dictionary should be a lot smaller, hence faster to traverse.

Nevertheless, not a very pretty or concise bit of code, I'll admit.

3 Comments

This is identical in spirit to my solution... except defaultdict consolidates the first part (since you don't have to check for existence) and list(mydict.iteritems()) is shorter than the list comprehension.
mytups = mydict.items() is a simpler way to get the list of tuples.
Thanks @Paul and @Tom. It seems like there is always a better way to do something in Python. :)
1

A solution without hashing:

def lcount(lst):
   return reduce(lambda a, b: a[0:-1] + [(a[-1][0], a[-1][1]+1)] if a and b == a[-1][0] else a + [(b, 1)], lst, [])

>>> lcount([])
[]
>>> lcount(['a'])
[('a', 1)]
>>> lcount(['a', 'a', 'a', 'b', 'b'])
[('a', 3), ('b', 2)]

Comments

1

Convert any data structure into a pandas series s:

CODE:

for i in sort(s.value_counts().unique()):
  print i, (s.value_counts()==i).sum()

Comments

0

With help of pandas you can do like:

import pandas as pd
dict(pd.value_counts(my_list))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.