Get frequency count of elements in an array

Question

Hi I have a list of values. I want to get another list with the amount of times every values in that list occurs. This is fairly easy, but I also need to have the values which are not present in the original list, to be present in the frequency list, but then with value 0. For example:

I = [0,1,1,2,2,2,4,4,5,5,6,6,6,8,8,8]

What you expect:

freqI = [1,2,3,2,2,2,3,3]

What I need:

freqI = [1,2,3,0,2,2,3,0,3]

As you can see 3 and 7 are not present in I, though they are still accounted for in the frequency list.

My initial try ended up giving me the first kind of solution (with the intermediate values):

d = {x:I.count(x) for x in I}

sorted_x = sorted(d.iteritems(), key=operator.itemgetter(0))

How can I get the frequency count (aka histogram) of my array, with the intermediate values present ?

what is toList() and why are you calling it on something that's already a list? — Henry Keiter
– Henry Keiter, Commented May 27, 2013 at 19:35
How are you limiting the set of "values which are not present in the original list", which is infinite? — jscs
– jscs, Commented May 27, 2013 at 19:37
The largest number is the last number in the frequency list. (So everything bellow the maximum should be taken into account. — Olivier_s_j
– Olivier_s_j, Commented May 27, 2013 at 19:50
@Ojtwist if the list is always sorted then jamylak's solution must be preferred. — Ashwini Chaudhary
– Ashwini Chaudhary, Commented May 27, 2013 at 20:20

Ashwini Chaudhary · Accepted Answer · 2013-05-27 19:48:33Z

8

>>> lis = [0,1,1,2,2,2,4,4,5,5,6,6,6,8,8,8]
>>> maxx,minn = max(lis),min(lis)
>>> from collections import Counter
>>> c = Counter(lis)
>>> [c[i] for i in xrange(minn,maxx+1)]
[1, 2, 3, 0, 2, 2, 3, 0, 3]

or as suggested by @DSM we can get min and max from the dict itself:

>>> [c[i] for i in xrange( min(c) , max(c)+1)]
[1, 2, 3, 0, 2, 2, 3, 0, 3]

edited May 27, 2013 at 19:48

answered May 27, 2013 at 19:37

Ashwini Chaudhary

252k60 gold badges478 silver badges519 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

DSM Over a year ago

Not that it really matters, but if you put off computing the maximum and the minimum until you take the range, then you'll only need to scan through the keys and not the list. (IOW, for i in range(min(c), max(c)+1) or something.)

Ashwini Chaudhary Over a year ago

@DSM good point, this will surely improve the average case complexity.

Colonel Panic · Accepted Answer · 2013-05-27 19:37:46Z

5

How about

>>> I = [0,1,1,2,2,2,4,4,5,5,6,6,6,8,8,8]
>>> from collections import Counter
>>> frequencies = Counter(I)
>>> frequencies
Counter({2: 3, 6: 3, 8: 3, 1: 2, 4: 2, 5: 2, 0: 1})

You can query the counter for any number. For numbers it hasn't seen, it gives 0

>>> frequencies[42]
0

answered May 27, 2013 at 19:37

Colonel Panic

138k98 gold badges420 silver badges483 bronze badges

Comments

jamylak · Accepted Answer · 2013-05-27 20:05:47Z

2

Your list looks like it's in sorted order, if so this is the best way:

>>> from collections import Counter
>>> I = [0,1,1,2,2,2,4,4,5,5,6,6,6,8,8,8]
>>> c = Counter(I)
>>> [c[i] for i in range(I[0], I[-1]+1)]
[1, 2, 3, 0, 2, 2, 3, 0, 3]

answered May 27, 2013 at 20:05

jamylak

134k30 gold badges238 silver badges240 bronze badges

Comments

YXD · Accepted Answer · 2013-05-27 22:45:33Z

2

[I.count(k) for k in range(max(I+1))]

edited May 27, 2013 at 22:45

answered May 27, 2013 at 19:36

YXD

32.6k15 gold badges79 silver badges117 bronze badges

4 Comments

Alfe Over a year ago

+1 for brevity. Searching the max first might not be the best solution, though.

Ashwini Chaudhary Over a year ago

count() will result in O(N^2) complexity, this can be done in O(N).

YXD Over a year ago

@AshwiniChaudhary yep agree, your solution is more efficient.

DSM Over a year ago

Note that because of the absence of +1, this doesn't show the 3 corresponding to 8,8,8.

Henry Keiter · Accepted Answer · 2013-05-27 19:37:34Z

1

You're close; you just want to iterate over a generator instead of your actual list. Something like this:

# note: lowercase variable names are Python standard and good coding practice!
d = {n:list_of_ints.count(n) for n in range(max(list_of_ints))}

Note that I'm using max(I), which is just the biggest element in your list, since you didn't specify an upper bound. Obviously you could hardcode this number instead, or if you want to restrict your histogram to the range of data in I, you can make it range(min(I), max(I)).

answered May 27, 2013 at 19:37

Henry Keiter

17.3k8 gold badges53 silver badges85 bronze badges

Collectives™ on Stack Overflow

Get frequency count of elements in an array

5 Answers 5

2 Comments

Comments

Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

Comments

Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related