4

Hi I have a list of values. I want to get another list with the amount of times every values in that list occurs. This is fairly easy, but I also need to have the values which are not present in the original list, to be present in the frequency list, but then with value 0. For example:

I = [0,1,1,2,2,2,4,4,5,5,6,6,6,8,8,8]

What you expect:

freqI = [1,2,3,2,2,2,3,3]

What I need:

freqI = [1,2,3,0,2,2,3,0,3]

As you can see 3 and 7 are not present in I, though they are still accounted for in the frequency list.

My initial try ended up giving me the first kind of solution (with the intermediate values):

d = {x:I.count(x) for x in I}

sorted_x = sorted(d.iteritems(), key=operator.itemgetter(0))

How can I get the frequency count (aka histogram) of my array, with the intermediate values present ?

5
  • what is toList() and why are you calling it on something that's already a list? Commented May 27, 2013 at 19:35
  • It was a remnant of my previous code. I cleaned it up. Commented May 27, 2013 at 19:37
  • 3
    How are you limiting the set of "values which are not present in the original list", which is infinite? Commented May 27, 2013 at 19:37
  • The largest number is the last number in the frequency list. (So everything bellow the maximum should be taken into account. Commented May 27, 2013 at 19:50
  • @Ojtwist if the list is always sorted then jamylak's solution must be preferred. Commented May 27, 2013 at 20:20

5 Answers 5

8
>>> lis = [0,1,1,2,2,2,4,4,5,5,6,6,6,8,8,8]
>>> maxx,minn = max(lis),min(lis)
>>> from collections import Counter
>>> c = Counter(lis)
>>> [c[i] for i in xrange(minn,maxx+1)]
[1, 2, 3, 0, 2, 2, 3, 0, 3]

or as suggested by @DSM we can get min and max from the dict itself:

>>> [c[i] for i in xrange( min(c) , max(c)+1)]
[1, 2, 3, 0, 2, 2, 3, 0, 3]
Sign up to request clarification or add additional context in comments.

2 Comments

Not that it really matters, but if you put off computing the maximum and the minimum until you take the range, then you'll only need to scan through the keys and not the list. (IOW, for i in range(min(c), max(c)+1) or something.)
@DSM good point, this will surely improve the average case complexity.
5

How about

>>> I = [0,1,1,2,2,2,4,4,5,5,6,6,6,8,8,8]
>>> from collections import Counter
>>> frequencies = Counter(I)
>>> frequencies
Counter({2: 3, 6: 3, 8: 3, 1: 2, 4: 2, 5: 2, 0: 1})

You can query the counter for any number. For numbers it hasn't seen, it gives 0

>>> frequencies[42]
0

Comments

2

Your list looks like it's in sorted order, if so this is the best way:

>>> from collections import Counter
>>> I = [0,1,1,2,2,2,4,4,5,5,6,6,6,8,8,8]
>>> c = Counter(I)
>>> [c[i] for i in range(I[0], I[-1]+1)]
[1, 2, 3, 0, 2, 2, 3, 0, 3]

Comments

2
[I.count(k) for k in range(max(I+1))]

4 Comments

+1 for brevity. Searching the max first might not be the best solution, though.
count() will result in O(N^2) complexity, this can be done in O(N).
@AshwiniChaudhary yep agree, your solution is more efficient.
Note that because of the absence of +1, this doesn't show the 3 corresponding to 8,8,8.
1

You're close; you just want to iterate over a generator instead of your actual list. Something like this:

# note: lowercase variable names are Python standard and good coding practice!
d = {n:list_of_ints.count(n) for n in range(max(list_of_ints))} 

Note that I'm using max(I), which is just the biggest element in your list, since you didn't specify an upper bound. Obviously you could hardcode this number instead, or if you want to restrict your histogram to the range of data in I, you can make it range(min(I), max(I)).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.