3

Hi I want to be able to count the occurrences of items from my list by indexes of a nested list.

That is if my list is

keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
        'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']

and my nested list looks like:

[['Three' 'One' 'Ten']
 ['Three' 'Five' 'Nine']
 ['Two' 'Five' 'Three']
 ['Two' 'Three' 'Eight']
 ['One' 'Three' 'Nine']]

How many times does 'One' occur at index 0 etc for each item, is what I want to know.

I am using numpy arrays to build list and am creating output from weighted random. I want to be able to run the test over say 1000 lists and count the index occurrences to determine how the changes I make elsewhere in my program affect the end result.

I have found examples such as https://stackoverflow.com/a/10741692/461887

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

But this appears not to work with nested lists. Also been looking under indexing in the numpy cookbook - indexing and histogram & digitize in the example list but I just can't seem to find a function that could do this.

Updated to include example data output:

Assunming 100 deep nested lists

{'One': 19, 'Two': 16, 'Three': 19, 'Four': 11, 'Five': 7, 'Six': 8, 'Seven' 4, 'Eight' 3,
            'Nine' 5, 'Ten': 1, 'Eleven': 2, 'Twelve': 1, 'Thirteen': 1, 'Fourteen': 3, 'Fifteen': 0}

Or as in treddy's example

array([19, 16, 19, 11, 7, 8, 4, 3, 5, 1, 2, 1, 1, 3, 0])
6
  • Does the location of the values in the nested list matter? The bincount solution would work if you just flatten the array. What do you mean by "for each item"? Is 'item' one of the sublists, or is it one of the keys? Commented Nov 23, 2013 at 4:56
  • @askewchan. I would like to know how many times 'one' 'two' 'three' etc occur at index 0. Commented Nov 23, 2013 at 5:04
  • 1
    @sayth it is unclear what you are asking... could you add an example about how should your output look like? Commented Nov 23, 2013 at 7:05
  • 1
    @SaulloCastro updated to include example output. Commented Nov 23, 2013 at 10:13
  • @sayth I assume that desired result is dictionary, not list? Commented Nov 23, 2013 at 10:17

3 Answers 3

4

You'd better to add example output you want to get for your example, but for now looks like collections.Counter will do the job:

>>> data = [['Three','One','Ten'],
...  ['Three','Five','Nine'],
...  ['Two','Five','Three'],
...  ['Two','Three','Eight'],
...  ['One','Three','Nine']]
... 
>>> 
>>> from collections import Counter
>>> [Counter(x) for x in data]
[Counter({'Three': 1, 'Ten': 1, 'One': 1}), Counter({'Nine': 1, 'Five': 1, 'Three': 1}), Counter({'Five': 1, 'Two': 1, 'Three': 1}), Counter({'Eight': 1, 'Two': 1, 'Three': 1}), Counter({'Nine': 1, 'Three': 1, 'One': 1})]

update:

As you gave desired output, I think the idea for you would be - fatten the list, use Counter to count occurences, and then create dictionary (or OrderedDict if order matters for you):

>>> from collections import Counter, OrderedDict
>>> c = Counter(e for l in data for e in l)
>>> c
Counter({'Three': 5, 'Two': 2, 'Nine': 2, 'Five': 2, 'One': 2, 'Ten': 1, 'Eight': 1})

or if you need only first entry in each list:

>>> c = Counter(l[0] for l in data)
>>> c
Counter({'Three': 2, 'Two': 2, 'One': 1})

simple dictionary:

>>> {x:c[x] for x in keys} 
{
    'Twelve': 0, 'Seven': 0,
    'Ten': 1, 'Fourteen': 0,
    'Nine': 2, 'Six': 0
    'Three': 5, 'Two': 2,
    'Four': 0, 'Eleven': 0,
    'Five': 2, 'Thirteen': 0,
    'Eight': 1, 'One': 2, 'Fifteen': 0
}

or OrderedDict:

>>> OrderedDict((x, c[x]) for x in keys)
OrderedDict([('One', 2), ('Two', 2), ('Three', 5), ('Four', 0), ('Five', 2), ('Six', 0), ('Seven', 0), ('Eight', 1), ('Nine', 2), ('Ten', 1), ('Eleven', 0), ('Twelve', 0), ('Thirteen', 0), ('Fourteen', 0), ('Fifteen', 0)])

And, just in case, if you don' need zeroes in your otput, you could just use Counter to get number of occurences:

>>> c['Nine']   # Key is in the Counter, returns number of occurences
2
>>> c['Four']   # Key is not in the Counter, returns 0
0
Sign up to request clarification or add additional context in comments.

2 Comments

counter only counts in each nested list, I am trying to count the first entry from each nested list as a total of all nested lists.
@sayth ah ok, then just change counter creation like c = Counter(l[0] for l in data), other code is good
3

The OP asked a numpy question and collections Counter and OrderDict will certainly work, but here's a numpy answer:

In [1]: # from original posting:
In [2]: keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
...:         'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']
In [3]: data = [['Three', 'One', 'Ten'],
...:            ['Three', 'Five', 'Nine'],
...:            ['Two', 'Five', 'Three'],
...:            ['Two', 'Three', 'Eight'],
...:            ['One', 'Three', 'Nine']]
In [4]: # make it numpy
In [5]: import numpy as np
In [6]: keys = np.array(keys)
In [7]: data = np.array(data)
In [8]: # if you only want counts for column 0
In [9]: counts = np.sum(keys == data[:,[0]], axis=0)
In [10]: # view it
In [11]: zip(keys, counts)
Out[11]:
[('One', 1),
('Two', 2),
('Three', 2), ...
In [12]: # if you wanted counts for all columns (newaxis here sets-up 3D broadcasting)
In [13]: counts = np.sum(keys[:,np.newaxis,np.newaxis] == data, axis=1)
In [14]: # view it (you could use zip without pandas, this is just for looks)
In [15]: import pandas as pd
In [16]: pd.DataFrame(counts, index=keys)
Out[16]:
          0  1  2
One       1  1  0
Two       2  0  0
Three     2  2  1
Four      0  0  0
Five      0  2  0 ...

1 Comment

+1, good one, just thought that I should mention standard python collections
1

You are correct that numpy.bincount accepts a 1D array-like object, so a nested list or array with more than 1 dimension can't be used directly, but you can simply use numpy array slicing to select the first column of your 2D array and bin count the occurrence of each digit within the range of values in that column:

keys = numpy.arange(1,16) #don't really need to use this
two_dim_array_for_counting = numpy.array([[3,1,10],\
                                      [3,5,9],\
                                      [2,5,3],\
                                      [2,3,8],\
                                      [1,3,9]])
numpy.bincount(two_dim_array_for_counting[...,0]) #only count all rows in the first column
Out[36]: array([0, 1, 2, 2]) #this output means that the digit 0 occurs 0 times, 1 occurs once, 2 occurs twice, and three occurs twice

No digits greater than 3 occur in the first column so the output array only has 4 elements counting occurrences of 0, 1, 2, 3 digits in first column.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.