numpy - how do I count the occurrence of items in nested lists by index?

Question

Hi I want to be able to count the occurrences of items from my list by indexes of a nested list.

That is if my list is

keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
        'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']

and my nested list looks like:

[['Three' 'One' 'Ten']
 ['Three' 'Five' 'Nine']
 ['Two' 'Five' 'Three']
 ['Two' 'Three' 'Eight']
 ['One' 'Three' 'Nine']]

How many times does 'One' occur at index 0 etc for each item, is what I want to know.

I am using numpy arrays to build list and am creating output from weighted random. I want to be able to run the test over say 1000 lists and count the index occurrences to determine how the changes I make elsewhere in my program affect the end result.

I have found examples such as https://stackoverflow.com/a/10741692/461887

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]
zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

But this appears not to work with nested lists. Also been looking under indexing in the numpy cookbook - indexing and histogram & digitize in the example list but I just can't seem to find a function that could do this.

Updated to include example data output:

Assunming 100 deep nested lists

{'One': 19, 'Two': 16, 'Three': 19, 'Four': 11, 'Five': 7, 'Six': 8, 'Seven' 4, 'Eight' 3,
            'Nine' 5, 'Ten': 1, 'Eleven': 2, 'Twelve': 1, 'Thirteen': 1, 'Fourteen': 3, 'Fifteen': 0}

Or as in treddy's example

array([19, 16, 19, 11, 7, 8, 4, 3, 5, 1, 2, 1, 1, 3, 0])

Does the location of the values in the nested list matter? The bincount solution would work if you just flatten the array. What do you mean by "for each item"? Is 'item' one of the sublists, or is it one of the keys? — askewchan
– askewchan, Commented Nov 23, 2013 at 4:56
@askewchan. I would like to know how many times 'one' 'two' 'three' etc occur at index 0. — sayth
– sayth, Commented Nov 23, 2013 at 5:04
@sayth it is unclear what you are asking... could you add an example about how should your output look like? — Saullo G. P. Castro
– Saullo G. P. Castro, Commented Nov 23, 2013 at 7:05
@sayth I assume that desired result is dictionary, not list? — roman
– roman, Commented Nov 23, 2013 at 10:17

roman · Accepted Answer · 2013-11-23 10:22:59Z

4

You'd better to add example output you want to get for your example, but for now looks like collections.Counter will do the job:

>>> data = [['Three','One','Ten'],
...  ['Three','Five','Nine'],
...  ['Two','Five','Three'],
...  ['Two','Three','Eight'],
...  ['One','Three','Nine']]
... 
>>> 
>>> from collections import Counter
>>> [Counter(x) for x in data]
[Counter({'Three': 1, 'Ten': 1, 'One': 1}), Counter({'Nine': 1, 'Five': 1, 'Three': 1}), Counter({'Five': 1, 'Two': 1, 'Three': 1}), Counter({'Eight': 1, 'Two': 1, 'Three': 1}), Counter({'Nine': 1, 'Three': 1, 'One': 1})]

update:

As you gave desired output, I think the idea for you would be - fatten the list, use Counter to count occurences, and then create dictionary (or OrderedDict if order matters for you):

>>> from collections import Counter, OrderedDict
>>> c = Counter(e for l in data for e in l)
>>> c
Counter({'Three': 5, 'Two': 2, 'Nine': 2, 'Five': 2, 'One': 2, 'Ten': 1, 'Eight': 1})

or if you need only first entry in each list:

>>> c = Counter(l[0] for l in data)
>>> c
Counter({'Three': 2, 'Two': 2, 'One': 1})

simple dictionary:

>>> {x:c[x] for x in keys} 
{
    'Twelve': 0, 'Seven': 0,
    'Ten': 1, 'Fourteen': 0,
    'Nine': 2, 'Six': 0
    'Three': 5, 'Two': 2,
    'Four': 0, 'Eleven': 0,
    'Five': 2, 'Thirteen': 0,
    'Eight': 1, 'One': 2, 'Fifteen': 0
}

or OrderedDict:

>>> OrderedDict((x, c[x]) for x in keys)
OrderedDict([('One', 2), ('Two', 2), ('Three', 5), ('Four', 0), ('Five', 2), ('Six', 0), ('Seven', 0), ('Eight', 1), ('Nine', 2), ('Ten', 1), ('Eleven', 0), ('Twelve', 0), ('Thirteen', 0), ('Fourteen', 0), ('Fifteen', 0)])

And, just in case, if you don' need zeroes in your otput, you could just use Counter to get number of occurences:

>>> c['Nine']   # Key is in the Counter, returns number of occurences
2
>>> c['Four']   # Key is not in the Counter, returns 0
0

edited Nov 23, 2013 at 10:22

answered Nov 23, 2013 at 9:03

roman

118k30 gold badges205 silver badges209 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sayth Over a year ago

counter only counts in each nested list, I am trying to count the first entry from each nested list as a total of all nested lists.

roman Over a year ago

@sayth ah ok, then just change counter creation like c = Counter(l[0] for l in data), other code is good

Phil Cooper · Accepted Answer · 2013-11-23 15:50:21Z

3

The OP asked a numpy question and collections Counter and OrderDict will certainly work, but here's a numpy answer:

In [1]: # from original posting:
In [2]: keys = ['One', 'Two', 'Three', 'Four', 'Five', 'Six', 'Seven', 'Eight',
...:         'Nine', 'Ten', 'Eleven', 'Twelve', 'Thirteen', 'Fourteen', 'Fifteen']
In [3]: data = [['Three', 'One', 'Ten'],
...:            ['Three', 'Five', 'Nine'],
...:            ['Two', 'Five', 'Three'],
...:            ['Two', 'Three', 'Eight'],
...:            ['One', 'Three', 'Nine']]
In [4]: # make it numpy
In [5]: import numpy as np
In [6]: keys = np.array(keys)
In [7]: data = np.array(data)
In [8]: # if you only want counts for column 0
In [9]: counts = np.sum(keys == data[:,[0]], axis=0)
In [10]: # view it
In [11]: zip(keys, counts)
Out[11]:
[('One', 1),
('Two', 2),
('Three', 2), ...
In [12]: # if you wanted counts for all columns (newaxis here sets-up 3D broadcasting)
In [13]: counts = np.sum(keys[:,np.newaxis,np.newaxis] == data, axis=1)
In [14]: # view it (you could use zip without pandas, this is just for looks)
In [15]: import pandas as pd
In [16]: pd.DataFrame(counts, index=keys)
Out[16]:
          0  1  2
One       1  1  0
Two       2  0  0
Three     2  2  1
Four      0  0  0
Five      0  2  0 ...

answered Nov 23, 2013 at 15:50

Phil Cooper

5,8871 gold badge27 silver badges41 bronze badges

1 Comment

roman Over a year ago

+1, good one, just thought that I should mention standard python collections

treddy · Accepted Answer · 2013-11-23 07:43:22Z

You are correct that numpy.bincount accepts a 1D array-like object, so a nested list or array with more than 1 dimension can't be used directly, but you can simply use numpy array slicing to select the first column of your 2D array and bin count the occurrence of each digit within the range of values in that column:

keys = numpy.arange(1,16) #don't really need to use this
two_dim_array_for_counting = numpy.array([[3,1,10],\
                                      [3,5,9],\
                                      [2,5,3],\
                                      [2,3,8],\
                                      [1,3,9]])
numpy.bincount(two_dim_array_for_counting[...,0]) #only count all rows in the first column
Out[36]: array([0, 1, 2, 2]) #this output means that the digit 0 occurs 0 times, 1 occurs once, 2 occurs twice, and three occurs twice

No digits greater than 3 occur in the first column so the output array only has 4 elements counting occurrences of 0, 1, 2, 3 digits in first column.

Collectives™ on Stack Overflow

numpy - how do I count the occurrence of items in nested lists by index?

3 Answers 3

update:

2 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

update:

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related