How to count element in each list in the data frame with pandas?

Question

Given such a data frame df:

0     1
1     [12]
1     [13]
2     [11,12]
1     [10,0,1]
....

I'd like to count a certain value, for instance, '12' in each list of df. So i tried:

df.apply(list.count('12'))

but got error: TypeError: descriptor 'count' requires a 'list' object but received a 'str'. But they are exactly lists in df[1]! How can I correct it? Thanks!

jezrael · Accepted Answer · 2016-02-27 13:07:22Z

1

I think you can try first select column as Series by ix and then apply function x.count(12):

import pandas as pd

d = { 0:pd.Series([1,1,2,1]),
      1:pd.Series([[12], [13], [11,12 ],[10,0,1]])}

df = pd.DataFrame(d)  

print df 
   0           1
0  1        [12]
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1]
0          [12]
1          [13]
2      [11, 12]
3    [10, 0, 1]
Name: 1, dtype: object

print df.ix[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

Or use iloc for selecting:

print df.iloc[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT:

I think column 1 contains NaN.

You can use:

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1].notnull()
0    False
1     True
2     True
3     True
Name: 1, dtype: bool

print df.ix[df.ix[:, 1].notnull(), 1].apply(lambda x: x.count(12))   
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT2:

If you want filter by index (e.g. 0:2) and by NaN in column 1:

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

#filter df by index - only 0 to 2 
print df.ix[0:2, 1]
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

#boolean series, where is not nul filtered df
print df.ix[0:2, 1].notnull()
0    False
1     True
2     True
Name: 1, dtype: bool

#get column 1: first is filtered to 0:2 index and then if is not null
print df.ix[0:2, 1][df.ix[0:2, 1].notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object

#same as above, but more nice
df1 =  df.ix[0:2, 1]
print df1
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

print df1[df1.notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object

#apply count
print df1[df1.notnull()].apply(lambda x: x.count(12))   
1    0
2    1
Name: 1, dtype: int64

edited Feb 27, 2016 at 13:07

answered Feb 27, 2016 at 9:34

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

Romain Over a year ago

Hello, why are you using iloc or ix when a simple name indexing df[1] does the job ?

jezrael Over a year ago

I think it is best practices, but I cant find nice source.

Romain Over a year ago

I see what you mean: to avoid the ambiguity between label access and positional access. A quote from the pandas documentation Indexing and Selecting Data > Integers are valid labels, but they refer to the label and not the position.

jezrael Over a year ago

Yes, you are right. I think better is use ix or iloc for integer columns names.

user4462740 Over a year ago

I tried you answer too but got this error:AttributeError: 'float' object has no attribute 'count'

|

Romain · Accepted Answer · 2016-02-27 12:40:20Z

1

The count has to be applied on the column.

# Test data
df = pd.DataFrame({1: [[1], [12], [13], [11,12], [10,0,1]]})

df[1].apply(lambda x: x.count(12))

0    0
1    1
2    0
3    1
4    0
Name: 1, dtype: int64

A modification to handle the case when some values are not stored in a list

# An example with values not stored in list 
df = pd.DataFrame({1: [12, [12], [13], [11,12], [10,0,1], 1]})

_check = 12
df[1].apply(lambda l: l.count(_check) if (type(l) is list) else int(l == _check))

0    1
1    1
2    0
3    1
4    0
5    0
Name: 1, dtype: int64

edited Feb 27, 2016 at 12:40

answered Feb 27, 2016 at 9:30

Romain

22.2k6 gold badges63 silver badges77 bronze badges

4 Comments

user4462740 Over a year ago

Why did I got this error: AttributeError: 'float' object has no attribute 'count'?

Romain Over a year ago

@ZICHAO LI Probably because all the elements in the column are not in a list but it is not what you've said in the question.

Romain Over a year ago

@ZICHAOLI Just modified my answer to handle the case where some values are not stored in a list. Tell me.

user4462740 Over a year ago

Indeed they are all in a list. But your solution is still very helpful. Thanks a lot!

Alexander · Accepted Answer · 2016-02-27 16:57:21Z

0

You can use a conditional generator expression:

df = df = pd.DataFrame({0: [1, 1, 2, 1, 1, 2], 1: [np.nan, [13], [11, 12], [10, 0, 1], [12], [np.nan, 12]]})

target = 12
>>> sum(sub_list.count(target) 
        for sub_list in df.iloc[:, 1] 
        if not np.isnan(sub_list).all())
3

This is like the following conditional list comprehension:

>>> [sub_list.count(12) for sub_list in df.iloc[:, 1] if not np.isnan(sub_list).all()]
[0, 1, 0, 1, 1]

The difference is that the former lazily evaluates each item in the list instead of first generating the entire list, so it is generally more efficient.

edited Feb 27, 2016 at 16:57

answered Feb 27, 2016 at 16:51

Alexander

111k32 gold badges212 silver badges208 bronze badges

Collectives™ on Stack Overflow

How to count element in each list in the data frame with pandas?

3 Answers 3

9 Comments

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

9 Comments

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related