3

Given such a data frame df:

0     1
1     [12]
1     [13]
2     [11,12]
1     [10,0,1]
....

I'd like to count a certain value, for instance, '12' in each list of df. So i tried:

df.apply(list.count('12'))

but got error: TypeError: descriptor 'count' requires a 'list' object but received a 'str'. But they are exactly lists in df[1]! How can I correct it? Thanks!

3 Answers 3

1

I think you can try first select column as Series by ix and then apply function x.count(12):

import pandas as pd

d = { 0:pd.Series([1,1,2,1]),
      1:pd.Series([[12], [13], [11,12 ],[10,0,1]])}

df = pd.DataFrame(d)  

print df 
   0           1
0  1        [12]
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1]
0          [12]
1          [13]
2      [11, 12]
3    [10, 0, 1]
Name: 1, dtype: object

print df.ix[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

Or use iloc for selecting:

print df.iloc[:, 1].apply(lambda x: x.count(12))   
0    1
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT:

I think column 1 contains NaN.

You can use:

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

print df.ix[:, 1].notnull()
0    False
1     True
2     True
3     True
Name: 1, dtype: bool

print df.ix[df.ix[:, 1].notnull(), 1].apply(lambda x: x.count(12))   
1    0
2    1
3    0
Name: 1, dtype: int64

EDIT2:

If you want filter by index (e.g. 0:2) and by NaN in column 1:

print df 
   0           1
0  1         NaN
1  1        [13]
2  2    [11, 12]
3  1  [10, 0, 1]

#filter df by index - only 0 to 2 
print df.ix[0:2, 1]
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

#boolean series, where is not nul filtered df
print df.ix[0:2, 1].notnull()
0    False
1     True
2     True
Name: 1, dtype: bool

#get column 1: first is filtered to 0:2 index and then if is not null
print df.ix[0:2, 1][df.ix[0:2, 1].notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object
#same as above, but more nice
df1 =  df.ix[0:2, 1]
print df1
0         NaN
1        [13]
2    [11, 12]
Name: 1, dtype: object

print df1[df1.notnull()]
1        [13]
2    [11, 12]
Name: 1, dtype: object

#apply count
print df1[df1.notnull()].apply(lambda x: x.count(12))   
1    0
2    1
Name: 1, dtype: int64
Sign up to request clarification or add additional context in comments.

9 Comments

Hello, why are you using iloc or ix when a simple name indexing df[1] does the job ?
I think it is best practices, but I cant find nice source.
I see what you mean: to avoid the ambiguity between label access and positional access. A quote from the pandas documentation Indexing and Selecting Data > Integers are valid labels, but they refer to the label and not the position.
Yes, you are right. I think better is use ix or iloc for integer columns names.
I tried you answer too but got this error:AttributeError: 'float' object has no attribute 'count'
|
1

The count has to be applied on the column.

# Test data
df = pd.DataFrame({1: [[1], [12], [13], [11,12], [10,0,1]]})

df[1].apply(lambda x: x.count(12))

0    0
1    1
2    0
3    1
4    0
Name: 1, dtype: int64

A modification to handle the case when some values are not stored in a list

# An example with values not stored in list 
df = pd.DataFrame({1: [12, [12], [13], [11,12], [10,0,1], 1]})

_check = 12
df[1].apply(lambda l: l.count(_check) if (type(l) is list) else int(l == _check))

0    1
1    1
2    0
3    1
4    0
5    0
Name: 1, dtype: int64

4 Comments

Why did I got this error: AttributeError: 'float' object has no attribute 'count'?
@ZICHAO LI Probably because all the elements in the column are not in a list but it is not what you've said in the question.
@ZICHAOLI Just modified my answer to handle the case where some values are not stored in a list. Tell me.
Indeed they are all in a list. But your solution is still very helpful. Thanks a lot!
0

You can use a conditional generator expression:

df = df = pd.DataFrame({0: [1, 1, 2, 1, 1, 2], 1: [np.nan, [13], [11, 12], [10, 0, 1], [12], [np.nan, 12]]})

target = 12
>>> sum(sub_list.count(target) 
        for sub_list in df.iloc[:, 1] 
        if not np.isnan(sub_list).all())
3

This is like the following conditional list comprehension:

>>> [sub_list.count(12) for sub_list in df.iloc[:, 1] if not np.isnan(sub_list).all()]
[0, 1, 0, 1, 1]

The difference is that the former lazily evaluates each item in the list instead of first generating the entire list, so it is generally more efficient.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.