1

I have pandas dataframe of the form,df=

index,result1,result2,result3 
  0     s       u       s     
  1     u       s       u   
  2     s                     
  3     s       s       u 

i would like to add another column that contains a list of the number of times s occurs in that row, for example

index,result1,result2,result3,count 
  0     s       u       s      2
  1     u       s       u      1
  2     s                      1
  3     s       s       u      2

i have tried the following code

col=['result1','result2','result3']
df[cols].count(axis=1)

but this returns

0,3
1,3
2,1
3,3

so this counts the number of elements, i then tried

df[df[cols]=='s'].count(axis=1)

but this returned the following error: "Could not compare ['s'] with block values"

Any help would be greatly appreciated

2
  • What are the missing values here? Are they blank strings or NaN? what does df.info() show? df=='s' will work if you have all str or mixed dtypes but if you have any pure numerical columns or rows then this won't work, this will happen if you have any rows with all NaN You can try df.fillna('', inplace=True) then (df[cols] = 's').count(axis=1) should work Commented Nov 11, 2016 at 10:57
  • @WGP, Perhaps, df['count'] = (df[cols].values=='s').sum(1) would be a nice alternative? Commented Nov 11, 2016 at 11:09

1 Answer 1

2

For me works cast to string by astype numeric and NaN columns return your error:

print (df)
   index result1 result2  result3  result4
0      0       s       u        7      NaN
1      1       u       s        7      NaN
2      2       s     NaN        8      NaN
3      3       s       s        7      NaN
4      4     NaN     NaN        2      NaN

print (df.dtypes)
index        int64
result1     object
result2     object
result3      int64
result4    float64
dtype: object

cols = ['result1','result2','result3','result4']
df['count'] = df[df[cols].astype(str) == 's'].count(axis=1)
print (df)
   index result1 result2  result3  result4  count
0      0       s       u        7      NaN      1
1      1       u       s        7      NaN      1
2      2       s     NaN        8      NaN      1
3      3       s       s        7      NaN      2
4      4     NaN     NaN        2      NaN      0

Or sum only True values from boolean mask:

print (df[cols].astype(str) == 's')

  result1 result2 result3 result4
0    True   False   False   False
1   False    True   False   False
2    True   False   False   False
3    True    True   False   False
4   False   False   False   False

cols = ['result1','result2','result3','result4']
df['count'] = (df[cols].astype(str) =='s').sum(axis=1)
print (df)
   index result1 result2  result3  result4  count
0      0       s       u        7      NaN      1
1      1       u       s        7      NaN      1
2      2       s     NaN        8      NaN      1
3      3       s       s        7      NaN      2
4      4     NaN     NaN        2      NaN      0

Another nice solution is from Nickil Maveli - use numpy:

df['count'] = (df[cols].values=='s').sum(axis=1)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.