Conditional counting across a row in pandas when matching a string

Question

I have pandas dataframe of the form,df=

index,result1,result2,result3 
  0     s       u       s     
  1     u       s       u   
  2     s                     
  3     s       s       u

i would like to add another column that contains a list of the number of times s occurs in that row, for example

index,result1,result2,result3,count 
  0     s       u       s      2
  1     u       s       u      1
  2     s                      1
  3     s       s       u      2

i have tried the following code

col=['result1','result2','result3']
df[cols].count(axis=1)

but this returns

0,3
1,3
2,1
3,3

so this counts the number of elements, i then tried

df[df[cols]=='s'].count(axis=1)

but this returned the following error: "Could not compare ['s'] with block values"

Any help would be greatly appreciated

What are the missing values here? Are they blank strings or NaN? what does df.info() show? df=='s' will work if you have all str or mixed dtypes but if you have any pure numerical columns or rows then this won't work, this will happen if you have any rows with all NaN You can try df.fillna('', inplace=True) then (df[cols] = 's').count(axis=1) should work — EdChum
– EdChum, Commented Nov 11, 2016 at 10:57
@WGP, Perhaps, df['count'] = (df[cols].values=='s').sum(1) would be a nice alternative? — Nickil Maveli
– Nickil Maveli, Commented Nov 11, 2016 at 11:09

Community · Accepted Answer · 2017-05-23 12:13:43Z

For me works cast to string by astype numeric and NaN columns return your error:

print (df)
   index result1 result2  result3  result4
0      0       s       u        7      NaN
1      1       u       s        7      NaN
2      2       s     NaN        8      NaN
3      3       s       s        7      NaN
4      4     NaN     NaN        2      NaN

print (df.dtypes)
index        int64
result1     object
result2     object
result3      int64
result4    float64
dtype: object

cols = ['result1','result2','result3','result4']
df['count'] = df[df[cols].astype(str) == 's'].count(axis=1)
print (df)
   index result1 result2  result3  result4  count
0      0       s       u        7      NaN      1
1      1       u       s        7      NaN      1
2      2       s     NaN        8      NaN      1
3      3       s       s        7      NaN      2
4      4     NaN     NaN        2      NaN      0

Or sum only True values from boolean mask:

print (df[cols].astype(str) == 's')

  result1 result2 result3 result4
0    True   False   False   False
1   False    True   False   False
2    True   False   False   False
3    True    True   False   False
4   False   False   False   False

cols = ['result1','result2','result3','result4']
df['count'] = (df[cols].astype(str) =='s').sum(axis=1)
print (df)
   index result1 result2  result3  result4  count
0      0       s       u        7      NaN      1
1      1       u       s        7      NaN      1
2      2       s     NaN        8      NaN      1
3      3       s       s        7      NaN      2
4      4     NaN     NaN        2      NaN      0

Another nice solution is from Nickil Maveli - use numpy:

df['count'] = (df[cols].values=='s').sum(axis=1)

Collectives™ on Stack Overflow

Conditional counting across a row in pandas when matching a string

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related