27

I have the following pandas DataFrame.

import pandas as pd
df = pd.read_csv('filename.csv')

print(df)

     dog      A         B           C
0     dog1    0.787575  0.159330    0.053095
1     dog10   0.770698  0.169487    0.059815
2     dog11   0.792689  0.152043    0.055268
3     dog12   0.785066  0.160361    0.054573
4     dog13   0.795455  0.150464    0.054081
5     dog14   0.794873  0.150700    0.054426
..    ....
8     dog19   0.811585  0.140207    0.048208
9     dog2    0.797202  0.152033    0.050765
10    dog20   0.801607  0.145137    0.053256
11    dog21   0.792689  0.152043    0.055268
    ....

I create a new column by summing columns "A", "B", "C" as follows:

df['total_ABC'] = df[["A", "B", "B"]].sum(axis=1)

Now I would like to do this based on a conditional, i.e. if "A" < 0.78 then create a new summed column df['smallA_sum'] = df[["A", "B", "B"]].sum(axis=1). Otherwise, the value should be zero.

How does one create conditional statements like this?

My thought would be to use

df['smallA_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if row['A'] < 0.78))

However, this doesn't work and I'm not able to specify axis.

How do you create a column based on the values of other columns?

You could also do something like for each df['dog'] == 'dog2', create column dog2_sum, i.e.

 df['dog2_sum'] = df1.apply(lambda row: (row['A']+row['B']+row['C']) if df['dog'] == 'dog2'))

but my approach is incorrect.

2 Answers 2

36

The following should work, here we mask the df where the condition is met, this will set NaN to the rows where the condition isn't met so we call fillna on the new col:

In [67]:
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df

Out[67]:
          A         B         C
0  0.197334  0.707852 -0.443475
1 -1.063765 -0.914877  1.585882
2  0.899477  1.064308  1.426789
3 -0.556486 -0.150080 -0.149494
4 -0.035858  0.777523 -0.453747

In [73]:    
df['total'] = df.loc[df['A'] > 0,['A','B']].sum(axis=1)
df['total'].fillna(0, inplace=True)
df

Out[73]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000

Another approach is to call where on the sum result, this takes a value param to return when the condition isn't met:

In [75]:
df['total'] = df[['A','B']].sum(axis=1).where(df['A'] > 0, 0)
df

Out[75]:
          A         B         C     total
0  0.197334  0.707852 -0.443475  0.905186
1 -1.063765 -0.914877  1.585882  0.000000
2  0.899477  1.064308  1.426789  1.963785
3 -0.556486 -0.150080 -0.149494  0.000000
4 -0.035858  0.777523 -0.453747  0.000000
Sign up to request clarification or add additional context in comments.

Comments

1

Another approach is to use numpy.where() method to select values. It returns elements chosen from the sum result if the condition is met, 0 otherwise. Due to a lower overhead, numpy methods are usually faster than their pandas cousins. Barring numba-jitted or Cython loops, this is the fastest approach for this specific task.

import numpy as np
df['Total'] = np.where(df['A'] < 0.78, df[['A','B','C']].sum(axis=1), 0)

or

df['total'] = np.where(df['dog'] == 'dog2', df[['A','B','C']].sum(axis=1), 0)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.