Conditional statement / If statement with Dataframes

Question

I am trying to assign a value to column 'Percentage' based on multiple columns 'Class' and 'Value'

Below is a link that has my dataframe: https://filebin.net/fo2wk7crmwf0fycc

This is the logic that I want to be applied:

If df['Class'] equals 2 or 3, and if df['Value'] is less than 0.5, set df['Percentage'] to 0
If df['Class'] equals 2 or 3, and if df['Value'] is > 0.5 and <= 0.7, set df['Percentage'] to 0.25
If df['Class'] equals 2 or 3, and if df['Value'] is > 0.7 and <= 0.9, set df['Percentage'] to 0.5
Else set df['Percentage'] to 1

Below is an example of the output I am looking for:

Class	Value	Percentage
2	0.01	0
2	0.6	0.25
3	0.9	0.5
3	3	1

Thank you

piRSquared · Accepted Answer · 2021-03-22 17:42:43Z

Numpy and `searchsorted`

When using searchsorted you don't need to include the boundaries like 0 and 1 in this case.

bins =  np.array([.5, .7, .9])
labels = np.array([0, .25, .5, 1])
cut = bins.searchsorted(df.Value)
results = labels[cut]

df.assign(Percentage=np.where(df['Class'].isin([2, 3]), results, 1))

       Class     Value  Percentage
0          2  0.000620         0.0
1          2  0.000620         0.0
2          3  0.001240         0.0
3          4  0.000620         1.0
4          5  0.000620         1.0
...      ...       ...         ...
14782      5  0.001178         1.0
14783      2  0.001116         0.0
14784      3  0.001178         0.0
14785      5  0.000310         1.0
14786      5  0.001116         1.0

[14787 rows x 3 columns]

Pandas `cut`

When using pd.cut you DO need the boundaries because Pandas will create intervals.

#                        / boundaries \
#                       ↓              ↓
cut = pd.cut(df.Value, [0, .5, .7, .9, 1], labels=[0, .25, .5, 1])

df.assign(Percentage=np.where(df['Class'].isin([2, 3]), cut, 1))

       Class     Value  Percentage
0          2  0.000620         0.0
1          2  0.000620         0.0
2          3  0.001240         0.0
3          4  0.000620         1.0
4          5  0.000620         1.0
...      ...       ...         ...
14782      5  0.001178         1.0
14783      2  0.001116         0.0
14784      3  0.001178         0.0
14785      5  0.000310         1.0
14786      5  0.001116         1.0

[14787 rows x 3 columns]

SeaBean · Accepted Answer · 2021-03-22 17:23:58Z

1

You can also use pure np.where like below:

import numpy as np    
df['Percentage'] = np.where((df['Class'].isin([2, 3]) & (df['Value'] <= 0.5)), 0, 
                            np.where((df['Class'].isin([2, 3]) & (df['Value'] > 0.5) & (df['Value'] <= 0.7)), 0.25,
                                np.where((df['Class'].isin([2, 3]) & (df['Value'] > 0.7) & (df['Value'] <= 0.9) ), 0.5, 1)))

np.where is just like if-then-else conditional statement which you can easily understand.

       Class     Value  Percentage
0          2  0.000620         0.0
1          2  0.000620         0.0
2          3  0.001240         0.0
3          4  0.000620         1.0
4          5  0.000620         1.0
...      ...       ...         ...
14782      5  0.001178         1.0
14783      2  0.001116         0.0
14784      3  0.001178         0.0
14785      5  0.000310         1.0
14786      5  0.001116         1.0

[14787 rows x 3 columns]

edited Mar 22, 2021 at 17:23

answered Mar 22, 2021 at 17:16

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

1 Comment

user11427018 Over a year ago

Thank you so much, this solution works perfectly!

Collectives™ on Stack Overflow

Conditional statement / If statement with Dataframes

2 Answers 2

Numpy and `searchsorted`

Pandas `cut`

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Numpy and searchsorted

Pandas cut

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related

Numpy and `searchsorted`

Pandas `cut`