3

I am trying to assign a value to column 'Percentage' based on multiple columns 'Class' and 'Value'

Below is a link that has my dataframe: https://filebin.net/fo2wk7crmwf0fycc

This is the logic that I want to be applied:

If df['Class'] equals 2 or 3, and if df['Value'] is less than 0.5, set df['Percentage'] to 0
If df['Class'] equals 2 or 3, and if df['Value'] is > 0.5 and <= 0.7, set df['Percentage'] to 0.25
If df['Class'] equals 2 or 3, and if df['Value'] is > 0.7 and <= 0.9, set df['Percentage'] to 0.5
Else set df['Percentage'] to 1

Below is an example of the output I am looking for:

Class Value Percentage
2 0.01 0
2 0.6 0.25
3 0.9 0.5
3 3 1

Thank you

2 Answers 2

3

Numpy and searchsorted

When using searchsorted you don't need to include the boundaries like 0 and 1 in this case.

bins =  np.array([.5, .7, .9])
labels = np.array([0, .25, .5, 1])
cut = bins.searchsorted(df.Value)
results = labels[cut]

df.assign(Percentage=np.where(df['Class'].isin([2, 3]), results, 1))

       Class     Value  Percentage
0          2  0.000620         0.0
1          2  0.000620         0.0
2          3  0.001240         0.0
3          4  0.000620         1.0
4          5  0.000620         1.0
...      ...       ...         ...
14782      5  0.001178         1.0
14783      2  0.001116         0.0
14784      3  0.001178         0.0
14785      5  0.000310         1.0
14786      5  0.001116         1.0

[14787 rows x 3 columns]

Pandas cut

When using pd.cut you DO need the boundaries because Pandas will create intervals.

#                        / boundaries \
#                       ↓              ↓
cut = pd.cut(df.Value, [0, .5, .7, .9, 1], labels=[0, .25, .5, 1])

df.assign(Percentage=np.where(df['Class'].isin([2, 3]), cut, 1))

       Class     Value  Percentage
0          2  0.000620         0.0
1          2  0.000620         0.0
2          3  0.001240         0.0
3          4  0.000620         1.0
4          5  0.000620         1.0
...      ...       ...         ...
14782      5  0.001178         1.0
14783      2  0.001116         0.0
14784      3  0.001178         0.0
14785      5  0.000310         1.0
14786      5  0.001116         1.0

[14787 rows x 3 columns]
Sign up to request clarification or add additional context in comments.

Comments

1

You can also use pure np.where like below:

import numpy as np    
df['Percentage'] = np.where((df['Class'].isin([2, 3]) & (df['Value'] <= 0.5)), 0, 
                            np.where((df['Class'].isin([2, 3]) & (df['Value'] > 0.5) & (df['Value'] <= 0.7)), 0.25,
                                np.where((df['Class'].isin([2, 3]) & (df['Value'] > 0.7) & (df['Value'] <= 0.9) ), 0.5, 1)))

np.where is just like if-then-else conditional statement which you can easily understand.

       Class     Value  Percentage
0          2  0.000620         0.0
1          2  0.000620         0.0
2          3  0.001240         0.0
3          4  0.000620         1.0
4          5  0.000620         1.0
...      ...       ...         ...
14782      5  0.001178         1.0
14783      2  0.001116         0.0
14784      3  0.001178         0.0
14785      5  0.000310         1.0
14786      5  0.001116         1.0

[14787 rows x 3 columns]

1 Comment

Thank you so much, this solution works perfectly!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.