Condition is ignored using .where() in Pandas

Question

I'm trying to calculate the Bonus Pay given to our Courier's based on their Success Rate which is based off the number of Eligible orders they've successfully delivered.

Here's my code:

from openpyxl import load_workbook
import pandas as pd

df = pd.read_excel(r'path\filename.xlsx')


df['DeliveredAt'] = pd.to_datetime(df['DeliveredAt'].astype(str))

df['Date'] = df['DeliveredAt'].dt.strftime('%d/%m/%y')

df['Total Orders'] = df['OrderNumber']

df['Total Orders'] = 1

df['Eligible'] = df['DeliveryOnTime'].isin(["On-time", "Early"])

df['Success Rate'] = (df['Eligible']/df['Total Orders'])*100

df['Bonus'] = df['Eligible'].mul(1.2).where(df['Success Rate'] >= 95)

per_day = df.groupby(['Date', 'Courier']).agg({'Success Rate': 'mean', 'Total Orders': 'count', 'Eligible': 'sum', 'Incentive': 'sum'})

per_courier = df.groupby('Courier').agg({'Total Orders': 'count', 'Incentive': 'sum'})

I'm having issues with

df['Bonus'] = df['Eligible'].mul(1.2).where(df['Success Rate'] >= 95)

It seems to ignore the condition that Success Rate has to be over 95 for it to qualify for a Bonus. Right now, each row has a Bonus pay which is calculated from the number of Eligible * 1.2 regardless of the success rate value.

Sample Data:

data = {'ID': [1, 1, 1, 2, 2, 3, 4, 5, 5],
         'DeliveryOnTime': ["On-time", "Late", "Early", "On-time", "On-time", "Late", "Early", "Early", "Late"],
      }

df = pd.DataFrame(data)

#For the sake of example data, the count of `DeliveryOnTime` will be the total number of orders. 
df['Total Orders'] = df['DeliveryOnTime'].count() 
df['Eligible'] = df['DeliveryOnTime'].isin(["On-time", "Early"])
df['Success Rate'] = (df['Eligible']/df['Total Orders'])*100
df['Bonus'] = df['Eligible'].mul(1.2).where(df['Success Rate'] >= 95)

Current Output

    ID       Eligible      Total Orders    Success Rate   Bonus
     1           2              3            66.66         2.4
     2           2              2             100          2.4
     3           0              1              0            0
     4           1              1             100          1.2
     5           1              2              50          1.2

This is wrong as ID 1 and 5 should not have any Bonus at all with the Success Rate being lower than 95.

Expected Output

ID       Eligible     Total Orders    Success Rate    Bonus
 1           2              3            66.66          0
 2           2              2             100          2.4
 3           0              1              0            0
 4           1              1             100          1.2
 5           1              2              50           0

I'm not sure if it's something I have to change in the Per_day or Per_courier section of the .groupby() as that seems to be where most of my issues have occurred when writing this code.

Thanks

please rework your question to put emphasis on your minimal reproducible example (provide the expected output) — mozway
– mozway, Commented Nov 26, 2021 at 9:50
@mozway Hi thanks for your reply, I've added an expected output from the sample now. — cchev
– cchev, Commented Nov 26, 2021 at 10:09

Rupal Shah · Accepted Answer · 2021-11-26 10:18:52Z

1

As per what I understood, you need to find Bonus based on the success rate of the delivery boy. For that first you need to filter out with the eligible one.

Considering your own example

data = {'ID': [1, 1, 1, 2, 2, 3, 4, 5, 5],
         'DeliveryOnTime': ["On-time", "Late", "Early", "On-time", "On-time", "Late", "Early", "Early", "Late"],
      }

df = pd.DataFrame(data)

df['Eligible'] = df['DeliveryOnTime'].isin(["On-time", "Early"])
df['Total Orders'] = df['DeliveryOnTime'].count() 

df.replace([True,False], [1,0],inplace=True)
per_courier_eligible = df.groupby('ID').agg({'Eligible': 'sum','Total Orders':'count'})
per_courier_eligible['Success Rate'] = per_courier_eligible['Eligible']/per_courier_eligible['Total Orders']
per_courier_eligible['Bonus'] = per_courier_eligible['Eligible']
per_courier_eligible['Bonus'].loc[(per_courier_eligible['Success Rate'] >= 0.95)] = per_courier_eligible['Eligible'] + 1.2
print(per_courier_eligible)

answered Nov 26, 2021 at 10:18

Rupal Shah

3391 silver badge11 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Rupal Shah Over a year ago

replace that statement with the following: per_courier_eligible['Bonus'] = per_courier_eligible['Bonus'].where(per_courier_eligible['Success Rate'] >= 0.95, per_courier_eligible['Eligible'] * 1.2)

cchev Over a year ago

Hi thanks for your help, I didn't use your code but I used an element of filtering it through each ID for it to work. I appreciate your help!

Rupal Shah Over a year ago

Its good you could make your code work.

rl4215 · Accepted Answer · 2021-11-26 10:18:52Z

1

Is the problem with the order of .mul and .where?

This worked when I tried to reproduce:

df['Bonus'] = df['Eligible'].where(df['Success Rate'] >= 95).mul(1.2)

answered Nov 26, 2021 at 10:18

rl4215

294 bronze badges

1 Comment

cchev Over a year ago

Hey thanks for your answer but unfortunately it still multiplies those under 95 Success Rate even with it switched round.

cchev · Accepted Answer · 2021-12-17 15:12:14Z

0

I used an element of what @Rupal Shah said by filtering it per ID for the Success Rate and Bonus pay to work.

All I had to do was create a .groupby():

ID = df.groupby(['ID', 'Date']).agg({'Total Orders':'sum','Eligible': 'sum'})

After this, I changed any following syntaxes with calculations from df. to ID.e.g.

ID['Success Rate'] = (ID['Eligible']/ID['Total Orders'])*100

edited Dec 17, 2021 at 15:12

answered Nov 26, 2021 at 11:01

cchev

1793 silver badges14 bronze badges

Collectives™ on Stack Overflow

Condition is ignored using .where() in Pandas

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related