0

I have a pandas dataframe, like this example:

df = pd.DataFrame({
    'steps': ['step1','step2', 'step3', 'step4','step5'],
    'qty': [100, 95, 92, 87, 78]},
    index=[0,1,2,3,4])

I would like to calculate the percentage of abandonments in each step, based on the first value.

Output:

Steps   qty Tx %
Step1   100 0,00%
Step2   95  5,00%
Step3   92  3,00%
Step4   87  5,00%
Step5   78  9,00%

I thought about using pd.pct_change(), but it doesn't work as expected.

The manual calculation would be something like:


values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
          (1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
          (1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]

However, I believe that this is not scalable, especially considering increasing the number of steps or time periods.

Could someone help me think of some function or show a rationale that can make this calculation simpler?

3 Answers 3

2

You can use pd.Series.shift to shift the qty column one element down. Then simply calculate the difference between the shifted column and itself:

import pandas as pd

df = pd.DataFrame({
    'steps': ['step1','step2', 'step3', 'step4','step5'],
    'qty': [100, 95, 92, 87, 78],
})

df['Tx %'] = df.qty.shift() - df.qty
print(df)

# output:
#    steps  qty  Tx %
# 0  step1  100   NaN
# 1  step2   95   5.0
# 2  step3   92   3.0
# 3  step4   87   5.0
# 4  step5   78   9.0
Sign up to request clarification or add additional context in comments.

3 Comments

I think you should add / df.qty[0] on the calculation
@NoThlnG this will depend on whether OP wants to use percentage as a number (5 as in 5%) or as a fraction (0.05) (the question is not very clear regarding this).
This works! df['Tx %'] = round(((1 - df.qty / df.qty[0]) - (1- df.qty.shift() / df.qty[0]))*100, 2)
1

This works:

# you have
values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
          (1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
          (1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]

# 1s cancel out, so the above is equivalent to
values = [( - df['qty'][0]/df['qty'][0]) + (df['qty'][0]/df['qty'][0]),
          ( - df['qty'][1]/df['qty'][0]) + (df['qty'][0]/df['qty'][0]),
          ( - df['qty'][2]/df['qty'][0]) + (df['qty'][1]/df['qty'][0]),
          ( - df['qty'][3]/df['qty'][0]) + (df['qty'][2]/df['qty'][0]),
          ( - df['qty'][4]/df['qty'][0]) + (df['qty'][3]/df['qty'][0])]

# since every element is divided by df['qty'][0], you can take it out, so the above is equivalent to
values = [( - df['qty'][0] + df['qty'][0]) / df['qty'][0],
          ( - df['qty'][1] + df['qty'][0]) / df['qty'][0],
          ( - df['qty'][2] + df['qty'][1]) / df['qty'][0],
          ( - df['qty'][3] + df['qty'][2]) / df['qty'][0],
          ( - df['qty'][4] + df['qty'][3]) / df['qty'][0]]

#the RHS is the LHS shifted one level down, so it's equivalent to:
values = (- df['qty'] + df['qty'].shift()).fillna(0) / df['qty'][0]

# in summary
df['Tx %'] = (df['qty'].shift() - df['qty']).fillna(0) / df['qty'][0] * 100
   steps  qty  Tx %
0  step1  100   0.0
1  step2   95   5.0
2  step3   92   3.0
3  step4   87   5.0
4  step5   78   9.0

Comments

0

This response my doubt:

df['Tx %'] = round(((1 - df.qty / df.qty[0]) - (1- df.qty.shift() / df.qty[0]))*100, 2)

Thanks to @jfaccioni

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.