Is there a Python function to calculate the diff based on the first element?

Question

I have a pandas dataframe, like this example:

df = pd.DataFrame({
    'steps': ['step1','step2', 'step3', 'step4','step5'],
    'qty': [100, 95, 92, 87, 78]},
    index=[0,1,2,3,4])

I would like to calculate the percentage of abandonments in each step, based on the first value.

Output:

Steps   qty Tx %
Step1   100 0,00%
Step2   95  5,00%
Step3   92  3,00%
Step4   87  5,00%
Step5   78  9,00%

I thought about using pd.pct_change(), but it doesn't work as expected.

The manual calculation would be something like:


values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
          (1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
          (1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]

However, I believe that this is not scalable, especially considering increasing the number of steps or time periods.

Could someone help me think of some function or show a rationale that can make this calculation simpler?

jfaccioni · Accepted Answer · 2022-05-16 19:15:14Z

2

You can use pd.Series.shift to shift the qty column one element down. Then simply calculate the difference between the shifted column and itself:

import pandas as pd

df = pd.DataFrame({
    'steps': ['step1','step2', 'step3', 'step4','step5'],
    'qty': [100, 95, 92, 87, 78],
})

df['Tx %'] = df.qty.shift() - df.qty
print(df)

# output:
#    steps  qty  Tx %
# 0  step1  100   NaN
# 1  step2   95   5.0
# 2  step3   92   3.0
# 3  step4   87   5.0
# 4  step5   78   9.0

answered May 16, 2022 at 19:15

jfaccioni

7,5591 gold badge11 silver badges27 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

nxht Over a year ago

I think you should add / df.qty[0] on the calculation

jfaccioni Over a year ago

@NoThlnG this will depend on whether OP wants to use percentage as a number (5 as in 5%) or as a fraction (0.05) (the question is not very clear regarding this).

Ana Beatriz Over a year ago

This works! df['Tx %'] = round(((1 - df.qty / df.qty[0]) - (1- df.qty.shift() / df.qty[0]))*100, 2)

cottontail · Accepted Answer · 2022-05-16 19:30:31Z

This works:

# you have
values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
          (1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
          (1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
          (1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]

# 1s cancel out, so the above is equivalent to
values = [( - df['qty'][0]/df['qty'][0]) + (df['qty'][0]/df['qty'][0]),
          ( - df['qty'][1]/df['qty'][0]) + (df['qty'][0]/df['qty'][0]),
          ( - df['qty'][2]/df['qty'][0]) + (df['qty'][1]/df['qty'][0]),
          ( - df['qty'][3]/df['qty'][0]) + (df['qty'][2]/df['qty'][0]),
          ( - df['qty'][4]/df['qty'][0]) + (df['qty'][3]/df['qty'][0])]

# since every element is divided by df['qty'][0], you can take it out, so the above is equivalent to
values = [( - df['qty'][0] + df['qty'][0]) / df['qty'][0],
          ( - df['qty'][1] + df['qty'][0]) / df['qty'][0],
          ( - df['qty'][2] + df['qty'][1]) / df['qty'][0],
          ( - df['qty'][3] + df['qty'][2]) / df['qty'][0],
          ( - df['qty'][4] + df['qty'][3]) / df['qty'][0]]

#the RHS is the LHS shifted one level down, so it's equivalent to:
values = (- df['qty'] + df['qty'].shift()).fillna(0) / df['qty'][0]

# in summary
df['Tx %'] = (df['qty'].shift() - df['qty']).fillna(0) / df['qty'][0] * 100
   steps  qty  Tx %
0  step1  100   0.0
1  step2   95   5.0
2  step3   92   3.0
3  step4   87   5.0
4  step5   78   9.0

Ana Beatriz · Accepted Answer · 2022-05-16 19:32:14Z

0

This response my doubt:

df['Tx %'] = round(((1 - df.qty / df.qty[0]) - (1- df.qty.shift() / df.qty[0]))*100, 2)

Thanks to @jfaccioni

answered May 16, 2022 at 19:32

Ana Beatriz

1092 silver badges9 bronze badges

Collectives™ on Stack Overflow

Is there a Python function to calculate the diff based on the first element?

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related