I have a pandas dataframe, like this example:
df = pd.DataFrame({
'steps': ['step1','step2', 'step3', 'step4','step5'],
'qty': [100, 95, 92, 87, 78]},
index=[0,1,2,3,4])
I would like to calculate the percentage of abandonments in each step, based on the first value.
Output:
Steps qty Tx %
Step1 100 0,00%
Step2 95 5,00%
Step3 92 3,00%
Step4 87 5,00%
Step5 78 9,00%
I thought about using pd.pct_change(), but it doesn't work as expected.
The manual calculation would be something like:
values = [(1 - df['qty'][0]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
(1 - df['qty'][1]/df['qty'][0]) - (1-df['qty'][0]/df['qty'][0]),
(1 - df['qty'][2]/df['qty'][0]) - (1-df['qty'][1]/df['qty'][0]),
(1 - df['qty'][3]/df['qty'][0]) - (1-df['qty'][2]/df['qty'][0]),
(1 - df['qty'][4]/df['qty'][0]) - (1-df['qty'][3]/df['qty'][0])]
However, I believe that this is not scalable, especially considering increasing the number of steps or time periods.
Could someone help me think of some function or show a rationale that can make this calculation simpler?