I want to calculate the cumulative sum of values from different columns based on month.
Code:
import pandas as pd
import numpy as np
data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
'kpi': ['sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales'],
're': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
're3+9': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
're6+6': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
're9+3': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20]
}
# Create DataFrame
df = pd.DataFrame(data)
print(df)
df['re_final'] = np.where((df['month'] == 'April') | (df['month'] == 'May') | (df['month'] == 'June'),
df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
df['re_final'] = np.where((df['month'] == 'July') | (df['month'] == 'August') | (df['month'] == 'September'),
df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
df['re_final'] = np.where((df['month'] == 'October') | (df['month'] == 'November') | (df['month'] == 'December'),
df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
df['re_final'] = np.where((df['month'] == 'January') | (df['month'] == 'February') | (df['month'] == 'March'),
df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
print(df)
Here there is four columns named re,re3+9,re6+6,re9+3 for taking the cumulative sum values.I want to calculate cumulative sum based on the below conditions:
- If the months are April,May and June, cumulative sum will be taken from column
re - If the months are July,August and September , cumulative sum will be taken from
re3+9 - If the months are October,November and December , cumulative sum will be taken from
re6+6 - If the months are January,February and March,Cumulative sum will be taken from
re9+3
My expected output should be like this:
month kpi re re3+9 re6+6 re9+3 re_final
0 April sales 10 10 10 10 10
1 May sales 20 20 20 20 30
2 June sales 30 30 30 30 60
3 July sales 40 40 40 40 100
4 August sales 50 50 50 50 150
5 September sales 60 60 60 60 210
6 October sales 70 70 70 70 280
7 November sales 80 80 80 80 360
8 December sales 90 90 90 90 450
9 January sales 10 10 10 10 460
10 February sales 10 10 10 10 470
11 March sales 20 20 20 20 490
But when I use the above code, I got an error like below:
Traceback (most recent call last):
File "/home/anf/anaconda3/envs/pa/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 're_final'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/anf/pa/s.py", line 19, in <module>
df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
File "/home/anf/anaconda3/envs/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "/home/anf/anaconda3/envs/pa/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 're_final'
Can anyone suggest a solution to solve this issue?