How to calculate cumulative sum from different columns based on months using pandas dataframe?

Question

I want to calculate the cumulative sum of values from different columns based on month.

Code:

import pandas as pd
import numpy as np

data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
    'kpi': ['sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales'],
    're': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
    're3+9': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
    're6+6': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
    're9+3': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20]
    }

# Create DataFrame
df = pd.DataFrame(data)
print(df)
df['re_final'] = np.where((df['month'] == 'April') | (df['month'] == 'May') | (df['month'] == 'June'),
                                    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
df['re_final'] = np.where((df['month'] == 'July') | (df['month'] == 'August') | (df['month'] == 'September'),
                                    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
df['re_final'] = np.where((df['month'] == 'October') | (df['month'] == 'November') | (df['month'] == 'December'),
                                    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
df['re_final'] = np.where((df['month'] == 'January') | (df['month'] == 'February') | (df['month'] == 'March'),
                                    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
print(df)

Here there is four columns named re,re3+9,re6+6,re9+3 for taking the cumulative sum values.I want to calculate cumulative sum based on the below conditions:

If the months are April,May and June, cumulative sum will be taken from column re
If the months are July,August and September , cumulative sum will be taken from re3+9
If the months are October,November and December , cumulative sum will be taken from re6+6
If the months are January,February and March,Cumulative sum will be taken from re9+3

My expected output should be like this:

       month             kpi  re  re3+9  re6+6  re9+3     re_final
0       April           sales  10     10     10     10     10
1         May           sales  20     20     20     20     30
2        June           sales  30     30     30     30     60
3        July           sales  40     40     40     40     100
4      August           sales  50     50     50     50     150
5   September           sales  60     60     60     60     210
6     October           sales  70     70     70     70     280
7    November           sales  80     80     80     80     360    
8    December           sales  90     90     90     90     450
9     January           sales  10     10     10     10     460
10   February           sales  10     10     10     10     470
11      March           sales  20     20     20     20     490

But when I use the above code, I got an error like below:

Traceback (most recent call last):
File "/home/anf/anaconda3/envs/pa/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in   pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 're_final'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/anf/pa/s.py", line 19, in <module>
    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
  File "/home/anf/anaconda3/envs/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/anf/anaconda3/envs/pa/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 're_final'

Can anyone suggest a solution to solve this issue?

Ynjxsjmh · Accepted Answer · 2022-09-27 10:49:34Z

1

You can create re_final column before your multiple np.where

df['re_final'] = -1

I would suggest you use np.select for multiple conditions and Series.isin for checking value in list, for example:

g = df.groupby(['kpi'], sort=False)

df['re_final'] = np.select(
    [(df['month'].isin(['April', 'May', 'June']),
     (df['month'].isin(['July', 'August', 'September'])),
     ], # conditions
    [g['re'].cumsum(),
     g['re+9'].cumsum()], # choices
    -1, # default value
)

edited Sep 27, 2022 at 10:49

answered Sep 27, 2022 at 9:50

Ynjxsjmh

30.3k7 gold badges43 silver badges64 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Abin Benny Over a year ago

Is there closing bracket missing in (df['month'].isin(['April', 'May', 'June'])@Ynjxsjmh

Abin Benny Over a year ago

Add ) at the end of (df['month'].isin(['April', 'May', 'June']) to avoid error

Ynjxsjmh Over a year ago

@AbinBenny Yes, didn't test this, thanks for pointing out.

mozway · Accepted Answer · 2022-09-27 10:02:36Z

0

I would use a dictionary to perform the month/column mapping, then use an indexing lookup combined with numpy.cumsum:

d = {'re9+3': ['January', 'February', 'March'],
     're':    ['April', 'May', 'June'],
     're3+9': ['July', 'August', 'September'],
     're6+6': ['October', 'November', 'December'],
    }

# reverse dictionary
d2 = {k:v for v,l in d.items() for k in l}

idx, cols = pd.factorize(df['month'].map(d2))
df['re_final'] = np.cumsum(df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx])

output:

        month    kpi  re  re3+9  re6+6  re9+3  re_final
0       April  sales  10     10     10     10        10
1         May  sales  20     20     20     20        30
2        June  sales  30     30     30     30        60
3        July  sales  40     40     40     40       100
4      August  sales  50     50     50     50       150
5   September  sales  60     60     60     60       210
6     October  sales  70     70     70     70       280
7    November  sales  80     80     80     80       360
8    December  sales  90     90     90     90       450
9     January  sales  10     10     10     10       460
10   February  sales  10     10     10     10       470
11      March  sales  20     20     20     20       490

answered Sep 27, 2022 at 10:02

mozway

267k13 gold badges56 silver badges106 bronze badges

2 Comments

Abin Benny Over a year ago

How to perform groubby@mozway

mozway Over a year ago

@Abin what do you mean? There is no need for a groupby.

Collectives™ on Stack Overflow

How to calculate cumulative sum from different columns based on months using pandas dataframe?

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related