0

I want to calculate the cumulative sum of values from different columns based on month.

Code:

import pandas as pd
import numpy as np

data = {'month': ['April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December', 'January', 'February', 'March'],
    'kpi': ['sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales', 'sales'],
    're': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
    're3+9': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
    're6+6': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20],
    're9+3': [10, 20, 30, 40, 50, 60, 70, 80, 90, 10, 10, 20]
    }

# Create DataFrame
df = pd.DataFrame(data)
print(df)
df['re_final'] = np.where((df['month'] == 'April') | (df['month'] == 'May') | (df['month'] == 'June'),
                                    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
df['re_final'] = np.where((df['month'] == 'July') | (df['month'] == 'August') | (df['month'] == 'September'),
                                    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
df['re_final'] = np.where((df['month'] == 'October') | (df['month'] == 'November') | (df['month'] == 'December'),
                                    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
df['re_final'] = np.where((df['month'] == 'January') | (df['month'] == 'February') | (df['month'] == 'March'),
                                    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
print(df)

Here there is four columns named re,re3+9,re6+6,re9+3 for taking the cumulative sum values.I want to calculate cumulative sum based on the below conditions:

  1. If the months are April,May and June, cumulative sum will be taken from column re
  2. If the months are July,August and September , cumulative sum will be taken from re3+9
  3. If the months are October,November and December , cumulative sum will be taken from re6+6
  4. If the months are January,February and March,Cumulative sum will be taken from re9+3

My expected output should be like this:

       month             kpi  re  re3+9  re6+6  re9+3     re_final
0       April           sales  10     10     10     10     10
1         May           sales  20     20     20     20     30
2        June           sales  30     30     30     30     60
3        July           sales  40     40     40     40     100
4      August           sales  50     50     50     50     150
5   September           sales  60     60     60     60     210
6     October           sales  70     70     70     70     280
7    November           sales  80     80     80     80     360    
8    December           sales  90     90     90     90     450
9     January           sales  10     10     10     10     460
10   February           sales  10     10     10     10     470
11      March           sales  20     20     20     20     490

But when I use the above code, I got an error like below:

Traceback (most recent call last):
File "/home/anf/anaconda3/envs/pa/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in   pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 're_final'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/anf/pa/s.py", line 19, in <module>
    df.groupby(['kpi'], sort=False)['re'].cumsum(), df['re_final'])
  File "/home/anf/anaconda3/envs/pa/lib/python3.9/site-packages/pandas/core/frame.py", line 3505, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/anf/anaconda3/envs/pa/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
    raise KeyError(key) from err
KeyError: 're_final'

Can anyone suggest a solution to solve this issue?

2 Answers 2

1

You can create re_final column before your multiple np.where

df['re_final'] = -1

I would suggest you use np.select for multiple conditions and Series.isin for checking value in list, for example:

g = df.groupby(['kpi'], sort=False)

df['re_final'] = np.select(
    [(df['month'].isin(['April', 'May', 'June']),
     (df['month'].isin(['July', 'August', 'September'])),
     ], # conditions
    [g['re'].cumsum(),
     g['re+9'].cumsum()], # choices
    -1, # default value
)
Sign up to request clarification or add additional context in comments.

3 Comments

Is there closing bracket missing in (df['month'].isin(['April', 'May', 'June'])@Ynjxsjmh
Add ) at the end of (df['month'].isin(['April', 'May', 'June']) to avoid error
@AbinBenny Yes, didn't test this, thanks for pointing out.
0

I would use a dictionary to perform the month/column mapping, then use an indexing lookup combined with numpy.cumsum:

d = {'re9+3': ['January', 'February', 'March'],
     're':    ['April', 'May', 'June'],
     're3+9': ['July', 'August', 'September'],
     're6+6': ['October', 'November', 'December'],
    }

# reverse dictionary
d2 = {k:v for v,l in d.items() for k in l}

idx, cols = pd.factorize(df['month'].map(d2))
df['re_final'] = np.cumsum(df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx])

output:

        month    kpi  re  re3+9  re6+6  re9+3  re_final
0       April  sales  10     10     10     10        10
1         May  sales  20     20     20     20        30
2        June  sales  30     30     30     30        60
3        July  sales  40     40     40     40       100
4      August  sales  50     50     50     50       150
5   September  sales  60     60     60     60       210
6     October  sales  70     70     70     70       280
7    November  sales  80     80     80     80       360
8    December  sales  90     90     90     90       450
9     January  sales  10     10     10     10       460
10   February  sales  10     10     10     10       470
11      March  sales  20     20     20     20       490

2 Comments

How to perform groubby@mozway
@Abin what do you mean? There is no need for a groupby.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.