Pandas Sum values from different columns based on dates

Question

I'm working with a dataframe on pandas and I'm trying to sum the values of different rows to a new column. This must be based on the previous date (current month - 1 to be precise).

I have something like this:

Period  Value
2015-01 1
2015-09 2
2015-10 1
2015-11 3
2015-12 1

And I would like to create a new column with the sum of 'Value' from the current 'Period' and ('Period' - 1month) if it exists. Example:

Period  Value Result
2015-01 1     1
2015-09 2     2
2015-10 1     3
2015-11 3     4
2015-12 1     4

I tried to use a lambda function with something like:

df['Result'] = df.apply(lambda x: df.loc[(df.Period <= x.Period) & 
                                         (x.Period >= df.Period-1),
                                         ['Value']].sum(), axis=1)

It was based on other answers, but I'm a little confused if it is the best way to do it and how to make it work successfully (It is not giving any python error message, but it is not giving my expected output either).

UPDATE

I'm testing @taras answer on a simple example with three columns:

Account Period  Value
15035   2015-01 1
15035   2015-09 1
15035   2015-10 1

The expected result would be:

Account Period  Value
15035   2015-01 1
15035   2015-09 1
15035   2015-10 2

But I'm getting:

Account Period  Value
15035   2015-01 1
15035   2015-09 2
15035   2015-10 2

When inspecting

print(df.loc[df.index - 1, 'Value'].fillna(0).values)

I'm getting [ 0. 1. 1.] (it should be [ 0. 0. 1.]). By looking at

print(df.loc[df.index - 1, 'Period'].fillna(0).values)

I'm getting [0 Period('2015-01', 'M') Period('2015-09', 'M')] (which looks like the index is getting the value from the previous row, and not the previous month).

Am I doing something wrong?

Period is a PeriodIndex, obtained by using the function dt.to_period("M") on the column (it was previously a datetime). — LuizF Gonçalves
– LuizF Gonçalves, Commented Jul 18, 2018 at 16:43

taras · Accepted Answer · 2018-07-18 17:41:00Z

2

You can compute the index of rows for previous month with

idx = df.index - pd.DateOffset(months=1)

and then simply add it to your Value column

df.loc[idx, 'Value'].fillna(0).values + df['Value']

which results in

Period
2015-01-01    1.0
2015-09-01    2.0
2015-10-01    3.0
2015-11-01    4.0
2015-12-01    4.0
Name: Value, dtype: float64

Update: since you use pd.PeriodIndex rather than df.DatetimeIndex, idx is computed in much simple way:

idx = df.index - 1

because your period is 1 month.

So, to wrap up, the whole thing can be expressed in one quite simple expression:

df.loc[df.index - 1, 'Value'].fillna(0).values + df['Value']

edited Jul 18, 2018 at 17:41

answered Jul 18, 2018 at 15:33

taras

6,93510 gold badges46 silver badges54 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

LuizF Gonçalves Over a year ago

The Period column is a PeriodIndex, obtained by using the function dt.to_period("M") on the column (it was previously a datetime). Is there any workaround using this type? Oh, and in your answer it seems to have a ")" not needed in the end

taras Over a year ago

@LuizFGonçalves, I have updated the answer, so it properly handles PeriodIndex

LuizF Gonçalves Over a year ago

@taras, I will try it to see if there is any other problems with it. Thanks

LuizF Gonçalves Over a year ago

@taras I'm testing here and it seems your solution for PeriodIndex is not considering the previous month (current - 1), but the previous row. The second row is adding the value of the first one, but it should not (as 2015-09 should be related to 2015-08 and not 2015-01).

LuizF Gonçalves Over a year ago

My complete data has one column before Period that is not important to the question itself (It is a value that repeats in all rows), but I'm afraid that maybe it could be why your solution is not working for me. Is this column a problem?

|

Yuca · Accepted Answer · 2018-07-18 17:24:45Z

1

You can join on an auxiliary column that manages the string conversion of your inputs:

import pandas as pd
from datetime import datetime

df['prev'] = (df.Period.apply(lambda x: x.to_timestamp()) - pd.DateOffset(months=1)
aux = df.merge(df, how='left', left_on = 'prev', right_on = 'Period')
df['sum'] = aux.Value_x + aux.Value_y
df= df.drop('prev',axis=1)

edited Jul 18, 2018 at 17:24

answered Jul 18, 2018 at 15:40

Yuca

6,1114 gold badges26 silver badges45 bronze badges

3 Comments

LuizF Gonçalves Over a year ago

I think your solution is going to work, but I have to adapt some parts of the code to make it work correctly with PeriodIndex (pd.DateOffset(months=1) did not work, for example). I'm doing it right now and than I'll update you of the result. Thanks

Yuca Over a year ago

If you have a PeriodIndex you can replace the Period.apply to Period.to_timestamp() directly :)

LuizF Gonçalves Over a year ago

I changed the lambda function to (lambda x: x - 1) and it seems to work well with PeriodIndex. I'm upvoting it and waiting for taras answer to decide on the best solution. Thanks a lot for your help.

Collectives™ on Stack Overflow

Pandas Sum values from different columns based on dates

2 Answers 2

7 Comments

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

7 Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related