0

I'm working with a dataframe on pandas and I'm trying to sum the values of different rows to a new column. This must be based on the previous date (current month - 1 to be precise).

I have something like this:

Period  Value
2015-01 1
2015-09 2
2015-10 1
2015-11 3
2015-12 1

And I would like to create a new column with the sum of 'Value' from the current 'Period' and ('Period' - 1month) if it exists. Example:

Period  Value Result
2015-01 1     1
2015-09 2     2
2015-10 1     3
2015-11 3     4
2015-12 1     4

I tried to use a lambda function with something like:

df['Result'] = df.apply(lambda x: df.loc[(df.Period <= x.Period) & 
                                         (x.Period >= df.Period-1),
                                         ['Value']].sum(), axis=1)

It was based on other answers, but I'm a little confused if it is the best way to do it and how to make it work successfully (It is not giving any python error message, but it is not giving my expected output either).

UPDATE

I'm testing @taras answer on a simple example with three columns:

Account Period  Value
15035   2015-01 1
15035   2015-09 1
15035   2015-10 1

The expected result would be:

Account Period  Value
15035   2015-01 1
15035   2015-09 1
15035   2015-10 2

But I'm getting:

Account Period  Value
15035   2015-01 1
15035   2015-09 2
15035   2015-10 2

When inspecting

print(df.loc[df.index - 1, 'Value'].fillna(0).values)

I'm getting [ 0. 1. 1.] (it should be [ 0. 0. 1.]). By looking at

print(df.loc[df.index - 1, 'Period'].fillna(0).values)

I'm getting [0 Period('2015-01', 'M') Period('2015-09', 'M')] (which looks like the index is getting the value from the previous row, and not the previous month).

Am I doing something wrong?

2
  • what's the type of Period? string? Commented Jul 18, 2018 at 15:29
  • Period is a PeriodIndex, obtained by using the function dt.to_period("M") on the column (it was previously a datetime). Commented Jul 18, 2018 at 16:43

2 Answers 2

2

You can compute the index of rows for previous month with

idx = df.index - pd.DateOffset(months=1)

and then simply add it to your Value column

df.loc[idx, 'Value'].fillna(0).values + df['Value']

which results in

Period
2015-01-01    1.0
2015-09-01    2.0
2015-10-01    3.0
2015-11-01    4.0
2015-12-01    4.0
Name: Value, dtype: float64

Update: since you use pd.PeriodIndex rather than df.DatetimeIndex, idx is computed in much simple way:

idx = df.index - 1

because your period is 1 month.

So, to wrap up, the whole thing can be expressed in one quite simple expression:

df.loc[df.index - 1, 'Value'].fillna(0).values + df['Value']
Sign up to request clarification or add additional context in comments.

7 Comments

The Period column is a PeriodIndex, obtained by using the function dt.to_period("M") on the column (it was previously a datetime). Is there any workaround using this type? Oh, and in your answer it seems to have a ")" not needed in the end
@LuizFGonçalves, I have updated the answer, so it properly handles PeriodIndex
@taras, I will try it to see if there is any other problems with it. Thanks
@taras I'm testing here and it seems your solution for PeriodIndex is not considering the previous month (current - 1), but the previous row. The second row is adding the value of the first one, but it should not (as 2015-09 should be related to 2015-08 and not 2015-01).
My complete data has one column before Period that is not important to the question itself (It is a value that repeats in all rows), but I'm afraid that maybe it could be why your solution is not working for me. Is this column a problem?
|
1

You can join on an auxiliary column that manages the string conversion of your inputs:

import pandas as pd
from datetime import datetime

df['prev'] = (df.Period.apply(lambda x: x.to_timestamp()) - pd.DateOffset(months=1)
aux = df.merge(df, how='left', left_on = 'prev', right_on = 'Period')
df['sum'] = aux.Value_x + aux.Value_y
df= df.drop('prev',axis=1) 

3 Comments

I think your solution is going to work, but I have to adapt some parts of the code to make it work correctly with PeriodIndex (pd.DateOffset(months=1) did not work, for example). I'm doing it right now and than I'll update you of the result. Thanks
If you have a PeriodIndex you can replace the Period.apply to Period.to_timestamp() directly :)
I changed the lambda function to (lambda x: x - 1) and it seems to work well with PeriodIndex. I'm upvoting it and waiting for taras answer to decide on the best solution. Thanks a lot for your help.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.