Build rows in Python Dataframe, based on values in previous column

Question

My input looks like this:

import datetime as dt
import pandas as pd

some_money = [34,42,300,450,550]
df = pd.DataFrame({'TIME': ['2020-01', '2019-12', '2019-11', '2019-10', '2019-09'], \
                    'MONEY':some_money}) 
df

Producing the following:

I want to add 3 more columns, getting the MONEY value for the previous month, like this (color coding for illustrative purposes):

This is what I have tried:

prev_period_money = ["m-1", "m-2", "m-3"]
for m in prev_period_money:
    df[m] = df["MONEY"] - 10 #well, it "works", but it gives df["MONEY"]- 10...

The TIME column is sorted, so one should not care about it. (But it would be great, if someone shows the "magic", being able to get data from it.)

jezrael · Accepted Answer · 2020-02-05 09:52:35Z

2

Use for pandas 0.24+ fill_value=0 in Series.shift, then also are correct integers columns:

for x in range(1,4):
    df[f"m-{x}"] = df["MONEY"].shift(periods=-x, fill_value=0)

print (df)
      TIME  MONEY  m-1  m-2  m-3
0  2020-01     34   42  300  450
1  2019-12     42  300  450  550
2  2019-11    300  450  550    0
3  2019-10    450  550    0    0
4  2019-09    550    0    0    0

For pandas below 0.24 is necessary replace mising values and convert to integers:

for x in range(1,4):
    df[f"m-{x}"] = df["MONEY"].shift(periods=-x).fillna(0).astype(int)

edited Feb 5, 2020 at 9:52

answered Feb 5, 2020 at 9:46

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

sammywemmy Over a year ago

much shorter syntax.

Vityata Over a year ago

The fillna(0) per column is actually quite important, otherwise the df = df.fillna(0) may "pollute" the whole table.

Boendal · Accepted Answer · 2020-02-05 09:43:51Z

1

It is quite easy if you use shift

That would give you the desired output:

df["m-1"] = df["MONEY"].shift(periods=-1)
df["m-2"] = df["MONEY"].shift(periods=-2)
df["m-3"] = df["MONEY"].shift(periods=-3)
df = df.fillna(0)

This would work only if it's ordered. Otherwise you have to order it before.

answered Feb 5, 2020 at 9:43

Boendal

2,5162 gold badges25 silver badges37 bronze badges

2 Comments

Vityata Over a year ago

If you knew that "shift" exists... I was expecting "Offset" as a term. :) Thanks :)

Boendal Over a year ago

@Vityata that's true. Knowledge is power. Btw instead of three lines you can also integrate that in your for loop.

sammywemmy · Accepted Answer · 2020-02-05 09:46:10Z

1

My suggestion: Use a list comprehension with the shift function to get your three columns, concat them on columns, and concatenate it again to the original dataframe

(pd.concat([df,pd.concat([df.MONEY.shift(-i) for i in 
                         range(1,4)],axis=1)],
           axis=1)
  .fillna(0)
 )


    TIME    MONEY   MONEY   MONEY   MONEY
0   2020-01 34  42.0    300.0   450.0
1   2019-12 42  300.0   450.0   550.0
2   2019-11 300 450.0   550.0   0.0
3   2019-10 450 550.0   0.0 0.0
4   2019-09 550 0.0 0.0 0.0

edited Feb 5, 2020 at 9:46

answered Feb 5, 2020 at 9:43

sammywemmy

28.9k4 gold badges21 silver badges35 bronze badges

Comments

Zaraki Kenpachi · Accepted Answer · 2020-02-05 10:01:09Z

1

import pandas as pd

columns = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov"]
some_money = [34,42,300,450,550]

df = pd.DataFrame({'TIME': ['2020-01', '2019-12', '2019-11', '2019-10', '2019-09'], 'MONEY':some_money})

prev_period_money = ["m-1", "m-2", "m-3"]
count = 1
for m in prev_period_money:
    df[m] = df['MONEY'].iloc[count:].reset_index(drop=True)
    count += 1

df = df.fillna(0)

Output:

      TIME  MONEY    m-1    m-2    m-3
0  2020-01     34   42.0  300.0  450.0
1  2019-12     42  300.0  450.0  550.0
2  2019-11    300  450.0  550.0    0.0
3  2019-10    450  550.0    0.0    0.0
4  2019-09    550    0.0    0.0    0.0

edited Feb 5, 2020 at 10:01

answered Feb 5, 2020 at 9:52

Zaraki Kenpachi

5,7802 gold badges17 silver badges40 bronze badges

2 Comments

Vityata Over a year ago

What is the idea of the dropping and the reseting of the index?

Zaraki Kenpachi Over a year ago

@Vityata iloc is collecting rows with raw indexes

Collectives™ on Stack Overflow

Build rows in Python Dataframe, based on values in previous column

4 Answers 4

2 Comments

2 Comments

Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

2 Comments

2 Comments

Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related