Create new Pandas columns using the value from previous row

Question

I need to create two new Pandas columns using the logic and value from the previous row.

I have the following data:

Day Vol    Price    Income   Outgoing
1   499     75      
2   3233    90      
3   1812    70      
4   2407    97      
5   3474    82      
6   1057    53      
7   2031    68      
8   304     78      
9   1339    62      
10  2847    57      
11  3767    93      
12  1096    83      
13  3899    88      
14  4090    63      
15  3249    52      
16  1478    52      
17  4926    75      
18  1209    52      
19  1982    90      
20  4499    93

My challenge is to come up with a logic where both the Income and Outgoing columns (which are currently empty), should have the values of (Vol * Price).

But, the Income column should carry this value when, the previous day's "Price" value is lower than present. The Outgoing column should carry this value when, the previous day's "Price" value is higher than present. The rest of the Income and Outgoing columns, should just have NaN's. If the Price is unchanged, then that day's value is to be dropped.

But the entire logic should start with (n + 1) day. The first row should be skipped and the logic should apply from row 2 onwards.

I have tried using shift in my code example such as:

if sample_data['Price'].shift(1) < sample_data['Price'].shift(2)):
    sample_data['Income'] = sample_data['Vol'] * sample_data['Price']
else:
    sample_data['Outgoing'] = sample_data['Vol'] * sample_data['Price']

But it isn't working.

I feel there would be a simpler and comprehensive tactic to go about this, could someone please help ?

Update (The final output should look like this):

For day 16, the data is deleted because we have two similar prices for day 15 and 16.

What should your output look like with the given input? Your wording is a little unclear — G. Anderson
– G. Anderson, Commented May 24, 2019 at 22:09

Andy Hayden · Accepted Answer · 2019-05-24 22:17:04Z

1

I'd calculate the product and the mask separately, and then update the cols:

In [11]: vol_price = df["Vol"] * df["Price"]

In [12]: incoming = df["Price"].diff() < 0

In [13]: df.loc[incoming, "Income"] = vol_price

In [14]: df.loc[~incoming, "Outgoing"] = vol_price

In [15]: df
Out[15]:
    Day   Vol  Price    Income  Outgoing
0     1   499     75       NaN   37425.0
1     2  3233     90       NaN  290970.0
2     3  1812     70  126840.0       NaN
3     4  2407     97       NaN  233479.0
4     5  3474     82  284868.0       NaN
5     6  1057     53   56021.0       NaN
6     7  2031     68       NaN  138108.0
7     8   304     78       NaN   23712.0
8     9  1339     62   83018.0       NaN
9    10  2847     57  162279.0       NaN
10   11  3767     93       NaN  350331.0
11   12  1096     83   90968.0       NaN
12   13  3899     88       NaN  343112.0
13   14  4090     63  257670.0       NaN
14   15  3249     52  168948.0       NaN
15   16  1478     52       NaN   76856.0
16   17  4926     75       NaN  369450.0
17   18  1209     52   62868.0       NaN
18   19  1982     90       NaN  178380.0
19   20  4499     93       NaN  418407.0

or is it this way around:

In [21]: incoming = df["Price"].diff() > 0

In [22]: df.loc[incoming, "Income"] = vol_price

In [23]: df.loc[~incoming, "Outgoing"] = vol_price

In [24]: df
Out[24]:
    Day   Vol  Price    Income  Outgoing
0     1   499     75       NaN   37425.0
1     2  3233     90  290970.0       NaN
2     3  1812     70       NaN  126840.0
3     4  2407     97  233479.0       NaN
4     5  3474     82       NaN  284868.0
5     6  1057     53       NaN   56021.0
6     7  2031     68  138108.0       NaN
7     8   304     78   23712.0       NaN
8     9  1339     62       NaN   83018.0
9    10  2847     57       NaN  162279.0
10   11  3767     93  350331.0       NaN
11   12  1096     83       NaN   90968.0
12   13  3899     88  343112.0       NaN
13   14  4090     63       NaN  257670.0
14   15  3249     52       NaN  168948.0
15   16  1478     52       NaN   76856.0
16   17  4926     75  369450.0       NaN
17   18  1209     52       NaN   62868.0
18   19  1982     90  178380.0       NaN
19   20  4499     93  418407.0       NaN

answered May 24, 2019 at 22:17

Andy Hayden

378k110 gold badges640 silver badges546 bronze badges

Sign up to request clarification or add additional context in comments.

8 Comments

Dinesh Over a year ago

@QuangHoang, could you elaborate what you meant please ?

Quang Hoang Over a year ago

Line 16 should be NaN in both Income and Outgoing.

Dinesh Over a year ago

@Andy, how can we skip the first row and apply logic from second row onwards ? Because I'm trying to start it from (n+1) onwards.

Andy Hayden Over a year ago

@QuangHoang You could mask that out after with df.loc[0, ["Income", "Outgoing"]] = np.nan

Andy Hayden Over a year ago

@Dinesh if you do df.loc[0, ["Income", "Outgoing"]] = np.nan it will update the first row of Income and Outgoing to be NaN

|

Collectives™ on Stack Overflow

Create new Pandas columns using the value from previous row

1 Answer 1

8 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

8 Comments

Your Answer

Sign up or log in

Post as a guest

Related