Pandas DataFrame: Add new column with calculated values based on previous row

Question

I have the following Pandas DataFrame df:

            name          rating    val1     val2        val3
<DATE>                                                 
2020-10-16  cool_name      23.0  1.700079  1.515385  0.184694
2020-10-19  cool_name      -3.0  1.230071  1.289615 -0.059545
2020-10-20  cool_name     -11.0  0.007064  0.675135 -0.668071
2020-10-21  cool_name     -21.0 -2.093643 -0.408622 -1.685021
2020-10-22  cool_name      -5.0 -2.384278 -0.638191 -1.746087

How can I add a new column called "calculated" which is calculated this way:

df['calculated'] = df['calculated'(previous day)] + df['val3'(current day)]

If I try to do it this way, I'll receive a key error (I am not even sure if it is shift(1) or shift(-1)):

df['calculated'] = df['calculated'].shift() + df['val3']

I think this is due to the fact that the first row doesn't have a previous row with "calculated" . Howerver, I don't know how to solve this problems.

I tried various solutions and searched for answers, but unfortunately I'm stuck. Any help would be highly appreciated.

It doesn't exist yet. 'calculated' is a new column which should be calculated based on a previous value and a today's value (see second part of the question) — PythonLearner
– PythonLearner, Commented Jun 27, 2021 at 15:52
you want to add df['val3'] to the all columns of the previous day? — Anurag Dabas
– Anurag Dabas, Commented Jun 27, 2021 at 15:55
I want to add a new column called 'calculated'. Its value should be: Last_row_value(calculated) + current_row_value(val3) — PythonLearner
– PythonLearner, Commented Jun 27, 2021 at 15:57

SeaBean · Accepted Answer · 2021-06-27 18:23:09Z

1

Your formula is the same as accumulation of the value of val3 everyday (probably thinking in Excel style of formula). As such, you can try using cumsum(), as follows:

df['calculated'] = df['val3'].cumsum()

You got the error because when you are going to define the column calculated, it is derived also from column values of calculated which is yet to be defined. Hence, the error.

You can use the code above to get the same result directly without relying on a column not yet defined.

Result:

print(df)

                 name  rating      val1      val2      val3  calculated
2020-10-16  cool_name    23.0  1.700079  1.515385  0.184694    0.184694
2020-10-19  cool_name    -3.0  1.230071  1.289615 -0.059545    0.125149
2020-10-20  cool_name   -11.0  0.007064  0.675135 -0.668071   -0.542922
2020-10-21  cool_name   -21.0 -2.093643 -0.408622 -1.685021   -2.227943
2020-10-22  cool_name    -5.0 -2.384278 -0.638191 -1.746087   -3.974030

edited Jun 27, 2021 at 18:23

answered Jun 27, 2021 at 15:59

SeaBean

23.4k3 gold badges16 silver badges28 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

PythonLearner Over a year ago

You are absolutely right about my formula, and it is funny that I couldn't see that. Your answer is very simple and easy to understand, which is why I will go with this one. Thanks!

Collectives™ on Stack Overflow

Pandas DataFrame: Add new column with calculated values based on previous row

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related