0

I have the following Pandas DataFrame df:

            name          rating    val1     val2        val3
<DATE>                                                 
2020-10-16  cool_name      23.0  1.700079  1.515385  0.184694
2020-10-19  cool_name      -3.0  1.230071  1.289615 -0.059545
2020-10-20  cool_name     -11.0  0.007064  0.675135 -0.668071
2020-10-21  cool_name     -21.0 -2.093643 -0.408622 -1.685021
2020-10-22  cool_name      -5.0 -2.384278 -0.638191 -1.746087


       

How can I add a new column called "calculated" which is calculated this way:

df['calculated'] = df['calculated'(previous day)] + df['val3'(current day)]

If I try to do it this way, I'll receive a key error (I am not even sure if it is shift(1) or shift(-1)):

df['calculated'] = df['calculated'].shift() + df['val3']

I think this is due to the fact that the first row doesn't have a previous row with "calculated" . Howerver, I don't know how to solve this problems.

I tried various solutions and searched for answers, but unfortunately I'm stuck. Any help would be highly appreciated.

4
  • where is 'calculated' column? that you are shifting? Commented Jun 27, 2021 at 15:50
  • It doesn't exist yet. 'calculated' is a new column which should be calculated based on a previous value and a today's value (see second part of the question) Commented Jun 27, 2021 at 15:52
  • you want to add df['val3'] to the all columns of the previous day? Commented Jun 27, 2021 at 15:55
  • I want to add a new column called 'calculated'. Its value should be: Last_row_value(calculated) + current_row_value(val3) Commented Jun 27, 2021 at 15:57

1 Answer 1

1

Your formula is the same as accumulation of the value of val3 everyday (probably thinking in Excel style of formula). As such, you can try using cumsum(), as follows:

df['calculated'] = df['val3'].cumsum()

You got the error because when you are going to define the column calculated, it is derived also from column values of calculated which is yet to be defined. Hence, the error.

You can use the code above to get the same result directly without relying on a column not yet defined.

Result:

print(df)

                 name  rating      val1      val2      val3  calculated
2020-10-16  cool_name    23.0  1.700079  1.515385  0.184694    0.184694
2020-10-19  cool_name    -3.0  1.230071  1.289615 -0.059545    0.125149
2020-10-20  cool_name   -11.0  0.007064  0.675135 -0.668071   -0.542922
2020-10-21  cool_name   -21.0 -2.093643 -0.408622 -1.685021   -2.227943
2020-10-22  cool_name    -5.0 -2.384278 -0.638191 -1.746087   -3.974030
Sign up to request clarification or add additional context in comments.

1 Comment

You are absolutely right about my formula, and it is funny that I couldn't see that. Your answer is very simple and easy to understand, which is why I will go with this one. Thanks!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.