1

I'm trying to update the values of a given column based on values stored in the previous row but from different columns.

I can do it using a for loop that works well with small data sets but when dealing with large DT (say over 1MM rows) this procedure off course takes ages. The following is a small example:

library(data.table)

DT <- data.table(Year = 2019:2038, Area = 500, Cos = c(0,0,0,150,0,0,
  0,0,350,0,0,0,0,0,0,0,120,200,80,100), Rep = c(0,0,0,0,150,0,0,0,0,
  350,0,0,0,0,0,0,0,0,0,0), Calc = c(500,500,500,500,500,500,500,500,
  500,500,500,500,500,500,500,500,500,380,180,100))

Basically, I want to replicate the column "Calc" which is calculated as follow:

1) If row == 1

Calc[1] == Area[1]

2) For rows > 1

Calc[i] == Rep[i] + Calc[i-1] - Cos[i-1]

I would appreciate any feedback

Many thanks

0

2 Answers 2

1

In this particular case, you can use:

DT[, newCalc := Calc[1L] + cumsum(Rep - shift(Cos, fill=0L))]

output:

    Year Area Cos Rep Calc    d newCalc
 1: 2019  500   0   0  500    0     500
 2: 2020  500   0   0  500    0     500
 3: 2021  500   0   0  500    0     500
 4: 2022  500 150   0  500    0     500
 5: 2023  500   0 150  500    0     500
 6: 2024  500   0   0  500    0     500
 7: 2025  500   0   0  500    0     500
 8: 2026  500   0   0  500    0     500
 9: 2027  500 350   0  500    0     500
10: 2028  500   0 350  500    0     500
11: 2029  500   0   0  500    0     500
12: 2030  500   0   0  500    0     500
13: 2031  500   0   0  500    0     500
14: 2032  500   0   0  500    0     500
15: 2033  500   0   0  500    0     500
16: 2034  500   0   0  500    0     500
17: 2035  500 120   0  500    0     500
18: 2036  500 200   0  380 -120     380
19: 2037  500  80   0  180 -200     180
20: 2038  500 100   0  100  -80     100
Sign up to request clarification or add additional context in comments.

Comments

1

We can use Reduce with accumulate = TRUE

DT[, newCalc := Reduce(`+`, Rep - shift(Cos, fill = 0), 
         init = Area[1], accumulate = TRUE)[-1]]
DT
#    Year Area Cos Rep Calc newCalc
# 1: 2019  500   0   0  500     500
# 2: 2020  500   0   0  500     500
# 3: 2021  500   0   0  500     500
# 4: 2022  500 150   0  500     500
# 5: 2023  500   0 150  500     500
# 6: 2024  500   0   0  500     500
# 7: 2025  500   0   0  500     500
# 8: 2026  500   0   0  500     500
# 9: 2027  500 350   0  500     500
#10: 2028  500   0 350  500     500
#11: 2029  500   0   0  500     500
#12: 2030  500   0   0  500     500
#13: 2031  500   0   0  500     500
#14: 2032  500   0   0  500     500
#15: 2033  500   0   0  500     500
#16: 2034  500   0   0  500     500
#17: 2035  500 120   0  500     500
#18: 2036  500 200   0  380     380
#19: 2037  500  80   0  180     180
#20: 2038  500 100   0  100     100

Or the same with accumulate from tidyverse

library(tidyverse)
DT %>% 
   mutate(newCalc = accumulate(Rep - lag(Cos, default = 0),
          .init = first(Area), `+`)[-1])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.