I'm trying to loop through a data.table and do certain processing on the data:
provide output which is based on the combined output from each row processed
record various details of the processing in a separate table called statsTable which gets updated at this and other stages of the process
The actual processing is more complex (with records being included in the output for each iteration of apply) and with bigger volumes than the code below which I have simplified right down for this question.
However, I can't see how to update the statsTable as lapply prevents this from happening (by design I believe so that functions can't have unintended consequences - so the processing time remains at zero). Is there a way to do this and still use one of the the apply functions? I know I can use a for loop but would prefer not to if possible.
mainTable <- data.table(year = rep(2016:2020), value = runif(5, min=0, max=50000000))
statsTable <- data.table(year = rep(2016:2020), procTime = 0)
setkey(statsTable, year)
output <- bind_rows(lapply(mainTable$year, function(fileYear) {
randomValue = as.integer(mainTable[year == fileYear]$value)
print(paste0(fileYear, ":", randomValue))
start <- proc.time()[[3]]
for(i in 1:randomValue) {}
elapsed = proc.time()[[3]]- start
statsTable[year == fileYear]$procTime = elapsed
print(elapsed)
data.table(year = fileYear, loopsPerSecond = randomValue / elapsed)
}))
print(output)
print(statsTable)