3

I am using setDT() to add additional columns to a data.table but

setDT(mydata)[, paste0('F2_E',2:30) := lapply(.SD, function(x) log(value/x)), .SDcols = 32:60][]

is not being added when you run this script:

library(data.table)
library(zoo)
date = seq(as.Date("2016-01-01"),as.Date("2016-05-10"),"day")
value =seq(1,131,1)
mydata = data.frame (date, value)
mydata
setDT(mydata)[, paste0('F1',2:30) := lapply(2:30, function(x) rollmeanr(value, x, fill = rep(NA,x-1)) ),][]
setDT(mydata)[, paste0('F2',2:30) := lapply(2:30, function(x) rollapply(value,x,FUN="median",align="right",fill=NA))][]
setDT(mydata)[, paste0('F1_E',2:30) := lapply(.SD, function(x) log(value/x)     ), .SDcols = 3:31][]
setDT(mydata)[, paste0('F2_E',2:30) := lapply(.SD, function(x) log(value/x)), .SDcols = 32:60][]
rbind(colnames(mydata))


rbind(colnames(mydata))
     [,1]   [,2]    [,3]  [,4]  [,5]  [,6]  [,7]  [,8]  [,9]  [,10] [,11]  [,12]  [,13]  [,14]  [,15]  [,16]  [,17]  [,18]  [,19]  [,20]  [,21]  [,22]  [,23]  [,24]  [,25]  [,26]  [,27] 
[1,] "date" "value" "F12" "F13" "F14" "F15" "F16" "F17" "F18" "F19" "F110" "F111" "F112" "F113" "F114" "F115" "F116" "F117" "F118" "F119" "F120" "F121" "F122" "F123" "F124" "F125" "F126"
     [,28]  [,29]  [,30]  [,31]  [,32] [,33] [,34] [,35] [,36] [,37] [,38] [,39] [,40]  [,41]  [,42]  [,43]  [,44]  [,45]  [,46]  [,47]  [,48]  [,49]  [,50]  [,51]  [,52]  [,53]  [,54] 
[1,] "F127" "F128" "F129" "F130" "F22" "F23" "F24" "F25" "F26" "F27" "F28" "F29" "F210" "F211" "F212" "F213" "F214" "F215" "F216" "F217" "F218" "F219" "F220" "F221" "F222" "F223" "F224"
     [,55]  [,56]  [,57]  [,58]  [,59]  [,60]  [,61]   [,62]   [,63]   [,64]   [,65]   [,66]   [,67]   [,68]   [,69]    [,70]    [,71]    [,72]    [,73]    [,74]    [,75]    [,76]    [,77]   
[1,] "F225" "F226" "F227" "F228" "F229" "F230" "F1_E2" "F1_E3" "F1_E4" "F1_E5" "F1_E6" "F1_E7" "F1_E8" "F1_E9" "F1_E10" "F1_E11" "F1_E12" "F1_E13" "F1_E14" "F1_E15" "F1_E16" "F1_E17" "F1_E18"
     [,78]    [,79]    [,80]    [,81]    [,82]    [,83]    [,84]    [,85]    [,86]    [,87]    [,88]    [,89]   
[1,] "F1_E19" "F1_E20" "F1_E21" "F1_E22" "F1_E23" "F1_E24" "F1_E25" "F1_E26" "F1_E27" "F1_E28" "F1_E29" "F1_E30"

You can see there are no F2_E2, F2_E3,etc... columns.

Why would those columns not be added?

4
  • I don't think you need to setDT() every time. Just do setDT(mydata) once and then use mydata[, blah := blah, by=blah] from then on. mydata[, paste0('F2_E',2:30) := lapply(.SD, function(x) log(value/x)), .SDcols = 32:60] works just fine for me. Commented Jun 6, 2016 at 23:03
  • @latemail it didn't work for me. What were the final dimensions for mydata after the four runs? Commented Jun 6, 2016 at 23:14
  • @PierreLafortune - dim(mydata) is 131 118 Commented Jun 6, 2016 at 23:20
  • I get 131 89 on data.table_1.9.6 Commented Jun 6, 2016 at 23:22

2 Answers 2

2

Short answer:

Use setDT(mydata) once, and separately. Then do all your assignment statements.

Additionally, if you're going to add a lot of columns use the function alloc.col() to over-allocate more slots up-front until next release (v1.9.8). i.e.,

setDT(mydata)
truelength(mydata) # [1] 100
alloc.col(mydata, 1000L)
truelength(mydata) # [1] 1000

In the current development version, v1.9.7, we've increased the over-allocation to 1024, by default. So this should happen extremely rarely.


A quick and slightly detailed explanation:

This happens because data.table over-allocates column pointers during its creation, and the default over-allocation length is 100 columns. You can check this with truelength(). See ?truelength.

require(data.table)
mydata = data.frame (x=1, y=2)
setDT(mydata)      ## convert to data.table by reference
length(mydata)     ## equals the columns assigned
# [1] 2
truelength(mydata) ## total number of column slots allocated
# [1] 100

Let's add 30 more columns the way you did.

setDT(mydata)[, paste0("z", 1:30) := 1L]
length(mydata)     ## [1] 32
truelength(mydata) ## [1] 100

And another 30.

setDT(mydata)[, paste0("z", 31:60) := 1L]
length(mydata)     ## [1] 62
truelength(mydata) ## [1] 100

And another 30.

setDT(mydata)[, paste0("z", 61:90) := 1L]
length(mydata)     ## [1] 92
truelength(mydata) ## [1] 100

Now, the next time we do this, we've to add 30 more columns, but we only have 8 more slots free. So we need to create another object with even more over-allocated slots, assign all columns currently in mydata to the new object, and finally assign it back to mydata. And this is handled internally and automatically so that the user doesn't have to keep track. So the next time we do:

setDT(mydata)[, paste0("z", 91:120) := 1L]

The function [.data.table realises it needs to over-allocate again, and does so, and the new columns get added to the new object. The issue is assigning the result from this new object back to mydata which is in the parent frame of [.data.table. And that is done through assign() statement, which only accepts a variable name as character input, and setDT(mydata) isn't. So the re-assignment step fails and therefore the over-allocation couldn't be reflected back to the original object. If you'd done mydata[, paste0(..) := ...] then the input object mydata is a name, and can be used to assign the over-allocated result back to the original object, and that's why the suggestion from @thelatemail would work.

If this is all too advanced, just upgrade to the devel version, and this'll all go away, and is very unlikely to happen (unless you'd want to have more than 1024 columns in your data.table).


I've filed #1731 to remind us to come back to this and see if there are other ways to get around this case.

Sign up to request clarification or add additional context in comments.

2 Comments

can you advise on what/where the devel version is?
@user3022875 see Installation wiki page
0

I am running into the same issue that you are. We can try to ping the data.table experts to understand the sticking point. @latemail gets the desired output so it works for someone.

In the meantime, since you are running the same operation on the third and fourth calls, we can combine them into one to get the desired output:

mydata[, paste0(rep(c('F1_E', 'F2_E'),each=29), rep(2:30, 2)) := lapply(.SD, function(x) log(value/x)), .SDcols = 3:60][]

 dim(mydata)
[1] 131 118

Edit

Remove the square brackets and it will work. I'll scratch my head for awhile until I figure out why.

7 Comments

I also removed the [] at the end as well.
That's it. It works without the open close brackets at the end. Weird
I'm really surprised that that would be the solution.
I can't create a simple version that stuffs up, e.g. - dat <- mtcars; setDT(dat); dat[, paste0("new",1:2) := lapply(.SD, sqrt), .SDcols=1:2][] seems to work fine.
I'll ask Arun about it
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.