What is an efficient way to modify a variable or a series of variables in a data.table to add or change attributes? For example, the R Hmisc package label function will add a label attribute to a variable, and make it class "labelled" so that [. and other operators will preserve the label (the package defines [.labelled. Likewise there is Hmisc::units to specify units of measurement. I typically add labels and units by using for example
label(age) <- 'Age at study entry' # or:
label(mydataframe$age) <- 'Age at study entry' # or:
mydataframe <- upData(mydataframe, labels=c(age='Age at study entry'), units='year')
data.table is correctly preserving attributes for simple operations:
require(Hmisc)
x1 <- runif(10)
label(x1) <- 'This is a label'
units(x1) <- 'xunits'
x2 <- rnorm(10)
d <- data.table(x1, x2)
label(d$x1) # label there
label(d[,'x1']) # label gone
label(d[1:3,]$x1)# label there
label(d[,x1]) # label there
units(d[1:2,x1]) # yes
so the primary question is how to insert attributes into an already existing data.table object with a minimum of memory use/execution time.
d[,'x1']returns adata.tableandd[1:3,]$x1returns the vector... the actual label attribute follows the vector.