2

What is an efficient way to modify a variable or a series of variables in a data.table to add or change attributes? For example, the R Hmisc package label function will add a label attribute to a variable, and make it class "labelled" so that [. and other operators will preserve the label (the package defines [.labelled. Likewise there is Hmisc::units to specify units of measurement. I typically add labels and units by using for example

label(age) <- 'Age at study entry'               # or:
label(mydataframe$age) <- 'Age at study entry'   # or:
mydataframe <- upData(mydataframe, labels=c(age='Age at study entry'), units='year')

data.table is correctly preserving attributes for simple operations:

require(Hmisc)
x1 <- runif(10)
label(x1) <- 'This is a label'
units(x1) <- 'xunits'
x2 <- rnorm(10)
d <- data.table(x1, x2)
label(d$x1)      # label there
label(d[,'x1'])  # label gone
label(d[1:3,]$x1)# label there
label(d[,x1])    # label there
units(d[1:2,x1]) # yes

so the primary question is how to insert attributes into an already existing data.table object with a minimum of memory use/execution time.

4
  • What is an example of a typical data.table operation where your labels are not preserved? Commented Jun 23, 2020 at 14:10
  • Great question and I should have addressed that. Will edit top post now. Commented Jun 23, 2020 at 14:14
  • 1
    Note that d[,'x1'] returns a data.table and d[1:3,]$x1 returns the vector... the actual label attribute follows the vector. Commented Jun 23, 2020 at 14:20
  • you can subsribe to this issue to be informed about the feature: github.com/Rdatatable/data.table/issues/623 also you may want to upvote it Commented Jun 24, 2020 at 21:22

1 Answer 1

3

Setting variable attributes (including labels) in data.table:

To easily and efficiently (by reference) set the label you can use: setattr().

Example:

library(data.table)
iDT <- data.table(iris)

setattr(iDT$Species, "label", "Know the species")
attributes(iDT$Species)

# $levels
# [1] "setosa"     "versicolor" "virginica" 
# 
# $class
# [1] "factor"
# 
# $label
# [1] "Know the species"
Sign up to request clarification or add additional context in comments.

6 Comments

Wonderful. How would I use the full capabilities of Hmisc::label such that I can provide multiple attributes at once (label and units plus possibly class) so that the attributes will survive subsetting?
I guess you could try something like setattr(iDT$Species, "class", c("factor", "labelled"))? In general I am not convinced attributes are the best place to keep metadata.
@FrankHarrell Your question is a bit unclear. data.table preserves attributes. I'd suggest you provide a simple setlabel function modeled after this answer for use with data.table (but it could be used elsewhere too).
I will try that. It would be nice if a compact syntax would work, i.e., one that looks more like setkey(). I've wondered too whether there is a better place to store metadata, but I've never found one that is safe for subsetting and that doesn't require a bit more bookkeeping.
I'm going to modify the R Hmisc package upData function to do setattr in a loop over the variables which are being given labels or units. data.table is blazing fast at doing this.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.