I have a data set that contains some logical columns and would like to replace values that are 'TRUE' with the corresponding column name. I asked a similar question here and was able to identify an appropriate solution with the help of some suggestions from other S/O users. However, the solution does not use data.table syntax and copies the whole dataset instead of replacing by reference, which is time consuming.
What is the most appropriate way to do this using data.table syntax?
I tried this:
# Load library
library(data.table)
# Create dummy data.table:
mydt <- data.table(id = c(1,2,3,4,5),
ptname = c("jack", "jill", "jo", "frankie", "claire"),
sex = c("m", "f", "f", "m", "f"), apple = c(T,F,F,T,T),
orange = c(F,T,F,T,F),
pear = c(T,T,T,T,F))
# View dummy data:
> mydt
id ptname sex apple orange pear
1: 1 jack m TRUE FALSE TRUE
2: 2 jill f FALSE TRUE TRUE
3: 3 jo f FALSE FALSE TRUE
4: 4 frankie m TRUE TRUE TRUE
5: 5 claire f TRUE FALSE FALSE
# Function to recode values in a data.table:
recode.multi <- function(datacol, oldval, newval) {
trans <- setNames(newval, oldval)
trans[ match(datacol, names(trans)) ]
}
# Get a list of all the logical columns in the data set:
logicalcols <- names(which(mydt[, sapply(mydt, is.logical)] == TRUE))
# Apply the function to convert 'TRUE' to the relevant column names:
mydt[, (logicalcols) := lapply(.SD, recode.multi,
oldval = c(FALSE, TRUE),
newval = c("FALSE", names(.SD))), .SDcols = logicalcols]
# View the result:
> mydt
id ptname sex apple orange pear
1: 1 jack m apple FALSE apple
2: 2 jill f FALSE apple apple
3: 3 jo f FALSE FALSE apple
4: 4 frankie m apple apple apple
5: 5 claire f apple FALSE FALSE
This isn't correct as instead of iterating through each column name for the replacement values, it just recycles the first one ("apple" in this case).
Moreover, if I reverse the order of old and new values, the function ignores my character string replacement for the second value and uses the first two column names as replacements in all cases:
# Apply the function with order of old and new values reversed:
mydt[, (logicalcols) := lapply(.SD, recode.multi,
oldval = c(TRUE, FALSE),
newval = c(names(.SD), "FALSE")), .SDcols = logicalcols]
# View the result:
> mydt
id ptname sex apple orange pear
1: 1 jack m apple orange apple
2: 2 jill f orange apple apple
3: 3 jo f orange orange apple
4: 4 frankie m apple apple apple
5: 5 claire f apple orange orange
I'm sure I'm probably missing something simple but does anyone know why the function does not iterate through the column names (and how to edit it to do this)?
My expected output would be as follows:
> mydt
id ptname sex apple orange pear
1: 1 jack m apple FALSE pear
2: 2 jill f FALSE orange pear
3: 3 jo f FALSE FALSE pear
4: 4 frankie m apple orange pear
5: 5 claire f apple FALSE FALSE
Alternatively any other suggestions of concise data.table syntax to achieve this would be much appreciated.
lapplyiterates on one thing at a time (.SD here). If you need it to iterate over .SD and names(.SD), try Map.mydt[, (logicalcols) := mapply(recode.multi, datacol = .SD, oldval = c(TRUE, FALSE), newval = c(names(.SD), "FALSE"), SIMPLIFY = FALSE), .SDcols = logicalcols]almost gets me there except that the FALSE values are converted to NAs.