I have a data.table which contains several binary columns with the same values that I would like to recode in one operation. I have modified a function that was originally written for data.frames, but am not sure if I am really capitalizing on the speed of data.table with the way I have modified it: specifically I suspect the function might still be copying values.
How can I make sure that the function replaces values by reference?
Here is a toy data set:
# Example data:
id <- c(1,2,3,4,5)
fruit <- c("apple", "orange", "banana", "strawbery", "rasberry")
mydate <- c("2015-09-01", "2015-09-02", "2015-11-15", "2016-02-24", "2016-03-08")
eaten <- c("y", "y", "n", "y", "u")
present <- c("n", "n", "y", "y", "y")
dt <- data.table(id, fruit, mydate, eaten, present)
dt[, mydate := as.Date(mydate, format = "%Y-%m-%d")]
dt[, sex := c("m", "f", "f", "m", "f")]
# Columns to update:
bincols <- c("eaten", "present")
Before recoding, the data looks like this:
> dt
id fruit mydate eaten present sex
1: 1 apple 2015-09-01 y n m
2: 2 orange 2015-09-02 y n f
3: 3 banana 2015-11-15 n y f
4: 4 strawbery 2016-02-24 y y m
5: 5 rasberry 2016-03-08 u y f
Here is the function:
recode.multi <- function(datacols, oldval, newval) {
for (i in 1:length(datacols)) {
datacols[datacols == oldval[i]] = newval[i]
}
datacols
}
... applied to the data:
dt[, (bincols) := lapply(.SD, recode.multi, oldval = c("u", "n", "y"), newval = c(NA_real_, 0, 1)), .SDcols = bincols]
... and the output, which updates the values as desired but not sure if it is copying the columns during this process?
> dt
id fruit mydate eaten present sex
1: 1 apple 2015-09-01 1 0 m
2: 2 orange 2015-09-02 1 0 f
3: 3 banana 2015-11-15 0 1 f
4: 4 strawbery 2016-02-24 1 1 m
5: 5 rasberry 2016-03-08 NA 1 f
I tried changing the last '=' in the function to ':=' but got an error re checking whether 'datacols' was a data.table. Adding a clause to the function to check if is.data.table == TRUE didn't solve the problem (same error returned).
Any thoughts on the most data.table appropriate way to approach this function would be much appreciated.
=in the function to:=." Which=were you trying to change?recode.multi <- function(datacols, oldval, newval) { for (i in 1:length(datacols)) { datacols[datacols == oldval[i]] := newval[i] } datacols }