data.table assignment by reference using lapply and also returning the rest of the columns [duplicate]

Question

I want to be able to apply a function over a subset of columns, and return those columns that have been manipulated along with the rest of the data columns that weren't touched. Is there a way to do this with data.table. I wasn't able to figure out the syntax.

In this example I have NAs and want to overwrite them with something else for a few different columns. I need a way to also return other columns that weren't touched.

library(data.table)

# make data set
a <- sample(c(letters[1:5], NA), 50, replace=TRUE)
b <- sample(c(LETTERS[1:5], NA), 50, replace=TRUE)
c <- sample(runif(50))

x <- data.table(a,b,c)

# function to apply to a single column
overwriteNA <- function(vec, new="") ifelse(is.na(vec), new, vec)

# Only returns .SDcols but would like to also include rest of columns in data.table
x[, lapply(.SD, overwriteNA), .SDcols=c("a", "b")]

# Need something along these lines
x[, `:=` lapply(.SD, overwriteNA), .SDcols=c("a", "b")]

David Arenburg · Accepted Answer · 2016-07-21 05:09:22Z

10

Try

x[,  c("a", "b") := lapply(.SD, overwriteNA), .SDcols = c("a", "b")]

Edit:

Per OPs additional request.

myCols <- c("a", "b")  
x[, (myCols) := lapply(.SD, overwriteNA), .SDcols = myCols]

edited Jul 21, 2016 at 5:09

answered May 18, 2014 at 21:32

David Arenburg

92.4k18 gold badges145 silver badges202 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

iAmAMutt Over a year ago

Thanks. Any idea how to do so with a character vector of column names? This doesn't seem to work. myCols = c("a", "b"); x[, myCols := lapply(.SD, overwriteNA), .SDcols=myCols]

skan Over a year ago

Why do you use eval() ?

David Arenburg Over a year ago

@skan Because you need to tell data.table to evaluate myCols. Otherwise it will create a new column called myCols. Though you can evaluate it with just parenthesis

skan Over a year ago

Hello, @DavidArenburg , and what's the difference between using (myCols) and .(myCols) ? , I've seen the latter option, with a dot, in some examples.

David Arenburg Over a year ago

@skan, no, list isn't evaluating the expression within it. So you can't use it in this case

|

Collectives™ on Stack Overflow

data.table assignment by reference using lapply and also returning the rest of the columns [duplicate]

1 Answer 1

7 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

7 Comments

Linked

Related