Dynamically passing a variable to the `i` expression in data.table

Question

SO #24833247 covers nearly all the use cases for passing column names dynamically to a data.table within a function. However it misses one I'm currently trying to address: passing variables to the i expression.

I'm trying to refactor some data cleansing code to a function that converts certain values to NA after I've pulled the data into a data.table

For example, given the following:

dt <- data.table(colA = c('A', 'b', '~', 'd', ''), colB = c('', '?', 'a1', 'a2', 'z4'))
dt[colA %in% c('~', ''), colA := NA]
dt[colB %in% c('~', ''), colB := NA]

I want a generic function that replaces the '~', '?' and '' values with NA, instead of having to explicitly code each transformation.

dt <- data.table(colA = c('A', 'b', '~', 'd', ''), colB = c('', '?', 'a1', 'a2', 'z4'))
clearCol(dt, colA)
clearCol(dt, colB)

The j expression is straight-forward

clearCol <- function(dt, f) {
  f = substitute(f)
  dt[,(f) := NA]
}
clearCol(data.table(colA = c('A', 'b', '~', 'd', '',)), colA)[]
    x
1: NA
2: NA
3: NA
4: NA
5: NA

However, extending it to add the variable to the i expression fails:

clearCol <- function(dt, f) {
  f = substitute(f)
  dt[(f) %in% c('~', ''),(f) := NA]
}
clearCol(data.table(colA = c('A', 'b', '~', 'd', '')), colA)[]
 Error in match(x, table, nomatch = 0L) : 'match' requires vector arguments

Swapping to this seems to work, but the lack of output with verbose = TRUE (compared to the hard-coded method at the top) leaves me concerned that it will not scale well when given the large data sets I'm working with

clearCol <- function(dt, f) {
  f = deparse(substitute(f))
  dt[get(f) %in% c('~', ''),(f) := NA]
}
clearCol(data.table(colA = c('A', 'b', '~', 'd', '')), colA)[]
   colA
1:    A
2:    b
3:   NA
4:    d
5:   NA

Is there another way of doing what I want?

Frank · Accepted Answer · 2016-11-18 18:01:58Z

1

You can follow FAQ 1.6 to get the verbose output:

cc = function(d, col, vs = c("~", ""), verb = FALSE){
  col = substitute(col)
  ix  = substitute(col %in% vs)
  d[eval(ix), as.character(col) := NA, verbose = verb ][]
}

dt <- data.table(colA = c('A', 'b', '~', 'd', ''), colB = c('', '?', 'a1', 'a2', 'z4'))
cc(dt, colA, verb = TRUE)

which gives

Creating new index 'colA'
Starting bmerge ...done in 0 secs
Detected that j uses these columns: <none> 
Assigning to 2 row subset of 5 rows
Dropping index 'colA' due to update on 'colA' (column 1)
   colA colB
1:    A     
2:    b    ?
3:   NA   a1
4:    d   a2
5:   NA   z4

However, notice what the verbose output is saying here. It's creating an index (assuming you didn't do something to create it already, which seems likely since the data was only just read in)... and then it's removing that index (since it is invalidated by the edit to the column). That hardly sounds like something that would do much to contribute to efficiency.

If you really want to do this efficiently, there are a couple options:

Use na.strings when reading the data in
Use set if you have a ton of columns and somehow can't do #1

answered Nov 18, 2016 at 18:01

Frank

66.9k8 gold badges104 silver badges190 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Stephen Zander Over a year ago

In the actual use case, the data is coming through a JDBC database connection, so na.strings isn't an option. However, the substitute/eval combination definitely gets me want I want. Thanks

Frank Over a year ago

Ok. Btw, data.table, R and other things are discussed over here, mostly on weekdays: chat.stackoverflow.com/rooms/75819/gmts I don't want to spam github emails, but that's an interesting example in your last comment there. Odd that you again managed to have typos since it looks like copy-paste.

Stephen Zander Over a year ago

The lesson is run your code, don't just write your code

Collectives™ on Stack Overflow

Dynamically passing a variable to the `i` expression in data.table

1 Answer 1

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related