3

Here's a data.table

dt <- data.table(group = c("a","a","a","b","b","b"), x = c(1,3,5,1,3,5), y= c(3,5,8,2,8,9))
dt
   group x y
1:     a 1 3
2:     a 3 5
3:     a 5 8
4:     b 1 2
5:     b 3 8
6:     b 5 9

And here's a function that operates on a data.table and returns a data.table

myfunc <- function(dt){
  # Hyman spline interpolation (which preserves monotonicity)

  newdt <- data.table(x = seq(min(dt$x), max(dt$x)))
  newdt$y <- spline(x = dt$x, y = dt$y, xout = newdt$x, method = "hyman")$y
  return(newdt)
}

How do I apply myfunc to each subset of dt defined by the "group" column? In other words, I want an efficient, generalized way to do this

result <- rbind(myfunc(dt[group=="a"]), myfunc(dt[group=="b"]))
result
    x     y
 1: 1 3.000
 2: 2 3.875
 3: 3 5.000
 4: 4 6.375
 5: 5 8.000
 6: 1 2.000
 7: 2 5.688
 8: 3 8.000
 9: 4 8.875
10: 5 9.000

EDIT: I've updated my sample dataset and myfunc because I think it was initially too simplistic and invited work-arounds to the actual problem I'm trying to solve.

6
  • You function creates unnecessary copies, Just do dt[, .(x = seq(min(x), max(x) + 1), y = rep(y, each = 2)), by = group] Commented Mar 31, 2015 at 21:09
  • Alternately, define your function as following myfunc <- function(x, y){ list(x = seq(min(x), max(x)+1), y = rep(y, each=2))} and then do dt[, myfunc(x, y), by = group] Commented Mar 31, 2015 at 21:12
  • @DavidArenburg see my edit (sorry) Commented Mar 31, 2015 at 21:18
  • @Ben, @DavidArenburg 's comment still holds. Have your function return a list, not a data.table, and do dt[, myfunc(x, y), by = group]. Commented Mar 31, 2015 at 21:19
  • Actually your new function returns an error now. Commented Mar 31, 2015 at 21:24

1 Answer 1

7

The whole idea of data.table is being both memory efficient and fast. Thus we never use $ within the data.table scope (only in very rare situations) and we don't create data.table objects within data.tables environment (currently, even .SD has an overhead).

In your case you can take advantage of data.table's non-standard evaluation capabilities and define your function as follows

myfunc <- function(x, y){
   temp = seq(min(x), max(x))
   y = spline(x = x, y = y, xout = temp, method = "hyman")$y
   list(x = temp, y = y)
}

Then the implementation within the dt scope is straight forward

dt[, myfunc(x, y), by = group]
#     group x      y
#  1:     a 1 3.0000
#  2:     a 2 3.8750
#  3:     a 3 5.0000
#  4:     a 4 6.3750
#  5:     a 5 8.0000
#  6:     b 1 2.0000
#  7:     b 2 5.6875
#  8:     b 3 8.0000
#  9:     b 4 8.8750
# 10:     b 5 9.0000
Sign up to request clarification or add additional context in comments.

1 Comment

NSE is "non-standard evaluation", eh? So suggests google, anyway.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.