2

I have the following data

set.seed(42)
dat <- list(data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10)), 
            data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10)))

to which I would like to apply this function element by element and group by group.

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]
  mon = data.table(cond = as.character(L))[, skip := FALSE]

  for (i in seq_along(L)){
    d = eval( substitute(x[cond, verbose=v], list(cond = L[[i]], v = verbose)) )
    if (nrow(d)){
      x = d
    } else {
      mon[i, skip := TRUE]
    }    
  }
  #print(mon)
  return(x)
}

However, when I run this code

# works
out <- lapply(1:2, function(h){
    res <- list()
    d <- dat[[h]] 
    for(k in 1:2){
        g <- d[group==k]
        cutoff <- 1
        print(cutoff)
        res[[k]] <- subs(g, x>cutoff)
    }
    res
})

I receive the error that object cutoff cannot be found, although it is printed correctly. However, when I apply the same for-loop outside of the lapply(), it appears to work.

d1 <- dat[[1]]
s <- list()
for(k in 1:2){
    g <- d1[group==k]
    cutoff <- 1
    s[[k]] <- subs(g, x>cutoff)
}

> s
[[1]]
   id group        x
1:  1     1 1.370958

[[2]]
   id group        x
1:  7     2 1.511522
2:  9     2 2.018424

This leads me to suspect that it's the inclusion in the lapply() that causes the error but I find it hard to see what the error is, let along how to fix it.

Edit

Data with two variables:

set.seed(42)
dat <- list(data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10), y=11:20), 
            data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10), y=11:20))

with expected result

[[1]]
   id group          x   y
1:  9     2  2.0184237  19
2:  1     1  1.3709584  11
3:  2     1 -0.5646982  12
4:  3     1  0.3631284  13
5:  4     1  0.6328626  14
6:  5     1  0.4042683  15

[[2]]
   id group          x   y
1:  2     1  2.2866454  12
2: 10     2  1.3201133  20
5
  • 1
    I think your function subs() is not aware of the object cutoff, as you're not passing it trough arguments. Substitute returns parse tree, but doesn't create cutoff? subs = function(x, ..., verbose=FALSE, cutoff = cutoff) and res[[k]] <- subs(g, 1 > cutoff, cutoff = cutoff) would, for example, work edit: outside the lapply ou create it in global env. which sub() can access there? Commented Aug 21, 2019 at 12:08
  • Correct, subs() passes x>cutoff as it's to L not x>1, insert browser() at the first line of subs() and re run the code. Commented Aug 21, 2019 at 12:12
  • @Arcoutte, I don't quite understand. If possible, I would like to keep the conditions entered as flexible as possible so I can apply subs() to different situations. Are you saying I have to define variables in the definition of the function? Commented Aug 21, 2019 at 12:26
  • I checked and think you are right: It creates a global variable. I still don't understand how to do it within lapply(). Commented Aug 21, 2019 at 12:29
  • Change L in subs to L=list(...) and res[[k]] to res[[k]] <- subs(g, substitute(x>cutoff)),Works fine for one and two conditions in subs but I don't know if it will scale to your real case. Commented Aug 21, 2019 at 13:50

1 Answer 1

3

If you use non-standard evaluation you always pay a price. Here it is a scoping issue.

It works like this:

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]
  mon = data.table(cond = as.character(L))[, skip := FALSE]

  for (i in seq_along(L)){
    d = eval( substitute(x[cond,, #needed to add this comma, don't know why
                           verbose=v], list(cond = L[[i]], v = verbose)))
    if (nrow(d)){
      x = d
    } else {
      mon[i, skip := TRUE]
    }    
  }
  #print(mon)
  return(x)
}

out <- lapply(1:2, function(h){
  res <- list()
  d <- dat[[h]] 
  for(k in 1:2){
    g <- d[group==k]

    cutoff <- 1
    res[[k]] <- eval(substitute(subs(g, x>cutoff), list(cutoff = cutoff)))
  }
  res
})
#works

Is there a particular reason for not using data.table's by parameter?

Edit:

Background: The point of subs() is to apply multiple conditions (if multiple are passed to it) unless one would result in an empty subset.

I would use a different approach then:

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]

  for (i in seq_along(L)){
    d = eval( substitute(x[cond, , verbose=v], list(cond = L[[i]], v = verbose)))
    x <- rbind(d, x[!d, on = "group"]) 
  }

  return(x)
}

out <- lapply(dat, function(d){

  cutoff <- 2 #to get empty groups

  eval(substitute(subs(d, x>cutoff), list(cutoff = cutoff)))

})

#[[1]]
#   id group          x
#1:  9     2  2.0184237
#2:  1     1  1.3709584
#3:  2     1 -0.5646982
#4:  3     1  0.3631284
#5:  4     1  0.6328626
#6:  5     1  0.4042683
#
#[[2]]
#   id group          x
#1:  2     1  2.2866454
#2:  6     2  0.6359504
#3:  7     2 -0.2842529
#4:  8     2 -2.6564554
#5:  9     2 -2.4404669
#6: 10     2  1.3201133

Beware that this does not retain the ordering.

Another option that retains the ordering:

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]

  for (i in seq_along(L)){
    x = eval( substitute(x[, {
      res <- .SD[cond];
      if (nrow(res) > 0) res else .SD 
    }, by = "group", verbose=v], list(cond = L[[i]], v = verbose)))
  }

  return(x)
}

The by variable could be passed as a function parameter and then substituted in together with the condition.

I haven't done benchmarks comparing the efficiency of these two.

Sign up to request clarification or add additional context in comments.

21 Comments

Do you have a suggestion how to use [..., by=group] together with subs() ? I am all ears! Also: Is the drop=FALSE necessary? I thought I read data.table made this obsolete.
Sorry, artifacts from testing.
Can you change subs? Then just pass a value that is then passed on to by.
Background: The point of subs() is to apply multiple conditions (if multiple are passed to it) unless one would result in an empty subset. It was Frank's solution to an earlier question of mine. As long as it retains this feature, it can be changed. I would welcome this even, since the for-loop over groups is not ideal.
Excellent! To improve on your edit, I wonder whether subs = function(x, by, ..., verbose=FALSE){... grouping = as.character(by) ... by = as.character(grouping) ...}, where by picks up the grouping variable would work. Does this make sense?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.