variables not recognized in for-loop nested within lapply

Question

I have the following data

set.seed(42)
dat <- list(data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10)), 
            data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10)))

to which I would like to apply this function element by element and group by group.

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]
  mon = data.table(cond = as.character(L))[, skip := FALSE]

  for (i in seq_along(L)){
    d = eval( substitute(x[cond, verbose=v], list(cond = L[[i]], v = verbose)) )
    if (nrow(d)){
      x = d
    } else {
      mon[i, skip := TRUE]
    }    
  }
  #print(mon)
  return(x)
}

However, when I run this code

# works
out <- lapply(1:2, function(h){
    res <- list()
    d <- dat[[h]] 
    for(k in 1:2){
        g <- d[group==k]
        cutoff <- 1
        print(cutoff)
        res[[k]] <- subs(g, x>cutoff)
    }
    res
})

I receive the error that object cutoff cannot be found, although it is printed correctly. However, when I apply the same for-loop outside of the lapply(), it appears to work.

d1 <- dat[[1]]
s <- list()
for(k in 1:2){
    g <- d1[group==k]
    cutoff <- 1
    s[[k]] <- subs(g, x>cutoff)
}

> s
[[1]]
   id group        x
1:  1     1 1.370958

[[2]]
   id group        x
1:  7     2 1.511522
2:  9     2 2.018424

This leads me to suspect that it's the inclusion in the lapply() that causes the error but I find it hard to see what the error is, let along how to fix it.

Edit

Data with two variables:

set.seed(42)
dat <- list(data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10), y=11:20), 
            data.table(id=1:10, group=rep(1:2, each=5), x=rnorm(10), y=11:20))

with expected result

[[1]]
   id group          x   y
1:  9     2  2.0184237  19
2:  1     1  1.3709584  11
3:  2     1 -0.5646982  12
4:  3     1  0.3631284  13
5:  4     1  0.6328626  14
6:  5     1  0.4042683  15

[[2]]
   id group          x   y
1:  2     1  2.2866454  12
2: 10     2  1.3201133  20

I think your function subs() is not aware of the object cutoff, as you're not passing it trough arguments. Substitute returns parse tree, but doesn't create cutoff? subs = function(x, ..., verbose=FALSE, cutoff = cutoff) and res[[k]] <- subs(g, 1 > cutoff, cutoff = cutoff) would, for example, work edit: outside the lapply ou create it in global env. which sub() can access there? — Arcoutte
– Arcoutte, Commented Aug 21, 2019 at 12:08
Correct, subs() passes x>cutoff as it's to L not x>1, insert browser() at the first line of subs() and re run the code. — A. Suliman
– A. Suliman, Commented Aug 21, 2019 at 12:12
@Arcoutte, I don't quite understand. If possible, I would like to keep the conditions entered as flexible as possible so I can apply subs() to different situations. Are you saying I have to define variables in the definition of the function? — bumblebee
– bumblebee, Commented Aug 21, 2019 at 12:26
I checked and think you are right: It creates a global variable. I still don't understand how to do it within lapply(). — bumblebee
– bumblebee, Commented Aug 21, 2019 at 12:29
Change L in subs to L=list(...) and res[[k]] to res[[k]] <- subs(g, substitute(x>cutoff)),Works fine for one and two conditions in subs but I don't know if it will scale to your real case. — A. Suliman
– A. Suliman, Commented Aug 21, 2019 at 13:50

Roland · Accepted Answer · 2019-09-02 13:15:34Z

3

If you use non-standard evaluation you always pay a price. Here it is a scoping issue.

It works like this:

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]
  mon = data.table(cond = as.character(L))[, skip := FALSE]

  for (i in seq_along(L)){
    d = eval( substitute(x[cond,, #needed to add this comma, don't know why
                           verbose=v], list(cond = L[[i]], v = verbose)))
    if (nrow(d)){
      x = d
    } else {
      mon[i, skip := TRUE]
    }    
  }
  #print(mon)
  return(x)
}

out <- lapply(1:2, function(h){
  res <- list()
  d <- dat[[h]] 
  for(k in 1:2){
    g <- d[group==k]

    cutoff <- 1
    res[[k]] <- eval(substitute(subs(g, x>cutoff), list(cutoff = cutoff)))
  }
  res
})
#works

Is there a particular reason for not using data.table's by parameter?

Edit:

Background: The point of subs() is to apply multiple conditions (if multiple are passed to it) unless one would result in an empty subset.

I would use a different approach then:

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]

  for (i in seq_along(L)){
    d = eval( substitute(x[cond, , verbose=v], list(cond = L[[i]], v = verbose)))
    x <- rbind(d, x[!d, on = "group"]) 
  }

  return(x)
}

out <- lapply(dat, function(d){

  cutoff <- 2 #to get empty groups

  eval(substitute(subs(d, x>cutoff), list(cutoff = cutoff)))

})

#[[1]]
#   id group          x
#1:  9     2  2.0184237
#2:  1     1  1.3709584
#3:  2     1 -0.5646982
#4:  3     1  0.3631284
#5:  4     1  0.6328626
#6:  5     1  0.4042683
#
#[[2]]
#   id group          x
#1:  2     1  2.2866454
#2:  6     2  0.6359504
#3:  7     2 -0.2842529
#4:  8     2 -2.6564554
#5:  9     2 -2.4404669
#6: 10     2  1.3201133

Beware that this does not retain the ordering.

Another option that retains the ordering:

subs = function(x, ..., verbose=FALSE){
  L   = substitute(list(...))[-1]

  for (i in seq_along(L)){
    x = eval( substitute(x[, {
      res <- .SD[cond];
      if (nrow(res) > 0) res else .SD 
    }, by = "group", verbose=v], list(cond = L[[i]], v = verbose)))
  }

  return(x)
}

The by variable could be passed as a function parameter and then substituted in together with the condition.

I haven't done benchmarks comparing the efficiency of these two.

edited Sep 2, 2019 at 13:15

answered Aug 21, 2019 at 13:13

Roland

134k12 gold badges203 silver badges305 bronze badges

Sign up to request clarification or add additional context in comments.

21 Comments

bumblebee Over a year ago

Do you have a suggestion how to use [..., by=group] together with subs() ? I am all ears! Also: Is the drop=FALSE necessary? I thought I read data.table made this obsolete.

Roland Over a year ago

Sorry, artifacts from testing.

Roland Over a year ago

Can you change subs? Then just pass a value that is then passed on to by.

bumblebee Over a year ago

Background: The point of subs() is to apply multiple conditions (if multiple are passed to it) unless one would result in an empty subset. It was Frank's solution to an earlier question of mine. As long as it retains this feature, it can be changed. I would welcome this even, since the for-loop over groups is not ideal.

bumblebee Over a year ago

Excellent! To improve on your edit, I wonder whether subs = function(x, by, ..., verbose=FALSE){... grouping = as.character(by) ... by = as.character(grouping) ...}, where by picks up the grouping variable would work. Does this make sense?

|

Collectives™ on Stack Overflow

variables not recognized in for-loop nested within lapply

1 Answer 1

21 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

21 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related