10

I'm trying to pass two separate lists of variable names into a data.table (v1.9.4). It returns the correct columns, but it strips the variable names. This works as expected:

dt <- data.table(a=1:3, b=4:6, c=7:9, d=10:12)
dt
   a b c  d
1: 1 4 7 10
2: 2 5 8 11
3: 3 6 9 12

It also works fine to pass a single list of names:

dt[,list(a,b)]
   a b
1: 1 4
2: 2 5
3: 3 6

But when I need to pass multiple lists, it returns the correct columns but strips the variable names:

dt[,c(list(a,b), list(c,d))]
   V1 V2 V3 V4
1:  1  4  7 10
2:  2  5  8 11
3:  3  6  9 12

Why two lists? I'm using multiple quote()'d lists of variables. I've read FAQ question 1.6, and I know that one workaround is to use a character vector using with=FALSE. But my real use case involves passing a mix of names and expressions to a function, e.g.,

varnames <- quote(list(a,b))
expr <- quote(list(a*b, c+d))
function(dt, varnames, expr) {
  dt[,c(varnames, expr)]
}

And I'd like the "varnames" columns to have their proper names (and they do if you just pass a single list like

dt[,list(a,b,a*b,c+d)]
   a b V3 V4
1: 1 4  4 17
2: 2 5 10 19
3: 3 6 18 21

How can I combine multiple lists in a data.table such that it still returns the proper column names? (I'm not completely sure if this is a data.table issue or if I'm just doing something silly in the way I'm trying to combine lists in R, but c() seems to do what I want.)

3
  • 1
    It works with named lists, not sure if that would be acceptible for your purposes, though. dt[,c(list(a = a, b = b), list(c = c,d = d))] Commented May 5, 2015 at 19:59
  • Interesting--thanks! That certainly qualifies as a workaround, and it's better than the FAQ 1.6 workaround, since that will also work with expressions (not just variable names). But I'd still like to know if there's a way to combine unnamed lists so it works with the regular (clean!) data.table syntax. (Or if I'm doing something stupid with the way I'm combining quoted lists.) Commented May 5, 2015 at 20:12
  • 1
    @cauchy you're not doing anything wrong, data.table is just doing you a favor in the simple list case by converting the unnamed list into a named one, but it can't do that in more complicated cases as the intention is not clear Commented May 5, 2015 at 20:15

2 Answers 2

4

Another option is to construct the full call ahead of time:

varnames[4:5] <- expr[2:3]  # this results in `list(a, b, a * b, c + d)`
dt[, eval(varnames)]

produces:

   a b V3 V4
1: 1 4  4 17
2: 2 5 10 19
3: 3 6 18 21

More generically, suppose you have a list of quoted lists of expressions:

exprlist <- list(quote(list(a, b)), quote(list(c, c %% a)), quote(list(a + b)))
expr <-  as.call(Reduce(function(x, y) c(as.list(x), as.list(y)[-1]), exprlist))  # @eddi
dt[, eval(expr)]
Sign up to request clarification or add additional context in comments.

5 Comments

Cool! I didn't realize that could be done. Also, never seen a case where that sort of assignment works, but c(varnames,expr[-1]) does not. exprs are weird.
pretty cool - I suggest simplifying to: as.call(Reduce(function(x, y) c(as.list(x), as.list(y)[-1]), exprlist))
@eddi, great, I was trying to do something of the sort but couldn't quite get it to work.
@Frank, yes that is because there isn't a c method for calls that does what @eddi's suggestion does. Note however that you can actually c two expressions as produced by expression().
Thanks! I learned a lot from this one! I still wish there were a cleaner way to do this, but this lets me write a simple helper function that combines lists and doesn't pollute the data.table[ call too badly.
1

Here's a possible workaround using .SD

varnames <- quote(list(a,b))
expr <- quote(list(a*b, c+d))

myFunc <- function(dt, varnames, expr) {
  dt[, c(.SD[, eval(varnames)], eval(expr))]
}

myFunc(dt, varnames, expr)

#    a b V1 V2
# 1: 1 4  4 17
# 2: 2 5 10 19
# 3: 3 6 18 21

1 Comment

Wow, I'm surprised to see that it works in this case. I guess .SD() returns columns with names attached to them, so it works for the same reason as the named list suggested by docendo discimus above. The main downside of this for me is that I have tens of thousands of columns and only need a small number of variables, and .SD() is really inefficient when you're not using most of the columns.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.