1

Sometimes, I have a keyed data.table which I'd like to subset according to its key and an unkeyed column. What's the simplest/fastest way to do this?

What feels most natural is an error:

dt <- data.table(id = 1:100, var = rnorm(100), key = "id")
dt[.(seq(1, 100, 2)) & var > 0, ]

The next cleanest thing is to chain:

dt[.(seq(1, 100, 2))][var > 0, ]

And of course we can ditch using binary search at all (I think this is clearly to be avoided):

dt[id %in% seq(1, 100, 2) & var > 0, ]

Is there an approach I'm missing? Also, any particular reason why the first is an error? The syntax seems clear enough to me.

3
  • I'm betting on the "clean" chain. If your second condition is an inequality, I doubt the current system of indexing can help. There is "auto indexing" on equality conditions now, but I'm not sure about the details. It's mentioned in the news: github.com/Rdatatable/data.table If you need to do a by=.EACHI with your subset, you'll have to switch the chain around, I guess. dt[var>2][.(seq(1,100,2)),...do stuff...,by=.EACHI] Commented May 14, 2015 at 1:09
  • 2
    see comments here Commented May 14, 2015 at 3:51
  • so it seems like the answer really depends on what I want to do in j, is that safe to say? Commented May 14, 2015 at 4:26

1 Answer 1

0

As of this writing, the native way to do:

dt[.(seq(1, 100, 2)) & var > 0, j] #some expression j

is the following:

dt[.(seq(1, 100, 2)), .SD[var > 0, j]]

The more I work with data.table, the more natural this is, but it still looks a bit unintuitive. C'est la vie.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.