7

An example case is here:

DT = data.table(x=1:4, y=6:9, z=3:6)
setkey(DT, x, y)

Join columns have multiple values:

xc = c(1, 2, 4)
yc = c(6, 9)
DT[J(xc, yc), nomatch=0]
   x y z
1: 1 6 3

This use of J() returns only single row. Actually, I want to join as %in% operator.

DT[x %in% xc & y %in% yc]
   x y z
1: 1 6 3
2: 4 9 6

But using %in% operator makes the search a vector scan which is very slow compared to binary search. In order to have binary search, I build every possible combination of join values:

xc2 = rep(xc, length(yc))
yc2 = unlist(lapply(yc, rep, length(xc)))
DT[J(xc2, yc2), nomatch=0]
   x y z
1: 1 6 3
2: 4 9 6

But building xc2, yc2 in this way makes code difficult to read. Is there a better way to have the speed of binary search and the simplicity of %in% operator in this case?

1
  • 11
    I think you're looking for cross join - which is the function CJ. Try DT[CJ(xc,yc), nomatch=0L]. Feel free to post this as the answer (if correct) and accept it. Commented Sep 1, 2014 at 15:46

1 Answer 1

1

Answering to remove this question from DT tag open questions.
Code from Arun's comment DT[CJ(xc,yc), nomatch=0L] will do the job.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.