1

I have two huge data tables (dt1 and dt2) that are almost identical except for 1 column. I want to join the tables by the p-1 columns, where p <- ncol(dt1). Should I setkey() to the p-1 columns and join using dt1[dt2]? If that is the case, how can I enter the arguments in setkey() since I can't put quoted string as argument.

Here is some simulated data:

dt1 <- data.table(matrix(rnorm(260), 10, 26))
setnames(dt1, letters)
dt2 <- copy(dt1)
dt2[,z:=rnorm(10)]

## Sections below won't run
setkey(dt1, get(letters[-which(letters=="z")]))
setkey(dt2, get(letters[-which(letters=="z")]))
dt1[dt2]

2 Answers 2

2

Use setkeyv:

setkeyv(dt1, letters[-which(letters=="z")])
setkeyv(dt2, letters[-which(letters=="z")])
dt1[dt2]
Sign up to request clarification or add additional context in comments.

Comments

0

If you know the name of the different column this works

merge(dt1,dt2,names(dt1)[-grep("z",names(dt1))])

It also preserves the two original differing columns as dt$z.x and dt$z.y

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.