R data.table merge tables grouping by multiple columns

Question

I have two huge data tables (dt1 and dt2) that are almost identical except for 1 column. I want to join the tables by the p-1 columns, where p <- ncol(dt1). Should I setkey() to the p-1 columns and join using dt1[dt2]? If that is the case, how can I enter the arguments in setkey() since I can't put quoted string as argument.

Here is some simulated data:

dt1 <- data.table(matrix(rnorm(260), 10, 26))
setnames(dt1, letters)
dt2 <- copy(dt1)
dt2[,z:=rnorm(10)]

## Sections below won't run
setkey(dt1, get(letters[-which(letters=="z")]))
setkey(dt2, get(letters[-which(letters=="z")]))
dt1[dt2]

eddi · Accepted Answer · 2014-07-23 15:09:41Z

2

Use setkeyv:

setkeyv(dt1, letters[-which(letters=="z")])
setkeyv(dt2, letters[-which(letters=="z")])
dt1[dt2]

answered Jul 23, 2014 at 15:09

eddi

49.5k6 gold badges109 silver badges157 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

bjoseph · Accepted Answer · 2014-07-23 15:04:56Z

0

If you know the name of the different column this works

merge(dt1,dt2,names(dt1)[-grep("z",names(dt1))])

It also preserves the two original differing columns as dt$z.x and dt$z.y

answered Jul 23, 2014 at 15:04

bjoseph

2,16617 silver badges24 bronze badges

Collectives™ on Stack Overflow

R data.table merge tables grouping by multiple columns

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related