3

I have the dt and dt1 data.tables.

dt<-data.table(id=c(rep(2, 3), rep(4, 2)), year=c(2005:2007, 2005:2006), event=c(1,0,0,0,1))
dt1<-data.table(id=rep(2, 5), year=c(2005:2009), performance=(1000:1004))

dt

   id year event
1:  2 2005     1
2:  2 2006     0
3:  2 2007     0
4:  4 2005     0
5:  4 2006     1

dt1

   id year performance
1:  2 2005        1000
2:  2 2006        1001
3:  2 2007        1002
4:  2 2008        1003
5:  2 2009        1004

I would like to subset the former using the combination of its first and second column that also appear in dt1. As a result of this, I would like to create a new object without overwriting dt. This is what I'd like to obtain.

   id year event
1:  2 2005     1
2:  2 2006     0
3:  2 2007     0

I tried to do this using the following code:

dt.sub<-dt[dt[,c(1:2)] %in% dt1[,c(1:2)],]

but it didn't work. As a result, I got back a data table identical to dt. I think there are at least two mistakes in my code. The first is that I am probably subsetting the data.table by column using a wrong method. The second, and pretty evident, is that %in% applies to vectors and not to multiple-column objects. Nevertherless, I am unable to find a more efficient way to do it...

Thank you in advance for your help!

2 Answers 2

8
setkeyv(dt,c('id','year'))
setkeyv(dt1,c('id','year'))
dt[dt1,nomatch=0]

Output -

> dt[dt1,nomatch=0]
   id year event performance
1:  2 2005     1        1000
2:  2 2006     0        1001
3:  2 2007     0        1002
Sign up to request clarification or add additional context in comments.

9 Comments

Many thanks! Probably this one is faster in very big data.tables.
If you don't want the performance column, then doing dt[dt1, list(event), nomatch=0L] should be a tad faster...
data.table supplies its own merge method, which works along these lines. I would expect the speed to be similar.
?merge already mentions that X[Y ...] is faster than merge, however it's not that slow (especially for the task in this question).
@Riccardo, no, as long as you don't use :=, there's no update by reference happening. You can just try it out yourself. Do ans <- dt[dt1, list(event), nomatch=0L] and check ans, dt and dt1.
|
4

Use merge:

merge(dt,dt1, by=c("year","id"))
   year id event performance
1: 2005  2     1        1000
2: 2006  2     0        1001
3: 2007  2     0        1002

1 Comment

OMG, sometimes the solution it's so easy that you can't see it... Thanks !

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.