29

I'm trying to match 4 variables pairwise and add a column with the lookup value. In base, I would do merge(df1,df2, by.x=c("lsr","ppr"),by.y=c("li","pro")), where df1 has 9 cols and df2 (2 being lsr and pro) df2 has only 3, li, pro, and the "value" I'm interested in, alpha.

This works fine, but as I'm beginning to be a huge fan of data.table, I would like to do this in the data.table way - and because I have some millions of rows - so base merge is slow (I saw, that the by.x, and by.y feature is pending for data.table, but maybe there is a workaround). See some sample data below:

df2:
         alpha         li        pro
      1: 0.5000000 0.01666667 0.01666667
      2: 0.3295455 0.03333333 0.01666667
      3: 0.2435897 0.05000000 0.01666667
      4: 0.1917808 0.06666667 0.01666667
      5: 0.1571429 0.08333333 0.01666667
df1:     
          demand rtime    mcv         mck        ppr       mlv         mlk        lsr
      1:    0.3     1 357.57700 0.099326944 0.01666667 558.27267 0.155075741 0.01666667
      2:    0.3    10 548.75433 0.152431759 0.01666667 614.30667 0.170640741 0.03333333
      3:    0.3    11 314.55767 0.087377130 0.01666667 636.48100 0.176800278 0.03333333
      4:    0.3     2 312.15033 0.086708426 0.01666667 677.48100 0.188189167 0.06666667
      5:    0.3     3 454.47867 0.126244074 0.01666667 608.92067 0.169144630 0.01666667
     ---                                                                               
6899196:    0.6     5 537.92673 0.149424093 1.00000000 537.92673 0.149424093 1.00000000
6899197:    0.6     6 277.34732 0.077040923 1.00000000 277.34732 0.077040923 1.00000000
6899198:    0.6     7  73.31484 0.020365235 1.00000000  73.31484 0.020365235 1.00000000
6899199:    0.6     8  32.04197 0.008900546 1.00000000  32.04197 0.008900546 1.00000000
6899200:    0.6     9  14.59008 0.004052799 1.00000000  14.59008 0.004052799 1.00000000

Last, maybe of interest is, that in df2 I have unique rows, and in df1, I have lots of duplicates in respect to lsr and ppr. I also tried to set two keys and join them the data.table way, and adding a new column with alpha. But without success.

7
  • If you don't need an outer join, you can just do a left, a right join or an inner join using binary join. The question is which one you want. Though judging by your base R code it seems like you wan an inner join? Also, which column you want to keep? You also don't have pro column in df1 Commented May 21, 2015 at 10:34
  • Thanks. Actually I believe, in this case the inner and the left joins are the same, as all combinations of lsr and ppr, or of li and pro are included in both df1 and df2. And not having pro in df1 is a typo!! Sorry for that, I meant, ppr in df1, I'll edit and correct it. I'd like to keep all columns from df1 and add alpha as a new column Commented May 21, 2015 at 11:27
  • If all you need is a left join to df1 while updating alpha column, it is a simple binary join. Try setkey(df1, lsr, ppr) ; setkey(df2, li, pro) ; df1[df2, alpha := i.alpha] ; df1 Commented May 21, 2015 at 11:44
  • 1
    So if you already have an alpha column in your df1 set, i will tell data.table to get the information from the alpha in df2. It is always safer to put i, before the column names on the RHS in order to avoid possible unexpected results. Commented May 21, 2015 at 12:00
  • 6
    OK. now I'm really a fan of data.table. Thanks! Commented May 21, 2015 at 12:07

1 Answer 1

29

You can use the statement provided by David Arenburg in comment:

setkey(df1, lsr, ppr)
setkey(df2, li, pro)
df1[df2, alpha := i.alpha]

From the current devel version, 1.9.5, we can perform joins directly without having to set keys using the on argument:

df1[df2, alpha := i.alpha, on = c(lsr="li", ppr="pro")]

If you don't want to install the devel version, then you can wait until this is pushed as v1.9.6 on CRAN.

Sign up to request clarification or add additional context in comments.

2 Comments

Is df1[df2, alpha := i.alpha] supposed to add a column to df1? It's not doing anything for me.
yes, please submit bug report to data.table repository providing reproducible example and session info

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.