85

I am trying to LEFT Join 2 data frames but I do not want join all the variables from the second data set:

As an example, I have dataset 1 (DF1):

  Cl    Q   Sales  Date
   A    2   30     01/01/2014
   A    3   24     02/01/2014
   A    1   10     03/01/2014
   B    4   10     01/01/2014
   B    1   20     02/01/2014
   B    3   30     03/01/2014

And I would like to left join dataset 2 (DF2):

Client  LO  CON
   A    12  CA
   B    11  US
   C    12  UK
   D    10  CA
   E    15  AUS
   F    91  DD

I am able to left join with the following code:

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE) :

   Client Q    Sales   Date             LO      CON
   A      2    30      01/01/2014       12      CA
   A      3    24      02/01/2014       12      CA
   A      1    10      03/01/2014       12      CA
   B      4    10      01/01/2014       11      US
   B      1    20      02/01/2014       11      US
   B      3    30      03/01/2014       11      US

However, it merges both column LO and CON. I would only like to merge the column LO.

   Client Q    Sales   Date             LO      
   A      2    30      01/01/2014       12      
   A      3    24      02/01/2014       12      
   A      1    10      03/01/2014       12      
   B      4    10      01/01/2014       11      
   B      1    20      02/01/2014       11     
   B      3    30      03/01/2014       11      
1
  • 4
    you need to rename "Cl" into "Client" in DF1 (in the written example above). Otherwise the merge function does not work. Commented Nov 24, 2014 at 23:56

6 Answers 6

137

You can do this by subsetting the data you pass into your merge:

merge(x = DF1, y = DF2[ , c("Client", "LO")], by = "Client", all.x=TRUE)

Or you can simply delete the column after your current merge :)

Sign up to request clarification or add additional context in comments.

1 Comment

remembering that you need to include the variable you are merging by
31

I think it's a little simpler to use the dplyr functions select and left_join ; at least it's easier for me to understand. The join function from dplyr are made to mimic sql arguments.

 library(tidyverse)

 DF2 <- DF2 %>%
   select(client, LO)

 joined_data <- left_join(DF1, DF2, by = "Client")

You don't actually need to use the "by" argument in this case because the columns have the same name.

Comments

6

Alternative solution using left_join() and select() from the dplyr package, without intermediate steps:

DF1 <- DF1 %>%
  left_join(DF2, by = "Client") %>%
  select(-CON)

Comments

5

Nothing elegant but this could be another satisfactory answer.

merge(x = DF1, y = DF2, by = "Client", all.x=TRUE)[,c("Client","LO","CON")]

This will be useful especially when you don't need the keys that were used to join the tables in your results.

Comments

4

One-liner with dplyr

DF_joined <- left_join(DF1, select(DF2, -CON), by = "Client")

Comments

3

For Client column in both tables:

DF_joined <- DF1 %>% left_join(DF2 %>% select(Client,CON))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.