R: Merge two data frames based on value in column and return all values of both data frames

Question

Let's say I have the following dfs

Now I want to merge both dfs conditional of column "a" to give me the following df

In my dataset i tried using

merge <- merge(x = df1, y = df2, by = "a", all = TRUE)

However, while df1 has 50,000 entries and df2 has 100,000 entries and there are definately matching values in column a the merged df has over one million entries. I do not understand this. As I understand there should be max. 150,000 entries in the merged df and this is the case when no values in column a are equal between the two dfs.

Using the example datasets you provided above, the example works. This means it is likely something to do with your datasets' format/structure. What is the output of str(df1) and str(df2)? One thing to be conscious of is that if the column names are the same in the two datasets it will make new ones by adding .y to the columns in the second dataset. I'm assuming this was just for the example though. — m.evans
– m.evans, Commented Mar 16, 2020 at 17:24

Chris Ruehlemann · Accepted Answer · 2020-03-16 17:36:26Z

3

I think what you want to do is not mergebut rather rbind the two dataframes and remove the duplicated rows:

DATA:

df1 <- data.frame(a = c(1,4,9),
                  b = c(2,3,7),
                  c = c(3,3,3),
                  d = c(4,4,4))
df2 <- data.frame(a = c(1,2,3),
                  b = c(2,2,2),
                  c = c(3,3,3),
                  d = c(4,4,4))

SOLUTION:

Row-bind df1and df2:

df3 <- rbind(df1, df2)

Remove the duplicate rows:

df3 <- df3[!duplicated(df3), ]

RESULT:

answered Mar 16, 2020 at 17:36

Chris Ruehlemann

21.5k4 gold badges15 silver badges45 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

akrun · Accepted Answer · 2020-03-16 18:28:52Z

1

With tidyverse, we can do bind_rows and distinct

library(dplyr)
bind_rows(df1, df2) %>%
     distinct

data

df1 <- structure(list(a = c(1, 4, 9), b = c(2, 3, 7), c = c(3, 3, 3), 
    d = c(4, 4, 4)), class = "data.frame", row.names = c(NA, 
-3L))

df2 <- structure(list(a = c(1, 2, 3), b = c(2, 2, 2), c = c(3, 3, 3), 
    d = c(4, 4, 4)), class = "data.frame", row.names = c(NA, 
-3L))

answered Mar 16, 2020 at 18:28

akrun

891k38 gold badges590 silver badges700 bronze badges

Comments

Yuriy Saraykin · Accepted Answer · 2020-03-16 20:12:31Z

0

it is possible so

dplyr::union(df1, df2)

answered Mar 16, 2020 at 20:12

Yuriy Saraykin

8,9501 gold badge11 silver badges16 bronze badges

Comments

ThomasIsCoding · Accepted Answer · 2020-03-16 20:29:57Z

0

here is another base R solution using rbind + %in%

dfout <- rbind(df1,subset(df2,!a %in% df1$a))

such that

> rbind(df1,subset(df2,!a %in% df1$a))
   a b c d
1  1 2 3 4
2  4 3 3 4
3  9 7 3 4
21 2 2 3 4
31 3 2 3 4

answered Mar 16, 2020 at 20:29

ThomasIsCoding

106k9 gold badges38 silver badges110 bronze badges

Collectives™ on Stack Overflow

R: Merge two data frames based on value in column and return all values of both data frames

4 Answers 4

Comments

data

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

data

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related