0

I have two dataframes like this:

#
df_1 <- data.frame(x = c('x4','x4','x5','x5','x5','x6','x6'),
                   y = c(0,0,1,1,1,0,0))
#
df_2 <- data.frame(x = c('x4','x4','x5','x5','x5','x7','x7'),
                   z = c(1,1,1,1,1,0,0))

I would like to merge them based on column x but in the new df have only the rows which are the same in both x column of every df. Example output:

x y z
x4 0 1
x4 0 1
x5 1 1
x5 1 1
x5 1 1

I tried this

merge(x = df_1, y = df_2, by = "x", all = TRUE)

but doesn't make. What can I do?

results from

merge(df_1, df_2)
    x y z
1  x4 0 1
2  x4 0 1
3  x4 0 1
4  x4 0 1
5  x5 1 1
6  x5 1 1
7  x5 1 1
8  x5 1 1
9  x5 1 1
10 x5 1 1
11 x5 1 1
12 x5 1 1
13 x5 1 1

Using this:

intersect(df_1$x, df_2$x)
[1] "x4" "x5"

it is possible to see which are the common values in the dataframes. Is it possible to use it as the rule to merge the rows which are only common?

5
  • @jogo thank you. This is the question I tried but it didn't worked for me. In x column I have same names and I would to merge by them and keep them. Please see the updated with the simple merge in my answer and this is not what I expected as output. Commented Jan 31, 2018 at 12:52
  • 1
    It is a m:n-join, e.g. each row from df_1 with "x4" is crossed with each row from df_2 with "x4". So you will get 2*2=4 rows in the result. So please define the logic to reduce the result! Commented Jan 31, 2018 at 12:57
  • @jogo thank you. I don't think merge is the right solution. Please see my expected result. The only common between with the two dataframes is the column x. From column x I know that there are values in rows which have the same value. I would to create a new dataframe based on this and I would to have the other columns based on the previous. That's why I have in my expected output y and z Commented Jan 31, 2018 at 13:03
  • 1
    Are you looking for cbind(df_1[df_1$x %in% df_2$x,], z=df_2[df_2$x %in% df_1$x, "z"]) ? Commented Jan 31, 2018 at 13:05
  • @jogo yes this is a solution but it is a little hard for me to implement it in my real dataset as I have many more columns. I only try to find a way to merge to dataframe into a new based on a column but I want to merge only the rows which have the same value between this to dataframes Commented Jan 31, 2018 at 13:14

1 Answer 1

1

With base, as jogo points out, simply run

merge(df_1, unique(df_2))

With tidyverse,

library(tidyverse)

left_join(df_1, unique(df_2), by = "x")
      x y z
   1 x4 0 1
   2 x4 0 1
   3 x5 1 1
   4 x5 1 1
   5 x5 1 1
Sign up to request clarification or add additional context in comments.

2 Comments

thank you but can you see in the output that for example the x4 must be 2 times and it is 4. It seems like it joins duple time
All right, I edited the answer. You'd need to reduce df_2 to its unique values. The function unique() does the trick for data.frame.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.