1

Given two dataframes whose names overlap partially, foo and bar:

foo <- iris[1:10,-c(4,5)]
#   Sepal.Length Sepal.Width Petal.Length
# 1           5.1         3.5          1.4
# 2           4.9         3.0          1.4
# 3           4.7         3.2          1.3
# 4           4.6         3.1          1.5
# 5           5.0         3.6          1.4
# 6           5.4         3.9          1.7
# 7           4.6         3.4          1.4
# 8           5.0         3.4          1.5
# 9           4.4         2.9          1.4
# 10          4.9         3.1          1.5

bar <- iris[3:13,-c(3,5)]
bar[1:8, ] <- bar[1:8, ] * 2
#    Sepal.Length Sepal.Width Petal.Width
# 3           9.4         6.4         0.4
# 4           9.2         6.2         0.4
# 5          10.0         7.2         0.4
# 6          10.8         7.8         0.8
# 7           9.2         6.8         0.6
# 8          10.0         6.8         0.4
# 9           8.8         5.8         0.4
# 10          9.8         6.2         0.2
# 11          5.4         3.7         0.2
# 12          4.8         3.4         0.2
# 13          4.8         3.0         0.1

How can I merge the dataframes such that both rows and columns are padded for missing cases, while prioritising the results of one dataframe for overlapping elements? In this example, it is the overlapping results in bar that I wish to prioritise.

merge(..., by = "row.names", all = TRUE) is close, in that it retains all 13 rows, and returns missing values as NA:

foobar <- merge(foo, bar, by = "row.names", all = TRUE)
#    Row.names Sepal.Length.x Sepal.Width.x Petal.Length Sepal.Length.y Sepal.Width.y Petal.Width
# 1          1            5.1           3.5          1.4             NA            NA          NA
# 2         10            4.9           3.1          1.5            9.8           6.2         0.2
# 3         11             NA            NA           NA            5.4           3.7         0.2
# 4         12             NA            NA           NA            4.8           3.4         0.2
# 5         13             NA            NA           NA            4.8           3.0         0.1
# 6          2            4.9           3.0          1.4             NA            NA          NA
# 7          3            4.7           3.2          1.3            9.4           6.4         0.4
# 8          4            4.6           3.1          1.5            9.2           6.2         0.4
# 9          5            5.0           3.6          1.4           10.0           7.2         0.4
# 10         6            5.4           3.9          1.7           10.8           7.8         0.8
# 11         7            4.6           3.4          1.4            9.2           6.8         0.6
# 12         8            5.0           3.4          1.5           10.0           6.8         0.4
# 13         9            4.4           2.9          1.4            8.8           5.8         0.4

However, it creates a distinct column for each column in the constituent dataframes, regardless of the fact that they share names.

The desired output would be as such:

#    Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1           5.1         3.5          1.4          NA # unique to foo
# 2           4.9         3.0          1.4          NA # unique to foo
# 3           9.4         6.4          1.3          0.4 # overlap, retained from bar
# 4           9.2         6.2          1.5          0.4 # 
# 5          10.0         7.2          1.4          0.4 # .
# 6          10.8         7.8          1.7          0.8 # .
# 7           9.2         6.8          1.4          0.6 # .
# 8          10.0         6.8          1.5          0.4 # 
# 9           8.8         5.8          1.4          0.4 # 
# 10          9.8         6.2          1.5          0.2 # overlap, retained from bar
# 11          5.4         3.7           NA          0.2 # unique to bar
# 12          4.8         3.4           NA          0.2 # unique to bar
# 13          4.8         3.0           NA          0.1 # unique to bar

My intuition is to subset the data into two disjoint sets, and the set of intersecting elements in bar, then merge these, but I'm sure there is a more elegant solution!

2 Answers 2

1

(Edited) The package plyr is awesome for this sort of thing. Just do:

 library(plyr)
 foo$ID <- row.names(foo)
 bar$ID <- row.names(bar)
 foobar <- join(foo, bar, type = "full", by = "ID")

Joining by row.names didn't work, as Flodl noted in the comments, so that's why I made a new column "ID".

Sign up to request clarification or add additional context in comments.

6 Comments

Error in [.data.frame(x, by) : undefined columns selected
Furthermore, the help page suggests that we should expect the result to be the same as from merge.
Now this is not doing the overwriting like the OP wants. Please test and compare with his expected output.
Ah, I see... Yes, I think any solution I would have wouldn't be any better than the one voidHead is thinking of.
join(bar, foo, type = "full", by = "ID", match = "first") seems more like it. If the OP does not care for the order of the rows and columns.
|
1

I see the glowing recommendation for plyr::join but do not see how it is much different than what the base merge offers:

 merge(foo, bar, by=c("Sepal.Length", "Sepal.Width"), all=TRUE)

5 Comments

Well, it is clearly not what the OP wants. Just compare your output with the OP's.
Agreed not clear. I assumed that the difference in the Petal.Width values were explained by laziness on the part of the OP. The missing calculated text values are explained by laziness on my part.
@BondedDust Which Petal.Width values are you referring to? I constructed the expected output by hand, but I believe it's consistent with the example data.
All those Petal.Length values less than 1.0. There are none such in the original.
Right you are. Corrected.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.