Given two dataframes whose names overlap partially, foo and bar:
foo <- iris[1:10,-c(4,5)]
# Sepal.Length Sepal.Width Petal.Length
# 1 5.1 3.5 1.4
# 2 4.9 3.0 1.4
# 3 4.7 3.2 1.3
# 4 4.6 3.1 1.5
# 5 5.0 3.6 1.4
# 6 5.4 3.9 1.7
# 7 4.6 3.4 1.4
# 8 5.0 3.4 1.5
# 9 4.4 2.9 1.4
# 10 4.9 3.1 1.5
bar <- iris[3:13,-c(3,5)]
bar[1:8, ] <- bar[1:8, ] * 2
# Sepal.Length Sepal.Width Petal.Width
# 3 9.4 6.4 0.4
# 4 9.2 6.2 0.4
# 5 10.0 7.2 0.4
# 6 10.8 7.8 0.8
# 7 9.2 6.8 0.6
# 8 10.0 6.8 0.4
# 9 8.8 5.8 0.4
# 10 9.8 6.2 0.2
# 11 5.4 3.7 0.2
# 12 4.8 3.4 0.2
# 13 4.8 3.0 0.1
How can I merge the dataframes such that both rows and columns are padded for missing cases, while prioritising the results of one dataframe for overlapping elements? In this example, it is the overlapping results in bar that I wish to prioritise.
merge(..., by = "row.names", all = TRUE) is close, in that it retains all 13 rows, and returns missing values as NA:
foobar <- merge(foo, bar, by = "row.names", all = TRUE)
# Row.names Sepal.Length.x Sepal.Width.x Petal.Length Sepal.Length.y Sepal.Width.y Petal.Width
# 1 1 5.1 3.5 1.4 NA NA NA
# 2 10 4.9 3.1 1.5 9.8 6.2 0.2
# 3 11 NA NA NA 5.4 3.7 0.2
# 4 12 NA NA NA 4.8 3.4 0.2
# 5 13 NA NA NA 4.8 3.0 0.1
# 6 2 4.9 3.0 1.4 NA NA NA
# 7 3 4.7 3.2 1.3 9.4 6.4 0.4
# 8 4 4.6 3.1 1.5 9.2 6.2 0.4
# 9 5 5.0 3.6 1.4 10.0 7.2 0.4
# 10 6 5.4 3.9 1.7 10.8 7.8 0.8
# 11 7 4.6 3.4 1.4 9.2 6.8 0.6
# 12 8 5.0 3.4 1.5 10.0 6.8 0.4
# 13 9 4.4 2.9 1.4 8.8 5.8 0.4
However, it creates a distinct column for each column in the constituent dataframes, regardless of the fact that they share names.
The desired output would be as such:
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1 5.1 3.5 1.4 NA # unique to foo
# 2 4.9 3.0 1.4 NA # unique to foo
# 3 9.4 6.4 1.3 0.4 # overlap, retained from bar
# 4 9.2 6.2 1.5 0.4 #
# 5 10.0 7.2 1.4 0.4 # .
# 6 10.8 7.8 1.7 0.8 # .
# 7 9.2 6.8 1.4 0.6 # .
# 8 10.0 6.8 1.5 0.4 #
# 9 8.8 5.8 1.4 0.4 #
# 10 9.8 6.2 1.5 0.2 # overlap, retained from bar
# 11 5.4 3.7 NA 0.2 # unique to bar
# 12 4.8 3.4 NA 0.2 # unique to bar
# 13 4.8 3.0 NA 0.1 # unique to bar
My intuition is to subset the data into two disjoint sets, and the set of intersecting elements in bar, then merge these, but I'm sure there is a more elegant solution!