0

I have 5 dataframes each with a different number of rows. I need to merge them all together based on values in col1 of each dataframe. Observe

df1 <- read.table(text="
   col1    col2
    A        5
    B        3
    C        6
    E        7", header=TRUE, stringsAsFactors=FALSE)

df2 <- read.table(text="
   col1    col2
    A        5
    B        6
    C       7
    M       8
    Z       9", header=T, stringsAsFactors=FALSE)

But I need it to produce:

   newdf
     col1    col2(#from df1)   col3(#from df2)
      A          5                  5
      B          3                  6
      C          6                  7  
      E          7                  0
      M          0                  8
      Z          0                  9

I have tried to merge a few at a time by='col1' but to no luck. Any tips?

What I have tried:

posidf<-merge(df1,df2,df3,df4,df5,all.x=TRUE)
#wont execute
posidf<-merge(df1,df2,df3,df4,df5,by="col1",all.x=TRUE)
#wont execute
posidf<-merge(df1,df2,df3,df4,df5,by="col1")
Error in fix.by(by.x, x) : 
'by' must specify one or more columns as numbers, names or logical
5
  • 1
    What exactly did your merge command look like? What does "no avail" mean exactly? Did you get an error message? Did you get output other than what you expected? Was the problem missing rows? Did you try setting the all= parameter? Commented Jul 10, 2018 at 15:14
  • 1
    Possible duplicate: stackoverflow.com/questions/5965698/… Commented Jul 10, 2018 at 15:15
  • @MrFlick edited. I tried the 'all-' as well. No Avail means it didn't work out. Commented Jul 10, 2018 at 15:24
  • merge() only takes two data.frames. You cannot pass in an arbitrary number of them? Check out the ?merge help page. You would have to repeatedly call something like merge(merge(merge(a,b), c, d) . Did you try it with just two data frames like the example you actually included in your question? Commented Jul 10, 2018 at 15:26
  • 3
    Possible duplicate of merge-multiple-data-frames-in-a-list Commented Jul 10, 2018 at 15:30

3 Answers 3

3

I suspect you are looking for something like this example:

merge(df1, df2, by = "col1", all.x=TRUE, all.y=TRUE)

Edit:

col1 <- c('A', 'B', 'C', 'E')
col2 <- c(5, 3, 6 ,7)
df1 <- data.frame(col1, col2)

col1 <- c('A', 'B', 'C', 'M', 'Z')
col2 <- c(5, 6, 7 ,8, 9)
df2 <- data.frame(col1, col2)

col1 <- c('A', 'B', 'C')
col2 <- c(10, 29, 7)
df3 <- data.frame(col1, col2)

col1 <- c('A', 'S', 'T')
col2 <- c(7 ,8, 9)
df4 <- data.frame(col1, col2)

col1 <- c('B', 'C')
col2 <- c(7 ,8)
df5 <- data.frame(col1, col2)

frame_list <- list(df1, df2, df3, df4, df5)
frame_names <- list('df1', 'df2', 'df3', 'df4', 'df5')
counter <- 0

df <- data.frame(Date=as.Date(character()),
                 File=character(), 
                 User=character(), 
                 stringsAsFactors=FALSE) 

df <- data.frame(matrix(ncol = 2, nrow = 0))
colnames(df) <- c("col1","col2")

for (d in frame_list) {
  counter <- counter + 1
  colnames(d) <- c("col1", paste0('col2_',frame_names[counter]))
  df <- merge(df, d, by = "col1", all.x=TRUE, all.y=TRUE)
}

df$col2 <- NULL
df[is.na(df)] <- 0
Sign up to request clarification or add additional context in comments.

1 Comment

Check out the updated question. The problem appears to be that the OP is trying to merge 5 data.frames, not just 2. This doesn't seem to address that problem.
1

Assuming your merge looks like this and your example is representative of your data, there are no issues.

newdf <- merge(x = df1, y = df2,'col1')

##  col1 col2.x col2.y
##1    A      5      5
##2    B      3      6
##3    C      6      7

If this is not the output you expect, you may want to look into the different types of merging: inner join (above), outer join, left outer, and right outer.

Outer join

merge(x = df1, y = df2, by = 'col1', all = TRUE)

##  col1 col2.x col2.y
##1    A      5      5
##2    B      3      6
##3    C      6      7
##4    E      7     NA
##5    M     NA      8
##6    Z     NA      9

Left outer

merge(x = df1, y = df2, by = 'col1', all.x = TRUE)

##  col1 col2.x col2.y
##1    A      5      5
##2    B      3      6
##3    C      6      7
##4    E      7     NA

Right outer

merge(x = df1, y = df2, by = 'col1', all.y = TRUE)

##  col1 col2.x col2.y
##1    A      5      5
##2    B      3      6
##3    C      6      7
##4    M     NA      8
##5    Z     NA      9

3 Comments

Check out the updated question. The problem appears to be that the OP is trying to merge 5 data.frames, not just 2. This doesn't seem to address that problem.
Thanks. Assumed the example was accurate to the problem. I'll edit a new response
I agree the example was misleading. But I also think the duplicate suggested by @zx8754 takes care of the problem.
1

I guess what you would like is a full_join. Using dplyr:

library(dplyr)

df1 <- data.frame(
  col1 = c("A", "B", "C", "E"),
  col2 = c(5, 3, 6, 7)
)

df2 <- data.frame(
  col1 = c("A", "B", "C", "M", "Z"),
  col2 = c(5, 6, 7, 8, 9)
)


df_merged <- full_join(x = df1, y = df2, by = "col1") 

If you want a 0 instead of a NA, replace them with

df_merged[is.na(df_merged)] <- 0

  col1 col2.x col2.y
1    A      5      5
2    B      3      6
3    C      6      7
4    E      7      0
5    M      0      8
6    Z      0      9

Edit for multiple data frames

Store them in a list and use reduce with the respective join, here full_join

set.seed(123)
df_list <- replicate(5, data.frame(col1 = LETTERS[sample(1:26, 5)], col2 = sample(1:9, 5)), simplify = F)
reduce(df_list, full_join, by = "col1")

   col1 col2.x col2.y col2.x.x col2.y.y col2
1     E      9     NA       NA        1    4
2     F      5     NA       NA       NA   NA
3     N      2     NA       NA        8   NA
4     X      7      4       NA       NA    6
5     P      8     NA       NA        5   NA
6     D     NA      8       NA       NA   NA
7     Q     NA      5       NA       NA   NA
8     J     NA      1       NA       NA   NA
9     U     NA      2       NA        9    8
10    V     NA     NA        1       NA   NA
11    M     NA     NA        7       NA   NA
12    B     NA     NA        8       NA   NA
13    H     NA     NA        9       NA   NA
14    I     NA     NA        4       NA   NA
15    K     NA     NA       NA        6   NA
16    W     NA     NA       NA       NA    9
17    O     NA     NA       NA       NA    3

But as @zx8754 suggested in the comment: duplicate from here Simultaneously merge multiple data.frames in a list

2 Comments

Check out the updated question. The problem appears to be that the OP is trying to merge 5 data.frames, not just 2. This doesn't seem to address that problem.
I see, updated the response which copies from stackoverflow.com/questions/8091303/… as already @zx8754 suggested

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.