R: Updating a data frame with another data frame

Question

Let's say our initial data frame looks like this:

df1 = data.frame(Index=c(1:6),A=c(1:6),B=c(1,2,3,NA,NA,NA),C=c(1,2,3,NA,NA,NA))

> df1
  Index A  B  C
1     1 1  1  1
2     2 2  2  2
3     3 3  3  3
4     4 4 NA NA
5     5 5 NA NA
6     6 6 NA NA

Another data frame contains new information for col B and C

df2 = data.frame(Index=c(4,5,6),B=c(4,4,4),C=c(5,5,5))

> df2
  Index B C
1     4 4 5
2     5 4 5
3     6 4 5

How can you update the missing values in df1 so it looks like this:

  Index A B C
1     1 1 1 1
2     2 2 2 2
3     3 3 3 3
4     4 4 4 5
5     5 5 4 5
6     6 6 4 5

My attempt:

library(dplyr)

> full_join(df1,df2)
Joining by: c("Index", "B", "C")
  Index  A  B  C
1     1  1  1  1
2     2  2  2  2
3     3  3  3  3
4     4  4 NA NA
5     5  5 NA NA
6     6  6 NA NA
7     4 NA  4  5
8     5 NA  4  5
9     6 NA  4  5

Which as you can see has created duplicate rows for the 4,5,6 index instead of replacing the NA values.

Any help would be greatly appreciated!

thelatemail · Accepted Answer · 2016-05-19 03:07:05Z

6

merge then aggregate:

aggregate(. ~ Index, data=merge(df1, df2, all=TRUE), na.omit, na.action=na.pass )

#  Index B C A
#1     1 1 1 1
#2     2 2 2 2
#3     3 3 3 3
#4     4 4 5 4
#5     5 4 5 5
#6     6 4 5 6

Or in dplyr speak:

df1 %>% 
    full_join(df2) %>%
    group_by(Index) %>%
    summarise_each(funs(na.omit))

#Joining by: c("Index", "B", "C")
#Source: local data frame [6 x 4]
#
#  Index     A     B     C
#  (dbl) (int) (dbl) (dbl)
#1     1     1     1     1
#2     2     2     2     2
#3     3     3     3     3
#4     4     4     4     5
#5     5     5     4     5
#6     6     6     4     5

edited May 19, 2016 at 3:07

answered May 19, 2016 at 2:55

thelatemail

94.3k12 gold badges140 silver badges197 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Zyferion Over a year ago

Thanks, it worked great! I really need to learn more about the aggregate function. Is there also a way to update entries that aren't necessarily NA?

Peter King Over a year ago

I tried the aggregate option where the master file (MF) had 9634 obs of 430 variables and the update file (UF) had 9249 obs of 4 identically named variables. The result file has 9224 obs of 430 variables (So considerable loss of masterfile data). The dplyr option (which was MUCH slower) [warning summarise_each() deprecated in dplyr 0.7.0] - and in fact never completed. So be warned you will lose NA data from your master file with this solution.

akrun · Accepted Answer · 2016-05-19 02:18:07Z

5

We can use join from data.table. Convert the 'data.frame' to 'data.table' (setDT(df1), join on with 'df1' using "Index" and assign (:=), the values in 'B' and 'C' with 'i.B' and 'i.C'.

library(data.table)
setDT(df1)[df2, c('B', 'C') := .(i.B, i.C), on = "Index"]
df1
#   Index A B C
#1:     1 1 1 1
#2:     2 2 2 2
#3:     3 3 3 3
#4:     4 4 4 5
#5:     5 5 4 5
#6:     6 6 4 5

answered May 19, 2016 at 2:18

akrun

891k38 gold badges590 silver badges700 bronze badges

4 Comments

Zyferion Over a year ago

Hi akrun, I couldn't get this solution to work for me

akrun Over a year ago

@Zyferion My solution was based on the example you provided. It works for me with data.table 1.9.6. Also, as your comment was not very informative (i.e. not providing any error message or anything), I can't comment.

Zyferion Over a year ago

Error in [.data.table(setDT(df1), df2, :=(c("B", "C"), .(i.B, i.C)), : unused argument (on = "Index")

akrun Over a year ago

@Zyferion It could be because you are using an older version of data.table. Please update to new version

LMc · Accepted Answer · 2022-06-27 22:02:19Z

5

As of dplyr >= 1.0.0 you can use rows_update:

library(dplyr)

df1 %>% 
  rows_update(df2, by = "Index")

  Index A B C
1     1 1 1 1
2     2 2 2 2
3     3 3 3 3
4     4 4 4 5
5     5 5 4 5
6     6 6 4 5

Alternatively, there is rows_patch:

rows_patch() works like rows_update() but only overwrites NA values.

edited Jun 27, 2022 at 22:02

answered May 11, 2022 at 16:54

LMc

19k4 gold badges41 silver badges54 bronze badges

1 Comment

ok1more Over a year ago

This only work if the both data frame has same columns

Zyferion · Accepted Answer · 2016-05-20 08:04:23Z

For those interested, I've extended this problem to:

- handle updating a data frame with another data frame with new columns

- replace any existing entries regardless if they're NA or not.

Heres the solution I found using the aggregate function from @thelatemail :)

df1 = data.frame(Index=c(1:6),A=c(1:6),B=c(1,2,3,3,3,3),C=c(1,2,3,3,3,3))

df2 = data.frame(Index=c(4,5,6),B=c(4,4,4),C=c(5,5,5),D=c(6,6,6),E=c(7,7,7))

df3 = full_join(df1,df2)

# Create a function na.omit.last 
na.omit.last = function(x){
  x <- na.omit(x)
  x <- last(x)
}

# For the columns not in df1 
dfA = aggregate(. ~ Index, df3, na.omit,na.action = na.pass)
dfA = dfA[,-(1:ncol(df1))] 
dfA = data.frame(lapply(dfA,as.numeric))

dfB = aggregate(. ~ Index, df3[,1:ncol(df1)], na.omit.last, na.action = na.pass)

# If there are more columns in df2 append dfA
if (ncol(df2) > ncol(df1)) {
  df3 = cbind(dfB,dfA)
}  else {
    df3 = dfB
  }

print(df3)

TomNash · Accepted Answer · 2016-05-19 01:59:56Z

2

Not sure what the general case or conditions would be, but this works for this instance without dplyr

df3 <- as.matrix(df1)
df3[which(is.na(df3))] <- as.matrix(df2)
df3 <- as.data.frame(df3)
df3

  A B C
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 5
5 5 4 5
6 6 4 5

answered May 19, 2016 at 1:59

TomNash

3,3272 gold badges28 silver badges68 bronze badges

1 Comment

Zyferion Over a year ago

thanks this could be useful - i've updated the question and added an index column though

Collectives™ on Stack Overflow

R: Updating a data frame with another data frame

5 Answers 5

2 Comments

4 Comments

1 Comment

For those interested, I've extended this problem to:

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

4 Comments

1 Comment

For those interested, I've extended this problem to:

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related