0

I have one master dataframe df:

df <- data.frame(c("A", "B", "C"), c(1,2,3), c(3,1,2), c(4,2,1), rep(NA, 3), rep(NA, 3))
colnames(df) <- c("text", "var1", "var2", "var3", "value1", "value2")

And another dataframe df.upd with new information:

df.upd <- data.frame(c(1,2), c(3,1), c(4,2),c(0.5, 0.6), c(12, 20))                           
colnames(df.upd) <- c("var1", "var2", "var3", "value1", "value2")
> df
text var1 var2 var3 value1 value2
1    A    1    3    4     NA     NA
2    B    2    1    2     NA     NA
3    C    3    2    1     NA     NA
> df.upd
  var1 var2 var3 value1 value2
1    1    3    4    0.5     12
2    2    1    2    0.6     20

I want to match columns "var1", "var2", "var3" and update the columns "value1" and "value2". So row 1 and 2 of df.upd would update row 1 and 2 of df, ergo as.numeric(df.upd[row x, 1:3])==as.numeric(df[row y, 2:4]) must be TRUE.

The master df has around 30k rows and 60 columns, so a for loop is not an option. Any idea how to accomplish this faster?

2
  • you can use merge with all.x = TRUE (left join), then use ifelse with is.na to update the relevant columns, then drop extra columns Commented May 23, 2018 at 8:20
  • check out stackoverflow.com/questions/36347213/…... i.e. library(data.table); cols <- paste0('value', 1:2); setDT(df)[setDT(df.upd), (cols) := mget(paste0("i.", cols)), on=.(var1, var2, var3)] Commented May 23, 2018 at 8:25

1 Answer 1

0

It's a for-loop answer, but it still might be useful and fast, as the process is vectorized.

ind <- intersect(names(df), names(df.upd))
for (i in ind) {
  df[1:length(df.upd[,i]),i]  <- df.upd[,i]
}
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.