1

I'm trying to replace values in myDF1 from myDF2, where rows match for column "studyno" but the solutions I have found so far don't seem to be giving me the desired output.

Below are the data.frames:

myDF1 <- structure(list(studyno = c("J1000/9", "J1000/9", "J1000/9", "J1000/9", 
"J1000/9", "J1000/9"), date = structure(c(17123, 17127, 17135, 
17144, 17148, 17155), class = "Date"), pf_mcl = c(NA_integer_, 
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
), year = c(2016, 2016, 2016, 2016, 2016, 2016)), .Names = c("studyno", 
"date", "pf_mcl", "year"), row.names = c(NA, 6L), class = "data.frame")

myDF2 <- structure(list(studyno = c("J740/4", "J1000/9", "J895/7", "J931/6", 
"J609/1", "J941/3"), pf_mcl = c(0L, 0L, 0L, 0L, 0L, 0L)), .Names = c("studyno", 
"pf_mcl"), row.names = c(NA, 6L), class = "data.frame")

One solution I tried that seemed to work is shown below, however, I find that whatever values were in myDF1 before have been removed.

myDF1$pf_mcl <- myDF2$pf_mcl[match(myDF1$studyno, myDF2$studyno)]
4
  • Can you clarify the output you want, & how your proposed solution differs? It seems to me that if you want to "replace values in myDF1 from myDF2", then the "values [that] were in myDF1 before" should "have been removed", so I think I'm missing something. Commented Oct 13, 2017 at 16:43
  • You should look into the merge function. Commented Oct 13, 2017 at 16:53
  • Hi @gung, sorry for not being clear. myDF2 is a subset of myDF1, however, myDF2 is better curated that myDF1. For that reason, I have found some rows in myDF1 have missing values and I am therefore looking for a match in myDF2 and updating those values in myDF1. However, I don't want to loose the values in rows that don't match, which is what the script I posted was doing. Let me know if I need to add more detail. Commented Oct 13, 2017 at 16:55
  • Hi, @Kelli-Jean, an example please. I have seen some solutions with the merge function and still wasn't getting the right output. Commented Oct 13, 2017 at 16:57

1 Answer 1

1
# Merge myDF1 & myDF2 by the "studyno", keeping all the rows in myDF1
agg_df = merge(myDF1, myDF2, "studyno", all.x=TRUE)
# Populate pf_mcl in the merged dataframe by using pf_mcl in myDF2 if it is available. Otherwise, use pf_mcl from myDF1
# is missing in myDF1
agg_df$pf_mcl = ifelse(is.na(agg_df$pf_mcl.y), agg_df$pf_mcl.x, agg_df$pf_mcl.y)
myDF1 = agg_df[, names(myDF1)]
Sign up to request clarification or add additional context in comments.

3 Comments

Hi @Kelli-Jean, thanks for the solution, pardon my explanation...let me elaborate further. As I mentioned earlier, myDF2 is a well curated subset of myDF1. Therefore, some rows in the two datasets match based on "studyno", you may find that values are missing in myDF1$pf_mcl or the values are wrong. All I want to do is identify a matching row in myDF2 and populate myDF1$pf_mcl with the value in myDF2$pf_mcl. If a row does not match, the value should remain the same. I don't know whether it's worth mentioning, the two data frames have other columns...I have selected a few for example purposes
@K.Wamae I updated my answer. If this is still not the answer you are expecting, can you provide a data set that has records where your solution is not working? And the expected output. Thanks!
Dear @Kelli-Jean, I have tested it and it works perfectly. Thank you big time for the solution...

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.