2

I have the following dataframes:

db1 = data.frame(name = c('a', 'b', 'c', 'd'), age = c('10', '20', '30', '40'), tier = NA)
db2 = data.frame(name = c('a', 'a', 'c', 'b'), age = c('10', '10', '30', '20'), tier = c('1', '3', '4', '2'))

I want to enter the tier values from db2 into the same column in db1 if the name and age variables match.

I can do this with a for-loop but when we're dealing with thousands of rows this takes far too long. Is there a faster way to do this?

for (i in 1:nrow(db1)){
  for (j in 1:nrow(db2)){
    if (db1$name[i] == db2$name[j] & db1$age[i] == db2$age[j]){
      db1$tier[i] = db2$tier[j]
    }
  }
}
3
  • 2
    Drop the tier column from db1 (db1$tier <- NULL). This would be a simple merge : merge(db1, db2) Or to be specific - merge(db1, db2, by = c('name', 'age')) Commented Jun 9, 2021 at 13:32
  • This works, thank you. If you post an answer then I'll accept it Commented Jun 9, 2021 at 13:38
  • What to do if it matces two times? In you case name=a and age=10? Take the first? Commented Jun 9, 2021 at 13:43

4 Answers 4

2

When taking the first in case it matches multiple times is also ok (you code takes the last), you can use match and for multiple columns with interaction.

db1$tier <- db2$tier[match(interaction(db1[c("name","age")]),
                           interaction(db2[c("name","age")]))]
db1
#  name age tier
#1    a  10    1
#2    b  20    2
#3    c  30    4
#4    d  40 <NA>

Or taking the last match (like your code is doing) using in addition `rev.

db1$tier <- rev(db2$tier)[match(interaction(db1[c("name","age")]),
                    rev(interaction(db2[c("name","age")])))]
db1
#  name age tier
#1    a  10    3
#2    b  20    2
#3    c  30    4
#4    d  40 <NA>
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks! I haven't come across interaction() before
2

Drop the tier column and use merge -

db1$tier <- NULL
merge(db1, db2)

#  name age tier
#1    a  10    1
#2    a  10    3
#3    b  20    2
#4    c  30    4

If you want d in the final output use all.x = TRUE -

merge(db1, db2, all.x = TRUE)

#  name age tier
#1    a  10    1
#2    a  10    3
#3    b  20    2
#4    c  30    4
#5    d  40 <NA>

Comments

1

We can use merge + duplicated like below

subset(
  merge(db1, db2, by = c("name", "age"), all.x = TRUE),
  !duplicated(cbind(name, age)),
  select = -tier.x
)

which gives you

  name age tier.y
1    a  10      1
3    b  20      2
4    c  30      4
5    d  40   <NA>

Comments

0

This is a simple join.

library(dplyr)
db3<-full_join(db2,db1, by = c("name" = "name", "age" = "age"), suffix = c("", ".x"))

  name age tier tier.x
1    a  10    1     NA
2    a  10    3     NA
3    c  30    4     NA
4    b  20    2     NA
5    d  40 <NA>     NA

### i am assuming you want to have tier from db2 shown if they are not all NAs otherwise you can just drop before the join ###

db3$tier.x = NULL

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.