Replace values in data frame based on other data frame in R

Question

In the below example, userids is my reference data frame and userdata is the data frame where the replacements should take place.

> userids <- data.frame(USER=c('Ann','Jim','Lee','Bob'),ID=c(1,2,3,4))
> userids
  USER ID
1  Ann  1
2  Jim  2
3  Lee  3
4  Bob  4

> userdata <- data.frame(INFO=c('foo','bar','foo','bar'), ID=c('Bob','Jim','Ann','Lee'),AGE=c('43','33','53','26'), FRIENDID=c('Ann',NA,'Lee','Jim'))
> userdata
  INFO  ID AGE FRIENDID
1  foo Bob  43      Ann
2  bar Jim  33       NA
3  foo Ann  53      Lee
4  bar Lee  26      Jim

How do I replace ID and FRIENDID in userdata with the ID corresponding to USER in userids?

The desired output:

  INFO  ID AGE FRIENDID
1  foo   4  43        1
2  bar   2  33       NA
3  foo   1  53        3
4  bar   3  26        2

What do you mean by "correct"? Do you want to match userids$USER to userdata$ID? — Richie Cotton
– Richie Cotton, Commented Feb 25, 2013 at 15:10
@Robert, it'd help to have the desired output (to avoid these confusions, for the next time). — Arun
– Arun, Commented Feb 25, 2013 at 15:15

Arun · Accepted Answer · 2013-02-25 15:08:46Z

26

Use match:

userdata$ID <- userids$ID[match(userdata$ID, userids$USER)]
userdata$FRIENDID <- userids$ID[match(userdata$FRIENDID, userids$USER)]

answered Feb 25, 2013 at 15:08

Arun

119k28 gold badges290 silver badges396 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Tyler Rinker · Accepted Answer · 2013-02-26 14:57:41Z

2

This is a possibility:

library(qdap)
userdata$FRIENDID <- lookup(userdata$FRIENDID, userids)
userdata$ID <- lookup(userdata$ID, userids)

or to win the one line prize:

userdata[, c(2, 4)] <- lapply(userdata[, c(2, 4)], lookup, key.match=userids)

edited Feb 26, 2013 at 14:57

answered Feb 26, 2013 at 7:41

Tyler Rinker

111k74 gold badges335 silver badges535 bronze badges

7 Comments

N8TRO Over a year ago

qdap looks pretty great, but I'm not seeing it in my repositories.

Tyler Rinker Over a year ago

Not sure why. Maybe it's because it's a newer release. Try install.packages("qdap") or you could use: library(devtools) install_github("qdap", "trinker") for the devel. version.

N8TRO Over a year ago

Failed. ERROR: dependency 'openNLP' is not available for package 'qdap'

Tyler Rinker Over a year ago

What OS are you using? If mac you have to compile from source. See this for details: trinker.github.com/qdap_install/installation

Tyler Rinker Over a year ago

@agstudy. Missed that. You're correct. I edited to reflect this.

|

agstudy · Accepted Answer · 2013-02-25 17:13:13Z

0

Here a try using sqldf to get the result as a multiple join on differents columns.

  library(sqldf)
  sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
       FROM 
       userdata d
       INNER JOIN 
       userids i1 ON (i1.USER=d.FRIENDID)
       INNER JOIN
        userids i2 ON (i2.USER=d.ID)')

 INFO AGE ID FRIENDID
1  foo  43  1        4
2  foo  53  3        1
3  bar  26  2        3

But this this removes NA lines! maybe someone can suggest me something on how to deal with NA!

EDIT

Thanks to G. Grothendieck comment, replacing the INNER by LEFT we get the result.

 sqldf('SELECT d.INFO,d.AGE,i1.ID ,i2.ID FRIENDID
        FROM 
        userdata d
        LEFT JOIN 
        userids i1 ON (i1.USER=d.FRIENDID)
        LEFT JOIN
         userids i2 ON (i2.USER=d.ID)')
INFO AGE ID FRIENDID
1  foo  43  1        4
2  bar  33 NA        2
3  foo  53  3        1
4  bar  26  2        3

edited Feb 25, 2013 at 17:13

answered Feb 25, 2013 at 15:45

agstudy

122k18 gold badges205 silver badges265 bronze badges

1 Comment

G. Grothendieck Over a year ago

Regarding your question replace the two instances of INNER with LEFT .

Umair Rafique · Accepted Answer · 2017-07-17 18:02:40Z

0

Here's a possible solution, which will also work on datasets with multiple records of each ID, though we will need to coerce the ID and FRIENDID variables to character first:

> userdata$ID <- sapply(userdata$ID, function(x){gsub(x, userids[userids$USER==x, 2], x)})
> userdata$FRIENDID <- sapply(userdata$FRIENDID, function(x){gsub(x, userids[userids$USER==x, 2], x)})

answered Jul 17, 2017 at 18:02

Umair Rafique

1081 silver badge9 bronze badges

Collectives™ on Stack Overflow

Replace values in data frame based on other data frame in R

4 Answers 4

Comments

7 Comments

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Comments

7 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related