Match and replace values from one dataframe with another

Question

Let's say I have two dataframes like below, (real dataset has many more rows and cols)

df = data.frame("Worker" = c("JBB","JDD","MB","JBB"),
                 "Age" = c(4,5,6,4))

df2 = data.frame("Initials" = c("JBB","JDD","MB","JOD"),
                 "Worker" = c("Joe Bloggs/JBB", "Jane Doe/JDD", 
                                "Mr. Big/MB", "John Doe/JOD"))

I would like to replace the Worker col in df with the Worker col from df2 In the future more workers will be added to both dataframes so it would be nice if there was a quick and easy way to do this rather than manually doing something like this for each set of initials

df$Worker<-paste(gsub("JBB", "Joe Bloggs/JBB", df$Worker, perl=TRUE))

Perhaps a loop or simply some kind of tidyverse::replace solution

I have tried various joins but they don't work for me.

Have also tried

df %>%
mutate(new_Worker = case_when(df$Worker == df2$Initials ~ df2$Worker)

This gives errors too.

My suggestion is to separate the initials from the actual name and use that for the join since df1 has initials I presume ie replace everything before / and join. — NelsonGon
– NelsonGon, Commented Jul 27, 2022 at 9:21
Hi @NelsonGon, joins don't really do what I want. I need the "full name/initials" part of the second df added to the first one. Something like tidyverse::case_when but I just can't figure out the logic and how best to implement in an efficient way, have edited question to reflect trying with joins — McMahok
– McMahok, Commented Jul 27, 2022 at 9:28
What is wrong with this approach df %>% rename(Initials = Worker) %>% left_join(df2) — Mohan Govindasamy
– Mohan Govindasamy, Commented Jul 27, 2022 at 10:11
Yes @MohanGovindasamy, this does in fact also work. If you see below I had some errant whitespace which I hadn't picked up. — McMahok
– McMahok, Commented Jul 27, 2022 at 10:46

PaulS · Accepted Answer · 2022-07-27 10:11:30Z

3

A possible solution:

library(dplyr)

inner_join(df, df2, by = c("Worker" = "Initials"))

#>   Worker Age       Worker.y
#> 1    JBB   4 Joe Bloggs/JBB
#> 2    JDD   5   Jane Doe/JDD
#> 3     MB   6     Mr. Big/MB
#> 4    JBB   4 Joe Bloggs/JBB

answered Jul 27, 2022 at 10:11

PaulS

27.1k3 gold badges19 silver badges40 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

McMahok Over a year ago

So this works on the dummy dataset provided but not on the larger one? Can't figure why, I just get an empty dataframe??

PaulS Over a year ago

Without the true data, it is hard to figure out the reason behind that...

McMahok Over a year ago

Sorry it works, there was an extra bit of whitespace in the dataframe! d'oh

mrjoh3 · Accepted Answer · 2022-07-27 10:32:28Z

0

A simple solution is to rename the Worker column in df as you do the join

left_join(rename(df, Initials = Worker), 
          df2)

this results in the data.frame with columns 'Initials', 'Worker' and 'Age'. It also assumes that df is the data and that df2 is the lookup list.

I don't think you would use case_when for this example. Presumably there is a large number of Initials.

The other option is to filter and pull the values from df2.

df |>
  mutate(Worker = map_chr(Worker, 
                          ~ filter(df2, Initials == .x) |>  
                                   pull(Worker)
                          )
         )

The map_chr above is needed otherwise the nested filter does not work

answered Jul 27, 2022 at 10:32

mrjoh3

4673 silver badges11 bronze badges

Collectives™ on Stack Overflow

Match and replace values from one dataframe with another

2 Answers 2

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related