1

Let's say I have two dataframes like below, (real dataset has many more rows and cols)

df = data.frame("Worker" = c("JBB","JDD","MB","JBB"),
                 "Age" = c(4,5,6,4))

df2 = data.frame("Initials" = c("JBB","JDD","MB","JOD"),
                 "Worker" = c("Joe Bloggs/JBB", "Jane Doe/JDD", 
                                "Mr. Big/MB", "John Doe/JOD"))

I would like to replace the Worker col in df with the Worker col from df2 In the future more workers will be added to both dataframes so it would be nice if there was a quick and easy way to do this rather than manually doing something like this for each set of initials

df$Worker<-paste(gsub("JBB", "Joe Bloggs/JBB", df$Worker, perl=TRUE))

Perhaps a loop or simply some kind of tidyverse::replace solution

I have tried various joins but they don't work for me.

Have also tried

df %>%
mutate(new_Worker = case_when(df$Worker == df2$Initials ~ df2$Worker)

This gives errors too.

4
  • 2
    My suggestion is to separate the initials from the actual name and use that for the join since df1 has initials I presume ie replace everything before / and join. Commented Jul 27, 2022 at 9:21
  • 1
    Hi @NelsonGon, joins don't really do what I want. I need the "full name/initials" part of the second df added to the first one. Something like tidyverse::case_when but I just can't figure out the logic and how best to implement in an efficient way, have edited question to reflect trying with joins Commented Jul 27, 2022 at 9:28
  • 2
    What is wrong with this approach df %>% rename(Initials = Worker) %>% left_join(df2) Commented Jul 27, 2022 at 10:11
  • Yes @MohanGovindasamy, this does in fact also work. If you see below I had some errant whitespace which I hadn't picked up. Commented Jul 27, 2022 at 10:46

2 Answers 2

3

A possible solution:

library(dplyr)

inner_join(df, df2, by = c("Worker" = "Initials"))

#>   Worker Age       Worker.y
#> 1    JBB   4 Joe Bloggs/JBB
#> 2    JDD   5   Jane Doe/JDD
#> 3     MB   6     Mr. Big/MB
#> 4    JBB   4 Joe Bloggs/JBB
Sign up to request clarification or add additional context in comments.

3 Comments

So this works on the dummy dataset provided but not on the larger one? Can't figure why, I just get an empty dataframe??
Without the true data, it is hard to figure out the reason behind that...
Sorry it works, there was an extra bit of whitespace in the dataframe! d'oh
0

A simple solution is to rename the Worker column in df as you do the join

left_join(rename(df, Initials = Worker), 
          df2)

this results in the data.frame with columns 'Initials', 'Worker' and 'Age'. It also assumes that df is the data and that df2 is the lookup list.

I don't think you would use case_when for this example. Presumably there is a large number of Initials.

The other option is to filter and pull the values from df2.

df |>
  mutate(Worker = map_chr(Worker, 
                          ~ filter(df2, Initials == .x) |>  
                                   pull(Worker)
                          )
         )

The map_chr above is needed otherwise the nested filter does not work

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.