1

I want to create a new column in my data frame based on the values of another column that contains a set of strings. For some of the strings, I want to change the string, others I want to keep as is.

To keep things short, I want to do this using a vector of strings that specifies which strings I want to change and a vector of strings that I want to change the matches into.

I usually do this using the package dplyr::mutate and the case_when function. For the following code, I want to change Paul and Barbara to Anna and Fred respectively, while keeping the other names.

library(dplyr)
library(tibble)

a<-rep(c("Paul", "Barbara","Joey","Iris"),3)
test<-enframe(a)

mutate(test,
  name2 = case_when(
   value == "Paul" ~ "Anna",
   value == "Barbara" ~ "Fred", 
   TRUE ~ value)
)

Given that the real dataset is much longer, I would like to use vectors of strings as specified earlier. Using %in% b works to find the matching cells but using vector d to replace the hits throws an error:

b<-c("Paul","Barbara") #only Paul and Barbara need to change
d<-c("Anna","Fred") #they need to change to Anna and Fred

mutate(test,
       name2 = case_when(
           value %in% b ~ d, 
           TRUE ~ value)

Error in mutate(): ! Problem while computing name2 = case_when(value %in% b ~ d, TRUE ~ value). Caused by error in case_when(): ! value %in% b ~ d must be length 12 or one, not 2. Run rlang::last_error() to see where the error occurred.

I was hoping that if the match would be with the second element of b, the second element of d would be used. Clearly, as value %in% b returns a vector of 12 TRUE/FALSE values, this does not work that way but is there any to work with vectors of strings like this?

1 Answer 1

1

I would do it like this:

lkp <- c("Anna","Fred") %>%
   setNames(c("Paul", "Barbara"))

test %>%
   mutate(name2 = coalesce(lkp[value], value))
# # A tibble: 12 × 3
#     name value   name2
#    <int> <chr>   <chr>
#  1     1 Paul    Anna 
#  2     2 Barbara Fred 
#  3     3 Joey    Joey 
#  4     4 Iris    Iris 
#  5     5 Paul    Anna 
#  6     6 Barbara Fred 
#  7     7 Joey    Joey 
#  8     8 Iris    Iris 
#  9     9 Paul    Anna 
# 10    10 Barbara Fred 
# 11    11 Joey    Joey 
# 12    12 Iris    Iris 

Idea is that you create a named vector whose values are the new values and the names are the old values. Then you do a simple lookup and replace NAs (name not in the lookup vector) via coalesce with the original values.

Sign up to request clarification or add additional context in comments.

1 Comment

Super many thanks! I had to add "as.character" to the value for my specific table as my column was in fact a factor.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.