0

Please help! I am working with medication data which have a lot of misspellings. I am trying to replace several values (ex. "Orange", "orange", "ORANGE","Orangee") across several columns (about 50), all starting with "medication" and then followed by a number as our data is longitudinal so the same mistakes could be in 3 month column, 6 month column etc. At the moment I am using this

df$medication1[df$medication1 %in% c("Orange", "orange", "ORANGE","Orangee")] <- "Orange"

I have copied and pasted the same code and changed the column name each time but please please help me do this with a loop or something! We have 6 columns for every time point and 10 time points!

3
  • What are the column names? try df %>% mutate(across(yourcols, ~ replace(.x, .x %in% c("Orange", "orange", "ORANGE","Orangee"), "Orange"))) Commented Jun 1, 2022 at 19:13
  • sub("^(orange)e$", "\\1", tolower(df$medication1) should work Commented Jun 1, 2022 at 19:17
  • @onyambu ... but that might also target other fruit names or strings. Commented Jun 1, 2022 at 19:18

2 Answers 2

2

You could use grepl here with a regex pattern:

df$medication1[grepl("(?i)^orangee?$", df$medication1)] <- "Orange"
Sign up to request clarification or add additional context in comments.

3 Comments

You are right!!
but that would only replace it in the column medication1, i have around 50 columns, will I have to code it for each individual column? Surely there is a better way to do this where I can specify all the columns I want this change to happen at once?
You can use dplyr's mutate_all or mutate_at to apply the same function to more than 1 column
0

Expanding on previous answer:

library(dplyr)
library(stringr)

df <- tibble(
    medication1 = c("Orange", "orange", "ORANGE","Orangee"),
    medication2 = c("Orange", "orange", "ORANGE","Orangee"),
    medication3 = c("Orange", "orange", "ORANGE","Orangee"))

df
#> # A tibble: 4 x 3
#>   medication1 medication2 medication3
#>   <chr>       <chr>       <chr>      
#> 1 Orange      Orange      Orange     
#> 2 orange      orange      orange     
#> 3 ORANGE      ORANGE      ORANGE     
#> 4 Orangee     Orangee     Orangee

df %>% 
    mutate_all(.funs = ~ str_replace_all(.x, pattern = "(?i)^orangee?$", replacement = "Orange"))
#> # A tibble: 4 x 3
#>   medication1 medication2 medication3
#>   <chr>       <chr>       <chr>      
#> 1 Orange      Orange      Orange     
#> 2 Orange      Orange      Orange     
#> 3 Orange      Orange      Orange     
#> 4 Orange      Orange      Orange

Created on 2022-11-16 with reprex v2.0.2

This applies the same replacement in each column.

EDIT:

To mutate only columns that start with the word medication, you could do the following:

df %>% 
    mutate(across(
        starts_with("medication"), 
        ~ str_replace_all(.x, pattern = "(?i)^orangee?$", replacement = "Orange")
    ))
#> # A tibble: 4 x 3
#>   medication1 medication2 medication3
#>   <chr>       <chr>       <chr>      
#> 1 Orange      Orange      Orange     
#> 2 Orange      Orange      Orange     
#> 3 Orange      Orange      Orange     
#> 4 Orange      Orange      Orange

Created on 2022-11-18 with reprex v2.0.2

2 Comments

Thank you but in this case I would still have to manually write down all 50 columns. Is there a way to create a loop that takes all columns including the word "medication"
The mutate_all function would change all columns without having to list any names. The "mutate_at" function or using mutate with the function across could select columns starting with a character string: mutate(across(starts_with("medication"), ~ str_replace_all(...))) Does that make sense?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.