Replace several values in different columns at once - R

Question

Please help! I am working with medication data which have a lot of misspellings. I am trying to replace several values (ex. "Orange", "orange", "ORANGE","Orangee") across several columns (about 50), all starting with "medication" and then followed by a number as our data is longitudinal so the same mistakes could be in 3 month column, 6 month column etc. At the moment I am using this

df$medication1[df$medication1 %in% c("Orange", "orange", "ORANGE","Orangee")] <- "Orange"

I have copied and pasted the same code and changed the column name each time but please please help me do this with a loop or something! We have 6 columns for every time point and 10 time points!

What are the column names? try df %>% mutate(across(yourcols, ~ replace(.x, .x %in% c("Orange", "orange", "ORANGE","Orangee"), "Orange"))) — akrun
– akrun, Commented Jun 1, 2022 at 19:13
sub("^(orange)e$", "\\1", tolower(df$medication1) should work — Onyambu
– Onyambu, Commented Jun 1, 2022 at 19:17
@onyambu ... but that might also target other fruit names or strings. — Tim Biegeleisen
– Tim Biegeleisen, Commented Jun 1, 2022 at 19:18

Tim Biegeleisen · Accepted Answer · 2022-06-01 19:16:06Z

2

You could use grepl here with a regex pattern:

df$medication1[grepl("(?i)^orangee?$", df$medication1)] <- "Orange"

answered Jun 1, 2022 at 19:16

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Onyambu Over a year ago

You are right!!

NoobR Over a year ago

but that would only replace it in the column medication1, i have around 50 columns, will I have to code it for each individual column? Surely there is a better way to do this where I can specify all the columns I want this change to happen at once?

nd37255 Over a year ago

You can use dplyr's mutate_all or mutate_at to apply the same function to more than 1 column

nd37255 · Accepted Answer · 2022-11-18 16:14:37Z

0

Expanding on previous answer:

library(dplyr)
library(stringr)

df <- tibble(
    medication1 = c("Orange", "orange", "ORANGE","Orangee"),
    medication2 = c("Orange", "orange", "ORANGE","Orangee"),
    medication3 = c("Orange", "orange", "ORANGE","Orangee"))

df
#> # A tibble: 4 x 3
#>   medication1 medication2 medication3
#>   <chr>       <chr>       <chr>      
#> 1 Orange      Orange      Orange     
#> 2 orange      orange      orange     
#> 3 ORANGE      ORANGE      ORANGE     
#> 4 Orangee     Orangee     Orangee

df %>% 
    mutate_all(.funs = ~ str_replace_all(.x, pattern = "(?i)^orangee?$", replacement = "Orange"))
#> # A tibble: 4 x 3
#>   medication1 medication2 medication3
#>   <chr>       <chr>       <chr>      
#> 1 Orange      Orange      Orange     
#> 2 Orange      Orange      Orange     
#> 3 Orange      Orange      Orange     
#> 4 Orange      Orange      Orange

^{Created on 2022-11-16 with reprex v2.0.2}

This applies the same replacement in each column.

EDIT:

To mutate only columns that start with the word medication, you could do the following:

df %>% 
    mutate(across(
        starts_with("medication"), 
        ~ str_replace_all(.x, pattern = "(?i)^orangee?$", replacement = "Orange")
    ))
#> # A tibble: 4 x 3
#>   medication1 medication2 medication3
#>   <chr>       <chr>       <chr>      
#> 1 Orange      Orange      Orange     
#> 2 Orange      Orange      Orange     
#> 3 Orange      Orange      Orange     
#> 4 Orange      Orange      Orange

^{Created on 2022-11-18 with reprex v2.0.2}

edited Nov 18, 2022 at 16:14

answered Nov 16, 2022 at 13:34

nd37255

3781 silver badge12 bronze badges

2 Comments

NoobR Over a year ago

Thank you but in this case I would still have to manually write down all 50 columns. Is there a way to create a loop that takes all columns including the word "medication"

nd37255 Over a year ago

The mutate_all function would change all columns without having to list any names. The "mutate_at" function or using mutate with the function across could select columns starting with a character string: mutate(across(starts_with("medication"), ~ str_replace_all(...))) Does that make sense?

Collectives™ on Stack Overflow

Replace several values in different columns at once - R

2 Answers 2

3 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related