1

I have a dataframe of patient IDs and their various medications. Their medications are each listed in the format "name,code,class,dose" in one string separated by commas. Code for a sample dataframe below:

id <- c(1,2,3)
med1 <- c("Penicillin,1,Antibiotic,5mg","Fluoxetine,5,Antidepressant,20mg","Olanzapine,9,Antipsychotic,2mg")
med2 <- c("Aspirin,4,Blood thinner,81mg",NA,"Lisinopril,3,Antihypertensive,10mg")
med3 <- c(NA,NA,"Amlodipine,2,Antihypertensive,5mg")
meds <- data.frame(id, med1, med2, med3)
meds

enter image description here

However, I want to separate these out by each category, so that it would look like:

meds2 <- meds %>% separate(med1, c('med_name1', 'med_code1', "med_class1", "med_dose1"), sep=",") %>% separate(med2, c('med_name2', 'med_code2', "med_class2", "med_dose2"), sep=",") %>% separate(med3, c('med_name3', 'med_code3', "med_class3", "med_dose3"), sep=",")
meds2

enter image description here

I am able to do this with the code I included, but I have to manually write out the separate function for every single medication. In my actual dataset, some patients have over 100 medications, so I would not be able to do this. I tried to write a for loop using the separate function but cannot get it to work:

meds <- for(i in 1:3){
  separate(meds2[i+1], c(paste0("med_name", i), paste0("med_code", i), paste0("med_class", i), paste0("med_dose", i), sep=","))
}

Does anyone know where I might be going wrong or know of another approach to achieve this? Thanks so much!

2 Answers 2

1

We could pivot long, separate into columns, and pivot wide again incorporating those:

meds %>%
  pivot_longer(-id) %>%
  separate(value, c("med_name", "med_code", "med_class", "med_dose"), sep = ",") %>%
  mutate(name = readr::parse_number(name)) %>% # "med1" -> "1"
  pivot_wider(names_from = name, names_sep = "", names_vary = "slowest",
              values_from = starts_with("med"))

Result

# A tibble: 3 × 13
     id med_name1  med_code1 med_class1     med_dose1 med_name2  med_code2 med_class2       med_dose2 med_name3  med_code3 med_class3       med_dose3
  <dbl> <chr>      <chr>     <chr>          <chr>     <chr>      <chr>     <chr>            <chr>     <chr>      <chr>     <chr>            <chr>    
1     1 Penicillin 1         Antibiotic     5mg       Aspirin    4         Blood thinner    81mg      NA         NA        NA               NA       
2     2 Fluoxetine 5         Antidepressant 20mg      NA         NA        NA               NA        NA         NA        NA               NA       
3     3 Olanzapine 9         Antipsychotic  2mg       Lisinopril 3         Antihypertensive 10mg      Amlodipine 2         Antihypertensive 5mg  
Sign up to request clarification or add additional context in comments.

Comments

1

An easier option would be to split the columns by , and then use unnest_wider

library(dplyr)
library(tidyr)
nc <- ncol(meds)-1
nm1 <- paste0("med_", rep(c("name", "code", "class", "dose"),
      nc), rep(seq_len(nc), each = 4))

meds %>% 
  mutate(across(starts_with('med'),
       ~strsplit(as.character(.x), ",\\s*"))) %>%
  unnest_wider(where(is.list), names_sep = "") %>%
  setNames(c("id", nm1)) %>%
  type.convert(as.is = TRUE)

-output

# A tibble: 3 × 13
     id med_name1  med_code1 med_class1     med_dose1 med_name2  med_code2 med_class2       med_dose2 med_name3  med_code3 med_class3       med_dose3
  <int> <chr>          <int> <chr>          <chr>     <chr>          <int> <chr>            <chr>     <chr>          <int> <chr>            <chr>    
1     1 Penicillin         1 Antibiotic     5mg       Aspirin            4 Blood thinner    81mg      <NA>              NA <NA>             <NA>     
2     2 Fluoxetine         5 Antidepressant 20mg      <NA>              NA <NA>             <NA>      <NA>              NA <NA>             <NA>     
3     3 Olanzapine         9 Antipsychotic  2mg       Lisinopril         3 Antihypertensive 10mg      Amlodipine         2 Antihypertensive 5mg   

Or using read.csv within mutate

meds %>%
  mutate(across(starts_with('med'),
  ~ read.csv(text = .x, header = FALSE, 
   col.names = c("name", "code", "class", "dose")))) %>% 
  unnest_wider(where(is.data.frame), names_sep = "_")

-output

# A tibble: 3 × 13
     id med1_name  med1_code med1_class     med1_dose med2_name  med2_code med2_class         med2_dose med3_name  med3_code med3_class         med3_dose
  <dbl> <chr>          <int> <chr>          <chr>     <chr>          <int> <chr>              <chr>     <chr>          <int> <chr>              <chr>    
1     1 Penicillin         1 Antibiotic     5mg       Aspirin            4 "Blood thinner"    "81mg"    <NA>              NA ""                 ""       
2     2 Fluoxetine         5 Antidepressant 20mg      <NA>              NA ""                 ""        <NA>              NA ""                 ""       
3     3 Olanzapine         9 Antipsychotic  2mg       Lisinopril         3 "Antihypertensive" "10mg"    Amlodipine         2 "Antihypertensive" "5mg"    

Or we could do this easily in base R with read.csv

lst1 <- lapply(meds[-1], function(x) 
   read.csv(text = x, header = FALSE, col.names =
    c("name", "code", "class", "dose"), fill = TRUE))
 cbind(meds[1], do.call(cbind, unname(Map(function(x, y)
   setNames(x, paste0(y, "_", names(x))), lst1, names(lst1)))))

-output

 id  med1_name med1_code     med1_class med1_dose  med2_name med2_code       med2_class med2_dose  med3_name med3_code       med3_class med3_dose
1  1 Penicillin         1     Antibiotic       5mg    Aspirin         4    Blood thinner      81mg       <NA>        NA                           
2  2 Fluoxetine         5 Antidepressant      20mg       <NA>        NA                                  <NA>        NA                           
3  3 Olanzapine         9  Antipsychotic       2mg Lisinopril         3 Antihypertensive      10mg Amlodipine         2 Antihypertensive       5mg

9 Comments

Thank you so much for your quick response - when I run this, I get the following error: Error: Problem with mutate() input ..1. ℹ ..1 = across(starts_with("med"), ~strsplit(.x, ",\\s*")). x non-character argument Do you know what could be the issue?
Now I am getting the error message: This tidyselect interface doesn't support predicates yet. ℹ Contact the package author and suggest using eval_select() Is this another issue with my version of R or the version of the package?
R version 3.6.0, tidyr 1.0.2, and dplyr 1.0.6
sorry, it won't work
i'll try to update my versions and then see, thanks for your help!
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.