1

I have a dataframe that shows ICD-10 codes for people who have died (decedents). Each row in the data frame corresponds to a decedent, each of whom can have up to twenty conditions listed as contributing factors to his or her death. I want to create a new column that shows if a decedent had any ICD-10 code for diabetes (1 for yes, 0 for no). The codes for diabetes fall within E10-E14 i.e., codes for diabetes must start with any of the strings in the following vector, but the fourth position can take on different values:

diabetes <- c("E10","E11","E12","E13","E14")

This is a small, made-up example of what the data looks like:

original <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255", 
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), row.names = c(NA, 
-4L), class = c("tbl_df", "tbl", "data.frame"))
acond1 acond2 acond3 acond4
E112 I255 I258 I500
I250 B341 B348 E669
A419 F179 I10 I694
E149 F101 I10 R092

This is my desired result:

acond1 acond2 acond3 acond4 diabetes
E112 I255 I258 I500 1
I250 B341 B348 E669 0
A419 F179 I10 I694 0
E149 F101 I10 R092 1

There have been a couple other posts (e.g., Using if else on a dataframe across multiple columns, Str_detect multiple columns using across) on this type of question, but I can't seem to put it all together. Here is what I have unsuccessfully tried so far:

library(tidyverse)
library(stringr)

#attempt 1
original %>%
  mutate_at(vars(contains("acond")), ifelse(str_detect(.,paste0("^(", 
  paste(diabetes, collapse = "|"), ")")), 1, 0))

#attempt 2
original %>%
  unite(col = "all_conditions", starts_with("acond"), sep = ", ", remove = FALSE) %>%
  mutate(diabetes = if_else(str_detect(.,paste0("^(", paste(diabetes, collapse = "|"), ")")), 1, 0))

Any help would be appreciated.

4 Answers 4

2
library(tidyverse)

diabetes_pattern <- c("E10","E11","E12","E13","E14") %>% 
  str_c(collapse = "|")

original <-
  structure(
    list(
      acond1 = c("E112", "I250", "A419", "E149"),
      acond2 = c("I255", "B341", "F179", "F101"),
      acond3 = c("I258", "B348", "I10", "I10"),
      acond4 = c("I500", "E669", "I694", "R092")
    ),
    row.names = c(NA,-4L),
    class = c("tbl_df", "tbl", "data.frame")
  )

original %>% 
  rowwise() %>% 
  mutate(diabetes = +any(str_detect(string = c_across(everything()), pattern = diabetes_pattern)))
#> # A tibble: 4 x 5
#> # Rowwise: 
#>   acond1 acond2 acond3 acond4 diabetes
#>   <chr>  <chr>  <chr>  <chr>     <int>
#> 1 E112   I255   I258   I500          1
#> 2 I250   B341   B348   E669          0
#> 3 A419   F179   I10    I694          0
#> 4 E149   F101   I10    R092          1

original %>% 
  mutate(diabetes = rowSums(across(.cols = everything(), ~str_detect(.x, diabetes_pattern))))
#> # A tibble: 4 x 5
#>   acond1 acond2 acond3 acond4 diabetes
#>   <chr>  <chr>  <chr>  <chr>     <dbl>
#> 1 E112   I255   I258   I500          1
#> 2 I250   B341   B348   E669          0
#> 3 A419   F179   I10    I694          0
#> 4 E149   F101   I10    R092          1

Created on 2022-01-23 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

Comments

2

I would like to add an update to this question because I found the approved answer via dplyr takes a very long time to execute.

instead you could vectorize the original codes and columns you are looking for.

library(tidyverse)
original <-
  structure(
    list(
      acond1 = c("E112", "I250", "A419", "E149"),
      acond2 = c("I255", "B341", "F179", "F101"),
      acond3 = c("I258", "B348", "I10", "I10"),
      acond4 = c("I500", "E669", "I694", "R092")
    ),
    row.names = c(NA,-4L),
    class = c("tbl_df", "tbl", "data.frame")
  )

# vector for your columns & pattern you are looking for,
# this allows you to add or subtract 
# to a vector for the next portion of code.
dia <- c("acond1", "acond2", "acond3", "acond4")
diabetes_pattern <- c("E10","E11","E12","E13","E14")

identified_diabetes <- original |> 
  mutate(diabetes = +(if_any(any_of(dia), \(x) substr(x, 1,3) %in% c(diabetes_pattern))))


This should return the desired output all the same, but the benchmarking of this is drastically faster.

original %>% 
rowwise() %>% 
mutate(diabetes = any(grepl(dia, c_across(starts_with("ac")))) * 1) %>% ungroup          

replications elapsed
100    0.45

versus

original |> 
  mutate(diabetes = +(if_any(any_of(dia), \(x) substr(x, 1,3) %in% c(diabetes_pattern))))

replications elapsed
100    0.14

While this smaller set might be fast, it might be worth noting that as the dataset gets larger (like I attempted to do on a df of >250k rows and ~100 columns) the latter is much faster way to check this.

Comments

1

Here's a base R approach using apply

dia <- paste(c("E10","E11","E12","E13","E14"), collapse="|")

df$diabetes <- apply(df, 1, function(x) any(grepl(dia,x)))*1

df
  acond1 acond2 acond3 acond4 diabetes
1   E112   I255   I258   I500        1
2   I250   B341   B348   E669        0
3   A419   F179    I10   I694        0
4   E149   F101    I10   R092        1

With dplyr

library(dplyr)

df %>% 
  rowwise() %>% 
  mutate(diabetes=any(grepl(dia,c_across(starts_with("ac"))))*1) %>% 
  ungroup
# A tibble: 4 × 5
  acond1 acond2 acond3 acond4 diabetes
  <chr>  <chr>  <chr>  <chr>     <dbl>
1 E112   I255   I258   I500          1
2 I250   B341   B348   E669          0
3 A419   F179   I10    I694          0
4 E149   F101   I10    R092          1

Data

df <- structure(list(acond1 = c("E112", "I250", "A419", "E149"), acond2 = c("I255", 
"B341", "F179", "F101"), acond3 = c("I258", "B348", "I10", "I10"
), acond4 = c("I500", "E669", "I694", "R092")), class = "data.frame", row.names = c(NA, 
-4L))

Comments

1

If we want to use across wit ifelse and str_detect then we could:

  1. create a pattern with paste and collapse for str_detect
  2. mutate across all columns and use anonymous ~ifelse with the condition and .names to control for the new columns
  3. unite the new columns
  4. trick with parse_number from readr package
diabetes <- c("E10","E11","E12","E13","E14")

pattern <- paste(diabetes, collapse = "|")

library(tidyverse)

original %>% 
  mutate(across(everything(), ~ifelse(str_detect(., pattern), 1, 0), .names = "new_{col}")) %>% 
  unite(New_Col, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(diabetes = parse_number(New_Col), .keep="unused")                                                                                                                                                                                                                                                                                              
  acond1 acond2 acond3 acond4 diabetes
  <chr>  <chr>  <chr>  <chr>     <dbl>
1 E112   I255   I258   I500          1
2 I250   B341   B348   E669          0
3 A419   F179   I10    I694          0
4 E149   F101   I10    R092          1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.