I need to create a dummy variable (binary) from a character (string) variable The data that I have look like this:
dat <- tribble(
~pat_id, ~icd9_1, ~icd9_2,
1, "414.01", "414.01",
2, "411.89", NA,
3, NA, "410.71",
4, NA, NA,
5, NA, "410.51",
6, NA, "272.0, 410.71"
)
dat
# A tibble: 6 x 3
# pat_id icd9_1 icd9_2
# <dbl> <chr> <chr>
# 1 414.01 414.01
# 2 411.89 <NA>
# 3 <NA> 410.71
# 4 <NA> <NA>
# 5 <NA> 410.51
# 6 <NA> 272.0, 410.71
I want to create three new binary variables:
icd9_bin_1 == binary (0/1) for icd9_1
icd9_bin_2 == binary (0/1) for icd9_2
icd9_bin == binary for either icd9_1 OR icd9_2
What is the fastest way to create these binary variables?
I've replaced NAs with 0, turned into a factor and then recoded, but that
took forever.
# get structure
dat$icd9_1 %>% str()
# get rid of NAs (replace with 0s)
dat$icd9_1[is.na(dat$icd9_1 )] <- 0
# turn into factor
dat$icd9_1 <- factor(dat$icd9_1)
# get levels
dat$icd9_1 %>% levels()
# use fct_collapse
dat %>%
mutate(icd9_bin_1 = fct_collapse(
icd9_1,
`icd9` = c("411.89","414.01"),
`no icd9 dx` = c("0")))
# A tibble: 6 x 4
# pat_id icd9_1 icd9_2 icd9_bin_1
# <dbl> <fctr> <chr> <fctr>
# 1 414.01 414.01 icd9
# 2 411.89 <NA> icd9
# 3 0 410.71 no icd9 dx
# 4 0 <NA> no icd9 dx
# 5 0 410.51 no icd9 dx
# 6 0 272.0, 410.71 no icd9 dx
I'm looking for a more elegant solution. Ideas?
9_1only.dat$icd9_bin_1 <- if_else(is.na(dat$icd9_1), "no icd9 dx", "icd9")? I'm tired, so I'm probably missing something...icd9_bin_1. After these two are created, I usemutateandif_elseto create the binary for eithericd9_1oricd9_2dat[c('icd9_bin_1', 'icd9_bin_2')] <- paste(c('yes', 'no')[is.na(dat[-1]) + 1L], rep(names(dat[-1]), each=nrow(dat)), sep='-')dplyrsolution that let me create all three variables in one pipe? The actual data has up to 50 differenticd9levels across several variables.