1

Goal: Clean a data frame that has a column (let's call it, v1) containing one or (often) more than one value in each cell. I would like to generate multiple binary variables (say: v1_1, v1_2, v1_3) based on values contained in the cells in v1. (Reality: I have a very large, ugly excel dataset from elsewhere with many cells that have multiple values and would like to efficiently sort them into binary columns, ideally with tidyverse tools, but base works too).

Reproducible example:

df <- data.frame(caseID = c(1:5),
                 v1 = c(2, 1, "1,3", 1, "2, 3"))
df
desired_df <- data.frame(caseID = c(1:5),
                      v1_1 = c(0, 1, 1, 1, 0),
                      v1_2 = c(1, 0, 0, 0, 1),
                      v1_3 = c(0, 0, 1, 0, 1))
desired_df
1
  • 1
    Try cbind(df[1], as.data.frame.matrix(table(stack(setNames(strsplit(as.character(df$v1), ",\\s*"), df$caseID))[2:1]))) Commented Nov 17, 2017 at 17:32

1 Answer 1

2

A solution using dplyr and tidyr.

library(dplyr)
library(tidyr)


df2 <- df %>%
  separate_rows(v1) %>%
  mutate(Value = 1) %>%
  spread(v1, Value, fill = 0) %>%
  rename_at(vars(-caseID), funs(paste0("v1_", .)))
df2 
#   caseID v1_1 v1_2 v1_3
# 1      1    0    1    0
# 2      2    1    0    0
# 3      3    1    0    1
# 4      4    1    0    0
# 5      5    0    1    1
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.