Convert multiple columns to binary in R

Question

Hi I have a dataset with multiple columns that are populated with either NA or "Y". I wish to make these values 0 and 1 respectively.

I am fairly new to R, and trying to determine the best way to loop through these variables and recode them.

STATE<-c(NA, "WA", "NY", NA, NA)  
x<-c(NA,"Y",NA,NA,"Y")
y<-c(NA,NA,"Y",NA,"Y")
z<-c("Y","Y",NA, NA, NA)
mydata<-data.frame(x,y,z)

I have a large dataset, and many of these variables. However, some of them (such as STATE), I wish to leave alone. Any help would be greatly appreciated. Thanks.

xraynaud · Accepted Answer · 2017-04-03 20:58:59Z

2

You can use ifelse:

ifelse(is.na(mydata),0,ifelse(mydata=="Y",1,mydata)

This replaces elements of mydata to 0 if they are NA, to one if they are "Y" or keep element if they are anything else.

You added the binary tag. R has a binary type: TRUE/FALSE, so if you want binary, you should use

 ifelse(is.na(mydata),FALSE,ifelse(mydata=="Y",TRUE,mydata)

instead.

answered Apr 3, 2017 at 20:58

xraynaud

2,15621 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

pyll Over a year ago

Is there a way to only perform this action for selected variables (see edit)? Also, I think you're right...binary is what I want. That is, if they resolve to 1 and 0 in arithmetic functions. Is TRUE + TRUE equal to 2?

xraynaud Over a year ago

If you want to modify only some columns, you can do something like mydata[c('x','y')] = ifelse(is.na(mydata[c('x','y')]),0,ifelse(mydata[c('x','y')]=="Y",1,mydata[c('x','y')]))where contains the column names you wish to keep. And yes TRUE+TRUE = 2

David Pinto · Accepted Answer · 2017-04-03 23:42:28Z

1

The best way I think is to use the mutate_each() function from the package dplyr:

library(dplyr)

STATE  <- c(NA, "WA", "NY", NA, NA)  
x      <- c(NA, "Y", NA, NA, "Y")
y      <- c(NA, NA, "Y", NA, "Y")
z      <- c("Y", "Y", NA, NA, NA)
mydata <- data.frame(x, y, z, STATE)

mydata <- mutate_each(mydata, funs(ifelse(is.na(.), 0, 1)), -STATE)

It will apply the function specified inside funs() to each variable. The dot . is a representation for the variable. To skip one or more variables just write their names with a - before them: -var1, -var2, ...

answered Apr 3, 2017 at 23:42

David Pinto

1411 silver badge5 bronze badges

Comments

someguyinafloppyhat · Accepted Answer · 2017-04-03 21:06:04Z

0

First, you need to make sure the character vectors are not coded as factors:

mydata <- data.frame(x,y,z, stringsAsFactors=F)

Then:

mydata[mydata=="Y"] <- 1
mydata[is.na(mydata)] <- 0
mydata
  x y z
  1 0 0 1
  2 1 0 1
  3 0 1 0
  4 0 0 0
  5 1 1 0

answered Apr 3, 2017 at 21:06

someguyinafloppyhat

4411 gold badge7 silver badges20 bronze badges

Collectives™ on Stack Overflow

Convert multiple columns to binary in R

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related