0

Hi I have a dataset with multiple columns that are populated with either NA or "Y". I wish to make these values 0 and 1 respectively.

I am fairly new to R, and trying to determine the best way to loop through these variables and recode them.

STATE<-c(NA, "WA", "NY", NA, NA)  
x<-c(NA,"Y",NA,NA,"Y")
y<-c(NA,NA,"Y",NA,"Y")
z<-c("Y","Y",NA, NA, NA)
mydata<-data.frame(x,y,z)

I have a large dataset, and many of these variables. However, some of them (such as STATE), I wish to leave alone. Any help would be greatly appreciated. Thanks.

3 Answers 3

2

You can use ifelse:

ifelse(is.na(mydata),0,ifelse(mydata=="Y",1,mydata)

This replaces elements of mydata to 0 if they are NA, to one if they are "Y" or keep element if they are anything else.

You added the binary tag. R has a binary type: TRUE/FALSE, so if you want binary, you should use

 ifelse(is.na(mydata),FALSE,ifelse(mydata=="Y",TRUE,mydata)

instead.

Sign up to request clarification or add additional context in comments.

2 Comments

Is there a way to only perform this action for selected variables (see edit)? Also, I think you're right...binary is what I want. That is, if they resolve to 1 and 0 in arithmetic functions. Is TRUE + TRUE equal to 2?
If you want to modify only some columns, you can do something like mydata[c('x','y')] = ifelse(is.na(mydata[c('x','y')]),0,ifelse(mydata[c('x','y')]=="Y",1,mydata[c('x','y')]))where contains the column names you wish to keep. And yes TRUE+TRUE = 2
1

The best way I think is to use the mutate_each() function from the package dplyr:

library(dplyr)

STATE  <- c(NA, "WA", "NY", NA, NA)  
x      <- c(NA, "Y", NA, NA, "Y")
y      <- c(NA, NA, "Y", NA, "Y")
z      <- c("Y", "Y", NA, NA, NA)
mydata <- data.frame(x, y, z, STATE)

mydata <- mutate_each(mydata, funs(ifelse(is.na(.), 0, 1)), -STATE)

It will apply the function specified inside funs() to each variable. The dot . is a representation for the variable. To skip one or more variables just write their names with a - before them: -var1, -var2, ...

Comments

0

First, you need to make sure the character vectors are not coded as factors:

mydata <- data.frame(x,y,z, stringsAsFactors=F)

Then:

mydata[mydata=="Y"] <- 1
mydata[is.na(mydata)] <- 0
mydata
  x y z
  1 0 0 1
  2 1 0 1
  3 0 1 0
  4 0 0 0
  5 1 1 0

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.