Replacing multiple character strings in specific data frame columns in R

Question

I've looked all around for this, but have found no answers. I have a data frame that contains columns with multiple levels along the lines of "Unknown" "No response" or "Refused to answer" and the like. All of these are useless to me for analysis, so I want to replace them all with NA.

Note that I do not want to replace them across the entire data frame, only specific columns! There are other columns that contain values with the same names that are actually useful to me and I want to leave them alone.

I've managed to replace them one at a time by using:

data$col1 <- factor(gsub("Unknown", "NA", data$col1))

but that only works for one string at a time. If I try to add multiple strings, R throws an error. Is there a more efficient way to do this?

I'm relatively new to coding, please be gentle!

Use the na.strings in read.csv i.e. while reading the dataset, you can specify which values can be changed to NA, dat <- read.csv("yourfile.csv", na.strings = c("Unknown", "No response", "Refused to answer")) — akrun
– akrun, Commented Dec 4, 2016 at 3:40
Try data$col1 <- factor(gsub("Unknown|No response|Refused to answer", "NA", data$col1)). — R. Schifini
– R. Schifini, Commented Dec 4, 2016 at 3:43

akrun · Accepted Answer · 2016-12-04 03:44:09Z

1

If we need to change multiple values to NA, one option is using na.strings in read.csv/read.table while reading the data

dat <- read.csv("yourfile.csv", na.strings = c("Unknown", "No response", 
             "Refused to answer"))

However, here the problem is with specific columns, in that case, create an index of the columns, loop through the columns and replace the values by creating a logical index with %in% (assuming that these are not substrings)

columnsOfInterest <- c(1, 4, 5) #just for an example
df1[columnsOfInterest] <- lapply(df1[columnsOfInterest], function(x)
         replace(x, x %in% c("Unknown", "No response", "Refused to answer"), NA))

NOTE: changing to quoted NA i.e. "NA" is rather useless, instead we need just NA

answered Dec 4, 2016 at 3:44

akrun

891k38 gold badges590 silver badges700 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Replacing multiple character strings in specific data frame columns in R

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related