Still pretty new to coding and I'm running into subsetting issues all the time. In this case, my goal is to remove NA values from my dataframe.
col1 <- c("text", NA, "text", NA)
col2 <- c(NA, "text", "text", NA)
col3 <- c("text", NA, "text", NA)
col4 <- c(17, 22, NA, NA)
col5 <- c(3, NA, 3, 17)
df <- data.frame(col1, col2, col3, col4, col5)
When I just use data[is.na(data)] <- 0 or data[is.na(data)] <- "" I get an error, which I understand is because I'm assigning the wrong type of values to the wrong column types. There is no 'numeric' empty string and there is no string with the integer value 0.
What I want is to convert all NA in the numeric columns to 0 and all NA in character columns to "". I figured out how to logically address the two parts of the question:
is.na(df)
> col1 col2 col3 col4 col5
> [1,] FALSE TRUE FALSE FALSE FALSE
> [2,] TRUE FALSE TRUE FALSE FALSE
> [3,] FALSE FALSE FALSE FALSE FALSE
> [4,] TRUE TRUE TRUE FALSE FALSE
unlist(lapply(df, is.numeric), use.names=FALSE)
> [1] FALSE FALSE FALSE TRUE TRUE
Now with this, of course, I could simply write a for-loop to go through each loop, determine if a column is numerical or not, and then replace NA accordingly in that column. Likewise, if I understand correctly I could also extend the vector resulting from unlist and turn it into a 20 element vector and subset by df[ x == y ] <- 0 and df[ x != y] <- "" Or I could create a couple of new dataframes, change NA accordingly, and then reassemble.
But there has to be a simpler way of doing this. I am guessing that this is an issue I will continue to run into, so I am hoping rather than just getting a solution, I can actually understand how to do this 'right' (which in R will probably give me 8 suggestions from 5 people).