2

Still pretty new to coding and I'm running into subsetting issues all the time. In this case, my goal is to remove NA values from my dataframe.

col1 <- c("text", NA, "text", NA)
col2 <- c(NA, "text", "text", NA)
col3 <- c("text", NA, "text", NA)
col4 <- c(17, 22, NA, NA)
col5 <- c(3, NA, 3, 17)

df <- data.frame(col1, col2, col3, col4, col5)

When I just use data[is.na(data)] <- 0 or data[is.na(data)] <- "" I get an error, which I understand is because I'm assigning the wrong type of values to the wrong column types. There is no 'numeric' empty string and there is no string with the integer value 0.

What I want is to convert all NA in the numeric columns to 0 and all NA in character columns to "". I figured out how to logically address the two parts of the question:

is.na(df)

>       col1  col2  col3  col4  col5
> [1,] FALSE  TRUE FALSE FALSE FALSE
> [2,]  TRUE FALSE  TRUE FALSE FALSE
> [3,] FALSE FALSE FALSE FALSE FALSE
> [4,]  TRUE  TRUE  TRUE FALSE FALSE

unlist(lapply(df, is.numeric), use.names=FALSE)

> [1] FALSE FALSE FALSE  TRUE  TRUE

Now with this, of course, I could simply write a for-loop to go through each loop, determine if a column is numerical or not, and then replace NA accordingly in that column. Likewise, if I understand correctly I could also extend the vector resulting from unlist and turn it into a 20 element vector and subset by df[ x == y ] <- 0 and df[ x != y] <- "" Or I could create a couple of new dataframes, change NA accordingly, and then reassemble.

But there has to be a simpler way of doing this. I am guessing that this is an issue I will continue to run into, so I am hoping rather than just getting a solution, I can actually understand how to do this 'right' (which in R will probably give me 8 suggestions from 5 people).

3 Answers 3

3

You can treat numeric and character columns separately.

char_cols <- sapply(df, is.character)
num_cols <- sapply(df, is.numeric)

df[char_cols][is.na(df[char_cols])] <- ''
df[num_cols][is.na(df[num_cols])] <- 0
df

#  col1 col2 col3 col4 col5
#1 text      text   17    3
#2      text        22    0
#3 text text text    0    3
#4                   0   17
Sign up to request clarification or add additional context in comments.

2 Comments

Way simpler haha! Nice!
Yes. This is it. I am not sure why I wasn't able to sort this out. Just write it up step by step using an intermediate variable. Once it works you can always go back and remove the variable by nesting the function. Thanks.
2

If you define the data frame with strings as factors turned off, then you can simply subset the entire data frame to replace NA values with empty string:

df <- data.frame(col1, col2, col3, col4, col5,
                 stringsAsFactors=FALSE)
df[is.na(df)] <- ""
df

  col1 col2 col3 col4 col5
1 text      text   17    3
2      text        22     
3 text text text         3
4                       17

1 Comment

Thank you. I noticed that my columns were coerced to factors but wasn’t sure I could stop that. That said, independent of this solution, how would I go about sub setting a dataframe with a logical vector and matrix at the same time,
2

You can use mutate_if from dplyr to replace NA based in the column types:

library(dplyr)

df %>%
  mutate_if(is.numeric, ~replace(.x, is.na(.x), 0)) %>%
  mutate_if(is.character, ~replace(.x, is.na(.x), ""))

Output:

  col1 col2 col3 col4 col5
1 text      text   17    3
2      text        22    0
3 text text text    0    3
4                   0   17

A possible way to do that in R base:

# Identify the numeric and character columns
num_cols <- sapply(df, class) == "numeric"
char_cols <- sapply(df, class) == "character"

# Replace NA accordingly
apply(df[, num_cols], 2, function(col) replace(col, is.na(col), 0))
apply(df[, char_cols], 2, function(col) replace(col, is.na(col), ''))

3 Comments

Thank you. And can kinda see how this works, but I don’t really understand stand the syntax of the mutate_if statement. First it passes each individual vector to the function is numeric, but what does the ~ symbol and the .x indicate?
You're welcome! mutate_if will apply a function to each column only if the column match a condition (for example, if it's numeric). The ~ just specify we're passing a function (in our case, replace), where .x is the column. It treats each column (.x) independently. Let me know if it's clear enough.
You can also replace ~replace(.x, is.na(.x), 0) for ~ifelse(is.na(.x), 0, .x) and it should work.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.