Subsetting dataframe with logical matrix and vector

Question

Still pretty new to coding and I'm running into subsetting issues all the time. In this case, my goal is to remove NA values from my dataframe.

col1 <- c("text", NA, "text", NA)
col2 <- c(NA, "text", "text", NA)
col3 <- c("text", NA, "text", NA)
col4 <- c(17, 22, NA, NA)
col5 <- c(3, NA, 3, 17)

df <- data.frame(col1, col2, col3, col4, col5)

When I just use data[is.na(data)] <- 0 or data[is.na(data)] <- "" I get an error, which I understand is because I'm assigning the wrong type of values to the wrong column types. There is no 'numeric' empty string and there is no string with the integer value 0.

What I want is to convert all NA in the numeric columns to 0 and all NA in character columns to "". I figured out how to logically address the two parts of the question:

is.na(df)

>       col1  col2  col3  col4  col5
> [1,] FALSE  TRUE FALSE FALSE FALSE
> [2,]  TRUE FALSE  TRUE FALSE FALSE
> [3,] FALSE FALSE FALSE FALSE FALSE
> [4,]  TRUE  TRUE  TRUE FALSE FALSE

unlist(lapply(df, is.numeric), use.names=FALSE)

> [1] FALSE FALSE FALSE  TRUE  TRUE

Now with this, of course, I could simply write a for-loop to go through each loop, determine if a column is numerical or not, and then replace NA accordingly in that column. Likewise, if I understand correctly I could also extend the vector resulting from unlist and turn it into a 20 element vector and subset by df[ x == y ] <- 0 and df[ x != y] <- "" Or I could create a couple of new dataframes, change NA accordingly, and then reassemble.

But there has to be a simpler way of doing this. I am guessing that this is an issue I will continue to run into, so I am hoping rather than just getting a solution, I can actually understand how to do this 'right' (which in R will probably give me 8 suggestions from 5 people).

Ronak Shah · Accepted Answer · 2020-12-11 04:53:33Z

3

You can treat numeric and character columns separately.

char_cols <- sapply(df, is.character)
num_cols <- sapply(df, is.numeric)

df[char_cols][is.na(df[char_cols])] <- ''
df[num_cols][is.na(df[num_cols])] <- 0
df

#  col1 col2 col3 col4 col5
#1 text      text   17    3
#2      text        22    0
#3 text text text    0    3
#4                   0   17

answered Dec 11, 2020 at 4:53

Ronak Shah

391k20 gold badges173 silver badges237 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Cainã Max Couto da Silva Over a year ago

Way simpler haha! Nice!

Mario Niepel Over a year ago

Yes. This is it. I am not sure why I wasn't able to sort this out. Just write it up step by step using an intermediate variable. Once it works you can always go back and remove the variable by nesting the function. Thanks.

Tim Biegeleisen · Accepted Answer · 2020-12-11 04:23:56Z

2

If you define the data frame with strings as factors turned off, then you can simply subset the entire data frame to replace NA values with empty string:

df <- data.frame(col1, col2, col3, col4, col5,
                 stringsAsFactors=FALSE)
df[is.na(df)] <- ""
df

  col1 col2 col3 col4 col5
1 text      text   17    3
2      text        22     
3 text text text         3
4                       17

answered Dec 11, 2020 at 4:23

Tim Biegeleisen

526k32 gold badges324 silver badges399 bronze badges

1 Comment

Mario Niepel Over a year ago

Thank you. I noticed that my columns were coerced to factors but wasn’t sure I could stop that. That said, independent of this solution, how would I go about sub setting a dataframe with a logical vector and matrix at the same time,

Cainã Max Couto da Silva · Accepted Answer · 2020-12-11 04:47:42Z

2

You can use mutate_if from dplyr to replace NA based in the column types:

library(dplyr)

df %>%
  mutate_if(is.numeric, ~replace(.x, is.na(.x), 0)) %>%
  mutate_if(is.character, ~replace(.x, is.na(.x), ""))

Output:

  col1 col2 col3 col4 col5
1 text      text   17    3
2      text        22    0
3 text text text    0    3
4                   0   17

A possible way to do that in R base:

# Identify the numeric and character columns
num_cols <- sapply(df, class) == "numeric"
char_cols <- sapply(df, class) == "character"

# Replace NA accordingly
apply(df[, num_cols], 2, function(col) replace(col, is.na(col), 0))
apply(df[, char_cols], 2, function(col) replace(col, is.na(col), ''))

edited Dec 11, 2020 at 4:47

answered Dec 11, 2020 at 4:27

Cainã Max Couto da Silva

4,9691 gold badge15 silver badges39 bronze badges

3 Comments

Mario Niepel Over a year ago

Thank you. And can kinda see how this works, but I don’t really understand stand the syntax of the mutate_if statement. First it passes each individual vector to the function is numeric, but what does the ~ symbol and the .x indicate?

Cainã Max Couto da Silva Over a year ago

You're welcome! mutate_if will apply a function to each column only if the column match a condition (for example, if it's numeric). The ~ just specify we're passing a function (in our case, replace), where .x is the column. It treats each column (.x) independently. Let me know if it's clear enough.

Cainã Max Couto da Silva Over a year ago

You can also replace ~replace(.x, is.na(.x), 0) for ~ifelse(is.na(.x), 0, .x) and it should work.

Collectives™ on Stack Overflow

Subsetting dataframe with logical matrix and vector

3 Answers 3

2 Comments

1 Comment

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related