0

I am trying to find the total count of all missing values including NA, "", and NULL per column in a data frame. The summary() function only shows the NA values and even the VIM package does the same.

In the PASWR::titanic3 dataset, there are factor columns with empty string which is not being captured in my missingness analysis.

What is a good approach to include the counts of these missing values? Additionally, is there a way to show all the types/frequency of missing values?

Thanks in advance.

1
  • 2
    you can simply covert all forms of missing values to NA before using summary() Commented Oct 28, 2018 at 22:42

2 Answers 2

1

You should try using a user created function. Here is the one I came up with:

library(tidyverse)

test_function <- function(vector){
    ##The ifelse returns TRUE if the element in the vector is NA, NULL, or ""
    x <- ifelse(is.na(vector)|vector == ""|is.null(vector), TRUE, FALSE)

    ##Returns the sum of boolean vector (FALSE = 0, TRUE = 1)
    return(sum(x))
}

To apply the function to a dataframe you can use any of the apply function, but I recommend sapply, since it returns a vector.

##Create a data frame with mock data

test_df <- tibble(x = c(NA, NA, NA, "","",1,2,3),
   y = c(NA, "","","","","","",1),
   z = c(0,0,0,0,0,0,0,0))

##Assign the result to a new variable
 total_missing_by_column <- sapply(test_df, test_function)

##You can also build a data frame with the variables and the total missing

tibble(variable = colnames(test_df),
   total_missing = sapply(test_df, test_function))

Hope it helps

Sign up to request clarification or add additional context in comments.

Comments

1

Simply convert missing values other than NA with

df[df %in% c("NULL", "")] <- NA

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.