2

I want have a dataframe with something like 90 variables, and over 1 million observations. I want to calculate the percentage of NA rows on each variable. I have the following code: sum(is.na(dataframe$variable) / nrow(dataframe) * 100) My question is, how can I apply this function to all 90 variables, without having to type all variable names in the code?

2
  • 2
    lapply(df, yourfunction) Commented Nov 5, 2015 at 16:12
  • 1
    Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you. Commented Nov 5, 2015 at 16:13

2 Answers 2

3

Use lapply() with your method:

lapply(df, function(x) sum(is.na(x))/nrow(df)*100)
Sign up to request clarification or add additional context in comments.

1 Comment

or this: lapply(df, function(x) mean(is.na(x)))
3

If you want to return a data.frame rather than a list (via lapply()) or a vector (via sapply()), you can use summarise_each from the dplyr package:

library(dplyr)

df %>%
  summarise_each(funs(sum(is.na(.)) / length(.)))

or, even more concisely:

df %>% summarise_each(funs(mean(is.na(.)))) 

data

df <- data.frame(
  x = 1:10,
  y = 1:10,
  z = 1:10
)

df$x[c(2, 5, 7)] <- NA
df$y[c(4, 5)] <- NA

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.