I want have a dataframe with something like 90 variables, and over 1 million observations. I want to calculate the percentage of NA rows on each variable. I have the following code: sum(is.na(dataframe$variable) / nrow(dataframe) * 100) My question is, how can I apply this function to all 90 variables, without having to type all variable names in the code?
2 Answers
Use lapply() with your method:
lapply(df, function(x) sum(is.na(x))/nrow(df)*100)
1 Comment
davechilders
or this:
lapply(df, function(x) mean(is.na(x)))If you want to return a data.frame rather than a list (via lapply()) or a vector (via sapply()), you can use summarise_each from the dplyr package:
library(dplyr)
df %>%
summarise_each(funs(sum(is.na(.)) / length(.)))
or, even more concisely:
df %>% summarise_each(funs(mean(is.na(.))))
data
df <- data.frame(
x = 1:10,
y = 1:10,
z = 1:10
)
df$x[c(2, 5, 7)] <- NA
df$y[c(4, 5)] <- NA
lapply(df, yourfunction)