how can I apply a function to all dataframe variables?

Question

I want have a dataframe with something like 90 variables, and over 1 million observations. I want to calculate the percentage of NA rows on each variable. I have the following code: sum(is.na(dataframe$variable) / nrow(dataframe) * 100) My question is, how can I apply this function to all 90 variables, without having to type all variable names in the code?

Welcome to StackOverflow! Please read the info about how to ask a good question and how to give a reproducible example. This will make it much easier for others to help you. — Jaap
– Jaap, Commented Nov 5, 2015 at 16:13

maccruiskeen · Accepted Answer · 2015-11-05 16:14:05Z

3

Use lapply() with your method:

lapply(df, function(x) sum(is.na(x))/nrow(df)*100)

answered Nov 5, 2015 at 16:14

maccruiskeen

2,8282 gold badges15 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

davechilders Over a year ago

or this: lapply(df, function(x) mean(is.na(x)))

davechilders · Accepted Answer · 2015-11-05 16:22:40Z

3

If you want to return a data.frame rather than a list (via lapply()) or a vector (via sapply()), you can use summarise_each from the dplyr package:

library(dplyr)

df %>%
  summarise_each(funs(sum(is.na(.)) / length(.)))

or, even more concisely:

df %>% summarise_each(funs(mean(is.na(.))))

data

df <- data.frame(
  x = 1:10,
  y = 1:10,
  z = 1:10
)

df$x[c(2, 5, 7)] <- NA
df$y[c(4, 5)] <- NA

answered Nov 5, 2015 at 16:22

davechilders

9,1932 gold badges22 silver badges19 bronze badges

Collectives™ on Stack Overflow

how can I apply a function to all dataframe variables?

2 Answers 2

1 Comment

data

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

data

Comments

Your Answer

Sign up or log in

Post as a guest