2

I have been trying to create a new dataframe from several computations with lapply(). I have reached this so far reading several questions (1, 2, 3):

lapply(mtcars, function(x) c(colnames(x), 
                             NROW(unique(x)), 
                             sum(is.na(x)), 
                             round(sum(is.na(x))/NROW(x),2)   
                        )
       )

However, colnames(x) doesn't give the colname as x it's a vector. Second, I can't figure out a way to transform this output into a dataframe:

lapply(mtcars, function(x) data.frame(NROW(unique(x)), # if I put colnames(x) here it gives an error
                                      sum(is.na(x)), 
                                      round(sum(is.na(x))/NROW(x),2)   
                        )
       )

As you might see above, the final dataframe should follow a structure like:

| Variable_name | sum_unique | NA_count | NA_percent |

1 Answer 1

4

The following will work. First, create a list with each element as a data frame, and then combine all data frames to get the final output.

lst <- lapply(1:ncol(mtcars), function(i){
  x <- mtcars[[i]]
  data.frame(
    Variable_name = colnames(mtcars)[[i]],
    sum_unique = NROW(unique(x)), 
    NA_count = sum(is.na(x)), 
    NA_percent = round(sum(is.na(x))/NROW(x),2))  
  })

do.call(rbind, lst)
#    Variable_name sum_unique NA_count NA_percent
# 1            mpg         25        0          0
# 2            cyl          3        0          0
# 3           disp         27        0          0
# 4             hp         22        0          0
# 5           drat         22        0          0
# 6             wt         29        0          0
# 7           qsec         30        0          0
# 8             vs          2        0          0
# 9             am          2        0          0
# 10          gear          3        0          0
# 11          carb          6        0          0

Since you tagged this post with tidyverse, here I provided another alternative that uses map_dfr, which leads to a more concise code.

library(tidyverse)

map_dfr(mtcars, function(x){
  tibble(sum_unique = NROW(unique(x)), 
         NA_count = sum(is.na(x)), 
         NA_percent = round(sum(is.na(x))/NROW(x),2))
}, .id = "Variable_name")
# # A tibble: 11 x 4
#    Variable_name sum_unique NA_count NA_percent
#    <chr>              <int>    <int>      <dbl>
#  1 mpg                   25        0          0
#  2 cyl                    3        0          0
#  3 disp                  27        0          0
#  4 hp                    22        0          0
#  5 drat                  22        0          0
#  6 wt                    29        0          0
#  7 qsec                  30        0          0
#  8 vs                     2        0          0
#  9 am                     2        0          0
# 10 gear                   3        0          0
# 11 carb                   6        0          0

Finally, another solution using functions from dplyr and tidyr.

mtcars %>%
  summarize_all(
    list(
      sum_unique = function(x) NROW(unique(x)), 
      NA_count = function(x) sum(is.na(x)), 
      NA_percent = function(x) round(sum(is.na(x))/NROW(x),2)
    )
  ) %>%
  pivot_longer(everything(), 
               names_to = "column", 
               values_to = "value") %>%
  separate(column, into = c("Variable_name", "parameter"), sep = "_", extra = "merge") %>%
  pivot_wider(names_from = "parameter", values_from = "value")
# # A tibble: 11 x 4
#    Variable_name sum_unique NA_count NA_percent
#    <chr>              <int>    <int>      <dbl>
#  1 mpg                   25        0          0
#  2 cyl                    3        0          0
#  3 disp                  27        0          0
#  4 hp                    22        0          0
#  5 drat                  22        0          0
#  6 wt                    29        0          0
#  7 qsec                  30        0          0
#  8 vs                     2        0          0
#  9 am                     2        0          0
# 10 gear                   3        0          0
# 11 carb                   6        0          0
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.