0

I am in the process of converting to data.table and so far have not been able to find a data.table way to create a table with summary statistics based on a self-defined function. Until now, I have used dplyr to accomplish this, for which I provide the code below. Is it possible to achieve a similar thing in a neat way using data.table?

library(dplyr)
library(mlbench)
data(BostonHousing)
df <- BostonHousing

fun_stats <- function(x) {
  min <- min(x, na.rm = TRUE)
  max <- max(x, na.rm = TRUE)
  mean <- mean(x, na.rm = TRUE)
  summary <- list(min = min, max = max, mean = mean)
}

stats <- df %>%
  select_if(is.numeric) %>%
  purrr::map(fun_stats) %>%
  bind_rows(., .id = "var") %>%
  mutate(across(where(is.numeric)))

1 Answer 1

2
library(data.table)
library(mlbench)
data(BostonHousing)
dt <- as.data.table(BostonHousing)

fun_stats <- function(x) {
  min <- min(x, na.rm = TRUE)
  max <- max(x, na.rm = TRUE)
  mean <- mean(x, na.rm = TRUE)
  summary <- list(min = min, max = max, mean = mean)
}

dt[, rbindlist(lapply(.SD, fun_stats), idcol = "var"), 
   .SDcols = is.numeric]
#>         var       min      max        mean
#>      <char>     <num>    <num>       <num>
#>  1:    crim   0.00632  88.9762   3.6135236
#>  2:      zn   0.00000 100.0000  11.3636364
#>  3:   indus   0.46000  27.7400  11.1367787
#>  4:     nox   0.38500   0.8710   0.5546951
#>  5:      rm   3.56100   8.7800   6.2846344
#>  6:     age   2.90000 100.0000  68.5749012
#>  7:     dis   1.12960  12.1265   3.7950427
#>  8:     rad   1.00000  24.0000   9.5494071
#>  9:     tax 187.00000 711.0000 408.2371542
#> 10: ptratio  12.60000  22.0000  18.4555336
#> 11:       b   0.32000 396.9000 356.6740316
#> 12:   lstat   1.73000  37.9700  12.6530632
#> 13:    medv   5.00000  50.0000  22.5328063

Created on 2022-06-24 by the reprex package (v2.0.1)

Sign up to request clarification or add additional context in comments.

2 Comments

Note that in .SDcols you can provide a function (.SDcols = is.numeric) instead of .SDcols = sapply(dt, is.numeric. is there any reason why you chose the latter one?
Thanks @B.ChristianKamgang . I was thinking that was only in the dev version, but it's available since v1.13.0 (current CRAN is v1.14.2) so I've edited my answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.