1

I want to create a summary statistics table for some summary functions for multiple variables. I've managed to do it using summarise and across, but I get a wide dataframe which is hard to read. Is there a better alternative (perhaps using purrr), or is there an easy way of reshaping the data?

Here is a reproducible example (the funs list contains additional functions I've created myself):

data <- as.data.frame(cbind(estimator1 = rnorm(3), 
                            estimator2 = runif(3)))
funs <- list(mean = mean, median = median)

If I use summarise and across I obtain:

estimator1_mean estimator1_median estimator2_mean estimator2_median
0.9506083          1.138536       0.5789924         0.7598719

What I would like to obtain is:

         estimator1 estimator2
mean     0.9506083  0.5789924        
median   1.138536   0.7598719
1
  • You could do: tidyr::pivot_longer(df, everything(), names_sep = "_", names_to = c(".value", "metric")) after summarise/across Commented Mar 8, 2023 at 10:32

2 Answers 2

2

base R approach:

Using sapply:

sapply(data, \(x) sapply(funs, \(f) f(x) )) is applying a nested sapply() function to data and funs. For each element x of data, it applies each function f in funs to x using the inner sapply() function.

Both functions applied are anonymous functions defined with the \(f) syntax, which takes one argument f.

Having our given funs <- list(mean = mean, median = median)

The code sapply(data, \(x) sapply(funs, \(f) f(x) )) will apply mean() and median() to each element of data and return a matrix with the results:

sapply(data, \(x) sapply(funs, \(f) f(x) ))
       estimator1 estimator2
mean    0.3081365  0.4251447
median  0.2159416  0.3198206
Sign up to request clarification or add additional context in comments.

1 Comment

thank you, this is also very clean. Could you please elaborate on the '(x)' and the '(f) f(x)'? I don't understand what is going on intuitively
1

You can use pivot_longer() with .value (".value" indicates that the corresponding component of the column name defines the name of the output column containing the cell values, overriding values_to entirely, see here), eg.

  library(dplyr)  
  data |>
    summarise(across(everything(), list(mean = mean, median = median, var = var))) |>
    tidyr::pivot_longer(cols = everything(), names_to = c(".value", "stats"), names_sep = "_")

  stats  estimator1 estimator2
  <chr>       <dbl>      <dbl>
1 mean        0.221    0.448  
2 median      0.110    0.429  
3 var         0.770    0.00288

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.