4

Suppose I have the following function

SlowFunction = function(vector){
  return(list(
    mean =mean(vector),
    sd  = sd(vector)
    ))
  }

And I would like to use dplyr:summarise to write the results to a dataframe:

iris %>% 
  dplyr::group_by(Species) %>% 
  dplyr::summarise(
    mean = SlowFunction(Sepal.Length)$mean,
    sd   = SlowFunction(Sepal.Length)$sd
    )

Does anyone have a suggestion how I can do this by calling "SlowFunction" once instead of twice? (In my code "SlowFunction" is a slow function that I have to call many times.) Without splitting "SlowFunction" in two parts of course. So actually I would like to somehow fill multiple columns of a dataframe in one statement.

1

4 Answers 4

4

Without changing your current SlowFunction one way is to use do

library(dplyr)

iris %>% 
   group_by(Species) %>% 
   do(data.frame(SlowFunction(.$Sepal.Length)))

#  Species     mean    sd
#  <fct>      <dbl> <dbl>
#1 setosa      5.01 0.352
#2 versicolor  5.94 0.516
#3 virginica   6.59 0.636

Or with group_split + purrr::map_dfr

bind_cols(Species = unique(iris$Species), iris %>%
     group_split(Species) %>%
     map_dfr(~SlowFunction(.$Sepal.Length)))
Sign up to request clarification or add additional context in comments.

Comments

3

An option is to use to store the output of SlowFunction in a list column of data.frames and then to use unnest

iris %>%
    group_by(Species) %>%
    summarise(res = list(as.data.frame(SlowFunction(Sepal.Length)))) %>%
    unnest()
## A tibble: 3 x 3
#  Species     mean    sd
#  <fct>      <dbl> <dbl>
#1 setosa      5.01 0.352
#2 versicolor  5.94 0.516
#3 virginica   6.59 0.636

1 Comment

Thanks, I've compared the answers and this one is in my case by far the fastest. It is 2x faster than using "do" and 2.5x faster than using group_map!
3

We can use group_map if you are using dplyr 0.8.0 or later. The output from SlowFunction needs to be converted to a data frame.

library(dplyr)

iris %>% 
  group_by(Species) %>% 
  group_map(~SlowFunction(.x$Sepal.Length) %>% as.data.frame())
# # A tibble: 3 x 3
# # Groups:   Species [3]
#   Species     mean    sd
#   <fct>      <dbl> <dbl>
# 1 setosa      5.01 0.352
# 2 versicolor  5.94 0.516
# 3 virginica   6.59 0.636

Comments

3

We can change the SlowFunction to return a tibble and

SlowFunction = function(vector){
  tibble(
     mean =mean(vector),
      sd  = sd(vector)
     )
   }

and then unnest the summarise output in a list

iris %>% 
    group_by(Species) %>% 
    summarise(out = list(SlowFunction(Sepal.Length))) %>%
    unnest
# A tibble: 3 x 3
#  Species     mean    sd
#  <fct>      <dbl> <dbl>
#1 setosa      5.01 0.352
#2 versicolor  5.94 0.516
#3 virginica   6.59 0.636

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.