0

I have a dataframe that looks like this

set.seed(10)
sample <- data_frame(group = c('A','B','C','C',NA,'D'),
                   var_hello = rnorm(6),
                   var_how = rnorm(6),
                   var_are = rnorm(6),
                   var_you  = rnorm(6),
                   var_buddy = rnorm(6))
# A tibble: 6 × 6
  group   var_hello    var_how     var_are    var_you  var_buddy
  <chr>       <dbl>      <dbl>       <dbl>      <dbl>      <dbl>
1     A  0.01874617 -1.2080762 -0.23823356  0.9255213 -1.2651980
2     B -0.18425254 -0.3636760  0.98744470  0.4829785 -0.3736616
3     C -1.37133055 -1.6266727  0.74139013 -0.5963106 -0.6875554
4     C -0.59916772 -0.2564784  0.08934727 -2.1852868 -0.8721588
5  <NA>  0.29454513  1.1017795 -0.95494386 -0.6748659 -0.1017610
6     D  0.38979430  0.7557815 -0.19515038 -2.1190612 -0.2537805

In my original dataset, there are many, many var_something variables.

I would like to group_by('group') and compute the mean of a subset of these var_something variables, but even this subset can be large. So I dont want to resort to typing manually each mutate for every variable.

In the example, I am interested in variables in the following list ['var_hello', 'var_are']

I dont know how to code that up efficiently in dplyr. In Pandas, one could simply write

for var in ['var_hello', 'var_are']:
 sample[computation +'_' + var] = sample.groupby('group')[var].agg('mean')

Note how I can automatically create the new column names (of the form computation_var_hello) . What is the best way to achieve that in dplyr?

Many thanks!

1
  • 1
    @ProcrastinatusMaximus this is not the case. Here the challenge is to only compute something for a subset of my columns AND not having to type them all manually as in Jake's solution Commented Nov 23, 2016 at 16:11

1 Answer 1

2

You can do this simply by using group_by and summarize_each. You then specify which variables you want to summarize, then replace the prefix in the names using setNames.

sample %>%
   group_by(group) %>%
   summarize_each(funs(mean), var_hello, var_are) %>% 
   setNames(gsub("var_","computation_var_",colnames(.)))
Sign up to request clarification or add additional context in comments.

1 Comment

thanks but that does not work. The problem is that I have many variables, and even my subset list can be very large. So I dont want to type manually the mutate statement for each of them

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.