Dplyr: how to loop over specific columns whose names are in a list?

Question

I have a dataframe that looks like this

set.seed(10)
sample <- data_frame(group = c('A','B','C','C',NA,'D'),
                   var_hello = rnorm(6),
                   var_how = rnorm(6),
                   var_are = rnorm(6),
                   var_you  = rnorm(6),
                   var_buddy = rnorm(6))
# A tibble: 6 × 6
  group   var_hello    var_how     var_are    var_you  var_buddy
  <chr>       <dbl>      <dbl>       <dbl>      <dbl>      <dbl>
1     A  0.01874617 -1.2080762 -0.23823356  0.9255213 -1.2651980
2     B -0.18425254 -0.3636760  0.98744470  0.4829785 -0.3736616
3     C -1.37133055 -1.6266727  0.74139013 -0.5963106 -0.6875554
4     C -0.59916772 -0.2564784  0.08934727 -2.1852868 -0.8721588
5  <NA>  0.29454513  1.1017795 -0.95494386 -0.6748659 -0.1017610
6     D  0.38979430  0.7557815 -0.19515038 -2.1190612 -0.2537805

In my original dataset, there are many, many var_something variables.

I would like to group_by('group') and compute the mean of a subset of these var_something variables, but even this subset can be large. So I dont want to resort to typing manually each mutate for every variable.

In the example, I am interested in variables in the following list ['var_hello', 'var_are']

I dont know how to code that up efficiently in dplyr. In Pandas, one could simply write

for var in ['var_hello', 'var_are']:
 sample[computation +'_' + var] = sample.groupby('group')[var].agg('mean')

Note how I can automatically create the new column names (of the form computation_var_hello) . What is the best way to achieve that in dplyr?

Many thanks!

@ProcrastinatusMaximus this is not the case. Here the challenge is to only compute something for a subset of my columns AND not having to type them all manually as in Jake's solution — ℕʘʘḆḽḘ
– ℕʘʘḆḽḘ, Commented Nov 23, 2016 at 16:11

Jake Kaupp · Accepted Answer · 2016-11-23 16:45:03Z

2

You can do this simply by using group_by and summarize_each. You then specify which variables you want to summarize, then replace the prefix in the names using setNames.

sample %>%
   group_by(group) %>%
   summarize_each(funs(mean), var_hello, var_are) %>% 
   setNames(gsub("var_","computation_var_",colnames(.)))

edited Nov 23, 2016 at 16:45

answered Nov 23, 2016 at 16:06

Jake Kaupp

8,0922 gold badges28 silver badges36 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

ℕʘʘḆḽḘ Over a year ago

thanks but that does not work. The problem is that I have many variables, and even my subset list can be very large. So I dont want to type manually the mutate statement for each of them

Collectives™ on Stack Overflow

Dplyr: how to loop over specific columns whose names are in a list?

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related