29

With a data frame, I'm using dplyr to aggregate some column like below.

> data <- data.frame(a=rep(1:2,3), b=c(6:11))
> data
  a  b
1 1  6
2 2  7
3 1  8
4 2  9
5 1 10
6 2 11
> data %>% group_by(a) %>% summarize(tot=sum(b))
# A tibble: 2 x 2
      a   tot
  <int> <int>
1     1    24
2     2    27

This is perfect. However I want to create a re-usable function for this such that a column name can be passed as argument.

Looking at answers to related questions like here, I tried the following.

sumByColumn <- function(df, colName) {
  df %>%
  group_by(a) %>%
  summarize(tot=sum(colName))
  df
}

However I'm not able to get it working.

> sumByColumn(data, "b")

 Error in summarise_impl(.data, dots) : 
  Evaluation error: invalid 'type' (character) of argument. 

> sumByColumn(data, b)

 Error in summarise_impl(.data, dots) : 
  Evaluation error: object 'b' not found. 
> 
1

4 Answers 4

38

This can work using the latest dplyr syntax (as can be seen on github):

library(dplyr)
library(rlang)
sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarize(tot = sum(!! sym(colName)))
}

sumByColumn(data, "b")
## A tibble: 2 x 2
#      a   tot
#  <int> <int>
#1     1    24
#2     2    27

And an alternative way of specifying b as a variable:

library(dplyr)
sumByColumn <- function(df, colName) {
  myenc <- enquo(colName)
  df %>%
    group_by(a) %>%
    summarize(tot = sum(!!myenc))
}

sumByColumn(data, b)
## A tibble: 2 x 2
#      a   tot
#  <int> <int>
#1     1    24
#2     2    27
Sign up to request clarification or add additional context in comments.

2 Comments

this works ... However if I were to add filter( !!myenc > 7 ) before group_by it doesn't return any rows. What would be the right way to specify the column name inside filter() ?
This is part of the documentation. Instead of !! (which is a convenience function and which is not working with logical vectors), use UQ which is the proper function. i.e. filter(UQ(myenc) > 7). Then it works fine.
15

We can use {{}}:

library(dplyr)

sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarize(tot=sum({{colName}}))
}

sumByColumn(data, b)

#      a   tot
#  <int> <int>
#1     1    24
#2     2    27

2 Comments

Where is {{ documented? I'd like to read more about it.
@filups21 They talk a little about it in the 2019 release for rlang 0.4.0. You will sometimes see it called the curly curly operator.
8

We can use the .data pronoun.

library(dplyr)

sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarise(tot = sum(.data[[colName]]))
}

sumByColumn(data, "b")

#      a   tot
#* <int> <int>
#1     1    24
#2     2    27

Comments

6

dplyr now also provides helper functions (summarise_at, which accepts arguments vars, funs) for this

sumByColumn <- function(df, colName) {
  df %>%
    group_by(a) %>%
    summarize_at(vars(colName), funs(tot = sum))
}

provides the same answer

# A tibble: 2 x 2
      # a   tot
  # <int> <int>
# 1     1    24
# 2     2    27

3 Comments

Note that the last line could be: summarize_at(colName, sum)
@G.Grothendieck, funs(tot = sum) in case OP wanted to rename the column
verbs with _at are now depleted, so the other answers are preferable for people who get to this question nowadays.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.