Using dplyr summarise in R with dynamic variable

Question

I am trying to use summarise and group by from dplyr in R however when I use a variable in place of explicitly calling the summarized column it uses the sum of dist for the entire data set for each row rather then grouping properly. This can easily be seen in the difference between TestBad and TestGood below. I just want to be able to replicate TestGood's results using the GraphVar variable as in TestBad.

    require("dplyr")
    GraphVar <- "dist"

    TestBad <- summarise(group_by_(cars,"speed"),Sum=sum(cars[[GraphVar]],na.rm=TRUE),Count=n())

    testGood <- summarise(group_by_(cars,"speed"),Sum=sum(dist,na.rm=TRUE),Count=n())

Thanks!

You'll need the standard evaluation functions from dplyr. See an example here and the nse vignette here — aosmith
– aosmith, Commented Aug 31, 2016 at 14:36
@aosmith They're already using standard evaluation (group_by_) and are having trouble with it, I reckon. — Frank
– Frank, Commented Aug 31, 2016 at 14:36

aosmith · Accepted Answer · 2020-02-25 15:25:27Z

14

In February 2020 there are tidyeval tools for this from package rlang. In particular, if using strings you can use the .data pronoun.

library(dplyr)
GraphVar = "dist"
cars %>%
     group_by(.data[["speed"]]) %>%
     summarise(Sum = sum(.data[[GraphVar]], na.rm = TRUE),
               Count = n() )

While they will be superseded (but not deprecated) in dplyr 1.0.0, the scoped helper *_at() functions are useful when working with strings.

cars %>%
     group_by_at("speed") %>%
     summarise_at(.vars = vars(GraphVar), 
                  .funs = list(Sum = ~sum(., na.rm = TRUE),
                               Count = ~n() ) )

In 2016 you needed the standard evaluation function summarise_() along with lazyeval::interp(). This still works in 2020 but has been deprecated.

library(lazyeval)
cars %>%
    group_by_("speed") %>%
    summarise_(Sum = interp(~sum(var, na.rm = TRUE), var = as.name(GraphVar)), 
             Count = ~n() )

edited Feb 25, 2020 at 15:25

answered Aug 31, 2016 at 14:47

aosmith

36.3k9 gold badges87 silver badges129 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

user680111 Over a year ago

this usage is deprecated

aosmith Over a year ago

@user680111 Yes, this answer is from 2016, which predates the current tidyeval approach. Was the downvote to ask for an updated answer or something else?

user680111 Over a year ago

yeah - update would be appreciated. Most of the solutions for dynamic variable selection in dplyr correspond to obsolete usage

aosmith Over a year ago

@user680111 I updated yesterday. It's actually interesting that the old way, while deprecated, still works.

Indranil Gayen Over a year ago

how to do the .data pronounce for more than one variable

|

James Baye · Accepted Answer · 2020-12-24 13:36:54Z

5

The latest usage for referring to one or more columns by name seems to be

cars %>% group_by(across("speed")) %>% ...
cars %>% group_by(across(c("speed", "dist"))) %>% ...

See vignette("colwise"), section Other verbs.

answered Dec 24, 2020 at 13:36

James Baye

631 silver badge5 bronze badges

Collectives™ on Stack Overflow

Using dplyr summarise in R with dynamic variable

2 Answers 2

6 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related