8

I am trying to use summarise and group by from dplyr in R however when I use a variable in place of explicitly calling the summarized column it uses the sum of dist for the entire data set for each row rather then grouping properly. This can easily be seen in the difference between TestBad and TestGood below. I just want to be able to replicate TestGood's results using the GraphVar variable as in TestBad.

    require("dplyr")
    GraphVar <- "dist"

    TestBad <- summarise(group_by_(cars,"speed"),Sum=sum(cars[[GraphVar]],na.rm=TRUE),Count=n())

    testGood <- summarise(group_by_(cars,"speed"),Sum=sum(dist,na.rm=TRUE),Count=n())

Thanks!

2
  • You'll need the standard evaluation functions from dplyr. See an example here and the nse vignette here Commented Aug 31, 2016 at 14:36
  • @aosmith They're already using standard evaluation (group_by_) and are having trouble with it, I reckon. Commented Aug 31, 2016 at 14:36

2 Answers 2

14

In February 2020 there are tidyeval tools for this from package rlang. In particular, if using strings you can use the .data pronoun.

library(dplyr)
GraphVar = "dist"
cars %>%
     group_by(.data[["speed"]]) %>%
     summarise(Sum = sum(.data[[GraphVar]], na.rm = TRUE),
               Count = n() )

While they will be superseded (but not deprecated) in dplyr 1.0.0, the scoped helper *_at() functions are useful when working with strings.

cars %>%
     group_by_at("speed") %>%
     summarise_at(.vars = vars(GraphVar), 
                  .funs = list(Sum = ~sum(., na.rm = TRUE),
                               Count = ~n() ) )

In 2016 you needed the standard evaluation function summarise_() along with lazyeval::interp(). This still works in 2020 but has been deprecated.

library(lazyeval)
cars %>%
    group_by_("speed") %>%
    summarise_(Sum = interp(~sum(var, na.rm = TRUE), var = as.name(GraphVar)), 
             Count = ~n() )
Sign up to request clarification or add additional context in comments.

6 Comments

this usage is deprecated
@user680111 Yes, this answer is from 2016, which predates the current tidyeval approach. Was the downvote to ask for an updated answer or something else?
yeah - update would be appreciated. Most of the solutions for dynamic variable selection in dplyr correspond to obsolete usage
@user680111 I updated yesterday. It's actually interesting that the old way, while deprecated, still works.
how to do the .data pronounce for more than one variable
|
5

The latest usage for referring to one or more columns by name seems to be

cars %>% group_by(across("speed")) %>% ...
cars %>% group_by(across(c("speed", "dist"))) %>% ...

See vignette("colwise"), section Other verbs.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.