8

Doesn't work:

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
xcol <- "Col 1"
ycol <- "Col 2"
ggplot(data=mydat, aes_string(x=xcol, y=ycol)) + geom_point()

Works:

mydat <- data.frame(`A`=1:5, `B`=1:5)
xcol <- "A"
ycol <- "B"
ggplot(data=mydat, aes_string(x=xcol, y=ycol)) + geom_point()

Works.

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
ggplot(data=mydat, aes(x=`Col 1`, y=`Col 2`)) + geom_point()

What's the issue?

4
  • 1
    The docs for aes_string show that 1. weirdly named columns don't always work well (see second to last set of examples), and 2. aes_string and aes_ are being deprecated in favor of tidyeval Commented Aug 2, 2018 at 16:49
  • @camille Thanks, do you have a link to explaining tidyeval? Commented Aug 2, 2018 at 19:19
  • Sure, here's one: colinfay.me/tidyeval-1 Commented Aug 2, 2018 at 19:33
  • Also, it's interesting to see answers to this post, since a few are from before tidyeval was implemented in ggplot, and a few are from post-implementation Commented Aug 2, 2018 at 19:35

3 Answers 3

11

UPDATE: Note that in more recent version of ggplot2, the use of aes_string is discouraged. Instead if you need to get a column value from a string, use the .data pronoun

ggplot(data=mydat, aes(x=,.data[[xcol]], y=.data[[ycol]])) + geom_point()

ORIGINAL ANSWER: Values passed to aes_string are parse()-d. This is because you can pass things like aes_string(x="log(price)") where you aren't passing a column name but an expression. So it treats your string like an expression and when it goes to parse it, it finds the space and that's an invalid expression. You can "fix" this by wrapping column names in quotes. For example, this works

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
xcol <- "Col 1"
ycol <- "Col 2"
ggplot(data=mydat, aes_string(x=shQuote(xcol), y=shQuote(ycol))) + geom_point()

We just use shQuote() to but double quotes around our values. You could have also embedded the single ticks like you did in the other example in your string

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
xcol <- "`Col 1`"
ycol <- "`Col 2`"
ggplot(data=mydat, aes_string(x=xcol, y=ycol)) + geom_point()

But the real best way to deal with this is to not use column names that are not valid variable names.

Sign up to request clarification or add additional context in comments.

7 Comments

Thanks, time to re-familiarize myself with tidyverse =/
Well, this isn't the "tidyverse" way to do things any more. This is the legacy ggplot way. With modern tidyverse programming you would use quosures (or expr() or sym()) and expand those into aes(). Still doesn't really help with column names with spaces though. Those are just evil.
I disagree that names with spaces are "not valid variable names". For example, you can do this: `x 2` <- 1 or use an explicit assign to the global environment without issue.
@thc Ok. I guess I meant variable names that don’t require being surrounded by quotes. You can never use that name without also typing the quotes. And most people go to great lengths just to avoid a few extra characters (ala non-standard evaluation)
@thc I mean you can also do `4$.^` <- 3, but I would be reluctant to call that a valid variable name. The ticks really let you circumvent the normal variable name rules.
|
4

Here's a tidyeval approach, which is what the tidyverse development crew is moving towards in place of aes_ or aes_string. Tidyeval is tricky at first, but pretty well documented.

This recipe sheet isn't ggplot-specific, but it's on my bookmarks toolbar because it's pretty handy.

In this case, you want to write a function to handle making your plot. This function takes a data frame and two bare column names as arguments. Then you turn the column names into quosures with enquo, then !! unquotes them for use in aes.

library(ggplot2)

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)

pts <- function(data, xcol, ycol) {
  x_var <- enquo(xcol)
  y_var <- enquo(ycol)
  ggplot(data, aes(x = !!x_var, y = !!y_var)) +
    geom_point()
}

pts(mydat, `Col 1`, `Col 2`)

But also like @MrFlick said, do whatever you can to just use valid column names, because why not?

3 Comments

Thanks. It's mostly because read_xl preserves spaces, and it saves having to perform re-labeling of axes.
But adding a labs line to your plot is probably easier than having to write a whole tidyeval wrapper function, no?
It's still useful to be able to do this because you might be writing a plotting function that end-users of your R package will use and it's nice to automatically set axis labels on the plot which do not look like programmer variable names and saves your end-user from having to manually add nice labs themselves.
3

To whom it may still concern, if the column name happens to contain space or some math symbols like >, <, or =, one easy workaround is to wrap your string with as.name() when passing it to aes_string().

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.