ggplot aes_string doesn't work with spaces

Question

Doesn't work:

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
xcol <- "Col 1"
ycol <- "Col 2"
ggplot(data=mydat, aes_string(x=xcol, y=ycol)) + geom_point()

Works:

mydat <- data.frame(`A`=1:5, `B`=1:5)
xcol <- "A"
ycol <- "B"
ggplot(data=mydat, aes_string(x=xcol, y=ycol)) + geom_point()

Works.

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
ggplot(data=mydat, aes(x=`Col 1`, y=`Col 2`)) + geom_point()

What's the issue?

The docs for aes_string show that 1. weirdly named columns don't always work well (see second to last set of examples), and 2. aes_string and aes_ are being deprecated in favor of tidyeval — camille
– camille, Commented Aug 2, 2018 at 16:49
Also, it's interesting to see answers to this post, since a few are from before tidyeval was implemented in ggplot, and a few are from post-implementation — camille
– camille, Commented Aug 2, 2018 at 19:35

MrFlick · Accepted Answer · 2021-10-22 17:43:13Z

11

UPDATE: Note that in more recent version of ggplot2, the use of aes_string is discouraged. Instead if you need to get a column value from a string, use the .data pronoun

ggplot(data=mydat, aes(x=,.data[[xcol]], y=.data[[ycol]])) + geom_point()

ORIGINAL ANSWER: Values passed to aes_string are parse()-d. This is because you can pass things like aes_string(x="log(price)") where you aren't passing a column name but an expression. So it treats your string like an expression and when it goes to parse it, it finds the space and that's an invalid expression. You can "fix" this by wrapping column names in quotes. For example, this works

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
xcol <- "Col 1"
ycol <- "Col 2"
ggplot(data=mydat, aes_string(x=shQuote(xcol), y=shQuote(ycol))) + geom_point()

We just use shQuote() to but double quotes around our values. You could have also embedded the single ticks like you did in the other example in your string

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)
xcol <- "`Col 1`"
ycol <- "`Col 2`"
ggplot(data=mydat, aes_string(x=xcol, y=ycol)) + geom_point()

But the real best way to deal with this is to not use column names that are not valid variable names.

edited Oct 22, 2021 at 17:43

answered Aug 2, 2018 at 17:52

MrFlick

209k19 gold badges300 silver badges324 bronze badges

Sign up to request clarification or add additional context in comments.

7 Comments

thc Over a year ago

Thanks, time to re-familiarize myself with tidyverse =/

MrFlick Over a year ago

Well, this isn't the "tidyverse" way to do things any more. This is the legacy ggplot way. With modern tidyverse programming you would use quosures (or expr() or sym()) and expand those into aes(). Still doesn't really help with column names with spaces though. Those are just evil.

thc Over a year ago

I disagree that names with spaces are "not valid variable names". For example, you can do this: `x 2` <- 1 or use an explicit assign to the global environment without issue.

MrFlick Over a year ago

@thc Ok. I guess I meant variable names that don’t require being surrounded by quotes. You can never use that name without also typing the quotes. And most people go to great lengths just to avoid a few extra characters (ala non-standard evaluation)

MrFlick Over a year ago

@thc I mean you can also do `4$.^` <- 3, but I would be reluctant to call that a valid variable name. The ticks really let you circumvent the normal variable name rules.

|

camille · Accepted Answer · 2018-08-02 19:57:19Z

4

Here's a tidyeval approach, which is what the tidyverse development crew is moving towards in place of aes_ or aes_string. Tidyeval is tricky at first, but pretty well documented.

This recipe sheet isn't ggplot-specific, but it's on my bookmarks toolbar because it's pretty handy.

In this case, you want to write a function to handle making your plot. This function takes a data frame and two bare column names as arguments. Then you turn the column names into quosures with enquo, then !! unquotes them for use in aes.

library(ggplot2)

mydat <- data.frame(`Col 1`=1:5, `Col 2`=1:5, check.names=F)

pts <- function(data, xcol, ycol) {
  x_var <- enquo(xcol)
  y_var <- enquo(ycol)
  ggplot(data, aes(x = !!x_var, y = !!y_var)) +
    geom_point()
}

pts(mydat, `Col 1`, `Col 2`)

But also like @MrFlick said, do whatever you can to just use valid column names, because why not?

answered Aug 2, 2018 at 19:57

camille

16.9k18 gold badges44 silver badges67 bronze badges

3 Comments

thc Over a year ago

Thanks. It's mostly because read_xl preserves spaces, and it saves having to perform re-labeling of axes.

camille Over a year ago

But adding a labs line to your plot is probably easier than having to write a whole tidyeval wrapper function, no?

Dario Over a year ago

It's still useful to be able to do this because you might be writing a plotting function that end-users of your R package will use and it's nice to automatically set axis labels on the plot which do not look like programmer variable names and saves your end-user from having to manually add nice labs themselves.

TQCH · Accepted Answer · 2021-01-28 23:16:23Z

3

To whom it may still concern, if the column name happens to contain space or some math symbols like >, <, or =, one easy workaround is to wrap your string with as.name() when passing it to aes_string().

edited Jan 28, 2021 at 23:16

answered Oct 19, 2020 at 2:29

TQCH

1,2921 gold badge8 silver badges14 bronze badges

Collectives™ on Stack Overflow

ggplot aes_string doesn't work with spaces

3 Answers 3

7 Comments

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related