0

I have a dataset with 2 columns as follows:

# loading some libraries
library( data.table )
library( ggplot2 )
library( grid )
library( gridExtra )

# generating the data
set.seed( 2017 )
dt = data.table( x = rnorm( 500 ), y = rnorm( 500, 1, 0.5 ) )

I run multiple k-means with kmeans() function using 2 and 3 centers as follows:

# cluster the data
n_k = 2:3
for ( i in seq_along( n_k ) ) {
  assign( paste0( "cl_", n_k[ i ] ), 
          kmeans( dt[ , .( x, y ) ], centers = n_k[ i ] ) )
  dt[ , ( paste0( "cl_", n_k[ i ] ) ) := 
    as.factor( get( paste0( "cl_", n_k[ i ] ) )$cluster ) ][]
}

So now I have have added the columns cl_2 and cl_3 to my dataset dt. I want to use these two columns as my color set within two plots generated with ggplot2. So far, I put all in a for-loop again to build the two plots. What does not work is just the color specification. For instance, it ignores column cl_2 and considers only cl_3. Here is the plot generation:

# building plots
for ( i in seq_along( n_k ) ) {
  assign( paste0( "p_", n_k[ i ] ),
          ggplot( data = dt, 
                       aes( x = x, y = y, 
                       color = get( paste0( "cl_", n_k[ i ] ) ) ) ) +
            geom_point() +
            ggtitle( paste0( "kmeans with ", n_k[ i ], " centers" ) ) )
}

I plot these as follows:

grid.arrange( p_2, p_3, ncol = 2 )

What puzzles me is that if I built the two plots manually, everything works just as expected. For instance, doing the following produces correct results:

p_2 = ggplot( data = dt, aes( x = x, y = y, 
                              color = get( paste0( "cl_", n_k[ 1 ] ) ) ) ) +
  geom_point()
p_3 = ggplot( data = dt, aes( x = x, y = y, 
                              color = get( paste0( "cl_", n_k[ 2 ] ) ) ) ) +
  geom_point()

Any hints on what I am doing wrong?

3
  • Take a look at ?aes_. This lets you use columns by names given as strings Commented Jun 29, 2017 at 9:20
  • @GregordeCillia good call, but it didn't work... Commented Jun 29, 2017 at 9:25
  • 1
    Yes my bad, I meant aes_string. aes_ also works, but the syntax is a little different. Commented Jun 29, 2017 at 9:32

1 Answer 1

1

You can use aes_string to call columns through strings rather than using get. It is important tough, that you also use "x" rather than x since "mixed types" are not allowed in aes_string.

aes_ and aes_string require you to explicitly quote the inputs either with "" for aes_string(), or with quote or ~ for aes_(). (aes_q is an alias to aes_). This makes aes_ and aes_string easy to program with.

# loading some libraries
library( data.table )
library( ggplot2 )
library( grid )
library( gridExtra )

# generating the data
set.seed( 2017 )
dt = data.table( x = rnorm( 500 ), y = rnorm( 500, 1, 0.5 ) )

# cluster the data
n_k = 2:3
for ( i in seq_along( n_k ) ) {
  assign( paste0( "cl_", n_k[ i ] ),
          kmeans( dt[ , .( x, y ) ], centers = n_k[ i ] ) )
  dt[ , ( paste0( "cl_", n_k[ i ] ) ) :=
        as.factor( get( paste0( "cl_", n_k[ i ] ) )$cluster ) ][]
}

# building plots
for ( i in seq_along( n_k ) ) {
  assign( paste0( "p_", n_k[ i ] ),
          ggplot( data = dt,
                  aes_string( x = "x", y = "y",
                        color = paste0( "cl_", n_k[ i ] ) ) ) +
            geom_point() +
            ggtitle( paste0( "kmeans with ", n_k[ i ], " centers" ) ) )
}

grid.arrange( p_2, p_3, ncol = 2 )
Sign up to request clarification or add additional context in comments.

1 Comment

I think I did that...sorry

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.