2

This is my first question on stackoverlow, please correct me if I am not following correct question protocols.

I am trying to create some graphs for data that has been collected over three time points (time 1, time 2, time 3) which equates to X1..., X2... and X3... at the beginning of column names. The graphs are also separated by the column $Group from the data frame.

I have no problem creating the graphs, I just have many variables (~170) and am wanting to compare time 1 vs time 2, time 2 vs time 3, etc. so am trying to work a shortcut to be running this kind of code rather than having to type out each one individually.

As indicated above, I have created variable names like X1... X2... which indicate the time that the variable was recorded i.e. X1BCSTCAT = time 1; X2BCSTCAT = time 2; X3BCSTCAT = time 3. Here is a small sample of what my data looks like:

df <- structure(list(ID = structure(1:6, .Label = c("101","102","103","118","119","120"), class = "factor"), 
                   Group = structure(c(1L,1L,1L,2L,2L,2L), .Label = c("C8","TC"), class = "factor"), 
                   Wave = structure(c(1L, 2L, 3L, 4L, 1L, 2L), .Label = c("A","B","C","D"), class = "factor"), 
                   Yr = structure(c(1L, 2L, 1L, 2L, 1L, 2L), .Label = c("3","5"), class = c("ordered", "factor")), 
                   Age.Yr. = c(10.936,10.936, 9.311, 10.881, 10.683, 11.244), 
                   Training..hr. = c(10.667,10.333, 10.667, 10.333, 10.333, 10.333), 
                   X1BCSTCAT = c(-0.156,0.637,-1.133,0.637,2.189,1.229), 
                   X1BCSTCR = c(0.484,0.192, -1.309, 0.912, 1.902, 0.484), 
                   X1BCSTPR = c(-1.773,0.859, 0.859, 0.12, -1.111, 0.12), 
                   X2BCSTCAT = c(1.006, -0.379,-1.902, 0.444, 2.074, 1.006), 
                   X2BCSTCR = c(0.405, -0.457,-1.622, 1.368, 1.981, 0.168), 
                   X2BCSTPR = c(-0.511, -0.036,2.189, -0.036, -0.894, 0.949),
                   X3BCSTCAT = c(1.18, -1.399,-1.399, 1.18, 1.18, 1.18), 
                   X3BCSTCR = c(0.967, -1.622, -1.622,0.967, 0.967, 1.255), 
                   X3BCSTPR = c(-1.282, -1.282, 1.539,1.539, 0.792, 0.792)), 
              row.names = c(1L, 2L, 3L, 4L, 5L,8L), class = "data.frame")

Here is some working code to create one graph using ggplot for time 1 vs time 2 data on one variable:

library(ggplot2)

p <- ggplot(df, aes(x=df$X1BCSTCAT, y=df$X2BCSTCAT, shape = df$Group, color = df$Group)) + 
  geom_point() + geom_smooth(method=lm, aes(fill=df$Group), fullrange = TRUE) + 
  labs(title="BCSTCAT", x="Time 1", y = "Time 2") + 
  scale_color_manual(name = "Group",labels = c("C8","TC"),values = c("blue", "red")) +
  scale_shape_manual(name = "Group",labels = c("C8","TC"),values = c(16, 17)) +
  scale_fill_manual(name = "Group",labels = c("C8", "TC"),values = c("light blue", "pink"))

So I am really trying to create some kind of a shortcut where R will cycle through and match up variable names X1... vs X2... and so on and create the graphs. I assume there must be some way to plot either based upon matching column numbers e.g. df[,7] vs df[,10] and iterating through this process or plotting by actually matching the names (where the only difference in variable names is the number which indicates time).

I have previously cycled through creating individual graphs using the lapply function, but have no idea where to even start with trying to do this one.

1

1 Answer 1

2

A solution using tidyeval approach. We will need ggplot2 v3.0.0 (remember to restart your R session)

install.packages("ggplot2", dependencies = TRUE)
  • First we build a function that takes column and group names as inputs. Note the use of rlang::sym, rlang::quo_name & !!.

  • Then create 2 name vectors for x- & y- values so that we can loop through them simultaneously using purrr::map2.

library(rlang)
library(tidyverse)

df <- structure(list(ID = structure(1:6, .Label = c("101","102","103","118","119","120"), class = "factor"), 
                   Group = structure(c(1L,1L,1L,2L,2L,2L), .Label = c("C8","TC"), class = "factor"), 
                   Wave = structure(c(1L, 2L, 3L, 4L, 1L, 2L), .Label = c("A","B","C","D"), class = "factor"), 
                   Yr = structure(c(1L, 2L, 1L, 2L, 1L, 2L), .Label = c("3","5"), class = c("ordered", "factor")), 
                   Age.Yr. = c(10.936,10.936, 9.311, 10.881, 10.683, 11.244), 
                   Training..hr. = c(10.667,10.333, 10.667, 10.333, 10.333, 10.333), 
                   X1BCSTCAT = c(-0.156,0.637,-1.133,0.637,2.189,1.229), 
                   X1BCSTCR = c(0.484,0.192, -1.309, 0.912, 1.902, 0.484), 
                   X1BCSTPR = c(-1.773,0.859, 0.859, 0.12, -1.111, 0.12), 
                   X2BCSTCAT = c(1.006, -0.379,-1.902, 0.444, 2.074, 1.006), 
                   X2BCSTCR = c(0.405, -0.457,-1.622, 1.368, 1.981, 0.168), 
                   X2BCSTPR = c(-0.511, -0.036,2.189, -0.036, -0.894, 0.949),
                   X3BCSTCAT = c(1.18, -1.399,-1.399, 1.18, 1.18, 1.18), 
                   X3BCSTCR = c(0.967, -1.622, -1.622,0.967, 0.967, 1.255), 
                   X3BCSTPR = c(-1.282, -1.282, 1.539,1.539, 0.792, 0.792)), 
              row.names = c(1L, 2L, 3L, 4L, 5L,8L), class = "data.frame")

# define a function that accept strings as input
pair_plot <- function(x_var, y_var, group_var) {

  # convert strings to symbols
  x_var <- rlang::sym(x_var)
  y_var <- rlang::sym(y_var)
  group_var <- rlang::sym(group_var)

  # unquote symbols using !! 
  ggplot(df, aes(x = !! x_var, y = !! y_var, shape = !! group_var, color = !! group_var)) + 
    geom_point() + geom_smooth(method = lm, aes(fill = !! group_var), fullrange = TRUE) + 
    labs(title = "BCSTCAT", x = rlang::quo_name(x_var), y = rlang::quo_name(y_var)) +
    scale_color_manual(name = "Group", labels = c("C8", "TC"), values = c("blue", "red")) +
    scale_shape_manual(name = "Group", labels = c("C8", "TC"), values = c(16, 17)) +
    scale_fill_manual(name = "Group",  labels = c("C8", "TC"), values = c("light blue", "pink")) +
    theme_bw()
}

# Test if the new function works
pair_plot("X1BCSTCAT", "X2BCSTCAT", "Group")

# Create 2 parallel lists 
list_x <- colnames(df)[-c(1:6, (ncol(df)-2):(ncol(df)))]
list_x
#> [1] "X1BCSTCAT" "X1BCSTCR"  "X1BCSTPR"  "X2BCSTCAT" "X2BCSTCR"  "X2BCSTPR"

list_y <- lead(colnames(df)[-(1:6)], 3)[1:length(list_x)]
list_y
#> [1] "X2BCSTCAT" "X2BCSTCR"  "X2BCSTPR"  "X3BCSTCAT" "X3BCSTCR"  "X3BCSTPR"

# Loop through 2 lists simultaneously 
# Supply inputs to pair_plot function using purrr::map2
map2(list_x, list_y, ~ pair_plot(.x, .y, "Group"))

Sample outputs:

#> [[1]]

#> 
#> [[2]]

Created on 2018-05-24 by the reprex package (v0.2.0).

Sign up to request clarification or add additional context in comments.

2 Comments

Hi @Tung, thanks for your answer. I have installed ggplot via devtools but I am getting this error message when I try and run the test function for pair_plot: Error in get(Info[i, 1], envir = env) : lazy-load database '/Library/Frameworks/R.framework/Versions/3.5/Resources/library/ggplot2/R/ggplot2.rdb' is corrupt In addition: Warning message: In get(Info[i, 1], envir = env) : internal error -3 in R_decompress1
Sorry, fixed this issue. Had to remove and reinstall with dependencies and restart R session. All good. Thanks so much for your answer!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.