1

I have a large dataset with multiple dependent variables but with only two independent variables (that I will be using over and over again to sort the many dependent variables). Each of the dependent variables was measured twice, once before and once after treatment. I would like to write a function that allows me to obtain a graph for each of these multiple dependent variables, with the arguments of the function as the two column names for whichever of the dependent variables i wish to graph.

I have generated a toy dataset to illustrate my problem. 't1DV1' and 't1DV2' are the pre- and post- treatment scores for dependent variable 1. 't1DV2' and 't2DV2' are pre- and post- treatment scores for dependent variable 2. 'group' is the independent variable.

group <- factor(rep(c("A", "B"), 10))
t1DV1 <- runif(20, min = 0, max = 10)
t2DV1 <- runif(20, min = 0, max = 10)
t1DV2 <- runif(20, min = 0, max = 10)
t2DV2 <- runif(20, min = 0, max = 10)

df <- data.frame(group, t1DV1, t2DV1, t1DV2, t2DV2)

df

I tried writing the following function

DVGraph <- function (DV1, DV2) { 

require(tidyr)

dfLong <- gather(df, prePost, Score, DV1:DV1)

require(ggplot2)

barGraph <- ggplot(dfLong, aes(group, Score, fill = prePost)) + 
  geom_bar(stat = "identity", position = "dodge", size = 0.5) +
  scale_fill_manual(values = c("#999999", "#666666")) +
  xlab("") +
  ylab("Scores") +
  theme_bw()

return(barGraph)

}

And then tried calling it using the first of the repeated measures variables (I could equally have used the second, i.e. t1DV2 and t2DV2)

DVGraph(t1DV1, t2DV1)

But I get an error.

I tried using inverted commas like so

DVGraph("t1DV1", "t2DV1")

But i got another (different) error.

Does anyone know how I might go about this?

1 Answer 1

1

Alter your gather call to the following:

dfLong <- gather(df, prePost, Score, DV1, DV2)

Then when you call your function, use the column numbers instead of the column names:

DVGraph(2, 3)

enter image description here

Alternatively, you can replace gather() with melt() from reshape2 with substitute() in order to be able to call the function with the unquoted variables:

DVGraph <- function (DV1, DV2) { 

  require(tidyr)
  require(reshape2)

  dfLong <- melt(df,measure.vars = c(substitute(DV1),substitute(DV2)),
                 var="prePost",value.name ="Score")

  require(ggplot2)

  barGraph <- ggplot(dfLong, aes(group, Score, fill = prePost)) + 
    geom_bar(stat = "identity", position = "dodge", size = 0.5) +
    scale_fill_manual(values = c("#999999", "#666666")) +
    xlab("") +
    ylab("Scores") +
    theme_bw()

  return(barGraph)

}

DVGraph(t1DV2, t2DV1)

Update:

If you want to do what you asked about in your comment, one quick fix is to recognize that using substitute() forces your vector to be a list, but you can force it to be a character by using as.character(substitute()) as follows:

createFrame <- function (DV1, DV2) { 
  extractCols <- c("group", as.character(substitute(DV1)), as.character(substitute(DV2)))
  newFrame <- df[,extractCols]
  return(newFrame) 
}

createFrame(t1DV1, t2DV1) 
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Sam! Worked a treat.
Sam, I have a lot of variables in my dataset. A LOT. So in addition to the above I really need a way of creating a reduced version of my dataframe within the function above. Using the principles of the 'substitute' function you showed me above i tried this: createFrame <- function (DV1, DV2) { extractCols <- c("group", substitute(DV1), substitute(DV2)) newFrame <- df[,extractCols] return(newFrame) } createFrame(t1DV1, t2DV1) But this does not work. Any idea how I might do this? Or is this perhaps a separate question for the boards?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.