1

I'm struggling with ggplot2 and I've been looking for a solution online for several hours. Maybe one of you can give me a help? I have a data set that looks like this (several 100's of observations):

Y-AXIS X-AXIS SUBJECT
2.2796598 F1 1
0.9118639 F1 2
2.7111228 F3 3
2.7111228 F2 4
2.2796598 F4 5
2.3876401 F10 6
.... ... ...

The X-AXIS is a continuous value larger than 0 (the upper limit can vary from data set to data set, but is typically < 100). Y-AXIS is a categorical variable with 10 levels. SUBJECT refers to an individual and, across the entire data set, each individual has exactly 10 observations, exactly 1 for each level of the categorical variable.

To generate a box plot, I used ggplot like this:

plot1 <- ggplot(longdata,
         aes(x = X_axis, y = Y_axis)) +
         geom_boxplot() + 
         ylim(0, 12.5) +
         stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")

That results in the boxplot I have in mind. You can check out the result here if you like: boxplot

So far so good. What I want to do next, and hopefully someone can help me, is this: for one specific SUBJECT, I want to plot a line for their 10 scores in the same figure. So on top of the boxplot. An example of what I have in mind can be found here: boxplot with data of one subject as a line. In this case, I simply assumed that the outliers belong to the same case. This is just an assumption. The data of an individual case can also look like this: boxplot with data of a second subject as a line

Additional tips on how to customize that line (colour, thikness, etc.) would also be appreciated. Many thanks!

4
  • what have you tried so far? Commented Oct 14, 2022 at 10:08
  • how do you know which subject you need to highlight? have you looked at a line plot of all subjects? as in ` ggplot(longdata, aes(x=X_axis, y=Y_axis, color=Subject)) + geom_line() + y_lim(...) ` Commented Oct 14, 2022 at 10:12
  • @gira, you can think of the to-be-plotted line as the (10) scores of an individual and the boxplot summarizes the scores of the sample to which the selected individual belongs. Ultimately, it would be great if selection of the relevant subject data is just a simple parameter. I just tried your suggestion and it doesn't do what I have in mind, but thanks! Commented Oct 14, 2022 at 10:24
  • @PaulStaffordAllen. For example, i tried to generate a second plot with the same dimensions using a reduced data set that included the data of just 1 subject. It's very easy to make ggplot2 draw the correct line then (for that specific subject). In then tried to overlay these two plots, but without any succes. Commented Oct 14, 2022 at 10:29

2 Answers 2

1
library(ggplot2)

It is always a good idea to add a reproducible example of your data, you can always simulate what you need

set.seed(123)
simulated_data <- data.frame(
  subject = rep(1:10, each = 10),
  xaxis = rep(paste0('F', 1:10), times = 10),
  yaxis = runif(100, 0, 100)
)

In ggplot each geom can take a data argument, for your line just use a subset of your original data, limited to the subject desired.

Colors and other visula elements for the line are simple, take a look here

ggplot() +
  geom_boxplot(data = simulated_data, aes(xaxis, yaxis)) +
  geom_line(
    data = simulated_data[simulated_data$subject == 1,], 
    aes(xaxis, yaxis),
    color = 'red',
    linetype = 2, 
    size = 1,
    group = 1
  )

Created on 2022-10-14 with reprex v2.0.2

Sign up to request clarification or add additional context in comments.

2 Comments

This is great! Many thanks. That works like a beauty. I'm a novice, so I'm not allowed to cast a vote, but I truly appreciate your help. Have a nice day!
@AdriaanSpruyt - you can click the "tick" on Johan Rosa's post to mark it as your "accepted answer". This means anyone searching a similar problem may see that yours has an answer on it they could come and check out.
1
library(ggplot2)
library(dplyr)

# Simulate some data absent a reproducible example
testData <- data.frame(
  y = runif(300,0,100),
  x = as.factor(paste0("F",rep(1:10,times=30))),
  SUBJECT = as.factor(rep(1:30, each = 10))
)

# Copy your plot with my own data + ylimits
plot1 <- ggplot(testData,
                aes(x = x, y = y)) +
  geom_boxplot() + 
  ylim(0, 100) +
  stat_summary(fun = "mean", geom = "point", shape = 2, size = 3, color = "purple")

# add the geom_line for subject 1
plot1 +
  geom_line(data = filter(testData, SUBJECT == 1),
             mapping = aes(x=x, y=y, group = SUBJECT))

My answer is very similar to Johan Rosa's but his doesn't use additional packages and makes the aesthetic options for the geom_line much more apparent - I'd follow his example if I were you!

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.