ggplot: lineplot of means of two groups

Question

I have searched and searched in the stacks for an answer to my question; this one approaches my question but I have been unsuccessful in modifying the code to fix my graph.

I have data, reshaped in long format, that looks like this:

ID          Var1      GenePosition   ContinuousOutcomeVar
1           control      X20068492 0.092813611
2           control      X20068492 0.001746708
3           case         X20068492 0.069251157
4           case         X20068492 0.003639304

Each ID has one value for ContinuousOutcomeVar per position, and there are 86 positions and 10 IDs. I want to plot a line graph with position on the x axis and the continuous outcome variable on the y axis. I want two groups: a case group and control group, so there should be two dots for every position: one is the mean value for cases, and one is the mean value for controls. Then I want a line that connects the cases, and a line that connects the controls. I know this is easy, but I'm new to R - I've been working at it for 8 hours and I can't quite get it right. Below is what I have; I'd really appreciate some insight. If this exists somewhere in the stacks, I really apologize...I honestly looked all over and tried modifying a lot of code but still haven't gotten it right.

My code: This code plots all the values for all IDs at each position, and connects them for the two groups. It gives me a black dot at the mean of all 10 values per position (I think):

lineplot <- ggplot(data=seq.long, aes(x=Position, y=PMethyl, 
    group=CACO, colour=CACO)) +
    stat_summary (fun.y=mean, geom="point", aes(group=1), color="black") +      
    geom_line() + geom_point()

I can't get R to not plot all 10 points; just two means (one per case/control group) per position, with cases' & controls' values each connected by a line across the x axis.

Didzis Elferts · Accepted Answer · 2013-03-04 08:31:51Z

3

First, adjusted your original sample data to contain more than one unique GenePosition.

dput(seq.long)
structure(list(ID = 1:8, Var1 = structure(c(2L, 2L, 1L, 1L, 2L, 
2L, 1L, 1L), .Label = c("case", "control"), class = "factor"), 
    GenePosition = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
    ), .Label = c("X20068492", "X20068493"), class = "factor"), 
    ContinuousOutcomeVar = c(0.092813611, 0.001746708, 0.069251157, 
    0.003639304, 0.112813611, 0.002746708, 0.089251157, 0.004639304
    )), .Names = c("ID", "Var1", "GenePosition", "ContinuousOutcomeVar"
), class = "data.frame", row.names = c(NA, -8L))

If you just want to represent one value for each GenePosition and Var1 combination then it would be easier to calculate mean values before plotting. That can be achieved with function ddply() from library plyr.

library(plyr)    
seq.long.sum<-ddply(seq.long,.(Var1,GenePosition),
       summarize, value = mean(ContinuousOutcomeVar))
seq.long.sum
     Var1 GenePosition      value
1    case    X20068492 0.03644523
2    case    X20068493 0.04694523
3 control    X20068492 0.04728016
4 control    X20068493 0.05778016

Now with this new data frame you just have to give x and y values. Var1 should be used in colour= and group= to ensure that each group has different color and that lines are connected.

ggplot(seq.long.sum,aes(x=GenePosition,y=value,colour=Var1,group=Var1))+
   geom_point()+geom_line()

enter image description here

edited Mar 4, 2013 at 8:31

answered Mar 4, 2013 at 7:53

Didzis Elferts

99.2k17 gold badges275 silver badges207 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Jess Over a year ago

Thanks for teaching me something new! Also learned that if I use "transform" with ddply instead of summarize it keeps all the other vars in my dataframe. I appreciate your help!

孟泽楷 · Accepted Answer · 2024-01-05 03:10:02Z

0

1、First make data as Didzis Elferts support just like

data <- structure(list(ID = 1:8, Var1 = structure(c(2L, 2L, 1L, 1L, 2L, 
2L, 1L, 1L), .Label = c("case", "control"), class = "factor"), 
    GenePosition = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
    ), .Label = c("X20068492", "X20068493"), class = "factor"), 
    ContinuousOutcomeVar = c(0.092813611, 0.001746708, 0.069251157, 
    0.003639304, 0.112813611, 0.002746708, 0.089251157, 0.004639304
    )), .Names = c("ID", "Var1", "GenePosition", "ContinuousOutcomeVar"
), class = "data.frame", row.names = c(NA, -8L))

2、create a plot with code below:

ggplot(data,aes(x=GenePosition,y=ContinuousOutcomeVar,color=Var1,group=Var1))+
    stat_summary(fun = 'mean',geom = 'point')+
    stat_summary(fun = 'mean',geom = 'line')

edited Jan 5, 2024 at 3:10

answered Jan 4, 2024 at 11:54

孟泽楷

94 bronze badges

2 Comments

Leon Samson Over a year ago

Hi 孟泽楷, its probably better to create the summary stats separately as suggested here, because then you don't have to calculate the values twice (for the points and the lines). However, regardless that, your answer is missing the summary function within stat_summary(). I would suggest to adjust and share an output figure as well, thanks!

孟泽楷 Over a year ago

Hi Samson, I appreciate your kind advice. I think the code I provide is another solution, without any pre-summary. The missing args you mentioned, leaving it as default would be OK for this question. However, I provide the default mean by editing. The output figure is exactly same as the adopted answer's.

Collectives™ on Stack Overflow

ggplot: lineplot of means of two groups

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related