Making ggplot2's "geom_point" variable depending on certain conditions

Question

I have an R script that generates plots based on the run time data from a simulation. However, sometimes there are errors during the runs which result in null run time values and lead to graphics that make it seem like the run time is smaller than it really was.

Here's an example of what the data in the "data" data frame might look like:

| Version | TotalMean | TestNum |  Case |
|:-------:|:---------:|:-------:|:-----:|
| 1.0.1   |       350 |       1 | Case1 |
| 1.0.2   |       430 |       2 | Case1 |
| 1.0.4   |       470 |       3 | Case1 |
| 1.0.7   |       445 |       4 | Case1 |
| 1.0.1   |       320 |       1 | Case2 |
| 1.0.2   |       280 |       2 | Case2 |
| 1.0.4   |       450 |       3 | Case2 |
| 1.0.7   |       420 |       4 | Case2 |
| 1.0.1   |       335 |       1 | Case3 |
| 1.0.2   |       415 |       2 | Case3 |
| 1.0.4   |       465 |       3 | Case3 |
| 1.0.7   |       430 |       4 | Case3 |
| 1.0.1   |       310 |       1 | Case4 |
| 1.0.2   |       375 |       2 | Case4 |
| 1.0.4   |       425 |       3 | Case4 |
| 1.0.7   |       410 |       4 | Case4 |

Note that there are no null values listed in that table. That's because the way that the TotalMean column is calculated will never reflect that. However, there are nulls found in the data frame that TotalMean is calculated from. Is there any way that I could make geom_point dependent on whether there are null values in a certain table? Maybe change the shape and size?

Use the code below to create a working example. Version 1.0.2 in Case2 has an anomalous value because it had null values in the original table.

library(ggplot2)

Version <- c("1.0.1","1.0.2","1.0.4","1.0.7","1.0.1","1.0.2","1.0.4","1.0.7","1.0.1","1.0.2","1.0.4","1.0.7","1.0.1","1.0.2","1.0.4","1.0.7")
TotalMean <- c(350,430,470,445,320,280,450,420,335,415,465,430,310,375,425,410)
TestNum <- c(1,2,3,4,1,2,3,4,1,2,3,4,1,2,3,4)
Case <- c("Case1","Case1","Case1","Case1","Case2","Case2","Case2","Case2","Case3","Case3","Case3","Case3","Case4","Case4","Case4","Case4")
data <- data.frame(Version,TotalMean,TestNum,Case)
versions <- unique(data[order(data$TestNum), ][,1])
data$Version <- factor(data$Version, levels = versions)

Here's the code that I use to create a chart like I use. (using ggplot2)

g<-ggplot(data, aes(color = Case, x = Version, y = TotalMean, group = Case)) + 
    geom_line() + geom_point(shape = 16, size = 2) + coord_cartesian(ylim=c(0,550)) + 
    labs(x="Version", y="Run Time (minutes)") + 
    stat_summary(fun.y=sum, geom="line") +
    theme(plot.title = element_text(face = "bold", size = 16, vjust = 1.5)) + 
    theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) + 
    theme(axis.title.y = element_text(vjust = 1))
g

There's a way. You need to do it when you are pre-process data. i.e. when you are finding the TotalMean. — M--
– M--, Commented Jun 9, 2017 at 19:45
You should make a column with any(is.null(x)) and set the shape in ggplot according to that column. — M--
– M--, Commented Jun 9, 2017 at 19:48
@Masoud I can use any(is.null(x)) but how would I set the shape in ggplot according to the column that results from that? — Neal
– Neal, Commented Jun 9, 2017 at 19:59
change geom_point(shape = 16, size = 2) to geom_point(shape = IsNullColumn, size = 2). Let's make that column a numeric one instead of Boolean. — M--
– M--, Commented Jun 9, 2017 at 20:54
@Masoud I don't suppose you might want to write this out as an answer so I could visualize it a little better? For one thing, how can I still control the shape of geom_point if shape is set to IsNullColumn? How does all of this exactly communicate together? If you're changing the column to a numeric (I assume 1 for T and 0 for F), how does shape=IsNullColumn work anymore? — Neal
– Neal, Commented Jun 14, 2017 at 13:33

M-- · Accepted Answer · 2017-06-15 03:18:14Z

I made the data frame (structure provided at the bottom) that looks like this:

#    Version First_Run Second_Run TestNum  Case 
# 1    1.0.1       350        350       1 Case1 
# 2    1.0.2       430        430       2 Case1 
# 3    1.0.4       470        470       3 Case1 
# 4    1.0.7       445        445       4 Case1 
# 5    1.0.1       320        320       1 Case2 
# 6    1.0.2       560         NA       2 Case2 
# 7    1.0.4       450        450       3 Case2 
# 8    1.0.7       420        420       4 Case2 
# 9    1.0.1       335        335       1 Case3 
# 10   1.0.2       415        415       2 Case3 
# 11   1.0.4       465        465       3 Case3 
# 12   1.0.7       430        430       4 Case3 
# 13   1.0.1       310        310       1 Case4 
# 14   1.0.2       375        375       2 Case4 
# 15   1.0.4       425        425       3 Case4 
# 16   1.0.7       410        410       4 Case4

Then I calculated the mean and a column for shape:

data$TotalMean <- rowMeans(subset(data, select = c(First_Run, Second_Run)), na.rm = TRUE)

data$shapeflag <- ifelse(is.na(data$First_Run * data$Second_Run), "b", "a")

Note: na.rm = TRUE omits NA in the calculation of mean so you can have that in your calculations as well to adjust the mean while still has the shapeflag column to identify the specific runs that returned NULL. You can see that it returned 560 for the sixth row instead of 280.

This would be how the dataset looks finally:

#    Version First_Run Second_Run TestNum  Case TotalMean shapeflag 
# 1    1.0.1       350        350       1 Case1       350         a 
# 2    1.0.2       430        430       2 Case1       430         a 
# 3    1.0.4       470        470       3 Case1       470         a 
# 4    1.0.7       445        445       4 Case1       445         a 
# 5    1.0.1       320        320       1 Case2       320         a 
# 6    1.0.2       560         NA       2 Case2       560         b 
# 7    1.0.4       450        450       3 Case2       450         a 
# 8    1.0.7       420        420       4 Case2       420         a 
# 9    1.0.1       335        335       1 Case3       335         a 
# 10   1.0.2       415        415       2 Case3       415         a 
# 11   1.0.4       465        465       3 Case3       465         a 
# 12   1.0.7       430        430       4 Case3       430         a 
# 13   1.0.1       310        310       1 Case4       310         a 
# 14   1.0.2       375        375       2 Case4       375         a 
# 15   1.0.4       425        425       3 Case4       425         a 
# 16   1.0.7       410        410       4 Case4       410         a

Now we can set the shape based on a variable in the data frame within aes:

g<-ggplot(data, aes(color = Case, x = Version, y = TotalMean, group = Case,
                    shape = shapeflag)) + #Set the shape
  geom_line() + geom_point(size = 3) + coord_cartesian(ylim=c(0,550)) + 
  labs(x="Version", y="Run Time (minutes)") + 
  stat_summary(fun.y=sum, geom="line") +
  theme(plot.title = element_text(face = "bold", size = 16, vjust = 1.5)) + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1)) + 
  theme(axis.title.y = element_text(vjust = 1)) +
  scale_shape_discrete(labels=c("norm","null"),name="runs") #Edit the legend

This would be the plot:

>g

Data:

data <- 
       structure(list(Version = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 
       3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L), .Label = c("1.0.1", 
       "1.0.2", "1.0.4", "1.0.7"), class = "factor"), First_Run = c(350, 
       430, 470, 445, 320, 560, 450, 420, 335, 415, 465, 430, 310, 375, 
       425, 410), Second_Run = c(350, 430, 470, 445, 320, NA, 450, 420, 
       335, 415, 465, 430, 310, 375, 425, 410), TestNum = c(1, 2, 3, 
       4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4), Case = structure(c(1L, 
       1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L), .Label = c("Case1", 
       "Case2", "Case3", "Case4"), class = "factor")), .Names = c("Version", 
       "First_Run", "Second_Run", "TestNum", "Case"), row.names = c(NA, 
       -16L), class = "data.frame")

Collectives™ on Stack Overflow

Making ggplot2's "geom_point" variable depending on certain conditions

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related