1

my data looks like the following. I need to create some lineplot/barplot for average val for each group like, status and category in the csv file.
Data in dput format.

df <-
structure(list(val = c(4608, 4137, 6507, 5124, 
3608, 34377, 5507, 5624, 4608, 4137, 6507, 5124, 
3608, 3437, 5507, 5507, 5624), status = c("1x", 
"1x", "1x", "2x", "2x", "2x", "2x", "2x", "50xy", 
"50xy", "50xy", "60xy", "60xy", "70xy", "xyz", 
"xyz", "xyz"), category = c("A", "C", "A", "A", 
"A", "B", "B", "C", "B", "C", "A", "B", "C", 
"B", "B", "C", "C")), row.names = c(NA, 
-17L), class = "data.frame")

I tried the following code but could not figure out the whole thing.

library(ggplot2)
ggplot(df, aes(x = status, y = val, group = category, color = source)) + 
      geom_smooth(method = "loess")

Help to plot them (each group wise, such as plotting mean val for each 2x and B) in a single window would be really appreciated. Thank you.

2 Answers 2

2

You can do:

library(dplyr)
library(ggplot2)
df %>%
    group_by(category, status) %>%
    mutate(agg = mean(val)) %>%
    ggplot(., aes(status, agg, fill = category, color=status))+
    geom_col(position = "dodge")
Sign up to request clarification or add additional context in comments.

3 Comments

The OP wants to plot the means of groups, not just the values. Also just as shorthand, geom_col() is equivalent to geom_bar(stat = "identity")
The bars have uneven widths, the last solution in this answer solved it for me.
thank you @YOLO, for the hint, it worked, later I converted into line plots.
2

This question already has an accepted answer which requires to compute the aggregated mean(val) by status, category group beforehand.

However, ggplot2 includes transformations (or stats) which enable us to create the desired plot in one go without utilizing other packages:

library(ggplot2)
ggplot(df, aes(x = status, y = val, group = category, colour = category)) +
  stat_summary(geom = "line", fun.y = "mean")

This creates a line plot of the mean values as requested by the OP:

enter image description here

Alternatively, we can tell geom_line to use a summary statistics:

ggplot(df, aes(status, val, group = category, colour = category)) +
  geom_line(stat = "summary", fun.y = "mean")

which creates the same plot.

stat_summary() can also be used to show the original data and the summary statistics combined in one plot:

ggplot(df, aes(status, val, group = category, colour = category)) +
  geom_point() +
  stat_summary(geom = "line", fun.y = "mean")

enter image description here

This can help to better understand the structure of the underlying data, e.g., outliers. Please, note the different y scale.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.