Plot points from one df, plot errorbar from another

Question

Raw data looks like:

Restaurant     Question               rating

McDonalds      How was the food?      5       
McDonalds      How were the drinks?   3     
McDonalds      How were the workers?  2     
Burger_King    How was the food?      1       
Burger_King    How were the drinks?   3       
Burger_King    How were the workers?  4

Averages looks like:

Question              average_rating    error
How was the food?     3.13              0.7
How were the drinks?  2.37              0.56

How do I make a plot of points (x = question, y = rating, fill = restaurant) with the raw data, then plot the error bars (ymin/ymax = average_rating ± error) on top of it?

tribbles for convenience:

tribble(
  ~restaurant, ~question,  ~rating,
  "McDonalds", "How was the food?", 5,
  "McDonalds", "How were the drinks?", 3,
  "McDonalds", "How were the drinks?", 2,
  "BurgerKing", "How was the food?", 1,
  "BurgerKing", "How were the drinks?", 3,
  "BurgerKing", "How were the drinks?", 4
)

tribble(
  ~question, ~average_rating, ~error,
  "How was the food?", 3.13, 0.7,
  "How were the drinks?", 2.37, 0.56
)

"a plot of points with the raw data" could mean a bunch of things in this context. Please be specific how you want to plot your data. — Axeman
– Axeman, Commented Feb 14, 2020 at 21:38
How are you going to plot the error bars? You have the errors now by restaurant, not question — StupidWolf
– StupidWolf, Commented Feb 14, 2020 at 22:04
Does this answer your question? How to draw means and error bars on axes in ggplot2 R — Matias Andina
– Matias Andina, Commented Feb 14, 2020 at 22:05

dc37 · Accepted Answer · 2020-02-16 04:05:02Z

3

Your desired output is not in good agreement with your current dataframes. Because, your second dataframe contains average rating per restaurant and not per question (as outlined by @StupidWolf). So, either, you want to plot with restaurant in x axis and it will be easy to do, or you need to merge both dataframes and set Average_rating as a discrete value of the variable question.

I do the following for the second option:

library(dplyr)
df2 %>% mutate(question = "Average_rating") %>%
  rename(rating = average_rating) %>% full_join(df1,.) %>%
  mutate(restaurant = sub("BurgerKing","Burger_King",restaurant)) 
Joining, by = c("restaurant", "question", "rating")
# A tibble: 8 x 4
  restaurant  question             rating error
  <chr>       <chr>                 <dbl> <dbl>
1 McDonalds   How was the food?      5    NA   
2 McDonalds   How were the drinks?   3    NA   
3 McDonalds   How were the drinks?   2    NA   
4 Burger_King How was the food?      1    NA   
5 Burger_King How were the drinks?   3    NA   
6 Burger_King How were the drinks?   4    NA   
7 McDonalds   Average_rating         3.13  0.7 
8 Burger_King Average_rating         2.37  0.56

Then, if you want to add the plot, you can do the following:

library(ggplot2)
library(dplyr)
df2 %>% mutate(question = "Average_rating") %>%
  rename(rating = average_rating) %>% full_join(df1,.) %>%
  mutate(restaurant = sub("BurgerKing","Burger_King",restaurant)) %>%
  ggplot(aes(x = question, y= rating, color = restaurant))+
  geom_point(position = position_dodge(0.9))+
  geom_errorbar(aes(ymin = rating-error, ymax = rating+error), width = 0.1, position = position_dodge(0.9))

EDIT: Ploting error means per questions

With your new dataframe with the average rate per question, you can use geom_pointrange as follow:

ggplot(df1, aes(x = question, y = rating, color = restaurant))+
  geom_jitter(width = 0.2)+
  geom_pointrange(inherit.aes = FALSE,
                  data = df3, 
                  aes(x = question, 
                      y = average_rating,
                      ymin = average_rating-error,
                      ymax = average_rating+error))

Does it answer your question ?

edited Feb 16, 2020 at 4:05

answered Feb 14, 2020 at 22:05

dc37

16.3k4 gold badges19 silver badges35 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Sas Over a year ago

thank you. why did you use geom_pointrange() instead of geom_errorbar()? Is there a way to make the second error bars look the same as the first?

dc37 Over a year ago

You're welcome. geom_pointrange is plotting the mean point + error bar in one line. If you want the same appearance than the first one, use geom_point and geom_errorbar like in the first example.

Collectives™ on Stack Overflow

Plot points from one df, plot errorbar from another

1 Answer 1

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related