R plotly grouped boxplot highlighting a specific value for each category

Question

I have the following code that works fine on my end:

# Seeding the pseudo-random number generator for reproducible results
  set.seed(1234)
  # Create three varaible
  income <- round(rnorm(500,  # 500 random data point values
                        mean = 10000,  # mean of 100
                        sd = 1000),  # standard deviation of 1000
                  digits = 2)  # round the random values to two decimal points
  stage <- sample(c("Early",  
                    "Mid",
                    "Late"),  # sample space of the stage variable
                  500,  # 500 random data point values
                  replace = TRUE)  # replace values for reselection
  country <- sample(c("USA",
                      "Canada"),  # sample space of the country variabe
                    500,  # 500 random data point values
                    replace = TRUE)  # replace values for reselection
  # Create tibble
  df1 <- tibble(Income = income,  # create an Income variable for the income data point values
               Stage = stage,  # create a Stage variable for the stage data point values
               Country = country)  # create a Country variable for the country data point values
  
  df1 <- as.data.frame(df1)
  df1$HIGHLIGHT <- 'NO'
  df1$TMP = paste0(df1$Country,"_",df1$Stage)
  idx <- duplicated(df1$TMP)
  df1$HIGHLIGHT[!idx] = 'YES'
  
  
  plot_ly(df1,
          x = ~Country,
          y = ~Income,
          color = ~Stage,
          type = "box") %>% 
    layout(boxmode = "group",
           title = "Income by career stage",
           xaxis = list(title = "Country",
                        zeroline = FALSE),
           yaxis = list(title = "Income",
                        zeroline = FALSE))

However, what I would like to add is a red dot over each single boxplot showing the most recent value given by column "HIGHLIGHT" where the value in this column is "YES". This helps uses to see not only the distribution for each boxplot but also where the most recent value is positioned. I can't find a way to add those red dots. Any suggestions? Thank you

I tried experimenting with add_markers but no luck. Here is the code I piped onto your plotly code, if you want to try and pick it up from here. The key might be getting the boxplot to group based on Stage without using the boxmode argument. add_markers(data = df1 %>% filter(HIGHLIGHT == "YES") %>% group_by(Stage), x = ~Country, y = ~Income) — Harrison Jones
– Harrison Jones, Commented Apr 8, 2022 at 21:10
The problem is that I do get the red dots, but they go all to one category per group. Maybe you need to manually add the markers? — Quinten
– Quinten, Commented Apr 9, 2022 at 9:26

Kat · Accepted Answer · 2022-04-11 15:09:31Z

I couldn't find what I would call an easy or intuitive way of doing this, but I did find a way that works.

I used the domain to align the points on the x-axis and the income on the y-axis. Because annotations in plotly require text, I used an asterisk. I did start with a period, but the points appear off, because a period is at the bottom of text 'space.'

Let me know if this is what you were looking for.

# first find the values needed 
df1 %>% filter(HIGHLIGHT == "YES") %>% 
  group_by(Country, Stage) %>% 
  summarise(Income = Income)
# # A tibble: 6 × 3
# # Groups:   Country [2]
#   Country Stage Income
#   <chr>   <chr>  <dbl>
# 1 Canada  Early  7654.
# 2 Canada  Late   9002.
# 3 Canada  Mid    8793.
# 4 USA     Early 11084.
# 5 USA     Late   9110.
# 6 USA     Mid   10277.

Then extract the values needed for the plot. Note the order here, as well. This is the same order in the plot right now.

Using trial and error, knowing that Canada is centered about x = 0 in the domain and the US is centered at x = 1 in the domain, I tried a few values until I found ones that work.

The boxplot centers in the domain on x are -.235, 0, .235, .765, 1, and 1.235.

Next, I created the x and y for the annotation.

newY = df1 %>% filter(HIGHLIGHT == "YES") %>% 
  group_by(Country, Stage) %>% 
  summarise(Income = Income) %>% 
  ungroup() %>% 
  select(Income) %>% as.data.frame() %>% 
  unlist()

x = c(-.235, 0, .235, .765, 1, 1.235)

Then I put it all together. In your code for the plot, most of the variables are capitalized, but they aren't in the data. I just changed them in the data.

(plt = plot_ly(df1,
               x = ~Country,
               y = ~Income,
               color = ~Stage,
               type = "box") %>% 
    layout(boxmode = "group",
           title = "Income by career stage",
           xaxis = list(title = "Country",
                        zeroline = FALSE),
           yaxis = list(title = "Income",
                        zeroline = FALSE),
           annotations = list(x = x,
                              y = newY,
                              text = "*",
                              hovertext = newY,
                              font = list(size = 20,
                                          color = "red"),
                              showarrow = F,
                              valign = "middle",
                              xanchor = "middle",
                              yanchor = "middle" ) 
    ) # end legend
) # end print

Collectives™ on Stack Overflow

R plotly grouped boxplot highlighting a specific value for each category

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related