1

I would like to create a plot which shows the count of "Yes" votes in three different variables (Total, Interview, Hire) based on a third variable (Year).It's also worth noting there's no actual Total variable, rather it's just the total observations

I am trying to do this in ggplot2 however everything I've tried hasn't produced the results I'm looking for. I can easily get one dodged and plotted using geom_bar, but I am unsure how to represent 2 different variables.

 app <- structure(list(Applicant_Name = c("Aaraf", "Alaina", 
 "Aleena", "Alejandra", "Alexa", "Alexander", 
 "Alexandra", "Alexandra", "Alexandria", 
 "Alexis"), Interview = c("No", "No", "Yes", "Yes", "No", 
 "Yes", "Yes", "Yes", "Yes", "Yes"), Hire = c("No", "No", "Yes", 
 "No", "No", "No", "No", "No", "Yes", "Yes"), Year = c(2022, 2020, 
 2021, 2021, 2022, 2022, 2020, 2020, 2020, 2022), School = c("School of Business", 
 "Columbian Coll of Arts & Sci", "Milken Inst Sch of Public Hlth", 
 "Columbian Coll of Arts & Sci", "School of Engin & App Sc", "Columbian Coll of Arts & Sci", 
 "Columbian Coll of Arts & Sci", "Columbian Coll of Arts & Sci", 
 "School of Business", "Columbian Coll of Arts & Sci"), Major = c("Pre-Business Administration", 
 "Biological Anthropology", "Public Health", "Biological Anthropology", 
 "Systems Engineering", "Arts & Sciences", "Neuroscience", "English", 
 "International Business", "Arts & Sciences"), Ethnicity = c("Black or African American", 
 "White", "White", "Nonresident alien", "White", "White", "Race/ethnicity unknown", 
 "Two or More Race Codes", "Black or African American", "Black or African American"
 ), Sex = c("Female", "Female", "Female", "Female", "Female", 
 "Male", "Female", "Female", "Female", "Female"), GPA = c(3.221428, 
 3.230158, 3.429268, 3.576595, 3.86, 4, 3.460759, 3.89315, 3.227631, 
 1.433333)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", 
 "data.frame"))

 ggplot(app, aes(Year, ..count..)) + geom_bar(aes(fill = Hire), position = "dodge")

Ideally, I would like a plot showing our total number of applicants (all observations) next to total number of Interview=Yes next to total number of Hire=Yes, broken down by year.

Here is a visual example with my lovely artistic ability. https://i.sstatic.net/mYr8w.jpg

1 Answer 1

4

Using dplyr and tidyr to directly get the data you want to plot:

library(dplyr)
library(tidyr)
library(ggplot2)
app2 <- app %>% 
  group_by(Year) %>% 
  summarise(Total = n(),
            Interviewed = sum(Interview == "Yes"),
            Hired = sum(Hire == "Yes")) %>% 
  gather( "category", "counts", -Year)

And then plotting is straight forward:

ggplot(app2, aes(Year, counts)) + 
  geom_col(aes(fill = category), position = "dodge")

enter image description here

Sign up to request clarification or add additional context in comments.

3 Comments

Aweosome! Yeah this is exactly it; Now do you know how I can have the biggest one (total) be left most and then the smaller ones follow ? So total, interviewed, Hired as opposed to the opp?
sorry meant to tag
Just explicitly make the factor levels. so add %>% mutate(category = factor(category, levels = c("Total","Interviewed","Hired"))) to the end of the app2 pipe

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.