1

I am hoping to use ggplot to construct a barplot of frequencies (or just % 1s) of a bunch of binary variables, and am having trouble getting them all together on one plot.

The variables all stem from the same question in a survey, so ideally it'd be nice to have data that is tidy with one column for this variable, but respondents could select more than one option and I'm hoping to retain that instead of having a "more than one selected" option. Here is a slice of the data:

structure(list(gender = structure(c("Male", "Male", "Female", 
"Female", "Female", "Female", "Male", "Male", "Male", "Male"), label = "Q4", format.stata = "%24s"), 
    var1 = structure(c("0", "0", "1", "1", "0", "0", "0", "0", 
    "0", "0"), format.stata = "%9s"), var2 = structure(c("0", 
    "98", "1", "0", "0", "0", "0", "0", "0", "0"), format.stata = "%9s"), 
    var3 = structure(c("0", "0", "0", "0", "0", "0", "0", "0", 
    "0", "0"), format.stata = "%9s"), var4 = structure(c("1", 
    "0", "1", "0", "0", "0", "1", "1", "0", "0"), format.stata = "%9s"), 
    var5 = structure(c("1", "0", "0", "0", "0", "1", "0", "0", 
    "0", "0"), format.stata = "%9s")), row.names = c(NA, -10L
), class = c("tbl_df", "tbl", "data.frame"))
2
  • 1
    Can you clarify what you are looking to construct based on the data provided? It's not clear to me how you want to organize your intended chart, given the example data. Also.. is there supposed to be a 98 in there or is it supposed to be all 1's and 0's? Commented Apr 6, 2021 at 1:36
  • @chemdork123 Sorry, the 98 represents a missing value. And the ideal chart would have var1, var2, var3, etc. along the x axis and with a frequency or percentage of 1s along the y for each respective var. Commented Apr 6, 2021 at 2:17

1 Answer 1

1

Get the data in long format so that it is easier to plot.

library(tidyverse)

df %>%
  pivot_longer(cols = starts_with('var')) %>%
  group_by(name) %>%
  summarise(frequency_of_1 = sum(value == 1)) %>%
  #If you need percentage use mean instead of sum
  #summarise(frequency_of_1 = mean(value == 1)) %>%
  ggplot() + aes(name, frequency_of_1) + geom_col()

enter image description here


In base R you can do this with colSums and barplot.

barplot(colSums(df[-1] == 1))
#For percentage
#barplot(colMeans(df[-1] == 1))
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much, this is perfect. Is there an easy way to incorporate each value broken down (by something like gender)? It doesn't seem like it will work the same way as it would with a non-pivot_longer table would (which would just be ggplot(d, aes(x, fill = gender) or whatever)...
You need to include gender in group_by. Try : df %>% pivot_longer(cols = starts_with('var')) %>%group_by(gender, name) %>%summarise(frequency_of_1 = sum(value == 1)) %>% ggplot() + aes(name, frequency_of_1, fill = gender) + geom_col(position = 'dodge')

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.