0

I am trying to make a grouped bar chart with data in long form.

Here is the data:

structure(list(group = c("group1", "group2", "group3", "group1", 
"group2", "group1", "group1", "group1", "group4", "group1", "group4", 
"group4", "group1", "group4", "group1", "group1", "group2", "group1", 
"group4", "group2", "group4", "group2", "group3", "group3", "group1", 
"group1", "group3", "group3", "group1", "group1", "group3", "group1", 
"group4", "group3", "group3", "group1", "group2", "group1", "group4", 
"group1", "group3", "group3", "group3", "group2", "group2", "group4", 
"group3", "group3", "group3", "group2", "group3", "group2", "group1", 
"group1", "group3", "group1", "group1", "group2", "group4", "group1", 
"group4", "group1", "group1", "group4", "group1", "group3", "group4", 
"group1", "group4", "group2", "group4", "group1", "group2", "group4", 
"group1", "group4", "group1", "group2", "group1", "group1", "group1", 
"group1", "group2", "group1", "group3", "group1", "group1", "group1", 
"group3", "group4", "group1", "group3", "group1", "group3", "group4", 
"group1", "group2", "group1", "group3", "group1"), category = c("category4", 
"category5", "category2", "category4", "category3", "category6", 
"category3", "category1", "category4", "category2", "category6", 
"category6", "category5", "category5", "category4", "category4", 
"category1", "category6", "category1", "category4", "category6", 
"category6", "category2", "category6", "category3", "category2", 
"category6", "category3", "category6", "category1", "category6", 
"category2", "category2", "category2", "category5", "category1", 
"category1", "category4", "category3", "category4", "category4", 
"category5", "category1", "category3", "category5", "category2", 
"category2", "category5", "category5", "category2", "category6", 
"category6", "category5", "category1", "category4", "category3", 
"category6", "category1", "category6", "category3", "category2", 
"category2", "category3", "category2", "category2", "category5", 
"category4", "category4", "category4", "category4", "category1", 
"category5", "category6", "category5", "category4", "category5", 
"category1", "category2", "category3", "category5", "category3", 
"category2", "category4", "category6", "category4", "category6", 
"category1", "category4", "category4", "category3", "category4", 
"category5", "category5", "category6", "category4", "category3", 
"category5", "category3", "category3", "category1"), count = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 
0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0)), row.names = c(NA, 
-100L), class = c("tbl_df", "tbl", "data.frame"))

When I run the following:

pivot_sample %>% 
  ggplot(aes(x=group,fill=category))+
  geom_bar()

enter image description here The stat_count() default function seems to work just fine with the default position="stack" However, when I switch to position="dodge" in the code below:

pivot_sample %>% 
  ggplot(aes(x=group,y=count,fill=category))+
  geom_bar(position = "dodge",stat = "identity")

enter image description here It won't count the count variable.

I am sure there is something basic I am missing and could use another perspective. Do I need to use a count function for the y= argument in the aes()?

All help would be appreciated!

2
  • 4
    In the first one, you are not using the 'count' column. So, it does a count which would be similar to pivot_sample %>% %>% ggplot(aes(x = group, fill = category)) + geom_bar(position = "dodge") stat= 'identity' uses the exact values from that column 'count' instead of aggregation Commented Apr 23, 2021 at 19:50
  • @akrun, That's a good point. I was trying to plot with the data in this format to make it easier for looping through plots, but it seems like it may be a better idea to loop through the variables to create a list of summary tables that can be plotted rather than try using operations on a count variable. Commented Apr 24, 2021 at 15:40

1 Answer 1

1

OP, the simple answer here is just to add position="dodge" to your original plot code and it works fine to separate the bars according to the group aesthetic (which is not specified, so it will default for the bar geom to use the fill aesthetic as the one to group by):

pivot_sample %>%
  ggplot(aes(x=group, fill=category)) +
  geom_bar(position='dodge')

enter image description here

The reason is that the default option for the stat argument in geom_bar is stat="count". This will count all the observations and plot along the y axis the "count". To access this you can use the .. notation: ..count.., but it's not necessary with geom_bar(). So, the code below shows you kind of a long form that shows you the same plot:

pivot_sample %>%
ggplot(aes(x=group, fill=category)) +
  geom_bar(position='dodge', aes(y=..count..), stat="count")

Note that your data frame has a column called "count", but pivot_sample$count is not what is accessed when you specify and use ..count... What's being accessed there is the result after the stat="count" function is run.

What happened when you used stat="identity"? Well, the "identity" stat plots the actual value on the y axis. You specified y=count, which means that the value of the column pivot_sample$count was plotted at each grouping and category. geom_bar with stat="identity" is the same as using geom_col() (which should be used in that case), which will require x and y aesthetics to be defined. In this case, the "identity" will result in adding up all the values of the y aesthetic - or pivot_sample$count.

In your plot you showed using stat="identity", you are seeing the value of count represented as the bar height equal to the sum of all values of pivot_sample$count for each bar. You don't have a lot of values = 1 for that column in the data, so that's why it looks the way it does.

Note that geom_bar() using stat="count" counts observations, whereas stat="identity" totals the value.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks that clarifies a lot. What I realized after reading your post is that my example above is a sample of 100 observations from the larger dataset that I anonymized. Adding position="dodge," like you said, counts the category column. But I am looking for the count described in your last paragraph. However, I am still confused on the last paragraph. Why is the Y axis the range of possible values (i.e. 0-1) rather than setting a limit similar at the top end of the range of the sum of values in the count column? Do I need to set the Y-axis range myself?
can you clarify my last question regarding the y-axis scale?
If you want to show pivot_sample$count on the y axis, use geom_col() instead of geom_bar(). Regardless, the reason you have only 0 to 1 is because the data you supplied only has ranges of 0 or 1 for any value of pivot_sample$count of the grouping of category/group column. If your full dataset has different values, you would see that. If you want to add up groupings, then you may want to use stat_summary().

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.