1

I want to create a barplot that shows the numbers of bugs in 3 different lifestages. For each date (x) I want three bars representing the nr. of individuals in each of the lifestages.

My dataframe from the raw data looks similar like this (simplified example):

# create dataframe
date <- c("01/02/2018","14/02/2018","20/02/2018","03/03/2018","15/03/2018")
adult <- c(5,2,3,1,1) 
larvae <- c(6,5,9,7,12) 
nymph <- c(4,4,8,13,10)
df <- data.frame(date,adult,larvae,nymph)

 date adult larvae nymph
1 01/02/2018     5      6     4
2 14/02/2018     2      5     4
3 20/02/2018     3      9     8
4 03/03/2018     1      7    13
5 15/03/2018     1     12    10

The only way I know how to plot this with ggplot, is to turn the variables into factor levels of a new variable, say lifestage, and put all the counts into a variable counts.

That dataframe would look like this:

df2
date  stage counts
1  01/02/2018  adult      5
2  14/02/2018  adult      2
3  20/02/2018  adult      3
4  03/03/2018  adult      1
5  15/03/2018  adult      1
6  01/02/2018 larvae      6
7  14/02/2018 larvae      5
8  20/02/2018 larvae      9
9  03/03/2018 larvae      7
10 15/03/2018 larvae     12
11 01/02/2018  nymph      4
12 14/02/2018  nymph      4
13 20/02/2018  nymph      8
14 03/03/2018  nymph     13
15 15/03/2018  nymph      1

Plotting this df is easy:

ggplot(df2, aes(date, counts, fill=stage)) +
  geom_col(position = "dodge") 

To get from df to df2 I have rather large workarounds involving extracting columns, creating new vectors with rep("stagename",x) to add to the dataframe, rbind() the whole dataframe times the nr of variables I want to turn into factor levels, etc. (I noticed I have used several methods before, but all quite long).

So I have 2 questions:

1) Is there a quick way to turn the different variables into factor levels of one new variable? I'm talking about a large dataframe with several other variables as well that need to stay.

2) Is there a way to get the same type of barplot without having to transfrom the dataframe?

I was trying something like this, but that's certainly not correct:

ggplot(data=df) +
  geom_col(aes(x=date,y=adult),fill="blue") +
  geom_col(aes(x=date,y=nymph),fill="green") +
  geom_col(aes(x=date,y=larvae),fill="yellow")

I've searched for similar questions, but can't seem to find a problem quite like mine. Also mine is double: if I can make the ggplot without the transformation, it would be better.

I've recently discovered the package tidyverse and assumed a solution for the transformation would lie in there, but I haven't come across anything yet that allows me a quick transformation of this kind. I'd prefer any solutions using that package if possible.

1 Answer 1

4
library(tidyverse)
df %>% 
  gather(stage, counts, -date) %>%
  ggplot(aes(date, counts, fill = stage)) +
  geom_col(position = "dodge") 

This should do it, or not?

Sign up to request clarification or add additional context in comments.

2 Comments

It does and it is fantastically short, love it! But how does it work? How does R (or gather) know what I mean with "stage" and "counts"? I'd like to understand it so I can play with it myself the next time. The help page does not explain how R links the key and value with the correct column.
Hey, maybe it would have been more intuitive using gather("adult", "larvae", "nymph", key = "stage", value = "counts"). You select the columns that represent values, rather than variables. The key is the column names, which are called stage in your data, and the value in your data is counts. In my answer, gather() will gather all columns except date. -date (i.e. exclude date) comes from the dplyr::select() notation. This chapter might be of use to you!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.