3

I am trying to achieve a complex data viz like in the picture bellow. But with R and ggplot2.

enter image description here

As observed:

  1. there are 6 different groups "Africa", "Asia", "Europe", etc, above each sets of data visualisation;
  2. 1 set comprising of 3 area plots per each continent;
  3. the x axis appears only to one set, last row of Oceania
  4. the legend appears only once, above.
  5. There are two legends, above the plot - risk groups and conditions
  6. as you can see, Africa has population in million (one chart), risk groups and conditions.

I am trying to achieve same results with 2 of my datasets. For India for example, I want in one line, a chart for symptoms and the second a chart for comorbidities. The same for UK and Pakistan. Here are some fake datasets created:

  1. https://github.com/gabrielburcea/stackoverflow_fake_data/blob/master/fake_symptoms.csv
  2. https://github.com/gabrielburcea/stackoverflow_fake_data/blob/master/fake_comorbidities%202.csv

I have tried to get something by creating small datasets per each country and then created 2 plots, one for symptoms and the other for comorbities, and then adding them together. But this is heavy work with so many other issues coming up. Problems may emerge taking this approach. One example it is here:

india_count_symptoms <- count_symptoms %>%
  dplyr::filter(Country == "India")

india_count_symptoms$symptoms <- as.factor(india_count_symptoms$symptoms)
india_count_symptoms$Count <- as.numeric(india_count_symptoms$Count)

library(viridis)

india_sympt_plot <- ggplot2::ggplot(india_count_symptoms, ggplot2::aes(x = age_band, y = Count, group = symptoms, fill = symptoms)) +
  ggplot2::geom_area(position = "fill", color = "white") + 
  ggplot2::scale_x_discrete(limits = c("0-19", "20-39", "40-59","60+"), expand = c(0, 0)) +
  ggplot2::scale_y_continuous(expand = expansion(mult = c(0, 0.1))) + 
  viridis::scale_fill_viridis(discrete = TRUE)

india_sympt_plot  

this is what I got:

enter image description here

And as you can see:

a. the age bands aren't nicely aligned

b. I end up with legends for each plot for each country, if I take this approach

c. y axis does not give me the counts, it goes all the way to 1. and does not come intuitively right.

d. do the same for comorbidites and then get the same problems expressed in the above 3 points.

Thus, I want to follow an easier approach in order to get similar plot as in the first picture, with conditions expressed: from 1 to 5 points but for my 3 countries and for symptoms and comorbidities. However, my real dataset is bigger, with 5 countries but with same plotting - symptoms and comorbidities.

Is there a better way of achieving this with ggplot2, in RStudio?

5
  • A few comments: (a) What do you mean by "the age bands aren't nicely aligned"? Do you mean you want them rotated so they are vertical? See this FAQ on rotating and spacing axis labels in ggplot. Commented Nov 14, 2020 at 17:16
  • 1
    (b) re legends: I would suggest using facets - probably facet_grid. This will simplify your code and automatically combine legends. Commented Nov 14, 2020 at 17:17
  • as observed, 0-19 and 63+ starts the the beginning of the x axis and ends up at the end of the axis. I can rotate them anyway , is just that other issues emerge, as expressed them above Commented Nov 14, 2020 at 17:18
  • 1
    (c) position = 'fill' tells geom_area` that you want the y axis to fill the space from 0 to 1. Remove that setting and the defaults will show you your counts. Commented Nov 14, 2020 at 17:18
  • yes, when I removed the position = "fill" I had no areas plotted and looked everywhere to see how I can get this areas plotted. This was 4 hours trying to solve one little thing. Commented Nov 14, 2020 at 17:20

1 Answer 1

3

This is a good start - I'm not clear on some of your goals, but this answer should get you over the immediate obstacles.

## read in your data
count_symptoms = readr::read_csv("https://github.com/gabrielburcea/stackoverflow_fake_data/raw/master/fake_symptoms.csv")

## as mentioned in comments, removing `position = 'fill'` lets your chart show counts.
## (I'm skipping the unnecessary data conversions)
## And I'm removing the `ggplot2::` to make the code more readable...
## No other changes are made

india_count_symptoms <- count_symptoms %>%
  dplyr::filter(Country == "India")

india_sympt_plot <- ggplot(india_count_symptoms, aes(x = age_band, y = Count, group = symptoms, fill = symptoms)) +
  geom_area(color = "white") + 
  scale_x_discrete(limits = c("0-19", "20-39", "40-59","60+"), expand = c(0, 0)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) + 
  viridis::scale_fill_viridis(discrete = TRUE)

enter image description here

Now, instead of making individual plots for each country, let's use facets:

## same plot code as above, but we give it the whole data set
## and add the `facet_grid` on
ggplot(count_symptoms, aes(x = age_band, y = Count, group = symptoms, fill = symptoms)) +
  geom_area(color = "white") + 
  scale_x_discrete(limits = c("0-19", "20-39", "40-59","60+"), expand = c(0, 0)) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1))) +  
  viridis::scale_fill_viridis(discrete = TRUE) + 
  facet_grid(Country ~ .)

enter image description here

Notice we have a single legend. You can re-position it easily as shown here. Probably the next change I'd make is adding the argument labels = scales::comma_format in your scale_y_continuous. I have no idea what your issue is with the x-axis labels.

For the complete figure, I'd suggest doing one facet_grid plot for each column, and then use the patchwork package to combine them into one image. See how far you can get based on this, and if you continue to have issues ask a new question focused on the next step.

Sign up to request clarification or add additional context in comments.

4 Comments

hello Gregor, thank you a lot for your help. However, I have added some information into my post as I missed something. Yes, what you have done would work very well yet I want 2 charts in one line for India, one for symptoms and the other one for comorbidities. And the same for the other countries.
Right, and I said in my answer "I'd suggest doing one facet_grid plot for each column, and then use the patchwork package to combine them into one image. See how far you can get based on this, and if you continue to have issues ask a new question focused on the next step.".
You won't be able to use different fill or y scales within the same faceted plot. So each column of charts will need to be developed separately and then stuck together I recommend the patchwork package for sticking them together. I think this answer should get you far enough that you can develop each of them separately. If you run into new issues, ask a new question showing where you're at and what the current issue is.
took your suggestions in. Quite nice. Will post another issues connected to this aspect

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.