How to create a plot of two variables in a melted data frame in ggplot2?

Question

I have the following data frame of data about US counties grouped by their income decile and who won the 2016 election:

# A tibble: 1,188 x 5
# Groups:   day_month_year, deciles_income [270]
   day_month_year deciles_income winner2016      key            mean_spend_cases
   <date>                  <int> <chr>           <chr>                     <dbl>
 1 2020-01-12                  1 Donald Trump    mean_spend_all         0.00108 
 2 2020-01-12                  1 Hillary Clinton mean_spend_all         0.0196  
 3 2020-01-12                  2 Donald Trump    mean_spend_all        -0.000334
 4 2020-01-12                  2 Hillary Clinton mean_spend_all         0.00664 
 5 2020-01-12                  3 Donald Trump    mean_spend_all         0.00807 
 6 2020-01-12                  3 Hillary Clinton mean_spend_all         0.0257  
 7 2020-01-12                  4 Donald Trump    mean_spend_all        -0.00491 
 8 2020-01-12                  4 Hillary Clinton mean_spend_all        -0.0119  
 9 2020-01-12                  5 Donald Trump    mean_spend_all         0.000497
10 2020-01-12                  5 Hillary Clinton mean_spend_all         0.00001 
# … with 1,178 more rows

In the key column, I have melted the variables of mean_spend_all and new_case_rate_07da. I am trying to create a data plot which would show two lines for the development in new cases with time on x-axis (each line having a different color based on whether the winner is Trump or Clinton), and points for the change in spending (the color, again, being a function of the winner2016 column).

I am then making a facet wrap so that I have ten graphs based on the income of the counties' residents. Finally, I would like to display a line of best fit for the change in spending for which I am using the stat_smooth() function.

Ideally, the graph would look similar to this but with added lines for the case rate:

ggplot(data = group_by(afc, winner2016),
      aes(x = afc$day_month_year)) +
 geom_point(aes(color = winner2016, y = filter(afc, key == "mean_spend_all")$mean_spend_cases *100)) +
 geom_line(aes(color = winner2016, y = filter(afc, key == "new_case_rate_07da")$mean_spend_cases)) +
 facet_wrap(afc$deciles_income)+
 labs(title = "Change in spending for counties grouped by decile of income", 
      x = "Decile of a County by income", 
      y = "Change in consumer spending relative to January 14")+
 stat_smooth(aes(color = (afc$winner2016))) +
 scale_y_continuous(limits = c(-30,15))

However, I am getting the error "Aesthetics must be either length 1 or the same as the data (1188): y" which I assume is because of using filter().

This is the structure:

structure(list(day_month_year = structure(c(18301, 18434, 18406, 
18301, 18287, 18406, 18350, 18399, 18329, 18308, 18343, 18413, 
18308, 18434, 18280, 18273, 18371, 18434, 18273, 18448, 18287, 
18434, 18350, 18343, 18427, 18273, 18399, 18273, 18294, 18427
), tzone = "Europe/Prague", class = "Date"), deciles_income = c(9L, 
5L, 4L, 6L, 8L, 8L, 2L, 10L, 8L, 2L, 1L, 4L, 8L, 2L, 7L, 6L, 
5L, 9L, 8L, 3L, 5L, 8L, 8L, 8L, 9L, 7L, 9L, 6L, 9L, 8L), winner2016 = c("Hillary Clinton", 
"Hillary Clinton", "Hillary Clinton", "Donald Trump", "Donald Trump", 
"Hillary Clinton", "Donald Trump", "Donald Trump", "Hillary Clinton", 
"Donald Trump", "Donald Trump", "Donald Trump", "Donald Trump", 
"Hillary Clinton", "Hillary Clinton", "Hillary Clinton", "Hillary Clinton", 
"Hillary Clinton", "Hillary Clinton", "Hillary Clinton", "Donald Trump", 
"Donald Trump", "Hillary Clinton", "Donald Trump", NA, "Donald Trump", 
"Donald Trump", "Donald Trump", NA, "Hillary Clinton"), key = c("new_case_rate_07da", 
"new_case_rate_07da", "mean_spend_all", "new_case_rate_07da", 
"mean_spend_all", "new_case_rate_07da", "new_case_rate_07da", 
"new_case_rate_07da", "new_case_rate_07da", "mean_spend_all", 
"mean_spend_all", "new_case_rate_07da", "mean_spend_all", "new_case_rate_07da", 
"new_case_rate_07da", "mean_spend_all", "new_case_rate_07da", 
"new_case_rate_07da", "new_case_rate_07da", "mean_spend_all", 
"mean_spend_all", "new_case_rate_07da", "new_case_rate_07da", 
"new_case_rate_07da", "mean_spend_all", "new_case_rate_07da", 
"new_case_rate_07da", "new_case_rate_07da", "mean_spend_all", 
"mean_spend_all"), mean_spend_cases = c(NA, 7.15300714285714, 
-0.0640216666666667, 0, 0.0156585338983051, 4.90477891156463, 
1.04001215805471, 4.98906868131868, NA, -0.0116506382978723, 
-0.0940805, 3.22004958592133, 0.0157676779661017, 10.4577329192547, 
NA, -0.0137643636363636, 3.87815714285714, 5.65400529100529, 
NA, 0.00507125, 0.0140480451612903, 5.29207102502018, 3.33591666666667, 
0.280013559322034, 0.0406, NA, 4.06433752775722, NA, 0.00533333333333333, 
-0.109501666666667)), row.names = c(NA, -30L), groups = structure(list(
    day_month_year = structure(c(18273, 18273, 18273, 18280, 
    18287, 18287, 18294, 18301, 18301, 18308, 18308, 18329, 18343, 
    18343, 18350, 18350, 18371, 18399, 18399, 18406, 18406, 18413, 
    18427, 18427, 18434, 18434, 18434, 18434, 18448), tzone = "Europe/Prague", class = "Date"), 
    deciles_income = c(6L, 7L, 8L, 7L, 5L, 8L, 9L, 6L, 9L, 2L, 
    8L, 8L, 1L, 8L, 2L, 8L, 5L, 9L, 10L, 4L, 8L, 4L, 8L, 9L, 
    2L, 5L, 8L, 9L, 3L), .rows = structure(list(c(16L, 28L), 
        26L, 19L, 15L, 21L, 5L, 29L, 4L, 1L, 10L, 13L, 9L, 11L, 
        24L, 7L, 23L, 17L, 27L, 8L, 3L, 6L, 12L, 30L, 25L, 14L, 
        2L, 22L, 18L, 20L), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, 29L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

How would you approach the problem?

Could you please use next code dput(afc[sample(nrow(afc),30),]) and paste the output in your question in order to help you? — Duck
– Duck, Commented Aug 15, 2020 at 11:55
I have added a possible sketch for your problem with the data you provided. I hope that can help you! — Duck
– Duck, Commented Aug 16, 2020 at 17:18

Duck · Accepted Answer · 2020-08-16 17:17:34Z

2

Could you please try next code? There is few data but I believe is enough to sketch what you want:

library(ggplot2)
#Plot
ggplot()+
  geom_point(data=subset(afc,key == "mean_spend_all"),aes(x=day_month_year,
                                                          y=mean_spend_cases *100,
                                                          color = winner2016))+
  stat_smooth(data=subset(afc,key == "mean_spend_all"),
              formula = y~as.numeric(x),method = "gam",se = F,
              aes(x=day_month_year,y=mean_spend_cases*100,color = winner2016))+
  geom_line(data=subset(afc,key == "mean_spend_all"),aes(x=day_month_year,
                                                         y=mean_spend_cases,
                                                         color = winner2016)) +
  facet_wrap(.~deciles_income,scales = 'free')+
  theme(legend.position = 'top')+ylab('')

This will produce next output (few data points):

With more data that should change. Now, in stat_smooth I am not sure about what you want so I have added the code you can see. This works as I will show you next without facets:

ggplot()+
  geom_point(data=subset(afc,key == "mean_spend_all"),aes(x=day_month_year,
                                                          y=mean_spend_cases *100,
                                                          color = winner2016))+
  stat_smooth(data=subset(afc,key == "mean_spend_all"),
              formula = y~as.numeric(x),method = "gam",se = F,
              aes(x=day_month_year,y=mean_spend_cases*100,color = winner2016))

The output:

With more data you should have the proper curves. I have used gam but you should have your own desired method.

answered Aug 16, 2020 at 17:17

Duck

39.6k14 gold badges47 silver badges89 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Otakar Kořínek Over a year ago

Thank you very much! Can I just ask what was the problem in my code? And what is the difference in setting the x variable inside the ggplot function and the geom_point function (in the way the computer interprets it)?

Duck Over a year ago

@OtakarKořínek For sure, your code was having trouble about the data, and sometimes as you used $ for variables. ggplot2 has aes() component then it requires x and y coordinates, which is necessary in the case of your plots because of points and lines. As you want to plot two different elements is better if you define the coordinates in distinct geom as you see. This can also be done in the global ggplot but you should have the data in a format all variables can be read directly an in your case you have a subset. That is why your code broke down. Let me know if that is clear for you :)

Ben Norris · Accepted Answer · 2020-08-15 12:19:32Z

0

In your ggplot() call you define your x aesthetic as afc$day_month_year, which is all data in that column. Then in your two geom_ layers, you define y as a subset of afc$mean_spend_cases, which has a different number of points. You need to define and subset the x aesthetics in your geom_ layers as well:

ggplot(data = group_by(afc, winner2016)) +
  geom_point(aes(x = filter(afc, key == "mean_spend_all")$day_month_year,
                 color = winner2016, 
                 y = filter(afc, key == "mean_spend_all")$mean_spend_cases *100)) +
  geom_line(aes(x = filter(afc, key == "new_case_rate_07da")$day_month_year
                color = winner2016, 
                y = filter(afc, key == "new_case_rate_07da")$mean_spend_cases)) +

answered Aug 15, 2020 at 12:19

Ben Norris

5,7772 gold badges9 silver badges17 bronze badges

2 Comments

Otakar Kořínek Over a year ago

That also gives out the error "Aesthetics must be either length 1 or the same as the data (1188): x and y"

Ben Norris Over a year ago

Yes. I can see why. You need to filter the color aesthetic also so that it is the same length. Duck's answer fixes that.

Collectives™ on Stack Overflow

How to create a plot of two variables in a melted data frame in ggplot2?

2 Answers 2

2 Comments

2 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

2 Comments

Your Answer

Sign up or log in

Post as a guest

Related