0

I am trying to figure out how to return multiple columns that correspond with the desired aggregate functions, max of a sum, in SQL.

Based on data from CDC's Serotypes of concern: Illnesses and Outbreaks, I want to know what food caused the most illnesses for each year, so each row would look like: the year, the food category, and the total number of illnesses, which is basically max(sum(No_of_illnesses)). Sample data (found in CDC link above):

table_id Food_category Year_first_ill Serotype No_of_illnesses No_of_outbreak Pathogen Yr Year_range Running_total_by_year_range
Pork_Adelaide_2011-2015 Pork 2011 Adelaide 0 0 Salmonella 2020 2011-2015 0
Pork_Adelaide_2011-2015 Pork 2012 Adelaide 0 0 Salmonella 2020 2011-2015 0
... ... ... ... ... ... ... ... ... ...
Chicken_Anatum_2011-2015 Chicken 2011 Anatum 0 0 Salmonella 2020 2011-2015 0

In the end, what I'd like returned is all three columns for the max(Total_Illnesses) grouped by year with its corresponding food category, so the result would look partially like this:

Year Food Total_Illnesses
2011 Chicken 545
2012 Chicken 544
... ... ...
2022 Beef 384
2023 Chicken 113

In order to do that, I wrote a sub-query that summed the number illnesses by food category for that year and then tried to find the max of those sums. The two suggestions I've read online is a) grouping by both columns and b) the window function. My two attempts:

select Year_first_ill, Food_category, max(total)
from (select Year_first_ill, Food_category, sum(No_of_illnesses) as total
from salmonella 
group by Food_category, Year_first_ill)s
group by Food_category, Year_first_ill 

which doesn't return the max, but essentially returns the sums from the sub-query table (but in a different, odd order--I don't know if this is relevant, but: the data is grouped by food for 2011-20, but for the last 3 years, it's grouped by year):

Year Food Total_Illnesses
2011 Pork 238
2012 Pork 14
... ... ...
2011 Chicken 545
2012 Chicken 544
... ... ...
select Year_first_ill, Food_category, MAX(total) OVER (PARTITION BY Year_first_ill)
from (select Year_first_ill, Food_category, sum(No_of_illnesses) as total
from salmonella 
group by Food_category, Year_first_ill)s

which returns the correct max for each year, but repeated for each food:

Year Food Total_Illnesses
2011 Turkey 545
2011 Pork 545
2011 Beef 545
2011 Chicken 545
... ... ...
2023 Pork 113
2023 Chicken 113
2023 Beef 113

I am unable to correctly return all three columns. So, how can I return multiple columns that correspond to nesting aggregate functions in SQL?

Note: I am using DB Fiddle, MySQL v8. Link to code.

7
  • Please provide a minimal reproducible example with sample data, desired results and your attempt - including link to your DB Fiddle. Note sample data is not the entries external dataset you linked to , it's a small representation of the data contained within your question. Commented Oct 5 at 6:25
  • Changed duplicate link from "max of sum" to "greatest-n-per-group" (how to pick one row per year) Commented Oct 5 at 10:52
  • 1
    Changed the duplicate back as this is the exact duplicate - same questuon with different data. Commented Oct 5 at 12:52
  • @shadow The answers to that question are truely awful. The accepted answer won't even run in MySQL 8, and none make use of MySQL 8's a analytic functions (where as my alternative has both styles). By all means link to a different greatest-n-per-group question, but not the one you chose. Commented Oct 5 at 13:14
  • 2
    You want select year_first_ill, food_category, total from (select year_first_ill, food_category, sum(no_of_illnesses) as total, max(sum(no_of_illnesses)) over (partition by year_first_ill) as max_total from salmonella group by food_category, year_first_ill) s where total = max_total order by year_first_ill; Commented Oct 6 at 14:54

0

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.