1

I have a database with:

[business] must have a [location]
[location] may have many [business]
[location] may have many [postcode]
[postcode] may be for many [location]
[postcode] must have a [state]

I want a query to return the top 10 locations for each state in terms of the business count.

To make more sense, it may come out:

Western Australia
  - perth
  - bunbury
  - etc. up to 10

(i.e. perth has the most businesses listed in Western Australia)

Victoria
  - melbourne
  - st kilda
  - etc. up to 10

Etc.

I was hoping to achieve this without using UNION. I haven't done complex SQL for a while and it's hurting my head.

I am using MySQL.

A similar way to put this would be if you had categories, products and orders and wanted the the top ten products in terms of order count for each category.

4
  • 1
    You're going to need to include your table definitions if you want people to help you craft a MySQL query Commented Jan 13, 2011 at 3:10
  • If you use the categories, products and orders example at the end you could imagine a standard table definition. It's just as much a conceptual question about how to achieve this. I'm happy to write the query I just have no idea how to structure it without using union. Commented Jan 13, 2011 at 3:15
  • Tagged with greatest-n-per-group to guide you to similar questions -- I saw someone else do this (and took the exact comment! :-). Anyway, this would done with SQL Server with ROW_NUMBER(...PARTITION ON...) so it's a matter of finding a (nice) MySQL equivalent, if it exits. Commented Jan 13, 2011 at 5:01
  • This question perked my interest, so I started looking at it ... best explanation/example I could find was explainextended.com/2009/03/05/row-sampling. It uses some messy 'side-effecting' variables but the article states it runs in Omega(n) so it should be fine. This is similar as cyberkiwi proposed below. Commented Jan 13, 2011 at 5:06

1 Answer 1

1

This query should work.

select * from (
    select a.state, a.location_id, C,
        @n:=case when @s=a.state then @n+1 else 1 end counter,
        @s:=a.state
    from (select @s:=null) b, (
        select pc.state, b.location_id, COUNT(b.location_id) C
        from postcode pc
        inner join location_postcode lp on lp.postcode_id=pc.id
        inner join business b on b.location_id=lp.location_id
        group by pc.state, b.location_id
        order by pc.state, C desc
    ) a
) c
where counter <= 10

I used field names that should be easy to follow, assuming these tables exist with stated relationships:

business M-1 location
location M-M postcode
    (expands to) => location 1-M location_postcode M-1 postcode
postcode M-1 state

The query has beentested with this data:

create table business (location_id int);
insert into business select floor(rand()*10);
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
insert into business select floor(rand()*10) from business;
create table location_postcode (location_id int, postcode_id int);
insert into location_postcode select 1,1;
insert into location_postcode select 2,1;
insert into location_postcode select 3,1;
insert into location_postcode select 4,2;
insert into location_postcode select 5,1;
insert into location_postcode select 5,2;
insert into location_postcode select 6,1;
insert into location_postcode select 6,3;
insert into location_postcode select 7,1;
insert into location_postcode select 7,4;
insert into location_postcode select 8,5;
insert into location_postcode select 9,6;
insert into location_postcode select 10,7;
create table postcode (id int, state int);
insert into postcode select 1,1;
insert into postcode select 2,2;
insert into postcode select 3,3;
insert into postcode select 4,4;
insert into postcode select 5,4;
insert into postcode select 6,5;
insert into postcode select 7,5;

Which doesn't quite create enough records for a "top 10" from each, but you will see how the COUNTER column is ranking correctly. To see it working against this small data set, first leave this filter in there

where counter <= 10

to check the COUNTER column, then reduce it to something like 2 or 3, to show only the top 2 or 3 per state.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.