19

I am writing a SQL query using PostgreSQL that needs to rank people that "arrive" at some location. Not everyone arrives however. I am using a rank() window function to generate arrival ranks, but in the places where the arrival time is null, rather than returning a null rank, the rank() aggregate function just treats them as if they arrived after everyone else. What I want to happen is that these no-shows get a rank of NULL instead of this imputed rank.

Here is an example. Suppose I have a table dinner_show_up that looks like this:

 | Person | arrival_time | Restaurant |
 +--------+--------------+------------+
 | Dave   |     7        | in_and_out |
 | Mike   |     2        | in_and_out | 
 | Bob    |   NULL       | in_and_out | 

Bob never shows up. The query I'm writing would be:

select Person, 
       rank() over (partition by Restaurant order by arrival_time asc) 
               as arrival_rank
from dinner_show_up; 

And the result will be

 | Person | arrival_rank | 
 +--------+--------------+
 | Dave   |     2        | 
 | Mike   |     1        | 
 | Bob    |     3        |  

What I want to happen instead is this:

 | Person | arrival_rank | 
 +--------+--------------+
 | Dave   |     2        | 
 | Mike   |     1        | 
 | Bob    |     NULL     |  

4 Answers 4

22

Just use a case statement around the rank():

select Person, 
       (case when arrival_time is not null
             then rank() over (partition by Restaurant order by arrival_time asc) 
        end) as arrival_rank
from dinner_show_up; 
Sign up to request clarification or add additional context in comments.

2 Comments

+1 Interestingly, in SQL Server rank will process nulls before not nulls and rank will have gap, looks like postgres processing nulls after not nulls or don't count them at all?
@RomanPekar You can define whether or not you want NULLs first or last in the ORDER BY of the RANK() function (or regular queries of course) -> postgresql.org/docs/current/static/indexes-ordering.html
14

A more general solution for all aggregate functions, not only rank(), is to partition by 'arrival_time is not null' in the over() clause. That will cause all null arrival_time rows to be placed into the same group and given the same rank, leaving the non-null rows to be ranked relative only to each other.

For the sake of a meaningful example, I mocked up a CTE having more rows than the intial problem set. Please forgive the wide rows, but I think they better contrast the differing techniques.

with dinner_show_up("person", "arrival_time", "restaurant") as (values
   ('Dave' ,    7, 'in_and_out')
  ,('Mike' ,    2, 'in_and_out')
  ,('Bob'  , null, 'in_and_out')
  ,('Peter',    3, 'in_and_out')
  ,('Jane' , null, 'in_and_out')
  ,('Merry',    5, 'in_and_out')
  ,('Sam'  ,    5, 'in_and_out')
  ,('Pip'  ,    9, 'in_and_out')
)

select 
   person
  ,case when arrival_time is not null then         rank() over (                                      order by arrival_time) end as arrival_rank_without_partition
  ,case when arrival_time is not null then         rank() over (partition by arrival_time is not null order by arrival_time) end as arrival_rank_with_partition
  ,case when arrival_time is not null then percent_rank() over (                                      order by arrival_time) end as arrival_pctrank_without_partition
  ,case when arrival_time is not null then percent_rank() over (partition by arrival_time is not null order by arrival_time) end as arrival_pctrank_with_partition
from dinner_show_up

This query gives the same results for arrival_rank_with/without_partition. However, the results for percent_rank() do differ: without_partition is wrong, ranging from 0% to 71.4%, whereas with_partition correctly gives pctrank() ranging from 0% to 100%.

This same pattern applies to the ntile() aggregate function, as well.

It works by separating all null values from non-null values for purposes of the ranking. This ensures that Jane and Bob are excluded from the percentile ranking of 0% to 100%.

 |person|arrival_rank_without_partition|arrival_rank_with_partition|arrival_pctrank_without_partition|arrival_pctrank_with_partition|
 +------+------------------------------+---------------------------+---------------------------------+------------------------------+
 |Jane  |null                          |null                       |null                             |null                          |
 |Bob   |null                          |null                       |null                             |null                          |
 |Mike  |1                             |1                          |0                                |0                             |
 |Peter |2                             |2                          |0.14                             |0.2                           |
 |Sam   |3                             |3                          |0.28                             |0.4                           |
 |Merry |4                             |4                          |0.28                             |0.4                           |
 |Dave  |5                             |5                          |0.57                             |0.8                           |
 |Pip   |6                             |6                          |0.71                             |1.0                           |

1 Comment

Very good and detailed answer. Especially like the inclusion of how to get the percent rank correct. Thanks!
0
select Person, 
   rank() over (partition by Restaurant order by arrival_time asc) 
           as arrival_rank
from dinner_show_up
where arrival_time is not null
union 
select Person,NULL as arrival_rank
from dinner_show_up
where arrival_time is null;

1 Comment

First - you need to change union or union all. Otherwise postgres will waste alot of time on serching for duplicates. Secong - this query involves 2 table scans - it will still be ineficient.
0

if using Snowflake SQL (untested for other SQL), you can embed an "iff" condition in the window function order to effectively remove or de-prioritize certain scenarios

select person
   , rank() over (
        partition by restaurant 
        order by iff(arrival_time is null, 0, arrival_time) asc 
        ) as arrival_rank
from dinner_show_up

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.