Returning MIN and MAX values and ignoring nulls - populate null values with preceding non-null value

Question

Using a table of events, I need to return the date and type for:

the first event
the most recent (non-null) event

The most recent event could have null values, which in that case needs to return the most recent non-null value

I found a few articles as well as posts here on SO that are similar (maybe even identical) but am not able to decode or understand the solution - i.e.

Fill null values with last non-null amount - Oracle SQL

https://www.itprotoday.com/sql-server/last-non-null-puzzle

https://koukia.ca/common-sql-problems-filling-null-values-with-preceding-non-null-values-ad538c9e62a6

Table is as follows - there are additional columns, but I am only including 3 for the sake of simplicity. Also note that the first Type and Date could be null. In this case returning null is desired.

╔═══════╦════════╦════════════╗
║ Email ║  Type  ║    Date    ║
╠═══════╬════════╬════════════╣
║ A     ║ Create ║ 2019-04-01 ║
║ A     ║ Update ║ 2019-04-02 ║
║ A     ║ null   ║ null       ║
╚═══════╩════════╩════════════╝

The output should be:

╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate  ║ LastType ║  LastDate  ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A     ║ Create    ║ 2019-04-01 ║ Update   ║ 2019-04-02 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝

The first method I tried was to join the table to itself using a subquery that finds the MIN and MAX dates using case statements:

select
  Email,
  max(case when T1.Date = T2.Min_Date then T1.Type end) as FirstType,
  max(case when T1.Date = T2.Min_Date then T1.Date end) as FirstDate,
  max(case when T1.Date = T2.Max_Date then T1.Type end) as LastType,
  max(case when T1.Date = T2.Max_Date then T1.Date end) as LastDate,
from
  T1
join
  (select
    EmailAddress,
    max(Date) as Max_Date,
    min(Date) as Min_Date
  from
    Table1
  group by 
    Email
  ) T2
on
  T1.Email = T2.Email
group by
  T1.Email

This seemed to work for the MIN values, but the MAX values would return null.

To solve the problem of returning the last non-value I attempted this:

select
   EmailAddress,
   max(Date) over (partition by EmailAddress rows unbounded preceding) as LastDate,
   max(Type) over (partition by EmailAddress rows unbounded preceding) as LastType
from
   T1
group by
   EmailAddress,
   Date,
   Type

However, this gives a result of 3 rows, instead of 1.

I'll admit I don't quite understand analytic functions since I have not had to deal with them at length. Any help would be greatly appreciated.

Edit: The aforementioned example is an accurate representation of what the data could look like, however the below example is the exact sample data that I am using.

Sample:

╔═══════╦════════╦════════════╗
║ Email ║  Type  ║    Date    ║
╠═══════╬════════╬════════════╣
║ A     ║ Create ║ 2019-04-01 ║
║ A     ║ null   ║ null       ║
╚═══════╩════════╩════════════╝

Desired Outcome:

╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate  ║ LastType ║  LastDate  ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A     ║ Create    ║ 2019-04-01 ║ Create   ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝

Additional Use-Case:

╔═══════╦════════╦════════════╗
║ Email ║  Type  ║    Date    ║
╠═══════╬════════╬════════════╣
║ A     ║ null   ║ null       ║
║ A     ║ Create ║ 2019-04-01 ║
╚═══════╩════════╩════════════╝

Desired Outcome:

╔═══════╦═══════════╦════════════╦══════════╦════════════╗
║ Email ║ FirstType ║ FirstDate  ║ LastType ║  LastDate  ║
╠═══════╬═══════════╬════════════╬══════════╬════════════╣
║ A     ║ null      ║ null       ║ Create   ║ 2019-04-01 ║
╚═══════╩═══════════╩════════════╩══════════╩════════════╝

Gordon Linoff · Accepted Answer · 2019-04-29 20:45:11Z

0

Use window functions and conditional aggregation:

select t.email,
       max(case when seqnum = 1 then type end) as first_type,
       max(case when seqnum = 1 then date end) as first_date,
       max(case when seqnum_nonull = 1 and type is not null then type end) as last_type,
       max(case when seqnum_nonull = 1 and type is not null then date end) as last_date
from (select t.*,
             row_number() over (partition by email order by date) as seqnum,
             row_number() over (partition by email, (case when type is null then 1 else 2 end) order by date) as seqnum_nonull
      from t
     ) t
group by t.email;

edited Apr 29, 2019 at 20:45

answered Apr 26, 2019 at 0:38

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

A.Coustic Over a year ago

Thanks for the reply - this code returns an error in Databricks that I can't debug. Error in SQL statement: ParseException: mismatched input 'from' expecting <EOF>(line 6, pos 0)

Gordon Linoff Over a year ago

@A.Coustic . . . There was a typo in the definition of seqnum_nonull that might've caused your problem.

A.Coustic Over a year ago

The typo fix did solve the problem, however the output isn't exactly as desired. The sample dataset I am using only has 2 records - the first is non-null and the second has null values. The outcome returns the null value as the first event, and the non-null as the second event (I think this is flipped?) The desired outcome should return the first record for both the first and last events. Also I believe there is a typo in the last case statement where it is mixing type and date.

wBob · Accepted Answer · 2019-04-29 21:05:31Z

0

As Spark SQL window functions support NULLS LAST|FIRST syntax you could use that then specify a pivot with multiple aggregates for rn values 1 and 2. I could do with seeing some more sample data but this work for your dataset:

%sql
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp;

;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date), MAX(type) FOR rn In ( 1, 2 ) )

Rename the columns by supplying your required parts in the query, eg

-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
)
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )

Alternately supply a column list, eg

-- Pivot and rename columns
;WITH cte AS
(
SELECT *, ROW_NUMBER() OVER( PARTITION BY email ORDER BY date NULLS LAST ) rn
FROM tmp
), cte2 AS
(
SELECT *
FROM cte
PIVOT ( MAX(date) AS Date, MAX(type) AS Type FOR rn In ( 1 First, 2 Last ) )
) 
SELECT *
FROM cte2 AS (Email, FirstDate, FirstType, LastDate, LastType)

This simple query uses ROW_NUMBER to assign a row number to the dataset ordered by the date column, but using the NULLS LAST syntax to ensure null rows appear last in the numbering. The PIVOT then converts the rows to columns.

edited Apr 29, 2019 at 21:05

answered Apr 27, 2019 at 23:08

wBob

14.4k3 gold badges26 silver badges43 bronze badges

5 Comments

A.Coustic Over a year ago

Thanks, this seems to work. How can I rename the output columns? Can you also briefly explain what is going on here?

wBob Over a year ago

Any update @A.Coustic? Consider marking this as the answer if it works and answers your question.

A.Coustic Over a year ago

It looks like the output isn't exactly as desired. The sample dataset I am using only has 2 records, the latter being the null record (in retrospect, should have used this sample for my post). It is returning the non-null values for the first event and the null values for the last event. Desired should return the first record for both first and last events as null is only valid if it is the first record. I can't say what the output is using the sample data originally provided in the post..

wBob Over a year ago

Please provide some accurate sample data and expected results.

A.Coustic Over a year ago

Please see my edited post. Both examples are use-cases so I am not sure if that affects the code.

Collectives™ on Stack Overflow

Returning MIN and MAX values and ignoring nulls - populate null values with preceding non-null value

2 Answers 2

3 Comments

5 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related