1

I've got some data in a Postgres table that looks like:

Name | Date      | Balance
--------------------------
A    |2020-01-01 |    1
B    |2020-01-01 |    0
B    |2020-01-02 |    2
A    |2020-01-03 |    5

(note that A is missing a value for 2020-01-02 and B for 2020-01-03)

I'd like to fill in the missing date with it's most recent value for that name. In other words, I'd like

Name | Date      | Balance
--------------------------
A    |2020-01-01 |    1
B    |2020-01-01 |    0
A    |2020-01-02 |    1 <--- filled in with previous balance
B    |2020-01-02 |    2
A    |2020-01-03 |    5
B    |2020-01-03 |    2 <--- filled in with previous balance

Note that in reality, several dates may be missing in a row, in which case the most recent one for that name should always be selected.

1 Answer 1

2

I am thinking generate_series() and window functions:

select 
    n.name, 
    s.date, 
    coalesce(t.balance, lag(balance) over(partition by n.name order by s.date) balance
from (select generate_series(min(date), max(date), interval '1 day') date from mytable) s
cross join (select distinct name from mytable) n
left join mytable t on t.name = n.name and t.date = s.date
order by n.name, s.date

If you may have several missing dates in a row, then a little more logic is needed - this basically emulates lag() with the ignore nulls option:

select
    name,
    date,
    coalesce(balance, first_value(balance) over(partition by name, grp)) balance
from (
    select 
        n.name, 
        s.date, 
        t.balance,
        sum( (t.balance is not null)::int ) over(partition by n.name order by s.date) grp
    from (select generate_series(min(date), max(date), interval '1 day') date from mytable) s
    cross join (select distinct name from mytable) n
    left join mytable t on t.name = n.name and t.date = s.date
) t
order by name, date
Sign up to request clarification or add additional context in comments.

2 Comments

This is great, thanks! One last question, what if I wanted to be able to filter above a certain date, but still have the coelesced balance include the most recent value (even if the value is before the given date)?
@Octodone: you would need to turn the query to a subquery, and do the date filtering in the outer query.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.