PostgreSQL: fill NULL values in timeserie query with previous value

Question

I have a database with time related information. I want a list with values for every minute. Like this:

12:00:00  3
12:01:00  4
12:02:00  5
12:03:00  5
12:04:00  5
12:05:00  3

But when there is no data for some minutes I got a result like this:

12:00:00  3
12:01:00  4
12:02:00  5
12:03:00  NULL
12:04:00  NULL
12:05:00  3

I want to fill the NULL value with the previous NOT NULL value.

This query creates a timeserie for every minute. Then it joins this to the data in my database.

I read something about window functions to fill the NULL value with the previous NOT NULL value, but I can't figure out how to implement this in this query. Can someone push me in the good direction?

I tried this solution, but the NULL values are still there: PostgreSQL use value from previous row if missing

This is my query:

SELECT
    date,
    close
FROM generate_series(
  '2017-11-01 09:00'::timestamp,
  '2017-11-01 23:59'::timestamp,
  '1 minute') AS date
LEFT OUTER JOIN
 (SELECT
    date_trunc('minute', market_summary."timestamp") as day,
    LAST(current, timestamp) AS close
    FROM market_summary
  WHERE created_at >= '2017-11-01 09:00'
    AND created_at < '2017-11-01 23:59'
    GROUP BY day
 ) results
ON (date = results.day)
ORDER BY date

benjamin berhault · Accepted Answer · 2019-03-04 19:35:34Z

29

I found the following method easier:

Create the given data sample:

WITH example (date,close) AS 
(VALUES 
    ('12:00:00',3),
    ('12:00:01',4),
    ('12:00:02',5),
    ('12:00:03',NULL),
    ('12:00:04',NULL), 
    ('12:00:05',3)
) 
SELECT * INTO temporary table market_summary FROM example;

Query to fill NULL values with the previous filled value

select 
    date, 
    close, 
    first_value(close) over (partition by grp_close) as corrected_close
from (
      select date, close,
             sum(case when close is not null then 1 end) over (order by date) as grp_close
      from   market_summary
) t

Return

date      | close | corrected_close
-----------------------------------
12:00:00  | 3     | 3
12:01:00  | 4     | 4
12:02:00  | 5     | 5
12:03:00  | NULL  | 5
12:04:00  | NULL  | 5
12:05:00  | 3     | 3

close: existing value
corrected_close: corrected value

edited Mar 4, 2019 at 19:35

answered Feb 25, 2019 at 15:01

benjamin berhault

3904 silver badges7 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Bryan_D Over a year ago

I found that you may need to order by date in your first_value clauses as well otherwise you may still have nulls pop up first_value(close) over (partition by grp_close order by date) as corrected_close

Edwin Pauli · Accepted Answer · 2017-11-02 12:03:39Z

I found a solution on the page: http://www.postgresql-archive.org/lag-until-you-get-something-OVER-window-function-td5824644.html

CREATE OR REPLACE FUNCTION GapFillInternal( 
    s anyelement, 
    v anyelement) RETURNS anyelement AS 
$$ 
BEGIN 
  RETURN COALESCE(v,s); 
END; 
$$ LANGUAGE PLPGSQL IMMUTABLE; 

CREATE AGGREGATE GapFill(anyelement) ( 
  SFUNC=GapFillInternal, 
  STYPE=anyelement 
); 

postgres=# select id, natural_key, gapfill(somebody) OVER (ORDER BY 
natural_key, id) from lag_test; 
 id │ natural_key │ gapfill 
────┼─────────────┼───────── 
  1 │           1 │ 
  2 │           1 │ Kirk 
  3 │           1 │ Kirk 
  4 │           2 │ Roybal 
  5 │           2 │ Roybal 
  6 │           2 │ Roybal 
(6 rows)

Gordon Linoff · Accepted Answer · 2017-11-02 10:45:30Z

4

Here is one method:

select ms.*, ms_prev.close as lag_close
from (select ms.*,
             max(date) filter (where close is not null) over (order by date rows between unbounded preceding and 1 preceding) as dprev
      from market_summary ms
     ) ms left join
     market_summary ms_prev
     on ms_prev.dprev = ms.date
order by ms.date;

If, however, you only have one or two NULLs in a row, it is probably simpler to use:

select ms.*,
       coalesce(lag(ms.close, 1) over (order by date),
                lag(ms.close, 2) over (order by date),
                lag(ms.close, 3) over (order by date)
               ) as prev_close
from market_summary ms;

answered Nov 2, 2017 at 10:45

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Comments

Daniel Joyce · Accepted Answer · 2020-04-17 22:55:50Z

How to do this in vanilla posgres, with a couple of custom functions.

Schema (PostgreSQL v12)

CREATE TABLE test (ts timestamp, email varchar, title varchar);
insert into test values
('2017-01-01', '[email protected]', 'Old title'),
('2017-01-02', '[email protected]', null),
('2017-01-03', '[email protected]', 'New Title'),
('2017-01-04', '[email protected]', null),
('2017-01-05', '[email protected]', null),
('2017-01-06', '[email protected]', 'Newer Title'),
('2017-01-07', '[email protected]', null),
('2017-01-08', '[email protected]', null);

 -- The built in function coalesce is not a aggregate function, nor is variadic.
 -- It might just be a compiler construct.
 -- So we define our own version
 CREATE FUNCTION f_coalesce(a anyelement, b anyelement) RETURNS anyelement AS '
    SELECT COALESCE(a,b);
 ' LANGUAGE SQL PARALLEL SAFE;
 -- Aggregate colasce that keeps first non-null value it sees
CREATE AGGREGATE agg_coalesce (anyelement)
(
    sfunc = f_coalesce,
    stype = anyelement
);

Query #1

SELECT
    ts,
    email,

    array_agg(title) FILTER (WHERE title is not null ) OVER ( 
        order by ts desc ROWS BETWEEN current row and unbounded following 
    ) as title_array,
    (array_agg(title) FILTER (WHERE title is not null ) OVER ( 
        order by ts desc ROWS BETWEEN current row and unbounded following )
    )[1] as title,
    COALESCE(
        agg_coalesce(title) OVER ( 
            order by ts desc ROWS BETWEEN current row and unbounded following 
        ),
        (select title from test 
            where title is not null 
            and ts < '2017-01-02'
            order by ts desc limit 1 )
    )as title_locf 
from test
where ts >= '2017-01-02'
order by ts desc;

gist:

https://gist.github.com/DanielJoyce/cc9f80d4326b7cb40d07af2ffb069b74

Collectives™ on Stack Overflow

PostgreSQL: fill NULL values in timeserie query with previous value

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related