23

I have a database with time related information. I want a list with values for every minute. Like this:

12:00:00  3
12:01:00  4
12:02:00  5
12:03:00  5
12:04:00  5
12:05:00  3

But when there is no data for some minutes I got a result like this:

12:00:00  3
12:01:00  4
12:02:00  5
12:03:00  NULL
12:04:00  NULL
12:05:00  3

I want to fill the NULL value with the previous NOT NULL value.

This query creates a timeserie for every minute. Then it joins this to the data in my database.

I read something about window functions to fill the NULL value with the previous NOT NULL value, but I can't figure out how to implement this in this query. Can someone push me in the good direction?

I tried this solution, but the NULL values are still there: PostgreSQL use value from previous row if missing

This is my query:

SELECT
    date,
    close
FROM generate_series(
  '2017-11-01 09:00'::timestamp,
  '2017-11-01 23:59'::timestamp,
  '1 minute') AS date
LEFT OUTER JOIN
 (SELECT
    date_trunc('minute', market_summary."timestamp") as day,
    LAST(current, timestamp) AS close
    FROM market_summary
  WHERE created_at >= '2017-11-01 09:00'
    AND created_at < '2017-11-01 23:59'
    GROUP BY day
 ) results
ON (date = results.day)
ORDER BY date

4 Answers 4

29

I found the following method easier:

Create the given data sample:

WITH example (date,close) AS 
(VALUES 
    ('12:00:00',3),
    ('12:00:01',4),
    ('12:00:02',5),
    ('12:00:03',NULL),
    ('12:00:04',NULL), 
    ('12:00:05',3)
) 
SELECT * INTO temporary table market_summary FROM example;

Query to fill NULL values with the previous filled value

select 
    date, 
    close, 
    first_value(close) over (partition by grp_close) as corrected_close
from (
      select date, close,
             sum(case when close is not null then 1 end) over (order by date) as grp_close
      from   market_summary
) t

Return

date      | close | corrected_close
-----------------------------------
12:00:00  | 3     | 3
12:01:00  | 4     | 4
12:02:00  | 5     | 5
12:03:00  | NULL  | 5
12:04:00  | NULL  | 5
12:05:00  | 3     | 3
  • close: existing value
  • corrected_close: corrected value
Sign up to request clarification or add additional context in comments.

1 Comment

I found that you may need to order by date in your first_value clauses as well otherwise you may still have nulls pop up first_value(close) over (partition by grp_close order by date) as corrected_close
7

I found a solution on the page: http://www.postgresql-archive.org/lag-until-you-get-something-OVER-window-function-td5824644.html

CREATE OR REPLACE FUNCTION GapFillInternal( 
    s anyelement, 
    v anyelement) RETURNS anyelement AS 
$$ 
BEGIN 
  RETURN COALESCE(v,s); 
END; 
$$ LANGUAGE PLPGSQL IMMUTABLE; 

CREATE AGGREGATE GapFill(anyelement) ( 
  SFUNC=GapFillInternal, 
  STYPE=anyelement 
); 

postgres=# select id, natural_key, gapfill(somebody) OVER (ORDER BY 
natural_key, id) from lag_test; 
 id │ natural_key │ gapfill 
────┼─────────────┼───────── 
  1 │           1 │ 
  2 │           1 │ Kirk 
  3 │           1 │ Kirk 
  4 │           2 │ Roybal 
  5 │           2 │ Roybal 
  6 │           2 │ Roybal 
(6 rows) 

Comments

4

Here is one method:

select ms.*, ms_prev.close as lag_close
from (select ms.*,
             max(date) filter (where close is not null) over (order by date rows between unbounded preceding and 1 preceding) as dprev
      from market_summary ms
     ) ms left join
     market_summary ms_prev
     on ms_prev.dprev = ms.date
order by ms.date;

If, however, you only have one or two NULLs in a row, it is probably simpler to use:

select ms.*,
       coalesce(lag(ms.close, 1) over (order by date),
                lag(ms.close, 2) over (order by date),
                lag(ms.close, 3) over (order by date)
               ) as prev_close
from market_summary ms;

Comments

2

How to do this in vanilla posgres, with a couple of custom functions.

Schema (PostgreSQL v12)

CREATE TABLE test (ts timestamp, email varchar, title varchar);
insert into test values
('2017-01-01', '[email protected]', 'Old title'),
('2017-01-02', '[email protected]', null),
('2017-01-03', '[email protected]', 'New Title'),
('2017-01-04', '[email protected]', null),
('2017-01-05', '[email protected]', null),
('2017-01-06', '[email protected]', 'Newer Title'),
('2017-01-07', '[email protected]', null),
('2017-01-08', '[email protected]', null);

 -- The built in function coalesce is not a aggregate function, nor is variadic.
 -- It might just be a compiler construct.
 -- So we define our own version
 CREATE FUNCTION f_coalesce(a anyelement, b anyelement) RETURNS anyelement AS '
    SELECT COALESCE(a,b);
 ' LANGUAGE SQL PARALLEL SAFE;
 -- Aggregate colasce that keeps first non-null value it sees
CREATE AGGREGATE agg_coalesce (anyelement)
(
    sfunc = f_coalesce,
    stype = anyelement
);

Query #1

SELECT
    ts,
    email,

    array_agg(title) FILTER (WHERE title is not null ) OVER ( 
        order by ts desc ROWS BETWEEN current row and unbounded following 
    ) as title_array,
    (array_agg(title) FILTER (WHERE title is not null ) OVER ( 
        order by ts desc ROWS BETWEEN current row and unbounded following )
    )[1] as title,
    COALESCE(
        agg_coalesce(title) OVER ( 
            order by ts desc ROWS BETWEEN current row and unbounded following 
        ),
        (select title from test 
            where title is not null 
            and ts < '2017-01-02'
            order by ts desc limit 1 )
    )as title_locf 
from test
where ts >= '2017-01-02'
order by ts desc;

gist:

https://gist.github.com/DanielJoyce/cc9f80d4326b7cb40d07af2ffb069b74

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.