PostgreSQL: Forward fill NULL values with previous NOT NULL value in group

Question

I'm trying fill NULL values in multiple columns (different column types INT, VARCHAR) with previous NOT NULL value in a group ordered by date. Considering following table:

CREATE TABLE IF NOT EXISTS test (
   id VARCHAR,
   date DATE,
   value_1 INT,
   value_2 VARCHAR
);


INSERT INTO test VALUES 
(1, '2022-01-04', 5, 'asdf'),
(1, '2022-01-03', NULL, NULL),
(1, '2022-01-02', NULL, 'def'),
(1, '2022-01-01', 4, NULL),
(2, '2022-01-04', 1, 'a'),
(2, '2022-01-03', NULL, NULL),
(2, '2022-01-02', 2, 'b'),
(2, '2022-01-01', NULL, NULL);

I want to get here (please consider value_1 --> INTEGER, value_2 string):

You said you want to set null values with previous values. So, why does value_1 end up being 4 for 2022-01-02 and 2022-01-03 instead of 5? After all, value_1 was 5 for 2022-01-01. — Lajos Arpad
– Lajos Arpad, Commented Feb 15, 2023 at 16:18

Zegarek · Accepted Answer · 2023-03-10 12:29:37Z

6

Short window-functions alternative: online demo

SELECT 
  id
, date
,(JSONB_AGG(value_1)FILTER(WHERE value_1 IS NOT NULL)OVER w->>-1)::int AS value_1
, JSONB_AGG(value_2)FILTER(WHERE value_2 IS NOT NULL)OVER w->>-1    AS    value_2
FROM test
WINDOW w AS (PARTITION BY id ORDER BY date) 
ORDER BY id, date DESC;

The principle remains the same as in Lukasz' and Atmo's answers: it constructs a frame looking back at value_Ns from previous dates for the same ID, gets rid of NULL's then picks the latest non-NULL.

It's just shortened and cleaned up by having:

The window defined once and re-used.
An aggregate filter, to skip NULLs instead of collecting and having to remove them.
A jsonb array. Unlike a regular array it allows for negative subscripting, eliminating the need for a subquery, a CTE or duplicating the code in order to call array_upper() and get the last element. That's at the price of having to cast non-text types.

Negative index is equal to positive on a reversed array, so the same can be done with a regular array as long as its constructed in reverse, by inverting the frame clause and order: demo

edited Mar 10, 2023 at 12:29

answered Mar 5, 2023 at 14:33

Zegarek

30k5 gold badges27 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Zegarek Over a year ago

I'm posting this separately since it's fundamentally different from the approach I initially suggested - which I still somewhat prefer for it's conceptual simplicity, being a more direct translation of the spec.

Atmo · Accepted Answer · 2023-02-28 12:52:06Z

One day, PostgreSQL may support the IGNORE NULLS option for LEAD and LAG functions.

In the mean time, you must use window functions to build groups, then select the maximum in each group.

SELECT id, date,
       MAX(value_1) OVER (PARTITION BY id, grp_1) AS value_1,
       MAX(value_2) OVER (PARTITION BY id, grp_2) AS value_2
FROM(
    SELECT *,
          COUNT(value_1) OVER (PARTITION BY id ORDER BY Date DESC) as grp_1,
          COUNT(value_2) OVER (PARTITION BY id ORDER BY Date DESC) as grp_2
    FROM test
) T
ORDER BY ID, date

Explanation:

COUNT(value_1) OVER (PARTITION BY id ORDER BY Date ASC/DESC) is a cumulative count that only increases when value_1 is not null. The result is, for each consecutive value it takes (1, 2, 3), there will be 1 record where value_1 is not null, and 0, 1 or several records where the field is null.

Example:

First record encountered (following the ORDER BY Date specified in the window);
value_1 is null => count is 0.
Next record, value_1 is not null => count is 1.
Next record, value_1 is null => count stays at 1.
Next record, value_1 is null => count stays at 1.
Next record, value_1 is not null => count is 2.
... and so on.

Using this pattern, the MAX(value_1) OVER (PARTITION BY id, grp_1) works on this partition of record by taking the non-null value out of each group.

Group [count = 0] has only 1 record with null => The max is null.
Group [count = 1] has 1 non null and 2 null values => The max is the non null value.
Group [count = 2] has 1 non null value (and an unspecified number of null values) => The max is the non null value.

The pattern keeps going; for every value of count (except 0), the max window function always has exactly 1 non null value to pick.

Edit:
The above query is the answer to the original version of the question. Now that dates have been changed to be in the exact opposite order as before, the 3 occurrences of ORDER BY date must all be inverted to match. Basically, 2 negatives make a positive.

SELECT id, date,
       MAX(value_1) OVER (PARTITION BY id, grp_1) AS value_1,
       MAX(value_2) OVER (PARTITION BY id, grp_2) AS value_2
FROM(
    SELECT *,
          COUNT(value_1) OVER (PARTITION BY id ORDER BY Date ASC) as grp_1,
          COUNT(value_2) OVER (PARTITION BY id ORDER BY Date ASC) as grp_2
    FROM test
) T
ORDER BY ID, date DESC

instead of inserting max_value for NULL value_1, can we provide the previous value till we get new value?
I think you are talking about the gap and island problem. It is solved with LEAD/LAG window functions. You can search the site for this term to find some questions/answers showing how.

Zegarek · Accepted Answer · 2023-03-03 20:15:28Z

PostgreSQL distinct on can help whenever your spec says you want one, top record in a group. You can use it in a scalar subquery in an update: online demo

update test t1 
set value_1=(select distinct on (t2.id) value_1--The top value_1 (latest by date)
             from test t2                      --that's in the same table   ^
             where t2.date < t1.date           --in an earlier row          |
             and   t2.id   = t1.id             --for the same id            |
             and   t2.value_1 is not null      --and isn't a null.          |
             order by t2.id, date desc)        --The top meaning the latest_/
where t1.value_1 is null;

Since all it does is go to a row with a null and from its context, point back at a row you want to borrow the value from, it's fairly intuitive.

Lukasz Szozda · Accepted Answer · 2023-03-04 09:40:25Z

1

Using windowed function:

ARRAY_AGG building array with defined order
ARRAY_REMOVE removing NULLs
getting the last item in the array (cte/or replicating entire query) - unfortunately no way to access it as (array)[-1]

Query version readable:

WITH cte AS (
  SELECT *,
    ARRAY_REMOVE((ARRAY_AGG(value_1) OVER(PARTITION BY id ORDER BY date)), NULL) 
       AS value_1_arr,
    ARRAY_REMOVE((ARRAY_AGG(value_2) OVER(PARTITION BY id ORDER BY date)), NULL) 
       AS value_2_arr
  FROM test
)
SELECT id, date,
    value_1_arr[array_upper(value_1_arr, 1)] AS value_1,
    value_2_arr[array_upper(value_2_arr, 1)] AS value_2
FROM cte
ORDER BY id, date DESC;

Debug (array_is_build, it is a matter of taking last item)

WITH cte AS (
  SELECT *,
    ARRAY_REMOVE((ARRAY_AGG(value_1) OVER(PARTITION BY id ORDER BY date)), NULL) 
       AS value_1_arr,
    ARRAY_REMOVE((ARRAY_AGG(value_2) OVER(PARTITION BY id ORDER BY date)), NULL) 
       AS value_2_arr
  FROM test
)
SELECT *
FROM cte
ORDER BY id, date DESC;

Single query (lack of [-1] and duplicating the code makes it "ugly"

SELECT id, date,   
  (ARRAY_REMOVE((ARRAY_AGG(value_1) OVER(PARTITION BY id ORDER BY date)), NULL))
  [array_upper(ARRAY_REMOVE((ARRAY_AGG(value_1) OVER(PARTITION BY id ORDER BY date)), NULL),1)]
 AS value_1, 
  (ARRAY_REMOVE((ARRAY_AGG(value_2) OVER(PARTITION BY id ORDER BY date)), NULL))
  [array_upper(ARRAY_REMOVE((ARRAY_AGG(value_2) OVER(PARTITION BY id ORDER BY date)), NULL),1)]
 AS value_2
FROM test
ORDER BY id, date DESC;

Output:

db<>fiddle demo

edited Mar 4, 2023 at 9:40

answered Mar 4, 2023 at 0:38

Lukasz Szozda

181k26 gold badges278 silver badges326 bronze badges

3 Comments

Zegarek Over a year ago

It breaks.

Lukasz Szozda Over a year ago

@Zegarek Thank you for extending the initial data set. The version with windowed MAX worked for OP's dataset but not in general. I have rewritten my answer and provided an ARRAY_AGG equivalent of LEAD/LAG with IGNORE NULLS

Zegarek Over a year ago

You have my upvote. After this I gave window functions a shot and ended up solving the subscript issue, if you mind taking a look.

Collectives™ on Stack Overflow

PostgreSQL: Forward fill NULL values with previous NOT NULL value in group

4 Answers 4

1 Comment

2 Comments

Comments

3 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

2 Comments

Comments

3 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related