While inserting, replace NULL with previous non-null value

Question

I have a Postgres table with columns like this:

(id, foreign_key, big_int, value_1, value_2)

id is the autoincrement primary key, there is a unique key on foreign_key and big_int columns. I am inserting several million records into this table using several large INSERT .. VALUES (),()...() calls.

One of value_1 or value_2 is usually NULL, both can be set, too, but both are never NULL.
What I wish to achieve is to eliminate NULL besides the initial NULLs by replacing the NULL in the INSERT with the previous non-null values for value_1 and value_2 among all rows for foreign_key where big_int is max (due to the unique index there will only be one row and this should also be the row inserted before the one to be inserted - if it makes things easier).

How can I do that in a fast way? I stumbled upon the LAG() function, but I didn't figure out how to use it properly.

Example:

Lets assume I am inserting the following values:

INSERT INTO my_table (foreign_key, big_int, value_1, value_2) VALUES
(1, 5000, NULL, 50  ),
(1, 5001, 51,   NULL),
(2, 6000, 20,   NULL),
(1, 5002, NULL, 52  ),
(2, 6001, 21,   22  ),
(1, 5003, 53,   54  );

Then the resulting table is supposed to look like this (omitting the id column):

1 | 5000 | NULL | 50    -> inserted as is since there is no previous row for foreign_key 1
1 | 5001 | 51   | 50    -> 50 for value_2 column copied from previous row for foreign_key 1
2 | 6000 | 20   | NULL  -> inserted as is since there is no previous row for foreign_key 2
1 | 5002 | 51   | 52    -> 51 for value_1 column copied from previous row for foreign_key 1
2 | 6001 | 21   | 22    -> 22 for value_2 column copied from previous row for foreign_key 2
1 | 5003 | 53   | 54    -> no need to copy since both values have been provided

I am sure I have to modify the INSERT statement. Anything else (triggers?, etc) seem to slow.

A table of sample data is worth a thousand words for SO SQL questions. That being said, can you include some sample data? — Tim Biegeleisen
– Tim Biegeleisen, Commented Jul 8, 2020 at 10:19
I think that you should use some algorithm for to store data before inserting — Nurbek Boymurodov
– Nurbek Boymurodov, Commented Jul 8, 2020 at 12:24

Bergi · Accepted Answer · 2020-07-08 12:50:30Z

1

You can use a CTE for the VALUES to make use of the lag window function in the INSERT:

WITH input(foreign_key, big_int, value_1, value_2) AS (VALUES
  (…),
  …)
INSERT INTO my_table (foreign_key, big_int, value_1, value_2)
SELECT
  foreign_key,
  big_int,
  COALESCE(value_1, lag(value_1) OVER (ORDER BY big_int)),
  COALESCE(value_2, lag(value_2) OVER (ORDER BY big_int))
FROM input;

^{(online demo)}

answered Jul 8, 2020 at 12:50

Bergi

671k162 gold badges1k silver badges1.5k bronze badges

Sign up to request clarification or add additional context in comments.

5 Comments

8192K Over a year ago

This would work with one INSERT, right? But if I have several INSERTs, the SELECT ... FROM input would only work on the current set of VALUES, wouldn't it? Could I do something like SELECT ... FROM input UNION SELECT (..same columns..) FROM my_table?

8192K Over a year ago

Should be UNION ALL

8192K Over a year ago

But with such a large UNION ALL table I wouldn't have indexes and the functions used in the SELECT would take forever. Is this correct?

Bergi Over a year ago

Oh you mean you want to take the current max values from the table? I guess you can do that by using COALESCE(value_1, lag…, (SELECT value_1 FROM my_table WHERE value_1 IS NOT NULL ORDER BY big_int DESC LIMIT 1)), which should use the existing index on big_int to get the find last non-null value_1.

8192K Over a year ago

Thank you! However this did not work. Also, any CTE I've tried would be too slow unfortunately. I therefore created a second table which stores the max(big_int) values by foreign_key. I load this small table into memory and create my INSERTs with direct values from that table in memory. The content is updated with every new max for big_int and written back to DB asynchronously every few seconds.

Collectives™ on Stack Overflow

While inserting, replace NULL with previous non-null value

1 Answer 1

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related