0

I have a Postgres table with columns like this:

(id, foreign_key, big_int, value_1, value_2)

id is the autoincrement primary key, there is a unique key on foreign_key and big_int columns. I am inserting several million records into this table using several large INSERT .. VALUES (),()...() calls.

One of value_1 or value_2 is usually NULL, both can be set, too, but both are never NULL.
What I wish to achieve is to eliminate NULL besides the initial NULLs by replacing the NULL in the INSERT with the previous non-null values for value_1 and value_2 among all rows for foreign_key where big_int is max (due to the unique index there will only be one row and this should also be the row inserted before the one to be inserted - if it makes things easier).

How can I do that in a fast way? I stumbled upon the LAG() function, but I didn't figure out how to use it properly.

Example:

Lets assume I am inserting the following values:

INSERT INTO my_table (foreign_key, big_int, value_1, value_2) VALUES
(1, 5000, NULL, 50  ),
(1, 5001, 51,   NULL),
(2, 6000, 20,   NULL),
(1, 5002, NULL, 52  ),
(2, 6001, 21,   22  ),
(1, 5003, 53,   54  );

Then the resulting table is supposed to look like this (omitting the id column):

1 | 5000 | NULL | 50    -> inserted as is since there is no previous row for foreign_key 1
1 | 5001 | 51   | 50    -> 50 for value_2 column copied from previous row for foreign_key 1
2 | 6000 | 20   | NULL  -> inserted as is since there is no previous row for foreign_key 2
1 | 5002 | 51   | 52    -> 51 for value_1 column copied from previous row for foreign_key 1
2 | 6001 | 21   | 22    -> 22 for value_2 column copied from previous row for foreign_key 2
1 | 5003 | 53   | 54    -> no need to copy since both values have been provided

I am sure I have to modify the INSERT statement. Anything else (triggers?, etc) seem to slow.

5
  • A table of sample data is worth a thousand words for SO SQL questions. That being said, can you include some sample data? Commented Jul 8, 2020 at 10:19
  • I can, after lunch! Commented Jul 8, 2020 at 10:37
  • Well, hopefully I've you some food for thought :-) Commented Jul 8, 2020 at 10:38
  • @TimBiegeleisen There you go! Commented Jul 8, 2020 at 12:14
  • I think that you should use some algorithm for to store data before inserting Commented Jul 8, 2020 at 12:24

1 Answer 1

1

You can use a CTE for the VALUES to make use of the lag window function in the INSERT:

WITH input(foreign_key, big_int, value_1, value_2) AS (VALUES
  (…),
  …)
INSERT INTO my_table (foreign_key, big_int, value_1, value_2)
SELECT
  foreign_key,
  big_int,
  COALESCE(value_1, lag(value_1) OVER (ORDER BY big_int)),
  COALESCE(value_2, lag(value_2) OVER (ORDER BY big_int))
FROM input;

(online demo)

Sign up to request clarification or add additional context in comments.

5 Comments

This would work with one INSERT, right? But if I have several INSERTs, the SELECT ... FROM input would only work on the current set of VALUES, wouldn't it? Could I do something like SELECT ... FROM input UNION SELECT (..same columns..) FROM my_table?
Should be UNION ALL
But with such a large UNION ALL table I wouldn't have indexes and the functions used in the SELECT would take forever. Is this correct?
Oh you mean you want to take the current max values from the table? I guess you can do that by using COALESCE(value_1, lag…, (SELECT value_1 FROM my_table WHERE value_1 IS NOT NULL ORDER BY big_int DESC LIMIT 1)), which should use the existing index on big_int to get the find last non-null value_1.
Thank you! However this did not work. Also, any CTE I've tried would be too slow unfortunately. I therefore created a second table which stores the max(big_int) values by foreign_key. I load this small table into memory and create my INSERTs with direct values from that table in memory. The content is updated with every new max for big_int and written back to DB asynchronously every few seconds.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.