I have a Postgres table with columns like this:
(id, foreign_key, big_int, value_1, value_2)
id is the autoincrement primary key, there is a unique key on foreign_key and big_int columns. I am inserting several million records into this table using several large INSERT .. VALUES (),()...() calls.
One of value_1 or value_2 is usually NULL, both can be set, too, but both are never NULL.
What I wish to achieve is to eliminate NULL besides the initial NULLs by replacing the NULL in the INSERT with the previous non-null values for value_1 and value_2 among all rows for foreign_key where big_int is max (due to the unique index there will only be one row and this should also be the row inserted before the one to be inserted - if it makes things easier).
How can I do that in a fast way? I stumbled upon the LAG() function, but I didn't figure out how to use it properly.
Example:
Lets assume I am inserting the following values:
INSERT INTO my_table (foreign_key, big_int, value_1, value_2) VALUES (1, 5000, NULL, 50 ), (1, 5001, 51, NULL), (2, 6000, 20, NULL), (1, 5002, NULL, 52 ), (2, 6001, 21, 22 ), (1, 5003, 53, 54 );
Then the resulting table is supposed to look like this (omitting the id column):
1 | 5000 | NULL | 50 -> inserted as is since there is no previous row for foreign_key 1 1 | 5001 | 51 | 50 -> 50 for value_2 column copied from previous row for foreign_key 1 2 | 6000 | 20 | NULL -> inserted as is since there is no previous row for foreign_key 2 1 | 5002 | 51 | 52 -> 51 for value_1 column copied from previous row for foreign_key 1 2 | 6001 | 21 | 22 -> 22 for value_2 column copied from previous row for foreign_key 2 1 | 5003 | 53 | 54 -> no need to copy since both values have been provided
I am sure I have to modify the INSERT statement. Anything else (triggers?, etc) seem to slow.
:-)