Context
I have an UPDATE statement in Postgres which essentially restructures a jsonb column.Here's what it's doing if you're curious:
metadata = (
jsonb_build_object(
'input_file_url', metadata -> 'input_file_url',
'standard_encoding_state', metadata -> 'standard_encoding_state',
'state', metadata -> 'state',
'error', metadata -> 'error',
'io_file_and_run_strategy_response', (
jsonb_build_object(
'input_file_url', metadata -> 'input_file_url',
'output_file_url', metadata -> 'output_file_url',
'download_encode_upload_duration', metadata -> 'download_encode_upload_duration',
'output_video_size', metadata -> 'output_video_size',
'input_video_size', metadata -> 'input_video_size',
'download_source_start_time', metadata -> 'download_source_start_time',
'download_source_end_time', metadata -> 'download_source_end_time',
'encode_local_file_start_time', metadata -> 'encode_local_file_start_time',
'encode_local_file_end_time', metadata -> 'encode_local_file_end_time',
'upload_encoded_start_time', metadata -> 'upload_encoded_start_time',
'upload_encoded_end_time', metadata -> 'upload_encoded_end_time'
)
)
)
)
As you can see it's just changing the structure of the json object.
This table has tens of millions of records. When I run the query it usually it takes about 20 mins (I've been doing some testing on a local copy of the db).
However, earlier today it took about 2 hours. When I looked in the pg activities view, I saw that it was blocked by an IO DataFileRead event.. and in the activities it also looked like autovaccum was running (but according to pg_blocks nothing was blocked, but my theory is Autovacumm was slowing down disk read).
Actual Question
My theory on why it took so long was because autovaccum ran because so many records got updated.
Could there be any benefits if I introduced a date filter to the update query, and instead of running ONE BIG update, I run many on a loop using the date filter.
Or is it always better to just give PG a big query and let it do the most optimized thing?