0

I have a string column in the database that needs to be converted to an array type. The database should not be locked during the process in which I also need to set the indexes.

ALTER TABLE sites ALTER COLUMN rtb_id TYPE varchar[] USING string_to_array(rtb_id, '');
CREATE INDEX CONCURRENTLY rtb_id_search ON sites(rtb_id) USING array_to_string;
DROP INDEX CONCURRENTLY ix_sites_bundle_trgm_gin ON sites;
DROP INDEX CONCURRENTLY ix_sites_name_trgm_gin ON sites;

Is this the way to do it ?

Edit:

ALTER TABLE sites ADD COLUMN rtb_ids varchar[]
...
BEFORE INSERT OR UPDATE ... FOR EACH ROW trigger that sets NEW.rtb_id_new := string_to_array(NEW.rtb_id,' ') for each row.
In batches, UPDATE sites SET rtb_id_new = string_to_array(rtb_id,' ')
...
VACUUM sites; 
CREATE INDEX CONCURRENTLY rtb_ids_search ON sites(rtb_ids) USING array_to_string(rtb_ids, '');

ALTER TABLE sites DROP COLUMN rtb_id; 

Thanks

1
  • Hard to say. Please provide essential information: table and index definitions, sample values for rtb_id and describe your use case and rationale. It's very odd, that an ID column would be an array. And a default btree index on an array column doesn't seem to make a lot of sense ... Commented Sep 28, 2015 at 22:45

1 Answer 1

1

It is not possible to do it without locks. You can do it with relatively few short-lived strong locks, though.

The ALTER TABLE will take an exclusive lock for a long while at the moment because it does a full table rewrite.

Instead you'll need to:

  • ALTER TABLE sites ADD COLUMN rtb_id_new varchar[]
  • Create a BEFORE INSERT OR UPDATE ... FOR EACH ROW trigger that sets NEW.rtb_id_new := string_to_array(NEW.rtb_id,' ') for each row.
  • In batches, UPDATE sites SET rtb_id_new = string_to_array(rtb_id,' ')
  • Once all values are populated VACUUM sites; then ALTER TABLE sites ALTER COLUMN rtb_id_new NOT NULL. This will take an exclusive lock for long enough to do a sequential scan, so it's not going to be super-fast. On PostgreSQL 9.5 the lock taken is weaker and won't stop SELECTs.
  • Build your indexes CONCURRENTLY
  • ALTER TABLE sites DROP COLUMN rtb_id; ALTER TABLE sites RENAME COLUMN rtb_id_new TO rtb_column;
  • If you need to add any UNIQUE constraints, add them USING the indexes already built to minimise lock durations.

This isn't totally lock-free. In particular the NOT NULL constraint will hurt because PostgreSQL doesn't (yet) know how to add a NOT NULL constraint as NOT VALID then validate it.

Sign up to request clarification or add additional context in comments.

2 Comments

How would the looping through every row in the rtb_id column, apply string_to_array and insert into rtb_ids work ? Just edited my question, I'm not sure how I would do that part of the query.
@FranGoitia Do UPDATEs with ranges of keys in the WHERE clause. A simple script will be useful. Don't use plpgsql because you need them to be separate transactions and vacuum between them to have any benefit. If the table is small (less than a few hundred thousand rows) just do one UPDATE for the whole lot.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.