How to make a Postgresql function parallelly running?

Question

I have thousands tables with same structure, and I need to add a index on each.

In order to simplify my work, I code a function like this:

CREATE OR REPLACE FUNCTION createIndexTD(
  IN tbl_name information_schema.sql_identifier,
  OUT result BOOLEAN)
  LANGUAGE 'plpgsql' AS
$func$
BEGIN
    EXECUTE format('CREATE INDEX idx_%s_td_time ON %s USING BRIN(trade_day, ABS(EXTRACT(EPOCH FROM (tick_time - brok_time))));',
            tbl_name, tbl_name);
    SELECT TRUE INTO result;
END
$func$;

And then call it in a query like this:

SELECT createIndexTD("table_name")
FROM information_schema."tables"
WHERE table_schema = 'public' AND table_type = 'BASE TABLE'
  AND "table_name" LIKE 'ticks%';

It works correctly, but uses only one core of my CPU, while there are 12 cores(24 threads) and I configured 12 workers within my Postgresql instance.

The amount of the data is huge.

Is there a way, I can make this function to run parallelly?

In other words, how to concurrently add indexes on multiple different tables?

Thanks!

------------------------- Updated ---------------------------------

According to the hint by John K.N., I added PARALLEL SAFE and COST 2000 into my function, and tried it again. But it looks like still only one postgresql worker doing the job(by watching the output of Linux command top).

Then I edited my postgresql.conf, and switch force-parallel option to on, and restarted postgresql.

This time, Postgresql-11 refused to run my function and said:

ERROR: cannot execute CREATE INDEX during a parallel operation.

John K. N. · Accepted Answer · 2023-04-21 10:04:56Z

You can add the PARALLEL SAFE parameter to the definition of your function in order for the function to execute your statement in parallel and also introduce the COST parameter in order to force a parallel execution (as pointed out by Laurenz Albe in the comment):

CREATE OR REPLACE FUNCTION createIndexTD(
  IN tbl_name information_schema.sql_identifier,
  OUT result BOOLEAN)
  LANGUAGE 'plpgsql' 
  PARALLEL SAFE -- << add here
  COST 200 -- << include a high cost
AS
$func$
BEGIN
    EXECUTE format('CREATE INDEX idx_%s_td_time ON %s USING BRIN(trade_day, ABS(EXTRACT(EPOCH FROM (tick_time - brok_time))));',
            tbl_name, tbl_name);
    SELECT TRUE INTO result;
END
$func$;

However there are some restrictions:

PARALLEL

PARALLEL UNSAFE indicates that the function can't be executed in parallel mode and the presence of such a function in an SQL statement forces a serial execution plan. This is the default. PARALLEL RESTRICTED indicates that the function can be executed in parallel mode, but the execution is restricted to parallel group leader. PARALLEL SAFE indicates that the function is safe to run in parallel mode without restriction.

Functions should be labeled parallel unsafe if they modify any database state, or if they make changes to the transaction such as using sub-transactions, or if they access sequences or attempt to make persistent changes to settings (e.g., setval). They should be labeled as parallel restricted if they access temporary tables, client connection state, cursors, prepared statements, or miscellaneous backend-local state which the system cannot synchronize in parallel mode (e.g., setseed cannot be executed other than by the group leader because a change made by another process would not be reflected in the leader). In general, if a function is labeled as being safe when it is restricted or unsafe, or if it is labeled as being restricted when it is in fact unsafe, it may throw errors or produce wrong answers when used in a parallel query. C-language functions could in theory exhibit totally undefined behavior if mislabeled, since there is no way for the system to protect itself against arbitrary C code, but in most likely cases the result will be no worse than for any other function. If in doubt, functions should be labeled as UNSAFE, which is the default.

COST execution_cost

A positive number giving the estimated execution cost for the function, in units of cpu_operator_cost. If the function returns a set, this is the cost per returned row. If the cost is not specified, 1 unit is assumed for C-language and internal functions, and 100 units for functions in all other languages. Larger values cause the planner to try to avoid evaluating the function more often than necessary.

^{Reference: CREATE FUNCTION (PostgreSQL | Documentation Ver. 15)}

Parallel index creation is supported in PostgreSQL but only for B-Tree indexes:

PostgreSQL can build indexes while leveraging multiple CPUs in order to process the table rows faster. This feature is known as parallel index build. For index methods that support building indexes in parallel (currently, only B-tree), maintenance_work_mem specifies the maximum amount of memory that can be used by each index build operation as a whole, regardless of how many worker processes were started. Generally, a cost model automatically determines how many worker processes should be requested, if any.

^{Reference: CREATE INDEX (PostgreSQL | Documentation Ver. 15)}

As pointed out by ypercubeᵀᴹ in his comment, he noticed that you are creating a BRIN index. Creating a BRIN index in parallel is not supported.

See the above highlighted section of my answer regarding the CREATE INDEX syntax.

If you really want the function to be executed in parallel, you have to mark it PARALLEL SAFE. PARALLEL RESTRICTED is not good enough. Moreover, you should set the COST of the function high, so that the optimizer has an idea that it is an expensive function, and parallel execution could be useful. — Laurenz Albe
– Laurenz Albe, Commented Apr 21, 2023 at 6:27
Thank you gentelmen! I tried with PARALLEL SAFE but it looks like still only one worker was doing job. I confirmed my configures as: max_worker_processes = 24, max_parallel_maintenance_workers = 8, max_parallel_workers_per_gather = 24, parallel_leader_participation = on, max_parallel_workers = 24. — Leon
– Leon, Commented Apr 21, 2023 at 8:10
I got ERROR: cannot execute CREATE INDEX during a parallel operation... — Leon
– Leon, Commented Apr 21, 2023 at 8:22
What happens if you test a CREATE INDEX statement manually outside your function? Does it run in parallel? What version of PostgreSQL are you using? If possible, add the details of your error message to your question via the edit link. Thanks. — John K. N.
– John K. N., Commented Apr 21, 2023 at 8:29
Parallel index creation is supported in PostgreSQL but only for B-Tree indexes: You are trying to create a BRIN index. — ypercubeᵀᴹ
– ypercubeᵀᴹ, Commented Apr 21, 2023 at 9:58

Stack Exchange Network

How to make a Postgresql function parallelly running?

1 Answer 1

Your Answer

Hot Network Questions

How to make a Postgresql function parallelly running?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions