In my database design, a lot of functions are used. And many of them are very slow. So, I decided that it could be a wise idea, to create indexes on some of them in order to make execution a little bit faster. However, I don't succeed in persuading PostgreSQL (9.6) to actually use my index.
Consider this table "user"
id integer | name jsonb
1 | {"last_names": ["Tester"], "first_names": ["Teddy","Eddy"]}
2 | {"last_names": ["Miller"], "first_names": ["Lisa","Emma"]}
Often, I need the name as one string, that's done with a query like (called "concat_name")
SELECT array_to_string(jsonb_arr2text_arr(name->'last_names'), ' ') || ', ' || array_to_string(jsonb_arr2text_arr(name->'first_names'), ' ');
I decided to put that functionality into a function, because it is used on multiple tables:
CREATE OR REPLACE FUNCTION public.concat_name(name jsonb)
RETURNS text AS
$BODY$
SELECT pg_sleep(50);
SELECT array_to_string(jsonb_arr2text_arr(name->'last_names'), ' ') || ', ' || array_to_string(jsonb_arr2text_arr(name->'first_names'), ' ');
$BODY$
LANGUAGE sql IMMUTABLE SECURITY DEFINER
COST 100;
You see, to actually test whether it works, I've added an "artificially" time out. Now, I've created an index like:
CREATE INDEX user_concat_name_idx ON "user" (concat_name(name));
which succeeds and takes the expected time (because of the pg_sleep). I then run a query:
SELECT concat_name(name) FROM "user";
However, the index is not being used and the query is very slow. Instead, EXPLAIN tells me that the planer does a Sequence Scan on "user".
I did a little bit of research and many people state that the query planer thinks that in case the table is small or the dataset being retrieved is (almost) the whole table it thinks that doing a sequence scan is more efficient than looking up an index. However, in case of functions, especially slow ones, that doesn't make any sense to me. Even if you query a table which contains only one row - using a function index could dramatically decrease the execution time if your query includes a function which takes 50 seconds to execute each time.
So, in my opinion, the query planner has to compare the time it takes the look up the indexed value vs. the time it takes to execute the function. The size of the table or of the query itself (how many rows are returned), doesn't matter at all here. And, well, if the function takes 50 seconds to execute, looking up the index should always win.
So, what can I do here to make the query planer use the index instead of executing the function each time anew?