I have some big sql query with many calculating columns in SELECT block. Also, there is ordering by one of those calculating columns and limit for only 100 rows. But postgres calculates all columns for every row, not for only 100.
Let me explain on example.
Let's create some test table:
CREATE TABLE test_main(col1 INTEGER);
And fill it with some random data:
DO
$do$
BEGIN
FOR r IN 1..100000 LOOP
INSERT INTO test_main(col1) VALUES (trunc(random()*1000));
END LOOP;
END
$do$;
Then create some additional tables:
CREATE TABLE test_main_agg1(
col1 INTEGER,
val INTEGER
);
CREATE TABLE test_main_agg2(
col1 INTEGER,
val INTEGER
);
And fill it too:
DO
$do$
DECLARE
r test_main%rowtype;
BEGIN
FOR r IN SELECT * FROM test_main LOOP
FOR i IN 1..5 LOOP
INSERT INTO test_main_agg1(col1, val) VALUES (r.col1, trunc(random()*1000));
INSERT INTO test_main_agg2(col1, val) VALUES (r.col1, trunc(random()*1000));
END LOOP;
END LOOP;
END
$do$;
And, of course, create some indexes:
CREATE INDEX test_main_indx ON test_main(col1);
CREATE INDEX test_main_agg1_val_indx ON test_main_agg1(col1,val);
CREATE INDEX test_main_agg2_val_indx ON test_main_agg2(col1,val);
Now, if we execute this query:
SELECT col1,
(SELECT MAX(val) FROM test_main_agg1 g WHERE g.col1=m.col1) max_val1,
(SELECT MAX(val) FROM test_main_agg2 g WHERE g.col1=m.col1) max_val2
FROM test_main m
LIMIT 100;
It will be very fast because of indexes. If we add ORDER BY col1 it is still going to be fast. But if we will use ORDER BY max_val1, then it will take about 2 seconds.
If we run EXPLAIN ANALYZE on query with `ORDER BY max_val1, we will see this rows:
SubPlan 4
-> Result (cost=4.06..4.07 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=100000)
InitPlan 3 (returns $3)
-> Limit (cost=0.42..4.06 rows=1 width=4) (actual time=0.010..0.010 rows=1 loops=100000)
-> Index Only Scan Backward using test_main_agg2_val_indx on test_main_agg2 g_1 (cost=0.42..1818.25 rows=500 width=4) (actual time=0.010..0.010 rows=1 loops=100000)
Index Cond: ((col1 = m.col1) AND (val IS NOT NULL))
Heap Fetches: 100000
It means, that postgres calculate max_val2 for 100000 rows, but not for only 100 rows. I inderstand why postgres needs to calculate max_val1, but not max_val2.
Maybe there is some hints or something like this to tell postgres calculate columns after it execute ordering and limit?
DOblocks can be replaced with simple INSERT statements: e.g.:insert into test_main(col1) select trunc(random()*1000) from generate_series(1,100000);