I have following query:
SELECT
Sum(fact_individual_re.quality_hours) AS C0,
dim_gender.name AS C1,
dim_date.year AS C2
FROM
fact_individual_re
INNER JOIN dim_date ON fact_individual_re.dim_date_id = dim_date.id
INNER JOIN dim_gender ON fact_individual_re.dim_gender_id = dim_gender.id
GROUP BY dim_date.year,dim_gender.name
ORDER BY dim_date.year ASC,dim_gender.name ASC,Sum(fact_individual_re.quality_hours) ASC
When explaining it's plan, HASH JOIN is taking most time. Is there any way to minimize the time for HASH JOIN:
Sort (cost=190370.50..190370.55 rows=20 width=18) (actual time=4005.152..4005.154 rows=20 loops=1)
Sort Key: dim_date.year, dim_gender.name, (sum(fact_individual_re.quality_hours))
Sort Method: quicksort Memory: 26kB
-> Finalize GroupAggregate (cost=190369.07..190370.07 rows=20 width=18) (actual time=4005.106..4005.135 rows=20 loops=1)
Group Key: dim_date.year, dim_gender.name
-> Sort (cost=190369.07..190369.27 rows=80 width=18) (actual time=4005.100..4005.103 rows=100 loops=1)
Sort Key: dim_date.year, dim_gender.name
Sort Method: quicksort Memory: 32kB
-> Gather (cost=190358.34..190366.54 rows=80 width=18) (actual time=4004.966..4005.020 rows=100 loops=1)
Workers Planned: 4
Workers Launched: 4
-> Partial HashAggregate (cost=189358.34..189358.54 rows=20 width=18) (actual time=3885.254..3885.259 rows=20 loops=5)
Group Key: dim_date.year, dim_gender.name
-> Hash Join (cost=125.17..170608.34 rows=2500000 width=14) (actual time=2.279..2865.808 rows=2000000 loops=5)
Hash Cond: (fact_individual_re.dim_gender_id = dim_gender.id)
-> Hash Join (cost=124.13..150138.54 rows=2500000 width=12) (actual time=2.060..2115.234 rows=2000000 loops=5)
Hash Cond: (fact_individual_re.dim_date_id = dim_date.id)
-> Parallel Seq Scan on fact_individual_re (cost=0.00..118458.00 rows=2500000 width=12) (actual time=0.204..982.810 rows=2000000 loops=5)
-> Hash (cost=78.50..78.50 rows=3650 width=8) (actual time=1.824..1.824 rows=3650 loops=5)
Buckets: 4096 Batches: 1 Memory Usage: 175kB
-> Seq Scan on dim_date (cost=0.00..78.50 rows=3650 width=8) (actual time=0.143..1.030 rows=3650 loops=5)
-> Hash (cost=1.02..1.02 rows=2 width=10) (actual time=0.193..0.193 rows=2 loops=5)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on dim_gender (cost=0.00..1.02 rows=2 width=10) (actual time=0.181..0.182 rows=2 loops=5)
Planning time: 0.609 ms
Execution time: 4020.423 ms
(26 rows)
I am using postgresql v10.