Oracle 11g hash join vs nested loops question

Question

I'm using Oracle 11g and working on a legacy system. I have two tables, A and B. A has ~40M rows and B ~4M rows. A has a foreign key on B (B is the parent of A).

A query does essentially this:

select count(A.id), sum(A.something1), avg(B.something2), ....
from A inner join B on A.id_b = B.id
where A.filter_column in (...)
group by A.filter_column

condition on filter_column is such that 5-10% of rows from table A are selected. Join picks up 20-25% of rows from B. The end result is about 100 rows that contain aggregated data from selected rows from tables A and B.

The query initially had a hint in it, forcing Oracle to do a nested loops join of A to B. By removing the hint, execution plan changes from nested loops to a hash join. I would expect that to run faster, however, to my great surprise, it actually runs slower! Trying to find out why, I collected statistics on execution using one of Tom Kyte's scripts ("runstats") and found out that the nested loops version of the query did 0 (!) physical reads while the hash join version did a lot of them. I ran the test many times, changing the order of the execution of the two queries so that the nested loops version runs first and hash join version runs second and vice versa, but the results were always about the same. Furthermore, the hash join plan produces less logical I/O but is still slower.

Does anybody have an explanation for that?

Thank you!

ibre5041 · Accepted Answer · 2015-04-17 08:25:43Z

3

Try to execute the query with gather_plan_statistics hint. Then use dbms_xplan to display exec plan. You will see E-rows(estimated) and A-rows(Actual). This is where I would start, to check whether the optimizer is wrong in it's assumptions or not.

Also check v$sql_plan of the running query and check whether TEMP space is really used or not. Especially on older Oracle versions hash joins were slower, because hash table was too big, it did not fit into PGA and had to be stored on disk - thus generating too mush disk I/O.

answered Apr 17, 2015 at 8:25

ibre5041

1,5848 silver badges14 bronze badges

Thank you for your answer. I tried to do as you suggested but I'm a bit confused with the results. Executing the query with gather_plan_statistics shows that the optimizer did miss estimates for one table (number of rows resulting from filtering) by about a factor of 2, but I don't think that's significant. Looking at v$sql_plan I can see that temp_space is NULL which means, according to docs, that temp space is not used. v$sesstat shows 0 for physical reads/writes to temp tablespace. That would mean that the hash join was done entirely in memory... or am I missing something?

zgguy
– zgguy

2015-04-17 14:02:09 +00:00
Commented Apr 17, 2015 at 14:02
Then the problem is somewhere else. You can try to use the cardinality hint to advice the expected number of rows used, but if the factor is only 2 it will not help probably.

ibre5041
– ibre5041

2015-04-18 10:27:58 +00:00
Commented Apr 18, 2015 at 10:27

Add a comment |

Stack Exchange Network

Oracle 11g hash join vs nested loops question

1 Answer 1

Your Answer

Hot Network Questions

Oracle 11g hash join vs nested loops question

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions