1

I have kinda complex question. Let's say that I have 7 tables (20mil+ rows each) (Table1, Table2 ...) with corresponding pk (pk1, pk2, ....) (cardinality among all tables is 1:1) I want to get my final table (using hash join) as:

Create table final_table as select   
t1.column1,  
t2.column2,  
t3.column3,  
t4.column4,  
t5.column5,  
t6.column6,  
t7.column7  
from table1 t1  
join table2 t2 on t1.pk1 = t2.pk2  
join table2 t3 on t1.pk1 = t3.pk3   
join table2 t4 on t1.pk1 = t4.pk4     
join table2 t5 on t1.pk1 = t5.pk5  
join table2 t6 on t1.pk1 = t6.pk6  
join table2 t7 on t1.pk1 = t7.pk7

I would like to know if it would be faster to create partial tables and then final table, like this?

Create table partial_table1 as select   
t1.column1,  
t2.column2  
from table1 t1  
join table2 t2 on t1.pk1 = t2.pk2



create table partial_table2 as select   
t1.column1, t1.column2  
t3.column3  
from partial_table1 t1  
join table3 t3 on t1.pk1 = t3.pk3



create table partial_table3 as select  
t1.column1, t1.column2, t1.column3   
t4.column4   
from partial_table1 t1   
join table3 t4 on t1.pk1 = t4.pk4



...  
...  
... 

I know it depends on RAM (because I want to use hash join), actual server usage, etc.. I am not looking for specific answer, I am looking for some explanations why and in what situations would it be better to use partial results or why it would it be better to use all 7 joins in 1 select.
Thanks, I hope that my question is easy to understand.

1
  • If the data in the base tables is updated regularly, one thing to worry about is your derived table data going stale. Commented Mar 16, 2018 at 23:17

2 Answers 2

1

In general, it is not better to create temporary tables. SQL engines have an optimization phase and this optimization phase should do well as figuring out the best query plan.

In the case of a bunch of joins, this is mostly about join order, use of indexes, and the optimal algorithm.

This is a good default attitude. Does it mean that temporary tables are never useful for performance optimization? Not at all. Here are some exceptions:

  • The optimizer generates a suboptimal query plan. In this case, query hints can push the optimizer in the right direction. And, temporary tables can help.
  • Indexing the temporary tables. Sometimes an index on the temporary tables can be a big win for performance. The optimizer might not pick this up.
  • Re-use of temporary tables across queries.

For your particular goal of using hash joins, you can use a query hint to ensure that the optimizer does what you would like. I should note that if the joins are on primary keys, then a hash join might not be the optimal algorithm.

Sign up to request clarification or add additional context in comments.

2 Comments

thanks, this is the answer I was looking for. Yes I am gonna use /*+ use_hash*/. I forgot to mention where condition where indexes on temp tables will help. Great answer
Avoid hints unless you know what you are doing. Let the optimizer do its job first. Make sure you have fresh stats. If this still does not work, then investigate hints.
0

It is not a good idea to create temporary tables in your database. To Optimize your query for reporting purposes or faster results trying using views and it can lead to much better results.

For your specific case, you want to use hash join can you please explain a bit more like why you want to use that in particular because the optimizer will determine the best plan by itself and you don't need to worry about the type of join it performs.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.