what you are seeing is a result of the IN () bieng subject to bind variable peeking. when you have histograms you write a query like "where a = 'a'" oracle will use the histogram to guess how many rows will be returned (same idea with an inlist operator, which iterates for each item and aggregates rows). if no histograms it will make a guess in the form of rows/distinct values.
In a subquery oracle doesn't do this (in most cases..there is a unique case where it does).
for example:
SQL> create table test
2 (val1 number, val2 varchar2(20), val3 number);
Table created.
Elapsed: 00:00:00.02
SQL>
SQL> insert into test select 1, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;
100 rows created.
Elapsed: 00:00:00.01
SQL> insert into test select 2, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 1000;
1000 rows created.
Elapsed: 00:00:00.02
SQL> insert into test select 3, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;
100 rows created.
Elapsed: 00:00:00.00
SQL> insert into test select 4, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100000;
100000 rows created.
so i have a table with 101200 rows. for VAL1 , 100 are "1" 1000 are "2" 100 are "3" and 100k are "4".
now if histograms are gathered (and we do want them in this case)
SQL> exec dbms_stats.gather_table_stats(user , 'test', degree=>4, method_opt=>'for all indexed columns size 4', estimate_percent=>100);
SQL> exec dbms_stats.gather_table_stats(user , 'lookup', degree=>4, method_opt =>'for all indexed columns size 3', estimate_percent=>100);
we see the following:
SQL> explain plan for select * from test where val1 in (1, 2, 3) ;
Explained.
SQL> @explain ""
Plan hash value: 3165434153
--------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1200 | 19200 | 23 (0)| 00:00:01 |
| 1 | INLIST ITERATOR | | | | | |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST | 1200 | 19200 | 23 (0)| 00:00:01 |
|* 3 | INDEX RANGE SCAN | TEST1 | 1200 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------
vs
SQL> explain plan for select * from test where val1 in (select id from lookup where str = 'A') ;
Explained.
SQL> @explain ""
Plan hash value: 441162525
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25300 | 518K| 106 (3)| 00:00:02 |
| 1 | NESTED LOOPS | | 25300 | 518K| 106 (3)| 00:00:02 |
| 2 | TABLE ACCESS BY INDEX ROWID| LOOKUP | 1 | 5 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | LOOKUP1 | 1 | | 0 (0)| 00:00:01 |
|* 4 | TABLE ACCESS FULL | TEST | 25300 | 395K| 105 (3)| 00:00:02 |
----------------------------------------------------------------------------------------
where lookup table is
SQL> select * From lookup;
ID STR
---------- ----------
1 A
2 B
3 C
4 D
(str is unique indexed and has histograms).
notice a bang on cardinality of 1200 for the inlist and a good plan, but a wildly inaccurate one on the sub query? Oracle hasn't computed histograms on the join condition, instead it has said "look, i dont know what id will be, so ill guess total rows(100k+1000+100+100)/distinct values(4) = 25300 and use that. as such its picked a full table scan.
that's all great, but how to fix it? if you know that this sub query will match a small number of rows (we do). then you have to hint the outer query to try to have it use an index. like:
SQL> explain plan for select /*+ index(t) */ * from test t where val1 in (select id from lookup where str = 'A') ;
Explained.
SQL> @explain
Plan hash value: 702117913
----------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 25300 | 518K| 456 (1)| 00:00:06 |
| 1 | NESTED LOOPS | | 25300 | 518K| 456 (1)| 00:00:06 |
| 2 | TABLE ACCESS BY INDEX ROWID| LOOKUP | 1 | 5 | 1 (0)| 00:00:01 |
|* 3 | INDEX UNIQUE SCAN | LOOKUP1 | 1 | | 0 (0)| 00:00:01 |
| 4 | TABLE ACCESS BY INDEX ROWID| TEST | 25300 | 395K| 455 (1)| 00:00:06 |
|* 5 | INDEX RANGE SCAN | TEST1 | 25300 | | 61 (2)| 00:00:01 |
----------------------------------------------------------------------------------------
another thing is in my particular case. as val1=4 is most of the table, lets say i have my standard query:
select * from test t where val1 in (select id from lookup where str = :B1);
for the possible :B1 inputs. if i know that the valid values passed in are A, B and C (ie not D which maps to id=4) . i can add this trick:
SQL> explain plan for select * from test t where val1 in (select id from lookup where str = :b1 and id in (1, 2, 3)) ;
Explained.
SQL> @explain ""
Plan hash value: 771376936
--------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 250 | 5250 | 24 (5)| 00:00:01 |
|* 1 | HASH JOIN | | 250 | 5250 | 24 (5)| 00:00:01 |
|* 2 | VIEW | index$_join$_002 | 1 | 5 | 1 (100)| 00:00:01 |
|* 3 | HASH JOIN | | | | | |
|* 4 | INDEX RANGE SCAN | LOOKUP1 | 1 | 5 | 0 (0)| 00:00:01 |
| 5 | INLIST ITERATOR | | | | | |
|* 6 | INDEX UNIQUE SCAN | SYS_C002917051 | 1 | 5 | 0 (0)| 00:00:01 |
| 7 | INLIST ITERATOR | | | | | |
| 8 | TABLE ACCESS BY INDEX ROWID| TEST | 1200 | 19200 | 23 (0)| 00:00:01 |
|* 9 | INDEX RANGE SCAN | TEST1 | 1200 | | 4 (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------
now notice oracle has got a reasonable card (its pushed the 1,2,3 onto the TEST table and got 1200..not 100% accurate, as i was only filtering on noe of them but ive told oralce CERTAINLY NOT 4!
selectstatement, but these are not i thegroup bystatement.INNER JOINmight return a different result than anINexpression. Those are not always equivalent.