Optimizing sql query with subselect list in clause

Question

I'm using oracle 11g and trying to optimize a query.

The basic structure of the query is:

SELECT val1, val2, val3,
FROM 
table_name
WHERE
val1 in (subselect statement is here, it selects a list of possible values for 
    val1 from another table) 
and val5>=X and val5<=Y
group by val1
order by val2 desc;

My issue is that when I use a subselect, the cost is 3130. If I fill in the results of the subselect by hand - so, for example

field1 in (1, 2, 3, 4, 5, 6)

Where (1, 2, 3, 4, 5, 6) is the results of the subselect, which in this case is all possible values of field 1, the cost of the query is 14, and oracle uses an "inlist iterator" for the group by part of the query. The results of the two queries are identical.

My question is how to mimic the behaviour of manually listing the possible values of field1 with a subselect statement. The reason I don't list those values in the query is that the possible values change based on one of the other fields, so the subselect is pulling the possible values of field1 from a 2nd table based on, say, field2.

I have an index of val1, val5, so it isn't doing any full table scans - it does do a range scan in both cases, but in the subselect case the range scan is much more expensive. However it isn't the most expensive part of the subselect query. The most expensive part is the group by, which is a HASH.

Edit - Yes, the query isn't syntactically correct - I didn't want to put up anything too specific. The actual query is fine - the selects use valid group by functions.

The subselect returns 6 values, but it can be anywhere from 1-50 or so based on the other value.

Edit2 - What I ended up doing was 2 separate queries so I could generate the list used in the subselect. I actually tried a similar test in sqlite, and it does the same thing, so this isn't just Oracle.

Your query is not syntactically correct. You have val2 and val3 in the select statement, but these are not i the group by statement. — Gordon Linoff
– Gordon Linoff, Commented Nov 9, 2012 at 22:30
Use an INNER JOIN instead of a subquery and compare the results. IT should perform much better. — Eli Gassert
– Eli Gassert, Commented Nov 9, 2012 at 22:31
how many values subselect returns? How much percentage of data in column val1 those values cover? — sufleR
– sufleR, Commented Nov 9, 2012 at 22:33
@EliGassert: an INNER JOIN might return a different result than an IN expression. Those are not always equivalent. — user330315
– user330315, Commented Nov 9, 2012 at 23:08
In this case an inner join would give too many (duplicate) results. Sure. That's easily fixed and tested/profiled. It still warrants a test imo — Eli Gassert
– Eli Gassert, Commented Nov 10, 2012 at 0:19

DazzaL · Accepted Answer · 2012-11-09 23:37:01Z

what you are seeing is a result of the IN () bieng subject to bind variable peeking. when you have histograms you write a query like "where a = 'a'" oracle will use the histogram to guess how many rows will be returned (same idea with an inlist operator, which iterates for each item and aggregates rows). if no histograms it will make a guess in the form of rows/distinct values. In a subquery oracle doesn't do this (in most cases..there is a unique case where it does).

for example:

SQL> create table test
  2  (val1 number, val2 varchar2(20), val3 number);

Table created.

Elapsed: 00:00:00.02
SQL>
SQL> insert into test select 1, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;

100 rows created.

Elapsed: 00:00:00.01
SQL> insert into test select 2, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 1000;

1000 rows created.

Elapsed: 00:00:00.02
SQL> insert into test select 3, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100;

100 rows created.

Elapsed: 00:00:00.00
SQL> insert into test select 4, 'aaaaaaaaaa', mod(rownum, 5) from dual connect by level <= 100000;

100000 rows created.

so i have a table with 101200 rows. for VAL1 , 100 are "1" 1000 are "2" 100 are "3" and 100k are "4".

now if histograms are gathered (and we do want them in this case)

SQL> exec dbms_stats.gather_table_stats(user , 'test', degree=>4, method_opt=>'for all indexed columns size 4', estimate_percent=>100);

SQL> exec dbms_stats.gather_table_stats(user , 'lookup', degree=>4, method_opt =>'for all indexed columns size 3', estimate_percent=>100);

we see the following:

SQL> explain plan for select * from test where val1 in (1, 2, 3) ;

Explained.

SQL> @explain ""

Plan hash value: 3165434153

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |  1200 | 19200 |    23   (0)| 00:00:01 |
|   1 |  INLIST ITERATOR             |       |       |       |            |          |
|   2 |   TABLE ACCESS BY INDEX ROWID| TEST  |  1200 | 19200 |    23   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | TEST1 |  1200 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

vs

SQL> explain plan for select * from test where val1 in (select id from lookup where str = 'A') ;

Explained.

SQL> @explain ""

Plan hash value: 441162525

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         | 25300 |   518K|   106   (3)| 00:00:02 |
|   1 |  NESTED LOOPS                |         | 25300 |   518K|   106   (3)| 00:00:02 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LOOKUP  |     1 |     5 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LOOKUP1 |     1 |       |     0   (0)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL          | TEST    | 25300 |   395K|   105   (3)| 00:00:02 |
----------------------------------------------------------------------------------------

where lookup table is

SQL> select * From lookup;

        ID STR
---------- ----------
         1 A
         2 B
         3 C
         4 D

(str is unique indexed and has histograms).

notice a bang on cardinality of 1200 for the inlist and a good plan, but a wildly inaccurate one on the sub query? Oracle hasn't computed histograms on the join condition, instead it has said "look, i dont know what id will be, so ill guess total rows(100k+1000+100+100)/distinct values(4) = 25300 and use that. as such its picked a full table scan.

that's all great, but how to fix it? if you know that this sub query will match a small number of rows (we do). then you have to hint the outer query to try to have it use an index. like:

SQL> explain plan for select /*+ index(t) */ * from test t where val1 in (select id from lookup where str = 'A') ;

Explained.

SQL> @explain

Plan hash value: 702117913

----------------------------------------------------------------------------------------
| Id  | Operation                    | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |         | 25300 |   518K|   456   (1)| 00:00:06 |
|   1 |  NESTED LOOPS                |         | 25300 |   518K|   456   (1)| 00:00:06 |
|   2 |   TABLE ACCESS BY INDEX ROWID| LOOKUP  |     1 |     5 |     1   (0)| 00:00:01 |
|*  3 |    INDEX UNIQUE SCAN         | LOOKUP1 |     1 |       |     0   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID| TEST    | 25300 |   395K|   455   (1)| 00:00:06 |
|*  5 |    INDEX RANGE SCAN          | TEST1   | 25300 |       |    61   (2)| 00:00:01 |
----------------------------------------------------------------------------------------

another thing is in my particular case. as val1=4 is most of the table, lets say i have my standard query: select * from test t where val1 in (select id from lookup where str = :B1);

for the possible :B1 inputs. if i know that the valid values passed in are A, B and C (ie not D which maps to id=4) . i can add this trick:

SQL> explain plan for select  * from test t where val1 in (select id from lookup where str = :b1 and id in (1, 2, 3)) ;

Explained.

SQL> @explain ""

Plan hash value: 771376936

--------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |                  |   250 |  5250 |    24   (5)| 00:00:01 |
|*  1 |  HASH JOIN                    |                  |   250 |  5250 |    24   (5)| 00:00:01 |
|*  2 |   VIEW                        | index$_join$_002 |     1 |     5 |     1 (100)| 00:00:01 |
|*  3 |    HASH JOIN                  |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN          | LOOKUP1          |     1 |     5 |     0   (0)| 00:00:01 |
|   5 |     INLIST ITERATOR           |                  |       |       |            |          |
|*  6 |      INDEX UNIQUE SCAN        | SYS_C002917051   |     1 |     5 |     0   (0)| 00:00:01 |
|   7 |   INLIST ITERATOR             |                  |       |       |            |          |
|   8 |    TABLE ACCESS BY INDEX ROWID| TEST             |  1200 | 19200 |    23   (0)| 00:00:01 |
|*  9 |     INDEX RANGE SCAN          | TEST1            |  1200 |       |     4   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------

now notice oracle has got a reasonable card (its pushed the 1,2,3 onto the TEST table and got 1200..not 100% accurate, as i was only filtering on noe of them but ive told oralce CERTAINLY NOT 4!

Thanks for the detailed response. I'm going to end up generating the list for the subselect manually and adding it to the query, since it's so much faster this way. It takes my main query from 45 seconds to < 1s.

sufleR · Accepted Answer · 2012-11-09 23:11:52Z

2

I have done some research and I think everything is explained here: oracle docs.
Just look in "How the CBO Evaluates IN-List Iterators" and compare it to "How the CBO Evaluates the IN Operator".

Your query with "field1 in (1, 2, 3, 4, 5, 6)" is matching first case but query with subselect is rewritten by Oracle.

So every query with subselect or join will have similar cost to yours unless you find very tricky way to put return from subquery as parameters.

You can always try to set more memory to sorts.

edited Nov 9, 2012 at 23:11

answered Nov 9, 2012 at 23:05

sufleR

2,99320 silver badges31 bronze badges

Comments

Gordon Linoff · Accepted Answer · 2012-11-09 22:44:53Z

1

You might be able to fix the statement by adding indexes on the subselect. However, you would have to post the query and execution plan to understand that. By the way, how long does the subselect itself take?

You can try one of the following two versions:

select val1, val2, val3
from table_name join
     (select distinct val from (subselect here)) t
     on table_name.val1 = t.val
where val5>=X and val5<=Y
group by val1, val2, val3
order by val2 desc;

or:

select val1, val2, val3
from table_name
where val5>=X and val5<=Y and
      exists (select 1 from (subselect here) t where t.val = table_name.val1)
group by val1, val2, val3
order by val2 desc;

These are semantically equivalent, and one of them might optimize better.

One other possibility that might work is to do the filtering after the group by. Something like:

select t.*
from (select val1, val2, val3
      from table_name
      where val5>=X and val5<=Y and
      group by val1, val2, val3
     ) t
where val1 in (subselect here)
order by val2 desc;

edited Nov 9, 2012 at 22:44

answered Nov 9, 2012 at 22:34

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

5 Comments

user1813522 Over a year ago

The cost of the subselect is 2, it has an index. For your first example, the cost is the same as using a subselect. The 2nd way has a cost of 3128 instead of 3130.

Gordon Linoff Over a year ago

Oracle would appear to be choosing the wrong algorithm for joining the tables. You could update table statistics to see if that helps.

user1813522 Over a year ago

Unfortunately I can't depend on the table statistics being correct. Is there a way to hint oracle to force it to do what I want? I looked around google for a hint to force oracle to use an inlist interator but I didn't see one. Maybe I need to select into a varray?

Gordon Linoff Over a year ago

I would suggest the USE_HASH or NO_USE_NL hints (docs.oracle.com/cd/B12037_01/server.101/b10752/…). These are join hints for tables, and the problem is probably that the resulting query is doing a nested loop join.

user1813522 Over a year ago

It's doing a HASH JOIN for the subselect. If I can't figure out a way to do this in the SQL statement, I'll just run two separate statements into the DB and build the 2nd one with the results from the first.

Guillaume Outters · Accepted Answer · 2024-09-10 09:18:21Z

0

Since Oracle 10g R2 you can use the precompute_subquery hint:

WHERE val1 in (SELECT /*+ precompute_subquery */ …)

See https://stackoverflow.com/a/68492163/1346819 for a great explanation including this hint and all cautions on its use.

answered Sep 10, 2024 at 9:18

Guillaume Outters

7,5121 gold badge22 silver badges28 bronze badges

Collectives™ on Stack Overflow

Optimizing sql query with subselect list in clause

4 Answers 4

1 Comment

Comments

5 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

1 Comment

Comments

5 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related