Oracle SQL performance optimization

Question

I have two SQL statements whose performance I expect to be similar, but in fact SQL1 used 0.065 seconds and SQL2 used over 10 seconds with just 8000 records in total. Could anyone help to explain this? How can I optimize SQL2?

SQL 1:

select
    job_id,
    JOB_DESCRIPTION,
    REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') as occurrences 
from smms.job 
where TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017;

SQL 2:

select job_id, JOB_Description 
from (
    select 
        job_id, 
        JOB_DESCRIPTION,
        REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') as occurrences 
    from smms.job 
    where TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017
) 
where occurrences > 0;

Although this is somewhat separate from your question, why does "SQL 2" use the subquery? occurrences is not used in the final select list, so I don't see why you don't just use REGEXP_COUNT directly in the WHERE clause. i.e. SELECT job_id, job_description FROM sims.job WHERE TO_NUMBER... =2017 AND REGEXP_COUNT... > 0 — EdmCoff
– EdmCoff, Commented Aug 27, 2018 at 4:10
What are the execution plans of the queries? What will happen if TO_NUMBER(to_char(CREATE_DATE,'YYYY'))=2017 would be changed to CREATE_DATE >= DATE'2017-01-01' AND CREATE_DATE < DATE'2018-01-01'? — Andrei Odegov
– Andrei Odegov, Commented Aug 27, 2018 at 4:15
The clause you mentioned is my first version, but it also takes over 10 seconds, that's why I tried difference clause to optimize it. — MemoryLeak
– MemoryLeak, Commented Aug 27, 2018 at 4:26
I guess that version 1 is able to restrict the rather expensive regexp operation to a smaller resultset - the execution plans for both versions should show the difference (or maybe a sql trace with event 10046). Is there an index on create_date? — Martin Preiss
– Martin Preiss, Commented Aug 27, 2018 at 6:00
SQL3: select count(*) from smms.job where REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}')>0, it takes 10 seconds as well, which means the performance has nothing to do with CREATE_DATE filter, it wholly depends on the REGEXP_COUNT clause. — MemoryLeak
– MemoryLeak, Commented Aug 27, 2018 at 6:13

Martin Preiss · Accepted Answer · 2018-08-27 07:46:29Z

1

thinking again about the information I guess the two strategies are:

SQL 1:

Filter the rows with TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017
use the function REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') on the resulting rows

SQL 2:

use the function REGEXP_COUNT(JOB_Description, '(ABC|DEF)([[:digit:]]){5}') on all rows
filter the result with TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017

Since regexp functions are very expensive in Oracle this could explain the difference in performance.

Version 2 could be optimized with hints - for example with MATERIALIZE, if you add a CTE.

answered Aug 27, 2018 at 7:46

Martin Preiss

3963 silver badges9 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MemoryLeak Over a year ago

Yes, you are correct! if I use /*+ materialize */ on TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017 with a CTE first, then apply the REGEXP_COUNT, it greatly improves the performance, without the materialize hints, no matter how I reorganize the filter clauses, the execution plan and processing time are the same. Thank you so much!

Thomas Strub · Accepted Answer · 2018-08-27 11:20:43Z

0

As pointed out from Martin the issue is the expensive regexp_count function. So reducing the question is:

Why is:

  select * from (
  with dat as (select level lv, rpad('X',500,'X') txt from dual connect by level <= 20000)
  select lv, 
         REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') as occurrences 
  from   dat 
  --where  REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') > 1
  ) where rownum > 1

0.019 seconds and

  select * from (
  with dat as (select level lv, rpad('X',500,'X') txt from dual connect by level <= 20000)
  select lv, 
         REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') as occurrences 
  from   dat 
  where  REGEXP_COUNT(txt, '(ABC|DEF)([[:digit:]]){5}') > 1
  ) where rownum > 1

6.7 seconds. Oracle evaluates the regexp_count in both executions. So there must be a difference in the evaluation in the where part and in the select part.

answered Aug 27, 2018 at 11:20

Thomas Strub

1,2858 silver badges20 bronze badges

2 Comments

MemoryLeak Over a year ago

I don't quite understand your point, but rownum > 1 can materialize the SQL, thus improving the performance.

Thomas Strub Over a year ago

rownum > 1 is not to improve the performance. It's to check the time a statement needs without the need to go to the end of the cursor. So in query 1 and 2 is done the same and in 2 it's factors slower.

Stefanos Zilellis · Accepted Answer · 2018-10-18 14:37:05Z

0

At SQL1 it filters by (TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017) For the rows returned, executes (REGEXP_COUNT) per row

At SQL2 it filters by the result of (REGEXP_COUNT) which means that executes it against all table rows. Then, on that result, filters by (TO_NUMBER(to_char(CREATE_DATE,'YYYY')) = 2017)

To prove this, execute SQL1 without the filter. It will take approximately as much time as SQL2, maybe a little more.

To optimize you need to be 100% sure it will take SQL1 filter first. An absolute way would be to execute SQL1 and get the results into a temporary/memory table, then filter on them SQL2 filter

answered Oct 18, 2018 at 14:37

Stefanos Zilellis

6015 silver badges8 bronze badges

Collectives™ on Stack Overflow

Oracle SQL performance optimization

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related