0

I am writing a SQL query which requires highly optimized solution, so as to not timeout. But I have got no idea of how to continuously optimize the following SQL query:

select distinct j.job,f.path,p.path 
from fixes f, jobs j, paths p where f.job=j.id and p.id =f.path 
and (p.path like '//Tools/Web/%' or p.path = '//Tools/Web');

I have created indexes on the following fields(essentially everything):

  • jobs.id
  • jobs.job
  • paths.path
  • paths.id
  • fixes.job
  • fixes.path

In each of the "fixes", "jobs", "paths" table there are ~50,000 rows, and current timeout is 6 min

The 'explain' command shows the following information, try to deciphering

1   SIMPLE  j   index   PRIMARY         job     62   (null)    73226    Using index; Using temporary
1   SIMPLE  f   ref     path,job        job     8    j.id      825  
1   SIMPLE  p   eq_ref  PRIMARY,path    PRIMARY 8    f.path    1        Using where

The table creation statements for the 'paths' table:

CREATE TABLE `paths` (
   `id` bigint(20) NOT NULL AUTO_INCREMENT,
   `path` varchar(250) NOT NULL,
   PRIMARY KEY (`id`),
   UNIQUE KEY `path` (`path`),
 ) ENGINE=InnoDB  DEFAULT CHARSET=utf8;
6
  • Please do not ever use implicit joins , they are a SQL antitppattern as they are more difficult to maintain and far mor subjkject to accidental cross joins. Commented Oct 12, 2012 at 16:59
  • How long does the query take without the string comparisons on the path? Commented Oct 12, 2012 at 17:06
  • Duration: 0.078 sec, and fetch time is 0.015 sec, without string comparison Commented Oct 12, 2012 at 17:26
  • Can you edit the question and add the CREATE TABLE statements for the 3 tables? Commented Oct 12, 2012 at 17:29
  • And how long does the query take without the DISTINCT? Commented Oct 12, 2012 at 17:31

3 Answers 3

2

Wouldn't this get the same results?

select distinct j.job,f.path,p.path  
from fixes f
join  jobs j on  f.job=j.id 
join  paths p  on p.id =f.path  
where p.path like '//Tools/Web%' 

OR is almost always a costly feature.

You could also try a Union Query, they are often faster than an OR.

select  j.job,f.path,p.path  
from fixes f
join  jobs j on  f.job=j.id 
join  paths p  on p.id =f.path  
where p.path like '//Tools/Web/%' 
union 
select  j.job,f.path,p.path  
from fixes f
join  jobs j on  f.job=j.id 
join  paths p  on p.id =f.path  
where  p.path = '//Tools/Web'); 
Sign up to request clarification or add additional context in comments.

10 Comments

This would also get the '//Tools/WebDesign/'
Correct. There are ~50,000 paths and will be more as time goes. There is possibility to have paths that are such "prefixes" of some others
@ypercube, so would his orginal query (at least in SQL server it would)
@ChenXie did you try and see if it returned differnt results?
@HLGEM Thanks for the discussion and that 'OR' operation is the culprit in this case. After adapting this solution with 'UNION', the problem solved in this case and I now gain a very good query performance, with around 0.1s duration time
|
2

Do you need the DISTINCT? Maybe it's possible that in your dataset that you wouldn't require it. You could try rewriting the query without that, and start the WHERE condition with the path.p conditions. You could also try joining the other two tables.

E.g.

    select j.job,f.fix,p.path 
    from paths.p
    join fixes f on (f.path = p.id)
    join jobs j on (f.job = j.id)
    where (p.path like '//Tools/Web/%' or p.path = '//Tools/Web')

     group by job, fix, path

If you need the distinct, the GROUP BY might help. Also, you have two columns called "path" in your original query.

4 Comments

It does improved some performance. The query gives a ~5 min duration time, instead of timing out at 6 min.
Out of curiosity, how long does the query take if you remove the fix and job tables? i.e. SELECT path FROM paths WHERE (path like '//Tools/Web/%') or (path = '//Tools/Web') It might help you track down the slow table.
Without joining those tables, the query is fast. So here is the thing, without joining tables, the string comparison is fast; without the string comparison, the joining operation is fast. But when it comes together, things just won't work
What happens if you just JOIN fixes and jobs? You could SELECT the path information into a temporary table, then run the join on the results to see where it is slowing down.
1

Use Explain your sql query to see whether thses indexes are used by your query or not

I am sure your indexes are wrong because 6 min is lot of time for a query

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.