mysql query optimization with join and subquery

Question

Can someone pls help with this. From the slow log This query below takes 11 seconds to run and its eating up server resources. How do i re-write this query to achieve greater optimization?

P.S: The tables are indexed.

The query :

SELECT SUM(the_val) AS value
FROM
  (SELECT DISTINCT basic_data.id,
                   att2.the_val
   FROM province_create
   INNER JOIN basic_data ON province_create.province = basic_data.province
   INNER JOIN att2 ON att2.church_id = basic_data.id
   WHERE province_create.block = 0
     AND att2.month = 'Feb'
     AND att2.year = '2017'
     AND basic_data.parish = 1
     AND att2.report = 'ATTENDANCE'
     AND province_create.disable = 0 ) t1;

The EXPLAIN report:

[1] => Array ( [0] => 1 [id] => 1 [1] => PRIMARY [select_type] => PRIMARY [2] => [table] => [3] => ALL [type] => ALL [4] => [possible_keys] => [5] => [key] => [6] => [key_len] => [7] => [ref] => [8] => 38339 [rows] => 38339 [9] => [Extra] => )

[2] => Array
    (
        [0] => 2
        [id] => 2
        [1] => DERIVED
        [select_type] => DERIVED
        [2] => province_create
        [table] => province_create
        [3] => ALL
        [type] => ALL
        [4] => kk,province,kkk
        [possible_keys] => kk,province,kkk
        [5] => 
        [key] => 
        [6] => 
        [key_len] => 
        [7] => 
        [ref] => 
        [8] => 261
        [rows] => 261
        [9] => Using where; Using temporary
        [Extra] => Using where; Using temporary
    )

[3] => Array
    (
        [0] => 2
        [id] => 2
        [1] => DERIVED
        [select_type] => DERIVED
        [2] => basic_data
        [table] => basic_data
        [3] => ref
        [type] => ref
        [4] => PRIMARY,kk,kkk,k,parish
        [possible_keys] => PRIMARY,kk,kkk,k,parish
        [5] => kk
        [key] => kk
        [6] => 56
        [key_len] => 56
        [7] => databaseuser.province_create.province
        [ref] => databaseuser.province_create.province
        [8] => 39
        [rows] => 39
        [9] => Using index; Distinct
        [Extra] => Using index; Distinct
    )

[4] => Array
    (
        [0] => 2
        [id] => 2
        [1] => DERIVED
        [select_type] => DERIVED
        [2] => att2
        [table] => att2
        [3] => ref
        [type] => ref
        [4] => indpull,mmm
        [possible_keys] => indpull,mmm
        [5] => mmm
        [key] => mmm
        [6] => 57
        [key_len] => 57
        [7] => databaseuser.basic_data.id
        [ref] => databaseuser.basic_data.id
        [8] => 1
        [rows] => 1
        [9] => Using where; Distinct
        [Extra] => Using where; Distinct
    )

)

(a) How many records are involved? (b) Would it make a difference if you didn’t include basic_data.id in your subquery? — Manngo
– Manngo, Commented Apr 9, 2017 at 11:34
Please show a sample of the original data. Why is SELECT DISTINCT needed? — Gordon Linoff
– Gordon Linoff, Commented Apr 9, 2017 at 11:40
@Manngo. (a) The province_create table - about 300 records. The b — uzor
– uzor, Commented Apr 9, 2017 at 11:42
@uzor Is basic_data.id a primary key? There’s usually not much point in including a primary key in a SELECT DISTINCT clause, as it’s already distinct. Also, did you really want att2.the_val to be distinct? I have no idea what the value means, but you’re excluding multiple occurrences of the value. In other words, why are they distinct? — Manngo
– Manngo, Commented Apr 9, 2017 at 11:51
@Manngo (a) The province_create table - about 300 records, the basic_data table about 50,000 records, the att2 table almost 2 million records. (B) The select distinct basic_data.id is needed because some ids on the att2 table appeared more than once. when i took it out it gave me a different (false) result — uzor
– uzor, Commented Apr 9, 2017 at 11:54

Gordon Linoff · Accepted Answer · 2017-04-09 11:45:33Z

1

First, let me assume that SELECT DISTINCT is not needed. Then the query can be written as:

SELECT SUM(a.the_val)
FROM province_create pc INNER JOIN
     basic_data bd
     ON pc.province = bd.province INNER JOIN
     att2 a
     ON a.church_id = bd.id
WHERE pc.block = 0 AND
      a.month = 'Feb' AND
      a.year = '2017' AND
      bd.parish = 1 AND
      a.report = 'ATTENDANCE'
      pc.disable = 0 ;

Second, you should try indexes on the tables. It is hard to tell what the best index would be, so try adding the following:

attr2(year, month, report, church_id, the_val)
basic_data(id, province, parish)
province_create(province, disable)

This index should help even if the SELECT DISTINCT is needed. However, you need to understand why you are getting duplicates and fix the root cause of that problem for best performance.

answered Apr 9, 2017 at 11:45

Gordon Linoff

1.3m62 gold badges706 silver badges857 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

uzor Over a year ago

Thanks it enhanced performance significantly. However the issue is that on the att2 table the church_id (which is a foreign key of basic_data.id) is not unique. (i.e there are times where a paricular church_id,month,year, report appears more than once). The value of what i am getting from your solution is higher than what it should be. so how can i (a) make sure that is does not double sum or (b) is there a way can i search for values on the att2 table that are duplicates (based on basic_data.id) and delete them? Thanks. your help is appreciated

Gordon Linoff Over a year ago

@uzor . . . Does the index work on your original query?

uzor Over a year ago

yes it worked on the original query but worked better with your solution. right now am i researching ways of deleting duplicate entries in a mysql table so that i can adopt your solution.

Collectives™ on Stack Overflow

mysql query optimization with join and subquery

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related