0

I have a very big table and following code takes 990 sec. to complete. bdate and itype are indexed. What else do I need to optimize/change?

SELECT s, count(*) as total
FROM  `mt_ex_15` 
WHERE bdate > '2014-10-01' and bdate < '2014-11-01'
and itype = '3'
group by s
order by total desc

EDIT: Here is the EXPLAIN

id  select_type table   type    possible_keys   key key_len ref rows    Extra   
1   SIMPLE  mt_ex_15    ref itype,bdate,s   itype   2   const   44157686    Using where; Using temporary; Using filesort    

EDIT: I think I need to optimize my DB or my.cnf because even the following query took 40 secs.

SELECT count(*) as total
FROM  `mt_ex_15` 
WHERE bdate > '2015-02-01' and bdate < '2015-03-01'

And here is the explain:

 id     select_type     table   type    possible_keys   key     key_len     ref     rows    Extra   
1   SIMPLE  mt_ex_15    range   bdate   bdate   3   NULL    4494019     Using where; Using index
4
  • 2
    Table definition would be nice.EDIT:As it is your query would be best served by an index(bdate,itype,s).Is your bdate date or datetime? Commented Apr 22, 2015 at 19:40
  • 1
    Do you have index on s? How many distinct values for s are there? Since you are sorting on calculated value, if you have large number of distinct values for s, the sort may still take some time. Commented Apr 22, 2015 at 19:41
  • there are 8 distinct values for s and yes it is indexed Commented Apr 22, 2015 at 20:01
  • Consider a multi-column index on (itype, s, bdate). Assuming InnoDB, you need to increase size of InnoDB buffer pool if simple COUNT queries against and index are taking that long. Commented Apr 22, 2015 at 20:25

3 Answers 3

1

For this query:

SELECT s, count(*) as total
FROM  `mt_ex_15` 
WHERE bdate > '2014-10-01' and bdate < '2014-11-01' and itype = '3'
group by s
order by total desc

The best index is mt_ex_15(itype, bdate, s). The engine should be able to take full advantage of the index for the where clause. In addition, this is a covering index so the original data does not need to touched for this query.

If you had a list of all available "s" values, you could do this as a correlated subquery:

select s.*,
       (select count(*)
        from mt_ex_15 m
        where m.s = s.s and m.itype = 3 and m.bdate > '2014-10-01' and m.bdate < '2014-11-01'
       ) total
from s
having total > 0 -- using a convenient MySQL extension
order by total desc;

The best index for this query is mt_ex_15(s, itype, bdate).

Note: if itype is really an integer, you should remove the quotes around the constant. They are misleading.

Sign up to request clarification or add additional context in comments.

1 Comment

+1. For a limited number of values of s, using a correlated subquery to return the count for each value of s is a workable approach. But, as Gordon notes, for best performance, we really need to have a suitable index available (With the index Gordon suggests, the index on standalone s would be redundant, and could be dropped.) @Gordon: we could get a list of distinct values of s from an inline view e.g. FROM (SELECT DISTINCT s FROM mt_ex_15) s . Also, the query from OP doesn't include "zero" counts, so we'd also need to add HAVING total > 0 to replicate that behavior.
0

Use EXPLAIN to see the execution plan.

Lacking any information about the table, we're really just guessing.

I'd try achieving the specified result like this:

CREATE INDEX `mt_ex_15_IX1` ON `mt_ex_15` (`itype`,`s`,`bdate`);

SELECT t.s
     , SUM(t.bdate > '2014-10-01' AND t.bdate < '2014-11-01') AS `total`
  FROM `mt_ex_15` t
 WHERE t.itype = '3'
 GROUP BY t.s
HAVING `total` > 0
 ORDER BY t.s DESC

Comparing the EXPLAIN output from this and from the original will (likely) show that the two queries are using different execution plans.

FOLLOWUP

With a suitable index, MySQL can avoid an expensive "Using filesort" operation. The index I recommended above will render the index on just the itype column redundant, and that index could be dropped. (Any query that was making use of that index can make use of the new index, since itype is the leading column.

The recommendation for the new index is based on the query... an equality predicate on itype (make that column the leading column), followed by s since there's a GROUP BY on that column. Including the bdate column in the index means that the query can be satisfied from the index, without a lookup to the underlying data page.

We'd expect the EXPLAIN output "Extra" column to show "Using index", and not show "Using filesort".

If adding an index is out of the question, then your best shot at avoiding a "Using filesort" is going to be to make use of an existing index that has column s as the leading column. But that means that the query is going to need to examine every row in the table; if the columns bdate and itype aren't included in the index, then that means an index lookup to every row in the table. But, this may perform faster. Check the output from EXPLAIN for this query:

EXPLAIN
SELECT t.s
     , SUM(t.itype = '3' AND t.bdate > '2014-10-01' AND t.bdate < '2014-11-01') 
       AS `total`
  FROM `mt_ex_15` t
 GROUP BY t.s
HAVING `total` > 0
 ORDER BY t.s DESC

5 Comments

id select_type table type possible_keys key key_len ref rows Extra 1 SIMPLE mt_ex_15 ref itype,bdate,s itype 2 const 44157686 Using where; Using temporary; Using filesort
adding index to 50 GB of data with 200M rows will kill me :)
Try doing it on a replica if your db is in use. What is killing you is the order by on a derived field: no way to index that. The DB dev work around is to use an intermediate table, a real table. Use a stored procedure. 1. Create a temp table with columns and indexes. 2. Select the grouped results into it. 3. Select from it using the index for the order by. If that still does not run in an acceptable time, creat 2 table, one for the select with where conditions and one for the grouped results. Then select with order by.
there will be 7-8 different results. i don't need order by. I think it is all about group by. Of course i am not sure but will try it now
@blacksun: My bad, I missed the ORDER BY total. MySQL is going to need to do a sort operation to get that; but a sort on 8 rows isn't going to take any time. The real problem is the sort operation required by the GROUP BY operation. MySQL can make use of an appropriate index to avoid that sort (it can retrieve rows pre-sorted using the index). To do the "group by s" operation, MySQL needs the rows "in order" by "s". The queries in my answer are designed to trick MySQL into avoiding the expensive sort operation on a boatload of rows.
0

GROUP BY s ORDER BY total -- You are stuck with at least one "filesort". Depending on various things, the sort may actually be in RAM.

An off-the-wall suggestion:

  • Change to GROUP BY itype, s -- The unnecessary field in the GROUP BY may lead to a better EXPLAIN.
  • INDEX(itype, s, bdate) -- in that order

If you are using MySQL 5.6.16 or later, ALTER TABLE ... ALGORITHM = INPLACE will be less invasive.

If bdate is a DATE, then bdate > '2014-10-01' eliminates Oct. 1; was that intentional?

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.