2

item_tag_map has two column item_id and tag_id and both of them have index.

Here's a data sample:

item_id     tag_id
1           1
1           3
4           7
1           5
3           1
3           8
6           8
10          4

Now I want to obtain item ids which have tags 1,2,3,5 and sort the result by the total count of all tags.

Here's a result sample:

item_id     count(m.tag_id)
1           3
3           1

The SQL I tried was:

SELECT m.item_id,count(m.tag_id) from item_tag_map AS m
WHERE tag_id in(1,2,3,5)
GROUP BY m.item_id
ORDER BY count(m.tag_id)
LIMIT 10

There're about 10k rows in this table and the query was very slow. I tried to remove all count statement, then it became very fast than before.

Why would count slow down this query? How to optimize this query to make it fast?

5
  • 1
    What does EXPLAIN say? It's almost certainly the "ORDER BY" statement which is causing the delay - but the only way to be sure is to EXPLAIN the querry. Commented Oct 29, 2012 at 9:03
  • @NevilleK I don't think it's order by because only adding count in select count(m.tag_id) from ... slows down it.And here's the explain result:id:1;select_type:simple;table:m;type:index;possible_keys:tag_id;key:item_id;key_len:4;ref:null;extra:using where using temporary using filesort Commented Oct 29, 2012 at 9:08
  • Have you tried what Kali suggests? The explain suggests that the query is using a temporary table (to store the result of the select) and then a file sort on that temporary table (to achieve the "order by"). It also suggests that it's not using the index on TAG_ID - which is odd. Commented Oct 29, 2012 at 9:17
  • @NevilleK I tried USE INDEX(tag_id) to force it using tag_id but not item_id as index, it became a lot more faster than before which still costed 2s to finish the query. Commented Oct 29, 2012 at 9:33
  • I don't think it's necessarily the index which is causing the slow down. Can you remove the order by clause and tell us if the query is faster? Commented Oct 29, 2012 at 9:40

1 Answer 1

4

This is because of ORDER BY COUNT(m.tag_id).
MySQL needs to fetch all rows (ie. do a full table scan) to calculate the count for each value of item_id.

MySQL is not able to use the index in this case. (as you may realize when looking at EXPLAIN SELECT ..)

When you remove the COUNT() from the ORDER BY clause, MySQL is able to use the index for sorting.


One possible solution for this would be to create a materialized view, where the DBMS caches the count of tag_id values per item_id in a seperate table.

MySQL doesn't support materialized views natively, but you can simulate them:
You can initially create the table once using the query in question (INSERT INTO tag_counts SELECT ...) and then keep it updated using ON [INSERT | DELETE] triggers.
Alternatively, there is a third party software named FlexViews which automates this process for you.

That's how i kept my multi-million-rows-per-week statistics database reactive.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for your answer and I'll consider your method but I don't think you explained the reason well. I posted the explain result in the question comment which suggested type:index;possible_keys:tag_id;key:item‌​_id;key_len:4. I think that meant it used indexes.
@LotusH no that does not mean it's using an index. You must look at the "extra" column: using where using temporary using filesort. This actually means what i said: MySQL is doing a full table scan. "possible_keys" just means that there is a potential key to use. It does not mean it can be used. That would be indicated by "using index" in the "extra" column.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.