2

I have a quite simple table used for logging visits on members profiles, with a multi-column key (member_id, visitor_id, month_visited) and a more precise date. The month_visited is a CHAR(7) column like that : '2013-10'

Each new month, I want to compact in another table the data for the previous month, and then delete it.

My request is simply:

DELETE FROM visits WHERE month_visited = '2013-10'

It takes AGES to remove these lines, like several minutes on my dedicated server. The same goes when I just query a simple SELECT COUNT(*) FROM visits.

I have 1.8M entries for 2013-10.

But it takes ages. And when I try

EXPLAIN SELECT * FROM visits WHERE month_visited = "2013-10"

it tells me:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  visits  ref idx_month_visited   idx_month_visited   21  const   1782148 Using where

"using where", seriously??

EDIT : sorry, I forgot to specify that I also added an INDEX on just the month_visited column :) (as the EXPLAIN shows, actually, but it does not use it...)

How can I improve those (obviously) simple queries? I am a noob in MySQL but I don't think it is quite normal that it takes minutes to perform these queries.

Thanks for any input!

Best regards,

10
  • How many rows does this table have? Commented Nov 1, 2013 at 1:52
  • I'm asking because in my -limited- experience, when an index is not being used, it's generally because using it won't help much; that is, it will not save much time compared with a full table scan (this tends to happen when the cardinality of the index is low) Commented Nov 1, 2013 at 1:55
  • Also, deleting is a "write" action. Indices optimize reads, at the cost of making writes more expensive (because of index rebuilding on writes). So the fact that you have some complex indices does not help, but aggravates the problem. Commented Nov 1, 2013 at 2:00
  • Hi Juan. I have 1.8M rows. When I try to delete them, the "show processlist" indicates me that the query has been running for more than 2,000 seconds, and it is not over yet. Something is definitely wrong. An index should ease the selection of the rows to be deleted, hence the "DELETE" query should be fast and not take an hour! Commented Nov 1, 2013 at 2:12
  • Mmm, but if your table has 1.8M rows and you have 1.8M rows that meet the condition of the query, there's no use for an index. An index makes sense when it narrows down the number of rows to retrieve; otherwise it offers no real gain and could even impose some extra overhead. Also, an index may, in the best case, make a SELECT more efficient. But it won't make writes (insert, update and delete) work faster; on the contrary, it will make them perform worse. Commented Nov 1, 2013 at 2:22

2 Answers 2

5

I'm summarizing my comments in this answer.

In general, when an index is not being used, it's because using it won't help much. That is, it will not save much time compared with a full table scan (this tends to happen when the cardinality of the index is low). This seems to be the case here since you have about the same number of rows in the table than rows that you want to select. In this case, a full scan is usually cheaper than using the index.

Also, deleting is a "write" action. Indices optimize reads, at the cost of making writes more expensive (because of index rebuilding on writes). So the fact that you have some complex indices does not help, but aggravates the problem. An index makes sense when it narrows down the number of rows to retrieve; otherwise it offers no real gain and could even impose some extra overhead. Also, an index may, in the best case, make a SELECT more efficient. But it won't make writes (insert, update and delete) work faster; on the contrary, it will make them perform worse.

So, you should try to get rid of the indices that are not absolutely necessary. Remember an index is a trade-off, that might make read operations (select) faster, at the expense of making write operations (insert, update, delete) slower. This is because the index has to be rebuilt after a write.

You might want to give this a try: "If you are going to delete many rows from a table, it might be faster to use DELETE QUICK followed by OPTIMIZE TABLE. This rebuilds the index rather than performing many index block merge operations." dev.mysql.com/doc/refman/5.0/en/delete.html

Yet another option (may work or not, just thinking out loud here): if you want to delete all but a few rows from visitss, perhaps you could insert the rows "WHERE month != '2013-10' into an auxiliary table, TRUNCATE visits, then insert back the rows from the aux table into visits and finally TRUNCATE the aux table. As you point out, though, you'll need to put up some sort of locking while this process is running.

Sign up to request clarification or add additional context in comments.

Comments

1

Multi column keys can only be used if the first key components are used in the condition. In your case, this means your key (member_id, visitor_id, month_visited) will only be used if your condition includes

  • member_id or
  • member_id and visitor_id
  • member_id and visitor_id and month_visited.

Create a key that has month_visited as the first component.

2 Comments

Sorry I forgot to specify that I also added an INDEX on month_visited, as it is shown in the EXPLAIN, but still, Mysql does not seem to be willing to use it!
Just for information: I have launched my PHP script (that performs the data compacting, plus selecting of the rows for json backup, plus deletion of the rows), and it has been running for more than 30 minutes now and it is still not finished.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.