1
+----------------------------+------------------------------------------------------------------------------+------+-----+---------+----------------+
| Field                      | Type                                                                         | Null | Key | Default | Extra          |
+----------------------------+------------------------------------------------------------------------------+------+-----+---------+----------------+
| type                       | enum('Website','Facebook','Twitter','Linkedin','Youtube','SeatGeek','Yahoo') | NO   | MUL | NULL    |                |
| name                       | varchar(100)                                                                 | YES  | MUL | NULL    |                |
| processing_interface_id    | bigint(20)                                                                   | YES  | MUL | NULL    |                |
| processing_interface_table | varchar(100)                                                                 | YES  | MUL | NULL    |                |
| create_time                | datetime                                                                     | YES  | MUL | NULL    |                |
| run_time                   | datetime                                                                     | YES  | MUL | NULL    |                |
| completed_time             | datetime                                                                     | YES  | MUL | NULL    |                |
| reserved                   | int(10)                                                                      | YES  | MUL | NULL    |                |
| params                     | text                                                                         | YES  |     | NULL    |                |
| params_md5                 | varchar(100)                                                                 | YES  | MUL | NULL    |                |
| priority                   | int(10)                                                                      | YES  | MUL | NULL    |                |
| id                         | bigint(20) unsigned                                                          | NO   | PRI | NULL    | auto_increment |
| status                     | varchar(40)                                                                  | NO   | MUL | none    |                |
+----------------------------+------------------------------------------------------------------------------+------+-----+---------+----------------+

select *  from remote_request use index ( processing_order )  where remote_request.status = 'none' and type = 'Facebook' and reserved = '0' order by priority desc limit 0, 40;

This table receives an extremely large amount of writes and reads. each remote_request ends up being a process, which can spawn anywhere between 0 and 5 other remote_requests depending on the type of request, and what the request does.

The table is currently sitting at about 3.5 Million records, and it goes to a snail pace when the site itself is under heavy load and I have more then 50 or more instances running simultaneously. (REST requests are the purpose of the table just in case you were not sure).

As the table grows it just gets worse and worse. I can clear the processed requests out on a daily basis but ultimatly this is not fixing the problem.

What I need is for this query to always have a very low response ratio.

Here are the current indexes on the table.

+----------------+------------+----------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table          | Non_unique | Key_name                         | Seq_in_index | Column_name                | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------------+------------+----------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| remote_request |          0 | PRIMARY                          |            1 | id                         | A         |     2403351 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | type_index                       |            1 | type                       | A         |          18 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | processing_interface_id_index    |            1 | processing_interface_id    | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | processing_interface_table_index |            1 | processing_interface_table | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | create_time_index                |            1 | create_time                | A         |      160223 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | run_time_index                   |            1 | run_time                   | A         |      343335 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | completed_time_index             |            1 | completed_time             | A         |      267039 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | reserved_index                   |            1 | reserved                   | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | params_md5_index                 |            1 | params_md5                 | A         |     2403351 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | priority_index                   |            1 | priority                   | A         |         716 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | status_index                     |            1 | status                     | A         |          18 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | name_index                       |            1 | name                       | A         |          18 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | processing_order                 |            1 | priority                   | A         |         200 |     NULL | NULL   | YES  | BTREE      |         |               |
| remote_request |          1 | processing_order                 |            2 | status                     | A         |         200 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | processing_order                 |            3 | type                       | A         |         200 |     NULL | NULL   |      | BTREE      |         |               |
| remote_request |          1 | processing_order                 |            4 | reserved                   | A         |         200 |     NULL | NULL   | YES  | BTREE      |         |               |
+----------------+------------+----------------------------------+--------------+----------------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

Any idea how i solve this? Is it not possible to make some sort of complicated index that would automatic order them with priority, then take the first 40 that match the 'Facebook' type? It currently is scanning more then 500k rows of the table before it returns a result which is grossly inefficient.

Some other version of the query that I have been tinkering with are:

select *  from remote_request use index ( type_index,status_index,reserved_index,priority_index )  where remote_request.status = 'none' and type = 'Facebook' and reserv                          ed = '0' order by priority desc limit 0, 40

It would be amazing if we could get the rows scanned to under 1000 rows depending on just how many types of requests enter the table.

Thanks in advance, this might be a real nutcracker for most except the most experienced mysql experts?

1
  • Even though you cannot show the intensity of data, an SQLfiddle example would make it easier to see the issue Commented Aug 25, 2013 at 6:31

2 Answers 2

1

Your four-column index has the right columns, but in the wrong order.

You want the index to first look up matching rows, which you do by three columns. You are looking up by three equality conditions, so you know that once the index finds the set of matching rows, the order of these rows is basically a tie with respect to those first three columns. So to resolve the tie, add as the fourth column the column by which you wanted to sort.

If you do that, then the ORDER BY becomes a no-op, because the query can just read the rows in the order they are stored in the index.

So I would create the following index:

CREATE INDEX processing_order2 ON remote_request 
 (status, type, reserved, priority);

There's probably not too much significance to the order of the first three columns, since they're all in equality terms combined with AND. But the priority column belongs at the end.

You may also like to read my presentation How to Design Indexes, Really.

By the way, using USE INDEX() shouldn't be necessary if you have the right index, MySQL's optimizer will choose it automatically most of the time. But USE INDEX() can block the optimizer from considering a new index that you create, so it becomes a disadvantage for code maintenance.

Sign up to request clarification or add additional context in comments.

2 Comments

I really really like this answer, let me try it out.
Glad to help! You can see that the of columns in an index really matters. And learning how to predict the best columns is not that hard.
1

This isn't a complete answer but it was too long for a comment:

Are you actually searching on all of those indexes? If not get rid of some. Extra indexes slow down writes.

Secondly use EXPLAIN on your query and don't specify an index when you do. See how MySQL wants to process it rather than forcing an option (Generally it does the right thing).

Finally sorting is likely what hurts you the most. If you don't sort it probably gets the records pretty quickly. It has to scan and sort every row that meets your criteria before it can return the top 40.

Options:

  1. Try creating a VIEW (not as familiar with VIEWS but it might work)
  2. Split this table into smaller tables
  3. use a third party tool such as Sphinx or Lucene to create specialized indexes to search on. (I've used Sphinx for something like this before. You can find it at http://sphinxsearch.com/).
  4. Or look into using a NoSQL solution where you can use a Map function to do it.

Edit I read a bit about using VIEW and I don't think it will help you in your case because you have such a large table. See the answer in this thread: Using MySQL views to increase performance

2 Comments

1) view I dont think is any good because it will end up having to recalculate a lot... as for each insert it forces a recalculate of the view? 2) smaller tables wont work for this particular thing, i guess i could break the table apart based on type however. 3) shinx seems overkill. 4) no sql also seems over kill just yet.. as far as indexes, I am using for sure 85% of them elsewhere to make other queries faster so removing them is not really an option per say.. i like your response however.. its definitely thinking. I might have to break it down on type.
Yes you're right. The result would be cached but if you have a lot of writes it can hurt you.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.