1

I have the following query:

SELECT `assignments`.`id`
FROM `assignments` 
WHERE
  `assignments`.`account_id` = 742 
  AND `assignments`.`method` != 'stray' 
  AND (
    `assignments`.`judge_id` = 2349724 
    OR (
      `assignments`.`role_id` IN (234, 8745) 
      AND `assignments`.`judge_id` IS null
    )
  );

This table currently has 6.6 million records, and has quite a bit of traffic. Our slowest query is the one above, and even with an index that targets account_id, method, judge_id and role_id, it's taking approximately 0.5s to run.

The query does use the provided index but doesn't seem to give it much of a boost.

What can I do here to improve the query and get this down to under 100ms? 6.6 million records really isn't that much =\

I'd also like to add, that if I just restrict the query to the account_id clause (which has its own index), the speed is about the same. So I'm really perplexed.

The following is the execution plan on just using account_id:

EXPLAIN select `assignments`.id FROM `assignments`WHERE `assignments`.`account_id` = 374;
+----+-------------+-------------+------------+------+----------------------------------------------------------------------+------------------------------+---------+-------+------+----------+-------------+
| id | select_type | table       | partitions | type | possible_keys                                                        | key                          | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-------------+------------+------+----------------------------------------------------------------------+------------------------------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | assignments | NULL       | ref  | assignments_account_id_index,assignments_account_id_updated_at_index | assignments_account_id_index | 9       | const |  965 |   100.00 | Using index |
+----+-------------+-------------+------------+------+----------------------------------------------------------------------+------------------------------+---------+-------+------+----------+-------------+

Create table syntax:

CREATE TABLE `assignments` (
  `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
  `key` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `batch` int(10) unsigned NOT NULL,
  `account_id` bigint(20) unsigned DEFAULT NULL,
  `season_id` bigint(20) unsigned DEFAULT NULL,
  `judge_id` bigint(20) unsigned DEFAULT NULL,
  `role_id` bigint(20) unsigned DEFAULT NULL,
  `entry_id` bigint(20) unsigned NOT NULL,
  `score_set_id` bigint(20) unsigned NOT NULL,
  `slug` char(8) COLLATE utf8_unicode_ci DEFAULT NULL,
  `method` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
  `original_method` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `status` varchar(255) COLLATE utf8_unicode_ci NOT NULL DEFAULT 'none',
  `locked` tinyint(1) NOT NULL DEFAULT '0',
  `conflict_of_interest` tinyint(1) NOT NULL DEFAULT '0',
  `raw_score` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `raw_total` double NOT NULL DEFAULT '0',
  `weighted_score` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `weighted_total` double NOT NULL DEFAULT '0',
  `weight_sum` decimal(8,2) NOT NULL DEFAULT '0.00',
  `progress` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `consensus` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `top_pick_preference` tinyint(3) unsigned DEFAULT NULL,
  `top_pick_winner` tinyint(1) NOT NULL DEFAULT '0',
  `top_pick_rank` int(11) DEFAULT NULL,
  `total_votes` bigint(20) unsigned NOT NULL DEFAULT '0',
  `scored_at` timestamp NULL DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `updated_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  PRIMARY KEY (`id`),
  UNIQUE KEY `assignments_key_unique` (`key`),
  KEY `assignments_account_id_index` (`account_id`),
  KEY `assignments_judge_id_index` (`judge_id`),
  KEY `assignments_role_id_index` (`role_id`),
  KEY `assignments_entry_id_index` (`entry_id`),
  KEY `assignments_score_set_id_index` (`score_set_id`),
  KEY `assignments_season_id_index` (`season_id`),
  KEY `assignments_slug_index` (`slug`),
  KEY `assignments_status_index` (`status`),
  KEY `assignments_method_index` (`method`),
  KEY `assignments_original_method_index` (`original_method`),
  KEY `assignments_account_id_updated_at_index` (`account_id`,`updated_at`)
) ENGINE=InnoDB AUTO_INCREMENT=661994447 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
17
  • MySql's ability to take advantage of a compound index ends on the first range condition. In this case an index on (account_id, method, judge_id, role_id) can only take advantage of the index for account_id and method, since != is a "range" condition. Commented Mar 3, 2020 at 22:55
  • The problem is that even if I restrict the query to just account_id, as mentioned - it's still just as fast/slow. Commented Mar 3, 2020 at 22:55
  • In regards to the method aspect though, should I simply define what values are valid, rather than what is not? Commented Mar 3, 2020 at 22:56
  • 1
    Can you post the query execution plane? Commented Mar 3, 2020 at 22:57
  • 1
    @Uueerdo - OR using the same column is turned into IN. Otherwise, OR is very deadly to optimization. IN ranks somewhere between = and a 'range', depending on several things. Commented Mar 4, 2020 at 0:19

4 Answers 4

3

Since range conditions negatively impact MySQL's ability to take advantage of indexes, UNION can sometimes be used (at the cost of some duplication of query syntax):

SELECT a.id 
FROM `assignments` AS a
WHERE a.`account_id` = 742 
   AND a.`judge_id` = 2349724 
   AND a.`method` != 'stray' 
UNION ALL
SELECT a.id 
FROM `assignments` AS a
WHERE a.`account_id` = 742 
   AND a.`judge_id` IS NULL AND a.`role_id` IN (234, 8745)
   AND a.`method` != 'stray'
;

A compound index on account_id, judge_id, method_id or account_id, judge_id, role_id would be greatly beneficial to the performance of the above query. ...and if I am not mistaken, the first of these could benefit the first half, and the second of these could benefit the second half (but there is such as thing as over indexing as well).

Sign up to request clarification or add additional context in comments.

14 Comments

It's not. In fact, it doesn't do a damn thing. As mentioned in the initial post, even restricting the query to just account_id has the same query performance..
It's not what? Did you try this with either of the new indexes I suggested? (Splitting it out into a union like this allows the judge_id values to no longer be handled as range conditions, letting the query take better advantage of an index starting in account_id, judge_id.)
@Oddman and, as GMB pointed out in his answer, it does not look like you have any compound indexes; individual indexes are not as helpful in this case as MySQL can only use one index per table reference in a query; it will not combine the individual ones.
Please re-read what I just said: using ONLY the account_id clause, it has the same query performance (and account_id has its own index). I have indeed tried compound indexes. The latter point is why I'm so damn confused. Why is just querying by account_id still slow?
You need, and can use, both of these indexes: (account_id, judge_id, method) and (account_id, judge_id, role_id). The OR is the Optimizer killer more than the range. If you disagree, let's see the EXPLAIN SELECT...
|
2

The things I would look at and do if I were you in order to solve the problem:

Check the number of items you would get

SELECT count(*)
FROM `assignments` 
WHERE
  `assignments`.`account_id` = 742 
  AND `assignments`.`method` != 'stray' 
  AND (
    `assignments`.`judge_id` = 2349724 
    OR (
      `assignments`.`role_id` IN (234, 8745) 
      AND `assignments`.`judge_id` IS null
    )
  );

tells us how many records your specific query should result.

SELECT `assignments`.`id`
FROM `assignments` 
WHERE
  `assignments`.`account_id` = 742;

tells you how many assignments are linked to a given account. If the counts are significantly quicker than the actual selection, that might mean somethings. Also, if there are many records, it could take a lot of time to load that into memory and send it to another computer via a network.

Check whether anything is quick for the table

SELECT `assignments`.`id`
FROM `assignments` limit 0, 100;

If this is slow, then you might have issues with your network.

Make a copy of your database

Make a dump and recreate your database and run your queries in this newly created sandbox, so you will see whether other commands are slowing you down or not. If other queries are slowing you down, then maybe write operations are causing the slowdown. If write-locks are slowing you down, then you might want to group your write operations into batches and execute them together at certain times.

Make proper indexes

Create multidimensional indexes on your table, using the fields you are using as filters in the where clause as UUeerdo and GMB have already suggested in their answers.

4 Comments

This is a great response, thank you. A count performs roughly the same, no real difference there. The main issue is that even just querying by account_id is slow, regardless of anything else. Any ideas?
@Oddman yes, I understand your point. The fact that the count performs roughly the same shows that the writing and reading to and from memory is insignificant as a burden in comparison to the search itself. The next thing I would check is whether running a query like SELECT `assignments`.`id` FROM `assignments` limit 0, 100; loading arbitrary records performs well. If not, then it's not an issue with your filters, but rather with the traffic of your db.
aha! As in the network my be the bottleneck? I'll do some more tests. Thank you.
@Oddman at least we need some tests to confirm or refute that hypothesis. Doing the experiments in this direction will inherently be useful. If it's a network issue, then you find the problem to solve. If not, then you successfully narrowed down the problem-space.
1

This might not be a complete answer, but: you don't have the right index in place. A compound index is different than individual indexes on each column.

Consider, instead:

(account_id, judge_id, role_id, method, id)

The entire index might not be actually used due the AND/OR/IN, but this at least gives the query planner a chance. You might also want to try it against Uueerdo's union all query (upvoted).

4 Comments

Thanks GMB, please read the entire post :) The issue isn't the index, I've tried with numerous indexes and none have improved it. Additionally, when just using account_id as a clause, it's just as slow.
@Oddman: then you should probably show the execution plans for this specific index (or the ones suggested by Uueerdo). The query plan that you have showed hardly relate to your real query.
my main point is that even when just using account_id (and the provided plan), it's just as slow. Not really sure how much more specific I can be? Any suggestions?
This index here is "covering". EXPLAIN will indicate it by saying Using index. But "covering" only provides a small boost in performance. Getting rid of the OR is the key.
0

first thing that comes into my mind is you should not need to have 'assignments' everywhere in your query.

select id FROM `assignments`
WHERE `account_id` = 742
AND `method` != 'stray'
AND (`judge_id` = 2349724 
OR (`role_id` IN (234, 8745) 
AND `judge_id` IS null));

should work just fine. Ok, that won't solve your problem.

Maybe you can query the ID first and the method after that, like so:

    select id FROM `assignments`
WHERE `account_id` = 742
    AND (`judge_id` = 2349724 
OR (`role_id` IN (234, 8745) 
AND `judge_id` IS null))
AND `method` != 'stray';

Just an idea.

4 Comments

Thanks, Michael. as mentioned in the post, even if I restrict it to just account_id (which has its own index), it's still too slow.
MySQL doesn't (generally) care about the order of the conditions.
As it turns out, the Parser will add back the table name! This can be seen by doing EXPLAIN EXTENDED SELECT ...; SHOW WARNINGs;. (In newer versions EXTENDED can be omitted.)
To add to what Rick said; while omitting table names may make your query string shorter, it will not make your query process any faster. The request might get sent across the network a little faster, but it also might slow the server's parsing a little since it has to identify which table which field name is referring to.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.