1

I've got table in MySQL db where >20 000 000 rows, the query below executes great on small amount of rows, but takes 2-3 secs if there are more. How can I optimize this to make it run < 1 at least? Note - the problem is in sub-query SELECT read_state FROM messages... Query:

SELECT sql_no_cache users.id AS uid,
  name,
  avatar,
  avatar_date,
  driver,
  msg,
  DATE,
  messages.removed,
  from_id = 528798 AS outbox ,
  !(0    IN
  (SELECT read_state
  FROM messages AS msgs FORCE KEY(user_id_2)
  WHERE (msgs.from_id = messages.from_id
  OR msgs.from_id = messages.user_id)
  AND msgs.user_id = 528798
  AND removed = 0
  )) AS read_state
FROM dialog,
  messages,
  users
WHERE messages.id = mid
AND ((uid1 = 528798
AND users.id = uid2)
OR (uid2 = 528798
AND users.id = uid1))
ORDER BY DATE DESC;

show index from messages;

+----------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table    | Non_unique | Key_name    | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+----------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| messages |          0 | PRIMARY     |            1 | id          | A         |    27531939 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | to_number   |            1 | to_number   | A         |          22 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | from_id     |            1 | from_id     | A         |      529460 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | from_id     |            2 | to_number   | A         |      529460 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | user_id_2   |            1 | user_id     | A         |      655522 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | user_id_2   |            2 | read_state  | A         |      917731 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | user_id_2   |            3 | removed     | A         |      949377 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | idx_user_id |            1 | user_id     | A         |      809762 |     NULL | NULL   |      | BTREE      |         |               |
| messages |          1 | idx_from_id |            1 | from_id     | A         |      302548 |     NULL | NULL   |      | BTREE      |         |               |
+----------+------------+-------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

desc messages;

+------------+-------------+------+-----+---------+----------------+
| Field      | Type        | Null | Key | Default | Extra          |
+------------+-------------+------+-----+---------+----------------+
| id         | int(11)     | NO   | PRI | NULL    | auto_increment |
| from_id    | int(11)     | NO   | MUL | NULL    |                |
| user_id    | int(11)     | NO   | MUL | NULL    |                |
| group_id   | int(11)     | NO   |     | NULL    |                |
| to_number  | varchar(30) | NO   | MUL | NULL    |                |
| msg        | text        | NO   |     | NULL    |                |
| image      | varchar(20) | NO   |     | NULL    |                |
| date       | bigint(20)  | NO   |     | NULL    |                |
| read_state | tinyint(1)  | NO   |     | 0       |                |
| removed    | tinyint(1)  | NO   |     | NULL    |                |
+------------+-------------+------+-----+---------+----------------+

EXPLAIN EXTENDED:

+----+--------------------+----------+-------------+---------------+-----------+---------+--------------------+--------+----------+---------------------------------------------------------------------------+
| id | select_type        | table    | type        | possible_keys | key       | key_len | ref                | rows   | filtered | Extra                                                                     |
+----+--------------------+----------+-------------+---------------+-----------+---------+--------------------+--------+----------+---------------------------------------------------------------------------+
|  1 | PRIMARY            | dialog   | index_merge | uid1,uid2     | uid1,uid2 | 4,4     | NULL               |   1707 |   100.00 | Using sort_union(uid1,uid2); Using where; Using temporary; Using filesort |
|  1 | PRIMARY            | users    | ALL         | PRIMARY       | NULL      | NULL    | NULL               | 608993 |   100.00 | Range checked for each record (index map: 0x1)                            |
|  1 | PRIMARY            | messages | eq_ref      | PRIMARY       | PRIMARY   | 4       | numbers.dialog.mid |      1 |   100.00 |                                                                           |
|  2 | DEPENDENT SUBQUERY | msgs     | ref         | user_id_2     | user_id_2 | 6       | const,const,const  |   2607 |   100.00 | Using where                                                               |
+----+--------------------+----------+-------------+---------------+-----------+---------+--------------------+--------+----------+---------------------------------------------------------------------------+
5
  • You should be able to put that sub query in as a JOIN. Then you can do it as a non correlated sub query which should be more efficient. However it is a bit messy mixing up join syntax (between implicit and explicit joins) and you haven't supplied the table declares to sort out the joins (ie, you join on messages.id = mid but nothing says which table mid is from). If you could add the declares it would help. Commented May 14, 2014 at 12:58
  • from your explain i would guess the problem is more the users table than the message table? do you have information on that table also? Commented May 14, 2014 at 13:10
  • Just a marginal info: In my opinion you can delete the indexes idx_user_id (user_id_2 will be used instead) and idx_from_id (from_id will be used instead) to save hard drive space and speed up data manipulation operations (insert, update, delete). Commented May 14, 2014 at 14:02
  • Can you quickly go back to the query and edit so all columns are represented as table.column (or even alias.column) so we know origin of the columns to respective tables. Commented May 14, 2014 at 14:07
  • Of course you should never be using implicit syntax to begin with, very poor programming technique and SQL antipattern. Commented May 15, 2014 at 15:13

1 Answer 1

1

Making a few guesses, something like this might be more efficient:-

SELECT DISTINCT users.id AS uid,
  name,
  avatar,
  avatar_date,
  driver,
  msg,
  `DATE`,
  messages.removed,
  from_id = 528798 AS outbox ,
  CASE WHEN msgs.read_state IS NULL THEN 1 ELSE 0 END AS read_state
FROM messages
INNER JOIN dialog ON messages.id = dialog.mid
INNER JOIN users ON (dialog.uid1 = 528798 AND users.id = dialog.uid2) OR (dialog.uid2 = 528798 AND users.id = dialog.uid1)
LEFT OUTER JOIN messages msgs ON msgs.read_state = 0 AND msgs.user_id = 528798 AND removed = 0 AND (msgs.from_id = messages.from_id OR msgs.from_id = messages.user_id)
ORDER BY `DATE` DESC;

This is doing an extra join as a LEFT JOIN against messages again, and then using case to convert the result to 0 or 1.

the DISTINCT should cope when the LEFT JOIN can bring back multiple matching rows (if that is not possible then you can elminate the DISTINCT)

Suspect the OR clauses in the join onto users will not be that efficient. May be better to replace the INNER JOIN against users with 2 LEFT OUTER JOINs. Something like this:-

SELECT DISTINCT COALESCE(users1.id, users2.id) AS uid,
  COALESCE(users1.name, users2.name),
  COALESCE(users1.avatar, users2.avatar),
  COALESCE(users1.avatar_date, users2.avatar_date),
  COALESCE(users1.driver, users2.driver),
  msg,
  `DATE`,
  messages.removed,
  from_id = 528798 AS outbox ,
  CASE WHEN msgs.read_state IS NULL THEN 1 ELSE 0 END AS read_state
FROM messages
INNER JOIN dialog ON messages.id = dialog.mid
LEFT OUTER JOIN users users1 ON (dialog.uid1 = 528798 AND users1.id = dialog.uid2)
LEFT OUTER JOIN users users2 ON (dialog.uid2 = 528798 AND users2.id = dialog.uid1)
LEFT OUTER JOIN messages msgs ON msgs.read_state = 0 AND msgs.user_id = 528798 AND removed = 0 AND (msgs.from_id = messages.from_id OR msgs.from_id = messages.user_id)
WHERE users1.id IS NOT NULL
OR users2.id IS NOT NULL
ORDER BY `DATE` DESC;
Sign up to request clarification or add additional context in comments.

5 Comments

Marvellous: bad query - 21 rows in set (1.46 sec) the good one (Yours) - 29 rows in set (0.01 sec) Thx a lot.
Sorry, got a problem - distinct is not working for uid with multiple columns. Tested sql with GROUP BY uid - takes more then original one ). It would be great to see an example like 1-st one with unique uids. Anyway thx for help upvoted Your answer.
GROUP BY would probably cause odd problems. Have you got a bit of sample data that demonstrates the problem? I have just spotted an error with it though (I should have used COALESCE on the uid as well).
Here is the 1-st sql sample data result: 610253 0 0 Роо 1400157327052 0 0 1 610253 0 0 Рррррр 1400157320994 0 0 1 610253 0 0 Пьо 1400157309489 0 0 1 610253 0 0 rwerwfew 1400157293064 0 1 1 610253 0 0 Asd 1400157276348 0 1 1 610253 0 0 1400154840172 0 0 1 610253 0 0 1400153603744 0 0 1
None of those rows are duplicates (DISTINCT eliminates duplicate rows, not rows that have a duplicate of a specific column) so wouldn't be eliminated, and GROUP BY uid would eliminate all except 1 (and which one would be undefined). How do you want to decide which row for a uid to keep? I can't see anything in your original query that would eliminate these duplicates. I probably can sort a solution out but I would need a small bit of example data (without that and table declares I can't test anything).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.