We've got a relatively straightforward query that does LEFT JOINs across 4 tables. A is the "main" table or the top-most table in the hierarchy. B links to A, C links to B. Furthermore, X links to A. So the hierarchy is basically
A
C => B => A
X => A
The query is essentially:
SELECT
a.*, b.*, c.*, x.*
FROM
a
LEFT JOIN b ON b.a_id = a.id
LEFT JOIN c ON c.b_id = b.id
LEFT JOIN x ON x.a_id = a.id
WHERE
b.flag = true
ORDER BY
x.date DESC
LIMIT 25
Via EXPLAIN, I've confirmed that the correct indexes are in place, and that the built-in MySQL query optimizer is using those indexes correctly and properly.
So here's the strange part...
When we run the query as is, it takes about 1.1 seconds to run.
However, after doing some checking, it seems that if I removed most of the SELECT fields, I get a significant speed boost.
So if instead we made this into a two-step query process:
- First query same as above except change the SELECT clause to only
SELECT a.idinstead ofSELECT * - Second query also same as above, except change the WHERE clause to only do an
a.id INagains the result of Query 1 instead of what we have before
The result is drastically different. It's .03 seconds for the first query and .02 for the second query.
Doing this two-step query in code essentially gives us a 20x boost in performance.
So here's my question:
Shouldn't this type of optimization already be done within the DB engine? Why does the difference in which fields that are actually SELECTed make a difference on the overall performance of the query?
At the end of the day, it's merely selecting the exact same 25 rows and returning the exact same full contents of those 25 rows. So, why the wide disparity in performance?
ADDED 2012-08-24 13:02 PM PDT
Thanks eggyal and invertedSpear for the feedback. First off, it's not a caching issue -- I've run tests running both queries multiple times (about 10 times) alternating between each approach. The result averages at 1.1 seconds for the first (single query) approach and .03+.02 seconds for the second (2 query) approach.
In terms of indexes, I thought I had done an EXPLAIN to ensure that we're going thru the keys, and for the most part we are. However, I just did a quick check again and one interesting thing to note:
The slower "single query" approach doesn't show the Extra note of "Using index" for the third line:
+----+-------------+-------+--------+------------------------+-------------------+---------+-------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------+-------------------+---------+-------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | t1 | index | PRIMARY | shop_group_id_idx | 5 | NULL | 102 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 4 | dbmodl_v18.t1.organization_id | 1 | Using where |
| 1 | SIMPLE | t0 | ref | bundle_idx,shop_id_idx | shop_id_idx | 4 | dbmodl_v18.t1.organization_id | 309 | |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY | PRIMARY | 4 | dbmodl_v18.t0.id | 1 | |
+----+-------------+-------+--------+------------------------+-------------------+---------+-------------------------------+------+----------------------------------------------+
While it does show "Using index" for when we query for just the IDs:
+----+-------------+-------+--------+------------------------+-------------------+---------+-------------------------------+------+----------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+-------+--------+------------------------+-------------------+---------+-------------------------------+------+----------------------------------------------+
| 1 | SIMPLE | t1 | index | PRIMARY | shop_group_id_idx | 5 | NULL | 102 | Using index; Using temporary; Using filesort |
| 1 | SIMPLE | t2 | eq_ref | PRIMARY | PRIMARY | 4 | dbmodl_v18.t1.organization_id | 1 | Using where |
| 1 | SIMPLE | t0 | ref | bundle_idx,shop_id_idx | shop_id_idx | 4 | dbmodl_v18.t1.organization_id | 309 | Using index |
| 1 | SIMPLE | t3 | eq_ref | PRIMARY | PRIMARY | 4 | dbmodl_v18.t0.id | 1 | |
+----+-------------+-------+--------+------------------------+-------------------+---------+-------------------------------+------+----------------------------------------------+
The strange thing is that both do list the correct index being used... but I guess it begs the questions:
Why are they different (considering all the other clauses are the exact same)? And is this an indication of why it's slower?
Unfortunately, the MySQL docs do not give much information for when the "Extra" column is blank/null in the EXPLAIN results.
EXPLAINshowsUsing index), you may see a significant perfomance boost. Is this what's happening?Flush TablesandReset query cachebetween tests will give you truer benchmarks.