0

I have four tables that I am trying to join and output the result to a new table. My code looks like this:

create table tbl  
select a.dte, a.permno, (ret - rf) f0_xs_ret, (xs_ret - (betav*xs_mkt)) f0_resid, mkt_cap     last_year_mkt_cap, betav beta_value
from a inner join b using (dte)
inner join c on (year(a.dte) = c.yr and a.permno = c.permno)
inner join  d on (a.permno = d.permno and year(a.dte)-1 = year(d.dte));

All of the tables have multiple indices and for table a, (dte, permno) identify a unique record, for table b, dte id's a unique record, for table c, (yr, permno) id a unique record and for table d, (dte, permno) id a unique record. the explain from the select part of the query is:

+----+-------------+-------+--------+-------------------+---------+---------+----------    ------------------------+--------+-------------------+
| id | select_type | table | type   | possible_keys     | key     | key_len | ref                                  | rows   | Extra             |
+----+-------------+-------+--------+-------------------+---------+---------+----------    ------------------------+--------+-------------------+
|  1 | SIMPLE      | d     | ALL    | idx1              | NULL    | NULL    | NULL                                 | 264129 |                   | 
|  1 | SIMPLE      | c     | ref    | idx2              | idx2    | 4       |     achernya.d.permno                |     16 |                   | 
|  1 | SIMPLE      | b     | ALL    | PRIMARY,idx2      | NULL    | NULL    | NULL                                 |  12336 | Using join buffer | 
|  1 | SIMPLE      | a     | eq_ref | PRIMARY,idx1,idx2 | PRIMARY | 7       |    achernya.b.dte,achernya.d.permno |      1 | Using where       | 
+----+-------------+-------+--------+-------------------+---------+---------+----------------------------------+--------+-------------------+

Why does mysql have to read so many rows to process this thing? and if i am reading this correctly, it has to read (264129*16*12336) rows which should take a good month.

Could someone please explain what's going on here?

4
  • oh - i see. my understanding was that for every row read in the first table it would have to read 16*12336 rows in the other table. i assumed it would simply go down the rows of the first table and then do sequential reads of the others for each row of the first.. is that incorrect? Commented Aug 15, 2012 at 3:26
  • are you fairly certain that is the way to understand the rows column? i understand about the indices that they narrow down the amount of rows to scan by allowing mysql to go to a subset of the table but it would still have to read that entire subset for each row in my original table? Commented Aug 15, 2012 at 3:32
  • i read this online: "omputing rows to be examined is more complicated It is frequent approach to take number of rows estimated from each row in a join and multiply them ..." he goes on to say that it's inaccurate etc. but he doesn't say anything about adding stuff up. do you have any sources where i could check this out? if it was just a sum then my query would be done already Commented Aug 15, 2012 at 3:39
  • source for above mysqlperformanceblog.com/2006/07/24/… Commented Aug 15, 2012 at 3:40

1 Answer 1

2

MySQL has to read the rows because you're using functions as your join conditions. An index on dte will not help resolve YEAR(dte) in a query. If you want to make this fast, then put the year in its own column to use in joins and move the index to that column, even if that means some denormalization.

As for the other columns in your index that you don't apply functions to, they may not be used if the index won't provide much benefit, or they aren't the leftmost column in the index and you don't use the leftmost prefix of that index in your join condition.

Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.)

http://dev.mysql.com/doc/refman/5.0/en/mysql-indexes.html

Sign up to request clarification or add additional context in comments.

2 Comments

that makes sense. i just did that actually and indexed the year columns and things sped up significantly. thanks!
btw, did i read that right? that you multiply the number of rows to get an estimate?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.