0

I have a query like this:

SELECT * FROM mytable WHERE
((num=8198747 AND class='A') OR
 (num=1646463 AND class='B') OR
 (num=4099442 AND class='C') OR
 (num=1176312 AND class='A') OR
 (num=2146847 AND class='B') OR
 (num=7000296 AND class='F') OR
 --...about 400 more clauses like this
)

SHOW INDEXES FROM mytable;
+---------+------------+----------+--------------+---------------+
| Table   | Non_unique | Key_name | Seq_in_index | Column_name   |
+---------+------------+----------+--------------+---------------+
| mytable |          0 | PRIMARY  |            1 | id            |
| mytable |          1 | nc_idx   |            1 | num           |
| mytable |          1 | nc_idx   |            2 | class         |
+---------+------------+----------+--------------+---------------+
3 rows in set (0.00 sec)

My understanding is that for each of the ~400 clauses in the query, it will do a separate BTREE lookup on num=XXXXXXXX. Is there any value in changing the query to:

SELECT * FROM mytable WHERE
((class='A' AND num IN (8198747, 1176312, ...)) OR
 (class='B' AND num IN (1247910, 1248192, ...)) OR
 (class='F' AND num IN (7244626, 9084903, ...)) OR
 --...for each class in the query
)

after adding a new index cn_idx on class and num, in that order? I think it won't be much faster, since class is just one char, and thus the number of BTREE lookups will be the same. But, each subtree will be shorter. Thoughts?

5
  • Assuming your classes go from A-Z, which is 26 characters - it seems pointless to have it indexed as the selectivity is low. On the other hand, and I realize this isn't related to your direct question about B-TREEs - why would anyone have 400 clauses like this in the first place? It seems absolutely ineffective to construct such a query. Commented Mar 16, 2011 at 19:01
  • I agree. I didn't write said query, but I'm tasked with optimizing it. How would you write it? Commented Mar 16, 2011 at 19:03
  • Since I have no idea what the query is actually supposed to do - I just know I wouldn't index a column that would produce low index selectivity :) Commented Mar 16, 2011 at 19:05
  • Basically, I need to query something like 400 rows from a table with over 1 million values. The selectivity of num is nearly 1, and that of class is nearly 0. I'm starting to think that there is no way around the query I have (which is not that slow). Commented Mar 16, 2011 at 19:45
  • If we're talking about InnoDB, you can increase innodb_buffer_pool_size variable in order to squeeze out more performance. However, since you got num with high selectivity - get rid of the index on class as it won't do anything performance-wise, it'll just waste more space for no reason. Commented Mar 16, 2011 at 19:47

1 Answer 1

2

In short

  • just stick to OR
  • the index on num is the only index that will help this query
  • an composite index on (num,class) has negligible effect, since num is already very selective

Two other ways to write it, using UNION ALL

SELECT * FROM mytable
WHERE (num=8198747 AND class='A')
UNION ALL
SELECT * FROM mytable
WHERE (num=1646463 AND class='B')
UNION ALL
SELECT * FROM mytable
WHERE (num=4099442 AND class='C')
UNION ALL
... etc ...

This may work well if you don't have too many num/class pairs, as each one will individually perform an index seek. (A composite index on num/class will work better than num and class indexes single-field indexes. class/num is less selective)

The second way uses JOIN mechanics, by making a virtual table out of the num/class pairs:

SELECT t.*
FROM mytable t
JOIN (
    select 8198747 as num, 'A' as class union all
    select 1646463, 'B' union all
    select 4099442, 'C' union all
    ... etc ...
    ) v on v.num=t.num and v.class=t.class

Performance comparison

Create a table

create table mytable (
id int auto_increment primary key,
num int, 
class char(1), 
other varchar(10), 
date timestamp default current_timestamp) ENGINE InnoDB;

Fill it up with 1 million records
(Note: data properties - selectivity: num ~ 1, class ~ 1/26)

insert into mytable(num, class, other)
select rand()*100000000, char(rand()*26+65), concat('',rand()*10000000)
from
(select 1 a union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
union all select 8 union all select 9 union all select 0) a,
(select 1 a union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
union all select 8 union all select 9 union all select 0) b,
(select 1 a union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
union all select 8 union all select 9 union all select 0) c,
(select 1 a union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
union all select 8 union all select 9 union all select 0) d,
(select 1 a union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
union all select 8 union all select 9 union all select 0) e,
(select 1 a union all select 2 union all select 3 union all select 4
union all select 5 union all select 6 union all select 7
union all select 8 union all select 9 union all select 0) f

Create the indexes

create index nc_num on mytable(num);
create index nc_class on mytable(class);

Select using OR

select * from mytable
WHERE
(num=38142659 and class='T') OR
(num=42476845 and class='E') OR
(num=45205882 and class='B') OR
(num=84861596 and class='K') OR
..... 100 in total

Output of Show profiles: (run set profiling=1; once. then run the queries. run show profiles to see the last timings)

Duration: 0.00003025

Explain extended (add explain extended before the query)

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"SIMPLE";"mytable";"range";"nc_num,nc_class";"nc_num";"5";NULL;"125";"Using where"

Select using UNION ALL between num/class

SELECT * FROM mytable WHERE (num=38142659 AND class='T') UNION ALL
SELECT * FROM mytable WHERE (num=42476845 AND class='E') UNION ALL
SELECT * FROM mytable WHERE (num=45205882 AND class='B') UNION ALL
SELECT * FROM mytable WHERE (num=84861596 AND class='K') UNION ALL
.... 100 in total

Show Profile

Duration: 0.00069525

Explain extended

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"PRIMARY";"mytable";"ref";"nc_num,nc_class";"nc_num";"5";"const";"1";"Using where"
"2";"UNION";"mytable";"ref";"nc_num,nc_class";"nc_num";"5";"const";"1";"Using where"
"3";"UNION";"mytable";"ref";"nc_num,nc_class";"nc_num";"5";"const";"1";"Using where"
"4";"UNION";"mytable";"ref";"nc_num,nc_class";"nc_num";"5";"const";"1";"Using where"
... etc

Select using Union All to build up a virtual table

SELECT t.*
FROM mytable t
JOIN (
select 41805446 num, 'X' collate utf8_general_ci class union all
select 84867135, 'P' union all
select 52747446, 'R' union all
.... etc...
) v on v.num=t.num and v.class=t.class

Show profile

Duration: 0.00026100

Explain extended

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"PRIMARY";"<derived2>";"ALL";NULL;NULL;NULL;NULL;"100";""
"1";"PRIMARY";"t";"ref";"nc_num";"nc_num";"5";"v.num";"1";"Using where"
"2";"DERIVED";NULL;NULL;NULL;NULL;NULL;NULL;NULL;"No tables used"
"3";"UNION";NULL;NULL;NULL;NULL;NULL;NULL;NULL;"No tables used"
"4";"UNION";NULL;NULL;NULL;NULL;NULL;NULL;NULL;"No tables used"
....
"101";"UNION";NULL;NULL;NULL;NULL;NULL;NULL;NULL;"No tables used"
NULL;"UNION RESULT";"<union2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,...>";"ALL";NULL;NULL;NULL;NULL;NULL;""

Select using IN

SELECT * FROM mytable WHERE
(class='A' and num in (28538065
)) or (class='B' and num in (70851926
,90457823
,94804149
)) or (class='C' and num in (74179050
,43280101
,24562525
,56859448
,38226813
,33532373
,93501613
,28634136
,8204338
,15636810
)) or (class='D' and num in (26672499
.... etc

Show profile

Duration: 0.00003125

Explain extended

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"SIMPLE";"mytable";"range";"nc_num,nc_class";"nc_num";"5";NULL;"136";"Using where"
Sign up to request clarification or add additional context in comments.

3 Comments

You do know this is MySQL we're talking about here, not MSSQL? SELECT * doesn't have ANY performance implications here. Unions will only make the whole thing slower, making temp tables and what not. Explanation on what the query does would help definitely, as there's always a better way to achieve something and query with 400 where conditions seems like a mistake in design.
Great, then you do know how many times the table will be opened in order to do all the UNIONs and how SELECT * works with MySQL and InnodB, which will help you realize why you're wrong. Happy to help.
Thank you for this extremely detailed comparison. I can see why you have 20k+ points.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.