2

I have this query:

SELECT stringa FROM table WHERE stringb = 'x' OR stringb = 'y' OR stringb = 'z'

That's only a shortened version, the actual query has over 1,000 'OR' clauses in the one query.

It takes minutes to execute, which is no good.

I've tried doing one query at a time like so:

SELECT stringa FROM table WHERE stringb = 'x'
SELECT stringa FROM table WHERE stringb = 'y'
SELECT stringa FROM table WHERE stringb = 'z'

But that takes even longer. I also tried one big query like so:

SELECT stringa FROM table WHERE stringb = 'x'
UNION
SELECT stringa FROM table WHERE stringb = 'y'
UNION
SELECT stringa FROM table WHERE stringb = 'z'

But again that took even longer.

If anyone has any suggestions to increase performance it would be greatly appreciated. My table is MyISAM, if it's important.

Edit:

Here's the structure of the table:

Columns:

key (CHAR PRIMARY), stringa (CHAR), stringb (CHAR)

And the rows look like so: (key - stringa - stringb)

key - a - b
key - a - c
key - a - d
key - a - e
key - a - f
key - b - b
key - b - c
key - b - d
key - c - c
key - c - d
key - c - f
key - d - f

etc. etc. ..There are nearly a million rows.

I need to select all 'stringa' where 'stringb' equals a OR b OR c, etc.

Of course stringa and stringb aren't just 'a' and 'b', they contain CHARs of length varying between 3 - 80 characters.

I hope that helps in some way

10
  • If your single-column queries took long time, you are very likely missing an index. Commented Aug 3, 2012 at 21:26
  • Usually a "select '" makes it hard the optimizer of any database system to use any sort of index and the performance suffers badly. Try to select only the column values you need. Put an an index on the column or columns you need, if there isn't one. Commented Aug 3, 2012 at 21:27
  • 1
    Where are you getting your OR clause comparisons from? Can't you do a SELECT column WHERE a in (SELECT valid_terms FROM other_table)? Or is each OR really on a different column with a single value? If it's the latter...something's not right Commented Aug 3, 2012 at 21:27
  • Are all three columns, a, b and c indexed? Show us the table and indexes definitions. We'll have a better chance of helping you. Commented Aug 3, 2012 at 21:27
  • 2
    1000 or clauses? There's something horribly wrong with what you are doing. Need more info, but given your example any solution is going to be a painful amount of work Commented Aug 3, 2012 at 21:28

5 Answers 5

2

First, change the column data type to varchar. Despite what you may have heard about char supposedly being faster, the tradeoff is to save a tiny bit of CPU for a huge increase in I/O (a very bad trade).

Second, you need an index on column stringb if it doesn't already have one. Indexes do not have to be unique.

Third, many DBMSes have no problem with thousands of OR conditions, though usually such a thing is expressed as WHERE stringb IN ('a', 'b', 'c', 'd', 'e' ...).

Finally, in many cases a JOIN, if not providing superior performance (though possible in some DBMSes or situations) will at the very least provide greater clarity and reuse. For example, one thing many people do is create a string split function, that when passed a string in the format 'a,b,c,d,e' returns a rowset containing each item in a separate row. Joining to this rowset is then easy, and as long as the client can construct the string to be split, you've made your query able to be dynamically driven.

Here's one possible way to do a JOIN:

CREATE TEMPORARY TABLE Keys (
   Value varchar(30)
);

INSERT Keys VALUES ('x');
INSERT Keys VALUES ('y');
INSERT Keys VALUES ('z');

SELECT T.SomeColumns
FROM
   YourTable T
   INNER JOIN Keys K
      ON T.stringb = K.Value
Sign up to request clarification or add additional context in comments.

Comments

1

You need to create an index on the stringb column.

Your issue is more that you are doing a full table scan rather than the efficiency of the "or". It is traditional to rout lists of values in an "in" statement. In some databases, however, this would have no effect on performance.

Also, are your fields declared at char or varchar? If they are char, then that is probably the root of the performance problem. These would be padded out with spaces, greatly increasing the storage footprint and lengthening the comparison.

2 Comments

Thanks for your answer. Sorry I'm a newbie so please bare with me, but don't index's have to be unique? My stringb values aren't unique. All 3 columns are declared as CHAR(90), I read somewhere that CHAR creates bigger tables because of the padded space but it's quicker because SQL dosen't have to determine where the data ends, like with VARCHAR. Have I been misinformed there?
Indexes do not have to be unique! Also, change your column definitions to varchar. You are storing 90+ chars for each string, even an empty one. This change will probably have a big impact on your query, since you'll reduce the size of the data on disk by an order of magnitude.
1

First, as others have suggested VARCHAR is a better choice for this data than CHAR. CHAR won't be faster.

Consider partitioning the table by KEY(stringb) PARTITIONS 8 (that is just arbitrary) and add an index on (stringb,stringa). This will reduce IO and the covering index will make returning data faster.

Run equality lookups IN PARALLEL. Running:

SELECT stringa FROM table WHERE stringb  in('x',...)
SELECT stringa FROM table WHERE stringb  in('y',...)
SELECT stringa FROM table WHERE stringb  in('z',...)

In three threads will result in significant performance improvement.

You just need to put the results back together which isn't difficult. Shard-Query can be used to automatically parallelize queries with IN() lists if you want to look into it:

http://code.google.com/p/shard-query

Comments

0

Try

SELECT stringa FROM table WHERE stringb = 'x' 
UNION ALL
SELECT stringa FROM table WHERE stringb = 'y' 
UNION ALL
SELECT stringa FROM table WHERE stringb = 'z' 

or

SELECT stringa FROM table WHERE stringb in ( 'x', 'y', 'z')

Or @ErikE's solution if you truly have a thousand OR conditions.

UNION ALL should be considerably faster than UNON since your selects are mutually exclusive, you don't need to have the query removeduplicates the way that union does.

1 Comment

UNION and UNION ALL are both serialized over each query, so it buys no advantage over running the queries serially, and ends up being slower because it stages all rows into a temporary table.
0

Although I consider @HLGEM second answer the best, you can also try using regular expressions in your query for column stringb.

2 Comments

Supposing the database in question supports regular expressions, a regex of ORs can be created like stringb = a|b|c|... This is just an option. I really don't know if this would solve the problem or not but I would try to do it. Of course creating regular expressions for your database is out of scope of this question.
It would do a full table scan, so it isn't very useful. In MySQL the RLIKE function would be used. Where stringb RLIKE 'a|b|c'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.