0

I have a table with following columns:

CREATE TABLE some_table (
    col_1 ...
    col_2 ...
    col_3 ...
    col_4 ...
    col_5 ...

    # some more columns
) ENGINE = InnoDB DEFAULT CHARSET = utf8 DEFAULT COLLATE utf8_unicode_ci;

and I perform the following search:

SELECT * FROM some_table t
WHERE (t.col_1 LIKE :col1
    OR t.col_2 = :col2
    OR t.col_3 = :col3
    OR t.col_4 = :col4)
AND t.col_5 > :col5;

I populate :col1, :col2, etc. within the code. So ideally my index would be:

ALTER TABLE some_table
ADD INDEX my_index (col_1, col_2, col_3, col_4, col_5);

However, a new requirement showed up, and when :col1, col2, etc. are NULL I need to skip them. Two alternatives are present:

  1. Make the SQL Query dynamic with multiple indexes:

I can query like:

SELECT * FROM some_table t
WHERE (t.col_1 LIKE :col1
    -- OR t.col_2 LIKE :col2 -- Remove these columns from query if they're null
    -- OR t.col_3 LIKE :col3 -- Remove these columns from query if they're null
    OR t.col_4 LIKE :col4)
AND t.col_5 LIKE :col5;

but then I would need a lot more indexes because an index with col_1, col_2, col_3, col_4, col_5 would cover say a col_1, col_2, col_3 query but not col_1, col_3, col_4 query as per here and here. So my indexes become:

ALTER TABLE some_table
ADD INDEX index_1 (col_1, col_2, col_3, col_4, col_5),

ADD     INDEX index_2(col_1, col_2, col_3, col_5),
ADD     INDEX index_3(col_1, col_2, col_4, col_5),
ADD     INDEX index_4(col_1, col_3, col_4, col_5),
ADD     INDEX index_5(col_2, col_3, col_4, col_5),

ADD     INDEX index_6 (col_1, col_2, col_5),
ADD     INDEX index_7 (col_1, col_3, col_5),
ADD     INDEX index_8 (col_1, col_4, col_5),
ADD     INDEX index_9 (col_2, col_3, col_5),
ADD     INDEX index_10 (col_2, col_4, col_5),
ADD     INDEX index_11 (col_3, col_4, col_5),

ADD     INDEX index_12 (col_1, col_5),
ADD     INDEX index_13 (col_2, col_5),
ADD     INDEX index_14 (col_3, col_5),
ADD     INDEX index_15(col_4, col_5),
ADD     INDEX index_16 (col_5);

I do not really care about my read performance, but a lot of storage overhead might be an issue in the long term.

  1. Second alternative: Keep the single index (my_index) as is, but change query:
SELECT * FROM some_table t
WHERE ((t.col_1 IS NOT NULL AND t.col_1 LIKE :col1)
    OR (t.col_2 IS NOT NULL AND t.col_2 LIKE :col2)
    OR (t.col_3 IS NOT NULL AND t.col_3 LIKE :col3)
    OR (t.col_4 IS NOT NULL AND t.col_4 LIKE :col4))
AND t.col_5 LIKE :col5;

I don't know if it would be enough to trick MySQL to use the index because if say :col1 is indeed NULL the condition becomes: t.col_2 IS NOT NULL AND t.col_2 LIKE NULL and since this is an impossible condition, the optimizer could remove it.

With the data currently in hand, both queries run about the same time. What is the correct way to proceed with this?

3
  • @Akina Hi, thanks for the comment. Can you elaborate on normalize? The problem is that the data is not mine, so I have very few options (like creating indexes) for the moment. Commented Jan 15, 2021 at 12:51
  • @Akina, sorry was not specific enough, The columns are not dependent, to be exact with the context, in an online payment transaction, columns are, merchantId, customerId, transactionDate... Commented Jan 15, 2021 at 12:58
  • col5 is the only part that will be used; putting it last makes the index useless. See my answer for more discussion. Commented Jan 15, 2021 at 18:52

2 Answers 2

1

OR won't use an index (except in the case of "covering"). So, don't bother with those multi-column indexes.

A trick to avoid OR is to use UNION. More discussion: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or

You are using LIKE. Does this mean that the user might say LIKE '%abc'? If so, that also cannot use an index, even with the UNION technique. That is, if you discover any fill-ins that start with a wildcard, then your original OR is probably optimal (though slow).

Are the search values "words"? In particular, the same word? If so, there is an excellent (though not perfect) solution:

FULLTEXT(col1, col2, col3, col4)

together with

MATCH(col1, col2, col3, col4) AGAINST("+word" IN BOOLEAN MODE)

That will be faster than any other approach, but it depends on the search strings being "words" and of a minimal length and not being "a stop word".

Whatever approach you settle on, I recommend you "construct" the query in your app so that you can skip col2 if the user does not need to test with it, etc.

As for AND col5 LIKE ... -- Again this is costly if it starts with a wildcard. If it does not, then this will be beneficial:

INDEX(col5)

If no wildcard, then that is equivalent to AND col5 = '...', which optimizes nicely. If it ends with a wildcard, it is a "range", somewhat like BETWEEN. This is also efficient.

If you have both:

WHERE MATCH(col1, col2, col3, col4) AGAINST("+word" IN BOOLEAN MODE)
  AND col5 ...

The Optimizer will perform the MATCH first; any index on col5 or leading wildcard won't matter.

Since the columns are merchantId, customerId, transactionDate, etc it further behooves you to "construct" the query. If it looks like a date, then test only transactionDate. That gets rid of OR and leads to possibly having INDEX(transactionDate, col5). (We can give you better advice if you are more specific with the column names and types). Also, are those Ids numbers or strings? What are some examples of where your LIKE might involve a wildcard? Or do the users even know about such?

Sign up to request clarification or add additional context in comments.

3 Comments

I'm always looking for an exact match (apart from transactionDate, which is col_5, and the condition is >). So I interpret this as: I should convert to UNIONs, with INDEX (col_1, transaction_date), INDEX (col_2, transaction_date), INDEX (col_3, transaction_date) and INDEX (col_4, transaction_date). Would another INDEX (transaction_date) help?
Not performing the UNION for the columns where :colN is null will also take care of handling of the null values.
@HasanCanSaral - Yes. That is part of "constructing" the query -- leave out stuff that is not needed based on user input. And, yes; build a suitable index for each of the SELECTs in the UNION; and they will usually be different.
0

Just a suggestion A part the problems related to a propably bad attribute design .. A lot of your indexes are totally useless

ADD     INDEX index_1 (col_1, col_2, col_3, col_4, col_5),

ADD     INDEX index_2(col_1, col_2, col_3, col_5),
ADD     INDEX index_3(col_1, col_2, col_4, col_5),
ADD     INDEX index_4(col_1, col_3, col_4, col_5),
ADD     INDEX index_5(col_2, col_3, col_4, col_5),

ADD     INDEX index_6 (col_1, col_2, col_5),
ADD     INDEX index_7 (col_1, col_3, col_5),
ADD     INDEX index_8 (col_1, col_4, col_5),

all these indexes work in the same way ..

the db engine use a index evaluating the columns from left to right .. so you are just replicate several time the same value without any effort for performance

if you want better performance you should change the left to right column order eg_

ADD     INDEX index_2(col_2, col_3, col_4, col_1),
ADD     INDEX index_3(col_3, col_4, col_5, col_1),

anyway you should add the index for the more frequente and selective column

9 Comments

Thank you for the answer. So indexes with 5 columns for every possible sequence would cover all my queries?
Pratically yes .. even the concept is more complex .. anyway .. i suggest you a review for table struct ..and normalize these columns in a proper relation
Can you elaborate on the meaning of normalize? Thanks.
depend of the meaning of the columns ..if the columns refer to data complety different then the normalize in not necessary . if the columns are value repetition of same contents then you should create a detail table for one or more vale related to you main . record .. but without a real data sample is not possible answer properly .
They are absolutely independent. To be exact with the context, in an online payment transaction, columns are, merchantId, customerId, transactionDate...
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.