2

I have the values of 'foo', 'bar' and 'buz'. I'd like to select rows 1 and 3 from the following table:

+----+------+------+------+
| Id | Col1 | Col2 | Col3 |
+----+------+------+------+
|  1 | foo  |      |      |
|  2 | foo  | bar  | buz  |
|  3 |      | bar  | buz  |
+----+------+------+------+

What would be the optimal MySQL query for that considering the following:

  • empty spaces are null
  • I have n columns where 4 < n < 20
  • there will be couple hundred rows
  • I will be selecting particular combinations of columns in a row where only provided values can be set and the rest needs to be null. E.g.: I need a query of this kind

SELECT * FROM my_table WHERE Col1 = foo OR (Col2 = bar AND Col3 = buz);

that'll return only rows 1 and 3.

In other words I'd like to have a query that returns rows only with certain combinations of values, not the rows where coln = 'some_val' and we dont' care about the others.

The reason for this is that I'd like to have 5 combinations in 1 query and I'd like to avoid writing (...) coln <> NULL(...) everywhere.

I'd appreciate your suggestions.

5
  • You want to have a query that returns a records which have at least on entry in one of the three columns col1, col2, col3? Commented Feb 25, 2015 at 11:17
  • can you explain your last bullet-point a bit more? Commented Feb 25, 2015 at 11:26
  • @blckbird, Chris: I've edited my question Commented Feb 25, 2015 at 11:37
  • Could you please give a few more examples? In particular, which, if any, of the following rows would you like returned: (4, foo, bar, NULL), (5, xyzzy, NULL, NULL), (6, bar, NULL, NULL), (7, NULL, NULL, NULL)? Commented Feb 25, 2015 at 11:40
  • @Ilmari Karonen I would like the db to assume, that for every field not provided in a query it needs to be NULL for the whole row to be returned. Even if provided values match. Commented Feb 25, 2015 at 11:51

2 Answers 2

3

You would use where clause. To find the three values:

where 'foo' in (col1, col2, col3, . . . ) or
      'bar' in (col1, col2, col3, . . . ) or
      'buz' in (col1, col2, col3, . . . )

Limiting the results to those three values is more difficult. You need to do a comparison on each column:

where ( (col1 in ('foo', 'bar', 'buz') or col1 is null) or
        (col2 in ('foo', 'bar', 'buz') or col2 is null) or
        . . . 
      )

You can use a spreadsheet to generate the code, if you don't want to type it all in.

Another alternative would be the following kludge:

where replace(replace(replace(concat_ws('', col1, col2, col3, . . . ),
                              'foo', ''
                             ), 'bar', ''
                      ), 'buz', ''
             ) = ''

This concatenates the values together and then removes the ones you care about. If everything is removed, then the row matches.

As a note: this problem suggests that you have a poor data structure. You are storing things in columns that should be in rows.

Sign up to request clarification or add additional context in comments.

3 Comments

Thanks Gordon, I'm going through your answer. Just a comment on the last sentence. I could store these details in separate tables but then I'd have x5 db queries in my script. Also, there will be an 'infinite' number of rows and finite number of columns.
@Alan . . . You would store the values in a single table, with one row per column. That is, you would use a normalized data format.
. . I am not sure if serialization is the best option here. If I serialize this data I will have to do SELECT * each time. With this structure and WHERE clause I will be selecting max 5 results from couple hundred rows a query. Unless it's more efficient to do it that way.... Thanks for the suggestion, I'll have to research it.
1

I would like the db to assume, that for every field not provided in a query it needs to be NULL for the whole row to be returned. Even if provided values match.

In that case, the trivial solution would be:

SELECT * FROM my_table WHERE
  (Col1 = 'foo' AND Col2 IS NULL AND Col3 IS NULL) OR
  (Col1 IS NULL AND Col2 = 'bar' AND Col3 = 'buz') OR
  /* ... */

In many cases, this can be a perfectly good solution. However, it can sometimes suffer suboptimal performance due to MySQL's poor index use for OR queries. In particular, if you have a combined index on all the columns, and want to make full use of it, it can be better to rewrite the query to use UNION (or UNION ALL) instead of OR:

SELECT * FROM my_table WHERE
  (Col1 = 'foo' AND Col2 IS NULL AND Col3 IS NULL) UNION ALL
SELECT * FROM my_table WHERE
  (Col1 IS NULL AND Col2 = 'bar' AND Col3 = 'buz') UNION ALL
  /* ... */

See this SQLFiddle for a demonstration of the two techniques, and compare the execution plans (the execution times are rather meaningless with such a small table; you'd need many more rows to see any systematic difference). Looking at the KEY_LEN and REF columns, you can see that, for the OR query, MySQL ends up using the index only for Col1, whereas with UNION ALL, it can use the whole index.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.