0

I have a MySQL column which holds various string data, it is a VARCHAR field.

The table has more than 100k records, and I'd like to filter a query by this field to SELECT only the records in which this field starts with any characters but 1,2,3,4,5,6,7,8,9.

Is it faster to:

  • write a REGEXP in the SQL query, or
  • just select all records and filter them out in PHP by performing a PHP REGEX?
9
  • It's likely much faster in MySQL. See Select Query | Select Entires That Don't Start With A Number - MySQL Commented Nov 6, 2017 at 17:20
  • Do it on the database, that way you are not returning a massive amount of redundant data. This type of thing is exactly what databases are designed to do. Commented Nov 6, 2017 at 17:20
  • Your second option is absolutely not good. When it's about db data access, let the VERY powerful db engine do all the possible jobs. No matter how complex the sql statements become. Commented Nov 6, 2017 at 17:24
  • Don't worry about faster. The difference will be insignificant except under huge volumes of data. Instead, focus on which is easier to read and maintain. Commented Nov 6, 2017 at 17:24
  • @AndyLester a for loop in PHP on 100K records will definitely show a performance drop. Believe me, I've tried it... Commented Nov 6, 2017 at 17:25

2 Answers 2

2

The SQL query will be faster, hands down. This sort of thing is precisely what SQL is meant to be used for.

To clarify for future reference: when you need the DB to return a specific data set, you should let the DB deal with constructing the dataset by using a SQL query. Your application code can then have one or more abstractions that represent and handle the resulting dataset for your business use case, but it should not do the DB engine's job.

TL;DR: building a dataset from DB tables is a Data access layer concern, handling abstractions related to business entities is the application layer concern

Sign up to request clarification or add additional context in comments.

Comments

0

I reopened because the dup (Select Query | Select Entires That Don't Start With A Number - MySQL) had the inverse condition -- "rows not starting with ..."

There is a significant optimization for rows starting with some consecutive set of characters, such as 1..9:

INDEX(col)

SELECT ... WHERE col >= '1'
             AND col  < CHAR(ORD('9') + 1)

This would scan only the 1..9 rows, not the entire table, such as the PHP approach would require, and such that all three answers on the other Question require.

A second reason this is not a dup of that other one -- the Question here is more about PHP vs MySQL. The main performance argument for doing it in MySQL is to save the transmission time.

If you need a fancier REGEXP, you could switch to MariaDB, which has (I think) the same regexp engine as PHP. If you need something too complex for SQL, even with a better regexp, then you may be forced to go to PHP. But even in that case, filter as much as you can in SQL -- to minimize the amount of data being shoveled over the 'wire'.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.