Should I perform regex filtering in MySQL or PHP?

Question

I have a MySQL column which holds various string data, it is a VARCHAR field.

The table has more than 100k records, and I'd like to filter a query by this field to SELECT only the records in which this field starts with any characters but 1,2,3,4,5,6,7,8,9.

Is it faster to:

write a REGEXP in the SQL query, or
just select all records and filter them out in PHP by performing a PHP REGEX?

It's likely much faster in MySQL. See Select Query | Select Entires That Don't Start With A Number - MySQL — ctwheels
– ctwheels, Commented Nov 6, 2017 at 17:20
Do it on the database, that way you are not returning a massive amount of redundant data. This type of thing is exactly what databases are designed to do. — Alex K.
– Alex K., Commented Nov 6, 2017 at 17:20
Your second option is absolutely not good. When it's about db data access, let the VERY powerful db engine do all the possible jobs. No matter how complex the sql statements become. — user7941334
– user7941334, Commented Nov 6, 2017 at 17:24
Don't worry about faster. The difference will be insignificant except under huge volumes of data. Instead, focus on which is easier to read and maintain. — Andy Lester
– Andy Lester, Commented Nov 6, 2017 at 17:24
@AndyLester a for loop in PHP on 100K records will definitely show a performance drop. Believe me, I've tried it... — ctwheels
– ctwheels, Commented Nov 6, 2017 at 17:25

SrThompson · Accepted Answer · 2017-11-06 17:44:14Z

2

The SQL query will be faster, hands down. This sort of thing is precisely what SQL is meant to be used for.

To clarify for future reference: when you need the DB to return a specific data set, you should let the DB deal with constructing the dataset by using a SQL query. Your application code can then have one or more abstractions that represent and handle the resulting dataset for your business use case, but it should not do the DB engine's job.

TL;DR: building a dataset from DB tables is a Data access layer concern, handling abstractions related to business entities is the application layer concern

edited Nov 6, 2017 at 17:44

answered Nov 6, 2017 at 17:25

SrThompson

5,7582 gold badges20 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Rick James · Accepted Answer · 2017-11-07 15:46:08Z

I reopened because the dup (Select Query | Select Entires That Don't Start With A Number - MySQL) had the inverse condition -- "rows not starting with ..."

There is a significant optimization for rows starting with some consecutive set of characters, such as 1..9:

INDEX(col)

SELECT ... WHERE col >= '1'
             AND col  < CHAR(ORD('9') + 1)

This would scan only the 1..9 rows, not the entire table, such as the PHP approach would require, and such that all three answers on the other Question require.

A second reason this is not a dup of that other one -- the Question here is more about PHP vs MySQL. The main performance argument for doing it in MySQL is to save the transmission time.

If you need a fancier REGEXP, you could switch to MariaDB, which has (I think) the same regexp engine as PHP. If you need something too complex for SQL, even with a better regexp, then you may be forced to go to PHP. But even in that case, filter as much as you can in SQL -- to minimize the amount of data being shoveled over the 'wire'.

Collectives™ on Stack Overflow

Should I perform regex filtering in MySQL or PHP?

2 Answers 2

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related