1

I have a table with a column that contains a regular expression used to match rows from other tables. I then need to query like this

SELECT st.id
FROM some_table st
WHERE '1234' REGEXP st.regexp;

As long as the column regexp contains a valid expression or null the query will run fine. But, if you have an invalid regexp in any of the rows, the entire query fails with error 3685. It is then virtually impossible to find where the error is, as there is no function for validating the regular expression. Something like VALID_REGEXP() would solve it like

SELECT *
FROM some_table st
WHERE NOT VALID_REGEXP(st.regexp);

I am validating on INSERT/UPDATE by doing

SELECT '' REGEXP 'regexp-to-test'

But, if an invalid expression find its way in anyway, there is no way to find it in several millions of rows, as you will have to test them one-by-one and looking for error 3685.

Any hint on how to, in one query, find all rows with invalid regular expressions in their regexp column?

SELECT '' REGEXP <expression>

will let me know if is valid, as it will return a row while it will return with error 3685 if it fails. But, testing row-by-row is not an options as there is a huge amount of rows to test.

1
  • What version of MySQL are you running? Commented Oct 3, 2023 at 11:44

1 Answer 1

1

Because MySQL doesn't have a built-in Regex validator function you'll need to use something like a shell-script to test and report all regex patterns in your table - but after you do that you can prevent this happening in future by adding a CHECK constraint that runs the stored regex pattern against a known match (or known non-match) stored in the same row: if it's an invalid regex then the CHECK constraint will fail, thus preventing invalid patterns from being stored in the table in future.

Part 1: Validate patterns in the shell:

Using this QA as a source, and this one too.

mysql -e "SELECT pattern FROM my_patterns" | while read pattern; do
    
    echo "foobar" | grep "^${pattern}"

done

You'll need to update the data manually based on how each grep goes.

Part 2: Prevent invalid patterns from being stored with a CHECK constraint:

For example:

CREATE TABLE my_patterns (
  patId   int          NOT NULL AUTO_INCREMENT,
  pattern varchar(255) NOT NULL,
  test    varchar(50)  NOT NULL,
  
  CONSTRAINT PK_patterns PRIMARY KEY ( patId ),
  
  CONSTRAINT CK_pattern_test CHECK ( REGEXP_LIKE( test, pattern ) = 1 )
);

INSERT INTO my_patterns ( pattern, test ) VALUES ( '\\d\\d\\d', '123' ); -- OK

INSERT INTO my_patterns ( pattern, test ) VALUES ( '\\d\\w\\d', '1a3' ); -- OK

INSERT INTO my_patterns ( pattern, test ) VALUES ( '][', 'aaa' ); -- fails due to invalid pattern

Schema Error: Check constraint 'CK_pattern_test' is violated.

Live example.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks. Part 2 solves the problem. The project isn't live yet, so with this CHECK on DB level I will not risk getting corrupt data in, and then there will be no need query to find invalid expressions.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.