In Python, how to check if a string match one of the regex in array `efficiently`?

Question

I know there is already some threads about matching regex in array: How do you use a regex in a list comprehension in Python? But I don't think these approaches are very scalable.

My question is how to do the regex matching as efficiently as possible. For example, I have a profanity word list below (It has 2000 lines in total):

.*damn
bollock.*
...

(You get the idea…)

What I want to do is to find whether a sentence contains any profanity word/pattern as fast as possible. Concatenate all this pattern into a pattern by using | will lead to a super-huge pattern.. Does anyone have ideas about how to optimize it in Python?

Likely due to the gratuitous use of profanity in the original post. — deceze
– deceze ♦, Commented May 19, 2017 at 9:42

Hanfei Sun · Accepted Answer · 2017-05-18 23:47:45Z

-1

I will give it a try for this library:

https://code.google.com/archive/p/esmre/

Regular expression acceleration in Python using Aho-Corasick

Or this:

https://github.com/WojciechMula/pyahocorasick/

answered May 18, 2017 at 23:47

Hanfei Sun

47.3k42 gold badges136 silver badges253 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

In Python, how to check if a string match one of the regex in array `efficiently`?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related