-4

I know there is already some threads about matching regex in array: How do you use a regex in a list comprehension in Python? But I don't think these approaches are very scalable.

My question is how to do the regex matching as efficiently as possible. For example, I have a profanity word list below (It has 2000 lines in total):

.*damn
bollock.*
...

(You get the idea…)

What I want to do is to find whether a sentence contains any profanity word/pattern as fast as possible. Concatenate all this pattern into a pattern by using | will lead to a super-huge pattern.. Does anyone have ideas about how to optimize it in Python?

3
  • 1
    Beware the Scunthorpe problem… Commented May 19, 2017 at 3:48
  • I don't know who downvotes this. Any suggestion? Commented May 19, 2017 at 9:04
  • Likely due to the gratuitous use of profanity in the original post. Commented May 19, 2017 at 9:42

1 Answer 1

-1

I will give it a try for this library:

https://code.google.com/archive/p/esmre/

Regular expression acceleration in Python using Aho-Corasick

Or this:

https://github.com/WojciechMula/pyahocorasick/

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.