1

Say I have a string "ldhjshjds HdAjhdshj4 Hdsshj4 kdskjdshjdsjds"

I only want to search for substrings (alphanumeric only) starting with "H", but only if the string is between 10-20 characters.

"HdAjhdshj4" would be a match. "Hdsshj4" would not.

Would such a regex be costly on CPU cycles?

3 Answers 3

1

r"\bH[A-Za-z0-9]{9,19}\b" looks for precisely that.

Sign up to request clarification or add additional context in comments.

3 Comments

It should probably be r"\bH[A-Za-z0-9]{9,19}\b"; without the word anchors, it won't insist on the "word" starting with an H, and it won't really enforce the length limit (it will match the first 20 characters of a 30 character word, which does not seem the be desired). Also, side-note, you've got a typo in the character class; you typed a-Z instead of a-z.
Touché, from the examples given it does not seem like OP desires matches inside of words. Edited the answer to your expression.
Oh, and just to answer the OP's other question: With the word and character anchors and bounded quantifiers, this regex should be pretty darn cheap to run. Profile if you have performance issues, but I don't foresee any.
0

Use negative lookarounds.

re.findall(r'(?<!\S)H[A-Za-z0-9]{9,19}(?!\S)', s)

DEMO

Comments

0

You can do this using lookarounds.

re.findall(r'(?:^|(?<=\s))H[A-Za-z0-9]{9,19}(?=\s|$)', s)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.