3

I'm filtering tweets in my application and want to return all tweets that either have a certain word in the text. So if I am filtering BBC and I want all instances of BBC eg. BBC, bbc, BBC1, #BBC, @bbc, how could I write the regex.

So far I'm doing:

re.compile(r'#|@[0-9]'+term, re.IGNORECASE)

Term is a list containing words and I want returned only those words in the list with the extra @ or # or 0-9 prepending or appending that word OR the word by itself.

Thanks

3
  • 1
    The plus sign should probably be outside the brackets... Commented Nov 16, 2012 at 22:59
  • If I do that, I get this error: "Encountered Exception: unsupported operand type(s) for &: 'str' and 'int'" Commented Nov 16, 2012 at 23:03
  • 1
    Outside the brackets not outside the quotes! It's still part of the regex... Commented Nov 16, 2012 at 23:13

1 Answer 1

2

Use the '\b' delimiter to find whole words:

re.compile(r'\b(?:#|@|)[0-9]*%s[0-9]*\b' % re.escape(term), re.IGNORECASE)
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.