I have a regex that matches all three characters words in a string:
\b[^\s]{3}\b
When I use it with the string:
And the tiger attacked you.
this is the result:
regex = re.compile("\b[^\s]{3}\b")
regex.findall(string)
[u'And', u'the', u'you']
As you can see it matches you as a word of three characters, but I want the expression to take "you." with the "." as a 4 chars word.
I have the same problem with ",", ";", ":", etc.
I'm pretty new with regex but I guess it happens because those characters are treated like word boundaries.
Is there a way of doing this?
Thanks in advance,
EDIT
Thaks to the answers of @BrenBarn and @Kendall Frey I managed to get to the regex I was looking for:
(?<!\w)[^\s]{3}(?=$|\s)
you.to be treated as 4-char words so it doesn't match the regex\Zand not$? I think they will do the same thing in this case, but$is more recognizable.