5

I have tried this command in python console:

re.match('^\<.+\>([\w\s-,]+)\<.+\>$', 'Carrier-A')

and I got:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/re.py", line 141, in match
    return _compile(pattern, flags).match(string)
  File "/usr/lib/python2.7/re.py", line 251, in _compile
    raise error, v # invalid expression
sre_constants.error: bad character range

but when I use:

re.match('^\<.+\>([\w\s,-]+)\<.+\>$', 'Carrier-A')

no error is being returned.

What is it that I should consider about character sequences?

1
  • I think the correct answer to that is because \s is itself a range of characters, so you can not use it as a start or end character range, check my answer to understand more thoroughly. Commented Jan 15, 2017 at 9:47

3 Answers 3

11

A dash -, when used within square brackets [], has a special meaning: it defines a range of characters. E.g., [\s-,] means "any character from \s to ," (which is not possible). However, the dash does not have the special meaning if it is either the first or the last character in the square brackets. That's why your second regex is correct.

Sign up to request clarification or add additional context in comments.

1 Comment

@MYGz: \s is not just space, but a shorthand for "any kind of whitespace" (including tabs, carriage returns, line feeds etc.). So it doesn't correspond to a single ASCII value/Unicode code point.
3

the character - stands for specifying the range of characters within a character class, which works based on the ASCII number of the characters. So the left side must always have a lower ASCII number than the right side. And whenever your regex doesn't meet this criteria python will raise that error. Which in this case your range is completely meaningless, since it's \s-, which means any character between whitespaces and comma! which is obviously wrong!

And if you want to use the hyphen character literally you have two options in python first is escaping the characters with a backslash, like [\w\s\-,] and the second one is putting it at the leading or trailing of other characters within character-class, as you did. [\w\s,-]

Read more http://www.regular-expressions.info/charclass.html

1 Comment

@MYGz It's not only the "space" it's a whitespace modifier, which will match all whitespaces include tab, space, etc. Which makes it ambiguous in this context.
2

Actually, the - (minus sign) is used to provide character ranges inside [], so that's why you got error when using:

re.search('^\<.+\>([\w\s-,]+)\<.+\>$', 'Carrier-A') as if you were saying there is a character range from \s to , which is not correct.

Whereas in your second expression, ([\w\s,-]+, there is no character ranges since the - is at the end of your characters class (between []), which generates no issues.

The reason as to why character ranges starting or ending with \s does not work as quoted from Python's doc:

\s

For Unicode (str) patterns: Matches Unicode whitespace characters (which includes [ \t\n\r\f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the ASCII flag is used, only [ \t\n\r\f\v] is matched (but the flag affects the entire regular expression, so in such cases using an explicit [ \t\n\r\f\v] may be a better choice). For 8-bit (bytes) patterns: Matches characters considered whitespace in the ASCII character set; this is equivalent to [ \t\n\r\f\v].

3 Comments

The reason is simple, IMHO, \s does not represent a single character, but as per re's doc: "Unicode whitespace characters (which includes [ \t\n\r\f\v], and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages)", So how can someone makes a character range with \s ?
That makes sense. Update it in your answer.
@MYGz, updated accordingly, thanks for the head's up :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.