27

I have a question, can I say \t is equivalent to \s+ in regular expression.? I have some lines of code :

>>> b = '\tNadya Carson'
>>> c = re.compile(r'\s\s*')
>>> c
<_sre.SRE_Pattern object at 0x02729800>
>>> c.sub('',b)
'NadyaCarson'
>>> c = re.compile(r'\s\s+')
>>> c
<_sre.SRE_Pattern object at 0x027292F0>

There is pattern object till here but when I want to substitute with no space, it still shows \t instead of substituting it:

>>> c.sub('',b)
'\tNadya Carson'

Why is the attribute sub not working in this case.? Thank you.!

3
  • \t is the escape sequence for tab only. \s is the special escape for space, tab, newline etc. Commented Apr 22, 2014 at 16:18
  • When you use \s\s+ you are looking for a space followed by 1 or more spaces. It will not match \tNadya, in fact it won't match any letter at all. Commented Apr 22, 2014 at 16:20
  • 6
    @Havenard: Say rather that \s\s+ matches a whitespace character followed by one or more whitespace characters. I know that's what you meant, but it's important to be precise in your phrasing when you're talking to regex beginners. Commented Apr 22, 2014 at 17:30

4 Answers 4

35

\t is not equivalent to \s+, but \s+ should match a tab (\t).

The problem in your example is that the second pattern \s\s+ is looking for two or more whitespace characters, and \t is only one whitespace character.

Here are some examples that should help you understand:

>>> result = re.match(r'\s\s+', '\t')
>>> print result
None
>>> result = re.match(r'\s\s+', '\t\t')
>>> print result
<_sre.SRE_Match object at 0x10ff228b8>

\s\s+ would also match ' \t', '\n\t', ' \n \t \t\n'.

Also, \s\s* is equivalent to \s+. Both will match one or more whitespace characters.

Sign up to request clarification or add additional context in comments.

1 Comment

\s alone should also match to \t. The reason why the second regex doesn't work is because it's looking for two or more, yes, but also note that the reason first regex works is because \s* part includes 0 of \s as well.
7

\s+ is not equivalent to \t because \s does not mean <space>, but instead means <whitespace>. A literal space (sometimes four of which are used for tabs, depending on the application used to display them) is simply . That is, hitting the spacebar creates a literal space. That's hardly surprising.

\s\s will never match a \t because since \t IS whitespace, \s matches it. It will match \t\t, but that's because there's two characters of whitespace (both tab characters). When your regex runs \s\s+, it's looking for one character of whitespace followed by one, two, three, or really ANY number more. When it reads your regex it does this:

\s\s+

Regular expression visualization

Debuggex Demo

The \t matches the first \s, but when it hits the second one your regex spits it back out saying "Oh, nope nevermind."

Your first regex does this:

\s\s*

Regular expression visualization

Debuggex Demo

Again, the \t matches your first \s, and when the regex continues it sees that it doesn't match the second \s so it takes the "high road" instead and jumps over it. That's why \s\s* matches, because the * quantifier includes "or zero." while the + quantifier does not.

Comments

5

can I say \t is equivalent to \s+ in regular expression.?

No.

\t

Match a tab character

\s+

Matches a “whitespace character” (spaces, tabs, and line breaks) Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Comments

1

No way, \s+ says one or more white spaces BUT \t is one of the whitespace ocurring once.

So , \s+ contain \t but vice versa is not true.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.