64

I expect the code below to echo "yes", but it does not. For some reason it won't match the single quote. Why?

str="{templateUrl: '}"
regexp="templateUrl:[\s]*'"

if [[ $str =~ $regexp ]]; then
  echo "yes"
else
  echo "no"
fi

4 Answers 4

120

Replace:

regexp="templateUrl:[\s]*'"

With:

regexp="templateUrl:[[:space:]]*'"

According to man bash, the =~ operator supports "extended regular expressions" as defined in man 3 regex. man 3 regex says it supports the POSIX standard and refers the reader to man 7 regex. The POSIX standard supports [:space:] as the character class for whitespace.

The GNU bash manual documents the supported character classes as follows:

Within ‘[’ and ‘]’, character classes can be specified using the syntax [:class:], where class is one of the following classes defined in the POSIX standard:

alnum alpha ascii blank cntrl digit graph lower print
punct space upper word xdigit

The only mention of \s that I found in the GNU bash documentation was for an unrelated use in prompts, such as PS1, not in regular expressions.

The Meaning of *

[[:space:]] will match exactly one white space character. [[:space:]]* will match zero or more white space characters.

The Difference Between space and blank

POSIX regular expressions offer two classes of whitespace: [[:space:]] and [[:blank:]]:

  • [[:blank:]] means space and tab. This makes it similar to: [ \t].

  • [[:space:]], in addition to space and tab, includes newline, linefeed, formfeed, and vertical tab. This makes it similar to: [ \t\n\r\f\v].

A key advantage of using character classes is that they are safe for unicode fonts.

Sign up to request clarification or add additional context in comments.

3 Comments

Note that [:space:] means all whitespace, including carriage returns and newlines; while [:blank:] means "horizontal" whitespace (spaces and tabs) -- regular-expressions.info/posixbrackets.html
For just matching a literal space, you can also escape it with a backslash, i.e.: regexp="templateUrl:\ *'"
@ChristophThiede Yes, that's true. Actually, though, you don't need the backslash. regexp="templateUrl: *'" also works. In either case, of course, this limits the regular expression to matching an actual ASCII blank. The other whitespace characters that may be recognized by [[:blank:]] or [[:space:]] are not matched.
4

Get rid of the square brackets in the regular expression:

regexp="templateUrl:\s*'"

With the square brackets present, the \s inside gets interpreted literally as matching either the \ or s characters, but your intent is clearly to match against the white space character class for which \s is shorthand (and therefore no square brackets needed).

$ uname -a
Linux noname 3.13.0-24-generic #47-Ubuntu SMP Fri May 2 23:30:00 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
$ bash --version
GNU bash, version 4.3.11(1)-release (x86_64-pc-linux-gnu)
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>

This is free software; you are free to change and redistribute it. 
There is NO WARRANTY, to the extent permitted by law.
$ cat test.sh
str="{templateUrl: '}" 
regexp="templateUrl:\s*'"

if [[ $str =~ $regexp ]]; then
  echo "yes"
else
  echo "no"
$ bash test.sh
yes 

3 Comments

Did you test it? Using regexp="templateUrl:\s*'" still echo's "no" for me.
I ran your script verbatim - and it echoed yes for me. I'm running on a Linux Mint 17 box. I'll update the answer to reflect as such.
You are right, I switched to a Mac and got different results from my Linux box. It appears that bash on OS X (at least the flavors that you and I have) defaults to strict POSIX notation - you should go with the answers from @John1024 or heemayl
3

This should work:

#!/bin/bash
str="{templateUrl: '}"
regexp="templateUrl:[[:space:]]*'"

if [[ $str =~ $regexp ]]; then
  echo "yes"
else
  echo "no"
fi

If you want to match zero or more whitespaces the * needs to added after [[:space:]].

5 Comments

* seems to be needed.
In my GNU bash, version 4.2.25 it is not being needed.
I see. Ugh, this is so annoying.
@crzrcn The * is needed if you want to match zero or more spaces. It is not needed if you want to match exactly one space.
Yup, that's why I said that it seems to be needed. I want to match zero or more spaces.
1

This is another way that work, if you want only the space from the space character class.

#!/bin/bash
str="{templateUrl: '}"
if [[ $str =~ templateUrl:" "*"'" ]]; then
  echo "yes"
else
 echo "no"
fi

credit to Malak Younes.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.