3

Trying to to match the hash character fails, but succeeds for any other member of the regex.

Why does this fail?

Thanks,

Joe

UNIT = [ 'floor', 'fl', '#', 'penthouse', 'mezzanine', 'basement', 'room' ]

unit_regex = "\\b(" + UNIT.to_a.join("|") + ")\\b"

unit_regexp = Regexp.new(unit_regex, Regexp::IGNORECASE)

x=unit_regexp.match('#')
1
  • 2
    Your real problem what "word boundary" means, it roughly means "a word character on one side and nothing or a non-word character on the other side" but # is not a word character. I think you're going to have to be a little more explicit in your regex as to what you're trying to match. Commented Dec 28, 2015 at 3:10

1 Answer 1

4

As noted in the comments, your problem is that \b is a word boundary inside a regex (unless it is inside a character class, sigh, the \b in /[\b]/ is a backspace just like in a double quoted string). A word boundary is roughly

a word character on one side and nothing or a non-word character on the other side

But # is not a word character so /\b/ can't match '#' at all and your whole regex fails to match.

You're going to have to be more explicit about what you're trying to match. A first stab would be "the beginning of the string or whitespace" instead of the first \b and "the end of the string or whitespace" instead of the second \b. That could be expressed like this:

unit_regex = '(?<=\A|\s)(' + UNIT.to_a.join('|') + ')(?=\z|\s)'

Note that I've switched to single quotes to avoid all the double escaping hassles. The ?<= is a positive lookbehind, that means that (\A|\s) needs to be there but it won't be matched by the expression; similarly, ?= is a positive lookahead. See the manual for more details. Also note that we're using \A rather than ^ since ^ matches the beginning of a line not the string; similarly, \z instead of $ because \z matches the end of the string whereas $ matches the end of a line.

You may need to tweak the regex depending on your data but hopefully that will get you started.

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you all. I completely missed the word boundary issue.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.