0

I have a string called 'raw'. I am trying to parse it in ruby in the following way:

raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(-+|\342\200\224)[ ]*\d*\.?\d+/

The output from the above is []. I think it should be: ["8.0—10.0"].

Does anyone have any insight into what is wrong with the above regex statement?

Note: \342\200\224 is equal to (em-dash, U+2014).

The piece that is not working is: (-+|\342\200\224)

I think it should be equivalent to saying, match on 1 or more - OR match on the string \342\200\224.

Any help would be greatly appreciated it!

1
  • What happens if the string contains a hyphen instead of an em-dash? i.e.: "HbA1C ranging 8.0-10.0%" Commented May 12, 2010 at 1:37

2 Answers 2

1

The original regex works for me (ruby 1.8.7), justs needs the capture to be non-capturing and scan will output the entire match. Or switch to String#[] or String#match instead of String#scan and don't edit the regex.

raw = "HbA1C ranging 8.0—10.0%"
raw.scan /\d*\.?\d+[ ]*(?:-+|\342\200\224)[ ]*\d*\.?\d+/
# => ["8.0—10.0"]

For testing/building regular expressions in Ruby there's a fantastic tool over at http://rubular.com that makes it a lot easier. http://rubular.com/r/b1318BBimb is the edited regex with a few test cases to make sure it works against them.

Sign up to request clarification or add additional context in comments.

Comments

0
raw = "HbA1C ranging 8.0—10.0%"
raw.scan(/\d+\.\d+.+\d+\.\d+/)
#=> ["8.0\342\200\22410.0"]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.