2

I am having problems extracting values from a given string using RegEx match, the string which I am working with is below.

533 x 1981mm, 35mm Thick - Non Fire Door: £33.14

The RegEx I have is, which works fine if the string is as follows

533 x 1981mm, 35mm Thick: £33.14

^(?<first>\d+)\s*x\s*(?<second>\d+)mm,\s*(?<third>\d+)mm Thick: £(?<price>\d+\.\d+)$

My question is, how can I change the RegEx to ignore anything between the last 'mm' and the '£' sign?

What my code does it extract millimetre measurements, converts them into inches and returns a string to my method. The rest of the code is as follows.

var first = Int32.Parse(match.Groups["first"].Value);
var second = Int32.Parse(match.Groups["second"].Value);
var third = Int32.Parse(match.Groups["third"].Value);
var price = Decimal.Parse(match.Groups["price"].Value, CultureInfo.InvariantCulture);

Thank you gurus!

2
  • 1
    Maybe I don't understand your question, but shuldn't ^(?<first>\d+)\s*x\s*(?<second>\d+)mm,\s*(?<third>\d+)mm .* £(?<price>\d+\.\d+)$ do the trick? Commented Dec 20, 2013 at 10:56
  • should work as far as I can see - but .* is dangerous, cause you'll get a greedy match. If you only expect to see one amount like: £00.00 in the entire string following the matched mm then that's fine though. EDIT actually as its matching the end of the string after that then yes, no worries, should work Commented Dec 20, 2013 at 11:03

2 Answers 2

3

Replace mm Thick: £ with mm.*?£.

The .*? means "match any character (.) any number of times, including zero (*), as few times as possible (?)"

Sign up to request clarification or add additional context in comments.

4 Comments

in this case the lazy operator ? isn't needed, as the end of the string $ is being matched after the last amount. Its better to have it greedy match than lazy, as the regex engine doesn't need to backtrack as often: see link
You're right (along with your other comment) that this won't affect the output. However, I would've expected this to be more efficient in the usual "only one £" case - it'll capture until it meets the pound sign then immediately move on to match the price, rather than capturing everything, failing to match a £ at the end of the string, then backtracking character-by-character out of the greedy match until it finds the £ again.
If I've correctly read how regex engines work generally then no. With a lazy repetition operator .? for every character in the test string (whilst lazy matching) it checks forward to see if the characters beyond match, and then if not backtracks to the character after the test character, and then repeats. To get it working as you describe you would use the alternative answer provided by @MarkO
@JamesS You're right. Of course it'll try and fail to match a £ after every character the .*? matches, which will (in the usual case) be more failures than .* backtracking over the price until it finds the £.
1

Use [^£]+ to get 1 or more characters which are not a £.

^(?<first>\d+)\s*x\s*(?<second>\d+)mm,\s*(?<third>\d+)mm[^£]+£(?<price>\d+\.\d+)$

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.