Simple C# regex match issue

Question

I am having problems extracting values from a given string using RegEx match, the string which I am working with is below.

533 x 1981mm, 35mm Thick - Non Fire Door: £33.14

The RegEx I have is, which works fine if the string is as follows

533 x 1981mm, 35mm Thick: £33.14

^(?<first>\d+)\s*x\s*(?<second>\d+)mm,\s*(?<third>\d+)mm Thick: £(?<price>\d+\.\d+)$

My question is, how can I change the RegEx to ignore anything between the last 'mm' and the '£' sign?

What my code does it extract millimetre measurements, converts them into inches and returns a string to my method. The rest of the code is as follows.

var first = Int32.Parse(match.Groups["first"].Value);
var second = Int32.Parse(match.Groups["second"].Value);
var third = Int32.Parse(match.Groups["third"].Value);
var price = Decimal.Parse(match.Groups["price"].Value, CultureInfo.InvariantCulture);

Thank you gurus!

Maybe I don't understand your question, but shuldn't ^(?<first>\d+)\s*x\s*(?<second>\d+)mm,\s*(?<third>\d+)mm .* £(?<price>\d+\.\d+)$ do the trick? — Tim Zimmermann
– Tim Zimmermann, Commented Dec 20, 2013 at 10:56
should work as far as I can see - but .* is dangerous, cause you'll get a greedy match. If you only expect to see one amount like: £00.00 in the entire string following the matched mm then that's fine though. EDIT actually as its matching the end of the string after that then yes, no worries, should work — James S
– James S, Commented Dec 20, 2013 at 11:03

Rawling · Accepted Answer · 2013-12-20 10:58:00Z

3

Replace mm Thick: £ with mm.*?£.

The .*? means "match any character (.) any number of times, including zero (*), as few times as possible (?)"

answered Dec 20, 2013 at 10:58

Rawling

50.3k7 gold badges94 silver badges131 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

James S Over a year ago

in this case the lazy operator ? isn't needed, as the end of the string $ is being matched after the last amount. Its better to have it greedy match than lazy, as the regex engine doesn't need to backtrack as often: see link

Rawling Over a year ago

You're right (along with your other comment) that this won't affect the output. However, I would've expected this to be more efficient in the usual "only one £" case - it'll capture until it meets the pound sign then immediately move on to match the price, rather than capturing everything, failing to match a £ at the end of the string, then backtracking character-by-character out of the greedy match until it finds the £ again.

James S Over a year ago

If I've correctly read how regex engines work generally then no. With a lazy repetition operator .? for every character in the test string (whilst lazy matching) it checks forward to see if the characters beyond match, and then if not backtracks to the character after the test character, and then repeats. To get it working as you describe you would use the alternative answer provided by @MarkO

Rawling Over a year ago

@JamesS You're right. Of course it'll try and fail to match a £ after every character the .*? matches, which will (in the usual case) be more failures than .* backtracking over the price until it finds the £.

MarkO · Accepted Answer · 2013-12-20 10:59:04Z

1

Use [^£]+ to get 1 or more characters which are not a £.

^(?<first>\d+)\s*x\s*(?<second>\d+)mm,\s*(?<third>\d+)mm[^£]+£(?<price>\d+\.\d+)$

answered Dec 20, 2013 at 10:59

MarkO

2,25313 silver badges14 bronze badges

Collectives™ on Stack Overflow

Simple C# regex match issue

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related