1

I'm scratching my head trying to come up with a regex that extracts numbers from strings that are differently formatted. For example:

'1', '1.1', '1,1', '1,000,000.20', '1.00000020', '1.000.000,20', '10.20001'

I currently use the regex [-+]?[0-9]*[.,]?[0-9]+(?:[eE][-+]?[0-9]+)? and it works well in the majority of the cases except from 1,000,000.20 and 1.000.000,20.

Do you have any idea how can I tweak the previous regex to work with those examples?

4
  • 1
    In your case, you may just capture what is inside single quotes, '([^']+)'. Commented Jan 29, 2018 at 10:56
  • stackoverflow.com/questions/5917082/… might be useful. In particular you can use "Commas optional as long as they're consistent" twice (with ',' and '.' exchanged). Commented Jan 29, 2018 at 10:58
  • Are you using a programming language here? Commented Jan 29, 2018 at 10:59
  • 1
    @joanfihu Note that if your texts are clean and numbers are not following each other (after a comma or dot), you might try something like \d[\d.,]*(?:[eE][-+]?\d+)?. Enclose with word boundaries if necessary. Commented Jan 29, 2018 at 11:14

1 Answer 1

1
(?!\d+,\d+\.\d+,|\d+\.\d+,\d+.)^([+-]?(?:\d+|\d{1,3}(?:[.,]\d{3})*)(?:[.,]\d+|[eE][+-]?(?:\d+|\d{1,3}(?:[.,]\d{3})*))?)$

Perhaps something like this?

This will match all of the ones you stated, plus numbers written in the format 1e10 and 1e-9.

It will also not match numbers where there are inconsistencies in the comma dot format, i.e 10.234245,214,10.234,245.214 or 10,234.245,214

Also will allow for + or - at the beginning of these numbers

Check it out on Regex101

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.