2

I'm trying to match numbers in scientific notation (regex from here):

scinot = re.compile('[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)')
re.findall(scinot, 'x = 1e4')
['1e4']
re.findall(scinot, 'x = c1e4')
['1e4']

I'd like it to match x = 1e4 but not x = c1e4. What should I change?

Update: The answer here has the same problem: it incorrectly matches 'x = c1e4'.

6
  • @sunkuet02 Clarified why that answer doesn't work. Commented Jan 16, 2017 at 4:36
  • What are you trying to do? Commented Jan 16, 2017 at 4:40
  • @Blender trying to match numbers in scientific notation, but not match variable names containing that pattern. Commented Jan 16, 2017 at 13:55
  • Possible duplicate of Parsing scientific notation sensibly? Commented Jan 16, 2017 at 15:53
  • 1
    @hek2mgl Thanks for the unnecessary downvote. That's the post I started from, linked where I say "regex from here". Commented Jan 16, 2017 at 16:43

3 Answers 3

5

Add anchor at the end of regex and alternative space or equal sign before the number:

[\s=]+([+-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+))$
Sign up to request clarification or add additional context in comments.

8 Comments

Thanks @Toto. That doesn't match x = 1e6.
@BogdanVasilescu: Do you mean it doesn't match the whole string x = 1e4 or 1e4 alone?
I meant it doesn't match 1e6 from x = 1e6. Sorry about the confusion. re.findall('^[+-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)$', 'x = 1e6') returns [].
@BogdanVasilescu: Remove the caret (start of string) and add [\s=], see my edit.
I'd prefer if = is not part of the match; right now I get ['=1e6'] for x=1e6.
|
2

Simply add [^\w]? to exclude all alphanumeric characters that precede your first digit:

 [+\-]?[^\w]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)

Technically, the \w will also exlude numeric characters, but that's fine because the rest of your regex will catch it.

If you want to be truly rigorous, you can replace \w with A-Za-z:

 [+\-]?[^A-Za-z]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)

Another sneaky way is to simply add a space at the beginning of your regex - that will force all your matches to have to begin with whitespace.

1 Comment

Thanks @Akshat. Your suggestion still matches c1e4 unfortunately. Also, if I add a space (nice idea!), I won't be able to match x=1e4 anymore.
0

scinot = re.compile('[-+]?[\d]+\.?[\d]*[Ee](?:[-+]?[\d]+)?')

This regex would help you to find all the scientific notation in the text.

By the way, here is the link to the similar question: Extract scientific number from string

3 Comments

Not really: re.findall('([-+])?(\d+)(\.\d*)?[eE]([-+]?\d+)*', 'x = c1e6') returns [('', '1', '', '6')]
Try now @BogdanVasilescu
still not what I need. re.findall('[-+]?[0-9]+\.?[0-9]*[Ee](?:\ *[-+]?\ *[0-9]+)?', 'x = c1e6') incorrectly matches ['1e6']

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.