2

How can I create a regular expression for accepting numbers in Python? The numbers can be either integers, floats or of the format 3e+3 or 3e-3.

I want to match only the beginning of the string, and if a number in any of the above mentioned formats is present, return that number and the rest of the string.

Edit:

For example,

Input>> 290.07abcd Output>> [290.07, abcd]

Input>> abc123 Output>> None

Also, only the first occurrence is to be checked for.

For example,

Input>> -390-400abc

Output>>[-390, -400abc]

How can I do this using Python? I have tried the following, but it is not giving me the expected output:

import re
r = input()
x = re.search('^[+-]?\d*(\.\d+)?([+-][eE]\d+)?', r)
if x:
    print("x present: ", x.group())
else:
    print(None)

For example,

Input>> 100abc

Output>> x present: 100


Input>> abc100

Output>> x present:

Expected Output>> None

4
  • Try something and show what you tried, the result, and what input.output shows your attempt isn't right. Commented Dec 20, 2018 at 6:05
  • 1
    You show a few specific examples, but are you interested in matching anything that would be accepted by the Python interpreter as a floating point value? For example, .123e2 is a valid floating point expression (no leading 0 before the decimal point, and no explicit sign in the exponent). Commented Dec 20, 2018 at 6:42
  • @WarrenWeckesser yes, I would like to accept anything that is a valid floating point number Commented Dec 20, 2018 at 6:50
  • This is not a duplicate of stackoverflow.com/questions/42142309/…. The linked question is about a much more restricted set of inputs. Note that there is no mention of scientific notation in the question or in the answer of the so-called duplicate. Commented Dec 20, 2018 at 13:21

3 Answers 3

2

Here's one possibility. The pattern for a number is

number_pattern = "[+-]?((\d+\.\d*)|(\.\d+)|(\d+))([eE][+-]?\d+)?"

The pattern consists of:

  • optional sign;
  • three alternatives for the main part of the number:
    • one or more digits, followed by a decimal point, followed by zero or more digits;
    • a decimal point, followed by one or more digits;
    • one or more digits (no decimal point);
  • optional exponential part, consisting of:
    • e or E;
    • optional sign;
    • one or more digits.

The first and third alternatives for the main part of the number can be combined to consist of one or more digits, optionally followed by a decimal point followed by zero or more digits. The number pattern is then

number_pattern = "[+-]?((\d+(\.\d*)?)|(\.\d+))([eE][+-]?\d+)?"

You can use this to create a function that does what you asked:

pattern = "(" + number_pattern + ")(.*)"
compiled = re.compile(pattern)

def number_split(s):
    match = compiled.match(s)
    if match is None:
        return None
    groups = match.groups()
    return groups[0], groups[-1]

Some examples:

In [4]: print(number_split("290.07abcd"))
('290.07', 'abcd')

In [5]: print(number_split("abc123"))
None

In [6]: print(number_split("-390-400abc"))
('-390', '-400abc')

In [7]: print(number_split("0.e-3"))
('0.e-3', '')

In [8]: print(number_split("0x"))
('0', 'x')

In [9]: print(number_split(".123e2"))
('.123e2', '')
Sign up to request clarification or add additional context in comments.

1 Comment

Thanks, this works. I just wanted to confirm if the following is acceptable as well: def number_split(s): x = re.search(number_pattern, s) if x: return x.group(), s[x.end():] else: return None s = input() x = number_split(s) print(x)
1

Try this pattern:

\d+(\.\d+)?(e[+-]\d+)?

This matches:

100
100.123
100e+3
100.123e-3

Demo

7 Comments

I like your regex exp, and I extend yours like: ^(\d+(?:\.\d+)?(?:[eE][+-]?\d+)?) It will only match numbers at begining
@YangHG Depending on how the OP plans to use the pattern in his Python script, the starting and ending anchors may be inappropriate.
And your point is? Do you have reason to expect such a value?
About .123e2: the Python interpreter accepts that as a valid floating point number. In the question, the only examples given have a sign after e, so maybe Mihika isn't looking to match the generality of the Python interpreter.
Agreed about waiting for clarification... but flag for removal? I guess I don't know how comment flags are supposed to be used.
|
1

You can use this

^[+-]?\d*(\.\d+)?([+-][eE]\d+)?$
  • ^ - Start of string.
  • [+-]- Matches + or -.
  • \d* - Matches zero or more digits.
  • (\.\d+)? - Matches . followed by one or more digit.
  • ([+-][eE]\d+)? - Matches + or - followed by e or E followed by digits.
  • $ - End of string.

Demo

8 Comments

I would also allow +/- at the start of the string and e/E as exponent then you need 1 or more digits after the e/E
@SteveBarnes added + or - at the start and at least one digit after e/E
Have you tested this with .123?
+ or - are optional after e. Currently your pattern doesn't match .123e2.
About .123e2: The question says "How can I create a regular expression for accepting numbers in Python". The Python interpreter accepts .123e2 as a valid floating point number. In the question, the only examples given have a sign after e, so maybe Mihika isn't looking to match the generality of the Python interpreter.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.