Extracting float or int number and substring from a string

Question

I've just learned regex in python3 and was trying to solve a problem. The problem is something like this:

You have given a string where the first part is a float or integer number and the next part is a substring. You must split the number and the substring and return it as a list. The substring will only contain the alphabet from a-z and A-Z. The values of numbers can be negative. For example:

Input: 2.5ax
Output:['2.5','ax']

Input: -5bcf
Output:['-5','bcf']

Input:-69.67Gh
Output:['-69.67','Gh']

and so on.

I did several attempts with regex to solve the problem.

1st attempt:

import re
i=input()
print(re.findall(r'^(-?\d+(\.\d+)?)|[a-zA-Z]+$',i))

For the input -2.55xy, the expected output was ['-2.55','xy'] But the output came:

[('-2.55', '.55'), ('', '')]

2nd attempt: My second attempt was similar to my first attempt just a little different:

import re
i=input()
print(re.findall(r'^(-?(\d+\.\d+)|\d+)|[a-zA-Z]+$',i))

For the same input -2.55xy, the output came as:

[('-2.55', '2.55'), ('', '')]

3rd attempt: My next attempt was like that:

import re
i=input()
print(re.findall(r'^-?[1-9.]+|[a-z|A-Z]+$',i))

which matched the expected output for -2.55xy and also with the sample examples. But when the input is 2..5 or something like that, it considers that also as a float.

4th attempt:

import re
i=input()
value=re.findall(r"[a-zA-Z]+",i)
print([i.replace(value[0],""),value[0]])

which also matches the expected output but has the same problem as 3rd one that goes with it. Also, it doesn't look like an effective way to do it.

Conclusion: So I don't know why my 1st and 2nd attempt isn't working. The output comes with a list of tuples which is maybe because of the groups but I don't know the exact reason and don't know how to solve them. Maybe I didn't understand the way the pattern works. Also why the substring didn't show in the output? In the end, I want to know what's the mistake in my code and how can I write better and more efficient code to solve the problem. Thank you and sorry for my bad English.

The fourth bird · Accepted Answer · 2022-09-02 13:58:31Z

2

The alternation | matches either the left part or the right part.

If the chars a-zA-Z are after the digit, you don't need the alternation | and you can use 2 capture groups to get the matches in that order.

Then using re.findall will return a list of tuples for the capture group values.

(-?\d+(?:\.\d+)?)([a-zA-Z]+)

Explanation

( Capture group 1
- -?\d+ Match an optional -
- (?:\.\d+)? Optionally match . and 1+ digits using a non capture group (so it is not outputted separately by re.findall)
) Close group 1
( Capture group 2
- [a-zA-Z]+ Match 1+ times a char a-z or A-Z
) Close group 2

regex demo

import re

strings = [
    "2.5ax",
    "-5bcf",
    "-69.67Gh",
]

pattern = r"(-?\d+(?:\.\d+)?)([a-zA-Z]+)"
for s in strings:
    print(re.findall(pattern, s))

Output

[('2.5', 'ax')]
[('-5', 'bcf')]
[('-69.67', 'Gh')]

edited Sep 2, 2022 at 13:58

answered Sep 2, 2022 at 9:41

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Samsil Arefeen Over a year ago

Ok, I understand how the capture group works but can you please explain how does '?:' expression works in your following pattern? Does this somehow remove (\.\d+) as a capture group?

The fourth bird Over a year ago

@SamsilArefeen I have added an explanation about the pattern. Then (?: is a non capture group, so you can still make that whole part optional but as re.findall returns capture group values, the non capture group prevents that.

LetzerWille · Accepted Answer · 2022-09-02 14:25:17Z

1

lookahead and lookbehind in re.sub simplify things sometimes.

(?<=\d) look behind
(?=[a-zA-Z]) look ahead

that is split between the digit and the letter.

strings = [
    "2.5ax",
    "-5bcf",
    "-69.67Gh",
]

for s in strings:
    print(re.split(r'(?<=\d)(?=[a-zA-Z])', s))


['2.5', 'ax']
['-5', 'bcf']
['-69.67', 'Gh']

answered Sep 2, 2022 at 14:25

LetzerWille

5,6965 gold badges26 silver badges28 bronze badges

Collectives™ on Stack Overflow

Extracting float or int number and substring from a string

2 Answers 2

2 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related