How do I capture the first numeric element in a string in python? [duplicate]

Question

I have the below code

import re
age = []

txt = ('9', "10y", "4y",'unknown')
for t in txt:
    if t.isdigit() is True:
        age.append(re.search(r'\d+',t).group(0))
    else:
        age.append('unknown')
print(age)

and I get: ['9', 'unknown', 'unknown', 'unknown']

So the 9 I get, but I also need to get the 10 in the second position, the 4 in the third and unknown for the last.
Can anyone point me in the right direction? Thank you for your help!

I don't know why it is selected as duplicate I don't see any duplication.a — Mehdi Golzadeh
– Mehdi Golzadeh, Commented Oct 30, 2020 at 0:09
I think the flag is correct. I did go through an hour of stack overflowing before submitting the question getting hung up on different items..I didn't see the answer that my question is similar to. The answer from that question is similar to @Erfan pandas solution. I must have missed it. Thank you all for the help — pauliec
– pauliec, Commented Oct 30, 2020 at 11:41

Erfan · Accepted Answer · 2020-10-29 22:35:43Z

2

We can make use of the fact that re.search returns None when not finding any digit:

txt = ('9', "10y", "4y",'unknown')
age = []
for t in txt:
    num = re.search('\d+', t)
    if num:
        age.append(num.group(0))
    else:
        age.append('unknown')

['9', '10', '4', 'unknown']

Since you tagged pandas, if you have a column, use str.extract:

pd.Series(txt).str.extract('(\d+)')

0      9
1     10
2      4
3    NaN
dtype: object

answered Oct 29, 2020 at 22:35

Erfan

43.3k10 gold badges75 silver badges86 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

pauliec Over a year ago

Thank you!!! That's it! Geesh...I should have put more context in the question....I have been doing some volunteer work for a pet shelter...one of the columns in the data frame is xY yM for years and months. I only need the age to do some analysis so the pandas idea is probably the way to go. thanks again!

sahasrara62 · Accepted Answer · 2020-10-29 22:39:53Z

0

import re
age = []

txt = ('9', "10y22", "4y", 'unknown')

for t in txt:
    res = re.findall('[0-9]+', t)
    if res:
        age.append(res[0])
    else:
        age.append("unknown")

answered Oct 29, 2020 at 22:39

sahasrara62

11.4k3 gold badges35 silver badges48 bronze badges

Comments

tetektoza · Accepted Answer · 2020-10-29 22:54:37Z

0

import re


age = []

txt = ('9', "10y", "4y",'unknown')
for t in txt:
    if len(t) > 1 and not t.isdigit():
        t = t.replace(t[-1], '')
    if t.isdigit() is True:
        age.append(re.search(r'\d+',t).group(0))
    else:
        age.append('unknown')
print(age)

Check this out. So the len function checks if the string is bigger than one, and then if the last letter of the string is not a digit, then the string's last letter is being replaced with an empty space. And then it follows the rest of your algorithm. You can modify it more to fit your requirements, since you didn't specify that much.

edited Oct 29, 2020 at 22:54

answered Oct 29, 2020 at 22:49

tetektoza

3101 silver badge11 bronze badges

1 Comment

pauliec Over a year ago

Thank you, this is really cool!

Collectives™ on Stack Overflow

How do I capture the first numeric element in a string in python? [duplicate]

3 Answers 3

1 Comment

Comments

1 Comment

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

Comments

1 Comment

Linked

Related