1

I have the below code

import re
age = []

txt = ('9', "10y", "4y",'unknown')
for t in txt:
    if t.isdigit() is True:
        age.append(re.search(r'\d+',t).group(0))
    else:
        age.append('unknown')
print(age)

and I get: ['9', 'unknown', 'unknown', 'unknown']

So the 9 I get, but I also need to get the 10 in the second position, the 4 in the third and unknown for the last.
Can anyone point me in the right direction? Thank you for your help!

2
  • 1
    I don't know why it is selected as duplicate I don't see any duplication.a Commented Oct 30, 2020 at 0:09
  • I think the flag is correct. I did go through an hour of stack overflowing before submitting the question getting hung up on different items..I didn't see the answer that my question is similar to. The answer from that question is similar to @Erfan pandas solution. I must have missed it. Thank you all for the help Commented Oct 30, 2020 at 11:41

3 Answers 3

2

We can make use of the fact that re.search returns None when not finding any digit:

txt = ('9', "10y", "4y",'unknown')
age = []
for t in txt:
    num = re.search('\d+', t)
    if num:
        age.append(num.group(0))
    else:
        age.append('unknown')
['9', '10', '4', 'unknown']

Since you tagged pandas, if you have a column, use str.extract:

pd.Series(txt).str.extract('(\d+)')
0      9
1     10
2      4
3    NaN
dtype: object

Sign up to request clarification or add additional context in comments.

1 Comment

Thank you!!! That's it! Geesh...I should have put more context in the question....I have been doing some volunteer work for a pet shelter...one of the columns in the data frame is xY yM for years and months. I only need the age to do some analysis so the pandas idea is probably the way to go. thanks again!
0
import re
age = []

txt = ('9', "10y22", "4y", 'unknown')

for t in txt:
    res = re.findall('[0-9]+', t)
    if res:
        age.append(res[0])
    else:
        age.append("unknown")

Comments

0
import re


age = []

txt = ('9', "10y", "4y",'unknown')
for t in txt:
    if len(t) > 1 and not t.isdigit():
        t = t.replace(t[-1], '')
    if t.isdigit() is True:
        age.append(re.search(r'\d+',t).group(0))
    else:
        age.append('unknown')
print(age)

Check this out. So the len function checks if the string is bigger than one, and then if the last letter of the string is not a digit, then the string's last letter is being replaced with an empty space. And then it follows the rest of your algorithm. You can modify it more to fit your requirements, since you didn't specify that much.

1 Comment

Thank you, this is really cool!

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.