0

i need to find years of dates of births only starting with 18xx and 19xxfrom string

i'm using regex to solve task

i have testing testbirtdays = 'ABCDEFG 01.19.1701 1801 02.18.1901 2001'

def getNumbers(str): 
    array = re.findall(r'[0-9]+', str) 
    return array 

i can use this function but output will be:

getNumbers(testbirtdays)

#['01', '19', '1701', '1801', '02', '18', '1901', '2001']

my function can't do 2 things:

  1. i need numbers only starting wtih 18 and 19

  2. i need only 4x numbers to get only years and ignore months/days

so i need output like:

#['1801','1901']

5 Answers 5

3

You may use

r'(?<![0-9])1[89][0-9]{2}(?![0-9])'

Or, with word boundaries:

r'\b1[89][0-9]{2}\b'

See the regex demo #1 and regex demo #2.

Regex details:

  • (?<![0-9]) - no ASCII digit allowed immediately on the left
  • \b - a word boundary
  • 1 - a 1 digit
  • [89] - 8 or 9
  • [0-9]{2} - two ASCII digit
  • (?![0-9]) - no ASCII digit allowed immediately on the right or
  • \b - a word boundary

See the Python demo:

import re

def getNumbers(s): 
    return re.findall(r'(?<![0-9])1[89][0-9]{2}(?![0-9])', s) 

testbirtdays = 'ABCDEFG 01.19.1701 1801 02.18.1901 2001'
print(getNumbers(testbirtdays)) # => ['1801', '1901']
Sign up to request clarification or add additional context in comments.

Comments

2

here is one way :

import re

re.findall(r'\b18\d{2}\b|\b19\d{2}\b', testbirtdays)

output:

['1801', '1901']

Comments

1

You need a more specific regex like 1[8-9][0-9]{2} : a 1, then one of 89 then 2 digits

You can also do (?:18|19)[0-9]{2} start with 18or 19 then 2 other digits

def getNumbers(value):
    return re.findall(r'1[8-9][0-9]{2}', value)

r = getNumbers('ABCDEFG 01.19.1701 1801 02.18.1901 2001')
print(r)  # ['1801', '1901']

Comments

1

Try this:

def get_years(str):
    return re.findall(r"((?:18|19)\d{2})\b", str)

print(get_years(testbirtdays))

Output:

['1801', '1901']

Comments

1
test = 'ABCDEFG 01.19.1701 1801 02.18.1901 2001'
pattern = r'1[89]\d{2}'    
re.findall(pattern, test)

The pattern looks for 1 followed by 8 or 9, and 2 more digits.

Output:

['1801', '1901']

6 Comments

The | in the character class [] does not mean OR but a pipe char
But this is the output I got : ['1801', '1901']
You get that output, but it might also match 1|01
Ah, right. Now I get it - [] by itself does the OR part, so it should just be [89].
Exactly, but now you will still get partial matches in for example 55180155 which I don't think is the desired result. The answer provided by Wiktor Stribiżew shows how that can be prevented.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.