How can I extract a numeric expression from a string, which may or may not have underscore or hyphen. For eg; like 2016-03 or 2016_03 or simply 201603.
Sample strings:
s = 'Total revenue for 2016-03 is 3000 €' # Output 2016-03
s = 'Total revenue for 2016_03 is 3000 €' # Output 2016_03
s = 'Total revenue for 201603 is 3000 €' # Output 201603
There are 6 numbers and in case we have either of - or _, then the total length is 7. There is no other number in the entire string.
I don't know how to use if-else in regex, so that in can include the logic of length 6 or 7. For simple strings like 201603, I am able to do it -
import re
print(re.findall('\d{6}','Total revenue for 201603 is 3000 €'))
['201603']
print(re.findall('\d{6}','Total revenue for 2016-03 is 3000 €'))
[]
Note: I am looking for a solution where theoretically _ or - could be anywhere in between the 6 length number. Like 123-456 or 123456 or 12345-6 and so on.
(?<=^Total revenue for )(\d+[-_]?\d+)\d{6}matches at least 6 digits in a row...r'(?<!\S)(?=\d+[_-]\d+)[\d_-]{6,7}(?!\S)'. Probably, it will be simpler to split with whitespace and then test with^(?=.{6,7}$)\d+[-_]\d+$-or_Total revenue for 201603 is 3000 €?again, sorry, I edited the comment above and added the link to the regex demo. I do not like this pattern since there are too many checks involved. Probably, the(?!\S)is still better at the end:r'(?<!\S)(?=\d+(?:[_-]\d+)?)[\d_-]{6,7}(?!\S)'or even doubled:r'(?<!\S)(?=\d+(?:[_-]\d+)?(?!\S))[\d_-]{6,7}(?!\S)'. Too much redundancy. I would combine a regex with some code.