Python3 regex findall

Question

Here is my issue. Given below list:

a = ['COP' , '\t\t\t', 'Basis', 'Notl', 'dv01', '6m', '9m', '1y',
     '18m', '2y', '3y', "15.6", 'mm', '4.6', '4y', '5y', '10', 'mm',
     '4.6', '6y', '7y', '8y', '9y', '10y', '20y', 'TOTAL', '\t\t9.2' ]

I'm trying to get some outputs like this one. The most important note is the rows After the first number ended on "y" or "m" will come a number only if it is there in the list Example : ('3y', '15.6', '')

SAMPLE OUTPUT ( forget about the structure that is a tuple, jsut want teh values)

('6m', '', '')
('9m', '', '')
('1y', '', '')
('18m', '', '')
('2y', '', '')
('3y', '15.6', '')
('4y', '', '')
('5y', '10', '')
('6y', '', '')
('7y', '', '')
('8y', '', '')
('9y', '', '')
('10y', '', '')
('20y', '', '')

I used the following regex that should have returned :

all numbers followed by "y" or "m" => (\b\d+[ym]\b)
and then any number (integer or not) if it appears (meaning zero or more times)=> (\b[0-9]+.[0-9]\b)

Here is what I did, using Python3 regex and re.findall(), but still got no result

rule2 = re.compile(r"(\b\d+[ym]\b)(\b[0-9]+.*[0-9]*\b)+")
a_str = " ".join(a)
OUT2 = re.findall(rule2, a_str)
print(OUT2)
# OUT2 >>[]

Why I'm not getting the correct result?

I think that there's a problem with \b word boundary token. You cannot chain 2 word boundary tokens. — Jean-François Fabre
– Jean-François Fabre ♦, Commented Jan 30, 2020 at 20:37

Jean-François Fabre · Accepted Answer · 2020-01-30 20:43:30Z

3

You cannot use word boundary twice. Since data is separated by non-letter/digits use \W+ instead.

Then, escape the dot, and make it optional, or you're not going to match 10. Don't use .* as it will match too much (regex greediness)

that yields more or less what you're looking for (note that matching strict numbers, integers or floats, is trickier than that, so this isn't perfect):

rule2 = re.compile(r"\b(\d+[ym])\W+([0-9]+\.?[0-9]*)\b")
a_str = " ".join(a)
OUT2 = re.findall(rule2, a_str)
print(OUT2)

[('3y', '15.6'), ('5y', '10')]

answered Jan 30, 2020 at 20:43

Jean-François Fabre♦

141k24 gold badges179 silver badges246 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

MasterOfTheHouse Over a year ago

Thank you. Actually I'm interested in this solution because I previously saw a solution you gave to a problem. I tried to solve and I overworked it and you solved with a couple of regex !!. Since then, I have been trying to solve it and understand how you saw that solution. After 3 days I see that I failed to see the problem as just one string and that I do not have a deep knowledege or practice with Regex

Collectives™ on Stack Overflow

Python3 regex findall

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related