1

Here is my issue. Given below list:

a = ['COP' , '\t\t\t', 'Basis', 'Notl', 'dv01', '6m', '9m', '1y',
     '18m', '2y', '3y', "15.6", 'mm', '4.6', '4y', '5y', '10', 'mm',
     '4.6', '6y', '7y', '8y', '9y', '10y', '20y', 'TOTAL', '\t\t9.2' ]

I'm trying to get some outputs like this one. The most important note is the rows After the first number ended on "y" or "m" will come a number only if it is there in the list Example : ('3y', '15.6', '')

SAMPLE OUTPUT ( forget about the structure that is a tuple, jsut want teh values)

('6m', '', '')
('9m', '', '')
('1y', '', '')
('18m', '', '')
('2y', '', '')
('3y', '15.6', '')
('4y', '', '')
('5y', '10', '')
('6y', '', '')
('7y', '', '')
('8y', '', '')
('9y', '', '')
('10y', '', '')
('20y', '', '')

I used the following regex that should have returned :

  1. all numbers followed by "y" or "m" => (\b\d+[ym]\b)
  2. and then any number (integer or not) if it appears (meaning zero or more times)=> (\b[0-9]+.[0-9]\b)

Here is what I did, using Python3 regex and re.findall(), but still got no result

rule2 = re.compile(r"(\b\d+[ym]\b)(\b[0-9]+.*[0-9]*\b)+")
a_str = " ".join(a)
OUT2 = re.findall(rule2, a_str)
print(OUT2)
# OUT2 >>[]

Why I'm not getting the correct result?

3
  • I think that there's a problem with \b word boundary token. You cannot chain 2 word boundary tokens. Commented Jan 30, 2020 at 20:37
  • Let me give that try Commented Jan 30, 2020 at 20:39
  • It did not work still I removing \b Commented Jan 30, 2020 at 20:42

1 Answer 1

3

You cannot use word boundary twice. Since data is separated by non-letter/digits use \W+ instead.

Then, escape the dot, and make it optional, or you're not going to match 10. Don't use .* as it will match too much (regex greediness)

that yields more or less what you're looking for (note that matching strict numbers, integers or floats, is trickier than that, so this isn't perfect):

rule2 = re.compile(r"\b(\d+[ym])\W+([0-9]+\.?[0-9]*)\b")
a_str = " ".join(a)
OUT2 = re.findall(rule2, a_str)
print(OUT2)

[('3y', '15.6'), ('5y', '10')]
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you. Actually I'm interested in this solution because I previously saw a solution you gave to a problem. I tried to solve and I overworked it and you solved with a couple of regex !!. Since then, I have been trying to solve it and understand how you saw that solution. After 3 days I see that I failed to see the problem as just one string and that I do not have a deep knowledege or practice with Regex

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.