3

I am having trouble understanding findall, which says...

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result.

Why doesn't this basic IP regex work with findall as expected? The matches are not overlapping, and regexpal confirms that pattern is highlighted in re_str.

enter image description here

Expected: ['1.2.2.3', '123.345.34.3']

Actual: ['2.', '34.']

re_str = r'(\d{1,3}\.){3}\d{1,3}'
line = 'blahblah -- 1.2.2.3 blah 123.345.34.3'
matches = re.findall(re_str, line)
print(matches)    # ['2.', '34.']
1

2 Answers 2

3

When you use parentheses in your regex, re.findall() will return only the parenthesized groups, not the entire matched string. Put a ?: after the ( to tell it not to use the parentheses to extract a group, and then the results should be the entire matched string.

Sign up to request clarification or add additional context in comments.

Comments

1

This is because capturing groups return only the last match if they're repeated.

Instead, you should make the repeating group non-capturing, and use a non-repeated capture at an outer layer:

re_str = r'((?:\d{1,3}\.){3}\d{1,3})'

Note that for findall, if there is no capturing group, the whole match is automatically selected (like \0), so you could drop the outer capture:

re_str = r'(?:\d{1,3}\.){3}\d{1,3}'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.