Python: Non capturing group is not working in Regex

Question

I'm using non-capturing group in regex i.e., (?:.*) but it's not working.

I can still able to see it in the result. How to ignore it/not capture in the result?

Code:

import re

text = '12:37:25.790 08/05/20 Something   P  LR    0.156462 sccm   Pt   25.341343 psig something-else'

pattern = ['(?P<time>\d\d:\d\d:\d\d.\d\d\d)\s{1}',
           '(?P<date>\d\d/\d\d/\d\d)\s',
           '(?P<pr>(?:.*)Pt\s{3}\d*[.]?\d*\s[a-z]+)'
          ]

result = re.search(r''.join(pattern), text)

Output:

>>> result.group('pr')
            
'Something   P  LR    0.156462 sccm   Pt   25.341343 psig'

Expected output:

'Pt   25.341343 psig'

More info:

>>> result.groups()
            
('12:37:25.790', '08/05/20', 'Something   P  LR    0.156462 sccm   Pt   25.341343 psig')

Remove (?:.*) regex101.com/r/X69k0V/1 and if the digits can not be optional (?P<pr>Pt\s{3}\d+(?:\.\d+)?\s[a-z]+) — The fourth bird
– The fourth bird, Commented Aug 7, 2020 at 7:57
@Thefourthbird There is one more group before this pattern, which I want to capture. So, it's not possible to remove this. — shaik moeed
– shaik moeed, Commented Aug 7, 2020 at 7:59
What do you want to capture before? Like this? .*\b(?P<pr>Pt\s{3}\d+(?:\.\d+)?\s[a-z]+) regex101.com/r/sA3atN/1 — The fourth bird
– The fourth bird, Commented Aug 7, 2020 at 7:59
Like this? (?P<time>\d\d:\d\d:\d\d.\d\d\d)\s{1}(?P<date>\d\d/\d\d/\d\d)\s.*?(?P<pr>Pt\s{3}\d*[.]?\d*\s[a-z]+) regex101.com/r/dQULqz/1 — The fourth bird
– The fourth bird, Commented Aug 7, 2020 at 8:06

The fourth bird · Accepted Answer · 2020-08-07 08:27:33Z

1

The quantifier is inside the named group, you have to place it outside and possibly make it non greedy to.

The updated pattern could look like:

(?P<time>\d\d:\d\d:\d\d.\d\d\d)\s{1}(?P<date>\d\d/\d\d/\d\d)\s.*?(?P<pr>Pt\s{3}\d*[.]?\d*\s[a-z]+)

Note that with he current pattern, the number is optional as all the quantifiers are optional. You can omit {1} as well.

If the number after Pt can not be empty, you can update the pattern using \d+(?:\.\d+)? matching at least a single digit:

(?P<time>\d\d:\d\d:\d\d.\d{3})\s(?P<date>\d\d/\d\d/\d\d)\s.*?(?P<pr>Pt\s{3}\d+(?:\.\d+)?\s[a-z]+)

(?P<time> Group time
\d\d:\d\d:\d\d.\d{3} Match a time like format
)\s Close group and match a whitespace char
(?P<date> Group date
- \d\d/\d\d/\d\d Match a date like pattern
)\s Close group and match a whitespace char
.*? Match any char except a newline, as least as possible
(?P<pr> Group pr
- Pt\s{3} Match Pt and 3 whitespace chars
- \d+(?:\.\d+)? Match 1+ digits with an optional decimal part
\s[a-z]+ Match a whitespace char an 1+ times a char a-z
) Close group

Regex demo

edited Aug 7, 2020 at 8:27

answered Aug 7, 2020 at 8:12

The fourth bird

165k16 gold badges61 silver badges75 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

shaik moeed Over a year ago

Thanks for the answer. What is the reason for adding non-capturing for decimal number i.e., (?:\.\d+)?

The fourth bird Over a year ago

@shaikmoeed It means that you first match 1 or more digits and optionally match a dot followed by 1 or more digits. So the non capturing group makes matching a dot and the following digits as a whole optional to prevent matching 123. for example.

Thierry Lathuille · Accepted Answer · 2020-08-07 08:06:03Z

1

Remove the non-capturing group from your named group. Using a non-capturing group means that no new group will be created in the match, not that that part of the string will be removed from any including group.

import re

text = 'Something   P  LR    0.156462 sccm   Pt   25.341343 psig something-else'

pattern = r'(?:.*)(?P<pr>Pt\s{3}\d*[.]?\d*\s[a-z]+)'

result = re.search(pattern, text)
print(result.group('pr'))

Output:

Pt   25.341343 psig

Note that the specific non-capturing group you used can be excluded completely, as it basically means that you want your regex to be preceded by anything, and that's what search will do anyway.

answered Aug 7, 2020 at 8:06

Thierry Lathuille

24.4k10 gold badges49 silver badges57 bronze badges

1 Comment

shaik moeed Over a year ago

Thanks for explaining. It works after keeping a non-capturing group before the pr group.

soulmerge · Accepted Answer · 2020-08-07 08:04:21Z

0

I think there is a confusion regarding the meaning of "non-capturing" here: It does not mean that the result omits this part, but that no match group is created in the result.

Example where the same regex is executed with a capturing, and a non-capturing group:

>>> import re
>>> match = re.search(r'(?P<grp>foo(.*))', 'foobar')
>>> match.groups()
('foobar', 'bar')
>>> match = re.search(r'(?P<grp>foo(?:.*))', 'foobar')
>>> match.groups()
('foobar',)

Note that match.group(0) is the same in both cases (group 0 contains the matching part of the string in full).

answered Aug 7, 2020 at 8:04

soulmerge

76.2k20 gold badges121 silver badges160 bronze badges

Collectives™ on Stack Overflow

Python: Non capturing group is not working in Regex

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

2 Comments

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Related