2

I'm using non-capturing group in regex i.e., (?:.*) but it's not working.

I can still able to see it in the result. How to ignore it/not capture in the result?

Code:

import re

text = '12:37:25.790 08/05/20 Something   P  LR    0.156462 sccm   Pt   25.341343 psig something-else'

pattern = ['(?P<time>\d\d:\d\d:\d\d.\d\d\d)\s{1}',
           '(?P<date>\d\d/\d\d/\d\d)\s',
           '(?P<pr>(?:.*)Pt\s{3}\d*[.]?\d*\s[a-z]+)'
          ]

result = re.search(r''.join(pattern), text)

Output:

>>> result.group('pr')
            
'Something   P  LR    0.156462 sccm   Pt   25.341343 psig'

Expected output:

'Pt   25.341343 psig'

More info:

>>> result.groups()
            
('12:37:25.790', '08/05/20', 'Something   P  LR    0.156462 sccm   Pt   25.341343 psig')
4
  • Remove (?:.*) regex101.com/r/X69k0V/1 and if the digits can not be optional (?P<pr>Pt\s{3}\d+(?:\.\d+)?\s[a-z]+) Commented Aug 7, 2020 at 7:57
  • @Thefourthbird There is one more group before this pattern, which I want to capture. So, it's not possible to remove this. Commented Aug 7, 2020 at 7:59
  • What do you want to capture before? Like this? .*\b(?P<pr>Pt\s{3}\d+(?:\.\d+)?\s[a-z]+) regex101.com/r/sA3atN/1 Commented Aug 7, 2020 at 7:59
  • 1
    Like this? (?P<time>\d\d:\d\d:\d\d.\d\d\d)\s{1}(?P<date>\d\d/\d\d/\d\d)\s.*?(?P<pr>Pt\s{3}\d*[.]?\d*\s[a-z]+) regex101.com/r/dQULqz/1 Commented Aug 7, 2020 at 8:06

3 Answers 3

1

The quantifier is inside the named group, you have to place it outside and possibly make it non greedy to.

The updated pattern could look like:

(?P<time>\d\d:\d\d:\d\d.\d\d\d)\s{1}(?P<date>\d\d/\d\d/\d\d)\s.*?(?P<pr>Pt\s{3}\d*[.]?\d*\s[a-z]+)

Note that with he current pattern, the number is optional as all the quantifiers are optional. You can omit {1} as well.

If the number after Pt can not be empty, you can update the pattern using \d+(?:\.\d+)? matching at least a single digit:

(?P<time>\d\d:\d\d:\d\d.\d{3})\s(?P<date>\d\d/\d\d/\d\d)\s.*?(?P<pr>Pt\s{3}\d+(?:\.\d+)?\s[a-z]+)
  • (?P<time> Group time
  • \d\d:\d\d:\d\d.\d{3} Match a time like format
  • )\s Close group and match a whitespace char
  • (?P<date> Group date
    • \d\d/\d\d/\d\d Match a date like pattern
  • )\s Close group and match a whitespace char
  • .*? Match any char except a newline, as least as possible
  • (?P<pr> Group pr
    • Pt\s{3} Match Pt and 3 whitespace chars
    • \d+(?:\.\d+)? Match 1+ digits with an optional decimal part
  • \s[a-z]+ Match a whitespace char an 1+ times a char a-z
  • ) Close group

Regex demo

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks for the answer. What is the reason for adding non-capturing for decimal number i.e., (?:\.\d+)?
@shaikmoeed It means that you first match 1 or more digits and optionally match a dot followed by 1 or more digits. So the non capturing group makes matching a dot and the following digits as a whole optional to prevent matching 123. for example.
1

Remove the non-capturing group from your named group. Using a non-capturing group means that no new group will be created in the match, not that that part of the string will be removed from any including group.

import re

text = 'Something   P  LR    0.156462 sccm   Pt   25.341343 psig something-else'

pattern = r'(?:.*)(?P<pr>Pt\s{3}\d*[.]?\d*\s[a-z]+)'

result = re.search(pattern, text)
print(result.group('pr'))

Output:

Pt   25.341343 psig

Note that the specific non-capturing group you used can be excluded completely, as it basically means that you want your regex to be preceded by anything, and that's what search will do anyway.

1 Comment

Thanks for explaining. It works after keeping a non-capturing group before the pr group.
0

I think there is a confusion regarding the meaning of "non-capturing" here: It does not mean that the result omits this part, but that no match group is created in the result.

Example where the same regex is executed with a capturing, and a non-capturing group:

>>> import re
>>> match = re.search(r'(?P<grp>foo(.*))', 'foobar')
>>> match.groups()
('foobar', 'bar')
>>> match = re.search(r'(?P<grp>foo(?:.*))', 'foobar')
>>> match.groups()
('foobar',)

Note that match.group(0) is the same in both cases (group 0 contains the matching part of the string in full).

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.