Python regex catch two kind of comment

Question

Exemple :

a = "bzzzzzz <!-- blabla --> blibli * bloblo * blublu"

I want to catch the first comment. A comment may be

(<!-- .* -->) or (\* .* \*)

That is ok :

re.search("<!--(?P<comment> .* )-->",a).group(1)

Also that :

re.search("\*(?P<comment> .* )\*",a).group(1)

But if i want one or the other in comment, i have tried something like :

re.search("(<!--(?P<comment> .* )-->|\*(?P<comment> .* )\*)",a).group(1)

But it does't work

Thanks

BTW, your regexs are greedy and would fail on something like  real material . — Kirk Strauser
– Kirk Strauser, Commented Sep 23, 2011 at 15:45

eph · Accepted Answer · 2011-09-23 15:35:30Z

2

Try conditional expression:

>>> for m in re.finditer(r"(?:(<!--)|(\*))(?P<comment> .*? )(?(1)-->)(?(2)\*)", a):
...   print m.group('comment')
...
 blabla
 bloblo

answered Sep 23, 2011 at 15:35

eph

2,03813 silver badges25 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

gurney alex · Accepted Answer · 2011-09-23 15:23:18Z

1

the exception you get in the "doesn't work" part is quite explicit about what is wrong:

sre_constants.error: redefinition of group name 'comment' as group 3; was group 2

both groups have the same name: just rename the second one

>>> re.search("(<!--(?P<comment> .* )-->|\*(?P<comment2> .* )\*)",a).group(1)
'<!-- blabla -->'
>>> re.search("(<!--(?P<comment> .* )-->|\*(?P<comment2> .* )\*)",a).groups()
('<!-- blabla -->', ' blabla ', None)
>>> re.findall("(<!--(?P<comment> .* )-->|\*(?P<comment2> .* )\*)",a)
[('<!-- blabla -->', ' blabla ', ''), ('* bloblo *', '', ' bloblo ')]

answered Sep 23, 2011 at 15:23

gurney alex

13.7k4 gold badges47 silver badges58 bronze badges

Comments

Chriszuma · Accepted Answer · 2011-09-23 15:29:05Z

1

As Gurney pointed out, you have two captures with the same name. Since you're not actually using the name, just leave that out.

Also, the r"" raw string notation is a good habit.

Oh, and a third thing: you're grabbing the wrong index. 0 is the whole match, 1 is the whole "either-or" block, and 2 will be the inner capture that was successful.

re.search(r"(<!--( .* )-->|\*( .* )\*)",a).group(2)

edited Sep 23, 2011 at 15:29

answered Sep 23, 2011 at 15:22

Chriszuma

4,61824 silver badges19 bronze badges

1 Comment

Chriszuma Over a year ago

There can never be an index 3 with this regex.

Kirk Strauser · Accepted Answer · 2011-09-23 16:06:34Z

re.findall might be a better fit for this:

import re

# Keep your regex simple. You'll thank yourself a year from now. Note that
# this doesn't include the surround spaces. It also uses non-greedy matching
# so that you can embed multiple comments on the same line, and it doesn't
# break on strings like '<!-- first comment --> fragment -->'.
pattern = re.compile(r"(?:<!-- (.*?) -->|\* (.*?) \*)")

inputstring = 'bzzzzzz <!-- blabla --> blibli * bloblo * blublu foo ' \
              '<!-- another comment --> goes here'

# Now use re.findall to search the string. Each match will return a tuple
# with two elements: one for each of the groups in the regex above. Pick the
# non-blank one. This works even when both groups are empty; you just get an
# empty string.
results = [first or second for first, second in pattern.findall(inputstring)]

score 0 · Accepted Answer · 2011-09-23 16:25:35Z

0

You could go 1 of 2 ways (if supported by Python) -

2: Conditional expression (?(condition)yes-pattern|no-pattern)
(?:(|\*) here the condition is did we capt grp1

Modifiers sg single line and global

edited Sep 23, 2011 at 16:25

answered Sep 23, 2011 at 16:19

user557597

Collectives™ on Stack Overflow

Python regex catch two kind of comment

5 Answers 5

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Comments

Comments

1 Comment

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related