Negative lookbehind in Python regular expressions

Question

I am trying to parse a list of data out of a file using python - however I don't want to extract any data that is commented out. An example of the way the data is structured is:

#commented out block
uncommented block
#   commented block

I am trying to only retrieve the middle item, so am trying to exclude the items with hashes at the start. The issue is that some hashes are directly next to the commented items, and some arent, and the expression I currently have only works if items have been commented in the first example above -

(?<!#)(commented)

I tried adding \s+ to the negative lookahead but then I get a complaint that the expression does not have an obvious maximum length. Is there any way to do what I'm attempting to do?

Thanks in advance,

Dan

Maybe you just need something like ^([^#].*)

Andrew
– Andrew

2010-10-20 16:40:28 +00:00
Commented Oct 20, 2010 at 16:40 — Andrew
– Andrew, Commented Oct 20, 2010 at 16:40

SilentGhost · Accepted Answer · 2010-10-20 17:04:02Z

6

Why using regex? String methods would do just fine:

>>> s = """#commented out block
uncommented block
#   commented block
""".splitlines()
>>> for line in s:
    not line.lstrip().startswith('#')


False
True
False

edited Oct 20, 2010 at 17:04

answered Oct 20, 2010 at 16:42

SilentGhost

322k67 gold badges312 silver badges294 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

user395760 Over a year ago

+1 Regexes are great... for certain problems. For others, there are much better (and less cryptic) solutions ;)

JoshD Over a year ago

+1: use the right tool for the job. It's not always necessary to bring out the sledgehammer.

Dan Over a year ago

I ended up doing a combination of regular expression search and then checking the results for #s at the start. I only wanted to pull out certain sections of a file, that contained certain bits, hence why I used regex to search through for those pieces.

JoshD · Accepted Answer · 2010-10-20 16:56:53Z

4

As SilentGhost indicated, a regular expression isn't the best solution to this problem, but I thought I'd address the negative look behind.

You thought of doing this:

(?<!#\s+)(commented)

This doesn't work, because the look behind needs a finite length. You could do something like this:

(?<!#)(\s+commented)

This would match the lines you want, but of course, you'd have to strip the whitespace off the comment group. Again, string manipulation is better for what you're doing, but I wanted to show how negative look behind could work since you were asking.

answered Oct 20, 2010 at 16:56

JoshD

12.9k3 gold badges46 silver badges54 bronze badges

Comments

ghostdog74 · Accepted Answer · 2010-10-20 17:03:15Z

0

>>> s = """#commented out block
... uncommented block
...    #   commented block
... """
>>> for i in s.splitlines():
...    if not i.lstrip().startswith("#"):
...       print i
...
uncommented block

answered Oct 20, 2010 at 17:03

ghostdog74

346k62 gold badges264 silver badges349 bronze badges

Comments

Ranel Padon · Accepted Answer · 2024-05-29 19:39:52Z

0

I had a similar use case to parse CI/YAML files. Figured out a simpler way is to remove the commented lines first using regex before searching/proceeding:

import re

text = ci_file.read()

# Remove commented lines first.
any_commented_line = '#.*\n'
text = re.sub(any_commented_line, '', text)
    
# Search for the target pattern.
match = re.search(PATTERN, text)

This simplified the logic in my case.

answered May 29, 2024 at 19:39

Ranel Padon

6158 silver badges15 bronze badges

Collectives™ on Stack Overflow

Negative lookbehind in Python regular expressions

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

3 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related