3

I am trying to parse a list of data out of a file using python - however I don't want to extract any data that is commented out. An example of the way the data is structured is:

#commented out block
uncommented block
#   commented block

I am trying to only retrieve the middle item, so am trying to exclude the items with hashes at the start. The issue is that some hashes are directly next to the commented items, and some arent, and the expression I currently have only works if items have been commented in the first example above -

(?<!#)(commented)

I tried adding \s+ to the negative lookahead but then I get a complaint that the expression does not have an obvious maximum length. Is there any way to do what I'm attempting to do?

Thanks in advance,

Dan

1
  • 1
    Maybe you just need something like ^([^#].*) Commented Oct 20, 2010 at 16:40

4 Answers 4

6

Why using regex? String methods would do just fine:

>>> s = """#commented out block
uncommented block
#   commented block
""".splitlines()
>>> for line in s:
    not line.lstrip().startswith('#')


False
True
False
Sign up to request clarification or add additional context in comments.

3 Comments

+1 Regexes are great... for certain problems. For others, there are much better (and less cryptic) solutions ;)
+1: use the right tool for the job. It's not always necessary to bring out the sledgehammer.
I ended up doing a combination of regular expression search and then checking the results for #s at the start. I only wanted to pull out certain sections of a file, that contained certain bits, hence why I used regex to search through for those pieces.
4

As SilentGhost indicated, a regular expression isn't the best solution to this problem, but I thought I'd address the negative look behind.

You thought of doing this:

(?<!#\s+)(commented)

This doesn't work, because the look behind needs a finite length. You could do something like this:

(?<!#)(\s+commented)

This would match the lines you want, but of course, you'd have to strip the whitespace off the comment group. Again, string manipulation is better for what you're doing, but I wanted to show how negative look behind could work since you were asking.

Comments

0
>>> s = """#commented out block
... uncommented block
...    #   commented block
... """
>>> for i in s.splitlines():
...    if not i.lstrip().startswith("#"):
...       print i
...
uncommented block

Comments

0

I had a similar use case to parse CI/YAML files. Figured out a simpler way is to remove the commented lines first using regex before searching/proceeding:

import re

text = ci_file.read()

# Remove commented lines first.
any_commented_line = '#.*\n'
text = re.sub(any_commented_line, '', text)
    
# Search for the target pattern.
match = re.search(PATTERN, text)

This simplified the logic in my case.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.