2

I always have a hard time understanding the logic of regex in python.

all_lines = '#hello\n#monica, how re "u?\n#hello#robert\necho\nfall and spring'

I want to retrieve the substring that STARTS WITH # until the FIRST \n THAT COMES RIGHT AFTER the LAST # - I.e., '#hello\n#monica, how re "u?\n#hello#robert'

So if I try:

>>> all_lines = '#hello\n#monica, how re "u?\n#hello#robert\necho'
>>> RE_HARD = re.compile(r'(^#.*\n)')
>>> mo = re.search(RE_HARD, all_lines)
>>> print mo.group(0)
#hello

Now, if I hardcode what comes after the first \n after the last #, i.e., I hardcode echo, I get:

>>> all_lines = '#hello\n#monica, how re "u?\n#hello#robert\necho'
>>> RE_HARD = re.compile(r'(^#.*echo)')
>>> mo = re.search(RE_HARD, all_lines)
>>> print mo.group(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

I get an error, no idea why. Seems the same as before.

This is still not want I want since in reality after the first \n that comes after the last # I may have any character/string...

1

2 Answers 2

2

This program matches the pattern you request.

#!/usr/bin/python

import re

all_lines = '#hello\n#monica, how re "u?\n#hello#robert\necho'

regex = re.compile(
    r'''\#             # first hash
        .*             # continues to (note: .* greedy)
        \#             # last hash
        .*?$           # rest of the line. (note .*? non-greedy)
    ''',
    # Flags: 
    #   DOTALL: Make the '.' match any character at all, including a newline
    #   VERBOSE: Allow comments in pattern
    #   MULTILINE: Allow $ to match end of line
    re.DOTALL | re.VERBOSE | re.MULTILINE)

print re.search(regex, all_lines).group()

Reference: http://docs.python.org/2/library/re.html
Demo: http://ideone.com/aZjjVj

Sign up to request clarification or add additional context in comments.

Comments

0

Regular expressions are powerful but sometimes they are overkill. String methods should accomplish what you need with much less thought

>>> my_string = '#hello\n#monica, how re "u?\n#hello#robert\necho\nfall and spring'
>>> hash_positions = [index for index, c in enumerate(my_string) if c == '#']
>>> hash_positions
[0, 7, 27, 33]
>>> first = hash_positions[0]
>>> last = hash_positions[-1]
>>> new_line_after_last_hash = my_string.index('\n',last)
>>> new_line_after_last_hash
40
>>> new_string = my_string[first:new_line_after_last_hash]
>>> new_string
'#hello\n#monica, how re "u?\n#hello#robert'

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.