logic of regex in python

Question

I always have a hard time understanding the logic of regex in python.

all_lines = '#hello\n#monica, how re "u?\n#hello#robert\necho\nfall and spring'

I want to retrieve the substring that STARTS WITH # until the FIRST \n THAT COMES RIGHT AFTER the LAST # - I.e., '#hello\n#monica, how re "u?\n#hello#robert'

So if I try:

>>> all_lines = '#hello\n#monica, how re "u?\n#hello#robert\necho'
>>> RE_HARD = re.compile(r'(^#.*\n)')
>>> mo = re.search(RE_HARD, all_lines)
>>> print mo.group(0)
#hello

Now, if I hardcode what comes after the first \n after the last #, i.e., I hardcode echo, I get:

>>> all_lines = '#hello\n#monica, how re "u?\n#hello#robert\necho'
>>> RE_HARD = re.compile(r'(^#.*echo)')
>>> mo = re.search(RE_HARD, all_lines)
>>> print mo.group(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'group'

I get an error, no idea why. Seems the same as before.

This is still not want I want since in reality after the first \n that comes after the last # I may have any character/string...

Try this resource: regex101.com/#python

Roberto
– Roberto

2014-03-29 03:04:06 +00:00
Commented Mar 29, 2014 at 3:04 — Roberto
– Roberto, Commented Mar 29, 2014 at 3:04

Robᵩ · Accepted Answer · 2014-03-29 02:04:36Z

2

This program matches the pattern you request.

#!/usr/bin/python

import re

all_lines = '#hello\n#monica, how re "u?\n#hello#robert\necho'

regex = re.compile(
    r'''\#             # first hash
        .*             # continues to (note: .* greedy)
        \#             # last hash
        .*?$           # rest of the line. (note .*? non-greedy)
    ''',
    # Flags: 
    #   DOTALL: Make the '.' match any character at all, including a newline
    #   VERBOSE: Allow comments in pattern
    #   MULTILINE: Allow $ to match end of line
    re.DOTALL | re.VERBOSE | re.MULTILINE)

print re.search(regex, all_lines).group()

Reference: http://docs.python.org/2/library/re.html
Demo: http://ideone.com/aZjjVj

edited Mar 29, 2014 at 2:04

answered Mar 29, 2014 at 1:59

Robᵩ

170k20 gold badges251 silver badges323 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

PyNEwbie · Accepted Answer · 2014-03-29 02:13:46Z

0

Regular expressions are powerful but sometimes they are overkill. String methods should accomplish what you need with much less thought

>>> my_string = '#hello\n#monica, how re "u?\n#hello#robert\necho\nfall and spring'
>>> hash_positions = [index for index, c in enumerate(my_string) if c == '#']
>>> hash_positions
[0, 7, 27, 33]
>>> first = hash_positions[0]
>>> last = hash_positions[-1]
>>> new_line_after_last_hash = my_string.index('\n',last)
>>> new_line_after_last_hash
40
>>> new_string = my_string[first:new_line_after_last_hash]
>>> new_string
'#hello\n#monica, how re "u?\n#hello#robert'

edited Mar 29, 2014 at 2:13

answered Mar 29, 2014 at 2:07

PyNEwbie

4,9707 gold badges48 silver badges88 bronze badges

Collectives™ on Stack Overflow

logic of regex in python

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related