2

I have a large, multi-line string with multiple entries following a similar format. I'd like to split it into a list of strings for each entry.

I tried the following:

myre = re.compile('Record\sTime.*-{5}', re.DOTALL)
return re.findall(myre, text)

In this case, entries start with 'Record Time', and end with '-----'. Instead of acting how I'd like, the code above returns one item, starting at beginning of the first entry, and ending at the end of the last one.

I could probably find a way to make this work by using regex to find the end of a segment, then repeat with a slice of the original text starting there, but that seems messy.

3 Answers 3

5

You need to turn the .* into a reluctant match, by adding a question mark:

.*?

Otherwise it matches as much as it can, from the middle of the first record to the middle of the last record.

See Greedy vs. Reluctant vs. Possessive Quantifiers

Sign up to request clarification or add additional context in comments.

Comments

1

You can use this to avoid a reluctant quantifier, it's a trick to emulate an atomic group: (?=(...))\1. It's not totally in the subject but it can be usefull:

myre = re.compile('Record\sTime(?:(?=([^-]+|-(?!-{4})))\1)+-{5}')

Comments

1

Something like this:

txt='''\
Record Time
1
2
3
-----

Record Time
4
5
-----
Record Time
6
7
8
'''

import re
pat=re.compile(r'^Record Time$(.*?)(?:^-{5}|\Z)', re.S | re.M)
for i, block in enumerate((m.group(1) for m in pat.finditer(txt))):
    print 'block:', i
    print block.strip()

Prints:

block: 0
1
2
3
block: 1
4
5
block: 2
6
7
8

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.