Split text into sections using python regex

Question

I have a large, multi-line string with multiple entries following a similar format. I'd like to split it into a list of strings for each entry.

I tried the following:

myre = re.compile('Record\sTime.*-{5}', re.DOTALL)
return re.findall(myre, text)

In this case, entries start with 'Record Time', and end with '-----'. Instead of acting how I'd like, the code above returns one item, starting at beginning of the first entry, and ending at the end of the last one.

I could probably find a way to make this work by using regex to find the end of a segment, then repeat with a slice of the original text starting there, but that seems messy.

Community · Accepted Answer · 2017-05-23 11:49:41Z

5

You need to turn the .* into a reluctant match, by adding a question mark:

.*?

Otherwise it matches as much as it can, from the middle of the first record to the middle of the last record.

See Greedy vs. Reluctant vs. Possessive Quantifiers

edited May 23, 2017 at 11:49

CommunityBot

11 silver badge

answered Jan 11, 2014 at 17:40

NPE

503k114 gold badges970 silver badges1k bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Casimir et Hippolyte · Accepted Answer · 2014-01-11 17:55:35Z

1

You can use this to avoid a reluctant quantifier, it's a trick to emulate an atomic group: (?=(...))\1. It's not totally in the subject but it can be usefull:

myre = re.compile('Record\sTime(?:(?=([^-]+|-(?!-{4})))\1)+-{5}')

answered Jan 11, 2014 at 17:55

Casimir et Hippolyte

90k5 gold badges102 silver badges131 bronze badges

Comments

dawg · Accepted Answer · 2014-01-11 18:05:52Z

1

Something like this:

txt='''\
Record Time
1
2
3
-----

Record Time
4
5
-----
Record Time
6
7
8
'''

import re
pat=re.compile(r'^Record Time$(.*?)(?:^-{5}|\Z)', re.S | re.M)
for i, block in enumerate((m.group(1) for m in pat.finditer(txt))):
    print 'block:', i
    print block.strip()

Prints:

block: 0
1
2
3
block: 1
4
5
block: 2
6
7
8

edited Jan 11, 2014 at 18:05

answered Jan 11, 2014 at 17:44

dawg

105k24 gold badges143 silver badges217 bronze badges

Collectives™ on Stack Overflow

Split text into sections using python regex

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related