Given a test string:
teststr= 'chapter 1 Here is a block of text from chapter one. chapter 2 Here is another block of text from the second chapter. chapter 3 Here is the third and final block of text.'
I want to create a list of results like this:
result=['chapter 1 Here is a block of text from chapter one.','chapter 2 Here is another block of text from the second chapter.','chapter 3 Here is the third and final block of text.']
Using re.findall('chapter [0-9]',teststr)
I get ['chapter 1', 'chapter 2', 'chapter 3']
That's fine if all I wanted were the chapter numbers, but I want the chapter number plus all the text up to the next chapter number. In the case of the last chapter, I want to get the chapter number and the text all the way to the end.
Trying re.findall('chapter [0-9].*',teststr) yields the greedy result:
['chapter 1 Here is a block of text from chapter one. chapter 2 Here is another block of text from the second chapter. chapter 3 Here is the third and final block of text.']
I'm not great with regular expressions so any help would be appreciated.
pattern = re.compile(r'chapter (?:(?!\s+chapter \d+).)+')and usepattern.findallpattern = re.compile(r'(?i)chapter (?:(?!\s+chapter \d+).)+')and then usematches = pattern.findall(teststr)re.split(r'(?!^)(?=chapter \d)', teststr)is enough? See the Python demo.