I would like to split a text document on two new-line characters:
# document example
field1: content asd..\n\nfield2: content qwe...\n\nfield3: content asfdqegt
but sometimes fields contain new-line characters within their content (see field2):
field1: content asd..\n\nfield2: content\n\nqwe...\n\nfield3: content asfdqegt
because of this, I can't use \n\n as separator
actual behavior:
s = "field1: content asd..\n\nfield2: content\n\nqwe...\n\nfield3: content asfdqegt"
s.split("\n\n")
['field1: content asd..',
'field2: content',
'qwe...',
'field3: content asfdqegt']
expected output (need to replace \n\n between field2: and field3:, not all \n\n within document):
s.split("\n\n")
['field1: content asd..', 'field2: contentqwe...', 'field3: content asfdqegt']
my attempt:
import re
re.sub(r"(?<=field1: )(\n)(?<=field3: )", "", s) # does nothing
re.sub(r"\n", "", s) # replaces all \n, not just between field2 and field3