1

I am trying to split a text where it is between \n\n and \n, in that order. Take this string for example:

\n\nMy take on fruits.\n\nHealthy Fruits\nAn apple is a fruit and it\'s very good.\n\nPears are good as well. Bananas are very good too and healthy.\n\nSour Fruits\nOranges are on the sour side and contains a lot of vitamin C.\n\nGrapefruits are even more sour, if you can believe it.

My desired output is:

[('Healthy Fruits', "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]

I want to parse like this because anything between \n\n and \n is the title and the rest is text under the title (So "Healthy Fruits" and "Sour Fruits" . Not sure if this is the best way to grab the titles and its text.

15
  • re.split(r'\n\n?, ur_txt) Commented Apr 20, 2021 at 14:19
  • Maybe re.findall('(?<!\n)\n\n(.+)\n(?!\n)((?s:.*?))(?=\n\n|\Z)', text) will do. Commented Apr 20, 2021 at 14:20
  • @dawg Thanks, I edited my question. I wanted to group the last sentence with Grapefruits with the Oranges sentence as they are part of the same title. Would this be possible? Commented Apr 20, 2021 at 14:38
  • Why with Regex? I can would use another way, and btw why do you expect the last one?"Grapefruits are even more sour, if you can believe it."? Commented Apr 20, 2021 at 14:39
  • @WiktorStribiżew Thanks, I edited my question. I wanted to group the last sentence with Grapefruits with the Oranges sentence as they are part of the same title. Would this be possible? Right now it just takes the Oranges sentence instead of combining Oranges and Grapefruits sentences together into one string. I would like: [('Healthy Fruits', "An apple is a fruit and it's very good. Bananas are very good too and healthy."), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')] Commented Apr 20, 2021 at 14:39

2 Answers 2

1

Given:

txt='''\
\n\nMy take on fruits.\n\nHealthy Fruits\nAn apple is a fruit and it\'s very good.\n\nPears are good as well. Bananas are very good too and healthy.\n\nSour Fruits\nOranges are on the sour side and contains a lot of vitamin C.\n\nGrapefruits are even more sour, if you can believe it.'''

desired=[('Healthy Fruits',   "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'),  ('Sour Fruits',   'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]

You can use the regex:

r'\n\n([\s\S]*?)(?=(?:\n\n.*\n[^\n])|\Z)'

Demo

Python demo:

>>> sp=[tuple(re.split('\n+',l)) for l in re.findall(r'\n\n([\s\S]*?)(?=(?:\n\n.*\n[^\n])|\Z)',txt) if '\n' in l]

>>> sp
[('Healthy Fruits', "An apple is a fruit and it's very good.", 'Pears are good as well. Bananas are very good too and healthy.'), ('Sour Fruits', 'Oranges are on the sour side and contains a lot of vitamin C.', 'Grapefruits are even more sour, if you can believe it.')]

>>> sp==desired
True
Sign up to request clarification or add additional context in comments.

1 Comment

Thank you very much. This was exactly what I was looking for.
1

This not regex but it works:

text="\n\nMy take on fruits.\n\nHealthy Fruits\nAn apple is a fruit and it\'s very good. Bananas are very good too and healthy.\n\nSour Fruits\nOranges are on the sour side and contains a lot of vitamin C.\n\nGrapefruits are even more sour, if you can believe it."
    NewList=[]
    Newtext=text.split("\n\n")
    for line in Newtext:
        if line.find("\n")>=0:
            NewList.extend(line.split('\n'))
    
    NewList[len(NewList)-1]=str(NewList[len(NewList)-1])+str(Newtext[len(Newtext)-1])

1 Comment

Thank you for your help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.