I have the following dataframe:
pandas as pd
df = pd.DataFrame({'Text': ['Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.']})
What I want to do is, first, splitting the dataframe and, then, converting the list into a dataframe and create a column that keep track of how many sentences belong to a text.
To split the text I do:
df.Text.str.split('</p>')
df.Text.str.split('</p>')[0]
As you can see, every element in the original dataframe contains the 4 sentences which I split. I now want to create a dataframe as the the following one:
ID Text
1.1 Hello, I have some text.
1.2 I would like to split it into sentences.
1.3 However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
1.4 I also need to convert lists in df which is tricky.
2.1 Hello, I have some text.
2.2 I would like to split it into sentences.
2.3 However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
2.4 I also need to convert lists in df which is tricky.
3.1 Hello, I have some text.
3.2 I would like to split it into sentences.
3.3 However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
3.4 I also need to convert lists in df which is tricky.
Can anyone help me do it?
Thanks!
PS. In the real example, the sentences are not evenly split as above.