2

I have the following dataframe:

pandas as pd

df = pd.DataFrame({'Text': ['Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
                       'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
                       'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.']})

What I want to do is, first, splitting the dataframe and, then, converting the list into a dataframe and create a column that keep track of how many sentences belong to a text.

To split the text I do:

df.Text.str.split('</p>')

df.Text.str.split('</p>')[0]

As you can see, every element in the original dataframe contains the 4 sentences which I split. I now want to create a dataframe as the the following one:

ID    Text

1.1   Hello, I have some text.
1.2  I would like to split it into sentences. 
1.3  However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
1.4  I also need to convert lists in df which is tricky.
2.1  Hello, I have some text.
2.2  I would like to split it into sentences. 
2.3  However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
2.4 I also need to convert lists in df which is tricky.
3.1  Hello, I have some text.
3.2  I would like to split it into sentences. 
3.3 However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
3.4 I also need to convert lists in df which is tricky.

Can anyone help me do it?

Thanks!

PS. In the real example, the sentences are not evenly split as above.

1 Answer 1

2

You could use split to split the strings, then explode to create new rows, and finally rework the index:

df2 = (df.assign(Text=df['Text'].str.split('</p>'))
         .explode('Text')
       )

idx = df2.index.to_series().add(1).astype(str)
idx2 = idx.groupby(idx).cumcount().add(1).astype(str)

df2.index = idx+'.'+idx2

output:

                                                  Text
1.1                           Hello, I have some text.
1.2          I would like to split it into sentences. 
1.3   However, when it comes to splitting I want se...
1.4   I also need to convert lists in df which is t...
2.1                           Hello, I have some text.
2.2          I would like to split it into sentences. 
2.3   However, when it comes to splitting I want se...
2.4   I also need to convert lists in df which is t...
3.1                           Hello, I have some text.
3.2          I would like to split it into sentences. 
3.3   However, when it comes to splitting I want se...
3.4   I also need to convert lists in df which is t...
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.