How to index elements of list in a dataframe in Python?

Question

I have the following dataframe:

pandas as pd

df = pd.DataFrame({'Text': ['Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
                       'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.',
                       'Hello, I have some text.</p> I would like to split it into sentences. </p> However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.</p> I also need to convert lists in df which is tricky.']})

What I want to do is, first, splitting the dataframe and, then, converting the list into a dataframe and create a column that keep track of how many sentences belong to a text.

To split the text I do:

df.Text.str.split('</p>')

df.Text.str.split('</p>')[0]

As you can see, every element in the original dataframe contains the 4 sentences which I split. I now want to create a dataframe as the the following one:

ID    Text

1.1   Hello, I have some text.
1.2  I would like to split it into sentences. 
1.3  However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
1.4  I also need to convert lists in df which is tricky.
2.1  Hello, I have some text.
2.2  I would like to split it into sentences. 
2.3  However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
2.4 I also need to convert lists in df which is tricky.
3.1  Hello, I have some text.
3.2  I would like to split it into sentences. 
3.3 However, when it comes to splitting I want sentences to be indexed so that I can re-join them correctly.
3.4 I also need to convert lists in df which is tricky.

Can anyone help me do it?

Thanks!

PS. In the real example, the sentences are not evenly split as above.

mozway · Accepted Answer · 2022-02-15 13:53:46Z

You could use split to split the strings, then explode to create new rows, and finally rework the index:

df2 = (df.assign(Text=df['Text'].str.split('</p>'))
         .explode('Text')
       )

idx = df2.index.to_series().add(1).astype(str)
idx2 = idx.groupby(idx).cumcount().add(1).astype(str)

df2.index = idx+'.'+idx2

output:

                                                  Text
1.1                           Hello, I have some text.
1.2          I would like to split it into sentences. 
1.3   However, when it comes to splitting I want se...
1.4   I also need to convert lists in df which is t...
2.1                           Hello, I have some text.
2.2          I would like to split it into sentences. 
2.3   However, when it comes to splitting I want se...
2.4   I also need to convert lists in df which is t...
3.1                           Hello, I have some text.
3.2          I would like to split it into sentences. 
3.3   However, when it comes to splitting I want se...
3.4   I also need to convert lists in df which is t...

Collectives™ on Stack Overflow

How to index elements of list in a dataframe in Python?

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related