How to split text of a dataframe column into multiple columns?

Question

I'm trying to retrieve a string from an excel sheet and split it into words then print it or write it back into a new string but when retrieving the data using pandas and trying to split it an error occurs saying dataframe doesn't support split function

the excel sheet has this line in it:

I expect and output like this:

import numpy
import pandas as pd
df = pd.read_excel('eng.xlsx')
txt = df

x = txt.split()

print(x)


AttributeError: 'DataFrame' object has no attribute 'split'

Yohann L. · Accepted Answer · 2019-11-13 09:05:31Z

3

That's because you are applying split() function on a DataFrame and that's not possible.

import pandas as pd
import numpy as np

def append_nan(x, max_len):
    """
    Function to append NaN value into a list based on a max length
    """
    if len(x) < max_len:
        x += [np.nan]*(max_len - len(x))
    return x

# I define here a dataframe for the example
#df = pd.DataFrame(['This is my first sentence', 'This is a second sentence with more words'])
df = pd.read_excel('your_file.xlsx', index=None, header=None)
col_names = df.columns.values.tolist()
df_output = df.copy()

# Split your strings
df_output[col_names[0]] = df[col_names[0]].apply(lambda x: x.split(' '))
# Get the maximum length of all yours sentences
max_len = max(map(len, df_output[col_names[0]]))

# Append NaN value to have the same number for all column
df_output[col_names[0]] = df_output[col_names[0]].apply(lambda x: append_nan(x, max_len))

# Create columns names and build your dataframe
column_names = ["word_"+str(d) for d in range(max_len)]
df_output = pd.DataFrame(list(df_output[col_names[0]]), columns=column_names)

# Then you can save it
df_output.to_excel('output.xlsx')

edited Nov 13, 2019 at 9:05

answered Nov 12, 2019 at 13:35

Yohann L.

1,4612 gold badges16 silver badges31 bronze badges

Sign up to request clarification or add additional context in comments.

9 Comments

programming freak Over a year ago

that was perfect for me, but how to retrieve sentences from the excel sheet rather than writing the sentences by my self

Yohann L. Over a year ago

Same as you did using df.read_excel() and removing my line where I define the dataframe with text.

programming freak Over a year ago

sorry for bothering when Im trying to add the Excel in place of your sentence it gives me error, data=pd.read_excel('eng.xlsx',"Sheet1") df = pd.DataFrame(data) df_output = df.copy()

Yohann L. Over a year ago

Which error? It's probably because you need to change the column name

programming freak Over a year ago

my column doesnt have a name the excel file just have 2 columns in it which is first line and second line longer, the error is KeyError: 0

|

Collectives™ on Stack Overflow

How to split text of a dataframe column into multiple columns?

1 Answer 1

9 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

9 Comments

Your Answer

Sign up or log in

Post as a guest

Related