2

I'm trying to retrieve a string from an excel sheet and split it into words then print it or write it back into a new string but when retrieving the data using pandas and trying to split it an error occurs saying dataframe doesn't support split function

the excel sheet has this line in it:

enter image description here

I expect and output like this:

enter image description here

import numpy
import pandas as pd
df = pd.read_excel('eng.xlsx')
txt = df

x = txt.split()

print(x)


AttributeError: 'DataFrame' object has no attribute 'split'

1 Answer 1

3

That's because you are applying split() function on a DataFrame and that's not possible.

import pandas as pd
import numpy as np

def append_nan(x, max_len):
    """
    Function to append NaN value into a list based on a max length
    """
    if len(x) < max_len:
        x += [np.nan]*(max_len - len(x))
    return x

# I define here a dataframe for the example
#df = pd.DataFrame(['This is my first sentence', 'This is a second sentence with more words'])
df = pd.read_excel('your_file.xlsx', index=None, header=None)
col_names = df.columns.values.tolist()
df_output = df.copy()

# Split your strings
df_output[col_names[0]] = df[col_names[0]].apply(lambda x: x.split(' '))
# Get the maximum length of all yours sentences
max_len = max(map(len, df_output[col_names[0]]))

# Append NaN value to have the same number for all column
df_output[col_names[0]] = df_output[col_names[0]].apply(lambda x: append_nan(x, max_len))

# Create columns names and build your dataframe
column_names = ["word_"+str(d) for d in range(max_len)]
df_output = pd.DataFrame(list(df_output[col_names[0]]), columns=column_names)

# Then you can save it
df_output.to_excel('output.xlsx')
Sign up to request clarification or add additional context in comments.

9 Comments

that was perfect for me, but how to retrieve sentences from the excel sheet rather than writing the sentences by my self
Same as you did using df.read_excel() and removing my line where I define the dataframe with text.
sorry for bothering when Im trying to add the Excel in place of your sentence it gives me error, data=pd.read_excel('eng.xlsx',"Sheet1") df = pd.DataFrame(data) df_output = df.copy()
Which error? It's probably because you need to change the column name
my column doesnt have a name the excel file just have 2 columns in it which is first line and second line longer, the error is KeyError: 0
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.