Type error when importing pandas DataFrame from excel file in python

Question

I'm trying to save a pandas DataFrame as an excel file and import it again and convert it back to a dictionary. The data frame is quite large in size. For instance, consider the following code:

import pandas as pd

path = 'file.xlsx'
dict1 = {'a' : [3, [1, 2, 3], 'text1'],
         'b' : [4, [4, 5, 6, 7], 'text2']}
print('\n\nType 1:', type(dict1['a'][1]))

df1 = pd.DataFrame(dict1)
df1.to_excel(path, sheet_name='Sheet1')
print("\n\nSaved df:\n", df1 , '\n\n')

df2 = pd.read_excel(path, sheet_name='Sheet1')
print("\n\nLoaded df:\n", df2 , '\n\n')

dict2 = df2.to_dict(orient='list')
print("New dict:", dict2, '\n\n')
print('Type 2:', type(dict2['a'][1]))

The output is:

Type 1: <class 'list'>


Saved df:
            a             b
0          3             4
1  [1, 2, 3]  [4, 5, 6, 7]
2      text1         text2




Loaded df:
            a             b
0          3             4
1  [1, 2, 3]  [4, 5, 6, 7]
2      text1         text2


New dict: {'a': [3, '[1, 2, 3]', 'text1'], 'b': [4, '[4, 5, 6, 7]', 'text2']}


Type 2: <class 'str'>

Could you help me get back the original dictionary with the same element types? Thank you!

jwalton · Accepted Answer · 2019-02-26 20:38:08Z

Now, there is an option with read_excel which allows us to change the dtype of the columns as they're read in, however there is no such option to change the dtype of any of the rows. So, we have to do the type conversion ourselves, after the data has been read in.

As you've shown in your question, df['a'][1] has type str, but you'd like it to have type list.

So, let's say we have some string l ='[1, 2, 3]' we could convert it to a list of ints (l=[1, 2, 3]) as [int(val) for val in l.strip('[]').split(',')]. Now, we can use this in conjunction with the .apply method to get what we desire:

df.iloc[1] = df.iloc[1].apply(lambda x : [int(val) for val in x.strip('[]').split(',')])

Putting this example back together we have:

import pandas as pd

# Data as read in by read_excel method
df2 = pd.DataFrame({'a' : [3, '[1, 2, 3]', 'text1'],
                   'b' : [4, '[4, 5, 6, 7]', 'text2']})
print('Type: ', type(df2['a'][1]))
#Type:  <class 'str'>

# Convert strings in row 1 to lists
df2.iloc[1] = df2.iloc[1].apply(lambda x : [int(val) for val in x.strip('[]').split(',')])

print('Type: ', type(df2['a'][1]))
#Type:  <class 'list'>

dict2 = df2.to_dict(orient='list')

Collectives™ on Stack Overflow

Type error when importing pandas DataFrame from excel file in python

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related