4

I have a dataframe which includes a column that has a list. When I write the dataframe to a file then re-open it, I end up converting the list to a string. Is there a way to safely read/write dataframes that have lists as members?

df1 = DataFrame({'a':[['john quincy', 'tom jones', 'jerry rice'],['bob smith','sally ride','little wayne'],['seven','eight','nine'],['ten','eleven','twelve']],'b':[9,2,4,5], 'c': [7,3,0,9]})

df1.to_csv('temp.csv')
df2 = read_csv('temp.csv')

#note how the list (df1) has been converted to a string (df2)
df1['a'][0]
['john quincy', 'tom jones', 'jerry rice']

df2['a'][0]
"['john quincy', 'tom jones', 'jerry rice']"
3
  • After reading in don't you want the reverse? lambda L: L.split(',') - not join again... Commented Nov 1, 2012 at 20:01
  • i put that in to show that it has been converted to a string. It was just illustrating the case. if you open the tempfile you will see that the list column has quotes around it. Commented Nov 1, 2012 at 20:18
  • submitted an issue on pandas github github.com/pydata/pandas/issues/2158 Commented Nov 1, 2012 at 20:51

2 Answers 2

2

No need to covert the list into string in the first place, list's will be converted to string automatically. Just write the dataframe containing a list, and use ast.literal_eval on df2

                                             a  b  c
0   ['john quincy', 'tom jones', 'jerry rice']  9  7
1  ['bob smith', 'sally ride', 'little wayne']  2  3
2                   ['seven', 'eight', 'nine']  4  0
3                  ['ten', 'eleven', 'twelve']  5  9

df1.to_csv('temp.csv')
df2 = read_csv('temp.csv')

Use ast.literal_eval to get the string back to list:

import ast
fd2['a']=df2['a'].apply(lambda x: ast.literal_eval(x))
type(df2['a'][1])

Output:

list
Sign up to request clarification or add additional context in comments.

3 Comments

root, thanks for your input. The string conversion part was misleading. I want to do that as part of my script but I was getting an error because although I expected a list I returned as string.
@ root. Your answer works great. I rewrote the question for clarity. Although I would prefer if the pandas read/write parsers would recognize this automatically.
Note: you can do this without the lambda: just do .apply(ast.literal_eval)
1

The problem is here:

df2['a'] =df2['a'].map(f)
                   ^^^^^^

Where f = lambda x : ','.join(x)

There's no point joining it again, you want to split it to a list:

df2['a'] = df2['a'].map(lambda L: L.split(','))

1 Comment

I rewrote the answer to remove the map function as it was more confusing than helpful. thank you for your answer

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.