1

I have a pandas dataframe (df1) consisting of 56 rows and 9 columns, one column ['Matrix'] contains a dataframe (df2) per cell. Saving and loading this df1 turns df2 into a string. However, I need this to be a dataframe.

I came across several solutions (see below) but can't seem to apply these to my situation as I have a dataframe containing a dataframe.

df2 is build up out of two columns ('Mean', 'SD') and consists out of 95k rows per df1 row (sample below (they're not all 0 ;)).

My gut feeling is that my approach itself is suboptimal to start with.

Could someone give me a nudge in the right direction?

' Mean SD\r\n0 0.00 0.000000\r\n1 0.00 0.000000\r\n2 0.00 0.000000\r\n3 0.00 0.000000\r\n4 0.00 0.000000\r\n5 0.00 0.000000\r\n6 0.00 0.000000\r\n7 0.00 0.000000\r\n8 0.00 0.000000\r\n9 ...... 0.000000\r\n95194 0.00 0.000000\r\n95195 0.00 0.000000\r\n95196 0.00 0.000000\r\n95197 0.00 0.000000\r\n95198 0.00 0.000000\r\n95199 0.00 0.000000\r\n\r\n[95200 rows x 2 columns]'

1
  • 1
    You just need to loop through df1 and convert the cell to a dataframe. I haven't tried it in a while but you can probably load it up as a StringIO object and read that object: stackoverflow.com/questions/22604564/… . Then you can store it in a dictionary with each key referring to the row number or something like that. (You can also keep it df1, but that normally wouldn't make much sense from a tabular pov) Commented Jan 18, 2020 at 7:46

1 Answer 1

1

You should use json file format as saving your DataFrame.

Consider the following simplified example:

import pandas as pd

df1 = pd.DataFrame([[0,1], [2,3]], columns=["first", "second"])
df2 = pd.DataFrame([[4,5,6], [7,8,9]])
df1["dataf"] = [df2, df2]

print("\nDataFrame df1:")
print("**************")  
print(df1)
print("\ndataf column:")
print("***************")  
print(df1["dataf"])
print("\ndataf column cell:")
print("********************")  
print(df1["dataf"][0])
print("Type of dataf cells:", type(df1["dataf"][0]))

Out:

DataFrame df1:
**************
   first  second                             dataf
0      0       1     0  1  2
0  4  5  6
1  7  8  9
1      2       3     0  1  2
0  4  5  6
1  7  8  9

dataf column:
***************
0       0  1  2
0  4  5  6
1  7  8  9
1       0  1  2
0  4  5  6
1  7  8  9
Name: dataf, dtype: object

dataf column cell:
********************
   0  1  2
0  4  5  6
1  7  8  9
Type of dataf cells: <class 'pandas.core.frame.DataFrame'>

Now save our DataFrame as json using pandas.DataFrame.to_json:

df1.to_json("test.json")

Loading our data back with pandas.read_json:

df1 = pd.read_json("test.json")

print("\nDataFrame df1:")
print("**************")  
print(df1)
print("\ndataf column:")
print("***************")  
print(df1["dataf"])
print("\ndataf column cell:")
print("********************")  
print(df1["dataf"][0])
print("Type of dataf cells:", type(df1["dataf"][0]))

Out:

DataFrame df1:
**************
   first  second                                              dataf
0      0       1  {'0': {'0': 4, '1': 7}, '1': {'0': 5, '1': 8},...
1      2       3  {'0': {'0': 4, '1': 7}, '1': {'0': 5, '1': 8},...

dataf column:
***************
0    {'0': {'0': 4, '1': 7}, '1': {'0': 5, '1': 8},...
1    {'0': {'0': 4, '1': 7}, '1': {'0': 5, '1': 8},...
Name: dataf, dtype: object

dataf column cell:
********************
{'0': {'0': 4, '1': 7}, '1': {'0': 5, '1': 8}, '2': {'0': 6, '1': 9}}
Type of dataf cells: <class 'dict'>

We can simply convert our effected columns to DataFrames using pandas.DataFrame.apply

df1["dataf"] = df1["dataf"].apply(lambda x: pd.DataFrame(x))

print("\nDataFrame df1:")
print("**************")  
print(df1)
print("\ndataf column:")
print("***************")  
print(df1["dataf"])
print("\ndataf column cell:")
print("********************")  
print(df1["dataf"][0])
print("Type of dataf cells:", type(df1["dataf"][0]))

Out:

DataFrame df1:
**************
   first  second                             dataf
0      0       1     0  1  2
0  4  5  6
1  7  8  9
1      2       3     0  1  2
0  4  5  6
1  7  8  9

dataf column:
***************
0       0  1  2
0  4  5  6
1  7  8  9
1       0  1  2
0  4  5  6
1  7  8  9
Name: dataf, dtype: object

dataf column cell:
********************
   0  1  2
0  4  5  6
1  7  8  9
Type of dataf cells: <class 'pandas.core.frame.DataFrame'>

You can see that our core code is so simple as follows:

df1.to_json("test.json")
df1 = pd.read_json("test.json")
df1["dataf"] = df1["dataf"].apply(lambda x: pd.DataFrame(x))
Sign up to request clarification or add additional context in comments.

1 Comment

Amazing, thank you! As mentioned, I was doubting my approach with the csv export. This is perfect.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.