I have data-frame which contains json column and is quiet huge and is not very efficient, i would like to store it as nested data frame.
So sample data-frame looks like:
id date ag marks
0 I2213 2022-01-01 13:28:05.448054 [{'type': 'A', 'values': {'X': {'F1': 0.1, 'F2': 0.2}, 'U': {'F1': 0.3, 'F2': 0.4}}}, {'type': 'B', 'results': {'Y': {'F1': 0.3, 'F2': 0.2}}}] [{'type': 'A', 'marks': {'X': 0.5, 'U': 0.7}}, {'type': 'B', 'marks': {'Y': 0.4}}]
1 I2213 2022-01-01 14:28:05.448054 [{'type': 'B', 'values': {'Z': {'F1': 0.4, 'F2': 0.2}}}] [{'type': 'A', 'marks': {'X': 0.4, 'U': 0.6}}, {'type': 'B', 'marks': {'Y': 0.3, 'Z': 0.4}}]
2 I2213 2022-01-03 15:28:05.448054 [{'type': 'A', 'values': {'X': {'F1': 0.2, 'F2': 0.1}}}] [{'type': 'A', 'marks': {'X': 0.2, 'U': 0.9}}, {'type': 'B', 'marks': {'Y': 0.2}}]
grouped by date. Sample code for generating sample dataframe:
from datetime import datetime, timedelta
def sample_data():
ag_data = [
"[{'type': 'A', 'values': {'X': {'F1': 0.1, 'F2': 0.2}, 'U': {'F1': 0.3, 'F2': 0.4}}}, {'type': 'B', 'results': {'Y': {'F1': 0.3, 'F2': 0.2}}}]",
"[{'type': 'B', 'values': {'Z': {'F1': 0.4, 'F2': 0.2}}}]",
"[{'type': 'A', 'values': {'X': {'F1': 0.2, 'F2': 0.1}}}]",
]
marks_data = [
"[{'type': 'A', 'marks': {'X': 0.5, 'U': 0.7}}, {'type': 'B', 'marks': {'Y': 0.4}}]",
"[{'type': 'A', 'marks': {'X': 0.4, 'U': 0.6}}, {'type': 'B', 'marks': {'Y': 0.3, 'Z': 0.4}}]",
"[{'type': 'A', 'marks': {'X': 0.2, 'U': 0.9}}, {'type': 'B', 'marks': {'Y': 0.2}}]",
]
date_data = [
datetime.now() - timedelta(3, seconds=7200),
datetime.now() - timedelta(3, seconds=3600),
datetime.now() - timedelta(1),
]
df = pd.DataFrame()
df['date'] = date_data
df['ag'] = ag_data
df['marks'] = marks_data
df['id'] = 'I2213'
return df
I tried with json normalization, but it's creating dataframe in columnar fashion like:
d = a['ag'].apply(lambda x: pd.json_normalize(json.loads(x.replace("'", '"'))))
gives dataframe with columns type values.X.F1 values.X.F2 values.U.F1 values.U.F2 results.Y.F1 results.Y.F2 issue is how to put dict keys like X,Y, F1,F2 as rows instead of columns.
Is it possible to achieve the desired format as shown in image?

melt. pandas.pydata.org/pandas-docs/stable/user_guide/… orstackpandas.pydata.org/pandas-docs/stable/user_guide/…