I have created a function responsible for parsing JSON object, extracting useful fields and creating a Pandas data frame.
def parse_metrics_to_df(metrics):
def extract_details(row):
row['trial'] = row['agent']['trial']
row['numerosity'] = row['agent']['numerosity']
row['reliable'] = row['agent']['reliable']
row['was_correct'] = row['performance']['was_correct']
return row
df = pd.DataFrame(metrics)
df = df.apply(extract_details, axis=1)
df.drop(['agent', 'environment', 'performance'], axis=1, inplace=True)
df.set_index('trial', inplace=True)
return df
The metrics is an array of JSON documents looking similar to (first two rows):
[{'agent': {'fitness': 25.2375,
'numerosity': 1,
'population': 1,
'reliable': 0,
'steps': 1,
'total_steps': 1,
'trial': 0},
'environment': None,
'performance': {'was_correct': True}},
{'agent': {'fitness': 23.975625,
'numerosity': 1,
'population': 1,
'reliable': 0,
'steps': 1,
'total_steps': 2,
'trial': 1},
'environment': None,
'performance': {'was_correct': False}}]
Then executed as follows:
df = parse_metrics_to_df(metrics)
The code works as expected but it's extremely slow. Parsing array with a million objects takes nearly 1 hour.
How to do this properly?
