I have a JSON object which may contain duplicate items and locations and I want to keep the one with the highest risk (and only one of them)
[{
'item': 'itemone',
'location': 'locationone',
'risk_level': 'Low'
#Other values are omitted
},
{
'item': 'itemone',
'location': 'locationone',
'risk_level': 'High'
},
{
'item': 'itemone',
'location': 'locationone',
'risk_level': 'Moderate'
},
{
'item': 'itemone',
'location': 'locationone',
'risk_level': 'High'
},
{
'item': 'itemtwo',
'location': 'locationtwo',
'risk_level': 'Low'
}]
I have tried converting it into a pandas dataframe, ordering it based on risk_level and using the drop_duplicates however this causes issues with other values in the JSON (e.g. converting None into NaN, int into floats etc.) so I don't think it's feasible.
#Convert to dataframe and drop identical insights with lowest severities
dfInsights = pd.DataFrame(response['data'])
dfInsights = dfInsights.reindex(columns=list(response['data'][0].keys()))
dfInsights.sort_values(['risk_level'], inplace=True)
dfInsights.drop_duplicates(['item','location'], keep='first', inplace=True)
dfToJSON = dfInsights.to_dict(orient='records')
I would like the result to be:
[{
'item': 'itemone',
'location': 'locationone',
'risk_level': 'High'
},
{
'item': 'itemtwo',
'location': 'locationtwo',
'risk_level': 'Low'
}]