I am trying to take the following deeply nested JSON and turn it into (eventually) a csv. Below is just a sample, the full JSON is huge (12GB).
{'reporting_entity_name':'Blue Cross and Blue Shield of Alabama',
'reporting_entity_type':'health insurance issuer',
'last_updated_on':'2022-06-10',
'version':'1.1.0',
'in_network':[
{'negotiation_arrangement': 'ffs',
'name': 'xploration of Kidney',
'billing_code_type': 'CPT',
'billing_code_type_version': '2022',
'billing_code': '50010',
'description': 'Renal Exploration, Not Necessitating Other Specific Procedures',
'negotiated_rates': [{
'negotiated_prices': [
{'negotiated_type': 'negotiated',
'negotiated_rate': 993.0,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1180.68,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1283.95,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1042.65,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1290.9,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'},
{'negotiated_type': 'negotiated',
'negotiated_rate': 1241.25,
'expiration_date': '2022-06-30',
'service_code': ['21', '22', '24'],
'billing_class': 'professional'}
}]}]}
The end goal is to have a data frame, or dictionary, that I can then write to a csv. I hope to have each row with the columns:
{'reporting_entity_name':'','reporting_entity_type':'','last_updated_on':'','version':'','negotiation_arrangement':'','name':'','billing_code_type':'','billing_code_type_version':'','billing_code':'','description':'','provider_groups':'','negotiated_type':'','negotiated_rate':'','expiration_date':'','service_code':'','billing_class':''}
So far I have tried pandas normalize_json, flatten, and a few custom modules I found on GitHub. But none seem to normalize/flatten the data into new rows, only columns. Because this is such a huge dataset I'm worried about doing this recursively in a bunch of nested loops because I fear it will quickly eat up all my memory. Thanks in advance for any advice you can offer!