I have a kafka stream to consume which contains some information in JSON form.
I need to convert that json data into Pandas dataframe to feed it further into a data warehouse.
The problem is that the json data structure keeps on changing depending upon the event types.
Example:
The first event comes in with the structure as:
{
"organization": "nation1",
"job_id": 1,
"job_name": "job1",
"state": {
"started": "no"
},
"timetamp": 1570357814930
}
Then another event comes in with this structure:
{
"organization": "nation1",
"job_id": 1,
"job_name": "job1",
"state": {
"started": "yes",
"attended": "yes"
},
"timetamp": 1570357814988
}
Notice the change in state object above.
Assuming that the lowest level structure/hierarchy is not going to change, i.e.; the state object can have at-max the started and attended key-value pairs but not more. Although as can be seen in first event, the state object has only started in it.
How can I make sure that I get a pandas dataframe like below for such scenario. Keeping in mind that the actual json will have many such fields/maps which will have dynamic structure like this

json_normalize..?