5

we have an existing script to read json file from S3 and convert into parquet format, data receiving below format and able to read by below code,

json file content: [ {"Id":"123124","Account__c":"0ereeraw334U","Active__c":"true"} ]

Existing code to convert into data frame: df = pd.read_json(obj['Body'],dtype='unicode',convert_dates=False)

But how to read the below json data in same way,

{"cust_land_detail":[ {"Id":"45634653","Account__c":"sersff23se","Active__c":"true"} ] }

NB: file has a root element, so unable to encode data.

2 Answers 2

5

I found one solution for this by using 'json_normalize' as below,

df=pd.read_json(obj['Body'],dtype='unicode',convert_dates=False)
data=json_normalize(df['cust_land_detail'])

I am getting two types JSON files, one is without root element and another is with root element, so I need to read json using read_json, then normalize by root element comparison with argument passed root value.

Sign up to request clarification or add additional context in comments.

Comments

0

Do you have access to whatever generates the data going into the S3 buckets? If so, it might make sense to go out of your way on that end to simplify the format for yourself. This is also assuming there is only this one consumer of this data or all consumers are ok with that change.

Other alternatives off the top of my head:

  • Parse to JSON, serialize back using the level of the array -- this is expensive though because you end up parsing the JSON twice:
    s = json.dumps(json.loads(s)["cust_land_detail"])

  • Manually parse out the chunk you need -- assuming the structure is simple, knowable, and not likely to change:
    preamble, postamble = '"{"cust_land_detail":', '}'
    s = s[len(preamble):-len(postamble)]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.