I would like to deserialize JSON of which I predefined the schema. Here is a typical JSON file I deal with.
{'op': 'mcm',
'id': 1,
'clk': 'AKjT4QEAl5q/AQCW7rIB',
'pt': 1563999965598,
'mc': [{'id': '1.160679253',
'rc': [{'atl': [[1.18, 88.5],
[1.17, 152.86],
[1.16, 175.96],
[1.14, 93.3],
[1.08, 28.08],
[1.07, 8.84],
[1.02, 129.74]],
'id': 1}]}]}
for which I would like a schema like that:
{'op': String,
'id': Integer,
'clk': String,
'pt': Integer,
'mc': [{'id': String,
'rc': [{'atl': Array(Decimal),
'id': Integer}]}]}
I know it is possible to do that with PySpark but I am looking for a lighter solution (something on the top of the json packages for example).
Here is what I already tried so far:
- deserializing the JSON file and having a custom function to set up the type of each of the elements of the dictionary: I am afraid that by converting from string to float and then from float to Decimal I would get rounding errors.
- Use a custom
JSONDecoder(https://docs.python.org/3/library/json.html#json.JSONDecoder) with customparse_float,parse_int,parse_constantfunction: those functions only take the string to be parsed as an argument and I would have to treat'1.160679253'(just afterpt) and'1.18'(just afteratl) the same way while I want'1.160679253'to remain a string and'1.18'to be cast as decimal.
Thanks in advance for your help
Decimal()function right away