Schemas to deserialize JSON string in python

Question

I would like to deserialize JSON of which I predefined the schema. Here is a typical JSON file I deal with.

{'op': 'mcm',
 'id': 1,
 'clk': 'AKjT4QEAl5q/AQCW7rIB',
 'pt': 1563999965598,
 'mc': [{'id': '1.160679253',
   'rc': [{'atl': [[1.18, 88.5],
      [1.17, 152.86],
      [1.16, 175.96],
      [1.14, 93.3],
      [1.08, 28.08],
      [1.07, 8.84],
      [1.02, 129.74]],
     'id': 1}]}]}

for which I would like a schema like that:

{'op': String,
 'id': Integer,
 'clk': String,
 'pt': Integer,
 'mc': [{'id': String,
   'rc': [{'atl': Array(Decimal),
     'id': Integer}]}]}

I know it is possible to do that with PySpark but I am looking for a lighter solution (something on the top of the json packages for example).

Here is what I already tried so far:

deserializing the JSON file and having a custom function to set up the type of each of the elements of the dictionary: I am afraid that by converting from string to float and then from float to Decimal I would get rounding errors.
Use a custom JSONDecoder (https://docs.python.org/3/library/json.html#json.JSONDecoder) with custom parse_float, parse_int, parse_constant function: those functions only take the string to be parsed as an argument and I would have to treat '1.160679253' (just after pt) and '1.18' (just after atl) the same way while I want '1.160679253' to remain a string and '1.18' to be cast as decimal.

Thanks in advance for your help

Could you give an example of such a custom function from your first point? I'm wondering why do you feel the need to convert from String->Float->Decimal instead of just using the Decimal() function right away — Zaroth
– Zaroth, Commented Jul 29, 2019 at 9:17
I thought that when we call the load function in JSON, it will automatically convert elements that look like float to float and I would have to convert them to Decimal which would give this sequence of conversions: String -> Float -> Decimal. — Robin Nicole
– Robin Nicole, Commented Jul 29, 2019 at 9:30
Yes, I figured that after a moment of thinking, that's why I suggested forcing JSON parser to convert float-looking fields to Decimals straight away to avoid information loss. I thought you may not be aware that it's an option. — Zaroth
– Zaroth, Commented Jul 29, 2019 at 9:32

Zaroth · Accepted Answer · 2019-07-29 09:41:26Z

2

Your first approach is the most lightweight one as it requires nothing but the standard library - just use a custom function based on json package tailored to what you need. As for the float->decimal conversion and precision loss, json.loads() has parse_float parameter to force floating number parsing as Decimals straight away:

>>> import decimal
>>> json.loads('1.1', parse_float=decimal.Decimal)
Decimal('1.1')

As for the ID field, which will be parsed to Decimal as well thanks to its unique float-similar format - you can just convert it back to string via str() with no information loss as a special case.

edited Jul 29, 2019 at 9:41

answered Jul 29, 2019 at 9:23

Zaroth

5787 silver badges19 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Robin Nicole Over a year ago

Thanks, what you suggest is the second thing I tried and it would convert '1.160679253' in 'id': '1.160679253' to Decimal as well while I want to keep it as a String

Zaroth Over a year ago

Well, then just convert this single field back to string in the custom function - no information will be lost in this conversion and this field is clearly a special case, as usually strings meant to be strings don't look like decimals.

Robin Nicole Over a year ago

Isn't it possible that when I convert it back I will get rounding errors ?

Robin Nicole Over a year ago

In the end I am going to force json.loads() to keep everything as String and add a custom function that will convert all the strings in the dictionary one by one. That way I can cast the each element depending on its key.

Zaroth Over a year ago

What rounding errors? Decimal->String as performed by str(decimal) doesn't do any rounding at all. >>> d = decimal.Decimal('0.1428571428571428571428571429') >>> str(d) '0.1428571428571428571428571429'

|

Collectives™ on Stack Overflow

Schemas to deserialize JSON string in python

1 Answer 1

6 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Related